Query on a large database

  • EDIT :- BUMP UP three days on still no data from any other tests.

    Jayanth Kurup[/url]

  • BUMPity BUMP BUMP , still no data from ?

    For what its worth there has been a lot of positive feedback for the scripts and video posted on linkedin.

    Jayanth Kurup[/url]

  • Jayanth_Kurup (10/23/2015)


    Just wondering if anybody was able to finish testing the scripts and if there are any new results ?

    EDIT : BUMP, hoping for some independently verified results , also created a complete video with more scenarios in case any body is looking for a deep dive.

    To be honest, the scripts are terrible. Everything is either a hard-code path name or geared specifically to SQL Server 2014, which I won't even install on my laptop until the next SP is successful and has been out for at least 2 months. And I'm not one to store data with programs. You weren't even thoughtful enough to write a bloody Bulk Insert for the test data. Not exactly "readily consumable" code. It's a whole lot deeper dive than it could/should have been. You didn't even change the default growth settings. And now you've taken to bumping your own posts. Give it a break while we fix your "right click" code. :Whistling:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Thanks for taking the time to do this Jeff , I'll try an have a more easily usable script ready in a few minutes and send them to you.

    Jayanth Kurup[/url]

  • Jayanth_Kurup (10/26/2015)


    Thanks for taking the time to do this Jeff , I'll try an have a more easily usable script ready in a few minutes and send them to you.

    Looking at your current script, there are no indexes on the tables, either. If that's what you really setup for your tests, it's no wonder why partitioning seemed faster to you. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • that was on purpose so that the tester can compare

    heap no index - (script provided) - for baseline

    vs

    clustered index table ( they create the clustered index )

    vs

    heap partitioned table ( script provider)

    the other script is ready just testing it , will upload in a few minutes

    One of the original arguments was that partitioning doesnt provide any benefit , so the comparison between partitioned and partitioned heap should clear that up.

    Jayanth Kurup[/url]

  • Updated , cleaned up scripts attached.

    I have written it so that you can simply copy paste and execute ( after reviewing the code ie).

    Jayanth Kurup[/url]

  • The final video

    https://www.youtube.com/watch?v=Eyjokb1o7Go

    Jayanth Kurup[/url]

  • OK, populated 2 tables from the scripts.

    Here is the outcome:

    set statistics time on

    Select REF_AGE , COUNT(*)

    from Partitiontest123.dbo.partitionedtbl

    where ref_age between '27' and '30'

    group by REF_AGE

    set statistics time off

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 303 ms.

    set statistics time on

    Select REF_AGE , COUNT(*)

    from Partitiontest122.dbo.unpartitionedtbl

    where ref_age between '27' and '30'

    group by REF_AGE

    set statistics time off

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 115 ms.

    Clear win for clustered index.

    Almost 3 times faster.

    _____________
    Code for TallyGenerator

  • Jayanth_Kurup (10/27/2015)


    Updated , cleaned up scripts attached.

    I have written it so that you can simply copy paste and execute ( after reviewing the code ie).

    I'm guessing that you didn't actually check anything after you ran that. ALL of the data went into the PRIMARY partition because each field in the CSV is encapsulated in quotes. There's no such thing in the Ref_Age column as 24 because it's actually "24" including the quotes.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I've got the data loading now.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Alright then... after getting around the double-quote problem with loading your data by changing the delimiter on the BULK INSERT from ',' to '","', I ran some performance tests using YOUR test code and 1 variation of your test code.

    The first code I ran on both databases looked like this. It's a variation of your exact code in that I changed the last GO to GO 5 so I wouldn't have to babysit each run.

    Here's the first test I did, which drops the buffers on every iteration.

    PRINT '========== Partitioned table, drop buffers every iteration. ===================='

    GO

    dbcc dropcleanbuffers

    set statistics time on

    Select REF_AGE , COUNT(*)

    from Partitiontest123.dbo.partitionedtbl

    where ref_age between '27' and '30'

    group by REF_AGE

    set statistics time off

    GO 5

    PRINT '========== NON Partitioned table, drop buffers every iteration. ===================='

    GO

    dbcc dropcleanbuffers

    set statistics time on

    Select REF_AGE , COUNT(*)

    from Partitiontest122.dbo.unpartitionedtbl

    where ref_age between '27' and '30'

    group by REF_AGE

    set statistics time off

    GO 5

    And, after deleting the rows that didn't mean anything to make the times more obvious, here's what I got for run results...

    ========== Partitioned table, drop buffers every iteration. ====================

    Beginning execution loop

    CPU time = 1623 ms, elapsed time = 15008 ms.

    CPU time = 1669 ms, elapsed time = 15228 ms.

    CPU time = 1232 ms, elapsed time = 15044 ms.

    CPU time = 1716 ms, elapsed time = 15400 ms.

    CPU time = 1764 ms, elapsed time = 15593 ms.

    Batch execution completed 5 times.

    ========== NON Partitioned table, drop buffers every iteration. ====================

    Beginning execution loop

    CPU time = 873 ms, elapsed time = 23586 ms.

    CPU time = 1077 ms, elapsed time = 24508 ms.

    CPU time = 843 ms, elapsed time = 24171 ms.

    CPU time = 982 ms, elapsed time = 24743 ms.

    CPU time = 920 ms, elapsed time = 25257 ms.

    Batch execution completed 5 times.

    By golly!!! It sure does look like the partitioned table beats the tar out of the monolithic table with an index just like you said, doesn't it? Well my young friend, don't bet your life on it. 😉

    While the partitioned table certainly beat the unpartitioned indexed table FOR THIS TEST, it's the wrong bloody test. Who in their right mind would ever essentially clear cache on a production machine between runs? Hopefully, no one or only the severely insane. :hehe:

    Let's see what happens if we only clear cache for the first iteration as if we just got done building each table. We'll use the "variation" that I was talking about where we put a GO in after the DBCC command so cache is cleared only before the first iteration.

    Here's the code for that...

    PRINT '========== Partitioned table, drop buffers only on first iteration. ===================='

    GO

    dbcc dropcleanbuffers

    GO --Added this

    set statistics time on

    Select REF_AGE , COUNT(*)

    from Partitiontest123.dbo.partitionedtbl

    where ref_age between '27' and '30'

    group by REF_AGE

    set statistics time off

    GO 5

    PRINT '========== NON Partitioned table, drop buffers only on first iteration. ===================='

    GO

    dbcc dropcleanbuffers

    GO --Added this

    set statistics time on

    Select REF_AGE , COUNT(*)

    from Partitiontest122.dbo.unpartitionedtbl

    where ref_age between '27' and '30'

    group by REF_AGE

    set statistics time off

    GO 5

    And here are the cleaned up results from that run...

    ========== Partitioned table, drop buffers only on first iteration. ====================

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    Beginning execution loop

    CPU time = 1747 ms, elapsed time = 15148 ms.

    CPU time = 1982 ms, elapsed time = 726 ms.

    CPU time = 1966 ms, elapsed time = 708 ms.

    CPU time = 1934 ms, elapsed time = 725 ms.

    CPU time = 1934 ms, elapsed time = 717 ms.

    Batch execution completed 5 times.

    ========== NON Partitioned table, drop buffers only on first iteration. ====================

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    Beginning execution loop

    CPU time = 1030 ms, elapsed time = 22598 ms.

    CPU time = 1138 ms, elapsed time = 317 ms.

    CPU time = 1248 ms, elapsed time = 313 ms.

    CPU time = 999 ms, elapsed time = 321 ms.

    CPU time = 1248 ms, elapsed time = 311 ms.

    Batch execution completed 5 times.

    Look at that. If you actually test the code as it would actually be used, the monolithic non-partitioned index table is roughly twice as fast for both CPU and Duration... just like some of us have been saying all along.

    So, please stop saying that partitioned tables provides better performance because, using YOUR test data and YOUR testing method, it clearly doesn't. If you say it does in the movie, you need to change that so that you don't mislead people into thinking it does. AND, you should go to that other LinkedIn thread that you started and save some face by telling folks that you made a mistake during testing and partitioned tables are NOT a panacea of performance.

    If you don't, I will and that's a promise. 😉

    Now... enquiring minds want to know... do you like pork chops?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Sergiy (10/31/2015)


    OK, populated 2 tables from the scripts.

    Here is the outcome:

    set statistics time on

    Select REF_AGE , COUNT(*)

    from Partitiontest123.dbo.partitionedtbl

    where ref_age between '27' and '30'

    group by REF_AGE

    set statistics time off

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 303 ms.

    set statistics time on

    Select REF_AGE , COUNT(*)

    from Partitiontest122.dbo.unpartitionedtbl

    where ref_age between '27' and '30'

    group by REF_AGE

    set statistics time off

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 115 ms.

    Clear win for clustered index.

    Almost 3 times faster.

    Not to mention, that's a really nice machine you've got there, Sergiy. Didn't notice your post before but we proved the same thing. Partitioning is not a panacea of performance and, as we just proved, is actually a bit (a lot, actually) of a performance problem. At least for how this table was partitioned.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Well, colour me (not) surprised Jeff. 😉

    Looks like I might need to write a part 2 of the 'query performance tuning methodology' article....

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (10/31/2015)


    Well, colour me (not) surprised Jeff. 😉

    Looks like I might need to write a part 2 of the 'query performance tuning methodology' article....

    Gosh, I'd love to be in on that one, Gail. I've got the "chops" for it. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 61 through 75 (of 104 total)

You must be logged in to reply to this topic. Login to reply