T-SQL: Why “It Depends”

  • timothyawiseman (3/16/2010)


    Excellent Article. Thank you for providing it. I will be passing this off to some of my friends soon.

    You mean "passing it on"...right? 😉

    Passing it off would be a breach of Bob's rights as an author :laugh: :laugh: :laugh:

  • GabyYYZ (3/16/2010)


    ... (or is someone willing to defend keeping Autoshrink on?) 🙂

    ...

    I can't really defend keeping Autoshrink on but I can say that it has been left on in our production environment for a number of years with no issues.

    I've seen it said more than once here that turning it off is an absolute but with no explanation. Can anyone defend turning Autoshrink off?

    Dave

  • Bob,

    Thanks for a well written consise and much needed article. I've book marked it for the next time someone asks me why their query worked fine in the test enviornment with only a partial dataset and it needed to be reworked in production with 100's of millions of rows. Please write more articles.

    ---------------------------------------------------------------------
    Use Full Links:
    KB Article from Microsoft on how to ask a question on a Forum

  • GabyYYZ (3/16/2010)


    (or is someone willing to defend keeping Autoshrink on?)

    Hey Gaby 😀

    Auto-close and Auto-shrink default to true on MSDE (2000). I worked at a large international retail chain once, with close to 3000 MSDE instances running (one on each cashier's till in every store). The application created many 'snapshot' databases as bulk changes to prices etc. were applied. Auto-shrink and auto-close were pretty handy there.

    See? It depends! 😀

    Paul

  • Hi Bob,

    This is Venkat.

    You must be knowing by this time.

    It is really a million dollars worth article.

    In fact we discussed couple of times at our work place on the same topic.

    Keep posting.

    Cheers..

    Venkat Munjeti

    Vmunjeti@gmail.com

  • CirquedeSQLeil (3/16/2010)


    TedS (3/16/2010)


    Greg Edwards-268690 (3/16/2010)


    TedS (3/16/2010)


    Am I the only one who copied the code directly and got 3 different result sets for each query?

    Granted I used RedGate to generate 500,000 rows in the example table, but I got 699 Rows with the first example, 101 Rows with the 2nd and 102 rows with the 3rd.

    I limited the generation of data to have 100 products and dates to reside in 2 years.

    Please go back and look at Paul's comment on the format of the code and see if that explains your issue.

    Greg E

    I did, it doesn't.

    My generated data should work just fine, the results of the 3 queries (both seperate and together) are simply different.

    While I do appreciate the effort and agree with the premise that there isn't a one size fits all soloution (it does really depend) after I fix the (formatting related) syntax errors the results aren't the same.

    I have run the code and did not come up with 3 different result sets.

    Interesting. Perhaps my random data generator did something evil, but I'm getting 3 very distinct result sets (still). how did you populate the table? I have 500,000 semi-random records with some boundries. I used Red-Gate data generator to populate the table. I am very interested in the findings as I have a couple of databases that have crossed a "magical" theshold and are now generating different execution plans AND sometimes they cause issues such as IO performance drain, or filling up TempDB etc.

  • GabyYYZ (3/16/2010)


    1. Do NOT turn autoshrink on.

    (or is someone willing to defend keeping Autoshrink on?) 🙂

    I'll defend it! 😛

    ALL my developers have to put up with it.

    Frankly I want to give them the worst performing instances/databases in the organization. Not so bad that they can't do their jobs, but bad enough that they appreciate the benefits of getting it right!

    When their code moves to production, it's less work for me.

    but once again 'it depends' on how they have obtained the performance gains...

    *had one guy reduce his test-data set - no help to anyone.

  • A SUPER READ!!!

    Tom Garth
    Vertical Solutions[/url]

    "There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers
  • In the past I have used this technique to find the "latest" record for each Product:

    SELECT a.Product, a.SaleQuantity, a.SalePrice, a.SaleDate, a.SaleComments

    FROM dbo.SalesDetail a

    WHERE NOT EXISTS (

    SELECT 1 FROM dbo.SalesDetail b

    WHERE b.Product = a.Product AND

    b.SaleDate > a.SaleDate )

    Against my test data it out-performs the other three methods at the 10,000 and 100,000 record levels.

    Admittedly I haven't tried the 1,000,000 record level.

    The execution plan shows the major cost is a Hash Match (Right Anti Semi Join).

    Is this OK as an approach or will I run into fish-hooks?

  • Evan Yates (3/16/2010)


    In the past I have used this technique to find the "latest" record for each Product:

    ...NOT EXISTS...

    Against my test data it out-performs the other three methods at the 10,000 and 100,000 record levels.

    Admittedly I haven't tried the 1,000,000 record level.

    The execution plan shows the major cost is a Hash Match (Right Anti Semi Join).

    Is this OK as an approach or will I run into fish-hooks?

    It depends, of course.

    The 'weakness' of NOT EXISTS is that every row must be scanned to check the NOT EXISTS predicate. A semi join (EXISTS) can stop looking as soon as it finds a single match. Also, remember that a hash operation requires a memory grant, and so may not be the optimal choice for OLTP (the query might have to wait for the memory grant, or a poor estimate for the memory required might lead to the hash spilling to disk).

    In general, ROW_NUMBER and the APPLY/TOP methods will often out-perform the anti semi join (assuming optimal indexing for all). The reason being that ROW_NUMBER only scans the input once (no joins), and APPLY/TOP can be even faster if there are a larger number of detail records per group. It does depend on the indexing and data distribution though.

    Paul

  • There is another dimension to the "It Depends"--Parallelization.

    Some time ago I challenged my staff to demonstrate their SQL knowledge by solving the problem of identifying gaps in monotonic sequence of reference/key numbers typically found in an identity column. (I had already had a solution in hand to address an auditing issue.)

    So I asked:

    Given a table Transaction with the field ID defined as integer identity( 1, 1 ) NOT NULL,

    write a single SELECT statement that will return a result set containing the last number in a run of numbers, the first number of the next run, and the number of missing records between them.

    If Transaction had records containing the following values for ID: 1, 2, 3, 4, 8, 9, 10, 19, 20, 21, 22

    Your result set would be

    4 8 3

    10 19 8

    What resulted surprised me. One submission did better on a single core and one submission did better on a multi-core CPU.

    The multicore (8-way) winner was

    SELECT t1.ID AS StartGap

    ( SELECT MIN( ID ) FROM Transaction WHERE ID > t1.ID ) AS EndGap,

    ( SELECT MIN( ID ) FROM Transaction WHERE ID > t1.ID )

    - ( t1.ID + 1 ) AS GapSize

    FROM Transaction t1

    WHERE ( t1.ID + 1 ) != ( SELECT MIN( ID ) FROM Transaction WHERE ID > t1.ID )

    The single core winner was

    SELECT StartGap,

    EndGap ,

    StartGap - EndGap AS GapSize

    FROM ( SELECT t1.ID AS StartGap,

    ( SELECT MIN( CONVERT( int, t2.ID ) )

    FROM Transaction t2

    WHERE t2.ID > t1.ID

    ) AS EndGap

    FROM Transaction t1 ) tbl

    WHERE EndGap - StartGap > 1

    While I don't have the results handy to post, what was equally more disturbing was that the code that won on the single core had an execution time better than when it was run on the the 8-way. My take was that the optimizer goofed trying to parallelize it.

    Bill.

  • ith the last couple of posts in mind, I hope everyone realizes that the big brother to "It Depends" is "Test it!". 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • webatxcent (3/16/2010)


    Some time ago I challenged my staff to demonstrate their SQL knowledge by solving the problem of identifying gaps in monotonic sequence of reference/key numbers typically found in an identity column.

    Itzik did an excellent series on 'gaps and islands' problems for SQL Server Magazine. If you have a subscription to that site, I would encourage you to read them. There are even better methods than the ones you posted...for some given value of 'better'.

  • Jeff Moden (3/16/2010)


    ith the last couple of posts in mind, I hope everyone realizes that the big brother to "It Depends" is "Test it!". 😉

    I think this needs a little more emphasis. TEST IT!

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • [font="Arial Black"]TEST IT?[/font]

Viewing 15 posts - 46 through 60 (of 98 total)

You must be logged in to reply to this topic. Login to reply