Coincidence or Causality

  • Comments posted to this topic are about the item Coincidence or Causality

  • There is at least one case where the written order of the tables does matter. Ordinarily, the optimizer only considers plans where at least one input to each join comes from a base table. It won't normally consider doing a join on the results of two previous joins. Sometimes, our knowledge of the data makes it obvious that joining the results of prior joins should result in a better plan. A quick demonstration:

    USE tempdb;

    SET NOCOUNT ON;

    CREATE TABLE #A (A INTEGER PRIMARY KEY);

    CREATE TABLE #B (B INTEGER PRIMARY KEY);

    CREATE TABLE #C (C INTEGER PRIMARY KEY);

    CREATE TABLE #D (D INTEGER PRIMARY KEY);

    INSERT #A (A)

    SELECT TOP (10000)

    ROW_NUMBER() OVER (ORDER BY AU1.allocation_unit_id)

    FROM master.sys.allocation_units AU1,

    master.sys.allocation_units AU2;

    INSERT #B (B)

    SELECT TOP (10000)

    7 * ROW_NUMBER() OVER (ORDER BY AU1.allocation_unit_id)

    FROM master.sys.allocation_units AU1,

    master.sys.allocation_units AU2;

    INSERT #C (C)

    SELECT TOP (10000)

    ROW_NUMBER() OVER (ORDER BY AU1.allocation_unit_id)

    FROM master.sys.allocation_units AU1,

    master.sys.allocation_units AU2;

    INSERT #D (D)

    SELECT TOP (10000)

    28 * ROW_NUMBER() OVER (ORDER BY AU1.allocation_unit_id)

    FROM master.sys.allocation_units AU1,

    master.sys.allocation_units AU2;

    SET STATISTICS IO ON;

    PRINT 'Run 1';

    -- Leave it to the optimizer

    SELECT D.D

    FROM #A A

    JOIN #B B ON B.B = A.A

    JOIN #C C ON C.C = B.B

    JOIN #D D ON D.D = C.C;

    PRINT 'Run 2';

    -- Bushy plan

    SELECT D.D

    FROM (#B B JOIN #A A ON A.A = B.B)

    JOIN (#D D JOIN #C C ON D.D = C.C)

    ON D.D = A.A

    OPTION (FORCE ORDER);

    SET STATISTICS IO OFF;

    DROP TABLE #A, #B, #C, #D;

    Default:

    Bushy:

  • Its a moot point as to whether you consider deadlocks a performance issue (I would) but should there not at least be consistency in the order objects are accessed to help minimise deadlocking

    ---------------------------------------------------------------------

  • Deadlocks don't occur within a query, they are within a transaction when you are updating two objects.

  • ah, have i misunderstood it, it matters when you have more than one query within a transaction?

    ---------------------------------------------------------------------

  • A deadlock needs two sessions, and it needs multiple objects being accessed. AFAIK, if each person ran a single query you could not have a deadlock. It requires each person to access two objects, in the wrong order. So for deadlocks order is important, but that's among queries.

    Paul, that's interesting. I've not seen people writing joins as groups like you have. I think those parens are affecting things.

  • Steve Jones - Editor (12/13/2009)


    A deadlock needs two sessions, and it needs multiple objects being accessed. AFAIK, if each person ran a single query you could not have a deadlock. It requires each person to access two objects, in the wrong order. So for deadlocks order is important, but that's among queries.

    yes, thats the point I was trying to make Steve. When multiple queries are running on a database, the order in which objects are accessed can be important. But thats not the same as the order you write tables down in a single query, so I'll shutup now 🙂

    ---------------------------------------------------------------------

  • Steve Jones - Editor (12/13/2009)


    Paul, that's interesting. I've not seen people writing joins as groups like you have. I think those parens are affecting things.

    No parentheses version:

    SELECT D.D

    FROM #B B

    JOIN #A A

    ON A.A = B.B

    JOIN #D D

    JOIN #C C

    ON D.D = C.C

    ON C.C = B.B

    OPTION (FORCE ORDER);

    This produces a the same 'bushy' plan. My point is just to highlight the fact that this sort of plan is never normally considered by the optimizer: if you want to join the results of two joins, you have to perform tricks like this - where the written order does have an impact. The important thing is not the parentheses (though they do make the intention clearer - at least to my way of thinking); it is the OPTION clause that helps us specify an particular plan.

    I ought to mention too that the written order of the tables is important if an OBJECT or SQL plan guide exists which requires the textual form of the submitted query to match the guide. Re-arranging the query in this case would prevent the plan guide from being applied, potentially resulting in a suboptimal query plan.

  • One more, just to labour the point. This one features no OPTION clause (same bushy plan again):

    SELECT D.D

    FROM #B B

    INNER

    MERGE

    JOIN #A A

    ON A.A = B.B

    JOIN #D D

    JOIN #C C

    ON D.D = C.C

    ON C.C = B.B

    Not everyone realises that including a join hint forces the order for all tables in the query, just as if OPTION (FORCE ORDER) had been specified.

  • This is an example of where true DBAs rule.

    I mention this as this form of optimisation is usually beyond the day to day best practices of the database developer role. Whilst some DBAs do this role as well, many developers (code monkeys like me) do the database developer role. Unfortunately, too many think that if they can do one they can do the other. I like to think that one of my greatest strengths is knowing when to defer to others' expertise.

    It is good to lay down that myth though but also good to understand that in some circumstances it may make a difference. As with all performance issued, perceived or real, the only try test is measurement. This is why I tend to go for the declarative development of stored procedures and leave optimisation to the point where it is either a problem, there is enough of the system and data to measure or a highly skilled DBA comes to tut at my naievity 😉

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • Steve Jones - Editor (12/13/2009)


    A deadlock needs two sessions, and it needs multiple objects being accessed. AFAIK, if each person ran a single query you could not have a deadlock. It requires each person to access two objects, in the wrong order. So for deadlocks order is important, but that's among queries.

    Paul, that's interesting. I've not seen people writing joins as groups like you have. I think those parens are affecting things.

    It's essentially turning them into derived tables. Pretty common thing, just looks odd because it's all that's there in that particular query.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks for the compliment in the article, Steve.

    There are a lot of weird ideas out there on performance in T-SQL, but the single biggest factor in most of the questions I field on the subject is very simply an inability to read execution plans, or simply not knowing they even exist.

    Phil had a good editorial a few weeks ago (if I remember the timing), on the subject of SQL speed, and someone asked for a set of "rules" for fast queries in the discussion. The fact that the rules are so subject to vagaries, exceptions, circumstances, etc., is what makes it such a hard subject to learn or teach.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Great post, Steve. I like the use of the word "myth" for issues like this. They always seem to originate with tuning tips that were appropriate umpteen versions ago but are no longer appropriate today. Yet, with the persistence of documents across the internet, it's almost impossible to stamp out these things.

    There was a time, back in the day, when table order used to matter. I think that it dropped off the list of things to worry about around SQL7 or maybe as late as SQL2000. I'd like to know for sure, but can't recall off the top of my head. I'd bet even money that Kalen or Itzik or Kimberly remember for sure.

    One thing that -DOES- still matter, though, is whether you include or exclude all the columns in a compound index. I still run in to LOTS of people who don't understand those ramifications.

    Best regards,

    -Kev

  • GSquared (12/14/2009)


    It's essentially turning them into derived tables. Pretty common thing, just looks odd because it's all that's there in that particular query.

    Sort of. A derived table attempt might look like this:

    SELECT F.D

    FROM (

    SELECT A.A

    FROM #B B

    JOIN #A A ON A.A = B.B

    ) E

    JOIN (

    SELECT D.D

    FROM #D D

    JOIN #C C ON D.D = C.C

    ) F

    ON F.D = E.A

    OPTION (FORCE ORDER)

    The optimizer will still not produce the desired 'bushy' plan unless OPTION (FORCE ORDER) or a JOIN hint is used. It's also a bit more typing, though arguably clearer. Anyway, this is just another way of demonstrating that rewriting the table order like this can, and does, affect the execution plan.

  • kkline (12/14/2009)


    There was a time, back in the day, when table order used to matter. I think that it dropped off the list of things to worry about around SQL7 or maybe as late as SQL2000. I'd like to know for sure, but can't recall off the top of my head. I'd bet even money that Kalen or Itzik or Kimberly remember for sure.

    I too remember that back in the day when it did matter. In the early 90's I was using DB2 on a mainframe and IBM had a tuning course that mentioned this. At that time, the optimizer only dealt with 4 tables at a time so it would determine the first table to access based on the first 4 tables in the from clause, then it would concider the remaining 3 tables with the 5th table etc. So it only mattered if you joined more than 4 tables. Then in the mid 90's I moved on to Sybase and it had similar limitations on what the optimizer could look at in each pass. This of course would have been SQL Server 4.x timeframe. If I were a betting man I would guess that SQL 2000 was the release that got rid of that optimizer limitation, but others may know better.

    Now all we have to do is find all this misinformation living on the web and clean it up 😀

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply