Coincidence or Causality

Question

Post reply

Coincidence or Causality

Steve Jones - SSC Editor

SSC Guru

Points: 728159
More actions
December 11, 2009 at 6:13 pm

#218641

Comments posted to this topic are about the item Coincidence or Causality

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply

Paul White SSC Guru Points: 150457 More actions · Answer 1

There is at least one case where the written order of the tables does matter. Ordinarily, the optimizer only considers plans where at least one input to each join comes from a base table. It won't normally consider doing a join on the results of two previous joins. Sometimes, our knowledge of the data makes it obvious that joining the results of prior joins should result in a better plan. A quick demonstration:



USE tempdb;

SET NOCOUNT ON;

CREATE TABLE #A (A INTEGER PRIMARY KEY);

CREATE TABLE #B (B INTEGER PRIMARY KEY);

CREATE TABLE #C (C INTEGER PRIMARY KEY);

CREATE TABLE #D (D INTEGER PRIMARY KEY);

INSERT  #A (A)

SELECT  TOP (10000)

        ROW_NUMBER() OVER (ORDER BY AU1.allocation_unit_id)

FROM    master.sys.allocation_units AU1,

        master.sys.allocation_units AU2;

INSERT  #B (B)

SELECT  TOP (10000)

        7 * ROW_NUMBER() OVER (ORDER BY AU1.allocation_unit_id)

FROM    master.sys.allocation_units AU1,

        master.sys.allocation_units AU2;

INSERT  #C (C)

SELECT  TOP (10000)

        ROW_NUMBER() OVER (ORDER BY AU1.allocation_unit_id)

FROM    master.sys.allocation_units AU1,

        master.sys.allocation_units AU2;

INSERT  #D (D)

SELECT  TOP (10000)

        28 * ROW_NUMBER() OVER (ORDER BY AU1.allocation_unit_id)

FROM    master.sys.allocation_units AU1,

        master.sys.allocation_units AU2;

SET STATISTICS IO ON;

PRINT 'Run 1';

-- Leave it to the optimizer

SELECT  D.D

FROM    #A A

JOIN    #B B ON B.B = A.A

JOIN    #C C ON C.C = B.B

JOIN    #D D ON D.D = C.C;

PRINT 'Run 2';

-- Bushy plan

SELECT  D.D

FROM    (#B B JOIN #A A ON A.A = B.B)

JOIN    (#D D JOIN #C C ON D.D = C.C)

        ON D.D = A.A

OPTION  (FORCE ORDER);

SET STATISTICS IO OFF;

DROP TABLE #A, #B, #C, #D;

Default:

Bushy:

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

george sibbald SSC Guru Points: 104210 More actions · Answer 2

george sibbald

SSC Guru

Points: 104210

December 13, 2009 at 4:26 am

#1091624

Its a moot point as to whether you consider deadlocks a performance issue (I would) but should there not at least be consistency in the order objects are accessed to help minimise deadlocking

---------------------------------------------------------------------

Steve Jones - SSC Editor SSC Guru Points: 728159 More actions · Answer 3

Steve Jones - SSC Editor

SSC Guru

Points: 728159

December 13, 2009 at 8:43 am

#1091667

Deadlocks don't occur within a query, they are within a transaction when you are updating two objects.

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

george sibbald SSC Guru Points: 104210 More actions · Answer 4

george sibbald

SSC Guru

Points: 104210

December 13, 2009 at 8:58 am

#1091668

ah, have i misunderstood it, it matters when you have more than one query within a transaction?

---------------------------------------------------------------------

Steve Jones - SSC Editor SSC Guru Points: 728159 More actions · Answer 5

A deadlock needs two sessions, and it needs multiple objects being accessed. AFAIK, if each person ran a single query you could not have a deadlock. It requires each person to access two objects, in the wrong order. So for deadlocks order is important, but that's among queries.

Paul, that's interesting. I've not seen people writing joins as groups like you have. I think those parens are affecting things.

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

george sibbald SSC Guru Points: 104210 More actions · Answer 6

Steve Jones - Editor (12/13/2009)

A deadlock needs two sessions, and it needs multiple objects being accessed. AFAIK, if each person ran a single query you could not have a deadlock. It requires each person to access two objects, in the wrong order. So for deadlocks order is important, but that's among queries.

yes, thats the point I was trying to make Steve. When multiple queries are running on a database, the order in which objects are accessed can be important. But thats not the same as the order you write tables down in a single query, so I'll shutup now 🙂

---------------------------------------------------------------------

Paul White SSC Guru Points: 150457 More actions · Answer 7

Steve Jones - Editor (12/13/2009)

Paul, that's interesting. I've not seen people writing joins as groups like you have. I think those parens are affecting things.

No parentheses version:



SELECT  D.D

FROM    #B B

JOIN    #A A

        ON A.A = B.B

JOIN    #D D

JOIN    #C C

        ON D.D = C.C

        ON C.C = B.B

OPTION  (FORCE ORDER);

This produces a the same 'bushy' plan. My point is just to highlight the fact that this sort of plan is never normally considered by the optimizer: if you want to join the results of two joins, you have to perform tricks like this - where the written order does have an impact. The important thing is not the parentheses (though they do make the intention clearer - at least to my way of thinking); it is the OPTION clause that helps us specify an particular plan.

I ought to mention too that the written order of the tables is important if an OBJECT or SQL plan guide exists which requires the textual form of the submitted query to match the guide. Re-arranging the query in this case would prevent the plan guide from being applied, potentially resulting in a suboptimal query plan.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Paul White SSC Guru Points: 150457 More actions · Answer 8

One more, just to labour the point. This one features no OPTION clause (same bushy plan again):



SELECT  D.D

FROM    #B B

INNER

MERGE

JOIN    #A A

        ON A.A = B.B

JOIN    #D D

JOIN    #C C

        ON D.D = C.C

        ON C.C = B.B

Not everyone realises that including a join hint forces the order for all tables in the query, just as if OPTION (FORCE ORDER) had been specified.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Gary Varga SSC Guru Points: 82166 More actions · Answer 9

This is an example of where true DBAs rule.

I mention this as this form of optimisation is usually beyond the day to day best practices of the database developer role. Whilst some DBAs do this role as well, many developers (code monkeys like me) do the database developer role. Unfortunately, too many think that if they can do one they can do the other. I like to think that one of my greatest strengths is knowing when to defer to others' expertise.

It is good to lay down that myth though but also good to understand that in some circumstances it may make a difference. As with all performance issued, perceived or real, the only try test is measurement. This is why I tend to go for the declarative development of stored procedures and leave optimisation to the point where it is either a problem, there is enough of the system and data to measure or a highly skilled DBA comes to tut at my naievity 😉

Gaz

-- Stop your grinnin' and drop your linen...they're everywhere!!!

GSquared SSC Guru Points: 260824 More actions · Answer 10

Steve Jones - Editor (12/13/2009)

A deadlock needs two sessions, and it needs multiple objects being accessed. AFAIK, if each person ran a single query you could not have a deadlock. It requires each person to access two objects, in the wrong order. So for deadlocks order is important, but that's among queries.

Paul, that's interesting. I've not seen people writing joins as groups like you have. I think those parens are affecting things.

It's essentially turning them into derived tables. Pretty common thing, just looks odd because it's all that's there in that particular query.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

GSquared SSC Guru Points: 260824 More actions · Answer 11

Thanks for the compliment in the article, Steve.

There are a lot of weird ideas out there on performance in T-SQL, but the single biggest factor in most of the questions I field on the subject is very simply an inability to read execution plans, or simply not knowing they even exist.

Phil had a good editorial a few weeks ago (if I remember the timing), on the subject of SQL speed, and someone asked for a set of "rules" for fast queries in the discussion. The fact that the rules are so subject to vagaries, exceptions, circumstances, etc., is what makes it such a hard subject to learn or teach.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

KEKline SSC Enthusiast Points: 193 More actions · Answer 12

Great post, Steve. I like the use of the word "myth" for issues like this. They always seem to originate with tuning tips that were appropriate umpteen versions ago but are no longer appropriate today. Yet, with the persistence of documents across the internet, it's almost impossible to stamp out these things.

There was a time, back in the day, when table order used to matter. I think that it dropped off the list of things to worry about around SQL7 or maybe as late as SQL2000. I'd like to know for sure, but can't recall off the top of my head. I'd bet even money that Kalen or Itzik or Kimberly remember for sure.

One thing that -DOES- still matter, though, is whether you include or exclude all the columns in a compound index. I still run in to LOTS of people who don't understand those ramifications.

Best regards,

-Kev

Paul White SSC Guru Points: 150457 More actions · Answer 13

GSquared (12/14/2009)

It's essentially turning them into derived tables. Pretty common thing, just looks odd because it's all that's there in that particular query.

Sort of. A derived table attempt might look like this:



SELECT  F.D

FROM    (

        SELECT  A.A

        FROM    #B B

        JOIN    #A A ON A.A = B.B

        ) E

JOIN    (

        SELECT  D.D

        FROM    #D D

        JOIN    #C C ON D.D = C.C

        ) F

        ON F.D = E.A

OPTION  (FORCE ORDER)

The optimizer will still not produce the desired 'bushy' plan unless OPTION (FORCE ORDER) or a JOIN hint is used. It's also a bit more typing, though arguably clearer. Anyway, this is just another way of demonstrating that rewriting the table order like this can, and does, affect the execution plan.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

DBA_Rob Default port Points: 1494 More actions · Answer 14

kkline (12/14/2009)

There was a time, back in the day, when table order used to matter. I think that it dropped off the list of things to worry about around SQL7 or maybe as late as SQL2000. I'd like to know for sure, but can't recall off the top of my head. I'd bet even money that Kalen or Itzik or Kimberly remember for sure.

I too remember that back in the day when it did matter. In the early 90's I was using DB2 on a mainframe and IBM had a tuning course that mentioned this. At that time, the optimizer only dealt with 4 tables at a time so it would determine the first table to access based on the first 4 tables in the from clause, then it would concider the remaining 3 tables with the 5th table etc. So it only mattered if you joined more than 4 tables. Then in the mid 90's I moved on to Sybase and it had similar limitations on what the optimizer could look at in each pass. This of course would have been SQL Server 4.x timeframe. If I were a betting man I would guess that SQL 2000 was the release that got rid of that optimizer limitation, but others may know better.

Now all we have to do is find all this misinformation living on the web and clean it up 😀