Which way is better? To join or not to join or does it matter?

Question

Post reply

Which way is better? To join or not to join or does it matter?

Mike.Hahn

SSCarpal Tunnel

Points: 4053
More actions
October 25, 2012 at 12:38 pm

#263790

I'm writing a script that will probably join a bit more then 20 tables.

It's my understanding that joins are costly.

Would it cost less to write certain parts like my sample2 below?

I'm only returning that 1 field from Table2.

All ID fields are unique. I read about this a while back but can't remember anything about it.

Sample1

select t2.MyName PersonName , t1.* from Table1 t1 inner join Table2 t2 on t1.SalesPersonID = t2.ID where t1.ID= 12345

Sample2

select (select t2.MyName from Table2 t2 where t2.ID = t1.SalesPersonID) PersonName , t1.* from Table1 t1 where frqr.tril_gid = 12345

Thanks a bunch.

---------------------------------------------------------------
Mike Hahn - MCSomething someday:-)
Right way to ask for help!!
http://www.sqlservercentral.com/articles/Best+Practices/61537/
I post so I can see my avatar :hehe:
I want a personal webpage 😎
I want to win the lotto 😀
I want a gf like Tiffa :w00t: Oh wait I'm married!:-D

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Ray M SSC-Insane Points: 21093 More actions · Answer 1

Ray M

SSC-Insane

Points: 21093

October 25, 2012 at 12:42 pm

#1552873

My $.02.

If your tables are modeled and indexed correctly then joins are a necessary evil.

Your sample 2 uses a correlated subquery, and I believe this is more costly than a join.

CapnHector SSCoach Points: 16143 More actions · Answer 2

Want a cool sig (10/25/2012)

I'm writing a script that will probably join a bit more then 20 tables.

It's my understanding that joins are costly.

Would it cost less to write certain parts like my sample2 below?

I'm only returning that 1 field from Table2.

All ID fields are unique. I read about this a while back but can't remember anything about it.

Sample1

select t2.MyName PersonName , t1.* from Table1 t1 inner join Table2 t2 on t1.SalesPersonID = t2.ID where t1.ID= 12345

Sample2

select (select t2.MyName from Table2 t2 where t2.ID = t1.SalesPersonID) PersonName , t1.* from Table1 t1 where frqr.tril_gid = 12345

Thanks a bunch.

its worth testing but in my experience joins are faster than correlated sub queries (however not all correlated sub queries can be rewritten as joins). reason being the sub query has to be run for every row instead of being able to use a join to get the same results in one pass. in our production code we try to eliminate sub queries because of this added cost. if we can join we join. if its a big long complicated query we test both ways in our dev environment and use which ever is faster. but the key there is we test test and test again.

For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]

Mike.Hahn SSCarpal Tunnel Points: 4053 More actions · Answer 3

Thanks a bunch. Correlated sub queries. Forgot what it was called. Forgot what was said about it. Been forgetting a lot of things. Must be getting old. :doze:

The DB is not designed properly. The app is OO VB6 that allowed the users to create DB objects (table/column) etc from the app interface (pretty much any user can add an object--column/table).

Basically forcing an OO model on a Relational DB. It's a nightmare. 200+ columns when only 25 related objects exists. Not to mention that the same data can potentially be written across 30+ different tables. Makes reporting fun. 🙂

---------------------------------------------------------------
Mike Hahn - MCSomething someday:-)
Right way to ask for help!!
http://www.sqlservercentral.com/articles/Best+Practices/61537/
I post so I can see my avatar :hehe:
I want a personal webpage 😎
I want to win the lotto 😀
I want a gf like Tiffa :w00t: Oh wait I'm married!:-D

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 4

Correlated subqueries do not run per row of the outer query (unless triangular joins or top 1 ... order by).

Correlated subqueries are not usually more expensive than joins, they usually get compiled to exactly the same query plan.

Joins are not expensive operations to be avoided

Unless joins have been tested and are unacceptably slow, a more complex alternative should not be considered.

Write queries the simplest way, test performance, if unacceptable then consider complex rewrites

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

CapnHector SSCoach Points: 16143 More actions · Answer 5

GilaMonster (10/25/2012)

Correlated subqueries do not run per row of the outer query (unless triangular joins or top 1 ... order by).

Correlated subqueries are not usually more expensive than joins, they usually get compiled to exactly the same query plan.

Joins are not expensive operations to be avoided

Unless joins have been tested and are unacceptably slow, a more complex alternative should not be considered.

Write queries the simplest way, test performance, if unacceptable then consider complex rewrites

looks like im off to hit the books on correlated subqueries now. then go yell at the guy i heard it from.

For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]

Thinknook SSCrazy Points: 2531 More actions · Answer 6

looks like im off to hit the books on correlated subqueries now. then go yell at the guy i heard it from.

+1 to that...

Could've swore I've read in the official MCTS exam book that correlated subqueries are performed on a row by row basis if used in the SELECT, and perform exactly like a join if used in the EXISTS part of a WHERE clause.

Update: Iv actually looked it up and tested it, the book wrongly says:

A correlated subquery is typically equal in performance compared to using a JOIN when used in combination with the EXISTS operator to filter the outer query... While this type of correlated subquery is typically a good implementation, the use of correlated subqueries in the SELECT clause often has a negative effect on performance compared to JOINs. Of course, this depends on the number of rows returned by the outer query. If a large number of rows are returned, each query in the SELECT clause would be executed for each row

Damn, can't even trust books anymore. Thanks for clearing things up Gail

-----------------
... Then again, I could be totally wrong! Check the answer.
Check out posting guidelines here for faster more precise answers[/url].

I believe in Codd
... and Thinknook is my Chamber of Understanding

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 7

That's doubly wrong, because EXISTS with correlated subqueries does not perform equally with joins, it's usually very, very slightly better (and actually they're not even equivalent query forms)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 8

GilaMonster (10/26/2012)

That's doubly wrong, because EXISTS with correlated subqueries does not perform equally with joins, it's usually very, very slightly better (and actually they're not even equivalent query forms)

Agree with Gail on the fact that EXISTS with correlated subqueries is not equivalent to JOIN's. However in terms of performance - it depends. From my experience I wouldn't say that usually one is better than another. I've seen all three permutations of EXISTS vs OUTER JOIN's .

I do usually test both where appropriate.

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

Bhuvnesh SSC Guru Points: 59351 More actions · Answer 9

GilaMonster (10/26/2012)

That's doubly wrong, because EXISTS with correlated subqueries does not perform equally with joins, it's usually very, very slightly better (and actually they're not even equivalent query forms)

Good learning for me.. glad i spent one hour after my hectic office hours at Home ..thanks gail :-). can you please put some light on its explanation

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

SomewhereSomehow SSCommitted Points: 1586 More actions · Answer 10

GilaMonster (10/25/2012)

Correlated subqueries do not run per row of the outer query (unless triangular joins or top 1 ... order by).

One more case when optimizer is not able to decorrelate subquery is an assert impact.

use tempdb;



go

create table t1 (a int);

create table t2 (a int);

go

insert t1 values (1),(2),(3),(4),(5);

insert t2 values (1),(2),(3),(4),(5);

go

set statistics profile on

go

select

	a,

	other_a = (select a from t2 where t1.a = t2.a)

from

	t1

;

go

set statistics profile off

go

drop table t1,t2;

5 1 select a, other_a = (select a from t2 where t1.a = t2.a) from t1 ;



0	0	  |--Compute Scalar(DEFINE:([Expr1009]=[Expr1011]))

5	1	       |--Nested Loops(Left Outer Join, OUTER REFERENCES:([tempdb].[dbo].[t1].[a]))

5	1	            |--Table Scan(OBJECT:([tempdb].[dbo].[t1]))

5	5	            |--Assert(WHERE:(CASE WHEN [Expr1010]>(1) THEN (0) ELSE NULL END))

5	5	                 |--Stream Aggregate(DEFINE:([Expr1010]=Count(*), [Expr1011]=ANY([tempdb].[dbo].[t2].[a])))

5 5 |--Table Scan(OBJECT:([tempdb].[dbo].[t2]), WHERE:([tempdb].[dbo].[t1].[a]=[tempdb].[dbo].[t2].[a]))

upd. fixed formatting.

I am really sorry for my poor gramma. And I hope that value of my answers will outweigh the harm for your eyes.
Blog: http://somewheresomehow.ru[/url]
Twitter: @SomewereSomehow

SomewhereSomehow SSCommitted Points: 1586 More actions · Answer 11

SomewhereSomehow

SSCommitted

Points: 1586

October 26, 2012 at 9:40 am

#1553265

It still showing sad smiles after fixing and putting plan text into |code| tag instead of |quote|.

Yes, that is sad query =))

I am really sorry for my poor gramma. And I hope that value of my answers will outweigh the harm for your eyes.
Blog: http://somewheresomehow.ru[/url]
Twitter: @SomewereSomehow

Thinknook SSCrazy Points: 2531 More actions · Answer 12

One more case when optimizer is not able to decorrelate subquery is an assert impact.

Interesting! so what is a scenario when the optimizer compiles both a correlated subquery query and a join query to the same query plan?, Am trying to come up with one but not managing, even if the query plan looks "almost" the same, SQL is using different join operators (for example Nested Loop for correlated subquery vs Hash Match for join).

/>L

-----------------
... Then again, I could be totally wrong! Check the answer.
Check out posting guidelines here for faster more precise answers[/url].

I believe in Codd
... and Thinknook is my Chamber of Understanding

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 13

SELECT sod.SalesOrderID ,



        SalesOrderDetailID ,

        AccountNumber = (SELECT AccountNumber FROM Sales.SalesOrderHeader soh WHERE soh.SalesOrderID = sod.SalesOrderID)

FROM Sales.SalesOrderDetail AS sod

SELECT  sod.SalesOrderID ,

        SalesOrderDetailID ,

        AccountNumber

FROM Sales.SalesOrderDetail AS sod LEFT OUTER JOIN Sales.SalesOrderHeader soh ON soh.SalesOrderID = sod.SalesOrderID

Near-identical plans (an extra compute scalar in the first)

Or take SomewhereSomehow's example and add a unique constraint (or primary key) to the second table. If SQL knows the column is unique, it doesn't need an assert and hence doesn't need the nested loop join but can use other joins as it likes.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass