Which SQL would run faster?

Question

Post reply

Which SQL would run faster?

leonp

SSCommitted

Points: 1962
More actions
January 15, 2009 at 9:00 pm

#204145

Which of the following you think would run the quickest (I am thinking the 3rd - but I wanted to check)? Each of x and y has around 10 million rows

[font="Courier New"]Query - 1

========[/font]

Select

x.c1,

x.c2,

MIN(case when y.c3 = '' then '9' else coalesce(y.c3,'*') end) as c3,

MIN(case when y.c4 = '' then '9' else coalesce(y.c4,'*') end) as c4,

MIN(case when y.c5 = '' then '9' else coalesce(y.c5,'*') end) as c5,

MIN(case when y.c6 = '' then '9' else coalesce(y.c6,'*') end) as c6,

MIN(case when y.c7 = '' then '9' else coalesce(y.c7,'*') end) as c7

from

x

inner join

y

on

x.c1 = y.c1 and x.c2 = y.c2

group by x.c1, x.c2

[font="Courier New"]Query - 2

========[/font]

Select

x.c1,

x.c2,

MIN(c3) as c3,

MIN(c4) as c4,

MIN(c5) as c5,

MIN(c6) as c6,

MIN(c7) as c7

from

x

inner join

(select y.c1, y.c2,

case when y.c3 = '' then '9' else coalesce(y.c3,'*') end as c3,

case when y.c4 = '' then '9' else coalesce(y.c4,'*') end as c4,

case when y.c5 = '' then '9' else coalesce(y.c5,'*') end as c5,

case when y.c6 = '' then '9' else coalesce(y.c6,'*') end as c6,

case when y.c7 = '' then '9' else coalesce(y.c7,'*') end as c7

y)

on

x.c1 = y.c1 and x.c2 = y.c2

group by x.c1, x.c2

[font="Courier New"]Query - 3

========[/font]

With a scalar user defined function defined dbo.udf_coal(x) as -

Select

x.c1,

x.c2,

MIN(c3) as c3,

MIN(c4) as c4,

MIN(c5) as c5,

MIN(c6) as c6,

MIN(c7) as c7

from

x

inner join

(select y.c1, y.c2,

dbo.udf_coal(c3) end as c3,

dbo.udf_coal(c4) end as c4,

dbo.udf_coal(c5) end as c5,

dbo.udf_coal(c6) end as c6,

dbo.udf_coal(c7) end as c7

y)

on

x.c1 = y.c1 and x.c2 = y.c2

group by x.c1, x.c2

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 1

Why don't you just run it and tell us? 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 2

Well, running them would be a far quicker way of seeing which is faster than asking.

I can say that the third will probably be the slowest. Don't know about the other 2.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jack Corbett SSC Guru Points: 184388 More actions · Answer 3

Jack Corbett

SSC Guru

Points: 184388

January 16, 2009 at 7:07 am

#928089

I'd guess 1. 3 will be slowest.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Ronald H SSCarpal Tunnel Points: 4023 More actions · Answer 4

Make it a poll instead... 🙂 I think 1 and 2 would be equal in performance (it seems that the same amount of rows are touched, and every row is just once COALESCed and then SUMmed), the third slowest.

Ronald HensbergenHelp us, help yourself... Post data so we can read and use it: http://www.sqlservercentral.com/articles/Best+Practices/61537/-------------------------------------------------------------------------2+2=5 for significant large values of 2

Ramesh Saive SSC-Insane Points: 24275 More actions · Answer 5

Ramesh Saive

SSC-Insane

Points: 24275

January 16, 2009 at 7:17 am

#928100

I think the queries 1 & 2 are similar, so I'd bet on both & more on 1..., 3 is out of question here for me.:)

--Ramesh

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 6

r.hensbergen (1/16/2009)

Make it a poll instead... 🙂 I think 1 and 2 would be equal in performance (it seems that the same amount of rows are touched, and every row is just once COALESCed and then SUMmed), the third slowest.

Heh... no... please don't make it a poll. I'm trying to be as nice as pie... I don't understand why people post code and ask which will be faster... that's pretty lazy... just run the bloody code and see. Then, if you don't understand why, post the results, the code, and ask why. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

steveb. SSC-Forever Points: 46733 More actions · Answer 7

steveb.

SSC-Forever

Points: 46733

January 16, 2009 at 7:32 am

#928122

I guess number one.

What is the prize if I get this right?

leonp SSCommitted Points: 1962 More actions · Answer 8

alrite ... first thanks for 'all the guidance' ....

second .. while i admit to being a little lazy, I do not have access to the database itself that this 'representative query' is to process ..... would have had to build 2 tables with around 10 million rows (not that it is a big deal) ... I just assumed the "experts" here would have been able to tell this without having to run the SQL and also generous enough to share their opinion - guess I was wrong ....

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 9

Jeff Moden (1/16/2009)

I'm trying to be as nice as pie... I don't understand why people post code and ask which will be faster... that's pretty lazy...

Cause it's easier than setting up a test environment?

Then, if you don't understand why, post the results, the code, and ask why. 😉

I wonder how many of the people who think that one or the other will be faster can explain why they say that. 😉

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 10

leonp (1/16/2009)

second .. while i admit to being a little lazy, I do not have access to the database itself that this 'representative query' is to process

Got a local copy of SQL and a test DB? Taking half an hour or so and building up a small sample DB (or just taking adventureworks) can save you massively in the future when you need to test stuff out and aren't allowed to do so on the production server. It's also a great learning environment for fooling around with SQL and seeing how it behaves.

I just assumed the "experts" here would have been able to tell this without having to run the SQL and also generous enough to share their opinion - guess I was wrong ....

It's the fishing and teaching to fish thing again. I can (and did) tell you that 3 would probably be slower. If you don't know why, that fact in itself is useless as it doesn't help in the future.

Run the stuff, see how it work, see if you can figure it out, and next time you won't have to ask the 'experts', you'll be able to figure it out yourself.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

leonp SSCommitted Points: 1962 More actions · Answer 11

leonp

SSCommitted

Points: 1962

January 16, 2009 at 11:30 am

#928384

ok ok ..... spare me the barrage, smart, sweet people ..... you were right .... 1st and 2nd were quicker than 3rd and in fact 1st was quicker than 2nd a little bit ...

I changed the coalesce in the 1st to ISNULL and that was even quicker ..... (not asking why - just an FYI)

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 12

leonp (1/16/2009)

alrite ... first thanks for 'all the guidance' ....

second .. while i admit to being a little lazy, I do not have access to the database itself that this 'representative query' is to process ..... would have had to build 2 tables with around 10 million rows (not that it is a big deal) ... I just assumed the "experts" here would have been able to tell this without having to run the SQL and also generous enough to share their opinion - guess I was wrong ....

Sorry, Leon... I didn't even consider that. I've recently been involved with a rash of posts where people just don't try and think that folks on this forum actually get paid for their answers. We don't, by the way. Anyway, please accept my apologies and a small "gift" for my bit of rudeness (don't know what else to call it)... dunno if you have one already, but I've found the following data generator to be a big help for things like this. Feel free to warp the code to your needs...

--===== Create and populate a 1,000,000 row test table.



     -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

     -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

     -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

     -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

     -- Column "SomeDate" has a range of  >=01/01/2000 and <01/01/2010 non-unique date/times

     -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

     --        for all rows.

     -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

     -- Jeff Moden

 SELECT TOP 1000000

        RowNum       = IDENTITY(INT,1,1),

        SomeInt      = ABS(CHECKSUM(NEWID()))%50000+1,

        SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

                     + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

        SomeCSV      = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

        SomeMoney    = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

        SomeDate     = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

        SomeHex12    = RIGHT(NEWID(),12)

   INTO dbo.JBMTest

   FROM Master.dbo.SysColumns t1

  CROSS JOIN Master.dbo.SysColumns t2 

--===== Add a clustered index to keep it from being a "heap".  This one happens to also be a PK.

  ALTER TABLE dbo.JBMTest

        ADD PRIMARY KEY CLUSTERED (RowNum)

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 13

leonp (1/16/2009)

ok ok ..... spare me the barrage, smart, sweet people ..... you were right .... 1st and 2nd were quicker than 3rd and in fact 1st was quicker than 2nd a little bit ...

I changed the coalesce in the 1st to ISNULL and that was even quicker ..... (not asking why - just an FYI)

I obviously haven't reviewed the source code for ISNULL or COALESCE, but I've found the same thing to be true especially in concatenation functions. The only thing I can figure is that ISNULL is designed to take two and only two operands. COALESCE is designed to take at least two up to some relatively unknown number. I can only imagine that the code to determine how many operands COALESCE has and to "split them out" is a wee bit longer than the code to split only two operands that are required.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

leonp SSCommitted Points: 1962 More actions · Answer 14

leonp

SSCommitted

Points: 1962

January 16, 2009 at 12:18 pm

#928430

Truly appreciate it Jeff .....