T-SQL: Why “It Depends”

Question

Post reply

T-SQL: Why “It Depends”

Viewing 15 posts - 46 through 60 (of 98 total)

You must be logged in to reply to this topic. Login to reply

Paul White SSC Guru Points: 150457 More actions · Answer 1

Paul White

SSC Guru

Points: 150457

March 16, 2010 at 11:16 am

#1134660

timothyawiseman (3/16/2010)

Excellent Article. Thank you for providing it. I will be passing this off to some of my friends soon.

You mean "passing it on"...right? 😉

Passing it off would be a breach of Bob's rights as an author :laugh: :laugh: :laugh:

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Dave62 SSCertifiable Points: 7066 More actions · Answer 2

GabyYYZ (3/16/2010)

... (or is someone willing to defend keeping Autoshrink on?) 🙂

...

I can't really defend keeping Autoshrink on but I can say that it has been left on in our production environment for a number of years with no issues.

I've seen it said more than once here that turning it off is an absolute but with no explanation. Can anyone defend turning Autoshrink off?

Dave

Trey Staker SSCarpal Tunnel Points: 4766 More actions · Answer 3

Bob,

Thanks for a well written consise and much needed article. I've book marked it for the next time someone asks me why their query worked fine in the test enviornment with only a partial dataset and it needed to be reworked in production with 100's of millions of rows. Please write more articles.

---------------------------------------------------------------------
Use Full Links:
KB Article from Microsoft on how to ask a question on a Forum

Paul White SSC Guru Points: 150457 More actions · Answer 4

GabyYYZ (3/16/2010)

(or is someone willing to defend keeping Autoshrink on?)

Hey Gaby 😀

Auto-close and Auto-shrink default to true on MSDE (2000). I worked at a large international retail chain once, with close to 3000 MSDE instances running (one on each cashier's till in every store). The application created many 'snapshot' databases as bulk changes to prices etc. were applied. Auto-shrink and auto-close were pretty handy there.

See? It depends! 😀

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

vmunjeti Newbie Points: 9 More actions · Answer 5

vmunjeti

Newbie

Points: 9

March 16, 2010 at 12:36 pm

#1134727

Hi Bob,

This is Venkat.

You must be knowing by this time.

It is really a million dollars worth article.

In fact we discussed couple of times at our work place on the same topic.

Keep posting.

Cheers..

Venkat Munjeti

Vmunjeti@gmail.com

TedS Valued Member Points: 51 More actions · Answer 6

CirquedeSQLeil (3/16/2010)

TedS (3/16/2010)

Greg Edwards-268690 (3/16/2010)

TedS (3/16/2010)

Am I the only one who copied the code directly and got 3 different result sets for each query?

Granted I used RedGate to generate 500,000 rows in the example table, but I got 699 Rows with the first example, 101 Rows with the 2nd and 102 rows with the 3rd.

I limited the generation of data to have 100 products and dates to reside in 2 years.

Please go back and look at Paul's comment on the format of the code and see if that explains your issue.

Greg E

I did, it doesn't.

My generated data should work just fine, the results of the 3 queries (both seperate and together) are simply different.

While I do appreciate the effort and agree with the premise that there isn't a one size fits all soloution (it does really depend) after I fix the (formatting related) syntax errors the results aren't the same.

I have run the code and did not come up with 3 different result sets.

Interesting. Perhaps my random data generator did something evil, but I'm getting 3 very distinct result sets (still). how did you populate the table? I have 500,000 semi-random records with some boundries. I used Red-Gate data generator to populate the table. I am very interested in the findings as I have a couple of databases that have crossed a "magical" theshold and are now generating different execution plans AND sometimes they cause issues such as IO performance drain, or filling up TempDB etc.

rob.lobbe-964963 Old Hand Points: 307 More actions · Answer 7

GabyYYZ (3/16/2010)

1. Do NOT turn autoshrink on.

(or is someone willing to defend keeping Autoshrink on?) 🙂

I'll defend it! 😛

ALL my developers have to put up with it.

Frankly I want to give them the worst performing instances/databases in the organization. Not so bad that they can't do their jobs, but bad enough that they appreciate the benefits of getting it right!

When their code moves to production, it's less work for me.

but once again 'it depends' on how they have obtained the performance gains...

*had one guy reduce his test-data set - no help to anyone.

Tom Garth SSCertifiable Points: 6193 More actions · Answer 8

Tom Garth

SSCertifiable

Points: 6193

March 16, 2010 at 4:01 pm

#1134889

A SUPER READ!!!

Tom Garth
Vertical Solutions[/url]

"There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers

Evan Yates Valued Member Points: 61 More actions · Answer 9

In the past I have used this technique to find the "latest" record for each Product:

SELECT a.Product, a.SaleQuantity, a.SalePrice, a.SaleDate, a.SaleComments



FROM dbo.SalesDetail a

WHERE NOT EXISTS (

	SELECT 1 FROM dbo.SalesDetail b

	WHERE b.Product = a.Product AND

b.SaleDate > a.SaleDate )

Against my test data it out-performs the other three methods at the 10,000 and 100,000 record levels.

Admittedly I haven't tried the 1,000,000 record level.

The execution plan shows the major cost is a Hash Match (Right Anti Semi Join).

Is this OK as an approach or will I run into fish-hooks?

Paul White SSC Guru Points: 150457 More actions · Answer 10

Evan Yates (3/16/2010)

In the past I have used this technique to find the "latest" record for each Product:

...NOT EXISTS...

Against my test data it out-performs the other three methods at the 10,000 and 100,000 record levels.

Admittedly I haven't tried the 1,000,000 record level.

The execution plan shows the major cost is a Hash Match (Right Anti Semi Join).

Is this OK as an approach or will I run into fish-hooks?

It depends, of course.

The 'weakness' of NOT EXISTS is that every row must be scanned to check the NOT EXISTS predicate. A semi join (EXISTS) can stop looking as soon as it finds a single match. Also, remember that a hash operation requires a memory grant, and so may not be the optimal choice for OLTP (the query might have to wait for the memory grant, or a poor estimate for the memory required might lead to the hash spilling to disk).

In general, ROW_NUMBER and the APPLY/TOP methods will often out-perform the anti semi join (assuming optimal indexing for all). The reason being that ROW_NUMBER only scans the input once (no joins), and APPLY/TOP can be even faster if there are a larger number of detail records per group. It does depend on the indexing and data distribution though.

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

webatxcent Grasshopper Points: 12 More actions · Answer 11

There is another dimension to the "It Depends"--Parallelization.

Some time ago I challenged my staff to demonstrate their SQL knowledge by solving the problem of identifying gaps in monotonic sequence of reference/key numbers typically found in an identity column. (I had already had a solution in hand to address an auditing issue.)

So I asked:

Given a table Transaction with the field ID defined as integer identity( 1, 1 ) NOT NULL,

write a single SELECT statement that will return a result set containing the last number in a run of numbers, the first number of the next run, and the number of missing records between them.

If Transaction had records containing the following values for ID: 1, 2, 3, 4, 8, 9, 10, 19, 20, 21, 22

Your result set would be

4 8 3

10 19 8

What resulted surprised me. One submission did better on a single core and one submission did better on a multi-core CPU.

The multicore (8-way) winner was



SELECT 	t1.ID AS StartGap

	( SELECT MIN( ID ) FROM Transaction WHERE ID > t1.ID ) AS EndGap,

	( SELECT MIN( ID ) FROM Transaction WHERE ID > t1.ID )

		- ( t1.ID + 1 ) AS GapSize

FROM 	Transaction t1

WHERE ( t1.ID + 1 ) != ( SELECT MIN( ID ) FROM Transaction WHERE ID > t1.ID )

The single core winner was



SELECT 	StartGap,

	EndGap ,

	StartGap - EndGap AS GapSize

FROM	( SELECT		t1.ID AS StartGap,

			(	SELECT MIN( CONVERT( int, t2.ID ) )

				FROM Transaction t2

				WHERE t2.ID > t1.ID

			) AS EndGap

	FROM	Transaction t1 ) tbl

WHERE	EndGap - StartGap > 1

While I don't have the results handy to post, what was equally more disturbing was that the code that won on the single core had an execution time better than when it was run on the the 8-way. My take was that the optimizer goofed trying to parallelize it.

Bill.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 12

ith the last couple of posts in mind, I hope everyone realizes that the big brother to "It Depends" is "Test it!". 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Paul White SSC Guru Points: 150457 More actions · Answer 13

webatxcent (3/16/2010)

Some time ago I challenged my staff to demonstrate their SQL knowledge by solving the problem of identifying gaps in monotonic sequence of reference/key numbers typically found in an identity column.

Itzik did an excellent series on 'gaps and islands' problems for SQL Server Magazine. If you have a subscription to that site, I would encourage you to read them. There are even better methods than the ones you posted...for some given value of 'better'.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

SQLRNNR SSC Guru Points: 281324 More actions · Answer 14

Jeff Moden (3/16/2010)

ith the last couple of posts in mind, I hope everyone realizes that the big brother to "It Depends" is "Test it!". 😉

I think this needs a little more emphasis. TEST IT!

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

Paul White SSC Guru Points: 150457 More actions · Answer 15

Paul White

SSC Guru

Points: 150457

March 16, 2010 at 10:50 pm

#1134987

[font="Arial Black"]TEST IT?[/font]

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi