CASE vs JOIN

Question

Post reply

CASE vs JOIN

Khades

SSCrazy

Points: 2067
More actions
January 11, 2011 at 12:29 pm

#245161

I have a left join. My left table has over 1 billion rows. My right table has about 1000. Not all the values in my right table are used in this join. In fact, when the join is queried, there is only 201 distinct values from my right table used in the join.

If I change this out to a case statement, will it improve performance? This is not something I'm actually planning to do, I'm just curious. Putting in 200+ different conditions is a bit tideous. And if the right table gets updated in the future, it won't reflect in my code which makes it unreliable. But I'm just curious.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Vedran Kesegic SSCrazy Eights Points: 9302 More actions · Answer 1

In short: no. You probably will not have significant performance gains if you use "mutant" CASE statement.

It's all about the megabytes that db engine needs to read to get you results. 1 logical read = 1 page = 8KB.

SET STATISTICS IO ON, and look at the "Messages" for number of logical reads.

Look at execution plan and join type (nested loops, hash join, merge or sort join). Nested loops will give you probably worst result here because small table will be read many times. You can change join type via hint.

Do not "think" and assume, but measure - know the facts and truth. It is really easy.

The execution time is not so important because it depends on various caches (database cache, OS cache, storage system cache), but logical reads is consistent indicator to compare is something good/better or not and do not depend on cache or current server load.

After measuring, you can do at least four things:

1) create covering index (multi-column keys + included columns)

2) think a way to shorten your table by partitionig or separating majority of data to other table: Does it have date-related values, kind of history and only recent months are needed in most of the queries ?

3) if the data is relatively statical, you can make sumation in-advance, like materialized views or indexed view and combine that semi-sumation with more current data that is not summed-up yet.

4) If you use sql2008, filtered index that is much shorter (has much less rows) than the table could do the trick

You did not provide any sample script. Without it you can get just a general answer.

_____________________________________________________
Microsoft Certified Master: SQL Server 2008
XDetails Addin - for SQL Developers
blog.sqlxdetails.com - Transaction log myths

Khades SSCrazy Points: 2067 More actions · Answer 2

Khades

SSCrazy

Points: 2067

January 11, 2011 at 4:15 pm

#1272130

Thank you, you've been very helpful.

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 3

Vedran Kesegic (1/11/2011)

In short: no. You probably will not have significant performance gains if you use "mutant" CASE statement.

I'll agree with this, but add a few things.

Regarding joins, check out the following MSDN links to figure out what they're doing and why at the 10,000 foot view:

Joins in general: http://msdn.microsoft.com/en-us/library/ms191517.aspx

Merge Join: http://msdn.microsoft.com/en-us/library/ms190967.aspx

Loop Join: http://msdn.microsoft.com/en-us/library/ms191318.aspx

Hash Join: http://msdn.microsoft.com/en-us/library/ms189313.aspx

Of specific interest from the Loop Join:

A nested loops join is particularly effective if the outer input is small and the inner input is preindexed and large. In many small transactions, such as those affecting only a small set of rows, index nested loops joins are superior to both merge joins and hash joins. In large queries, however, nested loops joins are often not the optimal choice.

Vedran Kesegic (1/11/2011)

Do not "think" and assume, but measure - know the facts and truth. It is really easy. The execution time is not so important because it depends on various caches (database cache, OS cache, storage system cache), but logical reads is consistent indicator to compare is something good/better or not and do not depend on cache or current server load.

And yet, with a little DBCC FREEPROCCACHE magic with a few others, you can benchmark quite effectively, especially on quieter machines. We code for accuracy, we optimize for speed. Logical reads does not take into account cpu necessities for calculations and the like. Like anything else, it's a handy indicator that you're going in the right direction.

Why do I point this out? The CASE statement will do less reads. 😀 No data pages for an in memory logical conclusion being calculated on a per row basis.

Here's a little test I did, mostly out of curiousity, because I was pretty sure of the answer, but not completely:

Sample Data: (If you need a Tally table, check my link in the sig)

CREATE TABLE BigTable



	(btID INT IDENTITY( 1, 1) NOT NULL,

	btVal VARCHAR(150) NULL,

	fkID INT,

	CONSTRAINT PK_BigTable PRIMARY KEY CLUSTERED ( btID)

	)

INSERT INTO BigTable

	(btVal, fkID)

SELECT

	CONVERT( VARCHAR(50), NEWID()) AS btVal,

	t.N%40 AS fkID

FROM

	tempdb..Tally AS t,

	tempdb..Tally AS t2

WHERE

	t.N <=25000

	AND t2.N <= 150

CREATE TABLE LittleLookup

	(lkID INT IDENTITY( 1,1) NOT NULL,

	lkVal VARCHAR(50),

	CONSTRAINT PK_LittleLookup PRIMARY KEY CLUSTERED (lkID)

	)

INSERT INTO LittleLookup

	(lkVal)

SELECT

	CONVERT( VARCHAR(50), NEWID()) AS lkVal

FROM

	tempdb..Tally AS t

WHERE

	t.N <=32

Now, test code:

The first statement in the test code generated my WHEN THEN clausing for the CASE statement in the second item. Yours will be different than mine, but it'll be close enough.



SELECT

	'WHEN ' + CONVERT( VARCHAR(10), lkID) + ' THEN ''' + lkVal + ''''

FROM

	LittleLookup

SET STATISTICS IO, TIME ON

SELECT

	btID,

	btVal,

	lkVal

FROM

	BigTable AS bt

	LEFT JOIN

		LittleLookup AS lk

	ON	bt.fkID = lk.lkID

SELECT

	btID,

	btVal,

	CASE fkID

		WHEN 1 THEN '0A6D87EA-852A-47D2-8BF9-8DA3B93ABC61'

		WHEN 2 THEN '365634F3-D47D-4B2F-BA4D-3181BA3B0C3F'

		WHEN 3 THEN 'DB672700-9BAE-4F14-8E2F-3A4B0056B11F'

		WHEN 4 THEN '61FDEC0A-CD47-43DA-B6CD-92286DACE812'

		WHEN 5 THEN '0D1E4BC4-1FE9-4D75-B421-E656DF48A2D8'

		WHEN 6 THEN 'DAA51EEF-736E-4CF7-A883-1BA81160EE6F'

		WHEN 7 THEN '8A1FCC8D-9727-4E78-BE11-44817B17055C'

		WHEN 8 THEN 'FC8091FF-FF42-4C5B-B10B-FE937B468D01'

		WHEN 9 THEN 'D47B4CAC-7DE8-4D87-9F9E-99D37C4AD9FB'

		WHEN 10 THEN '40935839-8A03-4161-9B01-13A20C8BFB1B'

		WHEN 11 THEN '6E5D65BC-A198-49A7-BB81-5CDD2CFFD734'

		WHEN 12 THEN '19862EA1-B966-4E2F-BBA0-D6C71F51A8C5'

		WHEN 13 THEN 'A1F44E8A-0C3A-49EE-9BDC-57A3EC9F5CC7'

		WHEN 14 THEN '0CAFDF3B-5EEA-42E3-9A4B-A0F92CB86DA5'

		WHEN 15 THEN 'B585597A-A48F-4903-9C0F-425145B9560F'

		WHEN 16 THEN '82FDBF00-DDF4-4F98-9F92-D7129AE750BA'

		WHEN 17 THEN '15D2A8F0-3A13-4B29-ADBA-B894E6B6F829'

		WHEN 18 THEN '72B9A4CD-8640-4980-A1D3-E05C85FC2859'

		WHEN 19 THEN '1B80401A-E498-4C7A-ACF4-0D580F030140'

		WHEN 20 THEN '3928EE94-A63B-4ABA-80FD-C53A97D46130'

		WHEN 21 THEN '060F9282-18F7-4324-8050-A320F48AA030'

		WHEN 22 THEN '02559DC1-04CB-4D04-BA9D-83D26AB7021B'

		WHEN 23 THEN '970BE449-BEFC-4E73-B935-2B7D8F939771'

		WHEN 24 THEN 'C629785F-C8BF-42CB-BBAE-8C16F4C4D32F'

		WHEN 25 THEN 'ADF929C5-0108-480C-89DD-7836249A4E59'

		WHEN 26 THEN '8765C91A-682F-4E6B-8DC2-07438F1F6A78'

		WHEN 27 THEN 'B99C9F54-2FA2-48C3-B783-2BE4F838A279'

		WHEN 28 THEN '2331CD14-38BF-4EA1-9082-06C34DEC3279'

		WHEN 29 THEN '5D4DC2D3-D544-46CF-A6E5-EF03A9321037'

		WHEN 30 THEN 'A2A946DA-FD13-42A0-9D4B-3B8D0DD7C3D5'

		WHEN 31 THEN '95959DE1-FB1E-42AF-95E6-16881B5C0975'

		WHEN 32 THEN '229224BC-6A2A-4DFA-8503-B92EED312893'

		ELSE NULL END

FROM

	BigTable

SET STATISTICS IO, TIME OFF

From the first query:

Table 'LittleLookup'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.



Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'BigTable'. Scan count 3, logical reads 29090, physical reads 0, read-ahead reads 4, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

   CPU time = 4187 ms,  elapsed time = 429465 ms.

The execution plan is attached as BigTable_LeftLookup, and does a HashJoin for me, and uses some parallelism. I altered to MAXDOP 1 as a curiousity, and got a cleaner run:

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.



Table 'BigTable'. Scan count 1, logical reads 26508, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'LittleLookup'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:

   CPU time = 2579 ms,  elapsed time = 423347 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:

   CPU time = 0 ms,  elapsed time = 1 ms.

Attached that SQLPlan as the _MAXDOP1 version.

From the second one:



Table 'BigTable'. Scan count 1, logical reads 26508, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

   CPU time = 1687 ms,  elapsed time = 421770 ms.

Plan attached as BigTableOnly.

This saved a few seconds, took 1/4 of the CPU, didn't need a worktable, and generically ran cleaner.

I would directly test this with your data. I would also check into indexing better than I did, and what your where clauses do to the two possibilities. There is a chance this will run better for you with the CASE statement. Maintenance nightmare, but possible. You'll have to make that decision based on how transient these options are in the lookup table.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA