Metrics Table or Performance Tuning

Question

Post reply

Metrics Table or Performance Tuning

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 1

ScottPletcher (8/21/2015)

ChrisM@Work (8/21/2015)

The inner select can be accelerated with this index:

CREATE INDEX ix_Helper ON [cts].[exception_Main]

(productArea, reportable, reportYear, reportMonth, queueID)

INCLUDE (volume, cost, exceptionDateTime)

But with the best clustered index now on the table, you don't need this extra, fairly large index. You should get as good a response time -- possibly even better, depends on the details -- from a clustered index on exceptionDateTime. That's the huge advantage --edit: of the best clus index -- that you don't have to create the gazillions on covering indexes for so many queries.

We now have a test harness Scott if you'd like to put the theory to work. I'm out of time for today for something of this scale but I'll have time tomorrow.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ScottPletcher SSC Guru Points: 100262 More actions · Answer 2

ChrisM@Work (8/21/2015)

ScottPletcher (8/21/2015)

ChrisM@Work (8/21/2015)

The inner select can be accelerated with this index:

CREATE INDEX ix_Helper ON [cts].[exception_Main]

(productArea, reportable, reportYear, reportMonth, queueID)

INCLUDE (volume, cost, exceptionDateTime)

But with the best clustered index now on the table, you don't need this extra, fairly large index. You should get as good a response time -- possibly even better, depends on the details -- from a clustered index on exceptionDateTime. That's the huge advantage --edit: of the best clus index -- that you don't have to create the gazillions on covering indexes for so many queries.

We now have a test harness Scott if you'd like to put the theory to work. I'm out of time for today for something of this scale but I'll have time tomorrow.

But you're looking at one query in isolation. You have to consider all the processing going on against that table. All these custom indexes require insert/update/delete maintenance, and they take buffer space to satisfy only one query or two. A clustered index is by definition a covering index for all queries, and the buffer space is shared among all users of that data. There's usually many people reading the most recent data.

Edit: Can you get better performance for a given table by building a custom table -- which is what the proposed index effectively is -- specifically for that query? Sure. But is it really worth rewriting half the table for every table mod just to get, say, 5% more performance for this query? It doesn't take long before the covering indexes cost you vastly more than they are worth. I've removed literally many thousands of nonclustered indexes here with vastly better performance overall (orders of magnitude in some cases) by first identifying and creating the best clustered index on tables. The vast majority of those tables had the clustered index on identity, because of the horribly mistaken and misguided belief that is some type of "default" clustered index for any table.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

ScottPletcher (8/21/2015)

ChrisM@Work (8/21/2015)

ScottPletcher (8/21/2015)

ChrisM@Work (8/21/2015)

The inner select can be accelerated with this index:

CREATE INDEX ix_Helper ON [cts].[exception_Main]

(productArea, reportable, reportYear, reportMonth, queueID)

INCLUDE (volume, cost, exceptionDateTime)

But with the best clustered index now on the table, you don't need this extra, fairly large index. You should get as good a response time -- possibly even better, depends on the details -- from a clustered index on exceptionDateTime. That's the huge advantage --edit: of the best clus index -- that you don't have to create the gazillions on covering indexes for so many queries.

We now have a test harness Scott if you'd like to put the theory to work. I'm out of time for today for something of this scale but I'll have time tomorrow.

But you're looking at one query in isolation. You have to consider all the processing going on against that table. All these custom indexes require insert/update/delete maintenance, and they take buffer space to satisfy only one query or two. A clustered index is by definition a covering index for all queries, and the buffer space is shared among all users of that data. There's usually many people reading the most recent data.

Edit: Can you get better performance for a given table by building a custom table -- which is what the proposed index effectively is -- specifically for that query? Sure. But is it really worth rewriting half the table for every table mod just to get, say, 5% more performance for this query? It doesn't take long before the covering indexes cost you vastly more than they are worth. I've removed literally many thousands of nonclustered indexes here with vastly better performance overall (orders of magnitude in some cases) by first identifying and creating the best clustered index on tables. The vast majority of those tables had the clustered index on identity, because of the horribly mistaken and misguided belief that is some type of "default" clustered index for any table.

The clustered index on Nick's table wasn't the best choice since it was VARCHAR(400), so I ran a few tests to see how a surrogate key might fare against the suggestion of using the exceptionDateTime column. First I set up a test table containing little more than the columns used by the query, and with the same number of rows, about 14.5 million. I also set up the smaller table. Then I played about with indexing. Here's the code for the sample data:

-- set up sample data



IF OBJECT_ID('tempdb..#Exception_main') IS NOT NULL DROP TABLE #Exception_main

SELECT

	ID, -- 8 bytes

	[exceptionID] = CAST(REPLICATE(CAST(NEWID() AS VARCHAR(36))+' ',7) AS VARCHAR(400)),

	reportYear = YEAR(ReportDate),

	reportMonth = MONTH(ReportDate),

	reportable = CAST(CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'Y' ELSE 'N' END AS CHAR(1)),

	QueueID = CAST(ABS(CHECKSUM(NEWID()))%420 AS VARCHAR(256)),

	exceptionDateTime = DATEADD(DAY,ABS(CHECKSUM(NEWID()))%3,ReportDate), -- 8 bytes

	productArea = CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'FXMM' ELSE 'N' END,

	volume = ABS(CHECKSUM(NEWID()))%20,

	cost = ABS(CHECKSUM(NEWID()))%30

INTO #Exception_main

FROM (

	SELECT ID, ReportDate = DATEADD(MINUTE, 0-ID/20,GETDATE())

	FROM (

		SELECT TOP(14500000) -- 00:04:02 / 14 000 000

			ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

		FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c, SYS.COLUMNS d, SYS.COLUMNS e

	) d

) e

IF OBJECT_ID('tempdb..#Map_Exception') IS NOT NULL DROP TABLE #Map_Exception

SELECT TOP(420)

	QueueID = CAST(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS VARCHAR(40)),

	[Service] = ABS(CHECKSUM(NEWID()))%100

INTO #Map_Exception

FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c

CREATE UNIQUE CLUSTERED INDEX ucx_Map_Exception ON #Map_Exception (QueueID)

This takes about four minutes to run on a steam-powered dev box.

Having built the data, I had a play with indexes. To be sure that the results weren't skewed by other processes, I ran through the whole lot four times.

Here's the query:

SET STATISTICS IO, TIME ON



SELECT m.reportMonth,

	m.reportYear,

	ex.[service],

	SUM(m.Vol) AS Vol,

	SUM(m.Effort) AS effort

FROM (

	SELECT reportMonth,

		reportYear,

		queueID,

		SUM(volume) AS Vol,

		SUM(cost) AS Effort

	FROM #exception_Main -- 14.5M rows

	WHERE exceptionDateTime >= GETDATE() - 365

		AND productArea = ('FXMM')

		AND reportable = 'Y'

	GROUP BY

		reportYear,

		reportMonth,

		queueID

) m

LEFT JOIN #map_Exception ex -- 420 rows

	ON m.queueID = ex.queueID

GROUP BY

	m.reportYear,

	m.reportMonth,

	ex.[service]

SET STATISTICS IO, TIME OFF

On this particular set the query returned 1,287 aggregated rows from 2,633,895 qualifying rows.

Here are the summarised results from the indexing tests:

--====================================================================================================



-- 1. Baseline

CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([exceptionID])

CREATE NONCLUSTERED INDEX [idx_ctsTrend] ON #exception_Main

	([productArea] ASC, [reportable] ASC, [exceptionDateTime] ASC)

	INCLUDE ([queueID], [cost], [reportMonth], [reportYear], [volume]) 

EXEC sp_spaceused '#exception_Main'

--Reserved	= 9,446,096 KB

--Data		= 4,640,032 KB

--Index_size= 4,805,352 KB

-- Best result from 6 runs: logical reads 131639, elapsed time = 656 ms.

-- Index seek, no residual predicate, hash matches for aggregates 

--====================================================================================================

-- 2. Unique clustered index on surrogate key ID

DROP INDEX idx_ctsTrend ON #Exception_main

DROP INDEX ucx_Sample ON #Exception_main

CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([ID])

EXEC sp_spaceused '#exception_Main'

--Reserved	= 4,652,128 KB

--Data		= 4,640,024 KB

--Index_size= 11,672 KB

-- Best result from 6 runs: logical reads 581450, elapsed time = 28288 ms.

-- Clustered index scan, hash matches for aggregates 

--====================================================================================================

-- 2.1 Unique clustered index on surrogate key ID with supporting nonclustered index

CREATE INDEX ix_Helper ON #Exception_main

	(productArea, reportable, reportYear, reportMonth, queueID)

	INCLUDE (volume, cost, exceptionDateTime);

EXEC sp_spaceused '#exception_Main'

--Reserved	= 5,393,856 KB

--Data		= 4,640,024 KB

--Index_size= 753,208 KB

-- Best result from 6 runs: logical reads 24503, elapsed time = 332 ms.

-- Index seek (productarea, reportable), residual predicate for exceptionDateTime, Stream Aggregates

--====================================================================================================

-- 3.0 Clustered index on exceptionDateTime

DROP INDEX ix_Helper ON #Exception_main

DROP INDEX ucx_Sample ON #Exception_main

CREATE CLUSTERED INDEX ucx_Sample ON #Exception_main (exceptionDateTime)

EXEC sp_spaceused '#exception_Main'

--Reserved	= 4,656,544 KB

--Data		= 4,640,032 KB

--Index_size= 16,024 KB

-- Best result from 6 runs: logical reads 424515, physical reads 275, read-ahead reads 420270, elapsed time = 23276 ms.

-- Clustered index seek (exceptionDateTime) residual predicate for productArea and Reportable, hash matches for aggregates

--====================================================================================================

-- 3.1 Clustered index on exceptionDateTime & recommended non-clustered index

CREATE NONCLUSTERED INDEX ix_Recommended ON [dbo].[#Exception_main]

	([reportable],[productArea],[exceptionDateTime])

	INCLUDE ([reportYear],[reportMonth],[QueueID],[volume],[cost])

EXEC sp_spaceused '#exception_Main'

--Reserved	= 4,656,544 KB

--Data		= 4,640,032 KB

--Index_size= 16,024 KB

-- Best result from 6 runs: logical reads 16916, elapsed time = 495 ms.

-- Clustered index seek, no residual predicate, hash matches for aggregates 

--====================================================================================================

DROP INDEX ix_Recommended ON [dbo].[#Exception_main]

DROP INDEX ucx_Sample ON #Exception_main

-- back where we started, whizz around for another go to ensure results aren't skewed by local activity

The clustered index on exceptionDateTime was only marginally faster than the surrogate key because so many rows had to be filtered by the residual predicate. For sure the ratio would change in favour of a cluster on exceptionDateTime with a smaller number of qualifying rows, but this dataset wasn't deliberately tipped in favour of a natural key, it's just a very rough approximation of a real world situation.

xx

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ScottPletcher SSC Guru Points: 100262 More actions · Answer 4

ChrisM@Work (8/25/2015)

ScottPletcher (8/21/2015)

ChrisM@Work (8/21/2015)

ScottPletcher (8/21/2015)

ChrisM@Work (8/21/2015)

The inner select can be accelerated with this index:

CREATE INDEX ix_Helper ON [cts].[exception_Main]

(productArea, reportable, reportYear, reportMonth, queueID)

INCLUDE (volume, cost, exceptionDateTime)

But with the best clustered index now on the table, you don't need this extra, fairly large index. You should get as good a response time -- possibly even better, depends on the details -- from a clustered index on exceptionDateTime. That's the huge advantage --edit: of the best clus index -- that you don't have to create the gazillions on covering indexes for so many queries.

We now have a test harness Scott if you'd like to put the theory to work. I'm out of time for today for something of this scale but I'll have time tomorrow.

But you're looking at one query in isolation. You have to consider all the processing going on against that table. All these custom indexes require insert/update/delete maintenance, and they take buffer space to satisfy only one query or two. A clustered index is by definition a covering index for all queries, and the buffer space is shared among all users of that data. There's usually many people reading the most recent data.

Edit: Can you get better performance for a given table by building a custom table -- which is what the proposed index effectively is -- specifically for that query? Sure. But is it really worth rewriting half the table for every table mod just to get, say, 5% more performance for this query? It doesn't take long before the covering indexes cost you vastly more than they are worth. I've removed literally many thousands of nonclustered indexes here with vastly better performance overall (orders of magnitude in some cases) by first identifying and creating the best clustered index on tables. The vast majority of those tables had the clustered index on identity, because of the horribly mistaken and misguided belief that is some type of "default" clustered index for any table.

The clustered index on Nick's table wasn't the best choice since it was VARCHAR(400), so I ran a few tests to see how a surrogate key might fare against the suggestion of using the exceptionDateTime column. First I set up a test table containing little more than the columns used by the query, and with the same number of rows, about 14.5 million. I also set up the smaller table. Then I played about with indexing. Here's the code for the sample data:

-- set up sample data
IF OBJECT_ID('tempdb..#Exception_main') IS NOT NULL DROP TABLE #Exception_main SELECT ID, -- 8 bytes [exceptionID] = CAST(REPLICATE(CAST(NEWID() AS VARCHAR(36))+' ',7) AS VARCHAR(400)), reportYear = YEAR(ReportDate), reportMonth = MONTH(ReportDate), reportable = CAST(CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'Y' ELSE 'N' END AS CHAR(1)), QueueID = CAST(ABS(CHECKSUM(NEWID()))%420 AS VARCHAR(256)), exceptionDateTime = DATEADD(DAY,ABS(CHECKSUM(NEWID()))%3,ReportDate), -- 8 bytes productArea = CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'FXMM' ELSE 'N' END, volume = ABS(CHECKSUM(NEWID()))%20, cost = ABS(CHECKSUM(NEWID()))%30 INTO #Exception_main FROM ( SELECT ID, ReportDate = DATEADD(MINUTE, 0-ID/20,GETDATE()) FROM ( SELECT TOP(14500000) -- 00:04:02 / 14 000 000 ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c, SYS.COLUMNS d, SYS.COLUMNS e ) d ) e IF OBJECT_ID('tempdb..#Map_Exception') IS NOT NULL DROP TABLE #Map_Exception SELECT TOP(420) QueueID = CAST(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS VARCHAR(40)), [Service] = ABS(CHECKSUM(NEWID()))%100 INTO #Map_Exception FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c
CREATE UNIQUE CLUSTERED INDEX ucx_Map_Exception ON #Map_Exception (QueueID)

This takes about four minutes to run on a steam-powered dev box.

Having built the data, I had a play with indexes. To be sure that the results weren't skewed by other processes, I ran through the whole lot four times.

Here's the query:

SET STATISTICS IO, TIME ON
SELECT m.reportMonth, m.reportYear, ex.[service], SUM(m.Vol) AS Vol, SUM(m.Effort) AS effort FROM ( SELECT reportMonth, reportYear, queueID, SUM(volume) AS Vol, SUM(cost) AS Effort FROM #exception_Main -- 14.5M rows WHERE exceptionDateTime >= GETDATE() - 365 AND productArea = ('FXMM') AND reportable = 'Y' GROUP BY reportYear, reportMonth, queueID ) m LEFT JOIN #map_Exception ex -- 420 rows ON m.queueID = ex.queueID GROUP BY m.reportYear, m.reportMonth, ex.[service]
SET STATISTICS IO, TIME OFF

On this particular set the query returned 1,287 aggregated rows from 2,633,895 qualifying rows.

Here are the summarised results from the indexing tests:

--====================================================================================================
-- 1. Baseline CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([exceptionID]) CREATE NONCLUSTERED INDEX [idx_ctsTrend] ON #exception_Main ([productArea] ASC, [reportable] ASC, [exceptionDateTime] ASC) INCLUDE ([queueID], [cost], [reportMonth], [reportYear], [volume]) EXEC sp_spaceused '#exception_Main' --Reserved = 9,446,096 KB --Data = 4,640,032 KB --Index_size= 4,805,352 KB -- Best result from 6 runs: logical reads 131639, elapsed time = 656 ms. -- Index seek, no residual predicate, hash matches for aggregates --==================================================================================================== -- 2. Unique clustered index on surrogate key ID DROP INDEX idx_ctsTrend ON #Exception_main DROP INDEX ucx_Sample ON #Exception_main CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([ID]) EXEC sp_spaceused '#exception_Main' --Reserved = 4,652,128 KB --Data = 4,640,024 KB --Index_size= 11,672 KB -- Best result from 6 runs: logical reads 581450, elapsed time = 28288 ms. -- Clustered index scan, hash matches for aggregates --==================================================================================================== -- 2.1 Unique clustered index on surrogate key ID with supporting nonclustered index CREATE INDEX ix_Helper ON #Exception_main (productArea, reportable, reportYear, reportMonth, queueID) INCLUDE (volume, cost, exceptionDateTime); EXEC sp_spaceused '#exception_Main' --Reserved = 5,393,856 KB --Data = 4,640,024 KB --Index_size= 753,208 KB -- Best result from 6 runs: logical reads 24503, elapsed time = 332 ms. -- Index seek (productarea, reportable), residual predicate for exceptionDateTime, Stream Aggregates --==================================================================================================== -- 3.0 Clustered index on exceptionDateTime DROP INDEX ix_Helper ON #Exception_main DROP INDEX ucx_Sample ON #Exception_main CREATE CLUSTERED INDEX ucx_Sample ON #Exception_main (exceptionDateTime) EXEC sp_spaceused '#exception_Main' --Reserved = 4,656,544 KB --Data = 4,640,032 KB --Index_size= 16,024 KB -- Best result from 6 runs: logical reads 424515, physical reads 275, read-ahead reads 420270, elapsed time = 23276 ms. -- Clustered index seek (exceptionDateTime) residual predicate for productArea and Reportable, hash matches for aggregates --==================================================================================================== -- 3.1 Clustered index on exceptionDateTime & recommended non-clustered index CREATE NONCLUSTERED INDEX ix_Recommended ON [dbo].[#Exception_main] ([reportable],[productArea],[exceptionDateTime]) INCLUDE ([reportYear],[reportMonth],[QueueID],[volume],[cost]) EXEC sp_spaceused '#exception_Main' --Reserved = 4,656,544 KB --Data = 4,640,032 KB --Index_size= 16,024 KB -- Best result from 6 runs: logical reads 16916, elapsed time = 495 ms. -- Clustered index seek, no residual predicate, hash matches for aggregates --==================================================================================================== DROP INDEX ix_Recommended ON [dbo].[#Exception_main] DROP INDEX ucx_Sample ON #Exception_main -- back where we started, whizz around for another go to ensure results aren't skewed by local activity

The clustered index on exceptionDateTime was only marginally faster than the surrogate key because so many rows had to be filtered by the residual predicate. For sure the ratio would change in favour of a cluster on exceptionDateTime with a smaller number of qualifying rows, but this dataset wasn't deliberately tipped in favour of a natural key, it's just a very rough approximation of a real world situation.

xx

That seems to prove my point: the best clustered index eliminated the need for an extra, nonclustered index with better overall performance. And, since datetime is used as a filter in (almost) every query, almost every query will perform overall better and with more consistency.

Moreover, there's no dreaded "tipping point" when using the clustered index. Lastly, when you add one column to the query, such as including the customer in the grouping as did the original query the OP posted, there's no need to rebuild/refactor a covering index. No constant reshuffling nonclus indexes, and, inevitably, constantly increasing their size.

Don't get me wrong. Some covering indexes will almost certainly still be needed. But they are drastically reduced.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 5

ScottPletcher (8/25/2015)

ChrisM@Work (8/25/2015)

ScottPletcher (8/21/2015)

ChrisM@Work (8/21/2015)

ScottPletcher (8/21/2015)

ChrisM@Work (8/21/2015)

The inner select can be accelerated with this index:

CREATE INDEX ix_Helper ON [cts].[exception_Main]

(productArea, reportable, reportYear, reportMonth, queueID)

INCLUDE (volume, cost, exceptionDateTime)

But with the best clustered index now on the table, you don't need this extra, fairly large index. You should get as good a response time -- possibly even better, depends on the details -- from a clustered index on exceptionDateTime. That's the huge advantage --edit: of the best clus index -- that you don't have to create the gazillions on covering indexes for so many queries.

We now have a test harness Scott if you'd like to put the theory to work. I'm out of time for today for something of this scale but I'll have time tomorrow.

But you're looking at one query in isolation. You have to consider all the processing going on against that table. All these custom indexes require insert/update/delete maintenance, and they take buffer space to satisfy only one query or two. A clustered index is by definition a covering index for all queries, and the buffer space is shared among all users of that data. There's usually many people reading the most recent data.

Edit: Can you get better performance for a given table by building a custom table -- which is what the proposed index effectively is -- specifically for that query? Sure. But is it really worth rewriting half the table for every table mod just to get, say, 5% more performance for this query? It doesn't take long before the covering indexes cost you vastly more than they are worth. I've removed literally many thousands of nonclustered indexes here with vastly better performance overall (orders of magnitude in some cases) by first identifying and creating the best clustered index on tables. The vast majority of those tables had the clustered index on identity, because of the horribly mistaken and misguided belief that is some type of "default" clustered index for any table.

The clustered index on Nick's table wasn't the best choice since it was VARCHAR(400), so I ran a few tests to see how a surrogate key might fare against the suggestion of using the exceptionDateTime column. First I set up a test table containing little more than the columns used by the query, and with the same number of rows, about 14.5 million. I also set up the smaller table. Then I played about with indexing. Here's the code for the sample data:

-- set up sample data
IF OBJECT_ID('tempdb..#Exception_main') IS NOT NULL DROP TABLE #Exception_main SELECT ID, -- 8 bytes [exceptionID] = CAST(REPLICATE(CAST(NEWID() AS VARCHAR(36))+' ',7) AS VARCHAR(400)), reportYear = YEAR(ReportDate), reportMonth = MONTH(ReportDate), reportable = CAST(CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'Y' ELSE 'N' END AS CHAR(1)), QueueID = CAST(ABS(CHECKSUM(NEWID()))%420 AS VARCHAR(256)), exceptionDateTime = DATEADD(DAY,ABS(CHECKSUM(NEWID()))%3,ReportDate), -- 8 bytes productArea = CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'FXMM' ELSE 'N' END, volume = ABS(CHECKSUM(NEWID()))%20, cost = ABS(CHECKSUM(NEWID()))%30 INTO #Exception_main FROM ( SELECT ID, ReportDate = DATEADD(MINUTE, 0-ID/20,GETDATE()) FROM ( SELECT TOP(14500000) -- 00:04:02 / 14 000 000 ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c, SYS.COLUMNS d, SYS.COLUMNS e ) d ) e IF OBJECT_ID('tempdb..#Map_Exception') IS NOT NULL DROP TABLE #Map_Exception SELECT TOP(420) QueueID = CAST(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS VARCHAR(40)), [Service] = ABS(CHECKSUM(NEWID()))%100 INTO #Map_Exception FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c
CREATE UNIQUE CLUSTERED INDEX ucx_Map_Exception ON #Map_Exception (QueueID)

This takes about four minutes to run on a steam-powered dev box.

Having built the data, I had a play with indexes. To be sure that the results weren't skewed by other processes, I ran through the whole lot four times.

Here's the query:

SET STATISTICS IO, TIME ON
SELECT m.reportMonth, m.reportYear, ex.[service], SUM(m.Vol) AS Vol, SUM(m.Effort) AS effort FROM ( SELECT reportMonth, reportYear, queueID, SUM(volume) AS Vol, SUM(cost) AS Effort FROM #exception_Main -- 14.5M rows WHERE exceptionDateTime >= GETDATE() - 365 AND productArea = ('FXMM') AND reportable = 'Y' GROUP BY reportYear, reportMonth, queueID ) m LEFT JOIN #map_Exception ex -- 420 rows ON m.queueID = ex.queueID GROUP BY m.reportYear, m.reportMonth, ex.[service]
SET STATISTICS IO, TIME OFF

On this particular set the query returned 1,287 aggregated rows from 2,633,895 qualifying rows.

Here are the summarised results from the indexing tests:

--====================================================================================================
-- 1. Baseline CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([exceptionID]) CREATE NONCLUSTERED INDEX [idx_ctsTrend] ON #exception_Main ([productArea] ASC, [reportable] ASC, [exceptionDateTime] ASC) INCLUDE ([queueID], [cost], [reportMonth], [reportYear], [volume]) EXEC sp_spaceused '#exception_Main' --Reserved = 9,446,096 KB --Data = 4,640,032 KB --Index_size= 4,805,352 KB -- Best result from 6 runs: logical reads 131639, elapsed time = 656 ms. -- Index seek, no residual predicate, hash matches for aggregates --==================================================================================================== -- 2. Unique clustered index on surrogate key ID DROP INDEX idx_ctsTrend ON #Exception_main DROP INDEX ucx_Sample ON #Exception_main CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([ID]) EXEC sp_spaceused '#exception_Main' --Reserved = 4,652,128 KB --Data = 4,640,024 KB --Index_size= 11,672 KB -- Best result from 6 runs: logical reads 581450, elapsed time = 28288 ms. -- Clustered index scan, hash matches for aggregates --==================================================================================================== -- 2.1 Unique clustered index on surrogate key ID with supporting nonclustered index CREATE INDEX ix_Helper ON #Exception_main (productArea, reportable, reportYear, reportMonth, queueID) INCLUDE (volume, cost, exceptionDateTime); EXEC sp_spaceused '#exception_Main' --Reserved = 5,393,856 KB --Data = 4,640,024 KB --Index_size= 753,208 KB -- Best result from 6 runs: logical reads 24503, elapsed time = 332 ms. -- Index seek (productarea, reportable), residual predicate for exceptionDateTime, Stream Aggregates --==================================================================================================== -- 3.0 Clustered index on exceptionDateTime DROP INDEX ix_Helper ON #Exception_main DROP INDEX ucx_Sample ON #Exception_main CREATE CLUSTERED INDEX ucx_Sample ON #Exception_main (exceptionDateTime) EXEC sp_spaceused '#exception_Main' --Reserved = 4,656,544 KB --Data = 4,640,032 KB --Index_size= 16,024 KB -- Best result from 6 runs: logical reads 424515, physical reads 275, read-ahead reads 420270, elapsed time = 23276 ms. -- Clustered index seek (exceptionDateTime) residual predicate for productArea and Reportable, hash matches for aggregates --==================================================================================================== -- 3.1 Clustered index on exceptionDateTime & recommended non-clustered index CREATE NONCLUSTERED INDEX ix_Recommended ON [dbo].[#Exception_main] ([reportable],[productArea],[exceptionDateTime]) INCLUDE ([reportYear],[reportMonth],[QueueID],[volume],[cost]) EXEC sp_spaceused '#exception_Main' --Reserved = 4,656,544 KB --Data = 4,640,032 KB --Index_size= 16,024 KB -- Best result from 6 runs: logical reads 16916, elapsed time = 495 ms. -- Clustered index seek, no residual predicate, hash matches for aggregates --==================================================================================================== DROP INDEX ix_Recommended ON [dbo].[#Exception_main] DROP INDEX ucx_Sample ON #Exception_main -- back where we started, whizz around for another go to ensure results aren't skewed by local activity

The clustered index on exceptionDateTime was only marginally faster than the surrogate key because so many rows had to be filtered by the residual predicate. For sure the ratio would change in favour of a cluster on exceptionDateTime with a smaller number of qualifying rows, but this dataset wasn't deliberately tipped in favour of a natural key, it's just a very rough approximation of a real world situation.

xx

That seems to prove my point: the best clustered index eliminated the need for an extra, nonclustered index with better overall performance. And, since datetime is used as a filter in (almost) every query, almost every query will perform overall better and with more consistency.

Moreover, there's no dreaded "tipping point" when using the clustered index. Lastly, when you add one column to the query, such as including the customer in the grouping as did the original query the OP posted, there's no need to rebuild/refactor a covering index. No constant reshuffling nonclus indexes, and, inevitably, constantly increasing their size.

Don't get me wrong. Some covering indexes will almost certainly still be needed. But they are drastically reduced.

How is 23 seconds better than 300 milliseconds? The clustered index on your chosen column was only marginally better than a surrogate key (ID), which returned in 28 seconds.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ScottPletcher SSC Guru Points: 100262 More actions · Answer 6

ChrisM@Work (8/25/2015)

ScottPletcher (8/25/2015)

ChrisM@Work (8/25/2015)

ScottPletcher (8/21/2015)

ChrisM@Work (8/21/2015)

ScottPletcher (8/21/2015)

ChrisM@Work (8/21/2015)

The inner select can be accelerated with this index:

CREATE INDEX ix_Helper ON [cts].[exception_Main]

(productArea, reportable, reportYear, reportMonth, queueID)

INCLUDE (volume, cost, exceptionDateTime)

But with the best clustered index now on the table, you don't need this extra, fairly large index. You should get as good a response time -- possibly even better, depends on the details -- from a clustered index on exceptionDateTime. That's the huge advantage --edit: of the best clus index -- that you don't have to create the gazillions on covering indexes for so many queries.

We now have a test harness Scott if you'd like to put the theory to work. I'm out of time for today for something of this scale but I'll have time tomorrow.

But you're looking at one query in isolation. You have to consider all the processing going on against that table. All these custom indexes require insert/update/delete maintenance, and they take buffer space to satisfy only one query or two. A clustered index is by definition a covering index for all queries, and the buffer space is shared among all users of that data. There's usually many people reading the most recent data.

Edit: Can you get better performance for a given table by building a custom table -- which is what the proposed index effectively is -- specifically for that query? Sure. But is it really worth rewriting half the table for every table mod just to get, say, 5% more performance for this query? It doesn't take long before the covering indexes cost you vastly more than they are worth. I've removed literally many thousands of nonclustered indexes here with vastly better performance overall (orders of magnitude in some cases) by first identifying and creating the best clustered index on tables. The vast majority of those tables had the clustered index on identity, because of the horribly mistaken and misguided belief that is some type of "default" clustered index for any table.

The clustered index on Nick's table wasn't the best choice since it was VARCHAR(400), so I ran a few tests to see how a surrogate key might fare against the suggestion of using the exceptionDateTime column. First I set up a test table containing little more than the columns used by the query, and with the same number of rows, about 14.5 million. I also set up the smaller table. Then I played about with indexing. Here's the code for the sample data:

-- set up sample data
IF OBJECT_ID('tempdb..#Exception_main') IS NOT NULL DROP TABLE #Exception_main SELECT ID, -- 8 bytes [exceptionID] = CAST(REPLICATE(CAST(NEWID() AS VARCHAR(36))+' ',7) AS VARCHAR(400)), reportYear = YEAR(ReportDate), reportMonth = MONTH(ReportDate), reportable = CAST(CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'Y' ELSE 'N' END AS CHAR(1)), QueueID = CAST(ABS(CHECKSUM(NEWID()))%420 AS VARCHAR(256)), exceptionDateTime = DATEADD(DAY,ABS(CHECKSUM(NEWID()))%3,ReportDate), -- 8 bytes productArea = CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'FXMM' ELSE 'N' END, volume = ABS(CHECKSUM(NEWID()))%20, cost = ABS(CHECKSUM(NEWID()))%30 INTO #Exception_main FROM ( SELECT ID, ReportDate = DATEADD(MINUTE, 0-ID/20,GETDATE()) FROM ( SELECT TOP(14500000) -- 00:04:02 / 14 000 000 ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c, SYS.COLUMNS d, SYS.COLUMNS e ) d ) e IF OBJECT_ID('tempdb..#Map_Exception') IS NOT NULL DROP TABLE #Map_Exception SELECT TOP(420) QueueID = CAST(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS VARCHAR(40)), [Service] = ABS(CHECKSUM(NEWID()))%100 INTO #Map_Exception FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c
CREATE UNIQUE CLUSTERED INDEX ucx_Map_Exception ON #Map_Exception (QueueID)

This takes about four minutes to run on a steam-powered dev box.

Having built the data, I had a play with indexes. To be sure that the results weren't skewed by other processes, I ran through the whole lot four times.

Here's the query:

SET STATISTICS IO, TIME ON
SELECT m.reportMonth, m.reportYear, ex.[service], SUM(m.Vol) AS Vol, SUM(m.Effort) AS effort FROM ( SELECT reportMonth, reportYear, queueID, SUM(volume) AS Vol, SUM(cost) AS Effort FROM #exception_Main -- 14.5M rows WHERE exceptionDateTime >= GETDATE() - 365 AND productArea = ('FXMM') AND reportable = 'Y' GROUP BY reportYear, reportMonth, queueID ) m LEFT JOIN #map_Exception ex -- 420 rows ON m.queueID = ex.queueID GROUP BY m.reportYear, m.reportMonth, ex.[service]
SET STATISTICS IO, TIME OFF

On this particular set the query returned 1,287 aggregated rows from 2,633,895 qualifying rows.

Here are the summarised results from the indexing tests:

--====================================================================================================
-- 1. Baseline CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([exceptionID]) CREATE NONCLUSTERED INDEX [idx_ctsTrend] ON #exception_Main ([productArea] ASC, [reportable] ASC, [exceptionDateTime] ASC) INCLUDE ([queueID], [cost], [reportMonth], [reportYear], [volume]) EXEC sp_spaceused '#exception_Main' --Reserved = 9,446,096 KB --Data = 4,640,032 KB --Index_size= 4,805,352 KB -- Best result from 6 runs: logical reads 131639, elapsed time = 656 ms. -- Index seek, no residual predicate, hash matches for aggregates --==================================================================================================== -- 2. Unique clustered index on surrogate key ID DROP INDEX idx_ctsTrend ON #Exception_main DROP INDEX ucx_Sample ON #Exception_main CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([ID]) EXEC sp_spaceused '#exception_Main' --Reserved = 4,652,128 KB --Data = 4,640,024 KB --Index_size= 11,672 KB -- Best result from 6 runs: logical reads 581450, elapsed time = 28288 ms. -- Clustered index scan, hash matches for aggregates --==================================================================================================== -- 2.1 Unique clustered index on surrogate key ID with supporting nonclustered index CREATE INDEX ix_Helper ON #Exception_main (productArea, reportable, reportYear, reportMonth, queueID) INCLUDE (volume, cost, exceptionDateTime); EXEC sp_spaceused '#exception_Main' --Reserved = 5,393,856 KB --Data = 4,640,024 KB --Index_size= 753,208 KB -- Best result from 6 runs: logical reads 24503, elapsed time = 332 ms. -- Index seek (productarea, reportable), residual predicate for exceptionDateTime, Stream Aggregates --==================================================================================================== -- 3.0 Clustered index on exceptionDateTime DROP INDEX ix_Helper ON #Exception_main DROP INDEX ucx_Sample ON #Exception_main CREATE CLUSTERED INDEX ucx_Sample ON #Exception_main (exceptionDateTime) EXEC sp_spaceused '#exception_Main' --Reserved = 4,656,544 KB --Data = 4,640,032 KB --Index_size= 16,024 KB -- Best result from 6 runs: logical reads 424515, physical reads 275, read-ahead reads 420270, elapsed time = 23276 ms. -- Clustered index seek (exceptionDateTime) residual predicate for productArea and Reportable, hash matches for aggregates --==================================================================================================== -- 3.1 Clustered index on exceptionDateTime & recommended non-clustered index CREATE NONCLUSTERED INDEX ix_Recommended ON [dbo].[#Exception_main] ([reportable],[productArea],[exceptionDateTime]) INCLUDE ([reportYear],[reportMonth],[QueueID],[volume],[cost]) EXEC sp_spaceused '#exception_Main' --Reserved = 4,656,544 KB --Data = 4,640,032 KB --Index_size= 16,024 KB -- Best result from 6 runs: logical reads 16916, elapsed time = 495 ms. -- Clustered index seek, no residual predicate, hash matches for aggregates --==================================================================================================== DROP INDEX ix_Recommended ON [dbo].[#Exception_main] DROP INDEX ucx_Sample ON #Exception_main -- back where we started, whizz around for another go to ensure results aren't skewed by local activity

The clustered index on exceptionDateTime was only marginally faster than the surrogate key because so many rows had to be filtered by the residual predicate. For sure the ratio would change in favour of a cluster on exceptionDateTime with a smaller number of qualifying rows, but this dataset wasn't deliberately tipped in favour of a natural key, it's just a very rough approximation of a real world situation.

xx

That seems to prove my point: the best clustered index eliminated the need for an extra, nonclustered index with better overall performance. And, since datetime is used as a filter in (almost) every query, almost every query will perform overall better and with more consistency.

Moreover, there's no dreaded "tipping point" when using the clustered index. Lastly, when you add one column to the query, such as including the customer in the grouping as did the original query the OP posted, there's no need to rebuild/refactor a covering index. No constant reshuffling nonclus indexes, and, inevitably, constantly increasing their size.

Don't get me wrong. Some covering indexes will almost certainly still be needed. But they are drastically reduced.

How is 23 seconds better than 300 milliseconds? The clustered index on your chosen column was only marginally better than a surrogate key (ID), which returned in 28 seconds.

Busy, probably didn't read closely enough. I've tuned tens of thousands of tables and well over half the time there is a better clustered index than one on an identity column. It gives better overall performance while deleting thousands of nonclus indexes. The idea that there should be a "default" clustering index of identity is just false, period.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 7

ScottPletcher (8/25/2015)

ChrisM@Work (8/25/2015)

ScottPletcher (8/25/2015)

ChrisM@Work (8/25/2015)

ScottPletcher (8/21/2015)

ChrisM@Work (8/21/2015)

ScottPletcher (8/21/2015)

ChrisM@Work (8/21/2015)

The inner select can be accelerated with this index:

CREATE INDEX ix_Helper ON [cts].[exception_Main]

(productArea, reportable, reportYear, reportMonth, queueID)

INCLUDE (volume, cost, exceptionDateTime)

But with the best clustered index now on the table, you don't need this extra, fairly large index. You should get as good a response time -- possibly even better, depends on the details -- from a clustered index on exceptionDateTime. That's the huge advantage --edit: of the best clus index -- that you don't have to create the gazillions on covering indexes for so many queries.

We now have a test harness Scott if you'd like to put the theory to work. I'm out of time for today for something of this scale but I'll have time tomorrow.

But you're looking at one query in isolation. You have to consider all the processing going on against that table. All these custom indexes require insert/update/delete maintenance, and they take buffer space to satisfy only one query or two. A clustered index is by definition a covering index for all queries, and the buffer space is shared among all users of that data. There's usually many people reading the most recent data.

Edit: Can you get better performance for a given table by building a custom table -- which is what the proposed index effectively is -- specifically for that query? Sure. But is it really worth rewriting half the table for every table mod just to get, say, 5% more performance for this query? It doesn't take long before the covering indexes cost you vastly more than they are worth. I've removed literally many thousands of nonclustered indexes here with vastly better performance overall (orders of magnitude in some cases) by first identifying and creating the best clustered index on tables. The vast majority of those tables had the clustered index on identity, because of the horribly mistaken and misguided belief that is some type of "default" clustered index for any table.

The clustered index on Nick's table wasn't the best choice since it was VARCHAR(400), so I ran a few tests to see how a surrogate key might fare against the suggestion of using the exceptionDateTime column. First I set up a test table containing little more than the columns used by the query, and with the same number of rows, about 14.5 million. I also set up the smaller table. Then I played about with indexing. Here's the code for the sample data:

-- set up sample data
IF OBJECT_ID('tempdb..#Exception_main') IS NOT NULL DROP TABLE #Exception_main SELECT ID, -- 8 bytes [exceptionID] = CAST(REPLICATE(CAST(NEWID() AS VARCHAR(36))+' ',7) AS VARCHAR(400)), reportYear = YEAR(ReportDate), reportMonth = MONTH(ReportDate), reportable = CAST(CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'Y' ELSE 'N' END AS CHAR(1)), QueueID = CAST(ABS(CHECKSUM(NEWID()))%420 AS VARCHAR(256)), exceptionDateTime = DATEADD(DAY,ABS(CHECKSUM(NEWID()))%3,ReportDate), -- 8 bytes productArea = CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'FXMM' ELSE 'N' END, volume = ABS(CHECKSUM(NEWID()))%20, cost = ABS(CHECKSUM(NEWID()))%30 INTO #Exception_main FROM ( SELECT ID, ReportDate = DATEADD(MINUTE, 0-ID/20,GETDATE()) FROM ( SELECT TOP(14500000) -- 00:04:02 / 14 000 000 ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c, SYS.COLUMNS d, SYS.COLUMNS e ) d ) e IF OBJECT_ID('tempdb..#Map_Exception') IS NOT NULL DROP TABLE #Map_Exception SELECT TOP(420) QueueID = CAST(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS VARCHAR(40)), [Service] = ABS(CHECKSUM(NEWID()))%100 INTO #Map_Exception FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c
CREATE UNIQUE CLUSTERED INDEX ucx_Map_Exception ON #Map_Exception (QueueID)

This takes about four minutes to run on a steam-powered dev box.

Having built the data, I had a play with indexes. To be sure that the results weren't skewed by other processes, I ran through the whole lot four times.

Here's the query:

SET STATISTICS IO, TIME ON
SELECT m.reportMonth, m.reportYear, ex.[service], SUM(m.Vol) AS Vol, SUM(m.Effort) AS effort FROM ( SELECT reportMonth, reportYear, queueID, SUM(volume) AS Vol, SUM(cost) AS Effort FROM #exception_Main -- 14.5M rows WHERE exceptionDateTime >= GETDATE() - 365 AND productArea = ('FXMM') AND reportable = 'Y' GROUP BY reportYear, reportMonth, queueID ) m LEFT JOIN #map_Exception ex -- 420 rows ON m.queueID = ex.queueID GROUP BY m.reportYear, m.reportMonth, ex.[service]
SET STATISTICS IO, TIME OFF

On this particular set the query returned 1,287 aggregated rows from 2,633,895 qualifying rows.

Here are the summarised results from the indexing tests:

--====================================================================================================
-- 1. Baseline CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([exceptionID]) CREATE NONCLUSTERED INDEX [idx_ctsTrend] ON #exception_Main ([productArea] ASC, [reportable] ASC, [exceptionDateTime] ASC) INCLUDE ([queueID], [cost], [reportMonth], [reportYear], [volume]) EXEC sp_spaceused '#exception_Main' --Reserved = 9,446,096 KB --Data = 4,640,032 KB --Index_size= 4,805,352 KB -- Best result from 6 runs: logical reads 131639, elapsed time = 656 ms. -- Index seek, no residual predicate, hash matches for aggregates --==================================================================================================== -- 2. Unique clustered index on surrogate key ID DROP INDEX idx_ctsTrend ON #Exception_main DROP INDEX ucx_Sample ON #Exception_main CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([ID]) EXEC sp_spaceused '#exception_Main' --Reserved = 4,652,128 KB --Data = 4,640,024 KB --Index_size= 11,672 KB -- Best result from 6 runs: logical reads 581450, elapsed time = 28288 ms. -- Clustered index scan, hash matches for aggregates --==================================================================================================== -- 2.1 Unique clustered index on surrogate key ID with supporting nonclustered index CREATE INDEX ix_Helper ON #Exception_main (productArea, reportable, reportYear, reportMonth, queueID) INCLUDE (volume, cost, exceptionDateTime); EXEC sp_spaceused '#exception_Main' --Reserved = 5,393,856 KB --Data = 4,640,024 KB --Index_size= 753,208 KB -- Best result from 6 runs: logical reads 24503, elapsed time = 332 ms. -- Index seek (productarea, reportable), residual predicate for exceptionDateTime, Stream Aggregates --==================================================================================================== -- 3.0 Clustered index on exceptionDateTime DROP INDEX ix_Helper ON #Exception_main DROP INDEX ucx_Sample ON #Exception_main CREATE CLUSTERED INDEX ucx_Sample ON #Exception_main (exceptionDateTime) EXEC sp_spaceused '#exception_Main' --Reserved = 4,656,544 KB --Data = 4,640,032 KB --Index_size= 16,024 KB -- Best result from 6 runs: logical reads 424515, physical reads 275, read-ahead reads 420270, elapsed time = 23276 ms. -- Clustered index seek (exceptionDateTime) residual predicate for productArea and Reportable, hash matches for aggregates --==================================================================================================== -- 3.1 Clustered index on exceptionDateTime & recommended non-clustered index CREATE NONCLUSTERED INDEX ix_Recommended ON [dbo].[#Exception_main] ([reportable],[productArea],[exceptionDateTime]) INCLUDE ([reportYear],[reportMonth],[QueueID],[volume],[cost]) EXEC sp_spaceused '#exception_Main' --Reserved = 4,656,544 KB --Data = 4,640,032 KB --Index_size= 16,024 KB -- Best result from 6 runs: logical reads 16916, elapsed time = 495 ms. -- Clustered index seek, no residual predicate, hash matches for aggregates --==================================================================================================== DROP INDEX ix_Recommended ON [dbo].[#Exception_main] DROP INDEX ucx_Sample ON #Exception_main -- back where we started, whizz around for another go to ensure results aren't skewed by local activity

The clustered index on exceptionDateTime was only marginally faster than the surrogate key because so many rows had to be filtered by the residual predicate. For sure the ratio would change in favour of a cluster on exceptionDateTime with a smaller number of qualifying rows, but this dataset wasn't deliberately tipped in favour of a natural key, it's just a very rough approximation of a real world situation.

xx

That seems to prove my point: the best clustered index eliminated the need for an extra, nonclustered index with better overall performance. And, since datetime is used as a filter in (almost) every query, almost every query will perform overall better and with more consistency.

Moreover, there's no dreaded "tipping point" when using the clustered index. Lastly, when you add one column to the query, such as including the customer in the grouping as did the original query the OP posted, there's no need to rebuild/refactor a covering index. No constant reshuffling nonclus indexes, and, inevitably, constantly increasing their size.

Don't get me wrong. Some covering indexes will almost certainly still be needed. But they are drastically reduced.

How is 23 seconds better than 300 milliseconds? The clustered index on your chosen column was only marginally better than a surrogate key (ID), which returned in 28 seconds.

Busy, probably didn't read closely enough. I've tuned tens of thousands of tables and well over half the time there is a better clustered index than one on an identity column. It gives better overall performance while deleting thousands of nonclus indexes. The idea that there should be a "default" clustering index of identity is just false, period.

"The idea that there should be a "default" clustering index of identity is just false, period." Yes, agreed. But as a very famous guy who lurks around here is fond of saying "It depends", and in this particular case, purely by accident, the identity column fares quite well. Not only that, but the clustered index of your choice is next to useless without a supporting non-clustered index. Now here's something else to think about. The best choice of clustered index, if it's to be a natural key, won't be known until the database has been live for long enough to pick up decent usage stats.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ScottPletcher SSC Guru Points: 100262 More actions · Answer 8

ChrisM@Work (8/25/2015)

ScottPletcher (8/25/2015)

ChrisM@Work (8/25/2015)

ScottPletcher (8/25/2015)

ChrisM@Work (8/25/2015)

ScottPletcher (8/21/2015)

ChrisM@Work (8/21/2015)

ScottPletcher (8/21/2015)

ChrisM@Work (8/21/2015)

The inner select can be accelerated with this index:

CREATE INDEX ix_Helper ON [cts].[exception_Main]

(productArea, reportable, reportYear, reportMonth, queueID)

INCLUDE (volume, cost, exceptionDateTime)

But with the best clustered index now on the table, you don't need this extra, fairly large index. You should get as good a response time -- possibly even better, depends on the details -- from a clustered index on exceptionDateTime. That's the huge advantage --edit: of the best clus index -- that you don't have to create the gazillions on covering indexes for so many queries.

We now have a test harness Scott if you'd like to put the theory to work. I'm out of time for today for something of this scale but I'll have time tomorrow.

But you're looking at one query in isolation. You have to consider all the processing going on against that table. All these custom indexes require insert/update/delete maintenance, and they take buffer space to satisfy only one query or two. A clustered index is by definition a covering index for all queries, and the buffer space is shared among all users of that data. There's usually many people reading the most recent data.

Edit: Can you get better performance for a given table by building a custom table -- which is what the proposed index effectively is -- specifically for that query? Sure. But is it really worth rewriting half the table for every table mod just to get, say, 5% more performance for this query? It doesn't take long before the covering indexes cost you vastly more than they are worth. I've removed literally many thousands of nonclustered indexes here with vastly better performance overall (orders of magnitude in some cases) by first identifying and creating the best clustered index on tables. The vast majority of those tables had the clustered index on identity, because of the horribly mistaken and misguided belief that is some type of "default" clustered index for any table.

The clustered index on Nick's table wasn't the best choice since it was VARCHAR(400), so I ran a few tests to see how a surrogate key might fare against the suggestion of using the exceptionDateTime column. First I set up a test table containing little more than the columns used by the query, and with the same number of rows, about 14.5 million. I also set up the smaller table. Then I played about with indexing. Here's the code for the sample data:

-- set up sample data
IF OBJECT_ID('tempdb..#Exception_main') IS NOT NULL DROP TABLE #Exception_main SELECT ID, -- 8 bytes [exceptionID] = CAST(REPLICATE(CAST(NEWID() AS VARCHAR(36))+' ',7) AS VARCHAR(400)), reportYear = YEAR(ReportDate), reportMonth = MONTH(ReportDate), reportable = CAST(CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'Y' ELSE 'N' END AS CHAR(1)), QueueID = CAST(ABS(CHECKSUM(NEWID()))%420 AS VARCHAR(256)), exceptionDateTime = DATEADD(DAY,ABS(CHECKSUM(NEWID()))%3,ReportDate), -- 8 bytes productArea = CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'FXMM' ELSE 'N' END, volume = ABS(CHECKSUM(NEWID()))%20, cost = ABS(CHECKSUM(NEWID()))%30 INTO #Exception_main FROM ( SELECT ID, ReportDate = DATEADD(MINUTE, 0-ID/20,GETDATE()) FROM ( SELECT TOP(14500000) -- 00:04:02 / 14 000 000 ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c, SYS.COLUMNS d, SYS.COLUMNS e ) d ) e IF OBJECT_ID('tempdb..#Map_Exception') IS NOT NULL DROP TABLE #Map_Exception SELECT TOP(420) QueueID = CAST(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS VARCHAR(40)), [Service] = ABS(CHECKSUM(NEWID()))%100 INTO #Map_Exception FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c
CREATE UNIQUE CLUSTERED INDEX ucx_Map_Exception ON #Map_Exception (QueueID)

This takes about four minutes to run on a steam-powered dev box.

Having built the data, I had a play with indexes. To be sure that the results weren't skewed by other processes, I ran through the whole lot four times.

Here's the query:

SET STATISTICS IO, TIME ON
SELECT m.reportMonth, m.reportYear, ex.[service], SUM(m.Vol) AS Vol, SUM(m.Effort) AS effort FROM ( SELECT reportMonth, reportYear, queueID, SUM(volume) AS Vol, SUM(cost) AS Effort FROM #exception_Main -- 14.5M rows WHERE exceptionDateTime >= GETDATE() - 365 AND productArea = ('FXMM') AND reportable = 'Y' GROUP BY reportYear, reportMonth, queueID ) m LEFT JOIN #map_Exception ex -- 420 rows ON m.queueID = ex.queueID GROUP BY m.reportYear, m.reportMonth, ex.[service]
SET STATISTICS IO, TIME OFF

On this particular set the query returned 1,287 aggregated rows from 2,633,895 qualifying rows.

Here are the summarised results from the indexing tests:

--====================================================================================================
-- 1. Baseline CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([exceptionID]) CREATE NONCLUSTERED INDEX [idx_ctsTrend] ON #exception_Main ([productArea] ASC, [reportable] ASC, [exceptionDateTime] ASC) INCLUDE ([queueID], [cost], [reportMonth], [reportYear], [volume]) EXEC sp_spaceused '#exception_Main' --Reserved = 9,446,096 KB --Data = 4,640,032 KB --Index_size= 4,805,352 KB -- Best result from 6 runs: logical reads 131639, elapsed time = 656 ms. -- Index seek, no residual predicate, hash matches for aggregates --==================================================================================================== -- 2. Unique clustered index on surrogate key ID DROP INDEX idx_ctsTrend ON #Exception_main DROP INDEX ucx_Sample ON #Exception_main CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([ID]) EXEC sp_spaceused '#exception_Main' --Reserved = 4,652,128 KB --Data = 4,640,024 KB --Index_size= 11,672 KB -- Best result from 6 runs: logical reads 581450, elapsed time = 28288 ms. -- Clustered index scan, hash matches for aggregates --==================================================================================================== -- 2.1 Unique clustered index on surrogate key ID with supporting nonclustered index CREATE INDEX ix_Helper ON #Exception_main (productArea, reportable, reportYear, reportMonth, queueID) INCLUDE (volume, cost, exceptionDateTime); EXEC sp_spaceused '#exception_Main' --Reserved = 5,393,856 KB --Data = 4,640,024 KB --Index_size= 753,208 KB -- Best result from 6 runs: logical reads 24503, elapsed time = 332 ms. -- Index seek (productarea, reportable), residual predicate for exceptionDateTime, Stream Aggregates --==================================================================================================== -- 3.0 Clustered index on exceptionDateTime DROP INDEX ix_Helper ON #Exception_main DROP INDEX ucx_Sample ON #Exception_main CREATE CLUSTERED INDEX ucx_Sample ON #Exception_main (exceptionDateTime) EXEC sp_spaceused '#exception_Main' --Reserved = 4,656,544 KB --Data = 4,640,032 KB --Index_size= 16,024 KB -- Best result from 6 runs: logical reads 424515, physical reads 275, read-ahead reads 420270, elapsed time = 23276 ms. -- Clustered index seek (exceptionDateTime) residual predicate for productArea and Reportable, hash matches for aggregates --==================================================================================================== -- 3.1 Clustered index on exceptionDateTime & recommended non-clustered index CREATE NONCLUSTERED INDEX ix_Recommended ON [dbo].[#Exception_main] ([reportable],[productArea],[exceptionDateTime]) INCLUDE ([reportYear],[reportMonth],[QueueID],[volume],[cost]) EXEC sp_spaceused '#exception_Main' --Reserved = 4,656,544 KB --Data = 4,640,032 KB --Index_size= 16,024 KB -- Best result from 6 runs: logical reads 16916, elapsed time = 495 ms. -- Clustered index seek, no residual predicate, hash matches for aggregates --==================================================================================================== DROP INDEX ix_Recommended ON [dbo].[#Exception_main] DROP INDEX ucx_Sample ON #Exception_main -- back where we started, whizz around for another go to ensure results aren't skewed by local activity

The clustered index on exceptionDateTime was only marginally faster than the surrogate key because so many rows had to be filtered by the residual predicate. For sure the ratio would change in favour of a cluster on exceptionDateTime with a smaller number of qualifying rows, but this dataset wasn't deliberately tipped in favour of a natural key, it's just a very rough approximation of a real world situation.

xx

That seems to prove my point: the best clustered index eliminated the need for an extra, nonclustered index with better overall performance. And, since datetime is used as a filter in (almost) every query, almost every query will perform overall better and with more consistency.

Moreover, there's no dreaded "tipping point" when using the clustered index. Lastly, when you add one column to the query, such as including the customer in the grouping as did the original query the OP posted, there's no need to rebuild/refactor a covering index. No constant reshuffling nonclus indexes, and, inevitably, constantly increasing their size.

Don't get me wrong. Some covering indexes will almost certainly still be needed. But they are drastically reduced.

How is 23 seconds better than 300 milliseconds? The clustered index on your chosen column was only marginally better than a surrogate key (ID), which returned in 28 seconds.

Busy, probably didn't read closely enough. I've tuned tens of thousands of tables and well over half the time there is a better clustered index than one on an identity column. It gives better overall performance while deleting thousands of nonclus indexes. The idea that there should be a "default" clustering index of identity is just false, period.

"The idea that there should be a "default" clustering index of identity is just false, period." Yes, agreed. But as a very famous guy who lurks around here is fond of saying "It depends", and in this particular case, purely by accident, the identity column fares quite well. Not only that, but the clustered index of your choice is next to useless without a supporting non-clustered index. Now here's something else to think about. The best choice of clustered index, if it's to be a natural key, won't be known until the database has been live for long enough to pick up decent usage stats.

It's not at all next to useless. Typically you don't use an entire year's worth of data at once. You're also not including the overhead of maintaining custom indexes for every query. Again, yes, a custom table built and maintained for just that specific query will almost always outperform the general table, but that's not the total part of it.

You also didn't replicate the actual clustered index the requester created. Given their knowledge of the data, they included additional columns in the index that you ignored. As I noted above, only they would know whether to add columns beyond the datetime. But in every real-life case I've seen, datetime will be a vastly better clustered index for this table than identity, and you will save yourself duplicating the entire table -- or more -- in added nonclus indexes.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

ScottPletcher (8/25/2015)

ChrisM@Work (8/25/2015)

ScottPletcher (8/25/2015)

ChrisM@Work (8/25/2015)

ScottPletcher (8/25/2015)

ChrisM@Work (8/25/2015)

ScottPletcher (8/21/2015)

ChrisM@Work (8/21/2015)

ScottPletcher (8/21/2015)

ChrisM@Work (8/21/2015)

The inner select can be accelerated with this index:

CREATE INDEX ix_Helper ON [cts].[exception_Main]

(productArea, reportable, reportYear, reportMonth, queueID)

INCLUDE (volume, cost, exceptionDateTime)

But with the best clustered index now on the table, you don't need this extra, fairly large index. You should get as good a response time -- possibly even better, depends on the details -- from a clustered index on exceptionDateTime. That's the huge advantage --edit: of the best clus index -- that you don't have to create the gazillions on covering indexes for so many queries.

We now have a test harness Scott if you'd like to put the theory to work. I'm out of time for today for something of this scale but I'll have time tomorrow.

But you're looking at one query in isolation. You have to consider all the processing going on against that table. All these custom indexes require insert/update/delete maintenance, and they take buffer space to satisfy only one query or two. A clustered index is by definition a covering index for all queries, and the buffer space is shared among all users of that data. There's usually many people reading the most recent data.

Edit: Can you get better performance for a given table by building a custom table -- which is what the proposed index effectively is -- specifically for that query? Sure. But is it really worth rewriting half the table for every table mod just to get, say, 5% more performance for this query? It doesn't take long before the covering indexes cost you vastly more than they are worth. I've removed literally many thousands of nonclustered indexes here with vastly better performance overall (orders of magnitude in some cases) by first identifying and creating the best clustered index on tables. The vast majority of those tables had the clustered index on identity, because of the horribly mistaken and misguided belief that is some type of "default" clustered index for any table.

The clustered index on Nick's table wasn't the best choice since it was VARCHAR(400), so I ran a few tests to see how a surrogate key might fare against the suggestion of using the exceptionDateTime column. First I set up a test table containing little more than the columns used by the query, and with the same number of rows, about 14.5 million. I also set up the smaller table. Then I played about with indexing. Here's the code for the sample data:

-- set up sample data
IF OBJECT_ID('tempdb..#Exception_main') IS NOT NULL DROP TABLE #Exception_main SELECT ID, -- 8 bytes [exceptionID] = CAST(REPLICATE(CAST(NEWID() AS VARCHAR(36))+' ',7) AS VARCHAR(400)), reportYear = YEAR(ReportDate), reportMonth = MONTH(ReportDate), reportable = CAST(CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'Y' ELSE 'N' END AS CHAR(1)), QueueID = CAST(ABS(CHECKSUM(NEWID()))%420 AS VARCHAR(256)), exceptionDateTime = DATEADD(DAY,ABS(CHECKSUM(NEWID()))%3,ReportDate), -- 8 bytes productArea = CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'FXMM' ELSE 'N' END, volume = ABS(CHECKSUM(NEWID()))%20, cost = ABS(CHECKSUM(NEWID()))%30 INTO #Exception_main FROM ( SELECT ID, ReportDate = DATEADD(MINUTE, 0-ID/20,GETDATE()) FROM ( SELECT TOP(14500000) -- 00:04:02 / 14 000 000 ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c, SYS.COLUMNS d, SYS.COLUMNS e ) d ) e IF OBJECT_ID('tempdb..#Map_Exception') IS NOT NULL DROP TABLE #Map_Exception SELECT TOP(420) QueueID = CAST(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS VARCHAR(40)), [Service] = ABS(CHECKSUM(NEWID()))%100 INTO #Map_Exception FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c
CREATE UNIQUE CLUSTERED INDEX ucx_Map_Exception ON #Map_Exception (QueueID)

This takes about four minutes to run on a steam-powered dev box.

Having built the data, I had a play with indexes. To be sure that the results weren't skewed by other processes, I ran through the whole lot four times.

Here's the query:

SET STATISTICS IO, TIME ON
SELECT m.reportMonth, m.reportYear, ex.[service], SUM(m.Vol) AS Vol, SUM(m.Effort) AS effort FROM ( SELECT reportMonth, reportYear, queueID, SUM(volume) AS Vol, SUM(cost) AS Effort FROM #exception_Main -- 14.5M rows WHERE exceptionDateTime >= GETDATE() - 365 AND productArea = ('FXMM') AND reportable = 'Y' GROUP BY reportYear, reportMonth, queueID ) m LEFT JOIN #map_Exception ex -- 420 rows ON m.queueID = ex.queueID GROUP BY m.reportYear, m.reportMonth, ex.[service]
SET STATISTICS IO, TIME OFF

On this particular set the query returned 1,287 aggregated rows from 2,633,895 qualifying rows.

Here are the summarised results from the indexing tests:

--====================================================================================================
-- 1. Baseline CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([exceptionID]) CREATE NONCLUSTERED INDEX [idx_ctsTrend] ON #exception_Main ([productArea] ASC, [reportable] ASC, [exceptionDateTime] ASC) INCLUDE ([queueID], [cost], [reportMonth], [reportYear], [volume]) EXEC sp_spaceused '#exception_Main' --Reserved = 9,446,096 KB --Data = 4,640,032 KB --Index_size= 4,805,352 KB -- Best result from 6 runs: logical reads 131639, elapsed time = 656 ms. -- Index seek, no residual predicate, hash matches for aggregates --==================================================================================================== -- 2. Unique clustered index on surrogate key ID DROP INDEX idx_ctsTrend ON #Exception_main DROP INDEX ucx_Sample ON #Exception_main CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([ID]) EXEC sp_spaceused '#exception_Main' --Reserved = 4,652,128 KB --Data = 4,640,024 KB --Index_size= 11,672 KB -- Best result from 6 runs: logical reads 581450, elapsed time = 28288 ms. -- Clustered index scan, hash matches for aggregates --==================================================================================================== -- 2.1 Unique clustered index on surrogate key ID with supporting nonclustered index CREATE INDEX ix_Helper ON #Exception_main (productArea, reportable, reportYear, reportMonth, queueID) INCLUDE (volume, cost, exceptionDateTime); EXEC sp_spaceused '#exception_Main' --Reserved = 5,393,856 KB --Data = 4,640,024 KB --Index_size= 753,208 KB -- Best result from 6 runs: logical reads 24503, elapsed time = 332 ms. -- Index seek (productarea, reportable), residual predicate for exceptionDateTime, Stream Aggregates --==================================================================================================== -- 3.0 Clustered index on exceptionDateTime DROP INDEX ix_Helper ON #Exception_main DROP INDEX ucx_Sample ON #Exception_main CREATE CLUSTERED INDEX ucx_Sample ON #Exception_main (exceptionDateTime) EXEC sp_spaceused '#exception_Main' --Reserved = 4,656,544 KB --Data = 4,640,032 KB --Index_size= 16,024 KB -- Best result from 6 runs: logical reads 424515, physical reads 275, read-ahead reads 420270, elapsed time = 23276 ms. -- Clustered index seek (exceptionDateTime) residual predicate for productArea and Reportable, hash matches for aggregates --==================================================================================================== -- 3.1 Clustered index on exceptionDateTime & recommended non-clustered index CREATE NONCLUSTERED INDEX ix_Recommended ON [dbo].[#Exception_main] ([reportable],[productArea],[exceptionDateTime]) INCLUDE ([reportYear],[reportMonth],[QueueID],[volume],[cost]) EXEC sp_spaceused '#exception_Main' --Reserved = 4,656,544 KB --Data = 4,640,032 KB --Index_size= 16,024 KB -- Best result from 6 runs: logical reads 16916, elapsed time = 495 ms. -- Clustered index seek, no residual predicate, hash matches for aggregates --==================================================================================================== DROP INDEX ix_Recommended ON [dbo].[#Exception_main] DROP INDEX ucx_Sample ON #Exception_main -- back where we started, whizz around for another go to ensure results aren't skewed by local activity

The clustered index on exceptionDateTime was only marginally faster than the surrogate key because so many rows had to be filtered by the residual predicate. For sure the ratio would change in favour of a cluster on exceptionDateTime with a smaller number of qualifying rows, but this dataset wasn't deliberately tipped in favour of a natural key, it's just a very rough approximation of a real world situation.

xx

That seems to prove my point: the best clustered index eliminated the need for an extra, nonclustered index with better overall performance. And, since datetime is used as a filter in (almost) every query, almost every query will perform overall better and with more consistency.

Moreover, there's no dreaded "tipping point" when using the clustered index. Lastly, when you add one column to the query, such as including the customer in the grouping as did the original query the OP posted, there's no need to rebuild/refactor a covering index. No constant reshuffling nonclus indexes, and, inevitably, constantly increasing their size.

Don't get me wrong. Some covering indexes will almost certainly still be needed. But they are drastically reduced.

How is 23 seconds better than 300 milliseconds? The clustered index on your chosen column was only marginally better than a surrogate key (ID), which returned in 28 seconds.

Busy, probably didn't read closely enough. I've tuned tens of thousands of tables and well over half the time there is a better clustered index than one on an identity column. It gives better overall performance while deleting thousands of nonclus indexes. The idea that there should be a "default" clustering index of identity is just false, period.

"The idea that there should be a "default" clustering index of identity is just false, period." Yes, agreed. But as a very famous guy who lurks around here is fond of saying "It depends", and in this particular case, purely by accident, the identity column fares quite well. Not only that, but the clustered index of your choice is next to useless without a supporting non-clustered index. Now here's something else to think about. The best choice of clustered index, if it's to be a natural key, won't be known until the database has been live for long enough to pick up decent usage stats.

It's not at all next to useless. Typically you don't use an entire year's worth of data at once. You're also not including the overhead of maintaining custom indexes for every query. Again, yes, a custom table built and maintained for just that specific query will almost always outperform the general table, but that's not the total part of it.

You also didn't replicate the actual clustered index the requester created. Given their knowledge of the data, they included additional columns in the index that you ignored. As I noted above, only they would know whether to add columns beyond the datetime. But in every real-life case I've seen, datetime will be a vastly better clustered index for this table than identity, and you will save yourself duplicating the entire table -- or more -- in added nonclus indexes.

Typically you'd distinguish between OLTP and reporting databases before making sweeping generalisations 😀

The clustered index I created is exactly the same as the OP posted DDL for:

CONSTRAINT [idx_ctsException] PRIMARY KEY CLUSTERED

(

[exceptionID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]

) ON [PRIMARY]

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Eric M Russell SSC Guru Points: 125255 More actions · Answer 10

You've probably already considered this, but if you do go the route of summary tables, then perform incremental builds. In other words, if the fact table is inserted with daily transaction records related only to period July 2015, then that day's summary build process should delete and re-aggregate only records for July 2015, rather than truncating the summary table and re-aggregating the entire fact table.

Coincidentally at this very moment I'm taking a break from refactoring a legacy process that wasn't originally coded that way. Based on some initial unit tests, it's looking like the nightly summary build process will shrink from 12 hours to less than one hour. In this case I'm dealing with 10 TB of fact tables with a daily ingest rate of a few GB, so it's amazing how much I/O has been wasted re-processing the same data day after day.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho