Metrics Table or Performance Tuning

  • ScottPletcher (8/21/2015)


    ChrisM@Work (8/21/2015)

    The inner select can be accelerated with this index:

    CREATE INDEX ix_Helper ON [cts].[exception_Main]

    (productArea, reportable, reportYear, reportMonth, queueID)

    INCLUDE (volume, cost, exceptionDateTime)

    But with the best clustered index now on the table, you don't need this extra, fairly large index. You should get as good a response time -- possibly even better, depends on the details -- from a clustered index on exceptionDateTime. That's the huge advantage --edit: of the best clus index -- that you don't have to create the gazillions on covering indexes for so many queries.

    We now have a test harness Scott if you'd like to put the theory to work. I'm out of time for today for something of this scale but I'll have time tomorrow.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (8/21/2015)


    ScottPletcher (8/21/2015)


    ChrisM@Work (8/21/2015)

    The inner select can be accelerated with this index:

    CREATE INDEX ix_Helper ON [cts].[exception_Main]

    (productArea, reportable, reportYear, reportMonth, queueID)

    INCLUDE (volume, cost, exceptionDateTime)

    But with the best clustered index now on the table, you don't need this extra, fairly large index. You should get as good a response time -- possibly even better, depends on the details -- from a clustered index on exceptionDateTime. That's the huge advantage --edit: of the best clus index -- that you don't have to create the gazillions on covering indexes for so many queries.

    We now have a test harness Scott if you'd like to put the theory to work. I'm out of time for today for something of this scale but I'll have time tomorrow.

    But you're looking at one query in isolation. You have to consider all the processing going on against that table. All these custom indexes require insert/update/delete maintenance, and they take buffer space to satisfy only one query or two. A clustered index is by definition a covering index for all queries, and the buffer space is shared among all users of that data. There's usually many people reading the most recent data.

    Edit: Can you get better performance for a given table by building a custom table -- which is what the proposed index effectively is -- specifically for that query? Sure. But is it really worth rewriting half the table for every table mod just to get, say, 5% more performance for this query? It doesn't take long before the covering indexes cost you vastly more than they are worth. I've removed literally many thousands of nonclustered indexes here with vastly better performance overall (orders of magnitude in some cases) by first identifying and creating the best clustered index on tables. The vast majority of those tables had the clustered index on identity, because of the horribly mistaken and misguided belief that is some type of "default" clustered index for any table.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • ScottPletcher (8/21/2015)


    ChrisM@Work (8/21/2015)


    ScottPletcher (8/21/2015)


    ChrisM@Work (8/21/2015)

    The inner select can be accelerated with this index:

    CREATE INDEX ix_Helper ON [cts].[exception_Main]

    (productArea, reportable, reportYear, reportMonth, queueID)

    INCLUDE (volume, cost, exceptionDateTime)

    But with the best clustered index now on the table, you don't need this extra, fairly large index. You should get as good a response time -- possibly even better, depends on the details -- from a clustered index on exceptionDateTime. That's the huge advantage --edit: of the best clus index -- that you don't have to create the gazillions on covering indexes for so many queries.

    We now have a test harness Scott if you'd like to put the theory to work. I'm out of time for today for something of this scale but I'll have time tomorrow.

    But you're looking at one query in isolation. You have to consider all the processing going on against that table. All these custom indexes require insert/update/delete maintenance, and they take buffer space to satisfy only one query or two. A clustered index is by definition a covering index for all queries, and the buffer space is shared among all users of that data. There's usually many people reading the most recent data.

    Edit: Can you get better performance for a given table by building a custom table -- which is what the proposed index effectively is -- specifically for that query? Sure. But is it really worth rewriting half the table for every table mod just to get, say, 5% more performance for this query? It doesn't take long before the covering indexes cost you vastly more than they are worth. I've removed literally many thousands of nonclustered indexes here with vastly better performance overall (orders of magnitude in some cases) by first identifying and creating the best clustered index on tables. The vast majority of those tables had the clustered index on identity, because of the horribly mistaken and misguided belief that is some type of "default" clustered index for any table.

    The clustered index on Nick's table wasn't the best choice since it was VARCHAR(400), so I ran a few tests to see how a surrogate key might fare against the suggestion of using the exceptionDateTime column. First I set up a test table containing little more than the columns used by the query, and with the same number of rows, about 14.5 million. I also set up the smaller table. Then I played about with indexing. Here's the code for the sample data:

    -- set up sample data

    IF OBJECT_ID('tempdb..#Exception_main') IS NOT NULL DROP TABLE #Exception_main

    SELECT

    ID, -- 8 bytes

    [exceptionID] = CAST(REPLICATE(CAST(NEWID() AS VARCHAR(36))+' ',7) AS VARCHAR(400)),

    reportYear = YEAR(ReportDate),

    reportMonth = MONTH(ReportDate),

    reportable = CAST(CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'Y' ELSE 'N' END AS CHAR(1)),

    QueueID = CAST(ABS(CHECKSUM(NEWID()))%420 AS VARCHAR(256)),

    exceptionDateTime = DATEADD(DAY,ABS(CHECKSUM(NEWID()))%3,ReportDate), -- 8 bytes

    productArea = CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'FXMM' ELSE 'N' END,

    volume = ABS(CHECKSUM(NEWID()))%20,

    cost = ABS(CHECKSUM(NEWID()))%30

    INTO #Exception_main

    FROM (

    SELECT ID, ReportDate = DATEADD(MINUTE, 0-ID/20,GETDATE())

    FROM (

    SELECT TOP(14500000) -- 00:04:02 / 14 000 000

    ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c, SYS.COLUMNS d, SYS.COLUMNS e

    ) d

    ) e

    IF OBJECT_ID('tempdb..#Map_Exception') IS NOT NULL DROP TABLE #Map_Exception

    SELECT TOP(420)

    QueueID = CAST(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS VARCHAR(40)),

    [Service] = ABS(CHECKSUM(NEWID()))%100

    INTO #Map_Exception

    FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c

    CREATE UNIQUE CLUSTERED INDEX ucx_Map_Exception ON #Map_Exception (QueueID)

    This takes about four minutes to run on a steam-powered dev box.

    Having built the data, I had a play with indexes. To be sure that the results weren't skewed by other processes, I ran through the whole lot four times.

    Here's the query:

    SET STATISTICS IO, TIME ON

    SELECT m.reportMonth,

    m.reportYear,

    ex.[service],

    SUM(m.Vol) AS Vol,

    SUM(m.Effort) AS effort

    FROM (

    SELECT reportMonth,

    reportYear,

    queueID,

    SUM(volume) AS Vol,

    SUM(cost) AS Effort

    FROM #exception_Main -- 14.5M rows

    WHERE exceptionDateTime >= GETDATE() - 365

    AND productArea = ('FXMM')

    AND reportable = 'Y'

    GROUP BY

    reportYear,

    reportMonth,

    queueID

    ) m

    LEFT JOIN #map_Exception ex -- 420 rows

    ON m.queueID = ex.queueID

    GROUP BY

    m.reportYear,

    m.reportMonth,

    ex.[service]

    SET STATISTICS IO, TIME OFF

    On this particular set the query returned 1,287 aggregated rows from 2,633,895 qualifying rows.

    Here are the summarised results from the indexing tests:

    --====================================================================================================

    -- 1. Baseline

    CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([exceptionID])

    CREATE NONCLUSTERED INDEX [idx_ctsTrend] ON #exception_Main

    ([productArea] ASC, [reportable] ASC, [exceptionDateTime] ASC)

    INCLUDE ([queueID], [cost], [reportMonth], [reportYear], [volume])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 9,446,096 KB

    --Data = 4,640,032 KB

    --Index_size= 4,805,352 KB

    -- Best result from 6 runs: logical reads 131639, elapsed time = 656 ms.

    -- Index seek, no residual predicate, hash matches for aggregates

    --====================================================================================================

    -- 2. Unique clustered index on surrogate key ID

    DROP INDEX idx_ctsTrend ON #Exception_main

    DROP INDEX ucx_Sample ON #Exception_main

    CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([ID])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,652,128 KB

    --Data = 4,640,024 KB

    --Index_size= 11,672 KB

    -- Best result from 6 runs: logical reads 581450, elapsed time = 28288 ms.

    -- Clustered index scan, hash matches for aggregates

    --====================================================================================================

    -- 2.1 Unique clustered index on surrogate key ID with supporting nonclustered index

    CREATE INDEX ix_Helper ON #Exception_main

    (productArea, reportable, reportYear, reportMonth, queueID)

    INCLUDE (volume, cost, exceptionDateTime);

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 5,393,856 KB

    --Data = 4,640,024 KB

    --Index_size= 753,208 KB

    -- Best result from 6 runs: logical reads 24503, elapsed time = 332 ms.

    -- Index seek (productarea, reportable), residual predicate for exceptionDateTime, Stream Aggregates

    --====================================================================================================

    -- 3.0 Clustered index on exceptionDateTime

    DROP INDEX ix_Helper ON #Exception_main

    DROP INDEX ucx_Sample ON #Exception_main

    CREATE CLUSTERED INDEX ucx_Sample ON #Exception_main (exceptionDateTime)

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,656,544 KB

    --Data = 4,640,032 KB

    --Index_size= 16,024 KB

    -- Best result from 6 runs: logical reads 424515, physical reads 275, read-ahead reads 420270, elapsed time = 23276 ms.

    -- Clustered index seek (exceptionDateTime) residual predicate for productArea and Reportable, hash matches for aggregates

    --====================================================================================================

    -- 3.1 Clustered index on exceptionDateTime & recommended non-clustered index

    CREATE NONCLUSTERED INDEX ix_Recommended ON [dbo].[#Exception_main]

    ([reportable],[productArea],[exceptionDateTime])

    INCLUDE ([reportYear],[reportMonth],[QueueID],[volume],[cost])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,656,544 KB

    --Data = 4,640,032 KB

    --Index_size= 16,024 KB

    -- Best result from 6 runs: logical reads 16916, elapsed time = 495 ms.

    -- Clustered index seek, no residual predicate, hash matches for aggregates

    --====================================================================================================

    DROP INDEX ix_Recommended ON [dbo].[#Exception_main]

    DROP INDEX ucx_Sample ON #Exception_main

    -- back where we started, whizz around for another go to ensure results aren't skewed by local activity

    The clustered index on exceptionDateTime was only marginally faster than the surrogate key because so many rows had to be filtered by the residual predicate. For sure the ratio would change in favour of a cluster on exceptionDateTime with a smaller number of qualifying rows, but this dataset wasn't deliberately tipped in favour of a natural key, it's just a very rough approximation of a real world situation.

    xx

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (8/25/2015)


    ScottPletcher (8/21/2015)


    ChrisM@Work (8/21/2015)


    ScottPletcher (8/21/2015)


    ChrisM@Work (8/21/2015)

    The inner select can be accelerated with this index:

    CREATE INDEX ix_Helper ON [cts].[exception_Main]

    (productArea, reportable, reportYear, reportMonth, queueID)

    INCLUDE (volume, cost, exceptionDateTime)

    But with the best clustered index now on the table, you don't need this extra, fairly large index. You should get as good a response time -- possibly even better, depends on the details -- from a clustered index on exceptionDateTime. That's the huge advantage --edit: of the best clus index -- that you don't have to create the gazillions on covering indexes for so many queries.

    We now have a test harness Scott if you'd like to put the theory to work. I'm out of time for today for something of this scale but I'll have time tomorrow.

    But you're looking at one query in isolation. You have to consider all the processing going on against that table. All these custom indexes require insert/update/delete maintenance, and they take buffer space to satisfy only one query or two. A clustered index is by definition a covering index for all queries, and the buffer space is shared among all users of that data. There's usually many people reading the most recent data.

    Edit: Can you get better performance for a given table by building a custom table -- which is what the proposed index effectively is -- specifically for that query? Sure. But is it really worth rewriting half the table for every table mod just to get, say, 5% more performance for this query? It doesn't take long before the covering indexes cost you vastly more than they are worth. I've removed literally many thousands of nonclustered indexes here with vastly better performance overall (orders of magnitude in some cases) by first identifying and creating the best clustered index on tables. The vast majority of those tables had the clustered index on identity, because of the horribly mistaken and misguided belief that is some type of "default" clustered index for any table.

    The clustered index on Nick's table wasn't the best choice since it was VARCHAR(400), so I ran a few tests to see how a surrogate key might fare against the suggestion of using the exceptionDateTime column. First I set up a test table containing little more than the columns used by the query, and with the same number of rows, about 14.5 million. I also set up the smaller table. Then I played about with indexing. Here's the code for the sample data:

    -- set up sample data

    IF OBJECT_ID('tempdb..#Exception_main') IS NOT NULL DROP TABLE #Exception_main

    SELECT

    ID, -- 8 bytes

    [exceptionID] = CAST(REPLICATE(CAST(NEWID() AS VARCHAR(36))+' ',7) AS VARCHAR(400)),

    reportYear = YEAR(ReportDate),

    reportMonth = MONTH(ReportDate),

    reportable = CAST(CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'Y' ELSE 'N' END AS CHAR(1)),

    QueueID = CAST(ABS(CHECKSUM(NEWID()))%420 AS VARCHAR(256)),

    exceptionDateTime = DATEADD(DAY,ABS(CHECKSUM(NEWID()))%3,ReportDate), -- 8 bytes

    productArea = CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'FXMM' ELSE 'N' END,

    volume = ABS(CHECKSUM(NEWID()))%20,

    cost = ABS(CHECKSUM(NEWID()))%30

    INTO #Exception_main

    FROM (

    SELECT ID, ReportDate = DATEADD(MINUTE, 0-ID/20,GETDATE())

    FROM (

    SELECT TOP(14500000) -- 00:04:02 / 14 000 000

    ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c, SYS.COLUMNS d, SYS.COLUMNS e

    ) d

    ) e

    IF OBJECT_ID('tempdb..#Map_Exception') IS NOT NULL DROP TABLE #Map_Exception

    SELECT TOP(420)

    QueueID = CAST(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS VARCHAR(40)),

    [Service] = ABS(CHECKSUM(NEWID()))%100

    INTO #Map_Exception

    FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c

    CREATE UNIQUE CLUSTERED INDEX ucx_Map_Exception ON #Map_Exception (QueueID)

    This takes about four minutes to run on a steam-powered dev box.

    Having built the data, I had a play with indexes. To be sure that the results weren't skewed by other processes, I ran through the whole lot four times.

    Here's the query:

    SET STATISTICS IO, TIME ON

    SELECT m.reportMonth,

    m.reportYear,

    ex.[service],

    SUM(m.Vol) AS Vol,

    SUM(m.Effort) AS effort

    FROM (

    SELECT reportMonth,

    reportYear,

    queueID,

    SUM(volume) AS Vol,

    SUM(cost) AS Effort

    FROM #exception_Main -- 14.5M rows

    WHERE exceptionDateTime >= GETDATE() - 365

    AND productArea = ('FXMM')

    AND reportable = 'Y'

    GROUP BY

    reportYear,

    reportMonth,

    queueID

    ) m

    LEFT JOIN #map_Exception ex -- 420 rows

    ON m.queueID = ex.queueID

    GROUP BY

    m.reportYear,

    m.reportMonth,

    ex.[service]

    SET STATISTICS IO, TIME OFF

    On this particular set the query returned 1,287 aggregated rows from 2,633,895 qualifying rows.

    Here are the summarised results from the indexing tests:

    --====================================================================================================

    -- 1. Baseline

    CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([exceptionID])

    CREATE NONCLUSTERED INDEX [idx_ctsTrend] ON #exception_Main

    ([productArea] ASC, [reportable] ASC, [exceptionDateTime] ASC)

    INCLUDE ([queueID], [cost], [reportMonth], [reportYear], [volume])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 9,446,096 KB

    --Data = 4,640,032 KB

    --Index_size= 4,805,352 KB

    -- Best result from 6 runs: logical reads 131639, elapsed time = 656 ms.

    -- Index seek, no residual predicate, hash matches for aggregates

    --====================================================================================================

    -- 2. Unique clustered index on surrogate key ID

    DROP INDEX idx_ctsTrend ON #Exception_main

    DROP INDEX ucx_Sample ON #Exception_main

    CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([ID])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,652,128 KB

    --Data = 4,640,024 KB

    --Index_size= 11,672 KB

    -- Best result from 6 runs: logical reads 581450, elapsed time = 28288 ms.

    -- Clustered index scan, hash matches for aggregates

    --====================================================================================================

    -- 2.1 Unique clustered index on surrogate key ID with supporting nonclustered index

    CREATE INDEX ix_Helper ON #Exception_main

    (productArea, reportable, reportYear, reportMonth, queueID)

    INCLUDE (volume, cost, exceptionDateTime);

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 5,393,856 KB

    --Data = 4,640,024 KB

    --Index_size= 753,208 KB

    -- Best result from 6 runs: logical reads 24503, elapsed time = 332 ms.

    -- Index seek (productarea, reportable), residual predicate for exceptionDateTime, Stream Aggregates

    --====================================================================================================

    -- 3.0 Clustered index on exceptionDateTime

    DROP INDEX ix_Helper ON #Exception_main

    DROP INDEX ucx_Sample ON #Exception_main

    CREATE CLUSTERED INDEX ucx_Sample ON #Exception_main (exceptionDateTime)

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,656,544 KB

    --Data = 4,640,032 KB

    --Index_size= 16,024 KB

    -- Best result from 6 runs: logical reads 424515, physical reads 275, read-ahead reads 420270, elapsed time = 23276 ms.

    -- Clustered index seek (exceptionDateTime) residual predicate for productArea and Reportable, hash matches for aggregates

    --====================================================================================================

    -- 3.1 Clustered index on exceptionDateTime & recommended non-clustered index

    CREATE NONCLUSTERED INDEX ix_Recommended ON [dbo].[#Exception_main]

    ([reportable],[productArea],[exceptionDateTime])

    INCLUDE ([reportYear],[reportMonth],[QueueID],[volume],[cost])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,656,544 KB

    --Data = 4,640,032 KB

    --Index_size= 16,024 KB

    -- Best result from 6 runs: logical reads 16916, elapsed time = 495 ms.

    -- Clustered index seek, no residual predicate, hash matches for aggregates

    --====================================================================================================

    DROP INDEX ix_Recommended ON [dbo].[#Exception_main]

    DROP INDEX ucx_Sample ON #Exception_main

    -- back where we started, whizz around for another go to ensure results aren't skewed by local activity

    The clustered index on exceptionDateTime was only marginally faster than the surrogate key because so many rows had to be filtered by the residual predicate. For sure the ratio would change in favour of a cluster on exceptionDateTime with a smaller number of qualifying rows, but this dataset wasn't deliberately tipped in favour of a natural key, it's just a very rough approximation of a real world situation.

    xx

    That seems to prove my point: the best clustered index eliminated the need for an extra, nonclustered index with better overall performance. And, since datetime is used as a filter in (almost) every query, almost every query will perform overall better and with more consistency.

    Moreover, there's no dreaded "tipping point" when using the clustered index. Lastly, when you add one column to the query, such as including the customer in the grouping as did the original query the OP posted, there's no need to rebuild/refactor a covering index. No constant reshuffling nonclus indexes, and, inevitably, constantly increasing their size.

    Don't get me wrong. Some covering indexes will almost certainly still be needed. But they are drastically reduced.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • ScottPletcher (8/25/2015)


    ChrisM@Work (8/25/2015)


    ScottPletcher (8/21/2015)


    ChrisM@Work (8/21/2015)


    ScottPletcher (8/21/2015)


    ChrisM@Work (8/21/2015)

    The inner select can be accelerated with this index:

    CREATE INDEX ix_Helper ON [cts].[exception_Main]

    (productArea, reportable, reportYear, reportMonth, queueID)

    INCLUDE (volume, cost, exceptionDateTime)

    But with the best clustered index now on the table, you don't need this extra, fairly large index. You should get as good a response time -- possibly even better, depends on the details -- from a clustered index on exceptionDateTime. That's the huge advantage --edit: of the best clus index -- that you don't have to create the gazillions on covering indexes for so many queries.

    We now have a test harness Scott if you'd like to put the theory to work. I'm out of time for today for something of this scale but I'll have time tomorrow.

    But you're looking at one query in isolation. You have to consider all the processing going on against that table. All these custom indexes require insert/update/delete maintenance, and they take buffer space to satisfy only one query or two. A clustered index is by definition a covering index for all queries, and the buffer space is shared among all users of that data. There's usually many people reading the most recent data.

    Edit: Can you get better performance for a given table by building a custom table -- which is what the proposed index effectively is -- specifically for that query? Sure. But is it really worth rewriting half the table for every table mod just to get, say, 5% more performance for this query? It doesn't take long before the covering indexes cost you vastly more than they are worth. I've removed literally many thousands of nonclustered indexes here with vastly better performance overall (orders of magnitude in some cases) by first identifying and creating the best clustered index on tables. The vast majority of those tables had the clustered index on identity, because of the horribly mistaken and misguided belief that is some type of "default" clustered index for any table.

    The clustered index on Nick's table wasn't the best choice since it was VARCHAR(400), so I ran a few tests to see how a surrogate key might fare against the suggestion of using the exceptionDateTime column. First I set up a test table containing little more than the columns used by the query, and with the same number of rows, about 14.5 million. I also set up the smaller table. Then I played about with indexing. Here's the code for the sample data:

    -- set up sample data

    IF OBJECT_ID('tempdb..#Exception_main') IS NOT NULL DROP TABLE #Exception_main

    SELECT

    ID, -- 8 bytes

    [exceptionID] = CAST(REPLICATE(CAST(NEWID() AS VARCHAR(36))+' ',7) AS VARCHAR(400)),

    reportYear = YEAR(ReportDate),

    reportMonth = MONTH(ReportDate),

    reportable = CAST(CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'Y' ELSE 'N' END AS CHAR(1)),

    QueueID = CAST(ABS(CHECKSUM(NEWID()))%420 AS VARCHAR(256)),

    exceptionDateTime = DATEADD(DAY,ABS(CHECKSUM(NEWID()))%3,ReportDate), -- 8 bytes

    productArea = CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'FXMM' ELSE 'N' END,

    volume = ABS(CHECKSUM(NEWID()))%20,

    cost = ABS(CHECKSUM(NEWID()))%30

    INTO #Exception_main

    FROM (

    SELECT ID, ReportDate = DATEADD(MINUTE, 0-ID/20,GETDATE())

    FROM (

    SELECT TOP(14500000) -- 00:04:02 / 14 000 000

    ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c, SYS.COLUMNS d, SYS.COLUMNS e

    ) d

    ) e

    IF OBJECT_ID('tempdb..#Map_Exception') IS NOT NULL DROP TABLE #Map_Exception

    SELECT TOP(420)

    QueueID = CAST(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS VARCHAR(40)),

    [Service] = ABS(CHECKSUM(NEWID()))%100

    INTO #Map_Exception

    FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c

    CREATE UNIQUE CLUSTERED INDEX ucx_Map_Exception ON #Map_Exception (QueueID)

    This takes about four minutes to run on a steam-powered dev box.

    Having built the data, I had a play with indexes. To be sure that the results weren't skewed by other processes, I ran through the whole lot four times.

    Here's the query:

    SET STATISTICS IO, TIME ON

    SELECT m.reportMonth,

    m.reportYear,

    ex.[service],

    SUM(m.Vol) AS Vol,

    SUM(m.Effort) AS effort

    FROM (

    SELECT reportMonth,

    reportYear,

    queueID,

    SUM(volume) AS Vol,

    SUM(cost) AS Effort

    FROM #exception_Main -- 14.5M rows

    WHERE exceptionDateTime >= GETDATE() - 365

    AND productArea = ('FXMM')

    AND reportable = 'Y'

    GROUP BY

    reportYear,

    reportMonth,

    queueID

    ) m

    LEFT JOIN #map_Exception ex -- 420 rows

    ON m.queueID = ex.queueID

    GROUP BY

    m.reportYear,

    m.reportMonth,

    ex.[service]

    SET STATISTICS IO, TIME OFF

    On this particular set the query returned 1,287 aggregated rows from 2,633,895 qualifying rows.

    Here are the summarised results from the indexing tests:

    --====================================================================================================

    -- 1. Baseline

    CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([exceptionID])

    CREATE NONCLUSTERED INDEX [idx_ctsTrend] ON #exception_Main

    ([productArea] ASC, [reportable] ASC, [exceptionDateTime] ASC)

    INCLUDE ([queueID], [cost], [reportMonth], [reportYear], [volume])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 9,446,096 KB

    --Data = 4,640,032 KB

    --Index_size= 4,805,352 KB

    -- Best result from 6 runs: logical reads 131639, elapsed time = 656 ms.

    -- Index seek, no residual predicate, hash matches for aggregates

    --====================================================================================================

    -- 2. Unique clustered index on surrogate key ID

    DROP INDEX idx_ctsTrend ON #Exception_main

    DROP INDEX ucx_Sample ON #Exception_main

    CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([ID])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,652,128 KB

    --Data = 4,640,024 KB

    --Index_size= 11,672 KB

    -- Best result from 6 runs: logical reads 581450, elapsed time = 28288 ms.

    -- Clustered index scan, hash matches for aggregates

    --====================================================================================================

    -- 2.1 Unique clustered index on surrogate key ID with supporting nonclustered index

    CREATE INDEX ix_Helper ON #Exception_main

    (productArea, reportable, reportYear, reportMonth, queueID)

    INCLUDE (volume, cost, exceptionDateTime);

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 5,393,856 KB

    --Data = 4,640,024 KB

    --Index_size= 753,208 KB

    -- Best result from 6 runs: logical reads 24503, elapsed time = 332 ms.

    -- Index seek (productarea, reportable), residual predicate for exceptionDateTime, Stream Aggregates

    --====================================================================================================

    -- 3.0 Clustered index on exceptionDateTime

    DROP INDEX ix_Helper ON #Exception_main

    DROP INDEX ucx_Sample ON #Exception_main

    CREATE CLUSTERED INDEX ucx_Sample ON #Exception_main (exceptionDateTime)

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,656,544 KB

    --Data = 4,640,032 KB

    --Index_size= 16,024 KB

    -- Best result from 6 runs: logical reads 424515, physical reads 275, read-ahead reads 420270, elapsed time = 23276 ms.

    -- Clustered index seek (exceptionDateTime) residual predicate for productArea and Reportable, hash matches for aggregates

    --====================================================================================================

    -- 3.1 Clustered index on exceptionDateTime & recommended non-clustered index

    CREATE NONCLUSTERED INDEX ix_Recommended ON [dbo].[#Exception_main]

    ([reportable],[productArea],[exceptionDateTime])

    INCLUDE ([reportYear],[reportMonth],[QueueID],[volume],[cost])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,656,544 KB

    --Data = 4,640,032 KB

    --Index_size= 16,024 KB

    -- Best result from 6 runs: logical reads 16916, elapsed time = 495 ms.

    -- Clustered index seek, no residual predicate, hash matches for aggregates

    --====================================================================================================

    DROP INDEX ix_Recommended ON [dbo].[#Exception_main]

    DROP INDEX ucx_Sample ON #Exception_main

    -- back where we started, whizz around for another go to ensure results aren't skewed by local activity

    The clustered index on exceptionDateTime was only marginally faster than the surrogate key because so many rows had to be filtered by the residual predicate. For sure the ratio would change in favour of a cluster on exceptionDateTime with a smaller number of qualifying rows, but this dataset wasn't deliberately tipped in favour of a natural key, it's just a very rough approximation of a real world situation.

    xx

    That seems to prove my point: the best clustered index eliminated the need for an extra, nonclustered index with better overall performance. And, since datetime is used as a filter in (almost) every query, almost every query will perform overall better and with more consistency.

    Moreover, there's no dreaded "tipping point" when using the clustered index. Lastly, when you add one column to the query, such as including the customer in the grouping as did the original query the OP posted, there's no need to rebuild/refactor a covering index. No constant reshuffling nonclus indexes, and, inevitably, constantly increasing their size.

    Don't get me wrong. Some covering indexes will almost certainly still be needed. But they are drastically reduced.

    How is 23 seconds better than 300 milliseconds? The clustered index on your chosen column was only marginally better than a surrogate key (ID), which returned in 28 seconds.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (8/25/2015)


    ScottPletcher (8/25/2015)


    ChrisM@Work (8/25/2015)


    ScottPletcher (8/21/2015)


    ChrisM@Work (8/21/2015)


    ScottPletcher (8/21/2015)


    ChrisM@Work (8/21/2015)

    The inner select can be accelerated with this index:

    CREATE INDEX ix_Helper ON [cts].[exception_Main]

    (productArea, reportable, reportYear, reportMonth, queueID)

    INCLUDE (volume, cost, exceptionDateTime)

    But with the best clustered index now on the table, you don't need this extra, fairly large index. You should get as good a response time -- possibly even better, depends on the details -- from a clustered index on exceptionDateTime. That's the huge advantage --edit: of the best clus index -- that you don't have to create the gazillions on covering indexes for so many queries.

    We now have a test harness Scott if you'd like to put the theory to work. I'm out of time for today for something of this scale but I'll have time tomorrow.

    But you're looking at one query in isolation. You have to consider all the processing going on against that table. All these custom indexes require insert/update/delete maintenance, and they take buffer space to satisfy only one query or two. A clustered index is by definition a covering index for all queries, and the buffer space is shared among all users of that data. There's usually many people reading the most recent data.

    Edit: Can you get better performance for a given table by building a custom table -- which is what the proposed index effectively is -- specifically for that query? Sure. But is it really worth rewriting half the table for every table mod just to get, say, 5% more performance for this query? It doesn't take long before the covering indexes cost you vastly more than they are worth. I've removed literally many thousands of nonclustered indexes here with vastly better performance overall (orders of magnitude in some cases) by first identifying and creating the best clustered index on tables. The vast majority of those tables had the clustered index on identity, because of the horribly mistaken and misguided belief that is some type of "default" clustered index for any table.

    The clustered index on Nick's table wasn't the best choice since it was VARCHAR(400), so I ran a few tests to see how a surrogate key might fare against the suggestion of using the exceptionDateTime column. First I set up a test table containing little more than the columns used by the query, and with the same number of rows, about 14.5 million. I also set up the smaller table. Then I played about with indexing. Here's the code for the sample data:

    -- set up sample data

    IF OBJECT_ID('tempdb..#Exception_main') IS NOT NULL DROP TABLE #Exception_main

    SELECT

    ID, -- 8 bytes

    [exceptionID] = CAST(REPLICATE(CAST(NEWID() AS VARCHAR(36))+' ',7) AS VARCHAR(400)),

    reportYear = YEAR(ReportDate),

    reportMonth = MONTH(ReportDate),

    reportable = CAST(CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'Y' ELSE 'N' END AS CHAR(1)),

    QueueID = CAST(ABS(CHECKSUM(NEWID()))%420 AS VARCHAR(256)),

    exceptionDateTime = DATEADD(DAY,ABS(CHECKSUM(NEWID()))%3,ReportDate), -- 8 bytes

    productArea = CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'FXMM' ELSE 'N' END,

    volume = ABS(CHECKSUM(NEWID()))%20,

    cost = ABS(CHECKSUM(NEWID()))%30

    INTO #Exception_main

    FROM (

    SELECT ID, ReportDate = DATEADD(MINUTE, 0-ID/20,GETDATE())

    FROM (

    SELECT TOP(14500000) -- 00:04:02 / 14 000 000

    ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c, SYS.COLUMNS d, SYS.COLUMNS e

    ) d

    ) e

    IF OBJECT_ID('tempdb..#Map_Exception') IS NOT NULL DROP TABLE #Map_Exception

    SELECT TOP(420)

    QueueID = CAST(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS VARCHAR(40)),

    [Service] = ABS(CHECKSUM(NEWID()))%100

    INTO #Map_Exception

    FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c

    CREATE UNIQUE CLUSTERED INDEX ucx_Map_Exception ON #Map_Exception (QueueID)

    This takes about four minutes to run on a steam-powered dev box.

    Having built the data, I had a play with indexes. To be sure that the results weren't skewed by other processes, I ran through the whole lot four times.

    Here's the query:

    SET STATISTICS IO, TIME ON

    SELECT m.reportMonth,

    m.reportYear,

    ex.[service],

    SUM(m.Vol) AS Vol,

    SUM(m.Effort) AS effort

    FROM (

    SELECT reportMonth,

    reportYear,

    queueID,

    SUM(volume) AS Vol,

    SUM(cost) AS Effort

    FROM #exception_Main -- 14.5M rows

    WHERE exceptionDateTime >= GETDATE() - 365

    AND productArea = ('FXMM')

    AND reportable = 'Y'

    GROUP BY

    reportYear,

    reportMonth,

    queueID

    ) m

    LEFT JOIN #map_Exception ex -- 420 rows

    ON m.queueID = ex.queueID

    GROUP BY

    m.reportYear,

    m.reportMonth,

    ex.[service]

    SET STATISTICS IO, TIME OFF

    On this particular set the query returned 1,287 aggregated rows from 2,633,895 qualifying rows.

    Here are the summarised results from the indexing tests:

    --====================================================================================================

    -- 1. Baseline

    CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([exceptionID])

    CREATE NONCLUSTERED INDEX [idx_ctsTrend] ON #exception_Main

    ([productArea] ASC, [reportable] ASC, [exceptionDateTime] ASC)

    INCLUDE ([queueID], [cost], [reportMonth], [reportYear], [volume])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 9,446,096 KB

    --Data = 4,640,032 KB

    --Index_size= 4,805,352 KB

    -- Best result from 6 runs: logical reads 131639, elapsed time = 656 ms.

    -- Index seek, no residual predicate, hash matches for aggregates

    --====================================================================================================

    -- 2. Unique clustered index on surrogate key ID

    DROP INDEX idx_ctsTrend ON #Exception_main

    DROP INDEX ucx_Sample ON #Exception_main

    CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([ID])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,652,128 KB

    --Data = 4,640,024 KB

    --Index_size= 11,672 KB

    -- Best result from 6 runs: logical reads 581450, elapsed time = 28288 ms.

    -- Clustered index scan, hash matches for aggregates

    --====================================================================================================

    -- 2.1 Unique clustered index on surrogate key ID with supporting nonclustered index

    CREATE INDEX ix_Helper ON #Exception_main

    (productArea, reportable, reportYear, reportMonth, queueID)

    INCLUDE (volume, cost, exceptionDateTime);

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 5,393,856 KB

    --Data = 4,640,024 KB

    --Index_size= 753,208 KB

    -- Best result from 6 runs: logical reads 24503, elapsed time = 332 ms.

    -- Index seek (productarea, reportable), residual predicate for exceptionDateTime, Stream Aggregates

    --====================================================================================================

    -- 3.0 Clustered index on exceptionDateTime

    DROP INDEX ix_Helper ON #Exception_main

    DROP INDEX ucx_Sample ON #Exception_main

    CREATE CLUSTERED INDEX ucx_Sample ON #Exception_main (exceptionDateTime)

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,656,544 KB

    --Data = 4,640,032 KB

    --Index_size= 16,024 KB

    -- Best result from 6 runs: logical reads 424515, physical reads 275, read-ahead reads 420270, elapsed time = 23276 ms.

    -- Clustered index seek (exceptionDateTime) residual predicate for productArea and Reportable, hash matches for aggregates

    --====================================================================================================

    -- 3.1 Clustered index on exceptionDateTime & recommended non-clustered index

    CREATE NONCLUSTERED INDEX ix_Recommended ON [dbo].[#Exception_main]

    ([reportable],[productArea],[exceptionDateTime])

    INCLUDE ([reportYear],[reportMonth],[QueueID],[volume],[cost])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,656,544 KB

    --Data = 4,640,032 KB

    --Index_size= 16,024 KB

    -- Best result from 6 runs: logical reads 16916, elapsed time = 495 ms.

    -- Clustered index seek, no residual predicate, hash matches for aggregates

    --====================================================================================================

    DROP INDEX ix_Recommended ON [dbo].[#Exception_main]

    DROP INDEX ucx_Sample ON #Exception_main

    -- back where we started, whizz around for another go to ensure results aren't skewed by local activity

    The clustered index on exceptionDateTime was only marginally faster than the surrogate key because so many rows had to be filtered by the residual predicate. For sure the ratio would change in favour of a cluster on exceptionDateTime with a smaller number of qualifying rows, but this dataset wasn't deliberately tipped in favour of a natural key, it's just a very rough approximation of a real world situation.

    xx

    That seems to prove my point: the best clustered index eliminated the need for an extra, nonclustered index with better overall performance. And, since datetime is used as a filter in (almost) every query, almost every query will perform overall better and with more consistency.

    Moreover, there's no dreaded "tipping point" when using the clustered index. Lastly, when you add one column to the query, such as including the customer in the grouping as did the original query the OP posted, there's no need to rebuild/refactor a covering index. No constant reshuffling nonclus indexes, and, inevitably, constantly increasing their size.

    Don't get me wrong. Some covering indexes will almost certainly still be needed. But they are drastically reduced.

    How is 23 seconds better than 300 milliseconds? The clustered index on your chosen column was only marginally better than a surrogate key (ID), which returned in 28 seconds.

    Busy, probably didn't read closely enough. I've tuned tens of thousands of tables and well over half the time there is a better clustered index than one on an identity column. It gives better overall performance while deleting thousands of nonclus indexes. The idea that there should be a "default" clustering index of identity is just false, period.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • ScottPletcher (8/25/2015)


    ChrisM@Work (8/25/2015)


    ScottPletcher (8/25/2015)


    ChrisM@Work (8/25/2015)


    ScottPletcher (8/21/2015)


    ChrisM@Work (8/21/2015)


    ScottPletcher (8/21/2015)


    ChrisM@Work (8/21/2015)

    The inner select can be accelerated with this index:

    CREATE INDEX ix_Helper ON [cts].[exception_Main]

    (productArea, reportable, reportYear, reportMonth, queueID)

    INCLUDE (volume, cost, exceptionDateTime)

    But with the best clustered index now on the table, you don't need this extra, fairly large index. You should get as good a response time -- possibly even better, depends on the details -- from a clustered index on exceptionDateTime. That's the huge advantage --edit: of the best clus index -- that you don't have to create the gazillions on covering indexes for so many queries.

    We now have a test harness Scott if you'd like to put the theory to work. I'm out of time for today for something of this scale but I'll have time tomorrow.

    But you're looking at one query in isolation. You have to consider all the processing going on against that table. All these custom indexes require insert/update/delete maintenance, and they take buffer space to satisfy only one query or two. A clustered index is by definition a covering index for all queries, and the buffer space is shared among all users of that data. There's usually many people reading the most recent data.

    Edit: Can you get better performance for a given table by building a custom table -- which is what the proposed index effectively is -- specifically for that query? Sure. But is it really worth rewriting half the table for every table mod just to get, say, 5% more performance for this query? It doesn't take long before the covering indexes cost you vastly more than they are worth. I've removed literally many thousands of nonclustered indexes here with vastly better performance overall (orders of magnitude in some cases) by first identifying and creating the best clustered index on tables. The vast majority of those tables had the clustered index on identity, because of the horribly mistaken and misguided belief that is some type of "default" clustered index for any table.

    The clustered index on Nick's table wasn't the best choice since it was VARCHAR(400), so I ran a few tests to see how a surrogate key might fare against the suggestion of using the exceptionDateTime column. First I set up a test table containing little more than the columns used by the query, and with the same number of rows, about 14.5 million. I also set up the smaller table. Then I played about with indexing. Here's the code for the sample data:

    -- set up sample data

    IF OBJECT_ID('tempdb..#Exception_main') IS NOT NULL DROP TABLE #Exception_main

    SELECT

    ID, -- 8 bytes

    [exceptionID] = CAST(REPLICATE(CAST(NEWID() AS VARCHAR(36))+' ',7) AS VARCHAR(400)),

    reportYear = YEAR(ReportDate),

    reportMonth = MONTH(ReportDate),

    reportable = CAST(CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'Y' ELSE 'N' END AS CHAR(1)),

    QueueID = CAST(ABS(CHECKSUM(NEWID()))%420 AS VARCHAR(256)),

    exceptionDateTime = DATEADD(DAY,ABS(CHECKSUM(NEWID()))%3,ReportDate), -- 8 bytes

    productArea = CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'FXMM' ELSE 'N' END,

    volume = ABS(CHECKSUM(NEWID()))%20,

    cost = ABS(CHECKSUM(NEWID()))%30

    INTO #Exception_main

    FROM (

    SELECT ID, ReportDate = DATEADD(MINUTE, 0-ID/20,GETDATE())

    FROM (

    SELECT TOP(14500000) -- 00:04:02 / 14 000 000

    ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c, SYS.COLUMNS d, SYS.COLUMNS e

    ) d

    ) e

    IF OBJECT_ID('tempdb..#Map_Exception') IS NOT NULL DROP TABLE #Map_Exception

    SELECT TOP(420)

    QueueID = CAST(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS VARCHAR(40)),

    [Service] = ABS(CHECKSUM(NEWID()))%100

    INTO #Map_Exception

    FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c

    CREATE UNIQUE CLUSTERED INDEX ucx_Map_Exception ON #Map_Exception (QueueID)

    This takes about four minutes to run on a steam-powered dev box.

    Having built the data, I had a play with indexes. To be sure that the results weren't skewed by other processes, I ran through the whole lot four times.

    Here's the query:

    SET STATISTICS IO, TIME ON

    SELECT m.reportMonth,

    m.reportYear,

    ex.[service],

    SUM(m.Vol) AS Vol,

    SUM(m.Effort) AS effort

    FROM (

    SELECT reportMonth,

    reportYear,

    queueID,

    SUM(volume) AS Vol,

    SUM(cost) AS Effort

    FROM #exception_Main -- 14.5M rows

    WHERE exceptionDateTime >= GETDATE() - 365

    AND productArea = ('FXMM')

    AND reportable = 'Y'

    GROUP BY

    reportYear,

    reportMonth,

    queueID

    ) m

    LEFT JOIN #map_Exception ex -- 420 rows

    ON m.queueID = ex.queueID

    GROUP BY

    m.reportYear,

    m.reportMonth,

    ex.[service]

    SET STATISTICS IO, TIME OFF

    On this particular set the query returned 1,287 aggregated rows from 2,633,895 qualifying rows.

    Here are the summarised results from the indexing tests:

    --====================================================================================================

    -- 1. Baseline

    CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([exceptionID])

    CREATE NONCLUSTERED INDEX [idx_ctsTrend] ON #exception_Main

    ([productArea] ASC, [reportable] ASC, [exceptionDateTime] ASC)

    INCLUDE ([queueID], [cost], [reportMonth], [reportYear], [volume])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 9,446,096 KB

    --Data = 4,640,032 KB

    --Index_size= 4,805,352 KB

    -- Best result from 6 runs: logical reads 131639, elapsed time = 656 ms.

    -- Index seek, no residual predicate, hash matches for aggregates

    --====================================================================================================

    -- 2. Unique clustered index on surrogate key ID

    DROP INDEX idx_ctsTrend ON #Exception_main

    DROP INDEX ucx_Sample ON #Exception_main

    CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([ID])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,652,128 KB

    --Data = 4,640,024 KB

    --Index_size= 11,672 KB

    -- Best result from 6 runs: logical reads 581450, elapsed time = 28288 ms.

    -- Clustered index scan, hash matches for aggregates

    --====================================================================================================

    -- 2.1 Unique clustered index on surrogate key ID with supporting nonclustered index

    CREATE INDEX ix_Helper ON #Exception_main

    (productArea, reportable, reportYear, reportMonth, queueID)

    INCLUDE (volume, cost, exceptionDateTime);

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 5,393,856 KB

    --Data = 4,640,024 KB

    --Index_size= 753,208 KB

    -- Best result from 6 runs: logical reads 24503, elapsed time = 332 ms.

    -- Index seek (productarea, reportable), residual predicate for exceptionDateTime, Stream Aggregates

    --====================================================================================================

    -- 3.0 Clustered index on exceptionDateTime

    DROP INDEX ix_Helper ON #Exception_main

    DROP INDEX ucx_Sample ON #Exception_main

    CREATE CLUSTERED INDEX ucx_Sample ON #Exception_main (exceptionDateTime)

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,656,544 KB

    --Data = 4,640,032 KB

    --Index_size= 16,024 KB

    -- Best result from 6 runs: logical reads 424515, physical reads 275, read-ahead reads 420270, elapsed time = 23276 ms.

    -- Clustered index seek (exceptionDateTime) residual predicate for productArea and Reportable, hash matches for aggregates

    --====================================================================================================

    -- 3.1 Clustered index on exceptionDateTime & recommended non-clustered index

    CREATE NONCLUSTERED INDEX ix_Recommended ON [dbo].[#Exception_main]

    ([reportable],[productArea],[exceptionDateTime])

    INCLUDE ([reportYear],[reportMonth],[QueueID],[volume],[cost])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,656,544 KB

    --Data = 4,640,032 KB

    --Index_size= 16,024 KB

    -- Best result from 6 runs: logical reads 16916, elapsed time = 495 ms.

    -- Clustered index seek, no residual predicate, hash matches for aggregates

    --====================================================================================================

    DROP INDEX ix_Recommended ON [dbo].[#Exception_main]

    DROP INDEX ucx_Sample ON #Exception_main

    -- back where we started, whizz around for another go to ensure results aren't skewed by local activity

    The clustered index on exceptionDateTime was only marginally faster than the surrogate key because so many rows had to be filtered by the residual predicate. For sure the ratio would change in favour of a cluster on exceptionDateTime with a smaller number of qualifying rows, but this dataset wasn't deliberately tipped in favour of a natural key, it's just a very rough approximation of a real world situation.

    xx

    That seems to prove my point: the best clustered index eliminated the need for an extra, nonclustered index with better overall performance. And, since datetime is used as a filter in (almost) every query, almost every query will perform overall better and with more consistency.

    Moreover, there's no dreaded "tipping point" when using the clustered index. Lastly, when you add one column to the query, such as including the customer in the grouping as did the original query the OP posted, there's no need to rebuild/refactor a covering index. No constant reshuffling nonclus indexes, and, inevitably, constantly increasing their size.

    Don't get me wrong. Some covering indexes will almost certainly still be needed. But they are drastically reduced.

    How is 23 seconds better than 300 milliseconds? The clustered index on your chosen column was only marginally better than a surrogate key (ID), which returned in 28 seconds.

    Busy, probably didn't read closely enough. I've tuned tens of thousands of tables and well over half the time there is a better clustered index than one on an identity column. It gives better overall performance while deleting thousands of nonclus indexes. The idea that there should be a "default" clustering index of identity is just false, period.

    "The idea that there should be a "default" clustering index of identity is just false, period." Yes, agreed. But as a very famous guy who lurks around here is fond of saying "It depends", and in this particular case, purely by accident, the identity column fares quite well. Not only that, but the clustered index of your choice is next to useless without a supporting non-clustered index. Now here's something else to think about. The best choice of clustered index, if it's to be a natural key, won't be known until the database has been live for long enough to pick up decent usage stats.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (8/25/2015)


    ScottPletcher (8/25/2015)


    ChrisM@Work (8/25/2015)


    ScottPletcher (8/25/2015)


    ChrisM@Work (8/25/2015)


    ScottPletcher (8/21/2015)


    ChrisM@Work (8/21/2015)


    ScottPletcher (8/21/2015)


    ChrisM@Work (8/21/2015)

    The inner select can be accelerated with this index:

    CREATE INDEX ix_Helper ON [cts].[exception_Main]

    (productArea, reportable, reportYear, reportMonth, queueID)

    INCLUDE (volume, cost, exceptionDateTime)

    But with the best clustered index now on the table, you don't need this extra, fairly large index. You should get as good a response time -- possibly even better, depends on the details -- from a clustered index on exceptionDateTime. That's the huge advantage --edit: of the best clus index -- that you don't have to create the gazillions on covering indexes for so many queries.

    We now have a test harness Scott if you'd like to put the theory to work. I'm out of time for today for something of this scale but I'll have time tomorrow.

    But you're looking at one query in isolation. You have to consider all the processing going on against that table. All these custom indexes require insert/update/delete maintenance, and they take buffer space to satisfy only one query or two. A clustered index is by definition a covering index for all queries, and the buffer space is shared among all users of that data. There's usually many people reading the most recent data.

    Edit: Can you get better performance for a given table by building a custom table -- which is what the proposed index effectively is -- specifically for that query? Sure. But is it really worth rewriting half the table for every table mod just to get, say, 5% more performance for this query? It doesn't take long before the covering indexes cost you vastly more than they are worth. I've removed literally many thousands of nonclustered indexes here with vastly better performance overall (orders of magnitude in some cases) by first identifying and creating the best clustered index on tables. The vast majority of those tables had the clustered index on identity, because of the horribly mistaken and misguided belief that is some type of "default" clustered index for any table.

    The clustered index on Nick's table wasn't the best choice since it was VARCHAR(400), so I ran a few tests to see how a surrogate key might fare against the suggestion of using the exceptionDateTime column. First I set up a test table containing little more than the columns used by the query, and with the same number of rows, about 14.5 million. I also set up the smaller table. Then I played about with indexing. Here's the code for the sample data:

    -- set up sample data

    IF OBJECT_ID('tempdb..#Exception_main') IS NOT NULL DROP TABLE #Exception_main

    SELECT

    ID, -- 8 bytes

    [exceptionID] = CAST(REPLICATE(CAST(NEWID() AS VARCHAR(36))+' ',7) AS VARCHAR(400)),

    reportYear = YEAR(ReportDate),

    reportMonth = MONTH(ReportDate),

    reportable = CAST(CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'Y' ELSE 'N' END AS CHAR(1)),

    QueueID = CAST(ABS(CHECKSUM(NEWID()))%420 AS VARCHAR(256)),

    exceptionDateTime = DATEADD(DAY,ABS(CHECKSUM(NEWID()))%3,ReportDate), -- 8 bytes

    productArea = CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'FXMM' ELSE 'N' END,

    volume = ABS(CHECKSUM(NEWID()))%20,

    cost = ABS(CHECKSUM(NEWID()))%30

    INTO #Exception_main

    FROM (

    SELECT ID, ReportDate = DATEADD(MINUTE, 0-ID/20,GETDATE())

    FROM (

    SELECT TOP(14500000) -- 00:04:02 / 14 000 000

    ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c, SYS.COLUMNS d, SYS.COLUMNS e

    ) d

    ) e

    IF OBJECT_ID('tempdb..#Map_Exception') IS NOT NULL DROP TABLE #Map_Exception

    SELECT TOP(420)

    QueueID = CAST(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS VARCHAR(40)),

    [Service] = ABS(CHECKSUM(NEWID()))%100

    INTO #Map_Exception

    FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c

    CREATE UNIQUE CLUSTERED INDEX ucx_Map_Exception ON #Map_Exception (QueueID)

    This takes about four minutes to run on a steam-powered dev box.

    Having built the data, I had a play with indexes. To be sure that the results weren't skewed by other processes, I ran through the whole lot four times.

    Here's the query:

    SET STATISTICS IO, TIME ON

    SELECT m.reportMonth,

    m.reportYear,

    ex.[service],

    SUM(m.Vol) AS Vol,

    SUM(m.Effort) AS effort

    FROM (

    SELECT reportMonth,

    reportYear,

    queueID,

    SUM(volume) AS Vol,

    SUM(cost) AS Effort

    FROM #exception_Main -- 14.5M rows

    WHERE exceptionDateTime >= GETDATE() - 365

    AND productArea = ('FXMM')

    AND reportable = 'Y'

    GROUP BY

    reportYear,

    reportMonth,

    queueID

    ) m

    LEFT JOIN #map_Exception ex -- 420 rows

    ON m.queueID = ex.queueID

    GROUP BY

    m.reportYear,

    m.reportMonth,

    ex.[service]

    SET STATISTICS IO, TIME OFF

    On this particular set the query returned 1,287 aggregated rows from 2,633,895 qualifying rows.

    Here are the summarised results from the indexing tests:

    --====================================================================================================

    -- 1. Baseline

    CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([exceptionID])

    CREATE NONCLUSTERED INDEX [idx_ctsTrend] ON #exception_Main

    ([productArea] ASC, [reportable] ASC, [exceptionDateTime] ASC)

    INCLUDE ([queueID], [cost], [reportMonth], [reportYear], [volume])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 9,446,096 KB

    --Data = 4,640,032 KB

    --Index_size= 4,805,352 KB

    -- Best result from 6 runs: logical reads 131639, elapsed time = 656 ms.

    -- Index seek, no residual predicate, hash matches for aggregates

    --====================================================================================================

    -- 2. Unique clustered index on surrogate key ID

    DROP INDEX idx_ctsTrend ON #Exception_main

    DROP INDEX ucx_Sample ON #Exception_main

    CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([ID])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,652,128 KB

    --Data = 4,640,024 KB

    --Index_size= 11,672 KB

    -- Best result from 6 runs: logical reads 581450, elapsed time = 28288 ms.

    -- Clustered index scan, hash matches for aggregates

    --====================================================================================================

    -- 2.1 Unique clustered index on surrogate key ID with supporting nonclustered index

    CREATE INDEX ix_Helper ON #Exception_main

    (productArea, reportable, reportYear, reportMonth, queueID)

    INCLUDE (volume, cost, exceptionDateTime);

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 5,393,856 KB

    --Data = 4,640,024 KB

    --Index_size= 753,208 KB

    -- Best result from 6 runs: logical reads 24503, elapsed time = 332 ms.

    -- Index seek (productarea, reportable), residual predicate for exceptionDateTime, Stream Aggregates

    --====================================================================================================

    -- 3.0 Clustered index on exceptionDateTime

    DROP INDEX ix_Helper ON #Exception_main

    DROP INDEX ucx_Sample ON #Exception_main

    CREATE CLUSTERED INDEX ucx_Sample ON #Exception_main (exceptionDateTime)

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,656,544 KB

    --Data = 4,640,032 KB

    --Index_size= 16,024 KB

    -- Best result from 6 runs: logical reads 424515, physical reads 275, read-ahead reads 420270, elapsed time = 23276 ms.

    -- Clustered index seek (exceptionDateTime) residual predicate for productArea and Reportable, hash matches for aggregates

    --====================================================================================================

    -- 3.1 Clustered index on exceptionDateTime & recommended non-clustered index

    CREATE NONCLUSTERED INDEX ix_Recommended ON [dbo].[#Exception_main]

    ([reportable],[productArea],[exceptionDateTime])

    INCLUDE ([reportYear],[reportMonth],[QueueID],[volume],[cost])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,656,544 KB

    --Data = 4,640,032 KB

    --Index_size= 16,024 KB

    -- Best result from 6 runs: logical reads 16916, elapsed time = 495 ms.

    -- Clustered index seek, no residual predicate, hash matches for aggregates

    --====================================================================================================

    DROP INDEX ix_Recommended ON [dbo].[#Exception_main]

    DROP INDEX ucx_Sample ON #Exception_main

    -- back where we started, whizz around for another go to ensure results aren't skewed by local activity

    The clustered index on exceptionDateTime was only marginally faster than the surrogate key because so many rows had to be filtered by the residual predicate. For sure the ratio would change in favour of a cluster on exceptionDateTime with a smaller number of qualifying rows, but this dataset wasn't deliberately tipped in favour of a natural key, it's just a very rough approximation of a real world situation.

    xx

    That seems to prove my point: the best clustered index eliminated the need for an extra, nonclustered index with better overall performance. And, since datetime is used as a filter in (almost) every query, almost every query will perform overall better and with more consistency.

    Moreover, there's no dreaded "tipping point" when using the clustered index. Lastly, when you add one column to the query, such as including the customer in the grouping as did the original query the OP posted, there's no need to rebuild/refactor a covering index. No constant reshuffling nonclus indexes, and, inevitably, constantly increasing their size.

    Don't get me wrong. Some covering indexes will almost certainly still be needed. But they are drastically reduced.

    How is 23 seconds better than 300 milliseconds? The clustered index on your chosen column was only marginally better than a surrogate key (ID), which returned in 28 seconds.

    Busy, probably didn't read closely enough. I've tuned tens of thousands of tables and well over half the time there is a better clustered index than one on an identity column. It gives better overall performance while deleting thousands of nonclus indexes. The idea that there should be a "default" clustering index of identity is just false, period.

    "The idea that there should be a "default" clustering index of identity is just false, period." Yes, agreed. But as a very famous guy who lurks around here is fond of saying "It depends", and in this particular case, purely by accident, the identity column fares quite well. Not only that, but the clustered index of your choice is next to useless without a supporting non-clustered index. Now here's something else to think about. The best choice of clustered index, if it's to be a natural key, won't be known until the database has been live for long enough to pick up decent usage stats.

    It's not at all next to useless. Typically you don't use an entire year's worth of data at once. You're also not including the overhead of maintaining custom indexes for every query. Again, yes, a custom table built and maintained for just that specific query will almost always outperform the general table, but that's not the total part of it.

    You also didn't replicate the actual clustered index the requester created. Given their knowledge of the data, they included additional columns in the index that you ignored. As I noted above, only they would know whether to add columns beyond the datetime. But in every real-life case I've seen, datetime will be a vastly better clustered index for this table than identity, and you will save yourself duplicating the entire table -- or more -- in added nonclus indexes.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • ScottPletcher (8/25/2015)


    ChrisM@Work (8/25/2015)


    ScottPletcher (8/25/2015)


    ChrisM@Work (8/25/2015)


    ScottPletcher (8/25/2015)


    ChrisM@Work (8/25/2015)


    ScottPletcher (8/21/2015)


    ChrisM@Work (8/21/2015)


    ScottPletcher (8/21/2015)


    ChrisM@Work (8/21/2015)

    The inner select can be accelerated with this index:

    CREATE INDEX ix_Helper ON [cts].[exception_Main]

    (productArea, reportable, reportYear, reportMonth, queueID)

    INCLUDE (volume, cost, exceptionDateTime)

    But with the best clustered index now on the table, you don't need this extra, fairly large index. You should get as good a response time -- possibly even better, depends on the details -- from a clustered index on exceptionDateTime. That's the huge advantage --edit: of the best clus index -- that you don't have to create the gazillions on covering indexes for so many queries.

    We now have a test harness Scott if you'd like to put the theory to work. I'm out of time for today for something of this scale but I'll have time tomorrow.

    But you're looking at one query in isolation. You have to consider all the processing going on against that table. All these custom indexes require insert/update/delete maintenance, and they take buffer space to satisfy only one query or two. A clustered index is by definition a covering index for all queries, and the buffer space is shared among all users of that data. There's usually many people reading the most recent data.

    Edit: Can you get better performance for a given table by building a custom table -- which is what the proposed index effectively is -- specifically for that query? Sure. But is it really worth rewriting half the table for every table mod just to get, say, 5% more performance for this query? It doesn't take long before the covering indexes cost you vastly more than they are worth. I've removed literally many thousands of nonclustered indexes here with vastly better performance overall (orders of magnitude in some cases) by first identifying and creating the best clustered index on tables. The vast majority of those tables had the clustered index on identity, because of the horribly mistaken and misguided belief that is some type of "default" clustered index for any table.

    The clustered index on Nick's table wasn't the best choice since it was VARCHAR(400), so I ran a few tests to see how a surrogate key might fare against the suggestion of using the exceptionDateTime column. First I set up a test table containing little more than the columns used by the query, and with the same number of rows, about 14.5 million. I also set up the smaller table. Then I played about with indexing. Here's the code for the sample data:

    -- set up sample data

    IF OBJECT_ID('tempdb..#Exception_main') IS NOT NULL DROP TABLE #Exception_main

    SELECT

    ID, -- 8 bytes

    [exceptionID] = CAST(REPLICATE(CAST(NEWID() AS VARCHAR(36))+' ',7) AS VARCHAR(400)),

    reportYear = YEAR(ReportDate),

    reportMonth = MONTH(ReportDate),

    reportable = CAST(CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'Y' ELSE 'N' END AS CHAR(1)),

    QueueID = CAST(ABS(CHECKSUM(NEWID()))%420 AS VARCHAR(256)),

    exceptionDateTime = DATEADD(DAY,ABS(CHECKSUM(NEWID()))%3,ReportDate), -- 8 bytes

    productArea = CASE WHEN ABS(CHECKSUM(NEWID()))%2 = 1 THEN 'FXMM' ELSE 'N' END,

    volume = ABS(CHECKSUM(NEWID()))%20,

    cost = ABS(CHECKSUM(NEWID()))%30

    INTO #Exception_main

    FROM (

    SELECT ID, ReportDate = DATEADD(MINUTE, 0-ID/20,GETDATE())

    FROM (

    SELECT TOP(14500000) -- 00:04:02 / 14 000 000

    ID = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c, SYS.COLUMNS d, SYS.COLUMNS e

    ) d

    ) e

    IF OBJECT_ID('tempdb..#Map_Exception') IS NOT NULL DROP TABLE #Map_Exception

    SELECT TOP(420)

    QueueID = CAST(ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS VARCHAR(40)),

    [Service] = ABS(CHECKSUM(NEWID()))%100

    INTO #Map_Exception

    FROM SYS.COLUMNS a, SYS.COLUMNS b, SYS.COLUMNS c

    CREATE UNIQUE CLUSTERED INDEX ucx_Map_Exception ON #Map_Exception (QueueID)

    This takes about four minutes to run on a steam-powered dev box.

    Having built the data, I had a play with indexes. To be sure that the results weren't skewed by other processes, I ran through the whole lot four times.

    Here's the query:

    SET STATISTICS IO, TIME ON

    SELECT m.reportMonth,

    m.reportYear,

    ex.[service],

    SUM(m.Vol) AS Vol,

    SUM(m.Effort) AS effort

    FROM (

    SELECT reportMonth,

    reportYear,

    queueID,

    SUM(volume) AS Vol,

    SUM(cost) AS Effort

    FROM #exception_Main -- 14.5M rows

    WHERE exceptionDateTime >= GETDATE() - 365

    AND productArea = ('FXMM')

    AND reportable = 'Y'

    GROUP BY

    reportYear,

    reportMonth,

    queueID

    ) m

    LEFT JOIN #map_Exception ex -- 420 rows

    ON m.queueID = ex.queueID

    GROUP BY

    m.reportYear,

    m.reportMonth,

    ex.[service]

    SET STATISTICS IO, TIME OFF

    On this particular set the query returned 1,287 aggregated rows from 2,633,895 qualifying rows.

    Here are the summarised results from the indexing tests:

    --====================================================================================================

    -- 1. Baseline

    CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([exceptionID])

    CREATE NONCLUSTERED INDEX [idx_ctsTrend] ON #exception_Main

    ([productArea] ASC, [reportable] ASC, [exceptionDateTime] ASC)

    INCLUDE ([queueID], [cost], [reportMonth], [reportYear], [volume])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 9,446,096 KB

    --Data = 4,640,032 KB

    --Index_size= 4,805,352 KB

    -- Best result from 6 runs: logical reads 131639, elapsed time = 656 ms.

    -- Index seek, no residual predicate, hash matches for aggregates

    --====================================================================================================

    -- 2. Unique clustered index on surrogate key ID

    DROP INDEX idx_ctsTrend ON #Exception_main

    DROP INDEX ucx_Sample ON #Exception_main

    CREATE UNIQUE CLUSTERED INDEX ucx_Sample ON #Exception_main ([ID])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,652,128 KB

    --Data = 4,640,024 KB

    --Index_size= 11,672 KB

    -- Best result from 6 runs: logical reads 581450, elapsed time = 28288 ms.

    -- Clustered index scan, hash matches for aggregates

    --====================================================================================================

    -- 2.1 Unique clustered index on surrogate key ID with supporting nonclustered index

    CREATE INDEX ix_Helper ON #Exception_main

    (productArea, reportable, reportYear, reportMonth, queueID)

    INCLUDE (volume, cost, exceptionDateTime);

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 5,393,856 KB

    --Data = 4,640,024 KB

    --Index_size= 753,208 KB

    -- Best result from 6 runs: logical reads 24503, elapsed time = 332 ms.

    -- Index seek (productarea, reportable), residual predicate for exceptionDateTime, Stream Aggregates

    --====================================================================================================

    -- 3.0 Clustered index on exceptionDateTime

    DROP INDEX ix_Helper ON #Exception_main

    DROP INDEX ucx_Sample ON #Exception_main

    CREATE CLUSTERED INDEX ucx_Sample ON #Exception_main (exceptionDateTime)

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,656,544 KB

    --Data = 4,640,032 KB

    --Index_size= 16,024 KB

    -- Best result from 6 runs: logical reads 424515, physical reads 275, read-ahead reads 420270, elapsed time = 23276 ms.

    -- Clustered index seek (exceptionDateTime) residual predicate for productArea and Reportable, hash matches for aggregates

    --====================================================================================================

    -- 3.1 Clustered index on exceptionDateTime & recommended non-clustered index

    CREATE NONCLUSTERED INDEX ix_Recommended ON [dbo].[#Exception_main]

    ([reportable],[productArea],[exceptionDateTime])

    INCLUDE ([reportYear],[reportMonth],[QueueID],[volume],[cost])

    EXEC sp_spaceused '#exception_Main'

    --Reserved = 4,656,544 KB

    --Data = 4,640,032 KB

    --Index_size= 16,024 KB

    -- Best result from 6 runs: logical reads 16916, elapsed time = 495 ms.

    -- Clustered index seek, no residual predicate, hash matches for aggregates

    --====================================================================================================

    DROP INDEX ix_Recommended ON [dbo].[#Exception_main]

    DROP INDEX ucx_Sample ON #Exception_main

    -- back where we started, whizz around for another go to ensure results aren't skewed by local activity

    The clustered index on exceptionDateTime was only marginally faster than the surrogate key because so many rows had to be filtered by the residual predicate. For sure the ratio would change in favour of a cluster on exceptionDateTime with a smaller number of qualifying rows, but this dataset wasn't deliberately tipped in favour of a natural key, it's just a very rough approximation of a real world situation.

    xx

    That seems to prove my point: the best clustered index eliminated the need for an extra, nonclustered index with better overall performance. And, since datetime is used as a filter in (almost) every query, almost every query will perform overall better and with more consistency.

    Moreover, there's no dreaded "tipping point" when using the clustered index. Lastly, when you add one column to the query, such as including the customer in the grouping as did the original query the OP posted, there's no need to rebuild/refactor a covering index. No constant reshuffling nonclus indexes, and, inevitably, constantly increasing their size.

    Don't get me wrong. Some covering indexes will almost certainly still be needed. But they are drastically reduced.

    How is 23 seconds better than 300 milliseconds? The clustered index on your chosen column was only marginally better than a surrogate key (ID), which returned in 28 seconds.

    Busy, probably didn't read closely enough. I've tuned tens of thousands of tables and well over half the time there is a better clustered index than one on an identity column. It gives better overall performance while deleting thousands of nonclus indexes. The idea that there should be a "default" clustering index of identity is just false, period.

    "The idea that there should be a "default" clustering index of identity is just false, period." Yes, agreed. But as a very famous guy who lurks around here is fond of saying "It depends", and in this particular case, purely by accident, the identity column fares quite well. Not only that, but the clustered index of your choice is next to useless without a supporting non-clustered index. Now here's something else to think about. The best choice of clustered index, if it's to be a natural key, won't be known until the database has been live for long enough to pick up decent usage stats.

    It's not at all next to useless. Typically you don't use an entire year's worth of data at once. You're also not including the overhead of maintaining custom indexes for every query. Again, yes, a custom table built and maintained for just that specific query will almost always outperform the general table, but that's not the total part of it.

    You also didn't replicate the actual clustered index the requester created. Given their knowledge of the data, they included additional columns in the index that you ignored. As I noted above, only they would know whether to add columns beyond the datetime. But in every real-life case I've seen, datetime will be a vastly better clustered index for this table than identity, and you will save yourself duplicating the entire table -- or more -- in added nonclus indexes.

    Typically you'd distinguish between OLTP and reporting databases before making sweeping generalisations 😀

    The clustered index I created is exactly the same as the OP posted DDL for:

    CONSTRAINT [idx_ctsException] PRIMARY KEY CLUSTERED

    (

    [exceptionID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]

    ) ON [PRIMARY]

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • You've probably already considered this, but if you do go the route of summary tables, then perform incremental builds. In other words, if the fact table is inserted with daily transaction records related only to period July 2015, then that day's summary build process should delete and re-aggregate only records for July 2015, rather than truncating the summary table and re-aggregating the entire fact table.

    Coincidentally at this very moment I'm taking a break from refactoring a legacy process that wasn't originally coded that way. Based on some initial unit tests, it's looking like the nightly summary build process will shrink from 12 hours to less than one hour. In this case I'm dealing with 10 TB of fact tables with a daily ingest rate of a few GB, so it's amazing how much I/O has been wasted re-processing the same data day after day.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply