The "Numbers" or "Tally" Table: What it is and how it replaces a loop.

  • Hugo Kornelis (9/24/2009)


    The error occurs because SQL Server has to calculate the result of the DATEADD expression for each row in the Tally table, and with your extra huge amount of rows, this causes overflow at some point. But even without that, you are ruining performance.

    Try changing the WHERE clause to

    WHERE t.N <= DATEDIFF(hh, @DateStart, @DateEnd) / 8 + 1

    Oh and by the way - creating the tally table in tempdb is not a good option. The tempdb database is dropped and recreated every time the server restarts. You should either create it in a permanent database (maybe you have a single central database for objects used by all other databases?), or you can create it in the model database so that it will henceforth exist in every newly created database (including tempdb after every server restart).

    Heh... sorry Hugo... started reading from the bottom up... Yes, that's the problem. With a large Tally table, the original code is a lot less than optimized because I put a formula around t.N in the WHERE clause. Every value in the Tally table had to be evaluated and, of course a 16 million row Tally table is going to drive that formula over the edge. Your formula is the correct way to over come that problem.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • DavidL (9/24/2009)


    A reasonably sized (12000 rows) tally table took care of

    DECLARE @DateStart DATETIME

    DECLARE @DateEnd DATETIME

    --===== Find the min and max dates in the range of data

    SELECT @DateStart = MIN(SalesDate),

    @DateEnd = MAX(SalesDate)

    FROM dbo.vwTableau_GP_Shipments

    PRINT CONVERT(varchar(25),@datestart)

    PRINT CONVERT(varchar(25),@dateend)

    SELECT t.N-1+@DateStart AS ShippedDate

    FROM tempdb.dbo.Tally t

    WHERE t.N-1+@DateStart <= @DateEnd

    not running correctly.

    See the post just before this one. I had a less than optimal WHERE clause in the article. Hugo's WHERE clause will solve the problem even for a 16M row Tally table. It will also greatly improve the performance because his change will allow for an INDEX SEEK instead of scanning the whole table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Almost forgot... if you're just playing with a Tally table, then TempDB is fine. If you want one for a production table, then Hugo is correct there... TempDB is a bad place for it. Find a permanent database you can put it in. I generally keep these kinds of aux tables in a UTIL database so I don't have to back it up in every database.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hi guys: Thanks for hanging in there through all my bumbling. I'm not a sql server pro, and as a result find myself in over my head quite often. Jeff, I'm sure you never expected anyone to up and create a tally table with 16m records, but, there I was with a slow afternoon on my hands...

    In your write up, if you could devote a few sentences to how to size a tally table, I would appreciate it. It seems to me that it needs to be as big as the record set you expect to deal with (a 12K tally table can only substitute for a cursor that counts to 12K, no? If I have to count to 15K, I'm out of luck?). If that is a misconception, I'd like to hear why.

    Also, for what it's worth, as I examined the 16m tally table yesterday (since dropped) it appeared that the values in it did not ascend steadily. It went up to 4000 or so, then the next value jumped to 20k, from which it stepped up to 25K or so, then it went to some other value like 5K, etc. It occurred to me that perhaps that was one reason why your sample queries failed with my large tally table, because row number 4001 ended up using an 'N' of 20K. At any rate, thanks everyone for lending your time and expertise. D. Lewis

  • DavidL (9/25/2009)


    Hi guys: Thanks for hanging in there through all my bumbling. I'm not a sql server pro, and as a result find myself in over my head quite often. Jeff, I'm sure you never expected anyone to up and create a tally table with 16m records, but, there I was with a slow afternoon on my hands...

    In your write up, if you could devote a few sentences to how to size a tally table, I would appreciate it. It seems to me that it needs to be as big as the record set you expect to deal with (a 12K tally table can only substitute for a cursor that counts to 12K, no? If I have to count to 15K, I'm out of luck?). If that is a misconception, I'd like to hear why.

    Also, for what it's worth, as I examined the 16m tally table yesterday (since dropped) it appeared that the values in it did not ascend steadily. It went up to 4000 or so, then the next value jumped to 20k, from which it stepped up to 25K or so, then it went to some other value like 5K, etc. It occurred to me that perhaps that was one reason why your sample queries failed with my large tally table, because row number 4001 ended up using an 'N' of 20K. At any rate, thanks everyone for lending your time and expertise. D. Lewis

    Hi David,

    As I already said before - I guess that if you had examined your tally table with an ORDER BY clause in the query, you would have seen all numbers.

    In principle, there is no reason for large tally tables to work. If you change your query as I suggested, SQL Server can weed out the high numbers before starting the date artihmetic and you won't get errors.

    The reason that you got the results out of order is probably that SQL Server realised you wanted all rows from a pretty large table, and to save time it decided to create a parallel plan - that means that every one of your processors was busy scanning a part of the table. And since no ORDER BY was requested, the results were just returned to the client in the order that the processors produced them.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • In principle, there is no reason for large tally tables to work. If you change your query as I suggested, SQL Server can weed out the high numbers before starting the date artihmetic and you won't get errors.

    The reason that you got the results out of order is probably that SQL Server realised you wanted all rows from a pretty large table, and to save time it decided to create a parallel plan - that means that every one of your processors was busy scanning a part of the table. And since no ORDER BY was requested, the results were just returned to the client in the order that the processors produced them.

    Hi Hugo: Did you mean to write "... there is no reason for large tally tables NOT to work"?

    Also, in order to ensure that the queries that use a tally table work, wouldn't I need to add an ORDER BY clause every time I use it? As in:

    SELECT (t.N-1)%3 + 1 AS ShiftNumber,

    DATEADD(hh,8*(t.N-1),@DateStart) AS ShiftStart,

    DATEADD(hh,8*(t.N ),@DateStart) AS ShiftEnd

    FROM tempdb.dbo.Tally t

    ORDER BY t.N ASC

    WHERE DATEADD(hh,8*(t.N-1),@DateStart) <= @DateEnd

    David

  • DavidL (9/25/2009)


    Jeff, I'm sure you never expected anyone to up and create a tally table with 16m records, but, there I was with a slow afternoon on my hands...

    Heh... it's not that at all. The real fact of the matter is that I screwed up on that particular query in the article. I've actually had to create a Billion row Tally table for some folks to do some experiments with (some kind of chemistry experiment, don't know the details) so 16M isn't that bad.

    I did previously include some rules for a general purpose Tally table in the article. There are also some fine methods for creating a Tally CTE based on the methods Ben-Gan wrote about in his book that, when properly written, can be every bit as fast (and sometimes faster on the really big stuff) as a permanent Tally table.

    Hugo's observations on the use of ORDER BY are important.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • DavidL (9/25/2009)


    In principle, there is no reason for large tally tables to work. If you change your query as I suggested, SQL Server can weed out the high numbers before starting the date artihmetic and you won't get errors.

    The reason that you got the results out of order is probably that SQL Server realised you wanted all rows from a pretty large table, and to save time it decided to create a parallel plan - that means that every one of your processors was busy scanning a part of the table. And since no ORDER BY was requested, the results were just returned to the client in the order that the processors produced them.

    Hi Hugo: Did you mean to write "... there is no reason for large tally tables NOT to work"?

    Also, in order to ensure that the queries that use a tally table work, wouldn't I need to add an ORDER BY clause every time I use it? As in:

    SELECT (t.N-1)%3 + 1 AS ShiftNumber,

    DATEADD(hh,8*(t.N-1),@DateStart) AS ShiftStart,

    DATEADD(hh,8*(t.N ),@DateStart) AS ShiftEnd

    FROM tempdb.dbo.Tally t

    ORDER BY t.N ASC

    WHERE DATEADD(hh,8*(t.N-1),@DateStart) <= @DateEnd

    David

    Hi David,

    Yes - that's what I meant to write. As Jeff indicates, you can make your tally table as large as you want to / need it to be.

    And no - there is no reason to use ORDER BY every time you use the tally table. There is reason to use ORDER BY every time you expect ordered results. If you see a gap in the data from a query that uses an ORDER BY, you have uncovered either missing rows or a SQL Server bug. If you see the same gap from a query that does not use ORDER BY, it might be the result of missing rows, but more probable is that those rows are just somewhere else in the result set. The sequence 1, 2, 4, 5 misses the number 3, but the set 1, 2, 4, 5, 3 does not (even though it appears to be missing before you've read to the end).


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • And no - there is no reason to use ORDER BY every time you use the tally table. There is reason to use ORDER BY every time you expect ordered results. If you see a gap in the data from a query that uses an ORDER BY, you have uncovered either missing rows or a SQL Server bug. If you see the same gap from a query that does not use ORDER BY, it might be the result of missing rows, but more probable is that those rows are just somewhere else in the result set. The sequence 1, 2, 4, 5 misses the number 3, but the set 1, 2, 4, 5, 3 does not (even though it appears to be missing before you've read to the end).

    OK, thanks.

    However, yesterday the following crashed and burned:

    DECLARE @DateStart DATETIME

    DECLARE @DateEnd DATETIME

    SELECT @DateStart = '2008-01-01 06:00',

    @DateEnd = DATEADD(yy,5,@DateStart)

    --===== Display the shift number and date/times

    SELECT (t.N-1)%3 + 1 AS ShiftNumber,

    DATEADD(hh,8*(t.N-1),@DateStart) AS ShiftStart,

    DATEADD(hh,8*(t.N ),@DateStart) AS ShiftEnd

    FROM tempdb.dbo.Tally t

    WHERE DATEADD(hh,8*(t.N-1),@DateStart) <= @DateEnd

    Up until 2011-12-15 22:00:00 all was fine, then sql server threw the following error:

    Msg 517, Level 16, State 1, Line 7

    Adding a value to a 'datetime' column caused overflow.

    I assumed this was caused by the query running into an extremely large N. According to ms, "the number argument [for DATEADD] cannot exceed the range of int" so 8*N must be greater than 2147483647. So far so good.

    But, given a tally table with values in ascending order starting at "1", a query such as that above (with those start and end dates) would never encounter an N such that 8*N would exceed 2147483647. However using a 16m row tally table, the query DOES encounter that problem, and did throw the error. So, it seems to me that some kind of ORDER BY clause is needed, or, failing that, an orderly tally table is needed. dl

  • DavidL (9/25/2009)


    And no - there is no reason to use ORDER BY every time you use the tally table. There is reason to use ORDER BY every time you expect ordered results. If you see a gap in the data from a query that uses an ORDER BY, you have uncovered either missing rows or a SQL Server bug. If you see the same gap from a query that does not use ORDER BY, it might be the result of missing rows, but more probable is that those rows are just somewhere else in the result set. The sequence 1, 2, 4, 5 misses the number 3, but the set 1, 2, 4, 5, 3 does not (even though it appears to be missing before you've read to the end).

    OK, thanks.

    However, yesterday the following crashed and burned:

    DECLARE @DateStart DATETIME

    DECLARE @DateEnd DATETIME

    SELECT @DateStart = '2008-01-01 06:00',

    @DateEnd = DATEADD(yy,5,@DateStart)

    --===== Display the shift number and date/times

    SELECT (t.N-1)%3 + 1 AS ShiftNumber,

    DATEADD(hh,8*(t.N-1),@DateStart) AS ShiftStart,

    DATEADD(hh,8*(t.N ),@DateStart) AS ShiftEnd

    FROM tempdb.dbo.Tally t

    WHERE DATEADD(hh,8*(t.N-1),@DateStart) <= @DateEnd

    Up until 2011-12-15 22:00:00 all was fine, then sql server threw the following error:

    Msg 517, Level 16, State 1, Line 7

    Adding a value to a 'datetime' column caused overflow.

    I assumed this was caused by the query running into an extremely large N. According to ms, "the number argument [for DATEADD] cannot exceed the range of int" so 8*N must be greater than 2147483647. So far so good.

    But, given a tally table with values in ascending order starting at "1", a query such as that above (with those start and end dates) would never encounter an N such that 8*N would exceed 2147483647. However using a 16m row tally table, the query DOES encounter that problem, and did throw the error. So, it seems to me that some kind of ORDER BY clause is needed, or, failing that, an orderly tally table is needed. dl

    That's already been explained and Huge made the change to fix that. The problem was that t.N was used in a calculation in the predicate. Hugo's good code fixed that. Have you tried Hugo's fix?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • No, I didn't (at least not on using the large tally table). Yesterday I dropped the large tally tables and recreated small ones. dl

  • That's a shame... Hugo's code fixes the problem and you would have seen it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • DavidL (9/25/2009)


    And no - there is no reason to use ORDER BY every time you use the tally table. There is reason to use ORDER BY every time you expect ordered results. If you see a gap in the data from a query that uses an ORDER BY, you have uncovered either missing rows or a SQL Server bug. If you see the same gap from a query that does not use ORDER BY, it might be the result of missing rows, but more probable is that those rows are just somewhere else in the result set. The sequence 1, 2, 4, 5 misses the number 3, but the set 1, 2, 4, 5, 3 does not (even though it appears to be missing before you've read to the end).

    OK, thanks.

    However, yesterday the following crashed and burned:

    DECLARE @DateStart DATETIME

    DECLARE @DateEnd DATETIME

    SELECT @DateStart = '2008-01-01 06:00',

    @DateEnd = DATEADD(yy,5,@DateStart)

    --===== Display the shift number and date/times

    SELECT (t.N-1)%3 + 1 AS ShiftNumber,

    DATEADD(hh,8*(t.N-1),@DateStart) AS ShiftStart,

    DATEADD(hh,8*(t.N ),@DateStart) AS ShiftEnd

    FROM tempdb.dbo.Tally t

    WHERE DATEADD(hh,8*(t.N-1),@DateStart) <= @DateEnd

    Up until 2011-12-15 22:00:00 all was fine, then sql server threw the following error:

    Msg 517, Level 16, State 1, Line 7

    Adding a value to a 'datetime' column caused overflow.

    I assumed this was caused by the query running into an extremely large N. According to ms, "the number argument [for DATEADD] cannot exceed the range of int" so 8*N must be greater than 2147483647. So far so good.

    But, given a tally table with values in ascending order starting at "1", a query such as that above (with those start and end dates) would never encounter an N such that 8*N would exceed 2147483647. However using a 16m row tally table, the query DOES encounter that problem, and did throw the error. So, it seems to me that some kind of ORDER BY clause is needed, or, failing that, an orderly tally table is needed. dl

    Adding on to Jeff's great responses (Jeff, have you ever considered a career in tag team wrestling? ;-)), you are partly right in your assessment that the very large values of N cause the error (though the error message indicates that the problem is not caused when 8*N exceeds the maximum integer number, but when DATEADD(hh,8*(t.N-1),@DateStart) exceeds the maximum permittable datetime value, i.e. Dec 31st, 9999.

    However, adding the ORDER BY will not help. It is true that, for some value of N, SQL Server will find the result of the DATEADD expression to exceed @DateEnd. But there is nothing in the query to indicate that there will not be a row found later that does not exceed this value. You and I both know, because of the nature of the function used, that this can't happen if the input is processed in ascending order, but SQL Server doesn't do that logic. It will still evaluate the DATEADD expression for each value of N and compare it against @DateEnd. Until it runs in the date overflow error. It might take a bit longer before it errors out, but that's the only difference you'll find.

    I do hope that you tested my query against your smaller table anyway. (And make sure I got the calculation right for getting the exact boundaries you need - I may be off a tiny bit). Your version still forces SQL Server to process each row, and do the datetime maths for each row. My version enables SQL Server to process only the rows that will actually be needed. Even with a smaller tally table, that will not cause errors, you'll probably find that my version is significantly faster than your original code.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Got it! thanks guys!

  • Jeff, great job, I like your articles.

    I have discovered the Tally concept reading the Transact-SQL by Ales Spetic & Jonathan Gennick. They call it Pivot.

    I think that you should add 0 value to Tally, to generate an array of dates including a starting date:

    declare @interval int --days number from pay to pay

    declare @myDate char(8) --the first pay date

    select @interval = 14

    , @myDate = '20090925'

    select convert(char(8),dateadd(d, @interval * n, @myDate),112) payDay

    from Tally

    where n between 0 and 25 --will cover the year ahead

    Waiting for your next articles, cheers

    Greg

Viewing 15 posts - 271 through 285 (of 497 total)

You must be logged in to reply to this topic. Login to reply