The "Numbers" or "Tally" Table: What it is and how it replaces a loop.

Question

Post reply

The "Numbers" or "Tally" Table: What it is and how it replaces a loop.

Viewing 15 posts - 31 through 45 (of 497 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 1

You folks are absolutely the Best! I'm humbled by your awesome compliments and the testing that alot of you have taken the time to do and post.

I've got a lot of replies to make to all of you and I'll jump on those tonight... contrary to popular myth, I actually do work between answering posts 😛 Thanks again, folks! Keep 'em coming! 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 2

What - only 12 useful tips? in a single article?:hehe:

Stop it now - you're setting the bar too high for the rest of us....

(superlative as usual).....

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Marcus Grasshopper Points: 22 More actions · Answer 3

I am using the Tally table explanation from this article to parse out an incoming string that consists of whole records. By design, I know that the record has 3 values: an int, a datetime, and a float. What I need to get out are sets of those records parsed from the input string. The following SQL works fine (but I have questions):

[font="Courier New"]declare @param varchar(8000)

set @param = '101,2008.04.02,1.1,102,2008.04.03,2.2,103,2008.04.04,3.3'

declare @result table(idx int identity(1,1), val int, ord datetime, fval float)

declare @raw table (idx int identity(1,1), val varchar(50))

set @param = ',' + @param + ','

insert into @raw (val)

select substring(@param, N+1, charindex(',',@param,N+1)-N-1)

from dbo.Tally where N<len(@param) and substring(@param,N,1)=','

insert into @result(val,ord,fval)

select cast(R1.val as int), convert(datetime, R2.val, 102), convert(float, R3.val)

from dbo.Tally T

join @raw R1 on T.N=R1.idx

join @raw R2 on T.N+1=R2.idx

join @raw R3 on T.N+2=R3.idx

where (T.N-1)%3 = 0

select * from @result[/font]

As you can see I am parsing into a raw table that is varchar data with an index, then using that table to move sets of 3 items into a new table with the correct data type conversion.

My questions are:

1) Do I need to use the intermediate table or is there a way to do this going straight from the parameter string to the final table (while still being somewhat readable and understandable)?

2) If I remove the identity index from the @result table this stops working. The data conversions fail unless the @result table has the first column as an identity. All I do is insert the 3 converted items into @result, so I can't even begin to imagine why I need the index column. Does anyone else know why that happens?

Thanks for the extremely interesting and useful article, and for any assistance with these questions.

curbina SSC-Addicted Points: 411 More actions · Answer 4

curbina

SSC-Addicted

Points: 411

May 7, 2008 at 8:45 am

#811850

Thanks for taking the time to explain this, Jeff. It was very helpful.

Wayne West SSC-Insane Points: 22586 More actions · Answer 5

Carl Federl (5/7/2008)

Another great article ! Thanks

The first reference for a tally table that I can recall is in "Guide to Sybase and SQL Server" by C. J. Date and D.McGoveran published in June of 1992.

Does anyone know of an earlier reference ?

Carl, what is the page number of that? As it happens, I have the '92 edition in front of me (no idea why!) and couldn't find it readily.

It was a great book for its time, it was one of the first two books that I bought when we got 4.21a in the door.

Thanks!

-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

Wayne West SSC-Insane Points: 22586 More actions · Answer 6

Excellent article, Jeff! I came across an interesting date table article on Tech Republic that was linked here yesterday after it re-surfaced at BuilderAU.Com.au, I updated the loader to run from a tally table and the speed diff was just amazing.

I have not yet gotten in to string parsing with a tally table, but I'll get there eventually. I did have an amusing little time re-writing the Bizz/Buzz test using it:

[font="Courier New"]select case when (Number % 15 = 0) then 'BizzBuzz'

when (Number % 3 = 0) then 'Bizz'

when (Number % 5 = 0) then 'Buzz'

else cast(Number as char)

end

from zNumbers

where Number < 100[/font]

Keep up the good work, Jeff!

-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

James Goodwin Hall of Fame Points: 3691 More actions · Answer 7

For the use of a tally table or other mechanism, populating a variable with sample data like the above is completely wrong! Those delimited values would never be passed by an application. The proper approach for generating sample data is to use RANDOM values.

John,

Since the only important bit of the data for the performance test is the number of commas using random data is overkill for this. I would, however, agree that using an ordered set is a mistake since you don't notice that the routine ends up sorting the results instead of returning them in the order they were passed as would be necessary for parameter passing

--

JimFive

Luke L SSC-Dedicated Points: 35545 More actions · Answer 8

Luke L

SSC-Dedicated

Points: 35545

May 7, 2008 at 8:57 am

#811858

Gotta throw one more on the pile. Greta Article Jeff. I've been using Numbers tables for a while now, but this is the best explanation I've ever seen o nthe topic.

Thanks.

-Luke.

To help us help you read this[/url]For better help with performance problems please read this[/url]

James Goodwin Hall of Fame Points: 3691 More actions · Answer 9

1) Do I need to use the intermediate table or is there a way to do this going straight from the parameter string to the final table (while still being somewhat readable and understandable)?

Probably, but I haven't figured it out yet. One thing I would consider is, instead of splitting everything out at once. First, split everything into groups of 3 and then split the groups of 3 into your final table. That is, instead of selecting every comma, select every 3rd comma (using N%3 = 0) then use that query as the source for your actual query. This would allow you to get rid of the identity column as well since you are keeping your data grouped together. (See next response)

2) If I remove the identity index from the @result table this stops working. The data conversions fail unless the @result table has the first column as an identity. All I do is insert the 3 converted items into @result, so I can't even begin to imagine why I need the index column. Does anyone else know why that happens?

Since tables are unordered, if you remove the identity then your data won't be kept adjacent in the intermediate table.

--

JimFive

Mark Cowne One Orange Chip Points: 26865 More actions · Answer 10

Jeff Marcus (5/7/2008)

I am using the Tally table explanation from this article to parse out an incoming string that consists of whole records. By design, I know that the record has 3 values: an int, a datetime, and a float. What I need to get out are sets of those records parsed from the input string. The following SQL works fine (but I have questions):

[font="Courier New"]declare @param varchar(8000)

set @param = '101,2008.04.02,1.1,102,2008.04.03,2.2,103,2008.04.04,3.3'

declare @result table(idx int identity(1,1), val int, ord datetime, fval float)

declare @raw table (idx int identity(1,1), val varchar(50))

set @param = ',' + @param + ','

insert into @raw (val)

select substring(@param, N+1, charindex(',',@param,N+1)-N-1)

from dbo.Tally where N<len(@param) and substring(@param,N,1)=','

insert into @result(val,ord,fval)

select cast(R1.val as int), convert(datetime, R2.val, 102), convert(float, R3.val)

from dbo.Tally T

join @raw R1 on T.N=R1.idx

join @raw R2 on T.N+1=R2.idx

join @raw R3 on T.N+2=R3.idx

where (T.N-1)%3 = 0

select * from @result[/font]

As you can see I am parsing into a raw table that is varchar data with an index, then using that table to move sets of 3 items into a new table with the correct data type conversion.

My questions are:

1) Do I need to use the intermediate table or is there a way to do this going straight from the parameter string to the final table (while still being somewhat readable and understandable)?

2) If I remove the identity index from the @result table this stops working. The data conversions fail unless the @result table has the first column as an identity. All I do is insert the 3 converted items into @result, so I can't even begin to imagine why I need the index column. Does anyone else know why that happens?

Thanks for the extremely interesting and useful article, and for any assistance with these questions.

> 1) Do I need to use the intermediate table or is there a way to do this going straight from the parameter string to the final table (while

> still being somewhat readable and understandable)?

You're using the intermediate table to generate ordinal values. If you are using SQL Server 2005 you

can use ROW_NUMBER instead. Also you can replace the three joins with a single "group by" [in my experience this is more efficient].

with cte as (



select row_number() over(order by N) - 1 as num,

       substring(@param, N+1, charindex(',',@param,N+1)-N-1) as val

from dbo.Tally

where N<len(@param) and substring(@param,N,1)=',')

insert into @result(val,ord,fval)

select cast(max(case when num%3 = 0 then val end) as int) as val,

       convert(datetime, max(case when num%3 = 1 then val end), 102) as ord,

       convert(float, max(case when num%3 = 2 then val end)) as fval

from cte

group by num/3

select * from @result

____________________________________________________

Deja View - The strange feeling that somewhere, sometime you've optimised this query before

How to get the best help on a forum

http://www.sqlservercentral.com/articles/Best+Practices/61537

Mauve SSChampion Points: 11316 More actions · Answer 11

John,

Since the only important bit of the data for the performance test is the number of commas using random data is overkill for this.

I disagree to a bit. It also affects the amount of parsing that is being done. Using random values of varying lengths is the proper approach. In addition, the length of the string will also affect the results, particularly when you cross over the 8,000 character (4,000 for UNICODE) limit and get into [N]VARCHAR(MAX).

I've been involved in this tally table vs. a table-valued function discussion before. I did some benchmarking and found some very interesting results when the list of values was not random data and/or the length of the data was short vs. long.

I'm not knocking the Tally table. I'm just stating that proper performance testing requires the use of good random data. I'm been around the block for a long time and have seen too many instances where the application worked fine in the performance tests with "generated" data but had major performance problems when "real" data was used -- as in customer data. In the "post-mortem", we'd find that the "generated" data was of such a nature that it was "helping out" the database.

[font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

buktseat SSChasing Mays Points: 611 More actions · Answer 12

buktseat

SSChasing Mays

Points: 611

May 7, 2008 at 9:21 am

#811883

This is probably a rediculous question, but how can any of your "using the tally table" examples work if you don't declare N? I guess I just don't see how N is getting created/assigned? Thanks!

Hugo Kornelis SSC Guru Points: 64748 More actions · Answer 13

buktseat (5/7/2008)

This is probably a rediculous question, but how can any of your "using the tally table" examples work if you don't declare N? I guess I just don't see how N is getting created/assigned? Thanks!

N is a column (or rather: the column) in the Tally table.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/