The Numbers Table

Question

Post reply

The Numbers Table

Viewing 15 posts - 16 through 30 (of 106 total)

You must be logged in to reply to this topic. Login to reply

thisisfutile Hall of Fame Points: 3581 More actions · Answer 1

Jeff Moden (11/24/2008)

jacroberts (11/24/2008)

... queried the table with 4000 rows in in less than 1 second.

That's pretty slow and performance should be always an issue...

With all due respect Jeff, it really does matter what work environment you're in. I have to agree with jacroberts in that 'query' performance indeed is not always an issue. In many cases, worker performance is more important than query performance. If my boss new I was trying to make a query run less than "less than a second", I'm guessing he'd be disappointed in my use of time.

Adam Machanic SSCoach Points: 15370 More actions · Answer 2

Adam Machanic

SSCoach

Points: 15370

November 24, 2008 at 7:59 am

#902466

Just throw the CTE into an inline TVF. In my tests it almost always performs equally fast, and sometimes is even faster, than using a table.

--
Adam Machanic
whoisactive

CheeseMan316 SSC Veteran Points: 240 More actions · Answer 3

Adam Machanic (11/24/2008)

Just throw the CTE into an inline TVF. In my tests it almost always performs equally fast, and sometimes is even faster, than using a table.

Adam Machanic: Have you ever tested the time it takes to build out the CTE with millions of numbers? I'm not sure how you use numbers tables, but we have had the need to have over 8M numbers. A physical table is faster.

Adam Machanic SSCoach Points: 15370 More actions · Answer 4

CheeseMan316 (11/24/2008)

Adam Machanic (11/24/2008)

Just throw the CTE into an inline TVF. In my tests it almost always performs equally fast, and sometimes is even faster, than using a table.

Adam Machanic: Have you ever tested the time it takes to build out the CTE with millions of numbers? I'm not sure how you use numbers tables, but we have had the need to have over 8M numbers. A physical table is faster.

Yes, I've done some pretty massive tests. You seem to be thinking about the time to insert the numbers into a table, not the time for the engine to internally produce and use the numbers; these are not the same. But if you're routinely dealing with over 8,000,000 numbers, it sounds like you have much bigger problems than performance of a CTE vs. a physical table... What are you actually using these things for?

--
Adam Machanic
whoisactive

CheeseMan316 SSC Veteran Points: 240 More actions · Answer 5

CheeseMan316

SSC Veteran

Points: 240

November 24, 2008 at 8:28 am

#902504

I can't really get into what we use it for. But rest assured, I have the need for large numbers tables. I'm not saying it's every day practice, but we have used them.

Sir Slicendice Mr or Mrs. 500 Points: 570 More actions · Answer 6

CheeseMan316 (11/24/2008)

Adam Machanic (11/24/2008)

Just throw the CTE into an inline TVF. In my tests it almost always performs equally fast, and sometimes is even faster, than using a table.

Adam Machanic: Have you ever tested the time it takes to build out the CTE with millions of numbers? I'm not sure how you use numbers tables, but we have had the need to have over 8M numbers. A physical table is faster.

I frequently use the inline TVF approach that Adam described, and I have done lots of performance tests. In the vast majority of cases I have seen, the TVF outperforms the physical table; the optimizer seems very good at inlining the TVF into the containing query. (In showplan view you see this readily.) Also, I have found that the optimizer is generally very good at detecting opportunities to apply row-by-row filtering strategically into set-based operations; thus, even when the TVF is set to generate billions of rows, only the rows needed are actually generated (and not sent to a temp table).

Where I have seen this approach break down is when the containing query is very complex, or when there are multiple CTEs or TVFs; I suspect that the optimizer has some strict limits on how much complexity it can work on, or perhaps a ceiling on time it is willing to invest. Curiously, I have also found that when the same expression is stuffed into a view, the optimizer works harder....

The End.

GSquared SSC Guru Points: 260824 More actions · Answer 7

jacroberts (11/24/2008)

Performance isn't always an issue as in my previous example.

I have to disagree. Even if the performance of that particular query "doesn't matter", you have to ask why you'd deliberately use something that takes more server resources for more time than something that takes less and is faster, in any case.

Sure, in the middle of the night, a query that takes less than a second is no big deal. But, you can't re-use that query for something that will run during the day; other midnight tasks are slowed down by just that much; you end up with the habit of "it doesn't matter"; and so on.

Why ever use something worse when using something better is just as easy or easier and has no higher cost?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

jeffery.baynard Grasshopper Points: 22 More actions · Answer 8

jeffery.baynard

Grasshopper

Points: 22

November 24, 2008 at 8:32 am

#902511

Itzik uses RECURSION, even faster.

With nums as

(

Select 1 as n

Union all

Select n+1 from nums where n < 10

)

Select n

From nums

CheeseMan316 SSC Veteran Points: 240 More actions · Answer 9

CheeseMan316

SSC Veteran

Points: 240

November 24, 2008 at 8:38 am

#902519

As long as you don't need to generate more than 100 numbers, that will work.

Otherwise, you will get:

Msg 530, Level 16, State 1, Line 4

The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

Adam Machanic SSCoach Points: 15370 More actions · Answer 10

jeffery.baynard (11/24/2008)

Itzik uses RECURSION, even faster.

With nums as

(

Select 1 as n

Union all

Select n+1 from nums where n < 10

)

Select n

From nums

Sorry, but he doesn't. Recursion is actually much, much slower and often involves intermediate serialization to TempDB. Try the following two batches:



With nums as

(

   Select 0 as n

   Union all

   Select n+1 from nums where n < 999999

)

Select n

From nums

option (maxrecursion 0)

select internal_objects_alloc_page_count

from sys.dm_db_task_space_usage

where session_id = @@spid

go

with t0(n) as (

select 1 union all select 1

), t1(n) as (

select 1 from t0 as a, t0 as b

), t2(n) as (

select 1 from t1 as a, t1 as b

), t3(n) as (

select 1 from t2 as a, t2 as b

), t4(n) as (

select 1 from t3 as a, t3 as b

), t5(n) as (

select 1 from t4 as a, t4 as b

), Numbers(n) as (

select row_number() over (order by n) as n

from t5

)

select n - 1 as n

from Numbers

where n <= 1000000

select internal_objects_alloc_page_count

from sys.dm_db_task_space_usage

where session_id = @@spid

go

--
Adam Machanic
whoisactive

DEK46656 SSCrazy Points: 2051 More actions · Answer 11

jeffery.baynard (11/24/2008)

Itzik uses RECURSION, even faster.

With nums as

(

Select 1 as n

Union all

Select n+1 from nums where n < 10

)

Select n

From nums

I changed the 10 to 1000 and got this error message:

Msg 530, Level 16, State 1, Line 1

The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

This might work for my "I" section of the CTE I supplied, but I doubt that it works beyond "99"; that was where the results appeared to die.

Beer's Law: Absolutum obsoletum
"if it works it's out-of-date"

CheeseMan316 SSC Veteran Points: 240 More actions · Answer 12

CheeseMan316

SSC Veteran

Points: 240

November 24, 2008 at 8:44 am

#902526

as Adam Machanic pointed out, you can set the maxrecursion. But one by one recursion won't be very fast.

GSquared SSC Guru Points: 260824 More actions · Answer 13

Adam Machanic (11/24/2008)

Just throw the CTE into an inline TVF. In my tests it almost always performs equally fast, and sometimes is even faster, than using a table.

I've done extensive testing on Numbers table performance, and a 10,000-row numbers table performs better for things in that range (on my machines), while a series of cross joins between a 1,000-row table performs better for things in higher ranges.

I've tested it against the CTEs listed, and these two solutions performed better in every test I could come up with.

For example:



select row_number() over (order by t1.number, t2.number)

from dbo.smallnumbers t1 -- table with numbers 1 to 1,000

cross join dbo.smallnumbers t2

Will generate the numbers 1 to 1-million faster than any other method I've tested.

Add another cross join, and get 1-billion very rapidly.

Limit one of the cross joins, and get more specific multiples of 1-million.



select row_number() over (order by t1.number, t2.number, t3.number)

from dbo.smallnumbers t1 -- table with numbers 1 to 1,000

cross join dbo.smallnumbers t2

cross join dbo.smallnumbers t3

where t3.number between 1 and 10

Will give 1 to 10-million.

Add a "top X" to the select to get even more specific.



select top (7158961) row_number() over (order by t1.number, t2.number, t3.number)

from dbo.smallnumbers t1 -- table with numbers 1 to 1,000

cross join dbo.smallnumbers t2

cross join dbo.smallnumbers t3

where t3.number between 1 and 8

Will give numbers between 1 and 7,158,961 (to pull a meaningless number out of thin air).

In my case, because I use my Numbers table for dateadd functions more often than just about anything else, I start them at 0 instead of at 1, but keep the number of rows specific these kinds of functions. Thus, I have one with 0-999 and another with 0-9999, called SmallNumbers and Numbers, respectively.

Another slight twist that I've found quite useful is to keep things like this in a database I call Common, which I then keep 1 copy of on each server I administer. That way, I don't have to have a Numbers table in every database. Since I don't like adding things like this to the master database, Common works quite well. (Some other things I keep in there are a callendar table, tables with the US states, counties and Zip codes, and a series of functions for string parsing, formatting phone numbers, custom date and time formats, and some boilerplate procs, like one for error handling in Catch blocks.) Every database can then use this as a "code library".

As always, the performance testing I've done has been in specific environments. Test it before you take my word for it.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

CheeseMan316 SSC Veteran Points: 240 More actions · Answer 14

So it would seem that there are two debates going on here. The first is whether or not a numbers table SHOULD exist and the second is what is the best method for creating it.

IMHO: The need for a numbers table is up to the environment. There are obvious circumstances where it is not needed. And I'm not saying that everyone needs one. But I need one. I've come up with several uses for a numbers table.

The second is what is the best way to generate the table. I didn't, and don't, claim that the different ways I showed to create the table are the only ways. I just wanted to show a few different methods and what is good and bad about each!

B

Adam Machanic SSCoach Points: 15370 More actions · Answer 15

Joe Celko (11/24/2008)

Here is another method, which uses a tale of digits and some math

CREATE TABLE Sequence (seq INTEGER NOT NULL PRIMARY KEY);

WITH Digits (i)

AS

(VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9))

INSERT INTO Sequence (seq)

SELECT 1 + D0.i

+ (10 * D1.i)

+ (100 * D2.i)

+ (1000 * D3.i)

+ (10000 * D4.i)

+ (100000 * D5.i)

+ (1000000 * D6.i)

FROM Digits AS D0, Digits AS D1, Digits AS D2, Digits AS D3,

Digits AS D4, Digits AS D5, Digits AS D6;

Cute -- but I can't get VALUES to work properly in a CTE in SQL Server 2008. Have you tested this there? I'm not sure why it doesn't work as-is; it seems to work just fine in a derived table, or in the following reconstruction of your code:



WITH Digits (i)

AS

(SELECT i FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) x(i))

INSERT INTO  Sequence (seq)

SELECT  1 + D0.i

        + (10 * D1.i)

        + (100 * D2.i)

        + (1000 * D3.i)

        + (10000 * D4.i)

        + (100000 * D5.i)

        + (1000000 * D6.i)

   FROM Digits AS D0, Digits AS D1, Digits AS D2, Digits AS D3,

        Digits AS D4, Digits AS D5, Digits AS D6;

--
Adam Machanic
whoisactive