The Numbers Table

  • Jeff Moden (11/24/2008)


    jacroberts (11/24/2008)


    ... queried the table with 4000 rows in in less than 1 second.

    That's pretty slow and performance should be always an issue...

    With all due respect Jeff, it really does matter what work environment you're in. I have to agree with jacroberts in that 'query' performance indeed is not always an issue. In many cases, worker performance is more important than query performance. If my boss new I was trying to make a query run less than "less than a second", I'm guessing he'd be disappointed in my use of time.

  • Just throw the CTE into an inline TVF. In my tests it almost always performs equally fast, and sometimes is even faster, than using a table.

    --
    Adam Machanic
    whoisactive

  • Adam Machanic (11/24/2008)


    Just throw the CTE into an inline TVF. In my tests it almost always performs equally fast, and sometimes is even faster, than using a table.

    Adam Machanic: Have you ever tested the time it takes to build out the CTE with millions of numbers? I'm not sure how you use numbers tables, but we have had the need to have over 8M numbers. A physical table is faster.

  • CheeseMan316 (11/24/2008)


    Adam Machanic (11/24/2008)


    Just throw the CTE into an inline TVF. In my tests it almost always performs equally fast, and sometimes is even faster, than using a table.

    Adam Machanic: Have you ever tested the time it takes to build out the CTE with millions of numbers? I'm not sure how you use numbers tables, but we have had the need to have over 8M numbers. A physical table is faster.

    Yes, I've done some pretty massive tests. You seem to be thinking about the time to insert the numbers into a table, not the time for the engine to internally produce and use the numbers; these are not the same. But if you're routinely dealing with over 8,000,000 numbers, it sounds like you have much bigger problems than performance of a CTE vs. a physical table... What are you actually using these things for?

    --
    Adam Machanic
    whoisactive

  • I can't really get into what we use it for. But rest assured, I have the need for large numbers tables. I'm not saying it's every day practice, but we have used them.

  • CheeseMan316 (11/24/2008)


    Adam Machanic (11/24/2008)


    Just throw the CTE into an inline TVF. In my tests it almost always performs equally fast, and sometimes is even faster, than using a table.

    Adam Machanic: Have you ever tested the time it takes to build out the CTE with millions of numbers? I'm not sure how you use numbers tables, but we have had the need to have over 8M numbers. A physical table is faster.

    I frequently use the inline TVF approach that Adam described, and I have done lots of performance tests. In the vast majority of cases I have seen, the TVF outperforms the physical table; the optimizer seems very good at inlining the TVF into the containing query. (In showplan view you see this readily.) Also, I have found that the optimizer is generally very good at detecting opportunities to apply row-by-row filtering strategically into set-based operations; thus, even when the TVF is set to generate billions of rows, only the rows needed are actually generated (and not sent to a temp table).

    Where I have seen this approach break down is when the containing query is very complex, or when there are multiple CTEs or TVFs; I suspect that the optimizer has some strict limits on how much complexity it can work on, or perhaps a ceiling on time it is willing to invest. Curiously, I have also found that when the same expression is stuffed into a view, the optimizer works harder....


    The End.

  • jacroberts (11/24/2008)


    Performance isn't always an issue as in my previous example.

    I have to disagree. Even if the performance of that particular query "doesn't matter", you have to ask why you'd deliberately use something that takes more server resources for more time than something that takes less and is faster, in any case.

    Sure, in the middle of the night, a query that takes less than a second is no big deal. But, you can't re-use that query for something that will run during the day; other midnight tasks are slowed down by just that much; you end up with the habit of "it doesn't matter"; and so on.

    Why ever use something worse when using something better is just as easy or easier and has no higher cost?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Itzik uses RECURSION, even faster.

    With nums as

    (

    Select 1 as n

    Union all

    Select n+1 from nums where n < 10

    )

    Select n

    From nums

  • As long as you don't need to generate more than 100 numbers, that will work.

    Otherwise, you will get:

    Msg 530, Level 16, State 1, Line 4

    The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

  • jeffery.baynard (11/24/2008)


    Itzik uses RECURSION, even faster.

    With nums as

    (

    Select 1 as n

    Union all

    Select n+1 from nums where n < 10

    )

    Select n

    From nums

    Sorry, but he doesn't. Recursion is actually much, much slower and often involves intermediate serialization to TempDB. Try the following two batches:

    With nums as

    (

    Select 0 as n

    Union all

    Select n+1 from nums where n < 999999

    )

    Select n

    From nums

    option (maxrecursion 0)

    select internal_objects_alloc_page_count

    from sys.dm_db_task_space_usage

    where session_id = @@spid

    go

    with t0(n) as (

    select 1 union all select 1

    ), t1(n) as (

    select 1 from t0 as a, t0 as b

    ), t2(n) as (

    select 1 from t1 as a, t1 as b

    ), t3(n) as (

    select 1 from t2 as a, t2 as b

    ), t4(n) as (

    select 1 from t3 as a, t3 as b

    ), t5(n) as (

    select 1 from t4 as a, t4 as b

    ), Numbers(n) as (

    select row_number() over (order by n) as n

    from t5

    )

    select n - 1 as n

    from Numbers

    where n <= 1000000

    select internal_objects_alloc_page_count

    from sys.dm_db_task_space_usage

    where session_id = @@spid

    go

    --
    Adam Machanic
    whoisactive

  • jeffery.baynard (11/24/2008)


    Itzik uses RECURSION, even faster.

    With nums as

    (

    Select 1 as n

    Union all

    Select n+1 from nums where n < 10

    )

    Select n

    From nums

    I changed the 10 to 1000 and got this error message:

    Msg 530, Level 16, State 1, Line 1

    The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

    This might work for my "I" section of the CTE I supplied, but I doubt that it works beyond "99"; that was where the results appeared to die.

    Beer's Law: Absolutum obsoletum
    "if it works it's out-of-date"

  • as Adam Machanic pointed out, you can set the maxrecursion. But one by one recursion won't be very fast.

  • Adam Machanic (11/24/2008)


    Just throw the CTE into an inline TVF. In my tests it almost always performs equally fast, and sometimes is even faster, than using a table.

    I've done extensive testing on Numbers table performance, and a 10,000-row numbers table performs better for things in that range (on my machines), while a series of cross joins between a 1,000-row table performs better for things in higher ranges.

    I've tested it against the CTEs listed, and these two solutions performed better in every test I could come up with.

    For example:

    select row_number() over (order by t1.number, t2.number)

    from dbo.smallnumbers t1 -- table with numbers 1 to 1,000

    cross join dbo.smallnumbers t2

    Will generate the numbers 1 to 1-million faster than any other method I've tested.

    Add another cross join, and get 1-billion very rapidly.

    Limit one of the cross joins, and get more specific multiples of 1-million.

    select row_number() over (order by t1.number, t2.number, t3.number)

    from dbo.smallnumbers t1 -- table with numbers 1 to 1,000

    cross join dbo.smallnumbers t2

    cross join dbo.smallnumbers t3

    where t3.number between 1 and 10

    Will give 1 to 10-million.

    Add a "top X" to the select to get even more specific.

    select top (7158961) row_number() over (order by t1.number, t2.number, t3.number)

    from dbo.smallnumbers t1 -- table with numbers 1 to 1,000

    cross join dbo.smallnumbers t2

    cross join dbo.smallnumbers t3

    where t3.number between 1 and 8

    Will give numbers between 1 and 7,158,961 (to pull a meaningless number out of thin air).

    In my case, because I use my Numbers table for dateadd functions more often than just about anything else, I start them at 0 instead of at 1, but keep the number of rows specific these kinds of functions. Thus, I have one with 0-999 and another with 0-9999, called SmallNumbers and Numbers, respectively.

    Another slight twist that I've found quite useful is to keep things like this in a database I call Common, which I then keep 1 copy of on each server I administer. That way, I don't have to have a Numbers table in every database. Since I don't like adding things like this to the master database, Common works quite well. (Some other things I keep in there are a callendar table, tables with the US states, counties and Zip codes, and a series of functions for string parsing, formatting phone numbers, custom date and time formats, and some boilerplate procs, like one for error handling in Catch blocks.) Every database can then use this as a "code library".

    As always, the performance testing I've done has been in specific environments. Test it before you take my word for it.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • So it would seem that there are two debates going on here. The first is whether or not a numbers table SHOULD exist and the second is what is the best method for creating it.

    IMHO: The need for a numbers table is up to the environment. There are obvious circumstances where it is not needed. And I'm not saying that everyone needs one. But I need one. I've come up with several uses for a numbers table.

    The second is what is the best way to generate the table. I didn't, and don't, claim that the different ways I showed to create the table are the only ways. I just wanted to show a few different methods and what is good and bad about each!

    B

  • Joe Celko (11/24/2008)


    Here is another method, which uses a tale of digits and some math

    CREATE TABLE Sequence (seq INTEGER NOT NULL PRIMARY KEY);

    WITH Digits (i)

    AS

    (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9))

    INSERT INTO Sequence (seq)

    SELECT 1 + D0.i

    + (10 * D1.i)

    + (100 * D2.i)

    + (1000 * D3.i)

    + (10000 * D4.i)

    + (100000 * D5.i)

    + (1000000 * D6.i)

    FROM Digits AS D0, Digits AS D1, Digits AS D2, Digits AS D3,

    Digits AS D4, Digits AS D5, Digits AS D6;

    Cute -- but I can't get VALUES to work properly in a CTE in SQL Server 2008. Have you tested this there? I'm not sure why it doesn't work as-is; it seems to work just fine in a derived table, or in the following reconstruction of your code:

    WITH Digits (i)

    AS

    (SELECT i FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) x(i))

    INSERT INTO Sequence (seq)

    SELECT 1 + D0.i

    + (10 * D1.i)

    + (100 * D2.i)

    + (1000 * D3.i)

    + (10000 * D4.i)

    + (100000 * D5.i)

    + (1000000 * D6.i)

    FROM Digits AS D0, Digits AS D1, Digits AS D2, Digits AS D3,

    Digits AS D4, Digits AS D5, Digits AS D6;

    --
    Adam Machanic
    whoisactive

Viewing 15 posts - 16 through 30 (of 106 total)

You must be logged in to reply to this topic. Login to reply