FizzBuzz

  • OK then, let's take a look and up the ante to 4 billion rows at the same time.

    Database in SIMPLE recovery mode.

    ZERO growth in database files.

    ZERO growth in tempdb files.

    set statistics time on

    ;

    with

    cte1 as

    (select TOP 1681 1 as col from master..spt_values)

    ,

    cte2 as

    (select row_number() over (order by (select 0)) as row from cte1

    cross join cte1 AS T2

    cross join cte1 AS T3)

    SELECT count_big(all case

    when row%15 = 0 then 'FIZZBUZZ'

    when row%5 = 0 then 'BUZZ'

    when row%3 = 0 then 'FIZZ'

    else cast(row as varchar(10))

    end)

    --from cte2 where row < 150000001

    from cte2 where row < 4000000001

    set statistics time off

  • steve-893342 (2/28/2010)


    OK then, let's take a look and up the ante to 4 billion rows at the same time.

    Database in SIMPLE recovery mode.

    ZERO growth in database files.

    ZERO growth in tempdb files.

    set statistics time on

    ;

    with

    cte1 as

    (select TOP 1681 1 as col from master..spt_values)

    ,

    cte2 as

    (select row_number() over (order by (select 0)) as row from cte1

    cross join cte1 AS T2

    cross join cte1 AS T3)

    SELECT count_big(all case

    when row%15 = 0 then 'FIZZBUZZ'

    when row%5 = 0 then 'BUZZ'

    when row%3 = 0 then 'FIZZ'

    else cast(row as varchar(10))

    end)

    --from cte2 where row < 150000001

    from cte2 where row < 4000000001

    set statistics time off

    That is roughly what I had a week or so ago, before I realised that Jason's method using no table access was inevitably better. The only differences are that I used a three way join on a subset of master.sys.all_columns (I also tried using instead master.dbo.Tally, which on my system is a public read access 11000 Tally table), and that I was working with a smaller number of rows because the cube root of 4,294,967,296 is a bit under 1626, so that's big enough with a 3-way join. It ran several times more slowly than using a 6 way join that didn't read any tables. The trouble appears to be that reading more than 1000 records from a system view uses more CPU than doing the whole thing without reading from filestore.

    I thought about using a 5 way self-join, but it requires 85 elements in the base CTE and the gain (if any) seemed unlikely to be worth the extra typing.

    Tom

  • steve-893342 (2/28/2010)


    OK then, let's take a look and up the ante to 4 billion rows at the same time.

    Database in SIMPLE recovery mode.

    ZERO growth in database files.

    ZERO growth in tempdb files.

    set statistics time on

    ;

    with

    cte1 as

    (select TOP 1681 1 as col from master..spt_values)

    ,

    cte2 as

    (select row_number() over (order by (select 0)) as row from cte1

    cross join cte1 AS T2

    cross join cte1 AS T3)

    SELECT count_big(all case

    when row%15 = 0 then 'FIZZBUZZ'

    when row%5 = 0 then 'BUZZ'

    when row%3 = 0 then 'FIZZ'

    else cast(row as varchar(10))

    end)

    --from cte2 where row < 150000001

    from cte2 where row < 4000000001

    set statistics time off

    You got zero growth everywhere? Cool! I'll have to give the CTE version of the table join a shot at it. How long did it take your system and how fast is the processor? I ask because on the test I did about 6 months ago, it took 40 minutes just for a billion rows on my poor ol' 1.8GHz 8 year old box.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Well I ran it on my laptop (2.13 GHz, 1 GB RAM)

    and it took 47 minutes for the 4 billion rows. On a more conventional server set up it took 20 minutes.

    Have you got a really beefy server you can run this on?

    I'm interested in how fast this will go!

  • OK, so how far can we push this one then with the tools at my disposal? 100 billion is a nice round number and is just a billion times greater than the modest 100 used at the inception of the FIZZBUZZ challenge. On this SQL Server 2008 box I am currently using, the 4 billion row query took 17 minutes. If this query is scalable to the 100 billion level, then it should take roughly 17 minutes x 25 which is just over 7 hours. The actual time taken for the 100 billion row query was 6 hours 56 minutes. A scalability almost exactly as predicted!

    with

    cte1 as

    (select TOP 4642 1 as col from master.sys.all_columns)

    ,

    cte2 as

    (select row_number() over (order by (select 0)) as row from cte1

    cross join cte1 AS T2

    cross join cte1 AS T3)

    SELECT count_big(all case

    when row%15 = 0 then 'FIZZBUZZ'

    when row%5 = 0 then 'BUZZ'

    when row%3 = 0 then 'FIZZ'

    else cast(row as varchar(12))

    end)

    from cte2 where row < 100000000001

  • I have posted a solution on ASK that does the COUNT_BIG select for 4 Billion rows in 16 minutes on SQL express on my laptop, while I am working in outlook and have a vm running an erp package, so I am sure it would be quicker on a server with lots of super fast RAM.

    Surprisingly (maybe) instead of using CTEs I used good old Views and in all my tests it is slightly quicker than using CTEs....

    If only there was a quick alternative to the row_number function that also allowed parallelism, then we could use more than one thread and speed it up even more....

    I basically use a simple union of 32 "select 1" statements in the first view, then cross join that with itself in two more views to get over a million potential rows.

    I then use a fourth cross join view to do the ROW_NUMBER thing and make available over a trillion rows

    The last view does the fizzbuzz thing using the row numbers generated in view 4.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • nice solution

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Why thank you Mister Magoo that's very interesting feedback. I have most recently been looking at my version of the 4 billion on a virtual server running SQL Server 2005 and it's taking 15 minutes. My record is 13 minutes on a slightly higher spec virtual machine. Your timings are really interesting because it's really difficult to gauge what they mean when taken in isolation.

    BTW do you intend to go for the trillion any time soon?

    Here's my version

    THE TRILLION

    with

    cte1 as

    --(select TOP 10000 1 as col from master..syscolumns)

    (select TOP 10000 1 as col from master..tally)

    ,

    cte2 as

    (select row_number() over (order by (select 0)) as row from cte1

    cross join cte1 AS T2

    cross join cte1 AS T3)

    SELECT count_big(all case

    when row%15 = 0 then 'FIZZBUZZ'

    when row%5 = 0 then 'BUZZ'

    when row%3 = 0 then 'FIZZ'

    else cast(row as varchar(14))

    end)

    from cte2 where row < 1000000000001

  • @jason - thanks.

    @steve-2 - yes - just about to try it, but I am not sure if my laptop will have enough memory - have you done any monitoring to see what is required for the trillion?

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Too late now!

    I experimented with those union queries but couldn't find the right one, probably having too many unions or too few. Did you arrive at 32 by a trial and error process or did you have an inkling in advance?

  • Tom

    I think you'll find my query is written in a different way which speeds it up by a factor of 4.

  • [font="Arial"]

    Using an xml path clause, it's possible to perform the module operation only for 3 and 5, because the output for 15 is the concatenation of the other values.

    ;with cte(num) as (

    select 1 union all

    select num+1 from cte

    where num<100)

    select

    isnull(

    (select ltrim(string) from

    (select 3 module,'Fizz' string union all select 5,'Buzz') T

    where num % module=0

    for xml path(''))

    ,num)

    from cte

    option (maxrecursion 100)

    [/font]

  • steve-893342 (3/2/2010)


    Tom

    I think you'll find my query is written in a different way which speeds it up by a factor of 4.

    Yes, "order by (select 0)" is a nice. It's a "you can't do that" error that you get if you try to use "order by NULL" or "order by 0" in that context in a UDF. Using (select 0) (or (select NULL) - I wonder if it makes a difference) eliminates an expensive sort. I'm surprised it's a factor 4, though, and I don't see any other differences (apart from not being wrapped in a UDF).

    Tom

  • steve-893342 (3/2/2010)


    Too late now!

    I experimented with those union queries but couldn't find the right one, probably having too many unions or too few. Did you arrive at 32 by a trial and error process or did you have an inkling in advance?

    Re the unions: I kind of played around with various things - mostly based on vague hunches of what might be scalable.

    Along the way I tried variants of what I selected inside the first view (int, bit, char), variants of how many to select (2,4,8,16,32 etc), variants of how many times to cross-join, but in the end it all came down to testing performance of the various options - rather than intelligent design....

    I wanted to try views just because i thought it might (possibly) save a bit of compilation time and i was originally hoping for some parallelism (I don't know much about how the engine decides to parallelise(!?) obviously!) but it seems row_number() anywhere in this query prevents that...

    RE: The TRILLION....:ermm: my first attempt (overnight) did not perform as I would like (it was still running after 6.5 hours) so I think I may have a think about the maths of the cross joins and maybe try tweaking the row counts in my views to suit the larger set - but maybe i have found the limit of the views approach?

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Yes Mister Magoo that really is a very elegant solution indeed.

    I've had a look at comparing against my query and it sure is a close one to call.

    Re the TRILLION - I think you'll have to give it a lot longer than that!

  • Viewing 15 posts - 226 through 240 (of 363 total)

    You must be logged in to reply to this topic. Login to reply