FizzBuzz

Question

Post reply

FizzBuzz

Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

steve-893342 SSCarpal Tunnel Points: 4439 More actions · Answer 1

OK then, let's take a look and up the ante to 4 billion rows at the same time.

Database in SIMPLE recovery mode.

ZERO growth in database files.

ZERO growth in tempdb files.

set statistics time on

;

with

cte1 as

(select TOP 1681 1 as col from master..spt_values)

,

cte2 as

(select row_number() over (order by (select 0)) as row from cte1

cross join cte1 AS T2

cross join cte1 AS T3)

SELECT count_big(all case

when row%15 = 0 then 'FIZZBUZZ'

when row%5 = 0 then 'BUZZ'

when row%3 = 0 then 'FIZZ'

else cast(row as varchar(10))

end)

--from cte2 where row < 150000001

from cte2 where row < 4000000001

set statistics time off

TomThomson SSC Guru Points: 104773 More actions · Answer 2

steve-893342 (2/28/2010)

OK then, let's take a look and up the ante to 4 billion rows at the same time.

Database in SIMPLE recovery mode.

ZERO growth in database files.

ZERO growth in tempdb files.

set statistics time on

;

with

cte1 as

(select TOP 1681 1 as col from master..spt_values)

,

cte2 as

(select row_number() over (order by (select 0)) as row from cte1

cross join cte1 AS T2

cross join cte1 AS T3)

SELECT count_big(all case

when row%15 = 0 then 'FIZZBUZZ'

when row%5 = 0 then 'BUZZ'

when row%3 = 0 then 'FIZZ'

else cast(row as varchar(10))

end)

--from cte2 where row < 150000001

from cte2 where row < 4000000001

set statistics time off

That is roughly what I had a week or so ago, before I realised that Jason's method using no table access was inevitably better. The only differences are that I used a three way join on a subset of master.sys.all_columns (I also tried using instead master.dbo.Tally, which on my system is a public read access 11000 Tally table), and that I was working with a smaller number of rows because the cube root of 4,294,967,296 is a bit under 1626, so that's big enough with a 3-way join. It ran several times more slowly than using a 6 way join that didn't read any tables. The trouble appears to be that reading more than 1000 records from a system view uses more CPU than doing the whole thing without reading from filestore.

I thought about using a 5 way self-join, but it requires 85 elements in the base CTE and the gain (if any) seemed unlikely to be worth the extra typing.

Tom

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 3

steve-893342 (2/28/2010)

OK then, let's take a look and up the ante to 4 billion rows at the same time.

Database in SIMPLE recovery mode.

ZERO growth in database files.

ZERO growth in tempdb files.

set statistics time on

;

with

cte1 as

(select TOP 1681 1 as col from master..spt_values)

,

cte2 as

(select row_number() over (order by (select 0)) as row from cte1

cross join cte1 AS T2

cross join cte1 AS T3)

SELECT count_big(all case

when row%15 = 0 then 'FIZZBUZZ'

when row%5 = 0 then 'BUZZ'

when row%3 = 0 then 'FIZZ'

else cast(row as varchar(10))

end)

--from cte2 where row < 150000001

from cte2 where row < 4000000001

set statistics time off

You got zero growth everywhere? Cool! I'll have to give the CTE version of the table join a shot at it. How long did it take your system and how fast is the processor? I ask because on the test I did about 6 months ago, it took 40 minutes just for a billion rows on my poor ol' 1.8GHz 8 year old box.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

steve-893342 SSCarpal Tunnel Points: 4439 More actions · Answer 4

steve-893342

SSCarpal Tunnel

Points: 4439

March 1, 2010 at 8:50 am

#1126492

Well I ran it on my laptop (2.13 GHz, 1 GB RAM)

and it took 47 minutes for the 4 billion rows. On a more conventional server set up it took 20 minutes.

Have you got a really beefy server you can run this on?

I'm interested in how fast this will go!

steve-893342 SSCarpal Tunnel Points: 4439 More actions · Answer 5

OK, so how far can we push this one then with the tools at my disposal? 100 billion is a nice round number and is just a billion times greater than the modest 100 used at the inception of the FIZZBUZZ challenge. On this SQL Server 2008 box I am currently using, the 4 billion row query took 17 minutes. If this query is scalable to the 100 billion level, then it should take roughly 17 minutes x 25 which is just over 7 hours. The actual time taken for the 100 billion row query was 6 hours 56 minutes. A scalability almost exactly as predicted!

with

cte1 as

(select TOP 4642 1 as col from master.sys.all_columns)

,

cte2 as

(select row_number() over (order by (select 0)) as row from cte1

cross join cte1 AS T2

cross join cte1 AS T3)

SELECT count_big(all case

when row%15 = 0 then 'FIZZBUZZ'

when row%5 = 0 then 'BUZZ'

when row%3 = 0 then 'FIZZ'

else cast(row as varchar(12))

end)

from cte2 where row < 100000000001

mister.magoo SSC-Forever Points: 47068 More actions · Answer 6

I have posted a solution on ASK that does the COUNT_BIG select for 4 Billion rows in 16 minutes on SQL express on my laptop, while I am working in outlook and have a vm running an erp package, so I am sure it would be quicker on a server with lots of super fast RAM.

Surprisingly (maybe) instead of using CTEs I used good old Views and in all my tests it is slightly quicker than using CTEs....

If only there was a quick alternative to the row_number function that also allowed parallelism, then we could use more than one thread and speed it up even more....

I basically use a simple union of 32 "select 1" statements in the first view, then cross join that with itself in two more views to get over a million potential rows.

I then use a fourth cross join view to do the ROW_NUMBER thing and make available over a trillion rows

The last view does the fizzbuzz thing using the row numbers generated in view 4.

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

SQLRNNR SSC Guru Points: 281324 More actions · Answer 7

SQLRNNR

SSC Guru

Points: 281324

March 2, 2010 at 5:09 pm

#1127471

nice solution

Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events

steve-893342 SSCarpal Tunnel Points: 4439 More actions · Answer 8

Why thank you Mister Magoo that's very interesting feedback. I have most recently been looking at my version of the 4 billion on a virtual server running SQL Server 2005 and it's taking 15 minutes. My record is 13 minutes on a slightly higher spec virtual machine. Your timings are really interesting because it's really difficult to gauge what they mean when taken in isolation.

BTW do you intend to go for the trillion any time soon?

Here's my version

THE TRILLION

with

cte1 as

--(select TOP 10000 1 as col from master..syscolumns)

(select TOP 10000 1 as col from master..tally)

,

cte2 as

(select row_number() over (order by (select 0)) as row from cte1

cross join cte1 AS T2

cross join cte1 AS T3)

SELECT count_big(all case

when row%15 = 0 then 'FIZZBUZZ'

when row%5 = 0 then 'BUZZ'

when row%3 = 0 then 'FIZZ'

else cast(row as varchar(14))

end)

from cte2 where row < 1000000000001

mister.magoo SSC-Forever Points: 47068 More actions · Answer 9

@jason - thanks.

@steve-2 - yes - just about to try it, but I am not sure if my laptop will have enough memory - have you done any monitoring to see what is required for the trillion?

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

steve-893342 SSCarpal Tunnel Points: 4439 More actions · Answer 10

steve-893342

SSCarpal Tunnel

Points: 4439

March 2, 2010 at 8:41 pm

#1127513

Too late now!

I experimented with those union queries but couldn't find the right one, probably having too many unions or too few. Did you arrive at 32 by a trial and error process or did you have an inkling in advance?

steve-893342 SSCarpal Tunnel Points: 4439 More actions · Answer 11

steve-893342

SSCarpal Tunnel

Points: 4439

March 2, 2010 at 8:43 pm

#1127514

Tom

I think you'll find my query is written in a different way which speeds it up by a factor of 4.

massimo.dipaolo Old Hand Points: 349 More actions · Answer 12

[font="Arial"]

Using an xml path clause, it's possible to perform the module operation only for 3 and 5, because the output for 15 is the concatenation of the other values.



;with cte(num) as (

  select 1 union all

  select num+1 from cte

  where num<100)

select

  isnull(

    (select ltrim(string) from

	(select 3 module,'Fizz' string union all select 5,'Buzz') T

	 where num % module=0

	 for xml path(''))

    ,num)

from cte

option (maxrecursion 100)

[/font]

TomThomson SSC Guru Points: 104773 More actions · Answer 13

steve-893342 (3/2/2010)

Tom

I think you'll find my query is written in a different way which speeds it up by a factor of 4.

Yes, "order by (select 0)" is a nice. It's a "you can't do that" error that you get if you try to use "order by NULL" or "order by 0" in that context in a UDF. Using (select 0) (or (select NULL) - I wonder if it makes a difference) eliminates an expensive sort. I'm surprised it's a factor 4, though, and I don't see any other differences (apart from not being wrapped in a UDF).

Tom

mister.magoo SSC-Forever Points: 47068 More actions · Answer 14

steve-893342 (3/2/2010)

Too late now!

I experimented with those union queries but couldn't find the right one, probably having too many unions or too few. Did you arrive at 32 by a trial and error process or did you have an inkling in advance?

Re the unions: I kind of played around with various things - mostly based on vague hunches of what might be scalable.

Along the way I tried variants of what I selected inside the first view (int, bit, char), variants of how many to select (2,4,8,16,32 etc), variants of how many times to cross-join, but in the end it all came down to testing performance of the various options - rather than intelligent design....

I wanted to try views just because i thought it might (possibly) save a bit of compilation time and i was originally hoping for some parallelism (I don't know much about how the engine decides to parallelise(!?) obviously!) but it seems row_number() anywhere in this query prevents that...

RE: The TRILLION....:ermm: my first attempt (overnight) did not perform as I would like (it was still running after 6.5 hours) so I think I may have a think about the maths of the cross joins and maybe try tweaking the row counts in my views to suit the larger set - but maybe i have found the limit of the views approach?

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

steve-893342 SSCarpal Tunnel Points: 4439 More actions · Answer 15

steve-893342

SSCarpal Tunnel

Points: 4439

March 3, 2010 at 4:57 am

#1127647

Yes Mister Magoo that really is a very elegant solution indeed.

I've had a look at comparing against my query and it sure is a close one to call.

Re the TRILLION - I think you'll have to give it a lot longer than that!