Temp number list table

Question

Post reply

Temp number list table

Mark Green

SSCrazy

Points: 2085
More actions
September 4, 2006 at 10:17 am

#115334

What is the most efficient way of creating a tempory number list table where there is only a int key column and the number of rows in the the column = @variable?

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 1

Here's one way... nasty fast... takes 60-123 milliseconds for 11,000 rows (more than 30 years worth of dates if needed) and 5.8 to 9.2 seconds for a million rows (if TempDB doesn't need to grow)...

--===== If the temporary Tally (numbers) table exists, drop it

IF OBJECT_ID('TempDB..#Tally') IS NOT NULL

BEGIN

TRUNCATE TABLE #Tally --Very fast and makes the drop a bit faster

DROP TABLE #Tally

END

--===== Declare and set a variable to hold the number of desired rows

DECLARE @Rows INT

SET @Rows = 11000

--===== Limit the processing to the number of rows desired

SET ROWCOUNT @Rows

SET NOCOUNT ON

--===== Create and populate the Tally table on the fly

SELECT IDENTITY(INT,1,1) AS N

INTO #Tally

FROM Master.dbo.SysColumns sc1,

Master.dbo.SysColumns sc2

--===== Change the processing limit back to normal

SET ROWCOUNT 0

--===== Add a Primary Key to maximize performance

ALTER TABLE #Tally

ADD CONSTRAINT PK_tmpTally_N PRIMARY KEY CLUSTERED (N)

My question would be (because there might be something better)... why don't you just create a permanent tally table? What is it that you are doing?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Mark Green SSCrazy Points: 2085 More actions · Answer 2

As new data is added to the base table the number of rows required may change.

The statement used is like:

INSERT INTO Deaths (

Year_of_death,

Ageyears,

Sex,

Undcause,

ICD,

OAcode)

select Year_ref,

Age_start,

Sex,

ICD_ref,

ICD_number,

'XXXXXXXXXX'

from Eng_Wales_Deaths t

join Number_list n

on n.number < t.Person_count

WHERE n.number < @Max_person_count

Update:

I have just found that I don't think that I have enough memory to do this statement. Can anyone offer an alternative approach?

Mark Green SSCrazy Points: 2085 More actions · Answer 3

Mark Green

SSCrazy

Points: 2085

September 4, 2006 at 11:07 am

#658401

Is the column name in the #Tally table: PK_tmpTally_N?

Update:

I can see now that it is N

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 4

I guess I don't have enough information... first, I'm not sure why you need this...

join Number_list n

on n.number < t.Person_count --This makes a triangular join!!!

WHERE n.number < @Max_person_count

Second, are you aware that this makes triangular join (a bit more than half a cross join)??? Depending on the number of records you have, this could make millions of apparent joins (sometimes, billions).... the formula for the number of apparent records in a triangular join is...

(X²+X)/2

... where X would be 1 less than t.Person_Count in this case. If that number is, for example, 10,000 (very small table), then the number of apparent records processed will be (9,999² + 9,999)/2 or a whopping big 49,995,000 records!!! If you have 100,000 records (considered to be small to medium sized table), then number becomes insanely large at 4,999,950,000!!!!!!!!!! That's right... almost 5 billion apparent joined records will need to be resolved!

Like I said, I think you may have a flaw in your logic but I don't have enough info to help you figure it out. Help me help you... please explain why you think the join to the numbers table is necessary...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 5

Ninja's_RGR'us

SSC Guru

Points: 294069

September 4, 2006 at 12:01 pm

#658414

What's the difference between a triangular and half cross join...

and when the heck do I need to use that !?!?

Mark Green SSCrazy Points: 2085 More actions · Answer 6

The system was designed for small geographical area statistics. But my boss wants it to perform the same operation for England and Wales data. The small area statistics has a seperate row for each death. The England and Wales data uses Person count instead. I need to get the England and Wales data in to the same form as the local area data. This is why I perform the triangular join. After I perform the triangular join it offers me, for 2003, 363049 rows. But the number of rows should be 538254. I changed a few less-than signs in the equation above to <=. Maybe I should try again but limit the number of years I filter on. This would tell me if I have a memory problem or not.

Update:

Reducing the size of the table does not make any difference. I don't therefore think that it is a capacity problem. There must be somthing wrong with the code or the base table. Update B : It is a base table problem. I'm still using your code above though. Very useful. Thanks.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 7

X/2 records is the difference

Not sure why Mark thinks one is necessary here even with his latest explanation. Still trying to figure it out...

So far as triangular joins go, have seen lot's of folks make the mistake of using it to make running totals, running counts, and the like... get's real slow if more than 10k records are in the source table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 8

Update B : It is a base table problem. I'm still using your code above though. Very useful. Thanks.

Thanks Mark... I'm still trying to figure out why the triangular join is necessary at all...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 9

Ninja's_RGR'us

SSC Guru

Points: 294069

September 4, 2006 at 1:10 pm

#658422

Any exemple of when it should be used??

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 10

Sure, Remi... thanks for the interest...

Personally, I think triangular joins should never be used because it's an absolutely insane form of RBAR (a "Modenism" for "Row By Agonizing Row"). Here's an example that calculates both a running total and a running count... this won't look bad in the estimated execution plan (another trap! The correlated subqueries hide the problem!!) but takes huge amounts of time after you approach the 10k record mark as you'll see if you take a look at the record counts in the actual execution plan (the row counts are, in fact, (X²+X)/2) ...

USE NorthWind

SELECT x.OrderID,

x.Freight,

(SELECT SUM(y.Freight) FROM dbo.Orders y WHERE y.OrderID <= x.OrderID) AS RunningTotal,

(SELECT COUNT(y.Freight) FROM dbo.Orders y WHERE y.OrderID <= x.OrderID) AS RunningCount

FROM dbo.Orders X

ORDER BY x.OrderID

... as you can see (triangular joins in bold blue), each correlated subquery must be executed for each row in the outer select. A triangular join is included in each subquery which will really bog things down on larger tables. It IS a very simple way to code running totals/counts and even ranking (2005 has a better way) but it is NOT scalable... the number of apparent rows grows exponentionally...

If you have a Tally or Numbers table that goes to 10K, you can try this...

--===== This example "only" takes 1 second to process...

-- ... I think 1000 rows per second is horribly slow ... but wait for it...

SELECT x.N,

(SELECT SUM(y.N) FROM dbo.Tally y WHERE y.N <= x.N) AS RunningTotal,

(SELECT COUNT(y.N) FROM dbo.Tally y WHERE y.N <= x.N) AS RunningCount

FROM dbo.Tally X

WHERE x.N<=1000

ORDER BY x.N

--===== This example takes 56 times longer but only has 10 times the number of records

SELECT x.N,

(SELECT SUM(y.N) FROM dbo.Tally y WHERE y.N <= x.N) AS RunningTotal,

(SELECT COUNT(y.N) FROM dbo.Tally y WHERE y.N <= x.N) AS RunningCount

FROM dbo.Tally X

WHERE x.N<=10000

ORDER BY x.N

And those are just normal running totals/counts... not even grouped by user or anything... you can just imagine what the next order of magnitude would be

By the way... the last row of the RunningTotal column, in this case, just happens to contain the value of (X²+X)/2 (the number of apparent records processed), but because we have two triangular joins, you have to double that number to appreciate how hard these darned triangular joins make the computer work.

In fact, I'm writing an article for Steve Jones on how to get around this hidden performance sapper as we speak... The method I'm working on for the article will do a grouped running total on 1 MILLION records in about 13 seconds or so and 4 MILLION records in about 58 seconds. Takes about the same amount of time to do a flat running total, too. And, it's scalable... the duration grows in a close to linear fashion when compared to the number of records.

I won't even get into how bad triangular joins are when they are used in a WHERE clause in a single select 'cause we kind of covered that in the correlated subqueries above.

(UPDATE) Hope this doesn't sound to much like diatribe but these forms of hidden RBAR are a very serious problem in most Developer's code and (1) they never see it coming and (2) they either don't know how to fix it or will resort to a cursor to fix it. Like I said, I'm writing an article on the problem and the fix (will be part II of a performance series of articles), but here's a tease...

SQL Server 2005 has some very neat tricks for doing running counts using the new "Ranking" functions, but what about those of us still using SQL Server 2000? Are we stuck with Cursors or the horrendously slow triangular joins that use "<" to create running totals or running counts? Not hardly…

SQL Server has a very unique form of UPDATE that most never give a second thought to. Here’s the syntax…

UPDATE sometablename

SET @somevariable = somecolumnname = someexpresion/calculation

Yep, there it is in black and white… you can set the value of a variable and a column at the same time!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 11

Ninja's_RGR'us

SSC Guru

Points: 294069

September 4, 2006 at 3:13 pm

#658427

Ok I confess I wrote a few of those... Can't wait for the other solution now .

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 12

Yap... we've all written "those"... see the update I just posted to my previous message...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 13

Ninja's_RGR'us

SSC Guru

Points: 294069

September 4, 2006 at 3:27 pm

#658429

Ain't that update based on the clustered index??? I already wrote that for something else and I was always wary that something could go wrong if I didn't have exclusive access on the table...

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 14

Yep... and you've nailed the key to it... exclusive access... now, what table(s) will a developer ALWAYS have exclusive access to?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions