Running total (cummulative Sum)

Question

Post reply

Running total (cummulative Sum)

tommey152

Right there with Babe

Points: 772
More actions
July 6, 2011 at 4:02 am

#239187

Hi ,

Have a table that looks like this:

id count

1 100

2 50

3 10

I want to add a new column called cumulative_sum, so the table would look like this:

id count cumulative_sum

1 100 100

2 50 150

3 10 160

Is there any SQL statement that can do this easily? What's the best way to accomplish this?

_______________________________________________________________

Need help? Help us help you.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

taybre SSC Eights! Points: 916 More actions · Answer 1



--drop table #data

create table #data

(

id int identity(1,1),

num numeric(18,0),

cnum numeric(18,0)

)

insert into #data (num, cnum)

select 100,0

union

select 50,0

union

select 10,0

union

select 65,0

declare @cnt int

set @cnt = 1

while (@cnt <= (select COUNT(*) from #data))

begin

	update #data

	set cnum = (select SUM(num) from #data

				where id <= @cnt)

	where [id] = @cnt

	set @cnt = @cnt +1

end

select * from #data

HTH :-):-)
taybre

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 2

tommey152 (7/6/2011)

Hi ,

.....

Is there any SQL statement that can do this easily? What's the best way to accomplish this?

Easiest is the recursive CTE:

DROP TABLE #SourceData



CREATE TABLE #SourceData (id INT IDENTITY(1,1), [count] INT, cumulative_sum INT)

INSERT INTO #SourceData (count) VALUES (100),(50),(10)

;WITH Accumulator AS (

	SELECT id,

		[count],

		cumulative_sum = [count]

	FROM #SourceData

	WHERE id = 1

	UNION ALL

	SELECT s.id,

		s.[count],

		a.cumulative_sum+s.[count]

	FROM #SourceData s

	INNER JOIN Accumulator a

		ON a.id + 1 = s.id

) SELECT id,

	[count],

	cumulative_sum

FROM Accumulator OPTION (MAXRECURSION 0)

This method can calculate values for a million row in around ten seconds providing there's a unique clustered index on the column which holds the sequential values

Best is probably the "Running Totals Update"[/url] - it's fastest anyway.

Worst are those methods using a triangular join - where the value for say row 10 is calculated by summing the values for rows 1 to 10, then the value for row 11 is calculated by summing the values for rows 1 to 11...

Cheers

ChrisM

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 3

Chris,

Do you know of a way to use the rCTE to do a running total without the ID's being perfectly sequential?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 4

Jeff Moden (7/6/2011)

Chris,

Do you know of a way to use the rCTE to do a running total without the ID's being perfectly sequential?

There's a way using Cross Apply by taking the Min of the greater IDs as your next connection, but it's data volume and index dependent on if you're better off slapping a Row_Number() on it at that point.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 5

Craig Farrell (7/6/2011)

Jeff Moden (7/6/2011)

Chris,

Do you know of a way to use the rCTE to do a running total without the ID's being perfectly sequential?

There's a way using Cross Apply by taking the Min of the greater IDs as your next connection, but it's data volume and index dependent on if you're better off slapping a Row_Number() on it at that point.

You'd think so, but it's not straightforward:

DROP TABLE #SourceData



CREATE TABLE #SourceData (id INT IDENTITY(1,1), [count] INT, cumulative_sum INT)

INSERT INTO #SourceData (count) VALUES (100),(50),(10) 

;WITH Accumulator AS (

        SELECT id,

                [count],

                cumulative_sum = [count]

        FROM #SourceData

        WHERE id = 1

        UNION ALL

        SELECT s.id,

                s.[count],

                a.cumulative_sum+s.[count]

        FROM Accumulator a

        CROSS APPLY (SELECT NextID = MIN(id) FROM #SourceData WHERE id > a.id) n

        INNER JOIN #SourceData s ON s.id = n.NextID

) SELECT id,

        [count],

        cumulative_sum

FROM Accumulator OPTION (MAXRECURSION 0)

Msg 467, Level 16, State 1, Line 5

GROUP BY, HAVING, or aggregate functions are not allowed in the recursive part of a recursive common table expression 'Accumulator'.

Jeff, thanks for asking this. I've been working on a rCTE running totals article for a few months (when not wrestling with SSIS) but it's too bland to offer as it stands. Cracking this particular nut could make it more interesting and worthwhile.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Paul White SSC Guru Points: 150457 More actions · Answer 6

ChrisM@Work (7/7/2011)

You'd think so, but it's not straightforward...

MIN and MAX may not be allowed in the recursive part, but MIN and MAX are no more than the first row (TOP 1) ordered in some particular way. Now, TOP isn't allowed in the recursive part either, but nobody told the query optimizer that, so we can use ROW_NUMBER = 1 to achieve a TOP operator in the plan.



DROP TABLE #SourceData;

GO

CREATE TABLE #SourceData

(

	id				INTEGER IDENTITY(1,1) PRIMARY KEY,

	[count]			INTEGER NOT NULL,

	cumulative_sum	INTEGER NULL,

);

GO

INSERT #SourceData

	(count)

VALUES

	(100),

	(50),

	(10);



WITH Accumulator AS

(

	SELECT

		sd.id,

		sd.[count],

		cumulative_sum = sd.[count]

	FROM #SourceData AS sd

	WHERE

		sd.id = 1

	UNION ALL

	SELECT

		iTVF.id,

		iTVF.[count],

		a.cumulative_sum + iTVF.[count]

	FROM Accumulator AS a

	CROSS APPLY

	(

		SELECT

			r.id,

			r.[count]

		FROM

		(

			SELECT

				sd.id,

				sd.[count],

				rn = ROW_NUMBER() OVER (

					ORDER BY sd.id)

			FROM #SourceData AS sd

			WHERE

				sd.id > a.id

		) AS r

		WHERE

			r.rn = 1

	) AS iTVF

)

SELECT

	a.id,

    a.[count],

    a.cumulative_sum

FROM Accumulator AS a

OPTION (MAXRECURSION 0);

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 7

Nice one, Paul. The triangular join put me off writing and testing it, but in practice the performance is stunningly good - half a million rows return in 12s on this box.

INSERT INTO #SourceData ([count]) SELECT a.[count] FROM #SourceData a, #SourceData b, #SourceData c, #SourceData d



INSERT INTO #SourceData ([count]) SELECT a.[count] FROM #SourceData a, #SourceData b, #SourceData c

go

;WITH Accumulator AS

(

        SELECT

                sd.id,

                sd.[count],

                cumulative_sum = sd.[count]

        FROM #SourceData AS sd

        WHERE

                sd.id = 1

        UNION ALL

        SELECT

                s.id,

                s.[count],

                a.cumulative_sum+s.[count]

        FROM Accumulator AS a

        CROSS APPLY

        (

                SELECT

                        NextID = r.id

                FROM

                (

                        SELECT

                                sd.id,

                                rn = ROW_NUMBER() OVER (ORDER BY sd.id)

                        FROM #SourceData AS sd

                        WHERE

                                sd.id > a.id

                ) AS r

                WHERE

                        r.rn = 1

        ) AS n

        INNER JOIN #SourceData s ON

                s.id = n.NextID

)

SELECT

        a.id,

    a.[count],

    a.cumulative_sum

FROM Accumulator AS a

OPTION (MAXRECURSION 0);

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Paul White SSC Guru Points: 150457 More actions · Answer 8

Hi Chris,

There's no need for a join at all - a point I realised after editing the TOP trick into your code. I have updated the code and query plan to show what I'm getting at here. Eight seconds on my machine. The only real issue I have is that recursive query plans cannot use parallelism directly. Shame.

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

The new version runs faster but not by much.

So the optimizer converts what looks like a triangular join (the entire CROSS APPLY part) into a TOP query?

Interestingly, you can put SUM OVER () into the iTVF without an error message, and so long as you don't reference the product in the outer query, the optimizer will still recognise the iTVF as a TOP query.

SELECT

sd.id,

sd.[count],

SUMcount = SUM([count]) OVER (PARTITION BY 1),

rn = ROW_NUMBER() OVER (ORDER BY sd.id)

FROM #SourceData AS sd

WHERE

sd.id > a.id

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 10

SQLkiwi (7/7/2011)

Hi Chris,

There's no need for a join at all - a point I realised after editing the TOP trick into your code. I have updated the code and query plan to show what I'm getting at here. Eight seconds on my machine. The only real issue I have is that recursive query plans cannot use parallelism directly. Shame.

Paul

This looks similar to your DISTINCT trick...

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Paul White SSC Guru Points: 150457 More actions · Answer 11

ChrisM@Work (7/7/2011)

So the optimizer converts what looks like a triangular join (the entire CROSS APPLY part) into a TOP query?

There's a bit more going on than that, but essentially yes.

Interestingly, you can put SUM OVER () into the iTVF without an error message, and so long as you don't reference the product in the outer query, the optimizer will still recognise the iTVF as a TOP query.

You can do that, but why would you want to? By the way, a windowing function in the recursive part of a CTE has a weird semantic (quite counter-intuitive, in fact) but works the way it does because they are intended for use in hierarchy queries, where apparently the implemented semantic is the desired one.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 12

SQLkiwi (7/7/2011)

... By the way, a windowing function in the recursive part of a CTE has a weird semantic (quite counter-intuitive, in fact) but works the way it does because they are intended for use in hierarchy queries, where apparently the implemented semantic is the desired one.

I heard that too but the full text of the article hasn't reached this end of the solar system yet!

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ColdCoffee SSC-Dedicated Points: 39971 More actions · Answer 13

ChrisM@Work (7/7/2011)

SQLkiwi (7/7/2011)

Hi Chris,

There's no need for a join at all - a point I realised after editing the TOP trick into your code. I have updated the code and query plan to show what I'm getting at here. Eight seconds on my machine. The only real issue I have is that recursive query plans cannot use parallelism directly. Shame.

Paul

This looks similar to your DISTINCT trick...

Hey Chris, the DISTINCT trick you are mentioning is the one Paul coded as an altenative to DISTINCT operation ? I have been searching for that piece of code but in vain. Can you point me to that piece?

Paul, if u still on this thread, can you point me to your altenative to DISTINCT function?

Paul White SSC Guru Points: 150457 More actions · Answer 14

Paul White

SSC Guru

Points: 150457

July 7, 2011 at 10:56 am

#1349847

ColdCoffee (7/7/2011)

Paul, if u still on this thread, can you point me to your altenative to DISTINCT function?

Sure.

http://qa.sqlservercentral.com/Forums/FindPost1013407.aspx

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi