top 80 percent from sum

Question

Post reply

top 80 percent from sum

cometav2011

SSC Veteran

Points: 285
More actions
December 29, 2011 at 8:00 am

#254118

I tried function top 80 percent i sorted desd, but it still get some unwanted records.

60 20%

50 17%

50 17%

45 15%

30 10% 80%

25 8%

15 5%

10 3%

10 3%

total 295 100%

i can do in Excel, i need to use all records >=80 percent and other just deleted. any good advice how to do inSQL 2008? thanks

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

roryp 96873 SSCertifiable Points: 6649 More actions · Answer 1

Select top 80 percent * from <sometable>

will get you 80 percent of the total number of records. Since you have 9 records in the table you show, it returns the top 8 records in your code.

If you want to get the top 80% of a cumulative sum of a numeric field, you have to do something like this (assuming you want them ordered in descending order):



--Create data

select * into databkup.dbo.testtbl

from (

select 60 [num] union all

select 50 [num] union all

select 50 [num] union all

select 45 [num] union all

select 30 [num] union all

select 25 [num] union all

select 15 [num] union all

select 10 [num] union all

select 10 [num]

 ) num

--Start code

;with cumsum as

(

select num, ROW_NUMBER() over (order by num desc) id

from testtbl

)

select c1.num, SUM(c2.num) csum from cumsum c1

join cumsum c2 on c1.id >= c2.id

group by c1.id, c1.num

having SUM(c2.num) <= .8 * (select SUM(num) from testtbl)

order by c1.num desc

cometav2011 SSC Veteran Points: 285 More actions · Answer 2

cometav2011

SSC Veteran

Points: 285

December 29, 2011 at 9:04 am

#1426498

I want to get the top 80% of a cumulative sum of a numeric field. Thanks but i have more than 1 mln rows and i am newbees. i am looking for something more simple.

thanks

drew.allen SSC Guru Points: 76776 More actions · Answer 3

roryp 96873 (12/29/2011)

If you want to get the top 80% of a cumulative sum of a numeric field, you have to do something like this (assuming you want them ordered in descending order):

--Create data select * into databkup.dbo.testtbl from ( select 60 [num] union all select 50 [num] union all select 50 [num] union all select 45 [num] union all select 30 [num] union all select 25 [num] union all select 15 [num] union all select 10 [num] union all select 10 [num] ) num --Start code ;with cumsum as ( select num, ROW_NUMBER() over (order by num desc) id from testtbl ) select c1.num, SUM(c2.num) csum from cumsum c1 join cumsum c2 on c1.id >= c2.id group by c1.id, c1.num having SUM(c2.num) <= .8 * (select SUM(num) from testtbl) order by c1.num desc

This uses a triangular join which grows geometrically and will quickly grind your server to halt. For a much better approach to calculating running totals read Jeff Moden's article Solving the Running Total and Ordinal Rank Problems (Rewritten)[/url]. Be sure to read all of the warnings and the discussion for the article as well.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

roryp 96873 SSCertifiable Points: 6649 More actions · Answer 4

roryp 96873

SSCertifiable

Points: 6649

December 29, 2011 at 9:14 am

#1426506

Thanks Drew, I will check out that article. I am pretty new to SQL Server myself and that method was the first thing that popped into my head.

I am currently running it on a million rows on my local machine and can see what you mean about bringing things to a grinding halt.

drew.allen SSC Guru Points: 76776 More actions · Answer 5

cometav2011 (12/29/2011)

I want to get the top 80% of a cumulative sum of a numeric field. Thanks but i have more than 1 mln rows and i am newbees. i am looking for something more simple.

thanks

Running totals are inherently complex in T-SQL (2008). You can't expect a simple solution to an inherently complex problem. If you want a simple solution, you'll have to wait for SQL 2012, because they're introducing functions that will make this much simpler.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

roryp 96873 SSCertifiable Points: 6649 More actions · Answer 6

Cometav, perhaps this will help you.

Drew and/or others, I would appriciate any feedback on this method.

After reading Jeff Moden's article that Drew suggested (which was straightforward and very helpful), I came up with this solution. The problem is that it will only work with a clustered index on the table which may not be cometav's case, I am not sure. I did run this method on a table with a million rows and it finished in about 4 seconds. The only change I had to make was switching the csum column from an int to a bigint as the numbers started to get fairly large in my example.



select num, null csum into databkup.dbo.testtbl

from (

select 60 [num] union all

select 50 [num] union all

select 25 [num] union all

select 45 [num] union all

select 30 [num] union all

select 50 [num] union all

select 15 [num] union all

select 10 [num] union all

select 10 [num]

 ) num

 create clustered index ixc_num

	on dbo.testtbl (num)

--Using the "Quirky Update" from Jeff Moden's article to populate the csum column.

declare @csum bigint

declare @counter int

set @csum = 0

set @counter = 0

update testtbl

set @csum = csum = case when @counter = 1 then num

					else @csum + num end,

	@counter = @counter + 1

from testtbl

option (maxdop 1)

go

--Selecting only the rows with a csum under 80% of the total when in descending order.

select num

from testtbl

where ((select SUM(num) from testtbl)-csum)/convert(float, (select SUM(num) from testtbl)) <= .8

order by num desc

Again, any feedback on this would be greatly appriciated. As I said before I am still pretty new to SQL Server and I can manage to get most things done in there, but my efficiency at a lot of those things could use plenty of work.

SQL Padawan Say Hey Kid Points: 712 More actions · Answer 7

Here is my solution...*edit had to remove numbers CTE :-P*

;WITH NUMBERS(value)



	AS(	SELECT 60

		UNION ALL

		SELECT 50

		UNION ALL

		SELECT 50

		UNION ALL

		SELECT 45

		UNION ALL

		SELECT 30

		UNION ALL

		SELECT 25

		UNION ALL

		SELECT 15

		UNION ALL

		SELECT 10

		UNION ALL

		SELECT 10

		)

, RANKED

	AS (SELECT rc = ROW_NUMBER() OVER (ORDER BY VALUE desc),

				value,total = cast(value as float)/cast(x.total as float),

				maxper = CAST(x.total * .8  as float)

		FROM NUMBERS

		CROSS APPLY (SELECT total = SUM(value) FROM NUMBERS) x )

, maxrecords

	AS (SELECT TOP 1 record,min,max, perc = SUM(r2.total)

		FROM (	SELECT record = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

					min = R1.rc,

					max = R2.rc

				FROM RANKED R1 CROSS JOIN RANKED R2

				WHERE R1.rc < R2.rc AND R1.rc = 1

		) NEWRANK CROSS JOIN RANKED r1 CROSS JOIN RANKED r2    --== All the magic is done here

		WHERE min = r1.rc and max >= r2.rc

		GROUP BY record,min,max

		HAVING SUM(r2.total) <= .80 --This of course can be changed

		ORDER BY record DESC

		)

SELECT value,total FROM RANKED WHERE rc BETWEEN 1 AND (SELECT MAX FROM maxrecords)

cometav2011 SSC Veteran Points: 285 More actions · Answer 8

cometav2011

SSC Veteran

Points: 285

December 29, 2011 at 2:02 pm

#1426708

thanks all

Paul White SSC Guru Points: 150457 More actions · Answer 9

A SQL Server 2012 solution, for comparison's sake:



DECLARE @Data TABLE

(

    item INTEGER NOT NULL

)

;

INSERT @Data

    (item)

VALUES

    (60),

    (50),

    (50),

    (45),

    (30),

    (25),

    (15),

    (10),

    (10)

;



SELECT

    item,

    pct_sum

FROM

(

    SELECT

        d.item,

        pct_sum =

            SUM(d.item) OVER (ORDER BY d.item DESC) * 100 /

            SUM(d.item) OVER ()

    FROM @Data AS d

) AS ps

WHERE

    ps.pct_sum <= 80

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Paul White SSC Guru Points: 150457 More actions · Answer 10

roryp 96873 (12/29/2011)

After reading Jeff Moden's article that Drew suggested (which was straightforward and very helpful), I came up with this solution. The problem is that it will only work with a clustered index on the table which may not be cometav's case, I am not sure. I did run this method on a table with a million rows and it finished in about 4 seconds. The only change I had to make was switching the csum column from an int to a bigint as the numbers started to get fairly large in my example.

When I try this code on SQL Server 2008 R2 or SQL Server 2012, I get this error:

[font="Courier New"]Msg 425, Level 16, State 1, Line 7

Data type bigint of receiving variable is not equal to the data type int of column 'csum'.[/font]

Anyway, the 'quirky update' is problematic for a number of reasons. It does require, as you say, a particular clustered index on the table. Also, it persists the running total in a separate column, which again may not be suitable for the case at hand. The biggest objection to the method though is that it relies on undocumented (and therefore unsupported) behaviour. This is generally enough to dissuade people from using it in production. Alternative methods are available, as shown in the (rather long) comment thread associated with Jeff's article.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

roryp 96873 SSCertifiable Points: 6649 More actions · Answer 11

roryp 96873

SSCertifiable

Points: 6649

December 30, 2011 at 7:59 am

#1426965

Thanks Paul, I haven't made it through the comments in that article yet, but I will be sure to read through them. That error came from me switching the cumulative sum column from an int to bigint. I think I left it as a bigint in the code I posted, switching it back to an int should make it work.

Paul White SSC Guru Points: 150457 More actions · Answer 12

roryp 96873 (12/30/2011)

Thanks Paul, I haven't made it through the comments in that article yet, but I will be sure to read through them. That error came from me switching the cumulative sum column from an int to bigint. I think I left it as a bigint in the code I posted, switching it back to an int should make it work.

Yep, no worries.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi