top 80 percent from sum

  • I tried function top 80 percent i sorted desd, but it still get some unwanted records.

    60 20%

    50 17%

    50 17%

    45 15%

    30 10% 80%

    25 8%

    15 5%

    10 3%

    10 3%

    total 295 100%

    i can do in Excel, i need to use all records >=80 percent and other just deleted. any good advice how to do inSQL 2008? thanks

  • Select top 80 percent * from <sometable>

    will get you 80 percent of the total number of records. Since you have 9 records in the table you show, it returns the top 8 records in your code.

    If you want to get the top 80% of a cumulative sum of a numeric field, you have to do something like this (assuming you want them ordered in descending order):

    --Create data

    select * into databkup.dbo.testtbl

    from (

    select 60 [num] union all

    select 50 [num] union all

    select 50 [num] union all

    select 45 [num] union all

    select 30 [num] union all

    select 25 [num] union all

    select 15 [num] union all

    select 10 [num] union all

    select 10 [num]

    ) num

    --Start code

    ;with cumsum as

    (

    select num, ROW_NUMBER() over (order by num desc) id

    from testtbl

    )

    select c1.num, SUM(c2.num) csum from cumsum c1

    join cumsum c2 on c1.id >= c2.id

    group by c1.id, c1.num

    having SUM(c2.num) <= .8 * (select SUM(num) from testtbl)

    order by c1.num desc

  • I want to get the top 80% of a cumulative sum of a numeric field. Thanks but i have more than 1 mln rows and i am newbees. i am looking for something more simple.

    thanks

  • roryp 96873 (12/29/2011)


    If you want to get the top 80% of a cumulative sum of a numeric field, you have to do something like this (assuming you want them ordered in descending order):

    --Create data

    select * into databkup.dbo.testtbl

    from (

    select 60 [num] union all

    select 50 [num] union all

    select 50 [num] union all

    select 45 [num] union all

    select 30 [num] union all

    select 25 [num] union all

    select 15 [num] union all

    select 10 [num] union all

    select 10 [num]

    ) num

    --Start code

    ;with cumsum as

    (

    select num, ROW_NUMBER() over (order by num desc) id

    from testtbl

    )

    select c1.num, SUM(c2.num) csum from cumsum c1

    join cumsum c2 on c1.id >= c2.id

    group by c1.id, c1.num

    having SUM(c2.num) <= .8 * (select SUM(num) from testtbl)

    order by c1.num desc

    This uses a triangular join which grows geometrically and will quickly grind your server to halt. For a much better approach to calculating running totals read Jeff Moden's article Solving the Running Total and Ordinal Rank Problems (Rewritten)[/url]. Be sure to read all of the warnings and the discussion for the article as well.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thanks Drew, I will check out that article. I am pretty new to SQL Server myself and that method was the first thing that popped into my head.

    I am currently running it on a million rows on my local machine and can see what you mean about bringing things to a grinding halt.

  • cometav2011 (12/29/2011)


    I want to get the top 80% of a cumulative sum of a numeric field. Thanks but i have more than 1 mln rows and i am newbees. i am looking for something more simple.

    thanks

    Running totals are inherently complex in T-SQL (2008). You can't expect a simple solution to an inherently complex problem. If you want a simple solution, you'll have to wait for SQL 2012, because they're introducing functions that will make this much simpler.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Cometav, perhaps this will help you.

    Drew and/or others, I would appriciate any feedback on this method.

    After reading Jeff Moden's article that Drew suggested (which was straightforward and very helpful), I came up with this solution. The problem is that it will only work with a clustered index on the table which may not be cometav's case, I am not sure. I did run this method on a table with a million rows and it finished in about 4 seconds. The only change I had to make was switching the csum column from an int to a bigint as the numbers started to get fairly large in my example.

    select num, null csum into databkup.dbo.testtbl

    from (

    select 60 [num] union all

    select 50 [num] union all

    select 25 [num] union all

    select 45 [num] union all

    select 30 [num] union all

    select 50 [num] union all

    select 15 [num] union all

    select 10 [num] union all

    select 10 [num]

    ) num

    create clustered index ixc_num

    on dbo.testtbl (num)

    --Using the "Quirky Update" from Jeff Moden's article to populate the csum column.

    declare @csum bigint

    declare @counter int

    set @csum = 0

    set @counter = 0

    update testtbl

    set @csum = csum = case when @counter = 1 then num

    else @csum + num end,

    @counter = @counter + 1

    from testtbl

    option (maxdop 1)

    go

    --Selecting only the rows with a csum under 80% of the total when in descending order.

    select num

    from testtbl

    where ((select SUM(num) from testtbl)-csum)/convert(float, (select SUM(num) from testtbl)) <= .8

    order by num desc

    Again, any feedback on this would be greatly appriciated. As I said before I am still pretty new to SQL Server and I can manage to get most things done in there, but my efficiency at a lot of those things could use plenty of work.

  • Here is my solution...*edit had to remove numbers CTE :-P*

    ;WITH NUMBERS(value)

    AS( SELECT 60

    UNION ALL

    SELECT 50

    UNION ALL

    SELECT 50

    UNION ALL

    SELECT 45

    UNION ALL

    SELECT 30

    UNION ALL

    SELECT 25

    UNION ALL

    SELECT 15

    UNION ALL

    SELECT 10

    UNION ALL

    SELECT 10

    )

    , RANKED

    AS (SELECT rc = ROW_NUMBER() OVER (ORDER BY VALUE desc),

    value,total = cast(value as float)/cast(x.total as float),

    maxper = CAST(x.total * .8 as float)

    FROM NUMBERS

    CROSS APPLY (SELECT total = SUM(value) FROM NUMBERS) x )

    , maxrecords

    AS (SELECT TOP 1 record,min,max, perc = SUM(r2.total)

    FROM ( SELECT record = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

    min = R1.rc,

    max = R2.rc

    FROM RANKED R1 CROSS JOIN RANKED R2

    WHERE R1.rc < R2.rc AND R1.rc = 1

    ) NEWRANK CROSS JOIN RANKED r1 CROSS JOIN RANKED r2 --== All the magic is done here

    WHERE min = r1.rc and max >= r2.rc

    GROUP BY record,min,max

    HAVING SUM(r2.total) <= .80 --This of course can be changed

    ORDER BY record DESC

    )

    SELECT value,total FROM RANKED WHERE rc BETWEEN 1 AND (SELECT MAX FROM maxrecords)

  • thanks all

  • A SQL Server 2012 solution, for comparison's sake:

    DECLARE @Data TABLE

    (

    item INTEGER NOT NULL

    )

    ;

    INSERT @Data

    (item)

    VALUES

    (60),

    (50),

    (50),

    (45),

    (30),

    (25),

    (15),

    (10),

    (10)

    ;

    SELECT

    item,

    pct_sum

    FROM

    (

    SELECT

    d.item,

    pct_sum =

    SUM(d.item) OVER (ORDER BY d.item DESC) * 100 /

    SUM(d.item) OVER ()

    FROM @Data AS d

    ) AS ps

    WHERE

    ps.pct_sum <= 80

  • roryp 96873 (12/29/2011)


    After reading Jeff Moden's article that Drew suggested (which was straightforward and very helpful), I came up with this solution. The problem is that it will only work with a clustered index on the table which may not be cometav's case, I am not sure. I did run this method on a table with a million rows and it finished in about 4 seconds. The only change I had to make was switching the csum column from an int to a bigint as the numbers started to get fairly large in my example.

    When I try this code on SQL Server 2008 R2 or SQL Server 2012, I get this error:

    [font="Courier New"]Msg 425, Level 16, State 1, Line 7

    Data type bigint of receiving variable is not equal to the data type int of column 'csum'.[/font]

    Anyway, the 'quirky update' is problematic for a number of reasons. It does require, as you say, a particular clustered index on the table. Also, it persists the running total in a separate column, which again may not be suitable for the case at hand. The biggest objection to the method though is that it relies on undocumented (and therefore unsupported) behaviour. This is generally enough to dissuade people from using it in production. Alternative methods are available, as shown in the (rather long) comment thread associated with Jeff's article.

  • Thanks Paul, I haven't made it through the comments in that article yet, but I will be sure to read through them. That error came from me switching the cumulative sum column from an int to bigint. I think I left it as a bigint in the code I posted, switching it back to an int should make it work.

  • roryp 96873 (12/30/2011)


    Thanks Paul, I haven't made it through the comments in that article yet, but I will be sure to read through them. That error came from me switching the cumulative sum column from an int to bigint. I think I left it as a bigint in the code I posted, switching it back to an int should make it work.

    Yep, no worries.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply