Eliminate top and bottom percent of records

  • I have a temp table that is being populated and below is a representation of the data.  Basically, for each mile_code and rep combination there will be ‘X’ number of records.  (In the example below there are 30 records.)  What I need to do is eliminate the top and bottom 10% (This could be any percentage and I want to be able to change that easily if I have to.) of DeltaT for each mile_code and rep combination and then average the remaining values in DeltaT for each mile_code and rep combination. 

     

    I’ve tried a bunch of different approaches and none give me what I’m looking for.  Any ideas or solutions?

     

    mile_code                 DeltaT                    rep                            close_date

    Approach                 1                            Coleman                    6/5/2007 16:52

    Approach                 1                            COLEMAN                 6/5/2007 14:32

    Approach                 1                            COLEMAN                 6/5/2007 9:26

    Approach                 21                          COLEMAN                 5/31/2007 14:07

    Approach                 14                          COLEMAN                 5/31/2007 9:53

    Approach                 7                            Coleman                    5/30/2007 16:12

    Approach                 36                          COLEMAN                 5/29/2007 15:53

    Approach                 8                            COLEMAN                 5/29/2007 13:54

    Approach                 8                            COLEMAN                 5/29/2007 13:23

    Approach                 8                            COLEMAN                 5/29/2007 9:00

    … (20 more records)

    Interview                  1                            Coleman                    6/5/2007 16:52

    Interview                  1                            COLEMAN                 6/5/2007 14:32

    Interview                  1                            COLEMAN                 6/5/2007 9:26

    Interview                  8                            COLEMAN                 5/29/2007 13:54

    Interview                  8                            COLEMAN                 5/29/2007 9:00

    Interview                  14                          COLEMAN                 5/25/2007 8:37

    Interview                  61                          COLEMAN                 5/24/2007 10:57

    Interview                  14                          COLEMAN                 5/23/2007 9:32

    Interview                  26                          COLEMAN                 5/18/2007 9:39

    Interview                  23                          Coleman                    5/14/2007 13:25

    … (20 more records)

    Followed by 30 each of the other 3 milecodes for Coleman

     

    Approach                 0                            DOLAN                      6/6/2007 9:27

    Approach                 1                            DOLAN                      6/5/2007 13:39

    Approach                 1                            DOLAN                      6/5/2007 12:20

    Approach                 1                            DOLAN                      6/5/2007 11:37

    Approach                 92                          DOLAN                      6/5/2007 11:12

    Approach                 91                          DOLAN                      6/5/2007 9:44

    Approach                 2                            Dolan                         6/4/2007 11:53

    Approach                 2                            DOLAN                      6/4/2007 11:11

    Approach                 5                            Dolan                         6/1/2007 16:07

    Approach                 177                        DOLAN                      6/1/2007 10:58

    … (20 more records)

    Interview                  96                          DOLAN                      6/5/2007 16:38

    Interview                  167                        DOLAN                      6/5/2007 14:24

    Interview                  2                            Dolan                         6/4/2007 14:54

    Interview                  2                            DOLAN                      6/4/2007 11:11

    Interview                  5                            Dolan                         6/1/2007 16:07

    Interview                  27                          DOLAN                      6/1/2007 14:48

    Interview                  323                        DOLAN                      6/1/2007 13:49

    Interview                  15                          Dolan                         6/1/2007 11:48

    Interview                  224                        DOLAN                      5/31/2007 16:40

    Interview                  126                        Dolan                         5/30/2007 15:02

    … (20 more records)

  • Top by what?

    DeltaT?

    What if 50% of lines have DeltaT = 1?

    What if there are 25 lines? How many to cut off?

    _____________
    Code for TallyGenerator

  • I thought of this on the way to work this morning and realized the way I described what I wanted was incorrect.  If the data is sorted by DeltaT for each mile_code and rep combination then in this example (30 records for each group and 10% gone from top to bottom) the first 3 records and last 3 records should be removed.  So it doesn't matter what the values of DeltaT are.  If there were fractions caused by the number of records chosen and/or the percentage then I'd want it to just round to the nearest whole number.

  • Have a look at SELECT TOP * 10 PERCENT ... ORDER BY and SELECT * TOP 10 PERCENT .... ORDER BY DESC


    * Noel

  • does the table have a key?


  • If it has a key (and I called mine uid2 here) this may help.  Not sure how fast it would run on a lot of rows.

    select * from testperc t

    left join

    (select uid2 from (select top 15 percent uid2,deltat from testperc order by deltat desc) a

    union

    select uid2 from (select top 15 percent uid2,deltat from testperc order by deltat asc) b

    ) c

    on t.uid2=c.uid2

    where c.uid2 is null


  • It has a key and I tried this but ended up with some strange results.  I should have gotten back 120 records per rep (24 records each for 5 mile_codes) and for some reps I'd get way over that and others way under.

    At any rate, I got some help from some other resources and this is what accomplished it in case anyone else has these issues.  Requires SQL 2005 and I realize I probably should have posted this on the other forum.

    Still have a little more work to do to get the final result before putting it into producton (Have to add some other calculations but will do that by joining in another temp table or something.) Thanks to the help of others this is what I ended up with:

     

    DECLARE @percent int,

                    @recentindex int

     

    SET @percent = 10

    SET @recentindex = 45

     

    Then crate temp table #miledata to get the last 'x' number of records for each rep in each mile_code that they have completed.  Do this based off of the @recentindex variable.

     

    Then use either of the following to get results.  Haven't decided which I'll put into production yet.

     

    OPTION A

    /*This creates a common table expression to run our aggregate query against.*/

    ;WITH cte_AveRecords AS

                    (

                                    SELECT row_number() OVER(partition BY mile_code, rep ORDER BY deltaT) AS rowNum, --gives a row number per group, ordered by deltaT

                                                    COUNT(*) OVER (partition BY mile_code, rep) AS totalRows, --gets total rows by mile_code and rep.

                                                    mile_code,

                                                    rep,

                                                    deltaT,

                                                    recentindex

                                    FROM  #miledata

                    )

     

    /*This deterimines the average per mile_code and rep by selecting the rows that fall above the low percent and less than or equal to the upper percent.*/

    SELECT c.mile_code,

                    c.rep,

                    CAST(AVG(CAST(a.deltaT AS Decimal(5,1))) AS Decimal(5,1)) AS AvgTime

    FROM

                    (

                    SELECT mile_code, rep, deltaT, recentindex, rowNum, totalRows

                    FROM cte_AveRecords

                    WHERE rowNum > totalRows * (@percent / 100.0) --removes the bottom 'x' percent

                                    AND rowNum < totalRows - (totalRows * (@percent / 100.0)) -- removes the top

                    )a

    JOIN cte_AveRecords c ON a.rep = c.rep AND a.mile_code = c.mile_code

    GROUP BY c.mile_code,

                    c.rep

    ORDER BY c.rep,

                    c.mile_code

     

    OPTION B

    /*This creates a common table expression to run our aggregate query against.*/

    ;WITH cte_AveRecords AS

                    (

                    SELECT mile_code,

                    rep,

                    deltat,

    /*This assigns a percentage to each row number in the mile_code and rep group.  See explaination A below*/

                     ((row_number() over(partition by mile_code, rep order by deltat)) * 100.00) / (count(*) over(partition by mile_code, rep)) as pct

                    FROM #miledata

                    )

     /*This deterimines the average per mile_code and rep based on what we populated the CTE with..*/

    SELECT mile_code,

                    rep,

                    CAST(AVG(CAST(deltat AS Decimal(5,1))) AS Decimal(5,1)) AS avg_deltat

    FROM cte_AveRecords

    WHERE pct > @percent and pct < (100.00 - @percent)

    GROUP BY mile_code,

                    rep

    ORDER BY rep,

                    mile_code

    GO

  • DELETE T

    FROM Table T

    INNER JOIN (SELECT TOP 10 PERCENT KeyCol

    FROM Table ORDER BY DelteT ASC) T1 ON T.KeyCol = T1.KeyCol

    INNER JOIN (SELECT TOP 10 PERCENT KeyCol

    FROM Table ORDER BY DelteT DESC) T2 ON T.KeyCol = T2.KeyCol

    _____________
    Code for TallyGenerator

  • Other queries did top/bottom 10% of the whole table which is why you got such a disparity in rows... Serqiy's does it correctly by the key column...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • *If* you could add an Identity column to your data when you import it (ordered properly), you could:

    Declare @Lo Int,@Hi Int

    Select @Lo=Ident_Current('dbo.Data')*0.10,@Hi=Ident_Current('dbo.Data')-@Lo

    Select * from dbo.Data where Ident between Ident_Current('dbo.Data')*0.10 and Ident_Current('dbo.Data')

    Adjust things to suit...



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

  • Hmmm, forgot to adjust the code for the variables...

    *If* you could add an Identity column to your data when you import it (ordered properly), you could:

    Declare @Lo Int,@Hi Int

    Select @Lo=Ident_Current('dbo.Data')*0.10,@Hi=Ident_Current('dbo.Data')-@Lo

    Select * from dbo.Data where Ident between @Lo and @Hi

    Adjust things to suit...



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply