Getting the nth record for each group

  • Candidate for QOD

    I have a table with 3.3m rows, the table is well indexed to support the queries.

    There are 500 records for each IssuerID.

    CREATE TABLE [dbo].[FactPV](

     [ID] [int] NOT NULL,

     [ScenarioDate] [datetime] NOT NULL,

     [IssuerID] [int] NOT NULL,

     [PortfolioID] [int] NULL,

     [PV01h] [numeric](38, 10) NULL,

     [PV01l] [numeric](38, 10) NULL,

     [CSh] [numeric](25, 10) NULL,

     [CSl] [numeric](25, 10) NULL,

     [PVTotal] [numeric](25, 10) NULL,

    ) ON [PRIMARY]

    For each Issuerid in a subset I need to get

    6th highest PVTotal

    6th lowest PVTotal

    6th highest [CSh] + [CSl]

    I currently load all the records for the issuers into a table var

    I create a second table var for the results

    I use the following within a cursor for each IssuerID

    1 SET ROWCOUNT 6

     Insert into @tblFcVaR

     Select DataDate, F.IssuerID, PVTotal, 0

     From @tbltemp F

     where  F.IssuerId= @IssuerID

     Order by PVTotal

    2 Select top 1 @DwnPV = PVTotal from @tblFcVaR order by PVTotal desc

     delete from @tblFcVaR

    This is repeated for lowest PVTotal and highest [CSh] + [CSl]

    and the 3 vars are inserted into the result table.

    This is repeated for each Issuer (6-200 times)

    The Question!

    Will I get better performance from selecting directly against the table and therefore getting the benefit of the indexes or selecting against the table var with less data (500k vs 3.3m) and no IO.

    I presume there is no way to get the 6th record in 1 step.

     

     

  • Well, to get the sixth highest (or lowest with a very slight mod..) in every group, a simple way would be to skim off the keys for the top five of each group into a table variable and use a NOT IN () to get the sixth. This doesn't really work too well if there is a tie  but otherwise would do the trick. I've done a sketch using dummy data to try to illustrate what I mean.....

    SET nocount ON
    DECLARE  @DummyData TABLE(MyID INT IDENTITY(1,1), name VARCHAR(20), score INT, Team VARCHAR(5))
    INSERT INTO @DummyData SELECT 'fred',5,'blue' INSERT INTO @DummyData SELECT 'mary',2,'blue'
    INSERT INTO @DummyData SELECT 'jane',1,'blue' INSERT INTO @DummyData SELECT 'paul',3,'blue'
    INSERT INTO @DummyData SELECT 'bill',2,'blue' INSERT INTO @DummyData SELECT 'barry',4,'blue'
    INSERT INTO @DummyData SELECT 'garry',6,'blue' INSERT INTO @DummyData SELECT 'Raj',2,'blue'
    INSERT INTO @DummyData SELECT 'brad',5,'blue' INSERT INTO @DummyData SELECT 'Mo',7,'blue'
    INSERT INTO @DummyData SELECT 'joe',12,'blue' INSERT INTO @DummyData SELECT 'mell',8,'blue'
    INSERT INTO @DummyData SELECT 'keith',13,'blue' INSERT INTO @DummyData SELECT 'Shane',9,'blue'
    INSERT INTO @DummyData SELECT 'ian',7,'red' INSERT INTO @DummyData SELECT 'mary',5,'red'
    INSERT INTO @DummyData SELECT 'jane',12,'red' INSERT INTO @DummyData SELECT 'paul',13,'red'
    INSERT INTO @DummyData SELECT 'damian',5,'red' INSERT INTO @DummyData SELECT 'barry',7,'red'
    INSERT INTO @DummyData SELECT 'garry',8,'red' INSERT INTO @DummyData SELECT 'pete',2,'red'
    INSERT INTO @DummyData SELECT 'keith',6,'red' INSERT INTO @DummyData SELECT 'sanji',9,'red'
    INSERT INTO @DummyData SELECT 'joe',15,'red' INSERT INTO @DummyData SELECT 'mell',18,'red'
    INSERT INTO @DummyData SELECT 'keith',13,'red' INSERT INTO @DummyData SELECT 'Shane',19,'red'
    INSERT INTO @DummyData SELECT 'andy',4,'red' INSERT INTO @DummyData SELECT 'Saun',5,'red'
    
    DECLARE  @topRankers TABLE (TheirID INT, iteration INT)
    DECLARE @ii INT
    DECLARE @iiMax INT
    
    SELECT @ii=6
    WHILE @ii>0
           BEGIN
           INSERT INTO @TopRankers(TheirID, iteration)
           SELECT MyID,@ii FROM @DummyData g INNER JOIN
                   (
                   SELECT [highscore]=MAX(score), team FROM @DummyData
                   WHERE myID NOT IN 
                           (
                           SELECT TheirID FROM @TopRankers
                           )
                    GROUP BY team
                   )f
           ON f.highScore=Score AND f.team=g.team
           SELECT @ii=@ii-1
           END
    SELECT name, score, team FROM @dummyData INNER JOIN @TopRankers ON theirID=myID 
            WHERE iteration=1

    Best wishes,
    Phil Factor

  • Mark

    In the absence of any sample data, I've used the Northwind database.  This query finds the second highest value of Freight for each CustomerID in the Orders table.  You should be able to tweak it fairly easily to suit your needs.  No guarantees given on performance - although it doesn't use an explicit cursor or WHILE loop, I think it will have to run the TOP query once for each CustomerID.

    John

    select distinct o1.customerid,

     (select top 1 oo.freight from

      (select top 2 o2.freight from Orders o2

       where o2.CustomerID = o1.CustomerID

       order by o2.Freight desc) oo

      order by oo.Freight asc)

      as SecondFreight

    from orders o1

  • You should be able to write a version of this query using the new ROW_NUMBER intrinsic as well..


    Take care..

    - Ward Pond
    blogs.technet.com/wardpond

  • Outcome from the ideas - thanks.

    Stage 1 was to select directly against the heavily indexed FactPV table. This reduced the processing from 6:14 to 3:24

    I then created a summary table which reduced the row count from 3.3m to 1.25m and selected directly against the indexed summary table. This reduced the processing from 3:24 to 1:55

    I then implemented the Row_Number method and reduced the processing to 1:24. I had to create a view to do the sum outside the Row_Number method. While it probably could have been done using another select I dislike too many nested select statements.

     WITH PVData AS

     (SELECT PVTotal,

     ROW_NUMBER() OVER (order by PVTotal)as RowNumber

     FROM vwFactPVSum01

     where IssuerId= @IssuerID

    &nbsp

     SELECT @DwnPV = PVTotal

     FROM PVData

     WHERE RowNumber = 6;

    Thanks you all and especially Ward Pond for the Row_Number hint.

    Mark Firth

     

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply