using LAG() to calculate cumulative days but with pauses and resets

  • Guys,

    On 2012.. I have a data set that looks like the following

    CREATE TABLE Action
    (
    Action varchar(100)
    ,ActionDate DATE
    )

    INSERT INTO Action VALUES ('Start','2019-08-01')
    INSERT INTO Action VALUES ('Pause','2019-08-05')
    INSERT INTO Action VALUES ('Start','2019-08-10')
    INSERT INTO Action VALUES ('Reset','2019-08-20')
    INSERT INTO Action VALUES ('Reset','2019-08-25')



    I need to calculate for each date the cumulative number of days before and after the event.  The sequence will always begin with a Start event.  I can use LAG(ActionDate,1) OVER(ORDER BY ActionDate) to bring the current and previous dates together and do a datediff to calculate the days to be added, what I am struggling with is whether to add the difference to the cumulative or set the cumulative to zero because that has to be done in the context of the same window which is not possilble.

    For any given row the cumulative total needs to be one of

    • Previous Action = Started: Prior = Previous cumulative + this row datediff  , Post = Previous cumulative + this row datediff
    • Previous Action = Paused: Prior = Previous cumulative only, Post =  Previous cumulative only
    • Previous Action = Reset: Prior =  Previous cumulative + this row datediff , Post = 0

     

    The above data should yield the following results

     

    Action   Date            PriorClock        PostClock    explain       
    Start 2019-08-01 0 0 Clock Starts at 0
    Pause 2019-08-05 5 5 runs for 5 days and is then stopped
    Start 2019-08-10 5 5 clock has been paused and now restarts
    Reset 2019-08-20 15 0 cumulative got to 15 and now resets but still runs
    Reset 2019-08-25 5 0 cumulative got to 5 and now resets but still runs

    I simply can't work out how to apply the conditional logic to the cumulative so that it is picked up on the next row as it is not a direct LAG() calculation. I am beginning to fear that I will need to resort to cursors!

     

    • This topic was modified 4 years, 9 months ago by  aaron.reese.
    • This topic was modified 4 years, 9 months ago by  aaron.reese.
  • In the absence of any responses.  I have managed to fashion the following code which looks like is working for the specified data set and should work generically

    --drop table #Action

    --CREATE TABLE #Action
    --(
    --Action varchar(100)
    --,ActionDate DATE
    ----,isReset INT
    --)

    --INSERT INTO #Action VALUES('Start','2019-08-01')
    --INSERT INTO #Action VALUES('Pause','2019-08-05')
    --INSERT INTO #Action VALUES('Start','2019-08-10')
    --INSERT INTO #Action VALUES('Reset','2019-08-20')
    --INSERT INTO #Action VALUES('Reset','2019-08-25')
    ;
    WITH CTE AS
    (
    select
    RANK() OVER(ORDER BY ActionDate) AS 'SEQ'
    ,RANK() OVER(ORDER BY ActionDate DESC) AS 'SEQRev'
    ,*
    FROM
    #Action
    )
    ,
    CTE2 AS
    (
    select
    CTE.*
    ,CASE
    WHEN SEQ = 1 THEN 1
    WHEN Action = 'Reset' THEN 1
    ELSE 0
    END AS 'ClockResetNow'
    ,CASE
    WHEN SEQRev = 1 THEN 1
    WHEN LEAD(Action,1) OVER(ORDER BY ActionDate) = 'Reset' THEN 1
    ELSE 0
    END AS 'LastOfClock'
    from
    CTE
    )
    ,
    CTE3 AS
    (
    SELECT
    CTE2.*
    ,LAG(Action,1) OVER(ORDER BY ActionDate) AS 'PrevAction'
    ,LAG(ActionDate,1) OVER(ORDER BY ActionDate) AS 'PrevActionDate'
    ,SUM(CLockResetNow) OVER(ORDER BY ActionDate) AS 'ClockInstance'
    ,CASE
    WHEN Action = 'Pause' AND LAG(Action,1) OVER(ORDER BY ActionDate) <> 'Pause' THEN DATEDIFF(DAY,LAG(ActionDate,1) OVER(ORDER BY ActionDate),ActionDate)
    WHEN Action = 'Start' THEN 0
    WHEN Action = 'Reset' AND LAG(Action,1) OVER(ORDER BY ActionDate) <> 'Pause' THEN DATEDIFF(DAY,LAG(ActionDate,1) OVER(ORDER BY ActionDate),ActionDate)
    else -999
    END AS 'ClockDaysToAdd'

    FROM
    CTE2
    )
    ,CTE3A AS
    (
    SELECT
    *
    ,COALESCE(LAG(ClockInstance,1) OVER(ORDER BY ActionDate),1) AS 'ClockInstancePrior'

    FROM
    CTE3
    )
    ,
    CTE4 AS
    (
    SELECT
    *
    ,SUM(ClockDaysToAdd) OVER(PARTITION BY ClockInstancePrior ORDER BY ActionDate) AS 'CumulativeClockPrior'
    ,CASE
    WHEN Action = 'Reset' THEN 0
    ELSE SUM(ClockDaysToAdd) OVER(PARTITION BY ClockInstancePrior ORDER BY ActionDate)
    END AS 'CumulativeClockPost'
    FROM
    CTE3A
    )

    select * from CTE4

    The issue is that you can't use windowed functions inside their calling query or inside an aggregate with OVER clause so this requires a number of discreet common table expressions to make it work.

    First requirement is to RANK() the records forwards and backwards by date because I need to know the first and last records in the data set: First should always be a START.  the last could be anything

    The next step (CTE2) identifies all the clock resetting events. This will be the first START and any RESET. I also need to know if this is the last record in the clock instance which we do with LEAD() and the reverse sequence counter

    Now that I know how many instances of the clock I have and which records are in which instance I can safely identify whether this event and the previous event should contribute to the cumulative clock total (CTE3)

    Because I need to know the cumulative before and after the event I need to offset the clock instances by 1 (CTE3A)

    Now that I know the Pre and Post clock instances, and I have calculated whether the event contributes to the cumulative clocks, I can use SUM() OVER() to only sum up the contributing times for each clock.

    This seems WAAAAY harder than it needs to be and is probably a performance hog.  Am I missing something?  Cursors would be one option, but I doubt they will be any faster.

     

     

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply