Extracting Episode Start and End Dates From Multiple Assignments

  • Hello,

    We have a Staff/Program Assignment that captures to whom and what program a particular patient is assigned.  It's a historical table.  Sometimes a new staff person is assigned but the program remains the same.  I need to extract the start and end date for each episode that a patient was assigned to DG.  Below I have provided some sample data.  For example, the first 3 rows reflect that patient # 000001 was assigned to 3 different staff all during their first episode in DG.

    For Account # 000001, the episodes they were assigned to DG should be: 3/24/17 through 4/2/17; 7/31/17 through 8/9/17
    For Account # 000002, the episodes they were assigned to DG should be: 3/21/17 through 3/24/17


    CREATE TABLE #TEMP(ACCOUNT VARCHAR(15),STAFF VARCHAR(10),PROGRAM VARCHAR(10),STARTDATE DATETIME,ENDDATE DATETIME)
    INSERT INTO #TEMP VALUES ('000001','JOHN','DG','3/24/17','3/26/17')
    INSERT INTO #TEMP VALUES ('000001','JILL','DG','3/27/17','3/28/17')
    INSERT INTO #TEMP VALUES ('000001','STEVE','DG','3/29/17','4/2/17')
    INSERT INTO #TEMP VALUES ('000001','','INACTIVE','4/3/17','4/23/17')
    INSERT INTO #TEMP VALUES ('000001','JOHN','AA','6/1/17','6/20/17')
    INSERT INTO #TEMP VALUES ('000001','JILL','DH','6/30/17','7/30/17')
    INSERT INTO #TEMP VALUES ('000001','CHRIS','DG','7/31/17','8/09/17')
    INSERT INTO #TEMP VALUES ('000001','PHILLIP','AD','8/10/17',NULL)
    INSERT INTO #TEMP VALUES ('000002','STEVE','DG','3/21/17','3/24/17')

    SELECT * FROM #TEMP ORDER BY ACCOUNT,STARTDATE

    DROP TABLE #TEMP

    Thanks in advance for your help.

  • This is a classic Islands problem - You can Google for "SQL Gaps and Islands" for other examples

    Test Data

    CREATE TABLE #TEMP(ACCOUNT VARCHAR(15),STAFF VARCHAR(10),PROGRAM VARCHAR(10),STARTDATE DATETIME,ENDDATE DATETIME)

    INSERT INTO #TEMP VALUES ('000001','JOHN','DG','3/24/17','3/26/17')
    INSERT INTO #TEMP VALUES ('000001','JILL','DG','3/27/17','3/28/17')
    INSERT INTO #TEMP VALUES ('000001','STEVE','DG','3/29/17','4/2/17')
    INSERT INTO #TEMP VALUES ('000001','','INACTIVE','4/3/17','4/23/17')
    INSERT INTO #TEMP VALUES ('000001','JOHN','AA','6/1/17','6/20/17')
    INSERT INTO #TEMP VALUES ('000001','JILL','DH','6/30/17','7/30/17')
    INSERT INTO #TEMP VALUES ('000001','CHRIS','DG','7/31/17','8/09/17')
    INSERT INTO #TEMP VALUES ('000001','PHILLIP','AD','8/10/17',NULL)
    INSERT INTO #TEMP VALUES ('000002','STEVE','DG','3/21/17','3/24/17');

    The solution

    /*********************************************************************************
    NOTE: This technique was based on a solution from pg 196 of the book
       "Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions"
       by Itzik Ben-Gan.
    *********************************************************************************/
    WITH
    --=== Generate a list of numbers to use for padding the dates
    T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
               , (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
               , (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
               , (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
               , (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
               , (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
               , (0),(0),(0),(0)) AS X(N))
    , Nums(n) AS (SELECT n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
        FROM T T1 -- Max =    64
         , T T2 -- Max =   4,096
        )
    , cteData AS (
    --=== Generate a padded list of all the dates between STARTDATE and ENDDATE for each record
      SELECT
       ACCOUNT
      , PROGRAM
      , CALCDATE = DATEADD(DD, f.n -1, STARTDATE)
      , DATEADD(day, -1 * DENSE_RANK() OVER(ORDER BY DATEADD(DD, f.n -1, STARTDATE)), DATEADD(DD, f.n -1, STARTDATE)) AS diff
      FROM #TEMP as d
      CROSS APPLY Nums as f
      WHERE f.n <= DATEDIFF(DD, STARTDATE, ENDDATE)+1
    )
    SELECT ACCOUNT
      , PROGRAM
      , MIN(CALCDATE) AS rangeStart
      , MAX(CALCDATE) AS rangeEnd
    FROM cteData
    -- WHERE PROGRAM = 'dg'
    GROUP BY ACCOUNT, PROGRAM, diff
    ORDER BY ACCOUNT, PROGRAM, rangeStart;

  • After re-reading my post from last night, I notice that the solution will only work if STARTTIME is always unique.
    Below is a modified query that will handle multiple ACCOUNT, PROGRAM combinations.

    /*********************************************************************************
    NOTE: This technique was based on a solution from pg 196 of the book
     "Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions"
     by Itzik Ben-Gan.
    *********************************************************************************/
    WITH
    --=== Generate a list of numbers to use for padding the dates
    T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
        , (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
        , (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
        , (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
        , (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
        , (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
        , (0),(0),(0),(0)) AS X(N))
    , Nums(n) AS (SELECT n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
      FROM T T1 -- Max =  64
      , T T2 -- Max = 4,096
      )
    , cteData AS (
    --=== Generate a padded list of all the dates between STARTDATE and ENDDATE for each record
    SELECT
      ACCOUNT
    , PROGRAM
    , CALCDATE = DATEADD(DD, f.n -1, STARTDATE)
    , DATEADD(day, -1 * DENSE_RANK() OVER(PARTITION BY ACCOUNT, PROGRAM
                  ORDER BY DATEADD(DD, f.n -1, STARTDATE)), DATEADD(DD, f.n -1, STARTDATE)) AS diff
    FROM #TEMP as d
    CROSS APPLY Nums as f
    WHERE f.n <= DATEDIFF(DD, STARTDATE, ENDDATE)+1
    )
    SELECT ACCOUNT
     , PROGRAM
     , MIN(CALCDATE) AS rangeStart
     , MAX(CALCDATE) AS rangeEnd
    FROM cteData
    -- WHERE PROGRAM = 'dg'
    GROUP BY ACCOUNT, PROGRAM, diff
    ORDER BY ACCOUNT, PROGRAM, rangeStart;

  • DesNorton - Saturday, May 6, 2017 12:47 AM

    After re-reading my post from last night, I notice that the solution will only work if STARTTIME is always unique.
    Below is a modified query that will handle multiple ACCOUNT, PROGRAM combinations.

    /*********************************************************************************
    NOTE: This technique was based on a solution from pg 196 of the book
     "Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions"
     by Itzik Ben-Gan.
    *********************************************************************************/
    WITH
    --=== Generate a list of numbers to use for padding the dates
    T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
        , (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
        , (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
        , (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
        , (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
        , (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)
        , (0),(0),(0),(0)) AS X(N))
    , Nums(n) AS (SELECT n = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
      FROM T T1 -- Max =  64
      , T T2 -- Max = 4,096
      )
    , cteData AS (
    --=== Generate a padded list of all the dates between STARTDATE and ENDDATE for each record
    SELECT
      ACCOUNT
    , PROGRAM
    , CALCDATE = DATEADD(DD, f.n -1, STARTDATE)
    , DATEADD(day, -1 * DENSE_RANK() OVER(PARTITION BY ACCOUNT, PROGRAM
                  ORDER BY DATEADD(DD, f.n -1, STARTDATE)), DATEADD(DD, f.n -1, STARTDATE)) AS diff
    FROM #TEMP as d
    CROSS APPLY Nums as f
    WHERE f.n <= DATEDIFF(DD, STARTDATE, ENDDATE)+1
    )
    SELECT ACCOUNT
     , PROGRAM
     , MIN(CALCDATE) AS rangeStart
     , MAX(CALCDATE) AS rangeEnd
    FROM cteData
    -- WHERE PROGRAM = 'dg'
    GROUP BY ACCOUNT, PROGRAM, diff
    ORDER BY ACCOUNT, PROGRAM, rangeStart;

    Des - this loses the row for program "AD".
    Here's an alternative method adapted from something I found on an Oracle site recently:
    SELECT ACCOUNT, PROGRAM, STARTDATE = MIN(STARTDATE), ENDDATE = MIN(ENDDATE)
    FROM (
     SELECT *, Grouper = DATEADD(DAY,
       0-SUM(1+DATEDIFF(DAY,STARTDATE,ENDDATE)) OVER(PARTITION BY ACCOUNT,PROGRAM ORDER BY STARTDATE),
       ENDDATE)
     FROM #TEMP
    ) d
    GROUP BY ACCOUNT, PROGRAM, Grouper

    Here's a breakdown of how it works:
    SELECT *, Grouper = DATEADD(DAY,0-SumDuration,ENDDATE)
    FROM (
     SELECT *, SumDuration = SUM(Duration) OVER(PARTITION BY ACCOUNT,PROGRAM ORDER BY STARTDATE)
     FROM (
      SELECT *, Duration = 1+DATEDIFF(DAY,STARTDATE,ENDDATE)
      FROM #TEMP
     ) d1
    ) d2
    ORDER BY ACCOUNT,STARTDATE

    This is generally around ten times faster than the IBG method you reference - but I'm fairly sure he's improved on it in the last couple of years.

    Edit: typical execution stats

    (7 row(s) affected)

    Table 'Worktable'. Scan count 15, logical reads 54 ...

    Table '#TEMP'. Scan count 1, logical reads 2 ...

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    =========================================================

    (6 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0 ...

    Table '#TEMP'. Scan count 1, logical reads 8193 ...

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 20 ms.



    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Nice Chris

    Just one small change to your script
    Need to change MIN(ENDDATE) to MAX(ENDDATE)

    SELECT ACCOUNT, PROGRAM, STARTDATE = MIN(STARTDATE), ENDDATE = MAX(ENDDATE)
    FROM (
    SELECT *, Grouper = DATEADD(DAY,
     0-SUM(1+DATEDIFF(DAY,STARTDATE,ENDDATE)) OVER(PARTITION BY ACCOUNT,PROGRAM ORDER BY STARTDATE),
     ENDDATE)
    FROM #TEMP
    ) d
    GROUP BY ACCOUNT, PROGRAM, Grouper

  • DesNorton - Sunday, May 7, 2017 2:57 AM

    Nice Chris

    Just one small change to your script
    Need to change MIN(ENDDATE) to MAX(ENDDATE)

    SELECT ACCOUNT, PROGRAM, STARTDATE = MIN(STARTDATE), ENDDATE = MAX(ENDDATE)
    FROM (
    SELECT *, Grouper = DATEADD(DAY,
     0-SUM(1+DATEDIFF(DAY,STARTDATE,ENDDATE)) OVER(PARTITION BY ACCOUNT,PROGRAM ORDER BY STARTDATE),
     ENDDATE)
    FROM #TEMP
    ) d
    GROUP BY ACCOUNT, PROGRAM, Grouper

    Many thanks Des.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Excellent, thank you Des and Chris.  I will be studying this today!  I always love posting here, I learn something new every time.

  • Thanks for the feedback, Adam. Holler if you get stuck.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply