Convert Date List to Date Range

  • Hello,

    I have a table that a record gets inserted into every time a member subscribes or unsubscribes a list.

    Below is the sample data

    create table ##TEST (

    ID int,

    FLAG char(1),

    DT date)

    insert into ##TEST select 1, 'S', '2013-01-01' -- SUBSCRIBED

    insert into ##TEST select 1, 'U', '2013-06-01' -- UNSUBSCRIBED

    insert into ##TEST select 1, 'S', '2014-07-01' -- SUBSCRIBED

    insert into ##TEST select 1, 'U', '2014-12-01' -- UNSUBSCRIBED

    insert into ##TEST select 1, 'S', '2016-01-01' -- SUBSCRIBED

    It looks as below

    ID FLAG DT

    ----------- ---- ----------

    1 S 2013-01-01

    1 U 2013-06-01

    1 S 2014-07-01

    1 U 2014-12-01

    1 S 2016-01-01

    I'd like to convert the data list into a range as below

    ID EffectiveDate TermDate

    -----------------------------------

    1 2013-01-01 2013-06-01

    1 2014-07-01 2014-12-01

    1 2016-01-01 null

    Can someone please help me how to do this? I'd greatly appreciate your help.

  • This is what I currently have but it will have a performance issue as the source table might have millions records.

    select

    a.ID,

    a.EfftDT,

    isnull(a.TermDT,'9999-01-01')

    from (

    select

    a.ID,

    a.DT as EfftDT,

    b.DT as TermDT,

    row_number() over (partition by a.ID, a.DT order by datediff(day,a.DT,b.DT)) as RNK

    from ##TEST a with (nolock)

    left join ##TEST b with (nolock) on a.ID = b.ID and b.FLAG = 'U' and a.DT < b.DT

    where

    a.FLAG = 'S') a

    where

    a.RNK = 1

  • Another version. I think this will minimize the number of records returned from the join. I am wondering if there is a better way.

    with CTE (ID, DT, RNK)

    as (

    select

    a.ID,

    a.DT,

    row_number() over (partition by ID order by a.DT) as RNK

    from ##TEST a

    where

    a.FLAG = 'U')

    select

    a.ID,

    a.DT as CURR_DT,

    isnull(b.DT,'1900-01-01') as PREV_DT

    into ##TEMP

    from CTE a

    left join CTE b on a.ID = b.ID and a.RNK = b.RNK + 1

    select

    a.ID,

    a.DT as EFFT_DT,

    isnull(b.CURR_DT,'9999-01-01') as TERM_DT

    from ##TEST a

    left join ##TEMP b on a.ID = b.ID and a.DT between b.PREV_DT and b.CURR_DT

    where

    a.FLAG = 'S'

  • If you're using 2012 or later, you can use LAG.

    SELECT ID

    , FLAG

    , LAG(DT, 1) OVER (PARTITION BY ID ORDER BY DT) AS EffectiveDate

    , DT AS TermDate

    FROM ##Test;

    Note that I'm assuming (and that's a dangerous thing to do) that the Flag alternates between 'S' and 'U'. If that assumption can ever be false, then LAG won't work for you, and you'd probably have to use a CTE to get the latest previous record that was not equal to the current status/flag.

  • @pietlinden: your query returns this:

    ID EffectiveDate TermDate

    ----------- ------------- ----------

    1 NULL 2013-01-01

    1 2013-01-01 2013-06-01

    1 2013-06-01 2014-07-01

    1 2014-07-01 2014-12-01

    1 2014-12-01 2016-01-01

    (5 row(s) affected)

    The first row, ( 1, NULL, 2013-01-01) it totally off. 2013-01-01 is the first S date, and it cannot be TerminationDate. Then second row says that ID=1 started subscription on 2013-06-01 as EffectiveDate, and ended it on 2013-06-01.

    The third row is again incorrect - it seems that ID=1 subscribed on 2013-06-01, then unsubscribed on 2014-07-01. the opposite is true. In other words, every second row returned by your query is incorrect. To make it work, you should simply hide odd rows (1,3,5) and leave even rows (2,4,6). Still, subscription on 2016 will not be in the result set.

    Instead of trying to fix the query, I suggest this solution. First, add some more rows, to be closer to reality:

    insert into ##TEST select 2, 'S', '2013-02-01' -- SUBSCRIBED

    insert into ##TEST select 2, 'U', '2013-08-01' -- UNSUBSCRIBED

    insert into ##TEST select 2, 'S', '2014-09-01' -- SUBSCRIBED

    insert into ##TEST select 2, 'U', '2014-12-01' -- UNSUBSCRIBED

    insert into ##TEST select 2, 'S', '2016-01-01' -- SUBSCRIBED

    insert into ##TEST select 2, 'U', '2016-01-29' -- SUBSCRIBED

    ID = 2 has all subscriptions terminated, unlike ID 1 where the last subscription is still in effect.

    Here is the query that does the work:

    WITH SubscribeDates AS

    (

    SELECT

    ID, FLAG, DT

    , Seq = row_number() OVER (ORDER BY DT)

    FROM ##TEST

    WHERE Flag = 'S'

    ) , UnsubscribeDAtes AS

    (

    SELECT

    ID, FLAG, DT

    , Seq = row_number() OVER (ORDER BY DT)

    FROM ##TEST

    WHERE Flag = 'U'

    )

    SELECT S.ID, S.DT AS EffectiveDate

    , U.DT AS TermDate

    FROM SubscribeDates AS S

    LEFT JOIN UnsubscribeDAtes AS U ON S.Seq = U.Seq

    ORDER BY ID, EffectiveDate

    ;

    which returns result:

    ID EffectiveDate TermDate

    ----------- ------------- ----------

    1 2013-01-01 2013-06-01

    1 2014-07-01 2014-12-01

    1 2016-01-01 NULL

    2 2013-02-01 2013-08-01

    2 2014-09-01 2014-12-01

    2 2016-01-01 2016-01-29

    (6 row(s) affected)

    Returned rows are telling us:

    ID = 1 subscribed on 2013-01-01 then terminated subscription on 2013-06-01.

    ID = 1 subscribed on 2014-07-01 then terminated subscription on 2014-12-01.

    ID = 1 subscribed on 2016-01-01 and the subscription has been active since.

    ID = 2 subscribed on 2013-02-01, then terminated subscription on 2013-08-01.

    and so on.

    We must assume that the system will somehow enforce 'S' being first FLAG for each new ID, and after that, 'S' and 'F' alternate each other. In other words, start with S and never allow two S or two U for two adjacent rows for any given ID.

    🙂

  • Thank you for your reply. It was very helpful.

  • Thank you.

  • If you have the ability to change your transactional code I would suggest this, only because subscriptions are always paired with cancellations.

    create table ##TEST (ID int, SubscribeDate date, UnsubscribeDate date)

    New subscriptions would be inserted, leaving UnsubscribeDate null.

    When someone unsubscribes, this row is updated to populate the UnsubscribeDate.

    Since subscriptions, unsubscriptions are paired, you would find this far more convenient for reporting purposes.

    - Active subscriptions are those where UnsubscribeDate is null.

    - Cancelled subscriptions have Unsubscribe date populated.

    - Counts of active subscriptions over time become simpler.

    - Calculating the length of subscriptions (and average length) become simpler

    - Reporting PERFORMANCE will be better than any of the solutions using your current schema.

    Just a thought.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Proposed flat design does not solve any of probable issues with existing design. I mentioned the assumptions:

    We must assume that the system will somehow enforce 'S' being first FLAG for each new ID, and after that, 'S' and 'F' alternate each other. In other words, start with S and never allow two S or two U for two adjacent rows for any given ID.[\quote]

    The same applies to flat design:

    - subscription date must not be NULL

    - unsubscribe date must be after subscription date, or NULL

    Even if the designer is aware of these constraints and is capable of enforcing them, information gathered from flat design will be incomplete. Multiple terminations and re-subscriptions are possible with suggested design, but no history is kept. Existing design, separate rows for each transaction (subscribe, unsubscribe) can be improved, without touching what is already there, by adding few more elements to enforce given constraints. Joe Celko has written several good articles on the topic of state transitions, so has Alex Kuznetsov and Dwaine Champs, either on this site or Small Talk.

    Forum posts are not convenient place to discuss serious design topics. If I find some time I might write an article on state transitions and their proper use. Maybe then flat designs would not look as appealing.

    🙂

  • I don't have time to write a paper at the moment, so this will have to suffice. 🙂

    I've read Mr. Celko's books, and I have seen some truly execrable flat designs that failed miserably at trying to track a complicated process on a single row. I wholeheartedly agree that they are unsuitable for tracking a process of multiple steps. But consider the characteristics of a subscription. They include:

    An identifier (SubscriptionID)

    Who is subscribing (SubscriberID)

    What service(s) they are subscribing to (omitted in the example below)

    When the subscription begins (SubscribedDate) and

    When the subscription ends (UnsubscribedDate)

    -- when a subscription ends might even be known in advance, as in a one-year magazine subscription

    When someone resubscribes after a subscription has ended, it is in fact a new subscription. So in this case I do not consider the inclusion of both start and stop times on the same row as being a "flat" table.

    Perhaps my first post was too simplistic. Using a schema dictated by the above characteristics.

    declare @SubscriberID int, @UnsubDate date

    Create table #Subscriptions (SubscriptionID int identity(1,1), SubscriberID int, SubscribedDate date, UnSubscribedDate date)

    Insert into #Subscriptions(SubscriberID, SubscribedDate)

    values (1,'2/23/2015'), (2,'2/26/2015'), (3, '3/3/2015')

    set @SubscriberID = 2

    set @UnsubDate = '11/1/2015'

    ;with cte as ( select top (1) -- with proper constraints, the top(1

    SubscriptionID, UnSubscribedDate

    from #Subscriptions

    where SubscriberID = @SubscriberID

    and SubscribedDate < = @UnsubDate

    order by SubscriptionID desc)

    update cte

    set UnSubscribedDate = @UnsubDate

    insert into #Subscriptions(SubscriberID, SubscribedDate)

    values (4,'5/23/2016'), (2,'5/26/2016'), (5, '6/29/2016')

    set @SubscriberID = 1

    set @UnsubDate = '3/1/2016'

    ;with cte as ( select top (1) SubscriptionID, UnSubscribedDate

    from #Subscriptions

    where SubscriberID = @SubscriberID

    and SubscribedDate < = @UnsubDate

    order by SubscriptionID desc)

    update cte

    set UnSubscribedDate = @UnsubDate

    select * from #Subscriptions

    select * from #Subscriptions order by SubscriberID, SubscribedDate

    select sum(1) as ActiveSubscriptions

    from #Subscriptions

    where UnSubscribedDate is null

    select SubscriberID as ActiveSubscriber, SubscriptionID as ActiveSubscription, SubscribedDate

    from #Subscriptions

    where UnSubscribedDate is null

    order by SubscribedDate

    select SubscriberID as Resubscriber, SubscriptionID as LastestSubscription, SubscribedDate as DateOfLatestSubscription

    from #Subscriptions s1

    where UnSubscribedDate is null

    and exists(select 1 from #Subscriptions s2 where s1.SubscriberID = s2.SubscriberID and s1.SubscriptionID > s2.SubscriptionID)

    select *, datediff(Day, SubscribedDate, COALESCE(UnsubScribedDate,Getdate())) as SubscriptionLength

    from #Subscriptions

    select Avg(datediff(Day, SubscribedDate, COALESCE(UnsubScribedDate,Getdate()))) as AverageSubscriptionLength

    from #Subscriptions

    Given, proper constraints should ensure that a Subscriber doesn't wind up with multiple active subscriptions. Not only is history preserved, but look at how simple the queries against that history become. All the select statements together took less than five minutes to create. They will also run efficiently. This is valuable, because simple questions should not take complex queries to answer. We write once and read a hundred times.

    Sorry for being long-winded here. I just hate to see doctrine applied blindly.

    Best regards, all.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • I agree with The Dixie Flatline. Flat design does make answering the question "What is the subscription status of any customer at any given time". Queries are definitely simpler than the ones provided for the original solution. Update when unsubscribing looks as bad as CTE queries for the original solution. In order to have easy query, we made updates more difficult.

    The design is founded on assumption: "With proper constraints....". What would be those constraints? How can we implement them? In no particular order, I could think of:

    1) There cannot be un-subscription without subscription (SubscribedDate IS NULL AND UnSubscribedDate IS NOT NULL)

    2) UnSubscribedDate must be after SubscribedDate

    3) At any given moment, each customer can have only one subscription (no overlap)

    4) New subscription for a customer can be added only if the previous one has been canceled (this follows from 1)

    Let's see what happens if we do not implement any constraints:

    /*---

    This is what we have if we run the script:

    SubscriptionID SubscriberID SubscribedDate UnSubscribedDate

    -------------- ------------ -------------- ----------------

    1 1 2015-02-23 2016-03-01

    2 2 2015-02-26 2015-11-01

    3 3 2015-03-03 NULL

    4 4 2016-05-23 NULL

    5 2 2016-05-26 NULL

    6 5 2016-06-29 NULL

    (6 row(s) affected)

    ---*/

    After adding a few records I got this:

    SubscriptionID SubscriberID SubscribedDate UnSubscribedDate

    -------------- ------------ -------------- ----------------

    1 1 2015-02-23 2016-03-01

    8 2 NULL 2015-10-24

    2 2 2015-02-26 2015-11-01

    7 2 2015-05-26 NULL

    5 2 2016-05-26 NULL

    3 3 2015-03-03 NULL

    4 4 2016-05-23 NULL

    6 5 2016-06-29 NULL

    (8 row(s) affected)

    We see 4 rows for customer ID = 2. The first one is OK - SubscriptionID = 2, subscribed on 2015-02-26, unsubscribed on 2015-11-01. Then we have one row where customer unsubscribed before any subscription at all, then one with overlapping subscription, and one subscription beginning before the previous finished. We don't want bad data, even if querying for reports is easy, wrong report is worse than no report at all. Therefore, we must implement given constraints.

    How to implement given constraints?

    1) There cannot be un-subscription without subscription:

    CHECK ( SubscribedDate IS NOT NULL)

    2) UnSubscribedDate must be after SubscribedDate :

    CHECK ( SubscribedDate < UnSubscribedDate OR UnSubscribedDate IS NULL)

    Constraints 2 and 3 are actually the same one. Good, one less to worry about. However, our life is not twice as easy because of that. How to enforce 2 or 3? That is where Joe's articles offer some hope. I'll add next post for that part.

  • Thank you for putting in the effort to create the constraints.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • How to enforce 2 or 3?

    CREATE UNIQUE INDEX IX_#subscriptions ON #subscriptions (SubscriberID, UnsubScribedDate)

    will work for our example where the definition of an active subscription is one without an UnsubscribeDate. The unique index prevents insertion of a second row with a NULL UnsubscribeDate for a given user.

    With constraints in place, the update statement is simply

    update #subscriptions

    set UnSubscribedDate = @UnsubDate

    where SubscriberID = @SubscriberID

    and UnsubScribeDate is null

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Very nice touch with UNIQUE (SubscriberID, UnsubScribedDate). That does solve part of the problem, but not everything. To recap, this is what happened so far:

    -- Fresh start, with all correct updates:

    -- Table, with all constraints so far:

    IF Object_ID('tempdb..#Subscriptions') IS NOT NULL DROP TABLE #Subscriptions

    GO

    Create table #Subscriptions

    ( SubscriptionID int identity(1,1) NOT NULL

    , SubscriberID int NOT NULL

    , SubscribedDate date NOT NULL

    , UnSubscribedDate date

    , CONSTRAINT [PK_#Subscriptions] PRIMARY KEY (SubscriberID,SubscribedDate)

    , CONSTRAINT [UNIQUE #Subscriptions (SubscriberID,UnSubscribedDate)]

    UNIQUE (SubscriberID,UnSubscribedDate)

    , CONSTRAINT [CK #Subscriptions (SubscribedDate < UnSubscribedDate)]

    CHECK (SubscribedDate < UnSubscribedDate)

    )

    ;

    GO

    Insert into #Subscriptions(SubscriberID, SubscribedDate)

    values (1,'2/23/2015'), (2,'2/26/2015'), (3, '3/3/2015')

    ;

    -- (3 row(s) affected)

    ;

    UPDATE #Subscriptions

    set UnSubscribedDate = '20151101'

    WHERE SubscriptionID = 2

    ;

    insert into #Subscriptions(SubscriberID, SubscribedDate)

    values (4,'5/23/2016'), (2,'5/26/2016'), (5, '6/29/2016')

    ;

    UPDATE #Subscriptions

    set UnSubscribedDate = '20151101'

    WHERE SubscriberID = 1 AND UnSubscribedDate IS NULL

    ;

    -- So far so good. Cool.

    SELECT *

    FROM #Subscriptions

    ORDER BY SubscriberID, SubscribedDate

    ;

    /*--

    SubscriptionID SubscriberID SubscribedDate UnSubscribedDate

    -------------- ------------ -------------- ----------------

    1 1 2015-02-23 2015-11-01

    2 2 2015-02-26 2015-11-01

    5 2 2016-05-26 NULL

    3 3 2015-03-03 NULL

    4 4 2016-05-23 NULL

    6 5 2016-06-29 NULL

    (6 row(s) affected)

    --*/

    Now let's try to break some rules - add subscription before previous one expired, for SubscriberID=2, SubscribedDate = '2016-05-26':

    insert into #Subscriptions(SubscriberID, SubscribedDate)

    values (2,'6/01/2016')

    ;

    /*--

    Msg 2627, Level 14, State 1, Line 1

    Violation of UNIQUE KEY constraint

    'UNIQUE #Subscriptions (SubscriberID,UnSubscribedDate)'.

    Cannot insert duplicate key in object 'dbo.#Subscriptions'.

    The duplicate key value is (2, <NULL>).

    The statement has been terminated.

    --*/

    Cool, constraint [UNIQUE #Subscriptions (SubscriberID,UnSubscribedDate)]

    prevented additional subscription before ending the previous one.

    Let's close the subscription and try again.

    -- This will work:

    UPDATE #Subscriptions -- THsi should work because

    set UnSubscribedDate = '20160610' -- UnSubscribedDate after '2016-05-26'

    WHERE SubscriberID = 2 AND UnSubscribedDate IS NULL

    ;

    /*--

    SubscriptionID SubscriberID SubscribedDate UnSubscribedDate

    -------------- ------------ -------------- ----------------

    1 1 2015-02-23 2015-11-01

    2 2 2015-02-26 2015-11-01

    5 2 2016-05-26 2016-06-10

    3 3 2015-03-03 NULL

    4 4 2016-05-23 NULL

    6 5 2016-06-29 NULL

    (6 row(s) affected)

    --*/

    Let's try to add a subscription for SubscriberID, starting on 2016-05-30 which is before end of previous subscription ( 2016-06-10). The following UPADTE will work, no existing constraints validated. However, it will create illegal entry - new subscription starting before UnSubscribedDate from the previous one.

    insert into #Subscriptions(SubscriberID, SubscribedDate)

    values (2,'2016-05-30')

    ;

    /*--

    SubscriptionID SubscriberID SubscribedDate UnSubscribedDate

    -------------- ------------ -------------- ----------------

    1 1 2015-02-23 2015-11-01

    2 2 2015-02-26 2015-11-01

    5 2 2016-05-26 2016-06-10 = Jun 10, 2016

    8 2 2016-05-30 NULL , started on May 30 2016

    3 3 2015-03-03 NULL

    4 4 2016-05-23 NULL

    6 5 2016-06-29 NULL

    (7 row(s) affected)

    --*/

    Obviously, we need more constraints. What we have so far is OK, but we need more. Time permitting, new post is coming this afternoon, if not, then on Monday I guess.

  • Zidar (7/8/2016)


    Obviously, we need more constraints. What we have so far is OK, but we need more. Time permitting, new post is coming this afternoon, if not, then on Monday I guess.

    maybe someone will knock up a decent test harness ??

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply