Finding Start and End Dates

  • I have a series of dates with values assigned to them. I want to pull only the records that have the same value for 3 consecutive days.

    CREATE TABLE [dbo].[TestTable](

    [keyfield] [int] IDENTITY(1,1) NOT NULL,

    [datefield] [datetime] NOT NULL,

    [valuefield] [int] NOT NULL,

    CONSTRAINT [PK_TestTable] PRIMARY KEY CLUSTERED

    (

    [keyfield] ASC

    )WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]

    ) ON [PRIMARY]

    SET IDENTITY_INSERT [dbo].[TestTable] ON

    INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (1, '20080713 00:00:00.000', 1)

    INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (2, '20080714 00:00:00.000', 4)

    INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (3, '20080715 00:00:00.000', 6)

    INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (4, '20080720 00:00:00.000', 6)

    INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (5, '20080717 00:00:00.000', 6)

    INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (6, '20080718 00:00:00.000', 4)

    INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (7, '20080721 00:00:00.000', 2)

    INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (8, '20080716 00:00:00.000', 6)

    INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (9, '20080801 00:00:00.000', 7)

    INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (10, '20080805 00:00:00.000', 8)

    INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (11, '20080806 00:00:00.000', 8)

    INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (12, '20080807 00:00:00.000', 8)

    INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (13, '20080808 00:00:00.000', 4)

    SET IDENTITY_INSERT [dbo].[TestTable] OFF

    I should be returning the following records

    Date Value

    7/15/2008 6

    7/16/2008 6

    7/17/2008 6

    8/05/2008 8

    8/06/2008 8

    8/07/2008 8

    My thought was that if I could figure out the start and end date for each range I would be good. But I can't seem to get there.

    This is what I have so far. I'm trying not to use any loops or cursors.

    ;with TableWRowNum

    as

    (

    select row_number() over ( Order By Datefield,valuefield) as row,datefield,valuefield

    from testtable

    )

    select TableWRowNum.row,TableWRowNum.datefield curRecDay,

    PrevRow.datefield prevRecDay,NextRow.datefield nextRecDay,

    TableWRowNum.valuefield,

    datediff(d,PrevRow.datefield,TableWRowNum.datefield) daysBefore,

    datediff(d,TableWRowNum.datefield,NextRow.datefield) daysAfter,

    case when datediff(d,PrevRow.datefield,TableWRowNum.datefield)=1 or datediff(d,TableWRowNum.datefield,NextRow.datefield) = 1 then 1 else 0 end sequential

    from TableWRowNum left join TableWRowNum PrevRow on TableWRowNum.row=PrevRow.row+1

    left join TableWRowNum NextRow on TableWRowNum.row = NextRow.row-1

    Any help would be appreciated.

  • (Your sample data doesn't seem to match your requirements. Take a look at the dates where the value is 6, they go 15, 17, 20, but you have a 4 assigned on the 18th.)

    Here's how I'd do it, at least as a rough draft:

    ;with

    CTE1 (Row, Date, Val) as -- Date sequence

    (select row_number() over (order by datefield),

    datefield, valuefield

    from dbo.TestTable),

    CTE2 (SDate, EDate) as -- Same value 3 in a row

    (select c1.date, c3.date

    from cte1 c1

    inner join cte1 c2

    on c1.val = c2.val

    and c1.row = c2.row-1

    inner join cte1 c3

    on c1.val = c3.val

    and c1.row = c3.row-2)

    select datefield, valuefield -- Final select

    from dbo.TestTable

    inner join cte2

    on datefield between sdate and edate;

    Now, that doesn't quite handle situations like four (or more) in a row properly. Is that a possibility? If so, how do you want it to work? Give all four row, give overlapping sets of 3, something else?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • The data is correct. On the 15,16,17,20 there is a 6. I don't want to pull the 20th because it is not sequential. The 18th is a 4 because I wanted there to be a gap so the 20th won't be part of the result.

    I can't base it on always pulling where there are 3 sequential dates with the same results. That number will be variable. It would be 4...20...or even 100.

    Your SQL did give me the correct results. But I will need to now make the "3" a variable number.

    I don't know if this simplifies or complicates things more.

    Date Value Start End Days

    7/13/2008 1 7/13/2008 7/13/2008 1

    7/14/2008 4 7/14/2008 7/14/2008 1

    7/15/2008 6 7/15/2008 7/17/2008 3

    7/16/2008 6 7/15/2008 7/17/2008 3

    7/17/2008 6 7/15/2008 7/17/2008 3

    7/18/2008 4 7/18/2008 7/18/2008 1

    7/20/2008 6 7/20/2008 7/20/2008 1

    7/21/2008 1 7/21/2008 7/21/2008 1

    8/01/2008 7 8/01/2008 8/01/2008 1

    8/05/2008 8 8/05/2008 8/07/2008 3

    8/06/2008 8 8/05/2008 8/07/2008 3

    8/07/2008 8 8/05/2008 8/07/2008 3

    8/08/2008 4 8/08/2008 8/08/2008 1

    I need all records with a 3...but 3 could be a variable. I could be looking for 2 or 5 or 12 etc.

    I hope that explains it better.

  • I didn't have time to test this, but give it a shot and tell me what's broken, and I'll have time to get back to it in a little while.

    ;with

    CTE1 (Row, Date, Val) as -- Date sequence

    (select row_number() over (order by datefield),

    datefield, valuefield

    from dbo.TestTable),

    CTE2 (Row, Val, Date, ValSet) as -- Sets

    (select c1.row, c1.val, c1.date,

    row_number() over (order by c1.row)

    from cte1 c1

    inner join cte1 c2

    on c1.val != c2.val

    and c1.row = c2.row+1

    union all

    select c3.row, c3.val, c3.valset, cte2.valset

    from cte1 c3

    inner join cte2

    on c3.val = cte2.val

    and c3.row = cte2.row+1),

    CTE3 (SDate, EDate, Dates) as -- Ranges and Qty

    (select min(date), max(date), count(*)

    from cte2

    group by valset)

    select datefield, valuefield, dates -- Final select

    from dbo.TestTable

    inner join cte3

    on datefield between sdate and edate;

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Msg 207, Level 16, State 1, Line 38

    Invalid column name 'valset'.

    You are calling cte2 from withing cte2. I'm trying to figure out what you are trying to do.

  • You can do that with CTEs (Common Table Expression). They can be recursive.

    Not sure why it would give you an invalid column name. I'll test it.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • In the second part of CTE2 (after Union All), change "c3.valset" to "c3.date". That should give you what you need, regardless of the number of dates in a row. I tested it on your test data and it seems to do what's needed.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I finally got it. Yours worked but when I added some more data it broke it.

    INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (14, '20080902 00:00:00.000', 4)

    INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (15, '20080903 00:00:00.000', 4)

    INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (16, '20080904 00:00:00.000', 4)

    INSERT INTO [dbo].[TestTable] ([keyfield], [datefield], [valuefield]) VALUES (17, '20080910 00:00:00.000', 4)

    But this works!

    ;with TableWRowNum

    as

    (

    select row_number() over ( Order By Datefield,valuefield) as row,datefield,valuefield

    from testtable

    )

    ,

    CTE2 as (

    select TableWRowNum.row,TableWRowNum.datefield curRecDay,

    PrevRow.datefield prevRecDay,NextRow.datefield nextRecDay,

    TableWRowNum.valuefield,

    case when PrevRow.ValueField != TableWRowNum.ValueField or PrevRow.ValueField is null then TableWRowNum.datefield

    when datediff(d,PrevRow.datefield,TableWRowNum.datefield) > 1 then TableWRowNum.datefield

    else ''

    end as StartGroup,

    case when NextRow.ValueField != TableWRowNum.ValueField or NextRow.ValueField is null then TableWRowNum.datefield

    when datediff(d,TableWRowNum.datefield,NextRow.datefield) > 1 then TableWRowNum.datefield

    else ''

    end as EndGroup

    from TableWRowNum left join TableWRowNum PrevRow on TableWRowNum.row=PrevRow.row+1

    left join TableWRowNum NextRow on TableWRowNum.row = NextRow.row-1

    )

    ,

    CTE3 as (

    select row,curRecDay,valuefield,startgroup,endgroup,

    case startgroup when '1900-01-01 00:00:00.000' then (select max(startgroup) from cte2 where row < c2.row) else startgroup end as StartDate,

    case endgroup when '1900-01-01 00:00:00.000' then (select min(endgroup) from cte2 where row > c2.row and endgroup != '1900-01-01 00:00:00.000') else endgroup end as EndDate

    from CTE2 c2

    )

    select row,curRecDay,valuefield,startdate,enddate,datediff(d,startdate,enddate)+1 as Days

    from cte3

    Returns:

    row curRecDay valuefield startdate enddate Days

    --------------- ------------------------ -------- ----------------------- ------------------------ ----------

    1 2008-07-13 00:00:00.000 1 2008-07-13 00:00:00.000 2008-07-13 00:00:00.000 1

    2 2008-07-14 00:00:00.000 4 2008-07-14 00:00:00.000 2008-07-14 00:00:00.000 1

    3 2008-07-15 00:00:00.000 6 2008-07-15 00:00:00.000 2008-07-17 00:00:00.000 3

    4 2008-07-16 00:00:00.000 6 2008-07-15 00:00:00.000 2008-07-17 00:00:00.000 3

    5 2008-07-17 00:00:00.000 6 2008-07-15 00:00:00.000 2008-07-17 00:00:00.000 3

    6 2008-07-18 00:00:00.000 4 2008-07-18 00:00:00.000 2008-07-18 00:00:00.000 1

    7 2008-07-20 00:00:00.000 6 2008-07-20 00:00:00.000 2008-07-20 00:00:00.000 1

    8 2008-07-21 00:00:00.000 2 2008-07-21 00:00:00.000 2008-07-21 00:00:00.000 1

    9 2008-08-01 00:00:00.000 7 2008-08-01 00:00:00.000 2008-08-01 00:00:00.000 1

    10 2008-08-05 00:00:00.000 8 2008-08-05 00:00:00.000 2008-08-07 00:00:00.000 3

    11 2008-08-06 00:00:00.000 8 2008-08-05 00:00:00.000 2008-08-07 00:00:00.000 3

    12 2008-08-07 00:00:00.000 8 2008-08-05 00:00:00.000 2008-08-07 00:00:00.000 3

    13 2008-08-08 00:00:00.000 4 2008-08-08 00:00:00.000 2008-08-08 00:00:00.000 1

    14 2008-09-02 00:00:00.000 4 2008-09-02 00:00:00.000 2008-09-04 00:00:00.000 3

    15 2008-09-03 00:00:00.000 4 2008-09-02 00:00:00.000 2008-09-04 00:00:00.000 3

    16 2008-09-04 00:00:00.000 4 2008-09-02 00:00:00.000 2008-09-04 00:00:00.000 3

    17 2008-09-10 00:00:00.000 4 2008-09-10 00:00:00.000 2008-09-10 00:00:00.000 1

    Thanks for helping today.

  • Glad I could help.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply