Special Grouping with ROW_NUMBER

  • I am trying to figure out what I am doing wrong when trying to create a grouping for each pair of "To Be Reviewed - CM" and "Completed" status.

    CREATE TABLE [dbo].[DIARY_STATUSES](

    [diary_id] [int] NOT NULL,

    [row_begin_date] [datetime] NOT NULL,

    [row_end_date] [datetime] NOT NULL,

    [diary_status_text] [varchar] (50) NOT NULL,

    CONSTRAINT [PK_DIARY_STATUSES] PRIMARY KEY CLUSTERED

    (

    [diary_id] ASC,

    [row_begin_date] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    INSERT INTO dbo.DIARY_STATUSES

    VALUES(139644,'2009-12-21 00:00:00.000','2009-12-21 23:59:59.997','To Be Reviewed-CM'),

    (139644,'2009-12-22 00:00:00.000','2010-01-07 23:59:59.997','Completed'),

    (139644,'2010-01-08 00:00:00.000','2010-01-08 23:59:59.997','To Be Reviewed-CM'),

    (139644,'2010-01-09 00:00:00.000','2010-02-10 23:59:59.997','Completed'),

    (139644,'2010-02-11 00:00:00.000','2010-02-13 23:59:59.997','To Be Reviewed-CM'),

    (139644,'2010-02-14 00:00:00.000','2010-02-28 23:59:59.997','Not Completed'),

    (139644,'2010-03-01 00:00:00.000','9999-12-31 00:00:00.000','Completed');

    Here is my attempt at the solution.

    with cteGroupData AS

    (

    select row_number () over (order by diary_id, row_begin_date,diary_status_text desc)

    - row_number () over(partition by diary_id, diary_status_text order by diary_id, diary_status_text) as MyGroup

    ,diary_id

    ,row_begin_date

    ,diary_status_text

    from dbo.diary_statuses

    order by row_begin_date

    )

    select row_number() over (partition by MyGroup order by diary_id, row_begin_date, diary_status_text desc) as status_count

    ,*

    from cteGroupData

    order by row_begin_date

    Here are my desired results: These are the results of the three "To Be Reviewed - CM" and "Completed" status groupings.

    MyGroup begin_date end_date

    ----------- ----------------------- -----------------------

    1 2009-12-21 00:00:00.000 2009-12-22 00:00:00.000

    2 2010-01-08 00:00:00.000 2010-01-09 00:00:00.000

    3 2010-02-11 00:00:00.000 2010-03-01 00:00:00.000

  • There might be a better way of doing this, but this does get you your desired results:

    with cteGroupData AS

    (

    SELECT Row1 = row_number () over (order by diary_id, row_begin_date,diary_status_text desc),

    diary_id,

    row_begin_date,

    diary_status_text

    FROM #diary_statuses

    WHERE diary_status_text in ('To Be Reviewed-CM','Completed')

    )

    , cteGroupLevel2 AS

    (

    SELECT *,

    NewGrp = CASE WHEN diary_status_text = 'Completed' THEN Row1-1 ELSE Row1 END

    FROM cteGroupData

    )

    , cteLevel3 AS

    (

    SELECT *,

    MyGroup = DENSE_RANK() OVER (ORDER BY NewGrp)

    FROM cteGroupLevel2

    )

    SELECT MyGroup,

    begin_date = MAX(CASE WHEN diary_status_text = 'To Be Reviewed-CM' THEN row_begin_date ELSE NULL END),

    end_date = MAX(CASE WHEN diary_status_text = 'Completed' THEN row_begin_date ELSE NULL END)

    FROM cteLevel3

    GROUP BY MyGroup

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Another way:

    SELECT my_group = ROW_NUMBER() OVER (ORDER BY DS.row_begin_date),

    DS.row_begin_date,

    iTVF.row_begin_date

    FROM dbo.DIARY_STATUSES DS

    CROSS

    APPLY (

    SELECT TOP (1) *

    FROM dbo.DIARY_STATUSES DS2

    WHERE DS2.diary_id = DS.diary_id

    AND DS2.diary_status_text = 'Completed'

    AND DS2.row_begin_date > DS.row_begin_date

    ORDER BY

    DS2.row_begin_date

    ) iTVF

    WHERE DS.diary_status_text = 'To Be Reviewed-CM'

    ORDER BY

    my_group ASC;

  • Thanks to both Wayne and Paul for their solutions. I ended up using Paul's solution because when run against my larger data set it kept out those records that were not yet Complete.

    Paul,

    The artices on APPLY were great.

  • I think I'm going to have to start forcing myself to try using CROSS APPLY until I get the hang of it.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • dataczar (4/27/2010)


    The articles on APPLY were great.

    Thanks!

  • WayneS (4/27/2010)


    I think I'm going to have to start forcing myself to try using CROSS APPLY until I get the hang of it.

    It's all about different logical processes. Often, it's better to think of an all-at-once solution to a problem. Sometimes, breaking it down into a 'for each thing X' do this thing 'Y' helps - which is where APPLY can help. If you can solve the problem for one member of the input set, it's usually very easy to extend the idea...

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply