row_num/rank over date dimension table

  • I have a date dimension table with calendar date, year_num and month_num columns. This table is populated with calendar dates for 100 years.

    I am executing below query and not getting what i wanted in fourth column.

    select day_date

    ,year_num

    ,month_num

    ,row_number() over (PARTITION BY year_num, month_num order by month_num)

    from date_dim

    where day_date between '2009-11-1' and '2010-1-31'

    IN FOURTH COLUMN I AM EXPECTING 1 FOR NOV 2009, 2 FOR DECEMBER 2009 ADN 3 FOR JAN 2010 RECORDS.

  • You tell us what you want, but you left out what you are getting currently.

  • PARTITION BY means to start counting all over again from 1 when this value changes. It isn't required.

    Try this:

    ,row_number() over (ORDER BY day_date)

    But, if you have one row per day, instead of one row per month, ROW_NUMBER() isn't going to do what you want at all. Let me know if that's the case, and I'll post a solution for you.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Here is what I am expecting. I am listing only 2-3 records per month, in real I get 30/31 records per month.

    (day_date) (year_num) (month_num) (month_sequence)

    ---------------------------------------------------------

    (2009-11-1) (2009) (11) (1)

    .

    .

    (2009-11-30) (2009) (11) (1)

    (2009-12-1) (2009) (12) (2)

    .

    .

    (2009-12-31) (2009) (12) (2)

    (2010-1-1) (2010) (1) (3)

    .

    .

    (2010-1-31) (2010) (1) (3)

  • Look at the following, and try using the dense_rank function.

    select *, dense_rank() over (order by MyYear, MyMonth) from @TestTable;

  • If i've understood your requirements correctly you might want to try DENSE_RANK()

    SELECT

    day_date,

    year_num,

    month_num,

    DENSE_RANK() OVER(ORDER BY year_num, month_num)

    FROM date_dim

    WHERE day_date BETWEEN '2009-11-01' AND '2010-01-31'

    Dave

  • Isn't DENSE_RANK going to force a sort?

    declare @startDate datetime

    set @startDate = '11/1/2009'

    select DATEDIFF(month,0,@startdate)-1317 -- this datediff calculation is the solution

    -- proof

    ;with tally (N) as (select row_number() over(order by id) from master..syscolumns)

    ,cte1 as (select N, dateadd(day,N-1,@startDate) as someDate from tally where N <= 120)

    select N,somedate,DATEDIFF(month,0,somedate)-1317 as Column4

    from cte1

    Okay, Lynn: I admit it. I didn't even think of DENSE_RANK.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Thanks for everyone who is invloved.

    dense_rank did work.

  • Here is the code I wrote to test:

    declare @TestTable table (

    MyDate datetime,

    MyYear int,

    MyMonth int

    );

    insert into @TestTable

    select dateadd(mm, 2, dateadd(dd, datediff(dd, 0, getdate()),0)) + (N - 1), year(dateadd(mm, 2, dateadd(dd, datediff(dd, 0, getdate()),0)) + (N - 1)), month(dateadd(mm, 2, dateadd(dd, datediff(dd, 0, getdate()),0)) + (N - 1))

    from dbo.Tally

    where N < 93;

    select *, dense_rank() over (order by MyYear, MyMonth) from @TestTable;

  • HAH! I was right!! (cruel chuckle)

    The DENSE_RANK solution adds a sort and segment to the execution plan, so the DATEDIFF solution runs twice as fast in the test below. BUT, DENSE_RANK slows down a lot more as the rows get wider, so the difference in efficiency widens as well. Of course, it takes a million rows or so to really notice the difference.

    declare @startDate datetime

    set @startDate = '1/1/2009'

    -- populate a million row test table from pregenerated tally table

    if OBJECT_ID(N'TempDB..#test') is not null drop table #test

    select N,DATEADD(DAY,N-1,@startDate) as someDate,

    DATEPART(Year,DATEADD(DAY,N-1,@startDate)) as someYear,

    DATEPART(Month,DATEADD(DAY,N-1,@startDate)) as someMonth

    --, 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' as filler

    into #test

    from tally

    if OBJECT_ID(N'TempDB..#catch1') is not null drop table #catch1

    if OBJECT_ID(N'TempDB..#catch2') is not null drop table #catch2

    print '----- DENSE RANK SOLUTION'

    set statistics time on;

    select *, dense_rank() over (order by someYear, someMonth) as Column4

    into #catch1

    from #test;

    set statistics time off;

    print '----- DATEDIFF SOLUTION'

    set statistics time on;

    select *,DATEDIFF(month,0,somedate)-1317 as Column4

    into #catch2

    from #test

    set statistics time off;

    [Without Filler]

    ----- DENSE RANK SOLUTION

    SQL Server Execution Times:

    CPU time = 3198 ms, elapsed time = 3268 ms.

    (1000000 row(s) affected)

    ----- DATEDIFF SOLUTION

    SQL Server Execution Times:

    CPU time = 1575 ms, elapsed time = 1568 ms.

    (1000000 row(s) affected)

    [WITH Filler]

    ----- DENSE RANK SOLUTION

    SQL Server Execution Times:

    CPU time = 4508 ms, elapsed time = 7628 ms.

    (1000000 row(s) affected)

    ----- DATEDIFF SOLUTION

    SQL Server Execution Times:

    CPU time = 2106 ms, elapsed time = 2411 ms.

    (1000000 row(s) affected)

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Cool!

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply