Spreading out record over a date range to be analyzed on a day by day basis

  • I'm not sure I have accomplished this task using best coding practises and I was just wondering if anyone has an idea of how to better solve this problem. So I have a bunch of reading records. Each record has a Date1 and Date2 column. I need to take this one record and generate mutiple records for the span of days from date1 to date2 which each date having its time set to zero (At this point I really wish there was a 'small date' format in SQL server which excludes time.. anyhow)

    So heres an example

    Reading2 = 3

    Date1 = 2010-01-01 10:23:01

    Date2 = 2010-01-03 18:26:04

    So from this one record I would need to generate 3 days worth of records with the Reading2 value equally spread between the three days. So I would want:

    MyDate = 2010-01-01 00:00:00 Reading = 1

    MyDate = 2010-01-02 00:00:00 Reading = 1

    MyDate = 2010-01-03 00:00:00 Reading = 1

    As an output. The only thing I don't like about my way is how it uses a tally table of only a certain amount of days and I cross join to it which is a massive join then I throw a bunch away using a WHERE clause. This completely kills the ability to ever use an index on the date column because of all of my manipulation before I would be able to query based on the date. Here is a sample of what I'm doing. Please let me know if this is ok practise or if you see any better/faster way to do this.

    I know the code looks ugly, but I tried to comment as best as possible. Its all 100% executible if you want to try it for yourself.

    --#################################################

    -- Section 1 create test data

    --#################################################

    --This is just the simplified test data from my example above

    --Generally this would include many reocords like this, in my case around a few hundred thousand

    DECLARE @RawData TABLE

    (

    MethodID INT,

    Reading2 float,

    Date1 datetime,

    Date2 datetime

    )

    INSERT INTO @RawData (MethodID,Reading2,Date1,Date2)

    --First record is example from above, second is just for showing how it works when date1 and date2 on same day

    SELECT 1,3,'2010-01-01 10:23:01','2010-01-03 18:26:04' UNION ALL

    SELECT 2,10,'2010-01-01 10:23:01','2010-01-01 18:26:04'

    --#################################################

    -- Section 2 Declare Tally table to use in cross join to make mutiple records

    --#################################################

    --This is a dynamically created tally table that allows for a date

    --span of up to 10,000 days

    ;WITH

    E1(N) AS ( --=== Create Ten 1's

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 --10

    ),

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

    cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT N))-1 FROM E4)

    --#################################################

    -- Section 3 Code to generate desired result set

    --#################################################

    SELECT

    MethodID,

    MyDate = DateAdd(day,cteTally.N,Temp.Date1NoTime),

    Reading = (Temp.Reading2/(Temp.DaysSpanned))

    FROM

    (

    SELECT

    RawData.MethodID,

    RawData.Reading2,

    --Setting time portion of date to 0

    [Date1NoTime] = DATEADD(day,DATEDIFF(day,0,RawData.Date1),0),

    [Date2NoTime] = DATEADD(day,DATEDIFF(day,0,RawData.Date2),0),

    --Calculating the days spanned

    [DaysSpanned] = (CASE DATEADD(day,DATEDIFF(day,0,RawData.Date1),0)

    WHEN DATEADD(day,DATEDIFF(day,0,RawData.Date2),0) THEN 1

    ELSE 1 + DateDiff(day,DATEADD(day,DATEDIFF(day,0,RawData.Date1),0),DATEADD(day,DATEDIFF(day,0,RawData.Date2),0))END)

    FROM

    @RawData RawData

    )AS Temp

    CROSS JOIN

    cteTally

    WHERE

    --Constraining the cross join to return only days required

    --This is where the huge join happens to my 100,000+ record table so I assume its hard on the CPU

    DateAdd(day,cteTally.N,Temp.Date1NoTime) BETWEEN Temp.Date1NoTime AND Temp.Date2NoTime

    ORDER BY MethodID

  • Add this line to your where clause:

    AND cteTally.N <= DATEDIFF( dd, Temp.Date1NoTime, Temp.Date2NoTime)

    This will restrict the crossjoin to just the # of Tally records you require.

    FYI: the code above loses 1/1/2010, btw. Not sure if that was the intent.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Ah great thanks, thats just want I was looking for. Cuts the run time down from 24 sec to 8 sec. Thats a good start.

    And what do you mean I lose 2010-01-01 ? I seem to get it in my tests.

  • loki1049 (12/7/2010)


    Ah great thanks, thats just want I was looking for. Cuts the run time down from 24 sec to 8 sec. Thats a good start.

    And what do you mean I lose 2010-01-01 ? I seem to get it in my tests.

    Ah, found the difference. I have a hard saved Tally table locally I swapped off to, instead of the on the fly one. However, I'm 1 based in the array, you're 0 based. Sorry for the confusion.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (12/7/2010)


    Add this line to your where clause:

    AND cteTally.N <= DATEDIFF( dd, Temp.Date1NoTime, Temp.Date2NoTime)

    This will restrict the crossjoin to just the # of Tally records you require.

    FYI: the code above loses 1/1/2010, btw. Not sure if that was the intent.

    you can even avoid the datediff by replacing it with the already calculated column [DaysSpanned].

    Keep in mind to replace <= by < because of the 1 offset of this column

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Craig Farrell (12/7/2010)


    Ah, found the difference. I have a hard saved Tally table locally I swapped off to, instead of the on the fly one. However, I'm 1 based in the array, you're 0 based. Sorry for the confusion.

    Makes sense. Thanks.

  • Instead of a "CROSS JOIN" on the tally table, use an INNER JOIN, specifying the WHERE conditions relating to tally values as ON conditions instead.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply