Date Range within a Date Range

  • aktikt (8/5/2009)


    Jeff, thanks for your patronizing complements... Just kidding.

    By the way Aktikt, just to be clear... I never patronize... it's either praise or pork chops. No in between. Heh... and I'm not kidding. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Ok... here's the code that actually does the split by month segment instead of by day. The reason this had to be done is that the 11 row example spawned 3,009 internal rows (1 for each day in the range of each row). Just imagine what would happen if we tried that with a large number of rows. :sick:

    This new way "only" spawned 105 rows... 1 for each month in the range for each row. Still, for a 12 million row problem, that may still be way too much. This may have to be done by year of StartDate.

    As always, the details are in the code...

    --===== Outer SELECT does the display conversions and the sorting.

    SELECT Year(TotalData.TheMonth) AS Year,

    Month(TotalData.TheMonth) AS Month,

    TotalData.TotalAmount AS TotalAmount

    FROM (--==== Calculate the total amount for each month

    SELECT DATEADD(mm,DATEDIFF(mm,0,c.StartDate) + (t.N-1),0) AS TheMonth,

    SUM(

    c.Amount/(DATEDIFF(mm,c.StartDate,c.EndDate)+1) --Amount divided by the total number of days

    * DATEDIFF(dd, --===== Multiplied by the Number of days in a month segment

    CASE --===== Month start date or row start date, which ever is latest

    WHEN c.StartDate > DATEADD(mm,DATEDIFF(mm,0,c.StartDate) + (t.N-1),0)

    THEN c.StartDate

    ELSE DATEADD(mm,DATEDIFF(mm,0,c.StartDate) + (t.N-1),0)

    END

    ,

    CASE --===== Month end date or row end date, which ever is earliest

    WHEN c.EndDate < DATEADD(mm,DATEDIFF(mm,0,c.StartDate) + (t.N),0)-1

    THEN c.EndDate

    ELSE DATEADD(mm,DATEDIFF(mm,0,c.StartDate) + (t.N),0)-1

    END

    ) +1

    ) AS TotalAmount

    FROM #Claims c

    INNER JOIN Tally t ON t.N <= DATEDIFF(mm,c.StartDate,c.EndDate)+1 --Intentional/required triangular join

    GROUP BY DATEADD(mm,DATEDIFF(mm,0,c.StartDate) + (t.N-1),0)

    ) AS TotalData

    ORDER BY TotalData.TheMonth

    ________________________________________________________________________________________________________________________

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (8/5/2009)


    Nigel will probably beat me to it, though.

    Not a chance, I was sleeping while you were working on this.

    Nice solution Jeff, I think I just about get it.

    I thought that the number of intermediate rows generated may be a bottleneck but couldn't figure out a way to reduce them. Good one!

  • Jeff, condolences to you and your friend's family.

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • Jeff,

    I have listed the results of running your query against existing Claims data containing

    35923 records, below. I have added a WHERE clause to restrict the values to 2006 and 2007:

    WHERE Year(TotalData.TheMonth) BETWEEN 2006 AND 2007. If this scales linearly then it would take 11 minutes to run. Besides, it turns out that long claims are rather rare and 1 day for claim length is very common. Thus, it should perform even better on this type of data.

    Thanks for your help. You too Nigel. 🙂

    Great work!

    aktikt

    SQL Server parse and compile time:

    CPU time = 19 ms, elapsed time = 19 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Year Month TotalAmount

    ----------- ----------- ----------------------

    2006 1 447093.718447179

    2006 2 401313.289622182

    2006 3 440011.516648319

    2006 4 419537.0008398

    2006 5 466245.122821026

    2006 6 450941.492471082

    2006 7 426732.216494343

    2006 8 434883.843448311

    2006 9 436754.21254134

    2006 10 464639.761466278

    2006 11 488523.837852463

    2006 12 462196.566039379

    2007 1 483823.293872574

    2007 2 502495.751410013

    2007 3 580603.76534816

    2007 4 1040816.52684632

    2007 5 1053798.30161442

    2007 6 1029035.60165523

    2007 7 977611.117479759

    2007 8 843939.072468995

    2007 9 835551.49081171

    2007 10 864416.608813132

    2007 11 856591.992495634

    2007 12 871064.222941644

    (24 row(s) affected)

    SQL Server Execution Times:

    CPU time = 7296 ms, elapsed time = 1974 ms.

  • Think you might need to take a step back from all the fun code and ask a couple questions.

    First, (you already know this but we don't) what kind of 'claims' are you looking at? If medical claims, what date you want to allocate to may depend on the type of services. For example, if Inpatient hospital claims, you probably want to assign everything to the discharge, and therefore would want to use the last date, not split the cost between months. Just to make it fun, most utilization reports (non-cost related) look at inpatient claims on admission, and therefore the number of cases would change the allocation of claims to months.

    If you're looking at physician claims, then you are probably right to split cost between months, but would be better off looking at the detail line level rather than the whole claim, as a service won't have a date span for physician services, and you can just sum the dollars.

    If you're looking at non-healthcare 'claims', then do considerations like the above apply, where the business pays based on a determination at either the start or end of the service rendered? If so, you'll need to allocate appropriately.

    food for thought.

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • aktikt (8/6/2009)


    Jeff,

    I have listed the results of running your query against existing Claims data containing

    35923 records, below. I have added a WHERE clause to restrict the values to 2006 and 2007:

    WHERE Year(TotalData.TheMonth) BETWEEN 2006 AND 2007. aktikt

    Check out Gail Shaw's Dirty Dozen presentation from her blog, SQL In The Wild[/url].

    You'll see that using a date function in the WHERE like that will actually perform worse than WHERE #Claims.startdate > '12/31/2005' AND #Claims.startdate < '1/1/2008'

    You would change the above to #Claims.enddate if that was what you wanted to use, etc.

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • nigel (8/6/2009)


    Jeff Moden (8/5/2009)


    Nigel will probably beat me to it, though.

    Not a chance, I was sleeping while you were working on this.

    Nice solution Jeff, I think I just about get it.

    I thought that the number of intermediate rows generated may be a bottleneck but couldn't figure out a way to reduce them. Good one!

    Thanks Nigel. It may still hurl pretty badly on a 12 million row solution... but so would a simple update.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • jcrawf02 (8/6/2009)


    Jeff, condolences to you and your friend's family.

    Thanks. I very much appreciate that. I don't normally air my personal stuff but this one really got to me and I let it slip... he was 2 years younger than me and, from what I can tell, in better physical condition than me.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • aktikt (8/6/2009)


    Jeff,

    I have listed the results of running your query against existing Claims data containing

    35923 records, below. I have added a WHERE clause to restrict the values to 2006 and 2007:

    WHERE Year(TotalData.TheMonth) BETWEEN 2006 AND 2007. If this scales linearly then it would take 11 minutes to run. Besides, it turns out that long claims are rather rare and 1 day for claim length is very common. Thus, it should perform even better on this type of data.

    Thanks for your help. You too Nigel. 🙂

    Great work!

    aktikt

    SQL Server parse and compile time:

    CPU time = 19 ms, elapsed time = 19 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    Year Month TotalAmount

    ----------- ----------- ----------------------

    2006 1 447093.718447179

    2006 2 401313.289622182

    2006 3 440011.516648319

    2006 4 419537.0008398

    2006 5 466245.122821026

    2006 6 450941.492471082

    2006 7 426732.216494343

    2006 8 434883.843448311

    2006 9 436754.21254134

    2006 10 464639.761466278

    2006 11 488523.837852463

    2006 12 462196.566039379

    2007 1 483823.293872574

    2007 2 502495.751410013

    2007 3 580603.76534816

    2007 4 1040816.52684632

    2007 5 1053798.30161442

    2007 6 1029035.60165523

    2007 7 977611.117479759

    2007 8 843939.072468995

    2007 9 835551.49081171

    2007 10 864416.608813132

    2007 11 856591.992495634

    2007 12 871064.222941644

    (24 row(s) affected)

    SQL Server Execution Times:

    CPU time = 7296 ms, elapsed time = 1974 ms.

    That's cool... thanks for the performance feedback. How many rows did it actually process?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • jcrawf02 (8/6/2009)


    aktikt (8/6/2009)


    Jeff,

    I have listed the results of running your query against existing Claims data containing

    35923 records, below. I have added a WHERE clause to restrict the values to 2006 and 2007:

    WHERE Year(TotalData.TheMonth) BETWEEN 2006 AND 2007. aktikt

    Check out Gail Shaw's Dirty Dozen presentation from her blog, SQL In The Wild[/url].

    You'll see that using a date function in the WHERE like that will actually perform worse than WHERE #Claims.startdate > '12/31/2005' AND #Claims.startdate < '1/1/2008'

    You would change the above to #Claims.enddate if that was what you wanted to use, etc.

    I absolutely agree... as you said, it also depends on where such a limit is placed in the code as to whether or not the entire 12 million rows will still be processed or not.

    I also agree that Gail's blogs are some of the best.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jcrawf02,

    Your comment,

    Check out Gail Shaw's Dirty Dozen presentation from her blog, SQL In The Wild.

    You'll see that using a date function in the WHERE like that will actually perform worse than WHERE #Claims.startdate > '12/31/2005' AND #Claims.startdate 0

    THEN ' ' + p.MiddleName + ' '

    ELSE ' '

    END +

    p.lastName as FullName FROM Person

    Thanks,

    aktikt

  • Caveat Emptor - I'm hardly an expert, and would therefore suggest that you continue to read up on the articles and forums here to gain more knowledge. I just happened to see that what you were attempting fit nicely into Gail's explanation.

    However, I am happy to play around with SQL statements to make them work better. In your first example, I'm not quite sure what you're trying to accomplish? We're inflating the siteCount by 100 times, and then rounding to what precision? Casting to what datatype?

    SELECT Cast(Cast(Round(100 * Sum(SiteCount))))

    FROM Sites

    In your second, I can see that you're trying to concatenate a full name out of first, middle, lastname fields, and want to avoid the extra space when there is no middlename.

    SELECT

    p.FirstName +

    CASE WHEN Len(p.Middlename) > 0

    THEN ' ' + p.MiddleName + ' '

    ELSE ' '

    END +

    p.lastName as FullName FROM Person

    I haven't tried this, but my first thought was that coalesce() or isnull() were the way to do this cleanly. If that function is giving you headaches, then your CASE seems to do the job. I've got a situation like this at work, gonna play around and see what I find.

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply