Date Range within a Date Range

Question

Post reply

Date Range within a Date Range

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 1

aktikt (8/5/2009)

Jeff, thanks for your patronizing complements... Just kidding.

By the way Aktikt, just to be clear... I never patronize... it's either praise or pork chops. No in between. Heh... and I'm not kidding. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 2

Ok... here's the code that actually does the split by month segment instead of by day. The reason this had to be done is that the 11 row example spawned 3,009 internal rows (1 for each day in the range of each row). Just imagine what would happen if we tried that with a large number of rows. :sick:

This new way "only" spawned 105 rows... 1 for each month in the range for each row. Still, for a 12 million row problem, that may still be way too much. This may have to be done by year of StartDate.

As always, the details are in the code...

--===== Outer SELECT does the display conversions and the sorting.



 SELECT Year(TotalData.TheMonth)  AS Year,

        Month(TotalData.TheMonth) AS Month,

        TotalData.TotalAmount     AS TotalAmount

   FROM (--==== Calculate the total amount for each month

         SELECT DATEADD(mm,DATEDIFF(mm,0,c.StartDate) + (t.N-1),0) AS TheMonth,

                SUM(

                    c.Amount/(DATEDIFF(mm,c.StartDate,c.EndDate)+1) --Amount divided by the total number of days

                  * DATEDIFF(dd, --===== Multiplied by the Number of days in a month segment

                            CASE --===== Month start date or row start date, which ever is latest

                                 WHEN c.StartDate > DATEADD(mm,DATEDIFF(mm,0,c.StartDate) + (t.N-1),0)

                                 THEN c.StartDate

                                 ELSE DATEADD(mm,DATEDIFF(mm,0,c.StartDate) + (t.N-1),0)

                            END

                            ,

                            CASE --===== Month end date or row end date, which ever is earliest

                                 WHEN c.EndDate < DATEADD(mm,DATEDIFF(mm,0,c.StartDate) + (t.N),0)-1

                                 THEN c.EndDate

                                 ELSE DATEADD(mm,DATEDIFF(mm,0,c.StartDate) + (t.N),0)-1

                            END

                            ) +1

                ) AS TotalAmount

           FROM #Claims c

          INNER JOIN Tally t ON t.N <= DATEDIFF(mm,c.StartDate,c.EndDate)+1 --Intentional/required triangular join

          GROUP BY DATEADD(mm,DATEDIFF(mm,0,c.StartDate) + (t.N-1),0)

        ) AS TotalData

  ORDER BY TotalData.TheMonth

________________________________________________________________________________________________________________________

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

nigel. SSChampion Points: 11636 More actions · Answer 3

Jeff Moden (8/5/2009)

Nigel will probably beat me to it, though.

Not a chance, I was sleeping while you were working on this.

Nice solution Jeff, I think I just about get it.

I thought that the number of intermediate rows generated may be a bottleneck but couldn't figure out a way to reduce them. Good one!

Nigel
How to post questions

jcrawf02 SSC-Insane Points: 24198 More actions · Answer 4

Jeff, condolences to you and your friend's family.

---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]

"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

aktikt Ten Centuries Points: 1305 More actions · Answer 5

Jeff,

I have listed the results of running your query against existing Claims data containing

35923 records, below. I have added a WHERE clause to restrict the values to 2006 and 2007:

WHERE Year(TotalData.TheMonth) BETWEEN 2006 AND 2007. If this scales linearly then it would take 11 minutes to run. Besides, it turns out that long claims are rather rare and 1 day for claim length is very common. Thus, it should perform even better on this type of data.

Thanks for your help. You too Nigel. 🙂

Great work!

aktikt

SQL Server parse and compile time:

CPU time = 19 ms, elapsed time = 19 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

Year Month TotalAmount

----------- ----------- ----------------------

2006 1 447093.718447179

2006 2 401313.289622182

2006 3 440011.516648319

2006 4 419537.0008398

2006 5 466245.122821026

2006 6 450941.492471082

2006 7 426732.216494343

2006 8 434883.843448311

2006 9 436754.21254134

2006 10 464639.761466278

2006 11 488523.837852463

2006 12 462196.566039379

2007 1 483823.293872574

2007 2 502495.751410013

2007 3 580603.76534816

2007 4 1040816.52684632

2007 5 1053798.30161442

2007 6 1029035.60165523

2007 7 977611.117479759

2007 8 843939.072468995

2007 9 835551.49081171

2007 10 864416.608813132

2007 11 856591.992495634

2007 12 871064.222941644

(24 row(s) affected)

SQL Server Execution Times:

CPU time = 7296 ms, elapsed time = 1974 ms.

jcrawf02 SSC-Insane Points: 24198 More actions · Answer 6

Think you might need to take a step back from all the fun code and ask a couple questions.

First, (you already know this but we don't) what kind of 'claims' are you looking at? If medical claims, what date you want to allocate to may depend on the type of services. For example, if Inpatient hospital claims, you probably want to assign everything to the discharge, and therefore would want to use the last date, not split the cost between months. Just to make it fun, most utilization reports (non-cost related) look at inpatient claims on admission, and therefore the number of cases would change the allocation of claims to months.

If you're looking at physician claims, then you are probably right to split cost between months, but would be better off looking at the detail line level rather than the whole claim, as a service won't have a date span for physician services, and you can just sum the dollars.

If you're looking at non-healthcare 'claims', then do considerations like the above apply, where the business pays based on a determination at either the start or end of the service rendered? If so, you'll need to allocate appropriately.

food for thought.

---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]

"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

jcrawf02 SSC-Insane Points: 24198 More actions · Answer 7

aktikt (8/6/2009)

Jeff,

I have listed the results of running your query against existing Claims data containing

35923 records, below. I have added a WHERE clause to restrict the values to 2006 and 2007:

WHERE Year(TotalData.TheMonth) BETWEEN 2006 AND 2007. aktikt

Check out Gail Shaw's Dirty Dozen presentation from her blog, SQL In The Wild[/url].

You'll see that using a date function in the WHERE like that will actually perform worse than WHERE #Claims.startdate > '12/31/2005' AND #Claims.startdate < '1/1/2008'

You would change the above to #Claims.enddate if that was what you wanted to use, etc.

---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]

"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 8

nigel (8/6/2009)

Jeff Moden (8/5/2009)

Nigel will probably beat me to it, though.

Not a chance, I was sleeping while you were working on this.

Nice solution Jeff, I think I just about get it.

I thought that the number of intermediate rows generated may be a bottleneck but couldn't figure out a way to reduce them. Good one!

Thanks Nigel. It may still hurl pretty badly on a 12 million row solution... but so would a simple update.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 9

jcrawf02 (8/6/2009)

Jeff, condolences to you and your friend's family.

Thanks. I very much appreciate that. I don't normally air my personal stuff but this one really got to me and I let it slip... he was 2 years younger than me and, from what I can tell, in better physical condition than me.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 10

aktikt (8/6/2009)

Jeff,

I have listed the results of running your query against existing Claims data containing

35923 records, below. I have added a WHERE clause to restrict the values to 2006 and 2007:

WHERE Year(TotalData.TheMonth) BETWEEN 2006 AND 2007. If this scales linearly then it would take 11 minutes to run. Besides, it turns out that long claims are rather rare and 1 day for claim length is very common. Thus, it should perform even better on this type of data.

Thanks for your help. You too Nigel. 🙂

Great work!

aktikt

SQL Server parse and compile time:

CPU time = 19 ms, elapsed time = 19 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

Year Month TotalAmount

----------- ----------- ----------------------

2006 1 447093.718447179

2006 2 401313.289622182

2006 3 440011.516648319

2006 4 419537.0008398

2006 5 466245.122821026

2006 6 450941.492471082

2006 7 426732.216494343

2006 8 434883.843448311

2006 9 436754.21254134

2006 10 464639.761466278

2006 11 488523.837852463

2006 12 462196.566039379

2007 1 483823.293872574

2007 2 502495.751410013

2007 3 580603.76534816

2007 4 1040816.52684632

2007 5 1053798.30161442

2007 6 1029035.60165523

2007 7 977611.117479759

2007 8 843939.072468995

2007 9 835551.49081171

2007 10 864416.608813132

2007 11 856591.992495634

2007 12 871064.222941644

(24 row(s) affected)

SQL Server Execution Times:

CPU time = 7296 ms, elapsed time = 1974 ms.

That's cool... thanks for the performance feedback. How many rows did it actually process?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 11

jcrawf02 (8/6/2009)

aktikt (8/6/2009)

Jeff,

I have listed the results of running your query against existing Claims data containing

35923 records, below. I have added a WHERE clause to restrict the values to 2006 and 2007:

WHERE Year(TotalData.TheMonth) BETWEEN 2006 AND 2007. aktikt

Check out Gail Shaw's Dirty Dozen presentation from her blog, SQL In The Wild[/url].

You'll see that using a date function in the WHERE like that will actually perform worse than WHERE #Claims.startdate > '12/31/2005' AND #Claims.startdate < '1/1/2008'

You would change the above to #Claims.enddate if that was what you wanted to use, etc.

I absolutely agree... as you said, it also depends on where such a limit is placed in the code as to whether or not the entire 12 million rows will still be processed or not.

I also agree that Gail's blogs are some of the best.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

aktikt Ten Centuries Points: 1305 More actions · Answer 12

Jcrawf02,

Your comment,

Check out Gail Shaw's Dirty Dozen presentation from her blog, SQL In The Wild.

You'll see that using a date function in the WHERE like that will actually perform worse than WHERE #Claims.startdate > '12/31/2005' AND #Claims.startdate 0

THEN ' ' + p.MiddleName + ' '

ELSE ' '

END +

p.lastName as FullName FROM Person

Thanks,

aktikt

jcrawf02 SSC-Insane Points: 24198 More actions · Answer 13

Caveat Emptor - I'm hardly an expert, and would therefore suggest that you continue to read up on the articles and forums here to gain more knowledge. I just happened to see that what you were attempting fit nicely into Gail's explanation.

However, I am happy to play around with SQL statements to make them work better. In your first example, I'm not quite sure what you're trying to accomplish? We're inflating the siteCount by 100 times, and then rounding to what precision? Casting to what datatype?

SELECT Cast(Cast(Round(100 * Sum(SiteCount))))

FROM Sites

In your second, I can see that you're trying to concatenate a full name out of first, middle, lastname fields, and want to avoid the extra space when there is no middlename.

SELECT



p.FirstName +

CASE WHEN Len(p.Middlename) > 0

THEN ' ' + p.MiddleName + ' '

ELSE ' '

END +

p.lastName as FullName FROM Person

I haven't tried this, but my first thought was that coalesce() or isnull() were the way to do this cleanly. If that function is giving you headaches, then your CASE seems to do the job. I've got a situation like this at work, gonna play around and see what I find.

---------------------------------------------------------
How best to post your question[/url]
How to post performance problems[/url]
Tally Table:What it is and how it replaces a loop[/url]

"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."