How will you write a query for this scenario

Question

Post reply

How will you write a query for this scenario

vermarajiv

SSC Enthusiast

Points: 152
More actions
August 2, 2011 at 2:21 pm

#241682

I have a table called patientDates

PatientId int,

DivDate datetime

There can be multiple dates for a patient for eg

PatientId DivDate

1 2/1/2011

1 2/10/2011

1 6/5/2011

1 6/30/2011

1 7/8/2011

2 3/1/2011

2 5/3/2011

I need only first records that belong within 30 day intervals by patient. For eg. for the above data I need the following rows

1 2/1/2011 (Its the first record)

1 6/5/2011 (This is the min date after the first 30 day interval ie min(divDate) which is greater than first minDate + 30 days)

1 7/8/2011 (This is the min date after the 6/5/2011 + 30 days) and so on and so forth

How can we write a sql to get only these records. I have been thinking about it for hours now and cannot really figure it out. Appreciate all your help

PS - Looking for a set based solution and not cursor based.

Thanks

RK

Viewing 15 posts - 1 through 15 (of 32 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

Since each row is dependent on the value of the prior row, it's not a set-based dataset.

The easiest way I can think of to do this is a recursive CTE. It's just like a hierarchy-crawl, except it uses a date value instead of a parent ID. I'm not sure recursive CTEs can use the Min() function, but if they can, that'll be easy to code.

Otherwise, you'll need to write your own loop for it. Should be no big deal to code that if you need to.

But it's not a true relational set, since rows in it are ordered and dependent on prior rows, so don't worry about that aspect of it.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

eccentricDBA SSCarpal Tunnel Points: 4844 More actions · Answer 2

Here is a technique. It would need to be tested for usuability with a large record set.



-- Define Table

DECLARE @Table AS TABLE (PatientID INT, DivDate DATETIME);

-- Populate Table

INSERT INTO @Table(PatientID,DivDate)VALUES(1, '2/1/2011');

INSERT INTO @Table(PatientID,DivDate)VALUES(1, '2/10/2011');

INSERT INTO @Table(PatientID,DivDate)VALUES(1, '6/5/2011');

INSERT INTO @Table(PatientID,DivDate)VALUES(1, '6/30/2011');

INSERT INTO @Table(PatientID,DivDate)VALUES(1, '7/8/2011');

INSERT INTO @Table(PatientID,DivDate)VALUES(2,'3/1/2011');

INSERT INTO @Table(PatientID,DivDate)VALUES(2, '5/3/2011');

--Verify Table Poluation

SELECT * FROM @Table;

WITH PatientSchedule AS

(SELECT PatientID, Min(DivDate) FirstVisit, Min(DivDate) + 30 SecondVisit, Min(DivDate) + 60 ThirdVisit, Min(DivDate) + 90 FourthVisit

  FROM @Table

GROUP BY PatientID)

SELECT T.PatientID, Min(T.DivDate)

  FROM @Table as t

JOIN PatientSchedule as PS ON PS.PatientID = t.PatientID

WHERE T.DivDate BETWEEN PS.FirstVisit AND PS.SecondVisit

GROUP BY T.PatientID

UNION

SELECT T.PatientID, Min(T.DivDate)

  FROM @Table as t

JOIN PatientSchedule as PS ON PS.PatientID = t.PatientID

WHERE T.DivDate BETWEEN PS.SecondVisit AND PS.ThirdVisit

GROUP BY T.PatientID

UNION

SELECT T.PatientID, Min(T.DivDate)

  FROM @Table as t

JOIN PatientSchedule as PS ON PS.PatientID = t.PatientID

WHERE T.DivDate BETWEEN PS.ThirdVisit AND PS.FourthVisit

GROUP BY T.PatientID;

eccentricDBA SSCarpal Tunnel Points: 4844 More actions · Answer 3

eccentricDBA

SSCarpal Tunnel

Points: 4844

August 2, 2011 at 2:51 pm

#1362656

Sorry I just went back to review the data and it's not working correctly.

I'm getting the following for the results:

PatientID (No column name)

1 2011-02-01 00:00:00.000

2 2011-03-01 00:00:00.000

2 2011-05-03 00:00:00.000

vermarajiv SSC Enthusiast Points: 152 More actions · Answer 4

vermarajiv

SSC Enthusiast

Points: 152

August 2, 2011 at 4:55 pm

#1362757

Thanks GSquared for confirming what I was thinking.

I tried recursive CTE and you are right it does not allow aggregate functions

Based on your tip, I found a semi set based solution in a loop instead of using the cursor.

EccentricDBA - thanks for your effort to help me on this.

Thanks

RK

The Dixie Flatline SSC Guru Points: 53353 More actions · Answer 5

I found a semi set based solution in a loop instead of using the cursor.

Would you mind posting your solution up here?

Don't forget the warnings about performance. For high volumes of data, these kinds of problems are much better handled by procedural languages. Why do you reject a cursor solution here? With a supporting covering index ordered by patient number and date, I think you could get very satisfactory performance out of a cursor solution as opposed to a while loop.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

Recursive CTE's don't allow aggregate functions but they do allow ROW_NUMBER() which provides you with a means of obtaining MIN() and MAX():

http://qa.sqlservercentral.com/Forums/FindPost1137945.aspx

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

GSquared SSC Guru Points: 260824 More actions · Answer 7

ChrisM@Work (8/3/2011)

Recursive CTE's don't allow aggregate functions but they do allow ROW_NUMBER() which provides you with a means of obtaining MIN() and MAX():

http://qa.sqlservercentral.com/Forums/FindPost1137945.aspx

It's probably going to be slower than a hand-crafted loop in this kind of situation. Row_Number and the other ranking functions can often be performance hogs, and recursive CTEs are often low-performance anyway. Mixing the two is usually slow compared to a well-built loop.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 8

GSquared (8/3/2011)

ChrisM@Work (8/3/2011)

Recursive CTE's don't allow aggregate functions but they do allow ROW_NUMBER() which provides you with a means of obtaining MIN() and MAX():

http://qa.sqlservercentral.com/Forums/FindPost1137945.aspx

It's probably going to be slower than a hand-crafted loop in this kind of situation. Row_Number and the other ranking functions can often be performance hogs, and recursive CTEs are often low-performance anyway. Mixing the two is usually slow compared to a well-built loop.

That was my experience too, before PW posted this 😉

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

GSquared SSC Guru Points: 260824 More actions · Answer 9

ChrisM@Work (8/3/2011)

GSquared (8/3/2011)

ChrisM@Work (8/3/2011)

Recursive CTE's don't allow aggregate functions but they do allow ROW_NUMBER() which provides you with a means of obtaining MIN() and MAX():

http://qa.sqlservercentral.com/Forums/FindPost1137945.aspx

It's probably going to be slower than a hand-crafted loop in this kind of situation. Row_Number and the other ranking functions can often be performance hogs, and recursive CTEs are often low-performance anyway. Mixing the two is usually slow compared to a well-built loop.

That was my experience too, before PW posted this 😉

Yep. But your mileage may vary, so test multiple solutions. That's what I always do.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 10

Another possibility using quirky update.......thoughts anyone please?

regards gah



--=====

    USE TempDB

;

--===== conditionally drop the test table 

     IF OBJECT_ID('TempDB.dbo.PatientDates','U') IS NOT NULL

        DROP TABLE TempDB.dbo.PatientDates

;

----===== to test data as provided in OP -- RowId added

----===== start from here

 CREATE TABLE [dbo].[PatientDates]

 (

 	[PatientId] [int] NOT NULL,

 	[DivDate] [datetime] NOT NULL,

 	[RowId] [int] IDENTITY(1,1)

 ) 

 INSERT INTO [dbo].[PatientDates]([PatientId], [DivDate])

 SELECT 1, '20110201 00:00:00.000' UNION ALL

 SELECT 1, '20110210 00:00:00.000' UNION ALL

 SELECT 1, '20110605 00:00:00.000' UNION ALL

 SELECT 1, '20110630 00:00:00.000' UNION ALL

 SELECT 1, '20110708 00:00:00.000' UNION ALL

 SELECT 2, '20110302 00:00:00.000' UNION ALL

 SELECT 2, '20110503 00:00:00.000'

;

--== end here

--===== to test larger data sets -- create test table with 100000 records and 1000 patients --- dates from 01/01/10 to 31/12/2012  --- patient ID 1 to 1000

--===== overall runtime is for total process about 4 secs on my PC

--===== start from here

--SELECT TOP 100000

--         PatientID = CAST(RAND(CHECKSUM(NEWID()))*1000 +1 as INT) ,

--         Divdate = dateadd(dd, datediff(dd, 0, (CAST(RAND(CHECKSUM(NEWID()))*1096+40177 AS DATETIME))), 0),

--         IDENTITY(int,1,1) AS RowId

--  INTO PatientDates

--  FROM sys.All_Columns t1,

--       sys.All_Columns t2

--;    

--== end here

----===== Conditionally drop the Temp Table

     IF OBJECT_ID('TempDB..#PatientDates','U') IS NOT NULL

        DROP TABLE #PatientDates

;

--===== Create and copy the table on the fly

 SELECT PatientId, DivDate, RowId, Divdate as periodstartdate

 INTO #PatientDates

 FROM  PatientDates

;

--===== Check and make sure there's at least 1 row to work with.

    IF @@ROWCOUNT = 0

    BEGIN

        RAISERROR('No rows copied .  Exiting early.',10,1) WITH NOWAIT;

        RETURN;

    END

;

--======= Create Clustered Index....required by quirky update

CREATE UNIQUE CLUSTERED INDEX IX_#PatientDates_CI

     ON #PatientDates (PatientId, DivDate, RowId) WITH FILLFACTOR = 100

;

-----------------------------------------------------------------------------------------

--      Calculate the running total using "quirky update"  ....PLEASE READ FOLLOWING ARTICLE for detailed information.

--      http://qa.sqlservercentral.com/articles/T-SQL/68467/

---     by Jeff Moden

DECLARE @SafetyCounter  BIGINT,

        @PrevPatientID INT,

        @periodstartdate DATETIME

;

--===== Preset the Safety Counter (used as "Anchor")

 SELECT @SafetyCounter = 1

 ;

WITH

cteWork AS

(

 SELECT SafetyCounter = ROW_NUMBER() OVER (ORDER BY PatientId, DivDate),

        PatientId, DivDate, periodstartdate

   FROM #PatientDates

)

 UPDATE cteWork

 SET

     @periodstartdate

            =   periodstartdate

            =   CASE

                WHEN SafetyCounter = @SafetyCounter --Makes sure we're working on the correct row.

                   THEN    CASE

                            WHEN PatientId = @PrevPatientID

                            AND  datediff(day,@periodstartdate,divdate) < 31  --- for 30 day intervals

                            THEN @periodstartdate

                            ELSE periodstartdate

                           END

                  ELSE 1/0 --Forces error if out of sync.

                  END,

        @PrevpatientID = PatientId,

        @SafetyCounter  = @SafetyCounter + 1

 FROM cteWork WITH (TABLOCKX)

 OPTION (MAXDOP 1)

--==== RESULTS

SELECT PatientID, periodstartdate, COUNT(periodstartdate)as No_of_Visits FROM #PatientDates

GROUP BY PatientID, periodstartdate

order by PatientID, periodstartdate

;

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 11

The Dixie Flatline (8/2/2011)

I found a semi set based solution in a loop instead of using the cursor.

Would you mind posting your solution up here?

Don't forget the warnings about performance. For high volumes of data, these kinds of problems are much better handled by procedural languages. Why do you reject a cursor solution here? With a supporting covering index ordered by patient number and date, I think you could get very satisfactory performance out of a cursor solution as opposed to a while loop.

Hi RK

I am keen to see the solution you have found as well...I have a similar problem that I need to resolve ....pls see my earlier post..would appreciate your feedback

kind regards gah

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 12

I had to do something like that for a client a couple years back. The eventual solution was partially set-based, with a while loop tossed in and came out at almost 2 pages of T-SQL by the end.

It's not an easily-solved problem, especially if the data is 'interesting' ...

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 13

Ben-Gan has some very interesting date-span code that runs like the wind. It would be perfect for this and actually negates the need for a Quirky Update. In fact, it beats the Quirky Update for performance... heh... yeah, I tried. 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 14

vermarajiv (8/2/2011)

Thanks GSquared for confirming what I was thinking.

I tried recursive CTE and you are right it does not allow aggregate functions

Based on your tip, I found a semi set based solution in a loop instead of using the cursor.

EccentricDBA - thanks for your effort to help me on this.

Thanks

RK

Please... it's a 2 way street here. Please post the solution you found so that we may all learn. Thanks. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions