using row_number() over partition by to get datediff by row

Question

Post reply

using row_number() over partition by to get datediff by row

Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 1

Reran my code and in looking at the results from MM code I have to ask why the last row is being counted when it overlaps time wise with the previous record that is being excluded?

MM's results:



pmd_patient_id	endcallcd	eventid	createdat

100128137	1001	9999995	2013-04-16 12:02:14.000

100128137	1001	9999996	2013-04-16 17:02:14.000

100128138	1001	5093634	2013-04-16 17:17:14.000

100128138	1001	5099268	2013-04-26 13:48:53.000

100128138	1002	5121175	2013-05-16 16:43:51.000

100128138	1002	5133281	2013-05-23 17:53:46.000

100128138	1002	5133281	2013-05-23 19:03:46.000  << This row

Sample data:



pmd_patient_id	endcallcd	eventid	createdat

100128137	1001	9999995	2013-04-16 12:02:14.000

100128137	1001	9999996	2013-04-16 17:02:14.000

100128137	1001	9999997	2013-04-16 17:12:14.000

100128138	1001	5093634	2013-04-16 17:17:14.000

100128138	1001	5099268	2013-04-26 13:48:53.000

100128138	1002	5121175	2013-05-16 16:43:51.000

100128138	1002	5133281	2013-05-23 17:53:46.000

100128138	1002	5133281	2013-05-23 18:43:46.000

100128138	1002	5133281	2013-05-23 19:03:46.000 << Overlaps with the previous row

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 2

Lynn Pettis

SSC Guru

Points: 442447

May 30, 2013 at 9:08 am

#1619750

Okay, rereading everything. The first record that is counted starts the 60 minute window, subsequent records that are overlapping do not extend the window.

This correct?

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 3

Lynn Pettis (5/30/2013)

Okay, rereading everything. The first record that is counted starts the 60 minute window, subsequent records that are overlapping do not extend the window.

This correct?



100128138	1002	5121175	2013-05-16 16:43:51.000

100128138	1002	5133281	2013-05-23 17:53:46.000

100128138	1002	5133281	2013-05-23 18:43:46.000

100128138	1002	5133281	2013-05-23 19:03:46.000

For the above four rows the count should be 3.

Even so row #4 is less than 60 min apart of row #3, it should be counted as it's more than 60 min apart since previous "counted" row #2.

I guess, it's possible to do it with triangular joins, but I have no doubt that it will perform worse than "quirky update" provided. It may be slightly better than using cursor/loop.

I'm thinking towards using data-island method, but cannot get my head around it as yet.

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

ali86mola Valued Member Points: 65 More actions · Answer 4

ali86mola

Valued Member

Points: 65

May 30, 2013 at 10:20 am

#1619782

thank you is great artitial

mister.magoo SSC-Forever Points: 47068 More actions · Answer 5

Eugene, thanks for posting that QU solution.

I had always thought of QU as being a "Rolling Total" solution, and this has opened my mind a bit, so thanks again.

Of course, the QU version will outperform anything else in here, without a doubt.

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

lduvall SSCarpal Tunnel Points: 4869 More actions · Answer 6

I'm overwhelmed with gratitude for the multiple solutions posed! I thought I'd give a little more background on the business requirements to try to illustrate what we're doing. I'm a SQL back-end developer and our company has a web-based UI written in Ruby on Rails by overseas developers. We have no control over the UI or ROR pieces. There is a bug in the system that is generating multiple events in our SQL table that may or may not be "real." Sometimes, depending on the internet browser, it could be caused by double-clicking or clicking on some completely unrelated area of the screen. Then, I've been told there are "caching" issues. They haven't explained what "cache" they're referring to...local, browser, server, etc... In other words, we have no idea. It was decided to only count events that occurred more than 60 minutes apart. That's all I was told, so I'm trying to come up with the logic on my own. I think that both the QU and the loop code will work. Performance-wise, we'll be running the code at night on non-peak time, so as long as it finishes (and it doesn't appear to be an issue on our small datasets), that meets the requirement at this time. My hope is that the web GUI developers will straighten out their side soon!

Thank you again so much! I'm going to run all three sets of code and compare them all.

nimbell SSChasing Mays Points: 631 More actions · Answer 7

An alternative solution, that uses a recursive CTE instead of a WHILE loop:

--DECLARE @pmd_patient_id bigint = 100128137 --for single patient filtering



;WITH

  lteWithChronSeq AS ( --add a per patient chronological sequence number to raw data

	SELECT *

	, ROW_NUMBER() OVER(PARTITION BY pmd_patient_id ORDER BY createdat) AS ChronSeq

	FROM la_test_endcall

)

--SELECT * FROM lteWithChronSeq

, ltewcsCompare AS ( --match every event with every other event

	SELECT

	  ltewcs1.pmd_patient_id

	, ltewcs1.ChronSeq AS ChronSeq1

	, ltewcs2.ChronSeq AS ChronSeq2

	, CASE WHEN DATEDIFF(s,ltewcs2.createdat,ltewcs1.createdat) > 3600 THEN 1 ELSE 0 END AS IsGood --when time difference > 3600 secs (60 mins) then event is good

	FROM lteWithChronSeq ltewcs1

	CROSS JOIN lteWithChronSeq ltewcs2

	WHERE ltewcs1.pmd_patient_id = ltewcs2.pmd_patient_id --return only events for same patients

		AND ltewcs1.ChronSeq > ltewcs2.ChronSeq --only compare events with ones that occured earlier

		--AND ltewcs1.pmd_patient_id = @pmd_patient_id --single patient filter

)

--SELECT * FROM ltewcsCompare

, AllGoodEventPairs AS ( --first subsequent good event for any given event

	SELECT

	  pmd_patient_id

	, ChronSeq2 AS ChronSeq

	, MIN(ChronSeq1) AS NextGoodChronSeq

	FROM ltewcsCompare

	WHERE IsGood = 1

	GROUP BY pmd_patient_id, ChronSeq2

)

--SELECT * FROM AllGoodEventPairs

, GoodEvents AS ( --create a 'chain' of good events

	--anchor

	SELECT pmd_patient_id, ChronSeq

	FROM AllGoodEventPairs

	WHERE ChronSeq = 1 --first event is always good

	UNION ALL

	--recursive

	SELECT ge.pmd_patient_id, NextGoodChronSeq

	FROM GoodEvents ge

	INNER JOIN AllGoodEventPairs agep ON ge.ChronSeq = agep.ChronSeq AND ge.pmd_patient_id = agep.pmd_patient_id

)

--finally, join the chain of good events back to the original data:

SELECT

  ltewcs.pmd_patient_id

, ltewcs.endcallcd

, ltewcs.eventid

, ltewcs.createdat

FROM GoodEvents ge

INNER JOIN lteWithChronSeq ltewcs ON ltewcs.ChronSeq = ge.ChronSeq AND ltewcs.pmd_patient_id = ge.pmd_patient_id

ORDER BY ge.pmd_patient_id, ge.ChronSeq