recursive or a better way to solve this problem

Question

Post reply

recursive or a better way to solve this problem

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 1

Wayne,

I was looking at your previous row calculation on the 3 part update thinking to myself "Cool... he even thought of the forced RN change when EMID changes just in case adjacent rows have the same times". About that time I also realized that not everyone picks up on those types of things when writing 3 part update statements for the Quirky Update. Then, add in the fact that you didn't go for just an "=" condition... you allowed for minor shingling (overlaps) errors by using "<=", instead. AND, the case statements to do that are in the correct order which a lot of people actually mess up on even if they do get the parts of the CASE formula right.

I've also seen other people get things right forumula wise but then they leave out one of the quintessential rules necessary for the Quirky Update and I have to get on them. Not so here nor in your other Quirky Update examples (well... except for that one JOINed update :-P).

Considering all the heat I've taken over time for pushing the Quirky Update, I just can't let that kind of attention to detail go by without saying "VERY well done, Wayne!" Thanks for doing it EXACTLY correctly. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 2

To anyone else reading this particular thread...

The method Wayne used is affectionately known as the "Quirky Update" or the "Running Total Solution". It has rules that you MUST follow to make it work right over time. If you look at Wayne's code, it's pretty obvious that the rules are NOT difficult but they are ESSENTIAL to protecting the data and the end result itself.

If you want to use the method, then I strongly recommend that you read the article at the link Wayne points you to in the code. If you don't read anything else, read the rules and follow them to a "T". If you don't want to follow the rules for using the "Quirky Update", then I recommend you don't use it and use either a Cursor or a While loop instead. Don't use the fancy correlated subqueries that you see in most code because it's just going to slow you down... a lot! 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 3

thava,

Does that answer your question for you?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

WayneS SSC Guru Points: 95405 More actions · Answer 4

Jeff Moden (8/29/2010)

Wayne,

I was looking at your previous row calculation on the 3 part update thinking to myself "Cool... he even thought of the forced RN change when EMID changes just in case adjacent rows have the same times". About that time I also realized that not everyone picks up on those types of things when writing 3 part update statements for the Quirky Update. Then, add in the fact that you didn't go for just an "=" condition... you allowed for minor shingling (overlaps) errors by using "<=", instead. AND, the case statements to do that are in the correct order which a lot of people actually mess up on even if they do get the parts of the CASE formula right.

I've also seen other people get things right forumula wise but then they leave out one of the quintessential rules necessary for the Quirky Update and I have to get on them. Not so here nor in your other Quirky Update examples (well... except for that one JOINed update :-P).

Considering all the heat I've taken over time for pushing the Quirky Update, I just can't let that kind of attention to detail go by without saying "VERY well done, Wayne!" Thanks for doing it EXACTLY correctly. 🙂

:blush: Thanks Jeff. But - I did learn this from you.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

SwePeso SSC-Dedicated Points: 39725 More actions · Answer 5

Can one of you gurus please test this. It's something I came up with today.

SET STATISTICS IO ON returns

Table '#619B8048'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SET SHOWPLAN_TEXT ON returns



  |--Compute Scalar(DEFINE:([Expr1019]=datediff(minute,[Expr1016],[Expr1018])))

       |--Stream Aggregate(GROUP BY:(.[EmID], [Expr1015]) DEFINE:([Expr1016]=MIN([Expr1011]), [Expr1017]=MAX([Expr1012]), [Expr1018]=MAX([Expr1011])))

            |--Sort(ORDER BY:(.[EmID] ASC, [Expr1015] ASC))

                 |--Compute Scalar(DEFINE:([Expr1015]=([Expr1013]-(1))/(2)))

                      |--Sequence Project(DEFINE:([Expr1013]=row_number))

                           |--Segment

                                |--Sort(ORDER BY:(.[EmID] ASC, [Expr1011] ASC))

                                     |--Filter(WHERE:([Expr1010]=(1) OR datediff(minute,[Expr1011],[Expr1012])>(0)))

                                          |--Compute Scalar(DEFINE:([Expr1010]=CONVERT_IMPLICIT(int,[Expr1029],0)))

                                               |--Stream Aggregate(GROUP BY:(.[EmID], [Expr1009]) DEFINE:([Expr1029]=Count(*), [Expr1011]=MIN([Expr1007]), [Expr1012]=MAX([Expr1007])))

                                                    |--Sequence Project(DEFINE:([Expr1009]=dense_rank))

                                                         |--Segment

                                                              |--Segment

                                                                   |--Sort(ORDER BY:(.[EmID] ASC, [Expr1007] ASC))

                                                                        |--Filter(WHERE:([Expr1007] IS NOT NULL))

                                                                             |--Nested Loops(Left Outer Join, OUTER REFERENCES:(.[InTime], .[OutTime]))

                                                                                  |--Compute Scalar(DEFINE:([Expr1004]=@Sample.[EmID] as .[EmID]))

                                                                                  |    |--Table Scan(OBJECT:(@Sample AS ))

                                                                                  |--Constant Scan(VALUES:((@Sample.[InTime] as .[InTime]),(@Sample.[OutTime] as .[OutTime])))

Oh... And the code is here



-- Prepare sample data

DECLARE	@Sample TABLE

	(

		RID INT,

		EmID INT,

		InTime DATETIME,

		OutTime DATETIME

	)

INSERT	@Sample

SELECT	62, 12, '20050818 05:02', '20050818 06:31' UNION ALL

SELECT	63, 12, '20050818 06:31', '20050818 06:33' UNION ALL

SELECT	64, 12, '20050818 06:33', '20050818 06:34' UNION ALL

SELECT	66, 12, '20050818 07:02', '20050818 08:31' UNION ALL

SELECT	67, 12, '20050818 08:31', '20050818 08:33' UNION ALL

SELECT	68, 12, '20050818 08:33', '20050818 09:34' UNION ALL

SELECT	72, 12, '20050819 05:02', '20050819 06:31' UNION ALL

SELECT	73, 12, '20050819 06:31', '20050819 06:33' UNION ALL

SELECT	74, 12, '20050819 06:33', '20050819 06:34' UNION ALL

SELECT	76, 12, '20050819 07:02', '20050819 08:31' UNION ALL

SELECT	77, 12, '20050819 08:31', '20050819 08:33' UNION ALL

SELECT	78, 12, '20050819 08:33', '20050819 09:34'

;WITH cteSource(EmID, DT, Grp)

AS (

	SELECT		u.EmID,

			u.DT,

			DENSE_RANK() OVER (PARTITION BY EmID ORDER BY u.DT) AS Grp

	FROM		@Sample AS s

	UNPIVOT		(

				DT

				FOR theCol IN (s.InTime, s.OutTime)

			) AS u

), cteGrouping(EmID, MinDT, MaxDT, ColID)

AS (

	SELECT		EmID,

			MIN(DT) AS MinDT,

			MAX(DT) AS MaxDT,

			ROW_NUMBER() OVER (PARTITION BY EmID ORDER BY MIN(DT)) - 1 AS ColID

	FROM		cteSource

	GROUP BY	EmID,

			Grp

	HAVING		COUNT(*) = 1

			OR DATEDIFF(MINUTE, MIN(DT), MAX(DT)) > 0

)

SELECT		EmID,

		MIN(MinDT) AS FromTime,

		MAX(MaxDT) AS ToTime,

		DATEDIFF(MINUTE, MIN(MinDT), MAX(MinDT)) AS [Minutes]

FROM		cteGrouping

GROUP BY	EmID,

		ColID / 2

ORDER BY	EmID,

		ColID / 2

N 56°04'39.16"
E 12°55'05.25"

WayneS SSC Guru Points: 95405 More actions · Answer 6

SwePeso (8/29/2010)

Can one of you gurus please test this. It's something I came up with today.

Using the 116,000 rows of data from my previous post, this generates these results:

Table '#Test___00000000001A'. Scan count 3, logical reads 539, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 1264 ms, elapsed time = 1846 ms.

In comparison, the method I used has these results:

Table '#Test_____00000000000D'. Scan count 1, logical reads 540, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 187 ms, elapsed time = 187 ms.

Table '#Test_____00000000000D'. Scan count 3, logical reads 539, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 171 ms, elapsed time = 770 ms.

So, the method you came up with took twice as long, but with half of the reads. Pretty impressive... and it doesn't use the controversial "quirky" update. Much better than the triangular / cross joins from before.

FYI, if I don't add the clustered index, it takes 2088ms, but has the same statistics. The only thing in the execution plan that changes is the clustered index scan goes to a table scan.

For anyone that wants to compare the execution plans:

SSC1.sqlplan is this one with the unpivot.

SSC2.sqlplan is for my earlier code.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2