recursive or a better way to solve this problem

  • Wayne,

    I was looking at your previous row calculation on the 3 part update thinking to myself "Cool... he even thought of the forced RN change when EMID changes just in case adjacent rows have the same times". About that time I also realized that not everyone picks up on those types of things when writing 3 part update statements for the Quirky Update. Then, add in the fact that you didn't go for just an "=" condition... you allowed for minor shingling (overlaps) errors by using "<=", instead. AND, the case statements to do that are in the correct order which a lot of people actually mess up on even if they do get the parts of the CASE formula right.

    I've also seen other people get things right forumula wise but then they leave out one of the quintessential rules necessary for the Quirky Update and I have to get on them. Not so here nor in your other Quirky Update examples (well... except for that one JOINed update :-P).

    Considering all the heat I've taken over time for pushing the Quirky Update, I just can't let that kind of attention to detail go by without saying "VERY well done, Wayne!" Thanks for doing it EXACTLY correctly. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • To anyone else reading this particular thread...

    The method Wayne used is affectionately known as the "Quirky Update" or the "Running Total Solution". It has rules that you MUST follow to make it work right over time. If you look at Wayne's code, it's pretty obvious that the rules are NOT difficult but they are ESSENTIAL to protecting the data and the end result itself.

    If you want to use the method, then I strongly recommend that you read the article at the link Wayne points you to in the code. If you don't read anything else, read the rules and follow them to a "T". If you don't want to follow the rules for using the "Quirky Update", then I recommend you don't use it and use either a Cursor or a While loop instead. Don't use the fancy correlated subqueries that you see in most code because it's just going to slow you down... a lot! 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • thava,

    Does that answer your question for you?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (8/29/2010)


    Wayne,

    I was looking at your previous row calculation on the 3 part update thinking to myself "Cool... he even thought of the forced RN change when EMID changes just in case adjacent rows have the same times". About that time I also realized that not everyone picks up on those types of things when writing 3 part update statements for the Quirky Update. Then, add in the fact that you didn't go for just an "=" condition... you allowed for minor shingling (overlaps) errors by using "<=", instead. AND, the case statements to do that are in the correct order which a lot of people actually mess up on even if they do get the parts of the CASE formula right.

    I've also seen other people get things right forumula wise but then they leave out one of the quintessential rules necessary for the Quirky Update and I have to get on them. Not so here nor in your other Quirky Update examples (well... except for that one JOINed update :-P).

    Considering all the heat I've taken over time for pushing the Quirky Update, I just can't let that kind of attention to detail go by without saying "VERY well done, Wayne!" Thanks for doing it EXACTLY correctly. 🙂

    :blush: Thanks Jeff. But - I did learn this from you.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Can one of you gurus please test this. It's something I came up with today.

    SET STATISTICS IO ON returns

    Table '#619B8048'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SET SHOWPLAN_TEXT ON returns

    |--Compute Scalar(DEFINE:([Expr1019]=datediff(minute,[Expr1016],[Expr1018])))

    |--Stream Aggregate(GROUP BY:(.[EmID], [Expr1015]) DEFINE:([Expr1016]=MIN([Expr1011]), [Expr1017]=MAX([Expr1012]), [Expr1018]=MAX([Expr1011])))

    |--Sort(ORDER BY:(.[EmID] ASC, [Expr1015] ASC))

    |--Compute Scalar(DEFINE:([Expr1015]=([Expr1013]-(1))/(2)))

    |--Sequence Project(DEFINE:([Expr1013]=row_number))

    |--Segment

    |--Sort(ORDER BY:(.[EmID] ASC, [Expr1011] ASC))

    |--Filter(WHERE:([Expr1010]=(1) OR datediff(minute,[Expr1011],[Expr1012])>(0)))

    |--Compute Scalar(DEFINE:([Expr1010]=CONVERT_IMPLICIT(int,[Expr1029],0)))

    |--Stream Aggregate(GROUP BY:(.[EmID], [Expr1009]) DEFINE:([Expr1029]=Count(*), [Expr1011]=MIN([Expr1007]), [Expr1012]=MAX([Expr1007])))

    |--Sequence Project(DEFINE:([Expr1009]=dense_rank))

    |--Segment

    |--Segment

    |--Sort(ORDER BY:(.[EmID] ASC, [Expr1007] ASC))

    |--Filter(WHERE:([Expr1007] IS NOT NULL))

    |--Nested Loops(Left Outer Join, OUTER REFERENCES:(.[InTime], .[OutTime]))

    |--Compute Scalar(DEFINE:([Expr1004]=@Sample.[EmID] as .[EmID]))

    | |--Table Scan(OBJECT:(@Sample AS ))

    |--Constant Scan(VALUES:((@Sample.[InTime] as .[InTime]),(@Sample.[OutTime] as .[OutTime])))

    Oh... And the code is here

    -- Prepare sample data

    DECLARE @Sample TABLE

    (

    RID INT,

    EmID INT,

    InTime DATETIME,

    OutTime DATETIME

    )

    INSERT @Sample

    SELECT 62, 12, '20050818 05:02', '20050818 06:31' UNION ALL

    SELECT 63, 12, '20050818 06:31', '20050818 06:33' UNION ALL

    SELECT 64, 12, '20050818 06:33', '20050818 06:34' UNION ALL

    SELECT 66, 12, '20050818 07:02', '20050818 08:31' UNION ALL

    SELECT 67, 12, '20050818 08:31', '20050818 08:33' UNION ALL

    SELECT 68, 12, '20050818 08:33', '20050818 09:34' UNION ALL

    SELECT 72, 12, '20050819 05:02', '20050819 06:31' UNION ALL

    SELECT 73, 12, '20050819 06:31', '20050819 06:33' UNION ALL

    SELECT 74, 12, '20050819 06:33', '20050819 06:34' UNION ALL

    SELECT 76, 12, '20050819 07:02', '20050819 08:31' UNION ALL

    SELECT 77, 12, '20050819 08:31', '20050819 08:33' UNION ALL

    SELECT 78, 12, '20050819 08:33', '20050819 09:34'

    ;WITH cteSource(EmID, DT, Grp)

    AS (

    SELECT u.EmID,

    u.DT,

    DENSE_RANK() OVER (PARTITION BY EmID ORDER BY u.DT) AS Grp

    FROM @Sample AS s

    UNPIVOT (

    DT

    FOR theCol IN (s.InTime, s.OutTime)

    ) AS u

    ), cteGrouping(EmID, MinDT, MaxDT, ColID)

    AS (

    SELECT EmID,

    MIN(DT) AS MinDT,

    MAX(DT) AS MaxDT,

    ROW_NUMBER() OVER (PARTITION BY EmID ORDER BY MIN(DT)) - 1 AS ColID

    FROM cteSource

    GROUP BY EmID,

    Grp

    HAVING COUNT(*) = 1

    OR DATEDIFF(MINUTE, MIN(DT), MAX(DT)) > 0

    )

    SELECT EmID,

    MIN(MinDT) AS FromTime,

    MAX(MaxDT) AS ToTime,

    DATEDIFF(MINUTE, MIN(MinDT), MAX(MinDT)) AS [Minutes]

    FROM cteGrouping

    GROUP BY EmID,

    ColID / 2

    ORDER BY EmID,

    ColID / 2


    N 56°04'39.16"
    E 12°55'05.25"

  • SwePeso (8/29/2010)


    Can one of you gurus please test this. It's something I came up with today.

    Using the 116,000 rows of data from my previous post, this generates these results:

    Table '#Test___00000000001A'. Scan count 3, logical reads 539, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1264 ms, elapsed time = 1846 ms.

    In comparison, the method I used has these results:

    Table '#Test_____00000000000D'. Scan count 1, logical reads 540, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 187 ms, elapsed time = 187 ms.

    Table '#Test_____00000000000D'. Scan count 3, logical reads 539, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 171 ms, elapsed time = 770 ms.

    So, the method you came up with took twice as long, but with half of the reads. Pretty impressive... and it doesn't use the controversial "quirky" update. Much better than the triangular / cross joins from before.

    FYI, if I don't add the clustered index, it takes 2088ms, but has the same statistics. The only thing in the execution plan that changes is the clustered index scan goes to a table scan.

    For anyone that wants to compare the execution plans:

    SSC1.sqlplan is this one with the unpivot.

    SSC2.sqlplan is for my earlier code.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply