How can I calculate how many mussels have been shucked?

  • noeld (8/27/2008)


    I felt that I had to post my version 😀

    SELECT a.ShiftID

    , a.TableID

    , a.StationID

    , a.EmployeeID

    , a.CounterDateTimeStamp

    , a.MusselCount

    , a.CounterTimeSpan

    ,(SELECT TOP 1 b.MusselCount

    FROM dbo.ShiftCountersRawData b

    WHERE b.ShiftID = a.ShiftID

    AND b.TableID = a.TableID

    AND b.StationID = a.StationID

    AND b.EmployeeID = a.EmployeeID

    AND b.CounterDateTimeStamp > a.CounterDateTimeStamp

    ORDER BY b.CounterDateTimeStamp )

    - a.MusselCount as MusselCountDiff

    FROM dbo.ShiftCountersRawData a

    ORDER BY a.ShiftID, a.TableID, a.StationID, a.EmployeeID, a.CounterDateTimeStamp

    Yeah, I've been playing around with a version like that one, Noel. Except that I think that you have the A's & B's reversed: you are reporting the count difference on the record before the count is take. I am pretty sure that it is supposed to be on the record after.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • To really make a test, I loaded up the table. After loading in the first 25 rows of the supplied data (I left out the rows that didn't change amounts), I made the employee work every day for the past year and then added 99 other employees, all working every day. This gave me a table with just shy of 1 million rows.

    if Object_ID( 'dbo.ShiftCountersRawData', 'U' ) is not null begin

    truncate table dbo.ShiftCountersRawData;

    end;

    else begin

    create table ShiftCountersRawData (

    ShiftID int NOT NULL,

    TableID int NOT NULL,

    StationID int NOT NULL,

    EmployeeID varchar(12) NOT NULL,

    CounterDateTimeStamp datetime NOT NULL,

    MusselCount int NOT NULL,

    CounterTimeSpan int NULL,

    constraint PK_ShiftCountersRawData primary key

    (ShiftID, TableID, StationID, EmployeeID, CounterDateTimeStamp)

    );

    end;

    Insert dbo.ShiftCountersRawData

    (ShiftID, TableID, StationID, EmployeeID, CounterDateTimeStamp, MusselCount, CounterTimeSpan)

    select dbo.GetRowCount( 'ShiftCountersRawData' );

    select 7, 1, 1, 6652, '2006-12-07 13:13:22.223', 10261, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 13:16:22.330', 10261, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 13:19:22.470', 10302, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 13:22:22.470', 10388, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 13:25:22.487', 10476, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 13:28:22.487', 10565, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 13:31:22.487', 10648, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 13:34:22.487', 10734, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 13:37:22.487', 10825, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 13:40:22.487', 10917, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 13:43:22.503', 10998, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 13:46:22.503', 11084, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 13:49:22.503', 11179, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 13:52:22.503', 11268, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 13:55:22.503', 11355, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 13:58:22.503', 11436, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 14:01:22.673', 11521, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 14:04:22.767', 11600, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 14:07:22.770', 11684, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 14:10:22.770', 11763, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 14:13:22.770', 11839, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 14:16:22.770', 11894, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 14:19:22.783', 11972, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 14:22:22.783', 12005, 180 union all

    select 7, 1, 1, 6652, '2006-12-07 14:25:22.783', 12025, 180;

    Declare @I int;

    set @I = 1;

    -- Make the poor sucker work every day for the past year.

    while @I < 365 begin

    Insert dbo.ShiftCountersRawData

    (ShiftID, TableID, StationID, EmployeeID, CounterDateTimeStamp, MusselCount, CounterTimeSpan)

    select ShiftID, TableID, StationID, EmployeeID,

    DateAdd( dd, -@I, CounterDateTimeStamp ), MusselCount, CounterTimeSpan

    from dbo.ShiftCountersRawData

    where CounterDateTimeStamp > '2006-12-07';

    set @I = @I + 1;

    end;

    set @I = 1;

    -- Add 99 other employees, all working the entire year.

    while @I < 100 begin

    Insert dbo.ShiftCountersRawData

    (ShiftID, TableID, StationID, EmployeeID, CounterDateTimeStamp, MusselCount, CounterTimeSpan)

    select ShiftID, TableID, StationID,

    EmployeeID + @I, CounterDateTimeStamp, MusselCount + @I, CounterTimeSpan

    from dbo.ShiftCountersRawData

    where EmployeeID = 6652;

    set @I = @I + 1;

    end;Then, with multiple employees working multiple days, I had to add a WHERE clause.declare @EmpNo int, @StartDate datetime, @EndDate datetime;

    set @EmpNo = 6700;

    set @StartDate = '2006-01-01';

    set @EndDate = DateAdd( dd, 1, @StartDate );

    select s1.ShiftID, s1.TableID, s1.StationID, s1.EmployeeID,

    s1.CounterDateTimeStamp as CounterStart, s2.CounterDateTimeStamp as CounterEnd,

    s1.MusselCount as StartingMusselCount, s2.MusselCount as EndingMusselCount,

    s2.MusselCount - s1.MusselCount as MusselCount

    from ShiftCountersRawData s1

    join ShiftCountersRawData s2

    on s1.ShiftID = s2.ShiftID

    and s1.TableID = s2.TableID

    and s1.StationID = s2.StationID

    and s1.EmployeeID = s2.EmployeeID

    and s2.CounterDateTimeStamp = (

    select Min( s3.CounterDateTimeStamp )

    from ShiftCountersRawData s3

    where s3.CounterDateTimeStamp > s1.CounterDateTimeStamp

    and s3.ShiftID = s1.ShiftID

    and s3.TableID = s1.TableID

    and s3.StationID= s1.StationID

    and s3.EmployeeID= s1.EmployeeID

    group by s3.ShiftID, s3.TableID, s3.StationID, s3.EmployeeID

    )

    where s1.EmployeeID = @EmpNo

    and s1.CounterDateTimeStamp between @StartDate and @EndDate

    and s2.CounterDateTimeStamp between @StartDate and @EndDate;

    At first I only added the s1 date check but I was getting back an extra row. Turns out the last entry for the day was joining with the first entry of the next day. The s2 date check fixed that. I ran the query multiple times with different employee numbers and different dates. The query averaged 200ms on my laptop.

    Then I ran noeld's query -- after also adding a WHERE clause, or in this case, one new WHERE clause and an addition to an existing one: SELECT a.ShiftID

    , a.TableID

    , a.StationID

    , a.EmployeeID

    , a.CounterDateTimeStamp

    , a.MusselCount

    , a.CounterTimeSpan

    ,(SELECT TOP 1 b.MusselCount

    FROM dbo.ShiftCountersRawData b

    WHERE b.ShiftID = a.ShiftID

    AND b.TableID = a.TableID

    AND b.StationID = a.StationID

    AND b.EmployeeID = a.EmployeeID

    AND b.CounterDateTimeStamp > a.CounterDateTimeStamp

    and b.CounterDateTimeStamp between @StartDate and @EndDate

    ORDER BY b.CounterDateTimeStamp )

    - a.MusselCount as MusselCountDiff

    FROM dbo.ShiftCountersRawData a

    where EmployeeID = @EmpNo

    and CounterDateTimeStamp between @StartDate and @EndDate';

    Executing this with the same inputs resulted in a 180ms average response -- a 10% improvement. Not too bad, but there is still one problem. It always returns 26 rows, with the MusselCountDiff value being NULL on the last row.

    I haven't thought of an easy way around this. Any fix will surely increase the response time. It could be allowed as is, but then subsequent code will have to work around that invalid row, which will complicate the overall application efficiency.

    "It it's not...one...thing...it's anotherrrrrrrr!" (Sung to the tune of...well, any tune will do.)

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

  • rbarryyoung (8/27/2008)


    Yeah, I've been playing around with a version like that one, Noel. Except that I think that you have the A's & B's reversed: you are reporting the count difference on the record before the count is take. I am pretty sure that it is supposed to be on the record after.

    I'm afraid you have the A's and B's reversed. Noel's query returns the correct result (except for that last extra row for tables with data for more than one day). In fact, your code is failing to limit the time of the subquery (B) to only the times occurring after the A record. So the subquery is always returning the same value, the MusselCount value of the first entry for that day. Your result is a running total for the day rather than a row-by-row difference because you are using the subquery result as the subtrahend rather than the minuend like Noel.

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

  • rbarryyoung (8/26/2008)


    Jeff, dude:

    Jeff Moden (8/26/2008)


    SELECT ROW_NUMBER() OVER (PARTITION BY ShiftID, TableID, StationID, EmployeeID, DATEADD(dd,DATEDIFF(dd,0,CounterDateTimeStamp),0)

    ORDER BY ShiftID, TableID, StationID, EmployeeID, CounterDateTimeStamp) AS RowNum,

    *

    ... Uhm, ... what the blankety-blank is this !?!?!

    Heh...THAT is unfinished business... I didn't complete the rest of the query (CPR'd the wrong copy)... I'll be back... I'll probably have a million row table in tow kinda like Tom did... I want to test all the methods include Noels correlated sub-query method.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Tomm Carr (8/27/2008)


    rbarryyoung (8/27/2008)


    Yeah, I've been playing around with a version like that one, Noel. Except that I think that you have the A's & B's reversed: you are reporting the count difference on the record before the count is take. I am pretty sure that it is supposed to be on the record after.

    I'm afraid you have the A's and B's reversed. Noel's query returns the correct result (except for that last extra row for tables with data for more than one day). In fact, your code is failing to limit the time of the subquery (B) to only the times occurring after the A record. So the subquery is always returning the same value, the MusselCount value of the first entry for that day. Your result is a running total for the day rather than a row-by-row difference because you are using the subquery result as the subtrahend rather than the minuend like Noel.

    You are right, I forgot to bound the inner query correctly when I changed my original query. Thanks for the catch.

    However, it is not because I have A and B reversed, and my original point is still correct: Noel's query (and I think that yours is the same) is assign the count difference to the wrong time/record. It is supposed to be assinging the count difference to the same record that has the seconds difference that it corresponds to, but it is assigning it to the one before it. All of the other queries have this correct (well, my first one does, my second one does need to be fixed, as you pointed out).

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Jeff Moden (8/27/2008)


    rbarryyoung (8/26/2008)


    Jeff, dude:

    Jeff Moden (8/26/2008)


    SELECT ROW_NUMBER() OVER (PARTITION BY ShiftID, TableID, StationID, EmployeeID, DATEADD(dd,DATEDIFF(dd,0,CounterDateTimeStamp),0)

    ORDER BY ShiftID, TableID, StationID, EmployeeID, CounterDateTimeStamp) AS RowNum,

    *

    ... Uhm, ... what the blankety-blank is this !?!?!

    Heh...THAT is unfinished business... I didn't complete the rest of the query (CPR'd the wrong copy)... I'll be back... I'll probably have a million row table in tow kinda like Tom did... I want to test all the methods include Noels correlated sub-query method.

    Heh. Sorry, I thought it was another one of your mysterious voodoo performance tricks.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • rbarryyoung (8/27/2008)


    ...Noel's query (and I think that yours is the same) is assign the count difference to the wrong time/record. It is supposed to be assinging the count difference to the same record that has the seconds difference that it corresponds to, but it is assigning it to the one before it.

    I think I see where you are coming from here. Look at the raw data and follow me for a minute.

    In the first time period, the one that starts 13 minutes after 1PM, the initial mussel count is 10261. At the end of that 3 minute period (the beginning of the segment starting at 16 minutes after 1PM) the count is still at 10261. So during the time period of 13 minutes to 16 minutes after the hour, no mussels were shucked. The answer is 0.

    Now we look at the time period from 16 minutes to 19 minutes after. It starts with a mussel count of 10261 (we already know that) and ends at 10302. So in the period from 16 minutes to 19 minutes after, 41 mussels were shucked.

    The only difference is that we are associating the number of mussels shucked during each period with the row that begins the period and you want to associate the value with the row that ends the period. Personally, I think the former makes more sense (now why do you suppose that would be?).

    Think about it like this. Suppose each time period was a complete day, from midnight to midnight. Wouldn't it make more sense to associate the work done during a day with the start of the day? "On May 18th, this much work was performed." The other way would associate May 18th's work with the row labeled "May 19".

    However, that is a personal preference. Which one of us is "correct" depends on what the specs require. 😛

    Btw, I forgot to mention why I like my query better (in general) than the more simpler correlated subquery. In many places where I use this pattern, I need access to more than one value from the "next" row. For example, my result set shows the timestamp at the beginning of the period, CounterStart, and the end, CounterEnd as well as the beginning and ending MusselCount values. This isn't possible with just a single subquery. I can get a join of the complete "current" row with the complete "next" row for just a 10% performance hit. I don't think that's too shabby.

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

  • Jeff Moden (8/27/2008)


    I'll probably have a million row table in tow kinda like Tom did...

    I'll make you a deal. You don't call me Tom and I won't call you Jef. 😉

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

  • Heh... sorry Tomm... bet you've had "mo-den" enough of that 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Tomm Carr (8/27/2008)The only difference is that we are associating the number of mussels shucked during each period with the row that begins the period and you want to associate the value with the row that ends the period. Personally, I think the former makes more sense (now why do you suppose that would be?).

    But it does not make sense, nor is it what was spec'ed. The only association that makes sense is to have the Mussel Count difference associated with the row that records the information for the same time period that the Mussel count difference is calculated against, and that is the Row at the end of that time period. We know that because the column that specifically documents the time span: the CounterTimeSpan column is stored in the row at the end of the time span and not at the beginning.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • rbarryyoung (8/27/2008)


    But it does not make sense, nor is it what was spec'ed.

    Sigh! You can be so stubborn! You should be more like me and just have the courage of your convictions.

    Very well, have it your own way. To do it that way, you have to process the rows backwards. Iow, you will be working with the "current" row and the count from the "previous" row. To do that, you have to sort the subquery descending. You still have the problem with one row with a NULL for the value, but I've thrown in a quick and dirty fix for that.

    Does this give you what you want (as terribly wrong as it may be)?

    declare @EmpNo int, @StartDate datetime, @EndDate datetime;

    set @EmpNo = 6700;

    set @StartDate = '2006-01-01';

    set @EndDate = DateAdd( dd, 1, @StartDate );

    Select [ShiftID]

    ,[TableID]

    ,[StationID]

    ,[EmployeeID]

    ,[CounterDateTimeStamp]

    ,[MusselCount]

    ,[CounterTimeSpan]

    ,MusselCount - IsNull( (Select TOP 1 B.MusselCount

    From [ShiftCountersRawData] B

    Where B.ShiftID = A.ShiftID

    And B.TableID = A.TableID

    And B.StationID = A.StationID

    And B.EmployeeID = A.EmployeeID

    And B.CounterDateTimeStamp < a.CounterDateTimeStamp

    And B.CounterDateTimeStamp

    Between @StartDate and @EndDate

    order by B.CounterDateTimeStamp desc), MusselCount )

    as [MusselCountDiff]

    FROM [dbo].[ShiftCountersRawData] a

    where a.EmployeeID = @EmpNo

    and a.CounterDateTimeStamp between @StartDate and @EndDate

    Edit-- I meant to add a little jab about how I have not been privy to the spec, but I decided not to be so petty. 😀

    I will make one last point. The 25 rows that make up a day's worth of shucking define 24 time periods. If you'll look at the way I've formatted the output (in my original query, not the one above), the result set has 24 rows, each row represents a time period with a start and end timestamp, a start and end mussel count and a total count shucked during the period. Using a subquery means a result set of 25 rows. So the first or the last row (depending on if you want end or start time to be significant) will always be extraneous.

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply