How can I calculate how many mussels have been shucked?

Question

Post reply

How can I calculate how many mussels have been shucked?

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply

RBarryYoung SSC Guru Points: 143329 More actions · Answer 1

noeld (8/27/2008)

I felt that I had to post my version 😀

SELECT a.ShiftID , a.TableID , a.StationID , a.EmployeeID , a.CounterDateTimeStamp , a.MusselCount , a.CounterTimeSpan ,(SELECT TOP 1 b.MusselCount FROM dbo.ShiftCountersRawData b WHERE b.ShiftID = a.ShiftID AND b.TableID = a.TableID AND b.StationID = a.StationID AND b.EmployeeID = a.EmployeeID AND b.CounterDateTimeStamp > a.CounterDateTimeStamp ORDER BY b.CounterDateTimeStamp ) - a.MusselCount as MusselCountDiff FROM dbo.ShiftCountersRawData a ORDER BY a.ShiftID, a.TableID, a.StationID, a.EmployeeID, a.CounterDateTimeStamp

Yeah, I've been playing around with a version like that one, Noel. Except that I think that you have the A's & B's reversed: you are reporting the count difference on the record before the count is take. I am pretty sure that it is supposed to be on the record after.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Tomm Carr SSCertifiable Points: 5426 More actions · Answer 2

To really make a test, I loaded up the table. After loading in the first 25 rows of the supplied data (I left out the rows that didn't change amounts), I made the employee work every day for the past year and then added 99 other employees, all working every day. This gave me a table with just shy of 1 million rows.

if Object_ID( 'dbo.ShiftCountersRawData', 'U' ) is not null begin



    truncate table dbo.ShiftCountersRawData;

    end;

else begin

    create table ShiftCountersRawData  (

        ShiftID             int         NOT NULL,

        TableID             int         NOT NULL,

        StationID           int         NOT NULL,

        EmployeeID          varchar(12) NOT NULL,

        CounterDateTimeStamp datetime   NOT NULL,

        MusselCount         int         NOT NULL,

        CounterTimeSpan     int             NULL,

        constraint PK_ShiftCountersRawData primary key

                (ShiftID, TableID, StationID, EmployeeID, CounterDateTimeStamp)

    );

    end;

Insert  dbo.ShiftCountersRawData

        (ShiftID, TableID, StationID, EmployeeID, CounterDateTimeStamp, MusselCount, CounterTimeSpan)

select dbo.GetRowCount( 'ShiftCountersRawData' );

select  7, 1, 1, 6652, '2006-12-07 13:13:22.223', 10261, 180        union all

select  7, 1, 1, 6652, '2006-12-07 13:16:22.330', 10261, 180        union all

select  7, 1, 1, 6652, '2006-12-07 13:19:22.470', 10302, 180        union all

select  7, 1, 1, 6652, '2006-12-07 13:22:22.470', 10388, 180        union all

select  7, 1, 1, 6652, '2006-12-07 13:25:22.487', 10476, 180        union all

select  7, 1, 1, 6652, '2006-12-07 13:28:22.487', 10565, 180        union all

select  7, 1, 1, 6652, '2006-12-07 13:31:22.487', 10648, 180        union all

select  7, 1, 1, 6652, '2006-12-07 13:34:22.487', 10734, 180        union all

select  7, 1, 1, 6652, '2006-12-07 13:37:22.487', 10825, 180        union all

select  7, 1, 1, 6652, '2006-12-07 13:40:22.487', 10917, 180        union all

select  7, 1, 1, 6652, '2006-12-07 13:43:22.503', 10998, 180        union all

select  7, 1, 1, 6652, '2006-12-07 13:46:22.503', 11084, 180        union all

select  7, 1, 1, 6652, '2006-12-07 13:49:22.503', 11179, 180        union all

select  7, 1, 1, 6652, '2006-12-07 13:52:22.503', 11268, 180        union all

select  7, 1, 1, 6652, '2006-12-07 13:55:22.503', 11355, 180        union all

select  7, 1, 1, 6652, '2006-12-07 13:58:22.503', 11436, 180        union all

select  7, 1, 1, 6652, '2006-12-07 14:01:22.673', 11521, 180        union all

select  7, 1, 1, 6652, '2006-12-07 14:04:22.767', 11600, 180        union all

select  7, 1, 1, 6652, '2006-12-07 14:07:22.770', 11684, 180        union all

select  7, 1, 1, 6652, '2006-12-07 14:10:22.770', 11763, 180        union all

select  7, 1, 1, 6652, '2006-12-07 14:13:22.770', 11839, 180        union all

select  7, 1, 1, 6652, '2006-12-07 14:16:22.770', 11894, 180        union all

select  7, 1, 1, 6652, '2006-12-07 14:19:22.783', 11972, 180        union all

select  7, 1, 1, 6652, '2006-12-07 14:22:22.783', 12005, 180        union all

select  7, 1, 1, 6652, '2006-12-07 14:25:22.783', 12025, 180;

Declare @I  int;

set  @I  = 1;

-- Make the poor sucker work every day for the past year.

while   @I < 365 begin

    Insert  dbo.ShiftCountersRawData

            (ShiftID, TableID, StationID, EmployeeID, CounterDateTimeStamp, MusselCount, CounterTimeSpan)

    select  ShiftID, TableID, StationID, EmployeeID,

        DateAdd( dd, -@I, CounterDateTimeStamp ), MusselCount, CounterTimeSpan

    from    dbo.ShiftCountersRawData

    where   CounterDateTimeStamp > '2006-12-07';

    set @I = @I + 1;

end;

set  @I  = 1;

-- Add 99 other employees, all working the entire year.

while   @I < 100 begin

    Insert  dbo.ShiftCountersRawData

            (ShiftID, TableID, StationID, EmployeeID, CounterDateTimeStamp, MusselCount, CounterTimeSpan)

    select  ShiftID, TableID, StationID,

            EmployeeID + @I, CounterDateTimeStamp, MusselCount + @I, CounterTimeSpan

    from    dbo.ShiftCountersRawData

    where   EmployeeID = 6652;

    set @I = @I + 1;

end;Then, with multiple employees working multiple days, I had to add a WHERE clause.declare @EmpNo int, @StartDate datetime, @EndDate datetime;



set @EmpNo = 6700;

set @StartDate = '2006-01-01';

set @EndDate = DateAdd( dd, 1, @StartDate );

select  s1.ShiftID, s1.TableID, s1.StationID, s1.EmployeeID,

        s1.CounterDateTimeStamp as CounterStart, s2.CounterDateTimeStamp as CounterEnd,

        s1.MusselCount as StartingMusselCount, s2.MusselCount as EndingMusselCount,

        s2.MusselCount - s1.MusselCount as MusselCount

from    ShiftCountersRawData   s1

join    ShiftCountersRawData   s2

    on  s1.ShiftID      = s2.ShiftID

    and s1.TableID      = s2.TableID

    and s1.StationID    = s2.StationID

    and s1.EmployeeID   = s2.EmployeeID

    and s2.CounterDateTimeStamp  = (

        select  Min( s3.CounterDateTimeStamp )

        from    ShiftCountersRawData   s3

        where  s3.CounterDateTimeStamp > s1.CounterDateTimeStamp

            and  s3.ShiftID  = s1.ShiftID

            and  s3.TableID  = s1.TableID

            and  s3.StationID= s1.StationID

            and  s3.EmployeeID= s1.EmployeeID

        group by s3.ShiftID, s3.TableID, s3.StationID, s3.EmployeeID

     )

where   s1.EmployeeID  = @EmpNo

    and s1.CounterDateTimeStamp between @StartDate and @EndDate

    and s2.CounterDateTimeStamp between @StartDate and @EndDate;

At first I only added the s1 date check but I was getting back an extra row. Turns out the last entry for the day was joining with the first entry of the next day. The s2 date check fixed that. I ran the query multiple times with different employee numbers and different dates. The query averaged 200ms on my laptop.

Then I ran noeld's query -- after also adding a WHERE clause, or in this case, one new WHERE clause and an addition to an existing one: SELECT a.ShiftID



                , a.TableID

                , a.StationID

                , a.EmployeeID

                , a.CounterDateTimeStamp

                , a.MusselCount

                , a.CounterTimeSpan

                ,(SELECT TOP 1 b.MusselCount

                   FROM dbo.ShiftCountersRawData b

                   WHERE     b.ShiftID = a.ShiftID

                         AND b.TableID = a.TableID

                         AND b.StationID = a.StationID

                         AND b.EmployeeID = a.EmployeeID

                         AND b.CounterDateTimeStamp > a.CounterDateTimeStamp

                        and b.CounterDateTimeStamp between @StartDate and @EndDate

          ORDER BY  b.CounterDateTimeStamp )

         - a.MusselCount as MusselCountDiff

  FROM dbo.ShiftCountersRawData a

  where   EmployeeID  = @EmpNo

    and CounterDateTimeStamp between @StartDate and @EndDate';

Executing this with the same inputs resulted in a 180ms average response -- a 10% improvement. Not too bad, but there is still one problem. It always returns 26 rows, with the MusselCountDiff value being NULL on the last row.

I haven't thought of an easy way around this. Any fix will surely increase the response time. It could be allowed as is, but then subsequent code will have to work around that invalid row, which will complicate the overall application efficiency.

"It it's not...one...thing...it's anotherrrrrrrr!" (Sung to the tune of...well, any tune will do.)

Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms

Tomm Carr SSCertifiable Points: 5426 More actions · Answer 3

rbarryyoung (8/27/2008)

Yeah, I've been playing around with a version like that one, Noel. Except that I think that you have the A's & B's reversed: you are reporting the count difference on the record before the count is take. I am pretty sure that it is supposed to be on the record after.

I'm afraid you have the A's and B's reversed. Noel's query returns the correct result (except for that last extra row for tables with data for more than one day). In fact, your code is failing to limit the time of the subquery (B) to only the times occurring after the A record. So the subquery is always returning the same value, the MusselCount value of the first entry for that day. Your result is a running total for the day rather than a row-by-row difference because you are using the subquery result as the subtrahend rather than the minuend like Noel.

Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 4

rbarryyoung (8/26/2008)

Jeff, dude:

Jeff Moden (8/26/2008)

SELECT ROW_NUMBER() OVER (PARTITION BY ShiftID, TableID, StationID, EmployeeID, DATEADD(dd,DATEDIFF(dd,0,CounterDateTimeStamp),0) ORDER BY ShiftID, TableID, StationID, EmployeeID, CounterDateTimeStamp) AS RowNum, *

... Uhm, ... what the blankety-blank is this !?!?!

Heh...THAT is unfinished business... I didn't complete the rest of the query (CPR'd the wrong copy)... I'll be back... I'll probably have a million row table in tow kinda like Tom did... I want to test all the methods include Noels correlated sub-query method.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

RBarryYoung SSC Guru Points: 143329 More actions · Answer 5

Tomm Carr (8/27/2008)

rbarryyoung (8/27/2008)

Yeah, I've been playing around with a version like that one, Noel. Except that I think that you have the A's & B's reversed: you are reporting the count difference on the record before the count is take. I am pretty sure that it is supposed to be on the record after.

I'm afraid you have the A's and B's reversed. Noel's query returns the correct result (except for that last extra row for tables with data for more than one day). In fact, your code is failing to limit the time of the subquery (B) to only the times occurring after the A record. So the subquery is always returning the same value, the MusselCount value of the first entry for that day. Your result is a running total for the day rather than a row-by-row difference because you are using the subquery result as the subtrahend rather than the minuend like Noel.

You are right, I forgot to bound the inner query correctly when I changed my original query. Thanks for the catch.

However, it is not because I have A and B reversed, and my original point is still correct: Noel's query (and I think that yours is the same) is assign the count difference to the wrong time/record. It is supposed to be assinging the count difference to the same record that has the seconds difference that it corresponds to, but it is assigning it to the one before it. All of the other queries have this correct (well, my first one does, my second one does need to be fixed, as you pointed out).

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 6

Jeff Moden (8/27/2008)

rbarryyoung (8/26/2008)

Jeff, dude:

Jeff Moden (8/26/2008)

SELECT ROW_NUMBER() OVER (PARTITION BY ShiftID, TableID, StationID, EmployeeID, DATEADD(dd,DATEDIFF(dd,0,CounterDateTimeStamp),0) ORDER BY ShiftID, TableID, StationID, EmployeeID, CounterDateTimeStamp) AS RowNum, *

... Uhm, ... what the blankety-blank is this !?!?!

Heh...THAT is unfinished business... I didn't complete the rest of the query (CPR'd the wrong copy)... I'll be back... I'll probably have a million row table in tow kinda like Tom did... I want to test all the methods include Noels correlated sub-query method.

Heh. Sorry, I thought it was another one of your mysterious voodoo performance tricks.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Tomm Carr SSCertifiable Points: 5426 More actions · Answer 7

rbarryyoung (8/27/2008)

...Noel's query (and I think that yours is the same) is assign the count difference to the wrong time/record. It is supposed to be assinging the count difference to the same record that has the seconds difference that it corresponds to, but it is assigning it to the one before it.

I think I see where you are coming from here. Look at the raw data and follow me for a minute.

In the first time period, the one that starts 13 minutes after 1PM, the initial mussel count is 10261. At the end of that 3 minute period (the beginning of the segment starting at 16 minutes after 1PM) the count is still at 10261. So during the time period of 13 minutes to 16 minutes after the hour, no mussels were shucked. The answer is 0.

Now we look at the time period from 16 minutes to 19 minutes after. It starts with a mussel count of 10261 (we already know that) and ends at 10302. So in the period from 16 minutes to 19 minutes after, 41 mussels were shucked.

The only difference is that we are associating the number of mussels shucked during each period with the row that begins the period and you want to associate the value with the row that ends the period. Personally, I think the former makes more sense (now why do you suppose that would be?).

Think about it like this. Suppose each time period was a complete day, from midnight to midnight. Wouldn't it make more sense to associate the work done during a day with the start of the day? "On May 18th, this much work was performed." The other way would associate May 18th's work with the row labeled "May 19".

However, that is a personal preference. Which one of us is "correct" depends on what the specs require. 😛

Btw, I forgot to mention why I like my query better (in general) than the more simpler correlated subquery. In many places where I use this pattern, I need access to more than one value from the "next" row. For example, my result set shows the timestamp at the beginning of the period, CounterStart, and the end, CounterEnd as well as the beginning and ending MusselCount values. This isn't possible with just a single subquery. I can get a join of the complete "current" row with the complete "next" row for just a 10% performance hit. I don't think that's too shabby.

Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms

Tomm Carr SSCertifiable Points: 5426 More actions · Answer 8

Tomm Carr

SSCertifiable

Points: 5426

August 27, 2008 at 7:53 pm

#864087

Jeff Moden (8/27/2008)

I'll probably have a million row table in tow kinda like Tom did...

I'll make you a deal. You don't call me Tom and I won't call you Jef. 😉

Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 9

Heh... sorry Tomm... bet you've had "mo-den" enough of that 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

RBarryYoung SSC Guru Points: 143329 More actions · Answer 10

Tomm Carr (8/27/2008)The only difference is that we are associating the number of mussels shucked during each period with the row that begins the period and you want to associate the value with the row that ends the period. Personally, I think the former makes more sense (now why do you suppose that would be?).

But it does not make sense, nor is it what was spec'ed. The only association that makes sense is to have the Mussel Count difference associated with the row that records the information for the same time period that the Mussel count difference is calculated against, and that is the Row at the end of that time period. We know that because the column that specifically documents the time span: the CounterTimeSpan column is stored in the row at the end of the time span and not at the beginning.

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Tomm Carr SSCertifiable Points: 5426 More actions · Answer 11

rbarryyoung (8/27/2008)

But it does not make sense, nor is it what was spec'ed.

Sigh! You can be so stubborn! You should be more like me and just have the courage of your convictions.

Very well, have it your own way. To do it that way, you have to process the rows backwards. Iow, you will be working with the "current" row and the count from the "previous" row. To do that, you have to sort the subquery descending. You still have the problem with one row with a NULL for the value, but I've thrown in a quick and dirty fix for that.

Does this give you what you want (as terribly wrong as it may be)?

declare @EmpNo int, @StartDate datetime, @EndDate datetime;



set @EmpNo = 6700;

set @StartDate = '2006-01-01';

set @EndDate = DateAdd( dd, 1, @StartDate );

Select [ShiftID]

  ,[TableID]

  ,[StationID]

  ,[EmployeeID]

  ,[CounterDateTimeStamp]

  ,[MusselCount]

  ,[CounterTimeSpan]

  ,MusselCount - IsNull( (Select TOP 1 B.MusselCount

         From [ShiftCountersRawData] B

         Where B.ShiftID = A.ShiftID

          And B.TableID = A.TableID

          And B.StationID = A.StationID

          And B.EmployeeID = A.EmployeeID

          And B.CounterDateTimeStamp < a.CounterDateTimeStamp

          And B.CounterDateTimeStamp

                Between @StartDate and @EndDate

            order by B.CounterDateTimeStamp desc), MusselCount )

         as [MusselCountDiff]

  FROM [dbo].[ShiftCountersRawData] a

where   a.EmployeeID  = @EmpNo

    and a.CounterDateTimeStamp between @StartDate and @EndDate

Edit-- I meant to add a little jab about how I have not been privy to the spec, but I decided not to be so petty. 😀

I will make one last point. The 25 rows that make up a day's worth of shucking define 24 time periods. If you'll look at the way I've formatted the output (in my original query, not the one above), the result set has 24 rows, each row represents a time period with a start and end timestamp, a start and end mussel count and a total count shucked during the period. Using a subquery means a result set of 25 rows. So the first or the last row (depending on if you want end or start time to be significant) will always be extraneous.

Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms