Expressions based on data in the SELECT statement

  • Hi all,

    I have data (gathered from eye-tracking software) that represents when a person looked at various things in a timeline e.g.

    SessionId CaptureId ThingId StartTime FinishTime

    1 1 A 0 1

    1 2 B 1 2

    1 3 C 2 3

    1 4 A 3 4

    1 5 C 4 5

    2 1 B 0 1

    etc, etc, etc, etc, etc.....

    I need to work out if the first thing looked at was a target thing (to be supplied as a parameter), or if it was seen within x seconds (another parameter) or if it was seen at all.

    I think I need to end up with a table like:

    SessionId ThingWasSeenFirst ThingWasSeenWithin4Seconds ThingWasSeen

    Eventually tis info will end up in an SSRS report (which will actually show percentage i.e. what percentage of people saw the thing immediately, etc, etc).

    Now, I don't want anyone to write the SQL for me. I'm not looking for someone else to do my work. I'm looking for advice on the best way to tackle the problem i.e. can it be done in a single statement, do I need a sproc or function, should I be using temp tables, etc.

    I've been trawling articles and books all day but seen no example of what I am trying to do. I don't even know what it's called (aggregration? subqueires?)

    Any help will be very much appreciated.

    Kind regards

    jeanangeo

  • Without writing any SQL as explanation, I can only be general about what you can try to do (there will be several methods of tackling this issue, this is only one of them)

    Thing was seen first:

    This is fairly easy. Select the MIN (aggregate) CaptureID for each session and see if it that captureID's thing matches your Target Thing.

    Thing was seen within 4 seconds:

    I don't know where your actual times live... those start and finish times look like they might be referential to another table. Regardless of this, the goal is to figure out *what* things were looked at in the first 4 seconds (could possibly be a list), and then see if your target thing is in this list.

    Thing was seen at all:

    Same as the 4 Seconds method, but with the time restrictions removed.

    This could definitely all be in one Stored Procedure. You probably won't need any outside functions for it. Exactly what you use to accomplish this(and whether or not it could all be one select statement) would depend a lot on how your data(such as actual times) is stored, and how much manipulation needs to be done to it. Chances are that with either derived tables, or common table expressions (CTE), you could get it all into one select. You could also likely use temp tables / table variables.

    Once you have all this information, you can use CASE statements to output a simple YES or NO for each column for each session.

    Without writing any of the SQL for you, or seeing your exact table structure / sample data, that's about as descriptive as I can be.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • I may be missing something but sounds like you want to select the records where the Thing ID is the passed in value and (the CaptureID is 1 or the time is between some passed in start & passed in end values)

    That would get you the records for folks that meet the criteria. You can add a field to your select that is a simple CASE that returns a 1 when CaptureID is 1 and a 0 otherwise. That gives an easy reference (bit column) for whether a returned record was a first or not, assuming you don't want to use captureID alone for that.

  • Hi guys,

    Thanks very much for the replies. Very much appreciated. It's very good to know I'm no barking up the wrong tree. Now I've just got to figure out what's wrong with my syntax.

    The Start and Finish are the actual times simplified (although it looks like that actualy caused more confusion). I've simplified the initial state of the data as well. In actual fact, I have to do several joins just to get to the point of having the data in the format:

    SessionId CaptureId ThingId StartTime FinishTime

    1 1 A 0 1

    1 2 B 1 2

    1 3 C 2 3

    1 4 A 3 4

    1 5 C 4 5

    2 1 B 0 1

    I've been playing with it for a while but cannot seem to figure out the syntax for the first bit i.e. working out if the first record for a session matches my target thing. I can manage to get to where I have the id of the first selected thing for the session but can't seem to compare it to the target id and get a boolean without a syntax error.

    I've got something like:

    SELECT SessionId, (SELECT TOP 1 Things.Id FROM {AllMyJoinSQLAndJoinOnSessionId} AS FirstSeenThing) FROM {AllMyJoinSQL}

    but what I need is something like

    SELECT SessionId, (SELECT TOP 1 Thing.Id FROM {AllMyJoinSQLAndJoinOnSessionId} == @TargetThingId AS ThingWasSeenFirst) FROM {AllMyJoinSQL}

    but I get "Syntax error near =" when I try this. I've searched and searched but can't find an example of a column being calculated like this anywhere. I also tried to get it to work with a CASE statement with no luck. I must be doing the syntax barmy as Seth said this bit should be easy :(.

    I think I will set up an experiment where I can try the SQL without having to worry about the joins i.e. set up some artificial tables where I can just worry about the calculated columns. Meanwhile, if anyone could point me in the direction of an example of something similar it would also be greatly appreciated. If I have no luck with my simplfied experiment, I'll give in, post the SQL and hope maybe someone could point out what I'm doing wrong.

    Thanks again for the responses. Much appreciated.

    jeanangeo

    --------------------------------
    I found this article helpful in passing on the info the more experienced posters need to help me with my problem:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • What you have right now is an attempt at a correlated subquery. You don't want that. What you want that to be is a derived table.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • I am getting somewhere with this slowly but it sure is painful.

    I set up my simple experiment with the tables:

    SessionId Name Start

    -----------------------------------------------

    1 SESSION1 30/10/2008 09:00:00

    2 SESSION2 30/10/2008 11:00:00

    3 SESSION3 30/10/2008 15:00:00

    ThingId Name

    ----------------

    1 THING1

    2 THING2

    3 THING3

    4 THING4

    5 THING5

    CaptureIdSessionIdThingId Start Finish

    ---------------------------------------------------------------------------

    1 1 3 30/10/2008 09:00:05 30/10/2008 09:00:15

    2 1 1 30/10/2008 09:00:15 30/10/2008 09:00:30

    3 1 3 30/10/2008 09:00:30 30/10/2008 09:00:31

    4 1 4 30/10/2008 09:00:31 30/10/2008 09:00:38

    5 1 5 30/10/2008 09:00:38 30/10/2008 09:00:42

    6 1 2 30/10/2008 09:00:42 30/10/2008 09:00:49

    7 1 1 30/10/2008 09:00:49 30/10/2008 09:00:54

    8 1 5 30/10/2008 09:00:54 30/10/2008 09:01:01

    9 1 3 30/10/2008 09:01:01 30/10/2008 09:01:05

    10 1 4 30/10/2008 09:01:05 30/10/2008 09:01:15

    11 2 4 30/10/2008 11:00:05 30/10/2008 11:00:15

    12 2 1 30/10/2008 11:00:15 30/10/2008 11:00:30

    13 2 2 30/10/2008 11:00:30 30/10/2008 11:00:31

    14 2 5 30/10/2008 11:00:31 30/10/2008 11:00:38

    15 2 3 30/10/2008 11:00:38 30/10/2008 11:00:42

    16 2 4 30/10/2008 11:00:42 30/10/2008 11:00:49

    17 2 5 30/10/2008 11:00:49 30/10/2008 11:00:54

    18 2 2 30/10/2008 11:00:54 30/10/2008 11:01:01

    19 2 1 30/10/2008 11:01:01 30/10/2008 11:01:05

    20 2 3 30/10/2008 11:01:05 30/10/2008 11:01:15

    21 3 5 30/10/2008 15:00:05 30/10/2008 15:00:15

    22 3 2 30/10/2008 15:00:15 30/10/2008 15:00:30

    23 3 3 30/10/2008 15:00:30 30/10/2008 15:00:31

    24 3 1 30/10/2008 15:00:31 30/10/2008 15:00:38

    25 3 4 30/10/2008 15:00:38 30/10/2008 15:00:42

    26 3 5 30/10/2008 15:00:42 30/10/2008 15:00:49

    27 3 1 30/10/2008 15:00:49 30/10/2008 15:00:54

    28 3 3 30/10/2008 15:00:54 30/10/2008 15:01:01

    29 3 5 30/10/2008 15:01:01 30/10/2008 15:01:05

    30 3 2 30/10/2008 15:01:05 30/10/2008 15:01:15

    and can now get a column that displays 1 if the target thing was seen first or 0 if not:

    DECLARE @TargetThingId AS int;

    SET @TargetThingId = 4;

    SELECT

    s1.SessionId,

    s1.Name,

    CASE

    (

    SELECT TOP 1 t.ThingId

    FROM (Sessions s2 INNER JOIN Captures c ON s2.SessionId = c.SessionId)

    INNER JOIN Things t ON c.ThingId = t.ThingId

    WHERE s2.SessionId = s1.SessionId

    )

    WHEN @TargetThingId THEN 1

    ELSE 0

    END

    AS TargetThingWasSeenFirst

    FROM Sessions s1;

    The results so far are:

    SessionIdSessionName TargetThingWasFirstSeen

    -----------------------------------------------------

    1 SESSION1 0

    2 SESSION2 1

    3 SESSION3 0

    For the next coulmn, I need to compare the start of the first capture for the target thing to the session start and return 1 if it was less that 4 seconds or 0 if not.

    However, I cannot for the life of me figure out how to do a CASE statement (or IF for that matter) that takes a SELECT statement as the input expression and compares it to a parameter in the boolean expression. E.g. the following gives a syntax error:

    SELECT {as above.....},

    CASE

    s1.Start - (

    SELECT TOP 1 c.Start

    FROM (Sessions s2 INNER JOIN Captures c ON s2.SessionId = c.SessionId)

    INNER JOIN Things t ON c.ThingId = t.ThingId

    WHERE s2.SessionId = s1.SessionId

    AND t.ThingId = @TargetThingId

    )

    WHEN > 4 THEN 1

    ELSE 0

    END

    AS TargetThingWasSeenIn4Secs

    I can see it's fundamentally wrong on a couple of levels (i.e. I know I'd need to do more than check if the result is greater than 4) but can't find an example of how to get the statement format correct in the first place. (I can worry about the date manipulation stuff after.)

    At least I have an actual question now though:

    Within a SELECT statement, how do you compare the results of a sub-SELECT statement to a parameter and return the comparison result as one of the fields returned by the original SELECT statement?

    Any help is ever appreciated

    jeanangeo

    --------------------------------
    I found this article helpful in passing on the info the more experienced posters need to help me with my problem:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • Ah - just seen Seth's latest reply. Am now thinking about that. Thanks again Seth 🙂

    --------------------------------
    I found this article helpful in passing on the info the more experienced posters need to help me with my problem:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • I did what Seth suggested and investigated using derived tables (which I think are also called in-line views?) I've never used these before so this was interesting.

    So anyway, I have SQL that now gives me what I want (for my simplified experiment) i.e. I can get from a list of captures to the percentage of sessions in which the target thing was seen first, seen in time and seen at all.

    However, since I am new to derived tables, I'm not sure if I've created an efficient query or a mess. If anyone has then time and inclination, I'd apreciate comments.

    I used several derived tables to get the data where I want it. Each view brings it a little closer to what I need. I also tried to ensure I am not duplicating calculations. However, for all I know, it's better to have less views and ignore duplicate calculations. There's possibly other stuff I don't fully understand.

    Anyway, here it is:

    DECLARE @TargetThingId AS int;

    DECLARE @TargetDelayInSeconds AS int;

    SET @TargetThingId = 5;

    SET @TargetDelayInSeconds = 5;

    SELECT

    (TotalSeenFirst / TotalSessions) * 100 AS PercentageSeenFirst,

    (TotalSeenInTime / TotalSessions) * 100 AS PercentageSeenInTime,

    (TotalSeenAtAll / TotalSessions) * 100 AS PercentageSeenAtAll

    FROM

    (

    SELECT

    CONVERT(NUMERIC(5,2), COUNT(SessionId)) AS TotalSessions,

    CONVERT(NUMERIC(5,2), SUM(SeenFirst)) AS TotalSeenFirst,

    CONVERT(NUMERIC(5,2), SUM(SeenInTime)) AS TotalSeenInTime,

    CONVERT(NUMERIC(5,2), SUM(SeenAtAll)) AS TotalSeenAtAll

    FROM

    (

    SELECT

    SessionId,

    Name,

    SeenFirst,

    CASE

    WHEN DelayInSeconds <= @TargetDelayInSeconds THEN 1

    ELSE 0

    END AS SeenInTime,

    CASE

    WHEN DelayInSeconds IS NOT NULL THEN 1

    ELSE 0

    END AS SeenAtAll

    FROM

    (

    SELECT

    v.SessionId,

    v.Name,

    v.SeenFirst,

    DATEDIFF(ss, StartTime, TimeFirstSeen) AS DelayInSeconds

    FROM

    (

    SELECT

    s1.SessionId,

    s1.Name,

    s1.StartTime,

    CASE

    (

    SELECT

    TOP 1 t.ThingId

    FROM

    (Sessions s2 INNER JOIN Captures c ON s2.SessionId = c.SessionId)

    INNER JOIN Things t ON c.ThingId = t.ThingId

    WHERE s2.SessionId = s1.SessionId

    ORDER BY c.StartTime

    )

    WHEN @TargetThingId THEN 1

    ELSE 0

    END AS SeenFirst,

    (

    SELECT

    TOP 1 c.StartTime

    FROM

    (Sessions s2 INNER JOIN Captures c ON s2.SessionId = c.SessionId)

    INNER JOIN Things t ON c.ThingId = t.ThingId

    WHERE s2.SessionId = s1.SessionId

    AND c.ThingId = @TargetThingId

    ORDER BY c.StartTime

    ) AS TimeFirstSeen

    FROM

    Sessions s1

    ) v

    ) v2

    ) v3

    ) v4

    --------------------------------
    I found this article helpful in passing on the info the more experienced posters need to help me with my problem:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • A good attempt, especially if this gives you the data you were after. These aren't quite derived tables, as they make reference outside themselves and are recursive. Here's the difference. Let's say we wanted to get the MIN of StartTime for each SessionID (for simplicity), and you wanted to do this with a derived table. You could do that as so:

    SELECT S.SessionID, MinTime

    FROM SessionTable S

    INNER JOIN (SELECT SessionID, MIN(StartTime) MinTime FROM SessionTable GROUP BY SessionID) S2 ON S.SessionID = S2.SessionID

    Note that the inner join table inside of the parens (derived table) makes no reference outside itself. That statement would run completely on its own if you highlighted it and hit execute. This allows it to be executed one time only.

    Conversely, (assuming I'm reading it correctly, I haven't had coffee yet and that's a lot of nesting :P), what yours does is this:

    SELECT SessionID, (SELECT MIN(StartTime) FROM Session S2 WHERE S2.SessionID = S.SessionID) MinTime

    FROM SessionTable S

    It looks very similar... however if you highlighted that inner query and hit execute, you'd get an error. You'd get an error because it references your outer table. This means that for every row that you're returning in S, you have to rerun that SELECT MIN query. This makes it recursive. How well it runs will depend on a lot of factors.

    If you could post the DDL and some sample data for your actual tables, we could give you a lot better guidance as to how to get the data out of them(even if you don't want the TSQL posted for you, seeing the layout lets us figure out how we would get it out, and we can set you on the right path. Whether or not you'll be able to use one method as opposed to another, and the difficulty of doing so is highly dependent upon how your data is stored. If you decide to do this, please see the link in my signature for an example of how.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Thanks again Seth for your time.

    I see why what I've done is not a derived tables but have completely failed to figure out how to do what I want without referencing the outer table. I'm admitting defeat.

    I've attached the SQL to create and populate the test data as suggested in the article in your signature (which btw is a really good idea and I should have done that in the first place).

    The DDL creates a slightly more complicated scenario in that I've added a ValidSessions table which is designed to define the Sessions that can be used. This table also defines a number of Quota fields (Quota1, Quota2 and Quota3). A session will be valid for any of the Quotas, all of them or none of them. The percentages I am calculating will eventually be grouped by Quota and SessionType (a field I have added to the Sessions table). What I need at the end to to be able to list the percentages per Quota, per SessionType so I can get an SSRS report that looks like:

    Quota 1 Quota 2

    SessionType1 SessionType2 SessionType1 SessionType2

    SeenFirst 10% 20% 15% 30%

    SeenInTime 40% 60% 50% 60%

    SeenAtAll 50% 20% 35% 10%

    I'd also like to know that I've done it in an efficient sensible manner rather than whatever works (which is where my experiments will take me without outside verification). Again, I don't expect anyone to do it for me but any help/hints will be very much appreciated.

    On an aside, I've been reading SQL books until my eyes pop out these last few days but none of them seem to get into anything nitty gritty and complicated. As with most books, they illustrate a concept with as simple an example as possible. Does anyone know of any books/articles/whatever where they explain the inner workings and how to do more complicated queries?

    Kind regards

    jeanangeo

    --------------------------------
    I found this article helpful in passing on the info the more experienced posters need to help me with my problem:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • jeanangeo,

    Where does "SessionTypeX" come from in your desired results?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hi Jeff,

    SessionType is on the Sessions table e.g. 'Test 1', 'Test 2' or 'Control'. I've tried to keep the test data simple-ish so have just included two SessionTypes ('Test' and 'Control').

    The scenario is that a Session will be set up with Things arranged in different places to determine if arranging them in a certain way results in a higher chance of the target Thing being noticed straight off. A certain participent will do multiple Sessions, the first being the control Session followed by one or more test Sessions.

    Thanks for looking at my post.

    Kind regards

    jeanangeo

    --------------------------------
    I found this article helpful in passing on the info the more experienced posters need to help me with my problem:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

  • Hello,

    I've been looking at your sample data and it seems there is a slight mistake - Things table contains 3 rows, but 5 things appear in Captures table. Also, it isn't the best sample, because in all sessions all things were spotted (making it impossible to test "SeenAtAll" properly).

    I have inserted things 4 and 5 into Things table, and deleted row 26 from Captures table to make the data better to work with.

    I've prepared a select from the modified data that should diplay all that is necessary in one "table" (including rows for things that were not seen during certain session - these contain NULL in TimeSeen). From here it shouldn't be hard to produce the required output, although there might be something I've overlooked so far. If you think I'm missing something, let me know!

    SELECT Q.sessionId, Q.thingId, MIN(DATEDIFF(s, Q.StartTime, c.StartTime)) as time_seen,

    Q.Quota1, Q.Quota2, Q.Quota3, Q.sessionType

    FROM

    (select t.thingId, s.sessionId, v.Quota1, v.Quota2, v.Quota3, s.sessionType, s.StartTime

    from #sessions s

    cross join #things t

    join #ValidSessions v ON v.SessionId=s.SessionId) as Q

    LEFT JOIN ( select sessionId, thingId, MIN(StartTime) as StartTime

    from #captures

    group by sessionId, thingId) as c ON c.sessionId=Q.sessionId AND c.thingId=Q.thingId

    GROUP BY Q.sessionId, Q.thingId, Q.Quota1, Q.Quota2, Q.Quota3, Q.sessionType

    Unfortunately, I have to work now, but I'll try to get back to this problem some time later. In the meantime, good luck!

  • Thanks vladan for your time and trouble.

    I have corrected the issues you've raised with the ddl :blush: and have attached it again.

    Your SQL is so much simpler than my first attempt that I am very glad I came to this site. I'm fairly sure I could have come up with something that worked eventually but I seem to have completely over-complicated it.

    Thanks again for your time.

  • Actually, my first attempt could have been simpler - no grouping in the outer query is necessary, since the derived table c already handles that.

    I've removed it, and added column based on ROW_NUMBER() function that will help you determine which thing was seen first... it is the row that has "1" in order_seen column.

    SELECT Q.sessionId, Q.thingId, DATEDIFF(s, Q.StartTime, c.StartTime) as time_seen, c.order_seen,

    Q.Quota1, Q.Quota2, Q.Quota3, Q.sessionType

    FROM

    (select t.thingId, s.sessionId, v.Quota1, v.Quota2, v.Quota3, s.sessionType, s.StartTime

    from #sessions s

    cross join #things t

    join #ValidSessions v ON v.SessionId=s.SessionId) as Q

    LEFT JOIN (select sessionId, thingId, MIN(StartTime) as StartTime,

    row_number() over (partition by sessionId order by sessionId, MIN(StartTime)) as order_seen

    from #captures

    group by sessionId, thingId) as c ON c.sessionId=Q.sessionId AND c.thingId=Q.thingId

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply