How to query a one-to-many and only return 1 row on many side

  • I went through the available posts I could find on this topic and couldn't find a situation quite like mine. Essentially, I need to return one record from the left side of the join with one record on the right side. My problem is getting the correct record. I have a feeling I'm missing something obvious, but I'm not able to get it.

    The problem lies here. I need to find the number of days between an inmates booking date and their release date (inclusive). All the records are kept in a log table. There are two possible event types for a booking event (BOOK and REBK) and one for a release (RELS). Some inmates (name_id 123, below) are nice and clean and have only a single booking event and a single release event. Some are not (name_id 456). For name_id 456 the first book_id (22) is fine because there is only one booking event and one release event. The second book_id (33) is not so much because of the release and rebook events.

    Thanks for any help.

    Code is below.

    --remove temp table if necessary

    IF OBJECT_ID('tempdb..#jlog') IS NOT NULL

    DROP TABLE #jlog

    --make temp table

    CREATE TABLE #jlog (

    jlog_id int NOT NULL,

    name_id int NOT NULL,

    book_id int NOT NULL,

    eventtype char(4) NOT NULL,

    eventdate datetime NULL

    )

    --poplulate temp table

    INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (1,123,11,'BOOK','2009-09-10 16:21:00.000')

    INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (15,123,11,'INMT','2009-09-16 16:37:59.000')

    INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (65,123,11,'RELS','2009-09-17 01:23:34.000')

    INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (445,456,22,'OTH ','2009-09-19 00:01:00.000')

    INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (852,456,22,'BOOK','2009-09-19 04:10:00.000')

    INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (22,456,22,'INMT','2009-09-19 04:20:40.000')

    INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (28647,456,22,'BOND','2009-09-19 10:37:32.000')

    INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (224756,456,22,'BOND','2009-09-19 10:37:33.000')

    INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (4553,456,22,'BOND','2009-09-19 10:37:35.000')

    INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (536,456,22,'RELS','2009-09-19 10:45:31.000')

    INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (8797,456,33,'BOOK','2010-01-08 10:00:00.000')

    INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (455,456,33,'INMT','2010-01-08 10:11:02.000')

    INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (756,456,33,'BOND','2010-01-10 09:06:44.000')

    INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (3115,456,33,'RELS','2010-01-10 09:07:03.000')

    INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (8648,456,33,'REBK','2010-01-16 09:01:40.000')

    INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (4389,456,33,'INMT','2010-01-16 09:09:22.000')

    INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (1315,456,33,'RELS','2010-01-17 13:59:33.000')

    --select data from temp table

    SELECT jbook.name_id,jbook.book_id,jbook.eventtype,jbook.eventdate

    ,jrels.eventtype,jrels.eventdate

    ,DATEDIFF(day, jbook.eventdate, jrels.eventdate) servedDays

    FROM #jlog jbook

    INNER JOIN #jlog jrels

    ON jbook.name_id = jrels.name_id

    AND jbook.book_id = jrels.book_id

    AND jbook.eventdate < jrels.eventdate

    WHERE (jbook.eventtype = 'BOOK' OR jbook.eventtype = 'REBK')

    AND jrels.eventtype = 'RELS'

    ORDER BY jbook.name_id, jbook.book_id

  • Here is my take on it. basically I would use Row_Number() function to create an event_ID for the BOOK/RELS events and then create the same for the RELS event. This would allow you to pair them up based on the event ID. This pairing allows you to pull the correct dates together. then you can do what you want with them.

    DECLARE @InEvent Table

    (name_id int NOT NULL,

    book_id int NOT NULL,

    eventdate datetime NULL,

    eventtype char(4) NOT NULL,

    eventID int NULL)

    insert into @InEvent

    Select

    name_id

    ,book_id

    ,eventdate

    ,eventtype

    ,ROW_NUMBER() OVER(Partition BY name_id,book_id ORDER BY eventdate) as EventCnt

    from #jlog

    where eventtype in('BOOK','REBK')

    order by eventdate

    DECLARE @OutEvent Table

    (name_id int NOT NULL,

    book_id int NOT NULL,

    eventdate datetime NULL,

    eventtype char(4) NOT NULL,

    eventID int NULL)

    insert into @OutEvent

    Select

    name_id

    ,book_id

    ,eventdate

    ,eventtype

    ,ROW_NUMBER() OVER(Partition BY name_id,book_id ORDER BY eventdate) as EventCnt

    from #jlog

    where eventtype in('RELS')

    order by eventdate

    Select a.name_id,a.book_id,a.eventdate As BookDate,b.eventdate As RelsDate

    from @InEvent a

    join @outEvent b on a.name_id=b.name_id and a.book_id=b.book_id and a.eventID=b.eventID

    Dan

    If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

  • Looks like Dan beat me - he did this pretty much the same way I was working on.

    Here's my solution. The basic difference from Dan's solution is that I'm using CTEs instead of table variables... if the underlying tables have indexes, this could be faster.

    ;WITH jbook AS

    (

    select *, RN = ROW_NUMBER() OVER (PARTITION BY name_id, book_id ORDER BY eventdate)

    FROM #jlog

    WHERE eventtype IN ('BOOK', 'REBK')

    )

    , jrels AS

    (

    select *, RN = ROW_NUMBER() OVER (PARTITION BY name_id, book_id ORDER BY eventdate)

    FROM #jlog

    WHERE eventtype = 'RELS'

    )

    SELECT jbook.name_id,jbook.book_id,jbook.eventtype,jbook.eventdate

    ,jrels.eventtype,jrels.eventdate

    ,DATEDIFF(day, jbook.eventdate, jrels.eventdate) servedDays

    FROM jrels

    JOIN jbook

    ON jbook.name_id = jrels.name_id

    AND jbook.book_id = jrels.book_id

    AND jbook.RN = jrels.RN

    ORDER BY jbook.name_id, jbook.book_id

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • and here's the same approach, just without using intermediate tables (I'm using subqueries, or CTE's):

    Side note (valid for both solutions provided so far): Those solution will only work if you don't have "dirty data" (e.g. two consecutive 'BOOK' or 'REBK' or 'RELS' rows per name_id, book_id or any othe invalid combination of current eventtype and previous eventtype.

    As an alternative you might want to have a look at the "Quirky update" article[/url] by Jeff Moden. The concept described in this article would allow you to handle (almost?) any special scenario. But here's the downside of it: ... as long as you follow ALL the rules provided in that article (for example, having the correct clustered index, in this case I'd say it needs to be name_id, book_id, eventdate).

    This "quirky update" solution is most probably the best you can get in terms of performance...

    WITH cte AS

    (

    SELECT *,

    ROW_NUMBER() OVER(PARTITION BY name_id, book_id ORDER BY eventdate ) AS ROW

    FROM #jlog

    WHERE eventtype IN('BOOK','REBK','RELS')

    )

    SELECT

    cte1.name_id,

    cte1.book_id,

    cte1.eventdate AS book,

    cte2.eventdate AS rels,

    DATEDIFF(dd,cte1.eventdate,cte2.eventdate) AS days

    FROM cte cte1

    LEFT OUTER JOIN cte cte2

    ON cte1.name_id=cte2.name_id

    AND cte1.book_id=cte2.book_id

    AND cte1.row=cte2.row-1

    WHERE cte2.eventtype ='RELS'



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Thanks all,

    All three solutions work great and returned the same results (once I reconciled the selection). The only difference in run time seems to be that the first solution with temp tables runs slightly slower when applied to the full data table (3 seconds compared to sub 1 second). Nice to see it though because it provided for me a good comparison to the CTE solutions.

    It's apparent I need to bone up on CTE's.

    Now to make sure the data is clean enough to use these ... this is a third party application and I have no control over the data or the users...

    Again, thanks to all three,

    Steve.

  • Here is a different approach. Not sure if this will work better or worse than other methods, you'd just have to test it on your own data.

    If you don't already have an index on these columns, in order:

    book_id int NOT NULL,

    eventtype char(4) NOT NULL,

    eventdate datetime NULL

    add one, then you should be able to do this:

    SELECT *, DATEDIFF(DAY, eventdate, releasedate) AS ServedDays

    FROM (

    SELECT jbook.name_id,jbook.book_id,jbook.eventtype,jbook.eventdate

    ,'RELS' AS [eventtype2]

    ,(SELECT TOP 1 jrels.eventdate

    FROM #jlog jrels

    WHERE jrels.book_id = jbook.book_id

    AND jrels.eventtype = 'RELS'

    AND jrels.eventdate > jbook.eventdate

    ORDER BY jrels.eventdate) AS ReleaseDate

    FROM #jlog jbook

    WHERE (jbook.eventtype = 'BOOK' OR jbook.eventtype = 'REBK')

    ) AS derived

    ORDER BY name_id, book_id

    Scott Pletcher, SQL Server MVP 2008-2010

  • Yup. That works too. Time is about 1 second against the production table.

    Of the four solutions, it is the only one that returns bookings with no releases. Not a requirement, but might be handy information to have available.

    Luckily, two of the three indexes were already in place. I have not added the third because this is a third party database, and I try to be as light handed as possible when picking data from someone else's db.

    Thanks!

    Steve

  • Dan, I'm studying this solution. Please tell me where the tables are being created with syntax 'DECLARE @InEvent Table' and 'INSERT into @InEvent'. This is neither a concrete table nor a temp table....where are tables created with this syntax stored? What is this method of DDL called so that I can google it and learn. Thank you.

    -

  • They are "Table Variables".



    Clear Sky SQL
    My Blog[/url]

  • Bingo. Thanks for that. That led me to some useful links eg http://www.sql-server-performance.com/2007/temp-tables-vs-variables/

    -

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply