more complex join question

  • On 2/10 I posted a question, which a member (PW) responded to and gave me a great solution, which also follows. However, I found that the code does not work when Table1.Field1 is not unique. Here is the problem as it was originally posted:

    "I want to select several fields from 2 tables where each row in Table1 may map to 0 to n rows on Table2, but I only want to return a single row from Table2 where a date field is >= a date on Table1.

    For example:

    Select   R_ID

     , P.Field1

     , Date1

     , S.Field2

     , S.Field3

     from Table1 as P

    Left outer join Table2 as S

     on P.Field1 = S.Field1 and S.StartDate >= P.Date1

     Order by Field1, S.StartDate

    This returns all the rows on Table2 but I want only that row with the earliest S.StartDate

    I tried to use a subquery with the result set sorted by StartDate asc and specify "Top1" but I couldn't get the syntax to work.

    Any suggestions?"

    ---------

    PW's proposed solution works most of the time. It was:

    "For problems like this, you generally need to join to a derived table (note, not a sub-query), where the derived table introduces a column that resolves to just 1 row. In your case, you want to introduce MIN(StartDate) as a date to join on:

    In this example, I used a derived table named "dtEarliest"

    Select   R_ID

     , P.Field1

     , Date1

     , S.Field2

     , S.Field3

     from Table1 as P

    Left outer join

    -- Join a derived table, to get the Earliest date for each Field1 value

    (

     Select S.Field1, Min(S.StartDate) As EarliestStartDate

     From Table1 As P

     Inner Join Table2 as S

       on P.Field1 = S.Field1 and S.StartDate >= P.Date1

     Group By S.Field1

    ) dtEarliest

      On (dtEarliest.Field1 = P.Field1)

    -- Joint to Table2, this time joining on both Field1 and the Date

    -- so that there is only 1 row per Field1

    Left Outer Join Table2 As S

      On (dtEarliest.Field1 = S.Field1 And

          dtEarliest.EarliestStartDate = s.StartDate)

    Order by P.Field1, S.StartDate

    "

    ----------------------------------------

    It turns out that Table1.Field1 is not unique. I can have any number of rows with the same value of Field1; the rows would be distinct because they would differ on Date1. And because they have different values for Date1, they also map to different values for Table2.Field2.  If, for example, my Table1 has 3 rows where Field1 = '555' then all 3 rows are returned in the solution set but they are returned with identical values for Table2.Field2. And that value for Field2 is the value that should only map to the row with the earliest Table1.Date1.

    The result set might look like:

    Field1 Date1           Field2  StartDate

    555     01/01/2005 aaaa  1/3/2005

    555     05/01/2005 aaaa  1/3/2005

    555     02/24/2006 aaaa  1/3/2005

    StartDate should always be >= Date1 so the 2nd and 3rd rows are obviously wrong. What isn't obvious is that the values for Field2 are also wrong for the 2nd and 3rd rows.

    I am stumped. Please help?

    Thank you.

     

  • Select P.R_ID

    , P.Field1

    , P.Date1

    , Min(S1.StartDate) As EarliestStartDate

    , S2.Field2

    , S2.Field3

    from Table1 as P

    Left join Table2 as S1

    On P.Field1 = S1.Field1

    And S1.StartDate >= P.Date1

    Left Join Table2 S2

    On P.Field1 = S2.Field1

    Group By P.R_ID

    , P.Field1

    , P.Date1

    , S2.Field2

    , S2.Field3

    HAVING S2.StartDate = MIN(S1.StartDate)

  • Thank you for helping, Kevin.

    I am getting an error which states:

    "S2.StartDate is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause."

    I am reviewing the code to see if I left something out. Do you recommend adding S2.StartDate to the Group BY clause?

  • Kevin - thanks again for suggesting a solution!...

    I added S.StartDate to the Group BY clause. While the result set returned looks accurate, it took 1.5 minutes to return, which is significantly longer than the solution with the derived table.

    Table2 may contain several rows with the same Field1 and the same EarliestStartDate. I want to return only 1 row. I tried using "TOP 1" in the Select statement of the derived table originally suggested to me, but that did not work - all the StartDates and EndDates returned were NULL. Any suggestions?

    I am learning a lot. 

  • did you set a primary key or index for your tables? how many records in each?

    you may think about using the EXISTS statement.

    p

    inner join s1

    on ...

    where exists

    (select min() from s2 group by...)

    check out msdn.com on the usage of EXISTS

    also, are you looking for only one record from both table1 and table2? If so, what's the criteria? MAX(R_ID) ?

  • Table1 has 18 rows. Table2 (actually a view) has about 15k rows. I had a field in table1 set as identity but had not set it as PK. I set it as PK just now and reran the query you suggested and the result set came back in 17 seconds. 🙂

    The user doesn't care which of the records from table2 with the same earliest date is returned with a row from table1. As the current date approaches that earliest date they said those multiple rows will resolve into a single row, so it is irrelevant which one I select; the first one will be fine. 

    To sum up:

    I need all records from Table1 returned.

    If a record on table2 has the same value for Field1 that exists on a row in table1 AND has a StartDate >= table1's Date1, then I need to return fields from that row along with the fields from table1.

    If more than 1 row exists on table2 with those 2 criteria I want to select the one with the earliest StartDate.

    If more than 1 row have that same earliest StartDate than I select 1 row and it doesn't matter which one.

    I will look into "Exists".

    Thank you again, and again for the help and insights you provided.

    m

  • Kevin - the solution you proposed with "Having" does not return rows from table1 that map to zero rows on table2. Is this the result of using the Having clause: HAVING S2.StartDate = MIN(S1.StartDate)?

  • --I haven't forgot about you...

    --I'm pretty sure this is your expected result set

    SELECT

    P2.R_ID,

    P2.Field1,

    P2.Date1,

    S2.Field2,

    S2.Field3

    FROM

    (

    SELECT

    R_ID,

    --P.Field1,

    MIN(S1.startDate) minStartDate

    FROM Table1 as P

    LEFT JOIN Table2 as S1

    ON P.Field1 = S1.Field1

    AND S1.StartDate >= P.Date1

    GROUP BY

    R_ID

    --,P.Field1

    ) P1

    INNER JOIN Table1 P2

    ON p1.R_ID = P2.R_ID

    --Remove the join above if P.Field1 is part of the GROUP in P1

    LEFT JOIN Table2 S2

    ON P1.Field1 = S2.Field1

    AND S2.startDate = minStartDate

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply