more complex join question

LadyReader

Ten Centuries

Points: 1192
More actions
February 27, 2006 at 11:37 am

#189366

On 2/10 I posted a question, which a member (PW) responded to and gave me a great solution, which also follows. However, I found that the code does not work when Table1.Field1 is not unique. Here is the problem as it was originally posted:

"I want to select several fields from 2 tables where each row in Table1 may map to 0 to n rows on Table2, but I only want to return a single row from Table2 where a date field is >= a date on Table1.

For example:

Select   R_ID

, P.Field1

, Date1

, S.Field2

, S.Field3

from Table1 as P

Left outer join Table2 as S

on P.Field1 = S.Field1 and S.StartDate >= P.Date1

Order by Field1, S.StartDate

This returns all the rows on Table2 but I want only that row with the earliest S.StartDate

I tried to use a subquery with the result set sorted by StartDate asc and specify "Top1" but I couldn't get the syntax to work.

Any suggestions?"

---------

PW's proposed solution works most of the time. It was:

"For problems like this, you generally need to join to a derived table (note, not a sub-query), where the derived table introduces a column that resolves to just 1 row. In your case, you want to introduce MIN(StartDate) as a date to join on:

In this example, I used a derived table named "dtEarliest"

Select   R_ID

, P.Field1

, Date1

, S.Field2

, S.Field3

from Table1 as P

Left outer join

-- Join a derived table, to get the Earliest date for each Field1 value

(

Select S.Field1, Min(S.StartDate) As EarliestStartDate

From Table1 As P

Inner Join Table2 as S

   on P.Field1 = S.Field1 and S.StartDate >= P.Date1

Group By S.Field1

) dtEarliest

On (dtEarliest.Field1 = P.Field1)

-- Joint to Table2, this time joining on both Field1 and the Date

-- so that there is only 1 row per Field1

Left Outer Join Table2 As S

On (dtEarliest.Field1 = S.Field1 And

      dtEarliest.EarliestStartDate = s.StartDate)

Order by P.Field1, S.StartDate

"

----------------------------------------

It turns out that Table1.Field1 is not unique. I can have any number of rows with the same value of Field1; the rows would be distinct because they would differ on Date1. And because they have different values for Date1, they also map to different values for Table2.Field2. If, for example, my Table1 has 3 rows where Field1 = '555' then all 3 rows are returned in the solution set but they are returned with identical values for Table2.Field2. And that value for Field2 is the value that should only map to the row with the earliest Table1.Date1.

The result set might look like:

Field1 Date1          Field2 StartDate

555     01/01/2005 aaaa 1/3/2005

555     05/01/2005 aaaa 1/3/2005

555     02/24/2006 aaaa 1/3/2005

StartDate should always be >= Date1 so the 2nd and 3rd rows are obviously wrong. What isn't obvious is that the values for Field2 are also wrong for the 2nd and 3rd rows.

I am stumped. Please help?

Thank you.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

kevin mann Hall of Fame Points: 3894 More actions · Answer 1

Select P.R_ID

, P.Field1

, P.Date1

, Min(S1.StartDate) As EarliestStartDate

, S2.Field2

, S2.Field3

from Table1 as P

Left join Table2 as S1

On P.Field1 = S1.Field1

And S1.StartDate >= P.Date1

Left Join Table2 S2

On P.Field1 = S2.Field1

Group By P.R_ID

, P.Field1

, P.Date1

, S2.Field2

, S2.Field3

HAVING S2.StartDate = MIN(S1.StartDate)

LadyReader Ten Centuries Points: 1192 More actions · Answer 2

LadyReader

Ten Centuries

Points: 1192

February 27, 2006 at 3:08 pm

#623554

Thank you for helping, Kevin.

I am getting an error which states:

"S2.StartDate is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause."

I am reviewing the code to see if I left something out. Do you recommend adding S2.StartDate to the Group BY clause?

LadyReader Ten Centuries Points: 1192 More actions · Answer 3

Kevin - thanks again for suggesting a solution!...

I added S.StartDate to the Group BY clause. While the result set returned looks accurate, it took 1.5 minutes to return, which is significantly longer than the solution with the derived table.

Table2 may contain several rows with the same Field1 and the same EarliestStartDate. I want to return only 1 row. I tried using "TOP 1" in the Select statement of the derived table originally suggested to me, but that did not work - all the StartDates and EndDates returned were NULL. Any suggestions?

I am learning a lot.

kevin mann Hall of Fame Points: 3894 More actions · Answer 4

did you set a primary key or index for your tables? how many records in each?

you may think about using the EXISTS statement.

p

inner join s1

on ...

where exists

(select min() from s2 group by...)

check out msdn.com on the usage of EXISTS

also, are you looking for only one record from both table1 and table2? If so, what's the criteria? MAX(R_ID) ?

LadyReader Ten Centuries Points: 1192 More actions · Answer 5

Table1 has 18 rows. Table2 (actually a view) has about 15k rows. I had a field in table1 set as identity but had not set it as PK. I set it as PK just now and reran the query you suggested and the result set came back in 17 seconds. 🙂

The user doesn't care which of the records from table2 with the same earliest date is returned with a row from table1. As the current date approaches that earliest date they said those multiple rows will resolve into a single row, so it is irrelevant which one I select; the first one will be fine.

To sum up:

I need all records from Table1 returned.

If a record on table2 has the same value for Field1 that exists on a row in table1 AND has a StartDate >= table1's Date1, then I need to return fields from that row along with the fields from table1.

If more than 1 row exists on table2 with those 2 criteria I want to select the one with the earliest StartDate.

If more than 1 row have that same earliest StartDate than I select 1 row and it doesn't matter which one.

I will look into "Exists".

Thank you again, and again for the help and insights you provided.

m

LadyReader Ten Centuries Points: 1192 More actions · Answer 6

LadyReader

Ten Centuries

Points: 1192

February 28, 2006 at 8:38 am

#623711

Kevin - the solution you proposed with "Having" does not return rows from table1 that map to zero rows on table2. Is this the result of using the Having clause: HAVING S2.StartDate = MIN(S1.StartDate)?

kevin mann Hall of Fame Points: 3894 More actions · Answer 7

--I haven't forgot about you...

--I'm pretty sure this is your expected result set

SELECT

P2.R_ID,

P2.Field1,

P2.Date1,

S2.Field2,

S2.Field3

FROM

(

SELECT

R_ID,

--P.Field1,

MIN(S1.startDate) minStartDate

FROM Table1 as P

LEFT JOIN Table2 as S1

ON P.Field1 = S1.Field1

AND S1.StartDate >= P.Date1

GROUP BY

R_ID

--,P.Field1

) P1

INNER JOIN Table1 P2

ON p1.R_ID = P2.R_ID

--Remove the join above if P.Field1 is part of the GROUP in P1

LEFT JOIN Table2 S2

ON P1.Field1 = S2.Field1

AND S2.startDate = minStartDate