5 Table Join

  • I am trying to join 5 tables in a sql server 2k db.  Does anyone know of a good set of guidelines for doing this?  Alternately, could someone find the problem in the following query?

    The query that I am using is listed here (please let me know if I am violating any programming guidelines on this):

    SELECT p.ParticipantID, pr.Age, ir.FnlTime, e.EventDate

    FROM Participant p INNER JOIN PartRace pr ON p.ParticipantID = pr.ParticipantID

    JOIN IndResults ir ON pr.ParticipantID = ir.ParticipantID

    JOIN RaceData rd ON ir.RaceID = rd.RaceID

    JOIN Events e ON e.EventID = rd.EventID

    WHERE rd.Dist = '5_km' AND p.Gender = 'm' AND ir.FnlTime <> '00:00' AND e.EventGrp = 1

    ORDER BY ir.ParticipantID

    The problem that I am having is that if a participant shows up multiple times (which they could do since this is designed to get the performances for an event over a series of years) it does not associate the correct data from year to year.  Basically some times show up where they shouldn't.

  • Post the DDL please. For now, I recommend, you qualify your joins. SQL Qyery Optimizer interprets, true, but are your joins MEANT to be what is interpreted? e.g. INNER, LEFT OUTER etc...?


    _/_/_/ paramind _/_/_/

  • There is obviously much I don't know about this process.  What is DDL?  Also, I will qualify my joins as you suggested.

    Thanks~

  • DDL = data description language, which means the creation statements for your tables in this case.

    Is the following, what you mean? :

    On an event, multiple races take place(e). 

    For each race, some racedata (beginning, end etc. is stored)

    Each race has some participants.

    A participant has ONE individual result per race.

    The participant has further properties, eg. name, age ...

    ?

    SELECT

         p.ParticipantID,

         pr.Age,

         ir.FnlTime,

         e.EventDate

    FROM Events e

    INNER JOIN RaceData rd 

         ON e.EventID = rd.EventID 

             INNER JOIN PartRace pr 

             ON rd.RaceID = pr.RaceID

                  INNER JOIN IndResults ir 

                  ON pr.ParticipantID = pr.ParticipantID

                       INNER JOIN Participant p

                       ON pr.ParticipantID = p.ParticipantID

    WHERE

               rd.Dist = '5_km' 

        AND p.Gender = 'm' 

        AND ir.FnlTime <> '00:00' 

        AND e.EventGrp = 1

    ORDER BY 

               ir.ParticipantID


    _/_/_/ paramind _/_/_/

  • Just guessing  but I think the problem is the join between PartRace and IndResults not matching on RaceID

    SELECT p.ParticipantID, pr.Age, ir.FnlTime, e.EventDate

    FROM Participant p

      INNER JOIN PartRace pr

        ON pr.ParticipantID = p.ParticipantID

      INNER JOIN IndResults ir

        ON ir.ParticipantID = pr.ParticipantID

        AND ir.RaceID = pr.RaceID

        AND ir.FnlTime <> '00:00'

      INNER JOIN RaceData rd

        ON rd.RaceID = ir.RaceID

        AND rd.Dist = '5_km'

      INNER JOIN Events e

        ON e.EventID = rd.EventID

        AND e.EventGrp = 1

    WHERE p.Gender = 'm'

    ORDER BY p.ParticipantID

    Far away is close at hand in the images of elsewhere.
    Anon.

  • DDL is not data description language, DDL is data definition language and the last query is better because per ANSI SQL when the query processor see the FROM and ON clause in a JOIN operation the WHERE clause is ignored so the three extra evaluations introduced with the AND operator after the WHERE clause will be ignored by the query processor.  Hope this helps.

     

    Kind regards,

    Gift Peddie

    Kind regards,
    Gift Peddie

  • definition : of course - just overpaced while planning dinner 

    But with the query, it is still NOT clear, whether inner or outer joins are meant. The inner joins are just a guess, as I denoted. With regard to possible outer joins, it makes a big difference in query logic, whether the criteria are incorporated in join logic or later applied to the result set. As this is still unknown, I'd rather not worry about which query performs better, since we don't even know, which one is possibly wrong in terms of logic


    _/_/_/ paramind _/_/_/

  • From the standpoint of code itself this is how I would setup the query.

    SELECT

     p.ParticipantID,

     pr.Age,

     ir.FnlTime,

     e.EventDate

    FROM

     dbo.Participant p

    INNER JOIN

     dbo.PartRace pr

     INNER JOIN

      dbo.IndResults ir

      INNER JOIN

       dbo.RaceData rd

       INNER JOIN

        dbo.Events e

       ON

        e.EventID = rd.EventID AND

        e.EventGrp = 1 AND

        rd.Dist = '5_km'

      ON

       ir.RaceID = rd.RaceID AND

       ir.FnlTime != '00:00'

     ON

      pr.ParticipantID = ir.ParticipantID

      -- feel like you probably have a race id that needs to be reference in this join

      -- AND pr.RaceID = ir.RaceID

    ON

     p.ParticipantID = pr.ParticipantID

    WHERE

     p.Gender = 'm'

    ORDER BY

     ir.ParticipantID

  • why would you do so?


    _/_/_/ paramind _/_/_/

  • First and foremost for readability.

    Second to constrain the joins so I cannot connect on unrealted items. Queries are interpreted innermost to outtermost joins, top to bottom.

    So the join

    dbo.RaceData rd

    INNER JOIN

     dbo.Events e

    ON

     e.EventID = rd.EventID AND

     e.EventGrp = 1 AND

     rd.Dist = '5_km'

    The ON will not let me reference any table other than RaceData and Events.

    Also in general the innermost join will occurr (and many times this is what the query engine will try to do itself anyway) first and work it's way out. This hopefully with provide a small dataset comparing to a larger each time which means the query at each stage should generate the smallest possible dataset to work with before continuing, this can also be helped along when you take the where conditions that relate to that join and move them to the ON clause.' Which in turn makes it easier to read that when I look at table Events, I only want EventGrp = 1 and Dist = '5_km' at the time the join to RaceData occurs.

  • I'm pretty sure, I know and I knew, what you mean, but let me state again, what I already wrote in a prior message: 

    >it is still NOT clear, whether inner or outer joins are meant. The inner joins are just a guess, as I denoted.<

    Do you know, what will happen, when you incorporate '5 km' in the join logic and there's no data entered at some point?

    You will miss the records, not for not being there, but for being incomplete. Wouldn't you always want to adjust the "where"-filter only, when querying for different cases?, e.g. WHERE Dist IS NULL? But you'll never get those results, if already a join condition.

    As for the database's table-logic-driven join conditions, I agree. (e.g. ID etc.)

      


    _/_/_/ paramind _/_/_/

  • Ok this has been incredibly helpful and educational.  I apologize for anyone who was trying to help me with insufficient information.  The last query submitted worked very well and, yes, you were correct that the there needed to be a connection between the RaceIDs so thanks for that suggestion.

    I have a question:  Why the dbo. ?

    Thanks!

  • >I have a question:  Why the dbo. ?<

    the object owner (here dbo. = DataBaseOwner) is always an  integral part of naming in SQL-Server. This is true for all objects (tables, procedures etc.) Try the following:

    CREATE TABLE dbo.[tblCustomer] ([CustomerID] [int] NOT NULL)

    CREATE TABLE user1.[tblCustomer] ([CustomerID] [int] NOT NULL)

    CREATE TABLE user2.[tblCustomer] ([CustomerID] [int] NOT NULL)

    CREATE TABLE user3.[tblCustomer] ([CustomerID] [int] NOT NULL)

    dbo is simply implicitly assumed, as long as you don't provide another distinct name. In other words - if you use different object owners you will run into problems, if you omit the owner part. Nice feature, but beware of the pitfalls.

    PS: Replace users1 to 3 with real accounts in your db


    _/_/_/ paramind _/_/_/

  • There is also a minor performance gain in not forcing the query engine to go looking. It actually does first go looking for loggedonuser.object before assuming dbo.object. And in case someone thinks to ask curren server and current database are always assumed for their portion of the 4 part name.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply