Query on Multiple Tables. JOIN help needed

  • Hi

    I am trying to get result as (in actual, lot of columns)

    ILE.[Item No]

    Item.[Item Name]

    ILE.[Customer No]

    Customer.[Customer Name]

    ILE.[Location Code]

    Location.[Location Name]

    ILE.[Country Code]

    Country.[Country Name]

    ILE.[State Code]

    State.[State Name]

    Dimension.[dimCategory Code]

    Dim Category.[dimCategory Name]

    by selecting columns from multiple tables. The table structure is attached to this post. Can you please guide me in the right direction on linking multiple tables using JOIN or other better idea?

    I already tried with ILE, Item and Customer using INNER JOIN and ended with only ONE record. My ILE has millions of records.

  • Try this...

    select ILE.[Item No]

    Item.[Item Name]

    ILE.[Customer No]

    Customer.[Customer Name]

    ILE.[Location Code]

    Location.[Location Name]

    ILE.[Country Code]

    Country.[Country Name]

    ILE.[State Code]

    State.[State Name]

    Dimension.[dimCategory Code]

    Dim Category.[dimCategory Name]

    from ILE

    left outer join Items on (ILE.[Item No] = Items.[Item No])

    left outer join Customer on (ILE.[Customer No]= Customer.[Customer No])

    left outer join Location on (ILE.[Location Code] = Location.[Location Code])

    left outer join Country on (ILE.[Country Code] = Country.[Country Code])

    left outer join State on (ILE.[State Code] = State.[State Code])

    left outer join Dimension on (Items.[Item No]= Dimension.[Item No])

    left outer join [Dim Category] on (Dimension.[dimCategory Code] = [Dim Category].[dimCategory Code])

    Regards,

    Snigdha

  • You've made it very hard on yourself by putting spaces in the table and column names. It's not impossible, but it is not according common naming conventions to put these in the names. Because of them you need to escape the names by enclosing them in square brackets: [ and ]. Below query will do what you want. But only if all of the tables are correctly filled (do you have foreign key constraints in place on your tables?) and all columns in ILE, plus the dimCategory Code in Dimensions are mandatory (i.e. they can not be set to NULL, because they have "NOT NULL" after their declaration in the create table command). If any of the columns are optional (read: can be set to NULL), you should change 'inner join' into 'left outer join' for the join using that column. Failing to do so, will make all rows that have a NULL in that particular column disappear from your results.

    select

    l.[Item No],

    itm.[Item Name],

    l.[Customer No],

    cust.[Customer Name],

    l.[Location Code],

    loc.[Location Name],

    l.[Country Code],

    cntr.[Country Name],

    l.[State Code],

    stt.[State Name],

    dim.[dimCategory Code],

    dc.[dimCategory Name]

    from ILE l

    inner join Items itm on (itm.[Item No] = l.[Item No])

    inner join Dimension dim on (dim.[Item No] = l.[Item No])

    inner join [Dim Category] dc on (dc.[dimCategory Code] = dim.[dimCategory Code])

    inner join Customer cust on (cust.[Customer No] = l.[Customer No])

    inner join Location loc on (loc.[Location Code] = l.[Location Code])

    inner join Country cntr on (cntr.[Country Code] = l.[Country Code])

    inner join [State] stt on (stt.[State Code] = l.[State Code])

    Instead of putting spaces in your table and column names, you could better replace these by underscores: _. You won't need to escape the names and it still reads very clearly the different parts of the name. Also, you should better use ID instead of No for the unique identifiers per table, "No" is usually used to designate human readable numbers. Then there are most likely some normalization issues in your table definitions. I won't go into that now. But if you want us to, just put the DDL (data definition language, for more info see 'posting etiquette' in my footer text) online with some test data to fill it, and we'll be happy to explain it to you.

    Good luck,

    Richard Rozema



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • This was removed by the editor as SPAM

  • Thank you for all the replies. i used LEFT OUTER JOIN and solved the problem. So far i have added 9 tables and some SELECT sub queries to get the fields i wanted and all working well.

  • You really should see which tables really need an OUTER JOIN and only use the OUTER JOIN in those cases. Also you mention SELECT sub-queries, you don't mean that you are doing something like this do you:

    SELECT

    COLUMN,

    (SELECT COUNT(test) FROM table1 WHERE TABLE.COLUMN = table1.column) AS no_of_children

    FROM

    TABLE

    That's a performance killer if you are doing that.

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • Dear Jack

    Actually, that's what I have done. How can overcome this issue? Any better workaround methods?

  • meelan (9/27/2011)


    Dear Jack

    Actually, that's what I have done. How can overcome this issue? Any better workaround methods?

    Here are a few ideas:

    Put the sub-query in the from clause as a derived table.

    Eliminate the subquery altogether an investigate your table structure in more detail. If every ILE that you're interested in has a customer then use an inner join otherwise leave the left join. Inspect each join and ask the same question. (When in doubt, change to an inner join one at a time and see how the result changes)

    If there is selection criteria that applies to a particular table but not the entire query (common reason devs us subqueries) consider using a left join and adding that criteria to the ON statement e.g.

    SELECT *

    FROM

    ILE

    LEFT JOIN Customer ON ILE.[Customer No]=Customer.[Customer No]

    AND [Customer Name] LIKE 'A%'

    This allows you to filter the table without filtering the entire result set. (You'll simply get a NULL rather than the undesired data.

  • You can often get the same data using a CTE or derived table. Something like this:

    CTE:

    ;WITH ItemCount AS

    (

    SELECT

    COUNT(test) AS itemCount,

    column FROM table1

    GROUP by

    table1.COLUMN

    )

    SELECT

    COLUMN,

    itemCount

    FROM

    TABLE AS T JOIN

    ItemCount AS IC

    ON T.COLUMN = IC.column

    Derived Table:

    SELECT

    COLUMN,

    itemCount

    FROM

    TABLE AS T JOIN

    (

    SELECT

    COUNT(test) AS itemCount,

    column FROM table1

    GROUP by

    table1.COLUMN

    ) AS IC

    ON T.COLUMN = IC.column

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • Jack Corbett (9/27/2011)


    You can often get the same data using a CTE or derived table. Something like this:

    CTE:

    ;WITH ItemCount AS

    (

    SELECT

    COUNT(test) AS itemCount,

    column FROM table1

    GROUP by

    table1.COLUMN

    )

    SELECT

    COLUMN,

    itemCount

    FROM

    TABLE AS T JOIN

    ItemCount AS IC

    ON T.COLUMN = IC.column

    Derived Table:

    SELECT

    COLUMN,

    itemCount

    FROM

    TABLE AS T JOIN

    (

    SELECT

    COUNT(test) AS itemCount,

    column FROM table1

    GROUP by

    table1.COLUMN

    ) AS IC

    ON T.COLUMN = IC.column

    Use CTEs with care, if the expected row count is very high, you're actually better off with a temp table. I've also read that CTEs have a hard time with aggregate functions.

  • SDB15 (9/27/2011)


    Use CTEs with care, if the expected row count is very high, you're actually better off with a temp table. I've also read that CTEs have a hard time with aggregate functions.

    I don't see how anyone can give such a generic statement. CTE's are just another logical structure, they don't dictate how a query plan's put together. There are some tasks where temp tables make sense and others (especially where the data's only referred to once) where you're better off with a CTE/sub-query.

    Just because the CTE logically refers to a large number of rows, doesn't mean it has to process them all if they're not needed in the final query. It will build a plan for the whole query, wrapping in the CTE and may well end up getting far fewer rows than the logical structure suggests it would (with the appropriate indexes).

    Do you have a reference to problems specifically relating to CTE's and aggregate functions?

  • HowardW (9/27/2011)


    SDB15 (9/27/2011)


    Use CTEs with care, if the expected row count is very high, you're actually better off with a temp table. I've also read that CTEs have a hard time with aggregate functions.

    I don't see how anyone can give such a generic statement. CTE's are just another logical structure, they don't dictate how a query plan's put together. There are some tasks where temp tables make sense and others (especially where the data's only referred to once) where you're better off with a CTE/sub-query.

    Just because the CTE logically refers to a large number of rows, doesn't mean it has to process them all if they're not needed in the final query. It will build a plan for the whole query, wrapping in the CTE and may well end up getting far fewer rows than the logical structure suggests it would (with the appropriate indexes).

    Do you have a reference to problems specifically relating to CTE's and aggregate functions?

    I was going to post a similar statement. There definitely may be instances where a temp table will work better than a CTE>

    The other thing I'd add is that a CTE and a Derived Table, in my experience, usually get the same query plan from the optimizer, so you'd have to make the same statement regarding a derived table. Here's a simple example:

    CREATE TABLE #test

    (

    id INT IDENTITY(1, 1)

    NOT NULL

    PRIMARY KEY CLUSTERED,

    some_data VARCHAR(10)

    ) ;

    CREATE TABLE #child_test

    (

    id INT IDENTITY(1, 1)

    NOT NULL

    PRIMARY KEY CLUSTERED,

    test_id INT NOT NULL,

    some_data VARCHAR(10)

    ) ;

    INSERT INTO #test

    (

    some_data

    )

    SELECT TOP 10000

    CHAR(ABS(CHECKSUM(NEWID())) % 26 + 65) + CHAR(ABS(CHECKSUM(NEWID())) %

    26 + 65) AS value

    FROM

    sys.all_columns AS AC1

    CROSS JOIN sys.all_columns AS AC ;

    INSERT INTO #child_test

    (

    test_id,

    some_data

    )

    SELECT TOP 100000

    T.ID,

    CHAR(ABS(CHECKSUM(NEWID())) % 26 + 65) + CHAR(ABS(CHECKSUM(NEWID())) %

    26 + 65) AS value

    FROM

    sys.all_columns AS AC

    CROSS JOIN #test AS T ;

    WITH ChildData

    AS (

    SELECT

    COUNT(*) AS child_count,

    test_id

    FROM

    #child_test AS CT

    GROUP BY

    CT.test_id

    )

    SELECT

    T.id,

    CD.child_count

    FROM

    #test AS T

    JOIN ChildData AS CD

    ON T.id = CD.test_id ;

    SELECT

    T.id,

    CD.child_count

    FROM

    #test AS T

    JOIN (

    SELECT

    COUNT(*) AS child_count,

    test_id

    FROM

    #child_test AS CT

    GROUP BY

    CT.test_id

    ) AS CD

    ON T.id = CD.test_id ;

    DROP TABLE #test;

    DROP TABLE #child_test;

    The 2 select statements at the end get the same plan. I think the CTE version is a little easier to read and understand, but that's just a personal preference.

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • HowardW (9/27/2011)


    SDB15 (9/27/2011)


    Use CTEs with care, if the expected row count is very high, you're actually better off with a temp table. I've also read that CTEs have a hard time with aggregate functions.

    I don't see how anyone can give such a generic statement. CTE's are just another logical structure, they don't dictate how a query plan's put together. There are some tasks where temp tables make sense and others (especially where the data's only referred to once) where you're better off with a CTE/sub-query.

    Just because the CTE logically refers to a large number of rows, doesn't mean it has to process them all if they're not needed in the final query. It will build a plan for the whole query, wrapping in the CTE and may well end up getting far fewer rows than the logical structure suggests it would (with the appropriate indexes).

    Do you have a reference to problems specifically relating to CTE's and aggregate functions?

    These are just a couple rules of thumb that has been useful in my experience. As you said, CTEs, temp tables and table variables all have their uses.

    The problem with large datasets is well documented. Here are a few references:

    http://stackoverflow.com/questions/690465/which-are-more-performant-cte-or-temporary-tables

    CTE has its uses - when data in the CTE is small and there is strong readability improvement as with the case in recursive tables. However, its performance is certainly no better than table variables and when one is dealing with very large tables, temporary tables significantly outperform CTE. This is because you cannot define indices on a CTE and when you have large amount of data that requires joining with another table (CTE is simply like a macro). If you are joining multiple tables with millions of rows of records in each, CTE will perform significantly worse than temporary tables.

    http://www.simple-talk.com/community/forums/thread/72864.aspx

    I am working with common table expressions and observing what I would consider abnormal behavior. I have created a script that initializes a CTE that contains 22,000 plus rows of data. Later in the script I perform multiple (4) queries filtering the data in the cte. All the resultsets from the 4 queries are returned in one result set via union all clause. This script takes minutes to return the final dataset. I then wrote the same script using a temporary table in place of the cte. This script runs and returns the final result set in seconds.

    I would expect the script using the cte would run faster than the on using a temporoary table. the questions I have is:

    1. Is there an optimum number of rows in a cte after which performance stats to degrade?

    2. Is there a server setting that will optimize SQL server performance for cte's?

    This was also a good discussion on the subject:

    http://social.msdn.microsoft.com/Forums/en/transactsql/thread/2b977c99-f026-43f5-b062-9d976fd0fed6

    I'm looking for the article that compared CTEs, temp tables, and table variables when different aggregate functions were used. I'll post again if I find it.

  • SDB15 (9/27/2011)


    These are just a couple rules of thumb that has been useful in my experience. As you said, CTEs, temp tables and table variables all have their uses.

    The problem with large datasets is well documented. Here are a few references:

    http://stackoverflow.com/questions/690465/which-are-more-performant-cte-or-temporary-tables

    CTE has its uses - when data in the CTE is small and there is strong readability improvement as with the case in recursive tables. However, its performance is certainly no better than table variables and when one is dealing with very large tables, temporary tables significantly outperform CTE. This is because you cannot define indices on a CTE and when you have large amount of data that requires joining with another table (CTE is simply like a macro). If you are joining multiple tables with millions of rows of records in each, CTE will perform significantly worse than temporary tables.

    http://www.simple-talk.com/community/forums/thread/72864.aspx

    I am working with common table expressions and observing what I would consider abnormal behavior. I have created a script that initializes a CTE that contains 22,000 plus rows of data. Later in the script I perform multiple (4) queries filtering the data in the cte. All the resultsets from the 4 queries are returned in one result set via union all clause. This script takes minutes to return the final dataset. I then wrote the same script using a temporary table in place of the cte. This script runs and returns the final result set in seconds.

    I would expect the script using the cte would run faster than the on using a temporoary table. the questions I have is:

    1. Is there an optimum number of rows in a cte after which performance stats to degrade?

    2. Is there a server setting that will optimize SQL server performance for cte's?

    This was also a good discussion on the subject:

    http://social.msdn.microsoft.com/Forums/en/transactsql/thread/2b977c99-f026-43f5-b062-9d976fd0fed6

    I'm looking for the article that compared CTEs, temp tables, and table variables when different aggregate functions were used. I'll post again if I find it.

    I wouldn't necessarily call those 3 threads cases where performance problems with CTE's are well-documented. You got several opinions and some anecdotal evidence, but no concrete evidence, except in the last link to the msdn forum. Even in that case Erland made the point that (my interpretation) it needs to be evaluated on a case by case basis. As Jeff Moden would say, "Test it yourself".

    I'm very careful not to make blanket statements unless I have test results that prove the statement, and even in that case, I usually preface it by saying "my tests show". With SQL Server things change so often that I'm careful about that because in a change in vNext might change the behavior.

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply