Sub-Query or Left-Outer Join (Performance Wise)

  • Which one is Better and why ?

    Use SQL subselects to consolidate queries on multiple tables into a single statement

    Or

    Use Left Outer Join

    Example:

    Table Employee(Employee_ID,Employee_Name,Department_ID)

    Table Department(Department_ID,Department_Name)

    • Table Employee has million records, table Department has Few Records

    • Some Of the employees does not have department

    Which Query is better for performance?

    Select

    Employee_ID,

    Employee_Name,

    (Select Department_Name From Department Where Department .Department_ID = Employee. Department_ID) as Department_Name

    From Employee

    OR

    Select

    Employee_ID,

    Employee_Name,

    Department_Name

    From Employee

    Left Outer Join

    Department

    On Department .Department_ID = Employee. Department_ID

  • There are always exceptions, but in general, you're better off using the LEFT JOIN. When in doubt, write the query both ways and look at the execution plans.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • In this case I'm pretty sure that the optimizer will just rewrite the subquery to be a join. I normally favor the join syntax just for stylistic reasons, but when in doubt, like Grant said, write it both ways and check.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Generally speaking, the query optimizer doesn't much care how you write your queries.

    Ok, so that is a bold statement - but it is at least partly true, at least to a large extent on up-to-moderately complex statements.

    The SQL you write goes through a great deal of analysis, conversion and optimization before it hit the execution engine. Your query text is essentially translated in an internal tree structure of logical operators which represents the logical equivalent of what you have asked for.

    The SQL Server Algebrizer (which replaces the 'Normalizer' in 2000) does a fair amount of this work, including helping to decide on parallelism, how to approach sub-queries and aggregations, and where to locate objects in cache memory. That is but one stage - there are many opportunities for the process to translate your written query into something equivalent, if it decides it is worth it.

    Anyway, so long as the left join and subquery are exactly equivalent I guess it comes down to a question of writing style. Personally, I almost always write it as a join.

    Check the query plans involved - I would bet dollars to doughnuts that the subquery is implemented as a logical left join 🙂

    Paul

  • Check the query plans involved - I would bet dollars to doughnuts that the subquery is implemented as a logical left join

    Actually in ANSI SQL 1999 Subquery is equal to an INNER JOIN so it is techincally faster than OUTER JOIN because of the OUTER JOIN default NULL. So if Microsoft implemented it in 2005 and 2008 there should be some difference.

    If is the operative word.

    Kind regards,
    Gift Peddie

  • Gift Peddie (6/24/2009)


    Actually in ANSI SQL 1999 Subquery is equal to an INNER JOIN so it is techincally faster than OUTER JOIN because of the OUTER JOIN default NULL. So if Microsoft implemented it in 2005 and 2008 there should be some difference.

    1999? 😉

    SQL standards are a myth.

    😀

  • Paul White (6/24/2009)


    The SQL Server Algebrizer (which replaces the 'Normalizer' in 2000) does a fair amount of this work, including helping to decide on parallelism, how to approach sub-queries and aggregations, and where to locate objects in cache memory. That is but one stage - there are many opportunities for the process to translate your written query into something equivalent, if it decides it is worth it.

    While I'm in agreement with the general statements of the post, I've got a pick a nit with this part. The algebrizer creates the parse tree, the first step in optimization. It doesn't determine parallelism or where to locate objects in cache. It does resolve object names and it does flatten operators, which is part of what we're talking about here, rearranging the query so that you can see correlated sub-queries turned into joins or LIKE statements turned into >= and < operations. It also does aggregate & grouping binding prior to the optimization process.

    Sorry. Just had to get that one off my chest.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Grant Fritchey (6/25/2009)


    While I'm in agreement with the general statements of the post, I've got a pick a nit with this part. The algebrizer creates the parse tree, the first step in optimization. It doesn't determine parallelism or where to locate objects in cache. It does resolve object names and it does flatten operators, which is part of what we're talking about here, rearranging the query so that you can see correlated sub-queries turned into joins or LIKE statements turned into >= and < operations. It also does aggregate & grouping binding prior to the optimization process.

    Sorry. Just had to get that one off my chest.

    That's ok Grant, but I'm not so sure, and I don't have the SQL source code to hand to check 😉

    A quick Google shows several books and articles that assert that the algebrizer does in fact get involved in parallelism and caching.

    I'm not going to post a lmgtfy link 🙂 but here's the one that comes up first for me

    I don't actually care that much though!

  • Abdullah M. Al-Fararjeh (6/22/2009)


    Which one is Better and why ?

    Great question! Welcome to the board by the way.

    My rule of thumb is...don't listen to rules of thumb and find out which is better on your own. Best-practices aside, write the code in a number of different ways and figure it out using the tools at your disposal.

    I just finished up a very large query LEFT OUTER JOINing a table about a dozen times. Turns out that a UNION ALL was 10x faster. Bottom line - don't rat-hole yourself into one or two ways to do something.

  • Paul White (6/25/2009)


    That's ok Grant, but I'm not so sure, and I don't have the SQL source code to hand to check 😉

    A quick Google shows several books and articles that assert that the algebrizer does in fact get involved in parallelism and caching.

    I'm not going to post a lmgtfy link 🙂 but here's the one that comes up first for me

    I don't actually care that much though!

    I tracked down some more information. According to a Microsoft developer I spoke with, the determination for parallelism is made in the optimizer. BUT, there are functions & processes that will be marked by the algebrizer as "these can't be parallel" which will prevent the optimizer from choosing a parallel plan, but the actual determination is part of the optimizer process.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Grant Fritchey (6/26/2009)


    I tracked down some more information. According to a Microsoft developer I spoke with, the determination for parallelism is made in the optimizer. BUT, there are functions & processes that will be marked by the algebrizer as "these can't be parallel" which will prevent the optimizer from choosing a parallel plan, but the actual determination is part of the optimizer process.

    Cool. I did say "including helping to decide on parallelism" 😉

    Thanks though.

    edit: added smiley

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply