JOIN VS CORRELATED SUBQUERIES

  • What would work best. You have two tables Customers and Orders. Orders holds the foreign key reference to Customers. To produce a report on all Customers and there respective orders, the usual approach would be to use a join e.g.

    (1) SELECT Customers.ID, Customers.Name, Orders.ID OrderID, Orders.TransDate OrderDate FROM Customers INNER JOIN Orders ON Customers.ID = Orders.CustomerID

    However isn’t the following also possible:

    (2) SELECT Customers.ID, Customers.Name, (SELECT ID FROM Orders WHERE Orders.CustomerID = Customers.ID) OrderID, (SELECT TransDate FROM Orders WHERE Orders.CustomerID = Customers.ID) OrderDate FROM Customers

    By the way, I realize the second example would not return all the records in a one to many relationship, but for the sake of argument assume that both tables were one to one. My main question is what would be best,joins or correlated queries? When should each be used?

  • I think a rule of thumb is to use joins whenever possible, but to test the plan if performance is not what you expect.

    SQL Server can perform a join in one of several ways depending on the availablity of indexes, the size of the tables, etc. It generally does a good job of choosing the best plan. By avoiding the join, you're really stearing SQL Server in a certain direction, telling it to perform a number of substeps.

  • Join is definitly your choice.

    In your specifique example join is used the most efficiently.

    The second example you've brought cannot work because you want to bring in a single column a recordset (in case of a one to many relationship) and in addition you are making anyhow a join (in fact 2 joins)

    Bye

    Gabor



    Bye
    Gabor

  • There are not to many absolute rules.

    The writing of a query should 1st and foremost return the desired results. After that, much depends on the how the query is being applied. If it's rarely used, and returns nearly instant results then tweaking becomes a waste of time except where readability is concerned.

    If on the otherhand, the query is often used, or in a loop or something, then time is well spent investigating the "best" solution.

    The two examples you post would only return the same results if we use your 1-to-1 assumtion, otherwise the queries are performing different operations.

    As for you question... because the 2nd example is SELECTing from 3 tables ([Orders] twice, once in each correlated query) it might influence performance greatly on the "bad" side.

    In most cases a JOIN is going to be the best performing option.

    Since JOIN is a common and most often used syntax to retrieve data from 2 or more tables, this is the area of the database engine that is going to get the most attention from the database engine performance geeks as MS.



    Once you understand the BITs, all the pieces come together

  • Thanks guys for the quick response. I would stick to the joins as this seems to be the more effective solution with perhaps less adverse impact on the database. Thanks again!!!

  • KARE4VB,

    You mentioned : Quote

    By the way, I realize the second example would not return all the records in a one to many relationship, but for the sake of argument assume that both tables were one to one. My main question is what would be best,joins or correlated queries? When should each be used?

    EndQuote

    I believe, in a one-to-many relationship, your second query will not even execute. Reason: Each of your subqueries returns a set as opposed to a single value - and this is not allowed by SQL.

    Thanks

    r_achar

  • As pointed out in other posts, the second example would give an error unless there was a 1 to 1 relationship between the two tables.

    On the other hand, many subqueries are in fact nothing more than joins. In most cases, SQL Server actually performs a join to resolve the query. It is possible to write two queries that do the same thing, one using the JOIN syntax and another using a subquery. In most cases that I have actually looked at the query plan they will be identical. If the query plans aren't identical, there is a good chance that the queries are not really performing the same logical operation.

    I always prefer the JOIN syntax for standardization and readability etc... but for performance it shouldn't matter.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Like Thomas mentioned there are few absolute rules.

    If you have the time I would build up a test scenario, generate sample data (more than you actually expect to have), fire both queries and compare execution plans and execution time and all that stuff.

    I think it depends also on the underlying data, which indexes are on those tables to decide which alternative to choose.

    I for myself prefer JOINS and use correlated subqueries commonly to check for nonempty sets with EXISTS.

    But that's just me

    Frank

    http://www.insidesql.de

    http://www.familienzirkus.de

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • In addition to all the other good points people are making, it should be noted that the first example is an inner join, and the second results in what amounts to a left outer join (at least if there is never more than one order per customer).

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply