Table Order!

  • As a general rule is there any performance gain in the order I list my tables in the FROM clause?

    example table 1 has 14 rows, table 2 has 300k rows, table 3 has 1.5m rows.

    Do I list them small - large (1,2,3), or large - small (3,2,1) or is ther no difference and I can order them (2,1,3) and let SQL Server sort out what it put in cache memory etc?

  • Hi wildh,

    quote:


    As a general rule is there any performance gain in the order I list my tables in the FROM clause?

    example table 1 has 14 rows, table 2 has 300k rows, table 3 has 1.5m rows.

    Do I list them small - large (1,2,3), or large - small (3,2,1) or is ther no difference and I can order them (2,1,3) and let SQL Server sort out what it put in cache memory etc?


    I might be wrong on this, but one advantage of SQL is that you only state what you want from the DBMS and let the DBMS do the rest.

    So I think, there shouldn't be performance gain by ordering tables in different ways once to Query optimizer has made up an execution plan.

    However, you can have a look at the execution plans and see which one produces the lowest cost, if they are different at all.

    Frank

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Frank is correct. There is no benefit to the order of the tables in the statement. This is different from Oracle I believe.

    The optimizer considers the request and may reorder table access order based on what it decides.

    Steve Jones

    sjones@sqlservercentral.com

    http://qa.sqlservercentral.com/columnists/sjones

    http://www.dkranch.net

  • Maybe I'm dreaming this up, but I thought I had read somewhere that the query optimizer considers tables in groups of four, starting with how they are listed in the query. So, if you have eight tables in a query, it optimizes the first four and the last four joins separately... This is done to reduce the cost of generating the query plan.

    I've heard of tools that will take all possible permutations of the joins in a statement and find the best re-write (I think they probably also add index hinting). I have never used one personally, and I tend to prefer well formatted and easy to maintain code over trying to eek out a little extra performance from the optimizer since it does a pretty good job anyway.

    I have found, however, a few cases here and there where I've had to rewrite long joins (around 10ish tables) to get the execution time from 10+ seconds to sub-second. Most of the time I use derived tables, or do part of the query separately using table variables/temp tables. I don't know that I've ever optimized a query merely by reordering the joins, but sometimes moving things between the where clause and the join clauses has sped things up.

    Maybe this was all induced by drinking too much soda the night before, but I try to visualize what will help the optimizer find the smallest number of rows and put those tables together, then join everything else to that. Derived tables have been the most consistent way I've found to force to optimizer to pick the plans I want, though maybe I just don't know how to do it through ordering or clever query hints.

    The other possibilty is that this just lets me control the filtering better and may not really be related to the way SQL joins the tables. Consider the case where you're joining two 100k row tables, but you're filtering both so that only a few hundered are used from each... if you filter first, then the join result is pretty small, if you join first, you wind up with many, many candidate rows in the join. In my experience, the optimizer only gets the filtering right about 95% of the time in long, ugly joins with a lot of filters.

    Matthew Galbraith

  • I think I read somewhere in InsideSQL Server a passage where is states, as a rule of thumb you should not Join more than 4 tables. Otherwise you lose performance.

    Will investigate this evening.

    Frank

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Hi Matthew ,

    ok, unfortunately the following is not my own!

    Taken from Inside SQL Server 7 (sorry, the one I got at home) Part IV Chapter 14

    'Optimizing Queries'

    ...

    For the most common type of join, an equijoin that looks for equal values in columns of two tables, the optimizer automatically decides which is the inner table and which is the outer table of a join. The table order that you specify for the join doesn't matter in the equijoin case. However, the order for outer joins must match the semantics of the query, so the resulting order is dependent on the order specified.

    ...

    Joins can be processed using one of three methods: nested iteration, hashing, or merging. For any of these methods, the optimizer decides on the order in which the tables should be accessed. Because a nested iteration is a loop, order is important. The fewer the iterations through the loops, the less processing will be required. So it is useful to start with the table (or tables) that can exclude the most rows as the outer loops. The general strategy is to make the outer table limit the search the most, which results in the fewest total iterations (scans). With hash or merge joins, SQL Server builds an in-memory structure from one of the tables. To conserve memory resources, it tries to determine which is the smaller table or which will have the smallest number of qualifying rows after the WHERE conditions are applied. This table is typically chosen as the first table to be processed. (We'll see more on each of these join strategies later in the chapter.)

    For each combination of join strategy, join order, and indexes, the query optimizer estimates a cost, taking into account the number of logical reads and the memory resources that are required and available. Then the optimizer compares the cost of each plan and chooses the plan with the lowest estimate. You can think of query optimization as happening in three main phases: query analysis, index selection, and join selection (although there is a lot of overlap between the index selection and join selection phases). The following sections discuss each phase

    Join selection is the third major step in query optimization. If the query is a multiple-table query or a self-join, the query optimizer evaluates join selection and selects the join strategy with the lowest cost. It determines the cost using a number of factors, including the expected number of reads and the amount of memory required. It can use three basic strategies for processing joins: nested loop joins, merge joins, and hash joins. In each method, the tables to be joined (or subsets of tables that have already been restricted by applying SARGs) are called the join inputs.

    Joins are usually processed as nested iterations—a set of loops that take a row from the first table and use that row to scan the inner table, and so on, until the result that matches is used to scan the last table. The number of iterations through any of the loops equals the number of scans that must be done. (This is not a table scan, since it is usually done using an index. In fact, if no useful index is available to be used for an inner table, nested iteration probably isn't used and a hash join strategy is used instead.) The result set is narrowed down as it progresses from table to table within each iteration in the loop. If you've done programming with an ISAM-type product, this should look familiar in terms of opening one "file" and then seeking into another in a loop

    ....

    quote:


    Maybe I'm dreaming this up, but I thought I had read somewhere that the query optimizer considers tables in groups of four, starting with how they are listed in the query. So, if you have eight tables in a query, it optimizes the first four and the last four joins separately... This is done to reduce the cost of generating the query plan.

    I've heard of tools that will take all possible permutations of the joins in a statement and find the best re-write (I think they probably also add index hinting). I have never used one personally, and I tend to prefer well formatted and easy to maintain code over trying to eek out a little extra performance from the optimizer since it does a pretty good job anyway.


    Now this might be of interest for you.

    According to some folklore, SQL Server "does not optimize" joins of more than four tables. There was some truth to this in earlier versions of SQL Server, but in SQL Server 7 optimization doesn't happen in the same way at all. There are so many different possible permutations of join strategies and index usage that the query optimizer goes through multiple optimization passes, each considering a larger set of the possible plans. On each pass, it keeps track of the best plan so far and uses a cost estimate to reduce the search in succeeding passes. If your query contains many tables, the query optimizer is likely to interrupt the last pass before completion because the last pass is quite exhaustive. The cheapest plan found will then be used.

    Reading this cleared things a bit up for me.

    HTH

    Frank

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Wow, thanks for looking that up Frank. I must have read that in my 6.5 days... 🙂

    That explains why sometimes it works when I restort to using derived tables and such since the optimizer might not get to the right plan in that last pass... In one project I work on, we have a section of the application that uses a self refering table, and another that is joined to itself many, many times to pull out attributes for an entity in another table. You can imagine some of these queries require a little optimization to get to work.

    Matthew Galbraith

  • Thanks for your help guys.

    I did some tests on the example I gave you, ordering them small - large shaved 20% off run time and the query plan looked better.

    However, I tried the same thing with a much more complex query (12 tables with a range of inner and outer joins) and it took considerably longer. Horses for courses I think.

    Franks extract has helped me understand it better, now it's a case of remembering it when I come to optomising my dev code.

    Thanks.

  • Incidentally, IIRC a bookmark lookup is done when a non-clustered index is used, since the index that is used still has to be referred back to the clustered index to look up the actual record.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply