How to rewrite a LEFT OUTER JOIN query?

  • A lot of answers are mentioning that LEFT OUTER JOINs are slow and sometimes indeed they are.

    But what if we really have to write a query like this:

    SELECT A.some_fields, B.some_fields

    FROM A

    LEFT OUTER JOIN B

    ON A.f1=B.f1 AND func(A.something)=true

    is there any other way to write this query in order to make it as fas as the same query with INNER JOIN could be?

    Usually this kind of query could be faster than the first one :

    SELECT A.some_fields, B.some_fields

    FROM (SELECT some_fields FROM A WHERE func(A.something)=true)

    LEFT OUTER JOIN B

    ON A.f1=B.f1

    But it's still not as fast as the equivalent INNER JOIN query.

    Any help would be greatly appreciated - I have to write tons of queries based on LEFT OUTER JOINs .... and in 90% of them I can't use an INNER JOIN ... :-((

    By the way, the theory says that any query containing sub-queries could be re-written without subqueries and we are strongly advised to do so in order to improve performance. But for god's sake, how could we re-write this second query without subqueries and make it even faster?

    Thank you again for any help!

  • SELECT A.some_fields, B.some_fields

    FROM A

    LEFT OUTER JOIN B ON A.f1=B.f1

    WHERE func(A.something)=true

    Clause "func(A.something)=true" is big NO-NO in queries.

    Results in table scan every time.

    Rethink your query to avoid such calls.

    _____________
    Code for TallyGenerator

  • Sergiy (10/17/2007)


    SELECT A.some_fields, B.some_fields

    FROM A

    LEFT OUTER JOIN B ON A.f1=B.f1

    WHERE func(A.something)=true

    Clause "func(A.something)=true" is big NO-NO in queries.

    Results in table scan every time.

    Rethink your query to avoid such calls.

    So what if you have an index on the column, would it not just be an index scan, which isn't that bad?

  • Max Yasnytskyy (10/17/2007)


    So what if you have an index on the column, would it not just be an index scan, which isn't that bad?

    Index contains values from the column.

    And it's not the values in the column you evaluate in such WHERE clause.

    There is no way index can help here.

    So, it will be table scan anyway (or clustered index scan - it's effectively the same because leaf level of clustered index contains entire table).

    _____________
    Code for TallyGenerator

  • Depends... index scans can be worse than a table scan because even after the index scan, it still has to get the data.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Just being picky, but...

    There's no such thing as a LEFT JOIN being 'equivalent' to an INNER JOIN.

    Those are quite different, different questions asking for different answers and works in different ways.

    Also, LEFT JOINS per se aren't inherently slower than anything else.

    For a given individual query, LEFT JOIN may perform better than a NOT EXISTS/EXISTS or correlated subquery construct, for others not. And over time, what used to be 'best' may change when your data grows or it's density changes. Databases are living breathing things, and what is 'good' today may be 'bad' tomorrow.

    Query-tuning and index strategies is an evergoing process as time passes. (for most db's anyway)

    In this case, the function seems to be 'the bad guy'.

    Try to write your query without the function, incorporate it's functionalty into the query directly and see if it improves?

    /Kenneth

  • I was a little misunderstood - func(A.something) is not an UDF ... just something like A.some_field = a_value or something like that, I know that UDF are not a very good idea in a WHERE clause ... 😉

  • System functions are as bad here as UDFs.

    But it does not mean you cannot use functions in WHERE clause at all.

    You should not use it to modify values in table columns.

    _____________
    Code for TallyGenerator

  • I have the same problem with a query which has many LEFT JOINs. And for some LEFT JOINs I have bad seek index. For example:

    SELECT s.nrintstoc, s.tipdoc FROM

    stocuri s

    left join nir n on s.nrintdoc=n.nrintnir and s.tipdoc='NR'

    left join bon b on s.nrintdoc=b.nrintbon and s.tipdoc in ('BP', 'bc', 'BT', 'BA')

    From Execution Plan I deducted that I have problems on seeking column tipdoc - char(2). nrintdoc, nrintnir and nrinbon are integers (int).

    How to rethink this query or structure of tables?

    In Theory, theory and practice are the same...In practice, they are not.
  • Is the 'problem column' indexed in any way?

    If it is, is the statistics up to date?

    How many rows are expected to have this value? ('NR' was it?)

    If it's too common (ie too low density) then an index won't be used anyway, because in such cases a scan is cheaper than repeatedly accessing the index.

    /Kenneth

  • Yes, i already update statistics for stocuri table and have indexes on pairs of indexes like nrintdoc and tipdoc and nothing changes.

    In Theory, theory and practice are the same...In practice, they are not.
  • Sorin,

    Your query returns columns only from table "s".

    Why you need those joins at all?

    _____________
    Code for TallyGenerator

  • I put only few columns, you right, I have many columns to return, but here I show you only involving columns in this bad issue.

    Also, the count of values for tipdoc are the following:

    'NR' - 248.000

    'BP' - 51.000

    'bc' - 401

    'BT' - 567.000

    'BA' - 0

    In Theory, theory and practice are the same...In practice, they are not.
  • Well, according to these counts (they are the complete table?) it looks like your qyery is hitting just about every row. I'd expect this table to be scanned, since you're obviously matching the entire table anyway.

    The performance will then be dictated on how fast your drives are (for the scan) and how large your data cahce is (RAM on the server)

    btw, how slow is 'slow'..? What's the current timings that you want to improve.

    Also, what is the total rowcount returned from this query?

    /Kenneth

  • I tested the execution of stored proc which contains this query on my workstation (1GB RAM) and time is about 5 minutes and returns about 180 rows.

    Also, rowcounts for tables are :

    stocuri - 870.000

    bon - 650.000

    nir - 92.000

    When it is executed on production server (4 processors and 4 GB RAM but spindles are slow - 7200 rpm) the execution is very slow according with concurential access of db server (about 250 users) - about 20 minutes or slowest.

    In Theory, theory and practice are the same...In practice, they are not.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply