How to rewrite a LEFT OUTER JOIN query?

Question

Post reply

How to rewrite a LEFT OUTER JOIN query?

virgilrucsandescu

SSCrazy

Points: 2457
More actions
October 17, 2007 at 8:39 pm

#176270

A lot of answers are mentioning that LEFT OUTER JOINs are slow and sometimes indeed they are.

But what if we really have to write a query like this:

SELECT A.some_fields, B.some_fields

FROM A

LEFT OUTER JOIN B

ON A.f1=B.f1 AND func(A.something)=true

is there any other way to write this query in order to make it as fas as the same query with INNER JOIN could be?

Usually this kind of query could be faster than the first one :

SELECT A.some_fields, B.some_fields

FROM (SELECT some_fields FROM A WHERE func(A.something)=true)

LEFT OUTER JOIN B

ON A.f1=B.f1

But it's still not as fast as the equivalent INNER JOIN query.

Any help would be greatly appreciated - I have to write tons of queries based on LEFT OUTER JOINs .... and in 90% of them I can't use an INNER JOIN ... :-((

By the way, the theory says that any query containing sub-queries could be re-written without subqueries and we are strongly advised to do so in order to improve performance. But for god's sake, how could we re-write this second query without subqueries and make it even faster?

Thank you again for any help!

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Sergiy SSC Guru Points: 110192 More actions · Answer 1

Sergiy

SSC Guru

Points: 110192

October 17, 2007 at 9:42 pm

#742974

SELECT A.some_fields, B.some_fields

FROM A

LEFT OUTER JOIN B ON A.f1=B.f1

WHERE func(A.something)=true

Clause "func(A.something)=true" is big NO-NO in queries.

Results in table scan every time.

Rethink your query to avoid such calls.

_____________
Code for TallyGenerator

Max Yasnytsky SSCrazy Points: 2462 More actions · Answer 2

Sergiy (10/17/2007)

SELECT A.some_fields, B.some_fields

FROM A

LEFT OUTER JOIN B ON A.f1=B.f1

WHERE func(A.something)=true

Clause "func(A.something)=true" is big NO-NO in queries.

Results in table scan every time.

Rethink your query to avoid such calls.

So what if you have an index on the column, would it not just be an index scan, which isn't that bad?

Sergiy SSC Guru Points: 110192 More actions · Answer 3

Max Yasnytskyy (10/17/2007)

So what if you have an index on the column, would it not just be an index scan, which isn't that bad?

Index contains values from the column.

And it's not the values in the column you evaluate in such WHERE clause.

There is no way index can help here.

So, it will be table scan anyway (or clustered index scan - it's effectively the same because leaf level of clustered index contains entire table).

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 4

Depends... index scans can be worse than a table scan because even after the index scan, it still has to get the data.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 5

Just being picky, but...

There's no such thing as a LEFT JOIN being 'equivalent' to an INNER JOIN.

Those are quite different, different questions asking for different answers and works in different ways.

Also, LEFT JOINS per se aren't inherently slower than anything else.

For a given individual query, LEFT JOIN may perform better than a NOT EXISTS/EXISTS or correlated subquery construct, for others not. And over time, what used to be 'best' may change when your data grows or it's density changes. Databases are living breathing things, and what is 'good' today may be 'bad' tomorrow.

Query-tuning and index strategies is an evergoing process as time passes. (for most db's anyway)

In this case, the function seems to be 'the bad guy'.

Try to write your query without the function, incorporate it's functionalty into the query directly and see if it improves?

/Kenneth

virgilrucsandescu SSCrazy Points: 2457 More actions · Answer 6

virgilrucsandescu

SSCrazy

Points: 2457

October 18, 2007 at 7:57 am

#743131

I was a little misunderstood - func(A.something) is not an UDF ... just something like A.some_field = a_value or something like that, I know that UDF are not a very good idea in a WHERE clause ... 😉

Sergiy SSC Guru Points: 110192 More actions · Answer 7

Sergiy

SSC Guru

Points: 110192

October 18, 2007 at 1:58 pm

#743281

System functions are as bad here as UDFs.

But it does not mean you cannot use functions in WHERE clause at all.

You should not use it to modify values in table columns.

_____________
Code for TallyGenerator

Sorin Petcu Hall of Fame Points: 3145 More actions · Answer 8

I have the same problem with a query which has many LEFT JOINs. And for some LEFT JOINs I have bad seek index. For example:

SELECT s.nrintstoc, s.tipdoc FROM

stocuri s

left join nir n on s.nrintdoc=n.nrintnir and s.tipdoc='NR'

left join bon b on s.nrintdoc=b.nrintbon and s.tipdoc in ('BP', 'bc', 'BT', 'BA')

From Execution Plan I deducted that I have problems on seeking column tipdoc - char(2). nrintdoc, nrintnir and nrinbon are integers (int).

How to rethink this query or structure of tables?

In Theory, theory and practice are the same...In practice, they are not.

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 9

Kenneth Wilhelmsson

SSC-Dedicated

Points: 30043

October 19, 2007 at 2:08 am

#743450

Is the 'problem column' indexed in any way?

If it is, is the statistics up to date?

How many rows are expected to have this value? ('NR' was it?)

If it's too common (ie too low density) then an index won't be used anyway, because in such cases a scan is cheaper than repeatedly accessing the index.

/Kenneth

Sorin Petcu Hall of Fame Points: 3145 More actions · Answer 10

Sorin Petcu

Hall of Fame

Points: 3145

October 19, 2007 at 2:19 am

#743453

Yes, i already update statistics for stocuri table and have indexes on pairs of indexes like nrintdoc and tipdoc and nothing changes.

In Theory, theory and practice are the same...In practice, they are not.

Sergiy SSC Guru Points: 110192 More actions · Answer 11

Sergiy

SSC Guru

Points: 110192

October 19, 2007 at 2:20 am

#743454

Sorin,

Your query returns columns only from table "s".

Why you need those joins at all?

_____________
Code for TallyGenerator

Sorin Petcu Hall of Fame Points: 3145 More actions · Answer 12

I put only few columns, you right, I have many columns to return, but here I show you only involving columns in this bad issue.

Also, the count of values for tipdoc are the following:

'NR' - 248.000

'BP' - 51.000

'bc' - 401

'BT' - 567.000

'BA' - 0

In Theory, theory and practice are the same...In practice, they are not.

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 13

Well, according to these counts (they are the complete table?) it looks like your qyery is hitting just about every row. I'd expect this table to be scanned, since you're obviously matching the entire table anyway.

The performance will then be dictated on how fast your drives are (for the scan) and how large your data cahce is (RAM on the server)

btw, how slow is 'slow'..? What's the current timings that you want to improve.

Also, what is the total rowcount returned from this query?

/Kenneth

Sorin Petcu Hall of Fame Points: 3145 More actions · Answer 14

I tested the execution of stored proc which contains this query on my workstation (1GB RAM) and time is about 5 minutes and returns about 180 rows.

Also, rowcounts for tables are :

stocuri - 870.000

bon - 650.000

nir - 92.000

When it is executed on production server (4 processors and 4 GB RAM but spindles are slow - 7200 rpm) the execution is very slow according with concurential access of db server (about 250 users) - about 20 minutes or slowest.

In Theory, theory and practice are the same...In practice, they are not.