Re: EXCEPT vs. NOT EXISTS

Question

Post reply

Re: EXCEPT vs. NOT EXISTS

jlp3630

Hall of Fame

Points: 3243
More actions
October 8, 2008 at 9:06 am

#382991

Are there any performance differences between EXCEPT and NOT EXISTS?

Currently, I'm using the following "simplified" query (omitting a bunch of joins):

DECLARE @data_date DATETIME;

DECLARE @compare_date DATETIME;

SET @compare_date = DATEADD(week, -2, @data_date);

SELECT COUNT(DISTINCT P.pk_product_id)

FROM tbl_products

WHERE analytic_date = @data_date

AND NOT EXISTS (SELECT P2.pk_product_id

FROM tbl_products P2

WHERE analytic_date = @compare_date)

Are there any advantages/disadvantages to switching this to an EXCEPT query?

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

The Dixie Flatline SSC Guru Points: 53353 More actions · Answer 1

Well, to start with the outputs are different. Exists returns a true/false value, whereas except returns a set of 0-n rows. Why would you want to change what you have, if it's working?

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Jack Corbett SSC Guru Points: 184388 More actions · Answer 2

The best way to find out is to test it. Write the query using both methods and check the execution plan and execution statistics.

These following queries all return the same data with the NOT EXISTS and EXCEPT returning the same execution plan while the LEFT JOIN (which has been my preferred method in these situations) has a different and more expensive plan:

[font="Courier New"]USE AdventureWorks;



GO

SELECT

   P.ProductID

FROM

   Production.Product P LEFT JOIN

   Production.WorkOrder W ON

       W.productid = P.productid

WHERE

   W.productid IS NULL ;

USE AdventureWorks;

GO

SELECT

   ProductID

FROM

   Production.Product P

WHERE

   NOT EXISTS (SELECT

                   ProductID

               FROM

                   Production.WorkOrder W

               WHERE

                   W.productid = P.productid) ;

USE AdventureWorks;

GO

SELECT

   ProductID

FROM

   Production.Product

EXCEPT

SELECT

   ProductID

FROM

   Production.WorkOrder ;

[/font]

Execution Plan is attached.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

jlp3630 Hall of Fame Points: 3243 More actions · Answer 3

bhovious (10/8/2008)

Why would you want to change what you have, if it's working?

I'm looking to change it because the "non-simplified" query is running slower than desired. I'm trying to systematically go through all possible scenarios to rewrite the query for better performance. Before I started rewriting the query, I wanted to ask if using EXCEPT would produce performance gains.

Jack Corbett SSC Guru Points: 184388 More actions · Answer 4

jlp3630 (10/8/2008)

Are there any performance differences between EXCEPT and NOT EXISTS?

Currently, I'm using the following "simplified" query (omitting a bunch of joins):

DECLARE @data_date DATETIME;

DECLARE @compare_date DATETIME;

SET @compare_date = DATEADD(week, -2, @data_date);

SELECT COUNT(DISTINCT P.pk_product_id)

FROM tbl_products

WHERE analytic_date = @data_date

AND NOT EXISTS (SELECT P2.pk_product_id

FROM tbl_products P2

WHERE analytic_date = @compare_date)

Are there any advantages/disadvantages to switching this to an EXCEPT query?

I just noticed that you are not providing any link between the sub-query in the NOT EXISTS and I believe you need that. Basically the NOT EXISTS sub-query is returning all the records. If you add this to the WHERE in the sub-query you will probably get the performance you need:



AND

P2.pk_product_id = tbl_products.pk_product_id

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

The Dixie Flatline SSC Guru Points: 53353 More actions · Answer 5

Jack's right. Good eye!

For what it's worth, I always use the left join format too. It's simple to remember, fast to execute, and easy to read. I prefer to see all the table relationships grouped together in a few join lines as opposed to a blob of subqueries.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 6

bhovious (10/8/2008)

Jack's right. Good eye!

For what it's worth, I always use the left join format too. It's simple to remember, fast to execute, and easy to read. I prefer to see all the table relationships grouped together in a few join lines as opposed to a blob of subqueries.

If the two tables are relatively small, they're fairly interchangeable. Once one of the tables gets bigger though, it can be a bit of a toss-up as to which is faster, but the differences get to be fairly marked.

From what I've seen, IN performs better when there is a high degree of correlation between the tables; EXISTS tends to work better when they match up less evenly. I like EXCEPT when I need to do "compound" comparisons.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

jlp3630 Hall of Fame Points: 3243 More actions · Answer 7

jlp3630

Hall of Fame

Points: 3243

October 8, 2008 at 11:55 am

#882485

I like EXCEPT when I need to do "compound" comparisons.

What do you mean by "compound" comparisons?

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 8

jlp3630 (10/8/2008)

I like EXCEPT when I need to do "compound" comparisons.

What do you mean by "compound" comparisons?

When I'm trying to find differences across multiple columns.

As in something like



SELECT name, address, phonenumber

from tableA

EXCEPT

SELECT name, address, phonenumber

from tableB

where the comparison will show any difference in any of those columns

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Johan Bijnens SSC Guru Points: 134873 More actions · Answer 9

IMO you should use "where exists (*** correlated subquery)"

EXCEPT works on the full rows columns, so it needs two result sets to compare row by row, column by column.

As Matt stated: this is good when you actually have to compare row by row, column by column

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

The Dixie Flatline SSC Guru Points: 53353 More actions · Answer 10

Left Joins or WHERE NOT EXISTS are the ways to go. Here's why.

I ran a little experiment using the LEFT JOIN technique versus EXCEPT. The code labeled "Simple Compare" below produces close to two million rows for the first table and almost 3 million for the second table. The queries produced identical result sets of just over 1.7 million rows. Both execution plans performed clustered index scans of both tables as feeds to a nested loop. The only difference was that the LEFT JOIN technique added a filter step which tested for a null in the joined table. I ran it several times, sometimes with the Left Join query launching first, sometimes with the Exists query launching first. Time results in ms follow, and as you can see the Join produced comparable, if not better performance:

Run JOIN EXCEPT

1 16445 18316

2 17551 17693

3 17636 19174

4 19416 18155*

5 21262 18843*

6 18104 18475

7 18195 19089

8 18472 19808

9 21420 19558*

10 19094 17460

Next I varied the ID/SUBID combinations in table 2 to produce a smaller number (8000) of output rows from almost two million rows in each table. In this scenario, the Left Join was clearly the loser.

Run JOIN EXCEPT

1 7568 6866

2 7334 6993

3 7366 6769

4 7317 6964

5 7153 6830

So far, it appears that if we're looking for a small number of "exceptions" out of two large tables, using the EXCEPT technique will be consistently faster.

HOWEVER, using the Left Join enables me to immediately make use of all columns in table 1, whether or not they were used for comparison to table 2. By itself, EXCEPT can only output those columns which were for used for comparison. If you want to go back and get additional data, you have to add additional code.

When doing time trials using the code listed under "Variation" below, the Left Join was clearly superior even when the number of exceptions was small (8000 out of two tables with approx 4m rows apiece). You may notice the JOIN seems to be running faster, but it's because it's 6:00 by now and the server has a lesser load on it.

Run JOIN EXCEPT

1 14911 29058

2 14698 28069

3 14669 27782

The Left Join is now running TWICE as fast. Why the sudden reversal? Because the Left Join is still doing the exact same execution plan, just producing a slightly larger output row to include the "filler" column. To get at the filler column using the Except technique, I had to join to the result set produced by the Except, and the execution plan had to add a second nested loop. Since I rarely ever want to look at exceptions without looking at additional data about them, I intend to stick with the Left Join technique.

I'll concede that Where Not Exists is going to run slightly faster because it will get the whole job done with one nested loop and no filter. I still prefer the readability of the left join but I may have to change my evil ways since performance is what matters.

=================================================================

Simple Compare

=================================================================

set nocount on;

set statistics time on;

declare @tab1 table (id int, subID int, filler varchar(20), primary key (id,subID))

declare @tab2 table (id int, subID int, filler varchar(20), primary key (id,subID))

insert into @tab1

select t1.N, t2.N, 'abcedefghijklmno'

from analytics.dbo.tally t1

cross join analytics.dbo.tally t2

where t1.N between 5 and 2000

and t2.N between 1 and 1000

insert into @tab2

select t1.N, t2.N, 'zzzzzzzzzzz'

from analytics.dbo.tally t1

cross join analytics.dbo.tally t2

where t1.N between 1 and 300

and t2.N between 10 and 10000

select 'Tab1', count(*) from @tab1

select 'Tab2', count(*) from @tab2

print '------------------------------JOIN'

select t1.id,t1.subid

from @tab1 t1

left join @tab2 t2 on t2.id = t1.id and t2.subid = t1.subid

where t2.id is null

print '------------------------------EXCEPT'

select ID,subID from @tab1

except

select ID,subID from @tab2

==================================================================

Variation

(includes additional column from table 1)

==================================================================

set nocount on;

set statistics time on;

declare @tab1 table (id int, subID int, filler varchar(20), primary key (id,subID))

declare @tab2 table (id int, subID int, filler varchar(20), primary key (id,subID))

insert into @tab1

select t1.N, t2.N, 'abcedefghijklmno'

from analytics.dbo.tally t1

cross join analytics.dbo.tally t2

where t1.N between 5 and 2000

and t2.N between 1 and 2000

insert into @tab2

select t1.N, t2.N, 'zzzzzzzzzzz'

from analytics.dbo.tally t1

cross join analytics.dbo.tally t2

where t1.N between 1 and 2000

and t2.N between 5 and 2005

select 'Tab1', count(*) from @tab1

select 'Tab2', count(*) from @tab2

print '------------------------------JOIN'

select t1.*

from @tab1 t1

left join @tab2 t2 on t2.id = t1.id and t2.subid = t1.subid

where t2.id is null

print '------------------------------EXCEPT'

select t1.*

from @tab1 t1

join (select ID,subID from @tab1

except

select ID,subID from @tab2) as x

on x.ID = t1.ID and x.subID = t1.SubID

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills