Re: EXCEPT vs. NOT EXISTS

  • Are there any performance differences between EXCEPT and NOT EXISTS?

    Currently, I'm using the following "simplified" query (omitting a bunch of joins):

    DECLARE @data_date DATETIME;

    DECLARE @compare_date DATETIME;

    SET @compare_date = DATEADD(week, -2, @data_date);

    SELECT COUNT(DISTINCT P.pk_product_id)

    FROM tbl_products

    WHERE analytic_date = @data_date

    AND NOT EXISTS (SELECT P2.pk_product_id

    FROM tbl_products P2

    WHERE analytic_date = @compare_date)

    Are there any advantages/disadvantages to switching this to an EXCEPT query?

  • Well, to start with the outputs are different. Exists returns a true/false value, whereas except returns a set of 0-n rows. Why would you want to change what you have, if it's working?

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • The best way to find out is to test it. Write the query using both methods and check the execution plan and execution statistics.

    These following queries all return the same data with the NOT EXISTS and EXCEPT returning the same execution plan while the LEFT JOIN (which has been my preferred method in these situations) has a different and more expensive plan:

    [font="Courier New"]USE AdventureWorks;

    GO

    SELECT

       P.ProductID

    FROM

       Production.Product P LEFT JOIN

       Production.WorkOrder W ON

           W.productid = P.productid

    WHERE

       W.productid IS NULL ;

    USE AdventureWorks;

    GO

    SELECT

       ProductID

    FROM

       Production.Product P

    WHERE

       NOT EXISTS (SELECT

                       ProductID

                   FROM

                       Production.WorkOrder W

                   WHERE

                       W.productid = P.productid) ;

    USE AdventureWorks;

    GO

    SELECT

       ProductID

    FROM

       Production.Product

    EXCEPT

    SELECT

       ProductID

    FROM

       Production.WorkOrder ;

    [/font]

    Execution Plan is attached.

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • bhovious (10/8/2008)


    Why would you want to change what you have, if it's working?

    I'm looking to change it because the "non-simplified" query is running slower than desired. I'm trying to systematically go through all possible scenarios to rewrite the query for better performance. Before I started rewriting the query, I wanted to ask if using EXCEPT would produce performance gains.

  • jlp3630 (10/8/2008)


    Are there any performance differences between EXCEPT and NOT EXISTS?

    Currently, I'm using the following "simplified" query (omitting a bunch of joins):

    DECLARE @data_date DATETIME;

    DECLARE @compare_date DATETIME;

    SET @compare_date = DATEADD(week, -2, @data_date);

    SELECT COUNT(DISTINCT P.pk_product_id)

    FROM tbl_products

    WHERE analytic_date = @data_date

    AND NOT EXISTS (SELECT P2.pk_product_id

    FROM tbl_products P2

    WHERE analytic_date = @compare_date)

    Are there any advantages/disadvantages to switching this to an EXCEPT query?

    I just noticed that you are not providing any link between the sub-query in the NOT EXISTS and I believe you need that. Basically the NOT EXISTS sub-query is returning all the records. If you add this to the WHERE in the sub-query you will probably get the performance you need:

    AND

    P2.pk_product_id = tbl_products.pk_product_id

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • Jack's right. Good eye!

    For what it's worth, I always use the left join format too. It's simple to remember, fast to execute, and easy to read. I prefer to see all the table relationships grouped together in a few join lines as opposed to a blob of subqueries.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • bhovious (10/8/2008)


    Jack's right. Good eye!

    For what it's worth, I always use the left join format too. It's simple to remember, fast to execute, and easy to read. I prefer to see all the table relationships grouped together in a few join lines as opposed to a blob of subqueries.

    If the two tables are relatively small, they're fairly interchangeable. Once one of the tables gets bigger though, it can be a bit of a toss-up as to which is faster, but the differences get to be fairly marked.

    From what I've seen, IN performs better when there is a high degree of correlation between the tables; EXISTS tends to work better when they match up less evenly. I like EXCEPT when I need to do "compound" comparisons.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I like EXCEPT when I need to do "compound" comparisons.

    What do you mean by "compound" comparisons?

  • jlp3630 (10/8/2008)


    I like EXCEPT when I need to do "compound" comparisons.

    What do you mean by "compound" comparisons?

    When I'm trying to find differences across multiple columns.

    As in something like

    SELECT name, address, phonenumber

    from tableA

    EXCEPT

    SELECT name, address, phonenumber

    from tableB

    where the comparison will show any difference in any of those columns

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • IMO you should use "where exists (*** correlated subquery)"

    EXCEPT works on the full rows columns, so it needs two result sets to compare row by row, column by column.

    As Matt stated: this is good when you actually have to compare row by row, column by column

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Left Joins or WHERE NOT EXISTS are the ways to go. Here's why.

    I ran a little experiment using the LEFT JOIN technique versus EXCEPT. The code labeled "Simple Compare" below produces close to two million rows for the first table and almost 3 million for the second table. The queries produced identical result sets of just over 1.7 million rows. Both execution plans performed clustered index scans of both tables as feeds to a nested loop. The only difference was that the LEFT JOIN technique added a filter step which tested for a null in the joined table. I ran it several times, sometimes with the Left Join query launching first, sometimes with the Exists query launching first. Time results in ms follow, and as you can see the Join produced comparable, if not better performance:

    Run JOIN EXCEPT

    1 16445 18316

    2 17551 17693

    3 17636 19174

    4 19416 18155*

    5 21262 18843*

    6 18104 18475

    7 18195 19089

    8 18472 19808

    9 21420 19558*

    10 19094 17460

    Next I varied the ID/SUBID combinations in table 2 to produce a smaller number (8000) of output rows from almost two million rows in each table. In this scenario, the Left Join was clearly the loser.

    Run JOIN EXCEPT

    1 7568 6866

    2 7334 6993

    3 7366 6769

    4 7317 6964

    5 7153 6830

    So far, it appears that if we're looking for a small number of "exceptions" out of two large tables, using the EXCEPT technique will be consistently faster.

    HOWEVER, using the Left Join enables me to immediately make use of all columns in table 1, whether or not they were used for comparison to table 2. By itself, EXCEPT can only output those columns which were for used for comparison. If you want to go back and get additional data, you have to add additional code.

    When doing time trials using the code listed under "Variation" below, the Left Join was clearly superior even when the number of exceptions was small (8000 out of two tables with approx 4m rows apiece). You may notice the JOIN seems to be running faster, but it's because it's 6:00 by now and the server has a lesser load on it.

    Run JOIN EXCEPT

    1 14911 29058

    2 14698 28069

    3 14669 27782

    The Left Join is now running TWICE as fast. Why the sudden reversal? Because the Left Join is still doing the exact same execution plan, just producing a slightly larger output row to include the "filler" column. To get at the filler column using the Except technique, I had to join to the result set produced by the Except, and the execution plan had to add a second nested loop. Since I rarely ever want to look at exceptions without looking at additional data about them, I intend to stick with the Left Join technique.

    I'll concede that Where Not Exists is going to run slightly faster because it will get the whole job done with one nested loop and no filter. I still prefer the readability of the left join but I may have to change my evil ways since performance is what matters.

    =================================================================

    Simple Compare

    =================================================================

    set nocount on;

    set statistics time on;

    declare @tab1 table (id int, subID int, filler varchar(20), primary key (id,subID))

    declare @tab2 table (id int, subID int, filler varchar(20), primary key (id,subID))

    insert into @tab1

    select t1.N, t2.N, 'abcedefghijklmno'

    from analytics.dbo.tally t1

    cross join analytics.dbo.tally t2

    where t1.N between 5 and 2000

    and t2.N between 1 and 1000

    insert into @tab2

    select t1.N, t2.N, 'zzzzzzzzzzz'

    from analytics.dbo.tally t1

    cross join analytics.dbo.tally t2

    where t1.N between 1 and 300

    and t2.N between 10 and 10000

    select 'Tab1', count(*) from @tab1

    select 'Tab2', count(*) from @tab2

    print '------------------------------JOIN'

    select t1.id,t1.subid

    from @tab1 t1

    left join @tab2 t2 on t2.id = t1.id and t2.subid = t1.subid

    where t2.id is null

    print '------------------------------EXCEPT'

    select ID,subID from @tab1

    except

    select ID,subID from @tab2

    ==================================================================

    Variation

    (includes additional column from table 1)

    ==================================================================

    set nocount on;

    set statistics time on;

    declare @tab1 table (id int, subID int, filler varchar(20), primary key (id,subID))

    declare @tab2 table (id int, subID int, filler varchar(20), primary key (id,subID))

    insert into @tab1

    select t1.N, t2.N, 'abcedefghijklmno'

    from analytics.dbo.tally t1

    cross join analytics.dbo.tally t2

    where t1.N between 5 and 2000

    and t2.N between 1 and 2000

    insert into @tab2

    select t1.N, t2.N, 'zzzzzzzzzzz'

    from analytics.dbo.tally t1

    cross join analytics.dbo.tally t2

    where t1.N between 1 and 2000

    and t2.N between 5 and 2005

    select 'Tab1', count(*) from @tab1

    select 'Tab2', count(*) from @tab2

    print '------------------------------JOIN'

    select t1.*

    from @tab1 t1

    left join @tab2 t2 on t2.id = t1.id and t2.subid = t1.subid

    where t2.id is null

    print '------------------------------EXCEPT'

    select t1.*

    from @tab1 t1

    join (select ID,subID from @tab1

    except

    select ID,subID from @tab2) as x

    on x.ID = t1.ID and x.subID = t1.SubID

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply