Avoiding NOT EXISTS or NOT IN

  • I have 2 tables whith the same structure

    and i'want to find all the values from table1 where not exists in table2

    Table1(field1,field2,field3)

    Table2(field1,field2,field3)

    SQL with NOT EXISTS:

    SELECT

    field1,field2,field3

    FROM Table1, Tabl2

    WHERE NOT EXISTS

    (

    SELECT field1,field2,field3

    FROM Table1

    WHERE table1.field1=table2.field1

    AND table1.field2=table2.field2

    AND table1.field3=table2.field3

    )

    AND table1.field1=table2.field1

    how can avoid the NOT EXISTs for performance reasons?

    Thanks

    Bill

  • I prefer joins but don't know if performance is better though.

    SELECT t1.field1,t1.field2,t1.field3 
    
    FROM Table1 t1
    LEFT OUTER JOIN Table2 t2
    ON t2.field1 = t1.field1
    AND t2.field2 = t1.field2
    AND t2.field3 = t1.field3
    WHERE t2.field1 IS NULL

    Far away is close at hand in the images of elsewhere.
    Anon.

  • I don't believe there is a performance difference. The query optimiser should take care of this.

    Bill, I would like to point out that your query is not entirely correct.

    
    
    SELECT field1,field2,field3
    FROM Table1
    WHERE NOT EXISTS
    (
    SELECT Table2.field1,
    Table2.field2,
    Table2.field3
    FROM Table2
    WHERE table1.field1=table2.field1
    AND table1.field2=table2.field2
    AND table1.field3=table2.field3
    )

    Two changes :

    1. Remove the implicit (old style join) from the outer query.

    2. Add explicitly the table qualifier in the NOT EXISTS clause.

    Don't know if this makes any difference, but it is worth a try.

  • Why do you need to do a NOT EXIST or Join queries.

    I too have a similar situation where we create labels file for our subscribers every month and if we need to know who are the new subscribers this month then we have to find out the records in the current lables file which were not in the last months.

    what is do is as follows:

    select * from labels_19_jul_2003 where id not in (select id from labels_19_jun_2003)

    and get what i want it can also work vice versa.

    Hope this helps

    Regards

    Hitendra

  • Maybe not a direct answer to your question but I've used this construction

    SELECT field1,field2,field3

    FROM

    (SELECT field1, field2, field3, 1 as mask FROM Table1

    UNION ALL

    SELECT field1, field2, field3, 2 as mask FROM Table2) AS Tab

    GROUP BY field1, field2, field3 HAVING SUM(tab.mask) = 1

    Performance is most likely not better but this can be used with more tables and

    it is very easy to find records that exists in table1 and table2 but not in table3 (see below)

    SELECT field1,field2,field3

    FROM

    (SELECT field1, field2, field3, 1 as mask FROM Table1

    UNION ALL

    SELECT field1, field2, field3, 2 as mask FROM Table2

    UNION ALL

    SELECT field1, field2, field3, 4 as mask FROM Table3) AS Tab

    GROUP BY field1, field2, field3 HAVING SUM(tab.mask) = 3

    /Fredrik

  • jollyguy4all,

    quote:


    select * from labels_19_jul_2003 where id not in (select id from labels_19_jun_2003)


    I used to do it this way but found that performance can be poor, especially the larger the data set. I found using joins a vast improvement, not always but most of the time.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • hi guys!

    concerning: "outer-joins w/is null" vs. "not in" - to my experience if you've got larger tables the performance of the outer-join w/is null soluting significantly degrades compared to the "not in" solution.

    best regards,

    chris.

  • I believe the performance of either three solutions is highly dependant on the data that is contained in the tables.

    1. The number of records in both tables

    2. The number of records that match in both tables

    E.g. the NOT IN solution probably won't yield good performance if Table2 contains a lot of records.

    The Outer Join and NOT EXISTS solution probably won't yield good performance if both tables have a lot of matching rows.

  • One note. Whenever you do an EXISTS or NOT EXISTS change your subquery to "SELECT 1" instead of referencing cells. May not be a huge gain but results in less data coming back and is always a good practice. At least you aren't doing "SELECT *". This would be especially bad on a wide table.

  • after a few tests the solution with the Left Outer Join (DavidBurrows) yields the best performance compared to NOT EXISTS...

    SELECT t1.field1,t1.field2,t1.field3 FROM Table1 t1 LEFT OUTER JOIN Table2 t2 ON t2.field1 = t1.field1 AND t2.field2 = t1.field2 AND t2.field3 = t1.field3 WHERE t2.field1 IS NULL

    thanks...

    bill

  • When I have to be sure there will not be a performance issue, I do this

    create table #notInTbl2 (id int)

    insert into #notinTbl2 select ID from tbl1

    delete from #notInTbl2 where id in

    (select id from tbl2)

    select id from #notInTbl2

    Four statements in place of one, but SQL Server, as was the case with Ingres before SQL Server, is dependably fast with the positive as contrasted with the negative. The incentive for DBMS vendors has been fast inserts and deletes, not fast selects on "not in/not exists."

    _________________
    "Look, those sheep have been shorn."
    data analyst replies, "On the sides that we can see.."

  • quote:


    One note. Whenever you do an EXISTS or NOT EXISTS change your subquery to "SELECT 1" instead of referencing cells. May not be a huge gain but results in less data coming back and is always a good practice. At least you aren't doing "SELECT *". This would be especially bad on a wide table.


    Please see my topic "EXISTS (SELECT *) or (SELECT 1)"

    From MS's internal training for SQL Server 2000 Programming, it is said that there wouldn't be a performance different using SELECT *. In fact, it is a better idea to use SELECT * than to pick a column inside the tables as the column you pick may not have the appropriate index.

    I think NPeeters replied saying that the Query Optimizer knows that you don't need the data inside an "EXISTS", thus Select * is just fine. I was wondering if this holds true if inside the EXISTS, I'm doing a complicated join of multiple tables, but I guess SELECT * is still good... it is more important to use derived tables inside the joins so to minimize the data set being joined together. (of course need to consider the index too.)

    As for NOT IN / NOT EXISTS, I think that lady (her name is Kimberly Tripp-Simonnet) said to avoid NOT if at all possible because Query Optimizer doesn't work w/ NOTs. Thus I do what DavidBurrows does. LEFT OUTER JOIN and returns where right side is NULLS.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply