Alternative to NOT IN

  • I am trying to speed up a query. I am looking for an alternative to a 'NOT IN'. Example: I have a table with 2 columns-

    IDFromTable1 IDFromTable2

    1 100

    1 102

    1 104

    2 110

    2 102

    2 100

    3 104

    3 100

    3 134

    Now, I can do something like:

    Select IDFromTable1 from Table where IDFromTable2=100 and IDFromTable1 NOT IN (Select IDFromTable1 from Table where IDFromTable2=102)

    The above query should return an IDFromTable1 of 3.

    From what I've read, an alternative might be some type of self join. Any help/suggestions appreciated.

    John

  • A self-join won't really help because you'd still need to exclude the rows with any IDFromTable1 where IDFromTable2 is excluded. If you don't already have one, try putting an index on IDFromTable2. You culd also try using NOT EXISTS rather than NOT IN, e.g.:

    
    
    SELECT IDFromTable1
    FROM [Table] o
    WHERE IDFromTable2 = 100
    AND NOT EXISTS
    (SELECT *
    FROM [Table] i
    WHERE i.IDFromTable1 = o.IDFromTable1
    AND i.IDFromTable2 = 102)

    --Jonathan



    --Jonathan

  • Thanks so much Jonathan. The NOT EXISTS worked great... performance probably 10 times faster and now acceptable.

  • For a left join you could try something like the following...

    
    
    SELECT T.IDFromTable1 , T.IDFromTable2
    FROM Table T
    LEFT JOIN Table T2 ON T.IDFromTable1 = T2.IDFromTable1 AND T2.IDFromTable2 = 102
    WHERE T2.IDFromTable1 IS NULL

    The outer join should be faster than the Not Exists and Not In when you have a large dataset.

    Gary Johnson

    Microsoft Natural Language Group

    DBA, Sr. DB Engineer




    Gary Johnson
    Microsoft Natural Language Group
    DBA, Sr. DB Engineer

    This posting is provided "AS IS" with no warranties, and confers no rights. The opinions expressed in this post are my own and may not reflect that of my employer.

  • quote:


    The outer join should be faster than the Not Exists


    Hmmm, interesting. I thought this and was going to post a join query. But before I did, I tried an example by matching one table with 69,281 rows against another with 321,980 rows, both with primary keys and looking for any row in the first not in the second (using primary key). When I looked at the execution plans I was surprised that they were virtually identical other then the NOT EXISTS does a 'Right Anti Semi Join'. The NOT EXISTS took 5 secs whereas the JOIN took 28 secs. Not the same as John's I know but it does no harm to experiment.

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Thanks for all the posts and information. It's been very informative. After more experimenting, the NOT EXISTS runs about 8 times faster on average. Seems the larger the number of rows in the 'NOT' side has a much bigger impact on the 'NOT IN' than the 'NOT EXISTS'.

    Now it's off to BOL, what the heck is a 'right anti semi join'...

    John

  • I just installed the Google Searchbar and decided to try the feature where you right click on text on a browser page and choose google search. The top ranked result was from MSDN:

    Right Anti Semi Join

    The Right Anti Semi Join logical operator will output each row from the second (bottom) input when a matching row in the first (top) input does not exist. A matching row is defined as a row that satisfies the predicate in the Argument column (if no predicate exists, each row is a matching row).

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply