in and not in question

  • Hey everyone, hopefully this is a simple question, but it has me puzzled!

    I have two tables (lets call them A and B)

    SELECT * from tblA

    where tblA.id in (select id from tblB)

    ^This ^ works fine and allows me to see all the entries in table A which also are in table B

    SELECT * from tblA

    where tblA.id not in (select id from tblB)

    I *thought* ^this^ would allow me to see all the IDs which occur in table A but do not appear in table B

    In my 'real example' :

    Table A has around 10,000 rows, table B has around 500 rows, there is overlap, using the first query I get say 100 results - i.e. 100 records which occur in both tables.

    Using the second query I would expect to obviously get a minimum of 10,000- 500 (if there were perfect overlap, which there is not)

    Instead I get zero!!

    I've tried the same thing on a few tables and with the same results - can anyone shed any light on this for me?!

    Thanks

  • Try this:

    SELECT * from tblA

    where tblA.id not in (select tblB.id from tblB)

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I tried that previously I seem to recall, and it didn't work - for the record I tried again:

    SELECT * from tblfirm f

    Where f.firm_id not in (select a.firm_id from tbladvisor a)

    And still zero results - when as I said before there should be thousands :/

  • I believe you have nulls in tblB. If not in encounters null, the whole WHERE is yielding UNKNOWN, thus no records are returned.

    Look at this sample code:

    create table tblA(id int)

    create table tblB(id int)

    go

    --set ansi_nulls off

    insert tblA(id) values (1)

    insert tblA(id) values (2)

    insert tblA(id) values (3)

    insert tblA(id) values (4)

    insert tblA(id) values (5)

    insert tblB(id) values (1)

    insert tblB(id) values (2)

    insert tblB(id) values (3)

    insert tblB(id) values (null)

    insert tblB(id) values (6)

    select id [id exists] from tblA where id in (select id from tblB)

    select id [id doesn't exist] from tblA where id not in (select id from tblB)

    select id [id not 1 2 3] from tblA where id not in (1, 2, 3)

    select id [id in 1 null] from tblA where id not in (1, null)

    select id [id not in 1 null] from tblA where id not in (1, null)

    select id [null id to 0] from tblA where id not in (select isnull(id, 0) from tblB)

    select id [not id in tblB] from tblA where not (id in (select id from tblB))

    go

    drop table tblA

    drop table tblB

    Interesting is that if you set ansi nulls to off

    set ansi_nulls off

    everything will work. Uncomment this statement in code above and run it to verify.

    HTH

    Piotr

    ...and your only reply is slàinte mhath

  • Thanks Piotr Rodak - I tried the example you posted and it makes perfect sense to me now.

    Seems odd that it works fine with 'in' but doesn't with 'not in' - I thought the two would be very similar

    so if I were using a sub query with not in as part of a larger query if I just had

    set ansi_nulls off

    right at the top of the query window the above would work?

    If so, what other consequences might this action result in?

  • Another possibility is using the exists and not exists clause

    in place of your {not in} clause try this:

    select * from tableA a

    where not exists (select null from tableB where id = a.id)

    Dave Novak

  • peitech (3/28/2008)


    Thanks Piotr Rodak - I tried the example you posted and it makes perfect sense to me now.

    Seems odd that it works fine with 'in' but doesn't with 'not in' - I thought the two would be very similar

    so if I were using a sub query with not in as part of a larger query if I just had

    set ansi_nulls off

    right at the top of the query window the above would work?

    If so, what other consequences might this action result in?

    Try this on. When you look at the IN syntax, ONE (or more) of the values have to match for it to be true; when you look at the NOT IN syntax, ALL of the values must not match. Nulls with confuse the NOT IN, since it's a "harder question" so to speak.

    I'd steer clear of turning off ANSI_NULLS, since besides leading to lots of very confusing scenarios, it's marked as "deprecated", meaning the setting is going to be removed in the future and will not work anymore.

    Instead - change your syntax to remove NULL values from the inner SELECT statement. Should be faster too in a fair amount of cases.

    As in

    select id

    from tblA

    where id NOT IN (select id from tblB where ID IS NOT NULL)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt,

    Would not an 'EXISTS' clause be cleaner and more efficient in this situation or not? I am just asking this because I am curious, and I have been seeing your name all over this forum with some very good advice.

    Dave Novak

  • ACtually - I do find that EXISTS often enough works better.

    It's a bit of a "it depends", but I tend towards EXISTS rather than IN unless the "IN" list is very short. The bigger the inner list, the more likely EXISTS will outperform IN (in my experience.)

    Also - because of the correlation - you tend to not have the NULL issue (or you will have explicitly handled it).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thanks Matt,

    I couldn't shake off my head that this WHERE id IS NOT NULL is the right approach. I know about ansi_nulls off/on more that I'd wish 😉

    Might be interesting to compare execution plans for queries with EXISTS and IN subqueries.

    Piotr

    ...and your only reply is slàinte mhath

  • Just putting a solution that is new to SQL 2005. It works the same way as not in, except nulls do not throw it off.

    select id

    from tblA

    except

    select id

    from tblb

  • Thanks Adam, it's really cool to see the different ways of tackling this problem, before seeing your post I was using the is not null as part of my sub query's where clause, Ill give this method a try a little later on today 🙂

    cheers,

    Rob

  • Adam Haines (3/31/2008)


    Just putting a solution that is new to SQL 2005. It works the same way as not in, except nulls do not throw it off.

    select id

    from tblA

    except

    select id

    from tblb

    This is similar and may in some cases work faster:

    SELECT a.id

    FROM tblA a

    LEFT OUTER JOIN tblB b

    ON b.id = a.id

    WHERE b.id IS NULL

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • If there are any null values in the id field in table B, "not in" will return 0 rows. Since "null" means unknown, you're telling to select all values from table A, except those that match "unknown", which means they might all match, so you get none.

    Solve this by adding "where id is not null" to your sub-query.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • This is similar and may in some cases work faster:

    SELECT a.id

    FROM tblA a

    LEFT OUTER JOIN tblB b

    ON b.id = a.id

    WHERE b.id IS NULL

    Yes it is. This is the method I usually use but I felt like posting some SQL 2005 exclusive code :).

    You will also not that the left outer join is not always faster than using an in clause. As with most things, it depends.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply