not in vs not exists returns a different count

  • Hi

    I have 2 queries that return a different count and can't quite see why

    This query uses not exists and is very quick but returns around 4k rows

    I'm led to believe not exists is a better approach

    select count(*) FROM [DW_Enterprise]..E_SIP_CLAIM_PAYMENT A

    INNER JOIN [SiriusPink]..CLAIM C

    ON A.claim_id = C.claim_id

    where not exists (--A.base_claim_payment_id NOT in (

    SELECT 1

    FROM [DW_Enterprise]..E_SIP_CLAIM_PAYMENT AA

    INNER JOIN [SiriusPink]..CLAIM_PAYMENT_ITEM BB

    ON AA.claim_payment_id = BB.claim_payment_id

    INNER JOIN [SiriusPink]..CLAIM CC

    ON AA.claim_id = CC.claim_id

    WHERE CC.version_id < C.version_id

    )

    This query takes uses not in, takes about 30 minutes BUT returns around 55k rows

    The execution plan shows a very inefficient left anti-semi join Nested Loop

    select count(*) FROM [DW_Enterprise]..E_SIP_CLAIM_PAYMENT A

    INNER JOIN [SiriusPink]..CLAIM C

    ON A.claim_id = C.claim_id

    where A.base_claim_payment_id NOT in (

    SELECT 1

    FROM [DW_Enterprise]..E_SIP_CLAIM_PAYMENT AA

    INNER JOIN [SiriusPink]..CLAIM_PAYMENT_ITEM BB

    ON AA.claim_payment_id = BB.claim_payment_id

    INNER JOIN [SiriusPink]..CLAIM CC

    ON AA.claim_id = CC.claim_id

    WHERE CC.version_id < C.version_id

    )

    I cannot see what's causing the row count difference

    Any ideas?

    Thanks

    - Damian

  • The 2nd query is wrong:

    select count(*) FROM [DW_Enterprise]..E_SIP_CLAIM_PAYMENT A

    INNER JOIN [SiriusPink]..CLAIM C

    ON A.claim_id = C.claim_id

    where A.base_claim_payment_id NOT in (

    SELECT 1

    FROM [DW_Enterprise]..E_SIP_CLAIM_PAYMENT AA

    INNER JOIN [SiriusPink]..CLAIM_PAYMENT_ITEM BB

    ON AA.claim_payment_id = BB.claim_payment_id

    INNER JOIN [SiriusPink]..CLAIM CC

    ON AA.claim_id = CC.claim_id

    WHERE CC.version_id < C.version_id

    )


    Alex Suprun

  • Some DDL for the tables and some sample data would help us more certainly answer that.

    Having said that, unless something got lost in copy/paste translation, it looks like there are a couple issues.

    The first is that your NOT IN subquery is returning a result set consisting entirely of rows with a column containing only the value 1, so it's effectively just saying return all rows where the base_claim_payment_id isn't 1. I think that's clearly not what you're wanting to do 🙂

    The second is that the NOT IN query makes it look like you're wanting to somehow relate the base_claim_payment_id from the outer query to something in the inner query, but the NOT EXISTS subquery doesn't at all tie base_claim_payment_id from the outer query to anything in the inner query.

    If you mocked up some sample data and gave us DDL for the tables, along with the sample data in the form of INSERTs and your desired results, we'd be able to help out more.

    Cheers!

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply