Any way to use "not in" with multiple columns?

Question

Post reply

Any way to use "not in" with multiple columns?

Greg J

SSCarpal Tunnel

Points: 4288
More actions
June 21, 2006 at 12:01 pm

#114560

I currently have:

INSERT INTO table1

SELECT colA, colB, colC, colD, colE

FROM table2 t2

LEFT OUTER JOIN table1 t1

ON t1.colC = t2.colC

AND t1.colD = t2.colD

WHERE t1.colX IS NULL

I'd like to get a performance comparison between that and something like:

INSERT INTO table1

SELECT colA, colB, colC, colD, colE

FROM table2

WHERE colC, colD NOT IN (SELECT colC, colD FROM table1)

I'm dealing with record sets around 5 million for both t1 and t2. The columns being compared are CHAR(3)'s. ColX in the first query is an arbitrary column. Anyone? Anyone? Other ideas entirely? Thanks...

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

JeffB Hall of Fame Points: 3541 More actions · Answer 1

JeffB

Hall of Fame

Points: 3541

June 21, 2006 at 1:10 pm

#645127

OUTER JOINs always perform better than NOT INs.

I don't believe you can do what you are trying to do.

PW-201837 SSC-Insane Points: 20805 More actions · Answer 2

PW-201837

SSC-Insane

Points: 20805

June 21, 2006 at 1:15 pm

#645129

For NOT IN with multiple join columns, you need a correlated sub-query, which means you need NOT EXISTS:

INSERT INTO table1

SELECT colA, colB, colC, colD, colE

FROM table2 As t2

WHERE NOT EXISTS (

SELECT *

FROM Table1 As t1

WHERE t1.ColC = t2.ColC

AND t1.ColD = t2.ColD

)

stax68 SSChampion Points: 11711 More actions · Answer 3

The only way to do it as a single NOT IN subquery would be to concatenate the columns. But unless you use indexed calc columns, and probably even then, this would perform worse than an outer join.

So you should use an outer join, specifying in the where clause that a join column on the outer table IS NULL:

select t1.c1, t1.c2
from t1 left join t2 on t1.c3 = t2.c3 and t1.c4 = t2.c4
where t2.c3 is null

Tim Wilkinson

"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 4

I agree with JeffB... outer join, search for the nulls... haven't tested for speed but seems that it would be faster than either the correlated subquery or the concatenate method... could be wrong, though.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

PW-201837 SSC-Insane Points: 20805 More actions · Answer 5

PW-201837

SSC-Insane

Points: 20805

June 22, 2006 at 8:30 am

#645271

>>haven't tested for speed but seems that it would be faster than either the correlated subquery

I generally get the same execution plan using NOT EXISTS or LEFT JOIN check for NULL. I tend to use NOT EXISTS so that I know the intention of the query a year from now when I have to maintain/modify it.

stax68 SSChampion Points: 11711 More actions · Answer 6

stax68

SSChampion

Points: 11711

June 22, 2006 at 10:36 am

#645319

Bold assertion: left anti-join never performs worse than the equivalent NOT EXISTS...

Tim Wilkinson

"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant

Greg J SSCarpal Tunnel Points: 4288 More actions · Answer 7

Greg J

SSCarpal Tunnel

Points: 4288

June 22, 2006 at 2:07 pm

#645377

thanks fellas. I'm actually getting the quickest times using the concatenate and NOT IN clause. Could be a fluke, but I made sure to run freeproccache and dropcleanbuffers first.

I'll revisit the perf testing tomorrow and post the times and table defs.