Where Clause

  • I have two tables in my database. The first table has rows and the second

    table is empty.

    I want to return all rows from t1 where the t1.id is not existing from t2,

    when I try this statement

    SELECT t1.* FROM t1, t2 Where t1.id <> t2.id no result was returned to me. I

    know that I could use

    SELECT t1.* FROM t1 WHERE t1.id NOT EXIST (SELECT t2.id FROM t2) but i don't

    like to use

    the NOT EXIST clause. Is there any other way in doing this?

    aldo

  • Instead of the EXISTS clause, have you considered the IN clause.

    SELECT t1.*

    FROM t1

    WHERE t1.id not in(SELECT t2.id FROM t2)

  • Another way which is set based is to use a LEFT JOIN

    SELECT t1.*

    FROM

    t1

    LEFT JOIN

    t2

    WHERE t1.id = t2.id

    WHERE

    t2.id IS NULL

    With the left join when there is no match the output will generate a NULL.

    This also has a better performance since only one data read occurrs to t2 as opposed to an NOT IN or NOT EXISTS subquery which does for each row output by T1.

    "Don't roll your eyes at me. I will tape them in place." (Teacher on Boston Public)

  • This method is great! LEFT OUTER JOIN, when there's no join on the right side, NULL is returned... no need to use <> or NOT.

    Slight mistake on the Left Join though...

    SELECT t1.*

    FROM t1

    LEFT OUTER JOIN t2

    ON t1.id = t2.id -- ON instead of WHERE

    WHERE t2.id IS NULL

    BTW aldo, this is the new syntax, u can also write it the old way:

    SELECT t1.*

    FROM t1,t2

    WHERE t1.id *= t2.id

    AND t2.id IS NULL

  • Just a comment on the NOT EXISTS clause. You might want to reevaluate how SQL 2000 optimizes this clause; it is much more intelligent and can understand what you are attempting to do with this clause. Often times, it will generate query plans that are identical to the plans for other approaches that are usually considered more economical.

    Additionally, you need to be careful with the LEFT OUTER JOIN approach. Although it is a very popular approach, you need to make sure that you check for NULL values in the correct columns, which the examples in this thread illustrate; namely, you need to check for a NULL value in one of the JOIN columns of the outer table. If you check for a NULL value in a column that is not one of the columns used for joining the tables, you may experience errant results; so, be cautious on that matter.

    Finally, you may experience better results using the NOT EXISTS statement. Consider this:

    With NOT EXISTS, the query optimizer can throw out records that don't meet the specified criteria as it's doing a join between the two tables. For example, the query optimizer will likely use a NESTED LOOP JOIN operation, first selecting from the inner table, then stepping through each row and using (hopefully - if you've done your indexing correctly) an INDEX SEEK to try and select matching rows from the OUTER TABLE. If it finds a matching row, then it knows that the row from the INNER TABLE does not meet the NOT EXISTS criteria and it can immediately remove that row from the result set. Once it has gone through this join, it has found all of the correct results and can return them immediately (unless you've added additional conditions that need to be filtered, aggregations, ordering, or other query operations).

    Now consider the LEFT OUTER JOIN approach. It too will select data from the INNER TABLE, and then it too could very likely use a NESTED LOOP JOIN, just like the query using NOT EXISTS. In this case, though, it will keep all of the data from the INNER TABLE, and join all matching rows of the OUTER TABLE. Then, after it has completed the JOIN, it will go through an additional step, working on that result set, and filtering the results based on the criteria that a specific column is NULL. So, there is an additional step involved. Consequently, this may yield slower results than using NOT EXISTS.

    Here's an example that - at least on my SQL 2000 server - illustrates this. Look at the query plans for the two SELECT queries at the end:

    
    
    CREATE TABLE Test1
    (a int not null)
    
    
    CREATE TABLE Test2
    (a int not null)
    
    
    INSERT INTO Test1 VALUES (1)
    INSERT INTO Test1 VALUES (2)
    INSERT INTO Test1 VALUES (3)
    INSERT INTO Test1 VALUES (4)
    
    
    INSERT INTO Test2 VALUES (1)
    INSERT INTO Test2 VALUES (2)
    INSERT INTO Test2 VALUES (3)
    
    
    CREATE INDEX Test1_idx ON Test1 (a)
    CREATE INDEX Test2_idx ON Test2 (a)
    
    
    SELECT *
    FROM Test1
    WHERE NOT EXISTS (SELECT *
    FROM Test2
    WHERE Test1.a = Test2.a)
    
    
    SELECT Test1.*
    FROM Test1
    LEFT OUTER JOIN
    Test2
    ON Test1.a = Test2.a
    WHERE Test2.a IS NULL

    Although, with this small amount of data, the performance difference is negligible (a side-by-side evaluation of the query plans shows that they both take the same amount of time), with a large amount of data, that additional FILTER operation could contribute substantially to execution time and CPU utilization.

    Matthew Burr

  • Further more if the t2 can have more than one row for the an id value then you have to use the not exists method. This is an area where you can't say do it one way or another, because it hugely depends on the data involved, the indexing.

    Simon Sabin

    Co-author of SQL Server 2000 XML Distilled

    http://www.amazon.co.uk/exec/obidos/ASIN/1904347088


    Simon Sabin
    SQL Server MVP

    http://sqlblogcasts.com/blogs/simons

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply