Confused About a NULL

  • Alright, I have been writing SQL a long time and never ran across this yet. I have a query that is not acting as expected because of a NULL.

    Select Distinct Location From esi_PricesTemp

    Where Location Not In (Select Distinct GridPoint From esi_PriceHistoryLocations) Order By Location

    There is 296 Records in esi_PriceHistoryLocations and 219 in esi_PricesTemp.

    When I do a

    Select Distinct Location From esi_PricesTemp Where Location In (Select GridPoint From esi_PriceHistoryLocations) Order By Location

    I get 213 records, which is the 6 records I am expecting from the problem query.

    I have noticed that a GridPoint in the esi_PriceHistoryLocations table has a NULL in it. When I fill the NULL with a value the problem query works fine. I would like to understand why this is working this way.

    Any help will be appreciated.

  • Select Distinct Location From esi_PricesTemp Where Location In (Select GridPoint From esi_PriceHistoryLocations where GridPoint is not NULL ) Order By Location

  • I understand how to fix it, I am curious to the reason the data won't come back with that NULL in there.

  • Your query:

    Select Distinct Location From esi_PricesTemp Where Location In (Select GridPoint From esi_PriceHistoryLocations) Order By Location

    In order for a row from esi_PricesTemp to be included in the resultset, the WHERE clause has to be TRUE (not false, and not NULL) for that row.

    NULL values are unknown. They are not equal to anything, not even themselves. That is why "select count(*) from mytable where null=null" will always return 0.

    In your query, for a row to be included in the result set, SQL Server has to be able to prove that the WHERE clause is true for that row. Since the list of GRIDPOINT values has a NULL in it, this means the complete list is unknown. Asking "is Location X equal to NULL?" always evaluates to UNKNOWN, and *not* to FALSE. This means that SQL Server *cannot* prove that any location is not on the list retrieved by the subquery. Make sense?

    Put another way, suppose the subquery retrieves the values 1, 2, 3, and NULL. Then suppose that SQL Server is trying to decide whether to include a row from esi_PricesTemp where LOCATION=5. Is the value 5 in the set {1, 2, 3, NULL}? The answer is MAYBE, since that NULL (which is unknown) might really be a five. Consider this:

    5 NOT IN (1, 2, 3, NULL)?

    This is the same as

    NOT (5 IN (1, 2, 3, NULL))?

    This is the same as

    NOT ((5 = 1) or (5 = 2) or (5 = 3) or (5 = NULL))

    Which in turn is the same as

    NOT (FALSE or FALSE or FALSE or NULL)

    Which can be reduced to

    NOT (FALSE or NULL)

    Which is the same as

    (NOT FALSE) AND (NOT NULL)

    Which becomes

    TRUE AND NULL

    Which is NULL. But to qualify for inclusion in the rowset, the WHERE clause must evaluate to TRUE. It is not enough for it simply to be different from FALSE.

    So, the short answer to your question is that NULL predicates do *not* satisfy boolean predicates in a WHERE clause, and the presence of a single NULL in a subquery like yours means that no row will ever be returned.

    Sorry to be long-winded, but I hope that one of these approaches makes sense for you.

    Happy New Year!

    Chris

  • One more thing, ccundy, although you probably already know this ...

    You shouldn't code DISTINCT in your subquery when it is being accessed with a set operator like IN, EXISTS, ANY, etc. The DISTINCT keyword doesn't change the meaning of your query, and it's just one more thing the optimizer could get confused about. "In the old days" (yes, I'm an old database guy ... at least some of my team members think so!), doing this could really screw up the access path chosen by the optimizer. For example, the optimizer might see DISTINCT and just assume it had to do a sort, which would be a really bad idea if esi_PriceHistoryLocations.GridPoint was unindexed and the table was large. In such a case, it's best for the optimizer to just rip through the list of values sequentially, until it finds the target value (or a NULL, in this case, as explained in my other post). Today's optimizers should be able to see this formulation and ignore your DISTINCT keyword, but ... why tempt fate? Leave it out.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply