The Semantics of NULL in SQL Server 2008

  • paulm-771594 (8/24/2010)


    I would be interested to know what the impact of using NULL is on the storage space required for a Column. i.e. does a nullable nvarchar column save space over a non-nullable nvarchar column with an empty string in it?

    If you disregard page and row compression (SQL Server 2008), SPARSE columns (also SQL 2008) and vardecimal (introduced in one of SQL Server 2005's service packs and then superceded by row compression), the short answer is:

    * For fixed length data types, there is no difference at all in storage requirements for NULL vs NOT NULL

    * For variable length data types, there is no difference in storage requirements for NULL vs NOT NULL when storing actual data values; when storing NULL values, only as much space is taken is would be needed for a zero length content.

    For instance, a varchar(20) with a string of length 5 takes 7 bytes (5 for the data; 2 for the length), whereas both the zero length string ('') and NULL take 2 bytes (0 for the data; 2 for the length).


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (8/24/2010)


    paulm-771594 (8/24/2010)


    I would be interested to know what the impact of using NULL is on the storage space required for a Column. i.e. does a nullable nvarchar column save space over a non-nullable nvarchar column with an empty string in it?

    If you disregard page and row compression (SQL Server 2008), SPARSE columns (also SQL 2008) and vardecimal (introduced in one of SQL Server 2005's service packs and then superceded by row compression), the short answer is:

    * For fixed length data types, there is no difference at all in storage requirements for NULL vs NOT NULL

    * For variable length data types, there is no difference in storage requirements for NULL vs NOT NULL when storing actual data values; when storing NULL values, only as much space is taken is would be needed for a zero length content.

    For instance, a varchar(20) with a string of length 5 takes 7 bytes (5 for the data; 2 for the length), whereas both the zero length string ('') and NULL take 2 bytes (0 for the data; 2 for the length).

    Thank you Hugo - your explanation is very much appreciated!

  • Hugo Kornelis (8/24/2010)


    da-zero (8/24/2010)


    I would've liked a paragraph on the behaviour of NULL in COUNT(*) and COUNT(DISTINCT *).

    There's not much to say about that, as COUNT(*) does not care if values are NULL or not (it simply counts rows), and COUNT(DISTINCT *) is a syntax error.

    In COUNT(columnname), NULL is relevant. If you demand a COUNT(Age) from the patients table, you are not requesting the number of patients in the database, but the number of ages in the database. Since NULL represents missing data, a row with a NULL age does not represent an age in the database. These rows are correctly excluded from the COUNT() result.

    You're right, I didn't think about the syntax 🙂

    What I meant was that the author should've included the COUNT function in the article, since it behaves differently when you specify additional arguments.

    E.g. I would've liked examples of COUNT(*), COUNT(expression), COUNT(DISTINCT expression) and COUNT(ALL expression).

    That would've made the article more comprehensive.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Yes, I find that a lot of the confusion goes away by not calling NULL a "value" but seeing as a state (or state indicator).

  • da-zero (8/24/2010)


    What I meant was that the author should've included the COUNT function in the article, since it behaves differently when you specify additional arguments.

    E.g. I would've liked examples of COUNT(*), COUNT(expression), COUNT(DISTINCT expression) and COUNT(ALL expression).

    That would've made the article more comprehensive.

    I agree. Which is exactly why I included some information on this in my answer.

    To recap:

    COUNT(*) - how many rows?

    COUNT(column) - how many non-NULL values in the specified column?

    COUNT(DISTINCT column) - how many different non-NULL values in the specified column?

    COUNT(ALL column) - same as COUNT(column), since ALL is the default in the [ALL | DISTINCT] syntax.

    COUNT(expression) and COUNT(DISTINCT expression) are basicallly the same as COUNT([DISTINCT] column), except they operate on an expression rather than just a column reference.

    Example:

    If SELECT SomeColumn FROM SomeTable returns the values 1, 2, 2, 3, 5, NULL, 8, then SELECT COUNT(*), COUNT(SomeColumn), COUNT(DISTINCT SomeColumn) will return 7, 6, 5. Because there are 7 rows, 6 of which have a value for SomeColumn, but there are only 5 different values.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Alex-668179 (8/24/2010)


    Yes, I find that a lot of the confusion goes away by not calling NULL a "value" but seeing as a state (or state indicator).

    Agreed. I think that the standardization committee has made a mistake by using the term "null vlaue" and defining it as "a special value to indicate the absence of any data value."

    I think they should have simply used the term "null" and should have defined it as "a marker to indicate the absence of any value".

    That might have countered some of the misconceptions and confusion.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • This is an important topic, but the author unfortunately presents NULL as a 3-value logic value in both tables instead of UNKOWN. Joe Celko has presented this precisely in his article "SQL and the Snare of Three-Valued Logic" at http://www.simple-talk.com/sql/learn-sql-server/sql-and-the-snare-of-three-valued-logic/

    which I recommend as a must-to-read for all SQL programmers.

  • martti.laiho (8/24/2010)


    This is an important topic, but the author unfortunately presents NULL as a 3-value logic value in both tables instead of UNKOWN. Joe Celko has presented this precisely in his article "SQL and the Snare of Three-Valued Logic" at http://www.simple-talk.com/sql/learn-sql-server/sql-and-the-snare-of-three-valued-logic/

    which I recommend as a must-to-read for all SQL programmers.

    I don't endorse this recommendation. There are various errors in his article. Please see my comment below the article (to which Joe has never replied, unfortunately).

    Of course, his explanation how the ANSI standardisation committee based the decision not to implement an IMPLIES operand on completely erroneous decision tables does provide an interesting (and shocking!) insight in how SQL arrived in its current state...

    (Warning: Blatant self-promotion ahead.)

    For some really good reading on NULL, I recommend searching elsewhere. I once did a series of four articles on this subject that I am still pretty proud of. Here are the links:

    Part 1: NULL - The database's black hole

    Part 2: The logic of three-valued logic

    Part 3: Dr. Unknown, or how I learned to stop worrying and love the NULL

    Part 4: What if null if null is null null null is null?


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (8/24/2010)


    I rated this article "poor". A lot of the information is correct, and relevant - but there is a very important error, right at the start of the article.

    The author writes:

    A NULL value in SQL Server is (by design) meant to indicate an unknown or undefined value

    And this is absolutely not true. (Though it is indeed described this way in Books Online).

    The implementation of NULL in SQL Server (not only SQL Server 2008!!) follows the ANSI standard. And the ANSI standard defines "the null value" as:

    null value: A special value that is used to indicate the absence of any data value.

    The key words here are "absence of any data value". Or, as I usually describe it: NULL represents missing data.

    NULL does therefore NOT represent unknown data. Of course, the reason that data is missing might be that it is unknown (patient still in coma and unidentified; name, birthdate, etc are all unknown). But it might also be that it is inapplicable (when doing prenatal surgery, the patient has no name or birthday yet). Or it might be known, but totally irrelevant (there are good reasons to register the number of children a female patients has put into the world; for male patients, it's totally irrelevant in a medical database).

    Oh jeez.... here we go with bloody semantics again. If data is missing or absent, what value is it, Hugo? The answer is UNKNOWN because you don't know!

    I can't believe you rated an article as "poor" because of opinionated semantics such as this.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Thank you Adolpho - good article that showed me a new trick, the ISNULL( ) with multiple parameters! I had not seen that before, but could prove quite handy in writing ad-hoc queries.

    I see a lot of the other commentors had further questions which might be helpful as well. I hope that you answer a few of them.

    Larry

  • Hugo Kornelis (8/24/2010)


    Alex-668179 (8/24/2010)


    Yes, I find that a lot of the confusion goes away by not calling NULL a "value" but seeing as a state (or state indicator).

    Agreed. I think that the standardization committee has made a mistake by using the term "null vlaue" and defining it as "a special value to indicate the absence of any data value."

    I think they should have simply used the term "null" and should have defined it as "a marker to indicate the absence of any value".

    That might have countered some of the misconceptions and confusion.

    My recommendation would be that if you good folks really, really want to discuss the alternate definitions of and the philosphy of NULLs, there's a whole separate thread where you can do that (see below)... don't hijack this man's thread for yet another discussion on the semantics and philosphy of nulls.

    Here's the link to the thread...

    http://qa.sqlservercentral.com/Forums/Topic970493-374-1.aspx

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Good article ...


    A topic I wish had been addressed is how to handle nulls in datetime fields. I think that you do the following. Let me know if there is a better test:

    DECLARE @MyDateTime datetime

    SELECT @MyDateTime = null

    SELECT

    CASE ISNULL(@MyDateTime,0)

    WHEN '1900-01-01 00:00:00' THEN 'Is Null'

    ELSE 'Not Null'

    END AS [Test for Date Null]

  • Tom Carnahan (8/24/2010)


    Good article ...


    A topic I wish had been addressed is how to handle nulls in datetime fields. I think that you do the following. Let me know if there is a better test:

    DECLARE @MyDateTime datetime

    SELECT @MyDateTime = null

    SELECT

    CASE ISNULL(@MyDateTime,0)

    WHEN '1900-01-01 00:00:00' THEN 'Is Null'

    ELSE 'Not Null'

    END AS [Test for Date Null]

    Try this

    DECLARE @MyDateTime datetime

    SELECT @MyDateTime = null

    SELECT

    CASE WHEN @MyDateTime IS NULL THEN 'Is Null' ELSE 'Not Null' END



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • Tom Carnahan (8/24/2010)


    Good article ...


    A topic I wish had been addressed is how to handle nulls in datetime fields. I think that you do the following. Let me know if there is a better test:

    Your code will return 'Is Null' if you set @MyDateTime to '1900-01-01 00:00:00'. A better test would therefore be as follows:

    DECLARE @MyDateTime datetime

    SELECT @MyDateTime = null

    SELECT

    CASE WHEN @MyDateTime IS NULL THEN 'Is Null'

    ELSE 'Not Null'

    END AS [Test for Date Null]

    John

  • Jeff Moden (8/24/2010)


    Oh jeez.... here we go with bloody semantics again. If data is missing or absent, what value is it, Hugo?

    It's not just semantics. This statement is the root of way too many misunderstandings and misconceptions.

    I've worked for companies that did business with both natural persons and legal bodies (I hope those are the correct English terms). The differences between the two categories were completely irrrelevant to the business, so their information was kept in a single "Customers" table. But some of the attributes don't aply to both categories. A company has no gender or birthdate, and a natural person does not have a registration number at the chamber of commerce.

    The answer is UNKNOWN because you don't know!

    Building on the example above. Would you really answer "UNKNOWN" when I ask you to tell me the gender of Microsoft Incorporated, or the chamber of commerce registration number of Jeff Moden?

    I can't believe you rated an article as "poor" because of opinionated semantics such as this.

    And I can't believe that you disqualify a definition that is lifted straight from an internationally accepted ISO/ANSI standard as "opinionated semantics".

    My recommendation would be that if you good folks really, really want to discuss the alternate definitions of and the philosphy of NULLs, there's a whole separate thread where you can do that (see below)... don't hijack this man's thread for yet another discussion on the semantics and philosphy of nulls.

    I'm not hijacking any thread. I'm pointing out what I think is a flaw in an otherwise reasonable good and valuable article. This thread is the proper place for feedback on this article.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Viewing 15 posts - 16 through 30 (of 68 total)

You must be logged in to reply to this topic. Login to reply