Who Likes NULL?

  • Dennis Q Miller - Wednesday, July 11, 2018 8:23 AM

    You are illustrating how NULL is prone to coding error.  The first statement can never be true, but the second can!

    How can StartDate >= '2018-01-01' AND EndDate IS NULL never be true?  If EndDate is NULL and StartDate is greater than or equal to 2018-01-01 the statement is true.

  • Jeff Moden - Wednesday, July 11, 2018 8:24 AM

    Even such consistency "depends".  If you look at the connections for SSMS, ANSI NULLs is turned on.  For external connections, it is not.  MS was (and may still be) making noises that it may no longer be an option and that ANSI NULLs will be enabled for ALL connections.  That set the front-end world on fire because many are used to being able to relate NULL = NULL.  It would break a lot of code if MS ever forced ANSI NULLs on the front-end world.

    True but other operations will always perform differently on nulls vs magic values whether ansi nulls are on or off, for example date modifications or string concatenation or arithmetic.

  • Jeff Moden - Wednesday, July 11, 2018 8:28 AM

    Not sure how you think the first statement can never be true if NULL is used to signify that no EndDate has been registered yet.

    Because Because NULL >= '2018-01-01'   returns false 

    Logical operations on NULL do not behave like they do on other values and they do not behave consistently across platforms, languages, and even configuration settings. Logical operations on NULL do not behave like they do on values and they do not behave consistently across platforms, languages, and sometimes even configuration settings. 

    In most cases, all of the following return false:

  • NULL = NULL

  • NULL = anyvalue
  • NULL <> anyvalue

  • NULL <= anyvalue
  • Dennis Q Miller - Wednesday, July 11, 2018 9:12 AM

    Jeff Moden - Wednesday, July 11, 2018 8:28 AM

    Not sure how you think the first statement can never be true if NULL is used to signify that no EndDate has been registered yet.

    Because Because NULL >= '2018-01-01'   returns false 

    Logical operations on NULL do not behave like they do on other values and they do not behave consistently across platforms, languages, and even configuration settings. Logical operations on NULL do not behave like they do on values and they do not behave consistently across platforms, languages, and sometimes even configuration settings. 

    In most cases, all of the following return false:

  • NULL = NULL

  • NULL = anyvalue
  • NULL <> anyvalue

  • NULL <= anyvalue
  • No!  At least not in SQL.  All of those conditions return UNKNOWN which is different from FALSE.  Conflating FALSE and UNKNOWN is perhaps the most common error in tripartite logic.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Dennis Q Miller - Wednesday, July 11, 2018 9:12 AM

    Jeff Moden - Wednesday, July 11, 2018 8:28 AM

    Not sure how you think the first statement can never be true if NULL is used to signify that no EndDate has been registered yet.

    Because Because NULL >= '2018-01-01'   returns false 

    Logical operations on NULL do not behave like they do on other values and they do not behave consistently across platforms, languages, and even configuration settings. Logical operations on NULL do not behave like they do on values and they do not behave consistently across platforms, languages, and sometimes even configuration settings. 

    In most cases, all of the following return false:

  • NULL = NULL

  • NULL = anyvalue
  • NULL <> anyvalue

  • NULL <= anyvalue
  • Why would StartDate be NULL?  EndDate can be null since it may not be assigned, but StartDate should always have a value other than NULL based on its use, the start of a date interval.

    Also NULL >= '2018-01-01' does not return FALSE, it actually returns UNKNOWN.

  • roger.plowman - Wednesday, July 11, 2018 6:54 AM

    Null being a value or not is semantics. Null is a countable state, therefore it is a value, even though it represents no value. In a nullable Boolean for instance, there's true, false, and null. Thus three possible states. Calling them values may not be pedantically correct, but for all intents and purposes null is a "third value", just a very recalcitrant one.

    Domains do indeed have unused spaces. Consider the date domain as represented by the DateTime data type. The domain is 1/1/1753 to 12/31/9999. Now apply that data type to the domain of "hired persons" (employees that have ever worked for the company, regardless of current employment.

    Granted, this varies by company but let's assume the following facts. 1) The company's founding was 3/15/1992. 2) The company did not hire anyone 92 years of age at its founding, and in fact never hired anyone born on or before 1/1/1900--and now literally can not since such a person would be at least 118 years old. 🙂

    Thus for this domain the "unused" portion of the domain is certainly any date before 1/1/1900. We may not know the exact beginning of the domain "hired person" but we can guarantee any date before 1/1/1900 is not part of that domain--and this will be true forever, not just for a few thousand years. 🙂

    Now, it is true that certain companies, for certain domains, will have dates that are < 1/1/1900 so this magic value may not be universally part of the unused datetime domain. However, unless the company is dealing with genealogy or history the date 1/1/1753 almost certainly falls in the unused space.

    Magic values are frowned upon in IT because of their arbitrary nature, but when well documented (read, beaten into employees with a clue stick upon hire) then they become an extremely useful tool--one that, unlike null, follows the rules of the data type domain.

    I'm afraid we're getting mired in a quagmire of discordant definitions...

    NULL is literally defined as the state representing absence of a value. That is why NULL generally does not test equal to anything, even itself.  You may choose to use that state as if it were a value, but it still will not behave like a value.  A boolean datatype, by definition, represents one of two values. I think you should invent a new word for a "boolean" that represents three values.

    "Domain" is a logical construct, that among other things, defines all acceptable values.  In your example, '1/1/1753' is not within the domain of employment_date.  "Domain" is different from "datatype", which is a physical construct that does support that value.  

    Datatypes only approximate a domain well and that's giving the benefit of doubt. They generally allow values that are outside the domain and they often don't support values that are.  For example, "unknown" might be a legitimate value for date, but it is not supported by the Datetime datatype.

    SQL databases lack robust support for domains. That is precisely the reason we resort to techniques like assigning meaning to NULL and magic numbers.

  • Dennis Q Miller - Wednesday, July 11, 2018 10:29 AM

    I'm afraid we're getting mired in a quagmire of discordant definitions...

    NULL is literally defined as the state representing absence of a value. That is why NULL generally does not test equal to anything, even itself.  You may choose to use that state as if it were a value, but it still will not behave like a value.  A boolean datatype, by definition, represents one of two values. I think you should invent a new word for a "boolean" that represents three values.

    "Domain" is a logical construct, that among other things, defines all acceptable values.  In your example, '1/1/1753' is not within the domain of employment_date.  "Domain" is different from "datatype", which is a physical construct that does support that value.  

    Datatypes only approximate a domain well and that's giving the benefit of doubt. They generally allow values that are outside the domain and they often don't support values that are.  For example, "unknown" might be a legitimate value for date, but it is not supported by the Datetime datatype.

    SQL databases lack robust support for domains. That is precisely the reason we resort to techniques like assigning meaning to NULL and magic numbers.

    I have to disagree on a semantic level.  NULL defines the state where you don't even know if the value is missing or not.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Lynn Pettis - Wednesday, July 11, 2018 9:27 AM

    Why would StartDate be NULL?  EndDate can be null since it may not be assigned, but StartDate should always have a value other than NULL based on its use, the start of a date interval.

    Also NULL >= '2018-01-01' does not return FALSE, it actually returns UNKNOWN.

    I don't know why StartDate would be NULL, but it could be.  And FWIW, I do apologize for my erroneous claim that the first statement is always false.  My elderly brain misread it.  If you change > in the second statement to =, then those two statements are logically equivalent and your question is on point. 

    IYou are correct that the NULL comparison returns unknown.

  • Dennis Q Miller - Wednesday, July 11, 2018 10:57 AM

    I don't know why StartDate would be NULL, but it could be.  And FWIW, I do apologize for my erroneous claim that the first statement is always false.  My elderly brain misread it.  If you change > in the second statement to =, then those two statements are logically equivalent and your question is on point. 

    IYou are correct that the NULL comparison returns unknown.

    Well, knowing the usage of StartDate and EndDate, perhaps I should have defined them first which would have had StartDate NOT NULL and EndDate NULL.  Then StartDate could not be NULL.  When ever I have setup temporal date ranges like StartDate and EndDate that has always been the case as there will always be a valid StartDate.

  • All this talk about NULL makes me hungry for a burger and fries. 🙂

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Dennis Q Miller - Wednesday, July 11, 2018 10:29 AM

    roger.plowman - Wednesday, July 11, 2018 6:54 AM

    Null being a value or not is semantics. Null is a countable state, therefore it is a value, even though it represents no value. In a nullable Boolean for instance, there's true, false, and null. Thus three possible states. Calling them values may not be pedantically correct, but for all intents and purposes null is a "third value", just a very recalcitrant one.

    Domains do indeed have unused spaces. Consider the date domain as represented by the DateTime data type. The domain is 1/1/1753 to 12/31/9999. Now apply that data type to the domain of "hired persons" (employees that have ever worked for the company, regardless of current employment.

    Granted, this varies by company but let's assume the following facts. 1) The company's founding was 3/15/1992. 2) The company did not hire anyone 92 years of age at its founding, and in fact never hired anyone born on or before 1/1/1900--and now literally can not since such a person would be at least 118 years old. 🙂

    Thus for this domain the "unused" portion of the domain is certainly any date before 1/1/1900. We may not know the exact beginning of the domain "hired person" but we can guarantee any date before 1/1/1900 is not part of that domain--and this will be true forever, not just for a few thousand years. 🙂

    Now, it is true that certain companies, for certain domains, will have dates that are < 1/1/1900 so this magic value may not be universally part of the unused datetime domain. However, unless the company is dealing with genealogy or history the date 1/1/1753 almost certainly falls in the unused space.

    Magic values are frowned upon in IT because of their arbitrary nature, but when well documented (read, beaten into employees with a clue stick upon hire) then they become an extremely useful tool--one that, unlike null, follows the rules of the data type domain.

    I'm afraid we're getting mired in a quagmire of discordant definitions...

    NULL is literally defined as the state representing absence of a value. That is why NULL generally does not test equal to anything, even itself.  You may choose to use that state as if it were a value, but it still will not behave like a value.  A boolean datatype, by definition, represents one of two values. I think you should invent a new word for a "boolean" that represents three values.

    "Domain" is a logical construct, that among other things, defines all acceptable values.  In your example, '1/1/1753' is not within the domain of employment_date.  "Domain" is different from "datatype", which is a physical construct that does support that value.  

    Datatypes only approximate a domain well and that's giving the benefit of doubt. They generally allow values that are outside the domain and they often don't support values that are.  For example, "unknown" might be a legitimate value for date, but it is not supported by the Datetime datatype.

    SQL databases lack robust support for domains. That is precisely the reason we resort to techniques like assigning meaning to NULL and magic numbers.

    I'd also disagree on the literal "unknown" being a valid date value (even outside of the confines of SQL).  That's a serious poor modeling decision for any domain.(the fact that the date's value is unknown is NOT the same as saying the literal "unknown" is a valid value for a date).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (4) - Thursday, July 12, 2018 2:16 PM

    Dennis Q Miller - Wednesday, July 11, 2018 10:29 AM

    roger.plowman - Wednesday, July 11, 2018 6:54 AM

    Null being a value or not is semantics. Null is a countable state, therefore it is a value, even though it represents no value. In a nullable Boolean for instance, there's true, false, and null. Thus three possible states. Calling them values may not be pedantically correct, but for all intents and purposes null is a "third value", just a very recalcitrant one.

    Domains do indeed have unused spaces. Consider the date domain as represented by the DateTime data type. The domain is 1/1/1753 to 12/31/9999. Now apply that data type to the domain of "hired persons" (employees that have ever worked for the company, regardless of current employment.

    Granted, this varies by company but let's assume the following facts. 1) The company's founding was 3/15/1992. 2) The company did not hire anyone 92 years of age at its founding, and in fact never hired anyone born on or before 1/1/1900--and now literally can not since such a person would be at least 118 years old. 🙂

    Thus for this domain the "unused" portion of the domain is certainly any date before 1/1/1900. We may not know the exact beginning of the domain "hired person" but we can guarantee any date before 1/1/1900 is not part of that domain--and this will be true forever, not just for a few thousand years. 🙂

    Now, it is true that certain companies, for certain domains, will have dates that are < 1/1/1900 so this magic value may not be universally part of the unused datetime domain. However, unless the company is dealing with genealogy or history the date 1/1/1753 almost certainly falls in the unused space.

    Magic values are frowned upon in IT because of their arbitrary nature, but when well documented (read, beaten into employees with a clue stick upon hire) then they become an extremely useful tool--one that, unlike null, follows the rules of the data type domain.

    I'm afraid we're getting mired in a quagmire of discordant definitions...

    NULL is literally defined as the state representing absence of a value. That is why NULL generally does not test equal to anything, even itself.  You may choose to use that state as if it were a value, but it still will not behave like a value.  A boolean datatype, by definition, represents one of two values. I think you should invent a new word for a "boolean" that represents three values.

    "Domain" is a logical construct, that among other things, defines all acceptable values.  In your example, '1/1/1753' is not within the domain of employment_date.  "Domain" is different from "datatype", which is a physical construct that does support that value.  

    Datatypes only approximate a domain well and that's giving the benefit of doubt. They generally allow values that are outside the domain and they often don't support values that are.  For example, "unknown" might be a legitimate value for date, but it is not supported by the Datetime datatype.

    SQL databases lack robust support for domains. That is precisely the reason we resort to techniques like assigning meaning to NULL and magic numbers.

    I'd also disagree on the literal "unknown" being a valid date value (even outside of the confines of SQL).  That's a serious poor modeling decision for any domain.(the fact that the date's value is unknown is NOT the same as saying the literal "unknown" is a valid value for a date).

    I couldn't agree more.  We debate incessantly about how to represent "unknown" with NULL or a magic number in a datetime column, but we neglect considering schema designs that don't require it in the first place.

    Right..."Unknown" is not valid for a date datatype. And a date datatype does not support a domain that includes "unknown"  as one of its legitimate cases.  Using either NULL or magic number is a trick to pretend it does.  Domains are not defined by the values of a datatype; they are defined by the business requirements at a logical level and, in most cases, can only be approximated by a datatype.

  • Dennis Q Miller - Friday, July 13, 2018 12:30 PM

    Matt Miller (4) - Thursday, July 12, 2018 2:16 PM

    Dennis Q Miller - Wednesday, July 11, 2018 10:29 AM

    roger.plowman - Wednesday, July 11, 2018 6:54 AM

    Null being a value or not is semantics. Null is a countable state, therefore it is a value, even though it represents no value. In a nullable Boolean for instance, there's true, false, and null. Thus three possible states. Calling them values may not be pedantically correct, but for all intents and purposes null is a "third value", just a very recalcitrant one.

    Domains do indeed have unused spaces. Consider the date domain as represented by the DateTime data type. The domain is 1/1/1753 to 12/31/9999. Now apply that data type to the domain of "hired persons" (employees that have ever worked for the company, regardless of current employment.

    Granted, this varies by company but let's assume the following facts. 1) The company's founding was 3/15/1992. 2) The company did not hire anyone 92 years of age at its founding, and in fact never hired anyone born on or before 1/1/1900--and now literally can not since such a person would be at least 118 years old. 🙂

    Thus for this domain the "unused" portion of the domain is certainly any date before 1/1/1900. We may not know the exact beginning of the domain "hired person" but we can guarantee any date before 1/1/1900 is not part of that domain--and this will be true forever, not just for a few thousand years. 🙂

    Now, it is true that certain companies, for certain domains, will have dates that are < 1/1/1900 so this magic value may not be universally part of the unused datetime domain. However, unless the company is dealing with genealogy or history the date 1/1/1753 almost certainly falls in the unused space.

    Magic values are frowned upon in IT because of their arbitrary nature, but when well documented (read, beaten into employees with a clue stick upon hire) then they become an extremely useful tool--one that, unlike null, follows the rules of the data type domain.

    I'm afraid we're getting mired in a quagmire of discordant definitions...

    NULL is literally defined as the state representing absence of a value. That is why NULL generally does not test equal to anything, even itself.  You may choose to use that state as if it were a value, but it still will not behave like a value.  A boolean datatype, by definition, represents one of two values. I think you should invent a new word for a "boolean" that represents three values.

    "Domain" is a logical construct, that among other things, defines all acceptable values.  In your example, '1/1/1753' is not within the domain of employment_date.  "Domain" is different from "datatype", which is a physical construct that does support that value.  

    Datatypes only approximate a domain well and that's giving the benefit of doubt. They generally allow values that are outside the domain and they often don't support values that are.  For example, "unknown" might be a legitimate value for date, but it is not supported by the Datetime datatype.

    SQL databases lack robust support for domains. That is precisely the reason we resort to techniques like assigning meaning to NULL and magic numbers.

    I'd also disagree on the literal "unknown" being a valid date value (even outside of the confines of SQL).  That's a serious poor modeling decision for any domain.(the fact that the date's value is unknown is NOT the same as saying the literal "unknown" is a valid value for a date).

    I couldn't agree more.  We debate incessantly about how to represent "unknown" with NULL or a magic number in a datetime column, but we neglect considering schema designs that don't require it in the first place.

    Right..."Unknown" is not valid for a date datatype. And a date datatype does not support a domain that includes "unknown"  as one of its legitimate cases.  Using either NULL or magic number is a trick to pretend it does.  Domains are not defined by the values of a datatype; they are defined by the business requirements at a logical level and, in most cases, can only be approximated by a datatype.

    We're quibbling about semantics. Logical domains can and must be modeled physically with ill-matching physical representations--but that's true even for DATES, which are encoded as the number of days since a milestone date. 🙂 Try storing March 3, 1610 in a DATE or DATETIME datatype for instance.

    Also, the "verified unknown" value, whatever trick you use to represent it, can and in fact must be part of the date domain. For example a purchase date for a piece of equipment that you have lost the purchase order for because it was bought 25 years ago.

  • Dennis Q Miller - Friday, July 13, 2018 12:30 PM

    I couldn't agree more.  We debate incessantly about how to represent "unknown" with NULL or a magic number in a datetime column, but we neglect considering schema designs that don't require it in the first place.

    Right..."Unknown" is not valid for a date datatype. And a date datatype does not support a domain that includes "unknown"  as one of its legitimate cases.  Using either NULL or magic number is a trick to pretend it does.  Domains are not defined by the values of a datatype; they are defined by the business requirements at a logical level and, in most cases, can only be approximated by a datatype.

    If you're going to make such a bold statement, you better be prepared to back it up with facts, specifically, you should be able to provide an example of such a schema that doesn't require representing UNKNOWN.  I think that such a schema is untenable, because there will always be gaps in our knowledge where we have to represent "unknown" values.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Heh... always fun to watch discussions concerning NULL.  😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Viewing 15 posts - 106 through 120 (of 143 total)

    You must be logged in to reply to this topic. Login to reply