Who Likes NULL?

  • 2050 may seem a long way away, but does anyone remember the millenium issue?  Assuming that a date at that point is satisfactory is an obvious problem.  It may well be true that by that time the software will no longer be running, but that was what they said about two digit years!

    NULL at least will not be affected by changing  dates or date formats...

  • chrisbray - Monday, July 9, 2018 12:48 AM

    2050 may seem a long way away, but does anyone remember the millenium issue?  Assuming that a date at that point is satisfactory is an obvious problem.  It may well be true that by that time the software will no longer be running, but that was what they said about two digit years!

    NULL at least will not be affected by changing  dates or date formats...

    Agreed.

    What is done with the materialised future date is wrong (we don't know what the real date will be), dangerous (as you mentioned above) and possibly indicative of a poor design, but you've got to work with what you're given.
     I must check to see what sort of a performance hit would occur if the validto field were to become nullable and have no future dates within it.

  • Sean Redmond - Monday, July 9, 2018 1:07 AM

    chrisbray - Monday, July 9, 2018 12:48 AM

    2050 may seem a long way away, but does anyone remember the millenium issue?  Assuming that a date at that point is satisfactory is an obvious problem.  It may well be true that by that time the software will no longer be running, but that was what they said about two digit years!

    NULL at least will not be affected by changing  dates or date formats...

    Agreed.

    What is done with the materialised future date is wrong (we don't know what the real date will be), dangerous (as you mentioned above) and possibly indicative of a poor design, but you've got to work with what you're given.
     I must check to see what sort of a performance hit would occur if the validto field were to become nullable and have no future dates within it.

    The easy way to handle it is to use the trick from the 1960's... set it to 4 nines ('9999') which will translate to 9999-01-01 and then publish it as the standard.  And, to save you some time, there WILL be performance issues (not to mention a shedload of code that you will need to change) if you change to using NULLs.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden - Monday, July 9, 2018 6:29 AM

    The easy way to handle it is to use the trick from the 1960's... set it to 4 nines ('9999') which will translate to 9999-01-01 and then publish it as the standard.  And, to save you some time, there WILL be performance issues (not to mention a shedload of code that you will need to change) if you change to using NULLs.

    I simply cannot see what possible advantage accrues from this approach over using NULL.

  • edwardwill - Monday, July 9, 2018 6:34 AM

    Jeff Moden - Monday, July 9, 2018 6:29 AM

    The easy way to handle it is to use the trick from the 1960's... set it to 4 nines ('9999') which will translate to 9999-01-01 and then publish it as the standard.  And, to save you some time, there WILL be performance issues (not to mention a shedload of code that you will need to change) if you change to using NULLs.

    I simply cannot see what possible advantage accrues from this approach over using NULL.

    An example where I use that strategy with date columns, is a table that has a status history.  A common way our system queries this data is to find the current or future status (when they don't have a current or past status) of our field agents.  Having that end date in 9999 allows us to have an index with descending sorted data in it that is easily queried.

  • Chris Harshman - Monday, July 9, 2018 6:52 AM

    An example where I use that strategy with date columns, is a table that has a status history.  A common way our system queries this data is to find the current or future status (when they don't have a current or past status) of our field agents.  Having that end date in 9999 allows us to have an index with descending sorted data in it that is easily queried.

    That's similar to the scenario that I first mentioned earlier in the thread; the "magic" date (1 January 2050) was used to flag the current row in a table of User Details; so when a user changed her address, the old address was retained in the previous row and her new address could be found by looking for DateEntered = '1 January 2050'  As for why a bit column IsCurrent wasn't used ... well, your guess is as good as mine.

  • edwardwill - Monday, July 9, 2018 6:58 AM

    Chris Harshman - Monday, July 9, 2018 6:52 AM

    An example where I use that strategy with date columns, is a table that has a status history.  A common way our system queries this data is to find the current or future status (when they don't have a current or past status) of our field agents.  Having that end date in 9999 allows us to have an index with descending sorted data in it that is easily queried.

    That's similar to the scenario that I first mentioned earlier in the thread; the "magic" date (1 January 2050) was used to flag the current row in a table of User Details; so when a user changed her address, the old address was retained in the previous row and her new address could be found by looking for DateEntered = '1 January 2050'  As for why a bit column IsCurrent wasn't used ... well, your guess is as good as mine.

    I've found the BIT column you mention to be an extra headache because it must be maintained.  I agree that a date of 2050 is a bit short sighted and should have been '9999', which implicitly converts to 9999-01-01 00:00:00.000  (and NOT 9999-01-01 23:59:59 because it leaves no room for certain date tricks) but having such a future date makes life so easy.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • edwardwill - Monday, July 9, 2018 6:58 AM

    Chris Harshman - Monday, July 9, 2018 6:52 AM

    An example where I use that strategy with date columns, is a table that has a status history.  A common way our system queries this data is to find the current or future status (when they don't have a current or past status) of our field agents.  Having that end date in 9999 allows us to have an index with descending sorted data in it that is easily queried.

    That's similar to the scenario that I first mentioned earlier in the thread; the "magic" date (1 January 2050) was used to flag the current row in a table of User Details; so when a user changed her address, the old address was retained in the previous row and her new address could be found by looking for DateEntered = '1 January 2050'  As for why a bit column IsCurrent wasn't used ... well, your guess is as good as mine.

    Maybe IsCurrent is inappropriate since someone may have more than one active address (e.g. main home and holiday home, or home and work addresses)?  Personally if I was going to have something like that I would probably choose an Inactive flag so that historical values could be retained and reviewed but not used.

  • If you put a null in then you're saying the end date is unknown. However you're handling it in your code as though the NULL is effectively an arbitrary date sometime in the future.  So at least if you put a placeholder date in as Jeff suggests then it's very clear to the end user how the record is going to be handled. I think it definitely makes things easier and more transparent to have a future date/placeholder in this instance rather than a null.
    Arguably if you put a null in there then you shouldn't be using this record as it has no valid end date so you don't know what dates are between the startdate and an unknown end date. At least the placeholder date makes it crystal clear how the data will be handled. You can set the default for the enddate to 9999 or whatever future date is reasonable and then it's obvious to the end user what's going on.
    I've used a default placeholder date in some similar tables and it works very well for me. It simplifies a lot of the queries. I'm sure Jeff is right that there is a performance hit if you have big tables with NULL end dates as well.
    In this instance it seems a pretty reasonable compromise and definitely simplifies the queries on the object.
    Just make sure the date is well after we're all deceased and then at least if our apps and databases get that far into the future it won't be our problem!

  • edwardwill - Monday, July 9, 2018 6:34 AM

    Jeff Moden - Monday, July 9, 2018 6:29 AM

    The easy way to handle it is to use the trick from the 1960's... set it to 4 nines ('9999') which will translate to 9999-01-01 and then publish it as the standard.  And, to save you some time, there WILL be performance issues (not to mention a shedload of code that you will need to change) if you change to using NULLs.

    I simply cannot see what possible advantage accrues from this approach over using NULL.

    NULL introduces a kind of "strangeness" where columns in the NULL state do not respond normally to operations for the datatype.  This always needs to be coded around, with constructs like: if date > today or is null.  The necessary code-arounds are all-too-often overlooked, are vulnerable to errors, and are maintenance potholes.

    While magic numbers respond normally to operations, they still require code-arounds, sometimes ones that are even more insidious than NULL.  When a maintenance programmer encounters "date > today or is null" there's an implicit tip off to the strangeness.  When encountering "date > today" the implications of "12-31-9999" lurking in the data invisible.

    A status attribute (including a boolean flag) seems like the KISS approach for identifying current address changes over time.

  • Dennis Q Miller - Monday, July 9, 2018 2:23 PM

    edwardwill - Monday, July 9, 2018 6:34 AM

    Jeff Moden - Monday, July 9, 2018 6:29 AM

    The easy way to handle it is to use the trick from the 1960's... set it to 4 nines ('9999') which will translate to 9999-01-01 and then publish it as the standard.  And, to save you some time, there WILL be performance issues (not to mention a shedload of code that you will need to change) if you change to using NULLs.

    I simply cannot see what possible advantage accrues from this approach over using NULL.

    NULL introduces a kind of "strangeness" where columns in the NULL state do not respond normally to operations for the datatype.  This always needs to be coded around, with constructs like: if date > today or is null.  The necessary code-arounds are all-too-often overlooked, are vulnerable to errors, and are maintenance potholes.

    While magic numbers respond normally to operations, they still require code-arounds, sometimes ones that are even more insidious than NULL.  When a maintenance programmer encounters "date > today or is null" there's an implicit tip off to the strangeness.  When encountering "date > today" the implications of "12-31-9999" lurking in the data invisible.

    A status attribute (including a boolean flag) seems like the KISS approach for identifying current address changes over time.

    With or without a magic number, having a BIT for an "is active" type of status doesn't seem useful to me.  Checking the EndDate should be all that's needed and then you don't need to "remember" to update a bit for something that's obvious.

    But... that's just my humble opinion.  If an "is active" status bit is what works for you, then that's a whole lot better than what I've seen others do.  What could be worse?  How about storing Red, Yellow, and Green in a column to support a dashboard based on the age of the row.  Yeah... I killed that silliness with the launch of a single, very well placed, very high velocity pork chop. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden - Monday, July 9, 2018 2:46 PM

    Dennis Q Miller - Monday, July 9, 2018 2:23 PM

    edwardwill - Monday, July 9, 2018 6:34 AM

    Jeff Moden - Monday, July 9, 2018 6:29 AM

    The easy way to handle it is to use the trick from the 1960's... set it to 4 nines ('9999') which will translate to 9999-01-01 and then publish it as the standard.  And, to save you some time, there WILL be performance issues (not to mention a shedload of code that you will need to change) if you change to using NULLs.

    I simply cannot see what possible advantage accrues from this approach over using NULL.

    NULL introduces a kind of "strangeness" where columns in the NULL state do not respond normally to operations for the datatype.  This always needs to be coded around, with constructs like: if date > today or is null.  The necessary code-arounds are all-too-often overlooked, are vulnerable to errors, and are maintenance potholes.

    While magic numbers respond normally to operations, they still require code-arounds, sometimes ones that are even more insidious than NULL.  When a maintenance programmer encounters "date > today or is null" there's an implicit tip off to the strangeness.  When encountering "date > today" the implications of "12-31-9999" lurking in the data invisible.

    A status attribute (including a boolean flag) seems like the KISS approach for identifying current address changes over time.

    With or without a magic number, having a BIT for an "is active" type of status doesn't seem useful to me.  Checking the EndDate should be all that's needed and then you don't need to "remember" to update a bit for something that's obvious.

    But... that's just my humble opinion.  If an "is active" status bit is what works for you, then that's a whole lot better than what I've seen others do.  What could be worse?  How about storing Red, Yellow, and Green in a column to support a dashboard based on the age of the row.  Yeah... I killed that silliness with the launch of a single, very well placed, very high velocity pork chop. 😀

    I agree that a bit column for IsActive is a bad idea.  However, the opposite - a bit column for Inactive - works well if there is no EndDate or anything similar to indicate whether or not the record should be used and you want to retain the information but effectively disable it. Addresses are a good example where this is probably the best option, as are bank details and contacts, where you are not storing any form of active date ranges...  Having a flag to check *before* you check a date column is madness in my view - why not just check the date?

  • chrisbray - Monday, July 9, 2018 10:49 PM

    I agree that a bit column for IsActive is a bad idea.  However, the opposite - a bit column for Inactive - works well if there is no EndDate or anything similar to indicate whether or not the record should be used and you want to retain the information but effectively disable it. Addresses are a good example where this is probably the best option, as are bank details and contacts, where you are not storing any form of active date ranges...  Having a flag to check *before* you check a date column is madness in my view - why not just check the date?

    There is absolutely no difference between a bit column for IsActive and a bit column for Inactive.

  • edwardwill - Tuesday, July 10, 2018 1:26 AM

    chrisbray - Monday, July 9, 2018 10:49 PM

    I agree that a bit column for IsActive is a bad idea.  However, the opposite - a bit column for Inactive - works well if there is no EndDate or anything similar to indicate whether or not the record should be used and you want to retain the information but effectively disable it. Addresses are a good example where this is probably the best option, as are bank details and contacts, where you are not storing any form of active date ranges...  Having a flag to check *before* you check a date column is madness in my view - why not just check the date?

    There is absolutely no difference between a bit column for IsActive and a bit column for Inactive.

    Semantically there is a huge difference!  The requirement to activate something that logically should have a default state of active (after all, why would you add an inactive address for example) is nonsensical at least at the UI end.  I agree that the database doesn't care, but since logic dictates that the UI content reflect the database content and vice-versa then the naming is pretty critical to understanding by all concerned.

  • chrisbray - Tuesday, July 10, 2018 2:09 AM

    Semantically there is a huge difference!  The requirement to activate something that logically should have a default state of active (after all, why would you add an inactive address for example) is nonsensical at least at the UI end.  I agree that the database doesn't care, but since logic dictates that the UI content reflect the database content and vice-versa then the naming is pretty critical to understanding by all concerned.

    As a dev rather than a DBA I think your argument is entirely academic, and a straw poll of the three devs in my team who aren't currently on holiday supports my view.  Consider the LINQ to surface the data:

    var userDetails = _userService.Repository<UserDetails>().Query(x => x.IsActive == true).Select().FirstOrDefault();
    as opposed to
    var userDetails = _userService.Repository<UserDetails>().Query(x => x.InActive == false).Select().FirstOrDefault();

    I know which I'd prefer.

Viewing 15 posts - 61 through 75 (of 143 total)

You must be logged in to reply to this topic. Login to reply