Handling Null's in an End_Date Column

  • I'm looking for suggestions, practices for situations such as the one I'm about to describe. I have a table like this:

    CREATE TABLE [dbo].[Person_Unit_Relationships]

    (

    [Person_Unit_Relationship_Id] [int] NOT NULL IDENTITY(1, 1),

    [Person_ID] [int] NOT NULL,

    [Unit_ID] [int] NOT NULL,

    [Relationship_Start_Date] [smalldatetime] NOT NULL,

    [Relationship_End_Date] [smalldatetime] NULL,

    [Relationship_Type_ID] [int] NOT NULL,

    [Relationship_Status_ID] [int] NOT NULL,

    [Relationship_End_Reason_ID] [int] NULL,

    [Retire_Date] [smalldatetime] NULL,

    [Is_Primary] [bit] NULL CONSTRAINT [DF_Person_Sending_Entities_Is_Primary] DEFAULT ((0)),

    )

    ALTER TABLE [dbo].[Person_Unit_Relationships] ADD CONSTRAINT [PK_Person_Unit_Relationships] PRIMARY KEY CLUSTERED ([Person_Unit_Relationship_Id])

    GO

    CREATE UNIQUE NONCLUSTERED INDEX [UX_Person_Unit_Relationships_Person_Unit_Start_Date] ON [dbo].[Person_Unit_Relationships] ([Person_ID], [Unit_ID], [Relationship_Start_Date])

    GO

    Where over time a person could have multiple relationships with the same unit. The most common queries will only want currently active relationships (Relationship_End_Date is null or >= GetDate() AND Retire_Date is null or >= GetDate()) or relationships withing a time range.

    How do you handle this and why? The why is the important part.

    The options I know of are:

      1. Default value in the Nullable columns set to an outrageous date value or to the max datetime value

      2. Computed Column

      3. IsNull or Coalesce

      4. Use the OR syntax in the query

    And here's some sample data:

    INSERT INTO dbo.Person_Unit_Relationships (

    Person_ID,

    Unit_ID,

    Relationship_Start_Date,

    Relationship_End_Date,

    Relationship_Type_ID,

    Relationship_Status_ID,

    Relationship_End_Reason_ID,

    Retire_Date,

    Is_Primary)

    Select

    2,

    67,

    '20040825',

    '20060519',

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    4,

    4,

    '20010212',

    '20011219',

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    4,

    4,

    '20040526',

    '20050513',

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    4,

    32,

    '20050514',

    '20060531',

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    5,

    30,

    '19881201',

    '19891130',

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    5,

    32,

    '19891201',

    '19900531',

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    5,

    61,

    '19880101',

    '19881130',

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    5,

    61,

    '19900801',

    '19950101',

    5,

    23,

    NULL,

    NULL,

    1

    Union ALL

    Select

    8,

    28,

    '19720601',

    '19730531',

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    8,

    32,

    '19760501',

    '19770430',

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    8,

    52,

    '19750201',

    '19760430',

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    8,

    62,

    '19780501',

    '19980101',

    5,

    7,

    NULL,

    NULL,

    1

    Union ALL

    Select

    8,

    62,

    '20010701',

    '20040501',

    5,

    3,

    NULL,

    NULL,

    1

    Union ALL

    Select

    8,

    67,

    '19710901',

    '19720531',

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    9,

    28,

    '20030916',

    '20031222',

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    11,

    61,

    '19990915',

    '20010912',

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    13,

    4,

    '20030205',

    '20030824',

    3,

    4,

    NULL,

    NULL,

    1

    Union ALL

    Select

    14,

    4,

    '19860901',

    '19870331',

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    14,

    62,

    '19870501',

    '19941218',

    5,

    7,

    NULL,

    NULL,

    1

    Union ALL

    Select

    15,

    14,

    '20020520',

    '20040114',

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    15,

    28,

    '20000902',

    '20020519',

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    15,

    32,

    '20040115',

    '20041231',

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    19,

    18,

    '19760901',

    '19770826',

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    19,

    32,

    '19770827',

    '19780712',

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    19,

    62,

    '19780713',

    '19790214',

    5,

    4,

    NULL,

    NULL,

    1

    Union ALL

    Select

    4,

    62,

    '20060601',

    NULL,

    5,

    7,

    NULL,

    NULL,

    1

    Union ALL

    Select

    11,

    61,

    '20010913',

    NULL,

    5,

    7,

    NULL,

    NULL,

    1

    Union ALL

    Select

    15,

    62,

    '20050101',

    NULL,

    5,

    7,

    NULL,

    NULL,

    1

    Union ALL

    Select

    21,

    62,

    '20020101',

    NULL,

    5,

    28,

    NULL,

    NULL,

    1

    Union ALL

    Select

    28,

    62,

    '19950101',

    NULL,

    5,

    9,

    NULL,

    '19950101',

    1

    Union ALL

    Select

    35,

    42,

    '19940101',

    NULL,

    5,

    28,

    NULL,

    NULL,

    1

    Union ALL

    Select

    37,

    62,

    '19850801',

    NULL,

    5,

    7,

    NULL,

    NULL,

    1

    Union ALL

    Select

    39,

    28,

    '20070821',

    NULL,

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    44,

    62,

    '20080516',

    NULL,

    5,

    3,

    NULL,

    NULL,

    1

    Union ALL

    Select

    45,

    6,

    '19700101',

    NULL,

    5,

    9,

    NULL,

    '19700101',

    1

    Union ALL

    Select

    51,

    62,

    '19950201',

    NULL,

    5,

    7,

    NULL,

    NULL,

    1

    Union ALL

    Select

    58,

    62,

    '20020613',

    NULL,

    5,

    7,

    NULL,

    NULL,

    1

    Union ALL

    Select

    61,

    62,

    '19920801',

    NULL,

    5,

    28,

    NULL,

    NULL,

    1

    Union ALL

    Select

    64,

    61,

    '20040716',

    NULL,

    5,

    7,

    NULL,

    NULL,

    1

    Union ALL

    Select

    73,

    62,

    '19861201',

    NULL,

    5,

    7,

    NULL,

    NULL,

    1

    Union ALL

    Select

    74,

    5,

    '20000601',

    NULL,

    5,

    7,

    NULL,

    NULL,

    1

    Union ALL

    Select

    78,

    62,

    '19680801',

    NULL,

    5,

    7,

    NULL,

    NULL,

    1

    Union ALL

    Select

    83,

    36,

    '20050601',

    NULL,

    5,

    7,

    NULL,

    NULL,

    1

    Union ALL

    Select

    90,

    25,

    '20080609',

    NULL,

    5,

    7,

    NULL,

    NULL,

    1

    Union ALL

    Select

    96,

    62,

    '19950601',

    NULL,

    5,

    28,

    NULL,

    NULL,

    1

    Union ALL

    Select

    97,

    6,

    '19960224',

    NULL,

    5,

    7,

    NULL,

    NULL,

    1

    Union ALL

    Select

    117,

    62,

    '20030901',

    NULL,

    5,

    9,

    NULL,

    '20030901',

    1

    Union ALL

    Select

    119,

    26,

    '20090101',

    NULL,

    3,

    2,

    NULL,

    NULL,

    1

    Union ALL

    Select

    125,

    62,

    '19720801',

    NULL,

    5,

    7,

    NULL,

    NULL,

    1

    Union ALL

    Select

    130,

    6,

    '20051230',

    NULL,

    5,

    7,

    NULL,

    NULL,

    1

    Edit: Fixed sample code so it would work.

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • Always an ISNULL for me, interested to hear other's opinions

    SELECT * FROM XXX WHERE

    BeginDate @Date

  • Jack,

    Would a "Person_Unit_Relationships_Hist" table be an option? You keeps you main table very small and you are still able to get historic information. Since your table is a many to many relationship table there shouldn't be many affected modules.

    If not, I would use a computed column (e.g. Is_Active) and an index on (Person_Id, Unit_Id, Is_Active).

    Hope this helps!

    Flo

  • Flo,

    In this particular application I could do anything I wanted, well, after discussion with the rest of the dev team, but I only provided the structure because I know I can usually understand problems better when structures are provided and this is one I currently have.

    I'm really looking at, what is the best, most performant way to handle situations like this. In the future I may have to deal with issues like this on a large database and I want to know the best way and if it's the best way for large DB's then I want to use it on my small ones too!

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • If I have to query data based on the above, I've found that the OR syntax gets me better index utilization than IsNULL. Beyond that, a computed column of active as Flo mentioned would seem to be a more interesting way to go.

    BTW your sample data no worky...

    Server: Msg 8115, Level 16, State 2, Line 1

    Arithmetic overflow error converting expression to data type smalldatetime.

    The statement has been terminated.

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • I've used calculated columns for that kind of thing a few times, but I've generally found the best is a history table. Set up the indexes on them correctly, and active/historical tables can really work well. If you set them up to follow the rules of an updatable Union view (as per BOL "Create View"), it makes querying the whole thing quite convenient.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Luke L (6/1/2009)


    If I have to query data based on the above, I've found that the OR syntax gets me better index utilization than IsNULL. Beyond that, a computed column of active as Flo mentioned would seem to be a more interesting way to go.

    BTW your sample data no worky...

    Server: Msg 8115, Level 16, State 2, Line 1

    Arithmetic overflow error converting expression to data type smalldatetime.

    The statement has been terminated.

    Fixed the sample data in the original post. I generated the insert statement from actual data and left out some quotes (') around a date and when it treats it as an integer it won't fit!

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • No love for the ISNULL it seems- can you guys explain the negatives of doing it that way? Thanks 😀

    I know there is the small performance hit, is this something that will add up to something meaningful over your data, or are there other reasons? (Index was mentioned- is that a factor? I wouldn't think so, so if it is please, let me know!)

  • Mike McQueen (6/1/2009)


    No love for the ISNULL it seems- can you guys explain the negatives of doing it that way? Thanks 😀

    I know there is the small performance hit, is this something that will add up to something meaningful over your data, or are there other reasons? (Index was mentioned- is that a factor? I wouldn't think so, so if it is please, let me know!)

    The IsNull solution can end up with non-SARGable queries, which means index scans instead of index seeks. That can (almost always does) make a significant difference in performance.

    It can also have a negative effect on the execution plan SQL Server comes up with. In some cases, this will matter more than the SARGability of the Where clause. What if, for example, the first run of the proc has a null value, and that results in 10 rows of data, which then have to be joined to another table? For 10 rows, it'll probably pick Nested Loops for the join mechanism. What if the next query has a specific value, and that makes it 10,000 rows? Suddenly, Nested Loops looks like a bad idea, but the optimizer is quite possibly stuck with the prior plan. Or it has to recompile the query each time it's run, and you end up with compilation locks, which reduce efficient concurrency.

    If you split the table, then there's less question on it, and you'll generally end up with better execution plans and better index use.

    Also, if you use the "arbitrarily late date" you are, to one extent or another, setting up a Y2K-type situation.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Indexing is definitely a reason for NOT using IsNull. Any time you use a function on a column in a WHERE or JOIN clause SQL Server has to apply that function against every row in the table/index. In the example I provided, it is not a huge deal as the column(s) I mention are not in an index and if they were they would either be one of the right-most columns or an included column in the index. So if I were to have a query like this:

    SELECT

    PUR.Person_ID,

    PUR.Unit_ID

    FROM

    dbo.Person_Unit_Relationships AS PUR

    WHERE

    PUR.person_Id = 10 AND

    PUR.Unit_ID = 62 AND

    ISNULL(PUR.Relationship_End_Date, '20790602') >= GETDATE()

    I can still get a Index seek with a bookmark lookup and then a filter because I have a unique index on Person_ID, Unit_ID, Relationship_Start_Date, but if my Index was on Relationship_End_Date, Person_Id, Unit_Id I only get an index scan.

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • Cardinal say to Confusedcius 😛 one day, "To avoid pork chop, must look eye.... BIG EYE." SO! When come to fork in road, use fork on pork chop. :hehe:

    [font="Courier New"]--===== Do this in a nice safe place

        USE TempDB

    --===== Conditionally drop the test table for reruns

         IF OBJECT_ID('dbo.JBMTest'IS NOT NULL

            DROP TABLE dbo.JBMTest

    --===== Create and populate a 1,000,000 row test table.

         -- Column "RowNum" has a range of 1 to 100,000 unique numbers

         -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

         -- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

         -- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

         -- Column "SomeDate" has a range of  >=01/01/2000 and <01/01/2010 non-unique date/times

         -- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

         --        for all rows.

         -- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

         --Jeff Moden

     SELECT TOP 1000000

            SomeID       IDENTITY(INT,1,1),

            SomeInt      ABS(CHECKSUM(NEWID()))%50000+1,

            SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

                         + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

            SomeCSV      CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

            SomeMoney    CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

            SomeDate     CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

            SomeHex12    RIGHT(NEWID(),12)

       INTO dbo.JBMTest

       FROM Master.dbo.SysColumns t1,

            Master.dbo.SysColumns t2

    --===== Make 50k of the dates the end of time date.  Since they're in random order, it should

         -- affect all time frames equally.

     UPDATE dbo.JBMTest

        SET SomeDate NULL

      WHERE SomeID  <= 50000

    --===== A table is not properly formed unless a clustered index has been assigned

      ALTER TABLE dbo.JBMTest

            ADD PRIMARY KEY CLUSTERED (SomeID)

    --===== Looky here. Big medicine for 2k5. Keeps pork chop out of nostril.

     CREATE INDEX IX_JBMTest_SomeDate

         ON dbo.JBMTest (SomeDateINCLUDE (SomeID,SomeMoney)

    PRINT '--===== OR IS NULL ==================================='

    SET STATISTICS TIME ON

     SELECT SomeID,SomeDate,SomeMoney

       FROM dbo.JBMTest

      WHERE SomeDate >= GETDATE()

         OR SomeDate IS NULL

    SET STATISTICS TIME OFF

    PRINT '--===== ISNULL ==================================='

    SET STATISTICS TIME ON

     SELECT SomeID,SomeDate,SomeMoney

       FROM dbo.JBMTest

      WHERE ISNULL(SomeDate,GETDATE()) >= GETDATE()

    SET STATISTICS TIME OFF

    PRINT '--===== Coalesce ==================================='

    SET STATISTICS TIME ON

     SELECT SomeID,SomeDate,SomeMoney

       FROM dbo.JBMTest

      WHERE COALESCE(SomeDate,GETDATE()) >= GETDATE()

    SET STATISTICS TIME OFF

    --===== Update the Nulls to the end of SQL time

     UPDATE dbo.JBMTest

        SET SomeDate '99991231'

      WHERE SomeDate IS NULL

    --===== Rebuild that index after such a large update

       DBCC DBREINDEX ('dbo.JBMTest','IX_JBMTest_SomeDate')

    PRINT '--===== End-of-Time Date ==================================='

    SET STATISTICS TIME ON

     SELECT SomeID,SomeDate,SomeMoney

       FROM dbo.JBMTest

      WHERE SomeDate >= GETDATE()

    SET STATISTICS TIME OFF

    [/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply