To NULL or not to NULL

  • I recently started a new job where the existing database tables have all fields designated as NULLS Not Allowed and instead all fields have default values of '' (empty string), 0 (zero) or a default date time stamp of 1/1/1901 00:00:00. I was mystified, as I have always allowed NULLS in anything that didn't require a value at record creation. I conferred with the veterean designer who originated this scheme, and he is adamantly against using nulls. Instead his application code interprets any date with the default value to print a blank on user screens and reports. According to him there is definite performance gain in not allowing nulls but I would think the additional storage requirements would more than negate any performance gain. Is there any truth to his argument? If not, besides storage, what advantages are there in using NULLS over not using them?

    Ron Kunce

    Ron K.

    "Any fool can write code that a computer can understand. Good programmers write code that humans can understand." -- Martin Fowler

  • Number type field value 0 is quite common real value.

    I hardly can imagine how to tell the difference between 'real' 0 and 'NULL' 0.

  • I realize how number fields accomodate nulls, and these are of little concern. However, when you have many char(##) fields and the default is set to '', then you have allocated ## bytes of storage for that empty value where a null would not have. Even of more concern is using a default date of 1/1/1901 in all datetime fields. Not only is the storage requirements increased needlessly and though not a real problem, having an "Ending" Date of 1/1/1901 within a date span set of values where an actual end date has not ocurred yet is disconcerting, requiring extra coding to blank it out on user screens and reports.

    My real purpose here is to determine if there really is a performance advantage to not allowing nulls, as I was told, or if our lead designer is blowing smoke as I believe! However, I need supportive arguements to refute his claim. (Performance advantage claims are always one of the hardest to confirm or disprove without setting up and running your benchmark trials.)

    Ron Kunce

    Ron K.

    "Any fool can write code that a computer can understand. Good programmers write code that humans can understand." -- Martin Fowler

  • From what I understand if nulls are allowed the on disk data adds a supporting column consisting of a bit map that identifies which columns in a particular row are null. I cant really see this affecting performance.

    The real problem with NULL is that you (or more likeley someone else on the dev team) have to understand 3 valued logic and handle them correctly.

    They certainly have their place, I hope my medical records have a NULL in the DOD column!

  • CHAR fields set to nullable will be handled as VARCHAR if the ANSI_PADDING is turned off. If the ANSI_PADDING is NOT turned off, the same amount of space is allocated whether or not the field is null (i.e. the maximum size of the field). Datetime fields are a fixed-width field, so AFAIK - the space is allocated no matter what.

    One way or another - there's going to be some amount of code to deal with either the "dummy" values or the NULLS. Some dev platforms, including .NET to some degree, don't much appreciate receiving NULL values to handle, so there might be something to be said about there being extra work. Of course - you're going to have the SAME amount of work to handle the dummy value, AND then you also have to handle the dummy value in all backend processing as well, so I'd often agree with you.

    Bottom line though - it might depend on your specific scenario. Is it important to know whether a field was never entered (usually the application of a NULL), or does it work better for something default (like, say 0, but could be any number of values that make most sense as a default).

    Note that none of that was performance-related. You might see some that would advocate that you push your normalization to such extremes as to not allow for Nulls, but quite honestly - given current RDBMS processing, that usually NEGATIVELY impacts your app.

    I can't seem to see any performance differences between the two. I just ran a few tests on some test data, and the difference between IS NULL and ='' is so small that I am not sure it's anything more than random (a few milliseconds difference over 10M records). This also applies to the converse (IS NOT NULL vs. <>'' vs. >''). Reads and CPU cost are the roughly the same as well.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • My real purpose here is to determine if there really is a performance advantage to not allowing nulls, as I was told, or if our lead designer is blowing smoke as I believe!

    Ron,

    A good friend of mine once said, "A Developer must not guess... a Developer must know." DBA's and Designers tend to stress that as they "knock their rings" on the table. Along with that, another good friend of mine suggests that "One good test is worth a thousand speculations and calculations."

    Performance advantage claims are always one of the hardest to confirm or disprove without setting up and running your benchmark trials.

    Considering the point you're trying to make and who you're trying to make it with, you should really take the time to do that 😛

    However, I need supportive arguements to refute his claim.

    No... you need supportive tests... and a pant-load of data... 😉 With that in mind... here's the pant-load of data...

    DROP TABLE JBMTest

    GO

    --===== Create and populate a 2,000,000 row test table.

    -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

    -- Column "SomeIntNN" has a range of 1 to 50,000 non-unique numbers and 0's

    -- Column "SomeDateNN" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times and 0's

    -- Column "SomeCharNN" has a range of "AA" to "ZZ" non-unique 2 character strings and blanks

    -- Column "SomeIntNull" has a range of 1 to 50,000 non-unique numbers and Nulls

    -- Column "SomeDateNull" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times and Nulls

    -- Column "SomeCharNull" has a range of "AA" to "ZZ" non-unique 2 character strings and Nulls

    SELECT TOP 2000000

    RowNum = IDENTITY(INT,1,1),

    SomeIntNN = ABS(CHECKSUM(NEWID()))%50000+1,

    SomeDateNN = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    SomeCharNN = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65),

    SomeIntNull = ABS(CHECKSUM(NEWID()))%50000+1,

    SomeDateNull = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    SomeCharNull = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    + CHAR(ABS(CHECKSUM(NEWID()))%26+65)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns t1,

    Master.dbo.SysColumns t2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (RowNum)

    --===== Update even numbered rows with 0's/blanks and Null's

    UPDATE dbo.JBMTest

    SET SomeIntNN = 0,

    SomeDateNN = 0, --Same as '01/01/1900'

    SomeCharNN = '',

    SomeIntNull = NULL,

    SomeDateNull = NULL,

    SomeCharNull = NULL

    WHERE RowNum %2 = 0

    ... and here's the tests... I recommend that you run them multiple times and log them in a spreadsheet so you can show what you've learned to the "smoker"... I also recommend that you try some variations... like finding 0 and Null values instead of finding non-0 and Non-Null values and adding indexes and clearing cache, etc, etc... In other words, don't take a knife to a gun-fight. 😉

    --===== Declare a timer variable

    DECLARE @StartTime DATETIME

    --===== Declare dummy variables to make it so we don't have to display

    -- results which would taint the speed tests.

    DECLARE @DummyInt INT

    DECLARE @DummyChar CHAR(2)

    DECLARE @DummyDate DATETIME

    PRINT REPLICATE('=',78)

    --==============================================================================

    --===== Test which is faster... finding non-0 Ints or non-null Ints

    --==============================================================================

    PRINT 'Testing lookup for non-0 Ints in non-nullable column...'

    PRINT '(Note: this test will always be slow on first test... no statistics built up'

    SET @StartTime = GETDATE()

    SELECT @DummyInt = SomeIntNN

    FROM dbo.JBMTest

    WHERE SomeIntNN > 0

    PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)'

    PRINT REPLICATE('-',78)

    PRINT 'Testing lookup for non-Null Ints in nullable column...'

    SET @StartTime = GETDATE()

    SELECT @DummyInt = SomeIntNULL

    FROM dbo.JBMTest

    WHERE SomeIntNULL IS NOT NULL

    PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)'

    PRINT REPLICATE('-',78)

    PRINT 'Testing lookup for non-Null Ints in nullable column a "special" way...'

    SET @StartTime = GETDATE()

    SELECT @DummyInt = SomeIntNULL

    FROM dbo.JBMTest

    WHERE SomeIntNULL > 0 -- Will still exclude nulls because the cannot be compared this way

    PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)'

    PRINT REPLICATE('=',78)

    --==============================================================================

    --===== Test which is faster... finding non-0 Date or non-null Date

    --==============================================================================

    PRINT 'Testing lookup for non-0 Dates in non-nullable column...'

    SET @StartTime = GETDATE()

    SELECT @DummyDate = SomeDateNN

    FROM dbo.JBMTest

    WHERE SomeDateNN > 0

    PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)'

    PRINT REPLICATE('-',78)

    PRINT 'Testing lookup for non-Null Dates in nullable column...'

    SET @StartTime = GETDATE()

    SELECT @DummyDate = SomeDateNULL

    FROM dbo.JBMTest

    WHERE SomeDateNULL IS NOT NULL

    PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)'

    PRINT REPLICATE('-',78)

    PRINT 'Testing lookup for non-Null Dates in nullable column a "special" way...'

    SET @StartTime = GETDATE()

    SELECT @DummyDate = SomeDateNULL

    FROM dbo.JBMTest

    WHERE SomeDateNULL > 0 -- Will still exclude nulls because the cannot be compared this way

    PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)'

    PRINT REPLICATE('=',78)

    --==============================================================================

    --===== Test which is faster... finding non-blank Char or non-Null Char

    --==============================================================================

    PRINT 'Testing lookup for non-blank Char in non-nullable column...'

    SET @StartTime = GETDATE()

    SELECT @DummyChar = SomeCharNN

    FROM dbo.JBMTest

    WHERE SomeCharNN > ''

    PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)'

    PRINT REPLICATE('-',78)

    PRINT 'Testing lookup for non-blank Char in nullable column...'

    SET @StartTime = GETDATE()

    SELECT @DummyChar = SomeCharNULL

    FROM dbo.JBMTest

    WHERE SomeCharNULL IS NOT NULL

    PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)'

    PRINT REPLICATE('-',78)

    PRINT 'Testing lookup for non-blank Char in nullable column a "special" way...'

    SET @StartTime = GETDATE()

    SELECT @DummyChar = SomeCharNULL

    FROM dbo.JBMTest

    WHERE SomeCharNULL > '' -- Will still exclude nulls because the cannot be compared this way

    PRINT CONVERT(CHAR(12),GETDATE()-@StartTime,114) + ' Duration (hh:mi:ss:mmm)'

    PRINT REPLICATE('=',78)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • ...which yields on my system:

    ==============================================================================

    Testing lookup for non-0 Ints in non-nullable column...

    (Note: this test will always be slow on first test... no statistics built up

    00:00:00:310 Duration (hh:mi:ss:mmm)

    ------------------------------------------------------------------------------

    Testing lookup for non-Null Ints in nullable column...

    00:00:00:330 Duration (hh:mi:ss:mmm)

    ------------------------------------------------------------------------------

    Testing lookup for non-Null Ints in nullable column a "special" way...

    00:00:00:373 Duration (hh:mi:ss:mmm)

    ==============================================================================

    Testing lookup for non-0 Dates in non-nullable column...

    00:00:00:313 Duration (hh:mi:ss:mmm)

    ------------------------------------------------------------------------------

    Testing lookup for non-Null Dates in nullable column...

    00:00:00:313 Duration (hh:mi:ss:mmm)

    ------------------------------------------------------------------------------

    Testing lookup for non-Null Dates in nullable column a "special" way...

    00:00:00:297 Duration (hh:mi:ss:mmm)

    ==============================================================================

    Testing lookup for non-blank Char in non-nullable column...

    00:00:00:373 Duration (hh:mi:ss:mmm)

    ------------------------------------------------------------------------------

    Testing lookup for non-blank Char in nullable column...

    00:00:00:313 Duration (hh:mi:ss:mmm)

    ------------------------------------------------------------------------------

    Testing lookup for non-blank Char in nullable column a "special" way...

    00:00:00:343 Duration (hh:mi:ss:mmm)

    ==============================================================================

    Again - in my mind - so small I'm not sure that's even significant.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Yeap... even on my 4 year old 1.8GHz desktop... not much of a difference... remember, this is to "return" a million rows out of 2 million...

    ==============================================================================

    Testing lookup for non-0 Ints in non-nullable column...

    (Note: this test will always be slow on first test... no statistics built up

    00:00:00:783 Duration (hh:mi:ss:mmm)

    ------------------------------------------------------------------------------

    Testing lookup for non-Null Ints in nullable column...

    00:00:00:783 Duration (hh:mi:ss:mmm)

    ------------------------------------------------------------------------------

    Testing lookup for non-Null Ints in nullable column a "special" way...

    00:00:00:810 Duration (hh:mi:ss:mmm)

    ==============================================================================

    Testing lookup for non-0 Dates in non-nullable column...

    00:00:00:797 Duration (hh:mi:ss:mmm)

    ------------------------------------------------------------------------------

    Testing lookup for non-Null Dates in nullable column...

    00:00:00:783 Duration (hh:mi:ss:mmm)

    ------------------------------------------------------------------------------

    Testing lookup for non-Null Dates in nullable column a "special" way...

    00:00:00:780 Duration (hh:mi:ss:mmm)

    ==============================================================================

    Testing lookup for non-blank Char in non-nullable column...

    00:00:01:563 Duration (hh:mi:ss:mmm)

    ------------------------------------------------------------------------------

    Testing lookup for non-blank Char in nullable column...

    00:00:01:440 Duration (hh:mi:ss:mmm)

    ------------------------------------------------------------------------------

    Testing lookup for non-blank Char in nullable column a "special" way...

    00:00:01:450 Duration (hh:mi:ss:mmm)

    ==============================================================================

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Thanks for the Ammo!:D

    Ron K.

    "Any fool can write code that a computer can understand. Good programmers write code that humans can understand." -- Martin Fowler

  • allen davidson (11/28/2007)


    From what I understand if nulls are allowed the on disk data adds a supporting column consisting of a bit map that identifies which columns in a particular row are null. I cant really see this affecting performance.

    If you have the time to actually trace my history of post you will find this

    As I recall the null bitmap is there even if no nullable column exists it just is a big fat 0 which as I recall is 6 maybe up to 8 bytes wide.

    With regards to space this is the key the offset for the end of variable length data is 4 bytes wide which I found if you have a non-nullable column that is say varchar(1) defined after all other variable length data items which are nullable you loose the savings. The key is order their placement if at all possible in the design with non-nullable first, then from least to most likely nullable next.

    I was able to figure this out with the DBCC PAGE command and have posted a lot in the past with it.

    Now as for the reasons some folks give for not using null there are many but none I have found (including performance) have ever been founded in documentation or other evidence, the system still has to process a third potential somewhere to say it is an unknown. That said, you can have performance issues if you don't understand NULL behavior which can cause you to get unexpected results but that is lack of understanding and not perfomance with the system.

  • Ron_Kunce (11/28/2007)


    Thanks for the Ammo!:D

    Heh... ammo is one thing... learning how to shoot it straight may be another. My advice is to play with the code I gave you... add indexes and do the other things I suggested like look for the opposite condition that the code is looking for now. That way, you'll be prepared for all of the side-bars the Designer is going to throw at you.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Thanks Much for your comment. It totally explains why I got what amounts to an inane excuse for not using nulls from a person who otherwise is a very knowledgeable programmer (yet is lacking in relational database design).

    Thanks again.

    Ron K.

    "Any fool can write code that a computer can understand. Good programmers write code that humans can understand." -- Martin Fowler

  • Ron,

    I suggest you follow Jeff's advice and play with indexes.

    NULLs may really affect execution plan.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply