NULL vs. Empty String

  • Hi all,

    I was wonderning what is a good entry for a database record if it has to be left blank (ie, no data is entered). Is NULL better than Empty String or vice versa.

    I apologize if this is a redundant posting.

    thank you all in advance,

    V

  • My answer is that it depends.

    Basically, if you mean using a 'space' for an empty string, then NULL takes up less space.

    But if you are using a query such as this:

    SELECT col1, col2

    FROM mytable

    WHERE col1 = @myvariable

    Then NULL will probably cause problems. To use NULL in the search you would have to change the query to WHERE col1 IS NULL. So you couldn't use the variable. But a space could be entered as:

    SET @myvariable = ' '

    and you wouldn't have to make any changes to the WHERE.

    -SQLBill

  • My thinking while writing this question was, if we discuss from a purely database maintainence (and/or performance) perspective, which is better?

  • My rule is to only make a column nullable if it truly needs to hold "unknown" values. Nullable columns will force you to use ternary predicates with NULL logic that work correctly for the actual "unknown" meaning of NULL. E.g., if you create a column for "Address Line 2", there may be many tuples where an empty string means "known to be blank," which is logically very different from "unknown."

    If you don't need NULL logic then don't use nullable columns.

    --Jonathan



    --Jonathan

  • Well, as has been said above, it depends...

    you might take a look at

    http://www.firstsql.com/inulls.htm

    http://qa.sqlservercentral.com/columnists/jtravis/understandingthedifferencebetweenisnull.asp

    http://www.hughdarwen.freeola.com/TheThirdManifesto.web/Missing-info-without-nulls.pdf

    to get further information on NULL

    Frank

    http://www.insidesql.de

    Well, actually no real need to read my last link. Might actually lead to confusion . But nice to know anyway.

    Edited by - Frank Kalis on 10/21/2003 01:12:18 AM

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Keep it simple and stupid.

    Follow Jonathan's guidelines.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • quote:


    Keep it simple and stupid.


    are we talking about management ???

    Frank

    http://www.insidesql.de

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • hi!

    try to avoid empty strings, if you're up to creating "clean" code on whatever client side you're using. stumbling over a NULL from time to time will make your developers remember, that ther *is* something like NULL and that they'll have to check for it!

    otherwise they'll assume that empty means just "empty string" which will ease programming but make your applications do the funniest things, once some NULLs arise.

    best regards,

    chris.

  • Have to agree with Jonathan. Null can have an entirely different meaning from an empty string.

    Just think about an integer column. In that case a NULL is completely different from a zero value.

    For me every column that is not required has to be nullable, and a null value indicates that it has not been entered yet.

  • Definitely NULL. I would never even consider using a blank. In fact, it drives me nuts when a front end tool inserts a blank instead of a null in a nullable column. Because of this you should always query with IsNull(nullcol, '') = ''.

  • First off I will state I disagree with CJ Date and the others who think NULL is a waste in SQL. NULL does not apply only to SQL but to most all languages and in fact came from the others as far as concept.

    NULL is the abstract expression of the unknown.

    Without the abstract value of NULL how then do you define a completely unknown value. Some suggest defaults but give no clear idea as to how these defaults solve the NULL problem as they will still represent the unknown.

    Now consider this when a datatype is fixed length such as an int or char how is the data express in the database when it does exist.

    Here is the DBCC PAGE output from a table I created tx with column d int, c char(10), and a int.

    First off ignore the 00160010 at the begining of each hex set as it has special meaning but not important here.

    Note: int is 4 bytes char(10) is 10 and int is 4 for a total of 18 bytes

    In Slot 0 (record 1 on the page) the space from the [ and ] is 20 but the 0003 in the last set can be ignored because it has to do with end of data so subtract 2 (HEX is base on 00 thru FF as 1 byte and we are removing 2 sets).

    Thus we have 18 total bytes. Also note the order of bytes is funny in that 00000001 is actually read 00010000 and so on to read out the rows in this view.

    436B4060: 00160010 [00000001 00000000 00000000 ................

    436B4070: 00000000 00030000] 06 .........

    Now none of that is key what is key is the 06 at the very end seperate from the rest in this case.

    It is important in the fact that columns 2 and 3 are NULL in the first example. Now as you see it is 6 but the key is it is a bitmask. Column 1 = 1, 2 = 2, 3 = 4, 4 = 8 and so on up to 1,024 columns

    So 2 + 4 = 6 the bitmask tells me column 2 and 3 are NULL as this is the NULL bitmask.

    
    
    Slot 0 Offset 0x60
    ------------------
    Record Type = PRIMARY_RECORD
    Record Attributes = NULL_BITMAP
    436B4060: 00160010 00000001 00000000 00000000 ................
    436B4070: 00000000 00030000 06 .........
    d = 1
    c = [NULL]
    a = [NULL]

    Slot 1 Offset 0x79
    ------------------
    Record Type = PRIMARY_RECORD
    Record Attributes = NULL_BITMAP
    436B4079: 00160010 00000002 00000000 00000000 ................
    436B4089: 00000000 00030000 06 .........
    d = 2
    c = [NULL]
    a = [NULL]

    Slot 2 Offset 0x92
    ------------------
    Record Type = PRIMARY_RECORD
    Record Attributes = NULL_BITMAP
    436B4092: 00160010 00000003 2020207a 20202020 ........z
    436B40A2: 00002020 00030000 04 .......
    d = 3
    c = z
    a = [NULL]

    Slot 3 Offset 0xab
    ------------------
    Record Type = PRIMARY_RECORD
    Record Attributes = NULL_BITMAP
    436B40AB: 00160010 00000004 2020207a 20202020 ........z
    436B40BB: 00012020 00030000 00 .......
    d = 4
    c = z
    a = 1

    Ok so under the hood the data is still stored but with 0 and a bitmask to tell the engine which columns are NULL.

    Now from there what happens with varchars. It is a bit different.

    The table was defined as ty with column a int, b varchar(5), c tinyint, and d char(8).

    Here is the DBCC PAGE output

    
    
    Slot 0 Offset 0x60
    ------------------
    Record Type = PRIMARY_RECORD
    Record Attributes = NULL_BITMAP
    43808060: 00110010 00000001 00000000 00000000 ................
    43808070: 0e000400 ....
    a = 1
    b = [NULL]
    c = [NULL]
    d = [NULL]

    Slot 1 Offset 0x74
    ------------------
    Record Type = PRIMARY_RECORD
    Record Attributes = NULL_BITMAP
    43808074: 00110010 00000001 00000002 00000000 ................
    43808084: 0a000400 ....
    a = 1
    b = [NULL]
    c = 2
    d = [NULL]

    Slot 2 Offset 0x88
    ------------------
    Record Type = PRIMARY_RECORD
    Record Attributes = NULL_BITMAP
    43808088: 00110010 00000001 61616102 20202061 .........aaaa
    43808098: 02000420 ...
    a = 1
    b = [NULL]
    c = 2
    d = aaaa

    Slot 3 Offset 0x9c
    ------------------
    Record Type = PRIMARY_RECORD
    Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
    4380809C: 00110030 00000001 61616102 20202061 0........aaaa
    438080AC: 00000420 00190001 63 .......c
    a = 1
    b = c
    c = 2
    d = aaaa

    Now there are a few differences here that make using NULLable varchars good and tell us a bit about the engine.

    In Slot 0 we notice we are 5 bytes short of what a usual record looks like when all fixed length, this is because nothing is written for the varchar field when not filled.

    But we see the fist byte on the second row has 0e00 whih is what we are looking for here. What has happened is the engine bitmasks the fact 2,3, and 4 are NULL (in other words 2+4+8=14 or 0E in HEX) so the engine uses the NULL marker even for varchar fields.

    Proceed to SLOT 2 and SLOT 3 and you see the respective values for NULL bitmask 0a (columsn 2 and 4) and 02 (column 2 only).

    Now the fun comes when we add data to the varchar our row grows by several bytes but not neccessarily equal to the 5 for the varchar. What happens here is there is a marker added to tell the engine where the varchar data for the first non-fixed-length column begins and takes up 2 bytes itself (this is the offset from begining of record). a new one is added for each varchar column that is put into play and all come before the varchar data for all columns is written (or any non-fixed-length datatype).

    The key thou is the bitmask for columns is used under all cirumstances to note when there are NULL values.

    So in the case of a non-fixed-length column that may have NULLs using a default instead is a waste of space which is a commoditity best served otherwise.

    Also, if you are going to use a Non-NULL value how do you deal with it more effectively (or at least as effectively) compared to the NULL bitmask. You still have to dream up something that will add the overhead for NULL to be dealt with but you will rely on the developer coding a NULL or UNKNOWN value process which opens the door for many errors with a strict guideline.

    My opinion is use NULL when possible but in some situations it may make more sense to use a default for no data such as '' because it may be easier to deal with user input in char values. Just be carefull that good records do not have a overlap issue such a $0 != $0 becuase one is unknown and oen represents no value. As for non-fixed-length datatypes thou, it is still up to you, but keep in mind you save space with NULL.

    I hope all that helps a bit or leads you to some understanding of NULL and it's handling but as for the whole NULLs are not valueable I am afraid that is not the cause becuase you will always run into the need to express an unknown value as opposed to a non-existant value.

  • Oh 1 more not I would suggest if you have alots of repeating values or if the NULLable column will be NULL often then consider splitting the table and using a 1 to 1 relationship. This will still produce NULLs when joined but can be bennificial, especially with fixed length datatypes, in saving space.

  • I prefer defaults whenever possible. In some columns (such as Termination Date in an Employees table) NULL is the logical choice.

    It depends.

    Dr. Peter Venkman: Generally you don't see that kind of behavior in a major appliance.

    Patrick

    Quand on parle du loup, on en voit la queue

  • quote:


    I prefer defaults whenever possible. In some columns (such as Termination Date in an Employees table) NULL is the logical choice.

    It depends.


    That's interesting, as I would use something like 99991231 if they're not terminated. The termination date is not "unknown" in the sense that they may or may not have been terminated, and this allows normal logic and (perhaps) making the column NOT NULL, although it does preclude using the IS NOT NULL logic as a shortcut for "not terminated." Personally I think this "logic" is misguided: you may save a tiny bit of space by making the TerminationDate do double duty, but you have limited the information you can store. E.g., if you use a separate flag column for Terminated, you will be able to tell if someone is terminated whether or not their termination date has been set. (Why am I getting a mental picture of the Governor-Elect of California?)

    The nulls vs no nulls debate has been raging since Date and Codd. We're not going to settle it in this forum, although it's interesting to read some of these opinions and reasoning.

    quote:


    My opinion is use NULL when possible...


    I guess I disagree with James ("Antares686") on this; I am loathe to use nullable columns and create them only when I need the "unknown" logic. I certainly won't allow NULL to be used when the value is instead empty. E.g. I might make a City column in an address table nullable and not allow blanks, but I would make the Name column not null and not allow blanks. I might make the PhoneNumber column nullable and allow blanks (not everyone has a phone). That way one could query for entities with no phones or for entities where we don't know their phone numbers. A problem with allowing both blanks and nulls, as I wrote in another thread, is making clear the distinction in the data-entry front-end.

    --Jonathan



    --Jonathan

  • quote:


    That's interesting, as I would use something like 99991231 if they're not terminated. The termination date is not "unknown" in the sense that they may or may not have been terminated, and this allows normal logic and (perhaps) making the column NOT NULL, although it does preclude using the IS NOT NULL logic as a shortcut for "not terminated." Personally I think this "logic" is misguided: you may save a tiny bit of space by making the TerminationDate do double duty, but you have limited the information you can store. E.g., if you use a separate flag column for Terminated, you will be able to tell if someone is terminated whether or not their termination date has been set. (Why am I getting a mental picture of the Governor-Elect of California?)


    uff, and I thought while reading Patricks' post, the programmers of our human resources system ?!? are completely wrong using a termination date of 12/31/9999.

    Now that Jonathan mentioned the Californian Governor Election I may ask the question if it advisable to keep calm with critic of the government in California?

    Do you have to fear hearing 'Hasta la vista, baby' and getting terminated

    quote:


    The nulls vs no nulls debate has been raging since Date and Codd. We're not going to settle it in this forum, although it's interesting to read some of these opinions and reasoning.


    NULLs have been, are and I guess will always be an integral part of RDBMS, so help me Codd (#3).

    Frank

    http://www.insidesql.de

    Edited by - Frank Kalis on 10/22/2003 12:29:16 AM

    Edited by - Frank Kalis on 10/22/2003 12:31:20 AM

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply