First Normal Form

  • Eric M Russell (6/30/2011)


    I never contain phone or address related columns in something like the primary customer table.

    #1 An address or phone number isn't an attribute or measure of the customer; the telephone company assigns it to us for the duration of our contract, so we only have a relationship with it.

    #2 Multiple customers may share the same address or phone number, and for many applications that's useful to know.

    #3 Virtually nobody has only one phone number these days.

    #4 There are usually additional contextual information like type, priority (primary, seconday, etc.), and also effective_from_date and effective_to_date that users need to know.

    Everything you mention is a Normalization that is done after 1NF.

    Customer Phone number relations are used in many texts to explain the differant levels of Normalization.

    I bet Tom was not trying to imply this as good DB structure.

  • I liked this article. It gave an understandable definition of 1NF without having to be familiar with all the terms of domain theory or relation theory or whatever esoterica people often use to prove how smart they are. I don't mind nulls for our use. I work for an insurance company and when someone calls in a claim "I don't know" or "I'm not sure now" are legitimate responses. Making a table for each possible "I don't know"-- "I don't know the name of the guy I hit", "I don't know my policy number" seems unwieldy.

    Thanks for the article and I look forward to 2NF!

    Jim

  • SanDroid (6/30/2011)


    Eric M Russell (6/30/2011)


    I never contain phone or address related columns in something like the primary customer table.

    #1 An address or phone number isn't an attribute or measure of the customer; the telephone company assigns it to us for the duration of our contract, so we only have a relationship with it.

    #2 Multiple customers may share the same address or phone number, and for many applications that's useful to know.

    #3 Virtually nobody has only one phone number these days.

    #4 There are usually additional contextual information like type, priority (primary, seconday, etc.), and also effective_from_date and effective_to_date that users need to know.

    Everything you mention is a Normalization that is done after 1NF.

    Customer Phone number relations are used in many texts to explain the differant levels of Normalization.

    I bet Tom was not trying to imply this as good DB structure.

    Right that would actually be 2nd normal form, I think. As for a common 1NF violation, and this is probably at the top of my gripe list, is containing "date/time" values in a varchar column, especially when it's inserted by multiple applications and there is no check constraint on the format or consensus about what format to use.

    For example, a request comes in to modify an ETL extract from a legacy database that's soon to be retired. Filter out patients less than 18 years old. OK, it's a Friday afternoon, and I'm hoping I can knock it out in an hour. However, I then look at the table and see this nonsense. Got to hate that.

    birthdate

    ------------

    04/15/2007

    3/3/2004

    jan 2, 2010

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (6/30/2011)


    SanDroid (6/30/2011)


    Eric M Russell (6/30/2011)


    I never contain phone or address related columns in something like the primary customer table.

    #1 An address or phone number isn't an attribute or measure of the customer; the telephone company assigns it to us for the duration of our contract, so we only have a relationship with it.

    #2 Multiple customers may share the same address or phone number, and for many applications that's useful to know.

    #3 Virtually nobody has only one phone number these days.

    #4 There are usually additional contextual information like type, priority (primary, seconday, etc.), and also effective_from_date and effective_to_date that users need to know.

    Everything you mention is a Normalization that is done after 1NF.

    Customer Phone number relations are used in many texts to explain the differant levels of Normalization.

    I bet Tom was not trying to imply this as good DB structure.

    Right that would actually be 2nd normal form, I think. As for a common 1NF violation, and this is probably at the top of my gripe list, is containing "date/time" values in a varchar column, especially when it's inserted by multiple applications and there is no check constraint on the format or consensus about what format to use.

    For example, a request comes in to modify an ETL extract from a legacy database that's soon to be retired. Filter out patients less than 18 years old. OK, it's a Friday afternoon, and I'm hoping I can knock it out in an hour. However, I then look at the table and see this nonsense. Got to hate that.

    birthdate

    ------------

    04/15/2007

    3/3/2004

    jan 2, 2010

    Weird... Have we worked for the same people? I had the same issue with an Enterprise ETL application once. After three failed deployments by the offshore Dev team my manager begged for help. Took me 20 minutes on a friday to re-do the whole thingredoSIS, debug the location of the only two dates MS SQL could not do an implicit conversion for, get those updated from the lame UI that allowed the entry, and got home early. All three of these examples can be implicitly converted from string to date by SQL server. Test it.

    create table testdate(birthdate datetime);

    Insert Into testdate

    Select '04/15/2007' birthdate

    union

    Select '3/3/2004' birthdate

    union

    Select 'jan 2, 2010' birthdate;

    select * from testdate;

    drop table testdate;

    I can't hate what MS SQL has made easier for me to work with.

  • SanDroid (6/30/2011)


    Eric M Russell (6/30/2011)


    SanDroid (6/30/2011)


    Eric M Russell (6/30/2011)


    I never contain phone or address related columns in something like the primary customer table.

    #1 An address or phone number isn't an attribute or measure of the customer; the telephone company assigns it to us for the duration of our contract, so we only have a relationship with it.

    #2 Multiple customers may share the same address or phone number, and for many applications that's useful to know.

    #3 Virtually nobody has only one phone number these days.

    #4 There are usually additional contextual information like type, priority (primary, seconday, etc.), and also effective_from_date and effective_to_date that users need to know.

    Everything you mention is a Normalization that is done after 1NF.

    Customer Phone number relations are used in many texts to explain the differant levels of Normalization.

    I bet Tom was not trying to imply this as good DB structure.

    Right that would actually be 2nd normal form, I think. As for a common 1NF violation, and this is probably at the top of my gripe list, is containing "date/time" values in a varchar column, especially when it's inserted by multiple applications and there is no check constraint on the format or consensus about what format to use.

    For example, a request comes in to modify an ETL extract from a legacy database that's soon to be retired. Filter out patients less than 18 years old. OK, it's a Friday afternoon, and I'm hoping I can knock it out in an hour. However, I then look at the table and see this nonsense. Got to hate that.

    birthdate

    ------------

    04/15/2007

    3/3/2004

    jan 2, 2010

    Weird... Have we worked for the same people? I had the same issue with an Enterprise ETL application once. After three failed deployments by the offshore Dev team my manager begged for help. Took me 20 minutes on a friday to re-do the whole thingredoSIS, debug the location of the only two dates MS SQL could not do an implicit conversion for, get those updated from the lame UI that allowed the entry, and got home early. All three of these examples can be implicitly converted from string to date by SQL server. Test it.

    create table testdate(birthdate datetime);

    Insert Into testdate

    Select '04/15/2007' birthdate

    union

    Select '3/3/2004' birthdate

    union

    Select 'jan 2, 2010' birthdate;

    select * from testdate;

    drop table testdate;

    I can't hate what MS SQL has made easier for me to work with.

    Letting SQL Server implicitly do the datatype conversion won't always cut it. For example, is this April 3, 2002 or March 4, 2002? It matters because in one situation I came across, the DOB was manually entered by end users and Name + DOB + ZIP was used to key back to another database.

    ------------

    4/3/2002

    Attempting to let SQL Server cast this value will cause the SQL query to fail with an error.

    ------------

    June 31, 2002

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (6/30/2011)


    Letting SQL Server implicitly do the datatype conversion won't always cut it. For example, is this April 3, 2002 or March 4, 2002? It matters because in one situation I came across, the DOB was manually entered by end users and Name + DOB + ZIP was used to key back to another database.

    ------------

    4/3/2002

    Attempting to let SQL Server cast this value will cause the SQL query to fail with an error.

    ------------

    June 31, 2002

    You are right. I was just pointing out that these date conversions are not so bad when moving them to MS SQL datetime data type. This data type is so well defined that implicit or explicit conversions from valid string values IMHO happen with an ease that cause me to love, not hate them. It makes the "magic" happen easy.

    I do hate trying to convert the JDE julian date format stored on any RDBMS to any date time datatype. Would love the prince that came up with that custom data type to have read about 1NF.

    I love the wings cause that's the fly part. -- Annonymous Boeing Engineer --

  • Eric M Russell (6/30/2011)


    I never contain phone or address related columns in something like the primary customer table.

    #1 An address or phone number isn't an attribute or measure of the customer; the telephone company assigns it to us for the duration of our contract, so we only have a relationship with it.

    #2 Multiple customers may share the same address or phone number, and for many applications that's useful to know.

    #3 Virtually nobody has only one phone number these days.

    #4 There are usually additional contextual information like type, priority (primary, seconday, etc.), and also effective_from_date and effective_to_date that users need to know.

    While all true, these are illustrative of the impetus for developing the normal forms other than 1NF so they make a nice place to start the discussion in his later articles should he so choose. Ultimately, though, I think the main point he was trying to make here is that, if the data in the field has no overt relationship (in an organizational sense) to other elements of the database but simply represents a blob of data which is of importance to a user or application, etc., external to the database, then what it contains does not affect normalization, even if, in the real world whatever it represents is related to other real-world items which are represented as data elements in the database

  • JJ B (6/30/2011)


    I appreciated the article. More so, I appreciated the author's replies in these discussions. Very informational and more so, a great example of professional conduct in response to criticism. Thanks for being a good role model. I look forward to reading your future articles.

    Thank you very much for this comment. A comment like this means a lot to any author, and is a real encouragement to try to maintain a high standard.

    Tom

  • KaplanDBA (6/30/2011)


    Setting aside the fact that I'd rather normalize this further than just 1NF, I want to understand your argument better:

    Well, NULL should always mean "The database doesn't contain this value", so it would be used in the case where the value is applicable but unknown for one reason or another as well as in the case where there is no value because it's not applicable; if you want to be able to tell for certain that the value is inapplicable, you can't use NULL unless you can guarantee that there are no cases where the value is applicable but unknown.

    What you're suggesting is that in the case where "Person A: Cannot possibly have a phone number" it would be better to use a garbage value like 000-000-0000,

    but in the case where "Person A: Could have a phone number but we don't know it" you suggest to use NULL?

    No, that's not at all what I'm suggesting. See below.

    What is the fundamental reasoning behind that decision? Is it simply based off the original definitions of NULL from Codd et al.?

    Codd's best known NULL definition in fact defines two distinct NULL values, and he uses a 4-valued logic instead of a 3-valued logic, and I wouldn't dream of using his 4VL for anything so I am certainly not using that definition!

    It's really a question of whether you want to distinguish various cases - and of whether the people who take the database over from you in the future will perhaps want to make distinctins that you currently believe are unimportant. If you have allowed your schema to have some values that can be inapplicable, you need to have some way of expressing "this is inapplicable". If you have a real world situation where some data may have to be entered before all the attributes are known, you need some way of expressing "we haven't put this value in the database". You can't use NULL to represent both if you want to be able to distinguish the two cases - but if you don't want to distinguish the two cases and are sure that no-one in future will then of course you can use NULL for both (but it's a good idea if you do so to document it very clearly in very many places). Since real world situations always seem to result in values being absent for one silly reason or another, it's a good idea to reserve NULL for that case - which means you need something other than NULL for the "known to be inapplicable" case. That's really all there is to it - no amazing mathematical basis or complex logic, just something simple and obvious.

    Chris Date suggested that ALL nulls should be replaced by special values; I don't believe that's possible, but some people do.

    Tom

  • Steven993 (6/30/2011)


    The only key in relational theory is the candidate key. Or just "key". There is no notion of "primary" which indeed come from SQL.

    Indeed the notion of "primary" key does not come from SQL. It's first mention in relational theory is in Codd's internal IBM report, the first public reference in relational theory in his paper in CACM 13/6 (June 1970) both of which antedate SQL by several years. In fact the term "primary key" antedates relational theory by a long long time. All of IMS, IDS, IDMS used the term "primary key" long before 1970, and IDMSX was using it before SQL (or even SEQUEL) existed. I'm pretty sure that all the main hierarchical and network DBMSs used the term - but maybe SDC's TDMS was an exception (I have no experience of that DBMS). I think Celko may be correct in suggesting that the term originated with some sort of sequential file system and meant the key which determined the sequence, and the most probable candidate is ISAM (which certainly used the term).

    Tom

  • SanDroid (6/30/2011)


    Eric M Russell (6/30/2011)


    I never contain phone or address related columns in something like the primary customer table.

    #1 An address or phone number isn't an attribute or measure of the customer; the telephone company assigns it to us for the duration of our contract, so we only have a relationship with it.

    #2 Multiple customers may share the same address or phone number, and for many applications that's useful to know.

    #3 Virtually nobody has only one phone number these days.

    #4 There are usually additional contextual information like type, priority (primary, seconday, etc.), and also effective_from_date and effective_to_date that users need to know.

    Everything you mention is a Normalization that is done after 1NF.

    Customer Phone number relations are used in many texts to explain the differant levels of Normalization.

    I bet Tom was not trying to imply this as good DB structure.

    That was a pretty safe bet!

    In what was meant to be a short and simple description of first normal form without any reference to other normal forms there was no hope of producing a good DB structure. In my view nothing that isn't all of 2NF, 3NF, and EKNF as well as 1NF is a good database structure, and if you can get to BCNF or even 4NF without violating the representation principle a good structure will have those properties too. Also, a structure that is not future-proof is not a good structure. When considering what the business rules are when determining what is 2NF, 3NF, EKNF, BCNG or 4NF one should consider what changes in business rules may happen next year, or the year after, or even later. For 1NF one should consider what someone might produce as a new requirement on the database in the future, as a new requirement may make a list-valued attribute into a violation of 1NF.

    Tom

  • Hi Tom,

    Firstly, thanks for a great article, and an even better discussion following up from it. I'm really looking forward to parts 2 and 3.

    Just one question. When you say

    Tom.Thomson (6/30/2011)

    In my view nothing that isn't all of 2NF, 3NF, and EKNF as well as 1NF is a good database structure...

    are you implying that you can have a model that is in 2NF, 3NF etc that is not 1NF? I've always been taught that for something to me 2NF it does have to be 1NF in the first place. Or am I reading something into that sentence that isn't there?

    Regards,

    Jan

    --------------------------------------------------------------------------
    A little knowledge is a dangerous thing (Alexander Pope)
    In order for us to help you as efficiently as possible, please read this before posting (courtesy of Jeff Moden)[/url]

  • Tom.Thomson (6/30/2011)


    ...All of IMS, IDS, IDMS used the term "primary key" long before 1970...

    Wow! Tom, that takes me back :Wow:

    IDMS was the first database I worked with :w00t:

    Nice article by by the way 🙂

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Jan Van der Eecken (7/1/2011)


    Hi Tom,

    Firstly, thanks for a great article, and an even better discussion following up from it. I'm really looking forward to parts 2 and 3.

    Thanks. I'm glad you liked it.

    Just one question. When you say

    Tom.Thomson (6/30/2011)

    In my view nothing that isn't all of 2NF, 3NF, and EKNF as well as 1NF is a good database structure...

    are you implying that you can have a model that is in 2NF, 3NF etc that is not 1NF? I've always been taught that for something to me 2NF it does have to be 1NF in the first place. Or am I reading something into that sentence that isn't there?

    Regards,

    Jan

    I didn't intend to suggest that somethiong could be in 2NF without being in 1NF, or in 3NF that isn't in 2NF - just that going to 1NF is generally not on its own going far enough to produce to produce a good structure, one has to do lots of other normalisation too. I guess I should have phrased it better.

    Tom

  • Jan Van der Eecken (7/1/2011)


    Just one question. When you say

    Tom.Thomson (6/30/2011)

    In my view nothing that isn't all of 2NF, 3NF, and EKNF as well as 1NF is a good database structure...

    are you implying that you can have a model that is in 2NF, 3NF etc that is not 1NF? I've always been taught that for something to me 2NF it does have to be 1NF in the first place. Or am I reading something into that sentence that isn't there?

    All the normal forms are concerned with dependencies satisfied by relations (or relation variables). If 1NF is understood to be exactly equivalent to the definition of a relation then 2NF, 3NF, BCNF, etc certainly also imply 1NF, just because relations are the only things that satisfy those normal forms.

    However, if you understand (misunderstand in my view) 1NF to stipulate other conditions over and above the basic requirement that a schema consists only of relations then the higher normal forms don't have to satisfy 1NF.

    For example if you think that 1NF disallows certain types of value in a relation then that's something that isn't required by any other NF. Or if you think 1NF means a relation must have a "primary" key and that "primary" means something other than a candidate key then you'd also be defining something that was not required by any other NF.

    This is why I don't think it's productive to say that 1NF disallows certain types of value - even complex values like XML that the DBMS may later deconstruct into other values; or relation values; or other "exotic" objects like multimedia. Tom's article suggests a criteria for excluding certain types of value from 1NF based on how you intend to use them in the database. I disagree. If it is useful to put some value in a relation and potentially subject it to the same dependencies as any other relation then it's perfectly reasonable to want to apply the principles of normal forms to it. In Codd's day databases were concerned with numbers and strings and little else. That's not the world we are in today. As soon as we start placing limits on the type of data values a relation is "allowed" to include then we weaken the relational model and make it less useful.

Viewing 15 posts - 31 through 45 (of 69 total)

You must be logged in to reply to this topic. Login to reply