Database Design and Reference Tables

  • No, we certainly wouldn't want to let customers modify FKs on installation.

    However, the original requirement is inescapable (and - in our market context - not individually chargeable for).

    The duplication of data, and subsequent drift, is unacceptable.

    It seems I can no longer seek best practice but 'least worst' practice.

    Of the three alternatives, despite the criticism given in this thread, the CLUTs table still looks to be the about the least worst across the life of a software product - with the considerable commercial advantage of simple initial development.

    Perhaps it is time to open a separate thread on the 'least worst' way to build a CLUTs table.

  • I think you are minimizing the risks of a MUCK table and maximizing the percieved benefits...

    The risk of drift is real, but using MUCKs does nothing to address it.  Using seperate tables you could create a job to check the two tables against each other and where differences are found send an alert to someone.  That is one example (and maybe not the best) of mitigating the risk of drift.  What can you do with a MUCK to prevent bad data from being entered and used?

    If you are absolutely determined to use a MUCK, use them as little as possible.  Using them totally negates the purpose and benefit of using a database at all; you might as well just use a file-based system for data persistence.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • I probably am, but - hey - I'm not the database programmer in this project, far less the lead design authority.  I'm just the business tier programmer trying to find out how best to resolve a conflict.

    If forced down the MUCK/CLUTs route, as I fear we will be; I would embed the Category (Type) of the User defined code field in the entity table, as well as the code, and use either a check constraint or insert & update triggers to do the necessary look up (to ensure the category in the entity record matches the category configured) and a foreign key constraint to ensure the Category x Code composite  matches a row in the MUCK.

    It will undoubtedly hurt insert & update performance but leaves the cleanest read.

    Using jobs to minimise drift in copied tables does look like a possible alternative, though it does require us to maintain some storage of which tables are duplicates of which other tables.  

  • Notice that the name "Automobiles-to-Zipcodes" requires a quoted identifier.  See Joe Celko's SQL Programming Style, Chap. 1.1.2 ("Avoid All Special Characters in Names") and 1.1.3 ("Avoid Quoted Identifiers").

  • Although I agree that the design proposed in the article should be avoided, I'm a little confused by all the comments on normalization, especially this one.

    In what way does the design violate 1NF? Are you sure you understand what 1NF is?

     

    TroyK

  • I would call such a column one that is defined over an "interstingly" designed domain (data type). It would be possible for it to be correct, but unlikely.

    Regarding the table from the article -- there is only one value (a "description") at each "intersection" of row and column. Furthermore, I have a high degree of confidence that the DBMS implements the necessary operators to support the manipulation of the values of the particular domain (data type) over which the column is defined (presumably a varchar(...)). Assuming my confidence is warranted, the table is in 1NF.

    TroyK

  • To add to what Joe said:

    Use of MUCK tables violates either the Information Principle or 1NF--or both.  It is common to see a MUCK table defined with like so

    CREATE TABLE Codes (CodeName varchar(50), CodeValue varchar(50), CodeType varchar(50)) 

    In this case the code columns in the referencing tables are not wholly dependent on the key of that table as it is dependent on the CodeType column as well, and thus the table is not normalized regardless of how broadly you define your domain.

    If you leave off the CodeType, you violate the Information Principle because now instead of representing that piece of information as a column in a table, it is now hidden in the logic of the application or trigger or UDF or wherever you choose to put that logic.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • I couldn't have stated it any better.  Thanks DC

    This discussion doesn't seem two sided.  We are all saying the same thing and 90% percent of us agree that MUCK and Generic Lookup tables are "Bad", "Against Best Practice", "Poor Performing", etc...

    I kinda feel sorry for the author of this discussion, and all those that still remain ignorant to generic lookup tables. 

    On a side note, according to Codd 1NF; "Make a separate table for each set of related attributes, and give each table a primary key."  If you have codes that you have unioned together to create a list, think about the union of data.  It is typically separate attributes from the beginning.  You usually get the codes from other sources.  That should tell you something.  That union should be normalized, not stored together.  Hell, why not just create one or two huge flat tables and call it good.  "Just Joking".

  • Is the quote from your side note an actual quote by Codd, or are you paraphrasing? I'm genuinely curious as I haven't yet read his original paper on Relational in its entirety.

    This sentence from your post doesn't make sense: "That union should be normalized, not stored together." The UNION operator takes as its arguments type-compatible relations, meaning each "matching" attribute in the arguments must be of the same domain. To speak of then "normalizing" the result of the operation seems a little off. Perhaps you were using the term "union" loosely... can you reformulate more precisely?

    As to the last sentence... get with the times. You only need one table with a GUID and an XML column. (Just joking, too)

     

    TroyK

  • Paraphrasing...  Got it from DataModel.org.

    The union was used somewhat loosely.  If you have CODE_A from table A and CODE_B from table B, and union the codes together, you would have a combined set of data from two separate sources.  The two datasources could be referred to as two separate Attribute types.  Therefore, you would need to break them out into two separate tables to comply with 1NF.

    In my opinion, a MUCK or Generic Lookup table can't be considered 1NF.  Basically, I would say that it doesn't fall under any of CODDs rules.  Another way to think about it is to consider that when you begin to normalize and look at you data in a table, a MUCK table is a union of different types of data; possibly different sources.  You can't look at it the same way, you would need to look at it from a horizontal partition of sorts. 

    I don't know if I cleared that up.

     

     

  • Actually, I made a typo.  I should have said 3NF since 1NF says nothing about key dependence...  Sorry for any confusion. 

    So while Troy's point about 1NF is valid (assuming an outrageously broad domain), a MUCK table is definately NOT properly normalized.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Correct - that's why I asked about the Codd 1NF paraphrase. Either Greg is misremembering, or DataModel.org has got something wrong.

    As Greg said, I think that the majority of us understand that the idea proposed in the article is to be avoided. My only concern and the reason I chimed in is that I think people are confused as to exactly why it's a bad idea. Although the table from the article may violate some normal form or another (I would contend not 1NF, though), it's not difficult to envision a MUCK table that is properly normalized. However, it would still [probably] be a bad idea. The point is that normalization only gets you part of the way to good db design... well, more accurately, a properly normalized schema is the result of a good design.

    Dare I write a short article about this for the website?

     

    TroyK

  • While the MUCK table might be properly normalized when viewed in isolation, the database that utilizes a MUCK cannot possibly be properly normalized.

    Good articles on database design are always needed.  Actually, the real need is for more people to become knowledgable about good database design.  I had  been working on a series of articles on the basics, but when Chris Date's latest book "Database In Depth" came out, I read through it and decided that I had nothing to add...

    Anyone interested in the fundamentals of data management and wants to understand WHY, not just HOW, owes it to themselves to get that book and study it.  Date really did a good job of laying out the fundamentals in an understandable manner.  It is a more concise and targeted book than his "Introduction" book, but it still isn't for rank beginners.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • I have the book, but I haven't had a chance to crack it open yet -- too busy setting up the Wai'ale'ale remote office.

     I'm looking forward to the read, though, as I'm already familiar with Chris's writing style from his other books and dbdebunk foundation papers.

    TroyK

  • Ahh, I wondered if that was you.  It must be rough out there in paradise!

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

Viewing 15 posts - 61 through 75 (of 101 total)

You must be logged in to reply to this topic. Login to reply