Lookup Table Madness

  • Thanks for the feedback.  I do tend to get a bit sarcastic when I see ignorance masquerading as "elegance".  I must admit that I was frustrated at having to deal with the effects of this practice at the time I wrote it and that colored my writing.

    BTW your suggestion is very much like those of my proof-readers who kept pointing things out that were lightly covered in the first drafts.  In response the article kept getting longer and logner (wordier).  It is a valid point and in retrospect I wish I had elaborated more on the subject...

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Clearly combining multiple descriptions with type codes into a single table as your example evolved to would be a terrible practice unless it was a single step in a migration or something similar.

    But what would be wrong with having three individual tables with an ID and a status like the example you started with? What rule of 3rd normal form does it violate?

    Those statuses can also be used to populate a drop down list in an application and there would never be any sparsely populated rows. I fail to see the problem, or for that matter even the slightest inefficiency, with this design.

  • Nothing is wrong with it at all.  The three seperate tables (one for each type of code) is the proper approach.  Where some people go wrong is when they decide that the database would be "simpler" if those three (or however many) tables were combined.  That is the practice that I argue against.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Dear Mr. Peterson:

     

    I found your article very well written, NOT wordy, to the point, and humorous; all characteristics of a good piece. I felt compelled to write, as it seemed your article was being unjustly maligned- it was uncalled for regardless of the truth of those opinions. We are grateful for your efforts. However, after reading a few critical opinions, I re-read your article and found those opinions off the mark anyway.

     

    I particularly enjoyed the crack about how the table was looking like a spreadsheet. LOL.

    Yes, I’ve been guilty of that, I knew I shouldn’t have done it, thought it was clever, and then regretted it!

    Respondents that claim that MUCK tables might sometimes be valid should acknowledge that the approach is still cheating and they should be appreciative that you’re reminding them that they shouldn’t make a habit of it and why. And yes, your article was clear to me why.

    Novices need to read your article to know that the MUCK table is cheating in the first place. How else would they know if it wasn’t for good people like you who take the time to write and get your friends to edit these informational articles?

     

    Lastly, we do need to be reminded that there is form and purpose to relational design.

     

    Thank you.

  • Good article, I agree wholeheartedly with all the points raised. I can't believe there are people out there that would ever contemplate the use of MUCK tables.

    I have a method of my own that I use, though only under certain circumstances.

    Imagine we are storing attributes about people and each of the set of possible answers for each attribute is a small enumeration. For example:
    AgeGroup: {0-50, >50}
    Gender: {M, F}
    MaritalStatus: {Married, Single}

    I would be tempted to store this as so:

    ID | AgeGroup | Gender | MaritalStatus
    1  | 0-50       | M         | Married
    2  | 0-50       | M         | Single
    3  | 0-50       | F         | Married
    4  | 0-50       | F         | Single
    5  | <50        | M         | Married
    6  | <50        | M         | Single
    7  | <50        | F         | Married
    8  | <50        | F         | Single

    The number of possible combinations is finite. The calculation to work out how many there would be is elementary maths (2*2*2 in the above case).

    With this approach:
    1. The number of tables is reduced
    2. Only one FK is required from the referencing table to this one. It is therefore quicker to query when joined to the referencing table.
    3. Is still pretty simple to understand. Argubaly simple than numerous lookup tables.
    4. Does not run the risk of creating the problems that Don outlined in his article
    5. All enumerated attribute data is stored in the same place.
     
    I would use this aproach only under the following circumstances
    1. All attributes can be enumerated (thus creating a finite list)
    2. All attributes relate to the same entity

    The above example (storing demographics) is the classic example.

    You can probably guess that I come from a data warehousing background as this is where this approach is generally used - because querying speed is paramount. In this case the table is usually not pre-populated either - new combinations of attributes are added to the table as and when they occur.

    Comments???

     

  • I may poorly express myself or you do not properly understand me. I am not trying work without proper data integration. Even if you have one table (not many) integrity can be established if that one table is in the same database as all other data.

    In general, this conversation is endless. It is like Newton's and Lobachevsky's theories that contradict with each other. But both are right for their own certain conditions.

    The situation may dictate the design behavior. And if all choices will be not analyzed for the situation you may choose the safest one that will not benefit company/project... for the long run. I am always making analysis before working with design and open to any balanced solution.  And only then I am moving toward implementation ideas.

     

     

  • I haven't given this any extensive thought, but the first concern I would have about this approach is that it would make updating an individual attribute a bit tricky.  Instead of merely issuing a simple update on MaritalStatus for example, you would have to figure out which one of the combinations fit the new reality (a much more complicated process).  I can see some potential upside to this approach in a read-only, warehouse or DSS system, as the ETL process could determine which demographic category a person belongs to.  But I don't think it would be the optimum for a transaction processing system for the reason given above and possibly others that I haven't thought of.

    Thanks for your kind words.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Actually, I thought it was pretty clear, although I agree with "too wordy".

    Main point being that constraints can't be properly maintained with the "MUCK" table design, and the fact that it breaks the ATOM rules. 

    I have a PRODUCTION app that uses this MUCK design.  As predicted in the article, it's caused nothing but trouble.

    So, although the author is "zealous", he has good reason.

    cl

    Signature is NULL

  • Here are two articles that argue the opposite:

    http://qa.sqlservercentral.com/columnists/dsumlin/lookupstrategy.asp

    http://qa.sqlservercentral.com/columnists/nraghavendra/storekeystoringmiscdata.asp

    You'll notice in one that I got into a battle with the author about it...

    --
    Adam Machanic
    whoisactive

  • amachanic,

    Both articles referenced talk about using check constraints and triggers instead of foreign keys.  Lots of overhead!

    cl

    Signature is NULL

  • As long as we are on the topic of lookup tables, here is a performance questions related to them. Take the situation where an application has a known and unchanging set of values, for instance, Trade Type (Buy, Sell, and Hold). We could have a lookup table called TradeType[TradeTypeID INT, TradeTypeDesc VARCHAR(10)].

    Which query below is the better performing query, one that uses a lookup table and a JOIN or one that uses a CASE statement? Does the answer change as (more columns are returned, more rows are returned, more tables joined in the query, etc)?

    QUERY 1

    SELECT T.TradeNum, R.TradeTypeDesc FROM Trade T INNER JOIN TradeType R ON T.TradeTypeID = R.TradeTypeID

    QUERY 2

    SELECT

    T.TradeNum,

    CASE T.TradeTypeID

      WHEN 1 THEN 'Buy'

      WHEN 2 THEN 'Sell'

      WHEN 3 THEN 'Hold'

    END AS TradeTypeDesc

    FROM Trade T 

  • Jeffrey,

    This shouldn't really be a performance question; it's more of a design question.  For example; do you want to change your code everytime a new TradeTypeID is created?  If you use the Case you'll have to, while the Join will do it automatically.  Or say the TradeTypeDesc is changed; this won't be affected in your code.

    The way to do this is in a join, no question.  Keep your data in tables and your code in procs; mixing the two can cause big problems and maintenence headaches.

    If you are worried about performace:

    1)  Make TradeType.TradeTypeID a Primary Key

    2)  Put an index (nonclustered) on Trade.TradeTypeID

    Signature is NULL

  • I agree with Calvin Lawson on this, it's not a performance question but a maintenence question.  For instance, CHECK constraints (CHECK x IN ('A', 'B', 'C')) will perform marginally better than a FOREIGN KEY constraint to a lookup table for x... But do you want to re-write the CHECK constraint to add or delete a valid value, or just insert a row in a table?

    One thing I don't agree with, Calvin, is the non-clustered index.  On a small table like that, might as well just cluster the PK.  The non-clustered index will only take up more space and will require a bookmark lookup anyway to get the value.  Most likely, the optimizer will choose a table scan instead.

    --
    Adam Machanic
    whoisactive

  • --------------------------------------------------------------

    Quote:

    One thing I don't agree with, Calvin, is the non-clustered index. 

    --------------------------------------------------------------

    You mis-understand, the nonclustered index is on the "Trade" table; I would assume the clustered index would be on the "TradeNum" column for that table.  The nonclustered index on Trade.TradeTypeID optimizes Updates/Inserts for that table; and will also speed up the join from Trade to TradeType.

    Of course the Primary Key on "TradeTYPE.TradeTypeID" should be clustered (and will be, by default).

    cl

    Signature is NULL

  • Ahh, yeah, I definitely agree with that

    --
    Adam Machanic
    whoisactive

Viewing 15 posts - 31 through 45 (of 202 total)

You must be logged in to reply to this topic. Login to reply