Lookup Table Madness

  • Good and bad are such loaded terms...like everything, it depends on what you are trying to optimize for. The article was kind of hard to understand, but what I bring down from that is that you don't like lookups (dimensions, code lists, etc.) stored in Entity-Attribute-Value formats. Don't forget, raw speed isn't always the #1 aspect to optimize for. In the clinical trial world, where you may have thousands of codelists, managing that number of tables is prohibitively expensive, whilst processing speed isn't really of much concern (i.e., if it finishes in under an hour, who cares?)

  • Instead of "Lookup Tables" we call them "Static Tables", since the content is updated relatively rarely.

    In systems with big amount of tables we even putt prefix ST_ for them.

  • I do a lot of data warehousing, where someone has made a lot of money calling lookup tables dimensions. If you want to confuse the heck out of your users, call them dimensions. Ugh

  • I'm sorry you feel that way...

    I admit that if I had it to do over again, I could improve the article significantly, and will offer in my defense that I am not a writer by trade and as much as I value SQLServerCentral.com I must say that their editiorial staff is lacking

    However, if you will take the time to actually read the article and UNDERSTAND it (I promise it isn't that hard) you will see that I did in fact provide a solution, but the main point of the article was to debunk this tragically common practice, not to write "Database Design for Dummies."

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • First of all let me say I completely agree with all your points on why not to use MUCK tables.

    However, as a point of interest, we have used muck tables in one unusual situation successfully.  The system was a multi-lingual website where every lookup in the system also had a related translation table.  The number of lookup tables alone was huge.  We solved the problem by: 1. Using one MUCK table for the lookups and one MUCK table for the translation tables.  this eliminated 100 tables from the data model and allowed us to use lots of generic code. 2. We created custom multilingual dropdown controls that were designed to intrinsically use the MUCK tables. 3. We made sure (as sure as you can be) that the lookups we added to the MUCK table were domains and not entities.  4. We had a contingency plan and an architecture that had an elegant solution in the situation when a domain got elevated to an entity (i.e. got more attributes).

    It was a trade-off but all in all not a bad one...

  • I have one simple point to clarify your argument against code tables, which you touched upon, but which I believe needs to be heavily emphasized.

    By forcing "codes" of different types into a single table, you may be slamming different domains of data into the same place. Overlaps are clearly unacceptable, even if you have different types, because the meaning of a specific key is not clear without the type, i.e. the data is not irreducibly dependent on the primary key.

    Here's a very clear example: I work in the healthcare insurance industry. We have codes for everything. If I mixed codes used to specify a diagnosis (what's wrong with you) with codes for procedures (what we did about it), I would be very confused. This means I only query one table which contains both diagnoses and actions taken to remedy diagnoses. That's awfully confusing.

  • Nice article. Flowed well, lots of good solid points made. Lots of good rules and principles that apply to *most* ordinary situation where you are building a system around known kinds of information about known kinds of things.

    The difficulties involved in using this kind of design are magnified when you are forced to used little drag-n-drop relationship builders to design queries, etc.

    As with anything, understanding the rules and the implications of bending the rules allows you to weigh the pros and cons, then make creative use of available tools to new ends. I'm always happy to split hairs, weigh pros-n-cons then boldly break rules where appropriate. It drives me nuts to work with people who aren't willing or able to do that.

    This article was clearly speaking to most situations and did a good job of illuminating some of the major negative implications of using 'MUCK' tables and went on to show how those negative implications outweigh the benefits in most situations.

    But....there are of course exceptions. Consider a situation where you are building a system that is optimized for flexibility rather than performance where you may or may not have to create new 'lookups' at runtime and perhaps attach those new lookups to new kinds of things.

    What would you do? If you (or your team-mates) had already squarely decided to forever swear off those nasty 'MUCK' tables at any cost... I suppose you could programmatically create and attach to brand new lookup tables at run-time ??? Other ideas?

    I would say that this would be a kind of situation where a 'MUCK' table would be very useful and appropriate. I've personally created several such designs which work very well for my purposes. It requires that I use one tiny extra little condition in my where clause, but I'd take that any day over a thing where I had to dynamically create a new lookup table, then query it by doing some kind of table-name replacement scheme in my query. Passing in the name of the table you want to query as a parameter is at the bottom of the list of things I want to do (especially if using stored procedures).

    *****************************

    parameter @ItemType

    Select description from MUCKTABLE

    where

    ItemType = @ItemType

    *****************************

    vs.

    ******************************

    parameter @NameOfTableToSelectFrom

    string sql "Select description from " + @NameOfTableToSelectFrom

    Execute(sql)

    ******************************

    Kenneth Papa

  • Here's a rule--clarify, don't simplify.

  • Yes and no...

    Dare I admit that I have actually used MUCK tables?  Yes I have, because I have had situations where we need to capture data that is given to us and we don't always know what it is ahead of time.  Throwing it into a MUCK table is an easy way to store that data.  However, and this is a BIG however, it must be understood that that data is for storage and retrieval only.  I call it "shoe size" data, it's data that we need, for one reason or another to keep, and possibly regurgitate, but it is also data that we don't really care about and don't do anything with.

    These tables are known by many different names, but there is a reason I focused on "Code" tables for my article.  By definition "Codes" are meaningful to a business; they tend to play a large role in logic and workflow etc...  As such, they should NEVER be treated as a MUCK.  The work of adding another table to store a new code type is trivial compared to writing the logic that will make use of it.  It's also trivial compared to the work involved in sorting out the messes created by MUCK tables in such situations.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Just to follow up, the primary objection to MUCK tables has NOTHING to do with perormance, it has everything to do with logical correctness.  For me, performance is a secondary concern because it doesn't matter how quickly I can return an answer if I can't guarantee its correctness.

    If raw physical performance is concern #1 you shouldn't be using a DBMS anyway.  That's not to say that DBMS's are incompatibale with performance, but that is not their primary purpose, and ultimately no DBMS can compete with binary data files for raw performance and efficiency.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • I don't like MUCK tables even though I have used them and inherited them.  The integrity just isn't there.

    The only advantage is that they require only 1 screen in the UI to manage.

    Here is my plan for my next project when I am required to use a MUCK table.

    1. Create a MUCK table for the UI to manage.  The code type table will have a column that tells the name of the real table.

    2.  Create all of the normalized tables.

    3.  Create a trigger on the MUCK table to update the real tables.

     

  • Here's my little bit of hell...

    CREATE TABLE [dbo].[CODES] (

    [CODEID] [varchar] (20),

    [CODETYPE] [varchar] (10),

    [CODENUMBER] [int] NULL ,

    [DESCRIPT] [varchar] (50),

    [ABBREV] [varchar] (50),

    [SUBTYPE] [int] NULL ,

    CONSTRAINT [PK_CODES] PRIMARY KEY NONCLUSTERED

    (

    [CODEID]

    ) WITH FILLFACTOR = 90 ON [PRIMARY]

    ) ON [PRIMARY]

    CREATE TABLE [dbo].[CODETYPES] (

    [CODETYPE] [varchar] (10),

    [DESCRIPTION] [varchar] (50),

    CONSTRAINT [PK_CODETYPES] PRIMARY KEY CLUSTERED

    (

    [CODETYPE]

    ) ON [PRIMARY]

    ) ON [PRIMARY]

  • What if we change your code to:

    CREATE TABLE Code (

    CodeID int IDENTITY(1,1)

    , CodeType int

    , Description varchar(255))

    ALTER TABLE Code ADD CONSTRAINT PK_Code PRIMARY KEY (CodeID)

    CREATE UNIQUE INDEX IX_CODE_1 on Code(CodeType,Description)

    CREATE TABLE Employee (

    EmployeeID int IDENTITY(1,1)

    , FirstName varchar(50)

    , LastName varchar(50)

    , CodeType int

    , CodeID int

    , etc... )

    ALTER TABLE Employee ADD CONSTRAINT FK_Code FOREIGN KEY (CodeID)

    REFERENCES Code (CodeID)

    ALTER TABLE Employee ADD CONSTRAINT CHK_CodeType CHECK(CodeType = 1)

  • I was immediately interested in the topic because I use lookup tables and am happy with the results, but a little lost in the conclusions of the article. I didn't see a solid argument illustrating the drawback of simple lookup tables which define codes used in the principal data tables, though in the exaggerated examples the problems became clear. I would like to read about any clear disadvantages, and appreciate the efforts of the author for submitting the article.

  • Spot On Target.  In fact, I was surprised to learn that any self-respecting developer would ever even consider creating one of those MUCK thingees, as it violates the most basic of principles in sound data base design to have a table where ambiguity is not only permitted, but deliberately introduced.

     

Viewing 15 posts - 121 through 135 (of 202 total)

You must be logged in to reply to this topic. Login to reply