is storing a comma-separated list symptomatic of a bad design?

  • I've created a database to manage the data for a fantasy hockey pool program. From my reading, I've determined that in general, I should strive for at least third normal form (3NF) for a good database design. The following is my naive idea, followed by my dilemma, and two possible solutions:

    Some of the tables and some of their columns:

    Players (a list of NHL hockey players)

    PlayerID (the primary key),

    FirstName,

    LastName,

    PlaceOfBirth, etc.

    GeneralManagers (a list of my friends who have created fantasy teams)

    FirstName,

    LastName,

    FantasyTeamName

    Question: How should I store the list of players each general manager has picked for their team?

    Solution 1: Do I make a column in the 'GeneralManagers' table with a comma-delimited list of PlayerIDs from the Players table? (This smells like it will result in annoying parsing of strings when I want to use this data, or a violation of 3NF).

    Solution 2: Do I make separate tables for each fantasy team, each table containing only one column, 'PlayerID', so each row is a different player picked? If so, it seems that the only way to "link" a GeneralManager with their respective FanatasyTeam table would be to use a naming convention such as JoeSmithsFantasyTeam, JohnDoesFantasyTeam etc. This design, however, does not explicitly link a general manager with their team; it only does so implicitly by the naming convention.

    Any direction would be greatly appreciated!

  • So I think I found a better solution than either of the two proposed:

    Solution 3: Create a new table called 'DraftedPlayers' with columns 'FantasyTeamName' and 'PlayerID'. Although I still don't understand 3NF very well, this smells right, so I'll go with it for now.

    My inspiration came from the OrderItems table from this page: http://www.4guysfromrolla.com/webtech/010406-1.shtml. It seems somewhat analogous to my situation. The conceptual leap I had to make was that when I want to ask "What is Joe Smith's fantasy team roster?" I have to run a query and generate the answer on the fly by joining my Players and DraftedPlayers tables, instead of having that information "hard-coded" into a table somewhere.

  • Hi,

    Glad to see that you have the answer to your question and which is perfect with the experience that i have on architecture side.

    Thanks -- Vj

    http://dotnetvj.blogspot.com

  • The third "mapping" table is usually a good solution. That being said - since you don't usually want the same player to be pickable by multiple teams, you COULD create a reference to the team within the player table.

    Of course - the purpose of this kind of stuff is often to track your stats over multiple seasons (even if virtual seasons), so you might need:

    Players

    GMs/Teams

    Seasons

    PlayerTeamSeason (which player is assigned to which team during a given season).

    (PlayerTeamSeason should likely get brokern down further, so you'd end up with TeamToSeason, to track which friends are active during a given season, and then map players to that table).

    that way - you end up with the ability to map the player multiple times, but only once per season (you really don't want to allow Dan Marino to play himself on opposite teams).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • How about something like this?

    CREATE TABLE Player

    (PlayerID INT IDENTITY(1,1) PRIMARY KEY

    ,FirstName VARCHAR(50)

    ,LastName VARCHAR(50)

    ,PlaceOfBirth VARCHAR(50))

    GO

    CREATE TABLE GeneralManager

    (ManagerID INT IDENTITY(1,1) PRIMARY KEY

    ,FirstName VARCHAR(50)

    ,LastName VARCHAR(50)

    ,TeamID INT)

    GO

    CREATE TABLE Team

    (TeamID INT IDENTITY (1,1) PRIMARY KEY

    ,TeamName VARCHAR(50))

    GO

    CREATE TABLE Squad

    (TeamID INT NOT NULL

    ,PlayerID INT NOT NULL)

    GO

    ALTER TABLE Squad ADD PRIMARY KEY (TeamID,PlayerID);

    GO

  • Matt Miller, great suggestion about different seasons.

    OleHank, that code is pretty much what I've gone with. Thanks.

    As a bit of an aside: why bother having a "TeamID" field? I mean, I understand it's necessary to have keys, but if, for instance, I know that "TeamName" will be unique, why not just have that as a primary key? I guess my question is: is it standard to include ID fields that are just ints for sake of uniformity, or is it a better idea to have primary keys that are meaningful if possible?

  • partridgetim ,

    The issue with using something like TeamName as your primary key is other tables will use the primary key to reference the team in the team table. For example, say a friend creates the "Dolphins Team". In your 'Drafted Players' table you will have a set of records consisting of "Dolphins Team" and associated players. I hope you're following me.

    Now lets say your friend recieves a cease and decist letter from the NFL and decides he wants to rename his team to "Dolfins". If you change the team name in the team table, the next time you go to look up all the memebers of the team, there will be none because your Draftedplayers Table will have a bunch of "Dolphins Team" since it was never updated. You could set up your database to cascade those changes, or do it yourself, but it's just more work.

    Hope that made sense.

    ---
    Dlongnecker

  • dlongnecker, I follow...

    so essentially using inteteger ID fields is a little more static, or permanent. That is, they are unlikely to change in the future and cause problems, whereas my friend might want to change his team name. Good advice. Thanks.

  • Yes, you do want to get the db in 3nf as much as possible. Taking it to higher levels are not necessary except in extremely specialized circumstances (which, in over 10 years working with databases, I have never actually seen in the real world). A list of values violates 1nf and you have to be in 1nf in order to get to 2nf and so forth.

    As to what to use as the primary key: the name or a number. Actually, if you design the foreign key relationships correctly, should you change the key value, that change will cascade to all the referencing tables. So this is not, to me, a convincing argument except in special circumstances. Another reason for assigning a numeric substitute for the key (this is called a surrogate key, btw) is efficiency -- a number is easier for the computer to handle in searching and sorting than a string. However, the difference (assuming the string is small, say varchar(16) or smaller) using modern computers is so slight as to not be noticeable until you start working with Very Large Tables. So this is also a rather weak argument for the general case.

    The danger of always using surrogate keys as a matter of policy is that the designers eventually forget to correctly identify the natural key(s) of a table. Even when a surrogate key is used, the natural key(s) must be identified and a NOT NULL, UNIQUE constraint defined on the column(s). In databases where surrogate key use is ubiquitous, I have found that typically 90%+ of natural keys are not identified and defined. In these shops, they may have me defend my decision not to use a surrogate key. IMMHO, that is backwards. The use of a surrogate key should be allowed only after a convincing case has been made. I have found that a well-designed database will typically have about 50% surrogate key usage.

    As for the rest, a good exercise is to identify all the entities and their relationships with each other. Your entities may be Player, Manager, Team. A Team consists of zero, one or more Players. A Player may be assigned to zero or one Team. A Manager may manage zero or one Team. A Team may have zero, one or many Managers (this is, after all, fantasy). As you create more entities, make sure you specify how they relate to other entities. If you do this, the physical layout of the tables pretty much takes care of itself. An entity that can be associated with no more than one other entity will have the foreign key as part of its attributes. Any entity that can relate to many of another entity (say, Team to Players) cannot have the foreign key as part of its attributes -- when you start thinking about comma-separated lists, you're trying to put the FK on the wrong end of the relationship. And, as you have identified on your own, when an entity can relate to many of a second entity and that entity, in turn, can relate to many of the first entity, then you have to create a separate table, called an intersection table, to hold the foreign keys.

    Forgive me if I've rambled. I used to teach Data Modeling and I sometimes act as if I'm still in the classroom. As it is, I have thrown out some terms, such as "cascade", completely undefined so you will have something to look up on your own -- which you seem to be pretty good at doing. Good luck and don't be hesitant to reply later with your finished design or ask more questions.

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply