Beginning Database Design - Spot the Flaws

  • Obs (11/20/2007)


    harald_m_mueller, I think your missing the spirit of what Steve is trying to put out. Yes, there could be a better description of what requirements are, but as a generic learning exercise and discussion go, it's not that bad. If anything, this article did what it's supposed to do, and that's to generate questions. In day to day jobs, no one is going to give you a perfect requirements. There's always going to be communication happening to narrow down the focus. It's just the way it is.

    My reading into this is that you think the exercise should be to spot flaws only in the database design, and not in the requirements gathering.

    Is that a fair interpretation?

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Great idea for a series!

    I'm in the same boat with harold: what are the design goals, etc? It's hard to make decisions about data-types, table relationships and even normalization without having more information about how the data are related and thier types.

    I'm looking forward to seeing where this goes....

  • My opinion says that yeah, this was a meant to be spot the flaw in the design (hence the title). Understandably, you could say flaw #1 are the requirements or lack thereof but I surmise that this was more about technical issues.

  • I think it'd be doing a disservice to the people responsible to try to lump in requirements gather and design into a discussion about database design. Ultimately, I think this is a scope issue. It would almost be a better exercise to do a series of articles covering end to end project life cycle, just to make sure that the requirements gathering and planning addressed too. It's not practical, but I'm sure that people would be providing more specific options to the issues at hand. Really, most of these aren't bad at all for off the cuff suggestions

  • harald_m_mueller (11/20/2007)


    As a rule of thumb, clustered indices should be on the (logical or technical) primary key

    I would argue with that. The best place for the clustered index is highly (if not completely) dependent on how the table is used. In many cases, there are far better places to put the clustered index than the primary key.

    A very, very general rule of thumb I use. Few records to be retrieved - NC index. Large number of records to be retrieved - clustered index.

    You only have one clustered index. Make the best use of it possible. My general rule to the developers I work with is to put the cluster on the pk field if there is no where better to put it.

    It's fairly pointless discussing indexes. We have no idea how the table will be queried, how much data will be in there, how fast the inserts and updates will be, etc, etc. Those are needed to say anything meaningful about the indexes

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Obs (11/20/2007)


    My opinion says that yeah, this was a meant to be spot the flaw in the design (hence the title). Understandably, you could say flaw #1 are the requirements or lack thereof but I surmise that this was more about technical issues.

    Okay - thanks 🙂

    I saw this as great opportunity to discuss the problems of requirements gathering in addition to database design, and was thinking that Harald had done a great job of pointing out some of the typical issues associated with it.

    I can see how this would grate on your opinion now though.

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • I'll chime in here, with kids off to school and a cup of coffee to enjoy the comments with.

    I've started the second part, but lots of your ideas will be incorporated. I deliberately made some mistakes and you found I few things I hadn't thought of. And thank you for the great debate. This is exactly what I was hoping to provoke.

    The requirements are vague, after I trashed another page or so of prose I'd written. However, this is exactly how many applications I've started working on are built. Someone has an idea, often not fleshed out, more half baked, and they want someone to start building it. So you have to somewhat interpolate, somewhat ask questions, somewhat plan for where you think the application will go with good design.

    To answer a few questions:

    - It's a web app, meant to be out there for others to see the information.

    - Cost isn't an issue with regards to SQL Server. I can use shared hosting, low cost plans, a work server, whatever, so the cost isn't a problem.

    - Most used query will typically be a list of titles, rating, review date, sortable in any of those columns. There will be queries by author (list by author), to get the review, to get a list of books by rating.

    - This is a consumer type of site, not really looking to gather too much technical information about the books (publish date, publisher, etc.). That stuff is better left to booksellers and the like.

    - The "plan" is hundreds of books. After all, how many can I read? 😉

    - It is expected that limited number of people will use this, but we can hope for tens of thousands.

    - Books can be owned (or not) by the reviewer.

    - Image is the book cover. I wonder if I'd ever include others...

    - We'll use SQL Server 2005 for now.

    Again, somewhat vague, but isn't that what the customer does :w00t:

  • Great Idea for an article!

    A few thoughts. I would change the owned field to ownerid and create an owner table (Owner information), then relate them. This would allow access for many owners to see the reviews of their books as well. I would also put in a field containing searchable metadata that people could see reviews from books that don't have the same Author or Title (words) but have the same content.

    Steven

  • Almost 30 replies on the same day that it was posted, I think this is a topic many are interested in, good choice Steve

    Much has been said already.. but here's a few of my thoughts

    I think splitting it up into about 3 tables and 1 relation table would be about right, even this I feel is borderline over kill for such a simple app.

    Here's the tables I'd use:

    (There's a few posts on the specifics of these tables, so I won't spam you with the SQL of it)

    Books - this table would be used for tracking what books you have and their status, you can add books to the table as a reminder of what books you still need to review, furthermore by adding an additional status code for "Wish list" you can have a personal wish list built into this.

    Reviews - not all books in your book table will have reviews

    Authors - I think this is an elegant idea suggested by many, with this table you can add more functionality to your website by providing links such as "Other books I have reviewed by this author"

    RelationAuthorsToBooks - Since books can have multiple authors and authors can have multiple books, but maybe there is a better way of doing fast searches on many to many relationships. (I'd just use a two column table of "Book_PK" and "Author_PK")

    One last thought:

    Primary keys - I've personally found that its best to have full control over the primary key, not only because of the uniqueness issue, but also because so many things change. The ISBN is a good example, things that are never supposed to change always find a way to change, sometimes (like in this case) it'd be a simple change from char(10) to char(13) and if your using string types on the website no big deal, but other times your PK will be a number that now has a letter appended to the end, and all your types have to go from ints to strings.

    While the issue of important data changing will still be an issue, if that value is not the central key used by your entire db and website, it tends to be much easier to fix the issue.

    Has anyone else had problems with basing their Primary keys on something that wasn't supposed to change?

    Kris

  • I think all the database issues have been covered so i'm just bringing up requirement ideas.

    I would change the owner field to a larger size so it can acommodate the owners name. I've borrowed books and forgotten where they came from. I would also include an expected return date so you don't keep them forever.

    You may want to include a version or release field. If you are reviewing text books, they come out with different versions each year.

  • The value in this article (perhaps not the original intention) is to point out how important it is to be as specific as possible with the requirements. This is NOT to say that one should expect that requirements won't change for whatever reason -- incorrectly specified, new business opportunities, etc. The hallmark of a good design, though, is that it is based on sound fundamental data management knowledge, and through the application of those fundamentals, is able to grow and change as needed. Some people might call this "agility".

    The simple fact remains that it is impossible to fully assess a particular design absent a reference to the myriad requirements. There are too many possible "workable" designs, and the tradeoffs one makes to arrive at the optimal are directly dependent upon those requirements. This is the reason that "design-by-forum-posting" exercises are generally so fruitless. Someone who has a good grasp of the requirements that must be specified in order to derive a good design is the least likely type to need to ask for help in a forum.

    TroyK

  • Kris (11/20/2007)


    Has anyone else had problems with basing their Primary keys on something that wasn't supposed to change?

    Kris

    The criteria for choosing a candidate key are:

    - uniqueness

    - simplicity

    - irreducibility

    - stability

    - familiarity

    What you are describing is a result of incorrectly assessing the stability of a key. The unfortunate reality is that, sometimes, the design incorporates a mistake. The key (pun intended) is to be well-versed in techniques for migrating to better designs and implementations as these mistakes are discovered (and as requirements emerge).

    Like most design decisions, there are tradeoffs to be weighed as some of the criteria listed above can be at odds with each other. For instance, a familiar key may not be simple or irreducible.

    HTH,

    TroyK

  • I appreciate this discussion. Most especially, I appreciate it when people explain the *whys* for their recommendations. It helps to understand where people are coming from. Just as helpful to me, some of these posts have given me a way to articulate my own past design decisions that have been made based on intuition. My intuition may have been right, but it is always better to have technical reasons.

    As for recommendations, I agree with many of the recommendations posted here. I'll add the slant that I don't think I have ever created a SQL database without first designing it in ERwin. Thus issues like whether or not to make a column nullable are something you can't miss up front. You have to specify and think about those things. And you almost can't miss creating foreign key constraints when designing in ERwin. This discussion has helped me see how a modeling tool (any data modeling tool would do, I'm definitely not a big ERwin fan or specifically promoting ERwin) has helped me design good databases from the start.

    There are a million design decisions that ERwin can't help with. But it seems like a good data modeling tool (one that can generate the SQL code for you) can not only be very efficient, but help prevent holes in the design.

    It reminds me of when I first learned to write basic SQL queries. I started with the design view in MS Access and had MS Access show me how the SQL looked. Over time, you grow comfortable (and even prefer) doing complicated SQL without the design grid. But even then, it is often faster and more accurate (no typos) to get the root of a query started with a nice GUI interface.

  • Kris (11/20/2007)

    Has anyone else had problems with basing their Primary keys on something that wasn't supposed to change?

    We had exactly this issue at my university in regards to social security numbers being keys for tables. The numbers themselves haven't changed but identity theft issues have persuaded us to switch from using them to some fictitious value, amongst other changes we're making. So yeah, I'm more in the camp of using fake primary keys where feasible.

  • Thanks for your article. My only initial comment in the design would be that there are opportunities to normalize the tables more, by separating out authors for example from the books table. This I think would allow for growth as the amount of data increases in your database.

Viewing 15 posts - 31 through 45 (of 96 total)

You must be logged in to reply to this topic. Login to reply