To NULL or not to NULL (Design Question)

  • Professionally i'm a programmer. I also happen to be an avid cyclist. Well these worlds are colliding, and I find that I want to create a full featured application for tracking rides and nutrition.

    The first thing that needs to be done is to come up with a data model. This is where I could have two different approaches (that I know of), and I wonder what other DB folks think.

    For instance, data that needs to be collected for any given ride can include DISTANCE, RIDETIME, RIDEDATE, TEMPERATURE, HUMIDITY, WIND, LOCATION, RIDER, EQUIPMENT USED, and even RIDER COMMENTS.

    Most of the data above shouldn't be required. It's also important to note that this will be a multi user system, and as such, there could be several people doing any given ride.

    So with that in mind, started with something like the following:

    ====================

    [RIDE]

    RIDEID identity pk,

    LOCID int fk (links to a location/route table / not shown),

    RIDE_DATE smalldatetime

    [RIDE_DETAIL]

    DETAILID identity pk,

    RIDEID int fk (links to RIDE),

    USERID int fk (links to USER table / not shown),

    RIDE_TIME int (total time riding, stored in seconds)

    ====================

    Those two tables are what I would consider to be "required" information. All the other stuff, like weather, comments, and equpiment, is optional. How to store that info? I've never been a big fan of NULL fields, so I started to think of using an "ATTRIBUTE" type of setup.

    The idea is that i'd have an ATTRIBUTE_TYPE table that would contain an identity field and a description of the attribute. Things like "DISTANCE_KM", "TEMP_C", "WIND", "HUMIDITY", "COMMENT". The problem with is that the data types differ. Distance_KM, Temp_C, Humidity would need to be a double, while Wind, Comment would need to be Varchar.

    I've rambled plenty, does anyone have suggestions? Should I just allow null values for those fields?

  • The "Attribute" or "name/value pair" solution is never a good solution partially for the reason you state but it goes further than that.  The real problem is that the domain (set of all possible values) is different for each one and it is impossibly complex to enforce any type of data integrity using that method.

    Personally I try to avoid nullable columns whenever practical, but in my opinion nullable columns are preferable to the name/value pair approach.

    If you truly want to avoid NULLs and still enforce data integrity (What I recommend) you should define a seperate table for each nullable attribute.  For example:

    [Distance]
    DETAILID int REFERENCES RIDE_DETAIL(DETAILID)
    Km numeric(5.2)

    And so on...

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • BTW I'm pretty impressed; a programmer who understands that the data model is the proper starting point of application design...  There aren't enough of you out there!

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Nullable columns it is. It's not so much the data integrity that bugged me, it was scalability. I don't like the idea of having a table that has potentially 20 or more columns in it.

    As the system gets usage, there will undoubtably be requests to add columns that I never thought about.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply