DB Design question

  • Hi everybody

    I would appreciate some help with the following design problem:

    The application is about documents with paragraphs. The documents are in a table, and each document consists of various paragraphs, which are also kept in a table.

    If a document is created using paragraphs 1, 13, 27 & 54, it is easy to have a

    many-to-many relation between the document table and the paragraph table, something like:

    DocNo ParaNo

    ----- ------

    1 1

    1 13

    1 27

    1 54

    However, after pulling a paragraph into a document, the user can modify the paragraph, which must then be stored as a "custom" paragraph, bacsue we don't want to change the "original" paragraph. Note that it is not always that a user modifies a paragraph, but it happens regularly. Also, some paragraphs are almost never

    modified, while still others are modified fairly regularly.

    My problem is finding the best design. As I see it, there are basically three options:

    (1) Put all paragraphs in the paragraph table. Even if the user only changes one single word in a paragraph, it then becomes a new paragraph. The drawback of this is that the paragraph table could get huge (500000 documents with 50 paragraphs each = 25 million rows!) Also, there will eventually be many variations of the same paragraph - it might even be that two users have made the exact same modification to the same paragraph, resulting in two "different" paragraphs (with different keys) that are exactly the same, content-wise.

    (2) Keep all unmodified paragraphs in one table, and all modified paragraphs in another table. If a user modifies a standard paragraph, it gets copied to the "modified" table.

    The problem with this is that it becomes more difficult to build up the document from its different paragraphs - especially the order in which the paragraphs must occur. And it also doesn't solve the problem of two users creating the same modified paragraph.

    (3) Keep the table of "standard" paragraphs, and use that table purely as a "lookup" or "template" table, so all clauses that are selected, get copied to another table. This is a variation on the 1st option, and will have some of the same issues.

    It seems to me that option 1 is the one with the least problems. Am I perhaps missing something?

    Thanks

    Schalk

  • You are not missing anything (at least to me).  However, I would suggest option 3 and that is the truely normalized version.  The fact that you have a large number of rows and a lot of data is just the nature of your database requirements.

    Have you thought of scanning paragraphs already in the system (noting which parent paragraph they came from) to see if an identical paragrah is already in the system? 

    Make sure your server has plenty of memory and diskspace!

    If the phone doesn't ring...It's me.

  • Your paragraph table needs a revision number to keep track of the instances of the paragraph.

    Revision zero is the original.

    I would recommend that you have

    • Creation date
    • Last updated date
    • Creating user
    • Last updating user (for paragraphs modified by the system have a psuedo user.
    • CheckOut user (Who has currently checked out the paragraph).  Set to zero if not checked out.
    • CheckOut date and time.  If someone doesn't check a paragraph back in then you will need a mechanism to determine whether it was deliberate or just sloppiness.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply