BEST PRACTICE ON PRIMARY KEYS

  • I have to use primary keys but they seem to be a bit lenghty.However they all provide meaningful Identification especially at the database level.They are not friendly especially on application side in searching,retrieving records etc.eg AE001200708HQW01 this key has got five meaningful segments,to me it seem to be weird as far as database design is concerned! what is the best practice to manipulate such primary keys?? Can I just leave it as it is???

    PLEASE HELP

  • My practice (best or not) is to replace lengthy text keys like the one you describe with a system assigned key...either a four or eight byte integer or a GUID if you like them better...and store your other value as an attribute.  Your joins to child tables will be easier but to query on the text key, you'll always have to join to the root table.  Also, if you have a very large volume of rows, you will definitely save space.

    Make sure to index the text key in the root table because most end-user queries will want to use it.

  • In my databases I usually assign an identity integer as a primary key or a combination of integers. If I have a string that is a natural key, such as e-mail address, I use an identity integer instead of the e-mail address. The reason for this is that if other tables use this field as part of a composite key an integer is smaller than an e-mail address. Using an integer also means that you get more records per page than if you use large string fields which means that the primary key will be smaller and accessed faster when you try to find a row in the table.

  • You will find strong opinions on both points of view.  And in all honesty there are pro's and con's to both approaches. 

    I believe that you should NEVER use a system generated key, when you can create a natural key.  Yes, your keys will get large.  Especially as you get further and further down the Parent-Child Tree.  However, what you will find is that lookup querys are almost non-existent.  However, the downside to this is that changing the Primary key is quite difficult then.  However, this should be an EXTREMELY RARE thing to do if you designed the DB correctly. 

    I did a fairly involved study in my past company, and since I did it on company time, I don't think it would be a great idea to post it.  What I can tell you is that in the test, I was able to show that in a real functioning system, overall performance went up significantly because of the reduced load on the system.  The number of actual queries against the system could be reduced significantly. 

  • If there are 5 segments that make up the primary key it is not atomic.

    The first normal form, sometimes called 1NF, states that each attribute or column value must be atomic

  • Another consideration, if the PK is clustered, is the inclusion of the key in all non-clustered indices.  The larger the clustered key, the larger the NC indices will be.

  • True -- each attribute or column value must be atomic for 1NF. However, a key may or may not be atomic and normalization rules do not require it.

    If a PK consists of 5 attributes and it will be used in a FK relationship, sure, go ahead and demote the PK to Candidate and create a surrogate key field. Just never forget to define a unique index on the 5 columns of the now-candidate key. After all, that is what is going to uniquely identify the tuple. One often overlooked drawback of the self-generating surrogate key is that it guarantees uniqueness.

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

  • I re-wrote this post about 5 times.  Trying to find a way to say this nicely. 

    I have been doing this for a very long time.  Although that might be how I learned it at school.   What I learned on the job helped me understand why the people that still believed that would work are still in school.    There is a place for that type of design, but you won't find it in very many environments where results are required and money isn't endless.

  • Thanks very much for your contributions...I really appreciate your support.However,to illustrate the real situation,below is the sample scenario with 3 sample tables;

    Buyer table

    BuyerID LocationID Buyer Name Address Tel

    AE100 PA APPliance Ltd Box 23300

    ME250 GE Ministry of Finance Box 0999

    Location Table

    LocationID Name

    PA PARIS

    GE GENEVA

    Tender table

    TenderNo BuyerId FinYrID TenderType ……..etc

    AE10020072008PAW30 AE100 20072008 Works

    AE10020072008PAG31 AE100 20072008 Goods

    AE10020072008PAW10 AE100 20072008 Works

    ME250020072008GE10 ME100 20072008 Goods

    Above is an example of data to be stored in the table in question. My concern is ,is it best practice to keep the whole TenderNo which basically is the combination of values already available in other columns??ie AE100 20072008 TA W 30 stands for BuyerID,FinancialYearID and TenderType respectively.

    thanks

    rgds

    noneK

  • Ah, well, in the example you give, you are violating 1NF. Break the one column into TenderNo (if needed), BuyerID, FinancialYearID and TenderType columns. The TenderNo column would be needed, and it would be the primary key, if the combination of the other three was not necessarily unique for every entry. If the other three did uniquely identify each entry, you could still add TenderNo, probably as an identity, for the pk and create a unique index on the three. This would make sense if the PK was used to form a relationship with another table.

    However, if the pk of this table will not be used in such a relationship, I would recommend not creating a surrogate key. You would be adding a level of complexity for no benefit.

    One of my favorite examples of taking the surrogate identity key idea to an extreme is this:

    TableA and TableB both have a surrogate identity key. They form a M:M relationship so there is an intersection table A2B which is defined

      A2B
      ===================================
      A2B_ID    int identity primary key
      TableA_ID int references TableA(ID)
      TableB_ID int references TableB(ID)

    I looked at this for the longest time thinking I must be missing something. Then I thought I was looking at someone's idea of a joke. But this table exists (the names have been changed to protect the innocent -- if there are any) and is currently in use.

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

  • Just curious? 

    Was there a unique constraint on TableA_ID, TableB_ID?

     

  • Well, I see you've managed to create a circular reference to yourself.

    • I don't know what database analysis tool the guy used so I can't comment on its accuracy.
    • I don't know the DDL of the table (did he define the two FK fields as the table PK?)

      (Addendum: yes, I see in rereading the original post that he did define them as the PK. But I find it interesting that no one on that thread, including Bob, asked to see any DDL.)

    • I'm not sure what was meant by "segmented" -- is it the same thing as clustered? If so, it's possible the answer to the preceding question is "No" which was why it suggested defining a unique constraint on them.
    • I don't know what other use he may be making of the table or the design of the queries he uses against it. It seems to me that this would make a huge difference in his best course of action.

    The faith you put in following the recommendations of automated analysis tools is heartwarming.

    If the composite PK of an intersection table formed by the FK references is in turn used to form a FK relation with yet another table, then creating a surrogate identity key field would make sense.

    I'm sorry, I should have stipulated...wait, I did stipulate that in an earlier post.

    What I didn't make clear, I suppose, was that the intersection table in my example was not used for anything other than to implement the M:M relation between the two tables. In all the queries against these tables, the PK field of the intersection table is never used. How could it be? Where would that value be kept and why? So why is it there? What purpose does it serve?

    To reiterate, I see no reason to ever create a third column on an intersection table (that has no other use but as an intersection table) to make into the primary key.

    If you can provide a realistic example where this makes sense, or makes some improvement on the efficiency of the database, I will thank you for it. After all, I come here mostly to learn.

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

  • Chasing my tail, going round and round.    I have only come across one situation where someone actually designed a DB with true circular reference.  It had 4 tables to complete it, but it was there, but this is a little off topic.  Sorry about that, but I just couldn't believe that there were two posts about the topic at one time. 

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply