Worst Practices - Not Using Primary Keys and Clustered Indexes

  • I would regard having a compound primary key as terrible practice.

    By all means have a unique key on it, but the primary key?

    I have a situation where we have an application with tables that have a 4 column compound key. The table structure is set in stone as it is 3rd party, however I can extend the schema using triggers.

    The problem comes when I want to trap updates to the table. There is no way I can trap a change to the table where any of 3 fields in the compound key could have been changed.

    If the table had a single field with a unique/primary key on it (why not use the IDENTITY) then updates could be tracked and I would home free!

    I understand that there is some hostility to Identity columns, but, as Steve wrote, its just a tool. Don't use screwdrivers to knock in nails!

  • I think that needs for clustered/not clustered indexes must come from design, if is the case of a process oriented system like OLTP, indexes must be used to minimum, because slow down your transactions and generate locks.If your needs are for data mining,OLAP.. stuff like this, is required a heavy use of indexes, tunned for every common query.

    In the case of ODS or any kind of mixture of process oriented with report oriented systems, is all about fine tunning on every query and see the use or the overhead that involves a change in the indexes.There is NO SILVER BULLET as I know.

    What I know, is that a good design must let the DBA role on the outside.

    Bogdan

  • I think PK are absolutely required, as

    most people agree. Here is what I usually

    do: If an unique, typed-integer column can

    be 'naturally' found, then I will use it as

    the PK; however, this is seldom the case,

    as the customer_id for a customer table

    maybe a long string, or it required other

    columns to be unique(bad by itself), then

    an artificial typed-integer column, such as

    an IDENTITY must be introduced. Therefore,

    I would create an unique typed-integer column

    as PK in cases where natural unique column

    cannot be found or if found but they are

    not typed integer, or required more than

    one column to make up as PK.

    IDENTITY, which has some drawbacks, as

    others have already point out, when it

    comes to imports or required INSERT from

    other sources, is still a quick and most

    efficient way of introducing PK to a table.

    Without using IDENTITY as PK, what are

    the best way to introduce typed-integer

    column as PK?? Anyone?

    Again, nothing is absolute, we are just

    talking in 'general', excluding little

    'value-lookup' tables, or some less than

    norm use of tables...etc.

    Ming

  • Ming has a good explanation above. I also often find there is no good natural key, especially as business rules are constantly changing. Often I need a name for something and I can't prevent the name from changing, so I need some artificial key.

    Steve Jones

    steve@dkranch.net

  • Hi Andy, There is one case where we've made notable improvements in performance by removing the primary key - high volume transaction buckets.

    I work for a profitable dotcom, imagine that! One reason we are profitable is that we make the most of the resources we have.

    In our environment, we capture and process millions of transactions per day. We increased throughput on these intense inserts (15%) by removing our PK's (and all indexes) on the tables where we pump our transactions. Transaction GUIDs become PK's during our processing phases. RI is maintained after the fact. This frugal use of CPU and I/O has enabled us to handle this processing with workstation class servers. For us, eliminating the PK is appropriate.

    BTW, this was a good set of articles.

    Bill


    Bill Bertovich
    bbertovich@interchangeusa.com

  • Thanks for the feedback! Nothing wrong with breaking any rule as long as you know you're doing it and why...more or less anyway! We're looking over the Worst Practices idea - has been popular, but so much of it is SOOOO subjective. Possibly not a bad thing.

    Andy

  • quote:


    In our environment, we capture and process millions of transactions per day. We increased throughput on these intense inserts (15%) by removing our PK's (and all indexes) on the tables where we pump our transactions. Transaction GUIDs become PK's during our processing phases. RI is maintained after the fact. This frugal use of CPU and I/O has enabled us to handle this processing with workstation class servers. For us, eliminating the PK is appropriate.


    This sounds interesting.

    Can you go into a bit more detail?

    Does it work by having daily transaction tables that are index free and then some nightly process copies the transactions up into the main processing tables which are indexed?

  • Index improves query if the query

    uses it; however, index, by the fact

    that insert/update/delete needs to also

    create/update/delete the index will slow

    down the process; therefore, there's

    no big suprise to why an indexless table

    would be faster if mostly doing inserts.

    Nightly copying over to another table,

    THEN create the index, rather than

    copying over an already indexed table.

    Ming

  • Hey David, More detail:

    You were on target, but we don't process nightly batches. Instead, we alternate between two small servers. When one is processing the other is capturing, and visa versa. We just flop back and forth between servers whenever the processing cycle is finished.

    Here's the beauty of it - this processing is both an audit process and an archiving process. We are updating account balances, etc. real time. The audit process, makes sure we didn't miss anything. Now I'm about to make everyone puke: we don't hold our updates in transaction - too much overhead across servers. We use the audit process to identify discrepancies and then re-process if need be, which rarely happens (2 times since December 1st due to DNS issues). We have the ability to reset and re-process a batch with full auditability.

    Less urgent aggregations do get processed less frequently, and some are in nightly batches.

    We use a methodology similar to Data Dependent Routing to switch back and forth between servers. You can read about it at:

    http://www.microsoft.com/technet/treeview/default.asp?url=/technet/itsolutions/idc/default.asp

    Drill down to the SQL Server database section.

    Good luck,

    Bill


    Bill Bertovich
    bbertovich@interchangeusa.com

  • Sounds interesting Bill. Being able to reprocess without a problem is a great design.

    Andy

  • If you put a clustered index on the PK and the table is constantly being updated, wouldn’t this increase the possibility of record locks?

  • Generally no. One thing you have to look at is where the inserts get done. If you're using a sequential primary key then you may end up with some contention at the end of the table. Another is if you are actually updating the primary key frequently. Really I should have left out the word primary, since these are always considerations!

    Andy

  • Hi all,

    I definitely agree that a PK is needed for EVERY table except for maybe very small temp tables (since I work in v7 I can't create tables as variables). Every time I forgot (sometimes out of laziness) I had to go back and put one in.

    What I don't understand is why everyone has something against the identity column. I've used this quite a bit in my normalised databases and it works quite fast. But I use these columns for joining on other columns and not as much for filtering out certain records. I think it may be faster to join on the identity column than on a unique 10 char column that has a clustered key (I should try this some time). It is always possible to associate a code with the value of this 10 char column, but how would that be any different than the identity int? Also, if you have to put this 10 char value in a column of another table (as a f-key) then it will take up more space than the int (I just got out of bed and don't feel like doing the math here to check if the char(10) is bigger).

    I already put indexes on all the f-keys and put an identity in EVERY table, also an index on the columns that are used for querying on and ordering on. The identity is not always my clustered index, but many times is. And as far as I know this has never gone wrong.

    The article makes quite some sense to me… and I'm willing to try anything for performance increase without sacrificing any integrity.

    My suggestion is: just test and test and test. Do all type of tests (select/insert/update/delete) and check the execution plan.

    I think the article is a good start for beginners, then it is up to them to test.

    Now I'm going for… coke 😉

    Regards,

    Michiel

  • Hi anyone,

    I only want to add that a clustered PK has an overhead:

    true: the leave-level of that index will be stored in the tablerow,

    but the top of this balanced tree will need extra space in the DB

    greetings,

    r.warnat

  • If you're doing inserts where the value being indexed is not increasing then a clustered index is generally a BAD idea. You'll get lots of page splits. See Rob Vieira's book, "Professional SQL Server 2000 Programming" pp.283-4 and 294-7. I would like to have seen this discussed in the article. Otherwise it has good points.

Viewing 15 posts - 31 through 45 (of 184 total)

You must be logged in to reply to this topic. Login to reply