Worst Practices - Not Using Primary Keys and Clustered Indexes

Question

Post reply

Worst Practices - Not Using Primary Keys and Clustered Indexes

Viewing 15 posts - 31 through 45 (of 184 total)

You must be logged in to reply to this topic. Login to reply

David.Poole SSC Guru Points: 75656 More actions · Answer 1

I would regard having a compound primary key as terrible practice.

By all means have a unique key on it, but the primary key?

I have a situation where we have an application with tables that have a 4 column compound key. The table structure is set in stone as it is 3rd party, however I can extend the schema using triggers.

The problem comes when I want to trap updates to the table. There is no way I can trap a change to the table where any of 3 fields in the compound key could have been changed.

If the table had a single field with a unique/primary key on it (why not use the IDENTITY) then updates could be tracked and I would home free!

I understand that there is some hostility to Identity columns, but, as Steve wrote, its just a tool. Don't use screwdrivers to knock in nails!

LinkedIn Profile

bbeca SSC Rookie Points: 26 More actions · Answer 2

I think that needs for clustered/not clustered indexes must come from design, if is the case of a process oriented system like OLTP, indexes must be used to minimum, because slow down your transactions and generate locks.If your needs are for data mining,OLAP.. stuff like this, is required a heavy use of indexes, tunned for every common query.

In the case of ODS or any kind of mixture of process oriented with report oriented systems, is all about fine tunning on every query and see the use or the overhead that involves a change in the indexes.There is NO SILVER BULLET as I know.

What I know, is that a good design must let the DBA role on the outside.

Bogdan

mingch SSC Enthusiast Points: 101 More actions · Answer 3

I think PK are absolutely required, as

most people agree. Here is what I usually

do: If an unique, typed-integer column can

be 'naturally' found, then I will use it as

the PK; however, this is seldom the case,

as the customer_id for a customer table

maybe a long string, or it required other

columns to be unique(bad by itself), then

an artificial typed-integer column, such as

an IDENTITY must be introduced. Therefore,

I would create an unique typed-integer column

as PK in cases where natural unique column

cannot be found or if found but they are

not typed integer, or required more than

one column to make up as PK.

IDENTITY, which has some drawbacks, as

others have already point out, when it

comes to imports or required INSERT from

other sources, is still a quick and most

efficient way of introducing PK to a table.

Without using IDENTITY as PK, what are

the best way to introduce typed-integer

column as PK?? Anyone?

Again, nothing is absolute, we are just

talking in 'general', excluding little

'value-lookup' tables, or some less than

norm use of tables...etc.

Ming

Steve Jones - SSC Editor SSC Guru Points: 728159 More actions · Answer 4

Ming has a good explanation above. I also often find there is no good natural key, especially as business rules are constantly changing. Often I need a name for something and I can't prevent the name from changing, so I need some artificial key.

Steve Jones

steve@dkranch.net

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

bertovich SSC Veteran Points: 265 More actions · Answer 5

Hi Andy, There is one case where we've made notable improvements in performance by removing the primary key - high volume transaction buckets.

I work for a profitable dotcom, imagine that! One reason we are profitable is that we make the most of the resources we have.

In our environment, we capture and process millions of transactions per day. We increased throughput on these intense inserts (15%) by removing our PK's (and all indexes) on the tables where we pump our transactions. Transaction GUIDs become PK's during our processing phases. RI is maintained after the fact. This frugal use of CPU and I/O has enabled us to handle this processing with workstation class servers. For us, eliminating the PK is appropriate.

BTW, this was a good set of articles.

Bill

Bill Bertovich
bbertovich@interchangeusa.com

Andy Warren SSC Guru Points: 119747 More actions · Answer 6

Thanks for the feedback! Nothing wrong with breaking any rule as long as you know you're doing it and why...more or less anyway! We're looking over the Worst Practices idea - has been popular, but so much of it is SOOOO subjective. Possibly not a bad thing.

Andy

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

David.Poole SSC Guru Points: 75656 More actions · Answer 7

quote:

In our environment, we capture and process millions of transactions per day. We increased throughput on these intense inserts (15%) by removing our PK's (and all indexes) on the tables where we pump our transactions. Transaction GUIDs become PK's during our processing phases. RI is maintained after the fact. This frugal use of CPU and I/O has enabled us to handle this processing with workstation class servers. For us, eliminating the PK is appropriate.

This sounds interesting.

Can you go into a bit more detail?

Does it work by having daily transaction tables that are index free and then some nightly process copies the transactions up into the main processing tables which are indexed?

LinkedIn Profile

mingch SSC Enthusiast Points: 101 More actions · Answer 8

Index improves query if the query

uses it; however, index, by the fact

that insert/update/delete needs to also

create/update/delete the index will slow

down the process; therefore, there's

no big suprise to why an indexless table

would be faster if mostly doing inserts.

Nightly copying over to another table,

THEN create the index, rather than

copying over an already indexed table.

Ming

bertovich SSC Veteran Points: 265 More actions · Answer 9

Hey David, More detail:

You were on target, but we don't process nightly batches. Instead, we alternate between two small servers. When one is processing the other is capturing, and visa versa. We just flop back and forth between servers whenever the processing cycle is finished.

Here's the beauty of it - this processing is both an audit process and an archiving process. We are updating account balances, etc. real time. The audit process, makes sure we didn't miss anything. Now I'm about to make everyone puke: we don't hold our updates in transaction - too much overhead across servers. We use the audit process to identify discrepancies and then re-process if need be, which rarely happens (2 times since December 1st due to DNS issues). We have the ability to reset and re-process a batch with full auditability.

Less urgent aggregations do get processed less frequently, and some are in nightly batches.

We use a methodology similar to Data Dependent Routing to switch back and forth between servers. You can read about it at:

http://www.microsoft.com/technet/treeview/default.asp?url=/technet/itsolutions/idc/default.asp

Drill down to the SQL Server database section.

Good luck,

Bill

Bill Bertovich
bbertovich@interchangeusa.com

Andy Warren SSC Guru Points: 119747 More actions · Answer 10

Andy Warren

SSC Guru

Points: 119747

March 29, 2002 at 5:51 am

#426282

Sounds interesting Bill. Being able to reprocess without a problem is a great design.

Andy

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

klott SSC Rookie Points: 28 More actions · Answer 11

klott

SSC Rookie

Points: 28

March 31, 2002 at 9:29 pm

#426328

If you put a clustered index on the PK and the table is constantly being updated, wouldn’t this increase the possibility of record locks?

Andy Warren SSC Guru Points: 119747 More actions · Answer 12

Generally no. One thing you have to look at is where the inserts get done. If you're using a sequential primary key then you may end up with some contention at the end of the table. Another is if you are actually updating the primary key frequently. Really I should have left out the word primary, since these are always considerations!

Andy

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

Navaros Newbie Points: 8 More actions · Answer 13

Hi all,

I definitely agree that a PK is needed for EVERY table except for maybe very small temp tables (since I work in v7 I can't create tables as variables). Every time I forgot (sometimes out of laziness) I had to go back and put one in.

What I don't understand is why everyone has something against the identity column. I've used this quite a bit in my normalised databases and it works quite fast. But I use these columns for joining on other columns and not as much for filtering out certain records. I think it may be faster to join on the identity column than on a unique 10 char column that has a clustered key (I should try this some time). It is always possible to associate a code with the value of this 10 char column, but how would that be any different than the identity int? Also, if you have to put this 10 char value in a column of another table (as a f-key) then it will take up more space than the int (I just got out of bed and don't feel like doing the math here to check if the char(10) is bigger).

I already put indexes on all the f-keys and put an identity in EVERY table, also an index on the columns that are used for querying on and ordering on. The identity is not always my clustered index, but many times is. And as far as I know this has never gone wrong.

The article makes quite some sense to me… and I'm willing to try anything for performance increase without sacrificing any integrity.

My suggestion is: just test and test and test. Do all type of tests (select/insert/update/delete) and check the execution plan.

I think the article is a good start for beginners, then it is up to them to test.

Now I'm going for… coke 😉

Regards,

Michiel

warnat Grasshopper Points: 17 More actions · Answer 14

warnat

Grasshopper

Points: 17

August 1, 2002 at 3:34 am

#433781

Hi anyone,

I only want to add that a clustered PK has an overhead:

true: the leave-level of that index will be stored in the tablerow,

but the top of this balanced tree will need extra space in the DB

greetings,

r.warnat

Steven Hand SSC Rookie Points: 25 More actions · Answer 15

If you're doing inserts where the value being indexed is not increasing then a clustered index is generally a BAD idea. You'll get lots of page splits. See Rob Vieira's book, "Professional SQL Server 2000 Programming" pp.283-4 and 294-7. I would like to have seen this discussed in the article. Otherwise it has good points.