Worst Practices - Not Using Primary Keys and Clustered Indexes

  • -- PRIMARY KEY - don't CREATE TABLE without it. Doesn't matter how temp it is. Even in-memory table variables.

    -- IDENTITY - a wonderful alternative to 200+ byte PKs that aren't required to ensure data integrity. You may have a natural PK, but do you really want to use it when it's five varchar(100) fields that change? This is even more important if other tables need to reference records in the table. Always consider "lookup" tables and other "extreme relational design" techniques and super-normalized schemas -- SQL Server is tuned much better for large Star-schema queries such that performance gains from denormalization aren't as easy to acheive.

    -- and CLUSTERED INDEXES: If you are using SQL Server 2K (or even 7), you have no excuse to not apply a clustered index to each and every table. Period. Understand that SQL Server handles tables with no clustered index quite differently in many respects to clustered tables. If you have more than 100 pages of data and you have significant modifications to existing rows, you will soon die under the I/O overhead. Only the rarest of circumstance could find the sole benefit of non-clustered tables (SQL Server's ability to reuse space created by deleted rows) more compelling than the many benefits of clustering a PK.

    As a general rule, OLTP-type usage will demand the PK be clustered, and OLAP-type usage many benefit from another set of columns for range searches. But don't forget that you can never go wrong with a clustered PK, but you can go very wrong with some other index clustered. Also, remember that a covering index is just as effective in improving query performance as the best clustered index. (I have one table with an index that has every table field, but with a different order.)

    Any body have examples where a non-clustered table is the best solution, particularly on SQL2K? I'd love to see 'em...

  • Good article. I too am amazed by how many people are willing to forgo primary keys on small tables. Data integrity always preceeds performance (esp such a slight gain). Also, without this PK, you can't create the appropriate foreign keys, which is also a data integrity issue. My experience is if you don't make the relation, you end up with orphans.

    One technical point: if you create a table using TSQL in Query Analyzer, the PRIMARY KEY constraint is always clustered unless a clustered index already exists or you specify NONCLUSTERED.

    As far as no primary keys on temp tables, this position also has little validity. We key our temp tables, and have generated PK violations that would have gone unknown to us and produced erroneous results without them.

    Tables having a clustered index, however, is less certain. I recently had the experience of seeing a significant jump in performance in some temp tables by adding NONCLUSTERED to the create command. The tables were still keyed, but the absence of a CLUSTERED index helped tremendously.

    I don't recommend as a "damn good start" adding and clustering an identity column. There would almost always be a better choice for a clustered column, and these bad habits might be hard to break.

    Still, I will refer to this article anytime someone tells me I might not want to key my small tables.

  • Andy,

    I find using clustered indexes painful experience in the following scenarios:

    Replicated tables

    Large Tables w/ 15 million rows or more

    With replication you can't use a clustered index on a foreign key (logical choice lots of cases) because in replication an update gets transalated to insert&delete on the backside.

    Clustered indexes on large tables are painful anytime you have to do maintenance on them (adding or dropping columns).

  • Interesting points. On the replication one I'll try to find time to experiment, might be possible to override the behavior in the replication sequence.

    Thanks to all for their comments. The combination of the article and your responses make for pretty solid package!

    Andy

    http://qa.sqlservercentral.com/columnists/awarren/

  • More about clustered keys.

    Andy is right about a lot of things in this article. Ironically, a few of his mistakes are not the result of being overzealous, but of giving the opposing argument too much slack!

    For example, Andy says that some database designers will argue that a table doesn't need a primary key because "it's a temp table, so the rules don't really apply." He writes, "I think there is some validity to this position. [...] the most compelling reason [...] is that in many places where you use a temp table now you could (and probably should) use a variable of type table (available beginning in SQL 2000) which only allows you to define the structure, not indexes."

    But you *can* specify a primary key (with a clustered index, if you like) on a table variable:

    DECLARE @temp TABLE (

    a INTEGER,

    b NVARCHAR(2000),

    PRIMARY KEY CLUSTERED(a)

    )

    Another argument Andy should dismiss, but doesn't, is that a particular type of table doesn't need a clustered index. He writes: "A good example of this might be a lookup table that contains state name and state abbreviation. You're going to be doing single record lookups, never a range query which is where a clustered index excels. You're also not going to have a lot of indexes, probably one on the name and one on the abbreviation where the index will provide the data, no need to do a seek through the table to find other bits of information associated with that row. For this example, is it a fair point? I think so."

    Well, I don't think it matters because SQL Server will probably just keep the small table in memory. But let's look more closely at this.

    Suppose you don't use a clustered index, and you then query against your "states" table for a particular state code. SQL Server has two options: (1) use the index, in which case it has to load two pages of data from the disk - a data page and an index page; or (2) don't use the index, load one page of data, but then scan the whole table to find the right row. It's a trade-off.

    When you use a clustered index, the index information is built into the table data itself. So there's no trade-off. SQL Server only has to load one page from disk, and that page contains both the table data and the index data. A table with a clustered key could therefore actually perform faster.

    [Plug: I verified some of this using Lumigent's Log Explorer tool. I created one table with a clustered index and one with a non-clustered index. When I insert rows into the clustered table, SQL Server records a single write in the transaction log. When I insert a row into the other table, SQL Server records two writes on different pages: one for the table itself, and one for the non-clustered index. Log Explorer let me view these records in the log. Note - I work for Lumigent, which advertises on this site.]

    As a side note, the Books Online specifically recommend clustered indices for "[c]olumns that contain a limited number of unique values, such as a state column that contains only 50 unique state codes." But I can't tell if they are talking about a small "states" table or a large table that happens to have a "state" column.

  • On a related subject, should Int IDENTITY columns be deprecated infavour of GUID columns?

  • Another thing that should be stressed is the fact that PK works better, for various reasons if it's either identity or other simple one column-integer field, rather than 'natural PK'--which I guess someone explained it as "these x number of columns just happend to be unique, therefore, let's make it the PK". Multiple column PK make it hard to reference, or if the PK need to be passed to another, usually web application, so it can be re-queried...multiple column making up a PK simply make programming/query un-necessarily harder than has to be....

  • Hey Jorend, thats the first time anyone has said I was too easy on the opposition! Very interesting comments, thanks for reading and posting.

    David P,

    I like GUID's and have begun using them some. Nothing that was really a high useage/high volume thing, but more where I needed to build relationships on the client and wanted to avoid a lot of round trips. Biggest complaint I've had is from other IS users who work with the table via Access or EM is that the GUID's are 'harder to work with', which I think translates to 'hard to remember compared to a plain number'. They are great for object work where every object needs a key. I've also got an article posted here on the site about them as well.

    Andy

    http://qa.sqlservercentral.com/columnists/awarren/

  • Andy:> “So, let's talk about another worst practice - not putting a primary key on EVERY table and failing to use a clustered key on EVERY table.”

    That’s a bold statement. I happen to disagree. You spend a lot of time trying to stipulate that not making the key and the index is a “mistake”. What you do not spend a lot of time doing is explaining precisely why. You eventually get around to briefly mentioning three substantive reasons for using these devices – performance considerations, data integrity, low cost – but you do not address why these things will always necessarily be of paramount concern in all situations.

    Andy:> “It's a great opportunity to make sure that your developers understand how important these issues are...”

    I think developers can blow us off with justification if we can’t provide good reasons for our pronouncements.

    Andy:> “...a gentle reminder to stay sharp is worthwhile. If they didn't know better - isn't it time they learned?”

    I can’t afford to take a posture of arrogance toward the developers I work with.

    Andy:> Come on - having a primary key is such a core concept, why would you not have one?”

    You’re trying to shift the burden of proof. The article calls not using one a “worst practice”. This is your case to make, and not ours to refute until you’ve made it.

    Andy:> “...because the table is so small, the index will not be used in the query plan. Quite possibly true. So what? ....you're making the assumption the table will always be small...."

    There are certain tables (e.g., code-lookup tables) that, when I design them, I know for an absolute fact that there are never going to be more than a couple of dozen rows. My judgment in such cases, sometimes, is that neither a key nor an index is necessary.

    Andy:> “Why have to fix a problem later that I can avoid now?”

    On the other hand, why should I spend extra time “solving” something that will never become a problem?

    Andy:> “How much time does it take to create a primary key and a clustered index....”

    I’m not going to have performance issues over a table that only has a couple of dozen rows in it.

    Andy:> “More important than that though - it's not just about query plans, it's about data integrity. Not having a primary key can provide a fatal failure point....”

    Someone might accidentally enter a duplicate row, and if that’s a significant risk, installing a constraint is a good idea. But it isn’t always a significant risk. For example, duplicates aren’t a problem if a table’s only purpose is to allow checking for inclusion in a list.

    Andy: “Another is that it's a temp table...."

    If I’m using a temp table that only has a couple of rows inserted into it within a stored procedure, then I’ve just taken care of the only two substantive objections you’re raised so far: poor performance, duplicate rows.

    Andy:> “Should we seek to minimize that overhead in our design? Always. But never at the expense of data integrity.”

    Fine, so I’ll use primary keys if the alternative is poor data integrity, but that lets me off the hook for those times when there is no chance of that happening, allowing me to minimize overhead.

    Andy:> “The nice thing about primary keys and clustered indexes is you can only have one of each per table - pretty hard to over use them!”

    There are lots of inexpensive things I don’t always use.

    Edited by - Lee Dise on 08/02/2002 1:39:37 PM

  • Hi Lee Dise,

    it seems quite obviously, that your projects will never grow into the market, and your customer never changes his/her objectives to achieve with your product(s).

    Dise:> ...You spend a lot of time trying to stipulate that not making the key and the index is a “mistake”. What you do not spend a lot of time doing is explaining precisely why...

    I say: you once forgot to precisely define your tables (including the primary key) after a couple of weeks spending with fixing another project, you will lose the idea of what has to be in every table. With a primary key you could at least identify the stored objects.

    Dise:>I think developers can blow us off with justification if we can’t provide good reasons for our pronouncements.

    Andy:> “...a gentle reminder to stay sharp is worthwhile. If they didn't know better - isn't it time they learned?”

    Dise:>I can’t afford to take a posture of arrogance toward the developers I work with.

    So, if you can't stand your position and can't phrase the importance of some development design goals, better code some simple vb-sell-and-forget-programs and be done with it.

    Andy:> “...because the table is so small, the index will not be used in the query plan. Quite possibly true. So what? ....you're making the assumption the table will always be small...."

    Dise:>There are certain tables ... when I design them, I know for an absolute fact that there are never going to be more than a couple of dozen rows

    And you know for sure, no one else enters any data in this table and you know this table can't be used by anyone else, fine, go ahaed !

    Andy:> “Why have to fix a problem later that I can avoid now?”

    Dise:>On the other hand, why should I spend extra time “solving” something that will never become a problem?

    Please take a look over the horizon of your project, or will that project never be reused ?

    Andy:> “How much time does it take to create a primary key and a clustered index....”

    Dise:>Someone might accidentally enter a duplicate row, and if that’s a significant risk, installing a constraint is a good idea.

    So, you know how to do it with a few mouse-clicks, don't you?

    Dise:>But it isn’t always a significant risk. For example, duplicates aren’t a problem if a table’s only purpose is to allow checking for inclusion in a list.

    I think you will use your tables only for that purpose, but will your customer, or will the guy, copying your project for the next one ?

    Andy:> “Another is that it's a temp table...."

    Dise:>If I’m using a temp table that only has a couple of rows inserted into it within a stored procedure, then I’ve just taken care of the only two substantive objections you’re raised so far: poor performance, duplicate rows.

    If you are destroying the temp table inside your stored procudure: go ahead, but please comment the columns, you are identifying your data with (the primary key?)

    Greetings to all system designers, who don't have to code fire and forget programs,

    R.Warnat

  • > Hi Lee Dise,

    Hi, Warnat!

    > it seems quite obviously, that your projects will never grow into the market...

    That's precisely the point: Not all products that are created on SQL Server require a deep regimen of indexes and keys precisely because they do not have a large market, and never will.

    Dise:> There are certain tables ... when I design them, I know for an absolute fact that there are never going to be more than a couple of dozen rows

    > And you know for sure, no one else enters any data in this table and you know this table can't be used by anyone else, fine, go ahaed !

    If I want to prevent others from entering data into a table, I employ the concept of not granting permissions.

    Dise:>I can’t afford to take a posture of arrogance toward the developers I work with.

    > So, if you can't stand your position and can't phrase the importance of some development design goals, better code some simple vb-sell-and-forget-programs and be done with it.

    Ooooh, flame on! I'm going to stick my neck out and guess that taking such a posture toward your developers isn't much of a problem for you.

    > Greetings to all system designers, who don't have to code fire and forget programs,

    General Curtis LeMay once said, "There's a reason for the rules. The reasons are important; the rules are not." If certain rules help you to accomplish your purposes, then they are good rules. In my shop, it pays to be flexible. All situations are not the same. Computer programming's dirty little secret is that not every job worth doing is worth doing well. Often, the schedule dictates how "well" a job is done. I tell my bosses, there are three desirable things, Good, Cheap, and Fast: Pick any two. Nine times out of ten, they want "fast and cheap".

    For this reason, I am not a design dogmatist. What we call "best practices" can become a fetish. The difference between the workaday and the fetishistic can be illustrated thusly: If a beautiful woman walks down the street wearing high heels, most people notice the woman, but the fetishist only sees the heels. His loss! If all this means I'll never get to DBA Heaven, then oh well!! It's starting to sound like a pretty Pharisitical place, anyhow.

  • Hi Lee Dise,

    W> it seems quite obviously, that your projects will never grow into the market...

    D>That's precisely the point: Not all products that are created on SQL Server require a deep regimen of indexes and keys precisely because they do not have a large market, and never will.

    I see your point, for your point of view, it seems very obviously to put everything in one single program, deliver it, and it works well.

    D>If I want to prevent others from entering data into a table, I employ the concept of not granting permissions.

    At least there are other DBA's who could enter data (but would not).

    D>Ooooh, flame on! I'm going to stick my neck out and guess that taking such a posture toward your developers isn't much of a problem for you.

    It's not a problem for me anyway.

    D> In my shop, it pays to be flexible ... "fast and cheap".

    I think the main topic was meant to show most of the people to be more careful and think of their data tables beeing used by other persons/circumstances as well. Not all of them can casulate their program's and data like you (and me).

    D> ...If all this means I'll never get to DBA Heaven, then oh well!!

    meet me in DBA-Hell,

    R.Warnat

  • Hi, Warnat!

    > At least there are other DBA's who could enter data (but would not).

    Not here. There be me, and that be all.

    D> I think the main topic was meant to show most of the people to be more careful and think of their data tables beeing used by other persons/circumstances as well.

    Fair enough, so long as one doesn't assume that absence of indexes/keys is necessarily evidence of lack of care.

    > Not all of them can casulate their program's and data like you (and me).

    I agree. I've worked in large shops with dozens of DBAs and hundreds of developers, and in small shops as the only DBA, and pretty much everything in between. My previous job was in a shop with a dozen DBAs that featured a breathtakingly careful configuration management and design review apparatus. (Today's fix would, assuming it got past Q/A, CM, stress-testing and user acceptance testing, make it into production in about six months.) Their software was very high quality and performed quickly, but it wasn't cheap.

    Where I work now, we do not have the budget for such extreme qualitative measures, and I would not have the support of my bosses if I were to unilaterally dictate coding practices. For myself, I tend to err on the side of over-engineering, anyway (meaning, among other things, I usually go ahead and add in the primary key), and occasionally take moderate amounts of heat: "Is that constraint really necessary?"

    Again, the purpose of my initial response was to object to extreme positions such as "always do this" or "never do that". If Andy would amend his thesis to, "The DBA, the vast majority of the time, needs to explicitly define the unique key in his tables," I would go along with it. (I think he's on even thinner ice by always demanding a clustered index.)

    > meet me in DBA-Hell

    I may already be there! 🙂

    Edited by - Lee Dise on 08/05/2002 12:15:56 PM

  • Well, at least it got you guys talking!

    Leen, you make some fair points. Article probably could have been a little richer in some places. As with all our content posted here, we find that the combination of the article AND the resulting feedback makes the whole cake. Not often that we'll nail it down perfectly. Part of that is it's difficult to cover any topic right in full in a page, part is our failings(!) as authors trying to get our idea across.

    For now one ;oint I'd like to address is that it's not my intent or style to be arrogant to developers. I deal with a lot of junior developers, both where I work and here on the site, and you know what? They LIKE for you to give them rules to live by and to have a short lesson explaining why to go with it. As they grow they can start to challenge those rules a bit and explore when it makes sense to break them. Death and taxes are absolutes, the rest is negotiable! But I find that using a primary key and using a clustered key is a win far more often than it's a lose.

    Thanks for participating, your comments enrich the experience for everyone.

    Andy

    http://qa.sqlservercentral.com/columnists/awarren/

  • > For now one ;oint I'd like to address is that it's not my intent or style to be arrogant to developers....

    Fair enough. I didn't know how to take that "gentle reminder" statement.

Viewing 15 posts - 46 through 60 (of 184 total)

You must be logged in to reply to this topic. Login to reply