Index on Identity Columns

  • I have a few rambling questions (or observations) indexes on identity columns. I would like to hear from other members with their experience in this area.

    1. All the time, I believed that declaring an identity column does not add an index on it, so if any joins were made based on those identity columns they should be indexed. The unique index should also be in place on identity columns, since, it is possible (very cery very tiny possibility) to get some duplicates in there. At least this what I heard from previous versions of SQL servers with abnormal shutdowns, though I have never experienced one. Am I on the right track?

    2. Also, as a follow up, I have come across varying recommendations from venerable SQL Server expert sites, regarding cluster indexing identity columns (some I copied verbatim, refer to the links for full context):

    SQL Server Performance (http://www.sql-server-performance.com/clustered_indexes.asp) --> recommends creating a clustered index on an identity column because that eliminates page splits and improves INSERTS, UPDATES, and DELETES.

    From MSDN (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/optimsql/odp_tun_1_6583.asp) --> Clustered indexes benefit from being created on unique, nonnull, or IDENTITY columns.

    From Experts Exchange (http://www.experts-exchange.com/Databases/Microsoft_SQL_Server/Q_21148735.html) -- > Generally, I think it's a waste, to use a clustered index on ID columns (especially Identity columns).

    http://www.4guysfromrolla.com/ASPScripts/PrintPage.asp?REF=%2Fwebtech%2Fsqlguru%2Fq021700-1.shtml

    Well, placing a clustered index  on an IDENTITY column isn't all that useful.

    I do understand, that deciding clustered index columns is not always black and white, but what do you guys think of these above quite contradictory statements.

    3. BTW, the sql-server-performance site recommends "As a rule of thumb, every table should have a clustered index". Though this is true, and I firmly believed it, until one of our application vendors explianed why he does not use a clustered index, and uses non-clustered index on the heap. Since the way the app works is that they load data into the tables daily (never modify them) and use queries that read large number of rows. Having non-clustered index on a heap helps in this case, since, there is no second hop from NC index to Cl index and then to the data page. The data retrieval is from the NC index to the data page. I think this is a special case where clustered indexes are not helpful.

    I would like to hear from other members with their experiences.

  • When I Design & Develop Databases, more often than not, just about every Table has an Identity Column that is also the Primary Key. I create a NON-Clustered Index on all Identity Columns that are also Primary Keys.

    The reason I do this is that in the past I use to place Clustered Indexes on Identity Columns that were also Primary Keys and over time I discovered that I needed to create a Clustered Index on a more frequently used column like a DateColumn or a Surname Column and I had to settle for a NON-Clustered Index which turned out to be a NON OPTIMAL option. Since then I ALWAYS create a NON-Clustered Index on Primary Key Columns especially if they are Identity Columns. It's better to save the Clustered Index for a more frequently used column/s.

    The above links you have discovered, I believe that creating a Clustered Index on an Idenetity Column probably does eliminate Page Splits only because some guru has written it however, I NEED A DEEPER EXPLANATION on how that happens, not just a statement from some guru saying that it does. How bout telling us how a Clustered Index is beneficial on an Identity Column!


    Kindest Regards,

  • If you can,you might want to read "Inside SQL Server 2000" on that topic. Might be a more reliable reference than the ones you've quoted.

    While it is true that almost every table should have a clustered index and that a clustered index on a PK column with an IDENTITY property defined on it, is better than having no clustered index at all, keep in mind that upon (re) creation of a clustered index all data is sorted according to the sort order of the clustered index keys. So when all your queries mainly focus on the PK column, it might be a good choice to have a clustered index on the PK.

    However, in reality you most likely will query for other criterias more often, so that it makes more sense to use some other column for the CI.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • I wouldn't put a clustered index on a busy table where the primary key could possible contain sequential values.

    An example is a cacheing database where the primary key is a session id.

    We hit a problem a couple of years ago where we were inserting a large number of rows, in quick succession, into such a table and hit terrible locking issues primarily because when inserting a row, sql server was locking a page on the first insert which blocked the second insert which blocked the third insert and so on...As soon as we made the primary key a non clustered index the blocking stopped.

    The views expressed in this post may not coincide with those of my employer.

  • Let's looks at this one.

    1. All the time, I believed that declaring an identity column does not add an index on it, so if any joins were made based on those identity columns they should be indexed. The unique index should also be in place on identity columns, since, it is possible (very cery very tiny possibility) to get some duplicates in there. At least this what I heard from previous versions of SQL servers with abnormal shutdowns, though I have never experienced one. Am I on the right track?

    Yes you are on the right track. However no chance of duplicating the IDENTITY column in general, you'd have to override in SQL 7 and 2000 or reset the next available value to a number within the existing range to cause issues if this is supposed to a unique column by itself. But with that said I have a table which I reset the indenity column to 1 every year and the key is the year column plus the identity column. So it really depends on how you idenity column effects relates in the design of your table.

    2)

    a) SQL Server Performance (http://www.sql-server-performance.com/clustered_indexes.asp) --> recommends creating a clustered index on an identity column because that eliminates page splits and improves INSERTS, UPDATES, and DELETES.

    Not 100% sure this is correct but I would assume it is so. The problem is I have never seen an official statement from MS on this fact nor have I taken the time to test this myself. But if MS programmed it smarter then this should be true as the engine can safely assume the next value to go in will be 1 higher than the previous and thus can always safely determine a page split will not be needed.

    b) From MSDN (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/optimsql/odp_tun_1_6583.asp) --> Clustered indexes benefit from being created on unique, nonnull, or IDENTITY columns.

    The primary key to that statement is unique can only have 1 null, nonnull of course none and IDENTITY have to be non-null.

    c) From Experts Exchange (http://www.experts-exchange.com/Databases/Microsoft_SQL_Server/Q_21148735.html) -- > Generally, I think it's a waste, to use a clustered index on ID columns (especially Identity columns).

    Sorry didn't log in to see the full response but the snippet on google looks like he was support the idea that indentity columns are not the best keys for clustered index for the same reason I do next.

    d) http://www.4guysfromrolla.com/ASPScripts/PrintPage.asp?REF=%2Fwebtech%2Fsqlguru%2Fq021700-1.shtml

    Well, placing a clustered index  on an IDENTITY column isn't all that useful.

    Need to put in full context. "Well, placing a clustered index (PK constraints create a clustered index by default) on an IDENTITY column isn't all that useful." A clustered index should represent the most frequent item queried against if at all possible because this removes an additional index invovlement. Remember all non-clustered index conatin the relative clustered value to move from the non-clustered index to the clustered index to the physiacal location of the requested data. Many times an Identity column is usefull on support tables such that are used to provide text values in joins to data tables. But unless you have an extrememly wide canidate key you should go with the coulmn that is involved in the most queries executed (based on number of times not numer of individual queries). If the canidate kety is wide the effect on non-clustered indexes are that they will be wider due to the fact the related clustered value is stored with each item in the non-clustered key, thus the use of a surrogate key and identity values are great here.

    3) Deffinently a special case. But you run into these more often than one would think. For example I wrote a C++ service to monitor realtime data from several ASPECT call centers in our org. The data isn't required for reporting right away and with state changes every 3 seconds across 500+ agents I wanted maximum performance. In this situation I have no indexes at all, straight heap. Then at 1 AM every night I change the SP to point to a mirror table that is empty so I can transfer all the data out of the data table to the reporting server. Then the next night I flip the process. And all this is being done with a simple MSDE instance, not a full blown SQL Server on a 533 pentium 3 box with only IDE drives and 256 MBs of ram. ANd recently another call center was added to the process, server hasn't flenched yet and in addition the collecting service operates as a server pushing the data to the clients realtime as well. Adding indexes would cause me issues on this machine with the volume of data processed every 3 secounds and the only good key then would be an IDENTITY column if I wanted to have a realtime sort but I would see a good deal of contention while the IDENITY proceses itself occurrs. Other fields would cause page splits so and create a lot of contention.

     

    All and all just weigh your canidiate keys based on most queried first then consider size of key. If no key exists or key is extremely wide use a surrogate key of which IDENTITY columns are great if no other choice makes sense. Also support tables where you need to join more than you need to lookup are real good choices for clustered IDENTITY columns. But many times I would say it still comes down to personl choice. Some folks will still use a clustered IDENTITY key no matter.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply