Index on IDENTITY column

  • Identity gaps are only an issue if, for whatever reason, you're assuming a perfect chain of numbers. Don't make that assumption and the gap problem goes away. The idea is not to have a perfect chain of numbers, but rather to have as small as possible, but still completely unique, key on which to hang the rest of the data. That's the purpose of IDENTITY. If you want a perfect chain of numbers, use a different construct and, I'd say, avoid making that your PK (whether or not you make it the cluster is a different critter).

    As to scans & seeks, again, it has nothing to do with natural or artificial keys. It has to do with the selectivity of the index. In general, no, having an identity field as an index (clustered or not) does not lead to scans. By and large it leads to seeks because, and this is the important part, the primary key constraint makes them unique. A unique set of values is, by definition, highly selective. That selectivity is what leads to seeks.

    It's possible to have an identity value and either leave it off an index or make that index non-unique and then force data into the column using IDENTITY_INSERT in such a way that you end up with all kinds of duplicates, making the column less and less selective. But that's an unnatural construct. Using the old joke, doctor, doctor, it hurts when I do this... don't do that.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • karthikeyan-444867 (5/26/2010)


    GilaMonster (5/25/2010)


    karthikeyan-444867 (5/25/2010)


    I am sure we shouldn't create Clustered index/primary key on an IDENTITY column.

    There's a big, massive, huge difference between not using an int identity for a primary key and not using an int identity for a clustered index. Totally different reasons, totally different principals, totally different object types.

    See Joe Celko's stuff about int identity primary keys.

    As for int, identity, clustered index - http://qa.sqlservercentral.com/articles/Indexing/68563/

    int identity for a primary key is nothing but unique clustered index. I am confused...

    not using an int identity for a clustered index.

    ???

    The primary key of a table does not have to be the clustered index. Yes, that is the default behavior when you create a primary key, but you can control it. So it is possible to create an integer, mark it as an identity, and make it the primary key, but make it non-clustered. You can then create the cluster on a different set of columns.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • karthikeyan-444867 (5/26/2010)


    int identity for a primary key is nothing but unique clustered index.???

    By default only.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • First, what actually makes you say that and why are you "sure"?

    I read like 'Don't create clustered index on monotonically increasing column' some years back.

    In fact, having a Clustered Index on a natural key may be the worst thing you can do insofar as fragmentation, etc.

    Do you always ( or most of the time) go with creating index on IDENTITY column?

    What does a natural key have to do with a Clustered Index? Nothing really.

    If we have fake/surrogate key in a table, will the optimizer use 'Clustered seek' ? I believe it will use 'Clustered Scan'.

    One of the major advantages of using a Clustered Index on an IDENTITY column is it does prevent page splits on heavy OLTP tables or tables that take a lot of batch inserts.

    Incase if there is less inserts happening for this table, is it a good choice to have clustered index on an IDENTITY column? Say for example, insert is a one time activity. But UPDATE & DELETE will happen frequently for this table. will index on IDENTITY column help us to do UPDATE & DELETE very fast? I believe the optimizer will scan the entire table to do UPDATE & DELETE.

    Yes, I may also use a unique constraint/index (Alternate Key) in addition to a PK especially when humans are making entries one way or another into a table.

    is it a good choice to combine some natural keys in the fake/surrogate key?

    say for example,

    CREATE TABLE tablename

    (

    ID INT IDENTITY PRIMARY KEY,

    Strategy VARCHAR(25),

    AssetType_Code INT

    )

    create unique clustered index id1 on tablename(Strategy,AssetType_Code,ID)

    instead

    create unique clustered index id1 on tablename(ID)

    karthik

  • karthikeyan-444867 (5/26/2010)


    First, what actually makes you say that and why are you "sure"?

    I read like 'Don't create clustered index on monotonically increasing column' some years back.

    I take it you didn't read the article I referred you to earlier in this thread.

    If we have fake/surrogate key in a table, will the optimizer use 'Clustered seek' ? I believe it will use 'Clustered Scan'.

    Natural/surrogate key has NOTHING to do with clustered index seek/scan. Whether you get a clustered index seek or scan depends on the query you're writing, the where clause you specify.

    Again, primary key != clustered index. That's critical to understand. A primary key is logical construct that provides a unique way to identify a row. By definition, it doesn't even have to be an index, though in SQL Server it always will be.

    If your query has a SARGable predicate on the clustering key, you'll get a clustered index seek. A clustered index scan is a table scan, you get those when there are NO suitable indexes for the query to use (or no sargable predicates, or insufficient selectivity, or a couple other reasons).

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • karthikeyan-444867 (5/26/2010)


    If we have fake/surrogate key in a table, will the optimizer use 'Clustered seek' ? I believe it will use 'Clustered Scan'.

    If you use the "fake" key in your WHERE clause

    WHERE MyFakeID = 42

    and that column has an index (clustered or not) and that index is selective enough (clustered or not, primary key or not) then you will get a seek operation. Period.

    If, on the other hand, you use the natural key in the WHERE clause

    WHERE NaturalKey = 'My Natural Value'

    and the index is still on the other column, then, regardless of the selectivity and beauty of the other index, this will result in a scan operation because you are using innapropriate values in the WHERE clause, not because the other column/index is somehow wrong. You either need to use the appropriate column for your search or you need to change your indexing structure.

    NONE of this has anything to do with a natural or an artificial primary key.

    Incase if there is less inserts happening for this table, is it a good choice to have clustered index on an IDENTITY column? Say for example, insert is a one time activity. But UPDATE & DELETE will happen frequently for this table. will index on IDENTITY column help us to do UPDATE & DELETE very fast? I believe the optimizer will scan the entire table to do UPDATE & DELETE.

    Assuming a properly maintained set of indexes, depending on the amount of deletes, your belief is just wrong. But regardless, if you sufficiently change the parameters, then yes, the general rule of the usefulness of an IDENTITY column as a primary key and/or a clustered index will change. No one is saying ALWAYS do this or NEVER do that. We're telling you what the trade-offs are.

    is it a good choice to combine some natural keys in the fake/surrogate key?

    say for example,

    CREATE TABLE tablename

    (

    ID INT IDENTITY PRIMARY KEY,

    Strategy VARCHAR(25),

    AssetType_Code INT

    )

    create unique clustered index id1 on tablename(Strategy,AssetType_Code,ID)

    instead

    create unique clustered index id1 on tablename(ID)

    As a general rule, I'd say no, that's not a good approach, but you know what, depending on your circumstances, there might be a reason to do things that way. But probably, especially the order of columns on that clustered index, it's not.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • karthikeyan-444867 (5/26/2010)


    what will happen If I delete some rows from the table. wouldn't cause IDENTITY GAP issue? wouldn't waste the datapage space?

    I guess you really need to read up on IDENTITY and how things are stored on a page. Gaps in an IDENTITY column have no effect on how the data is stored.

    Deletes are another story but mere gaps caused by someone cancelling a transaction have no affect on the stored data any more than having an increment of 10 would.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • karthikeyan-444867 (5/26/2010)


    First, what actually makes you say that and why are you "sure"?

    I read like 'Don't create clustered index on monotonically increasing column' some years back.

    I take it that wasn't Books Online you read where the opposite is recommended. Stop just taking people's word for things especially when you read just one thing. There are a lot more people who believe and recommend just the opposite (like me). Don't take their word for it either... TEST! "A Developer must not guess... a Developer must KNOW." The only way to do that is to TEST. As long a history as you and I have, you should know that answer by now. Don't even trust me... use the information to define what and how your will TEST. 😉

    Do you always ( or most of the time) go with creating index on IDENTITY column?

    Heh... already been answered 😉 You even quoted what I said further below.

    If we have fake/surrogate key in a table, will the optimizer use 'Clustered seek' ? I believe it will use 'Clustered Scan'.

    Stop "believing". Test. It depends on what the non-clustered and clustered indexes are. Read about the "design considerations" of indexes in Books Online and discover the special relationship between non-clustered index columns and clustered index columns.

    Incase if there is less inserts happening for this table, is it a good choice to have clustered index on an IDENTITY column? Say for example, insert is a one time activity. But UPDATE & DELETE will happen frequently for this table. will index on IDENTITY column help us to do UPDATE & DELETE very fast? I believe the optimizer will scan the entire table to do UPDATE & DELETE.

    Again, it depends on the criteria and your belief means nothing. Test. Read BOL about design considerations when designing indexes. It may take an hour or two, but it'll be faster than asking these types of "belief" questions on a forum.

    Yes, I may also use a unique constraint/index (Alternate Key) in addition to a PK especially when humans are making entries one way or another into a table.

    is it a good choice to combine some natural keys in the fake/surrogate key?

    I would say generally "No" but you need to read about the design considerations in BOL and find out what the special relationship between non-clustered index columns and clustered index columns is to find out why I say so.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • karthikeyan-444867 (5/26/2010)


    First, what actually makes you say that and why are you "sure"?

    I read like 'Don't create clustered index on monotonically increasing column' some years back.

    ...

    It would be long time ago not "some years". Do you remember why it was not advised to do so (actually for both: SQLServer and Sybase)?

    Mostly it was all to do with the page locking on insert and update operations!

    For OLTP system, having clustered index on the incrementing columns could result in very bad performance of the system front-end application where the user action would insert new records into such tables as these records would be created in the same page! So, if for example, in your Call-Center system, creation of Customer record would require some long valildation (and, may be other actions/events), the data page where the record inserted would be locked for the period of transaction, making other inserts into the table to wait for quite a time...

    The one way around it was to use GUID type of ID (or generate Random ID) and have clustered index on it. It would minimize the probability of inserting new records into the same datapage.

    BUT.

    These problems dissapeared long ago. SQL Server will not lock (most likely) the datapage for inserting of one record. So, there is nothing wrong to have clustered index on Identity columns. Actually, as I prefer using artificial keys other than natural ones (until, it's absolutely silly like for some of reference type of data) , I would recommend to have clustered index on my PK Identity columns most of the time.

    BTW, creating the clustered index for PK by default was done by server designers after some thoughts...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • elutin (5/26/2010)


    BUT.

    These problems dissapeared long ago.

    They dissappeared in SQL Server 7 to be specific. 6.5 had that problem, 7 did not. 5 versions ago.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply