Primary Key and Index on same column

  • I am looking at a database with otimization in mind.

    I have found numerous instances of tables with a clustered primary key on an integer (this is good), but these same tables have an index on the same column.

    This strikes me as redundant, but could there be a reason for doing this that I am not aware of?

  • >>but these same tables have an index on the same column.

    Are the other indexes single-column indexes that are identical to the PK, or are they composite indexes that include the primary key integer plus additional columns ?

     

  • These other indexes are identical to the PK (Except they are defined as non clustered, non unique indexes).

  • If the indexes are duplicates then the non clustered or non PK's should be removed.

    You have checked the columns in the indexes are identical  using for example, sp_helpindex tablename  ?

     

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Yep, they are identical. I figured the non PKs should be removed, but I was mainly just checking to make sure there is was not some good reason for it.

    Thanks,

  • I can't think of a good reason for it.  Technically, it should not be a problem (except for extra overhead for SQL to maintain 2 indexes).  You might try comparing the Query Plans in each case.  Nevertheless, I've actually found cases in the past where SQL actually got confused and things were better (faster) after I removed the redundant index.  I tend to always remove the redundant index. 

    Paul

    - Paul

    http://paulpaivasql.blogspot.com/

  • Thanks, that's what I was thinking.

  • There are a couple reasons that I can think of where you would want a duplicate of a clustered index.

    1.  The Nonclustered index will be spread out across fewer data pages than the Clustered index, since each row will only have the indexed column and the RID for the Clustered index.   So, if you ever need to get all of the PKs from that table, and nothing else, it will be faster to scan the non-clustered index than the clustered index. 

    For example:

    Select SomeColumn from ThisTable where PKThatTable in (Select PKThatTable from ThatTable)

    should be faster if there is a non clustered index on ThatTable.PKThatTable to scan.  I'm going out on a limb to say that *normally* you won't do this kind of query, but there may be cases where it has to happen.

    2. For performace reasons, it may be designed so that the duplicate index is created on a different filegroup from the data.  This technique can be used to increase performance.

    It is probably a good idea to check before you drop those indexes.

    jg

     

     

  • Good point. I will definately discuss these issues before doing anything.

  • Very cool.  Thanks for the education, Jeff.

    - Paul

    http://paulpaivasql.blogspot.com/

  • An interesting point about the indexes, but I doubt the optimiser would look at a secondary index defined on a clustered index ( where they are both identical in columns within the index ) without a hint.

    Worth testing though !!!

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • The optimizer should use the cheapest index without any hints.  It would find that the second index has a lower I/O cost (Because it has less data pages). 

    Now, if a query requires any values that are not in the index, it would probably not consider the NC index because it would have to follow that with a bookmark lookup.  In that case, the Clustered index would be used, if there were not any other suitable "covering" index.

    I just tried it on a table with a couple million rows, by adding a duplicate of a Clustered index on an identity key.  Checking the plan with and without the second index shows that the optimizer will chose the index without a hint, and that the difference is I/O cost.

    jg

     

  • I just tried this too on a table with 1 million rows, and whilst I agree that it chooses the NC index in preference to the clustered index, the estimated IO costs were identical between the two indexes, as were the estimated subtree costs.

    Running the query with STATISTICS IO ON, the results were also identical (3 logical reads in each case).

    Don't forget that the NC index will ALSO have the clustered key column present too (not a RID pointer), so it actually contains the same column twice.

    The comparative sizes of the indexes (and hence the IO costs) are only really significant if the optimizer had to scan them both completely, and it doesn't need to do that since it's doing an index seek.

  • I have to add this, as way of an apology to Jeff, having disagreed with him I have almost as described in sets of tables used in partitioned views - the Primary key has to include the partitioning column, in my case i have a two column PK where the partitioning column always holds the same value and the first column is unique, so I have a unique index on the first column of the PK - in the selects the optimiser chooses the secondary index over the clustered index

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply