Reasons Against Clustered Index

  • Can anyone think of a valid reason against clustered index? I know there are several reasons why a table should have a clustered index. But there is one (and probably just one) good reason, for a table not to be defined with a clustered index. Can any one think of any?

    thanks.

  • The only valid reason I can think of is that the clustered index would need to be updated 10 000s times per minute (let's read extremely often here) and that the system cannot keep up with the updates. But even then I would simly suggest moving the clustered index on another unique column (either existing or identity) that would reduce the need to rearrange the indexes all the time.

    What reason did you have in mind?

  • Just thaught of another reason... I have a dts package that pumps the data from an error log every 2 minutes. The data is then spread across 4 other tables for relational integrity. The temp table doesn't need a clustered index to do its job... but even then I create one on the fly (because it failed if that field existed when doing the data pump) so that I can identity any failed step and log it accordingly.

  • Reasons against a clustered index:

    - Fragmentation. If the indexed column(s) are subject to updates this can cause page splits & fragmentation.

    - Size of non-clustered indexes. Every key value in the clustered index gets replicated into all the non-clustered indexes on the table. If the clustered index is built on several wide columns, you bloat the other indexes.

  • About the size... that would be a reason against a bad choice of Clustered index... not for a reason not to use it.

  • Many people seem to think that clustered indexes are "more expensive" to maintain than non-clustered indexes because they need to "rearrange" or reorganize the data frequently.  This is not the case.  A clustered index does not attempt to physically arrange the data on disk (something that would be outrageously expensive).  It maintains the ordering of the data via a double-linked-list, rather than by actually storing the data in a predefined order on disk.  It maintains a logical ordering rather than a physical one.

    It is true that if you define a clustered index on an IDENTITY column, for example, it can lead to I/O "hot spots" particularly in a heavy INSERT environment, but as you point out in a later post, that isn't an argument against clustered indexes, it is an argument to choose them more wisely.

    I'm not one of those people who insist that you should always have a clustered index on every table.  But you should certainly consider using one anytime you have a column, or columns, that are often searched based on ranges i.e. if your queries against them use <, >, or BETWEEN.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Thanks for your participation and sharing your thoughts on this topic. Let me add some context to my original question with an example. Consider this scenario: A very large table (few hundread million rows) with columns Col1, col2, col3, col4 and col5 (all integers). Data in this table is loaded once a year (entire table), updated throughout the year in large volumes, and archived / deleted at the end of the year. Only col4 and col5 are updated during the update operations. Primary key is defined on Col1, col2, col3. Update operations use either col1, col2 or col3 in the where clause at about the same number of times (so about 33% of time updates qualify with col1, 33% on col2 and 33% on col3). Do you suggest a clustered index or not. Why?

  • just my 2ct

    we always have a clustered index (even if on a identity or current-datetime column if no other suggested by the development team ! )  because we have noticed (with sql7) that when having no clix and heavy insert/delete rate, pagemanagement gets stuck and pages are not freed/reused as they should be.

    With a clustering index, we did not encounter this issue.

    Regarding your table questions :

    Keep this in mind :

    Your table having 5 int-columns (all 5 not null ?) : rowsize 20 bytes

    max rows / page : 403 (8060/20)

    500.000.000 rows = +- 1.241.000  pages

    1.241.000 pages = +- 9,7 Gb

    this all without the indexes !

    Your PK will take 12 bytes / row + some overhead.

    I'd calculate cardinality for each of the key-columns, and define the most filtering as clustered index.

    You may also have to play around with fillfactor and PAD_INDEX to avoid frequent reindex/indexdefrag.

    Once in a while capture the actual sql-load and have the index-wizard analyze your load and evaluate your existing indexes.

    Also have your data-upload and -maintenance applications take this clustering algorithm into consideration when adding new data ! This way you avoid unneeded page-splits.

    Didn't I read someone's statement like : There is more than one way to skin a cat 

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • http://qa.sqlservercentral.com/columnists/chedgate/clusterthatindex.asp

    http://qa.sqlservercentral.com/columnists/chedgate/clusterthatindexparttwo.asp

    http://qa.sqlservercentral.com/columnists/chedgate/clusterthatindexpart2.asp

    gives a reasonable discussion of that topic.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • On a table that you described I would use a clustered index, though not necessarily on the Primary Key... depending on how each column is most often referenced in the WHERE clauses, and how much variability there is in the values of each column, it may show to be more advantageous to define the clustered index on 1 or 2 columns only.

    Anyway, if each of the 3 columns tends to be used "alone" in the WHERE clause (not specifying the values of other 2 columns), there should be a separate index on each of the columns, (except on the column that is first in the composite clustered index)... because AFAIK if you use only the second or third column of a composite index in the WHERE clause, index is not used and it results in table scan. I can't tell, without knowing your data and most frequent queries, whether a composite or one-column clustered index would increase performance more. You'll have to test it... which is anyway the only reliable way how to find out, which indexes should be used :-))

  • - Range Scans and Inserts

    If your table contains a huge amout of columns your range scans can be very costly and in some cases Non Clustered indexes can be beneficial.

    (Inserting into a table without clustered index can be faster in some instances)

    Happy hunting! Hanslindgren

  • In this case, I am NOT using a clustered index, here is how I justify this:

    I add a non-clustered index on the primary key columns (col1, col2 and col3 for enforcing the PK constraint) and non clustered index on each of the columns used in the where clause (col4 and col5 mostly each of them is used by itself). Now when an update needs to happen, using one of the where clause columns the non-clustered index on that column would point to the actual rows in the heap (non - clustered table). So there is no intermediate step of looking up the clustered index value and then accessing the actual row, so this performs better. A clustered index would have been required if the inserts and updates resulted in page splits leading to fragmentation. Since, Without a clustered index, I would have to recreate all the non-clustered indexes every time I did reorg. But in this particular case, there are no inserts or deleted and even the updates are on integer fields, so chance of page splits or row migrations during updates are not possible (are next to nothing). I think this is a tailor made case against clustered index. I am a big proponent of clustered indexes, but my testing in this case confirmed that there is scenario where clustered index is non only more space consuming but will hurt the performance too. I appreciate your feedback.

    thanks.

  • However ... you can't specify a FillFactor on a heap.

    Since you only Update throughout the year, and since those updates are on int columns that don't cause page splitting, you could have this table at 100% FillFactor.

    Only a clustered index (created after the initial data load) will allow you to have 100% Fillactor on the data pages, and the subsequent reduction in page I/O for all Update/Select operations throughout the year.

     

     

  • PW:

    That is a good point. I guess I can make my PK index as a clustered index at 100% fill factor and then drop the clustered index. I figured, since I am not adding / deleting data or modifying the PK columns, I do not need any explicit PK constraint / indexes.

     

  • You actually have a covered index which can often perform equally with a clustered index, you might want to examine your i/o count however, often using a nc index will use 1 extra i/o per operation - this is a generalisation of course and any number of scenarios can be generated to either prove or disprove this statement < grin >  Anyway you might want to check - this is the only situation where I have changed a nc index to a clustered index - oh and where I might want to remove fragmentation 'cos you can't without a clustered index.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply