Clustered indexes and index width

  • Short question: How relevant is index width when the index is clustered?

    Longer explanation: We have a couple of tables with several hundred million rows. There is a unique index on them. One of them has several extra columns, which were added for joins. Let me make up an example:

    BigTable

    Index columns: A, B, C, D, X, Y, Z

    MediumTable

    Index Columns: A, B, C, X, Y, Z

    On the big table, A, B, C, D make it unique. X, Y, and Z are commonly used in joining BigTable to MediumTable, which is done frequently. They were added to a non-clustered index so that the join could succeed without hitting the data table (as often) and while it took more space, it did improve join time. There are not there for use in the WHERE clause to access BigTable.

    However, we've decided to make the index on BigTable a clustered index. Experimentation shows we will save space and access I/O's (this is 98% a read-only query database). It will also let us do regular defragmentation on the table which is now impractical with it in a heap (and we're getting about 50% space utilization in the table itself from fragmentation).

    Here's the question -- how are index values stored and used in a cluster index. I understand of course that the last index node is also the data node. I understand that making the index wider may make it deeper due to the amount that fits in each tree node before it gets to the bottom. What I don't know:

    - Are the index fields duplicated in the bottom node? I.e. for a given row are the index columns in there once for the index, PLUS once for data? Or just once?

    - When doing a retrieval or join with a non-clustered index, it can often do an index-only retrieval getting some data (or pruning it down) without hitting the actual data row (or hitting it only once it is pruned down). Is there any benefit to this concept in a clustered index, less overhead, etc?

    I.e. as we convert from non-clustered to clustered should we drop those X, Y, Z fields from the index.

    I know this sounds confusing, the standard answer of "make indexes as narrow as possible" contradicts what I'm asking. But despite "simple" wisdon, widening an index is occasionally useful if it avoids hitting the data table (IF the index is not clustered). Is it useful if clustered?

  • In a clustered index, the values are there once. They ARE the data.

    I build my clustered indexes as light as possible as the data values as keys will be used to retrieve full rows in the case of bookmarked lookups. (which I do my best to eliminate to start with using covered indexes.)

    I believe in your case, I would create a clustered index as you determined, dropping the join fields, and a seperate covered index for use by the joins, including the clustered index fields, and see which one it uses. It would be a simple straightforward way of checking, and would give good indication of which direction would be best for you. I believe your queries would be able to benifit from the smaller clustered index, while your joins would not suffer by being able to still use the covered index. And although I could be wrong about this, I do not believe your queries would be too much worse off by the addition of an extra write by having this extra index.

    Oh, and though you've probably thought about it already, watch for those hot spots when choosing your clustering field(s). Your disk access patterns on the table will change by adding a clustered index where one wasn't before.

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply