Indexes with more writes than reads

  • Ah. Ok. Now I understand. Thanks for the feedback.

    Yes... Your last paragraph above is indeed a problem. It would appear that someone knew of the performance gains of using a clustered index for SELECTS. They may have even added the ID column because they also know the value of having a unique clustered index and adding the ID column as the second column would allow it to be unique.

    Unfortunately, and I'm sure that I'm preaching to the choir but it would appear that whomever designed the indexes had no clue as to the other attributes that a clustered index on a large volume insert-able table should follow not the least of which is the "ever increasing" attribute. And the table is NOT partitioned, correct?

    Other than the ID column and the current clustered index, are there any other unique indexes on this table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff,

    This table does not have any unique indexes. The tables that have a unique non-clustered are few and far between. The table is not partitioned, and neither are any other tables. Even when the clustered index is an INT type it is not a unique index, the table definition allows NULLs, and a NULL value is often being inserted by the app.

    In the case of one table I just looked at, there are just over 15 million rows and just over 35,000 of those records have a NULL for the column that is the clustered index. There is an IDENTITY column as the non-clustered PK. This situation is the norm in this database. I have been switching these out to make the PK into a clustered index and make the current clustered index, which is full of NULLs, to be a non-unique non-clustered. This gives us DML operations on a new clustered PK that is ever-increasing and reduces expensive page splits on what I think are poorly chosen, former clustered indexes. What are your thoughts on this approach?

  • I usually try to change my PK to nonclustered if I know that there are no reads against it. Then I can use my available clustered index on something useful.

    edit: sorry, I see Gila Monster said much the same

  • lmarkum (7/20/2015)


    Jeff,

    This table does not have any unique indexes. The tables that have a unique non-clustered are few and far between. The table is not partitioned, and neither are any other tables. Even when the clustered index is an INT type it is not a unique index, the table definition allows NULLs, and a NULL value is often being inserted by the app.

    In the case of one table I just looked at, there are just over 15 million rows and just over 35,000 of those records have a NULL for the column that is the clustered index. There is an IDENTITY column as the non-clustered PK. This situation is the norm in this database. I have been switching these out to make the PK into a clustered index and make the current clustered index, which is full of NULLs, to be a non-unique non-clustered. This gives us DML operations on a new clustered PK that is ever-increasing and reduces expensive page splits on what I think are poorly chosen, former clustered indexes. What are your thoughts on this approach?

    You seem to be the kind of person to actually think about this and know about these things. To me, that probably means that you've really done your homework in this (you even checked for NULLs which a lot of people totally miss before making a move). With that in mind, it sounds like a decent plan and it'll certainly kill the page split problem that you currently have. It will also open the door to possibly greatly reduced clustered index maintenance because of the new order to insertions (proverbial end of the table).

    I will say (before someone else says it) that having such a clustered index will sometimes make a bit of a "hot spot" at the end of the table but I'll also remind folks that such a hot spot is usually a minor annoyance compared to page splits and massive fragmentation cause by a clustered index that's not based on columns with ever-increasing values.

    Bottom line is, sounds right enough to me where I'd do the same thing if the problem were mine.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff, Thanks for your time on the forum and with my question. Just the feedback I am looking for to either confirm I am thinking correctly or steer me in a different direction. Much appreciated.

  • IMO - it is rare to "over-index" a table - I almost always find the exact opposite. However, things I look for when I see many indexes:

    1. Are there any indexes that are exact duplicates of each other, including ones that exactly duplicate the primary key? Duplicate indexes are likely candidates to remove.

    2. Are there any that include all (or nearly all) columns of the source table (don't laugh - I have found indexes that are ), or include large varchar columns ([n]varchar(500) or larger)? Indexes on large varchar columns are rarely useful in my experience.

    That said, all the comments about "beware" are very valid. There may be monthly/quarterly/annual reports that run with some queries that need the extra indexes to avoid huge table scans.

    If the problem is locking with some long-running queries, are you looking at why the queries are taking so long? Index maintenance is a relatively fixed overhead - I would be questioning why long queries are blocking the transactions. Are these queries that need to use committed reads (or worse, repeatable reads), or could they safely use "nolock" hints to avoid locking the transaction tables?

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply