altering huge table - your suggestions

  • Hi all,

    I have an audit table with +- 70 mil rows.

    I am playing around with the indexes to see if I can speed up some queries.

    One of the scenarios I would like to try is removing the clustered index from the PK, identity column(int) and placing it on a datetime column that ALWAYS gets used in the queries.

    Firstly, do you think I am wasting my time with this excersize?

    If not, what's the quickest/most efficient way of dropping the column, and adding it back as PK + identity?(but without clustered index) Or is there a way of removing the clustered index without dropping the column?

    Thanks!

     



    What's this "backup strategy" everyone is on about?

  • First off, you do not have to drop the column to drop the clustered index.  Since it is a clustered index /PK, use ALTER TABLE MyTable DROP CONSTRAINT PK_MyTable.  This will drop the Primary Key constraint along with the clustered index.  As far as building a new clustered index on your date column, you will see a performance increase when querying that table by date.  Depending on the number of audits you are inserting in your table, you could be creating an insert hotspot but you would have had one anyway with your clustered index on an identity column.  Keep in mind that moving your clustered index to the date column will force SQL Server to physically reorder the data in your table.  With 70 M rows, this will take some time. 

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Thanks for the help John!

    Some more help on your terminology please

    "Depending on the number of audits you are inserting in your table, you could be creating an insert hotspot but you would have had one anyway with your clustered index on an identity column."

    Re insert hotspot:

    Are you referring to the possible decrease in performance I could recieve during periods of high number of inserts?

    Thanks



    What's this "backup strategy" everyone is on about?

  • Thinus,

    Are you actually moving your PK to the date column, or just the clustered index (leaving a NON-clustered PK on the identity field)? If you are putting the PK on the date field, just be sure that there is no way that two records can be added at the same time.

    I'd be tempted to leave my PK on the identity column (as a NON-clustered index), and simply put a clustered index on the date column. This will definitely improve your performance, BTW.

    John's right to warn you about potential hotspots. Do the values in this date column increase sequentially, or are they more random than that? If they do NOT increase sequentially, then you will still have a problem if you need to insert large numbers of records in a hurry. The reason for this is that because the entire table would then be sorted by date, if a row needs to be inserted which would place it somewhere in the middle of the table, if there is no free space on the target page then a page-split operation will be required. This isn't terminal, but it is a performance hit, and if a large number of pagesplits are required then the insert performance will suffer.

    To resolve this, you will need to specify a larger than normal FILLFACTOR (free space left on the data pages when the index is built that will accomodate new rows later on). Once you've used up the intial fillfactor space, the only way to get it back is to rebuild the clustered index again.

    If this is your situation (if your date field values are added in random order and not sequential order), the you will need to perform regular rebuilds of the clustered index to maintain performance. Something to bear in mind!

    Phil

  • Hi Phil,

    Thanks for the easy to understand advice

    I tested the following configuration as per John/Phil's suggestions/advice:

    Leaving the PK on the identity column with a non-clustered index.

    Because the date column will be inserted sequentially(most of the time), I created the clustered index on the datecolumn with a fairly high fillfactor(90)

    The reduction in query execution time was AMAZING!

    Here are the results of the 2 most used queries:

    From an average of 68 seconds to 7 seconds

    From an average of 58 seconds to less than 1!

    Thanks guys!



    What's this "backup strategy" everyone is on about?

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply