How do you reindex a very large database?

  • Jonathan Kehayias (1/5/2009)


    noeld (1/5/2009)


    I join to the "FREQUENT REORGANIZE" idea... In my 24x7 Servers Reindexing is banned!

    At some point I would expect that you have to do a rebuild to get it back in order. Even Microsoft recommends that you do a rebuild if the fragmentation climbs over 30% as it is better for performance. This is why Enterprise Edition has Online Indexing as an option, so that the impact to 24X7 servers is minimized. If you don't do index rebuilding ever, do you ever run DBCC CHECKDB against the servers, because that is going to be more invasive than spreading index rebuilds out over time.

    I havent run into the full reindex issue, but I have been defragging weekly. What are the symptoms of an index that needs to be reindexed and not defragged? We dont see any performance problems (its been about 2 years since we built the indexes) Maybe I'm missing something.

  • We also have a large Oracle environment here as well. One thing we do in the multi-TB data warehouses is track index rebuilds/reorganizations in a table in the DBA schema. It contains a row for every index in the database, and a nightly process updates the fragmentation levels of them if they haven't been rebuilt/reorganized in the last X days. Then it works them in reverse order of size/fragmentation/impact. The impact column is one that a DBA can set a value 1-10 to mark how important the index(s) are to actual functionality, for instance every index on the base fact table is extremely important. The other thing tracked is the start/finish time of the last rebuild, so the job manager can determine whether it can complete a rebuild within the established rebuild window and if not, it will pick another index that will fit in the remaining time window. Granted that this is a large implementation but it works really good to keep things straight.

    Wow. We are trying to accomplish the same thing using a fragmentation_history table, but never thought about an Impact column. I like that idea. I also like the intelligence you built into the job manager checking the history table to see if a reindex can be performed within a given window of time. Do you generate a report indicate all indexes that were not rebuilt?

    Would you be willing to share some of the code/logic you implemented? I'm always looking to improve upon existing processes.

    Thanks, Dave

  • Jim Johnston (1/5/2009)


    Jonathan Kehayias (1/5/2009)


    noeld (1/5/2009)


    I join to the "FREQUENT REORGANIZE" idea... In my 24x7 Servers Reindexing is banned!

    At some point I would expect that you have to do a rebuild to get it back in order. Even Microsoft recommends that you do a rebuild if the fragmentation climbs over 30% as it is better for performance. This is why Enterprise Edition has Online Indexing as an option, so that the impact to 24X7 servers is minimized. If you don't do index rebuilding ever, do you ever run DBCC CHECKDB against the servers, because that is going to be more invasive than spreading index rebuilds out over time.

    I havent run into the full reindex issue, but I have been defragging weekly. What are the symptoms of an index that needs to be reindexed and not defragged? We dont see any performance problems (its been about 2 years since we built the indexes) Maybe I'm missing something.

    That is a really good question. I haven't ever encountered that kind of situation because we do rebuilds on anything over 30% fragmentation following Microsoft's recommendations. I'll find out though. There may not be any benefit if you keep them reorganized enough, but a reorganization of a highly fragmented index >50-60%, won't be faster than a rebuild, at least not in my experience.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • DBADave (1/5/2009)


    We also have a large Oracle environment here as well. One thing we do in the multi-TB data warehouses is track index rebuilds/reorganizations in a table in the DBA schema. It contains a row for every index in the database, and a nightly process updates the fragmentation levels of them if they haven't been rebuilt/reorganized in the last X days. Then it works them in reverse order of size/fragmentation/impact. The impact column is one that a DBA can set a value 1-10 to mark how important the index(s) are to actual functionality, for instance every index on the base fact table is extremely important. The other thing tracked is the start/finish time of the last rebuild, so the job manager can determine whether it can complete a rebuild within the established rebuild window and if not, it will pick another index that will fit in the remaining time window. Granted that this is a large implementation but it works really good to keep things straight.

    Wow. We are trying to accomplish the same thing using a fragmentation_history table, but never thought about an Impact column. I like that idea. I also like the intelligence you built into the job manager checking the history table to see if a reindex can be performed within a given window of time. Do you generate a report indicate all indexes that were not rebuilt?

    Would you be willing to share some of the code/logic you implemented? I'm always looking to improve upon existing processes.

    Thanks, Dave

    If it were in SQL Server, I would gladly post it, but it is HP-UX (KSH Scripting) and Oracle 10g PL/SQL. No report is really needed, because we can query the table at any point. There are always indexes that won't get rebuilt for a few days because the rebuild windows isn't large enough for the engine to rebuild them all. It would take a week or more on the largest database I would expect. Letting it run daily, keeps things in good enough order that the performance impact of fragmentation is minimized.

    I guess I could rewrite a majority of the code for it tonight for TSQL and post it if you are interested.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • I don't want to waste your time by asking you to re-write it in SQL, although I do appreciate the offer. I can figure it out, but was just hoping to save development and testing time if you had a script(s) to share.

    Thanks, Dave

  • It would make an interesting article to write up anyway. If I get it complete, I'll PM you the code that way Steve can publish it as an article if he thinks it is quality enough without it having been posted previously.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • I had a similar piece of code (for SQL) at my former company. I didn't take the code when I left, of course. I should probably rewrite and retest it sometime.

    Dave, read through this recent post from the Storage Engine team. It's on resolving IO bottlenecks in TempDB, but the techniques are applicable to any database. http://blogs.msdn.com/sqlserverstorageengine/archive/2009/01/05/tempdb-monitoring-and-troubleshooting-io-bottleneck.aspx

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I really appreciate this. Thanks. It's good to learn how other people approach maintenance.

    Dave

  • GilaMonster (1/5/2009)


    I had a similar piece of code (for SQL) at my former company. I didn't take the code when I left, of course. I should probably rewrite and retest it sometime.

    The above is why I blog it, or write it as an article and publish it online, and then put it into use, referencing the link to the blog post or Articles now. It is possible to lose it online, but I have a distinct name, and I am pretty good at google so I am sure I can find it. The link reference is so I can keep it up to date if I happen to have to change anything along the way.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • Jonathan Kehayias (1/5/2009)


    GilaMonster (1/5/2009)


    I had a similar piece of code (for SQL) at my former company. I didn't take the code when I left, of course. I should probably rewrite and retest it sometime.

    The above is why I blog it, or write it as an article and publish it online, and then put it into use, referencing the link to the blog post or Articles now.

    The code I'm talking about was written and tested on company time. Hence it doesn't belong to me and I couldn't blog it or publish it. Anything I write on my own time is fair game for either.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I Guess I hadnt thought about from a speed perspective, but more so from an availability standpoint. Defrags dont prevent other processes from running like a reindex on a cluster. I'm not in a situation where the window is critical where a few extra clicks of CPU is going to slow anything down to the point of causing performance issues while the defrag does its thing. The Defrag seems to have a pretty small footprint, and stopping and restarting it hasnt ever caused any calamities for me, like a full reindex has.

    Now, If I am headed for a brick wall with this I would appreciate a heads up, but it just seems to work well for my clients at this point, no outages, no angry calls/emails/pages so far. When I check the fragmentation levels of the indexes, they seem to be staying stable. I generally set a threshold of 20 percent and its worked so far for me. I dont know that any of the major tables have ever gone over 30 percent, so perhaps that has been my saving grace. When I put up a new server - or take on an existing server, Since for some reason not everyone seems to be checking these things, I start by setting the threshold high, then bringing it down over a period of time, so if an index is really whacked out, we catch it a little at a time, starting with the most fragmented and moving down.

  • DBADave (1/4/2009)


    The log drive is also showing poor performance, although not as bad and most are during the transaction log backups that occur every 15 minutes. The transaction log backups should not tax a SAN to the point of a disk bottleneck every 15 minutes.

    Thanks, Dave

    A simple thing to check but sometimes overlooked, do you have 'Verify Backup Integrity' checked? This drives IO through the roof. I select this for full backups but not for log backups.

    The only meaningful verification of a backup is to perform a restore, e.g. once a week rebuild the Test instance from a Prod backup.

    Regards

    Karl

  • We don't use maintenance plans for backups, rather we code our own backup routines in SQL jobs. I assume you are referring to the CHECKSUM option, which is not referenced by our backup scripts.

    Thanks

  • GilaMonster (1/5/2009)


    Jonathan Kehayias (1/5/2009)


    GilaMonster (1/5/2009)


    I had a similar piece of code (for SQL) at my former company. I didn't take the code when I left, of course. I should probably rewrite and retest it sometime.

    The above is why I blog it, or write it as an article and publish it online, and then put it into use, referencing the link to the blog post or Articles now.

    The code I'm talking about was written and tested on company time. Hence it doesn't belong to me and I couldn't blog it or publish it. Anything I write on my own time is fair game for either.

    I guess I am blessed in that fact since my company doesn't care about my maintenance code in that manner. It isn't considered proprietary and I have the go ahead to share it online from our CIO. The last company I worked for was the same way as well. Code for an application is a different story as is database design and schema information.

    I do write a good bit more code at home than I do in the office though. I never seem to have the time to actually get to it or complete it during the work day between developers asking questions, general administration stuff, and meetings, there just isn't enough time ever.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • You mentioned using MAXDOP = 1 for reindexing, what's your thought on using it for DBCC DBREINDEX? I'm seeing more CXPACKETs and PAGEIOLATCH_SH then with reindexing so I figured why not try it there as well. Apparently to do this I need to enable trace flag 2528, which comes with some warnings.

    Disables parallel checking of objects by DBCC CHECKDB, DBCC CHECKFILEGROUP, and DBCC CHECKTABLE. By default, the degree of parallelism is automatically determined by the query processor. The maximum degree of parallelism is configured just like that of parallel queries. For more information, see max degree of parallelism Option.

    Parallel DBCC should typically be left enabled. For DBCC CHECKDB, the query processor reevaluates and automatically adjusts parallelism with each table or batch of tables checked. Sometimes, checking may start while the server is almost idle. An administrator who knows that the load will increase before checking is complete may want to manually decrease or disable parallelism.

    However, disabling parallel checking can cause a decrease in overall database performance. Decreasing the degree of parallelism increases the amount of transaction log that must be scanned. This in turn increases the demand for tempdb space and causes a nonlinear increase in the time that is required for DBCC to complete its checks. If DBCC is run with the TABLOCK feature enabled and parallelism set off, tables may be locked for longer periods of time.

    I would only attempt this on our largest database(s) since running DBCC CHECKDB against the smaller databses is not an issue in terms of our maintenance window. Any thoughts?

    Dave

Viewing 15 posts - 16 through 30 (of 39 total)

You must be logged in to reply to this topic. Login to reply