re-index

  • just a quick question, I am using the command 'dbcc dbreindex (tablename, '', 80) to re-index some larger tables of the system.

    I know if you drop and re-create the index of a table, you'd run 'update statistics' after that, to get the changes of index updated to the statistics of the table in the system,etc.

    so my question here is that, After I run DBCC DBREINDEX, should I 'update statistics' of that table immediately?!

    THANKS A LOT!

  • I prefer to do UPDATE STATS and recompile the stored procs with sp_recompile after the index rebuild

    Shas3

  • Good question.

    The advice I've always understood to be correct and which I've found has been born out in practice is that if there's been significant changes in the values of the indexed columns, then it's a good idea to update the statistics. However, if you do this in business hours, it would put extra load onto the server immediately after the load created by dropping and re-creating the indexes. And that could potentially cause timeouts, although the index drop would be the major culprit.

    So counter-intuitive as it seems, it might actually be a better idea to let the optimiser re-create the stats on the fly as and when it needs them. This would slow down each individual query when it initially runs, as the stats would need to be created on the fly as it ran, but at least it would spread out the load over a period of time, rather than creating a potentially large, continuous load.

    But if you have the luxury of scheduling the re-index for out-of-business hours, then I'd definitely update the stats immediately after the re-index.

    This got me thinking...

    The statistics need to be kept up-to-date because the query processor uses the them to work out which index to use when executing a query. It does

    this by looking at all of the indexes that could potentially help it to improve the performance of a query, then picks those which give the best selectivity.

    It works out the selectivity by looking at the statistics which have been created for the index, and choosing those which return the fewest number

    of rows.

    >READ THE NEXT PARAGRAPH WITH CAUTION : I AM NOT TOTALLY CERTAIN THAT THIS IS EXACTLY HOW THINGS WORK<

    I used to believe that the statistics were created immediately after you (re-)create an index, and would only be changed when you either re-built

    the index, or explicitly update the stats. However, I'm now under the impression that my belief was wrong and that the statistics are not created

    until the point at which the query optimiser needs them. If the statistics don't exist, the optimiser creates them on-the-fly, in order to assess

    which index is the best to use. However, if they already exist for an index, then it uses the existing ones, regardless of whether or not they are usefully up-to-date.

    The thing that makes me doubt the accuracy of this statement is that if you look at the syntax of the CREATE INDEX command, there's an option in there called STATISTICS_NORECOMPUTE.

    Using this in a CREATE INDEX ensures "that out-of-date index stats are not automatically recomputed." (Books Online, TSQL Reference, CREATE INDEX)

    Which implies that if it IS omitted, then the index will be created, and the stats re-computed on it immediately afterwards.

    Mmmmm....

    >END OF CAUTION<

    Now, if your statistics don't reflect the 'selectivity-ness' of the data in the indexed column at least reasonably accurately, then it might cause an

    inappropriate index to be selected, or no index at all when executing the query.

    Obviously this would slow the performance of your database. Trouble is, do any statistics exist for an index after you have dropped and re-created it, which can be updated using UPDATE STATISTICS ? Answer is, I'm not certain.

    Therefore I'd say it's a good idea to update the stats if you believe that the indexes do not accurately reflect the data. Given that you are re-building the indexes, it's most likely a good idea for you to update the stats too (if they don't get re-created with your index re-build).

    However, if you have auto update statistics turned on for the database by default (Database Properties -> Options tab from Enterprise Manager, or run sp_dboption <database_name> from Query Analyser),

    then they should automatically get updated after a certain number of changes have occurred without you having to worry about it. The downside is that this might happen when the database is at it's busiest, putting even more load on an already heavily hit server.

    In summary, like the index re-build, it would be best to do it out-of-hours on a very large database, as it will have a hit on the server's performance.

    How much of a hit depends on what percentage sample of the index you specify to read ; as usual with database performance, this is a trade-off - the greater the sample, the more accurate your statistics will be, but the longer it will take to sample them, and therefore the greater the load on the server. However, the update stats load will be nowhere as heavy as actually re-building the indexes, as it involves mainly reading data.

    Finally, as Shas stated, it's also a good idea to run sp_recompile against your tables periodically. This is because the query plan which is held for a stored proc, and which is one of a number of factors which make stored procs faster than dynamic SQL, uses index statistics to improve the performance of the stored proc. Although SQL Server will try to re-optimise on the fly if it thinks that the plan is not the most efficient one, this obviously takes time, so it is good practice to run it out of business hours. You won't notice staggering improvements, but every bit helps.

    If you want to read up more on statistics, Books Online has some excellent help.

    There are also some useful informative articles at http://www.sql-server-performance.com/statistics.asp

    Edited by - jonreade on 07/22/2003 07:31:19 AM


    Jon

  • To quote :

    "SQL Server knows statistical information about your data through a special object in the database (which I refer to as the statistics blob). The statistics blob contains information about how the data is distributed throughout the table and also calculates the average number of duplicates for the indexed columns. When you create an index, if the table already contains data, a statistics blob is created and populated with information about the columns that you've indexed. If you create an index on a table and then add the data (and you haven't turned off auto gathering of statistics), a statistics blob is created for you. If you turn off SQL Server's capability to automatically keep updated statistics about an index, this statistics blob isn't created. If this information doesn't exist, SQL Server must guess about how your data is distributed. When your query contains a clause, such as col1 < 3, SQL Server consults the statistics blob to guess

    how many data rows will be returned. The more current your statistics blob, the

    better job SQL Server does selecting the proper index (or deciding to use a table

    scan)." (Sams Teach Yourself Microsoft SQL Server 2000 in 21 Days)

    Would it be worthwhile running

    select stats_date(table_id, undex_id)

    after doing the DBCC dbreindex to check that the stats have been updated if it is not specified to update them as an option in DBCC dbreindex to see if SQL Server has done an update then, or if it does the update when the next query is run?

  • What are the criteria SQL Server uses when deciding to auto update statistics? And, what sample does it use - presumable not 100%?

    Andy

  • Andy,

    According to the article at http://www.sql-server-performance.com/statistics.asp :

    "If the number of rows in a table are greater than 6, but less than or equal to 500, then statistics are automatically updated when there have been 500 modifications made.

    If the number of rows in the table are greater than 500, then updates are automatically made when (500 plus 20 percent of the number of rows in the table) have been modified."

    Not sure about the formula for working out the sample size, but almost certain it's not 100%, at least not on very large tables.

    Edited by - jonreade on 07/28/2003 05:57:38 AM


    Jon

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply