• Good question.

    The advice I've always understood to be correct and which I've found has been born out in practice is that if there's been significant changes in the values of the indexed columns, then it's a good idea to update the statistics. However, if you do this in business hours, it would put extra load onto the server immediately after the load created by dropping and re-creating the indexes. And that could potentially cause timeouts, although the index drop would be the major culprit.

    So counter-intuitive as it seems, it might actually be a better idea to let the optimiser re-create the stats on the fly as and when it needs them. This would slow down each individual query when it initially runs, as the stats would need to be created on the fly as it ran, but at least it would spread out the load over a period of time, rather than creating a potentially large, continuous load.

    But if you have the luxury of scheduling the re-index for out-of-business hours, then I'd definitely update the stats immediately after the re-index.

    This got me thinking...

    The statistics need to be kept up-to-date because the query processor uses the them to work out which index to use when executing a query. It does

    this by looking at all of the indexes that could potentially help it to improve the performance of a query, then picks those which give the best selectivity.

    It works out the selectivity by looking at the statistics which have been created for the index, and choosing those which return the fewest number

    of rows.

    >READ THE NEXT PARAGRAPH WITH CAUTION : I AM NOT TOTALLY CERTAIN THAT THIS IS EXACTLY HOW THINGS WORK<

    I used to believe that the statistics were created immediately after you (re-)create an index, and would only be changed when you either re-built

    the index, or explicitly update the stats. However, I'm now under the impression that my belief was wrong and that the statistics are not created

    until the point at which the query optimiser needs them. If the statistics don't exist, the optimiser creates them on-the-fly, in order to assess

    which index is the best to use. However, if they already exist for an index, then it uses the existing ones, regardless of whether or not they are usefully up-to-date.

    The thing that makes me doubt the accuracy of this statement is that if you look at the syntax of the CREATE INDEX command, there's an option in there called STATISTICS_NORECOMPUTE.

    Using this in a CREATE INDEX ensures "that out-of-date index stats are not automatically recomputed." (Books Online, TSQL Reference, CREATE INDEX)

    Which implies that if it IS omitted, then the index will be created, and the stats re-computed on it immediately afterwards.

    Mmmmm....

    >END OF CAUTION<

    Now, if your statistics don't reflect the 'selectivity-ness' of the data in the indexed column at least reasonably accurately, then it might cause an

    inappropriate index to be selected, or no index at all when executing the query.

    Obviously this would slow the performance of your database. Trouble is, do any statistics exist for an index after you have dropped and re-created it, which can be updated using UPDATE STATISTICS ? Answer is, I'm not certain.

    Therefore I'd say it's a good idea to update the stats if you believe that the indexes do not accurately reflect the data. Given that you are re-building the indexes, it's most likely a good idea for you to update the stats too (if they don't get re-created with your index re-build).

    However, if you have auto update statistics turned on for the database by default (Database Properties -> Options tab from Enterprise Manager, or run sp_dboption <database_name> from Query Analyser),

    then they should automatically get updated after a certain number of changes have occurred without you having to worry about it. The downside is that this might happen when the database is at it's busiest, putting even more load on an already heavily hit server.

    In summary, like the index re-build, it would be best to do it out-of-hours on a very large database, as it will have a hit on the server's performance.

    How much of a hit depends on what percentage sample of the index you specify to read ; as usual with database performance, this is a trade-off - the greater the sample, the more accurate your statistics will be, but the longer it will take to sample them, and therefore the greater the load on the server. However, the update stats load will be nowhere as heavy as actually re-building the indexes, as it involves mainly reading data.

    Finally, as Shas stated, it's also a good idea to run sp_recompile against your tables periodically. This is because the query plan which is held for a stored proc, and which is one of a number of factors which make stored procs faster than dynamic SQL, uses index statistics to improve the performance of the stored proc. Although SQL Server will try to re-optimise on the fly if it thinks that the plan is not the most efficient one, this obviously takes time, so it is good practice to run it out of business hours. You won't notice staggering improvements, but every bit helps.

    If you want to read up more on statistics, Books Online has some excellent help.

    There are also some useful informative articles at http://www.sql-server-performance.com/statistics.asp

    Edited by - jonreade on 07/22/2003 07:31:19 AM


    Jon