Out of date statistics

  • I recently was alerted to some time outs occurring in our main application. I started a trace and discovered a stored proc that was causing the problem. None of the indexes on the related tables had fragmentation above 3%, so I started looking at statistics.

    I updated stats on one table at a time as follows:

    UPDATE STATISTICS myTableName WITH ALL, FULLSCAN

    After each update, I would try to reproduce the time out error. It was after updating stats on the 4th table that the time out error went away. I theorized that a large number of records were involved in some DML operations sometime after stats had last been updated for the table. Here is what I found:

    Total number of records in table: 1.487 million

    DateTime of last stats update before the time outs: 03/16/2008 6:00am

    Number of records inserted/updated since last stats update: 176

    Number of records deleted since last stats update: 0

    So...DML operations on about one tenth of a percent of the table caused the statistics to be out of date. Is this a fair assessment? If statistics are out of date and causing poor performance, how would I know? Right now, the only indication is the influx of phone calls from the end users.

    Ideally, I'd like to have a sql job run periodically throughout the day and update stats on an as-needed basis.

  • A possible sign that your statistics are out of date is if the actual and estimated row counts in the execution plans are very different.

    I expect that you know, but in case you don't, there is an Auto Update statistics option for each of your databases which should normally be set to True.

    David

  • David (3/16/2009)


    A possible sign that your statistics are out of date is if the actual and estimated row counts in the execution plans are very different.

    I expect that you know, but in case you don't, there is an Auto Update statistics option for each of your databases which should normally be set to True.

    David

    Comparing row counts between actual and estimated row counts is fine when I'm sitting at my desk using SSMS. I need a solution that can be automated, though...

    Yes, Auto Update Statistics is enabled/true for the database. (It done asynchronously on the db.)

  • We've found that SQL2k5 seems to be more sensitive to variations in statistics, in that bad query plans (ie. slow) are more likely to be used than when the statistics are "fresh".

    The system in question has very skewed data (first element in clustered index has 2,000 different values but 60% of rows have one value and over 99% have one of 4 values) and fields in all indices, including the clustered index, are updated relatively frequently. In the largest tables (there are around a dozen tables of 20-400 million rows) the number of rows increases by only ~0.5% per month and ~0.2% of rows in the clustered indices are updated, but if the stats are not updated weekly then by the third or fourth week the performance starts to degrade significantly (NB. they're vendor supplied indices that we're contractually not allowed to change).

    The update stats job that should already be in your maintenance plan is critical to the performance of your applications, possibly more so than the rebuild indices job, and if you're not running it at least weekly then I'd make that change as soon as practical (more often is probably not necessary, depending on the system).

  • Do not put Autoupdate stats ON, it does affect your database performance. I would prefer running a job to update statistics now and then after analyzing the data.

  • The update stats job that should already be in your maintenance plan is critical to the performance of your applications, possibly more so than the rebuild indices job

    This has always been my experience as well!


    * Noel

  • The update stats job that should already be in your maintenance plan is critical to the performance of your applications, possibly more so than the rebuild indices job, and if you're not running it at least weekly then I'd make that change as soon as practical (more often is probably not necessary, depending on the system).

    I already have a sql job that updates stats and rebuilds indexes once per day. No worries there.

    What I really need is to be able to programmatically determine which particular statistics are out of date. Then I can automate the process of updating stats as needed throughout the day.

    In my production environment, updating stats is sometimes needed more than once per day. Obviously, I could schedule my sql job to run multiple times throughout the day. But I don't want to take a performance hit for unnecessary rebuilds.

  • Krishna (3/16/2009)


    Do not put Autoupdate stats ON, it does affect your database performance. I would prefer running a job to update statistics now and then after analyzing the data.

    AFAIK, disabling AutoUpdate of Statistics is bad advice left over from the SQl 6.5/7.0 days;

    updated statistics are critical, and allowing AutpUdate=true actually have a minimal impact in 2000/2005.

    statistics get updated when 20 percent of the rows in the table have been inserted/updated...20% in a million row table takes a long time to get to, and that's why running statistics more often on big tables than autoupdate would do is often beneficial.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Hi Dave,

    Can you tell me how do you get the information of data inserted/updated/deleted since last update statistics...is there any DMV or table to look out..

  • Joseph (3/17/2009)


    Hi Dave,

    Can you tell me how do you get the information of data inserted/updated/deleted since last update statistics...is there any DMV or table to look out..

    Hi Joseph,

    •Deletes are easy: there are none. (Each table has a [Deleted] BIT column).

    •Each table has a LastChangedDate DATETIME column (it defaults to GETDATE()). Every time the table is updated, LastChangedDate is updated to GETDATE(). In addition, each table has an update trigger that inserts the entirety of the logical deleted table into a corresponding history table. For example, table dbo.MyAppTable. It has an update trigger that inserts into dbo.MyAppTableHistory. Collectively, these mechanisms allow me to determine the number of inserts/updates since the last stats update.

  • ok i put together a script i thought would identify statistics that might need to be updated...

    but when i run the results of the script, a few of the tables remain in the next pass of the script, usually becuase the # rows modified is the smae as the # total rows.

    results look something like this:

    [font="Courier New"]Db  SN  TABLE Name        Mod      Tot    %Mod    STATISTICS SQL

    KHC dbo CENSUSTRACTS      65693  65693  100.00    UPDATE STATISTICS dbo.CENSUSTRACTS       --100% No real Stats Rule

    KHC dbo CMENTITY              2   1740    0.11    

    KHC dbo CMENTITY              3   1740    0.17    

    KHC dbo Foreign_Key_Table     1      1  100.00    UPDATE STATISTICS dbo.Foreign_Key_Table  --20% Small Table Rule

    KHC dbo GMAACCP_OLD          29  10113    0.29    

    KHC dbo GMAACCP_OLD          58  10113    0.57    

    KHC dbo GMACTATTRIBUTES    4427   4292  100.00    UPDATE STATISTICS dbo.GMACTATTRIBUTES    --100% No real Stats Rule

    KHC dbo GMACTMEASURES_OLD 10636   5318  100.00    UPDATE STATISTICS dbo.GMACTMEASURES_OLD  --100% No real Stats Rule[/font]

    here's my script as it stands:

    --The Analysis: my crappy assumptions:

    --tables under 1000 rows, I'll leave then at 20%

    --tables with more rows than that, I'll use an arbitrary sliding scale formula.

    --formula to be modified based on analysis

    SELECT X.*,

    ISNULL(CASE

    WHEN X.[Total Rows]<=1000

    THEN

    CASE

    WHEN [Percent Modified] >=20.0

    THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' --20% Small Table Rule'

    END

    WHEN [Percent Modified] =100.00

    THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' --100% No real Stats Rule'

    --WHEN X.[Rows Modified] > 1000

    --THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' --1000 Rows Modified Rule'

    ELSE

    CASE

    WHEN X.[Total Rows] > 1000000000 --billion rows

    THEN CASE

    WHEN [Percent Modified] > 0.1

    THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' -- 1B Big Table Rule'

    END

    WHEN X.[Total Rows] > 100000000 --hundred million rows

    THEN CASE

    WHEN [Percent Modified] > 1.0

    THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' -- 100M Big Table Rule'

    END

    WHEN X.[Total Rows] > 10000000 --ten million rows

    THEN CASE

    WHEN [Percent Modified] > 2.0

    THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' -- 10M Big Table Rule'

    END

    WHEN X.[Total Rows] > 1000000 --million rows

    THEN CASE

    WHEN [Percent Modified] > 5.0

    THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' -- 1M Big Table Rule'

    END

    WHEN X.[Total Rows] > 100000 --hundred thousand rows

    THEN CASE

    WHEN [Percent Modified] > 10.0

    THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' -- 100K Big Table Rule'

    END

    WHEN X.[Total Rows] > 10000 --ten thousand rows

    THEN CASE

    WHEN [Percent Modified] > 20.0

    THEN 'UPDATE STATISTICS ' + [Schema Name] + '.' + [Table Name] + ' -- 10K Big Table Rule'

    END

    END

    END,'') AS [Statistics SQL]

    FROM (

    SELECT DISTINCT

    DB_NAME() AS [Database],

    S.name AS [Schema Name],

    T.name AS [Table Name],

    I.rowmodctr AS [Rows Modified],

    P.rows AS [Total Rows],

    CASE

    WHEN I.rowmodctr > P.rows

    THEN 100

    ELSE CONVERT(decimal(8,2),((I.rowmodctr * 1.0) / P.rows * 1.) * 100.0)

    END AS [Percent Modified]

    FROM

    sys.partitions P

    INNER JOIN sys.tables T ON P.object_Id = T.object_id

    INNER JOIN sys.schemas S ON T.schema_id = S.schema_id

    INNER JOIN sysindexes I ON P.object_id = I.id

    WHERE P.index_id in (0,1)

    AND I.rowmodctr > 0

    ) X

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I think we can get how many inserted/deleted/updates together from the column 'rowmodctr' of sys.sysindexes tables..

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply