What are the Potential Dangers of Too Many Statistics?

  • I'm busy creating keys and indices for a database that had no keys and basically no indices. I'm seeing a remarkable improvement so far, so I ran seven queries through the Database Tuning Advisor to see what it suggested. It suggested an imperial shipload worth of statistics to be created on some of the tables, especially the main table which contains the historical data.

    A potential danger that I see is that SQL Server will have much more work to do whenever then data is changed or selected. However, how much work is involved in updating indices if AUTO_UPDATE_STATISTICS is set to ON?

    More information:

    Concerning the usage of the data, reports might not be run at all for a week, but then at month end there are approximately 200k records inserted across several days in batches of approximately 15k each and then a bunch of reports are run.

    There's an interesting (IMO) article by Josef Richberg on statistics and indices titled "Index Primer - Just what statistics are kept?" here.

    (Why doesn't the forum automatically underline hyperlinks, as is the norm almost everywhere else on the Internet?)

  • There is another interesting article by Gail Shaw:

    http://qa.sqlservercentral.com/articles/Indexing/68439/

    The section on Index Limits may be of interest to you.

    BrainDonor.

    Steve Hall
    Linkedin
    Blog Site

  • Thanks BrainDonor. I've read it, but I'll read it again.

  • Well, if you have autocreate statistics on, I'm going to guess that SQL Server will end up creating most of the statistics recommended by DTA.

    In my opinion, indexing is more an art than a science. If your database is more of a decision support database (Reporting) than OLTP than adding indexes usually is a good thing. I would also bet that if you create covering indexes then DTA will not recommend as many statistics.

    According to Kimberly Tripp's blog post:

    Statistics are traditionally updated when roughly 20% (+ a minimum of 500 rows) of the data has changed. If you have a table with 100,000 rows then SQL Server will hit the threshold at 20,500 rows (just a tad over 20%) and a table with 10,000 rows will hit the threshold at 2,500 rows (which is 25%). Additionally, in SQL Server 200x (2000, 2005 and 2008) statistics are NOT immediately updated when the threshold is reached, instead they are invalidated. It's not until someone needs the statistic that SQL Server updates it.

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • GDI Lord (11/10/2009)


    It suggested an imperial shipload worth of statistics to be created on some of the tables, especially the main table which contains the historical data.

    Don't trust DTA. It's way too fond of recommending loads and loads of statistics (and loads of unnecessarily wide indexes). Test its recommendations out carefully, if you don't see a performance improvement from adding something, then don't add it in production.

    I asked in a session that I did on statistics how many people manually created statistics. Out of a room of over 100, 2 people had. One because he was writing a book, one because DTA told him so.

    I [personally have never manually created statistics (except when playing around and writing demos for the stats session)

    A potential danger that I see is that SQL Server will have much more work to do whenever then data is changed or selected. However, how much work is involved in updating indices if AUTO_UPDATE_STATISTICS is set to ON?

    None at the time of update. Stats are invalidated after a number of changes (20%) and updated the next time they are needed.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Jack Corbett (11/10/2009)


    Well, if you have autocreate statistics on, I'm going to guess that SQL Server will end up creating most of the statistics recommended by DTA.

    DTA often suggests multi-column statistics, which Auto_Create will never create. Whether those multi-column stats are truly needed or not is another matter

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail,

    Thanks for the clarification. I was, obviously, assuming single column stats.

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • Jack Corbett (11/10/2009)


    Well, if you have autocreate statistics on, I'm going to guess that SQL Server will end up creating most of the statistics recommended by DTA.

    My opinion too.

    Jack Corbett (11/10/2009)


    In my opinion, indexing is more an art than a science. If your database is more of a decision support database (Reporting) than OLTP than adding indexes usually is a good thing. I would also bet that if you create covering indexes then DTA will not recommend as many statistics.

    Thanks.

    GilaMonster (11/10/2009)


    Don't trust DTA... Test its recommendations out carefully

    And also ask people who are more experienced and knowledgeable than myself 🙂

    GilaMonster (11/10/2009)


    I asked in a session that I did on statistics how many people manually created statistics. Out of a room of over 100, 2 people had. One because he was writing a book, one because DTA told him so.

    I personally have never manually created statistics (except when playing around and writing demos for the stats session)

    I think that's pretty telling...

    GilaMonster (11/10/2009)


    DTA often suggests multi-column statistics, which Auto_Create will never create.

    Why?

    GilaMonster (11/10/2009)


    Whether those multi-column stats are truly needed or not is another matter

    Is there any way to find out other than testing them?

    Thanks again guys. Visiting SSC.com is almost always a fantastic learning experience!

  • GDI Lord (11/12/2009)


    GilaMonster (11/10/2009)


    DTA often suggests multi-column statistics, which Auto_Create will never create.

    Why?

    Why does DTA recommend them or why does auto_create never create them?

    For the first, I don't know. Personally I'd rather create composite indexes than composite stats.

    As for the second, one of the main reasons for stats is to give the optimiser info on the distribution of data in a column. Said distribution (the histogram) is only kept for the leading column on multi-column stats. The only info on the second column is the density.

    GilaMonster (11/10/2009)


    Whether those multi-column stats are truly needed or not is another matter

    Is there any way to find out other than testing them?

    Nope. Test in a dev environment, see if there's any improvement, if not don't create in prod.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Gail.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply