Statistics move from DB to DB?

  • I think the technique we use to update a data warehouse is contributing to poor performance.

    We build on one machine, then we detach, copy it, and reattach it on another machine for use the next day for queries (while the build machine is building at the same time).

    This works great in terms of overlap, however...

    I think this means all the intelligence about what columns should have statistics for better queries is being figured out fresh each day on the query machine, then lost.

    There are hundreds of tables. Doing a manual Create Statistics is of course possible, but difficult.

    Has this ever come up? Anyone seen any scripts to save the settings from one database and go manually create them on another and then update them?

    Anyone see any reason not to take that approach, and so "remember" from day to day, plus incorporate any improvements auto-create did?

  • Why not just run sp_createstats on the database after it's built? Statistics will not change unless the data changes, and this is a data warehouse so the data is static; right?

    --Jonathan



    --Jonathan

  • Hmmm... there are a lot of tables and a lot of columns. I guess I need to do some experimentation to find out how that would go. I'm sure 80% of the columns are never used in selection criteria. The question is whether building ALL would take days or hours.

  • So you've set 'auto create statistics' to on, so the statistics are being created on the fly as needed. You believe the auto-creation is slowing queries down when it is invoked, so you wish to pregenerate the statistics in one fell swoop?

    Perhaps you have several hours after building but before copying when the "build" server is idle, so you can use that time to create the column statistics. This isn't enough time to create the statistics on all useable columns with sp_createstats, so you'd like to have a script to create the same statistics as were auto created on the "query" server over the course of a day...

    
    
    SELECT 'CREATE STATISTICS ' + QUOTENAME('_' + COL_NAME(k.id,k.colid))
    + ' ON ' + QUOTENAME(OBJECT_NAME(k.id)) + '(' + QUOTENAME(COL_NAME(k.id,k.colid)) + ')'
    FROM sysindexes i JOIN sysindexkeys k ON i.id = k.id AND i.indid = k.indid
    WHERE i.status & 32 = 32

    --Jonathan

    Edited by - Jonathan on 11/21/2003 3:09:06 PM



    --Jonathan

  • Well, I do not know for sure how much time it would take to create all statistics, but we really have very little free time over night.

    There are some really big tables that are not built every night, but added to. Those we could update statistics on without rebuilding them all, not necessarily all each night.

    Other tables (usually of a few hundred megabytes) are wiped and rebuilt, so they start with nothing. Maybe I could build those from scratch for all columns. Maybe.

    I'm really trying to work this out as I think through it. But it occurred to me that each day we re-learn what statistics are needed, rebuild them each morning as people start using them, and then wipe them out afresh. So I was more or less thinking out loud, and wondering if anyone had similar problems and solutions.

  • quote:


    Well, I do not know for sure how much time it would take to create all statistics, but we really have very little free time over night.

    There are some really big tables that are not built every night, but added to. Those we could update statistics on without rebuilding them all, not necessarily all each night.

    Other tables (usually of a few hundred megabytes) are wiped and rebuilt, so they start with nothing. Maybe I could build those from scratch for all columns. Maybe.

    I'm really trying to work this out as I think through it. But it occurred to me that each day we re-learn what statistics are needed, rebuild them each morning as people start using them, and then wipe them out afresh. So I was more or less thinking out loud, and wondering if anyone had similar problems and solutions.


    I doubt that auto-creation (and auto-update?) as needed is much slower overall than the time taken to do this manually; that's why I was assuming you had some spare time that you were trying to utilize.

    I would not attempt to actually copy the histograms (sysindexes.statblob) from one database to another.

    --Jonathan



    --Jonathan

  • quote:


    would not attempt to actually copy the histograms (sysindexes.statblob) from one database to another.


    Goodness no. I was more just trying to identify what statistics are worth building. The more I think about this (and I appreciate the sounding board) the more likely answer is probably just to find a way to build all of them once, and rebuild key ones nightly and update others on some regular basis.

  • Just to put this to bed.... Building them all at once was just fine, it was a very few minutes despite being a large (140G) database.

    It's the samplying that allows it to work. I tried one of the smaller-large tables (maybe 10M rows) and doing it with a full scan takes many minutes, but much less than a minute otherwise.

    Though of course this leads to the question whether sampling is good enough, but that's for a discussion another day.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply