Quickest daily data dump

  • Greetings!

    I just migrated some a table for data snapshots from MySQL to SQL Server 2000. I basically pulled many fields from quite a few different Oracle tables into one flat table dumped into MySQL for faster display on our intranet site. I found with MySQL, instead of deleting and re-populating the table every night, it would just be easier and much faster to drop the table and recreate the table (with its indexes) before re-populating it.

    Now, I see things a bit different with SQL Server in terms of how it does indexes. Is it still better to drop the table and re-create it and re-build all indexes to accomplish the same thing or should I now use delete all records from table? Please advise.

    Thanks.

  • Delete and and repopulate may take more time because delete is logged operation...

    You better test for optimal solution..

    1. Truncate the table, drop the indexes and repopulate the using bcp/dts.

    2. Truncate the table and repopulate table without dropping indexes..

    3.Create new table table_new, populate table_new and create the indexes on on table_new and then either drop or rename the orginal table to table_old and rename the table_new to original table...

    In my opinion 1 and 3 might be faster than 2. Make sure you use Fast load option if you are using DTS...

     

    MohammedU
    Microsoft SQL Server MVP

  • Thanks! This is what I need to know.. all of options available to me. I'm gonna check even option 2 because if it's relatively fast to do this, this might be the best solution for me as it takes the least amount of steps. Thanks again!

  • Using DELETE is not a practical option, it logs the deletion of each row one at a time.  Your choices are either TRUNCATE or DROP/CREATE.

    It is hard to say with any confidence what the best option is because it depends on the size of the table, the number of indexes, the amount of free space in your database and in tempdb, and so on.  Try several options and see what works best.

    You'll get the fastest load performance if you load the data into a table with no indexes.  On the other hand, if you have a clustered index you may get better overall performance with this in place during the load.  Creating a clustered index after the load involves copying all the data to a new structure, so it requires a lot of extra free space.  If the clustered index already exists you will get an execution plan that sorts the data before inserting it into the table.  This also requires extra disk space, but in tempdb instead of the the destination database.  I think that having the clustered index in place before the load should be quicker overall and should cause less transaction log activity.

    Another way to speed up loads into tables with a clustered index is to sort the data at the source (make Oracle do some of the work) to match the clustered index, and use the ORDER hint in bcp or BULK INSERT to tell SQL Server that the data is already sorted.

    Nonclustered indexes should almost certainly be dropped and recreated after the load.  If your tempdb can handle it, using the WITH SORT_IN_TEMPDB option on the CREATE INDEX statements should speed up the job.

    Other things that will speed up a load operation are to drop or disable all triggers, constraints, and foreign keys on the table, and to use the TABLOCK hint.

  • Out of curiosity, why didn't you try to 'tune' the queries that pukled the data from Oraqcle originally ? That point aside, I'd liek to optimize Mohammed's suggestions and also add a few additional points:

    • drop indexes - doing this first saves a few page deallocations in the log
    • truncate table - quicker than delete since is logs only page deallocations as opposed to rows
    • repopulate table - bcp or DTS, your testing wil determine the quickest tool
    • create indexes - use FILLFACTOR=100 since it is read only and refreshed daily
    • update statistics on table - 'fresh' index statistics are needed to generate optimal query plans
    • exec sp_recompile for the table - this lets all stored procedures that use this table know that they need to be recompiled in order to take advantage of new (and optimal) statistics for the execution plan
    • exec sp_refreshview for views that use the table - same as sp_recompile except it is for views that use the table
    • update usage on database - this will help keep space reporting on the database up to date

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply