UPDATE Best Practices

  • Hi Everybody,

    I was wondering if you guys have any opinions or know good source(s) for updating large tables(e.g 6,000,000 records and up) or importing large files into tables.

    When size of the data exceeds a certain limit then we need to reconsider some of the DML queries we've been using so far.

    Thanks

    Moe M
    Database Consultant
    http://www.cubeangle.com

  • "It depends ..."

    Updating 6mil records is not a specific enough description of the problem.

    Width of records, number of data pages in the table ?

    Type of application - OLTP with 24/7 availability required, or analysis warehouse/datamart, with several hours of overnight downtime ?

    Which columns being updated and are they included in 1 or more indexes ?

    Are updated columns foreign keys or are they referenced by foreign keys in other tables, requiring RI constraint checks on every change ?

    Does operation need to be logged, or is database in 'Simple' logging mode ?

     

  • Hi,

    Thanks for quick reply and sorry if my question was vague its just I am not very experienced with large datasets but willing to learn good techniques and practices.

    Width of Records: Each record has 5 Columns and roughly it is 50 chr long.

    Type of Application: This is a Data Warehouse application.

    Which columns being updated: There are two columns being updated. No index on the columns. Columns are not foriegn keys.

    Operation is being logged through a custom made logging stored procedure.

    Thanks,

    Moe M
    Database Consultant
    http://www.cubeangle.com

  • for importing/exporting large amounts of data from/to flat files, I use bcp.exe.

    see "bcp utility" in BOL.

    if you are importing a significant fraction of the current table size, it's a good idea to drop all indexes, import, and then recreate the indexes.

    also, use the -b flag in bcp so you don't blow out the transaction log.

    ---------------------------------------
    elsasoft.org

  • BCP is more robust.

    BULK_INSERT is the fastest but not by a lot.

    Both are nasty fast but BULK_INSERT can be run through a query without a trip through xp_CmdShell.  xp_CmdShell requires "SA" privs.

    BULK_INSERT will do about 5.4 million 20 column rows in 60 seconds.  Fast enough?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • it's not BULK_INSERT, but BULK INSERT, without the underscore.

    also BULK INSERT is one-sided: there's no BULK EXPORT.  so you have to use bcp for exporting, unless you love the GUI in which case you can use SSIS/DTS etc.

    ---------------------------------------
    elsasoft.org

  • Heh... you're right... no underscore... I shouldn't type late at night

    Don't get me wrong... I love BCP!  I love it's ability to log, isolate bad records and save them in a file, etc, etc.

    But there is another way to export especially if you have to export flat files from just a part of a table (normally means there's a design error, as well :hehe.  BCP won't let you exclude columns on an export even if you have a format file in play.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • You can exclude any columns you want in BCP if you use a query as input data source.  You can also do it using a view.

  • Thanks everybody,

    Recently I was reading an article suggesting to actually create a single index on the columns to be updated. Pre-sorting the file also helps a lot.

    I am going to test all the ideas and of course we have to take into consideration lots of stuff like does the transaction needs to be atomic or using a temp table helps or not.

    Thanks again.

    Moe M
    Database Consultant
    http://www.cubeangle.com

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply