UPDATE

  • I'm on SQL Server 2000 Standard Edition.

    I need to update 10 000 000 records in 1 Table.

    My question is, what is the best method in updating this many records in 1 Table? Below is the Update statement.

    UPDATE TABLE

    SET Column = 'XYZ'


    Kindest Regards,

  • I can't get what u r tending 2 say.. What's there other than 'Update'

  • If you need to update all rows, without any WHERE clause, then my guess is that updating everything at once should work best. If there is a where clause and the current value of column to be updated is part of it, if there is a good selecitve index that could be used for update, then maybe you could split the update to smaller chunks... but I would probably vote for a simple straightforward UPDATE anyway. Split helps with DELETE, but it should not be necessary when updating.

    Just my 2 cents... but I never really updated 10M rows on a table (at least not in production database while users are working in it), so I can't be sure about that.

  • Read this if other processes are going to need to read from or write to during the update:

    Vladin's suggestion of breaking the update into chunks is worth considering. When you are attempting to update a table of that size your process will acquire an exclusive table lock. SQL server escalates locks when you approach (I think) 5000 locks or when it is running out of resources. For this reason SQL server will override locking hints that specify lock granularity for updates of this magnitude.

    When you have an exclusive table lock, and you are using the default transaction isolation level(or higher), other processes (even read proccesses) will be blocked until the update completes. Depending on your horsepower, this update could take several minutes.

    If other processes will need to read (with an exclusive table lock), you could use the NOLOCK hint (or a transaction isolation level of read uncomitted) but this may not always be the desired behaviour.

    When I am updating large tables that will need to be read by other processes during the update. I sacrifice overall performance for concurrency.

    SET ROWCOUNT 3000
    WHILE 1=1 --Be careful with this you don't want an infinte loop! 
    BEGIN
        UPDATE myTab
            SET myCol = @newVal 
        WHERE mycol  @newVal
    
        IF @@ROWCOUNT = 0 
            BREAK
    END
    SET ROWCOUNT = 0 
    

    This is a simple example, consider what indexes you have and also consider using real values in your while loop. (count how many rows need to be updated and increment your counter by 3000 each time. Please let me know if this makes sense or if you have any questions.

    Thanks!

    Robert

    SQL guy and Houston Magician

  • You might also consider adding a waifor in the loop to let other transactions execute on the table.

    Depending on the update you need to do and the power of the server, I suppose you could be able to update something like 10K rows/sec without breaking a sweat.  So you could do something like this :

    SET ROWCOUNT 10000

    WHILE 1=1 --Be careful with this you don't want an infinte loop!

    BEGIN

        UPDATE myTab

            SET myCol = @newVal

        WHERE mycol <> @newVal

        IF @@ROWCOUNT = 0

            BREAK

    waitfor delay 00:00:02 --wait 2 seconds before running again

    END

    SET ROWCOUNT = 0

    Assuming that the update takes exactly 1 sec to run.  The full update would be done in 50 minutes which seems reasonable since the speed of the server shouldn't be too affected by this and that the table wouldn't be locked long enough to slow anything down. I would go with a straight update only if that update were critical to the business.

    You might also consider doing more frequent transaction log backups while this is running depending on how short you are on HD space.

  • Good point Ninja,

    10000 would definitly be a more appropriate number. Depending on the width of the table, you could probably increase that number even further without SQL Server acquiring a table lock. As a rule of thumb, I make sure my update rowcount doesn't exceed 3000 pages of data. You can calculate a theoretical max by looking at the tabledef. (8000/maxRowWidth)*3000. If you have a halfway decent spec server, SQL should never escalate to a table lock.

    I could be wrong (and I often am!) but I think the waitfor is unneccessary. You will only be locking (for a short time) specific pagse of data that you will not touch again. Other processes that wish to read or write to rows on those pages will be blocked for (at most) seconds and then allowed to execute. Your update should not touch those pages again (and even if it does, it will be blocked by other processes that queued to hit those pages).

    SQL guy and Houston Magician

  • But doesn't it worry anyone that instead of scanning 10 million records at the end you will have scanned over 5 billion records?

  • Not if you index the column.  You can also use the slidding window trick (just search these forums to find it).

     

    Also the idea behind this technic is that you don't stop production for any amount of time (maybe just slow it down a bit).  Sometimes this is much more important than having the query be optimal (especially if it's only a one time deal).

  • I'm not sure the index helps when you use the '<>' operator.

  • Agreed... but we don't know the froms and tos of the update so we would just be speculating at this point.

     

    Also the sliding window method doesn't have that limitation .

  • Can't find anything on this 'sliding window' thing. Have you got a specific link?

  • The sliding window was used to do something slightly different but it can be modified for this case.  This is untested and from the top of my head :

    create table ttemp

    (

    id primary key clustered

    )

    insert into ttemp (id) select id from tmain

    set rowcount 10000

    while @@rowcount > 0

    begin

    update tmain set col = 'wathever'

    from tmain inner join ttemp on tmain.id = ttemp.id

    delete from ttemp

    --would have to check to make sure this process updates and delete matching rows too!!

    end

    set rowcount 0

    --check if new rows where inserted and update if needed... or set a new rule/default beforehand

    drop table ttemp

  • Ninja is correct and perhaps we both should have made that more clear, that the code was to demonstrate the general concept, and not a snippit to be used verbatim without looking at indexes and table usage. (in fact, if nobody is going to be touching the table then it really doesn't matter if you are blocking for several minutes)

    Very good example with the temp table. There are a lot of variations but it is a much clearer example of how one might approach this problem in the real world.

    Nice one!

    SQL guy and Houston Magician

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply