bulk updates on large tables

  • i need to update a new column (tinyint flag) on a 50+ million table. I get heavy locks on the clustered int PK when i do this and i can only think the row is getting deleted and a new inserted to cause this. Can sql not realise with a fixed size field like a tinyint/bit that if it is only that field being updated and there is not index on the table it doenst need to touch any other indexes? Does the column permitting null make a difference? Anyways that is besides the point.

    I believe the fastest way to do this operation is to perform the update in batches. on 50+ million what size batches would you use? The update involves joining to another table of 5 million to determine the flag value. The code i am using is as follows;

    declare @counter int

    set @counter=1

    while @counter<=52414880

    begin

    Update d

    set d.CustomerStatus=isnull(c.status,0)

    from dbo.Datapool d --52million

    left outer join dbo.Customers c --5million

    on d.ID between @counter and @counter+10000

    and d.GroupID = c.GroupID

    set @counter=@counter+10000

    end

    clustered PK on ID (identity int) and non clustered, non unique 60% selective index on GroupID (int), default 80% fill spec on both no padding. no need to worry about other processes. just speed. box only has 2gb ram and single 3gb xeon.

  • SQL does know what indexes to update and what ones not to, but your clustered index has the data pages, so it will always be updated.

    My guess is that you are getting locks because of simple lock escalation. Your query updates every record in the table, not just the ones that have changed. If it is all in one batch, the update will commit all at the same time so you have essentially specified a table lock for the duration of the update.

    Your batch idea will get around this, and as a one-time thing, it is not a terrible approach. Make sure you commit changes in each batch if you do this. The batch size will determine the number of row and page locks the update is going to take, so smaller will reduce contention but take longer. Balance it, but I would probably go with 1/2 a million per group in your case.

    I would also only update records that need to be changed. Remember that an ISNULL function in your WHERE clause will also cause table scans, but writing the update to only update the records it need to is a good idea.

  • thanks for the advice. I made your change so it only udpates where different (100% are different on the first runas they are defaulted to -1).

    I am going to try with 500k and see how it goes...

  • jb_lbm (3/20/2008)


    thanks for the advice. I made your change so it only udpates where different (100% are different on the first runas they are defaulted to -1).

    I am going to try with 500k and see how it goes...

    I usually "walk" the clustered index in batches from 20,000 to 50,000, depending on the size of the target row. 500K at a time might actually slow it down. There's definitely a sweet spot, and the further away from it in either direction, the slower the process seems to be.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply