bulk updates on large tables

Question

Post reply

bulk updates on large tables

JB-117464

Ten Centuries

Points: 1279
More actions
March 20, 2008 at 7:55 am

#184410

i need to update a new column (tinyint flag) on a 50+ million table. I get heavy locks on the clustered int PK when i do this and i can only think the row is getting deleted and a new inserted to cause this. Can sql not realise with a fixed size field like a tinyint/bit that if it is only that field being updated and there is not index on the table it doenst need to touch any other indexes? Does the column permitting null make a difference? Anyways that is besides the point.

I believe the fastest way to do this operation is to perform the update in batches. on 50+ million what size batches would you use? The update involves joining to another table of 5 million to determine the flag value. The code i am using is as follows;

declare @counter int
set @counter=1 while @counter<=52414880 begin Update d set d.CustomerStatus=isnull(c.status,0) from dbo.Datapool d --52million left outer join dbo.Customers c --5million on d.ID between @counter and @counter+10000 and d.GroupID = c.GroupID set @counter=@counter+10000
end

clustered PK on ID (identity int) and non clustered, non unique 60% selective index on GroupID (int), default 80% fill spec on both no padding. no need to worry about other processes. just speed. box only has 2gb ram and single 3gb xeon.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Michael Earl-395764 SSC Guru Points: 53873 More actions · Answer 1

SQL does know what indexes to update and what ones not to, but your clustered index has the data pages, so it will always be updated.

My guess is that you are getting locks because of simple lock escalation. Your query updates every record in the table, not just the ones that have changed. If it is all in one batch, the update will commit all at the same time so you have essentially specified a table lock for the duration of the update.

Your batch idea will get around this, and as a one-time thing, it is not a terrible approach. Make sure you commit changes in each batch if you do this. The batch size will determine the number of row and page locks the update is going to take, so smaller will reduce contention but take longer. Balance it, but I would probably go with 1/2 a million per group in your case.

I would also only update records that need to be changed. Remember that an ISNULL function in your WHERE clause will also cause table scans, but writing the update to only update the records it need to is a good idea.

JB-117464 Ten Centuries Points: 1279 More actions · Answer 2

JB-117464

Ten Centuries

Points: 1279

March 20, 2008 at 9:51 am

#792091

thanks for the advice. I made your change so it only udpates where different (100% are different on the first runas they are defaulted to -1).

I am going to try with 500k and see how it goes...

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 3

jb_lbm (3/20/2008)

thanks for the advice. I made your change so it only udpates where different (100% are different on the first runas they are defaulted to -1).

I am going to try with 500k and see how it goes...

I usually "walk" the clustered index in batches from 20,000 to 50,000, depending on the size of the target row. 500K at a time might actually slow it down. There's definitely a sweet spot, and the further away from it in either direction, the slower the process seems to be.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?