UPDATE

Question

Post reply

UPDATE

TriggerMe

SSChampion

Points: 11752
More actions
November 1, 2006 at 11:44 pm

#192987

I'm on SQL Server 2000 Standard Edition.

I need to update 10 000 000 records in 1 Table.

My question is, what is the best method in updating this many records in 1 Table? Below is the Update statement.

UPDATE TABLE

SET Column = 'XYZ'

Kindest Regards,

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

faizjmc SSC Eights! Points: 881 More actions · Answer 1

faizjmc

SSC Eights!

Points: 881

November 2, 2006 at 4:03 am

#669413

I can't get what u r tending 2 say.. What's there other than 'Update'

Vladan SSC-Insane Points: 22072 More actions · Answer 2

If you need to update all rows, without any WHERE clause, then my guess is that updating everything at once should work best. If there is a where clause and the current value of column to be updated is part of it, if there is a good selecitve index that could be used for update, then maybe you could split the update to smaller chunks... but I would probably vote for a simple straightforward UPDATE anyway. Split helps with DELETE, but it should not be necessary when updating.

Just my 2 cents... but I never really updated 10M rows on a table (at least not in production database while users are working in it), so I can't be sure about that.

R Michael SSCarpal Tunnel Points: 4211 More actions · Answer 3

Read this if other processes are going to need to read from or write to during the update:

Vladin's suggestion of breaking the update into chunks is worth considering. When you are attempting to update a table of that size your process will acquire an exclusive table lock. SQL server escalates locks when you approach (I think) 5000 locks or when it is running out of resources. For this reason SQL server will override locking hints that specify lock granularity for updates of this magnitude.

When you have an exclusive table lock, and you are using the default transaction isolation level(or higher), other processes (even read proccesses) will be blocked until the update completes. Depending on your horsepower, this update could take several minutes.

If other processes will need to read (with an exclusive table lock), you could use the NOLOCK hint (or a transaction isolation level of read uncomitted) but this may not always be the desired behaviour.

When I am updating large tables that will need to be read by other processes during the update. I sacrifice overall performance for concurrency.

SET ROWCOUNT 3000
WHILE 1=1 --Be careful with this you don't want an infinte loop! 
BEGIN
    UPDATE myTab
        SET myCol = @newVal 
    WHERE mycol  @newVal

    IF @@ROWCOUNT = 0 
        BREAK
END
SET ROWCOUNT = 0

This is a simple example, consider what indexes you have and also consider using real values in your while loop. (count how many rows need to be updated and increment your counter by 3000 each time. Please let me know if this makes sense or if you have any questions.

Thanks!

Robert

SQL guy and Houston Magician

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 4

You might also consider adding a waifor in the loop to let other transactions execute on the table.

Depending on the update you need to do and the power of the server, I suppose you could be able to update something like 10K rows/sec without breaking a sweat. So you could do something like this :

SET ROWCOUNT 10000

WHILE 1=1 --Be careful with this you don't want an infinte loop!

BEGIN

UPDATE myTab

SET myCol = @newVal

WHERE mycol <> @newVal

IF @@ROWCOUNT = 0

BREAK

waitfor delay 00:00:02 --wait 2 seconds before running again

END

SET ROWCOUNT = 0

Assuming that the update takes exactly 1 sec to run. The full update would be done in 50 minutes which seems reasonable since the speed of the server shouldn't be too affected by this and that the table wouldn't be locked long enough to slow anything down. I would go with a straight update only if that update were critical to the business.

You might also consider doing more frequent transaction log backups while this is running depending on how short you are on HD space.

R Michael SSCarpal Tunnel Points: 4211 More actions · Answer 5

Good point Ninja,

10000 would definitly be a more appropriate number. Depending on the width of the table, you could probably increase that number even further without SQL Server acquiring a table lock. As a rule of thumb, I make sure my update rowcount doesn't exceed 3000 pages of data. You can calculate a theoretical max by looking at the tabledef. (8000/maxRowWidth)*3000. If you have a halfway decent spec server, SQL should never escalate to a table lock.

I could be wrong (and I often am!) but I think the waitfor is unneccessary. You will only be locking (for a short time) specific pagse of data that you will not touch again. Other processes that wish to read or write to rows on those pages will be blocked for (at most) seconds and then allowed to execute. Your update should not touch those pages again (and even if it does, it will be blocked by other processes that queued to hit those pages).

SQL guy and Houston Magician

Michael Meierruth SSChampion Points: 10051 More actions · Answer 6

Michael Meierruth

SSChampion

Points: 10051

November 3, 2006 at 4:15 am

#669729

But doesn't it worry anyone that instead of scanning 10 million records at the end you will have scanned over 5 billion records?

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 7

Not if you index the column. You can also use the slidding window trick (just search these forums to find it).

Also the idea behind this technic is that you don't stop production for any amount of time (maybe just slow it down a bit). Sometimes this is much more important than having the query be optimal (especially if it's only a one time deal).

Michael Meierruth SSChampion Points: 10051 More actions · Answer 8

Michael Meierruth

SSChampion

Points: 10051

November 3, 2006 at 7:44 am

#669783

I'm not sure the index helps when you use the '<>' operator.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 9

Ninja's_RGR'us

SSC Guru

Points: 294069

November 3, 2006 at 7:51 am

#669789

Agreed... but we don't know the froms and tos of the update so we would just be speculating at this point.

Also the sliding window method doesn't have that limitation .

Michael Meierruth SSChampion Points: 10051 More actions · Answer 10

Michael Meierruth

SSChampion

Points: 10051

November 3, 2006 at 8:08 am

#669799

Can't find anything on this 'sliding window' thing. Have you got a specific link?

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 11

The sliding window was used to do something slightly different but it can be modified for this case. This is untested and from the top of my head :

create table ttemp

(

id primary key clustered

)

insert into ttemp (id) select id from tmain

set rowcount 10000

while @@rowcount > 0

begin

update tmain set col = 'wathever'

from tmain inner join ttemp on tmain.id = ttemp.id

delete from ttemp

--would have to check to make sure this process updates and delete matching rows too!!

end

set rowcount 0

--check if new rows where inserted and update if needed... or set a new rule/default beforehand

drop table ttemp

R Michael SSCarpal Tunnel Points: 4211 More actions · Answer 12

Ninja is correct and perhaps we both should have made that more clear, that the code was to demonstrate the general concept, and not a snippit to be used verbatim without looking at indexes and table usage. (in fact, if nobody is going to be touching the table then it really doesn't matter if you are blocking for several minutes)

Very good example with the temp table. There are a lot of variations but it is a much clearer example of how one might approach this problem in the real world.

Nice one!

SQL guy and Houston Magician