Delete records from huge table

Question

Post reply

Delete records from huge table

danny09

Say Hey Kid

Points: 688
More actions
July 1, 2010 at 10:51 am

#223474

Performance being the main criteria. Which is the best way to delete 200000 records daily from table with 5 - 7 million records...

1. Directly use a delete statement

2. create a temp table insert 5-7 million records into temp table, truncate original table and insert back the necessary records..

Any other approach ?

Note: The ID column is an identity column with a clustered index

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

sturner SSC-Insane Points: 22411 More actions · Answer 1

With the clustered index on ID I would propose the following:

select into a temp table the IDs that qualify for being deleted (use NOLOCK if you can)

delete from the main table by joining to your temp table. Depending upon how the candidate IDs are dispersed throughout the table and/or the size of the rows you may need limit the deletes to smaller batches using top N in a while loop.

The probability of survival is inversely proportional to the angle of arrival.

danny09 Say Hey Kid Points: 688 More actions · Answer 2

danny09

Say Hey Kid

Points: 688

July 1, 2010 at 11:09 am

#1187809

sturner,

how do i limit them. can you please provide me an example.

Thanks

sturner SSC-Insane Points: 22411 More actions · Answer 3

for limiting rows you have two options. If the IDs to be deleted are pretty well grouped together and mostly contiguous you can divide up the deletes by ID ranges. Or in the simplest case you can just limit each delete to a fixed number of rows and keep deleting until they are all gone.

just as an example:

declare @nrows int

select ID into #deleteList from BIGTABLE where ... (where clause to find candidates for deletion).

set @nrows = @@ROWCOUNT

while @nrows > 0

begin

delete top(10000) A

from BIGTABLE A

JOIN #deleteList B on A.ID = B.ID

set @nrows = @@ROWCOUNT

end

You could also get the min and max ID from the #deleteList and divvy up the deletes evenly based on the dispersion of IDs... The logic would be a bit more complicated but would allow you another level of control over the deletes using a where clause versus top(n)

The probability of survival is inversely proportional to the angle of arrival.

danny09 Say Hey Kid Points: 688 More actions · Answer 4

danny09

Say Hey Kid

Points: 688

July 1, 2010 at 11:33 am

#1187825

I will definitly try it out. thanks

Steve Jones - SSC Editor SSC Guru Points: 728159 More actions · Answer 5

I like sturner's code. The one thing I might change is adding in a delay between loops, especially if this might take enough time to cross a log backup. Can help limit log growth.

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

sturner SSC-Insane Points: 22411 More actions · Answer 6

Good point Steve, I should have mentioned that as well. If your goal is to minimize the size of the transaction log then pausing long enough between iterations to allow the log to be backed up will do the trick.

Even if that is not an issue, breaking it up in smaller batches will reduce the size of the footprint and reduce blocking and/or lock escalation.

The probability of survival is inversely proportional to the angle of arrival.