Delete records from huge table

  • Performance being the main criteria. Which is the best way to delete 200000 records daily from table with 5 - 7 million records...

    1. Directly use a delete statement

    2. create a temp table insert 5-7 million records into temp table, truncate original table and insert back the necessary records..

    Any other approach ?

    Note: The ID column is an identity column with a clustered index

  • With the clustered index on ID I would propose the following:

    select into a temp table the IDs that qualify for being deleted (use NOLOCK if you can)

    delete from the main table by joining to your temp table. Depending upon how the candidate IDs are dispersed throughout the table and/or the size of the rows you may need limit the deletes to smaller batches using top N in a while loop.

    The probability of survival is inversely proportional to the angle of arrival.

  • sturner,

    how do i limit them. can you please provide me an example.

    Thanks

  • for limiting rows you have two options. If the IDs to be deleted are pretty well grouped together and mostly contiguous you can divide up the deletes by ID ranges. Or in the simplest case you can just limit each delete to a fixed number of rows and keep deleting until they are all gone.

    just as an example:

    declare @nrows int

    select ID into #deleteList from BIGTABLE where ... (where clause to find candidates for deletion).

    set @nrows = @@ROWCOUNT

    while @nrows > 0

    begin

    delete top(10000) A

    from BIGTABLE A

    JOIN #deleteList B on A.ID = B.ID

    set @nrows = @@ROWCOUNT

    end

    You could also get the min and max ID from the #deleteList and divvy up the deletes evenly based on the dispersion of IDs... The logic would be a bit more complicated but would allow you another level of control over the deletes using a where clause versus top(n)

    The probability of survival is inversely proportional to the angle of arrival.

  • I will definitly try it out. thanks

  • I like sturner's code. The one thing I might change is adding in a delay between loops, especially if this might take enough time to cross a log backup. Can help limit log growth.

  • Good point Steve, I should have mentioned that as well. If your goal is to minimize the size of the transaction log then pausing long enough between iterations to allow the log to be backed up will do the trick.

    Even if that is not an issue, breaking it up in smaller batches will reduce the size of the footprint and reduce blocking and/or lock escalation.

    The probability of survival is inversely proportional to the angle of arrival.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply