Delete Large records 20 million rows!!! FAST

  • Hey gurus and bright minds,

    I need a few idea of how I can delete 20 million rows from one table FAST!!!  Last time I tried to do it, the translog got filled up & rolled back the whole job.  The process took at least 1 hour plus. 

    I heard a few suggestions of commit after a few thousands rows.  Any good sample code out there for that?  Also, would using truncate table instead of delete help? 

    thanks,

    JON

    DBA/OLAP Developer -
    San Francisco, CA

  • I would use the the TRUNCATE command. The deletions will not be logged in the transaction log. e.g.

    TRUNCATE TABLE authors

    You need to be either a member of the sysadmin role the database owner or a member of the db_ddladmin to run this command

    hope this helps

    Dave

  • Sorry, I got to clearify that I do not want to remove all rows.

    I need to delete from a specific criteria.  I am mistaken, but

    Truncate table will not work for me... Any other ideas?

    thanks

    JON

    DBA/OLAP Developer -
    San Francisco, CA

  • You could try

    DECLARE @x INT

    SET @x = 1

    SET ROWCOUNT 1000

    WHILE @x > 0

    BEGIN

     BEGIN TRAN

     

     --DELETE STATEMENT HERE

     SET @x = @@rowcount

     COMMIT TRAN

    END 

    However with such a large dataset to remove it will still take awhile and the transaction log will still grow. You can either set recovery model to Simple (Trunc on Checkpoint in SQL 7) or set a point to stop so you can truncate yourself.

  • Thanks a lot!! that works!  But I thought even if I put the recovery mode to simple.  It will log all the delete as well!  Do you have any idea how big the log will grow?  Do I need to truncate log during my big loop?

     

    Thanks

    DBA/OLAP Developer -
    San Francisco, CA

  • It shouldn't grow that big if you are committing as Antares suggested. You can always make the batch smaller, like 100, and add a small delay, like a second, between loops or issue a checkpoint to allow the log to clear.

  • Another technique is to create a new table containing the rows you DON'T want to delete. You can create the new table using SELECT INTO (which is not logged), then delete the old table, and rename the new one.

    Depending on how many rows will remain in the table after deletion, this may be quicker.

    Also, things like indexes, FK constraints etc may make this impractical.

  • If you still want the delete-transactions to be logged (i.e. remain in full-recovery mode), but you don't want to fill up your transaction log, dump it manually from time to time, like e.g. (I continue on Antares' script)

    DECLARE @x INT

    SET @x = 1

    SET ROWCOUNT 1000

    WHILE @x > 0

    BEGIN

     BEGIN TRAN

     

     --DELETE STATEMENT HERE

     SET @x = @@rowcount

     COMMIT TRAN

     dump tran MyDatabase to disk='D:\Microsoft SQL Server\MSSQL\BACKUP\MyDatabase.TRN'

    END 

    This will dump the transaction log after each 1000 deleted rows.

    Warning: if your table is very large, the MyDatabase.TRN file may become very large.

    Bert

  • One technique I've used is to create a view which selects the rows you wish to keep, BCP out using the view, truncate the table, drop the indexes, BCP in the rows that were BCPed out, then rebuild the indexes. If you are dealing with a clustered index you can order the rows in the view when you BCP out (using TOP 100) and then recreate the index with SORTED_DATA option after the BCP in.

  • Jonathan,

    Keep in mind that the reason you are doing the delete in batches, as suggested above. is to allow a commit to happen, but continue to run your delete. This allows the committed part of the log to be truncated (in simple mode) or backed up, if you have a backup running during the process. If your process runs so fast that the truncate/backup does not happen in time, the log can still fill up.

    I don't think an explicit commit is required, just stopping at X rowcount in the loop should cause the commit to happen just fine.

    Chris

  • To tag onto ianscarlett's reply, copy the rows you DON'T want to delete into a temp table using SELECT INTO (no logging), TRUNCATE the original table (no logging, deletes everything), but instead of deleting the old table and renaming the new one, copy the "saved" rows back into the original table and delete the temporary table.  Should eliminate any issues with indexes, FKs, etc. unless killing all the records will cause referential integrity problems.

  • Thanks for all of your responses!  You guys are awesome!

    Anyway, I had made my database log property to be simple.  Use set Rowcount = 100000, Delete based on my critieria.  WOW, it too only less than 40 seconds to complete.  I also tested the same delete query under Full log mode.  It also performs really well.  SO my conclusion is:

    I think my mistake was not to use Rowcount.  By trying to delete million record at once.  It fills up the entire log, SQL was treating it as one big transaction.

    If I loop throught the delete transaction, every 100000 row I commit to disk.  It speed up the process so so much faster and the log doesn't fill up at all!

    I'm just curious.  Since the delete under DB full log mode was pretty quickly in completion.  Should I do dump/truncate log in the middle of my loop?  When does log get truncated?  Is there a rule of how often we should truncate the log.  Does SQL server automatically does it?  If it is simple Mode, does it mean SQL truncate the log at every checkpoint?  What about Full mode?

    Thanks!

    JON

    DBA/OLAP Developer -
    San Francisco, CA

  • Well, back on the first page it looked like simple recovery mode was an option for you?

    If so, and this is something you only have to do once in a blue moon, since the TLOG auto-truncates at 70% full in simple recovery mode, why not put the DB into simple, delete your rows - if the TLOG hits 70% it will truncate, you can empty your table.  Then put the DB back into FULL or whatever and backup the DB. (?)

     

    Just a suggestion.

  • I don't have control over the delete routine (it's part of a closed source package) so is there a setting in the database manager (SQL Server 2000) that will auto commit after N transactions? Time is not an issue, I'd rather get the job done in 4 hours than not at all, as long as I can keep the transaction log from filling up the disk space.

    TIA,

    Edward Hooper

    ehooper@princesscruises.com

  • To the best of my knowledge I know of no such setting to commit after N transactions.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply