Deleting Millions of Records

  • Hello...

    I am in the process of cleaning up some database records. This is a yearly purge type of process for my software application.

    As it sits, my SQL DB is 20 GB in space. I have executed the first of three delete statements that will delete about a million records per statement and each statement is for a different table. The first one has been running about 2 hours now. The transaction log is at 17 GB and is usually around 100 MB. Is there anyway for me to delete these records without causing such a massive transactions log?

    I only have about 60 GB of free space on this server and I am not sure how big the transaction log is going to grow.

    I am just trying to find out if I am going about this in the right way.

    Thanks

    Garry

  • What is the recovery model of the database? If SIMPLE I would batch my deletes and issue a CHECKPOINT after each one which will cause the log to truncate, feeing up existing space to be reused. If FULL I would still batch my deletes and then backup the log which will free up space. I would try to have batches where the deletes are under a minute or 2 so that the table(s) affected are not locked for long. Part of the reason you are having such a long time with the deletes is because of log growth.

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • For large volume of data deletion, I always unload the data to be kept, truncate the table and load the data back. You can reduce your overall time for the taks considerably.

  • Kaushal (7/18/2008)


    For large volume of data deletion, I always unload the data to be kept, truncate the table and load the data back. You can reduce your overall time for the taks considerably.

    But if you have Foreign Key contraints you have to remove them before you truncate and remember to reapply them afterwards. I suppose if you script it once it is not a big deal.

    If you are going to do this you should copy the data to be kept to a new table (Select into) , drop the old table, rename the new table, and apply the FK constraints. This would be faster than copy out - truncate - copy back - drop new table.

    I would also think that improved performance would only be gained if the data you are keeping is less volume than the data you are deleting.

    This solution also requires that the table(s) be unavailable during the process.

    Jack Corbett
    Consultant - Straight Path Solutions
    Check out these links on how to get faster and more accurate answers:
    Forum Etiquette: How to post data/code on a forum to get the best help
    Need an Answer? Actually, No ... You Need a Question

  • Performance-wise, it sounds like you need to break that up in to much smaller chunks. You could use something like

    declare @n int

    declare @rn int

    set @n=50000

    set @rn=1

    While @rn>0

    BEGIN

    DELETE top(@N)

    from delme

    --insert your criteria here

    set @rn=@@rowcount

    END

    Even faster would be to "walk the clustered index", so as to bunch up your deletes as much as possible. It looks fairly similar:

    declare @n int

    declare @batchsize int

    declare @topN int

    set @batchsize =50000

    set @n=0

    select @topN=max(id) --ID is the clustered key in this case

    from delme

    While @n<@topN

    BEGIN

    DELETE

    from delme

    WHERE ID between @n and @n+@batchsize

    --insert your additional criteria here

    set @n=@n+@batchsize+1

    END

    This will however still record all deletions in the Transaction log. You may need to just do some of the delete, then a transaction log backup, then some more, and another transaction. Alternatively, you could switch to simple mode during this time, but this will wreck your point-in-time restores, since it will reset the log chain.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • gbargsley (7/18/2008)


    Hello...

    I am in the process of cleaning up some database records. This is a yearly purge type of process for my software application.

    As it sits, my SQL DB is 20 GB in space. I have executed the first of three delete statements that will delete about a million records per statement and each statement is for a different table. The first one has been running about 2 hours now. The transaction log is at 17 GB and is usually around 100 MB. Is there anyway for me to delete these records without causing such a massive transactions log?

    I only have about 60 GB of free space on this server and I am not sure how big the transaction log is going to grow.

    I am just trying to find out if I am going about this in the right way.

    Thanks

    Garry

    the log is recording every row that is deleted.

    set your database to bulk logged recovery mode while deleting the records, might even run a bit quicker to as no logging will take place

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Thanks all for the responses. I have decided since it is the end of the year that I will just truncate all data from the three tables and start from scratch.

  • The bulk-logged recovery model will not help at all btw. There is nothing like a bulk-logged delete.

    I would have gone with the new table approach as well, with copying the required data to the new tables.

    Best Regards,

    Chris Büttner

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply