Transaction Log Growth During a Delete

  • Our company has a database containing a single table that is used to log actions in our Mailer process. This database can grow larger than I would prefer in a short period of time. I am constantly having to shrink it and the other tables used in the Mailer process. The logging table currently has 156,787,290 records in it and the database has grown to 30GB with the log file at 2GB.

    I am perplexed by how large the transaction log grows if I attempt to delete most of the records in the table. I attempted to delete 151,757,051 of the records from the table yesterday, on a dev server, to determine how long it would take and how large the database transaction log file would grow. I let the delete command run for 45 minutes, but I had to cancel it due to time constraints and because the log file grew to 84GB.

    Why would the log file grow to almost three times the size of the original database? I also noticed that most of the records in the table did NOT get deleted.

    I realize that this is not the best way to clear this table. I would normally stop the Mailer process, copy one month of data to another table in another database, truncate the original table and then restore the copied records. This process gets more tedious and difficult when I have to delete the records from the multiple databases used in the entire Mailer process.  Those databases contain numerous tables and get very large if I do not delete the records on a monthly basis.

    Thank you in advance,

    Brian

    Part-Time DBA

  • Log can grow larger than data ( it also needs to log index activity) for making rollback possible.

    Delete in small batches to minimize duration and locking. (it may be locked out by other transactions)

    Perhaps you can copy the keys you want to delete to a temporary table and join from there (instead of a complex CTE)

    Is there an index that helps you with deletes ? (so you don't need to scan the whole log to find which items to delete)

    It is easier to help when there is sample/DDL information

  • You can make a loop to delete X number of records, and include a transaction log back up in each loop.

    You will need disk space for the backups, but your LDF should not get huge.

    If you do this monthly, maybe schedule it to run during quiet activity time

    NextDelete:
    waitfor delay '00:00:03' --- Dont hog the server

    BACK UP Trans Log

    Delete top(100000)
    from My_Table
    where ArchiveDate < getdate() -30

    if @@rowcount > 0 goto NextDelete

    • This reply was modified 2 years, 5 months ago by  homebrew01.
    • This reply was modified 2 years, 5 months ago by  homebrew01.
    • This reply was modified 2 years, 5 months ago by  homebrew01.
  • homebrew01,

    That is an interesting suggestion that I may consider for deleting both the log records and the process records. I would like to use cascading deletes to delete the process records, but record locking has been an issue in the past. Your method should help eliminate record locking which would allow the Mailer processes to keep running during the delete process.

    Thank you very much!

  • Jo,

    Below is the info you requested:

    -- DDL for Log Table:

    CREATE TABLE [dbo].[tbNewMatchBatchLog](

    [NewMatchBatchLogID] [bigint] IDENTITY(1,1) NOT NULL,

    [BatchType] [char](1) NOT NULL,

    [NewMatchBatchID] [int] NULL,

    [NewMatchBatchRunTimeID] [int] NULL,

    [AgentSavedCriteriaID] [int] NULL,

    [ProcedureName] [varchar](100) NOT NULL,

    [Action] [varchar](100) NOT NULL,

    [StartDateTime] [datetime2](7) NOT NULL,

    [EndDateTime] [datetime2](7) NULL,

    [Records] [int] NULL,

    CONSTRAINT [PK_tbNewMatchBatchLog] PRIMARY KEY CLUSTERED

    (

    [NewMatchBatchLogID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    -- SQL to Delete every record except last 30 days

    select

    top 1 NewMatchBatchLogID

    from NewMatchesLog.dbo.tbNewMatchBatchLog

    where

    StartDateTime < dateadd(Day, -30, getdate())

    order by NewMatchBatchLogID desc

    delete

    from NewMatchesLog.dbo.tbNewMatchBatchLog

    where

    NewMatchBatchLogID <= 686438997

  • For best performance, make sure you have the log table properly clustered.  Specifically, if you intend to delete by log time, as is quite typical, then cluster the table that way.  If an identity is available on the table, add that to the clus key to make it unique.

    CREATE <LOG> TABLE ( log_date datetime2 NOT NULL, id int IDENTITY(1, 1) NOT NULL, ... )

    CREATE CLUSTERED INDEX ... ON ... ( log_date, id ) WITH ...

    Do the deletes in batches/loops as shown above, but based on the MIN(log_date) remaining in the table going forward.  That is, delete the rows in clus key order.  This will ensure a very fast start to each delete.

    Also, be sure to do a CHECKPOINT as part of your process.  The log can't be trimmed until a CHECKPOINT occurs, even if it's already been backed up.

    So, the sequence would be:

    DELETE TOP (nnnnnn) rows

    CHECKPOINT

    WAITFOR

    BACKUP log

     

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply