Deleting a few million rows

  • Hi,

    I have a production server that has a table with about 6 million rows.

    table definition is as follows:

    CREATE TABLE [dbo].[GICSPFProductUpdateDetail](

    [CompanyID] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [DivisionID] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [DepartmentID] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [GICSPFProductUpdateDetailID] [int] NOT NULL IDENTITY (1,1),

    [GICSPFProductUpdateID] [int] NOT NULL,

    [MfrVendorID] [int] NULL,

    [ProductID] [int] NULL,

    [TransactionCode] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [VendorCode] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [MfrCode] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [Catalog] [varchar] (27) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [CatalogNew] [varchar] (27) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [VndPrdCode] [nvarchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [VndPrdCodeNew] [nvarchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [VndCatalog] [nvarchar] (55) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [VndCatalogNew] [nvarchar] (55) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [VndCategory] [int] NULL,

    [GICUPC] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [GICUPCNew] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [UPC] [varchar] (14) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [UPCNew] [varchar] (14) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [Zone] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [OldZone] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [DescriptionEnglish] [nvarchar] (57) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [DescriptionFrench] [nvarchar] (57) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [SPFCategoryID] [int] NULL,

    [PurUOM] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [DistrCost] [money] NULL,

    [List] [money] NULL,

    [Col1] [money] NULL,

    [Col2] [money] NULL,

    [Col3] [money] NULL,

    [CashDiscount] [numeric] (2,1) NULL,

    [TaxFlag] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [SellingUOM] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [UnitQty] [int] NULL,

    [CartonQty] [int] NULL,

    [CaseQty] [int] NULL,

    [PalletQty] [int] NULL,

    [MinToOrder] [int] NULL,

    [OrderIncrement] [int] NULL,

    [Weight] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [POA] [bit] NULL,

    [ReplVendorCode] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [ReplMfrCode] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [ReplGICCatalog] [nvarchar] (27) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [ProductStatus] [nvarchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [ProductStatusSource] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [ProductStatusEffDate] [datetime] NULL,

    [PricePercentage] [numeric] (9,4) NULL,

    [PC] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [LockedBy] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [LockTS] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NULL

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[GICSPFProductUpdateDetail] ADD CONSTRAINT [PK_GICSPFProductUpdateDetail_CoDivDepUpdateHeaderUpdateDetail] PRIMARY KEY

    CLUSTERED

    (

    [CompanyID] ASC

    ,[DivisionID] ASC

    ,[DepartmentID] ASC

    ,[GICSPFProductUpdateID] ASC

    ,[GICSPFProductUpdateDetailID] ASC

    ) WITH

    (

    PAD_INDEX = OFF

    ,STATISTICS_NORECOMPUTE = OFF

    ,IGNORE_DUP_KEY = OFF

    ,ALLOW_ROW_LOCKS = ON

    ,ALLOW_PAGE_LOCKS = ON

    ) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [IX_GICSPFProductUpdateDetail_ProductID] ON [dbo].[GICSPFProductUpdateDetail]

    (

    [ProductID] ASC

    )ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    GO

    I would like to delete every row that do not have the headerID of 12, 19 or 50.

    My query is as follows:

    Begin transaction

    delete from GICSPFproductUpdateDetail where GICSPFProductUpdateID not in (12,19,50)

    But, i started this query, and it ran forever, locking every row, it was really not performant at all (i stopped it after 2 hrs). Had to stop the server completely, because it did not want to stop).

    What is the best way to delete these million rows, and keep the let's say 100k I want to keep?

    Thanks in advance!

    J-F

    Cheers,

    J-F

  • You can delete them in small batches in a loop like the following:

    DECLARE @BatchSize INT,

    @Criteria DATETIME,

    @RowCount INT

    SET @BatchSize = 1000

    SET @Criteria = getdate()-60

    SET @RowCount = 1000

    SET ROWCOUNT @BatchSize

    WHILE @RowCount > 0

    BEGIN

    DELETE

    FROM MYTABLE

    WHERE MYCOL < @Criteria

    SELECT @RowCount = @@rowcount

    END

    SET ROWCOUNT 0

    Someone recently pointed out to me that SET ROWCOUNT is being deprecated after SQL 2008 as documented in the Books Online, and TOP is a better method. The following shows how to do it with TOP as well:

    DECLARE @Criteria DATETIME,

    @RowCount INT

    SET @Criteria = getdate()-60

    WHILE @RowCount = 1000

    BEGIN

    DELETE TOP(1000)

    FROM MYTABLE

    WHERE MYCOL < @Criteria

    SELECT @RowCount = @@rowcount

    END

    This will minimize locking since it is only working 1000 rows at a time, and it will also minimize log growth if you are in Simple Recovery.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • Being as you only want to keep a small number of the rows(approx 1 in 60) you probably would find it quickest to

    a) copy those rows that you want to keep into a new table (same definition as current table)

    b) drop the original table

    c) rename the new table

    or

    a) as above

    b) truncate the original table

    c) copy the rows to keep back to the original table

    Obviously this might require some downtime, and is dependent on constraints.

    If that isn't feasible then you need to batch the deletes (ie remove them in smaller groups, with a commit and possibly a log backup after each batch). You can start with http://qa.sqlservercentral.com/articles/SQL+Server+2005+-+TSQL/2612/ for information on how to do this - there are other articles / posts with additional information.

  • Can you not just save the records you want into a temp table and then truncate ?

  • Hi, thanks for the replies,

    Unfortunately I cannot move the data and put it back, since the data has to stay available. So I think the 1000 rows batch delete is a good solution. I'm running it right now.

    I would like to know why, though, I understand batching reduces the number of locks, but what's the difference in doing this in a huge 6 million rows delete, or 1000 rows. Are lock ressource consuming?

    And what about the log file, yes I have a simple recovery database, but I'm really interested in what happens in the backend for the logging, why does a lot of small batches work faster than 1 big one.

    Thanks in advance,

    Cheers,

    J-F

  • The difference is that you are issuing thousands of smaller transactions affecting 1000 rows at a time versus issuing a single transaction that is affecting millions of rows. The key is that the batched delete transaction commits its changes and releases the locks and can then be truncated from the log at next checkpoint, and then starts another delete transaction for the next batch of 1000 rows. It has a much smaller footprint, and if you kill it, the rollback segment is limited to the current 1000 rows being affected.

    Whereas, your bulk delete, the rollback as you experienced can take as long or in most cases longer than the original delete ran before being stopped. This is because it can not truncate the log for the long running transaction and it now has to rollback all of that logged information to make the database consistent.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • Jonathan Kehayias (12/4/2008)


    It has a much smaller footprint, and if you kill it, the rollback segment is limited to the current 1000 rows being affected.

    Does that mean I should start a sub Transaction within the While?

    Begin transaction

    While @RowCount = 1000

    Begin transaction T1

    Delete from ...

    Commit transaction T1

    END while

    Commit transaction

    Because, How can it rollback only the last 1000 rows if it is within the same big transaction? There is something I do not get in this explanation.

    Thanks,

    J-F

    Cheers,

    J-F

  • Here is some code you can work with. If you have a test environment, test it there first. You do need to make some adjustments to the code before using it.

    use [yourDBNameHere]

    go

    declare @BatchSize int,

    @BatchCount int,

    @TLogBackupFile varchar(255);

    set @BatchSize = 100000; -- Delete 100,000 rows at a time

    set @BatchCount = 1; -- Used for TLog backup file name

    while exists(select 1 from GICSPFproductUpdateDetail where GICSPFProductUpdateID not in (12,19,50))

    begin

    delete top (@BatchSize)

    from

    GICSPFproductUpdateDetail

    where

    GICSPFProductUpdateID not in (12,19,50);

    set @TLogBackupFile = 'C:\Backups\yourDBNameHere_TLog_' + cast(@BatchCount as varchar(8)) + '.trn';

    backup log [yourDBNameHere] to disk = @TLogBackupFile; -- this is to keep the Transaction Log file for growing excessively

    set @BatchCount = @BatchCount + 1;

    end

  • Simon Smith (12/4/2008)


    Can you not just save the records you want into a temp table and then truncate ?

    If you can swing this kind of operation, it is generally much faster, but I was really confused by the original post, where it said 6 million row table, and delete a million, but then it said later that the remaining would only be 100K rows. The batched method works in either case, but if the resulting row count is significantly smaller than the rows to deleted, it is much faster to recreate the table, insert the rows needed into it, drop the old table, rename the new table, and recreate any needed constraints/indexes.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • Jean-François Bergeron (12/4/2008)


    Jonathan Kehayias (12/4/2008)


    It has a much smaller footprint, and if you kill it, the rollback segment is limited to the current 1000 rows being affected.

    Does that mean I should start a sub Transaction within the While?

    Begin transaction

    While @RowCount = 1000

    Begin transaction T1

    Delete from ...

    Commit transaction T1

    END while

    Commit transaction

    Because, How can it rollback only the last 1000 rows if it is within the same big transaction? There is something I do not get in this explanation.

    Thanks,

    J-F

    No. The DELETE is going to be run in a implicit transaction on its own. That is how the last 1000 rows will be rolled back if you stop it in the middle of the script. Any DELETE that completed will be gone, but the currently running DELETE will rollback.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • Thanks a lot Jonathan,

    Makes it a lot more clearer to my mind. 😉

    By the way, the operation is almost done, only got 1 million rows left to delete.

    Thanks again,

    P.S. Lynn, Thanks for the script, I like the idea with the log backup,

    J-F.

    Cheers,

    J-F

  • Jean-François Bergeron (12/4/2008)


    P.S. Lynn, Thanks for the script, I like the idea with the log backup,

    I agree that is a nifty script, and I will have to add that to my toolbelt. It doesn't work so good in my own environment since I use third party backup tools and a VLS but it certainly would work for most places I have seen.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply