Deleting a few million rows

Question

Post reply

Deleting a few million rows

J-F Bergeron

SSCrazy Eights

Points: 8593
More actions
December 4, 2008 at 8:37 am

#124921

Hi,

I have a production server that has a table with about 6 million rows.

table definition is as follows:

CREATE TABLE [dbo].[GICSPFProductUpdateDetail]( [CompanyID] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, [DivisionID] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, [DepartmentID] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, [GICSPFProductUpdateDetailID] [int] NOT NULL IDENTITY (1,1), [GICSPFProductUpdateID] [int] NOT NULL, [MfrVendorID] [int] NULL, [ProductID] [int] NULL, [TransactionCode] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [VendorCode] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [MfrCode] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [Catalog] [varchar] (27) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [CatalogNew] [varchar] (27) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [VndPrdCode] [nvarchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [VndPrdCodeNew] [nvarchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [VndCatalog] [nvarchar] (55) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [VndCatalogNew] [nvarchar] (55) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [VndCategory] [int] NULL, [GICUPC] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [GICUPCNew] [varchar] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [UPC] [varchar] (14) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [UPCNew] [varchar] (14) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [Zone] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [OldZone] [varchar] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [DescriptionEnglish] [nvarchar] (57) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [DescriptionFrench] [nvarchar] (57) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [SPFCategoryID] [int] NULL, [PurUOM] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [DistrCost] [money] NULL, [List] [money] NULL, [Col1] [money] NULL, [Col2] [money] NULL, [Col3] [money] NULL, [CashDiscount] [numeric] (2,1) NULL, [TaxFlag] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [SellingUOM] [varchar] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [UnitQty] [int] NULL, [CartonQty] [int] NULL, [CaseQty] [int] NULL, [PalletQty] [int] NULL, [MinToOrder] [int] NULL, [OrderIncrement] [int] NULL, [Weight] [varchar] (15) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [POA] [bit] NULL, [ReplVendorCode] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [ReplMfrCode] [varchar] (5) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [ReplGICCatalog] [nvarchar] (27) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [ProductStatus] [nvarchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [ProductStatusSource] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [ProductStatusEffDate] [datetime] NULL, [PricePercentage] [numeric] (9,4) NULL, [PC] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [LockedBy] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, [LockTS] [nvarchar] (36) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ) ON [PRIMARY] GO ALTER TABLE [dbo].[GICSPFProductUpdateDetail] ADD CONSTRAINT [PK_GICSPFProductUpdateDetail_CoDivDepUpdateHeaderUpdateDetail] PRIMARY KEY CLUSTERED ( [CompanyID] ASC ,[DivisionID] ASC ,[DepartmentID] ASC ,[GICSPFProductUpdateID] ASC ,[GICSPFProductUpdateDetailID] ASC ) WITH ( PAD_INDEX = OFF ,STATISTICS_NORECOMPUTE = OFF ,IGNORE_DUP_KEY = OFF ,ALLOW_ROW_LOCKS = ON ,ALLOW_PAGE_LOCKS = ON ) ON [PRIMARY] GO CREATE NONCLUSTERED INDEX [IX_GICSPFProductUpdateDetail_ProductID] ON [dbo].[GICSPFProductUpdateDetail] ( [ProductID] ASC )ON [PRIMARY] GO SET ANSI_PADDING OFF
GO

I would like to delete every row that do not have the headerID of 12, 19 or 50.

My query is as follows:

Begin transaction
delete from GICSPFproductUpdateDetail where GICSPFProductUpdateID not in (12,19,50)

But, i started this query, and it ran forever, locking every row, it was really not performant at all (i stopped it after 2 hrs). Had to stop the server completely, because it did not want to stop).

What is the best way to delete these million rows, and keep the let's say 100k I want to keep?

Thanks in advance!

J-F

Cheers,

J-F

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 1

You can delete them in small batches in a loop like the following:



DECLARE @BatchSize INT,

        @Criteria DATETIME,

        @RowCount INT

SET @BatchSize = 1000

SET @Criteria = getdate()-60

SET @RowCount = 1000

SET ROWCOUNT @BatchSize

WHILE @RowCount > 0

BEGIN

    DELETE

    FROM MYTABLE

    WHERE MYCOL < @Criteria

    SELECT @RowCount = @@rowcount

END

SET ROWCOUNT 0

Someone recently pointed out to me that SET ROWCOUNT is being deprecated after SQL 2008 as documented in the Books Online, and TOP is a better method. The following shows how to do it with TOP as well:



DECLARE @Criteria DATETIME,

        @RowCount INT

SET @Criteria = getdate()-60

WHILE @RowCount = 1000

BEGIN

    DELETE TOP(1000)

    FROM MYTABLE

    WHERE MYCOL < @Criteria

    SELECT @RowCount = @@rowcount

END

This will minimize locking since it is only working 1000 rows at a time, and it will also minimize log growth if you are in Simple Recovery.

matt stockham SSCrazy Eights Points: 9942 More actions · Answer 2

Being as you only want to keep a small number of the rows(approx 1 in 60) you probably would find it quickest to

a) copy those rows that you want to keep into a new table (same definition as current table)

b) drop the original table

c) rename the new table

or

a) as above

b) truncate the original table

c) copy the rows to keep back to the original table

Obviously this might require some downtime, and is dependent on constraints.

If that isn't feasible then you need to batch the deletes (ie remove them in smaller groups, with a commit and possibly a log backup after each batch). You can start with http://qa.sqlservercentral.com/articles/SQL+Server+2005+-+TSQL/2612/ for information on how to do this - there are other articles / posts with additional information.

Simon AV SSCommitted Points: 1551 More actions · Answer 3

Simon AV

SSCommitted

Points: 1551

December 4, 2008 at 8:56 am

#907663

Can you not just save the records you want into a temp table and then truncate ?

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 4

Hi, thanks for the replies,

Unfortunately I cannot move the data and put it back, since the data has to stay available. So I think the 1000 rows batch delete is a good solution. I'm running it right now.

I would like to know why, though, I understand batching reduces the number of locks, but what's the difference in doing this in a huge 6 million rows delete, or 1000 rows. Are lock ressource consuming?

And what about the log file, yes I have a simple recovery database, but I'm really interested in what happens in the backend for the logging, why does a lot of small batches work faster than 1 big one.

Thanks in advance,

Cheers,

J-F

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 5

The difference is that you are issuing thousands of smaller transactions affecting 1000 rows at a time versus issuing a single transaction that is affecting millions of rows. The key is that the batched delete transaction commits its changes and releases the locks and can then be truncated from the log at next checkpoint, and then starts another delete transaction for the next batch of 1000 rows. It has a much smaller footprint, and if you kill it, the rollback segment is limited to the current 1000 rows being affected.

Whereas, your bulk delete, the rollback as you experienced can take as long or in most cases longer than the original delete ran before being stopped. This is because it can not truncate the log for the long running transaction and it now has to rollback all of that logged information to make the database consistent.

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 6

Jonathan Kehayias (12/4/2008)

It has a much smaller footprint, and if you kill it, the rollback segment is limited to the current 1000 rows being affected.

Does that mean I should start a sub Transaction within the While?

Begin transaction



While @RowCount = 1000

Begin transaction T1

Delete from ...

Commit transaction T1

END while

Commit transaction

Because, How can it rollback only the last 1000 rows if it is within the same big transaction? There is something I do not get in this explanation.

Thanks,

J-F

Cheers,

J-F

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 7

Here is some code you can work with. If you have a test environment, test it there first. You do need to make some adjustments to the code before using it.



use [yourDBNameHere]

go

declare @BatchSize int,

        @BatchCount int,

        @TLogBackupFile varchar(255);

set @BatchSize = 100000; -- Delete 100,000 rows at a time

set @BatchCount = 1; -- Used for TLog backup file name

while exists(select 1 from GICSPFproductUpdateDetail where GICSPFProductUpdateID not in (12,19,50))

begin

    delete top (@BatchSize)

    from

        GICSPFproductUpdateDetail

    where

        GICSPFProductUpdateID not in (12,19,50);

    set @TLogBackupFile = 'C:\Backups\yourDBNameHere_TLog_' + cast(@BatchCount as varchar(8)) + '.trn';

    backup log [yourDBNameHere] to disk = @TLogBackupFile; -- this is to keep the Transaction Log file for growing excessively

    set @BatchCount = @BatchCount + 1;

end

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 8

Simon Smith (12/4/2008)

Can you not just save the records you want into a temp table and then truncate ?

If you can swing this kind of operation, it is generally much faster, but I was really confused by the original post, where it said 6 million row table, and delete a million, but then it said later that the remaining would only be 100K rows. The batched method works in either case, but if the resulting row count is significantly smaller than the rows to deleted, it is much faster to recreate the table, insert the rows needed into it, drop the old table, rename the new table, and recreate any needed constraints/indexes.

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 9

Jean-François Bergeron (12/4/2008)

Jonathan Kehayias (12/4/2008)

It has a much smaller footprint, and if you kill it, the rollback segment is limited to the current 1000 rows being affected.

Does that mean I should start a sub Transaction within the While?

Begin transaction
While @RowCount = 1000 Begin transaction T1 Delete from ... Commit transaction T1 END while
Commit transaction

Because, How can it rollback only the last 1000 rows if it is within the same big transaction? There is something I do not get in this explanation.

Thanks,

J-F

No. The DELETE is going to be run in a implicit transaction on its own. That is how the last 1000 rows will be rolled back if you stop it in the middle of the script. Any DELETE that completed will be gone, but the currently running DELETE will rollback.

J-F Bergeron SSCrazy Eights Points: 8593 More actions · Answer 10

J-F Bergeron

SSCrazy Eights

Points: 8593

December 4, 2008 at 9:23 am

#907712

Thanks a lot Jonathan,

Makes it a lot more clearer to my mind. 😉

By the way, the operation is almost done, only got 1 million rows left to delete.

Thanks again,

P.S. Lynn, Thanks for the script, I like the idea with the log backup,

J-F.

Cheers,

J-F

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 11

Jean-François Bergeron (12/4/2008)

P.S. Lynn, Thanks for the script, I like the idea with the log backup,

I agree that is a nifty script, and I will have to add that to my toolbelt. It doesn't work so good in my own environment since I use third party backup tools and a VLS but it certainly would work for most places I have seen.