Speeding Up Delete's on Large Databases

Question

Post reply

Speeding Up Delete's on Large Databases

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 1

Eirikur Eiriksson (10/31/2016)

Quick suggestion, reading the data to be deleted into a "bucket" variables can drastically speed up the delete, here is an example based on the DDL previously posted.

/quote]

Seems like an extreme case of RBAR, Eirikur. What do you do once you have the variables loaded?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Eirikur Eiriksson SSC Guru Points: 182631 More actions · Answer 2

Jeff Moden (10/31/2016)

Eirikur Eiriksson (10/31/2016)

Quick suggestion, reading the data to be deleted into a "bucket" variables can drastically speed up the delete, here is an example based on the DDL previously posted.

Seems like an extreme case of RBAR, Eirikur. What do you do once you have the variables loaded?

After running the pre-loading, then I do the delete, this can speed it up quite significantly. It's in the essence the same principal as I normally use when running a "Dry run" in my normal test harness, makes certain that the affected pages are in memory before any operation is applied.

😎

Obviously one has to do this for all indices, the easiest way is to target each index with a force'ing hint.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 3

Eirikur Eiriksson (10/31/2016)

Jeff Moden (10/31/2016)

Eirikur Eiriksson (10/31/2016)

Quick suggestion, reading the data to be deleted into a "bucket" variables can drastically speed up the delete, here is an example based on the DDL previously posted.

Seems like an extreme case of RBAR, Eirikur. What do you do once you have the variables loaded?

After running the pre-loading, then I do the delete, this can speed it up quite significantly. It's in the essence the same principal as I normally use when running a "Dry run" in my normal test harness, makes certain that the affected pages are in memory before any operation is applied.

😎

Obviously one has to do this for all indices, the easiest way is to target each index with a force'ing hint.

I guess I don't understand. I don't get how loading a set of variables for each row to be deleted and then deleting one row at a time would improve performance. Either that or I've totally misread what you mean.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Eirikur Eiriksson SSC Guru Points: 182631 More actions · Answer 4

Jeff Moden (10/31/2016)

Eirikur Eiriksson (10/31/2016)

Jeff Moden (10/31/2016)

Eirikur Eiriksson (10/31/2016)

Quick suggestion, reading the data to be deleted into a "bucket" variables can drastically speed up the delete, here is an example based on the DDL previously posted.

Seems like an extreme case of RBAR, Eirikur. What do you do once you have the variables loaded?

After running the pre-loading, then I do the delete, this can speed it up quite significantly. It's in the essence the same principal as I normally use when running a "Dry run" in my normal test harness, makes certain that the affected pages are in memory before any operation is applied.

😎

Obviously one has to do this for all indices, the easiest way is to target each index with a force'ing hint.

I guess I don't understand. I don't get how loading a set of variables for each row to be deleted and then deleting one row at a time would improve performance. Either that or I've totally misread what you mean.

Guess I haven't been clear enough, this applies for each batch, not each row.

😎

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 5

Jeff, the method Eirikur's describing works really well. We implemented such a method here for cleaning up a substantial table earlier in the year. The existing method was causing some blocking and was estimated to take around six weeks to complete.

We looked at locking & blocking, logical reads and physical reads for a few cycles of the old method, and noticed that the first few cycles completed in seconds, extending later to minutes. This coincided with a change from logical reads to physical reads. So we decided to try cache priming - getting the required data into cache with a low-cost read before processing it. A single loop of the code we ran with back then looks something like this:

INSERT INTO #Temp (ID)



SELECT TOP(100000) iom.ID

FROM dbo.CoffeeClubMember iom WITH (NOLOCK)

WHERE <<funky WHERE clause here>>

ORDER BY <<funky ORDER BY here>>

SELECT @id = iom1.ID

FROM #Temp s

INNER JOIN dbo.CoffeeClubMember iom1 WITH (NOLOCK, INDEX (IX_CoffeeClubMember_MemberIDEndDateStartDate_InclCoffeeClubID))

       ON s.MemberID = iom1.MemberID

       AND (s.EndDate IS NULL OR s.EndDate = iom1.EndDate)

	   AND (s.StartDate IS NULL OR s.StartDate = iom1.StartDate)

	   AND s.ID = iom1.ID

INNER JOIN dbo.CoffeeClubMember iom2 WITH (NOLOCK, INDEX (UIX_CoffeeClubMember_CoffeeClubIDMemberIDStartEndDate))

       ON s.CoffeeClubID = iom2.CoffeeClubID AND s.MemberID = iom2.MemberID AND s.ID = iom2.ID

INNER JOIN dbo.CoffeeClubMember iom3 WITH (NOLOCK, INDEX (PK_CoffeeClubMember))

       ON s.ID = iom3.ID

DELETE iom

FROM dbo.CoffeeClubMember iom

INNER JOIN #Temp s

ON s.ID = iom.ID

and it completed the same day.

The priming step takes a variable amount of time depending on what's already in cache (we have a lot of RAM). Each delete takes about half a second, every time, every pass, and there's no blocking. Ever.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 6

ChrisM@Work (11/1/2016)

Jeff, the method Eirikur's describing works really well. We implemented such a method here for cleaning up a substantial table earlier in the year. The existing method was causing some blocking and was estimated to take around six weeks to complete.

We looked at locking & blocking, logical reads and physical reads for a few cycles of the old method, and noticed that the first few cycles completed in seconds, extending later to minutes. This coincided with a change from logical reads to physical reads. So we decided to try cache priming - getting the required data into cache with a low-cost read before processing it. A single loop of the code we ran with back then looks something like this:

INSERT INTO #Temp (ID)
SELECT TOP(100000) iom.ID FROM dbo.CoffeeClubMember iom WITH (NOLOCK) WHERE <<funky WHERE clause here>> ORDER BY <<funky ORDER BY here>> SELECT @id = iom1.ID FROM #Temp s INNER JOIN dbo.CoffeeClubMember iom1 WITH (NOLOCK, INDEX (IX_CoffeeClubMember_MemberIDEndDateStartDate_InclCoffeeClubID)) ON s.MemberID = iom1.MemberID AND (s.EndDate IS NULL OR s.EndDate = iom1.EndDate) AND (s.StartDate IS NULL OR s.StartDate = iom1.StartDate) AND s.ID = iom1.ID INNER JOIN dbo.CoffeeClubMember iom2 WITH (NOLOCK, INDEX (UIX_CoffeeClubMember_CoffeeClubIDMemberIDStartEndDate)) ON s.CoffeeClubID = iom2.CoffeeClubID AND s.MemberID = iom2.MemberID AND s.ID = iom2.ID INNER JOIN dbo.CoffeeClubMember iom3 WITH (NOLOCK, INDEX (PK_CoffeeClubMember)) ON s.ID = iom3.ID DELETE iom FROM dbo.CoffeeClubMember iom INNER JOIN #Temp s
ON s.ID = iom.ID

and it completed the same day.

The priming step takes a variable amount of time depending on what's already in cache (we have a lot of RAM). Each delete takes about half a second, every time, every pass, and there's no blocking. Ever.

Interesting, I need to keep this in the back of the brain. Actually, I may want to code a sample myself or I may forget considering my short term memory sucks.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 7

Lynn Pettis (11/1/2016)

ChrisM@Work (11/1/2016)

Jeff, the method Eirikur's describing works really well. We implemented such a method here for cleaning up a substantial table earlier in the year. The existing method was causing some blocking and was estimated to take around six weeks to complete.

We looked at locking & blocking, logical reads and physical reads for a few cycles of the old method, and noticed that the first few cycles completed in seconds, extending later to minutes. This coincided with a change from logical reads to physical reads. So we decided to try cache priming - getting the required data into cache with a low-cost read before processing it. A single loop of the code we ran with back then looks something like this:

INSERT INTO #Temp (ID)
SELECT TOP(100000) iom.ID FROM dbo.CoffeeClubMember iom WITH (NOLOCK) WHERE <<funky WHERE clause here>> ORDER BY <<funky ORDER BY here>> SELECT @id = iom1.ID FROM #Temp s INNER JOIN dbo.CoffeeClubMember iom1 WITH (NOLOCK, INDEX (IX_CoffeeClubMember_MemberIDEndDateStartDate_InclCoffeeClubID)) ON s.MemberID = iom1.MemberID AND (s.EndDate IS NULL OR s.EndDate = iom1.EndDate) AND (s.StartDate IS NULL OR s.StartDate = iom1.StartDate) AND s.ID = iom1.ID INNER JOIN dbo.CoffeeClubMember iom2 WITH (NOLOCK, INDEX (UIX_CoffeeClubMember_CoffeeClubIDMemberIDStartEndDate)) ON s.CoffeeClubID = iom2.CoffeeClubID AND s.MemberID = iom2.MemberID AND s.ID = iom2.ID INNER JOIN dbo.CoffeeClubMember iom3 WITH (NOLOCK, INDEX (PK_CoffeeClubMember)) ON s.ID = iom3.ID DELETE iom FROM dbo.CoffeeClubMember iom INNER JOIN #Temp s
ON s.ID = iom.ID

and it completed the same day.

The priming step takes a variable amount of time depending on what's already in cache (we have a lot of RAM). Each delete takes about half a second, every time, every pass, and there's no blocking. Ever.

Interesting, I need to keep this in the back of the brain. Actually, I may want to code a sample myself or I may forget considering my short term memory sucks.

Heh - I have the same problem Lynn - in spades.

I've been writing it up in bits and bobs for what seems like forever, because for us it was a much more significant issue than the tran log, which is what most folks pounce on in this scenario. Showing folks an article with testable code is always going to be better than "trust me".

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 8

ChrisM@Work (11/1/2016)

So we decided to try cache priming - getting the required data into cache with a low-cost read before processing it. A single loop of the code we ran with back then looks something like this:

Seems horribly complex for what it does. Take a look at the code that I posted. Does that not "warm the cache" that you're talking about? For temporal (mostly WORM) tables, it also deletes in contiguous page blocks at the "beginning" of the table far away from the "active end" of the table according to the CI, which makes the deletes nasty fast.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

Jeff Moden (11/1/2016)

ChrisM@Work (11/1/2016)

So we decided to try cache priming - getting the required data into cache with a low-cost read before processing it. A single loop of the code we ran with back then looks something like this:

Seems horribly complex for what it does. Take a look at the code that I posted. Does that not "warm the cache" that you're talking about? For temporal (mostly WORM) tables, it also deletes in contiguous page blocks at the "beginning" of the table far away from the "active end" of the table according to the CI, which makes the deletes nasty fast.

If uod.ID is the cluster key then you're going to buffer the relevant pages of the clustered index. Ordinary indexes will not be affected by the read - so they won't be in cache.

If you've got an index on uod.ID and uod.Created, then the relevant pages of that index alone is all that will be buffered - and the clustered index and any other indexes will still have to be read from disk - during the delete.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Eirikur Eiriksson SSC Guru Points: 182631 More actions · Answer 10

In most cases, we will see up to 2 times better performance when applying this method, more commonly the difference is greater. Here is an example, results speak for themselves

😎

Simple test harness

USE TEEST;



GO

SET NOCOUNT ON;

DBCC FREEPROCCACHE WITH NO_INFOMSGS;

--/* -- UNCOMMENT THIS LINE TO SKIP THE RECREATION OF THE TEST DATA SET

--DECLARE @SAMPLE_SIZE INT = 10;

IF OBJECT_ID('dbo.TBL_TEST_DELETE') IS NOT NULL DROP TABLE dbo.TBL_TEST_DELETE;

CREATE TABLE dbo.TBL_TEST_DELETE

(

    TD_ID   INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_DELETE_TD_ID PRIMARY KEY CLUSTERED

   ,TD_TS   DATETIME          NOT NULL CONSTRAINT DFLT_DBO_TBL_TEST_DELETE_TD_TS    DEFAULT (GETDATE())

   ,TD_C001 VARCHAR(4000)     NOT NULL

   ,TD_C002 VARCHAR(4000)     NOT NULL

   ,TD_C003 VARCHAR(4000)     NOT NULL

   ,TD_C004 VARCHAR(4000)     NOT NULL

   ,TD_C005 VARCHAR(4000)     NOT NULL

   ,TD_C006 VARCHAR(4000)     NOT NULL

   ,TD_C007 VARCHAR(4000)     NOT NULL

   ,TD_C008 VARCHAR(4000)     NOT NULL

   ,TD_C009 VARCHAR(4000)     NOT NULL

   ,TD_C010 VARCHAR(4000)     NOT NULL

);

DECLARE @SAMPLE_SIZE INT = 100000;

;WITH T(N) AS (SELECT N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))

,  NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N

               FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)

INSERT INTO dbo.TBL_TEST_DELETE(TD_TS,TD_C001,TD_C002,TD_C003,TD_C004,TD_C005,TD_C006,TD_C007,TD_C008,TD_C009,TD_C010)

SELECT

    DATEADD(SECOND,NM.N,GETDATE())

   ,REPLICATE('S',4000)

   ,REPLICATE('A',4000)

   ,REPLICATE('M',4000)

   ,REPLICATE('P',4000)

   ,REPLICATE('L',4000)

   ,REPLICATE('E',4000)

   ,REPLICATE('D',4000)

   ,REPLICATE('A',4000)

   ,REPLICATE('T',4000)

   ,REPLICATE('A',4000)

FROM NUMS   NM;

--*/

CHECKPOINT 1;

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

DBCC FREESYSTEMCACHE ( 'ALL' ) WITH NO_INFOMSGS;

RAISERROR('COLD DELETE 1

-----------------------------------------------------------------------------------------------------',0,0) WITH NOWAIT;

SET STATISTICS IO,TIME ON;

DELETE X

FROM dbo.TBL_TEST_DELETE X

WHERE X.TD_ID < 10001;

SET STATISTICS IO,TIME OFF;

CHECKPOINT 1;

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

DBCC FREESYSTEMCACHE ( 'ALL' ) WITH NO_INFOMSGS;

RAISERROR('HOT DELETE

-----------------------------------------------------------------------------------------------------',0,0) WITH NOWAIT;

SET STATISTICS IO,TIME ON;

DECLARE @TD_ID       INT          ;

DECLARE @TD_TS       DATETIME     ;

DECLARE @TD_C001     VARCHAR(4000);

DECLARE @TD_C002     VARCHAR(4000);

DECLARE @TD_C003     VARCHAR(4000);

DECLARE @TD_C004     VARCHAR(4000);

DECLARE @TD_C005     VARCHAR(4000);

DECLARE @TD_C006     VARCHAR(4000);

DECLARE @TD_C007     VARCHAR(4000);

DECLARE @TD_C008     VARCHAR(4000);

DECLARE @TD_C009     VARCHAR(4000);

DECLARE @TD_C010     VARCHAR(4000);

SELECT

     @TD_ID   = TD.TD_ID

    ,@TD_TS   = TD.TD_TS

    ,@TD_C001 = TD.TD_C001

    ,@TD_C002 = TD.TD_C002

    ,@TD_C003 = TD.TD_C003

    ,@TD_C004 = TD.TD_C004

    ,@TD_C005 = TD.TD_C005

    ,@TD_C006 = TD.TD_C006

    ,@TD_C007 = TD.TD_C007

    ,@TD_C008 = TD.TD_C008

    ,@TD_C009 = TD.TD_C009

    ,@TD_C010 = TD.TD_C010

FROM    dbo.TBL_TEST_DELETE TD

WHERE   TD.TD_ID < 20001;

DELETE X

FROM dbo.TBL_TEST_DELETE X

WHERE X.TD_ID < 20001;

SET STATISTICS IO,TIME OFF;

CHECKPOINT 1;

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

DBCC FREESYSTEMCACHE ( 'ALL' ) WITH NO_INFOMSGS;

RAISERROR('COLD DELETE 2

-----------------------------------------------------------------------------------------------------',0,0) WITH NOWAIT;

SET STATISTICS IO,TIME ON;

DELETE X

FROM dbo.TBL_TEST_DELETE X

WHERE X.TD_ID < 30001;

SET STATISTICS IO,TIME OFF;

Output

COLD DELETE 1



-----------------------------------------------------------------------------------------------------

SQL Server parse and compile time:

   CPU time = 155 ms, elapsed time = 306 ms.

Table 'TBL_TEST_DELETE'. Scan count 1, logical reads 90367, physical reads 462, read-ahead reads 6223, lob logical reads 90000, lob physical reads 6756, lob read-ahead reads 0.

 SQL Server Execution Times:

   CPU time = 1125 ms,  elapsed time = 7132 ms.

HOT DELETE

-----------------------------------------------------------------------------------------------------

SQL Server parse and compile time:

   CPU time = 345 ms, elapsed time = 1219 ms.

Table 'TBL_TEST_DELETE'. Scan count 1, logical reads 12620, physical reads 1, read-ahead reads 2666, lob logical reads 90000, lob physical reads 5620, lob read-ahead reads 0.

 SQL Server Execution Times:

   CPU time = 297 ms,  elapsed time = 3144 ms.

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 1 ms.

Table 'TBL_TEST_DELETE'. Scan count 1, logical reads 107747, physical reads 0, read-ahead reads 0, lob logical reads 90000, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:

   CPU time = 906 ms,  elapsed time = 1810 ms.

COLD DELETE 2

-----------------------------------------------------------------------------------------------------

SQL Server parse and compile time:

   CPU time = 391 ms, elapsed time = 1272 ms.

Table 'TBL_TEST_DELETE'. Scan count 1, logical reads 108950, physical reads 123, read-ahead reads 1248, lob logical reads 90000, lob physical reads 5620, lob read-ahead reads 0.

 SQL Server Execution Times:

   CPU time = 1250 ms,  elapsed time = 7753 ms.

....to be continued 😉

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 11

Wow... That's just silly how well that works! And I've tried a couple of different methods to beat it and haven't been able to even in the SIMPLE Recovery Model. Well done!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions