Ways to improve record deletion speed

Question

Post reply

Ways to improve record deletion speed

DamianC

SSCrazy Eights

Points: 8006
More actions
October 13, 2015 at 3:10 am

#312201

Hello

I have a table (F_POLICY_TRANSACTION)

This table has a couple of million rows in it

I am using a column named POLICY_TRANSACTION_BKEY to select records to delete (approximately 750k using the code below)

This column has a non-clustered index applied

This is the code I have used:

WHILE 1 = 1 BEGIN DELETE TOP(50000) FROM F_POLICY_TRANSACTION with (tablockx) WHERE [POLICY_TRANSACTION_BKEY] like 'SIC%'; IF @@ROWCOUNT < 50000 BREAK; END

Problem is, it takes around 10 minutes to run

Is there any way it can be made more efficient?

I have tried varying the rowcount with no success

Thanks

Damian.

- Damian

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Kristen-173977 SSCrazy Eights Points: 8687 More actions · Answer 1

When I do this (in batches) I wonder if the selection criteria (i.e. "like 'SIC%'" in your case) takes time on each iteration of the loop.

I get all the clustered index keys into a #temporary table, ordered by those keys, with an IDENTTIY column, and then delete in batches based on ranges of the ID. Something like this:



SELECT IDENTITY(int, 1, 1) AS MyID, ClustKey1, ClustKey2, ...

INTO #TEMP

FROM F_POLICY_TRANSACTION

WHERE [POLICY_TRANSACTION_BKEY] like 'SIC%'

ORDER BY ClustKey1, ClustKey2, ...

DECLARE	@intLoop int = 1

WHILE 1 = 1

 BEGIN

    DELETE D

    FROM #TEMP AS T

	JOIN F_POLICY_TRANSACTION AS D

		 ON D.ClustKey1 = T.ClustKey1

		AND D.ClustKey2 = T.ClustKey2

		AND ...

    WHERE T.ID BETWEEN @intLoop AND @intLoop + 50000

    IF @@ROWCOUNT < 50000 BREAK;

    SELECT @intLoop = @intLoop + 50000

 END

I'd be interested to hear if that is any more efficient for you.

DamianC SSCrazy Eights Points: 8006 More actions · Answer 2

DamianC

SSCrazy Eights

Points: 8006

October 13, 2015 at 9:33 am

#1833024

Thanks

That's runs through a lot quicker - less than a minute!

Interested to know why though

Why would the introduction of a temp table (with a join) for comparison run though faster than a straight delete?

Is it the fact that I am using an int based where in the delete?

- Damian

Bill Talada SSChampion Points: 11956 More actions · Answer 3

If you only have a few million rows and you are deleting about 30% of them, then likely the optimizer will use a table scan. If you delete 50k batches, that means 15 table scans. Statistics will likely not be updated automatically between the 15 batches. Also, if your table is a heap and your non-clustered index does not include the PK then a RID lookup will need to be done making the NC index less likely to be used.

I notice "Old Hands" solution selects by clustered index and also deletes by clustered index. That way whole pages get deleted in a single delete statement. The alternative would be to delete a row from a page with each loop causing much more overhead revisiting pages until a page finally gets freed.

ScottPletcher SSC Guru Points: 100262 More actions · Answer 4

For absolute max speed, you should also cluster the temp table on ID:



SELECT TOP (0) IDENTITY(int, 1, 1) AS ID, ClustKey1, ClustKey2 --, ...

INTO #TEMP

FROM F_POLICY_TRANSACTION

CREATE CLUSTERED INDEX TEMP__CL ON #TEMP ( ID ) WITH ( FILLFACTOR = 100 )

INSERT INTO #TEMP

SELECT ClustKey1, ClustKey2 --, ...

FROM F_POLICY_TRANSACTION

WHERE [POLICY_TRANSACTION_BKEY] LIKE 'SIC%'

ORDER BY ClustKey1, ClustKey2 --, ...

--...rest_of_code_as_before...

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Eric M Russell SSC Guru Points: 125255 More actions · Answer 5

So, you have a table with ~2,000k rows, you're deleting ~750k rows in batches of 50k, it's taking ~10 minutes, and you want to minimize the runtime duration. I'm guessing the biggest performance hit will be the I/O required by on the fly page reorganization and transaction logging. The problem is that deletes are the most expensive type of operation in that regard. Also, using the batch delete method, you're left with a fragmented table that could be the same size, or maybe even larger, than the original.

Maybe I'm wrong, we never know for sure until we experiment, but I suspect that selecting the rows you need into another table would require only 10 seconds or so, because selecting into a non-indexed heap table is a minimally logged operation. Once done, drop the original table, rename the temp table, and then re-create indexes (remember to add clustered index first, then non-clustered), which might take another 10 or 20 seconds. Another benefit of the select into method is that once done your table will be logically sorted with no page or index fragmentation.



SELECT *

INTO F_POLICY_TRANSACTION_TEMP

FROM F_POLICY_TRANSACTION (tablock)

WHERE POLICY_TRANSACTION_BKEY like 'SIC%';

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Kristen-173977 SSCrazy Eights Points: 8687 More actions · Answer 6

Kristen-173977

SSCrazy Eights

Points: 8687

October 13, 2015 at 12:02 pm

#1833051

ScottPletcher (10/13/2015)

For absolute max speed, you should also cluster the temp table on ID

Thanks Scott. I do in fact do that, declaring the ID as a PRIMARY KEY.



CREATE CLUSTERED INDEX TEMP__CL ON #TEMP ( ID ) WITH ( FILLFACTOR = 100 )

I think it will benefit from being declared as a UNIQUE index?

DamianC SSCrazy Eights Points: 8006 More actions · Answer 7

ok, so I am very nearly there using this:



SELECT TOP (0) IDENTITY(int,1 ,1) as ID, POLICY_TRANSACTION_BKEY

INTO #TEMP

FROM [F_POLICY_TRANSACTION]

-- Create a clustered index for efficiency

CREATE CLUSTERED INDEX TEMP__CL ON #TEMP ( ID ) WITH ( FILLFACTOR = 100 )

INSERT INTO #TEMP

SELECT POLICY_TRANSACTION_BKEY

FROM [F_POLICY_TRANSACTION]

WHERE POLICY_TRANSACTION_BKEY like = 'SIC%'

ORDER BY POLICY_TRANSACTION_BKEY

DECLARE	@intLoop int = 1

WHILE 1 = 1

 BEGIN

    DELETE D

    FROM #TEMP AS T

	JOIN [F_POLICY_TRANSACTION] AS D

		 ON D.POLICY_TRANSACTION_BKEY = T.POLICY_TRANSACTION_BKEY

    WHERE T.ID BETWEEN @intLoop AND @intLoop + 50000

    IF @@ROWCOUNT < 50000 BREAK;

    SELECT @intLoop = @intLoop + 50000

 END

 go

I runs through pretty quickly (less than a minute)

Now spotted an issue - it only deletes half the rows

Can anybody spot an error

Possibly an issue with the loop counter but can't quite see it

Thanks

- Damian

Kristen-173977 SSCrazy Eights Points: 8687 More actions · Answer 8

DamianC (10/14/2015)

Can anybody spot an error

Nope! :crying:

I'd suggest trying this to help with debugging:



SELECT TOP (0) IDENTITY(int,1 ,1) as ID, POLICY_TRANSACTION_BKEY

INTO #TEMP

FROM [F_POLICY_TRANSACTION]

-- Create a clustered index for efficiency

CREATE [highlight="#ffff11"]UNIQUE[/highlight] CLUSTERED INDEX TEMP__CL ON #TEMP ( ID ) WITH ( FILLFACTOR = 100 )

[highlight="#ffff11"]DECLARE	@intLoop int = 1,

	@intRowCount int,

	@intRowCountTotal int = 0[/highlight]

INSERT INTO #TEMP

SELECT POLICY_TRANSACTION_BKEY

FROM [F_POLICY_TRANSACTION]

WHERE POLICY_TRANSACTION_BKEY like = 'SIC%'

ORDER BY POLICY_TRANSACTION_BKEY

[highlight="#ffff11"]SELECT @intRowCount = @@ROWCOUNT

RAISERROR (N'INSERT INTO Rows %d', 10, 1, @intRowCount) WITH NOWAIT[/highlight]

WHILE 1 = 1

 BEGIN

    DELETE D

    FROM #TEMP AS T

	JOIN [F_POLICY_TRANSACTION] AS D

		 ON D.POLICY_TRANSACTION_BKEY = T.POLICY_TRANSACTION_BKEY

    WHERE T.ID BETWEEN @intLoop AND @intLoop + 50000

    [highlight="#ffff11"]SELECT @intRowCount = @@ROWCOUNT,

        @intRowCountTotal = @intRowCountTotal + @@ROWCOUNT,

        @intLoop = @intLoop + 50000

    RAISERROR (N'LOOP %d, Rows %d', 10, 1, @intLoop, @intRowCount) WITH NOWAIT

    IF @intRowCount < 50000 BREAK;[/highlight]

 END

[highlight="#ffff11"]RAISERROR (N'TOTAL ROWS %d', 10, 1, @intRowCountTotal) WITH NOWAIT[/highlight]

 go

Kristen-173977 SSCrazy Eights Points: 8687 More actions · Answer 9

Kristen-173977

SSCrazy Eights

Points: 8687

October 14, 2015 at 4:00 am

#1833165

P.S. I presume the Clustered Index Key (i.e. on POLICY_TRANSACTION_BKEY) in your table [F_POLICY_TRANSACTION] is unique / PKey?

DamianC SSCrazy Eights Points: 8006 More actions · Answer 10

DamianC

SSCrazy Eights

Points: 8006

October 14, 2015 at 8:19 am

#1833252

Ah, looks like POLICY_TRANSACTION_BKEY wasn't unique

This table now has an identity surrogate key on it

Using that worked perfectly

Thanks

- Damian

Kristen-173977 SSCrazy Eights Points: 8687 More actions · Answer 11

Kristen-173977

SSCrazy Eights

Points: 8687

October 15, 2015 at 2:41 am

#1833389

Might be worth adding your ID to make the Clustered Index unique (and then explicitly declaring the Clustered Index as UNIQUE) ... otherwise SQL will be adding a uniquifier value instead.

Mike Good SSCertifiable Points: 7415 More actions · Answer 12

Mike Good

SSCertifiable

Points: 7415

October 16, 2015 at 8:45 am

#1833842

FWIW - We have successfully used same method proposed by Kristen-173977 for many years, for both batched DELETEs and UPDATEs. The only difference is we explicitly CREATE the #keys table with a clustered identity ID, and then use INSERT to populate it.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 13

Mike Good (10/16/2015)

FWIW - We have successfully used same method proposed by Kristen-173977 for many years, for both batched DELETEs and UPDATEs. The only difference is we explicitly CREATE the #keys table with a clustered identity ID, and then use INSERT to populate it.

In the presence of a large number of rows to populate the Temp Table with, you might be surprised at how fast SELECT INTO followed by the creation of the clustered index is even in the FULL Recovery Model.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

ScottPletcher SSC Guru Points: 100262 More actions · Answer 14

Jeff Moden (11/1/2015)

Mike Good (10/16/2015)

FWIW - We have successfully used same method proposed by Kristen-173977 for many years, for both batched DELETEs and UPDATEs. The only difference is we explicitly CREATE the #keys table with a clustered identity ID, and then use INSERT to populate it.

In the presence of a large number of rows to populate the Temp Table with, you might be surprised at how fast SELECT INTO followed by the creation of the clustered index is even in the FULL Recovery Model.

I believe you can create the clustered index before loading the table, and still get the great benefits of minimal logging that SELECT INTO provides, if you code the INSERT as required, i.e., you use a TABLOCK hint on the destination table:

INSERT INTO table_name ( ... ) WITH (TABLOCK) SELECT ... FROM ...

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!