LCK_M_U delete 4999 rows.

Question

Post reply

LCK_M_U delete 4999 rows.

charliex

SSC-Addicted

Points: 491
More actions
February 16, 2017 at 2:21 pm

#320460

We are trying to delete a lot of data from a large table. We have been trying small batches that I if I understand correctly should prevent escalating locks. We are trying 4999 and even lower numbers. We are running multiple threads so the thinking is multiple delete statements of 4999 rows at the same time. The statements simply block each other and we see LCK_M_U update locks. Only the deletes are running against this table currently and nothing else. Obviously this isn't going to be the most efficient way to remove data. What would be the best way to remove as much data as possible as quickly as possible?

The old process was the delete loop which simply wasn't fast enough. Thank you.

SELECT NULL WHILE @@ROWCOUNT <> 0 BEGIN DELETE TOP (100000) FROM tablename WHERE Column < getdate()-5 END

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

ScottPletcher SSC Guru Points: 100262 More actions · Answer 1

How is the table clustered? If it's clustered on the date, you'll be fine, just run one DELETE at a time for a decent-sized chunk of rows, 20000 - 100000 depending on row width, log capacity, activity, etc..

If the table is clustered on something else, it's likely the DELETEs will step on each other, and each will be slow even by itself. How large (how many pages) is the table? You could try a nonclustered index with a leading key of "column": that would likely help for small DELETEs, more so as long as the deleted row count is a relatively small percentage of the total pages of the table.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

charliex SSC-Addicted Points: 491 More actions · Answer 2

ScottPletcher - Thursday, February 16, 2017 4:38 PM

How is the table clustered? If it's clustered on the date, you'll be fine, just run one DELETE at a time for a decent-sized chunk of rows, 20000 - 100000 depending on row width, log capacity, activity, etc..

If the table is clustered on something else, it's likely the DELETEs will step on each other, and each will be slow even by itself. How large (how many pages) is the table? You could try a nonclustered index with a leading key of "column": that would likely help for small DELETEs, more so as long as the deleted row count is a relatively small percentage of the total pages of the table.

Thank you for the response. I should have been more clear. We aren't using the while loop above any longer. Instead of a date field we are deleting off an id field which is part of the clustered index. We have two systems as well. One is an archive history system, the other an active data system. The method we are using with batches of 4999 rows works much better on the history system than it does the active data system where those same deletes seem to block constantly. The systems have the same resources and only the insert and deletes are hitting these tables, so no users interfering with them. It just seems no matter what adjustments we make the deletes bump up against each other on this active data system. The only difference I see between the systems is the max degree of parallelism is different between the two. I guess I could adjust that to be the same.

John Mitchell-245523 SSC Guru Points: 148786 More actions · Answer 3

John Mitchell-245523

SSC Guru

Points: 148786

February 17, 2017 at 7:10 am

#1929095

If you aren't using the while loop any more, what method are you using to delete in batches? Is the whole thing running in an explicit transaction, by any chance?

John

charliex SSC-Addicted Points: 491 More actions · Answer 4

John Mitchell-245523 - Friday, February 17, 2017 7:10 AM

If you aren't using the while loop any more, what method are you using to delete in batches? Is the whole thing running in an explicit transaction, by any chance?

John

No it is implicit. So we have been trying to get a little fancy and run this as a service in C# code. So we are opening a connection, perform the operation then close the connection. Just experimenting with different amounts of threads to see how many we can run without blocking. Started with 10, dropped to 5 with little difference. Now trying to run 2 or 3 at a time with just 4000 row deletes and still getting a lot of blocks that show up as LCK_M_U with the first delete at the head of the blocking chain.

Mr. Brian Gale SSC-Insane Points: 24363 More actions · Answer 5

What do you get from:
SELECT count(*) FROM tablename
WHERE Column >= getdate()-5

I am wondering if instead of doing a DELETE, you could select the rows you want to keep to a temporary location (temp table, table variable, real table that you will drop later) then truncate the table and then dump the good data back into it?

I imagine it is a much smaller data set. The only downsides I can see are:
1 - if somebody tries to SELECT the data after the truncate but before the insert, they may get unexpected results
2 - if somebody tries to insert after you back up the table but before you truncate, that data will be lost

If possible, I'd pick some downtime to do this and run with it. A SELECT will produce a shared lock, so other things can still SELECT from it. Unfortunately, a shared lock will block INSERTs and DELETEs, both of which require exclusive locks.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

charliex SSC-Addicted Points: 491 More actions · Answer 6

bmg002 - Friday, February 17, 2017 8:04 AM

What do you get from:
SELECT count(*) FROM tablename
WHERE Column >= getdate()-5

I am wondering if instead of doing a DELETE, you could select the rows you want to keep to a temporary location (temp table, table variable, real table that you will drop later) then truncate the table and then dump the good data back into it?

I imagine it is a much smaller data set. The only downsides I can see are:
1 - if somebody tries to SELECT the data after the truncate but before the insert, they may get unexpected results
2 - if somebody tries to insert after you back up the table but before you truncate, that data will be lost

If possible, I'd pick some downtime to do this and run with it. A SELECT will produce a shared lock, so other things can still SELECT from it. Unfortunately, a shared lock will block INSERTs and DELETEs, both of which require exclusive locks.

The data sets we are working with are very large unfortunately. Am I crazy for thinking the 4999 prevents escalating locks?

ScottPletcher SSC Guru Points: 100262 More actions · Answer 7

charliex - Friday, February 17, 2017 6:56 AM

ScottPletcher - Thursday, February 16, 2017 4:38 PM

How is the table clustered? If it's clustered on the date, you'll be fine, just run one DELETE at a time for a decent-sized chunk of rows, 20000 - 100000 depending on row width, log capacity, activity, etc..

If the table is clustered on something else, it's likely the DELETEs will step on each other, and each will be slow even by itself. How large (how many pages) is the table? You could try a nonclustered index with a leading key of "column": that would likely help for small DELETEs, more so as long as the deleted row count is a relatively small percentage of the total pages of the table.

Thank you for the response. I should have been more clear. We aren't using the while loop above any longer. Instead of a date field we are deleting off an id field which is part of the clustered index. We have two systems as well. One is an archive history system, the other an active data system. The method we are using with batches of 4999 rows works much better on the history system than it does the active data system where those same deletes seem to block constantly. The systems have the same resources and only the insert and deletes are hitting these tables, so no users interfering with them. It just seems no matter what adjustments we make the deletes bump up against each other on this active data system. The only difference I see between the systems is the max degree of parallelism is different between the two. I guess I could adjust that to be the same.

Sorry, I should have been more specific as well. You need the leading column(s) in the clustered index. Presumably identity wouldn't be the leading key in a multi-key index as it would normally be unique by itself. You could create a covering index on the actual lookup key(s) to extract the specific clustering keys to delete, then delete those, while also specifying the min and max clustering key values to "help" SQL keep a tight range on the table being deleted from.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Mr. Brian Gale SSC-Insane Points: 24363 More actions · Answer 8

charliex - Friday, February 17, 2017 8:17 AM

bmg002 - Friday, February 17, 2017 8:04 AM

What do you get from:
SELECT count(*) FROM tablename
WHERE Column >= getdate()-5

I am wondering if instead of doing a DELETE, you could select the rows you want to keep to a temporary location (temp table, table variable, real table that you will drop later) then truncate the table and then dump the good data back into it?

I imagine it is a much smaller data set. The only downsides I can see are:
1 - if somebody tries to SELECT the data after the truncate but before the insert, they may get unexpected results
2 - if somebody tries to insert after you back up the table but before you truncate, that data will be lost

If possible, I'd pick some downtime to do this and run with it. A SELECT will produce a shared lock, so other things can still SELECT from it. Unfortunately, a shared lock will block INSERTs and DELETEs, both of which require exclusive locks.

The data sets we are working with are very large unfortunately. Am I crazy for thinking the 4999 prevents escalating locks?

I could be wrong, but I am pretty sure that as soon as SQL sees DELETE, it makes a request for an exclusive lock on the row and with multiple rows being deleted it will need to decide if it needs page or table locking. Same thing with an INSERT. I think if the delete is small enough, it should only be doing a row level lock not a table lock. You could check the locks and see if you are getting any TAB_IX locking (table locks). But any delete will cause some form of locking. I've read that some people need to do it 1000 rows at a time and even that is taking 10's of seconds to complete.
if you look at the estimated execution plan for your delete, is it suggesting an index? If so, that may help the delete.
If you can get some downtime, I'd recommend doing it during downtime instead. Then if you block, it isn't a big deal.

here is a fun little article on lock escalation:
http://aboutsqlserver.com/2012/01/11/locking-in-microsoft-sql-server-part-12-lock-escalation/

but I don't think that is your issue. I THINK you are getting page locking which is what is causing the blocking. This would be if you have multiple rows in a page, you delete 1 of those rows, the entire page would be blocked. I might be wrong, but I think page locks end up locking any extent that covers that page. It might be table locking though.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

charliex SSC-Addicted Points: 491 More actions · Answer 9

bmg002 - Friday, February 17, 2017 8:36 AM

charliex - Friday, February 17, 2017 8:17 AM

bmg002 - Friday, February 17, 2017 8:04 AM

What do you get from:
SELECT count(*) FROM tablename
WHERE Column >= getdate()-5

I am wondering if instead of doing a DELETE, you could select the rows you want to keep to a temporary location (temp table, table variable, real table that you will drop later) then truncate the table and then dump the good data back into it?

I imagine it is a much smaller data set. The only downsides I can see are:
1 - if somebody tries to SELECT the data after the truncate but before the insert, they may get unexpected results
2 - if somebody tries to insert after you back up the table but before you truncate, that data will be lost

If possible, I'd pick some downtime to do this and run with it. A SELECT will produce a shared lock, so other things can still SELECT from it. Unfortunately, a shared lock will block INSERTs and DELETEs, both of which require exclusive locks.

The data sets we are working with are very large unfortunately. Am I crazy for thinking the 4999 prevents escalating locks?

I could be wrong, but I am pretty sure that as soon as SQL sees DELETE, it makes a request for an exclusive lock on the row and with multiple rows being deleted it will need to decide if it needs page or table locking. Same thing with an INSERT. I think if the delete is small enough, it should only be doing a row level lock not a table lock. You could check the locks and see if you are getting any TAB_IX locking (table locks). But any delete will cause some form of locking. I've read that some people need to do it 1000 rows at a time and even that is taking 10's of seconds to complete.
if you look at the estimated execution plan for your delete, is it suggesting an index? If so, that may help the delete.
If you can get some downtime, I'd recommend doing it during downtime instead. Then if you block, it isn't a big deal.

here is a fun little article on lock escalation:
http://aboutsqlserver.com/2012/01/11/locking-in-microsoft-sql-server-part-12-lock-escalation/

but I don't think that is your issue. I THINK you are getting page locking which is what is causing the blocking. This would be if you have multiple rows in a page, you delete 1 of those rows, the entire page would be blocked. I might be wrong, but I think page locks end up locking any extent that covers that page. It might be table locking though.

Yes I think you are exactly right. Would love to get away from this method and do some sliding window type of setup, but we aren't sure that would work for us, with child tables involved.

Mr. Brian Gale SSC-Insane Points: 24363 More actions · Answer 10

charliex - Friday, February 17, 2017 8:52 AM

bmg002 - Friday, February 17, 2017 8:36 AM

charliex - Friday, February 17, 2017 8:17 AM

bmg002 - Friday, February 17, 2017 8:04 AM

What do you get from:
SELECT count(*) FROM tablename
WHERE Column >= getdate()-5

I am wondering if instead of doing a DELETE, you could select the rows you want to keep to a temporary location (temp table, table variable, real table that you will drop later) then truncate the table and then dump the good data back into it?

I imagine it is a much smaller data set. The only downsides I can see are:
1 - if somebody tries to SELECT the data after the truncate but before the insert, they may get unexpected results
2 - if somebody tries to insert after you back up the table but before you truncate, that data will be lost

If possible, I'd pick some downtime to do this and run with it. A SELECT will produce a shared lock, so other things can still SELECT from it. Unfortunately, a shared lock will block INSERTs and DELETEs, both of which require exclusive locks.

The data sets we are working with are very large unfortunately. Am I crazy for thinking the 4999 prevents escalating locks?

I could be wrong, but I am pretty sure that as soon as SQL sees DELETE, it makes a request for an exclusive lock on the row and with multiple rows being deleted it will need to decide if it needs page or table locking. Same thing with an INSERT. I think if the delete is small enough, it should only be doing a row level lock not a table lock. You could check the locks and see if you are getting any TAB_IX locking (table locks). But any delete will cause some form of locking. I've read that some people need to do it 1000 rows at a time and even that is taking 10's of seconds to complete.
if you look at the estimated execution plan for your delete, is it suggesting an index? If so, that may help the delete.
If you can get some downtime, I'd recommend doing it during downtime instead. Then if you block, it isn't a big deal.

here is a fun little article on lock escalation:
http://aboutsqlserver.com/2012/01/11/locking-in-microsoft-sql-server-part-12-lock-escalation/

but I don't think that is your issue. I THINK you are getting page locking which is what is causing the blocking. This would be if you have multiple rows in a page, you delete 1 of those rows, the entire page would be blocked. I might be wrong, but I think page locks end up locking any extent that covers that page. It might be table locking though.

Yes I think you are exactly right. Would love to get away from this method and do some sliding window type of setup, but we aren't sure that would work for us, with child tables involved.

Yeah... once you get child tables involved it gets more messy. I'm assuming that means you have some FK's and these are cascading deletes, eh?
If so, that is beyond my knowledge on how to make for faster deletes.

If you look at the estimated or actual execution plan for the delete, is it suggesting an index? And does the table (and child tables) have a lot of indexes already? Each row deleted would need to also update all of the indexes.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.