Large delete statement problem

Question

Post reply

Large delete statement problem

Tim Coker

Right there with Babe

Points: 715
More actions
November 29, 2006 at 4:05 pm

#76621

I have an automation system that includes a SQL Server instance for reporting. The system was set up to purge data older than three months once a month. It seemed to be running fine, but upon greater investigation, it turns out that while the delete is happening (it occurs in a SQL server agent job as the 2nd step of the monthly archive/backup) no data makes it into the database. I confirmed this by looking at scan data that gets inserted every 10 seconds or so. The delete can last up to 20 minutes, and during these 20 minutes, no scan data exists when the job last ran.

I'm wondering how I can alleviate this. It seems that the delete statement is escalating the locks it acquires to table locks and prevents anyone else from inserting data (this is somewhat of a guess on my part). My initial thought was to modify the instead of delete trigger (discussed below) to loop through the deleted records, manually delete them myself, but start a transaction immediately before each delete and commit it immediately after the the delete finishes. My question is really about the effect this would have. The trigger itself would be wrapped inside of a transaction by the nature that it is a trigger, so would the extra transaction begin/commit do anything to help keep the tables available for inserts and reads? I don't really care how long the delete process takes. It could last for a week, and it wouldn't matter as long as the data ultimately gets purged and the tables remain available to other users.

I further found that the vendor set up an instead of delete trigger where they manually delete all child records in related tables by looping through the deleted records using a cursor. They have cascading deletes enabled on the tables, and if I remove the trigger, SQL Server handles deleting all of the necessary data from the child tables. The kicker is that the delete performs about twice as fast their trigger disabled. I'm just wondering if anyone would have any idea why they would have done it like this. The only thing I can think of is they were paranoid about SQL Server not handling the cascading deletes properly, but I verified that it works flawlessly. One of the child tables has 2 foreign keys into the main parent table, so it could be possible that these 2 keys would cause some issues with cascading deletes. As it turns out, though, we don't use this particular table in our system, so it doesn't have any effect.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Jeff Gray SSChampion Points: 10667 More actions · Answer 1

Hi Tim,

Your particular situation is probably a bit too complicated to give you a cookie-cutter answer, so instead I propose a strategy rather than a solution.

First, if you don't care how long it takes, you could reduce your impact by deleteing a day's worth of data once a day, rather than a whole month once a month. This might shorten the amount of time that your tables are locked. Taking it a little further, you could delete an hour's worth every hour, and so on.

This is not considering any business impact. It might be better to just confirm that things worked once a month, rather than having to monitor results every hour, so this may not work for your situation.

Another approach would be to perform the deletes in smaller batches. In the past, I've accomplished this by building a table of PKs that I intend to delete, and then using that table to get a list of rows to delete in a batch. Then I grab the next set of keys and delete those rows, repeating until the list is exhausted. I build the table of keys so that I don't have to run a potentially expensive query against a possibly large table for each batch iteration.

I've tried both methods in the past, and I normally would use the second one.

Good luck and let us know how it comes out.

jg

Robert Davis One Orange Chip Points: 28027 More actions · Answer 2

Also make sure your date field is indexed. For a real performance boost, store the date as an integer (20060704 = July 4, 2006, e.g.) as this will create a much smaller index and much quicker scanning of ranges.

Another trick to boost performace is to first calculate the date that you want to be the cutoff date first in a separate operation and then use the date in the criteria.

For example, this:

Declare @DateStart datetime

Set @DateStart = DateAdd(month, -3, getdate())

Select *

From MyBigTable

Where DateField < @DateStart

Will perform much better than this:

Select *

From MyBigTable

Where DateField < DateAdd(month, -3, getdate())

And if you do it like Jeff's 2nd suggestion, when you do the select step, use the nolock optimizer hint so that your process will ignore locks and not use locks itself.

My blog: SQL Soldier[/url]
SQL Server Best Practices: SQL Server Best Practices
Twitter: @SQLSoldier
My book: Pro SQL Server 2008 Mirroring[/url]
Microsoft Certified Master: SQL Server, Data Platform MVP
Database Engineer at BlueMountain Capital Management[/url]

SQL ORACLE One Orange Chip Points: 27807 More actions · Answer 3

SQL ORACLE

One Orange Chip

Points: 27807

December 1, 2006 at 11:22 am

#675222

Could you set the recovery mode of the database to SIMPLE before deleting?

Tim Coker Right there with Babe Points: 715 More actions · Answer 4

Thanks for the suggestions.

We've tried paring down what we delete to something very small (a handful of the parent records with all associated records), but since data is coming in every 9 seconds, this still causes data loss, which we would like to avoid.

The suggestion about the datetime/integer column and precalculating the date from which to delete is interesting. When you run the dateadd inline, does SQL Server actually calculate the value each time it compares it to a record in the table? I'll probably do a integer based date and change the job to precalculate. It should speed things up, but the bigger problem is still data loss. The nolock optimizer sounds like it might help. The vendor suggested using rowlock, but I haven't used query hints, so I need to do a little research. They're investigating further on their side as well.

As for the trigger, they confirmed what I suspected. The trigger is supposed to be in systems that use that one table with two keys into the parent. In our installation, that table isn't used, so the cursor can go away.

I don't think the vendor was aware of the problem whatsoever. In most other installations, the impact would be a few scans missing. We ended up doing some crazy things with the data retention policy that made the database grow considerably from its intended size. This caused the delete statement to take 7 hours, at which point we noticed the significant amount of missing data. I posted this originally at the same time I emailed them, and their response was encouraging. I've dealt with other vendors that basically swear off these types of problems.

Does anyone know of a good overview of using query hints? I haven't looked at BOL yet, but a lot of the information tends to be concerned with the syntax of particular commands and doesn't do that great of a job of introducing a topic. I'll check google, too, but if someone knows of a good introduction off the top of their head, it would help.

Thanks again for the help!

Jeff Gray SSChampion Points: 10667 More actions · Answer 5

Tim,

How many casecade-related tables do you have, that you can't get out of the way of an insert every 9 seconds?

Also, why doesn't the insert just get blocked and then succeed after the deletes are done?

Something doesn't sound right here...

Anyway. You can specify a lock hint by placing it inside of parens after the table name.

Delete tblYourTable with (ROWLOCK)

where YourTableID = @SomeTableID

In BOL, see the topic, subtopic: hints, locking hints

jg

Tim Coker Right there with Babe Points: 715 More actions · Answer 6

We probably could get out of the way for an insert every 9 seconds, but the interval isn't guaranteed to be 9 seconds. It could be less than that. I also don't want to have a setup where I am constantly deleting data as time passes. The system was designed to do a monthly backup (the "archive"), then delete any data older than 3 months. (Yes, the archives all have redundant data.) If I implement a scheme where I delete one record every time one is inserted and the backup fails for a month, I could potentially blow away data that isn't in any of the archives.

I like your suggestion of generating a list of ids from the parent to be deleted. It seems that SQL Server is escalating my lock to a table lock to conserve resources. IE, its trying to do row locks, but it gets so many, that it escalates my lock to a table lock. I'm now thinking that instead of the delete each month, the job will just generate a list of IDs to be deleted into a table. I'll then have a SQL Server Agent job check this table every hour or so (maybe as low as 10 mins?) and delete 5 records at a time using the rowlock hint. I would expect to run into major problems if I try to delete all the data at once using the rowlock hint. Even with a few parent records, SQL Server escalated the lock to a table lock in testing the other day. Can SQL Server override the hints provided?

For the record, there is one parent table, 8 child tables, and 1 grandchild table. Of the child tables, most have 1 record that corresponds to the parent record. There is one that has more, which also has the grandchild hanging off of it. It varies in the number of records, but I've seen it as low as 60, and as high as 200. The grandchild table will have the same number of records. The grandchild table consists of a bunch of image fields, which can be quite large.

Jeff Gray SSChampion Points: 10667 More actions · Answer 7

Hi Tim,

That's good information. Now, can you observe the execution plan of the delete statements? One thing that I've found is that a select or delete that causes a table or index scan can cause lock escalation problems even when affecting only a small number of rows. I do believe that SQL server will ignore lock hints if it wants to, but I'm not 100% sure.

Anyway, check the execution plan of the deletes for any scans. You may need to add or adjust an index or two.

Also, if your selection criteria (rows > 3 months old) causes a table scan, and you are using that query in the same transaction as the delete, then your table scan will be part of the blocking transaction. That is a good reason to build the key table first.

Tim Coker Right there with Babe Points: 715 More actions · Answer 8

I just ran across the link below, which confirms what I was afraid of. The rowlock will tell sqlserver where to start with locks, but the server can escalate if it decides to. You have no control over it.

http://groups.google.com/group/microsoft.public.sqlserver.programming/browse_thread/thread/c331c731e64bc573

I'll run a delete for 1 record and let you knwo what I find.

Tim Coker Right there with Babe Points: 715 More actions · Answer 9

Ha. The statement was generating a table scan for one table (the child with the grandchild hanging off of it). I indexed the column it was scanning, and I now see no escalations with a few records on my laptop. If I delete too many, I still see lock escalations in profiler, but really, that's in some ways expected when you bite off too much. This means I think I'm still going to have to go the route of generating a list to delete and have a job come by and purge a few at a time, but the results I'm getting now are much more along the lines of what I would have originally expected.

Thanks again for everyone's help. I'm going to send an email to the vendor with what I've found, and let them mull it over.

MohammedU SSCoach Points: 19644 More actions · Answer 10

I am purging the data from highly OLTP tables... some of my tables are accessed and inserted and updated by users all the time...

I use the following method...

1. Created the index on datetime column...

2. Delete 100/500/1000 rows at a time depends on sweet spopt figured it out by trial bases ...Delete one row at a time if you don't want to delete 500/1000 chunks at a time...

3. after each delete I use 'WAITFOR DELAY '00:00:00.001'" command to yield other SPIDs if any waiting...

Here is the examaple code...

SET NOCOUNT ON

DECLARE @NumRows int , @RecordsDeleted int , @From Datetime

select @NumRows = count(*) from tablename...

WHILE @RecordsDeleted <= @NumRows

BEGIN

SET ROWCOUNT 500

DELETE FROM TABLE WHERE DtColName <= @From

WAITFOR DELAY '00:00:00.001'

SELECT @RecordsDeleted = @RecordsDeleted + 1000

END

MohammedU
Microsoft SQL Server MVP