Best delete and archive method?

Question

Post reply

Best delete and archive method?

smv929

SSCrazy

Points: 2889
More actions
January 6, 2004 at 4:00 pm

#83745

What is the best technique for deleting and arciving. I currently run a basic script that

1. Populates a tempory table with a list of orders to archive.

2. Archives those orders' records to a set of archive tables on another server.

THis is dones using an INSERT join like...

    INSERT INTO arc_server.db..arc_table

    SELECT e.* from source_table e

    join temp_table d on e.order_num = d.order_num

3. Deletes the source records. It uses a DELETE join similar to the INSERT above.

Would I be better off to use replication, dts, or bcp out to txt file and bcp back into the archive tables. I would suppose dts is the best. If so, should I be employing parallel threads, table locks or any other best practice. Thanks for any advice.

smv929

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Steve Jones - SSC Editor SSC Guru Points: 728159 More actions · Answer 1

I like your method. It's simple, fast, almost bulletproof. Other methods will be more complex and have more places for things to go wrong.

The biggest issue is to be sure there are no errors before performing the delete. Might include a check that joins the two tables and looks for xx rows before doing the delete. @@rowcount from teh insert can give you the xx to look for.

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

Johan Bijnens SSC Guru Points: 134873 More actions · Answer 2

Maybe you should take locking into consideration.

Your methode will do fine, but when used with big volumes, the delete might lockout the table. You could avoid this by using smaler chunks.

First fill your archive table, then :

set nocount on

declare @rowcount int

set @rowcount = 1

while @rowcount > 0

begin

begin tran TopDelete

delete e

from source_table e

inner join (select top 5000 d.order_num

from source_table s

inner join temp_table d on s.order_num = d.order_num ) SelTop

-- I suppose Order_Num is your unique key ?

on e.order_num = SelTop.order_num

set @rowcount = @@rowcount

commit tran TopDelete

end

When using simple archive logging, this will also minimize growth of your log-files.

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

smv929 SSCrazy Points: 2889 More actions · Answer 3

alzdba,

Thanks. That's an elegant approach to batching a large result set. There's one complication: Many of the tables I delete/archive are Order-related. The order_header table has one record per order. However, the order_detail table has multiple orders per line. An order may have from 1, 5, 20, 50 (for one customer, up to 1500 lines).

So the above script will delete exactly 5000 records from the order_header record, which is great. However, the problem is when the source table is order_detail. The DELETE statement will allow many more records than 5000 to be deleted since order_detail has multiple lines per order. Is there a similar way to limit the batch to 5000 when deleting from the order_detail table? Thanks again.

smv929

Johan Bijnens SSC Guru Points: 134873 More actions · Answer 4

What I do now in such cases is I determine the average of dependant rows per dependant table for one parent row and adjust the top-statement. I also document this in the script !

So basicaly what I do is give it a number of parent-id's on which it should perform dependant deletes.

Some control is better than none.

You can, however, also perform these delete based on a "TopSel" part where you select the primary-keys of you dependant table when joined with the targeted parent table. Then your top-n will be exactly top-n and because you've deleted the previouse top-n dependant-rows, the next run will perform deletes to the next top-n rows,...

e.g.

-- replace the delete query in the while loop with this one

delete ChildTb

from T_Dependant1 ChildTb

inner join

(select top 1000 D.PkCol1, D.PkCol2 --all PK-columns

from T_Dependant1 D

inner join T_Parent P

on P.Pk = D.Related_Pk

-- ? order by D.PkCol1, D.PkCol2 --all PK-columns

) SelTop

on ChildTb.PkCol1 = SelTop.PkCol1

and ChildTb.PkCol2 = SelTop.PkCol2