Best delete and archive method?

  • What is the best technique for deleting and arciving. I currently run a basic script that

    1. Populates a tempory table with a list of orders to archive.

    2. Archives those orders' records to a set of archive tables on another server.

    THis is dones using an INSERT join like...

        INSERT INTO arc_server.db..arc_table

        SELECT e.* from source_table e

        join temp_table d on e.order_num = d.order_num

    3. Deletes the source records. It uses a DELETE join similar to the INSERT above.

    Would I be better off to use replication, dts, or bcp out to txt file and bcp back into the archive tables. I would suppose dts is the best. If so, should I be employing parallel threads, table locks or any other best practice. Thanks for any advice.

     


    smv929

  • I like your method. It's simple, fast, almost bulletproof. Other methods will be more complex and have more places for things to go wrong.

    The biggest issue is to be sure there are no errors before performing the delete. Might include a check that joins the two tables and looks for xx rows before doing the delete. @@rowcount from teh insert can give you the xx to look for.

  • Maybe you should take locking into consideration.

    Your methode will do fine, but when used with big volumes, the delete might lockout the table. You could avoid this by using smaler chunks.

    First fill your archive table, then :

    set nocount on

    declare @rowcount int

    set @rowcount = 1

    while @rowcount > 0

    begin

        begin tran TopDelete

        delete e 

        from source_table e

        inner join (select top 5000 d.order_num

                        from source_table s

                          inner join temp_table d on s.order_num = d.order_num ) SelTop

    -- I suppose Order_Num is your unique key ?

        on e.order_num = SelTop.order_num

        set @rowcount = @@rowcount

        commit tran TopDelete

    end

    When using simple archive logging, this will also minimize growth of your log-files.

     

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • alzdba,

    Thanks. That's an elegant approach to batching a large result set. There's one complication: Many of the tables I delete/archive are Order-related. The order_header table has one record per order. However, the order_detail table has multiple orders per line. An order may have from 1, 5, 20, 50 (for one customer, up to 1500 lines).

    So the above script will delete exactly 5000 records from the order_header record, which is great. However, the problem is when the source table is order_detail. The DELETE statement will allow many more records than 5000 to be deleted since  order_detail has multiple lines per order. Is there a similar way to limit the batch to 5000 when deleting from the order_detail table? Thanks again. 


    smv929

  • What I do now in such cases is I determine the average of dependant rows per dependant table for one parent row and adjust the top-statement. I also document this in the script !

    So basicaly what I do is give it a number of parent-id's on which it should perform dependant deletes.

    Some control is better than none. 

    You can, however, also perform these delete based on a "TopSel" part where you select the primary-keys of you dependant table when joined with the targeted parent table. Then your top-n will be exactly top-n and because you've deleted the previouse top-n dependant-rows, the next run will perform deletes to the next top-n rows,...

    e.g.

    -- replace the delete query in the while loop with this one

    delete ChildTb

    from T_Dependant1 ChildTb

    inner join

    (select top 1000 D.PkCol1, D.PkCol2  --all PK-columns

    from T_Dependant1 D

    inner join T_Parent P

    on P.Pk = D.Related_Pk

    -- ? order by D.PkCol1, D.PkCol2 --all PK-columns

    ) SelTop

    on ChildTb.PkCol1 = SelTop.PkCol1

    and ChildTb.PkCol2 = SelTop.PkCol2

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply