Purge and Archive using ssis packages

  • Purge and Archive using ssis 2008R2 packages

    The OLTP DB and Archive DB are of two separate servers. We have like hundreds of tables. Deletion is not a problem since I can issue a delete command (base on a condition), and this can be done on Execute SQL task job. I can loop through the table list for deletion and issue a execute sql job -- this is feasible.

    However, the archiving thing is somewhat hard to implement. For what I understood from your suggestion, I still have to create one data flow per table coz for each table it has different metadata column mappings. I cannot use OLEDB Source and destination and just loop through the table list for archive.

    I just wanna get all data with specific condition and transfer it to an Archive DB. We dont want to use linked server too (INSERT INTO ArchiveTable SELECT {Columns here} FROM ServerName.Databasename.dbo.OLTPTable WHERE {condition}).

    Kindly suggest.

  • A solution might be that you generate SSIS packages.

    This allows you to deal with the large number of tables in a flexible way.

    There is a stairway to BIML on this website:

    http://qa.sqlservercentral.com/stairway/100550/

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • jansub07 (5/26/2014)


    Purge and Archive using ssis 2008R2 packages

    The OLTP DB and Archive DB are of two separate servers. We have like hundreds of tables. Deletion is not a problem since I can issue a delete command (base on a condition), and this can be done on Execute SQL task job. I can loop through the table list for deletion and issue a execute sql job -- this is feasible.

    However, the archiving thing is somewhat hard to implement. For what I understood from your suggestion, I still have to create one data flow per table coz for each table it has different metadata column mappings. I cannot use OLEDB Source and destination and just loop through the table list for archive.

    I just wanna get all data with specific condition and transfer it to an Archive DB. We dont want to use linked server too (INSERT INTO ArchiveTable SELECT {Columns here} FROM ServerName.Databasename.dbo.OLTPTable WHERE {condition}).

    Kindly suggest.

    How many tables are you actually setting up for the archival process? And what is the "condition" that you're basing the DELETEs on? I ask because if the condition is based on (for example), a date column (especially if it has a common name like "Created_On"), this would be a relatively simple task using a bit of dynamic SQL and a Linked Server (or trusted-connection OPENDATASOURCE, etc). All you'd need is the list of tables (and, possibly, the nane of the column for the "condition") to build the dynamic SQL from. Then you could run a single stored procedure instead of multiple generated packages.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (5/26/2014)


    Then you could run a single stored procedure instead of multiple generated packages.

    But you could run the packages in parallel 😉

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (5/26/2014)


    Jeff Moden (5/26/2014)


    Then you could run a single stored procedure instead of multiple generated packages.

    But you could run the packages in parallel 😉

    As you could with the T-SQL, but true enough that it's easier to do so in SSIS. But, let's run just a handful of either in parallel for this task and see what happens to the hard drive on the other server and if they'd actually complete before a serial task. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (5/26/2014)


    Koen Verbeeck (5/26/2014)


    Jeff Moden (5/26/2014)


    Then you could run a single stored procedure instead of multiple generated packages.

    But you could run the packages in parallel 😉

    As you could with the T-SQL, but true enough that it's easier to do so in SSIS. But, let's run just a handful of either in parallel for this task and see what happens to the hard drive on the other server and if they'd actually complete before a serial task. 🙂

    True true, you get higher I/O latency when writing in parallel.

    Most of the times I do get a performance benefit if all the destinations are differerent tables.

    At my current client I have about 10 tables. If I write in serial, it takes about 15 minutes (slow Oracle source). If I write in parallel, it's about 8 minutes.

    That's enough performance improvement by just arranging things differently in SSIS.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • The nearly 50% improvement you identified definitely makes it worth the try. But as the disks continue to get bigger and bigger, the chances of the tables living on the same spindle or spindles increases. Parallel actions, such as these, were very much more effective when we had many more spindles (and, thus, R/W heads) in play.

    For grins about a year ago, I tried backing up two fairly large databases to the same drive letter at the same time. It was a dismal failure and took a lot longer than serial backups even though the test was on a SAN.

    Still definitely worth the try, though.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply