Transfering data between tables in chunks

  • We have implemented local partitioned views for better manageability of our 1 + billion record table.

    This table has archived data for a couple of years, so we divided it into several tables, one for each

    quarter. Only the current quarter table remains active for write ops, and others remain read only.

    For backups purposes we keep the archive tables in one database and the active tables in other database,

    so that we need to perform regular backups on the active database only. Once in a quarter we move the

    table for the just concluded quarter to the archive database and do a full backup of archive database. Then

    drop the table from the active database.

    With that back ground I have a couple of quations:

    The quarterly table I was talking about, gets to ~ 100 million records.

    What is the best way to transfer this table between the databases. My concern was if I do it in one shot,

    any should it not be completed in one transaction, the rollback of that magnitude is scary. I would

    prefer if there is any way to transfer in chunks (like few thousand records) and commit, so my log and

    recovery interval are in check. The only way to do this now seems to be use DTS Transformation, which seems

    more like bcp out and bcp in. I notice DTS packages disply messages like 1000 rows copied etc,.

    Is there any way to do the same from TSQL (like copying 1000 rows at a time), without going to a file in between?

    thanks.

  • how about write a sql script to devide the chunk into smaller piece by date time or by whatever you are using as your Primary key as long as they are int and you could do an incrementation base on them. once you are successfull on deviding the record, you could write a cursor or even a while loop to loop through the whole procesess

    example:

    while @i < 1000000

    begin

    insert tablename

    select * from tablename

    where id >= @i

    and id <= @i + 1000

    set @i = i+1000

    end

    mom

  • Since you don't state what your PK is this is a little hard to do. But I would create a script that will set the rowcount to 1000 and then loop through until your updated rowcount is 0...

    
    
    DECLARE @iNumRows int
    DECLARE @iStartID int
    DECLARE @iRowsAffected int
    DECLARE @iErr int
    SET @iNumRows = 1000 -- The number you want per batch...
    SET @iStartID = 0 -- InitialPK value...
    SET @iRowsAffected = 1 -- Any Non zero number..

    SET ROWCOUNT @iNumRows
    WHILE @iRowsAffected >= 1
    BEGIN
    BEGIN TRAN
    INSERT INTO ....
    SELECT ....
    WHERE PK >= @iStartID
    ORDER BY PK

    SELECT @iErr = @@Error, @iNumRowsAffected = @@ROWCOUNT
    IF @iErr != 0 ROLLBACK TRAN -- and handle error as needed

    COMMIT TRAN -- You might even want to truncate the log here...
    SELECT @iStartID = MAX(PK)
    FROM REMOTETABLE...
    END -- While loop

    SET ROWCOUNT 0 -- Clean up after yourself...

    Please note I have not tested this. I just typed it up here. I've done things like this in the past and it has worked great though! You may also need to change your @iStartID to what every your PK value is. And make sure you do an order by on your PK field(s).

    Gary Johnson

    Microsoft Natural Language Group

    DBA, Sr. DB Engineer




    Gary Johnson
    Microsoft Natural Language Group
    DBA, Sr. DB Engineer

    This posting is provided "AS IS" with no warranties, and confers no rights. The opinions expressed in this post are my own and may not reflect that of my employer.

  • Thanks to mom, gljjr and erasmusj for your replies, they would definitely work. But consider the situation where I do not have a PK, or my primary key consists of multiple columns and character based. I am curious how the DTS handles this. Because when I do DTS Import Export wizard for tables with no primary keys or multiple columns in the PK, it still does 1000 or so records at a time. My guess is that it is transfering the data to file and loading in chunks at time (some varation of BCP). Is that right? Or is it doing in memory, copying out and loading in rows. If it is doing in memory how does it keep pointer to where it had left off in source (we are talking about tables with no PK)..

    BTW, I ran a trace while executing the package and I see a construct "insert bulk [PartDB].[dbo].[newtbl]("Id" int,

    "State" varchar(50) ) where it does the insert. I see no references on the "insert bulk" on BOL.. Does anyone know what this is..

    At this time I am leaning towards using a DTS transaformation, simply because it does the transfer in chunks like I wanted and saves me coding for it, and call the DTS from the procedure that does this task. Any reasons why I should not do this?

  • DTS would show you that it finish 1000 chunk at a time but if it failed it would roll back everything leaving you to keep 0 rows of data. My other though was, is there any way to insert identity field just for the sake of keeping place or if there is any date or time for that matter that we could go by?

    mom

  • With the batch size configuration, I can limit the #rows in my transaction, thus if the batch fails only the inserts in that batch are rolled back. I do have a date field and approximately each date has even distribution of records and I can do inserts by the date. But, consider my question more as an investigative.. How could the DTS transformation task be handling this, given it can work against any table whether has a PK or not or has multiple columns in the PK. And it does it in batches, with each a unit of transaction. That is what we should be able to do in TSQL without having to alter the table structure, because I do not see DTS altering the table.. Is DTS doing a BCP out and bulk copy in (in memory, since there is no file to be seen), is it using cursors or some other internal undocumented method not exposed through TSQL etc,. As I have already stated when I capture the DTS in trace I see it running a command "insert bulk" that has no references any where..

    Anyone from Microsoft in the forum?

  • sxg6023,

    I am really not sure what DTS does behind the back door. Here is what I found on BOL:

    BULK INSERT [ [ 'database_name'.] [ 'owner' ].] { 'table_name' FROM 'data_file' }

    [ WITH

    (

    [ BATCHSIZE [ = batch_size ] ]

    [ [ , ] CHECK_CONSTRAINTS ]

    [ [ , ] CODEPAGE [ = 'ACP' | 'OEM' | 'RAW' | 'code_page' ] ]

    [ [ , ] DATAFILETYPE [ =

    { 'char' | 'native'| 'widechar' | 'widenative' } ] ]

    [ [ , ] FIELDTERMINATOR [ = 'field_terminator' ] ]

    [ [ , ] FIRSTROW [ = first_row ] ]

    [ [ , ] FIRE_TRIGGERS ]

    [ [ , ] FORMATFILE = 'format_file_path' ]

    [ [ , ] KEEPIDENTITY ]

    [ [ , ] KEEPNULLS ]

    [ [ , ] KILOBYTES_PER_BATCH [ = kilobytes_per_batch ] ]

    [ [ , ] LASTROW [ = last_row ] ]

    [ [ , ] MAXERRORS [ = max_errors ] ]

    [ [ , ] ORDER ( { column [ ASC | DESC ] } [ ,...n ] ) ]

    [ [ , ] ROWS_PER_BATCH [ = rows_per_batch ] ]

    [ [ , ] ROWTERMINATOR [ = 'row_terminator' ] ]

    [ [ , ] TABLOCK ]

    )

    ]

    mom

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply