Tranactional Replication - Deleting 52 million rows from subscriber in a proc

  • Hi all

    I have a DB that is a publisher with 52 million rows in a table.

    I have a subscriber which gets this data and then I insert it into a new database on the same server.

    I then don't need to keep the data in the subscriber or publisher (which updates with new daily data each day)

    So my proc

    Inserts into my database from subscriber

    where record doesn't exist in my database

    Then deletes from subscriber for the ones I just inserted into my Database

    This has not been running for many months, which is why the tables have grown so large....

    Anyway, I run the proc, it inserts the 52 mill rows into my database and then deletes from subscriber.....table has 0 row count. but proc is still running whilst the publisher rows are sllllllooooooooooowwwwwwwwyyyyyy being deleted...and I mean really slowly....at the current rate of about 10 rows a second...calculating it will take 115 days to complete!

    Should the proc not finish when the delete has occurred on the subscriber? and then the replication will delete from the publisher without being linked to the proc? Is this proc going to keep running until the rows have been deleted from the publisher?

    Many thanks in advance

  • More information required...Does your procedure delete anything from the publisher? Are you replicating delete statements at the publisher? Could you provide the text of your procedure so we can understand what it's doing?

    Is there a reason that you're not just replicating directly into your final table?

    Kendal Van Dyke
    http://kendalvandyke.blogspot.com/[/url]

  • mark.hayter (11/27/2008)


    Hi all

    I have a DB that is a publisher with 52 million rows in a table.

    I have a subscriber which gets this data and then I insert it into a new database on the same server.

    I then don't need to keep the data in the subscriber or publisher (which updates with new daily data each day)

    So my proc

    Inserts into my database from subscriber

    where record doesn't exist in my database

    Then deletes from subscriber for the ones I just inserted into my Database

    This has not been running for many months, which is why the tables have grown so large....

    Anyway, I run the proc, it inserts the 52 mill rows into my database and then deletes from subscriber.....table has 0 row count. but proc is still running whilst the publisher rows are sllllllooooooooooowwwwwwwwyyyyyy being deleted...and I mean really slowly....at the current rate of about 10 rows a second...calculating it will take 115 days to complete!

    Should the proc not finish when the delete has occurred on the subscriber? and then the replication will delete from the publisher without being linked to the proc? Is this proc going to keep running until the rows have been deleted from the publisher?

    Many thanks in advance

    My knee-jerk reaction is that there's got to be a better way to move the data. It doesn't make sense to me that you would have data at ServerA.Database, replicate it to ServerB.Database, move the data from ServerB.Database to ServerB.Database2, and then delete all of the data from ServerA and ServerB.Database. Yikes! Are you running SQL Sysiphus? 😀

    Anyway...

    For me, it's always about transaction management. Large transactions consume log space, take longer to replicate, cause blocking, etc, etc.

    So if you're going to delete every row from a table, batch it into a couple of thousand rows at a time. Depending on how polite you want to be your batches can be 10 rows or 500,000. Whatever your comfortable with. Here's how I manage my large updates and deletes:

    declare @iRowcount bigint,

    @dtThen datetime,

    @ielapsed bigint,

    @iTotalRows bigint

    select @iRowCount = 1,

    @iElapsedThreshold = 10,

    @iTotalRows = 0

    set rowcount 50000

    set nocount on

    while @iRowcount > 0

    begin

    begin tran batchupdate

    select @dtThen = getdate()

    --> your delete here <--

    select @iRowcount = @@ROWCOUNT

    commit tran batchupdate

    select @iTotalRows = @iTotalRows + @iRowcount

    select @ielapsed = datediff(ss,@dtThen,getdate())

    print 'Total Row(s) Affected: ' + convert(varchar(30), @iTotalRows) +

    ' Row(s) Affected: ' + convert(varchar(30), @iRowCount) +

    ' Elapsed: ' + convert(varchar(30),@iElapsed)

    end

    set nocount off

    set rowcount 0

    go

    In fact, I wrote a silly little wrapper sproc for stuff like this. It doesn't actually execute the command, it prints code you can shift-f6, ctrl-a, shift-f6, ctrl-v, alt-x (copy, paste, and execute).

    Usage:

    exec makebatch {rows per transaction}, {max number of seconds}, {incoming command text}

    Example:

    exec makebatch 50000, 20, 'delete from SisterAct3PetitionSignatures'

    The above will give you code to delete 50,000 records at a time and then yield if the most recent execution took more than 20 seconds to perform. I put this in so that I can be a hero without taxing my users.

    "Ok, I'll try to update all of those rows but this should really be done during off-peak hours, but let me see what I can do. {nudge, nudge, wink, wink}"

    If it's something you think you might like, have at it:

    CREATE procedure makebatch

    @iBatchSize bigint,

    @iThreshold bigint,

    @vcInText varchar(8000)

    as

    begin

    print 'declare @iRowcount bigint,'

    print ' @dtThen datetime,'

    print ' @ielapsed bigint,'

    print ' @iElapsedThreshold bigint,'

    print ' @iTotalRows bigint'

    print ''

    print 'select @iRowCount = 1,'

    print ' @iElapsedThreshold = ' + convert(varchar(30),@iThreshold) + ', '

    print ' @iTotalRows = 0'

    print ''

    print 'set rowcount ' + convert(varchar(30),@iBatchSize)

    print 'set nocount on'

    print ''

    print 'while @iRowcount > 0'

    print 'begin'

    print ' begin tran batchupdate'

    print ' select @dtThen = getdate()'

    print ''

    print ' /*'

    print ' ** your update code'

    print ' */'

    print ' ' + @vcInText

    print ''

    print ' select @iRowcount = @@ROWCOUNT'

    print ' commit tran batchupdate'

    print ' select @iTotalRows = @iTotalRows + @iRowcount'

    print ' select @ielapsed = datediff(ss,@dtThen,getdate())'

    print ''

    print ' print ''Total Row(s) Affected: '' + convert(varchar(30), @iTotalRows) +

    '' Row(s) Affected: '' + convert(varchar(30), @iRowCount) +

    '' Elapsed: '' + convert(varchar(30),@iElapsed)'

    if @iThreshold > 0

    begin

    print ' if @ielapsed > @iElapsedThreshold'

    print ' begin'

    print ' print ''Aborting due to threshold violation.'''

    print ' select @iRowcount = 0'

    print ' set nocount off'

    print ' set rowcount 0'

    print ' break'

    print ' end'

    end

    print 'end'

    print 'set nocount off'

    print 'set rowcount 0'

    print 'go'

    end

    /* If you still scream [font="Courier New"]SELECT[/font] at your SQL Server and capitalize reserved
    ** words and statements: Please, just let it go. We've had syntax highlighting
    ** for almost 9 years, now.
    */

    Love = Hungarian notation = 0 conflicts with reserved words/entities

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply