How can I commit more frequently?

  • Hi everyone, my question is, if I have an update statement that updates 30+ million rows, is there a way to make this update commit every few thousand rows that it updates?  Thanks in advance.

  • Put your update statement in a loop, update only those that fit the criteria, increment your criteria, loop, and then update again. 

    mom

  • I think that using a loop (especially to iterate through a cursor) might be a good idea here, so I'd agree with mom ... BUT ... you don't want to commit every loop iteration if you're retrieving one row per iteration.  That would kill your performance (I could go into painful detail here, but it works better on a whiteboard).  I'm pretty sure that mom means the technique where your where clause allows the update of a bunch of rows every loop iteration, so that each commit also handles a bunch of rows.  But I thought it would be worth mentioning explicitly, because (ouch!) issuing a commit on each row update can just destroy your throughput.

    Cheers,

    Chris

  • The technical I often use to avoid log grow too large is using set rowcount. It control exactly how many rows to update.

     

  • That's a good technique, but it's easiest to use set rowcount on deletes (because you don't have to adjust the SQL statement - just run it repeatedly in a loop until you run out of rows to delete).  Remember, Shahgol needs to do updates, and so will need something a little more sophisticated than a "vanilla" set rowcount loop.  I assume this is what mom was referring to also - in this case the where clause of the update will probably have to be dynamically modified with each iteration to produce a windowed effect so that every row is updated at least once.  Either that, or updating through a cursor, counting the number of iterations, and doing a commit / begin transaction every Nth iteration, with one more cleanup commit at the end.

    The "windowed where clause" approach can be more efficient - sometimes - but the "cursor with commits every Nth row" approach is general and can always be used (sometimes the "windowed where clause" can be too complex for mere mortals to implement when the where clause has to be more complex than just identifying a range of integer primary key values).

    Cheers,

    Chris

  • Maybe you can use the Update in combination with an SELECT top 1000

     

     

     

    UPDATE

     

  • here is what I mean:

     

    declare @a int, @b-2 int

    set @a=0

    set @b-2 = select count(*) from tablename

    while @a <= @b-2

    update table where key >= @a

    and key < @a+ 1000

    set @a = @a+1000  --commit every 1000 row

    loop

     

    We could commite the row every row or every so many thousand row depending on how you want to set it up.

     

    mom

  • The hard thing about this problem is tracking the work you've done and

    the work remaining.  Ideally you want to be able to track a logical transaction against the large table.  I think shagols probably wants to be able to commit 1000 updated rows not what is updated every 1000 rows inspected.

    I think one good approach is to snapshot chunks of the desired

    table updates into a temporary table using the top function and then

    update the table by joining it with the temporary table.  This batches the updates size to your preferred 1000 rows at a time and allows for scanning large chunks of the update table under the update constraints on each loop pass.

    This does require an amenable pk and of course and an correct

    ordered select query plan by pk under the joins for the calculation of new column values so it may cause overhead if the clustering isn't on the pk.

    Also I would'nt recommend this if the pks are some how mutating either.

    Another thing to think about is avoiding second instances of such an update.  So youll probably want some form of incremental updates semaphore to avoid a second instance of the same update running against the table concurrently.

    Here's a brief outline of what I mean by the snapshot chunking and updates.

    Any comments on this approach would be welcomed.

    Peter Evasn (__PETER Peter_)

    -- Using the status column in the table approach.

    declare @runtimeid BigInt

    declare @ROWSUPDATED BigInt

    -- Get semaphore

    update incrementalupdatejobids

    set      jobrunning = runtimeid + 1

    ,         runtimeid = runtimeid + 1

    from    incrementalupdatejobids

    where  name = "table"

    and     updatetaskname = "updatetaskname"

    and     jobrunning is null

    select @ROWSUPDATED = @@ROWCOUNT

    if (@ROWSUPDATED > 0)  -- Good update task specific record

    begin

    -- Set Iterator range values to track job.

    select @runtimeid = runtimeid

    from  incrementalupdatejobids

    where name = "table"

    and   updatetaskname = "updatetaskname"

    and   jobrunning is not null

    insert

    (runtimeid, KEYBOUNDARY)

    select @runtimeid, NULL

    -- Create TABLE [#rowids w updated values]

    -- Loop over range

    while (@ROWSUPDATED > 0)

    begin

       -- Get a calculated snapshot of the first 1000 rows to process.

       insert into [#rowids w updated values]

          TOP 1000

    .[ROWID]

       ,  (desired updated  value expression) as COL1

       ,  (desired updated  value expression) as COL2

       ...

       ,  (desired updated  value expression) as COLN

       from 

         

       -- OTHER CONSTRAINT TABLES JOIN GO HERE

       -- NOTE MOVE THE [ROWID] constraints into the first join clause

       -- to indicate precedence for [ROWID] Filtering

       -- NOTE IF (TRUE OR UNKNOWN) IS TRUE

       where (

             (

    .[ROWID] > (

                select last

                from  

                where  runtimeid = @runtimeid))

       or

             (

    .[ROWID] >= (

                select MIN(

    .[ROWID])

                from

                WHERE NOT EXISTS (

                   select * from

                   where runtimeid = @runtimeid

                   )

                )

             )

       )

       AND 

          -- ... NORMAL UPDATE CONSTRAINTS GO HERE

       order by

    .[ROWID]

       -- Update the tables values

       update

       set [updated column 1] = [#rowids w updated values].[COL1]

       ,  [updated column 2] = [#rowids w updated values].[COL2]

       ,  .

       ,  .

       ,  .

       FROM

       INNER JOIN  [#rowids w updated values]

       on

    .[ROWID] = [#rowids w updated values].[ROWID]

       select @ROWSUPDATED = @@ROWCOUNT

       update

       set    Last = select MAX(

    .[ROWID])

       delete [#rowids to process plus updated values]

    end

    -- remove job as running from incremental updates tracking table

    update

       incrementalupdatejobids

       set

          jobrunning = NULL

    from  incrementalupdatejobids

    where name = "target table name"

    and   updatetaskname = "arbitrary update task name"

    and   jobrunning is NOT null

    end

     

  • The most interesting thing about this thread - to me - is that all of us who have posted probably understand one or more of the common ways to do this, but the thread originator - shahgols - hasn't responded yet.  Of course, it's only been about a day and a half, but since the original request was not very specific, it would be interesting to know more specifics about his or her very solvable problem.  In the meantime ... nice talking with the rest of you, I guess! 

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply