How to prevent a rollback?

Question

Post reply

How to prevent a rollback?

Pieter Huiberts

Old Hand

Points: 396
More actions
November 26, 2007 at 8:21 am

#377185

I am routinely performing transformations and model scoring on tables that are ~150mln records by several hundred columns of extract data that does not change in SQL2005.

When things go back (network errors bad data what not), SQL2005 decides to rollback everything to its original state. This process can take hours and is not needed. I am looking for any of the follwoing:

- Is there some way to turn this rollback behavior off? I can clean up manually afterwards

- Is there some way to either commit say very 5000 records and never rollback more than the last "batch"?

thanks a bunch

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Steve Jones - SSC Editor SSC Guru Points: 728159 More actions · Answer 1

Yep, you can batch your transaction if business rules allow it. You still could end up with stuff to clean up, but I wrote about this before:

http://qa.sqlservercentral.com/articles/Miscellaneous/batching/151/

You cannot turn off rollbacks. You need to commit transactions periodically to prevent issues.

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

Pieter Huiberts Old Hand Points: 396 More actions · Answer 2

Thanks Steve!

I looked into batch type jobs (see below) but that approach takes me from ~20hrs of processing time to 3-4 days of processing time. Increased processing time, will increase risk of failure. How will your approach change that?

use ;

go

declare @vMin int, @vMax int, @vTotal int;

set @vTotal = 142607203;

set @vMin = 1;

set @vMax = 5000;

while @vmax <= @vTotal

begin

insert into dbo.trans_Part

select

C1 ... c128,

from (nolock, index(iXID))

where xid between @vMin AND @vMax;

set @vMin = @vMax+1;

set @vMax = @vMax + 5000;

end

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 3

Heh... BCP the data out to a file... use Bulk Insert to load it... amazing how fast it can be...

Should never have any "data errors"... data should already be gleaned to not cause errors. To do otherwise is "programming by exception" and will always create a problem.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Pieter Huiberts Old Hand Points: 396 More actions · Answer 4

Pieter Huiberts

Old Hand

Points: 396

November 26, 2007 at 10:10 am

#754136

Thanks for the suggestion, but at 150mln rows by a couple hundred columns it is less practical.

Would A CLR + .NET allow some type of batching, or setting of "batch size + commit"? Anyone?

Michael Earl-395764 SSC Guru Points: 53873 More actions · Answer 5

You could try moving your processes into SSIS. If it is as simple as your example (copy form one table to another), you could try the SQL Server destination and the OLEDB Destination objects. They allow you to configure batch and commit sizes and do work pretty fast.

Otherwise, you could take your example of processing batches broken up into pieces, but instead of running the stored procedures in a loop, send them into a service broker queue. If you do this, you can configure your broker queue to process groups of records in parallel and still retain control over how many parallel processes can run. Then, by adjusting the batch size and the number of batches running in parallel, you could probably get back to a number far closer to your original performance without having a single, large batch.

As a simple test, run it as a loop in five query windows. In each window, have it only run every 5th batch (use mod or something to get it to skip) and see if your performance improves. Keep in mind that this parallel processing may hit your CPU or disk sub-systems pretty hard.

noeld SSC Guru Points: 96590 More actions · Answer 6

Not sure but maybe this is what you want ?

use ;

go

declare @vMin int, @vMax int, @vTotal int;

set @vTotal = 142607203;

set @vMin = 1;

set @vMax = 5000;

while @vmax <= @vTotal

begin

BEGIN TRANSACTION

insert into dbo.trans_Part

select

C1 ... c128,

from (nolock, index(iXID))

where xid between @vMin AND @vMax;

IF @@ERROR <> 0

ROLLBACK TRANSACTION

ELSE

COMMIT TRANSACTION

set @vMin = @vMax+1;

set @vMax = @vMax + 5000;

end

I would definitelly follow Jeff's bcp advice if you can...

* Noel

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 7

Pieter Huiberts (11/26/2007)

Thanks for the suggestion, but at 150mln rows by a couple hundred columns it is less practical.

Would A CLR + .NET allow some type of batching, or setting of "batch size + commit"? Anyone?

Wait a minute... let's back the performance pony up a bit. Are you saying that your staging table is an exact copy of everything that the target table will eventually have in it??? :blink: I mean it has ALL of the rows whether they are new or updated?

If that's true... then why don't you just do a "rename"... takes, oh, about 65 MILLI-SECONDS to do that... 😉

And, if you think a CLR is going to be more practical or effecient, go ahead... when you're done testing it for performance, we'll get back to doing it the right way 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Pieter Huiberts Old Hand Points: 396 More actions · Answer 8

The number of rows does not change at this stage, I need to score a complete universe. What does change is I pull a subset of the columns (50-70%) and modify their datatype and/or categorize them in the process. Afterwards, I take the transformed results and use them in model scoring. Those types of queries take long and when they brake things rollback. I typically use CREATE TABLE followed by INSERT INTO SELECT. This at least permits me to see progress compared to a SELECT INTO statement.

So, no I am not duplicating data. The scoring script is essentially weighing of values via a case statement and sum those results in 1 value. Picture a 100 case statements with plus signs connecting them into 1 single outcome.

All I want is some trick to avoid the lengthy rollback. I don't need it. I can clean up the malformed results myself.

Kenneth Wilhelmsson SSC-Dedicated Points: 30043 More actions · Answer 9

The only trick available to avoid lengthy rollbacks, is to not have lengthy transactions.

So what you're looking for is some way of slicing up the total work into many smaller transactions, and make those slices as lean and efficient as possible.

You might want to check out what can be gained with SELECT INTO instead of create table and then insert.. The gain is in the logging, if any, since you can have select into to do 'minimally logging' - ie only log the allocated extents instead of the entire rows as insert will. If volumes are large, this may be one area that can be tuned.

/Kenneth

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 10

What does change is I pull a subset of the columns (50-70%) and modify their datatype and/or...

That would be part of the reason why things take so long... why is it necessary to change the datatype? It also opens a source of rollbacks if the data happens to be incompatible with the changed datatype.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 11

noeld (11/26/2007)

Not sure but maybe this is what you want ?

use ;

go

declare @vMin int, @vMax int, @vTotal int;

set @vTotal = 142607203;

set @vMin = 1;

set @vMax = 5000;

while @vmax <= @vTotal

begin

BEGIN TRANSACTION

insert into dbo.trans_Part

select

C1 ... c128,

from (nolock, index(iXID))

where xid between @vMin AND @vMax;

IF @@ERROR <> 0

ROLLBACK TRANSACTION

ELSE

COMMIT TRANSACTION

set @vMin = @vMax+1;

set @vMax = @vMax + 5000;

end

I would definitelly follow Jeff's bcp advice if you can...

This works best if the index you're using is the Clustered index... Otherwise you will be doing multiple table scans to find all of the data (which will be VERY inefficient). Also - 5000 is usually too small a chunk (it's going to slow the process down somewhat), for most processes. I'd start around 50,000, and perhaps work up from there, to find the ideal size for your process.

I understand that data cleanup can something be a problem, but what exactly is causing the rollback? Is there no way to isolate what the issue is?

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?