SQL Server Jobs - Retry Attempts

Question

Post reply

SQL Server Jobs - Retry Attempts

AMCiti

SSC-Addicted

Points: 485
More actions
March 28, 2022 at 12:56 pm

#4007029

I know in SQL Server jobs, if a step fails, you can set the number of retries and interval. What I want to do though is if a step fails, retry the entire job from Step 1, not just the individual step.

The reason I need to do it this way, is that the first step truncates the various tables, then steps 2 - 15 load data into the table from various files. So if step #4 fails and half the file is loaded, if I retry step #4, we will have duplicate records. I realize the best solution is to rework the SQL Job so I can retry each step, but until we get that accomplished, restarting the entire job would at least provide a solution until we go through all the various development cycles to rework this job.

However, I cannot seem to find a method to retry the entire job if a step fails.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 245813 More actions · Answer 1

I can't see a way of doing that in SQL Agent.

What do the other steps do? Is this a sequence of T-SQL / stored procs, or do the steps include calls to other technologies (SSIS/SSAS/command line etc)?

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

AMCiti SSC-Addicted Points: 485 More actions · Answer 2

We get files that need to be imported daily into our SQL Server because we do not have direct connections to import the data. The import of each file is an SSIS package and there are about 15 files that need imported. The first step in the job truncates the table so the data if fresh each morning. So if a step fails, restarting that individual step will reload that data for that step resulting in duplicates.

I know how I can rework this SQL Job to avoid this and it is on my plate, but until I am able to get to rework the actual job and code, was thinking that if a step failed, simply rerunning the entire SQL Job is my best option so that I know it is not loading duplicates. Hopefully I explained this well enough, but if not, please let me know.

Michael L John One Orange Chip Points: 26551 More actions · Answer 3

Michael L John

One Orange Chip

Points: 26551

March 28, 2022 at 3:39 pm

#4007147

If step 4 fails, in the setting of the step, can you simply set it to go to step 1?

Of course, you run the risk of going in an endless loop

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

Ed B Hall of Fame Points: 3130 More actions · Answer 4

Ed B

Hall of Fame

Points: 3130

March 28, 2022 at 3:39 pm

#4007148

Can you set step 4 to go to step 1 on failure?

This does not allow you to limit the number of times it will try to restart and could end up in an infinite loop, but if you have an alert that the step failed you can intervene if it keeps failing.

Phil Parkin SSC Guru Points: 245813 More actions · Answer 5

Putting the truncate as step 1 in the SSIS package seems like the way to go – sounds like you've worked that out. This should be quick to implement.

Creating one Agent job per table is another (horrendously inelegant) possibility, with the added potential benefit of making some of the imports run in parallel.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

homebrew01 SSC Guru Points: 55454 More actions · Answer 6

homebrew01

SSC Guru

Points: 55454

March 28, 2022 at 4:12 pm

#4007162

Michael L John wrote:

If step 4 fails, in the setting of the step, can you simply set it to go to step 1?

Of course, you run the risk of going in an endless loop

Same thought.

AMCiti SSC-Addicted Points: 485 More actions · Answer 7

AMCiti

SSC-Addicted

Points: 485

March 28, 2022 at 5:06 pm

#4007203

I understand the risk. This is a stop gap until I can rework the SQL Job to avoid it. The reason it would fail is due to server load and resources.

Phil Parkin SSC Guru Points: 245813 More actions · Answer 8

AMCiti wrote:

I understand the risk. This is a stop gap until I can rework the SQL Job to avoid it. The reason it would fail is due to server load and resources.

So there's no chance that one of the files could contain bad data?

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

AMCiti SSC-Addicted Points: 485 More actions · Answer 9

Phil:

"So there's no chance that one of the files could contain bad data?"

To answer that question, I cannot say there will "never" be bad data, but as long as this has been running that has not been an issue. The only time it has failed is due to resource issues with the server caused by this other team. This started about 1 month ago when another team that pushes data to our server changed their processes and now it bogs down our server to the point of jobs failing, inability to perform basic functions, etc. The bad thing is, their process is not predictable and every day is a gamble on what will fail / work. So until I get to change the actual routine on what was a process that worked with no issues to accommodate these new found issues, I am just hoping to stop gap the manual intervention that was needed to manually run this job multiple times a day until it succeeded.

Phil Parkin SSC Guru Points: 245813 More actions · Answer 10

OK, sure. Just wanted to make sure that you've considered that before setting the 'On error, go back to start and try again' option!

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

ScottPletcher SSC Guru Points: 100262 More actions · Answer 11

Rather than just going back to the first step, I would go to a later, isolated step in the job, and ultimately that step would go back to the first step.

The difference being that the special step could log the fact that an error occurred, count the errors and/or make other determinations as to whether it was ok to go to Step 1 again.

For example, say the job currently has 8 steps (for example). The eight step exits the job on both good and bad completion. Add a 9th step to do the error checking / looping. The only way to go to that step is by an error in a previous step.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Ed B Hall of Fame Points: 3130 More actions · Answer 12

AMCiti wrote:

I understand the risk. This is a stop gap until I can rework the SQL Job to avoid it. The reason it would fail is due to server load and resources.[/quot

AMCiti wrote:

I understand the risk. This is a stop gap until I can rework the SQL Job to avoid it. The reason it would fail is due to server load and resources.

Here's a possible dirty hack to prevent an infinite loop in the event of bad data. It should take only a few minutes to set up.

Create a table called dbo.retry with a single integer column retrycount with a single row, set to zero.

create table dbo.retry (retrycount int)
insert retry values (0)

Job starts at step 1

Step 1) update dbo.retry set retrycount = 0

Step 2) update dbo.retry set retrycount += 1

step3) Query retrycount and raiserror if too great

declare @count int,
        @n int = 10
set @count = (select max(retrycount) from dbo.retry)
if @count > @n
  begin
     raiserror('too many failures', 16, 1)
  end

step 4) Truncate tables

step 5) Insert 1

Step 6) insert 2 etc...

On failure for all the insert steps, go to step 2. On success go to next step.

This will allow you to restart the job if any of the insert steps fail, but you can fail the job if the number of retries gets too big. This prevents the infinite loop.

This reply was modified 2 years, 4 months ago by Ed B.