Blog Post

Finding Failed Job Steps

,

Kendra’s query was a good starting point, and I used most of it in the first CTE shown below. This query basically looks at msdb.dbo.sysjobhistory and msdb.dbo.sysjobactivity, joining them on the job_id, which is the PK. However, we are only looking at steps, which is anything with a step_id > 0. The step_id = 0 is for the overall job.Recently a customer was asking for a way to alert on job steps that failed, but the job succeeded. They really wanted a custom metric for SQL Monitor, which I submitted, but this post looks at the query in general, trying to determine if a job step failed.

Note: Let me start by noting that this is based on work by Kendra from her post, SQL Agent Jobs: Checking for failed steps at the end of a job.

Based on Kendra’s query, I looked through what is happening in msdb.dbo.sysjobhistory and msdb.dbo.sysjobactivity. In Kendra’s query, she is looking for a specific job, but I wanted all jobs. This lead me to build a CTE that queries for the data.

As you can see in the code below, I use most of Kendra’s query to join these two tables together. First, we look for steps, so step_id != 0, and we look for failures. A status of 0 is a failure here, where 1 is success.

WITH cteJobStep (Job_ID, Step_Name)
AS (SELECT jh.job_id,
            jh.step_name AS JobStepName
     FROM msdb.dbo.sysjobhistory jh
         INNER JOIN msdb.dbo.sysjobactivity ja
             ON jh.job_id = ja.job_id
     WHERE jh.run_status = 0 --step failed
           AND jh.step_id != 0
           --         and jh.job_id = CONVERT(uniqueidentifier, '8C673935-F8C1-4E7D-94D3-1F3CAE50D7DC')
           AND --this block ensures that we just pull information from the most recent job run
         (
         -- The start time of the step, converted to datetime
         CONVERT(DATETIME, RTRIM(jh.run_date)) + (jh.run_time * 9 + jh.run_time % 10000 * 6 + jh.run_time % 100 * 10)
         / 216e4 >= ja.start_execution_date -- the time the job last started
         ))
SELECT COUNT(*) AS Failures
FROM cteJobStep c

Next, I kept most of Kendra’s query, but I commented out the line that limits this to a specific ID. I just want all step failures. I did keep the part that only checks the latest execution of the job.

The outer query just counts these up and returns a number. This lets me know how many job steps have failed during their latest execution.

This is a good first step, but this is something I could add in SQL Monitor as a custom metric or in any tool for alerting. When I have steps failing, I might want to know.

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating