Finding Failed Job Steps

Kendra’s query was a good starting point, and I used most of it in the first CTE shown below. This query basically looks at msdb.dbo.sysjobhistory and msdb.dbo.sysjobactivity, joining them on the job_id, which is the PK. However, we are only looking at steps, which is anything with a step_id > 0. The step_id = 0 is for the overall job.Recently a customer was asking for a way to alert on job steps that failed, but the job succeeded. They really wanted a custom metric for SQL Monitor, which I submitted, but this post looks at the query in general, trying to determine if a job step failed.

Note: Let me start by noting that this is based on work by Kendra from her post, SQL Agent Jobs: Checking for failed steps at the end of a job.

Based on Kendra’s query, I looked through what is happening in msdb.dbo.sysjobhistory and msdb.dbo.sysjobactivity. In Kendra’s query, she is looking for a specific job, but I wanted all jobs. This lead me to build a CTE that queries for the data.

As you can see in the code below, I use most of Kendra’s query to join these two tables together. First, we look for steps, so step_id != 0, and we look for failures. A status of 0 is a failure here, where 1 is success.

Update: Changed the query after the comment and another bug I noticed.

WITH cteActivity (job_id, start_execution_date)
AS ( -- get the latest job execution for all jobs
    SELECT job_id,
           MAX(start_execution_date) AS start_execution_date
    FROM msdb.dbo.sysjobactivity
    GROUP BY job_id)
, cteJobStep (Job_ID, Step_Name, run_date, run_time)
    AS (SELECT jh.job_id,
               jh.step_name AS JobStepName,
               jh.run_date,
               jh.run_time
        FROM msdb.dbo.sysjobhistory jh
            INNER JOIN cteActivity ja
                ON jh.job_id = ja.job_id
        WHERE jh.run_status = 0 --step failed
              AND jh.step_id != 0
              --         and jh.job_id = CONVERT(uniqueidentifier, '8C673935-F8C1-4E7D-94D3-1F3CAE50D7DC')
              AND --this block ensures that we just pull information from the most recent job run
            (
            -- The start time of the step, converted to datetime
            CONVERT(DATETIME, RTRIM(jh.run_date))
            + (jh.run_time * 9 + jh.run_time % 10000 * 6 + jh.run_time % 100
               * 10
              ) / 216e4 >= ja.start_execution_date -- the time the job last started
            )

Next, I kept most of Kendra’s query, but I commented out the line that limits this to a specific ID. I just want all step failures. I did keep the part that only checks the latest execution of the job.

The outer query just counts these up and returns a number. This lets me know how many job steps have failed during their latest execution.

This is a good first step, but this is something I could add in SQL Monitor as a custom metric or in any tool for alerting. When I have steps failing, I might want to know.

About way0utwest

Editor, SQLServerCentral
This entry was posted in Blog and tagged , , . Bookmark the permalink.

2 Responses to Finding Failed Job Steps

  1. Steffen Jabin says:

    Thank you for the idea.
    Your selection of the last job rund doesn’t work. The start time of the last step should be computed another way:

    SELECT jh.job_id,
    jh.step_name AS JobStepName
    FROM msdb.dbo.sysjobhistory jh
    WHERE jh.run_status = 0 –step failed
    AND jh.step_id != 0
    AND –this block ensures that we just pull information from the most recent job run
    (
    — The start time of the step, converted to datetime
    CONVERT(DATETIME, RTRIM(jh.run_date)) + (jh.run_time * 9 + jh.run_time % 10000 * 6 + jh.run_time % 100 * 10)
    / 216e4 >= ( select max ( ja.start_execution_date ) from msdb.dbo.sysjobactivity ja where jh.job_id = ja.job_id ) — the time the job last started
    )

    Like

Comments are closed.