AJS Failed Job Creations Metric

My BI4.2 system shows very high nbr of Failed Job Creations in my AJS Metrics. What is that indicating to me and what can I do to remediate or troubleshoot these failures?


darneson :us: (BOB member since 2009-03-10)

The metric itself is pretty useless – it’s just the number of schedules that have failed since the server was started. You can get more useful information in Instance Manager. Just select status of “Failed” and the date period you want to look at.


joepeters :us: (BOB member since 2002-08-29)

Thanks for the reply. Greetings from the great white north :blue: . I typically do focus on the failures in Instance Manager and the audit database to research specific failure causes and our overall success of our scheduled jobs. We run around 1500 jobs per day and see a handful of issues each day, typically around 1.5 - 2% failure rate, though when I saw this metric showing 1100 Failed Job Creations and 1500 received job requests, I got alarmed. I am concerned that it is indicating that there is a problem with my cluster communications, job server configuration, or capacity that is causing these failures before the job can even start. Unfortunately I do not see much documentation around this metric. If there is a way to reduce these failures to launch, I would like to what to do.


darneson :us: (BOB member since 2009-03-10)

Hmm.

Further down the page there is a section titled “Scheduling Services” which has the detail counts by service. The total of the “Failed Job Creation” column should match the total that you see. Which service has the bulk of the failures?


joepeters :us: (BOB member since 2002-08-29)

The vast majority of what we run each night is Webi, with a handful of schedules for Crystal 2016 and Olap For Office. Likewise, the failed job creations appear to be nearly all Webi with a few CR2016. For example, one of our two reporting job servers has 13 of 132 CR2016, 4202 of 5074 Webi, 0 of 2 AOS, and 0of 5 Lumira. Our other primary reporting Job server ( we have two others for special server groups for very large Webi reports). One of those shows 1 of 21 Webi and the other 1 of 13 Webi.


darneson :us: (BOB member since 2009-03-10)

That’s interesting. So you’re not seeing the same 4,202 failures in Instance Manager? Are you seeing 5,074 successful instances? If I were you, I would enable at least “Low” logging on that AJS, and then check the log file after a day or two. Look for any “|E|” entries.


joepeters :us: (BOB member since 2002-08-29)