DS RunBook

system · September 17, 2012, 3:51am

Hi

I am thinking of doing a DS Job to report the status of Daliy Execution of ETL Jobs.

So my idea is, once the ETL Window is done, I will run my Job to report the list of Jobs in PRODUCTION Repository and their Run Information for the day.

Say, if a job is missed/failed/glitched by the 3rd Party Scheduler, then raise a exception report to tell what all have not run for last 2 days or more or something like that.

I am sure some of us has already came up with the solution/already having a solution/refining the existing solution…

So just throw some ideas please.

Note: Everyone can ask about JOB LOG/CONTROL ENTRY Table. But I wanted to do it directly from the HISTORY Tables.

Ideas Welcome!

ganeshxp (BOB member since 2008-07-17)

system · October 15, 2012, 4:18pm

Did you get anywhere with this?

The history table can be a bit flaky when it comes to providing a status on jobs. If a job has a fatal failure then the history table doesn’t get updated with a failed status. It continues to look like the job is still running. If your status job checked for jobs that are still “running” but have gone far beyond their normal run time (say 200%) then that could be included in your report. Even better, if the status job changed the status of the job in the history table to failed that would be helpful. Sometimes you can abort the job from the Management Console and sometime you can’t. Doing from the Management Console is always preferable.

The history table is the only place to get this information so it’s not like you have a lot of choice.

Where are you storing the list of jobs to look for?

eganjp (BOB member since 2007-09-12)

system · October 15, 2012, 4:42pm

I have come up with a with a first version of it, a few weeks back. But never got a chance to enhance it then…

You know what, I forgot this post

I will post of that…Yes, as of now, I will just get the status of all the jobs. Good way to catch hold of jobs that are getting kicked on by some manual users than the Scheduler. Not bad at all, it works good for me. The thing now is, based on this, I should trigger on items that are having a NVL(END_TIME,SYSDATE) - START_TIME and trigger on stuffs.

ganeshxp (BOB member since 2008-07-17)

system · October 15, 2012, 7:36pm

Oh yes… I have been kicking around this idea for months. But never got to an implementation, partly because we do not have that much failing jobs at the moment.

First version:
List of jobs and their frequency (daily, weekly, which of the week). Then hold that list against the repo’s and see if something has ran succesfully that day.

This is the most basic requirement that will find failed or worse: not started jobs that should have started…

Variants:
[list=]Disable warnings when a job is deliberatly disabled.
Sum up an overview of a project when sometimes 20 different jobs run per 24 hours, you want an overview
Detect when a job has run too many times
Do not only check per period, but also the time. So a job should run every day at 22:00, then check that a job has run around that time and not at 10:00 in the morning by an administrator[/list]

I think that covers a lot. And is ambitious to implement. Not sure what I want to develop from scratch and what to incrementally implemnt yet.

Johannes Vink (BOB member since 2012-03-20)

system · October 16, 2012, 8:08am

Another thing…

New Oracle schema with rights across all repo’s and one huge view across relevant tables.

Purpose: query to find a specific PID when there is a wild running process… instead of querying every repo seperate.

Johannes Vink (BOB member since 2012-03-20)