How to check wich jobs are running?

Hi folks,

how it is possible to check which jobs are running?
I have an job wich runs and would check at start if the same job with the exact job name is always running or not.

Have you ideas how to solve this?


aser (BOB member since 2017-05-09)

Typically I use a job control table combined with job start and job end functions. You check the table when the job starts to see if there is another instance of the job running already in a started state. If not, you add a record and put it in the started state and then when the job completes you update the table to finished. You can also store if the job had warnings or errors, etc. Overall a convenient tool for tracking run times, status and other information about jobs past and current.

  • E

eepjr24 :us: (BOB member since 2005-09-16)

Adding to Ernie’s idea, you can check the AL_HISTORY table of the repository.

I’ve done some reverse engeniering about how its work, but basically every time a JOB is launched a new row is added to the table and then updated with the status as the JOB finshed (either OK or with error). So if you find there are more than one row with no finished status (one row is the own job checking the table) you can infere that other jobs with the same name are running and do some action (suicide the own job, for instance).

You can also use some of the misc functions of BD to do a function in order to reutilize the logic.

Regards,

Andrés


aidelia :argentina: (BOB member since 2006-02-02)

@eepjr24 about your solution i have thought too.
The problem of this solution is that if a job is terminated by an error or killed by user through Service Management Console it can not set the finish status.

So I would to like has a job table like it uses in SMC to control the jobs.

@aidelia I checked my AL_HISTORY table and found entries there.

Do you know it is the same tables like use in the SMC?


aser (BOB member since 2017-05-09)

The problem with this is that whenever a job is killed or crashes, the repo info is not updated, i.e. even in the administration console it apperas still as a running job.

The only way to tackle this problem is to store the job server’s job PID (another challenge to get that) and verify whether that process is still alive.

This is no simple logic, but it can be done.


lamanp :netherlands: (BOB member since 2008-09-02)

@lamanp Do you have some example atl Files or Workflows?
Maybe an article how to start?


aser (BOB member since 2017-05-09)

Hi,

In my experience the AL_HISTORY is always correct except in the case of a system crash or hard reset. Killed Jobs are correctly reported (also DS exceptions). A solution relaying in a custom table could no sort some of this situations.

Maybe if, as Lammamp stated, you kill an al_engine process or force the service shutdown you could get an incomplete row, but anyway is relatively easy to fix that if your solutions depends on AL_HISTORY table.

Getting the PID could also solve that problem, but with the PID is not easy to get the JOB name. At least is far more difficult than solving it at DS and the Repo DB level.

Regards,

Andrés


aidelia :argentina: (BOB member since 2006-02-02)

What I do as the first step in the job, is get the PID and various other job attributes (pid, internal job_run_id, job_name, repo name, jobserver host and port, time started, trace-, monitor- and errorfilename, system config used) and store that. Upon job completion, that record gets updated.

This enables me to prevent starting an already running job, or noticing when a job process is no longer present, even though my table suggest it is still running.

Getting the PID on Linux isn’t that hard, on Windows it’s a bit more difficult. The easiest way is to grab it from the first line of the trace file.


lamanp :netherlands: (BOB member since 2008-09-02)

As said above, the AL_HISTORY table is fairly consistent. I’ve used it for various reports as well as for writing a DIY scheduler. Keep it simple, don’t try to reinvent the wheel.


eganjp :us: (BOB member since 2007-09-12)