I have scheduled 9 data services jobs (nothing special - just data extract from a db and load into some flat files) and created the necessary .bat files and associated .txt files to store in a directory(not the default directory)
When I initiate the .bat files directly from the directory on the server, some of them work and some of them do not.
Every time I try it is different jobs that fail.
The error message in the AL_RWJobLauncherLog is;
Hi Ganesh,
Thanks for your suggestion.
I can confirm the CMS connection is populated and TEST works OK.
The problem I have is that it intermittently runs.
While watching the task manager, sometimes it spawns an al_engine and the job initiates OK and sometimes the AL_RWJobLauncher.exe just ends with the error message.
We have 2 job servers within a job server group.
The 2nd job server is on a 2nd Windows server.
The 2nd Windows server provide an extra Job Server only for increased capacity to run jobs.
My feeling is the issue lies with the current set up of 2 job servers across 2 machines - as we do not have the issue in our test environment. What I do not understand is how it can work sometimes and not others.
I’ve removed the 2nd remote job server from the server group and exported the execution commands again, but the same issue occurs. There is even a reference in the .txt file to -CmAPP0928 (APP0928 is the second Windows server - I do not know where it is picking this up from as the commands are exported to the primary server)
In the AL_MACHINE_INFO table I have 4 records.
1 for each of the Job Servers
1 for the Administrator
and 1 for the RepoManager
Bingo! That is why I asked about your setup. One of your servers is having problems.
Why it sometimes works and sometimes not: not entirely sure anymore, but the job launcher is a local program. It could be that the job is sometimes started on one server, and later then on the other.
How is the host name in AL_MACHINE_INFO for the job servers? Is there a difference in notation? Try for example the ip-address instead of a host name. And if there is a host name, please change it to the fully qualified host name.
Did you check the CMS connection for both job servers?
Thanks.
The MACHINE_NAME in the table is not qualified (i.e. APP0123)
How do I amend the AL_MACHINE_INFO table - can I do this via the DS Management console or the CMC ?
I have exported the command again, (setting the job server or Server Group option to just the primary job server) So, I was expecting the job to work every time I initiate it from the primary server, but again it is inconsistent and when fails reports the same error in the AL_RWJobLauncher.log
*** ERROR: RWJobLauncher failed to connect to CMS. (BODI-1250220)
Is it possible the JobLauncher is still trying to send this job to the second job server?
1.Your CMS may be down as per the error msg ’ RWJobLauncher failed to connect to CMS’. Make it running and check.
2. Have a look on your .bat and .txt files. On the .txt file, the servername/system/hostname is passed through the variable -S. Ensure that this systemname is appropriate. Open ur mgmt console/cmc and look for the systemname. This name should be followed by -S.
3. If it does nt work further, Try to replacing the AL_RWJobLauncher.exe. Pick it from an app server that has the AL_RWJobLauncher.exe running fine.