ERROR: RWJobLauncher failed to connect to CMS. (BODI-1250220

Good afternoon all,

I have scheduled 9 data services jobs (nothing special - just data extract from a db and load into some flat files) and created the necessary .bat files and associated .txt files to store in a directory(not the default directory)
When I initiate the .bat files directly from the directory on the server, some of them work and some of them do not.
Every time I try it is different jobs that fail.
The error message in the AL_RWJobLauncherLog is;

(14.1.1.210) 04_03_2013 12:20:56 (2776): CRWJobLauncherApp::InitInstance called.
(14.1.1.210) 04_03_2013 12:20:57 (7096): *** RWJL_EXIT called.
(14.1.1.210) 04_03_2013 12:20:57 (7096): *** ERROR: RWJobLauncher failed to connect to CMS. (BODI-1250220)

The .bat files have been generated from the DS Management Console and with the “use password file” checkbox ticked.

After creating the .bat files I generated a new password file.

Has anyone come across such a scenario and/or could offer some advice on resolution and understanding?

(IPS4 and BODS4.1 SP1 running on Windows Server 2008 R2 Standard SP1)

Thanks


russec03 :uk: (BOB member since 2012-11-30)

You have fix the CMS Connection stuff in the DS Management Console.

In the left pane, Administrator --> Management --> CMS Connection. This has to be filled in with an user who can talk to the IPS Server.


ganeshxp :us: (BOB member since 2008-07-17)

Hi Ganesh,
Thanks for your suggestion.
I can confirm the CMS connection is populated and TEST works OK.
The problem I have is that it intermittently runs.
While watching the task manager, sometimes it spawns an al_engine and the job initiates OK and sometimes the AL_RWJobLauncher.exe just ends with the error message.


russec03 :uk: (BOB member since 2012-11-30)

Do you have 1 job server or multiple job servers in a group? Or a Windows clustered install?

What kind of entries do you have in repo table AL_MACHINE_INFO?


Johannes Vink :netherlands: (BOB member since 2012-03-20)

We have 2 job servers within a job server group.
The 2nd job server is on a 2nd Windows server.
The 2nd Windows server provide an extra Job Server only for increased capacity to run jobs.

My feeling is the issue lies with the current set up of 2 job servers across 2 machines - as we do not have the issue in our test environment. What I do not understand is how it can work sometimes and not others.

I’ve removed the 2nd remote job server from the server group and exported the execution commands again, but the same issue occurs. There is even a reference in the .txt file to -CmAPP0928 (APP0928 is the second Windows server - I do not know where it is picking this up from as the commands are exported to the primary server)

In the AL_MACHINE_INFO table I have 4 records.
1 for each of the Job Servers
1 for the Administrator
and 1 for the RepoManager

Any pointers/help is much appreciated. Thanks


russec03 :uk: (BOB member since 2012-11-30)

Bingo! That is why I asked about your setup. One of your servers is having problems.

Why it sometimes works and sometimes not: not entirely sure anymore, but the job launcher is a local program. It could be that the job is sometimes started on one server, and later then on the other.

How is the host name in AL_MACHINE_INFO for the job servers? Is there a difference in notation? Try for example the ip-address instead of a host name. And if there is a host name, please change it to the fully qualified host name.

Did you check the CMS connection for both job servers?


Johannes Vink :netherlands: (BOB member since 2012-03-20)

Thanks.
The MACHINE_NAME in the table is not qualified (i.e. APP0123)
How do I amend the AL_MACHINE_INFO table - can I do this via the DS Management console or the CMC ?

I have exported the command again, (setting the job server or Server Group option to just the primary job server) So, I was expecting the job to work every time I initiate it from the primary server, but again it is inconsistent and when fails reports the same error in the AL_RWJobLauncher.log
*** ERROR: RWJobLauncher failed to connect to CMS. (BODI-1250220)

Is it possible the JobLauncher is still trying to send this job to the second job server?


russec03 :uk: (BOB member since 2012-11-30)

1.Your CMS may be down as per the error msg ’ RWJobLauncher failed to connect to CMS’. Make it running and check.
2. Have a look on your .bat and .txt files. On the .txt file, the servername/system/hostname is passed through the variable -S. Ensure that this systemname is appropriate. Open ur mgmt console/cmc and look for the systemname. This name should be followed by -S.
3. If it does nt work further, Try to replacing the AL_RWJobLauncher.exe. Pick it from an app server that has the AL_RWJobLauncher.exe running fine.


rajnia1 (BOB member since 2011-11-17)