CMS.exe crashes frequently in BI 4

Hi,

We are facing very critical problem in our BI 4 production environment. CMS.exe is crashing frequently. There are no Windows or any other patches installed on servers. Following is the environment:
BI4 Sp02 patch 15
2 CMS tier, 3 processing tier
Windows server 2008 R2-64bit, 64 GB RAM each
CMS DB: Oracle 11g

Errors in event viewer:

  1. APPCRASH, Faulting application CMS.exe
  2. CMS is unable to connect to cluster. Please check network connection and test responsiveness of system database.

Steps Taken:
1.Stop SIA on all 5 servers and kill any running java exe from task manager then reboot all 5 servers
2. Point CMS to blank schema and copy DB from old schema

Regards,
Lokesh


lokeshborse :india: (BOB member since 2010-09-21)

[Moderator Note: Moved from About BOB to XI Server Discussion]


Marek Chladny :slovakia: (BOB member since 2003-11-27)

This issue impacting business. I am unable to find out cause.

Any advise or ideas on this?

Regards,
Lokesh


lokeshborse :india: (BOB member since 2010-09-21)

You are on an totally outdated patch level. Current is either SP4 Patch 10 or SP5 patch 3 :!:
I would also look at connectivity problems between your 4 nodes.


Andreas :de: (BOB member since 2002-06-20)

I agree, you should, at least, patch your platform with SP4.

SP2 is really not stable.


prima :fr: (BOB member since 2011-11-18)

Maybe map out your nodes’ connectivity - are your servers all on the same subnet, are you sure you’re communicating across that subnet, what kind of reliability do you have?

Any messages at the database, do you see anything in the listener logs? What’s happening on your cluster when it goes down? HOW frequently, nightly? During daily operations??

B


bdouglas :switzerland: (BOB member since 2002-08-29)

We have applied SP04 on single test node but its in testing phase. We can not apply SP04 quickly unless it is tested. However I was wondering that everything was working fine till now on this patch and suddenly the issue occured without any change!!! CMS is crashing intermittently-Sometimes in 10 minutes and sometimes 5 hours!!!

I observerd in task manager that when CMS memory goes beyond 650 MB, it crashes hence I created 1 extra CMS each on 2 CMS nodes. Now, we have 4 CMS but still issue is appearing.

Event viewer logs show the error" Cluster connection with is broken. Please check your network connection and test responsiveness of your system database." We monitored 2 days from network team but there was no network packet dropped out. Tnsping also work fine.


lokeshborse :india: (BOB member since 2010-09-21)

Things do not just start happening, so something else must have changed…

Well, it doesnt seem it can get any worse :stuck_out_tongue: .

To be frank, most people tell me that SP5 was the one that fixes the most issues and gets things stable.


Mak 1 :uk: (BOB member since 2005-01-06)

Ok…Will put forward this SP05 patch suggestion to our team and hope for the best… :slight_smile:


lokeshborse :india: (BOB member since 2010-09-21)

Did you look at my response on this - are you sure your cluster communication is right? Any chance one of your nodes is communicating to the other nodes off the subnet??

We had an issue where BO was using our backup lan and not the primary - that took CORBA communications on a longer path, and it caused timing issues that quickly took the cluster down.

Maybe ping and tracert each server from each server, make sure you get the path you expect. Same with each server to DB server - again, maybe you’re pinging ok, but you’re not taking the best path. Maybe there is a DNS issue on one of your servers, something obscure like one box resolving addresses against a local host file…

Much luck!
B


bdouglas :switzerland: (BOB member since 2002-08-29)