I am running a 2-node XIR2 SP4 cluster on Windows Server 2003 SE and have always had an IFRS and OFRS running on both nodes.
This has left me with a nagging concern since I vaguely recall from my training course that there should only be 1 “active” IFRS and OFRS in a cluster
Since everything seemed to be running OK, I assumed the BO clusterware managed the active/standby status of the servers.
However, as the user-base grows (now into the hundreds) I am now seeing fairly frequent and recurrent errors in the windows application event viewer logs with issues like this:
“System problem encountered: Failed to remove file \lccfp4\busobj$\FileStore\Input\a_054\057\005\342326\ from the file system. Please check for permissions or sharing violations”
The file repository is presented to both nodes in the cluster through a windows share. This is hosted by a virtual server lccfp4 with an underlying SAN for the actual disk hosting (not my area - but this is my understanding of the architecture). This is shared to both nodes in the cluster.
So would someone please be able to comment if my basic configuration is ok (ie. an IFRS and OFRS on both nodes?) or should I delete/disable one of the pair?
Or is there something more fundamental involved here?
Thanks in anticipation.
Chris Noble
Oracle DBA, Unix System Admin, BO Admin
BOE does not support 2 IFRS or OFRS. Stop one set and set up scheduled copy to the location of the disabled FRS’s. Then you will have some protection in case one node will go down in the cluster. Or set up one new FRS’s on the network drive, copy all the active FRS folders to the new location and through CMC change the location of each FRSs servers. Ask your network guys to set up failover strategy for the remote FRS drive.
BOE does not provide failover for the node FRS’s in the clusters.
There are 2 issues that one must be concern with setting up true failover for BOE nodes in the cluster.
One is URLs and the second is FRSs. URLs usually handles through big IP strategy or like one the FRSs is through RAID or like one.
Business Objects does support more than one IFRS or OFRS in a cluster. Those services do not work in a distributed mode though.
The first IFRS and OFRS that start in a cluster are used as the active services and all other IFRS and OFRS services are used as passive stand-by services that perform no work. If an active IFRS or OFRS is no longer available, the next service in in order based on the start time (when the service registered with the CMS after its most recent start-up) is promoted from passive to active status.
You can see which FRS services are active by viewing the service metrics in the CMC - Servers area. You can view the start time and if you look at server performance metrics, you will find that only the active IFRS and OFRS are used.
FYI: There are functions in the CMS that are also only performed by the primary (first started in the cluster) CMS.
Steve,
I’ll buy your argument that it does support more then one I/O FRS but what I meant to say is that it is not true failover set up. Perhaps I had to be a little clearer.
Thanks Steve and Igor for your replies. Fantastic stuff.
So if I understand you both correctly I think my configuration seems ok.
Both windows boxes have a drive mapped to the same network share - hence I don’t need to copy files about for both to ‘see’ the same repository.
(The windows guys have resiliency for the shared storage area.)
Steve tells me there will be an active IFRS and OFRS based on start time of the servers - so i can happily have an IFRS started on each node.
(It would be nice of BO told you which was active.)
All sounds good !
If the architecture looks ok do you have any ideas what might be causing my errors - just by-the-by?
No problems Igor. 8) Now back to the original problem.
Chris,
Your basic configuration appears to be correct as long as you are sure that all ports in use for all servers are available from both servers. Each CMS needs the ability to talk to all the services in the cluster. If this communication is blocked, you end up with both CMS’s working independently and can cause all kinds of problems including the problem you are seeing.
Other things that can cause this problem are:
Hard shutdowns of the services by powering off one of the servers or killing services, where an update makes it into the repository database but the corresponding file does not make it onto the FRS because the services are down by then.
Restores of the repository/FRS from backups that did not happen at the same time with all the services stopped.
Network errors causing either the repository database or FRS to not be available suddenly.
For any of these, there is a command line program called Repository diagnostic Tool instructions at http://help.sap.com/content/bobj/bobj/index.htm that you can run and it will identify all items in your repository that are not in your FRS and in your FRS but not in your repository. You could also ignore many of these errors as this problem happens from any of the above problems and every Business Objects system produces them from time to time. They are only a problem if your users are complaining about them.
You may employ one more item to find out the cause of the problem. Look at the event log on the BO server(s) and you may find the pattern of what is the state of the system before the error had happen.