BusinessObjects Board

Service Monitoring

Is it possible to share the name of the book? I definitely would like to know more ways to hack the system. 8)


substring :us: (BOB member since 2004-01-16)

Although far from an industrial-strength solution, I’ve been using Health Monitor to monitor the BO-specific services across fifteen servers. I monitor the BO services every 60 seconds, and it sends me an email alert if they’ve stopped. It’s has helped me to be more aware of what’s going on in our environment, and I’ve been very pleased with it.

I’ve also used the sc.exe command to script bringing our environments up/down in a specific order, which has also been helpful.

DJ


DJ06482 :us: (BOB member since 2002-11-22)

I was on the Diamond site today and I’m sure I saw something about SDK or web services and the various XI ‘bits’. Sorry to be so vague, if you can’t find it let me know and I’ll go looking… :yesnod:


Nick Daniels :uk: (BOB member since 2002-08-15)

DJ:
Is this with XIr2?
Is mailing the only action you take, or do you attempt a restart?

substring:
Pro Crystal Enterprise/Business Objects XI Programming by Carl Ganz, Jr
Apress publications.
Excellent read :yesnod:

  • even for non crystal / sdk ppl like moi.
    Explains a tad deeper than the admin guides and there is an option to purchase an e-book copy for $10 via a simple question - probably referencing the hardcopy - I guess the publishers haven’t thought about wireless palm access in a bookstore lol.
    It actually would be very helpfull for the code, but I have not had a chance to get around to ordering one.

MikeD :south_africa: (BOB member since 2002-06-18)

Yes, we’re currently on XIr2 SP1 (MHF1)…

Right now the mailing is the first action I take, and then I typically log onto the server to see what’s happened firsthand. Our BO-related services are all set to automatically start up if the machine is bounced, and they all have recovery settings set to “Restart the Service” after each failure. So, in some cases the service issues resolve themselves, but the email notifications have been a big help in alerting me to:

  • Database connectivity issues causing the CMS to crash (and not start back up)
  • Unscheduled reboots of our servers

The first issue was causing a lot of issues for us because the CMS would stop after losing connectivity with the system database, and would never start back up on its own. More troubling, however, was that once the CMS was manually started, the other BO services were having trouble registering with the CMS. So, in several cases, we’d have to first start the CMS up, and then would have to stop/start all of the individual BO services so that they would correctly register with the CMS.

PM services have been especially troublesome for us, and that’s the reason for scripting our environment shutdowns and startups. If the CMS isn’t fully up and operational when the Performance Management services are started, it causes all kinds of problems from within PM. So, we start up the CMS first, wait for a few minutes to make sure it has had time to complete the startup process, and then start the Performance Management services. That startup procedure seems to result in a more stable environment for us.

HTH -

DJ


DJ06482 :us: (BOB member since 2002-11-22)

Many thanx for that insight DJ!

I have actually noticed / experienced the same with all the instances you mention - CMS and PM.
And once in a while - the Report server also dies.
So - in these respects I was hoping to develop a script with enough logic to restart one service - i.e. Report Server, if it stops, and in other instances like the CMS or any others that have dependants, a series of restarts.
Hopefully to avoid a server reboot and failover.

That brings 2 main questions to mind - can this be achieved via the script option in HealthMonitor (along with .bat capabilities and or some real scripting language) - or if it would be more suitable to rather scale the more unstable servers on the same machine.

I will certainly post my finding either way, and your comments are always appreciated.


MikeD :south_africa: (BOB member since 2002-06-18)

I would be happy to take a look at what you have but I seem to recall seeing something very similar somewhere for XIR2. Will post if it pops up again.


Decisys :de: (BOB member since 2004-05-21)

Hi Substring,

Do the CA come back to on this?
As we are using Clearcase in our organisation and they want to use it as a version control instead of EQM or EbiExpert

Any suggestion?

Thanks,
Vijay


vijaykollu :us: (BOB member since 2003-05-08)

They are supposed to do a POC for me in the next couple days. Since our ticketing system is CA, it will be advantageous on that regard. Furthermore, I know CA can monitor all our servers, including the load balancer. With 20,000+ users, it is important.

However, the CA folks are not sure if they can monitor the Business Objects propriatary services. But they will definitely try.

By the way, I am currently evaluating EQM4 as well. I know that none of these software can do everything for me…version control, change management/promotion, system monitoring, trouble-ticket integration, QA testing integration, etc etc. Basically, I am trying to find out which software can give me the maximum benefit. And then I have to decide which is more important. For example, is versioning more important than load balancer failure? Or vice versa?

I will let you know how it goes if you are interested.


substring :us: (BOB member since 2004-01-16)

Hmm - I did some digging around and believe I have something that might just resolve this - for us anyway lol.

i.e. I decided to not attempt the script from the book as it would require purchasing a #C manual - so decided to revert to something more familiar.

I have located a Perl Script that runs from a command line to any server and then - lists the services running - provides the options to stop / start specific services or even restart the server.
I have unearthed my old perl black book / installed ActivePerl and Komodo and intend using this script as a base to handle all XI services via the HealthMonitor script option.

The fact that it also contains logic to evaluate the service status gives me an alternative option if HealthMonitor proves to be troublesome.

I now just need to evaluate the XI service hierarchies and dependencies so that I can include that in the script as well,as it would make no sense to just restart a service that might also require first stopping and starting a few others.
:crazy_face:

If/when I get this right, I’ll post the script in this thread …


MikeD :south_africa: (BOB member since 2002-06-18)

Thanks Mike and Substring,

I am interested in looking at that, as I am not able to decided which one should I use. either CA tools or EQM4.

Also the failover mechanism.

Looking forward to see your analysis.

Thanks,
Vijay


vijaykollu :us: (BOB member since 2003-05-08)

Unfortunately, it’s a little tricky. When the CMS stops due to a lack of database connectivity, it’s classified as a (net) stop command, rather than a failure. Because of this, the Windows service recovery options never come into play. This makes scripting for the failures much more difficult. Most of our failures occur overnight, so if I can catch the failure first thing in the morning, I can fix the issue before it impacts our users (most start around 8AM, I’m in the system a couple of hours before that).

One thing I looked into using but never had the time to implement was using the eventtriggers.exe command, in which you can run batch jobs after certain events are triggered in the Event Logs. When a service stops or fails, it’s recorded in the Event Log, so I was hoping to utilize that command to script my different types of recoveries.

Another option I’m considering is to script a system reset every morning using the sc.exe command. I have a script that I run now, but haven’t scheduled it to run daily. Timing (in terms of when you could run the script) might be a tricky issue, depending on when your deployment specifics.

Our deployment sees very low levels of usage and we still run into these stability issues, so I don’t believe that it’s the load on the system that’s causing these problems. Scaling up the services (watch out for the PM services - see below) might help, but know that in the case of the CMS/database connectivity issue, scaling won’t help. If the DB connection is down (due to a backup or maintenance), it’s down across the board, and both of our CMS’ will crash and then eventually stop themselves after several attempts to restart.

PM services are very tricky to deal with. It’s not a well known fact, but many of them don’t have the ability to be load-balanced (AA Analytics and AA Dashboard are the two exceptions; they can be load-balanced). See the “pmxir2_deployment_guide.pdf” available on the Tech Support website, the relevant section begins on page 28. I’ve seen instances of having multiple PM services can cause PM repository corruption issues, so you’ll definitely want to avoid that.

If you’re re-starting your PM services, you’ll want to be careful of how they’re stopped and started. In my initial script, I shut down the web servers, then the CMS’, and then the remaining services (including PM). However, I noticed that on several occasions, the PM services would automatically restart, which I didn’t want to happen. I have a case open with BO regarding this behavior, but if I stop the PM services while the CMS is still running they generally seem to respond better. So, the general order of my current script is:

1. Stop the web ervers
2. Stop all other services (except CMS)
3. Stop CMS
4. Wait 2 min to be sure the CMS is down
5. Start CMS
6. Wait 2 min to make sure the CMS is running
7. Start other services
8. Start the web servers

HTH -

DJ


DJ06482 :us: (BOB member since 2002-11-22)

UpdateII: I’ve completed a Perl script to handle all this and will liase with the moderators to get the script uploaded into BOB.
Yeah - I know that most use the netstart and sc commands to do the same, but I wanted something a little more flexible i.e. I use an input file with a list of all xi services - AND - it can be modified to run against a unix environment if required.

Also: Automatic Restart of CMS


MikeD :south_africa: (BOB member since 2002-06-18)

UpdateIII.
Is there anyone out there that has created a windows service from a Perl script?
I can create the service, but it won’t run the script :hb:

The script runs just fine from a Scheduling tool - and would probably be fine with a cron job, but having this as a service would make it a tad more robust / idiot proof.


MikeD :south_africa: (BOB member since 2002-06-18)

Check out Servers Alive from www.woodstone.nu.

The demo version will allow monitoring of 10 ‘events’ - disk space, servives running, etc.

2 tier cost model of 199 or 299 - check it out - works great!

Tech support is also fantastic with quick and accurate responses.


KevinI :us: (BOB member since 2003-03-04)

MikeD has supplied this “Service Monitor Script” in BOB’s Downloads.

Thanks, Mike!


Anita Craig :us: (BOB member since 2002-06-17)

Hi,
can anyone post the script for deletion of log files on BO servers automatically.

thanks Inadvance


vk_0432 (BOB member since 2011-11-02)

Why not just stop creating the log files?


Nick Daniels :uk: (BOB member since 2002-08-15)

Hi,

I need log files, but I want to delete 5days older log files,
please send me the .bat file script for this


vk_0432 (BOB member since 2011-11-02)

I know BOXI 4.0 has some new/improved monitoring capabilities. I believe it may also allow scheduling a bounce of certain Servers/Services…


Captspeed :us: (BOB member since 2006-10-03)