BCA gone crazy

system · August 14, 2006, 7:39am

Hello,

Our BCA has suddenly started going crazy and re-running all jobs at every scan interval.

Our platform is Win2000 SP4, we run many BCA agents on the same machine where this is happening, but they are working normally. Similarly, if I shif the monitor for this BCA agent over to a different host, the same problem persists!

This seems to happen only with BO Full Client reports with tasks of “refresh” and “run macro” (the macro just calls an external program via “Shell()”).

We’re running BO 6.5.1 currently. Has anyone ever seen a problem like this?

Regards,

hansend (BOB member since 2002-12-20)

system · August 14, 2006, 4:53pm

Never seen anything like this…but I’m guessing something must have changed. Do you know what? I’d be tempted to turn them all off and start monitoring them one by one…maybe even reschedule the odd job again. Try a scan, repair and compact as well…

Nick Daniels (BOB member since 2002-08-15)

system · August 14, 2006, 5:40pm

Hi,

Thanks for the tips… unfortunately nothing really changed except the content of a macro, which really just runs a Shell( ) call anyway.

“Try a scan, repair and compact as well…” – I would love to, but we have integrated BO 6.5.1 with LDAP, and if you do a scan/repair/compact it would delete all BCA tasks scheduled by LDAP users and revert all documents published by LDAP users up to the General Supervisor. We haven’t done a scan/repair/compact in 10 months and are still awaiting a fix…

hansend (BOB member since 2002-12-20)

system · August 14, 2006, 5:52pm

Do you have a server cluster where a couple of BCA servers are monitoring the same scheduler? Is the date/time correct on your servers?

ToddGustafson (BOB member since 2002-09-17)

system · August 14, 2006, 6:04pm

The only time I saw anything like that was during our migration, and the bca server was up during the upgrade… Is it possible that your repository has a problem, something stopping the bca from writing updates? If it can’t mark the jobs run, it will keep running them.

I’d try rebooting your BCA server(s) and seeing if it works for a while before you see issues - there are some memory leaks with the BCA and full client jobs with macros. Maybe your SHELL() is making that worse, happen faster… Any chance you’re using up memory and not releasing it some how?

Good luck,
B.

bdouglas (BOB member since 2002-08-29)

system · August 18, 2006, 3:01pm

Hi,

(moved from a different reply – oops)

Still not resolved, but I’ve narrowed it way down. The UPDATE to the DS_PENDING_JOB table is being rolled back, causing the new record for the next day’s run to have today’s date – so it’s seen as “eligible” on the next BCA scan interval and run again.

It seems related to the fact that the reports are set up to be distributed to user groups which have no internal users – they are all externally defined in our LDAP system. If we add a repository user to the target groups, the problem goes away.

I’m 100% convinced that this is a bug in BCA – yet another case where BO didn’t consider all the implications of externalising user authentication. There are many other bugs in this area (e.g. our inability to do scan/repair/compacts is another example).

I’ve raised a case with BO, now I’m just waiting for them to confirm and open an ADAPT…

Regards,

hansend (BOB member since 2002-12-20)

system · August 18, 2006, 3:22pm

Wow. I had to read that twice to get the full implications…

I would have just assumed the user specified in the bomain.key to poll the repository would have been good enough to update the pending jobs table successfully.

Please keep us up on what you get back from BO on this - could be a problem for us in the future, if I follow.

Good luck and nice work on the debug!
B.

bdouglas (BOB member since 2002-08-29)

system · August 18, 2006, 3:41pm

Try adding a Group Reference User to each group

check the Supervisor doc for the exact workfolow

Ottoman (BOB member since 2002-10-04)

system · August 21, 2006, 5:24pm

Hi,

I’ve been testing this further and I’m becoming more and more alarmed by how easy it is to reproduce this problem, even with a more recent version of BO (SP2 + CHF28).

Here’s how I reproduce it:

Create an empty group in Supervisor, “TEST_GRP”. Add NO users to it.
Create a simple report against e-Fashion
Schedule the report to run (refresh) daily, distributing via the repository to TEST_GRP.
Watch the report in BCA_Console. It should run normally.
Add TEST_GRP to the LDAP system.
Add a user, “userx” to that group in LDAP
Log into any BO application as “userx” so that a hidden repository record is created for userx, marking him as a member of TEST_GRP
Schedule another report or wait a day until the previously scheduled report runs again. It immediately starts looping, re-running at every scan interval.

It seems hard to believe that the problem is this general… has anyone else encountered this, or can anyone confirm that they do not have this problem, even for cases where they send reports to groups with only LDAP users?

Regards,

hansend (BOB member since 2002-12-20)

system · August 21, 2006, 5:26pm

Tried that – doesn’t work. If there are LDAP users in the group, it loops.

hansend (BOB member since 2002-12-20)

system · September 6, 2006, 12:08pm

Quick update on this issue, BO have confirmed it’s a bug in BO6.5, and that it’s present in even the latest version (SP3 + MHF11, or even CHF47). They’re working on a fix.

It seems strange to me that nobody’s encountered this problem – all we’re doing is sending reports to a group with LDAP users via the BCA… Does nobody else do that?

Cheers,

hansend (BOB member since 2002-12-20)