Optimal high availability solution for Input/Output FRS

system · January 4, 2008, 7:55pm

For Windows servers, the Business Objects recommendation for a highly available Input/Output FRS file system is to use either a NAS or a shared folder on another server connected to a SAN. There are two problems I see with this recommendation: speed and single point-of-failure.

Both solutions are accessing files at the speed of the network. If the NAS fails or the server hosting the shared folder goes down, the entire system goes down.

It seems to me the optimal solution is to store the Input and Output FRS on local disks on a Business Objects server. Those local disks should then be replicated to other local disks on another Business Objects server.

This solution is accessing files at local disk speeds. The single point-of-failure is also negated by the replication of the FileStore to another server.

I would like to hear other opinions on the optimal solution.

cppwiz (BOB member since 2007-09-13)

system · January 4, 2008, 8:42pm

The biggest bottleneck in regards to the FRS is the time to read from the disk. Typically SAN/NAS drives are faster than standard hard drives (depends on vendor). If you’re concerned with network response time, you should work with your network admins, for most people this is not an issue. Also if you have a slow network response time, you’ll have problems elsewhere (reponce time between databases, communication with CMS, etc).

As far as failover it depends on your NAS/SAN vendor and configuration but typically these are highly redundant and highly available. I would think a local disk fails more often than a NAS/SAN.

A general recommendation: look at what your business requirements are. Does the server need to be up 24/7? If its not a critical system, maybe not worth investing extra time and money to increase redundancy/availability.

bension (BOB member since 2005-09-01)

system · January 4, 2008, 8:50pm

One more point: remember that if you have a clustered environment only one FRS service is active at a time.

bension (BOB member since 2005-09-01)

system · January 5, 2008, 3:49pm

I think the read times would have the least impact on end user response times. Most files in the Input FileStore are less than 100 KB. It seems to me that the write times for the Output FRS are heavily impacted by the narrow bandwidth of the network because the file sizes are much larger.

Assuming a Gigabit Ethernet connection between servers and NAS/SAN, that is still hundreds of times less bandwidth than a SCSI, SAS or SATA interface to disk.

NAS and SAN’s still need to be patched and taken offline for maintenance occasionally, right? But if local disk is replicated to another server, it seems to me that the system could approach five 9’s uptime and have a faster pipeline for disk access.

Sure there are trade-off’s when designing a system, but what would you consider the optimal design for the FileStore?

cppwiz (BOB member since 2007-09-13)

system · January 7, 2008, 2:37am

Maybe you’re right, I could have misspoken. Frequently used reports use the cache server, so that could minimize read responce time as you suggest.

I will say that unless you’ve done a lot of stress testing it can be hard to pinpoint exactly what’s causing slow performance. You might have a perfectly tuned FRS but find out your app server configuration is slowing things down. There’s a expression being “penny-wise but pound foolish”, so put things in perspective.

Anyway, in answer to your original question I would still recommend using a SAN/NAS drive.

bension (BOB member since 2005-09-01)

system · January 7, 2008, 6:42pm

We’ve been using NAS for our FRS on a three-machine production cluster successfully.

We also have a dev server with a local FRS.

Replication on NAS is not uncommon, and I would guess is probably easier to set up than replication of a local drive (that depends on the respective environments, of course).

You might get better throughput with a local FRS (or not, depending on the NAS and network hardware), but for availability, NAS is a must. High availability will require multiple BO servers in a cluster, with at least one instances of each service on multiple machines. This includes FRSs. Although only one FRS service is actually active at any point, you’ll want the same service enabled on another machine so that it can activate itself if the primary FRS fails. Given this, the FRS needs to be in a location that can be accessed by all cluster members with FRS services. You could, of course, set up a share on one local machine and map the other to it, but what if the FRS machine fails? Even if the drive is replicated, you will need to restore it or otherwise make it available to the remaining machine, which will be a manual effort. Better to point all cluster members at a common (redundant) file share.

If you only have a single machine, then I guess it doesn’t really matter – if the box goes down, then your whole environment is down anyway.

joepeters (BOB member since 2002-08-29)

system · January 9, 2008, 2:20am

I’ve been looking at this too, and am thinking of using DFS with Win20003 R2.

The setup would be have the DFS share primary set to the same node as the primary FRS. Replication happens in real time to the secondary node(s) and if the FRS falls over, the system should swap to a secondary node without intervention.

I haven’t tried it yet though

HTH

Hayden_Gill (BOB member since 2002-08-15)

system · January 9, 2008, 9:25pm

Thanks, I’m glad to see I’m not the only one who thinks local disks are an optimal solution. Here’s a topic from someone who is using this configuration:

https://bobj-board.org/t/87069

cppwiz (BOB member since 2007-09-13)

system · January 9, 2008, 10:52pm

Sounds good, my only issue is if you’re only using one way replication, what happens when the primary comes back up ? It would seem that it would never have any of the changes that occured when the backup was promoted, and therefore if the backup goes down, you’ve lost those changes.

I’ll ask in the other thread.

Hayden_Gill (BOB member since 2002-08-15)

system · January 10, 2008, 7:03am

I’ve modified the thread in https://bobj-board.org/t/87069 since we’ve had some major issues since the original post. Basically DFS will work ok when properly configured first (check, check, double-check)

The DFS settings would be Server1 (master - enabled ) → Server2 (slave - disabled). When Server1 is unavailable the FileStore on Server2 will be used.

The problems start when Server1 will become available again. The sync may overwrite/remove newer files on Server2. So make sure that the DFS settings are changed so Server2 is master (preferably as soon as Server1 was unavailable).
Server1 (slave - disabled) ← Server2 (master - enabled).

Plan the return to the first situation! Bugs in Windows or Business Objects (both are possible) gave us real problems. When the configuration is allright, make sure that it also works (and is indeed syncing from proper master to slave) and isn’t caching the old master

Endurance (BOB member since 2007-06-26)