DS Job Fails with CON-120302 - Delay opening svr connection

system · July 20, 2016, 1:20pm

A job that has been running every night for 2 years has recently started occasionally failing with the error

ODBC call for data source failed:
[Microsoft][SQL Server Native Client 10.0] Unable to complete login process due to delay in opening server connection

This is happening connecting to our warehouse DB which is running SQL Server.
Dataflows connect to this server all the time and there are probably other dataflows already connected and running OK in parallel.

When we rerun the job, it would typically run with no issue.

I’m guessing that there may be some network issue going on.
Is there any way of setting a CONNECT TIMEOUT on this native connection to our SQL Server DB?

Can’t see anything on the Datastore configuration and there are a number of Timeout parameters in dsconfig.txt, but uncertain which one to change if this is the place to look.

We are running DS 4.2 SP6

plattypuss (BOB member since 2016-07-20)

system · October 4, 2016, 2:26pm

I have seen the same issue in DS 4.2 SP7

I’ve opened an incident with SAP:

"The error is randomly triggered when two events clash. Data services (4.2 SP7) is running a random job regardless of the processes inside as long as the process consists of making a server connection to our database which is on SQL Server 2016 (running on a windows server 2012 r2 x64). And an in-house job which makes a snapshot of the virtual server, where SQL server runs, is running the “snapshot” task. The task is randomly triggered as per changes on the server (and it takes roughly 20 seconds).

Although the error, the trace log shows that the process completed successfully.

I need to understand if this is a false message or not. The error can be triggered in DS randomly. It is not tight to a particular process in the engine. It is tight with any DS process that makes a server connection to a SQL Server db while other operations are done to the server where SQL server is running, like: Virtual Machine replication process - snapshotting the live server which has SQL server running.

Is the message valid? If so, why is the job not going to the Catch block I have placed? Why is the job ending with success?"

Will post here once i have more information.

Regards,
Bogdan

bogdanpopey (BOB member since 2016-10-04)

system · October 4, 2016, 6:37pm

Try adding the following parameter under the [AL_Engine] section in your DSconfig.txt on the job server.

SQLServerLoginTimeout=60

SAP does state the default is 60 seconds, but I don’t think it is true, as I’ve experienced the timeout occurring around 20 seconds. You can change the value to whatever you like.

Restart of the job server after adding the parameter.

-Adam

avandeweghe (BOB member since 2012-04-25)

system · October 12, 2016, 10:41pm

Yes, I can concur with the SAP message.

We also had customers using VMWare snapshots to “backup” their Microsoft SQL Server instances, that were either hosting the BO/IPS CMC, DS local repos and/or the DWH itself.

If the VMWare snapshots are configured to ALSO capture the full memory state, the entire virtual server will be completely “frozen” and become unresponsive for quite a while, depending on the size of the server etc.

When BO/IPS and/or DS are running on a separate machine, this will cause some serious issues. Especially if the CMS and/or DS repo databases “disappear”. Your ETL jobs and your BO/IPS installation will simply go turtle at that point (e.g. flip on their backs and throw their feet in the air.).

We resolved these issues by either introduce scripting to this procedure whereby BO/IPS and DS are brought down on the other server before creating the SQL Server VM snapshots or in some cases actually moving the IPS + DS stack onto that server itself. Seeing that the entire server is frozen, DS itself shouldn’t actually be aware of the snapshot process happening at all.

However, we always point out that VM snapshots are not a replacement for regular database backups.

ErikR (BOB member since 2007-01-10)

system · October 17, 2016, 2:50pm

I think most of my clients get a backup of their VM once a week, on the weekend when no jobs are scheduled. And for some clients that’s too often. The O/S of the job server itself rarely changes. The most likely change is if the Export Execution Command is used or a job is scheduled within the Management Console.

Given the prevalence of off-server network attached storage, I’m not sure how much good a backup of the VM will be. With one DBMS I’ve worked with, I know that you can’t just perform file level copies of the database files and expect everything to work right if you restore the files selectively. Heck, even restoring all of them may not work because the backup works on one file at a time and by the time you back up every file they would be horribly out of sync. Good luck recovering with that.

As Erik said, a VM backup is no substitute for an actual DBMS initiated backup.

This may be a case where the ETL group needs to coordinate better with the sysadmin group.

eganjp (BOB member since 2007-09-12)