Problem using data_transfer (DI 11.7) with timeouts?

system · December 27, 2007, 3:53pm

We have a dataflow that sometimes (but not always) gets this error in production during our nightly DW load:

DFC-250014: |Dataflow df_Stage_Stage_MW_OPTOUT_LKUP
Job or data flow <df_Stage_Stage_MW_OPTOUT_LKUP> did not receive registration requests from its children within <30> seconds.

This dataflow has several data_transfer steps, which means the dataflow gets broken up into several sub-data-flows at runtime.

This dataflow runs fine when testing it by itself in the dev/test environment.

Here’s my theory: In production in our nightly job, we run several dataflows in parallel at any given time. I think this problem could be an interaction between the sub-dataflows and parallelism. The parallelism is capped at 4 (only 4 al_engines can be running at any given time) in our environment.

Each dataflow – or sub-data-flow? – gets its own AL_ENGINE.exe process on the server, so Dataflow A might call sub-dataflow B. Sub dataflow B might try to run but not be able to because it is waiting for a free AL_ENGINE that is capped at 4. Other dataflows C, D, and E already running in parallel might make it wait for a long time… And then you get this timeout error.

Unless DI is specifically designed to give priority in the processing queue to sub-data flows belonging to a parent data flow that is already running…

dnewton (BOB member since 2004-01-30)

system · December 28, 2007, 2:08pm

I don’t think this problem is specific to data_transfer - it can occur when using other features that create sub-data flows, and also when distribution level is set to ‘dataflow’. Our job servers are configured to allow 16 concurrent job engine processes and, running a job standalone in the dev. environment, there is no way I am running that number of processes.

dastocks (BOB member since 2006-12-11)

system · December 28, 2007, 2:55pm

There’s a DSCONFIG option to increase the timeout that the parent will wait for the child to complete. We’ve upped it from 30 to 300 seconds, maybe that will make a difference.

dnewton (BOB member since 2004-01-30)

system · January 2, 2008, 2:47pm

Curious, did this help? We get that error very infrequently but have always been able to handle it in the past by runnning that job at a time of lower server volume.

Ernie

eepjr24 (BOB member since 2005-09-16)

system · January 2, 2008, 2:48pm

It seems to have helped, but the problem is sporadic enough that I’d probably need another week or two of sampling to be sure.

dnewton (BOB member since 2004-01-30)

system · July 10, 2008, 1:54pm

HI: We are having the same problem you had with the data tranfers. By your posting date a few months had passed after you made the change to your DFConfig file, so did changing the DSconfig file to 300 seconds helped? Did it affect anything else? We are ready to change ours to 100 seconds but want to make sure is not going to affect anything else in DI.

mlepak (BOB member since 2007-05-02)

system · July 10, 2008, 3:07pm

Yes, this seems to have permanently fixed it.

dnewton (BOB member since 2004-01-30)

system · July 31, 2008, 5:06pm

I got this problem as well. Just started happening out the blue when I’ve run the same DF before.

The DSCONFIG parameter didn’t help but I broke up the job, restarting at the DF that failed and it ran OK.

George (BOB member since 2003-06-27)

system · October 15, 2008, 3:45pm

Hi,

I am having the same problem, can you please tell me which parameter in the DSCONFIG i need to change.

Thanks
Nalini

nalinikumar (BOB member since 2008-10-10)

system · October 16, 2008, 12:51am

I can’t remember off hand but look for the word “timeout”, it will currently be set to 300.

George (BOB member since 2003-06-27)

system · October 10, 2012, 4:09pm

Very old thread, but for those that are still seeing timeout issues…

Here is the error I see in the log:

It isn’t clear, but I don’t think this is actually a database issue even though the first line indicates that. The second line indicates there is a child process that it has lost communication with. I think this is the issue, but I could be wrong since the Dataflow normally executes in less than 30 seconds. The child process issue is (in my experience) often seen when you have two independent processes in a Dataflow (two Query transforms feeding into a common third Query transform). The Dataflow is waiting on one of them and somehow it goes off into La-La land.

I could find nothing in errorlog.txt that correlates to this job and there are no dump files (DSConfig.txt has the setting TURN_DUMP_ON=BOTH so dump files should get created I believe).

The property that is referenced earlier in this thread is DFRegistrationTimeoutInSeconds. Mine is currently set to 300 and I’m still seeing timeout issues. For the Dataflow in the error 300 should be more than enough but I’m going to increase it to 600 just for giggles.

This is on DS 12.1.1.4 using an Oracle Source/Target with a SQL Server repository. I can stay logged in to my SQL Server database for days without executing a query and then execute one and get no timeout error. So I don’t think the SQL Server database is the issue.

eganjp (BOB member since 2007-09-12)