FIL-080134: dataflow Named pipe error occurred

system · January 23, 2013, 2:53pm

Hello,

We are running BODS XI 3.2 SP3 (12.2.3.0) on RedHat Linux server 5.4.
One of our job is failing with the following error:

$ cat error_01_18_2013_12_36_49_481__d45052db_e588_4993_819c_dd9873720a5e.txt
FIL-080134: |Dataflow DF_xxxx|Pipe Listener for DF_xxxx_1 Named pipe error occurred: <EOF>

On BODS Job Server traces server_eventlogXXX.txt, we can indeed see the error:

"Starting data flow or sub data flow with command line .... -KdcSDF_xxxx_1 ....
JobServer:  StartJob : Data flow or sub data flow 'xxxxxxx' with pid 'xxxxx' is kicked off (BODI-850048)
StopJob : Data flow or sub data flow with pid <x> is killed. (BODI-850054)

and I attached to this post the stack_trace.txt.

Would you have any idea of what could be the problem ?

(important information: the same job is working correctly on another system)
stack_trace.txt (54.0 KB)

dida (BOB member since 2013-01-23)

system · January 23, 2013, 3:15pm

I can’t speak to your specific situation and certainly not if it is a Linux specific problem.

In general, the named pipe error comes up because a Dataflow is using multiple processes and a process has lost touch. This could be a process to sort the data in the memory of the job server or to perform a group by. There are a number of ways to spawn a separate process within a Dataflow.

For some reason the separate process has either died or stopped responding. Sometimes it’s just a timeout due to something being too busy to respond.

When I run into these issues I have two ways to deal with it:

Turn off ALL separate processes. Often times this was enabled by someone that didn’t quite know what they were doing. There is a developer at one of my client sites that just loves turning on all these separate processes thinking it makes his Dataflow runs faster. It doesn’t.
Improve the overall execution speed of the Dataflow. When processes take too long they seem to be more likely to get lost.

eganjp (BOB member since 2007-09-12)

system · January 23, 2013, 4:37pm

Thank you for the quick answer, eganjp.

There is no “separate process” option set in this job;
but, there is a Data Transfer (in order to push down ORDER BY operation) (Data Transfer number of loaders = 1)
that is certainly making the usage of Pipe.

And indeed, the difference that exists between our 2 systems, is that, even if it is the same job that is running, the volume of data processed is bigger on the system where the error occurs.

So, the “timeout” can be the root cause.
I could investigate on improvement of duration of this dataflow.
Or / and , would there be any timeout parameter I could increase on BODS server side ?

dida (BOB member since 2013-01-23)

system · January 23, 2013, 4:39pm

Do you have any function call or stored procedure running?

Arun

Arun.K (BOB member since 2011-10-18)

system · January 23, 2013, 4:50pm

Yes, there is a timeout that can be changed. It is in the DSConfig.txt file found in the bin directory. I don’t have the exact parameter name handy and I’m off to a meeting so I can’t look it up. I’ll check back later…

eganjp (BOB member since 2007-09-12)

system · February 4, 2013, 8:27am

Jim, could you please check? I looked in the dsconfig but could not find it…

We also have a simple DF that crashes after a few seconds with a table comparison object. Nothing fancy, 200 MB memory usage. Once in a while it crashes…

Johannes Vink (BOB member since 2012-03-20)

system · March 4, 2013, 10:48am

Other information.
At the time of the dataflow crash, there is always, this Linux system message (in /var/log/messages):

kernel: al_engine[XXXXX] ... trap invalid opcode rip:12d5f82 rsp:YYYYYYY error:0

Could it help in finding the root cause ?

dida (BOB member since 2013-01-23)

system · April 8, 2013, 3:36pm

For other reasons, the source and target databases have been scratched.
The source database has then been re-populated.
The same BODS jobs have been executed.

We did not reproduce the problem any longer…
…crossing the fingers it will not appear again !

Thank you for the information shared in this topic anyway.

dida (BOB member since 2013-01-23)

system · April 8, 2013, 10:11pm

I may feel this as a Data Issue for some reason!

So in your Data Transfer, is the Transfer type Table or a File?

ganeshxp (BOB member since 2008-07-17)

system · April 9, 2013, 12:24pm

Same data was pushed into source datastore !
I rather suspect a temporay environment issue: oracle, OS, …

ganeshxp, my Data Transfer is a Table.

dida (BOB member since 2013-01-23)

system · July 17, 2014, 6:37pm

Dear experts,
today we got the same error ASB. do you have the reason for this error and permanent fix ? we reran the job still the same error has thrown.

Job name: Job_XXXXX_XX_to_XX_MoXXXXXXXnt
(12.2) 07-17-14 18:42:25 (E) (5180:5500) Faa-0134: |Dataflow DF_10XXXXX_XX_to_XX_MoXXXXXXXnt|Pipe Listener for DF_10XXXXX_XX_to_XX_MoXXXXXXXnt_1
Named pipe error occurred:

(12.2) 07-17-14 18:42:25 (E) (5180:6628) Sxxx-215: |Dataflow DF_110XXXXX_XX_to_XX_MoXXXXXXXntFinal
Data flow <DF_10XXXXX_XX_to_XX_MoXXXXXXXnt_1_1> with pid failed to stop. Diagnostic information
<1-4-021-1001-Error from Job server <Unable to kill data flow or sub data flow with pid .
The parameter is incorrect.

Chilukuri.Venkat (BOB member since 2007-09-15)

system · July 17, 2014, 9:25pm

See post #2 of this thread.

The DSConfig.txt settings to look for are:
DFRegistrationTimeoutInSeconds=300
NamedPipeWaitTime=100

You can TRY increasing these values, but I think you are better served by redesigning the Dataflow. You may also want to take a look at this thread as it might be related to your issue a bit: Named pipe error occurred in push-down.

eganjp (BOB member since 2007-09-12)

system · July 18, 2014, 12:51pm

Hi all,

By changing the DF property to Pageble from In-Memory in the production job server, We were able to run this job successfully.

Chilukuri.Venkat (BOB member since 2007-09-15)