Setting a parameter/variable inside a dataflow

system · September 16, 2008, 6:02pm

How do i set a parameter or variable inside a dataflow ?
Scenario: Post Validation, If error i need to raise exception and set this paramter(flag) to 1, So that later if this flag is found 1 i do a rollback of all data in the tables if they have been loaded.
I have taken care of the rollback process and it is a conditional Workflow which executes only if this flag is 1.
So i would like to know if there is any way i can ,inside a dataflow, set this parameter/variable to 1 ?

I do use a dummy table with one column of varchar(1000) to make use of functions like raise_exception or email etc inside the dataflow, but i dont know how to set this parameter.

Thanks.

kountay (BOB member since 2008-08-13)

system · September 17, 2008, 7:07am

What about using the auditing feature? It is built for exactly that…

You let all error rows go a specific path, e.g. via the validation transform or a case transform, then you add a counter for that path and define the audit action.

https://boc.sdn.sap.com/node/5325

Werner Daehn (BOB member since 2004-12-17)

system · March 20, 2009, 7:24pm

Can we set a variable to a string dynamically in a data flow?
Could you let me know the ways to do this, if it can be done, please?

srinivas_t (BOB member since 2007-11-25)

system · March 21, 2009, 9:03pm

Inside a dataflow yes, but the value is not changed in the parent workflow.

You write a custom script function that has an input/output parameter. e.g. you pass into the function the value $counter, inside the function it is changed to +1 and hence the variable $counter of the dataflow is increased by one.

Werner Daehn (BOB member since 2004-12-17)

system · March 26, 2009, 5:10am

You mean to say that variable sustains the updates through-out the life of that data flow, but not outside…?

Could you give me any examples – or any notes that you may have written on this?

srinivas_t (BOB member since 2007-11-25)

system · March 26, 2009, 7:47am

Yes.
I had once, but nothing at hand right now. But I know others in the forum are using that technique…

Werner Daehn (BOB member since 2004-12-17)

system · October 6, 2009, 10:52am

I tried the following:

Created a dataflow with an output parameter ($df_job_nr). The dataflow generates some new number which I want to capture in this parameter. I do this by calling a custom function (EXP_assign_int) in a query transform in this dataflow, which contains the following code:

$Parameter_out = $Parameter_in ;

print( 'In EXP_assign_int: ' || $Parameter_out ) ;
Return $Parameter_in ;

It prints the value as expected, so I assume it is returned as well and the query transform that contains the following line

EXP_assign_int($df_job_nr , EXP_prev_transform.JOB_NR)

sets the output parameter ($df_job_nr). However this stays empty. What do I do wrong?

lamanp (BOB member since 2008-09-02)

system · October 6, 2009, 3:01pm

can you explain a bit more about what stays empty?

I assume you are assigning the call to the function to a column in the output schema?

speed12 (BOB member since 2004-09-27)

system · October 6, 2009, 3:13pm

If I am interpreting your need right, you may want to use a global variable. I use these to hold values until later in the job all the time. The assigned value persists until changed or until the job ends.

Chris

cfachris (BOB member since 2003-09-03)

system · October 6, 2009, 6:57pm

Let me elaborate on how I’ve set this up.

A dataflow exists which has an output parameter $df_par. The idea is to assign a value to it that is available on the ‘outside’.
The dataflow has three query transforms that are linked to one another: EXP_0 --> EXP_1 --> EXP_2.
In the dataflow, a value that it retrieves is assigned query transform EXP_1 to this parameter using a custom function EXP_assign_int with the following mapping:

EXP_assign_int( $df_par, EXP_0.PAR )

This should assign the value from column previous_transform.PAR to both the output column of EXP_1 as well as to the parameter $df_par.
Custom function EXP_assign_int has two parameters and a return value:

$Parameter_out, datatype int, type output
$Parameter_in, datatype int, type input
Return, datatype int
and contains the following code:

$Parameter_out = $Parameter_in;
print( 'In EXP_assign_int: ' || $Parameter_out );
Return $Parameter_in;

The batch job, which this dataflow is part of, has a global variable $bj_var. This is also assigned the same value in EXP_1:

EXP_assign_int( $bj_var, EXP_0.PAR )

The print in the custom function code demonstrates that the correct value is assigned. A subsequent query transform EXP_2 in the dataflow has two print statements to show the contents of $df_par and $bj_var. However, both appear to remain empty. The question is: why? What’s the use of a dataflow output parameter, if it can’t be assigned a value?

BTW. The Return value from the custom function contains the assigned value…

lamanp (BOB member since 2008-09-02)

system · October 7, 2009, 11:11am

I’m still not sure why you cannot just use a Global variable?

If you check the manual I cannot find anywhere it says you can return a value via a parameter from a DF?

speed12 (BOB member since 2004-09-27)

system · October 7, 2009, 11:15am

As I described above, that also remains empty (unassigned). My question remains, what’s the use of an output call parameter on a dataflow, if it cannot be assigned?

lamanp (BOB member since 2008-09-02)

system · October 7, 2009, 11:36am

You have two questions there - why does DI permit you to define a parameter type of output for a DF if it doesn’t support it - I do not know.

The second one, yes the GV should be set, I have just tried it here and it works fine. When you say you have a second query with a print statement - do you mean a script?

speed12 (BOB member since 2004-09-27)

system · October 7, 2009, 11:48am

No, I have a subsequent query transform with somr dummy columns that contains a print statements in the mapping. These prove that the global variable is not set, but the EXP_assign_int function does return a value, because that’s mapped to a column who’se value is mapped to the print statement.

lamanp (BOB member since 2008-09-02)

system · September 11, 2010, 12:58pm

Sorry, to resurrect the old thread.

I’m trying to assign a global variable in the custom function that is called in the query transform within the dataflow. I have the same issue - within the function the global variable is assigned and I can print it. But when I print it in the workflow to which the dataflow belongs - it is empty.

Also, what is interesting is that when I validate the function syntax I get a warning: “[Function:f_test] The function <f_test> contains an invalid expression. Additional information: <Referencing undeclared variable <$V1>. Please declare the variable in the context it is being used in”. But at the same time global variable is visible in the Parameters tab.

Did anybody succeeded in this?

Thank you for your help
Leonid

lxshteyn (BOB member since 2010-08-19)

system · September 11, 2010, 4:43pm

As far as I understand now, the issue is the scope in which a variable is used. The term global variable is somewhat misleading, because it behaves different from global variables in many programming languages. Since a data flow runs as a separate process, it gets a copy of the global variable. That copy can be set within the data flow, but does not change the global variable itself. Therefore that remains unchanged.

Can anybody confirm this?

lamanp (BOB member since 2008-09-02)

system · September 13, 2010, 7:46am

A dataflow is a process of its own and variables are passed in by value, not by reference.
So when you pass in the variable $X with the value 32 to the dataflow, inside the dataflow a new memory structure will be allocated and the value 32 written into. Now you modify that value and hence the memory structure. But that hos no effect to the memory structure and hence the value of the workflow, it is still 32.

Overall might not have been the wisest decision to implement that in DI this way, on the other hand you could run into lot of troubles else, two threads changing the same variable at the same time, parallel processing in general, multi-process architecture, grid computing,…

Werner Daehn (BOB member since 2004-12-17)

system · September 13, 2010, 3:07pm

I agree. Then what is the best way to pass data between the dataflow and the outside world without creating permanent database tables?

Thank you

lxshteyn (BOB member since 2010-08-19)

system · September 14, 2010, 7:38am

What is the use case?

Werner Daehn (BOB member since 2004-12-17)

system · September 26, 2014, 2:04pm

I have a use case!

In my dataflow, I have a CASE statement that checks for duplicate rows in the source stream (a group by query with record count > 1). I branch off and write these records to a duplicate file.

It seems that if I want header rows in my file, these are written even if no duplicates are put in the file. In a script after the dataflow, I check for the size of the duplicate file and send an email notification if the size > 0. Well, header row only runs show 1 kb which is not definitive enough. Essentially we only want notifications if duplicates exist so I considered setting a variable in the duplicate branch of the dataflow if any records came through the CASE branch.

Is it possible to set an OUT parameter with this value that the script can pick up (which I assume to be in the same context as the containing workflow)?

Chris

cfachris (BOB member since 2003-09-03)