How do i set a parameter or variable inside a dataflow ?
Scenario: Post Validation, If error i need to raise exception and set this paramter(flag) to 1, So that later if this flag is found 1 i do a rollback of all data in the tables if they have been loaded.
I have taken care of the rollback process and it is a conditional Workflow which executes only if this flag is 1.
So i would like to know if there is any way i can ,inside a dataflow, set this parameter/variable to 1 ?
I do use a dummy table with one column of varchar(1000) to make use of functions like raise_exception or email etc inside the dataflow, but i dont know how to set this parameter.
What about using the auditing feature? It is built for exactly that…
You let all error rows go a specific path, e.g. via the validation transform or a case transform, then you add a counter for that path and define the audit action.
Inside a dataflow yes, but the value is not changed in the parent workflow.
You write a custom script function that has an input/output parameter. e.g. you pass into the function the value $counter, inside the function it is changed to +1 and hence the variable $counter of the dataflow is increased by one.
Created a dataflow with an output parameter ($df_job_nr). The dataflow generates some new number which I want to capture in this parameter. I do this by calling a custom function (EXP_assign_int) in a query transform in this dataflow, which contains the following code:
If I am interpreting your need right, you may want to use a global variable. I use these to hold values until later in the job all the time. The assigned value persists until changed or until the job ends.
A dataflow exists which has an output parameter $df_par. The idea is to assign a value to it that is available on the ‘outside’.
The dataflow has three query transforms that are linked to one another: EXP_0 --> EXP_1 --> EXP_2.
In the dataflow, a value that it retrieves is assigned query transform EXP_1 to this parameter using a custom function EXP_assign_int with the following mapping:
EXP_assign_int( $df_par, EXP_0.PAR )
This should assign the value from column previous_transform.PAR to both the output column of EXP_1 as well as to the parameter $df_par.
Custom function EXP_assign_int has two parameters and a return value:
$Parameter_out, datatype int, type output
$Parameter_in, datatype int, type input
Return, datatype int
and contains the following code:
The batch job, which this dataflow is part of, has a global variable $bj_var. This is also assigned the same value in EXP_1:
EXP_assign_int( $bj_var, EXP_0.PAR )
The print in the custom function code demonstrates that the correct value is assigned. A subsequent query transform EXP_2 in the dataflow has two print statements to show the contents of $df_par and $bj_var. However, both appear to remain empty. The question is: why? What’s the use of a dataflow output parameter, if it can’t be assigned a value?
BTW. The Return value from the custom function contains the assigned value…
As I described above, that also remains empty (unassigned). My question remains, what’s the use of an output call parameter on a dataflow, if it cannot be assigned?
You have two questions there - why does DI permit you to define a parameter type of output for a DF if it doesn’t support it - I do not know.
The second one, yes the GV should be set, I have just tried it here and it works fine. When you say you have a second query with a print statement - do you mean a script?
No, I have a subsequent query transform with somr dummy columns that contains a print statements in the mapping. These prove that the global variable is not set, but the EXP_assign_int function does return a value, because that’s mapped to a column who’se value is mapped to the print statement.
I’m trying to assign a global variable in the custom function that is called in the query transform within the dataflow. I have the same issue - within the function the global variable is assigned and I can print it. But when I print it in the workflow to which the dataflow belongs - it is empty.
Also, what is interesting is that when I validate the function syntax I get a warning: “[Function:f_test] The function <f_test> contains an invalid expression. Additional information: <Referencing undeclared variable <$V1>. Please declare the variable in the context it is being used in”. But at the same time global variable is visible in the Parameters tab.
As far as I understand now, the issue is the scope in which a variable is used. The term global variable is somewhat misleading, because it behaves different from global variables in many programming languages. Since a data flow runs as a separate process, it gets a copy of the global variable. That copy can be set within the data flow, but does not change the global variable itself. Therefore that remains unchanged.
A dataflow is a process of its own and variables are passed in by value, not by reference.
So when you pass in the variable $X with the value 32 to the dataflow, inside the dataflow a new memory structure will be allocated and the value 32 written into. Now you modify that value and hence the memory structure. But that hos no effect to the memory structure and hence the value of the workflow, it is still 32.
Overall might not have been the wisest decision to implement that in DI this way, on the other hand you could run into lot of troubles else, two threads changing the same variable at the same time, parallel processing in general, multi-process architecture, grid computing,…
In my dataflow, I have a CASE statement that checks for duplicate rows in the source stream (a group by query with record count > 1). I branch off and write these records to a duplicate file.
It seems that if I want header rows in my file, these are written even if no duplicates are put in the file. In a script after the dataflow, I check for the size of the duplicate file and send an email notification if the size > 0. Well, header row only runs show 1 kb which is not definitive enough. Essentially we only want notifications if duplicates exist so I considered setting a variable in the duplicate branch of the dataflow if any records came through the CASE branch.
Is it possible to set an OUT parameter with this value that the script can pick up (which I assume to be in the same context as the containing workflow)?