Execute ABAP dataflows in parallel

My job requires data from 10 ECC tables which need to be joined. Although it is possible to create one ABAP dataflow that performs the join at once, that appears to be a performance disaster.

It is also possible to create an ABAP dataflow for each table and join it in the dataflow that contains all of these ABAP dataflows and performs the joins. The drawback is that a lot of data is retrieved from ECC and it needs a lot of caching, which also slows down the performance. Also, only one ABAP dataflow is executed at once, so there is no parallellism.

However, when I create 10 dataflows, containing 10 ABAP dataflows and store everything in temporary tables and then join the data from these tables, multiple dataflows and ECC jobs are submitted in parallel. I find that this approach creates a messy job with all these dataflows. Is it possible to execute multiple ABAP dataflows in parallel when they are within one dataflow?


lamanp :netherlands: (BOB member since 2008-09-02)

The first step would be to optimize the ABAP before doing anything else. Why is it a performance disaster? Can you show the main code areas of the ABAP and come back with row counts for each table?

I am assuming it is a 10-way join in the ABAP and that is causing the issues. But if two tables are large and the other just small, we could set the small ones to “cache” and force them to be cached inside an ABAP in-memory table. This way the SAP underlying database has less to do and it might do the trick.


Werner Daehn :de: (BOB member since 2004-12-17)

I attached an export of the ABAP dataflow, its generated ABAP as well as a screen shot of it. The tables that are used have the following row counts:
[list]EANL: 2376906
EASTL: 5509353
EASTS: 6707698
EGERH: 7585500
EGERS: 7591657
ETDZ: 8694442
EUIINSTLN: 5645630
EUITRANS: 5606113
EXTRACT: 2396317
EZUZ: 2314996
TE345: 59
[/list]So in fact all tables but the last are rather large. I did not change anything to the ABAP (that’s not my cup of tea (-: ). The dataflow ran for 4 hours and 20 minutes. When I build 10 data flows, each with one ABAP dataflow in it to retrieve data from one ECC table (and just filter the data), write this to 10 Oracle tables and join these, it takes me 16 minutes…

This is built using BODS 3.2 (12.2.3)
ABAP_Join_All.png
ADF_C4G_Extract_ECC_ABAP.zip (8.0 KB)
ABAP_Join_All.txt (24.0 KB)


lamanp :netherlands: (BOB member since 2008-09-02)

okay, you win. There is no ABAP tweaking we can make to speed that up.

You could build one dataflow with 10 (or less, some ABAPs might join two or mroe tables already?) dataflows and put a Data_Transfer behind each?


Werner Daehn :de: (BOB member since 2004-12-17)

Werner,

I tried that, but the ABAP dataflows are not executed in parallel, so now I have 10 dataflows with one ABAP dataflow each that extract the required data into 10 Oracle tables. There I do the join. Performance drops from 4 hours and 20 minutes to 16 minutes.

To be honest, I find this a quite ugly solution. Too many bits and pieces, while the original solution only had 1 dataflow, containing 1 ABAP dataflow.


lamanp :netherlands: (BOB member since 2008-09-02)