Performance tips for unified dataflows for initial & del

We recently had to re-initialise some of our extractors during testing of ERP support packs. One of our extractors was taking a couple of days to initialise (see old flow).

The way it works is using an if statement to replace the operation type (X) that is supplied from an initial load with an (I) so the map_CDC will still work.
The statement it uses is:

ifthenelse(“0MATERIAL_TEXT_SAP_DW”.DI_OPERATION_TYPE = ‘X’, ‘I’ , “0MATERIAL_TEXT_SAP_DW”.DI_OPERATION_TYPE)

I re-wrote this using a case transform to redirect inital load data so it avoids the map_cdc all together (see new flow). This now runs an initial load in about 20 minutes. Big improvement !

Note that in other cases (0MATERIAL_ATTR for example) I get almost no improvement from this.

I’m interested in what other people do here. Are there any other techniques to optimize a single flow for both initial and delta loads ?

Thanks,
Leigh.
0MATERIAL_NEW.jpg
0MATERIAL_OLD.jpg


Leigh Kennedy :australia: (BOB member since 2012-01-17)

We handle delta/full loads in 2 approaches:

  • Use a seperate job with the same objects, only change lower and upper date. This can be used when the performance of a full load is okay
  • Use a seperate job with specific designed objects for a full load. I.e. a delta on KNVV via CDHDR is fine for a daily delta. But for a full load transferring the full table is faster. BSEG the same, selecting on fiscal year is for a full load much faster as the index is used.

I am not too happy with coding a full/delta in one DF. It confuses code and for a full load sometimes you want to do more than a simple decode() statement to differentiate between full/delta.


Johannes Vink :netherlands: (BOB member since 2012-03-20)