We’re experiencing some memory problems running Data Integrator jobs. The target database is a Sybase database. BODI version is 11.7 (we’ve also tried with 12.1).
In our jobstream there is a table comparison (TC) where 20000 rows are being compared for insertion to a target table of 2 million. This operation seems to require more than 8GB RAM (the amount on the job server).
The table comparison is using the sorted input option and there are no duplicates. The BODI documentation states that TCs using sorted input should not require any memory on the job server. Are the problems we’re experiencing caused by the target database version (Sybase)? It almost seems like BODI is unable to push down the TC operation to the underlying database.
I forgot to mention that we’ve tried that with the result that it doubles the time required for the stream to finish. Since the whole job is quite big we cannot use this option without exceeding the time constraints.
The documentation states that both row-by-row and sorted input should not require any memory.
Select a method for accessing the comparison table:
Row-by-row select Select this option to have the transform look up the target table using SQL every time it receives an input row. This option is best if the target table is large compared to the number of rows the transform will receive as input.
Cached comparison table Select this option to load the comparison table into memory. In this case, queries to the comparison table access memory rather than the actual table. This option is best when you are comparing the entire target table. Data Integrator uses pageable cache as the default. If the table fits in the available memory, you can change the cache type to in-memory in the data flow Properties.
Sorted input Select this option to read the comparison table in the order of the primary key column(s) using sequential read. This option improves performance because Data Integrator reads the comparison table only once.
So using Cached option, table is read into memory, so how DI can push down this operation? Whenever something is cached, its all DI in-memory!
May you you can build indexes or something on target table, on key columns?
Werner, as far as we can determine it is the TC which is causing the memory caching. The setup is one source table (it’s a view from Oracle), query, TC, followed by a key generation and a target table. Flag for duplicated is not turned on. We used both sorted input (causes caching) and row-by-row (takes too long to finish). Target and source table are both 2 million rows and the select statements selects 10 days worth of data (about 2000 rows).
I presume an inengine join would require more than one source table?