Slow execution of BODI jobs on Sybase server

system · October 12, 2009, 11:08am

Hi,

We’re experiencing some memory problems running Data Integrator jobs. The target database is a Sybase database. BODI version is 11.7 (we’ve also tried with 12.1).

In our jobstream there is a table comparison (TC) where 20000 rows are being compared for insertion to a target table of 2 million. This operation seems to require more than 8GB RAM (the amount on the job server).
The table comparison is using the sorted input option and there are no duplicates. The BODI documentation states that TCs using sorted input should not require any memory on the job server. Are the problems we’re experiencing caused by the target database version (Sybase)? It almost seems like BODI is unable to push down the TC operation to the underlying database.

Any help is appreciated.

ahdno (BOB member since 2009-10-09)

system · October 12, 2009, 11:44am

Try Row by Row select, If memory is the issue!

data_guy (BOB member since 2006-08-19)

system · October 12, 2009, 11:56am

I forgot to mention that we’ve tried that with the result that it doubles the time required for the stream to finish. Since the whole job is quite big we cannot use this option without exceeding the time constraints.

The documentation states that both row-by-row and sorted input should not require any memory.

ahdno (BOB member since 2009-10-09)

system · October 12, 2009, 1:09pm

Couple of things from the manual:

Comparison method

Select a method for accessing the comparison table:

Row-by-row select Select this option to have the transform look up the target table using SQL every time it receives an input row. This option is best if the target table is large compared to the number of rows the transform will receive as input.

Cached comparison table Select this option to load the comparison table into memory. In this case, queries to the comparison table access memory rather than the actual table. This option is best when you are comparing the entire target table. Data Integrator uses pageable cache as the default. If the table fits in the available memory, you can change the cache type to in-memory in the data flow Properties.

Sorted input Select this option to read the comparison table in the order of the primary key column(s) using sequential read. This option improves performance because Data Integrator reads the comparison table only once.

So using Cached option, table is read into memory, so how DI can push down this operation? Whenever something is cached, its all DI in-memory!

May you you can build indexes or something on target table, on key columns?

just a thought!

data_guy (BOB member since 2006-08-19)

system · October 12, 2009, 1:46pm

Data-guy,

we’re not using cached comparison table, but sorted input. We want to avoid high memory usage and especially disk caching.

Edit: The database is Sybase ASE version 15, driver jTDS Type 4 JDBC Driver for MS SQL Server and Sybase 1.2.

ahdno (BOB member since 2009-10-09)

system · October 15, 2009, 6:57am

Anyone else?

ahdno (BOB member since 2009-10-09)

system · October 16, 2009, 2:28pm

How do you proof it is the TC causing the caching? Absolutely sure it can’t be anything else, an in engine join, a…?

TC also does not have flags like input-contains-duplicates set, yes?

Werner Daehn (BOB member since 2004-12-17)

system · October 19, 2009, 10:25am

Werner, as far as we can determine it is the TC which is causing the memory caching. The setup is one source table (it’s a view from Oracle), query, TC, followed by a key generation and a target table. Flag for duplicated is not turned on. We used both sorted input (causes caching) and row-by-row (takes too long to finish). Target and source table are both 2 million rows and the select statements selects 10 days worth of data (about 2000 rows).

I presume an inengine join would require more than one source table?

ahdno (BOB member since 2009-10-09)