How many lookups are too many?

Hi All,

I have a table, 14 columns mostly decimal(28,0) with a few small varchar columns. I have 14 lookups in the ETL code and when I run this on our jobserver, it runs VERY VERY slow. However, other ETL scripts using this table without as many lookups runs like a champ.

However, when I run it on my local machine, it runs 100 times faster.
Any ideas? Our job server is pretty robust.


fccc_mike :us: (BOB member since 2008-04-03)

lookup or lookup_ext?
As a mapping or as a function call?
What is the cache setting for each? 10 cached, 4 row-by-row,…


Werner Daehn :de: (BOB member since 2004-12-17)

all of them are lookup_ext with pre load cache. There are none that are run as separate process. I dont have any case or table comparisons

version 12.1.0.0

Hope this helps


fccc_mike :us: (BOB member since 2008-04-03)

What is the size of the lookup table? With 14 of them, will all of them fit comfortably into memory?


dnewton :us: (BOB member since 2004-01-30)

there are a couple of different lookup tables. 2.1 to 3.6 million rows each . However, the number columns is small. The source table is 4.5 million rows with a about 14 columns. no large varchar columns though

hope this helps


fccc_mike :us: (BOB member since 2008-04-03)

I beleive it is a memory issue in regards to the number of lookups. I added the three last ones and the performance went downhill are there any tricks that I can do?


fccc_mike :us: (BOB member since 2008-04-03)

3M rows per lookup table
X 50 bytes per row (just a guess) for the lookup key and the results value
X 14 instances of the lookup table

= 2Gb of memory

So you are causing the overall dataflow to page to disk.

Are you always reading 4.5M rows from the source table, or just a subset of them?

Out of the 3M rows in each lookup table, are you always “visiting” each row in that lookup table, each time you load? If not, and depending upon the rows being loaded from the source, a no cache, or demand-load cache, may be more appropriate.


dnewton :us: (BOB member since 2004-01-30)

Thanks

I am visiting the rows each time I load. Its a staging table to my fact table. I guess I will have to figure out something else. Maybe splitting up job into to seperate jobs.


fccc_mike :us: (BOB member since 2008-04-03)

lookup_ext as a mapping or a function call? That makes quite a difference! With many lookups and many CPUs I would prefer lookup_ext as a function call. Actually, I prefer the function call always but I meant from a performance perspective.

What is possible sometimes is to return multiple keys with one lookup_ext call. Imagine I have a fact table for bookings. First lookup is to find the surrogate key of the account_no, second lookup for the customer_no, third for the customer_country.

But one account is owned by one customer only and a customer can have one country only. So my account dimension could not only contain the customer_no but also the current_customer_surrogate_key. So with one lookup against the account dimension, I get the account key plus customer and country key. As said, sometimes…


Werner Daehn :de: (BOB member since 2004-12-17)

I had to do that as well, using an intermediate table to store the results to prevent the job consuming 2 GB of memory. (As under 11.5.x, there is no paging to disk and the job will just crash.)

I also found ordering through the Query transformation to be very memory consuming (under 11.5.x, the job started to load all the data into memory to sort it). After using a SQL Transformation as source, forcing the ORDER BY and several joins to down to database level, the memory usage dropped quite a bit.


ErikR :new_zealand: (BOB member since 2007-01-10)

We also faced a similar issue
–> preload cache speeds up the lookup process, but when the table to be looked up is huge, the bodi process consumes the whole of the physical memory and becomes slow when using the paging memory.
–> no-cache becomes less effective when the number of times the look up is going to perform is huge.

What we did is store the data from source after performing the transformations into a temp table and then outer join this temp table with the lookup table.


jebakaran :india: (BOB member since 2008-05-29)