1)One dataflow the incoming records restricted to 1500 always , i am doing a look up and look up table is 21 millions and it is growing on daily basis ,in look up condition compare column is 1 field and output is primary key id from look up table.
Currently , i am using no cache as lookup table is huge. the performance is very bad as it is 20 mins to complete.
I cant use pre load cache because of 21 millions rows,
i addition to lookup there is gen_row_group() function is used , i dont see any issues with this in monitor logs only look up performance is bad.
2)In another df , the same 1500 records doing look up on the table having 2.6 millions (it is growing as well on daily basis), I am doing 15 look ups in a query transform , all look ups has compare column as primary key and different result column , cache option is no cache , I tried left outer join but it is taking more time than look up, want to check with you guys any way to improve the performance for these 2 dataflows.
What version of BODS are you using and what is the source database type?
I think starting somewhere around BODS 4.2 SP5, it gained the ability to pushdown lookups, but this depends on the source type and the lookup must be configured as no cache.
Also, just because the lookup is pushed down it doesn’t mean it will be automatically faster. That will depend on how the source table is designed with regards to indexs, partitions, etc.
For 1500 records looking up into millions, generally no cache is your best option. You can look at a few things to improve performance. First, is the comparison column indexed in the lookup table? Second, are there restrictions you can put on the lookup itself to narrow the lookup set? If there are, you can put custom SQL into the lookup to narrow your scope. Lastly, you can consider pre-load caching the lookup. It sounds counter-intuitive, but if your comparison column is 10 bytes and your result column is 10 bytes, you are only talking about loading 400 MB to memory.
Why are you doing 15 lookups? Is there a reason you cannot simply us 1 lookup and multiple return columns? See above for other options.