Oracle bulk load API does not improve performance

system · January 20, 2009, 8:55am

Very simple data flow: SOURCE (1.1 million rows) => QUERY => TARGET. Source and target Oracle 10g. DI 11.5.x. No indexes or keys in target, doing insert only. When loading data using conventional load and using Oracle bulk getting almost identical run times. Changed the default rows per commit from 1000 to 5000, still no effect on performance.

Any ideas on how to speed this up?

Pasi Tervo (BOB member since 2002-09-03)

system · January 20, 2009, 11:47am

Increasing Rows per commit isn’t enough

Do you have any indexes, partitions on source table ?
Do you have partition on target table?
Have you used any functions or lookups in QRY transform ?

There are so many parameters to consider, such as

Increasing DOP size,
Setting No. of loaders on tgt table, and
Increase array fetch size on source table etc…

However you have chosen oracle Bulk load Conventional path method.
I would suggest you to go for Direct path method, relatively Direct path is
method is faster than Conventional path.

gssharma (BOB member since 2006-10-30)

system · January 20, 2009, 12:46pm

There are no partitions.

Yes I am very much aware of DOP, number of target loaders and other performance options. Those are techniques I will look into later. However, since this kind of data flow really is an ideal candidate to use bulk load in I want to see where that technique can take me performance-wise first.

By ideal candidate I mean there is no TC here, no need to update target, no indexes that need to be dropped & recreated etc. Essentially I am just copying data from one Oracle box into another.

I have chosen API method which really is the direct path method. When using this technique data should be written directly into Oracle datafiles and sql layer on top should be passed altogether. At least that’s what the documentation says. This technique should really be fast but for some reason in my test load it isn’t.

When you choose file method ( which I am not using) then you write data to a temp file first and from there Oracle loads it into data files.

What I am thinking really happens here is that even though I try to access Oracle’s direct path engine from DI, no bulk loading really takes place. Maybe something needs to be set up on Oracle to make it work?

Pasi Tervo (BOB member since 2002-09-03)

system · January 20, 2009, 5:07pm

Is your database running in archive log mode? Execute an

alter table target nologging

and try the API bulkload performance again. http://www.adp-gmbh.ch/ora/misc/nologging.html

Also, keep in mind you would not see a difference if not the loading but the reading is the bottleneck. For further analysis if above did not help, please send a sniplet of each run’s monitor log, just 10 lines from in the middle is enough.

Oh, increase the monitor sample rate to 65000 and if possible, disable the virus scanner for the database files and the DI log files.

Werner Daehn (BOB member since 2004-12-17)

system · January 27, 2009, 12:49pm

Thanks for your reply Werner.

Forcing no logging on target table does not have any effect on run time when using bulk load. As a matter of fact using bulk load on target slightly increases the run time of the data flow compared to regular load.

I am beginning to think bulk load is not the way to go in our case, at least not with the DI version (11.5) we’re using. We will update to latest version later this year. I may try this again then.

In our case what really improves performance is a full push down where df generates insert into … select -kind of direct path sql (where generated sql begins with insert /*+ APPEND */.) Unfortunately the generated sql is not always predictable with DI 11.5. Sometimes a surprisingly complex df with several joins etc can be fully pushed down. Sometimes just adding sysdate() function in one column (this function definitely has an Oracle equivalent) causes a partial push down - meaning just the select part is pushed to Oracle.

Pasi Tervo (BOB member since 2002-09-03)

system · January 27, 2009, 8:29pm

When you say that the generated sql is not predictable, do you mean it is inconsistent with what it says it will be when you do Validate | Display optimized SQL? Or just that you find inconsistencies when you make relatively minor changes and DI translates them to SQL differently from what you expected?

If it is the first one, you should be filing a bug immediately. I have not encountered that one and I was on 11.5.x for a good while, but things slip through. If you mean the latter, it is entirely possible for that to happen, but generally you can work around it by using the pushdown_sql function or by making minor modifications to your queries. If you DON’T want DI to push something down, just put a map operation into the flow before the point where you want it to stop pushing to the database. A bit clunky, but it works.

Ernie

eepjr24 (BOB member since 2005-09-16)

system · January 27, 2009, 9:37pm

Just to double-check: You are using the API bulkloader, not the file bulkloader. And you have validated the writing is the bottleneck, not reading or transformation?

Werner Daehn (BOB member since 2004-12-17)

system · January 28, 2009, 7:56pm

Werner: yes I am using API bulkloader, not file. I can not give a definitive answer to the bottleneck question. All I can say is that since I can manipulate the dataflow to run much faster with fully pushed down sql I have to assume reading is not the bottleneck and making the writing faster should speed this thing up.

Ernie: with generated sql not being predictable I mean your second statement. Let me be more specific by giving an example.

I start by creating a simple source => query transform => target dataflow. This generates a fully pushed down sql (as it should).
I add sysdate() function in one column in query transform. Now sql is only partially pushed down (just the select part) :? Why?
I delete the column with sysdate() function from query transform. I recreate the target Oracle table with a default sysdate value on the column in question. Now the sql is again fully pushed down. Job runs much faster than on step 2) even though the loaded data looks exactly the same.
I try to recreate step 2) by again adding the column with sysdate() function in query transform. I recreate the target table to not have sysdate default value in the column. Sql is still fully pushed down.

What I do not understand is why steps 2) and 4) produce different sql codes, even though data flows are identical. In my opinion all four steps should produce identical fully pushed down sql’s.

This is just one example. One one hand we have many dataflows that generate insert into … select type of sql and run very fast. On the other we have df’s that look very much the same but where only select part is pushed down. These run much slower. It is this latter category where we have played around with bulk load trying to make them faster.

Pasi Tervo (BOB member since 2002-09-03)

system · January 28, 2009, 8:18pm

Because the DI optimizer did not get what you were trying to do. Within any complex system, this can occur.

Of course. You removed the decision from the DI Optimizers hands.

And this time the DI Optimizer understood your desire.

I don’t understand either, but the dataflows are not identical. You modified it twice, and once you modify it the dataflow is no longer identical to DI, even though it may look the same to you. I do agree that in a perfect world, they would all produce the same result. I also understand that I do not and never will live in a perfect world. If I did, I would be out of a job.

I think you would be better served investigating the pushdown_sql option or figuring out the tricks you use to make DI do what you want. But that is just my .02.

Ernie

eepjr24 (BOB member since 2005-09-16)

system · January 29, 2009, 7:20am

I don’t wanna give up. Yes, processing things inside the database should be faster and in most cases it is. For example, when you read data from the database through the engine and bulkload it again the same technology is used as with an insert /*+APPEND */ …select… So it should be equally fast. And in many test cases it is but not all.
Reasons:

DI is installed remotely. So an insert select has one network hop less then via DI or it could be even zero to two network hops.
When reading the data from the database SQLNet has to convert the strings from the internal format to a C++ datatype and back. This is very fast but when the ETL process has a high throughput… It can get worse if e.g. codepage conversions have to be made within the SQLNet layer (database codepage not matching NLS_LANG codepage) or if DI has a transcoder (NLS_LANG not matching engine codepage).
Same with memory allocation, it takes a few nanoseconds only per row but times millions of rows processed per second this can utilize one CPU entirely.
Database is configured to execute tasks in parallel, the DI dataflow might not.
Either way, our philosophy is that the database is always faster hence we invested so much in being able to pushdown.

Why does the “same” dataflow once get pushed down and the other time not? Very often this is related to datatype conversions.
What does happen if a varchar(10) is loaded into a varchar(5) via DI? In Designer you get a validation warning that data might get truncated but the dataflow will execute.
What does happen if you write “insert into col_varchar_5 select col_varchar_10 from table”? The statement might work or fail, depending on the data. If no string is longer than the target column all will work otherwise you get the error “value too large”.
As the policy for DI is the dataflow has to succeed, we pushdown only if that is guaranteed.
Unfortunately we are not doing a perfect job with the datatype length calculation, e.g. just recently we wanted to load a ‘AB’ || substr(col,1,3) into a varchar(5). This statement will never have more than 5 chars so should be perfectly valid, the optimizer did not figure that out - no full pushdown as the datatypes supposingly are not compatible.

btw, I know we wanted to speed up DI and not the database but in 12.x try this:

This “alter session enable parallel dml” could speedup a full pushdown by another factor…
https://wiki.sdn.sap.com:443/wiki/display/BOBJ/Database+Session+parameters

Werner Daehn (BOB member since 2004-12-17)

system · January 29, 2009, 8:42am

[quote=“eepjr24”]

Obviously, but it shouldn’t work this way. And I suppose the system might be complex but our dataflow is definitely not.

I quote di tech manual v. 11.7.2, maximing push down operations, page 38:

Operations that cannot be pushed down:

di functions that do not have database correspondents
load operations that contains triggers
transforms other than query
joins on different db servers w/ no db links

In step 2) of my previous post we don’t have any of these.

But then again, we’re running 11.5 and I am quoting v.11.7 manual. Maybe that explains some of it.

Werner, I will look more closely into your post, particularly reason 2) about codepage conversion. You gave some excellent advice that definitely should be included in the performance manual.

And I want to emphasize that generally I am very happy with the performance we’re getting, even in our slowest df’s once the loading phase starts we are typically loading more than a thousand rows per second. I have done other projects with some other etl tools (I don’t want to name them ) where thousand + rows per sec was considered a good loading speed.

It’s just that there is always room for improvement.

Pasi Tervo (BOB member since 2002-09-03)