Bulk Loading to an Oracle Partitioned Table

I will be loading 2,500,000 rows to a table that is partitioned by day (20080801 and 20080802 are separate partitions). The Dataflow will be pulling data from the source system for a single day so all the rows are going into the same partition.

Partitions are created each day, likely by a process outside DI.

Since DI isn’t going to know about the new partition (it didn’t exist at design time) and all the rows are going into the same partition does that mean that there will be no optimizations done by DI at runtime when using the Bulk Loader feature? I’m concerned about this because I found this in the manual:

Note: If you plan to use a partitioned table as a target, the physical table partitions in the database must match the metadata table partitions in Data Integrator. If there is a mismatch, Data Integrator will not use the partition name to load partitions, which impacts processing time.


eganjp :us: (BOB member since 2007-09-12)

What partition based optimization would you expect if all data goes into one partition only anyway? There is not much I can think of.


Werner Daehn :de: (BOB member since 2004-12-17)

I’m new to Oracle partitions so I don’t really know for sure what DI would do differently. At the very least I think DI could run multiple loaders for each partition. But that implies that DI knows how to segregate the data for individual partitions. That’s not impossible but seems to me to be more overhead that I would want an ETL job to take on.


eganjp :us: (BOB member since 2007-09-12)

That is exactly what DI does when you check the “enable partitions” on the loader - it creates one loader per partition. But that is exactly my point, all loaders will get no data except the one for the current day. Doesn’t make a lot of sense.

Some reading material: https://boc.sdn.sap.com/node/5036


Werner Daehn :de: (BOB member since 2004-12-17)

This leads me to two questions:

  1. If I leave off any “enable partitions” options then I should be ok, right?
  2. Is there a way at runtime to update the partition information in the repository so it is current?

eganjp :us: (BOB member since 2007-09-12)

  1. Correct. And as you are loading one partition only preformance will be the same. What you might find is that the bulkload will disable all indexes and recreate them all - that might take a few minutes. It would have been better if only the active partition index would be rebuilt.

  2. No. We check if the partition information is still up to date and raise an error/warning if it is not. The DOP setting and partitions have to be in sync to take full advantage, hence an auto-setting was not possible. But we are aware of that, we have an idea and in Oracle 11 creating new partitions automatically is a supported feature and will happen more often - we have to deal with it.

I wonder what the performance would be if you dump the data into a stage table and do a final insert…select. So add a Data_transfer transform right before the target table.

btw, some more ideas like partition exchange: https://boc.sdn.sap.com/node/6003


Werner Daehn :de: (BOB member since 2004-12-17)

This might be one for the enhancement thread but since this adds on to the topic of this thread I’ll see where it goes…

I have multiple processes loading data to the same partitioned table. Each process loads to a different partition using bulk load. I would like to have each process running at the same time but using the bulk load feature makes this impossible (resource busy error).

However, if I could get Data Integrator to bulk load into a temp table I could then load the data into the partitioned table using partition exchange. Because I could have 1 to n processes running simultaneously I cannot use the same temp table for each process (bulk load wouldnt work and partition exchange wouldn’t either). Each process would need its own temp table.

The flow would look like this:

source table -> query -> load temp table (bulk load) -> partition exchange

One way that I can think of to make this work would be if I used a DI template table for the temp table and the template table derived the actual table name at runtime from a DI variable. But DI won’t accept a variable as the table name in the Create Template dialog.

Does anyone have any other ideas?


eganjp :us: (BOB member since 2007-09-12)

And what is your reason for having multiple dataflows instead of doing all in one?


Werner Daehn :de: (BOB member since 2004-12-17)

At the time of execution there can be 1 to n processes requested by the user. The user will run a job for each request.

Even if we wanted to enable partitioned bulk loading in DI it wouldn’t work. Every day a new partition is created and DI doesn’t know about these in the meta data.


eganjp :us: (BOB member since 2007-09-12)

I see. You have to build “something” yourself or do not use the bulkloader.


Werner Daehn :de: (BOB member since 2004-12-17)

I’m thinking that since Partitioning is a big part of Data Warehousing on Oracle that it would be extremely cool if Data Integrator/Data Services supported partition exchange behind the scenes. Maybe a new transform or perhaps new options on the Target Table object.

If the “Use Partition Exchange” option was checked ON then DI would do the following:

  1. Create a temp table using the same table properties as the target table (compression, pctfree, tablespace, etc)
  2. Bulk load to the temp table
  3. Create indexes on the temp table that match the target table
  4. Perform the partition exchange
  5. Remove the temp table

The big issues are with items 1 and 4. At run time you have to know the name of the tablespace to create the temp table in and you have to know the name of the partition to exchange into. If both of those properties supported the use of a variable then that would solve the issues.

For this particular project, if we have to roll our own solution for partition exchange then Data Integrator becomes a rather expensive scheduler.


eganjp :us: (BOB member since 2007-09-12)

Yes, I have that in mind…


Werner Daehn :de: (BOB member since 2004-12-17)