HOW TO : PARALLEL INSERT ?

Hi all,

As far as I understood it is impossible to insert data into a table in parallel if a data flow creates an “INSERT /+ APPEND/…” SQL. For example, I have a 30 million rows source table and am joining this table with 8 other tables (dimension transformation). To not to cash the data, i am not using any DI specific transformation and so DI creates an “INSERT /+ APPEND/…” SQL.

Here the conditions and figures :

1 - On TOAD or any query tool

ALTER SESSION ENABLE PARALLEL DML --( execute manually)

“INSERT /+ APPEND/ INTO …” --( execute manually)

In this situation
Reading : 1 minute
Writing : 1.5 minute

2 - On Data Integrator

Before the data flow I am writing the following sql inside DI script

sql(‘DS’,‘ALTER SESSION ENABLE PARALLEL DML’);

After the script data flow is executed

In this situation
Reading : 1 minute
Writing : 18-20 minutes (since DI sends a commit and so ends the session i can’t insert the data in parallel. I can also see this by observing the CPU usage from UNIX by TOP command)

I also tried to use the “ALTER SESSION…” as “pre-load comment” but dataintegrator AGAIN commits it.

I just don’t want to deal with on logon triggers so i have to find out a way (trick) to overcome this stupid problem.

Thanks in advance


k7nixen :us: (BOB member since 2006-10-07)

That’s a good one.

ADAPT01021877 created against DI12.0

I didn’t anticipate the difference to be that big. Stupid me.


Werner Daehn :de: (BOB member since 2004-12-17)

I am convinced that i will not be able to deceive the Data Integrator, so I wrote that trigger, everything works fine.


k7nixen :us: (BOB member since 2006-10-07)

Is the ADAPT related to the parallel insert specifically, or, have you finally come around to my idea of getting DI to stop doing all of this transaction nonsense? :wink:

Leave the transaction management to the developer for pre- and post-load commands, and this would have worked, since the same database connection is re-used without being closed, right?


dnewton :us: (BOB member since 2004-01-30)

I don’t think the same database connection is used for pre and post load comments since as i mentioned it didn’t work :roll_eyes:


k7nixen :us: (BOB member since 2006-10-07)

Actually, pre/post load runs in the same session. Having said that, no idea why it is not working for you.


Werner Daehn :de: (BOB member since 2004-12-17)

I’d just like to add that I agree with dnewton on this one :slight_smile:

k7nixen - the DI sql([datastore], [command]) is performed in an entirely seperate session to any dataflows, so unfortunately that doesn’t work.


HuwD :uk: (BOB member since 2007-04-19)

Hello all,

Can some one help me understand how to load 30 million rows this fast? I’ve got staging tables that loadup in about and hour. Their counts are below. Im trying to joing them to produce the final DIM table but not luck; not matter how I join them in DI (11.7.2.0) the DF never finishes. Tables are all in Oracle 10g; I’ve also tried staging them in DI Cache

Any help would be greatly appreciated.

STAGING TABLE COUNT
Effective Date 3,474,905
Analysis 3,474,479
Key 22,888
Header 15,190
Mol 32,329,221
Split 3,139,470

Don


donpowell58 :us: (BOB member since 2006-05-18)

You should provide more information about your environment to get help.

Please let me know whether you are doing the join in the database engine or not, then review the following questions & comments:

  • Performance tuning should be taken into consideration not only in ETL tool level but also in database level. You can get the best performance improvement by tuning your database(tables, partitions, parallelism etc.)
  • How many CPU’s do you have in the job server?
  • Are your staging tables’ parallelism and partitions set accordingly?
  • Did you check the executed SQL statements against the database? What about their execution plans? Does your system (oracle) know which table should be broadcasted and which should not? Are you sure there is no meaningless nested loops in the execution plan ?
  • Are you checking the CPU usage during the execution of the SQL ?

If the transformation is done in the job server memory then you should also revise the DOP of the dataflows, the number of loaders for target tables, partitions for source tables, rows per commit value for target tables etc. You should start by glancing at the performance (parallelism) topics
in the diamond.

GL,
Z.


k7nixen :us: (BOB member since 2006-10-07)

I have been intrigued by the session related info for the pre and post commands on a table load command.

Having setup an insert which captures the oracle sessions for a given schema user for the pre, load and post I am seeing one row for the pre-load command, 2 rows for the load command and 2 rows for the post-load command.

Interestingly, one of the inserts from each section, pre, post and load have a common oracle SID and SERIAL#. Then there are the 2 additional inserts ( one for the load and one for the post ) which have different ( but matching ) SID’s and SERIAL#'s.

This makes a total of 5 table inserts:
1 - pre-load command
2 - load processing entries
2 - post-load command

I am testing a fairly large insert /*+ Append */ as Select ***** DI generated command this weekend which applies “ALTER SESSION FORCE PARALLEL DML” in the pre-load. Will see if it has any affect on my loader performance.


gfrench (BOB member since 2010-03-12)

what does “ALTER SESSION FORCE PARALLEL DML” do?
Could you please explain this in detail.

Thanks,
Manny.


New2DI (BOB member since 2008-10-27)

Is it really too much to just google for it? :shock:

http://download.oracle.com/docs/cd/B10501_01/server.920/a96520/tuningpe.htm


ErikR :new_zealand: (BOB member since 2007-01-10)