Unable to use XML pipe line transform with XML source

system · July 14, 2011, 3:09pm

We have a XML file that has a structure wherein multiple children of the parent along with one parent id makes the primary key. Something like,

Course
ID
Item A
Lang
Desc
Item B
Lang
Desc
Item C
Lang
Desc

Here, assume ID, A.lang, B.lang, C.lang is the composite primary key. Now, the source XML is 8400KB and got 36 columns. Out of the 36 columns, 10 columns together makes the primary key (when its flattened to a relational table). To efficiently process this file in memory, I tried using XML pipeline transform for that - but transform is limited in the sense it can only take only one child node values along with parent in one transform. I cannot use multiple transforms and join as its not suitable for this XML source and scenario.

Any other suggestions on how to efficiently flatten this XML file would be very helpful. Thanks.
courses3.txt (7.0 KB)

jgopalan (BOB member since 2008-02-25)

system · July 14, 2011, 4:23pm

Don’t use XML Pipeline, use a Query where you drag 'n drop the entire input schema to the output and unnest the all schema?

If I am headed into the wrong direction, please post a simplified XML to see the problem better.

Werner Daehn (BOB member since 2004-12-17)

system · July 14, 2011, 4:30pm

Thanks Werner. Thats what I did. Trying to find out if there are better ways of doing this.

Also, when I unnest and flatten to a file, I get duplicates. If I do a select distinct before loading to a table, the distinct operation is carried out in memory probably and its causing VM issues. Any suggestions for not loading duplicates. The flattened file is getting loaded to an Oracle table and I can probably use Auto Correct load, but that will increase the running time (and as it is it takes 8 hours to load this file to an oracle table, the file has 36 columns with one description that is varchar(4000)).

jgopalan (BOB member since 2008-02-25)