We have a XML file that has a structure wherein multiple children of the parent along with one parent id makes the primary key. Something like,
Course
ID
Item A
Lang
Desc
Item B
Lang
Desc
Item C
Lang
Desc
Here, assume ID, A.lang, B.lang, C.lang is the composite primary key. Now, the source XML is 8400KB and got 36 columns. Out of the 36 columns, 10 columns together makes the primary key (when its flattened to a relational table). To efficiently process this file in memory, I tried using XML pipeline transform for that - but transform is limited in the sense it can only take only one child node values along with parent in one transform. I cannot use multiple transforms and join as its not suitable for this XML source and scenario.
Any other suggestions on how to efficiently flatten this XML file would be very helpful. Thanks. courses3.txt (7.0 KB)
Thanks Werner. Thats what I did. Trying to find out if there are better ways of doing this.
Also, when I unnest and flatten to a file, I get duplicates. If I do a select distinct before loading to a table, the distinct operation is carried out in memory probably and its causing VM issues. Any suggestions for not loading duplicates. The flattened file is getting loaded to an Oracle table and I can probably use Auto Correct load, but that will increase the running time (and as it is it takes 8 hours to load this file to an oracle table, the file has 36 columns with one description that is varchar(4000)).