Splitting Target Files

Hi

We are transforming data from our Source System and pushing it to SAP System. One of the SAP Table is getting more than 500K records which is taking huge time to be dumped into SAP.
Is there a way to split the target files like every 10,000 records goes to new file?


ganeshxp :us: (BOB member since 2008-07-17)

No, not easily. Needs to be done manually. Best thing would be to create one file with DS and then call some Operating System functions to split into multiple files.


Werner Daehn :de: (BOB member since 2004-12-17)

But of course you can with DS. :slight_smile:

I assume that you create a staging table of sorts before you create the LSMW load files? (Assumign you’re using LSMW to load the data into SAP?)

If so, create a record counter (gen_row_num) in a first Query Transform. Then write a custom function in subsequent QT, in which you check for each record number what fraction it is of your desired file content - so you can start assigning numbers to groups.

Just say that your desired output is 10,000 records. Then record 5432 is fraction < 1 thus “block 0”, (5432/10000). And record 336,001 is block 36, etc. So just divide the record number by your desired block number and either take the floor or ceiling to get a rounded block or group number. Then store this in a seperate column in your QT and into your staging table.

Now create a Work Flow with an init script to retrieve the MAX(Block_Number) from that staging table and your MIN(Block_Number). The latter being either 0 or 1, depending if you took the floor or ceiling from your record number fraction.

Then we add a LOOP object, with a while statement to keep it stepping through all block numbers (While $Block_Number <= $Max_Block_Number) and in this Loop we need the following:

  • A script to generate a file name and populates a WF level parameter
  • A Data Flow that accepts the filename parameter and the block number parameter and uses the block number in a QT WHERE statement to extract only a single block of data from your staging table - and then populates a predefined file format, using the generated filename parameter.
  • A Try/Catch block which includes the DF and in the Catch block, just include a script (for all errors) to raise $MAX_Block_Number + 1 - to break the loop in case of an error.
  • A Script following to Try/DF/Catch block to raise the $Block_Number +1 to push the Loop into its next cycle.

I don’t have DS Designer in front of me now, so this is all done from memory but this is how I’ve done it in the past - especially to break SAP LSMW files into smaller chunks. (Although I usually used to split them out by Site number or something like that but this should work as well).


ErikR :new_zealand: (BOB member since 2007-01-10)

Erik

Thanks for your reply. I did it by the same loop way. But I used a key generation transform and did it. It worked.

And yes for the LSMW Loads :expressionless:


ganeshxp :us: (BOB member since 2008-07-17)

Hi Ganesh,

Can you please send the whole process and techinique used in that??

Regards,
Vijay


vijay2817 (BOB member since 2010-10-28)

Hi,

Yes, you can do that.

Follow these steps.

1).Write a code in the script object as mentioned below.

$G_Total_Record_Cnt=total_rows(datastorename.ownername.tablename);

print($G_Total_Record_Cnt);

$L_Records_Per_File=10000;

$G_Num_Of_Files=ceil(($G_Total_Record_Cnt/$L_Records_Per_File));

print($G_Num_Of_Files);

$G_File_Num =1;

$G_StartRecord=1;

$G_EndRecord=$L_Records_Per_File;

$G_Split_FileName =‘C:\Documents and Settings\0309\Desktop\’||‘SPLIT_FILE’||’’||$G_StartRecord||’’||$G_EndRecord||’’||to_char(sysdate( ),‘MMDDYYYY’)||’’||$G_File_Num||’.csv’;

2). After the script object, drag while loop object

In that loop apply condition like ($G_File_Num <= $G_Num_Of_Files)

Now drag DF into while loop .

In the first query transform,create a new column and apply gen_rownum() function.

In the next query transform, apply condition like
((Qry_Seq_Num.Gen_RowNum >= $G_StartRecord) AND
(Qry_Seq_Num.Gen_RowNum <= $G_EndRecord))

Finally, map to target file definition.

3). Add one more script object after DF.Write a code in the script as below specified.

print(‘SPLIT_FILE’||’’||$G_StartRecord||’’||$G_EndRecord||’’||to_char(sysdate( ),‘MMDDYYYY’)||’’||$G_File_Num||’.csv File has been created’);

$G_File_Num=$G_File_Num+1;

$G_StartRecord=$G_EndRecord+1;

$G_EndRecord=$G_EndRecord+$L_Records_Per_File;

$G_Split_FileName =‘C:\Documents and Settings\0309\Desktop\’||‘SPLIT_FILE’||’’||$G_StartRecord||’’||$G_EndRecord||’’||to_char(sysdate( ),‘MMDDYYYY’)||’’||$G_File_Num||’.csv’;

Regards,
Bob


Ram-> :india: (BOB member since 2009-03-24)

Hi or if anyone else can help me.

I have 1million records populating a .dat file but I need to split the no of records in each file to 150,000 which should then produce multiple files with a sequence number for each one.

I have treid following this example using similar code but the job keeps looping and never ends.
Is there another way of doing this?


slidder (BOB member since 2011-01-13)

Hi Ganesh,

I have a similar problem to solve, please could you send me the code you used for this. I am new to DI and don’t know the syntax.

I have a million record file that I need to split into 150,000 records chunks and save the into seperate files with the date as the filename, when it was run and a sequence number.

I also need to make sure the data stops and creates file 2 at the correct point in the file because there are child records in it.

I know I need to put the records into Blocks but I don’t know the code to do this.

I am really stuck on this, I was wondering if you have an ATL file or any code samples you could kindly send to me on this.

I really appreciate your help.


slidder (BOB member since 2011-01-13)

I had the same issue. Source file of 20 records loops endlessly and generates couple million records.

I know this is an old thread…but any insight will be appreciated.
Thanks.