Split a single column text to multiple columns

system · November 14, 2014, 4:37pm

Hi,

I have a single column1 with 3500 characters. I need to split this into multiple columns based on the length of field names which is in another table.

I can use substring function and split each characters to 1085 columns in the target. Any way to dynamically do the substring and generate columns?

Thanks,
Arun

Arun.K (BOB member since 2011-10-18)

system · November 17, 2014, 2:15pm

Hi,

Giving more info.

I have two tables T1 and T2

T1 =>

SL.No—FieldName—Length—Start—End
1 ---- ABC ---------1---------1-------1
2 -----DEF----------5----------1-------4
3 -----HIG----------7----------4-------6
4 -----WXZ---------10---------6-------9

T2=>

Only a single column C1 with a string like “hajjcjehjhcehcj”

The target should be the above fieldnames with the string in C1 splitted as per the length of T1

Target = >

ABC—DEF —HIG—
h -----ajjcj—ehjhceh–

I tried using row gen transform but its skipping every alternate character after splitting. I tried another way of using a variable to capture length field from T1 and split using a while loop. But the output is always a single field.

Any suggestions?

Thanks,
Arun

Arun.K (BOB member since 2011-10-18)

system · November 17, 2014, 2:37pm

You can build the ATL using a script. (I use Python)

load your column lengths up into an array in the script, and use that to generate your field mappings/data types.

It is still going to take you some time, but its going to reduce the chance for error down to virtually zero. Getting off one byte building a layout like this is not fun trying to track down.

jlynn73 (BOB member since 2009-10-27)

system · November 17, 2014, 2:49pm

When you say python do i have to use UDT (user defined transform) and code in the python editor?

Arun.K (BOB member since 2011-10-18)

system · November 17, 2014, 7:31pm

Im talking about scripting this on your desktop.

If you build a simple source > query > target, with a field mapped for reference. You export that ATL and use that as a starting point for your script.

If the positions/lengths will change … you can use a script to build the in/out mappings for a udt. Then use a… for n in range(1085): … get+process+set fields.

I greatly dislike doing things the hard way.

jlynn73 (BOB member since 2009-10-27)

system · November 17, 2014, 9:08pm

Python seems very tough to crack, not used to it. Any other suggestions with in DI.

I created a while loop to iterate but the variable will bring only one value so always it will show the first value 1.

Arun.K (BOB member since 2011-10-18)

system · November 18, 2014, 3:44pm

You could probably do this in DI somehow using a row_gen transform, and a pivot but you’re still stuck doing all the output mappings.

You could use Excel to modify the ATL. Theres not really much to it once you look at the export.

jlynn73 (BOB member since 2009-10-27)

system · November 19, 2014, 3:13am

I have to look into python seriously. If Array is a possibility in DS, my work would have been over. Wonder when they will come up with that

Arun.K (BOB member since 2011-10-18)

system · November 19, 2014, 2:27pm

If you pivot your data and concatenate it using 2 unique delimiters, you’ll have yourself a 2 dimensional array. Then you use a row generator transform, to act as your iterator.

As long as you’re stuck mapping those output fields, the solution is going to be more work than just doing each substr() mapping individually.

jlynn73 (BOB member since 2009-10-27)

system · November 19, 2014, 2:41pm

I do not have a delimiter in the record like comma etc. My delimiter is field length which is another table and to be used dynamically to split the text based on that value. I can capture the field length in a variable using the script, but only one.

Arun.K (BOB member since 2011-10-18)

system · November 20, 2014, 3:01pm

You’re not following, which is ok because this is pretty convoluted.

If you pivot your mapping columns and concatenate them all together, with delimiters of your choosing… you have essentially created a two dimensional array.

The row generator transform can then be used as your looping mechanism to grab the specific index of this array using word_ext(row_gen#).

As I said … the solution is much more complicated than just doing all the data entry. (unless you can script the atl) my .02

jlynn73 (BOB member since 2009-10-27)

system · November 20, 2014, 8:25pm

Ok i got it. Let me try your idea and see.

Arun.K (BOB member since 2011-10-18)

system · December 3, 2014, 1:52pm

Did you solve this? I ask because, if I read your initial post correctly, I’ve done what you’re trying to do.

I’ve parsed a long string of characters according to start positions and lengths stored in a table. I did this by cross-joining the two sets of data; you can cross-join by just joining two tables/queries without specifying a join. This duplicates each source record once for each record with the parsing specifics. So, in my case, I was parsing my string into seven pieces. Each of my source records contained one instance of the string. The cross join duplicated each row seven times, once for row in the table with the start position/length. I created a new output column for the parsed values and mapped it with a substring function, in which the source string was the full string value, and the start position and length came from the values of the cross-joined table. I ended up with seven new rows, each with the proper piece of the full string.

ht1815 (BOB member since 2008-05-23)

system · December 3, 2014, 9:46pm

Thank you for the solution.

I have not done yet in DS. I managed to do that in DB with a stored proc. But your solution would be better as i can do that in DI itself.
One question if i understand correctly i have my t1 with string has 978 records and t2 with start, end and length has 1085 records. So when i cross join i will have (each string record) * length number of records?
Then apply substring of length to each records and pivot to generate columns? correct?

Arun

Arun.K (BOB member since 2011-10-18)

system · December 4, 2014, 12:05pm

Yep, I think you’ve got it.

ht1815 (BOB member since 2008-05-23)

system · December 5, 2014, 8:47pm

Thanks much.

The split is working, not yet tried the pivot, the record count is 1 million. I have 978 records each needs to be split into 1085 columns.

Arun.K (BOB member since 2011-10-18)

system · December 8, 2014, 12:13pm

Wow. I wonder how long it’ll take to pivot 1 million records? I wonder, too, how it will compare with doing it in the stored procedure in the database.

ht1815 (BOB member since 2008-05-23)

system · December 10, 2014, 9:28pm

Hey i have reduced the number of records from 1 mn to 19K, still higher though. But little confused with the second part. After the cross join, i am getting like this -

FieldNo----FieldName----Start----Length----Substring
1------------A------------1----------1------------8
1------------A------------1----------1------------A
1------------A------------1----------1-----------DC
2------------B------------2----------1------------7
3------------C------------5----------3------------22
3------------C------------5----------3------------37

The final output should be Fieldname and Substring values under each fieldnames. As far as i understand the Reverse Pivot will not take dynamic values. Any suggestions appreciated.

Thanks,
Arun

Arun.K (BOB member since 2011-10-18)

system · December 11, 2014, 12:23pm

Sorry. Not following. I see what you have after the substring. What exactly are you wanting the final, pivoted, rows to look like?

ht1815 (BOB member since 2008-05-23)

system · December 11, 2014, 1:38pm

I need the output with field name and substring values -

A ----B-----C
8-----7-----22
A-----Null—37
DC—Null—Null

The above output based on my previous post.

Thanks,
Arun

Arun.K (BOB member since 2011-10-18)