Removing special characters

Hey guys

I need to remove all of the special characters on a standard keyboard leaving just the alpha-numeric ones (A to Z and 0 to 9).

Does anyone know how to do this (ideally within the Query Mapping tab)?

The following statement will replace the dash with nothing (which is good);
replace_substr_ext(Query_2.MFRPN,‘-’,‘’,1,20)

I’m thinking I’d need to repeat this statement for each special character but I cant get it to work;
replace_substr_ext(Query_2.MFRPN,‘-’,‘’,1,20) and
replace_substr_ext(Query_2.MFRPN,‘_’,‘’,1,20) and
replace_substr_ext(Query_2.MFRPN,‘+’,‘’,1,20)

One last question: How can I tell Data Services to find and replace a special character which is forming a part of a statement its self, for example a comma or a apostrophe?

Any help would be greatly appreciated! I’ve also included a screenshot of the product im using, I hope this helps.

Thanks in advance!
ScreenShot610.jpg


Jay1982 (BOB member since 2010-12-28)

Not to wory… The following thread covers it. After a few hours I was able to get it to work via the Custom Function functionality…

https://bobj-board.org/t/151294/5


Jay1982 (BOB member since 2010-12-28)

Hi,

You can’t have mutiple functions combined with ‘and’ in a column mapping, or use any boolean operators. It doesn’t make sense, because a column mapping needs a single value result. In your example the replace_substr_ext function is returning a string, so your end result is ’ and and ', which is an illogical statement. You could get this to work by nesting the function calls so in effect you are repeatedly applying the function to the result of the inner function, eg.

replace_substr_ext(replace_substr_ext(replace_substr_ext(Query_2.MFRPN,’+’,’’,1,20)
,’_’,’’,1,20),’-’,’’,1,20) ) )

But this is only going to be feasible if you have a limited number of special characters to replace otherwise you are going to end up with a monster nested mapping.

My recommendation is to create a custom function with string input and string output which loops through each character of the input string and then use the match_simple function within the loop to determine if the current character is an alphanumeric or a number (the match_simple lets you use # to denote an alphanumeric and ‘[0…9]’ to denote a number between 0 and 9, so if either function call is TRUE then concatenate the character to a new string. After the loop return the new string.

The execute the function in the column mapping.


ClintL :south_africa: (BOB member since 2011-01-06)

do yourself a favor, setup a while loop incrementing a counter from 1 to 255. Print the chr(N) value.

after you figure out which values you want to keep, then you can setup a while loop to loop through each character in your string, only accepting the values in the ascii() range that you’re interested in.

I laugh everytime I see one of these monster replace_substr() functions.

If you really need to do it that way, create an array of your replace values, using a delimited string. Then loop through the array, using the word_ext() function. You could also increment an index to perform your replace_substr() fuction on each value (since its only 1 byte), but I like using these pseudo arrays.

something similiar I use that could be easily tweaked

$idx 		= 1;
$lenString 	= length($inputString);
while($idx <= $lenString)
	begin
		$chrChar = ascii(substr($inputString,$idx,1));
		if($chrChar < 32 or $chrChar > 126)
			begin
				return('T');
			end
		$idx = $idx + 1;
	end

return('F');

so using the same idea, you could create a new empty return string, and concatenate each qualifying character to it. Using the ascii values (to qualify), not the characters themselves. If a character doesnt fall into the numeric range, you just dont || concatenate it to the return string. my .02 :smiley:

heres an example SR function for diacritics I modified from the one provided by sap.

$lstDiaSR 		= 'À|A,Á|A,Â|A,Ã|A,Ä|A,Å';
$idx 			= 1;
$SearchPair 	= word_ext($lstDiaSR,1,',');
$return_string 	= $input_string;
while($SearchPair is not null)
	begin
		$return_string 	= replace_substr($return_string,word_ext($SearchPair,1,'|'),word_ext($SearchPair,2,'|'));	
		$idx 			= $idx + 1;
		$SearchPair 	= word_ext($lstDiaSR,$idx,',');
	end
return($return_string);

gives an example on how to create a 2dimensional array using a string and word_ext. I truncated the list of diacritics since it was fairly large.


jlynn73 :us: (BOB member since 2009-10-27)

I found this on the web a while back and have used it before. I created a function and put in the values that I needed to remove. Attached is an ATL export of the function.

$return_str = search_replace([], ['SR_STRING'], ['no'], [], [$input_field], []) 

SET("internal_table" = '<?xml version="1.0" encoding="UTF-8"?>
   <searchTable>
      <entry>
         <search>!</search>
         <replace></replace>
      </entry>
      <entry>
         <search>@</search>
         <replace></replace>
      </entry>
      <entry>
         <search>#</search>
         <replace></replace>
      </entry>
      <entry>
         <search>$</search>
         <replace></replace>
      </entry>
      <entry>
         <search>%</search>
         <replace></replace>
      </entry>
      <entry>
         <search>^</search>
         <replace></replace>
      </entry>
      <entry>
         <search>&amp;amp;</search>
         <replace></replace>
      </entry>
      <entry>
         <search>*</search>
         <replace></replace>
      </entry>
      <entry>
         <search>(</search>
         <replace></replace>
      </entry>
      <entry>
         <search>)</search>
         <replace></replace>
      </entry>
      <entry>
         <search>-</search>
         <replace></replace>
      </entry>
      <entry>
         <search>_</search>
         <replace></replace>
      </entry>
      <entry>
         <search>=</search>
         <replace></replace>
      </entry>
      <entry>
         <search>+</search>
         <replace></replace>
      </entry>
      <entry>
         <search>/</search>
         <replace></replace>
      </entry>
      <entry>
         <search>\\</search>
         <replace></replace>
      </entry>
      <entry>
         <search>|</search>
         <replace></replace>
      </entry>
      <entry>
         <search>?</search>
         <replace></replace>
      </entry>
      <entry>
         <search>[</search>
         <replace></replace>
      </entry>
      <entry>
         <search>]</search>
         <replace></replace>
      </entry>
      <entry>
         <search>{</search>
         <replace></replace>
      </entry>
      <entry>
         <search>}</search>
         <replace></replace>
      </entry>
      <entry>
         <search>&amp;lt;</search>
         <replace></replace>
      </entry>
      <entry>
         <search>&amp;gt;</search>
         <replace></replace>
      </entry>
      <entry>
         <search>,</search>
         <replace></replace>
      </entry>
      <entry>
         <search>.</search>
         <replace></replace>
      </entry>
      <entry>
         <search>;</search>
         <replace></replace>
      </entry>
      <entry>
         <search>:</search>
         <replace></replace>
      </entry>
   </searchTable>', "run_as_separate_process" = 'no')
;

RETURN $return_str;

JMoroney (BOB member since 2011-07-29)

If you are using Oracle it may be worth your time to look into the REGEXP_REPLACE function that Oracle provides. You’ll have to use a SQL transform to push that function down to the database. I would be willing to bet a large sum of your salary that using the Oracle function will execute a lot faster than a custom function written within Data Services. It’s not functionality that can be used within the Query mapping tab like you requested, but it’s a lot simpler solution.


eganjp :us: (BOB member since 2007-09-12)

Haha. But Jim, by writing a Oracle Function and calling that in DI as a Database function. How effective would that be?


ganeshxp :us: (BOB member since 2008-07-17)

That’s why it needs to be in a SQL transform so it can be pushed down to the database.

select REGEXP_REPLACE('A(B)C*1\2/3', '[^A-Za-z0-9]', '') from dual

Return value: ABC123


eganjp :us: (BOB member since 2007-09-12)

a stored function imported (so you have to wrap the Oracle function into a stored procedure you wrote just calling it) and using that function in a mapping of a query that is pushed down, will generate a SQL where the stored function is included. Just to avoid using the SQL Transform…


Werner Daehn :de: (BOB member since 2004-12-17)

The nested syntax is incorrect in closing parathenses. It should be in this form:
replace_substr_ext( replace_substr_ext(replace_substr_ext(TXTMD, ‘&’, ‘AMPER’, null, null), ‘+’, ‘PLUS’, null, null), ‘%’, ‘PERCENT’, null, null)


grantdoug :us: (BOB member since 2009-10-23)

cheers jlynn73 you got me thinking again about this, the function i wrote at the end of last year was a monstosity and i was always meaning to go back and look at it but as usual i hadn’t bothered as the previous function worked.

so here is my revised function that does pretty much the same thing


# return only alphanumeric characters

#non alphanumeric ASCII Characters
#1-47, 58-64, 91-96, 123-255 
$V_LOOP       = 1;

while( $V_LOOP <= 255 )  #max-ascii-character
   begin
   	   #replace the loop character
   	  $IN_STR = replace_substr($IN_STR, chr($V_LOOP), '');
	  
	  #Increment $V_LOOP
	  $V_LOOP = decode(
					  	$V_LOOP = 47, 58,
						$V_LOOP = 64, 91,
						$V_LOOP = 96, 123,
						$V_LOOP + 1
					  );
   end

return $IN_STR;

I went for looping over the ASCII set rather than the characters as the fields i use this on are usually over 1000 characters long.

Thanks again
Dave


davc4 :uk: (BOB member since 2009-07-03)

Dave, in my query, for column with field how do i call this function in the ‘Map’ tab, i.e. how do I get the coding to act on my TABLE.COLUMN? Do I have to enter Variables for Job to run?

Also, I am unable to remove 0D0A CR/LF - i can only replace the CR. Any suggestions?

Thanks,

Doug

non alphanumeric ASCII Characters
#1-47, 58-64, 91-96, 123-255
$V_LOOP = 1;

while( $V_LOOP <= 255 ) #max-ascii-character
begin
#replace the loop character
$IN_STR = replace_substr($IN_STR, chr($V_LOOP), ‘’);

 #Increment $V_LOOP 
 $V_LOOP = decode( 
                $V_LOOP = 47, 58, 
              $V_LOOP = 64, 91, 
              $V_LOOP = 96, 123, 
              $V_LOOP + 1 
             ); 

end

return $IN_STR;


grantdoug :us: (BOB member since 2009-10-23)

What i posted was a custom function,

You need to set that up with the correct variables.
then call that function from your dataflow.

LF is ASCII character 10 and CR is ascii character 13

I haven’t done a lot of work with this and haven’t tested the function with it but it might work.

Is that enough information.


davc4 :uk: (BOB member since 2009-07-03)

I haven’t used a custom function before, can you give me the specific step-by-step implementation detail? Using your code in this post, what are the names of variables that have to be created? Are they entered in Global Variables at the job? Then, what is the code that must be typed in the map tab for my TABLE.COLUMN.


grantdoug :us: (BOB member since 2009-10-23)

Hi Grant

Go to the custom functions tab
Right Click select new just like everything else.

On the next screen type in the function name and a description for the function (use your companies naming structures)

Decide if you want the function to run in parrallel if so check the box.

Click next

Copy and paste the code below into the right hand pane

On the left hand pane choose the variables tab

Right Click on Parameters and select Insert
Name: $IN_STR
Datatype: Varchar
Parameter type Input
length whatever you want.

On the Return Parameter right click-> properties
Change the datatype to be varchar with the length the same as $IN_STR

on the Local Section right click -> Insert
Name: $V_LOOP
Datatype: int

Validate and make sure everything is okay.

and you can call this function from a dataflow or a script as normal

i.e.

in a script


print( your_function_name('!"£$%^&amp;*()_+-=@:This is left' ));

you might want to change the variable names around to suit your companies standards as this was jsut me testing out ideas,


davc4 :uk: (BOB member since 2009-07-03)

Incredibly helpful, thank for your generosity in sharing this expertise!


grantdoug :us: (BOB member since 2009-10-23)

Your hints are spot-on, I coded the Custom Function end-to-end in one step with no edits, verified, and it ran perfectly on the first Job execution. Thanks again.

My development test procedure 1) I know CR & other ASCII chars have been causing problems in >1,000,000 row R3 data flow 2) I copied 1mm row schema to Flat File Format 3) Using only 10 rows, synthetically inserted various non-alphanumeric characters in Hex editors by replacing text strings with 0A 0D etc. 4) Ran job and study the output.

As we know, the CR can cause DS to skip the row and throw an error, but load the rest of the rows. The coding shared here “re-concatenates” the text field, and all the rows are loaded.

However, when within a column of text, CR/LF is encountered together, DS will skip the row irregardless of the ASCII replacement coding… apparently instances of CR/LF are evaluated against the row itself, before our Custom Function can evaluate what is in the column.

Is there a solution to that problem?

For example, the incoming data might have an “address” column, with CR LF encoding.

Thanks all.


grantdoug :us: (BOB member since 2009-10-23)

Please take a look at this topic…

I have mentioned the way to do it in SAP function. And the way to do is that only as of now.


ganeshxp :us: (BOB member since 2008-07-17)

Thanks, Ganesh. Is there any way that we can insert your coding into in BOBJ ABAP dataflow, without creating a new development in R3 program?
In our case, we are reading raw data directly from BW PSA tables… not sure how we could even implement coding

REPLACE ALL OCCURRENCES OF cl_abap_char_utilities=>newline
IN <I_O_PARAMETER> WITH ’ '.
REPLACE ALL OCCURRENCES OF cl_abap_char_utilities=>cr_lf
IN <I_O_PARAMETER> WITH ’ '.

Thanks,

Doug


grantdoug :us: (BOB member since 2009-10-23)

On a different note. Pass this value to a sutom function and use the Oracle translate function.

Cheers,
Shazin


Shazin :india: (BOB member since 2011-07-19)