BusinessObjects Board

Text Data Processing bug in DS 4.2?

Hi all,

For one of our customers, we have developed a social media analysis/monitoring application using SAP Data Services. This solution continuously extracts data from Twitter for analysis, alerting and reporting.

This solution was developed in SAP Data Services 4.1 and has been running fine for a long time now. However, our customer recently upgraded to SAP Data Services 4.2 SP3 (14.2.3.549) and is now experiencing issues with their correctly updated Social Media Analysis Jobs. (The repositories were correctly upgraded to DS 4.2 using the repository manager and everything else works fine.)

It seems that at random times, the Text Data Processing Transformation spits out this error:


6840       7364       DQX-058306       20/10/2014 12:56:04 p.m.             |Sub data flow DF_Twitter_Text_Analytics_1_2|Transform Base_EntityExtraction
6840       7364       DQX-058306       20/10/2014 12:56:04 p.m.             Transform <Base_EntityExtraction>: Internal format conversion error processing <8584>.

(That latter number is an internal ID of the Tweet being processed.)

Further studies shows that the error is not random but always happens with specific tweets/text content. If we re-process the same data in our development environment, we can reproduce the same error again and again, with the same tweets.

The issue only occurs with a relatively small number of the tweets being processed - perhaps less than 5% of the data.

Here are some of the text samples of these tweets:


@Calfreezy love your vid with W2S in apartment tour ! sick set up as weell !!!!! :P
@docfreeride I’m sure it eases the pain. Whether it fixes the problem or even adds another one (hangover) is another matter :-D
@mcnfreedom good feel sick :/
RT @picture_window: the @nzherald is really on form today publishing all of the chronically irrelevant opinions of chronically irrelevant p…
The free RDU app and RDUnited - $2 off house beer, wine and spirits at Dux live.  Android https://t.co/PrFcHfDcZY
The free RDU app and RDUnited - $2 off house beer, wine and spirits at Dux live.  iPhone https://t.co/lxddJ3KDZh
The free RDU app and RDUnited – 20% off all bottled beer at @threeboysbrew  Android https://t.co/PrFcHfDcZY
The free RDU app and RDUnited – 20% off all bottled beer at @threeboysbrew Android https://t.co/PrFcHfDcZY
The free RDU app and RDUnited – 20% off all bottled beer at Three Boys Brewery.  iPhone https://t.co/lxddJ3KDZh
The free RDU app and RDUnited – Free Tequila or beer with med Mexi Lime Chicken pizza at Winnies City.  iPhone https://t.co/tuN1AxBvDA
The free RDU app and RDUnited– Free Tequila or beer with med Mexi Lime Chicken pizza at Winnies City.  Android https://t.co/PrFcHfDcZY
The picture they used was of her at a club in Auckland, taken over a year ago, while she was drunk. Personally, that's disgusting ethics.
Win free tickets to @DocEdge festival at @MiramarTheRoxy (+ wine specials for all documentary festival-goers) - http://t.co/0ZLFvlbYdz
Win free tickets to @DocEdge festival at @MiramarTheRoxy + wine &amp;amp; lunch specials for all documentary festival-goers - http://t.co/0ZLFvlbYdz

To see if this was an upgrade/conversion issue, I created a new Data Flow in DS 4.2 and created a new TDP transformation without referring the existing Rule and Dictionary files that we are using and used a randomly selected tweet as source. And I got the exact same error as above.

The tweet I used was:

The picture they used was of her at a club in Auckland, taken over a year ago, while she was drunk. Personally, that's disgusting ethics.

I ensured that no special characters were in the tweet but still got the error. I then only used the first four words and STILL got the error. However, when I changed “The picture[…]” to “A picture” or “Them picture” or anything else BUT “The picture”… the error went away!?

Is this a bug of sorts that was introduced in DS 4.2?

I was able to reproduce the very same error on our own SAP Data Services 4.2 SP2 environment and I cannot reproduce this error in SAP Data Services 4.1 SP2 (14.1.2.378).

Has anyone any idea what is causing this problem?


ErikR :new_zealand: (BOB member since 2007-01-10)

We raised a SAP incident for this, SAP acknowledged that this was a bug and that it should be fixed in SP4, which has just been released.

I have not had the opportunity to test if SP4 indeed resolved this issue but I hope to be able to do so shortly.

For more information, please refer to SP4 release notes and SAP Note
2110602 (which only says that its a bug and it has been fixed… )


ErikR :new_zealand: (BOB member since 2007-01-10)