BusinessObjects Board

Entity Extraction - Sentiment extraction lacking context?

When using Entity Extraction to perform sentiment analysis on social media feedback, it seems that the sentiment analysis only picks up individual keywords but does not analyise the words in the overall context of the source data. Is this correct?

For instance, the phrase “it is hard not to like a video with cats” gets flagged as a negative sentiment. The word hard is flagged as MinorProblem but I would expect Entity Extraction with English Voice of the Customer rules to recognise a basic english structure as “hard not to like” as WeakPositiveSentiment instead.

I fully realize that it will be near impossible for Entity Extraction to pickup sarcasm but this is just basic English that should be properly processed … IF… Entity Extraction is “context aware”.

So the question reamins - is Entity Extraction aware of the context of the entire source data or is it merely just flagging known keywords?

What is the recommended approach to ensure that such false alarms do not skew the overall results too much?

Edit:

And a second question is, how can we make Entity Extraction deal with 2 different subjects within the same source/feed/context?

For instance, take this (real) tweet on Twitter:

Based on my custom dictionary, the Entity Extraction Transform picks up both Jetstar and Air NZ as two organizations of interest.

The Sentiment scores are positive and negative - but how can I associate the sentiment scores with each organization identified in this tweet?

(The VOC rule set did NOT pick up “is al class” as positive sentiment for Air NZ. Instead it picked up “Pure”, “Simple” as positive sentiments - despite following a very negative sentiment. This just seems to demonstrate that the sentiment extraction is just done on keywords only and is not context aware?)


ErikR :new_zealand: (BOB member since 2007-01-10)

Our sentiment rules definitely take context into account. If it was a keyword search how could it possibly figure out and classify the associated topics, problems and requests? It is an NLP engine that literally understands the meaning and context of information, not just the words themselves. However, some particular cases where context changes the meaning of a sentiment may be a) overlooked or b) hard to implement using the current rules. We will fix this particular example “it is hard not to like a video with cats" by preventing extraction of “hard” as Minor Problem in similar contexts.

However, there are cases where a distant phrase can affect the meaning of a sentiment:

I don’t agree with the fact that yesterday’s show was good.

Such cases are challenging to fix using our current VOC rules, a new VOC rule architecture we are implementing this year will address such issues.

Regarding the second question, yes the sentiment rules can extract two Topic/Sentiment pairs from the same context. The reason it wasn’t recognized here is because “all class” was not in our lexicons, being not the most frequent expression. Thus, if we change “all class” to something more frequent we’ll get two Sentiment entities, each having their own Topic/Stance:

Jetstar is an Australian embarrassment, pure and simple! Air New Zealand is wonderful! http://t.co/dcduJRcXek

Entity Mention Text: “Jetstar is an Australian embarrassment, pure and simple!”

Label Path(s): Sentiment
….
Subentity Mention Text: “embarrassment”
Label Path(s): StrongNegativeSentiment
…
Subentity Mention Text: “pure”
Label Path(s): WeakPositiveSentiment
….
Subentity Mention Text: “simple”
Label Path(s): WeakPositiveSentiment
…
Entity Mention Text: “Air New Zealand is wonderful! http://t.co/dcduJRcXek
Label Path(s): Sentiment
Source: ExtractionRule
…
Subentity Mention Text: “Air New Zealand”
Label Path(s): Topic
…
Subentity Mention Text: “wonderful”
Label Path(s): StrongPositiveSentiment
…

We will fix this problem, as well as “pure and simple” mistakenly extracted as Positive Sentiments. Thanks for the feedback.


alwaite (BOB member since 2013-03-11)

Here are some more tweets that, overall, receive a negative sentiment instead of a positive one.

What can I do get more reliable results from the sentiment analysis?


ErikR :new_zealand: (BOB member since 2007-01-10)

In addition, I have ensured that all topics are in my dictionary - I have associated the tweet accounts (@Jetstar, @AirNZ etc) with the corresponding airlines.

The entity extraction transform now identifies that there are two different organisations - which is good. But it is not establishing which sentiment belongs to which topic? The only parent_id columns populated are those of the sentiment extractions. Or is this done per ‘sentence’?


ErikR :new_zealand: (BOB member since 2007-01-10)

Erik, just so you know. Anthony is my counter part in product management for Text Analytics.


Werner Daehn :de: (BOB member since 2004-12-17)

  1. A very different way of demonstrating something quite boring but necessary. Well done Air New Zealand ad team! http://t.co/wgtgYB1A8K

Two sentiments are extracted from this example, “quite boring” as a Negative sentiment and “Well done” as Positive one. One can understand this tweet is overall positive, but the fact is that “quite boring” does describe user’s attitude to some information, that’s why the sentiment is extracted. What can be done in this case is extracting Air New Zealand as a topic for “well done”. This will explicitly tie “Air New Zealand” to a positive sentiment. We will look into modifying the corresponding rules.

  1. @FlyAirNZ reports rise in earnings despite economic climate - Breaking #Travel News : http://t.co/4ousm5iVVF

Yes, it shouldn’t extract “Breaking” as a MajorProblem. Otherwise, there’s no other sentiment in this tweet. “rise in earnings” is not a Sentiment according to our rules, it’s a fact.

  1. Hate to be unpatriotic but Air New Zealand smashes Qantas for service and friendliness. http://t.co/vTFNAs8Scf

In this example “Hate” is extracted as a Negative sentiment, and “to be unpatriotic” as a Topic. Usually this structure represents a true sentiment, like in “I hate to watch movies”, etc. However in this case it’s more a figure of speech. We need to investigate this problem more to see how we can distinguish such cases. As to “smashes” not being extracted as a positive sentiment: this verb is ambiguous, it can mean both negative and positive sentiments in different context. We are looking into this problem.

As for… The entity extraction transform now identifies that there are two different organisations - which is good. But it is not establishing which sentiment belongs to which topic? The only parent_id columns populated are those of the sentiment extractions. Or is this done per ‘sentence’?

Our Voice of the Customer module extracts a big Sentiment entity (in case of the English language its span is a sentence) and several subentities - Stances (WeakPositiveSentiment, StrongPositiveSentiment, etc.) and Topics. If a Sentiment entity has only one Stance and one Topic, it’s easy to make a connection:

Ex.
Entity Mention Text: “Air New Zealand is wonderful.”
Label Path(s): Sentiment

    Subentity Mention Text:   "Air New Zealand"
        Label Path(s):        Topic

    Subentity Mention Text:   "wonderful"
        Label Path(s):        StrongPositiveSentiment

Thanks again for feedback as we have incorporated it into our testing corpora.


alwaite (BOB member since 2013-03-11)

Here is another example of a tweet is rated as strong positive sentiment:

Clearly this is not very positive. I can see how it gets confused by “Thanks” but should VOC not pick up “Ripping us off” as very negative?

And because VOC did not pick up lovin’ as loving (!), the tweet below is rated strongly negative eventhough it is clearly positive:

This tweet was also rated very negatively - eventhough bloody brilliant is clearly positive:

And other sign that the social media component is not really of today’s age :slight_smile:

Clearly WICKED means absolutely fantastic! :slight_smile:

And here is a very interesting issue:

It has detected a negative smiley in this part " (the moa-mama) : " … even though there is a space between the two characters.

Does it consider this a smiley? ) : ?

I could understand if it would consider the smiley the other way around ( : ( ) as a negative smiley but the bracket-space-colon construction is frequently used in regular sentences, without intending any sort of emoticon.


ErikR :new_zealand: (BOB member since 2007-01-10)

This thread is giving me a huge headache!

Does that come out as positive or negative sentiment? :smiley:


eganjp :us: (BOB member since 2007-09-12)

Anthony,

  1. When will this new rule architecture be available?
  2. How difficult will it be to transition to?
  3. How difficult will it be to transition to if we have already been making manual changes to the rules?
  4. Will the VOC changes be made to TDP in Data Services as well as VOC in HANA?
  5. Can you provide any more details?

eganjp :us: (BOB member since 2007-09-12)