Entity Extraction - Sentiment extraction lacking context?

system · March 10, 2013, 11:33pm

When using Entity Extraction to perform sentiment analysis on social media feedback, it seems that the sentiment analysis only picks up individual keywords but does not analyise the words in the overall context of the source data. Is this correct?

For instance, the phrase “it is hard not to like a video with cats” gets flagged as a negative sentiment. The word hard is flagged as MinorProblem but I would expect Entity Extraction with English Voice of the Customer rules to recognise a basic english structure as “hard not to like” as WeakPositiveSentiment instead.

I fully realize that it will be near impossible for Entity Extraction to pickup sarcasm but this is just basic English that should be properly processed … IF… Entity Extraction is “context aware”.

So the question reamins - is Entity Extraction aware of the context of the entire source data or is it merely just flagging known keywords?

What is the recommended approach to ensure that such false alarms do not skew the overall results too much?

Edit:

And a second question is, how can we make Entity Extraction deal with 2 different subjects within the same source/feed/context?

For instance, take this (real) tweet on Twitter:

Based on my custom dictionary, the Entity Extraction Transform picks up both Jetstar and Air NZ as two organizations of interest.

The Sentiment scores are positive and negative - but how can I associate the sentiment scores with each organization identified in this tweet?

(The VOC rule set did NOT pick up “is al class” as positive sentiment for Air NZ. Instead it picked up “Pure”, “Simple” as positive sentiments - despite following a very negative sentiment. This just seems to demonstrate that the sentiment extraction is just done on keywords only and is not context aware?)

ErikR (BOB member since 2007-01-10)

system · March 11, 2013, 4:12pm

Our sentiment rules definitely take context into account. If it was a keyword search how could it possibly figure out and classify the associated topics, problems and requests? It is an NLP engine that literally understands the meaning and context of information, not just the words themselves. However, some particular cases where context changes the meaning of a sentiment may be a) overlooked or b) hard to implement using the current rules. We will fix this particular example it is hard not to like a video with cats" by preventing extraction of hard as Minor Problem in similar contexts.

However, there are cases where a distant phrase can affect the meaning of a sentiment:

I dont agree with the fact that yesterdays show was good.

Such cases are challenging to fix using our current VOC rules, a new VOC rule architecture we are implementing this year will address such issues.

Regarding the second question, yes the sentiment rules can extract two Topic/Sentiment pairs from the same context. The reason it wasnt recognized here is because all class was not in our lexicons, being not the most frequent expression. Thus, if we change all class to something more frequent well get two Sentiment entities, each having their own Topic/Stance:

Jetstar is an Australian embarrassment, pure and simple! Air New Zealand is wonderful! http://t.co/dcduJRcXek

Entity Mention Text: “Jetstar is an Australian embarrassment, pure and simple!”

Label Path(s): Sentiment
.
Subentity Mention Text: “embarrassment”
Label Path(s): StrongNegativeSentiment

Subentity Mention Text: “pure”
Label Path(s): WeakPositiveSentiment
.
Subentity Mention Text: “simple”
Label Path(s): WeakPositiveSentiment

Entity Mention Text: “Air New Zealand is wonderful! http://t.co/dcduJRcXek”
Label Path(s): Sentiment
Source: ExtractionRule

Subentity Mention Text: “Air New Zealand”
Label Path(s): Topic

Subentity Mention Text: “wonderful”
Label Path(s): StrongPositiveSentiment

We will fix this problem, as well as pure and simple mistakenly extracted as Positive Sentiments. Thanks for the feedback.

alwaite (BOB member since 2013-03-11)

system · March 11, 2013, 10:52pm

Here are some more tweets that, overall, receive a negative sentiment instead of a positive one.

What can I do get more reliable results from the sentiment analysis?

ErikR (BOB member since 2007-01-10)

system · March 12, 2013, 1:58am

In addition, I have ensured that all topics are in my dictionary - I have associated the tweet accounts (@Jetstar, @AirNZ etc) with the corresponding airlines.

The entity extraction transform now identifies that there are two different organisations - which is good. But it is not establishing which sentiment belongs to which topic? The only parent_id columns populated are those of the sentiment extractions. Or is this done per ‘sentence’?

ErikR (BOB member since 2007-01-10)

system · March 12, 2013, 9:00am

Erik, just so you know. Anthony is my counter part in product management for Text Analytics.

Werner Daehn (BOB member since 2004-12-17)

system · March 18, 2013, 11:08pm

A very different way of demonstrating something quite boring but necessary. Well done Air New Zealand ad team! http://t.co/wgtgYB1A8K

Two sentiments are extracted from this example, quite boring as a Negative sentiment and Well done as Positive one. One can understand this tweet is overall positive, but the fact is that quite boring does describe users attitude to some information, thats why the sentiment is extracted. What can be done in this case is extracting Air New Zealand as a topic for well done. This will explicitly tie Air New Zealand to a positive sentiment. We will look into modifying the corresponding rules.

@FlyAirNZ reports rise in earnings despite economic climate - Breaking #Travel News : http://t.co/4ousm5iVVF

Yes, it shouldnt extract Breaking as a MajorProblem. Otherwise, theres no other sentiment in this tweet. rise in earnings is not a Sentiment according to our rules, its a fact.

Hate to be unpatriotic but Air New Zealand smashes Qantas for service and friendliness. http://t.co/vTFNAs8Scf

In this example Hate is extracted as a Negative sentiment, and to be unpatriotic as a Topic. Usually this structure represents a true sentiment, like in I hate to watch movies, etc. However in this case its more a figure of speech. We need to investigate this problem more to see how we can distinguish such cases. As to smashes not being extracted as a positive sentiment: this verb is ambiguous, it can mean both negative and positive sentiments in different context. We are looking into this problem.

As for… The entity extraction transform now identifies that there are two different organisations - which is good. But it is not establishing which sentiment belongs to which topic? The only parent_id columns populated are those of the sentiment extractions. Or is this done per ‘sentence’?

Our Voice of the Customer module extracts a big Sentiment entity (in case of the English language its span is a sentence) and several subentities - Stances (WeakPositiveSentiment, StrongPositiveSentiment, etc.) and Topics. If a Sentiment entity has only one Stance and one Topic, its easy to make a connection:

Ex.
Entity Mention Text: “Air New Zealand is wonderful.”
Label Path(s): Sentiment

    Subentity Mention Text:   "Air New Zealand"
        Label Path(s):        Topic

    Subentity Mention Text:   "wonderful"
        Label Path(s):        StrongPositiveSentiment

Thanks again for feedback as we have incorporated it into our testing corpora.

alwaite (BOB member since 2013-03-11)

system · March 21, 2013, 7:22am

Here is another example of a tweet is rated as strong positive sentiment:

Clearly this is not very positive. I can see how it gets confused by “Thanks” but should VOC not pick up “Ripping us off” as very negative?

And because VOC did not pick up lovin’ as loving (!), the tweet below is rated strongly negative eventhough it is clearly positive:

This tweet was also rated very negatively - eventhough bloody brilliant is clearly positive:

And other sign that the social media component is not really of today’s age

Clearly WICKED means absolutely fantastic!

And here is a very interesting issue:

It has detected a negative smiley in this part " (the moa-mama) : " … even though there is a space between the two characters.

Does it consider this a smiley? ) : ?

I could understand if it would consider the smiley the other way around ( : ( ) as a negative smiley but the bracket-space-colon construction is frequently used in regular sentences, without intending any sort of emoticon.

ErikR (BOB member since 2007-01-10)

system · March 22, 2013, 11:54pm

This thread is giving me a huge headache!

Does that come out as positive or negative sentiment?

eganjp (BOB member since 2007-09-12)

system · May 20, 2013, 7:21pm

Anthony,

When will this new rule architecture be available?
How difficult will it be to transition to?
How difficult will it be to transition to if we have already been making manual changes to the rules?
Will the VOC changes be made to TDP in Data Services as well as VOC in HANA?
Can you provide any more details?

eganjp (BOB member since 2007-09-12)