When using Entity Extraction to perform sentiment analysis on social media feedback, it seems that the sentiment analysis only picks up individual keywords but does not analyise the words in the overall context of the source data. Is this correct?
For instance, the phrase “it is hard not to like a video with cats” gets flagged as a negative sentiment. The word hard is flagged as MinorProblem but I would expect Entity Extraction with English Voice of the Customer rules to recognise a basic english structure as “hard not to like” as WeakPositiveSentiment instead.
I fully realize that it will be near impossible for Entity Extraction to pickup sarcasm but this is just basic English that should be properly processed … IF… Entity Extraction is “context aware”.
So the question reamins - is Entity Extraction aware of the context of the entire source data or is it merely just flagging known keywords?
What is the recommended approach to ensure that such false alarms do not skew the overall results too much?
Edit:
And a second question is, how can we make Entity Extraction deal with 2 different subjects within the same source/feed/context?
For instance, take this (real) tweet on Twitter:
Based on my custom dictionary, the Entity Extraction Transform picks up both Jetstar and Air NZ as two organizations of interest.
The Sentiment scores are positive and negative - but how can I associate the sentiment scores with each organization identified in this tweet?
(The VOC rule set did NOT pick up “is al class” as positive sentiment for Air NZ. Instead it picked up “Pure”, “Simple” as positive sentiments - despite following a very negative sentiment. This just seems to demonstrate that the sentiment extraction is just done on keywords only and is not context aware?)
Our sentiment rules definitely take context into account. If it was a keyword search how could it possibly figure out and classify the associated topics, problems and requests? It is an NLP engine that literally understands the meaning and context of information, not just the words themselves. However, some particular cases where context changes the meaning of a sentiment may be a) overlooked or b) hard to implement using the current rules. We will fix this particular example it is hard not to like a video with cats" by preventing extraction of hard as Minor Problem in similar contexts.
However, there are cases where a distant phrase can affect the meaning of a sentiment:
I dont agree with the fact that yesterdays show was good.
Such cases are challenging to fix using our current VOC rules, a new VOC rule architecture we are implementing this year will address such issues.
Regarding the second question, yes the sentiment rules can extract two Topic/Sentiment pairs from the same context. The reason it wasnt recognized here is because all class was not in our lexicons, being not the most frequent expression. Thus, if we change all class to something more frequent well get two Sentiment entities, each having their own Topic/Stance:
Jetstar is an Australian embarrassment, pure and simple! Air New Zealand is wonderful! http://t.co/dcduJRcXek
Entity Mention Text: “Jetstar is an Australian embarrassment, pure and simple!”
In addition, I have ensured that all topics are in my dictionary - I have associated the tweet accounts (@Jetstar, @AirNZ etc) with the corresponding airlines.
The entity extraction transform now identifies that there are two different organisations - which is good. But it is not establishing which sentiment belongs to which topic? The only parent_id columns populated are those of the sentiment extractions. Or is this done per ‘sentence’?
A very different way of demonstrating something quite boring but necessary. Well done Air New Zealand ad team! http://t.co/wgtgYB1A8K
Two sentiments are extracted from this example, quite boring as a Negative sentiment and Well done as Positive one. One can understand this tweet is overall positive, but the fact is that quite boring does describe users attitude to some information, thats why the sentiment is extracted. What can be done in this case is extracting Air New Zealand as a topic for well done. This will explicitly tie Air New Zealand to a positive sentiment. We will look into modifying the corresponding rules.
Yes, it shouldnt extract Breaking as a MajorProblem. Otherwise, theres no other sentiment in this tweet. rise in earnings is not a Sentiment according to our rules, its a fact.
Hate to be unpatriotic but Air New Zealand smashes Qantas for service and friendliness. http://t.co/vTFNAs8Scf
In this example Hate is extracted as a Negative sentiment, and to be unpatriotic as a Topic. Usually this structure represents a true sentiment, like in I hate to watch movies, etc. However in this case its more a figure of speech. We need to investigate this problem more to see how we can distinguish such cases. As to smashes not being extracted as a positive sentiment: this verb is ambiguous, it can mean both negative and positive sentiments in different context. We are looking into this problem.
As for… The entity extraction transform now identifies that there are two different organisations - which is good. But it is not establishing which sentiment belongs to which topic? The only parent_id columns populated are those of the sentiment extractions. Or is this done per ‘sentence’?
Our Voice of the Customer module extracts a big Sentiment entity (in case of the English language its span is a sentence) and several subentities - Stances (WeakPositiveSentiment, StrongPositiveSentiment, etc.) and Topics. If a Sentiment entity has only one Stance and one Topic, its easy to make a connection:
Ex.
Entity Mention Text: “Air New Zealand is wonderful.”
Label Path(s): Sentiment
Here is another example of a tweet is rated as strong positive sentiment:
Clearly this is not very positive. I can see how it gets confused by “Thanks” but should VOC not pick up “Ripping us off” as very negative?
And because VOC did not pick up lovin’ as loving (!), the tweet below is rated strongly negative eventhough it is clearly positive:
This tweet was also rated very negatively - eventhough bloody brilliant is clearly positive:
And other sign that the social media component is not really of today’s age
Clearly WICKED means absolutely fantastic!
And here is a very interesting issue:
It has detected a negative smiley in this part " (the moa-mama) : " … even though there is a space between the two characters.
Does it consider this a smiley? ) : ?
I could understand if it would consider the smiley the other way around ( : ( ) as a negative smiley but the bracket-space-colon construction is frequently used in regular sentences, without intending any sort of emoticon.