Challenges of using Twitter for sentiment analysis

Post on 11-Feb-2017

3,613 views 4 download

transcript

Studying sentiment on social media

Ana Isabel Canhoto - Oxford Brookes Universitywww.anacanhoto.com

Canhoto 2015 1

2

Emotions impact on:•Information retrieval•Information processing•Information retention•Decision-making•Behaviour•Assessment of consumption experiences

Why study sentiment?

Canhoto 2015

Image source: http://images.flatworldknowledge.com/sirgy/sirgy-fig06_x003.jpg

3

What are we talking about when we talk about sentiment analysis?

More: http://www.mxmindia.com/2012/03/tweets-take-wing-in-airline-social-media-study/

Canhoto 2015

4

Traditional approaches - Experiments

More: http://www.psych.nyu.edu/amodiolab/Publications_files/Social_Psychological_Methods_of_Emotion_Elicitation.pdf

Canhoto 2015

5

Traditional approaches – Interviews

Canhoto 2015

Real time Unprompted No need to recall past behaviour Non-intrusive Cost-effective…

6

The Social Media Promise

Canhoto 2015

7Canhoto 2015Source: http://cs-wordpress.s3.amazonaws.com/crowdsource-v4/uploads/2013/11/sentiment-analysis-ui.png

Canhoto 2015 8

Pratik Thakar, Head of creative content for Coca-Cola Asia-Pacific:

“Every office has a listening centre listening to what people are saying about our brands, good and bad, 24 hours a day. We look at what’s trending and how we can respond [to discussions about Coca-Cola] and to anything happening in the world. (…) I believe that social media is a big focus group. It’s a good way to identify trends and what people are talking about”

Source: http://www.campaignasia.com/Article/402239,Dont+believe+everything+you+hear+Cokes+Pratik+Thakar.aspx

9

Many turning to third parties for automated tracking and analysis of SM conversations…

Canhoto 2015

44% of businesses engaged in sentiment analysisHilpern, K. 'In it to win it?' The Marketer, July-August 2012, pp.34-37

Estimated cumulative revenues cc $2bn in 2014

Source: http://breakthroughanalysis.com/2013/12/30/aw-re-aw-text-analytics-industry-study_start-ups-and-aquisition-activities_max-breitsprecher/

How accurate are these tools?

Promotional literature: accuracy rates of 70% - 80% (Davis & O’Flaherty, 2012)– Not clear how the coefficients were

calculated– Not possible to independently verify these

claims 10Canhoto 2015

11Canhoto 2015

Open access

12

Sources of vulnerability

Canhoto 2015

• Accuracy: extent to which different researchers agree on the classification of a particular data object (Gwet, 2012)– System vs human coders– System A vs System B…

13Canhoto 2015

Conversations about coffee •Food and beverages = most widely discussed topic on social media (Forsyth, 2011)•‘Charged with a wide range of cultural meanings’ (Grinshpun, 2014)•Subject of many (netnographic) studies - e.g., Kozinets, 2002

14Canhoto 2015

• Sample of 200 tweets• Search terms: ‘coffee’ + variants ‘latte’,

‘mocha’, ‘cappuccino’, ‘espresso’ and ‘Americano’, as well as the terms ‘flavour’, ‘aroma’ and ‘caffeine’.

• Multiple users– Exclude manufacturers and retailers.

15Canhoto 2015

Analysis - Stage 1: Polarity of emotion•Positive vs. negative– As per Koppel & Schler (2006): comments that did

not express an emotion, were given the code ‘neutral’.

•Manual + 2 automated tools

16Canhoto 2015

17Canhoto 2015

Analysis - Stage 2: Type of emotion• As per Plutchik (2001)•Manual + 3 Automated tools

18Canhoto 2015

19Canhoto 2015

20Canhoto 2015

Messages where all types of coders agreed

Examples:“Found a euro cent on my walk and have a great cup of coffee in hand. Monday is already off to a good start”

“Feeling much more alive this morning now that I’ve had my coffee. Thank you #Nespresso”.

Clearly positive!21Canhoto 2015

Messages where automated tools agreed (but different from manual coding)

Example:“In uni. I think without this cup of coffee I would hulk out”

Very short segments

22Canhoto 2015

The rest

Example:“The early shift sucks. Oh well at least my latte is yummy :) “

23

Multiple objects

Multiple emotions

Canhoto 2015

Example:“100 copies of Ghosts sold overnight means a definite Starbucks run this morning. Possibly coffee out twice this week! Maybe even sushi!!”

Lack of emotionally charged words

24Canhoto 2015

Example:“How the heck am I supposed to be able to sleep well without coffee in my system? fucking snow”

Subtlety - Negative sentiment due to absence of product

25Canhoto 2015

Example:“Having coffee with my grandma before work right now. QT”

Syntax and style, specially abbreviations and slang

26Canhoto 2015

Example:“This coffee shop needs to change there music up every once and a while. Or maybe I should go home”

Target of emotion is not coffee!

27Canhoto 2015

28Canhoto 2015

29Canhoto 2015

30

Compounded by:• Very short segments of text• Rich in abbreviations and slang• Typos or grammatical errors• Specific culture and netiquette of the media• Skills of data analyst

Canhoto 2015

As a consequence:•Inaccurate representation of the overall sentiment [towards coffee]– Both sentiment polarity and emotional state

•Segments that should have been excluded from the analysis were retained in the corpus of data– Might skew results

•Concerns with the quality of the insights and subsequent decisions

31Canhoto 2015

To improve accuracy [1/2]:•Take into consideration the social context of the conversation– E.g., Tweets before or after the one being coded; wide

patterns (e.g., Mondays); cultural connotations (e.g., Japan vs. UK)

– But what about sarcasm and highly contextualised uses of language? (e.g., Sick)

32Canhoto 2015

Pratik Thakar:“When people say good things, you don’t just take it as it is. Someone might be asking them to say it; there might be some design mechanism working. But when people are unhappy, they go super-loud, and they are genuine at that time. ”Source: http://www.campaignasia.com/Article/402239,Dont+believe+everything+you+hear+Cokes+Pratik+Thakar.aspx

To improve accuracy [2/2]:•Develop dictionaries that reflect the specific syntax and style

•Software solutions that “translate” commonly used abbreviations and typos– E.g., BRB – be right back– Changing norms – e.g., LOL

•Familiarise with software

33Canhoto 2015

Studying sentiment on social media

Ana Isabel Canhoto - Oxford Brookes Universitywww.anacanhoto.com

Canhoto 2015 34