Sentiment is just a stepping stone: Getting more out of Natural Language Processing/Machine Learning

Post on 21-Jul-2015

677 views 1 download

Tags:

transcript

Social Media & Web Analytics InnovationSentiment is just a stepping stone

Social Media & Web Analytics Innovation

Hello online viewers of this slide deck!

• A lot of the content here is visual—you’ll want to download the full presentation and read the notes fields

• You can also (soon) find the video version by looking at the Social Media & Web Analytics Innovation site

• You can also stay tuned for more content by checking out our blog: http://www.idibon.com/blog• The case studies are, partly, covered by these blog posts btw:

• http://idibon.com/toxicity-in-reddit-communities-a-journey-to-the-darkest-depths-of-the-interwebs/

• http://idibon.com/run-fast-as-you-can-likeagirl-advocates-and-brand-campaign-roi/

• http://idibon.com/idibon-supports-unicef-provide-natural-language-processing-sms-based-social-monitoring-systems-africa/

Social Media & Web Analytics Innovation

What’s ahead

• Quick overview of sentiment analysis

• It’s tricky

• And limited

• Can we do more?

• Yep

• Case studies

• Detecting toxicity/supportiveness of Reddit communities

• Understanding the effectiveness of Always’ #LikeAGirlcampaign

• Routing text messages to different groups in UNICEF

Social Media & Web Analytics Innovation

We are not robots

Social Media & Web Analytics Innovation

Though automation makes our lives easier

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

Referential

Social Media & Web Analytics Innovation

Persuasive

Social Media & Web Analytics Innovation

Expressive

Social Media & Web Analytics Innovation

How do you feel?

Social Media & Web Analytics Innovation

13 expert polarity lexiconsWords on 2 or more= 10,592 affective words

Social Media & Web Analytics Innovation

We don’t stand still

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

Yasssssss!

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

Snug as a bug in a rug

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

4 billion web pages20 million candidates1-10 words each178,104 polarity phrases

Social Media & Web Analytics Innovation

(In English)(only)

Social Media & Web Analytics Innovation

Dutch

tet

“Underscores the polarity of the clause and expresses either irritation or surprise, as if he or she had expected the opposite state of affairs”

Social Media & Web Analytics Innovation

Tongan

si’i and si’a

Different determiners (~the, that, etc) express sympathy

Social Media & Web Analytics Innovation

Cantonese

-k at the end of particles

“An emotion intensifier”

Social Media & Web Analytics Innovation

95% of the world’s conversationsare not in English

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

Different domains have different proportions

0% 10% 20% 30% 40% 50% 60% 70%

Positive

Negative

Conflict

Neutral

Restaurants

Laptops

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

“Okay, okay. Sentiment is complicated”

Social Media & Web Analytics Innovation

Real question: Can you take action?

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

How is sentiment for particular categories?

0% 10% 20% 30% 40% 50% 60% 70% 80%

Positive

Negative

Anecdotes

Ambience

Service

Price

Food

Social Media & Web Analytics Innovation

Setting the bar—at a minimum:Accuracy

(which is tied to your training data)+

An ability to do something

Social Media & Web Analytics Innovation

BEYOND SENTIMENT

Social Media & Web Analytics Innovation

What would you do with unlimited human analysts?

You’d ask them to classify messages into categories that enable you to take action.

Machine learning models with humans-in-the-loop can power sophisticated classification.

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

Toxicity > sentiment

• People don’t like things; they talk about them

• Negative comments aren’t the same as toxic comments

• Negative can be constructive

• Finding hateful and hate-inciting speech—that’s important

• To keep people safe

• To keep communities healthy

Social Media & Web Analytics Innovation

The importance of definition

• If people can’t agree on what’s-in and what’s-out, it’s hard to train a machine

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

Wait a sec! Aren’t these ducks?(Can we agree to disagree?)

Social Media & Web Analytics Innovation

The importance of definition

• If people can’t agree on what’s-in and what’s-out, it’s hard to train a machine

• In our case toxicity was defined as:

• ad hominem attacks (directed at specific people)

• bigoted comments (e.g., sexist, racist, homophobic, etc)

• Set definitions

• Then see if people are consistent

• Run pilots

• Do inter-annotator agreement

• Iterate

Social Media & Web Analytics Innovation

Sentiment is not IRRELEVANT

• A lot of comments are Neutral

• So that doesn’t teach us much about hate speech

• And we’ll waste a lot of time and money getting training data on Neutral

• So we ran an experiment:

• Annotate random data

• Annotate stuff that our sentiment models say is Negative

Social Media & Web Analytics Innovation

Work savings!

• Items chosen for review based on our sentiment model were MUCH more likely to be toxic or supportive

• A decrease of 96% of effort

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

Analyst time savings is a key benefit

73%

83%

88%

80%

91%

81%

87%

85%

90%

99%

% analyst time saved

% accuracy (compared to humans)

Finding relevant business articles

News category 1 News category 2 Health sciences

News category 4 Manufacturing

Social Media & Web Analytics Innovation

Okay back to community health

Social Media & Web Analytics Innovation

Finding healthy communities (supportive)

Social Media & Web Analytics Innovation

And unhealthy ones (toxic)

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

Unstructured data gets structured (bonus: a system that gets smarter over time)

Adaptive System

Machine Learning

Optimization

Human Annotation

Prediction Engine

Structured Data Reports

Action

Social Media & Web Analytics Innovation

By structuring text, you can do all kinds of visualizations

Social Media & Web Analytics Innovation

Learning more about ad campaigns than just “people liked it”: #LikeAGirl

Social Media & Web Analytics Innovation

The most re-shared #LikeAGirl post

Social Media & Web Analytics Innovation

60 second ad= ~ $9 million114.4 million viewers= ~ $0.08 per viewer

Social Media & Web Analytics Innovation

Always only spent 30%of what Anheuser-Busch didBut they had twice the tweets

Social Media & Web Analytics Innovation

Not all sharers and resharersare of equal value

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

Influencers extend the brand a lot

Social Media & Web Analytics Innovation

Posts by brand and ad advocates reach twice as far as posts by @Always

Social Media & Web Analytics Innovation

If we lumped everyone who used #LikeAGirl togetherWe wouldn’t know the difference betweenPeople talking about the ad (and products)

And people talking about the cause

Social Media & Web Analytics Innovation

Antagonists mainly posted their sexist content to #LikeABoyDefenders overwhelmed them with 3-4 times the content (yay!)

Social Media & Web Analytics Innovation

Positive sentiment would lump everyone togetherAnd negative sentiment would lump

Antagonists (sexists)in with

Defenders (anti-sexist)

Social Media & Web Analytics Innovation

Routing messages that matter

Social Media & Web Analytics Innovation

Processing millions of SMS in 12 African languages

Intent of sender(i.e. report a problem, ask

a question or make a suggestion)

Categorization(i.e. orphans and

vulnerable children, violence against children,

health, nutrition)

Language detection(i.e. English, Acholi,

Karamojong, Luganda, Nkole, Swahili, Lango)

Location(i.e. village names)

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

1.4%

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

Top 3 categories in Nigeria

9.69%

17.68%

39.44%

Employment

U-report support

Health

Social Media & Web Analytics Innovation

Quick conclusion

• Sentiment analysis is pretty rudimentary

• On its own, it rarely answers key business questions

• Though it IS automatic and scalable

• Think of it as an example of natural language processing

• There’s a lot more you can do

• The key is formulating specific questions

• And training the system on RELEVANT data

• For this, you’ll need to optimize humans

Social Media & Web Analytics Innovation

email tyler@idibon.com

twitter @idibon

www idibon.com

THANK YOU!

Social Media & Web Analytics Innovation

Accuracy of ~20 teams

Restaurantcategories (F-score)

Restaurant categorypolarity (F-score)

Top score 88.57 82.92

Median 74.24 69.75

Baseline (~ “let’s always guess the most popular category)

68.89 64.09

We care about overall accuracy, so we need to multiply how often the right category goes with the right polarity.

Social Media & Web Analytics Innovation

95% of the world’s conversations are not in English. Idibon covers 99% of the world’s GDP.

Rapidly tag and filter your chosen topics

and criteria in any language

Monitor how people respond to your brand

differently around the world

One unified system versus data cobbled together

from disparate systems

Idibon works with:

English, Portuguese (Brazilian and from Portugal), Spanish, Italian, French, Russian, German, Turkish, Arabic, Japanese, Greek, Mandarin Chinese, Persian, Polish, Dutch, Swedish, Serbian, Romanian, Korean, Hungarian, Bulgarian, Hindi, Croatian, Czech, Ukrainian, Finnish, Hebrew, Urdu, Catalan, Slovak, Indonesian, Malay, Vietnamese, Bengali, Thai, Navajo, Latvian, Estonian, Lithuanian, Kurdish, Yoruba, Amharic, Zulu, Hausa, Kazakh, Sindhi, Punjabi, Tagalog, Cebuano, Danish and Emoji.

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

Social Media & Web Analytics Innovation

Navajo

=go

Emotional evaluation in narrative