+ All Categories
Home > Data & Analytics > 2016 datascience emotion analysis - english version

2016 datascience emotion analysis - english version

Date post: 12-Apr-2017
Category:
Upload: yi-shin-chen
View: 273 times
Download: 1 times
Share this document with a friend
59
Emotion Analysis for Big Data NTHU CS, Yi-Shin Chen
Transcript
Page 1: 2016 datascience emotion analysis - english version

Emotion Analysis for Big Data

NTHU CS, Yi-Shin Chen

Page 2: 2016 datascience emotion analysis - english version

Hello!I am Yi-Shin Chen

Currently in NTHU CS

Intelligent Data Engineering and Application Lab (IDEA Lab)

You can find me at:[email protected]

2

Page 3: 2016 datascience emotion analysis - english version

3

We Promote Diversity at

More than 50 % students come from other countries

Belize

France

St Lucia

Honduras

India China

Japan

Taiwan

Indonesia

São Tomé

Page 4: 2016 datascience emotion analysis - english version

4

1.Why Emotion Analysis

There are few personal reasons

Page 5: 2016 datascience emotion analysis - english version

5

I don’t understand woman!! Their words are very vague and ambiguous”

From Carlos Argueta, my first foreign Ph.D. graduate

He’s the one to select the topic of sentiment analysis.And the first suffering from depression in our lab

Page 6: 2016 datascience emotion analysis - english version

Children are BewilderingThey don't say and they cannot say.

6

Page 7: 2016 datascience emotion analysis - english version

7

2.Emotion AnalysisLet's see what others did/do

Page 8: 2016 datascience emotion analysis - english version

8

Natural Language Processing

▷Analyze Part-of-Speech (POS) tagging▷Understand word meaning▷Analyze the relationships between words

Need dictionaries & semantic relationshipsWord positions affect statement meaningsNeed different data for different languages

This is the best thing happened in my life.Det. Det. NN PNPre.Verb VerbAdj Difficult

Page 9: 2016 datascience emotion analysis - english version

9

Data Mining/Machine Learning

▷Collect massive data▷Manually annotate training data▷Analyze data with classifiers

Recollect training data for different languages

Low recall rates (<<25%) Easier?

Page 10: 2016 datascience emotion analysis - english version

10

3.Learning from Experience

Difference between Reality and Practice

Page 11: 2016 datascience emotion analysis - english version

11

Emotion Embedded in Trivia

▷Most trivia are ignored in previous works

• Stop Words are the first batch to be removed→ E.g., often, above, again

• Determiner, pronoun are usually ignored• Most nouns are considered unimportant

My mom always said school is more important

😒 Angry 😂 Sad 👶 Joy

Page 12: 2016 datascience emotion analysis - english version

12

Emotional Mistakes

▷Mistakes everywhere• Some are careless

→ E.g., Luve you

• Some are intentional→ E.g., I’m soooooooo happppppy

▷Mistakes are not recorded in dictionaries• How to annotate mistakes?

→ Annotation cost A LOT!

Page 13: 2016 datascience emotion analysis - english version

13

Children are our mentors

Mumbling from a mom

▷My one-year-old kid can detect my emotion• Without seeing my face• I did not change my tone• How come she is always right?

▷Guessing• She did not know grammar• She did not memorize any dictionary• My statements might have a lot of mistakes

Goal

Multi-lingual

Page 14: 2016 datascience emotion analysis - english version

14

4.Overcome Challenges

Insufficient Research Fund

Page 15: 2016 datascience emotion analysis - english version

15

Free Resources

▷Free Data• As long as they can be legally accessed

▷Open source software

Page 16: 2016 datascience emotion analysis - english version

16

Philosophy Slow Life

▷ Our students are often delayed by various reasons▷ Not follow the trends

• Usually against common sense in academic

No POS TaggingNo dictionaryMultilingual

😱

Failure Success POS TaggingMultiple dictionariesOne language

Page 17: 2016 datascience emotion analysis - english version

17

Teamwork

▷ Implementation team• Coding• More coding

▷Dreaming team• Reading papers• Design

▷Boasting team• Writing papers• Generating presentation

▷Anonymous

Page 18: 2016 datascience emotion analysis - english version

18

CrowdsourcingMerriam-Webster: Obtaining needed services, ideas, or content by soliciting contributions from a large group of people, especially an online community

Cost $$$

Page 19: 2016 datascience emotion analysis - english version

19

Subconscious Crowdsourcing

▷Crowdsourcing in subconscious• Free

• Extract the subconscious from daily-life records→ Ex1: “computers/companies/product-support/apple” in

delicious tag

→ Ex2: “Trump” “Nickname generator” in search log

→ Ex3: “School day again #sad” in Twitter

Chun-Hao Chang, Elvis Saravia and Yi-Shin Chen, Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental Disorder Detection on Social Media, The 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2016), San Francisco, CA, USA, 18 - 21 August, 2016

Page 20: 2016 datascience emotion analysis - english version

20

5.Case1 : Analyze Emotions from

TextUtilize subconscious emotion patterns

Page 21: 2016 datascience emotion analysis - english version

21

Subconscious Emotion Big Data

▷Twitter, a good public source

Throwing my phone always calms me down #anger

My sister always makes things look much more worse than they seem >:[ #anger

Why my brother always crabby !?!? #rude #youranadult #anger #issues

WHY DOES MY COMPUTER ALWAYS FREEZE??? NEVER FAILS. #anger

Im wanna crazy,if my life always sucks like this. #anger

Hashtag and emoticon can represent emotion well; hence can be treated as annotated answers

Page 22: 2016 datascience emotion analysis - english version

22

Collect Emotion Data

Page 23: 2016 datascience emotion analysis - english version

23

Collect Emotion Data

Page 24: 2016 datascience emotion analysis - english version

24

Collect Emotion Data Wait! Need

Control Group

Page 25: 2016 datascience emotion analysis - english version

25

Not-Emotion Data

Page 26: 2016 datascience emotion analysis - english version

26

Not-Emotion Data

Page 27: 2016 datascience emotion analysis - english version

27

Not-Emotion Data

Page 28: 2016 datascience emotion analysis - english version

28

Preprocessing Steps

▷Hints: Remove troublesome oneso Too short

→ Too short to get important featureso Contain too many hashtags

→ Too much information to processo Are retweets

→ Increase the complexityo Have URLs

→ Too trouble to collect the page datao Convert user mentions to <usermention> and hashtags to

<hashtag>→ Remove the identification. We should not peek answers!

Big Data

anyway

Page 29: 2016 datascience emotion analysis - english version

29

Basic Guidelines

▷ Identify the common and differences between

the experimental and control groups• Analyze the frequency of words

→ TF•IDF (Term frequency, inverse document frequency)

• Analyze the co-occurrence between words/patterns

→ Co-occurrence

• Analyze the importance between words

→ CentralityGraph

Page 30: 2016 datascience emotion analysis - english version

30

Graph Construction

▷Construct two graphs• E.g.

→ Emotion one: I love the World of Warcraft new game → Not-emotion one: 3,000 killed in the world by ebola

Iof

Warcraftnew

game

WorldLove

the0.9

0.84

0.650.12

0.12

0.530.67

0.45

3,000world

byebola

the

killed in

0.49

0.870.93

0.83

0.55

0.25

Page 31: 2016 datascience emotion analysis - english version

31

Graph Processes

▷Remove the common ones between two

graphs• Leave the significant ones only appear in the

emotion graph▷Analyze the centrality of words

• Betweenness, Closeness, Eigenvector, Degree, Katz→ Can use the free/open software, e.g, Gaphi, GraphDB

▷Analyze the cluster degrees• Clustering Coefficient

GraphKey patterns

Page 32: 2016 datascience emotion analysis - english version

32

Essence Only

Only key phrases

→emotion patterns

Page 33: 2016 datascience emotion analysis - english version

33

Ranking Emotion Patterns

▷ Ranking the emotion patterns for each emotion• Frequency, exclusiveness, diversity• One ranked list for each emotion

SadJoy Anger

Page 34: 2016 datascience emotion analysis - english version

34

Emotion Pattern Samples

SadJoy Anger

finally * mytomorrow !!! * <hashtag> birthday .+ * yay ! :) * ! princess ** hehe prom dress *

memories * * without my sucks * <hashtag> * tonight :( * anymore .. felt so *. :( * * :((

my * alwaysshut the * teachers * people say * -.- * understand why *why are *with these *

Page 35: 2016 datascience emotion analysis - english version

35

Precision

Naïve Bayes SVM NRCWE Our Approach0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Acc

urac

y

LIWCNo LIWC

Page 36: 2016 datascience emotion analysis - english version

36

Feedback for Products

Page 37: 2016 datascience emotion analysis - english version

37

商品喜好分析

Page 38: 2016 datascience emotion analysis - english version

38

5.Case2: Analyze Emotion

Status for individualsWho is bi-polar disorder?

Who is borderline personal disorder?

Page 39: 2016 datascience emotion analysis - english version

39

Collect Patient Data

Support Group

Page 40: 2016 datascience emotion analysis - english version

40

Collect Patient Data

Followers

Page 41: 2016 datascience emotion analysis - english version

41

Collect Patient Data

Page 42: 2016 datascience emotion analysis - english version

42

Collect Patient Data

Page 43: 2016 datascience emotion analysis - english version

43

Collect Patient Data Wait! Control Group

Needed

Page 44: 2016 datascience emotion analysis - english version

44

Collect Data from Ordinary People

Page 45: 2016 datascience emotion analysis - english version

45

Collect Data from Ordinary People

Page 46: 2016 datascience emotion analysis - english version

46

Collect Data from Ordinary People

Page 47: 2016 datascience emotion analysis - english version

47

Basic Guidelines

▷ Identify the common and differences between

the experimental and control groups• Word/pattern frequency

• Emotion related data (e.g., flipping rates, occurrence rates)

• Social interaction (e.g., retweet, reply)

• Lifestyle (e.g., online time, stay-up or not)

• Age and genderFeatures

Page 48: 2016 datascience emotion analysis - english version

48

Apply Classifiers

▷ By utilize the extracted features

▷ Various classifiers• Neural Networks

• Naïve Bayes and Bayesian Belief Networks

• Support Vector Machines

• Random forest

Page 49: 2016 datascience emotion analysis - english version

49

Precisions

Page 50: 2016 datascience emotion analysis - english version

50

Possible Applications

Page 51: 2016 datascience emotion analysis - english version

51

Possible Applications

Page 52: 2016 datascience emotion analysis - english version

52

Possible Applications

Page 53: 2016 datascience emotion analysis - english version

53

Possible Applications

Page 54: 2016 datascience emotion analysis - english version

54

Election Analysis?

Page 55: 2016 datascience emotion analysis - english version

55

Election Analysis?

Page 56: 2016 datascience emotion analysis - english version

56

Election Analysis?

Page 57: 2016 datascience emotion analysis - english version

57

Election Analysis?

Page 58: 2016 datascience emotion analysis - english version

58

Election Analysis?

Page 59: 2016 datascience emotion analysis - english version

More in the future…

Thank you.Contact me at:[email protected]


Recommended