Date post: | 12-Apr-2017 |
Category: |
Data & Analytics |
Upload: | yi-shin-chen |
View: | 273 times |
Download: | 1 times |
Emotion Analysis for Big Data
NTHU CS, Yi-Shin Chen
Hello!I am Yi-Shin Chen
Currently in NTHU CS
Intelligent Data Engineering and Application Lab (IDEA Lab)
You can find me at:[email protected]
2
3
We Promote Diversity at
More than 50 % students come from other countries
Belize
France
St Lucia
Honduras
India China
Japan
Taiwan
Indonesia
São Tomé
4
1.Why Emotion Analysis
There are few personal reasons
“
5
I don’t understand woman!! Their words are very vague and ambiguous”
From Carlos Argueta, my first foreign Ph.D. graduate
He’s the one to select the topic of sentiment analysis.And the first suffering from depression in our lab
Children are BewilderingThey don't say and they cannot say.
6
7
2.Emotion AnalysisLet's see what others did/do
8
Natural Language Processing
▷Analyze Part-of-Speech (POS) tagging▷Understand word meaning▷Analyze the relationships between words
Need dictionaries & semantic relationshipsWord positions affect statement meaningsNeed different data for different languages
This is the best thing happened in my life.Det. Det. NN PNPre.Verb VerbAdj Difficult
9
Data Mining/Machine Learning
▷Collect massive data▷Manually annotate training data▷Analyze data with classifiers
Recollect training data for different languages
Low recall rates (<<25%) Easier?
10
3.Learning from Experience
Difference between Reality and Practice
11
Emotion Embedded in Trivia
▷Most trivia are ignored in previous works
• Stop Words are the first batch to be removed→ E.g., often, above, again
• Determiner, pronoun are usually ignored• Most nouns are considered unimportant
My mom always said school is more important
😒 Angry 😂 Sad 👶 Joy
12
Emotional Mistakes
▷Mistakes everywhere• Some are careless
→ E.g., Luve you
• Some are intentional→ E.g., I’m soooooooo happppppy
▷Mistakes are not recorded in dictionaries• How to annotate mistakes?
→ Annotation cost A LOT!
13
Children are our mentors
Mumbling from a mom
▷My one-year-old kid can detect my emotion• Without seeing my face• I did not change my tone• How come she is always right?
▷Guessing• She did not know grammar• She did not memorize any dictionary• My statements might have a lot of mistakes
Goal
Multi-lingual
14
4.Overcome Challenges
Insufficient Research Fund
15
Free Resources
▷Free Data• As long as they can be legally accessed
▷Open source software
16
Philosophy Slow Life
▷ Our students are often delayed by various reasons▷ Not follow the trends
• Usually against common sense in academic
No POS TaggingNo dictionaryMultilingual
😱
Failure Success POS TaggingMultiple dictionariesOne language
17
Teamwork
▷ Implementation team• Coding• More coding
▷Dreaming team• Reading papers• Design
▷Boasting team• Writing papers• Generating presentation
▷Anonymous
18
CrowdsourcingMerriam-Webster: Obtaining needed services, ideas, or content by soliciting contributions from a large group of people, especially an online community
Cost $$$
19
Subconscious Crowdsourcing
▷Crowdsourcing in subconscious• Free
• Extract the subconscious from daily-life records→ Ex1: “computers/companies/product-support/apple” in
delicious tag
→ Ex2: “Trump” “Nickname generator” in search log
→ Ex3: “School day again #sad” in Twitter
Chun-Hao Chang, Elvis Saravia and Yi-Shin Chen, Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental Disorder Detection on Social Media, The 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2016), San Francisco, CA, USA, 18 - 21 August, 2016
20
5.Case1 : Analyze Emotions from
TextUtilize subconscious emotion patterns
21
Subconscious Emotion Big Data
▷Twitter, a good public source
Throwing my phone always calms me down #anger
My sister always makes things look much more worse than they seem >:[ #anger
Why my brother always crabby !?!? #rude #youranadult #anger #issues
WHY DOES MY COMPUTER ALWAYS FREEZE??? NEVER FAILS. #anger
Im wanna crazy,if my life always sucks like this. #anger
Hashtag and emoticon can represent emotion well; hence can be treated as annotated answers
22
Collect Emotion Data
23
Collect Emotion Data
24
Collect Emotion Data Wait! Need
Control Group
25
Not-Emotion Data
26
Not-Emotion Data
27
Not-Emotion Data
28
Preprocessing Steps
▷Hints: Remove troublesome oneso Too short
→ Too short to get important featureso Contain too many hashtags
→ Too much information to processo Are retweets
→ Increase the complexityo Have URLs
→ Too trouble to collect the page datao Convert user mentions to <usermention> and hashtags to
<hashtag>→ Remove the identification. We should not peek answers!
Big Data
anyway
29
Basic Guidelines
▷ Identify the common and differences between
the experimental and control groups• Analyze the frequency of words
→ TF•IDF (Term frequency, inverse document frequency)
• Analyze the co-occurrence between words/patterns
→ Co-occurrence
• Analyze the importance between words
→ CentralityGraph
30
Graph Construction
▷Construct two graphs• E.g.
→ Emotion one: I love the World of Warcraft new game → Not-emotion one: 3,000 killed in the world by ebola
Iof
Warcraftnew
game
WorldLove
the0.9
0.84
0.650.12
0.12
0.530.67
0.45
3,000world
byebola
the
killed in
0.49
0.870.93
0.83
0.55
0.25
31
Graph Processes
▷Remove the common ones between two
graphs• Leave the significant ones only appear in the
emotion graph▷Analyze the centrality of words
• Betweenness, Closeness, Eigenvector, Degree, Katz→ Can use the free/open software, e.g, Gaphi, GraphDB
▷Analyze the cluster degrees• Clustering Coefficient
GraphKey patterns
32
Essence Only
Only key phrases
→emotion patterns
33
Ranking Emotion Patterns
▷ Ranking the emotion patterns for each emotion• Frequency, exclusiveness, diversity• One ranked list for each emotion
SadJoy Anger
34
Emotion Pattern Samples
SadJoy Anger
finally * mytomorrow !!! * <hashtag> birthday .+ * yay ! :) * ! princess ** hehe prom dress *
memories * * without my sucks * <hashtag> * tonight :( * anymore .. felt so *. :( * * :((
my * alwaysshut the * teachers * people say * -.- * understand why *why are *with these *
35
Precision
Naïve Bayes SVM NRCWE Our Approach0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Acc
urac
y
LIWCNo LIWC
36
Feedback for Products
37
商品喜好分析
38
5.Case2: Analyze Emotion
Status for individualsWho is bi-polar disorder?
Who is borderline personal disorder?
39
Collect Patient Data
Support Group
40
Collect Patient Data
Followers
41
Collect Patient Data
42
Collect Patient Data
43
Collect Patient Data Wait! Control Group
Needed
44
Collect Data from Ordinary People
45
Collect Data from Ordinary People
46
Collect Data from Ordinary People
47
Basic Guidelines
▷ Identify the common and differences between
the experimental and control groups• Word/pattern frequency
• Emotion related data (e.g., flipping rates, occurrence rates)
• Social interaction (e.g., retweet, reply)
• Lifestyle (e.g., online time, stay-up or not)
• Age and genderFeatures
48
Apply Classifiers
▷ By utilize the extracted features
▷ Various classifiers• Neural Networks
• Naïve Bayes and Bayesian Belief Networks
• Support Vector Machines
• Random forest
49
Precisions
50
Possible Applications
51
Possible Applications
52
Possible Applications
53
Possible Applications
54
Election Analysis?
55
Election Analysis?
56
Election Analysis?
57
Election Analysis?
58
Election Analysis?