Post on 26-Jan-2015
description
transcript
Adapting Sentiment Lexicons using Contextual Semantics for Sentiment
Analysis of TwitterHassan Saif, Yulan He, Miriam Fernandez and Harith Alani
Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom
1st Workshop on Semantic Sentiment AnalysisGreece, Crete 2014
• Sentiment Analysis
• Sentiment Analysis Approaches
• Sentiment Lexicons on Twitter
• Sentiment Lexicon Adaptation Approach
• Evaluation
• Conclusion
Outline
“Sentiment analysis is the task of identifying positive and negative opinions, emotions and evaluations in text”
3
Opinion OpinionFact
Sentiment Analysis
yes, It is sunny, but also very humid :(
The weather is great today :)
I think its almost 30 degrees today
oRich
o Formal Language {Well
Structured Sentences}
oDomain Specific
Conventional Text
Twitter Data
o Short (140-Chars)
oNoisy {gr8, lol, :), :P}
oOpen Environment
I had nightmares all night long last night :(
Negative
Sentiment Lexicon
Text Processing Algorithm
Sentiment Analysis
The Lexicon-based Approach
great successsad
pretty
down
wronghorrible
beautiful
mistake
love
good
Sentiment Analysis
Sentiment Lexicons- Lists of Opinionated:
- Words and Phrases (MPQA, SentiWordNet, etc)- Common Sense Concepts (SenticNet)
- Built:- Manually- Dictionary-based Approach- Corpus-based Approach
- Applied to Conventional Text- Movie Reviews, News, Blogs, Open Forums, etc.
Sentiment Lexicons on Twitter
Twitter Data- Language Variations
- New Words
- Noisy Nature - lol, gr8, :), :P
Traditional Lexicons- Not tailored to Twitter
noisy data
- Fixed number of words
Twitter-specific Sentiment Lexicons
- Such as: Thelwall-Lexicon
- Built to specifically work on social data - Contain lists of emoticons, slangs, abbreviations, etc.
- Coupled with rule-based method, SentiStrength- Apply text pre-processing routine on tweets
Twitter-specific Sentiment Lexicons
Offer Context-Insensitive Prior Sentiment Orientations and Strength of words
..and Traditional Lexicons
Great
Problem Smile
Sentiment Lexicon
great successsad
pretty
down
wronghorrible
beautiful
mistake
love
good
Positive
Lexicons Adaptation Approaches
Require Training from Labeled Corpora
Supervised Unsupervised
Use General Textual Corpora (e.g., WEB)
or Static lexical knowledge sources (e.g., WordNet)
Contextual Semantic Adaptation Approach
Unsupervised Approach
Captures the Contextual Semantics of words
To assign Contextual Sentiment
Contextual Semantics of Words
“Words that occur in similar context tend to have similar meaning”Wittgenstein (1953)
GreatProblem
Look SmileConcert
Song
WeatherLoss
Game Taylor Swift
AmazingGreat
Capturing Contextual Semantics
Term (m) C1 C2 Cn….
Context-Term Vector
Degree of Correlation
Prior SentimentSentiment Lexicon
(1)
(2)Great
Smile Look
SentiCircles Model
(3)
Contextual Sentiment Strength
Contextual Sentiment Orientation
Positive, Negative Neutral
[-1 (very negative)+1 (very positive)]
Capturing Contextual Semantics
Term (m) C1
Degree of Correlation
Prior Sentiment
Great
Smile
SentiCircles Model
X = R * COS(θ)
Y = R * SIN(θ)
Smile
X
ri
θi
xi
yi
Great
PositiveVery Positive
Very Negative Negative
+1
-1
+1-1 Neutral Region
ri = TDOC(Ci)
θi = Prior_Sentiment (Ci) * π
SentiCircles (Example)
Overall Contextual Sentiment
Ci
X
ri
θi
xi
yi
m
PositiveVery Positive
Very Negative Negative
+1
-1
+1-1 Neutral Region
Senti-Median of SentiCircle
Sentiment Function
Lexicon Adaptation Method
• A set of Antecedent-Consequent Rules
• Decides on the new sentiment of a term based on:– How Weak/Strong its Prior Sentiment – How Weak/Strong its Contextual Sentiment• Based on the Position of the term’s SentiMedian
Thelwall-LexiconCase Study fiery -2fiery -2vex* -3fiery -2witch -1inspir* 3fiery* -2trite* -3fiery -2cunt* -4fiery -2fiery* -2intelligent* 2fiery -2joll* 3fiery* -2fiery* -2suffers -4fiery -2loved 4insidious* -3despis* -4fiery* -2hehe* 2
Positive Negative Neutral0
500
1000
1500
2000
2500
398
1919
229
• Consists of 2546 terms• Coupled with prior sentiment strength between |1| and |5|
[-2, -5] negative term[2, 5] positive term[-1, 1] neutral term
Adaptation Rules on Thelwall-Lexicon
Prior Sentiment < -3 (week negative)
Revolution
Contextual Sentiment = Neutral Change to Neutral
Rule 10
Experiments• Sentiment Lexicon
– Thelwall-Lexicon
• Settings:– Update Setting– Expand Setting– Update + Expand Setting
• Datasets
• Binary Sentiment Classification– SentiStrength
• Lexicon-based Method• Work on Thelwall-Lexicon
Results
Adaptation Impact on Thelwall-Lexicon
Results
Cross comparison results of the original and the adapted lexicons
Adapted Lexicons on HCRPerformance
Precision Recall F1353739414345
Positive Sentiment Detection
Original UpdatedUpdated+Expanded
Sentiment Class Distribution
OMD HCR STS-Gold0.35
0.4
0.45
0.5
0.55
0.6
Positive to Negative Ratio
Impact on Thelwall-Lexicon
OMD HCR STS-Gold10121416182022242628 New Words Added To Thelwall-Lexicon
Conclusion• We proposed an unsupervised approach for sentiment lexicon
adaptation from Twitter data.
• It update the words’ prior sentiment orientations and/or strength based on their contextual semantics in tweets
• The evaluation was done on Thelwall-Lexicon using three Twitter datasets.
• Results showed that lexicons adapted by our approach improved the sentiment classification performance in both accuracy and F1 in two out of three datasets.
Thank You
Email: hassan.saif@open.ac.ukTwitter: hrsaifWebsite: tweenator.com