Adapting Sentiment Lexicons using Contextual Semantics for Sentiment Analysis of Twitter

Post on 26-Jan-2015

123 views 0 download

description

Sentiment lexicons for sentiment analysis offer a simple, yet effective way to obtain the prior sentiment information of opinionated words in texts. However, words' sentiment orientations and strengths often change throughout various contexts in which the words appear. In this paper, we propose a lexicon adaptation approach that uses the contextual semantics of words to capture their contexts in tweet messages and update their prior sentiment orientations and/or strengths accordingly. We evaluate our approach on one state-of-the-art sentiment lexicon using three different Twitter datasets. Results show that the sentiment lexicons adapted by our approach outperform the original lexicon in accuracy and F-measure in two datasets, but give similar accuracy and slightly lower F-measure in one dataset.

transcript

Adapting Sentiment Lexicons using Contextual Semantics for Sentiment

Analysis of TwitterHassan Saif, Yulan He, Miriam Fernandez and Harith Alani

Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom

1st Workshop on Semantic Sentiment AnalysisGreece, Crete 2014

• Sentiment Analysis

• Sentiment Analysis Approaches

• Sentiment Lexicons on Twitter

• Sentiment Lexicon Adaptation Approach

• Evaluation

• Conclusion

Outline

“Sentiment analysis is the task of identifying positive and negative opinions, emotions and evaluations in text”

3

Opinion OpinionFact

Sentiment Analysis

yes, It is sunny, but also very humid :(

The weather is great today :)

I think its almost 30 degrees today

oRich

o Formal Language {Well

Structured Sentences}

oDomain Specific

Conventional Text

Twitter Data

o Short (140-Chars)

oNoisy {gr8, lol, :), :P}

oOpen Environment

I had nightmares all night long last night :(

Negative

Sentiment Lexicon

Text Processing Algorithm

Sentiment Analysis

The Lexicon-based Approach

great successsad

pretty

down

wronghorrible

beautiful

mistake

love

good

Sentiment Analysis

Sentiment Lexicons- Lists of Opinionated:

- Words and Phrases (MPQA, SentiWordNet, etc)- Common Sense Concepts (SenticNet)

- Built:- Manually- Dictionary-based Approach- Corpus-based Approach

- Applied to Conventional Text- Movie Reviews, News, Blogs, Open Forums, etc.

Sentiment Lexicons on Twitter

Twitter Data- Language Variations

- New Words

- Noisy Nature - lol, gr8, :), :P

Traditional Lexicons- Not tailored to Twitter

noisy data

- Fixed number of words

Twitter-specific Sentiment Lexicons

- Such as: Thelwall-Lexicon

- Built to specifically work on social data - Contain lists of emoticons, slangs, abbreviations, etc.

- Coupled with rule-based method, SentiStrength- Apply text pre-processing routine on tweets

Twitter-specific Sentiment Lexicons

Offer Context-Insensitive Prior Sentiment Orientations and Strength of words

..and Traditional Lexicons

Great

Problem Smile

Sentiment Lexicon

great successsad

pretty

down

wronghorrible

beautiful

mistake

love

good

Positive

Lexicons Adaptation Approaches

Require Training from Labeled Corpora

Supervised Unsupervised

Use General Textual Corpora (e.g., WEB)

or Static lexical knowledge sources (e.g., WordNet)

Contextual Semantic Adaptation Approach

Unsupervised Approach

Captures the Contextual Semantics of words

To assign Contextual Sentiment

Contextual Semantics of Words

“Words that occur in similar context tend to have similar meaning”Wittgenstein (1953)

GreatProblem

Look SmileConcert

Song

WeatherLoss

Game Taylor Swift

AmazingGreat

Capturing Contextual Semantics

Term (m) C1 C2 Cn….

Context-Term Vector

Degree of Correlation

Prior SentimentSentiment Lexicon

(1)

(2)Great

Smile Look

SentiCircles Model

(3)

Contextual Sentiment Strength

Contextual Sentiment Orientation

Positive, Negative Neutral

[-1 (very negative)+1 (very positive)]

Capturing Contextual Semantics

Term (m) C1

Degree of Correlation

Prior Sentiment

Great

Smile

SentiCircles Model

X = R * COS(θ)

Y = R * SIN(θ)

Smile

X

ri

θi

xi

yi

Great

PositiveVery Positive

Very Negative Negative

+1

-1

+1-1 Neutral Region

ri = TDOC(Ci)

θi = Prior_Sentiment (Ci) * π

SentiCircles (Example)

Overall Contextual Sentiment

Ci

X

ri

θi

xi

yi

m

PositiveVery Positive

Very Negative Negative

+1

-1

+1-1 Neutral Region

Senti-Median of SentiCircle

Sentiment Function

Lexicon Adaptation Method

• A set of Antecedent-Consequent Rules

• Decides on the new sentiment of a term based on:– How Weak/Strong its Prior Sentiment – How Weak/Strong its Contextual Sentiment• Based on the Position of the term’s SentiMedian

Thelwall-LexiconCase Study fiery -2fiery -2vex* -3fiery -2witch -1inspir* 3fiery* -2trite* -3fiery -2cunt* -4fiery -2fiery* -2intelligent* 2fiery -2joll* 3fiery* -2fiery* -2suffers -4fiery -2loved 4insidious* -3despis* -4fiery* -2hehe* 2

Positive Negative Neutral0

500

1000

1500

2000

2500

398

1919

229

• Consists of 2546 terms• Coupled with prior sentiment strength between |1| and |5|

[-2, -5] negative term[2, 5] positive term[-1, 1] neutral term

Adaptation Rules on Thelwall-Lexicon

Prior Sentiment < -3 (week negative)

Revolution

Contextual Sentiment = Neutral Change to Neutral

Rule 10

Experiments• Sentiment Lexicon

– Thelwall-Lexicon

• Settings:– Update Setting– Expand Setting– Update + Expand Setting

• Datasets

• Binary Sentiment Classification– SentiStrength

• Lexicon-based Method• Work on Thelwall-Lexicon

Results

Adaptation Impact on Thelwall-Lexicon

Results

Cross comparison results of the original and the adapted lexicons

Adapted Lexicons on HCRPerformance

Precision Recall F1353739414345

Positive Sentiment Detection

Original UpdatedUpdated+Expanded

Sentiment Class Distribution

OMD HCR STS-Gold0.35

0.4

0.45

0.5

0.55

0.6

Positive to Negative Ratio

Impact on Thelwall-Lexicon

OMD HCR STS-Gold10121416182022242628 New Words Added To Thelwall-Lexicon

Conclusion• We proposed an unsupervised approach for sentiment lexicon

adaptation from Twitter data.

• It update the words’ prior sentiment orientations and/or strength based on their contextual semantics in tweets

• The evaluation was done on Thelwall-Lexicon using three Twitter datasets.

• Results showed that lexicons adapted by our approach improved the sentiment classification performance in both accuracy and F1 in two out of three datasets.

Thank You

Email: hassan.saif@open.ac.ukTwitter: hrsaifWebsite: tweenator.com