+ All Categories
Home > Documents > Jiang Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Jiang Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Date post: 22-Feb-2016
Category:
Upload: ciro
View: 48 times
Download: 0 times
Share this document with a friend
Description:
Every Term Has Sentiment: Learning from Emoticon Evidences for Chinese Microblog Sentiment Analysis. Jiang Fei [email protected] State Key Laboratory of Intelligent Technology and Systems Department of Computer Science and Technology Tsinghua University. Outline. Introduction Main work - PowerPoint PPT Presentation
Popular Tags:
26
Every Term Has Sentiment: Learning from Emoticon Evidences for Chinese Microblog Sentiment Analysis Jiang Fei [email protected] State Key Laboratory of Intelligent Technology and Systems Department of Computer Science and Technology Tsinghua University
Transcript
Page 1: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Every Term Has Sentiment: Learning fromEmoticon Evidences for Chinese Microblog

Sentiment Analysis

Jiang [email protected]

State Key Laboratory of Intelligent Technology and SystemsDepartment of Computer Science and Technology

Tsinghua University

Page 2: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Outline• Introduction• Main work• Sentiment lexicon construction• Feature extraction• Classification

• Experiments• Conclusion• Future work

Page 3: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Introduction• Objective

• Automatically sentiment lexicon construction.• Doc-level classification: positive, negative and neutral.

• Existing problems & Solutions• Limited coverage of human constructed sentiment

lexicons (automatically lexicon construction).• Lack of labeled data (using emoticon signals, or use noisy data

provided by some websites)

• Our contribution• No need for large amount of neutral corpora• Using proper emoticons• Every word has potential sentiment• Multi-view of features

Page 4: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Main work

• Sentiment lexicon construction based on emoticons

• Feature extraction based on sentiment lexicon

• Sentiment classification

Page 5: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Main work

• Sentiment lexicon construction based on emoticons

• Feature extraction based on sentiment lexicon

• Sentiment classification

Page 6: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Investigation on emoticonsStatistics of quantity distribution

0 1 2 3 4 5 6 7 8 9+0

0.10.20.30.40.50.60.70.8

# of emoticons

prop

ortio

n

with emoticons:~32%

With one emoticon:~18%

With more than one emoticons:~14%

Page 7: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

[

[ 哈哈 ] [

[ 给力 ][go

od][ 泪 ]

[ 悲伤 ][ 弱 ]

[ 鄙视 ][ 怒 ]

00.10.20.30.40.50.60.70.80.9

positiveneutralnegative

Prop

ortio

nInvestigation on emoticons

Statistics of sentiment distribution

Page 8: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Approach I: Label Propagation

• Sentiment score after the n-th iteration• [0, 1]. Control the impact of seeds• Init vector, dims(|V|), 1 for seeds (emoticons above)• Co-occurrence matrix, -1 for negation modified words

𝑠𝑛+1=𝛼 ∙𝑊 ∙𝑠𝑛+(1−𝛼 )𝑏

Based on our previous work: Emotion tokens: bridging the gap among multilingual twitter sentiment analysis. AIRS’11 (2011)

Page 9: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Approach II: Frequency Statistics for Sufficient Corpus

• Set B: Negative set, microblog containing negative emoticons ([ 弱 ], [ 鄙视 ], [ 怒 ])

Page 10: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

OOV/phrase extraction

• Word segmentation• n-gram

• Concatenate adjacent words• To reduce computation complexity , n<=4

• Compute two metrics

((t)

ni

t

coun

reqT

t

f

w

freq t

Motivation: 说真的,这款手机太次了,不给力!

Page 11: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Sentiment lexicon construction60,000 words/OOVs/phrases/emoticons in total

Page 12: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Main work

• Sentiment lexicon construction based on emoticons

• Feature extraction based on sentiment lexicon

• Sentiment classification

Page 13: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Feature extraction• Microblog structure features

• Number of mentioning labels (@)• Number of URLs• Number of hashtags• …

• Sentence structure features• Number of“ ;”• Number of“%”• Existence of continuous serial numbers• …

Page 14: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Feature extraction• Word segmentation/part-of-speech tagging

• Negations

• Constructed a negation list

• A negation word modifies the first v/a/p after it

• Invalidation window

• Greedy longest match

这 /rzv 位 /q 先生 /noun , /wd 您 /rr 真 /d 是 /vshi 站 /n 着 /uzhe 说 /v 话 /n 不 /d 腰 /n 疼 /v [ 鄙视 ]

这位先生,您真是站着说话不腰疼[ 鄙视 ]

真、是、站、着、说、话、不、腰、疼、您、真是、站着、说话、腰疼、这位、先生 [ 鄙视 ]

这位,先生,您,真是,站着,说话,腰疼 (-1) , [ 鄙视 ]

Page 15: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Feature extraction

• Sentiment lexicon features• (Maximum, Product) of (positive, negative) score of

words/phrases

• Emoticon features• (Maximum, Product) of (positive, negative) score of emoticons

• MDA (Modified by degree adv) features• (Maximum, Product) of (positive, negative) score of MDA

Page 16: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Main work

• Sentiment lexicon construction based on emoticons

• Feature extraction based on sentiment lexicon

• Sentiment classification

Page 17: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Sentiment classification with SVM• One-stage three-class classification (libsvm)

• Two-stage two-class classification (hierarchical)

• neutral VS non-neutral

• positive VS negative

• Two-stage two-class classification (parallel)

• positive VS non-positive

• negative VS non-negative

Page 18: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Experiments – Lexicon construction

__

( , )) ( , )

( ) ( )w NEG Ew POS E

w POS w NEG

freq w bias w NEG freq w bias w POS

NEG freq w POS freqE

w

Define lexicon error rate as

Explanation

• The frequency of a word.

• The degree of sentiment bias of a word.

• Labeled words from 《学生褒贬义词典》

Page 19: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Experiments – Lexicon construction

𝑠𝑛+1=𝛼 ∙𝑊 ∙𝑠𝑛+(1−𝛼 )𝑏

Page 20: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Experiments – Sentiment classification

• Dataset• NLP&CC 2013 evaluation, task II, sample data

• Preprocess• positive (happiness, like)• negative (sadness, anger, disgust)• neutral (none)• fear and surprise discarded

• Size• 968 for each class, a balanced set

Page 21: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Experiments – Sentiment classification

Page 22: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Method : Our lexicon replaced with Ⅰ “ 情感词汇本体”Method : Barbosa, etc [2010]. Ⅱ

Our model almost(-0.1%) performs the best in related task of COAE 2013

Experiments – Sentiment classification

Page 23: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Conclusion

• Sentiment lexicon construction

• Different strength of emoticon signals

• Every term has potential sentiment

• No need for large amount of neutral corpus

• Sentiment features

• Different, multi-views of microblog’s characteristics

Page 24: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Further work• Large amount of noisy neutral corpora may help

• e.g. Output of current classifier

• Syntactic/Semantic features

• Relation between words (i.e. skip gram)

Page 25: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

References• Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data.

In: Coling 2010: Posters. pp. 36–44. Beijing, China (2010)• Cui, A., Zhang, M., Liu, Y., Ma, S.: Emotion tokens: bridging the gap among multilingual

twitter sentiment analysis. In: Proceedings of the 7th Asia conference on Information Retrieval Technology. pp. 238–249. AIRS’11 (2011)

• Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of LREC. vol. 2010 (2010)

• Zhang, W., Liu, J., Guo, X.: Positive and Negative Words Dictionary for Students. Encyclopedia of China Publishing House (2004)

• Chang, C.C., Lin, C.J.: Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (May 2011)

• Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Coling 2010: Posters. pp. 36–44. Beijing, China (2010)

• Xu, L., Lin, H., Pan, Y., Ren, H., Chen, J.: Constructing the affective lexicon ontology. Journal of the China Society for Scientific and Technical Information 27(2), 180–185 (2008)

Page 26: Jiang  Fei f91.jiang@gmail State Key Laboratory of Intelligent Technology and Systems

Thanks!


Recommended