+ All Categories
Home > Documents > SentimentAnalysis - Stanford University · PDF...

SentimentAnalysis - Stanford University · PDF...

Date post: 20-Mar-2018
Category:
Upload: phamtu
View: 220 times
Download: 2 times
Share this document with a friend
75
Sentiment Analysis What is Sentiment Analysis?
Transcript
Page 1: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Sentiment  Analysis

What  is  Sentiment  Analysis?

Page 2: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Positive  or  negative  movie  review?

• unbelievably  disappointing  • Full  of  zany  characters  and  richly  applied  satire,  and  some  

great  plot  twists• this  is  the  greatest  screwball  comedy  ever  filmed• It  was  pathetic.  The  worst  part  about  it  was  the  boxing  

scenes.

2

Page 3: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

• a

3

Google  Shopping  aspectshttps://www.google.com/shopping/product/14023500906804577211

Page 4: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky Twitter  sentiment  versus  Gallup  Poll  of  Consumer  Confidence

Brendan O'Connor, Ramnath Balasubramanyan, Bryan R. Routledge, and Noah A. Smith. 2010. From  Tweets  to  Polls:  Linking  Text  Sentiment  to  Public  Opinion  Time  Series.  In  ICWSM-­‐2010

Page 5: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Twitter  sentiment:

Johan  Bollen,  Huina Mao,  Xiaojun Zeng.  2011.  Twitter  mood  predicts  the  stock  market,Journal  of  Computational  Science  2:1,  1-­‐8.  10.1016/j.jocs.2010.12.007.

5

Page 6: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Target  Sentiment  on  Twitter

• Twitter  Sentiment  App• Alec  Go,  Richa Bhayani,  Lei  Huang.  2009.  

Twitter  Sentiment  Classification  using  Distant  Supervision

6

Page 7: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Very  fancy  sentiment  detectors

• http://nlp.stanford.edu:8080/sentiment/rntnDemo.html

7

Page 8: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Sentiment  analysis  has  many  other  names

• Opinion  extraction• Opinion  mining• Sentiment  mining• Subjectivity  analysis

8

Page 9: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Why  sentiment  analysis?

• Movie:    is  this  review  positive  or  negative?• Products:  what  do  people  think  about  the  new  iPhone?• Public  sentiment:  how  is  consumer  confidence?  Is  despair  increasing?

• Politics:  what  do  people  think  about  this  candidate  or  issue?• Prediction:  predict  election  outcomes  or  market  trendsfrom  sentiment

9

Page 10: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Scherer  Typology  of  Affective  States

• Emotion:  brief  organically  synchronized  …  evaluation  of  a  major  event  • angry,  sad,  joyful,  fearful,  ashamed,  proud,  elated

• Mood:  diffuse  non-­‐caused  low-­‐intensity  long-­‐duration  change  in  subjective  feeling• cheerful,  gloomy,  irritable,  listless,  depressed,  buoyant

• Interpersonal  stances:  affective  stance  toward  another  person  in  a  specific  interaction• friendly,  flirtatious,  distant,  cold,  warm,  supportive,  contemptuous

• Attitudes:  enduring,  affectively  colored  beliefs,  dispositions  towards  objects  or  persons• liking,  loving,  hating,  valuing,  desiring

• Personality  traits:  stable  personality  dispositions  and  typical  behavior  tendencies• nervous,  anxious,  reckless,  morose,  hostile,  jealous

Page 11: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Scherer  Typology  of  Affective  States

• Emotion:  brief  organically  synchronized  …  evaluation  of  a  major  event  • angry,  sad,  joyful,  fearful,  ashamed,  proud,  elated

• Mood:  diffuse  non-­‐caused  low-­‐intensity  long-­‐duration  change  in  subjective  feeling• cheerful,  gloomy,  irritable,  listless,  depressed,  buoyant

• Interpersonal  stances:  affective  stance  toward  another  person  in  a  specific  interaction• friendly,  flirtatious,  distant,  cold,  warm,  supportive,  contemptuous

• Attitudes:  enduring,  affectively  colored  beliefs,  dispositions  towards  objects  or  persons• liking,  loving,  hating,  valuing,  desiring

• Personality  traits:  stable  personality  dispositions  and  typical  behavior  tendencies• nervous,  anxious,  reckless,  morose,  hostile,  jealous

Page 12: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Sentiment  Analysis

• Sentiment  analysis  is  the  detection  of  attitudes“enduring,  affectively  colored  beliefs,  dispositions  towards  objects  or  persons”1. Holder  (source)  of  attitude2. Target  (aspect)  of  attitude3. Type  of  attitude• From  a  set  of  types

• Like,  love,  hate,  value,  desire, etc.• Or  (more  commonly)  simple  weighted  polarity:  

• positive,  negative,  neutral,  together  with  strength4. Text containing  the  attitude• Sentence or  entire  document12

Page 13: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Sentiment  Analysis

• Simplest  task:• Is  the  attitude  of  this  text  positive  or  negative?

• More  complex:• Rank  the  attitude  of  this  text  from  1  to  5

• Advanced:• Detect  the  target  (stance  detection)• Detect  source• Complex  attitude  types

Page 14: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Sentiment  Analysis• Simplest  task:• Is  the  attitude  of  this  text  positive  or  negative?

• More  complex:• Rank  the  attitude  of  this  text  from  1  to  5

• Advanced:• Detect  the  target  (stance  detection)• Detect  source• Complex  attitude  types

Page 15: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Sentiment  Analysis

What  is  Sentiment  Analysis?

Page 16: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Sentiment  Analysis

A  Baseline  Algorithm

Page 17: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Sentiment Classification in Movie Reviews

• Polarity  detection:• Is  an  IMDB  movie  review  positive  or  negative?

• Data:  Polarity  Data  2.0:  • http://www.cs.cornell.edu/people/pabo/movie-­‐review-­‐data

Bo  Pang,  Lillian  Lee,  and  Shivakumar Vaithyanathan.    2002.    Thumbs  up?  Sentiment  Classification  using  Machine  Learning  Techniques.  EMNLP-­‐2002,  79—86.Bo  Pang  and  Lillian  Lee.    2004.    A  Sentimental  Education:  Sentiment  Analysis  Using  Subjectivity  Summarization  Based  on  Minimum  Cuts.    ACL,  271-­‐278

Page 18: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

IMDB  data  in  the  Pang  and  Lee  database

when  _star  wars_  came  out  some  twenty  years  ago  ,  the  image  of  traveling  throughout  the  stars  has  become  a  commonplace  image  .  […]when  han solo  goes  light  speed  ,  the  stars  change  to  bright  lines  ,  going  towards  the  viewer  in  lines  that  converge  at  an  invisible  point  .  cool  .  _october sky_  offers  a  much  simpler  image–that  of  a  single  white  dot  ,  traveling  horizontally  across  the  night  sky  .      [.  .  .  ]

“  snake  eyes  ”  is  the  most  aggravating  kind  of  movie  :  the  kind  that  shows  so  much  potential  then  becomes  unbelievably  disappointing  .  it’s  not  just  because  this  is  a  briandepalma film  ,  and  since  he’s  a  great  director  and  one  who’s  films  are  always  greeted  with  at  least  some  fanfare  .  and  it’s  not  even  because  this  was  a  film  starring  nicolas cage  and  since  he  gives  a  brauvara performance  ,  this  film  is  hardly  worth  his  talents  .  

✓ ✗

Page 19: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Baseline  Algorithm  (adapted  from  Pang  and  Lee)

• Tokenization• Feature  Extraction• Classification  using  different  classifiers• Naive  Bayes• MaxEnt• SVM

Page 20: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Sentiment  Tokenization  Issues

• Deal  with  HTML  and  XML  markup• Twitter  mark-­‐up  (names,  hash  tags)• Capitalization  (preserve  for  

words  in  all  caps)• Phone  numbers,  dates• Emoticons• Useful  code:

• Christopher  Potts  sentiment  tokenizer• Brendan  O’Connor  twitter  tokenizer20

[<>]? # optional hat/brow[:;=8] # eyes[\-o\*\']? # optional nose[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth | #### reverse orientation[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth[\-o\*\']? # optional nose[:;=8] # eyes[<>]? # optional hat/brow

Potts  emoticons

Page 21: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Extracting  Features  for  Sentiment  Classification

• How  to  handle  negation• I didn’t like this movievs• Don't dismiss this film

21

Page 22: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Negation

Add  NOT_  to  every  word  between  negation  and  following  punctuation:

didn’t like this movie , but I

didn’t NOT_like NOT_this NOT_movie but I

Das,  Sanjiv and  Mike  Chen.  2001.  Yahoo!  for  Amazon:  Extracting  market  sentiment  from  stock  message  boards.  In  Proceedings  of  the  Asia  Pacific  Finance  Association  Annual  Conference  (APFA).Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP-2002, 79—86.

Page 23: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Extracting  Features  for  Sentiment  Classification

Which  words  to  use?• Only  adjectives• All  words

All  words  turns  out  to  work  better,  at  least  on  this  data23

Page 24: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Reminder:  Naive Bayes

24

cNB = argmaxc j∈C

P(cj ) P(wi | cj )i∈positions∏

Page 25: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Reminder:  Naive Bayes

25

Let  Nc be  number  of  documents  with  class  cLet  Ndoc be  total  number  of  documents

6.2 • TRAINING THE NAIVE BAYES CLASSIFIER 5

positions all word positions in test document

cNB = argmaxc2C

P(c)Y

i2positions

P(wi|c) (6.9)

Naive Bayes calculations, like calculations for language modeling, are done inlog space, to avoid underflow and increase speed. Thus Eq. 6.9 is generally insteadexpressed as

cNB = argmaxc2C

logP(c)+X

i2positions

logP(wi|c) (6.10)

By considering features in log space Eq. 6.10 computes the predicted class asa linear function of input features. Classifiers that use a linear combination ofthe inputs to make a classification decision —like naive Bayes and also logisticregression— are called linear classifiers.linear

classifiers

6.2 Training the Naive Bayes Classifier

How can we learn the probabilities P(c) and P( fi|c)? Let’s first consider the max-imum likelihood estimate. We’ll simply use the frequencies in the data. For thedocument prior P(c) we ask what percentage of the documents in our training setare in each class c. Let Nc be the number of documents in our training data withclass c and Ndoc be the total number of documents. Then:

P̂(c) =Nc

Ndoc(6.11)

(6.12)

To learn the probability P( fi|c), we’ll assume a feature is just the existence of aword in the document’s bag of words, and so we’ll want P(wi|c), which we computeas the fraction of times the word wi appears among all words in all documents oftopic c. We first concatenate all documents with category c into one big “categoryc” text. Then we use the frequency of wi in this concatenated document to give amaximum likelihood estimate of the probability:

P̂(wi|c) =count(wi,c)Pw2V count(w,c)

(6.13)

Here the vocabulary V consists of the union of all the word types in all classes,not just the words in one class c.

There is a problem, however, with maximum likelihood training. Imagine weare trying to estimate the likelihood of the word “fantastic” given class positive, butsuppose there are no training documents that both contain the word “fantastic” andare classified as positive. Perhaps the word “fantastic” happens to occur (sarcasti-cally?) in the class negative. In such a case the probability for this feature will bezero:

Page 26: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Reminder:  Naive Bayes

26

• Likelihoods

• What  about  zeros?  Suppose  "fantastic"  never  occurs?

• Add-­‐one  smmoothing

6.2 • TRAINING THE NAIVE BAYES CLASSIFIER 5

positions all word positions in test document

cNB = argmaxc2C

P(c)Y

i2positions

P(wi|c) (6.9)

Naive Bayes calculations, like calculations for language modeling, are done inlog space, to avoid underflow and increase speed. Thus Eq. 6.9 is generally insteadexpressed as

cNB = argmaxc2C

logP(c)+X

i2positions

logP(wi|c) (6.10)

By considering features in log space Eq. 6.10 computes the predicted class asa linear function of input features. Classifiers that use a linear combination ofthe inputs to make a classification decision —like naive Bayes and also logisticregression— are called linear classifiers.linear

classifiers

6.2 Training the Naive Bayes Classifier

How can we learn the probabilities P(c) and P( fi|c)? Let’s first consider the max-imum likelihood estimate. We’ll simply use the frequencies in the data. For thedocument prior P(c) we ask what percentage of the documents in our training setare in each class c. Let Nc be the number of documents in our training data withclass c and Ndoc be the total number of documents. Then:

P̂(c) =Nc

Ndoc(6.11)

(6.12)

To learn the probability P( fi|c), we’ll assume a feature is just the existence of aword in the document’s bag of words, and so we’ll want P(wi|c), which we computeas the fraction of times the word wi appears among all words in all documents oftopic c. We first concatenate all documents with category c into one big “categoryc” text. Then we use the frequency of wi in this concatenated document to give amaximum likelihood estimate of the probability:

P̂(wi|c) =count(wi,c)Pw2V count(w,c)

(6.13)

Here the vocabulary V consists of the union of all the word types in all classes,not just the words in one class c.

There is a problem, however, with maximum likelihood training. Imagine weare trying to estimate the likelihood of the word “fantastic” given class positive, butsuppose there are no training documents that both contain the word “fantastic” andare classified as positive. Perhaps the word “fantastic” happens to occur (sarcasti-cally?) in the class negative. In such a case the probability for this feature will bezero:

6 CHAPTER 6 • NAIVE BAYES AND SENTIMENT CLASSIFICATION

P̂(“fantastic”|positive) =count(“fantastic”,positive)P

w2V count(w,positive)= 0 (6.14)

But since naive Bayes naively multiplies all the feature likelihoods together, zeroprobabilities in the likelihood term for any class will cause the probability of theclass to be zero, no matter the other evidence!

The simplest solution is the add-one (Laplace) smoothing introduced in Chap-ter 4. While Laplace smoothing is usually replaced by more sophisticated smoothingalgorithms in language modeling, it is commonly used in naive Bayes text catego-rization:

P̂(wi|c) =count(wi,c)+1P

w2V (count(w,c)+1)=

count(wi,c)+1�Pw2V count(w,c)

�+ |V |

(6.15)

Note once again that it is a crucial that the vocabulary V consists of the unionof all the word types in all classes, not just the words in one class c (try to convinceyourself why this must be true; see the exercise at the end of the chapter).

What do we do about words that occur in our test data but are not in our vocab-ulary at all because they did not occur in any training document in any class? Thestandard solution for such unknown words is to ignore such words—remove themfrom the test document and not include any probability for them at all.

Finally, some systems choose to completely ignore another class of words: stop

words, very frequent words like the and a. This can be done by sorting the vocabu-stop words

lary by frequency in the training set, and defining the top 10–100 vocabulary entriesas stop words, or alternatively by using one of the many pre-defined stop word listavailable online. Then every instance of these stop words are simply removed fromboth training and test documents as if they had never occurred. In most text classi-fication applications, however, using a stop word list doesn’t improve performance,and so it is more common to make use of the entire vocabulary and not use a stopword list.

Fig. 6.2 shows the final algorithm.

6.3 Worked example

Let’s walk through an example of training and testing naive Bayes with add-onesmoothing. We’ll use a sentiment analysis domain with the two classes positive(+) and negative (-), and take the following miniature training and test documentssimplified from actual movie reviews.

Cat Documents

Training - just plain boring- entirely predictable and lacks energy- no surprises and very few laughs+ very powerful+ the most fun film of the summer

Test ? predictable with no fun

The prior P(c) for the two classes is computed via Eq. 6.12 as NcNdoc

:

6 CHAPTER 6 • NAIVE BAYES AND SENTIMENT CLASSIFICATION

P̂(“fantastic”|positive) =count(“fantastic”,positive)P

w2V count(w,positive)= 0 (6.14)

But since naive Bayes naively multiplies all the feature likelihoods together, zeroprobabilities in the likelihood term for any class will cause the probability of theclass to be zero, no matter the other evidence!

The simplest solution is the add-one (Laplace) smoothing introduced in Chap-ter 4. While Laplace smoothing is usually replaced by more sophisticated smoothingalgorithms in language modeling, it is commonly used in naive Bayes text catego-rization:

P̂(wi|c) =count(wi,c)+1P

w2V (count(w,c)+1)=

count(wi,c)+1�Pw2V count(w,c)

�+ |V |

(6.15)

Note once again that it is a crucial that the vocabulary V consists of the unionof all the word types in all classes, not just the words in one class c (try to convinceyourself why this must be true; see the exercise at the end of the chapter).

What do we do about words that occur in our test data but are not in our vocab-ulary at all because they did not occur in any training document in any class? Thestandard solution for such unknown words is to ignore such words—remove themfrom the test document and not include any probability for them at all.

Finally, some systems choose to completely ignore another class of words: stop

words, very frequent words like the and a. This can be done by sorting the vocabu-stop words

lary by frequency in the training set, and defining the top 10–100 vocabulary entriesas stop words, or alternatively by using one of the many pre-defined stop word listavailable online. Then every instance of these stop words are simply removed fromboth training and test documents as if they had never occurred. In most text classi-fication applications, however, using a stop word list doesn’t improve performance,and so it is more common to make use of the entire vocabulary and not use a stopword list.

Fig. 6.2 shows the final algorithm.

6.3 Worked example

Let’s walk through an example of training and testing naive Bayes with add-onesmoothing. We’ll use a sentiment analysis domain with the two classes positive(+) and negative (-), and take the following miniature training and test documentssimplified from actual movie reviews.

Cat Documents

Training - just plain boring- entirely predictable and lacks energy- no surprises and very few laughs+ very powerful+ the most fun film of the summer

Test ? predictable with no fun

The prior P(c) for the two classes is computed via Eq. 6.12 as NcNdoc

:

Page 27: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Binarized (Boolean  feature)    Multinomial  Naive Bayes

• Intuition:• For  sentiment  (and  probably  for  other  text  classification  domains)

• Word  occurrence  may  matter  more  than  word  frequency• The  occurrence  of  the  word  fantastic tells  us  a  lot• The  fact  that  it  occurs  5  times  may  not  tell  us  much  more.

• "Binary  Naive Bayes"• Clips  all  the  word  counts  in  each  document  at  1

27

Page 28: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Boolean  Multinomial  Naive  Bayes:  Learning

• Calculate  P(cj) terms• For  each  cj in  C do

docsj¬ all  docs  with    class  =cj

P(cj )←| docsj |

| total # documents| P(wk | cj )←nk +α

n+α |Vocabulary |

• Textj¬ single  doc  containing  all  docsj• For each  word  wk in  Vocabulary

nk¬ #  of  occurrences  of  wk in  Textj

• From  training  corpus,  extract  Vocabulary

• Calculate  P(wk | cj) terms• Remove  duplicates  in  each  doc:

• For  each  word  type  w  in  docj• Retain  only  a  single  instance  of  w

Page 29: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Boolean  Multinomial  Naive Bayes(Binary  NB)  on  a  test  document  d

29

• First  remove  all  duplicate  words  from  d• Then  compute  NB  using  the  same  equation:  

cNB = argmaxc j∈C

P(cj ) P(wi | cj )i∈positions∏

Page 30: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Normal  vs.  Binary  NB

30

8 CHAPTER 6 • NAIVE BAYES AND SENTIMENT CLASSIFICATION

P(+)P(S|+) =25⇥ 1⇥1⇥2

293 = 3.2⇥10�5

The model thus predicts the class negative for the test sentence.

6.4 Optimizing for Sentiment Analysis

While standard naive Bayes text classification can work well for sentiment analysis,some small changes are generally employed that improve performance.

First, for sentiment classification and a number of other text classification tasks,whether a word occurs or not seems to matter more than its frequency. Thus it oftenimproves performance to clip the word counts in each document at 1. This variantis called binary multinominal naive Bayes or binary NB. The variant uses thebinary NB

same Eq. 6.10 except that for each document we remove all duplicate words beforeconcatenating them into the single big document. Fig. 6.3 shows an example inwhich a set of four documents (shortened and text-normalized for this example) areremapped to binary, with the modified counts shown in the table on the right. Theexample is worked without add-1 smoothing to make the differences clearer. Notethat the results counts need not be 1; the word great has a count of 2 even for BinaryNB, because it appears in multiple documents.

Four original documents:

� it was pathetic the worst part was theboxing scenes

� no plot twists or great scenes+ and satire and great plot twists+ great scenes great film

After per-document binarization:

� it was pathetic the worst part boxingscenes

� no plot twists or great scenes+ and satire great plot twists+ great scenes film

NB BinaryCounts Counts+ � + �

and 2 0 1 0boxing 0 1 0 1film 1 0 1 0great 3 1 2 1it 0 1 0 1no 0 1 0 1or 0 1 0 1part 0 1 0 1pathetic 0 1 0 1plot 1 1 1 1satire 1 0 1 0scenes 1 2 1 2the 0 2 0 1twists 1 1 1 1was 0 2 0 1worst 0 1 0 1

Figure 6.3 An example of binarization for the binary naive Bayes algorithm.

A second important addition commonly made when doing text classification forsentiment is to deal with negation. Consider the difference between I really like thismovie (positive) and I didn’t like this movie (negative). The negation expressed bydidn’t completely alters the inferences we draw from the predicate like. Similarly,negation can modify a negative word to produce a positive review (don’t dismiss thisfilm, doesn’t let us get bored).

A very simple baseline that is commonly used in sentiment to deal with negationis during text normalization to prepend the prefix NOT to every word after a tokenof logical negation (n’t, not, no, never) until the next punctuation mark. Thus thephrase

8 CHAPTER 6 • NAIVE BAYES AND SENTIMENT CLASSIFICATION

P(+)P(S|+) =25⇥ 1⇥1⇥2

293 = 3.2⇥10�5

The model thus predicts the class negative for the test sentence.

6.4 Optimizing for Sentiment Analysis

While standard naive Bayes text classification can work well for sentiment analysis,some small changes are generally employed that improve performance.

First, for sentiment classification and a number of other text classification tasks,whether a word occurs or not seems to matter more than its frequency. Thus it oftenimproves performance to clip the word counts in each document at 1. This variantis called binary multinominal naive Bayes or binary NB. The variant uses thebinary NB

same Eq. 6.10 except that for each document we remove all duplicate words beforeconcatenating them into the single big document. Fig. 6.3 shows an example inwhich a set of four documents (shortened and text-normalized for this example) areremapped to binary, with the modified counts shown in the table on the right. Theexample is worked without add-1 smoothing to make the differences clearer. Notethat the results counts need not be 1; the word great has a count of 2 even for BinaryNB, because it appears in multiple documents.

Four original documents:

� it was pathetic the worst part was theboxing scenes

� no plot twists or great scenes+ and satire and great plot twists+ great scenes great film

After per-document binarization:

� it was pathetic the worst part boxingscenes

� no plot twists or great scenes+ and satire great plot twists+ great scenes film

NB BinaryCounts Counts+ � + �

and 2 0 1 0boxing 0 1 0 1film 1 0 1 0great 3 1 2 1it 0 1 0 1no 0 1 0 1or 0 1 0 1part 0 1 0 1pathetic 0 1 0 1plot 1 1 1 1satire 1 0 1 0scenes 1 2 1 2the 0 2 0 1twists 1 1 1 1was 0 2 0 1worst 0 1 0 1

Figure 6.3 An example of binarization for the binary naive Bayes algorithm.

A second important addition commonly made when doing text classification forsentiment is to deal with negation. Consider the difference between I really like thismovie (positive) and I didn’t like this movie (negative). The negation expressed bydidn’t completely alters the inferences we draw from the predicate like. Similarly,negation can modify a negative word to produce a positive review (don’t dismiss thisfilm, doesn’t let us get bored).

A very simple baseline that is commonly used in sentiment to deal with negationis during text normalization to prepend the prefix NOT to every word after a tokenof logical negation (n’t, not, no, never) until the next punctuation mark. Thus thephrase

8 CHAPTER 6 • NAIVE BAYES AND SENTIMENT CLASSIFICATION

P(+)P(S|+) =25⇥ 1⇥1⇥2

293 = 3.2⇥10�5

The model thus predicts the class negative for the test sentence.

6.4 Optimizing for Sentiment Analysis

While standard naive Bayes text classification can work well for sentiment analysis,some small changes are generally employed that improve performance.

First, for sentiment classification and a number of other text classification tasks,whether a word occurs or not seems to matter more than its frequency. Thus it oftenimproves performance to clip the word counts in each document at 1. This variantis called binary multinominal naive Bayes or binary NB. The variant uses thebinary NB

same Eq. 6.10 except that for each document we remove all duplicate words beforeconcatenating them into the single big document. Fig. 6.3 shows an example inwhich a set of four documents (shortened and text-normalized for this example) areremapped to binary, with the modified counts shown in the table on the right. Theexample is worked without add-1 smoothing to make the differences clearer. Notethat the results counts need not be 1; the word great has a count of 2 even for BinaryNB, because it appears in multiple documents.

Four original documents:

� it was pathetic the worst part was theboxing scenes

� no plot twists or great scenes+ and satire and great plot twists+ great scenes great film

After per-document binarization:

� it was pathetic the worst part boxingscenes

� no plot twists or great scenes+ and satire great plot twists+ great scenes film

NB BinaryCounts Counts+ � + �

and 2 0 1 0boxing 0 1 0 1film 1 0 1 0great 3 1 2 1it 0 1 0 1no 0 1 0 1or 0 1 0 1part 0 1 0 1pathetic 0 1 0 1plot 1 1 1 1satire 1 0 1 0scenes 1 2 1 2the 0 2 0 1twists 1 1 1 1was 0 2 0 1worst 0 1 0 1

Figure 6.3 An example of binarization for the binary naive Bayes algorithm.

A second important addition commonly made when doing text classification forsentiment is to deal with negation. Consider the difference between I really like thismovie (positive) and I didn’t like this movie (negative). The negation expressed bydidn’t completely alters the inferences we draw from the predicate like. Similarly,negation can modify a negative word to produce a positive review (don’t dismiss thisfilm, doesn’t let us get bored).

A very simple baseline that is commonly used in sentiment to deal with negationis during text normalization to prepend the prefix NOT to every word after a tokenof logical negation (n’t, not, no, never) until the next punctuation mark. Thus thephrase

Page 31: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Binary  NB

• Binary  works  better  than  full  word  counts  for  sentiment  classification

31

B.  Pang,  L.  Lee,  and  S.  Vaithyanathan.    2002.    Thumbs  up?  Sentiment  Classification  using  Machine  Learning  Techniques.  EMNLP-­‐2002,  79—86.Wang,  Sida,  and  Christopher  D.  Manning.  2012.  "Baselines  and  bigrams:  Simple,  good  sentiment  and  topic  classification."  Proceedings  of  ACL,  90-­‐94.  

Page 32: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Cross-­‐Validation

• Break  up  data  into  10  folds• (Equal  positive  and  negative  inside  each  fold?)

• For  each  fold• Choose  the  fold  as  a  temporary  test  set

• Train  on  9  folds,  compute  performance  on  the  test  fold

• Report  average  performance  of  the  10  runs

TrainingTest

Test

Test

Test

Test

Training

Training Training

Training

Training

Iteration

1

2

3

4

5

Page 33: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Other  issues  in  Classification

• Logistic  Regression  and  SVM  tend  to  do  better  than  Naïve Bayes

33

Page 34: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky Problems:  What  makes  reviews  hard  to  classify?

• Subtlety:• Perfume  review  in  Perfumes:  the  Guide:• “If  you  are  reading  this  because  it  is  your  darling  fragrance,  please  wear  it  at  home  exclusively,  and  tape  the  windows  shut.”

• Dorothy  Parker  on  Katherine  Hepburn• “She  runs  the  gamut  of  emotions  from  A  to  B”

34

Page 35: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Thwarted  Expectationsand  Ordering  Effects

• “This  film  should  be  brilliant.    It  sounds  like  a  great  plot,  the  actors  are  first  grade,  and  the  supporting  cast  is  good  as  well,  and  Stallone  is  attempting  to  deliver  a  good  performance.  However,  it  can’t  hold  up.”

• Well  as  usual  Keanu  Reeves  is  nothing  special,  but  surprisingly,  the  very  talented  Laurence  Fishbourne is  not  so  good  either,  I  was  surprised.

35

Page 36: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Sentiment  Analysis

A  Baseline  Algorithm

Page 37: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Sentiment  Analysis

Sentiment  Lexicons

Page 38: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

The  General  Inquirer

• Home  page:  http://www.wjh.harvard.edu/~inquirer• List  of  Categories:   http://www.wjh.harvard.edu/~inquirer/homecat.htm

• Spreadsheet:  http://www.wjh.harvard.edu/~inquirer/inquirerbasic.xls• Categories:

• Positiv (1915  words)  and  Negativ (2291  words)• Strong  vs Weak,  Active  vs Passive,  Overstated  versus  Understated• Pleasure,  Pain,  Virtue,  Vice,  Motivation,  Cognitive  Orientation,  etc

• Free  for  Research  Use

Philip  J.  Stone,  Dexter  C  Dunphy,  Marshall  S.  Smith,  Daniel  M.  Ogilvie.  1966.  The  General  Inquirer:  A  Computer  Approach  to  Content  Analysis.  MIT  Press

Page 39: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

LIWC  (Linguistic  Inquiry  and  Word  Count)Pennebaker,  J.W.,  Booth,  R.J.,  &  Francis,  M.E.  (2007).  Linguistic  Inquiry  and  Word  Count:  LIWC  2007.  Austin,  TX

• Home  page:  http://www.liwc.net/• 2300  words,  >70  classes• Affective  Processes

• negative  emotion  (bad,  weird,  hate,  problem,  tough)• positive  emotion  (love,  nice,  sweet)

• Cognitive  Processes• Tentative  (maybe,  perhaps,  guess),  Inhibition  (block,  constraint)

• Pronouns,  Negation  (no,  never),  Quantifiers  (few,  many)  • $30  or  $90  fee

Page 40: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

MPQA  Subjectivity  Cues  Lexicon

• Home  page:  http://www.cs.pitt.edu/mpqa/subj_lexicon.html• 6885  words  from  8221  lemmas

• 2718  positive• 4912  negative

• Each  word  annotated  for  intensity  (strong,  weak)• GNU  GPL40

Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP-2005.

Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003.

Page 41: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Bing  Liu  Opinion  Lexicon

• Bing  Liu's  Page  on  Opinion  Mining• http://www.cs.uic.edu/~liub/FBS/opinion-­‐lexicon-­‐English.rar

• 6786  words• 2006  positive• 4783  negative

41

Minqing Hu  and  Bing  Liu.  Mining  and  Summarizing  Customer  Reviews.  ACM  SIGKDD-­‐2004.

Page 42: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Analyzing  the  polarity  of  each  word  in  IMDB

• How  likely  is  each  word  to  appear  in  each  sentiment  class?• Count(“bad”)  in  1-­‐star,  2-­‐star,  3-­‐star,  etc.• But  can’t  use  raw  counts:  • Instead,  likelihood:

• Make  them  comparable  between  words• Scaled  likelihood:

Potts,  Christopher.  2011.  On  the  negativity  of  negation.    SALT    20,  636-­‐659.

P(w | c) = f (w,c)f (w,c)

w∈c∑

P(w | c)P(w)

Page 43: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Analyzing  the  polarity  of  each  word  in  IMDBPotts,  Christopher.  2011.  On  the  negativity  of  negation.    SALT    20,  636-­‐659.

Overview Data Methods Categorization Scale induction Looking ahead

Example: attenuators

IMDB – 53,775 tokens

Category

-0.50

-0.39

-0.28

-0.17

-0.06

0.06

0.17

0.28

0.39

0.50

0.050.09

0.15

Cat = 0.33 (p = 0.004)Cat^2 = -4.02 (p < 0.001)

OpenTable – 3,890 tokens

Category

-0.50

-0.25

0.00

0.25

0.50

0.08

0.38

Cat = 0.11 (p = 0.707)Cat^2 = -6.2 (p = 0.014)

Goodreads – 3,424 tokens

Category

-0.50

-0.25

0.00

0.25

0.50

0.08

0.19

0.36

Cat = -0.55 (p = 0.128)Cat^2 = -5.04 (p = 0.016)

Amazon/Tripadvisor – 2,060 tokens

Category

-0.50

-0.25

0.00

0.25

0.50

0.12

0.28

Cat = 0.42 (p = 0.207)Cat^2 = -2.74 (p = 0.05)

somewhat/r

IMDB – 33,515 tokens

Category

-0.50

-0.39

-0.28

-0.17

-0.06

0.06

0.17

0.28

0.39

0.50

0.04

0.09

0.17

Cat = -0.13 (p = 0.284)Cat^2 = -5.37 (p < 0.001)

OpenTable – 2,829 tokens

Category

-0.50

-0.25

0.00

0.25

0.50

0.08

0.31

Cat = 0.2 (p = 0.265)Cat^2 = -4.16 (p = 0.007)

Goodreads – 1,806 tokens

Category

-0.50

-0.25

0.00

0.25

0.50

0.05

0.12

0.18

0.35

Cat = -0.87 (p = 0.016)Cat^2 = -5.74 (p = 0.004)

Amazon/Tripadvisor – 2,158 tokens

Category

-0.50

-0.25

0.00

0.25

0.50

0.11

0.29

Cat = 0.54 (p = 0.183)Cat^2 = -3.32 (p = 0.045)

fairly/r

IMDB – 176,264 tokens

Category

-0.50

-0.39

-0.28

-0.17

-0.06

0.06

0.17

0.28

0.39

0.50

0.050.090.13

Cat = -0.43 (p < 0.001)Cat^2 = -3.6 (p < 0.001)

OpenTable – 8,982 tokens

Category

-0.50

-0.25

0.00

0.25

0.50

0.08

0.140.19

0.32

Cat = -0.64 (p = 0.035)Cat^2 = -4.47 (p = 0.007)

Goodreads – 11,895 tokens

Category

-0.50

-0.25

0.00

0.25

0.50

0.07

0.15

0.34

Cat = -0.71 (p = 0.072)Cat^2 = -4.59 (p = 0.018)

Amazon/Tripadvisor – 5,980 tokens

Category

-0.50

-0.25

0.00

0.25

0.50

0.15

0.28

Cat = 0.26 (p = 0.496)Cat^2 = -2.23 (p = 0.131)

pretty/r

“Potts&diagrams” Potts,&Christopher.& 2011.&NSF&workshop&on&restructuring&adjectives.

good

great

excellent

disappointing

bad

terrible

totally

absolutely

utterly

somewhat

fairly

pretty

Positive scalars Negative scalars Emphatics Attenuators

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

Overview Data Methods Categorization Scale induction Looking ahead

Example: attenuators

IMDB – 53,775 tokens

Category

-0.50

-0.39

-0.28

-0.17

-0.06

0.06

0.17

0.28

0.39

0.50

0.050.09

0.15

Cat = 0.33 (p = 0.004)Cat^2 = -4.02 (p < 0.001)

OpenTable – 3,890 tokens

Category

-0.50

-0.25

0.00

0.25

0.50

0.08

0.38

Cat = 0.11 (p = 0.707)Cat^2 = -6.2 (p = 0.014)

Goodreads – 3,424 tokens

Category

-0.50

-0.25

0.00

0.25

0.50

0.08

0.19

0.36

Cat = -0.55 (p = 0.128)Cat^2 = -5.04 (p = 0.016)

Amazon/Tripadvisor – 2,060 tokens

Category

-0.50

-0.25

0.00

0.25

0.50

0.12

0.28

Cat = 0.42 (p = 0.207)Cat^2 = -2.74 (p = 0.05)

somewhat/r

IMDB – 33,515 tokens

Category

-0.50

-0.39

-0.28

-0.17

-0.06

0.06

0.17

0.28

0.39

0.50

0.04

0.09

0.17

Cat = -0.13 (p = 0.284)Cat^2 = -5.37 (p < 0.001)

OpenTable – 2,829 tokens

Category

-0.50

-0.25

0.00

0.25

0.50

0.08

0.31

Cat = 0.2 (p = 0.265)Cat^2 = -4.16 (p = 0.007)

Goodreads – 1,806 tokens

Category

-0.50

-0.25

0.00

0.25

0.50

0.05

0.12

0.18

0.35

Cat = -0.87 (p = 0.016)Cat^2 = -5.74 (p = 0.004)

Amazon/Tripadvisor – 2,158 tokens

Category

-0.50

-0.25

0.00

0.25

0.50

0.11

0.29

Cat = 0.54 (p = 0.183)Cat^2 = -3.32 (p = 0.045)

fairly/r

IMDB – 176,264 tokens

Category

-0.50

-0.39

-0.28

-0.17

-0.06

0.06

0.17

0.28

0.39

0.50

0.050.090.13

Cat = -0.43 (p < 0.001)Cat^2 = -3.6 (p < 0.001)

OpenTable – 8,982 tokens

Category

-0.50

-0.25

0.00

0.25

0.50

0.08

0.140.19

0.32

Cat = -0.64 (p = 0.035)Cat^2 = -4.47 (p = 0.007)

Goodreads – 11,895 tokens

Category

-0.50

-0.25

0.00

0.25

0.50

0.07

0.15

0.34

Cat = -0.71 (p = 0.072)Cat^2 = -4.59 (p = 0.018)

Amazon/Tripadvisor – 5,980 tokens

Category

-0.50

-0.25

0.00

0.25

0.50

0.15

0.28

Cat = 0.26 (p = 0.496)Cat^2 = -2.23 (p = 0.131)

pretty/r

“Potts&diagrams” Potts,&Christopher.& 2011.&NSF&workshop&on&restructuring&adjectives.

good

great

excellent

disappointing

bad

terrible

totally

absolutely

utterly

somewhat

fairly

pretty

Positive scalars Negative scalars Emphatics Attenuators

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

1 2 3 4 5 6 7 8 9 10rating

Page 44: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Other  sentiment  feature:  Logical  negation

• Is  logical  negation  (no,  not)  associated  with  negative  sentiment?

• Potts  experiment:• Count  negation  (not,  n’t,  no,  never)  in  online  reviews• Regress  against  the  review  rating

Potts,  Christopher.  2011.  On  the  negativity  of  negation.    SALT    20,  636-­‐659.

Page 45: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky Potts  2011  Results:More  negation  in  negative  sentiment

a

Scaled

 likelihoo

dP(w|c)/P(w)

Page 46: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Sentiment  Analysis

Sentiment  Lexicons

Page 47: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Sentiment  Analysis

Learning  Sentiment  Lexicons

Page 48: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Semi-­‐supervised  learning  of  lexicons

• What  to  do  for  domains  where  you  don't  have  a  lexicon?

• Learn  a  lexicon!    • Use  a  small  amount  of  information

• A  few  labeled  examples• A  few  hand-­‐built  patterns

• To  bootstrap  a  lexicon48

Page 49: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Semi-­‐supervised  learning  of  lexicons

18.2 • SEMI-SUPERVISED INDUCTION OF SENTIMENT LEXICONS 3

The General Inquirer is a freely available web resource with lexicons of 1915 posi-tive words and 2291 negative words (and also includes other lexicons we’ll discussin the next section).

The MPQA Subjectivity lexicon (Wilson et al., 2005) has 2718 positive and4912 negative words drawn from a combination of sources, including the GeneralInquirer lists, the output of the Hatzivassiloglou and McKeown (1997) system de-scribed below, and a bootstrapped list of subjective words and phrases (Riloff andWiebe, 2003) that was then hand-labeled for sentiment. Each phrase in the lexiconis also labeled for reliability (strongly subjective or weakly subjective). The po-larity lexicon of (Hu and Liu, 2004) gives 2006 positive and 4783 negative words,drawn from product reviews, labeled using a bootstrapping method from WordNetdescribed in the next section.

Positive admire, amazing, assure, celebration, charm, eager, enthusiastic, excel-lent, fancy, fantastic, frolic, graceful, happy, joy, luck, majesty, mercy,nice, patience, perfect, proud, rejoice, relief, respect, satisfactorily, sen-sational, super, terrific, thank, vivid, wise, wonderful, zest

Negative abominable, anger, anxious, bad, catastrophe, cheap, complaint, conde-scending, deceit, defective, disappointment, embarrass, fake, fear, filthy,fool, guilt, hate, idiot, inflict, lazy, miserable, mourn, nervous, objection,pest, plot, reject, scream, silly, terrible, unfriendly, vile, wicked

Figure 18.2 Some samples of words with consistent sentiment across three sentiment lexi-cons: the General Inquirer (Stone et al., 1966), the MPQA Subjectivity lexicon (Wilson et al.,2005), and the polarity lexicon of Hu and Liu (2004).

18.2 Semi-supervised induction of sentiment lexicons

Some affective lexicons are built by having humans assign ratings to words; thiswas the technique for building the General Inquirer starting in the 1960s (Stoneet al., 1966), and for modern lexicons based on crowd-sourcing to be described inSection 18.5.1. But one of the most powerful ways to learn lexicons is to use semi-supervised learning.

In this section we introduce three methods for semi-supervised learning that areimportant in sentiment lexicon extraction. The three methods all share the sameintuitive algorithm which is sketched in Fig. 18.3.

function BUILDSENTIMENTLEXICON(posseeds,negseeds) returns poslex,neglex

poslex posseedsneglex negseedsUntil done

poslex poslex + FINDSIMILARWORDS(poslex)neglex neglex + FINDSIMILARWORDS(neglex)

poslex,neglex POSTPROCESS(poslex,neglex)

Figure 18.3 Schematic for semi-supervised sentiment lexicon induction. Different algo-rithms differ in the how words of similar polarity are found, in the stopping criterion, and inthe post-processing.49

Page 50: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Hatzivassiloglou and  McKeown intuition  for  identifying  word  polarity

• Adjectives  conjoined  by  “and”  have  same  polarity• Fair  and legitimate,  corrupt  and brutal• *fair  and brutal,  *corrupt  and legitimate

• Adjectives  conjoined  by  “but”  do  not• fair  but  brutal

50

Vasileios Hatzivassiloglou and  Kathleen  R.  McKeown.  1997.  Predicting  the  Semantic  Orientation  of  Adjectives.  ACL,  174–181

Page 51: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Hatzivassiloglou &  McKeown 1997Step  1

• Label  seed  set  of  1336  adjectives  (all  >20  in  21  million  word  WSJ  corpus)

• 657  positive• adequate  central  clever  famous  intelligent  remarkable  reputed  sensitive  slender  thriving…

• 679  negative• contagious  drunken  ignorant  lanky  listless  primitive  strident  troublesome  unresolved  unsuspecting…

51

Page 52: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Hatzivassiloglou &  McKeown 1997Step  2

• Expand  seed  set  to  conjoined  adjectives

52

nice, helpful

nice, classy

Page 53: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Hatzivassiloglou &  McKeown 1997Step  3

• Supervised  classifier  assigns  “polarity  similarity”  to  each  word  pair,  resulting  in  graph:

53

classy

nice

helpful

fair

brutal

irrationalcorrupt

Page 54: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Hatzivassiloglou &  McKeown 1997Step  4

• Clustering  for  partitioning  the  graph  into  two

54

classy

nice

helpful

fair

brutal

irrationalcorrupt

+ -­‐

Page 55: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Output  polarity  lexicon

• Positive• bold  decisive  disturbing  generous  good  honest  important  large  mature  patient  peaceful  positive  proud  sound  stimulating  straightforward  strange  talented  vigorous  witty…

• Negative• ambiguous  cautious  cynical  evasive  harmful  hypocritical  inefficient  insecure  irrational  irresponsible  minor  outspoken  pleasant  reckless  risky  selfish  tedious  unsupported  vulnerable  wasteful…

55

Page 56: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Output  polarity  lexicon

• Positive• bold  decisive  disturbing generous  good  honest  important  large  mature  patient  peaceful  positive  proud  sound  stimulating  straightforward  strange talented  vigorous  witty…

• Negative• ambiguous  cautious cynical  evasive  harmful  hypocritical  inefficient  insecure  irrational  irresponsible  minor  outspoken pleasant reckless  risky  selfish  tedious  unsupported  vulnerable  wasteful…

56

Page 57: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Turney Algorithm

1. Extract  a  phrasal  lexicon  from  reviews2. Learn  polarity  of  each  phrase3. Rate  a  review  by  the  average  polarity  of  its  phrases

57

Turney (2002): Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

Page 58: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Extract  two-­‐word  phrases  with  adjectives

First  Word Second  Word Third  Word (not  extracted)

JJ NN  or  NNS anythingRB, RBR,  RBS JJ Not  NN  nor  NNSJJ JJ Not  NN  or  NNSNN  or  NNS JJ Nor  NN  nor NNSRB,  RBR,  or  RBS VB,  VBD,  VBN,  VBG anything58

Page 59: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

How  to  measure  polarity  of  a  phrase?

• Positive  phrases  co-­‐occur  more  with  “excellent”• Negative  phrases  co-­‐occur  more  with  “poor”• But  how  to  measure  co-­‐occurrence?

59

Page 60: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Pointwise Mutual  Information

• Mutual  information  between  2  random  variables  X  and  Y

• Pointwise mutual  information:  • How  much  more  do  events  x  and  y  co-­‐occur  than  if  they  were  independent?

I(X,Y ) = P(x, y)y∑

x∑ log2

P(x,y)P(x)P(y)

PMI(X,Y ) = log2P(x,y)P(x)P(y)

Page 61: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Pointwise Mutual  Information

• Pointwise mutual  information:  • How  much  more  do  events  x  and  y  co-­‐occur  than  if  they  were  independent?

• PMI  between  two  words:• How  much  more  do  two  words  co-­‐occur  than  if  they  were  independent?

PMI(word1,word2 ) = log2P(word1,word2)P(word1)P(word2)

PMI(X,Y ) = log2P(x,y)P(x)P(y)

Page 62: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

How  to  Estimate  Pointwise Mutual  Information

• Query  search  engine    • P(word)  estimated  by        hits(word)/N• P(word1,word2)  by      hits(word1 NEAR word2)/N• (More  correctly  the  bigram  denominator  should  be  kN,  because  there  are  a  total  of  N  consecutive  bigrams  (word1,word2),  but  kN bigrams  that  are  k  words  apart,  but  we  just  use  N  on  the  rest  of  this  slide  and  the  next.)

PMI(word1,word2 ) = log2

1Nhits(word1 NEAR word2)

1Nhits(word1) 1

Nhits(word2)

Page 63: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Does  phrase  appear  more  with  “poor”  or  “excellent”?

63

Polarity(phrase) = PMI(phrase,"excellent")−PMI(phrase,"poor")

= log2hits(phrase NEAR "excellent")hits("poor")hits(phrase NEAR "poor")hits("excellent")!

"#

$

%&

= log2hits(phrase NEAR "excellent")

hits(phrase)hits("excellent")hits(phrase)hits("poor")

hits(phrase NEAR "poor")

= log2

1N hits(phrase NEAR "excellent")1N hits(phrase) 1

N hits("excellent")− log2

1N hits(phrase NEAR "poor")1N hits(phrase) 1

N hits("poor")

Page 64: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Phrases  from  a  thumbs-­‐up  review

64

Phrase POS  tags Polarityonline service JJ  NN 2.8

online  experience JJ  NN 2.3

direct  deposit JJ  NN 1.3

local  branch JJ  NN 0.42…

low  fees JJ  NNS 0.33

true  service JJ  NN -0.73

other bank JJ  NN -0.85

inconveniently located JJ  NN -1.5

Average 0.32

Page 65: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Phrases  from  a  thumbs-­‐down  review

65

Phrase POS  tags Polaritydirect  deposits JJ  NNS 5.8

online  web JJ  NN 1.9

very  handy RB JJ 1.4…

virtual monopoly JJ  NN -2.0

lesser  evil RBR  JJ -2.3

other  problems JJ  NNS -2.8

low funds JJ  NNS -6.8

unethical practices JJ  NNS -8.5

Average -1.2

Page 66: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Results  of  Turney algorithm

• 410  reviews  from  Epinions• 170  (41%)  negative• 240  (59%)  positive

• Majority  class  baseline:  59%• Turney algorithm:  74%

• Phrases  rather  than  words• Learns  domain-­‐specific  information66

Page 67: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Summary  on  Learning  Lexicons

• Why:• Learn  a  lexicon  that  is  specific  to  a  domain• Learn  a  lexicon  with  more  words  (more  robust)  than  off-­‐the-­‐shelf

• Intuition• Start  with  a  seed  set  of  words  (‘good’,  ‘poor’)• Find  other  words  that  have  similar  polarity:• Using  “and”  and  “but”• Using  words  that  occur  nearby  in  the  same  document• Add  them  to  lexicon

Page 68: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Sentiment  Analysis

Learning  Sentiment  Lexicons

Page 69: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Sentiment  Analysis

Other  Sentiment  Tasks

Page 70: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Finding  sentiment  of  a  sentence

• Important  for  finding  aspects  or  attributes• Target  of  sentiment

• The food was great but the service was awful

70

Page 71: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Finding  aspect/attribute/target  of  sentiment

• Frequent  phrases  +  rules• Find  all  highly  frequent  phrases  across  reviews  (“fish tacos”)• Filter  by  rules  like  “occurs  right  after  sentiment  word”• “…great fish tacos”    means  fish tacos a  likely  aspect

Casino casino,  buffet,  pool,  resort,  bedsChildren’s Barber haircut,  job,  experience,  kidsGreek  Restaurant food,  wine,  service,  appetizer,  lambDepartment  Store selection,  department,  sales,  shop,  clothing

M.  Hu  and  B.  Liu.  2004.  Mining  and  summarizing  customer  reviews.  In  Proceedings  of  KDD.S.  Blair-­‐Goldensohn,  K.  Hannan,  R.  McDonald,  T.  Neylon,  G.  Reis,  and  J.  Reynar.  2008.    Building  a  Sentiment  Summarizer  for  Local  Service  Reviews.    WWW  Workshop.

Page 72: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Finding  aspect/attribute/target  of  sentiment

• The  aspect  name  may  not  be  in  the  sentence• For  restaurants/hotels,  aspects  are  well-­‐understood• Supervised  classification

• Hand-­‐label  a  small  corpus  of  restaurant  review  sentences  with  aspect• food,  décor,  service,  value,  NONE

• Train  a  classifier  to  assign  an  aspect  to  asentence• “Given  this  sentence,  is  the  aspect  food,  décor,  service,  value,  or NONE”

72

Page 73: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Putting  it  all  together:Finding  sentiment  for  aspects

73

ReviewsFinalSummary

Sentences&  Phrases

Sentences&  Phrases

Sentences&  Phrases

TextExtractor

SentimentClassifier

AspectExtractor

Aggregator

S.  Blair-­‐Goldensohn,  K.  Hannan,  R.  McDonald,  T.  Neylon,  G.  Reis,  and  J.  Reynar.  2008.    Building  a  Sentiment  Summarizer  for  Local  Service  Reviews.    WWW  Workshop

Page 74: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Results  of  Blair-­‐Goldensohn et  al.  methodRooms (3/5  stars,  41  comments)

(+) The  room  was  clean  and  everything  worked  fine  – even  the  water  pressure  ...

(+)  We  went  because  of  the  free  room  and  was  pleasantly  pleased  ...

(-­‐)  …the  worst  hotel  I  had  ever  stayed  at  ...Service (3/5  stars,  31  comments)

(+)  Upon  checking  out  another  couple  was  checking  early  due  to  a  problem  ...

(+)  Every  single  hotel  staff  member  treated  us  great  and  answered  every  ...

(-­‐)  The  food  is  cold  and  the  service  gives  new  meaning  to  SLOW.

Dining (3/5  stars,  18  comments)(+)  our  favorite  place  to  stay  in  biloxi.the food  is  great  also  the  service  ...(+)  Offer  of  free  buffet  for  joining  the  Play

Page 75: SentimentAnalysis - Stanford University · PDF fileSentiment’analysis’has’many’othernames •Opinion%extraction •Opinion%mining •Sentimentmining •Subjectivity%analysis

Dan  Jurafsky

Summary  on  Sentiment

• Generally  modeled  as  classification  or  regression  task• predict  a  binary  or  ordinal  label

• Features:• Negation  is  important• Using  all  words  (in  naive  bayes)  works  well  for  some  tasks• Finding  subsets  of  words  may  help  in  other  tasks• Hand-­‐built  polarity  lexicons• Use  seeds  and  semi-­‐supervised  learning  to  induce  lexicons


Recommended