+ All Categories
Home > Documents > Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear...

Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear...

Date post: 19-Jul-2020
Category:
Upload: others
View: 8 times
Download: 0 times
Share this document with a friend
47
0 SFU NatLangLab Natural Language Processing Angel Xuan Chang angelxuanchang.github.io/nlp-class adapted from lecture slides from Anoop Sarkar Simon Fraser University 2020-01-23
Transcript
Page 1: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

0

SFUNatLangLab

Natural Language Processing

Angel Xuan Changangelxuanchang.github.io/nlp-class

adapted from lecture slides from Anoop Sarkar

Simon Fraser University

2020-01-23

Page 2: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

1

Natural Language Processing

Angel Xuan Changangelxuanchang.github.io/nlp-class

adapted from lecture slides from Anoop Sarkar

Simon Fraser University

January 23, 2020

Part 1: Classification tasks in NLP

Page 3: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

2

Classification tasks in NLP

Naive Bayes Classifier

Log linear models

Page 4: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

3

Sentiment classification: Movie reviews

I neg unbelievably disappointing

I pos Full of zany characters and richly applied satire, and somegreat plot twists

I pos this is the greatest screwball comedy ever filmed

I neg It was pathetic. The worst part about it was the boxingscenes.

Page 5: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

4

Intent Detection

I ADDR CHANGE I just moved and want to change myaddress.

I ADDR CHANGE Please help me update my address.

I FILE CLAIM I just got into a terrible accident and I want tofile a claim.

I CLOSE ACCOUNT I’m moving and I want to disconnect myservice.

Page 6: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

5

Prepositional Phrases

I noun attach: I bought the shirt with pockets

I verb attach: I bought the shirt with my credit card

I noun attach: I washed the shirt with mud

I verb attach: I washed the shirt with soap

I Attachment depends on the meaning of the entire sentence –needs world knowledge, etc.

I Maybe there is a simpler solution: we can attempt to solve itusing heuristics or associations between words

Page 7: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

6

Ambiguity Resolution: Prepositional Phrases in English

I Learning Prepositional Phrase Attachment: Annotated Datav n1 p n2 Attachment

join board as director Vis chairman of N.V. N

using crocidolite in filters Vbring attention to problem V

is asbestos in products Nmaking paper for filters N

including three with cancer N...

......

......

Page 8: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

7

Prepositional Phrase Attachment

Method Accuracy

Always noun attachment 59.0Most likely for each preposition 72.2Average Human (4 head words only) 88.2Average Human (whole sentence) 93.2

Page 9: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

8

Back-off Smoothing

I Random variable a represents attachment.

I a = n1 or a = v (two-class classification)

I We want to compute probability of noun attachment:p(a = n1 | v , n1, p, n2).

I Probability of verb attachment is 1− p(a = n1 | v , n1, p, n2).

Page 10: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

9

Back-off Smoothing1. If f (v , n1, p, n2) > 0 and p̂ 6= 0.5

p̂(an1 | v , n1, p, n2) =f (an1 , v , n1, p, n2)

f (v , n1, p, n2)

2. Else if f (v , n1, p) + f (v , p, n2) + f (n1, p, n2) > 0and p̂ 6= 0.5

p̂(an1 | v , n1, p, n2) =f (an1 , v , n1, p) + f (an1 , v , p, n2) + f (an1 , n1, p, n2)

f (v , n1, p) + f (v , p, n2) + f (n1, p, n2)

3. Else if f (v , p) + f (n1, p) + f (p, n2) > 0

p̂(an1 | v , n1, p, n2) =f (an1 , v , p) + f (an1 , n1, p) + f (an1 , p, n2)

f (v , p) + f (n1, p) + f (p, n2)

4. Else if f (p) > 0 (try choosing attachment based onpreposition alone)

p̂(an1 | v , n1, p, n2) =f (an1 , p)

f (p)

5. Else p̂(an1 | v , n1, p, n2) = 1.0

Page 11: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

10

Prepositional Phrase Attachment: Results

I Results (Collins and Brooks 1995): 84.5% accuracywith the use of some limited word classes for dates, numbers,etc.

I Toutanova, Manning, and Ng, 2004:use sophisticated smoothing model for PP attachment86.18% with words & stems; with word classes: 87.54%

I Merlo, Crocker and Berthouzoz, 1997:test on multiple PPs, generalize disambiguation of 1 PP to2-3 PPs1PP: 84.3% 2PP: 69.6% 3PP: 43.6%

Page 12: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

11

Natural Language Processing

Angel Xuan Changangelxuanchang.github.io/nlp-class

adapted from lecture slides fromAnoop Sarkar, Danqi Chen and Karthik Narasimhan

Simon Fraser University

January 23, 2020

Part 2: Probabilistic Classifiers

Page 13: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

12

Classification Task

I Input:I A document dI a set of classes C = {c1, c2, . . . , cm}

I Output: Predicted class c for document dI Example:

I neg unbelievably disappointingI pos this is the greatest screwball comedy ever filmed

Page 14: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

13

Supervised learning: Let’s use statistics!

I Inputs:I Set of m classes C = {c1, c2, . . . , cm}I Set of n labeled documents: {(d1, c1), (d2, c2), . . . , (dn, cn)}

I Output: Trained classifier F : d → cI What form should F take?I How to learn F?

Page 15: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

14

Types of supervised classifiers

Page 16: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

15

Classification tasks in NLP

Naive Bayes Classifier

Log linear models

Page 17: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

16

Naive Bayes Classifier

I x is the input that can be represented as d independentfeatures fj , 1 ≤ j ≤ d

I y is the output classification

I P(y | x) = P(y)·P(x|y)P(x) (Bayes Rule)

I P(x | y) =∏d

j=1 P(fj | y)

I P(y | x) ∝ P(y) ·∏d

j=1 P(fj | y)

I We can ignore P(x) in the above equation because it is aconstant scaling factor for each y .

Page 18: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

17

Naive Bayes Classifier for text classification

I For text classificaiton: input x = documentd = (w1, . . . ,wk),

I Use as our features the words wj , 1 ≤ j ≤ |V | where V is ourvocabulary

I c is the output classification

I Assume that position of each word is irrelevant and that thewords are conditionally independent given class c

P(w1,w2, . . . ,wk |c) = P(w1|c)P(w2|c) . . .P(wk |c)

I Maximum a posteriori estimate

cMAP = arg maxc

P(c)P(d |c) = arg maxc

P̂(c)k∏

i=1

P̂(wi |c)

Page 19: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

18

Bag of words

Page 20: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

19

Estimating probabilities

Maximum likelihood estimate

P̂(cj) =Count(cj)

n

P̂(wi |cj) =Count(wi , cj)∑

w∈V [Count(w , cj)]

Smoothing

P̂(wi |c) =Count(wi , c) + α∑

w∈V [Count(w , cj) + α]

Page 21: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

20

Overall process

Input: Set of labeled documents: {(di , ci )}ni=1

I Compute vocabulary V of all words

I Calculate

P̂(cj) =Count(cj)

n

I Calculate

P̂(wi |cj) =Count(wi , cj) + α∑

w∈V [Count(w , cj) + α]

I Prediction: Given document d = (w1, . . . ,wk)

cMAP = arg maxc

P̂(c)k∏

i=1

P̂(wi |c)

Page 22: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

21

Naive Bayes Example

Page 23: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

22

Tokenization

Tokenization matters - it can affect your vocabulary

I aren’t aren’tarent

are n’taren t

I Emails, URLs, phone numbers, dates, emoticons

Page 24: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

23

Features

I Remember: Naive Bayes can use any set of features

I Captitalization, subword features (end with -ing), etc

I Domain knowledge crucial for performance

Top features for spam detection

[Alqatawna et al, IJCNSS 2015]

Page 25: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

24

Evaluation

I Table of prediction (binary classification)

I Ideally we want to get

Page 26: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

25

Evaluation Metrics

Accuracy =TP + TN

Total=

200

250= 80%

Page 27: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

26

Evaluation Metrics

Accuracy =TP + TN

Total=

200

250= 80%

Page 28: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

27

Precision and Recall

Page 29: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

28

Precision and Recall

from Wikipedia

Page 30: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

29

F-Score

Page 31: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

30

Choosing Beta

Page 32: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

31

Aggregating scores

I We have Precision, Recall, F1 for each classI How to combine them for an overall score?

I Macro-average: Compute for each class, then averageI Micro-average: Collect predictions for all classes and jointly

evaluate

Page 33: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

32

Macro vs Micro average

I Macroaveraged precision: (0.5 + 0.9)/2 = 0.7

I Microaveraged precision: 100/120 = .83

I Microaveraged score is dominated by score on common classes

Page 34: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

33

Validation

I Choose a metric: Precision/Recall/F1

I Optimize for metric on Validation (aka Development) set

I Finally evaluate on ‘unseen’ Test setI Cross-validation

I Repeatedly sample several train-val splitsI Reduces bias due to sampling errors

Page 35: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

34

Advantanges of Naive Bayes

I Very fast, low storage requirements

I Robust to irrelevant features

I Very good in domains with many equally important features

I Optimal if the independence assumptions hold

I Good dependable baseline for text classification

Page 36: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

35

When to use Naive Bayes

I Small data sizes: Naive Bayes is great! .Rule-based classifiers can work well too

I Medium size datasets: More advanced classifiers mightperform better (SVM, logistic regression)

I Large datasets: Naive Bayes becomes competive again (mostlearned classifiers will work well)

Page 37: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

36

Failings of Naive Bayes (1)

Independence assumptions are too strong

I XOR problem: Naive Bayes cannot learn a decision boundary

I Both variables are jointly required to predict class.Independence assumption broken!

Page 38: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

37

Failings of Naive Bayes (2)

Class Imbalance

I One or more classes have more instances than others

I Data skew causes NB to prefer one class over the other

Page 39: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

38

Failings of Naive Bayes (3)

Weight magnitude errors

I Classes with larger weights are preferred

I 10 documents with class=MA and “Boston” occurring onceeach

I 10 documents with class=CA and “San Francisco” occurringonce each

I New document d : “Boston Boston Boston San Francisco SanFrancisco”

P(class = CA|d) > P(class = MA|d)

Page 40: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

39

Naive Bayes Summary

I Domain knowledge is crucial to selecting good features

I Handle class imbalance by re-weighting classes

I Use log scale operations instead of multiplying probabilities

P(cNB) = arg maxcj∈C

logP(cj) +∑i

logP(xi |cj)

I Model is now just max of sum of weights

Page 41: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

40

Classification tasks in NLP

Naive Bayes Classifier

Log linear models

Page 42: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

41

Log linear model

I The model classifies input into output labels y ∈ YI Let there be m features, fk(x, y) for k = 1, . . . ,m

I Define a parameter vector v ∈ Rm

I Each (x, y) pair is mapped to score:

s(x, y) =∑k

vk · fk(x, y)

I Using inner product notation:

v · f(x, y) =∑k

vk · fk(x, y)

s(x, y) = v · f(x, y)

I To get a probability from the score: Renormalize!

Pr(y | x; v) =exp (s(x, y))∑

y ′∈Y exp (s(x, y ′))

Page 43: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

42

Log linear model

I The name ‘log-linear model’ comes from:

log Pr(y | x; v) = v · f(x, y)︸ ︷︷ ︸linear term

− log∑y ′

exp(v · f(x, y ′)

)︸ ︷︷ ︸

normalization term

I Once the weights v are learned, we can perform predictionsusing these features.

I The goal: to find v that maximizes the log likelihood L(v) ofthe labeled training set containing (xi , yi ) for i = 1 . . . n

L(v) =∑i

log Pr(yi | xi ; v)

=∑i

v · f(xi , yi )−∑i

log∑y ′

exp(v · f(xi , y ′)

)

Page 44: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

43

Log linear modelI Maximize:

L(v) =∑i

v · f(xi , yi )−∑i

log∑y ′

exp(v · f(xi , y ′)

)I Calculate gradient:

dL(v)

dv

∣∣∣∣v

=∑i

f(xi , yi )−∑i

1∑y ′′ exp (v · f(xi , y ′′))∑

y ′

f(xi , y′) · exp

(v · f(xi , y ′)

)=

∑i

f(xi , yi )−∑i

∑y ′

f(xi , y′)

exp (v · f(xi , y ′))∑y ′′ exp (v · f(xi , y ′′))

=∑i

f(xi , yi )︸ ︷︷ ︸Observed counts

−∑i

∑y ′

f(xi , y′) Pr(y ′ | xi ; v)︸ ︷︷ ︸

Expected counts

Page 45: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

44

Gradient ascent

I Init: v(0) = 0

I t ← 0I Iterate until convergence:

I Calculate: ∆ = dL(v)dv

∣∣∣v=v(t)

I Find β∗ = arg maxβ L(v(t) + β∆)I Set v(t+1) ← v(t) + β∗∆

Page 46: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

45

Learning the weights: v: Generalized Iterative Scaling

f # = maxx ,y∑

j fj(x , y)

(the maximum possible feature value; needed for scaling)

Initialize v(0)

For each iteration texpected[j] ← 0 for j = 1 .. # of featuresFor i = 1 to | training data |

For each feature fjexpected[j] += fj(xi , yi ) · P(yi | xi ; v(t))

For each feature fj(x , y)

observed[j] = fj(x , y) · c(x ,y)|training data|

For each feature fj(x , y)

v(t+1)j ← v

(t)j · f#

√observed[j]expected[j]

cf. Goodman, NIPS ’01

Page 47: Natural Language Processing€¦ · Classi cation tasks in NLP Naive Bayes Classi er Log linear models. 3 Sentiment classi cation: Movie reviews I negunbelievably disappointing I

46

Acknowledgements

Many slides borrowed or inspired from lecture notes by AnoopSarkar, Danqi Chen, Karthik Narasimhan, Dan Jurafsky, MichaelCollins, Chris Dyer, Kevin Knight, Chris Manning, Philipp Koehn,Adam Lopez, Graham Neubig, Richard Socher and LukeZettlemoyer from their NLP course materials.

All mistakes are my own.

A big thank you to all the students who read through these notesand helped me improve them.


Recommended