+ All Categories
Home > Documents > Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised...

Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised...

Date post: 06-Aug-2020
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
37
CS 562: Empirical Methods in Natural Language Processing Fall 2011 Liang Huang ([email protected] ) Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011
Transcript
Page 1: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562: Empirical Methodsin Natural Language Processing

Fall 2011

Liang Huang ([email protected])

Unit 3: Natural Language LearningPart 1: Unsupervised Learning

(EM, forward-backward, inside-outside)

Monday, November 7, 2011

Page 2: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Review of Noisy-Channel Model

2Monday, November 7, 2011

Page 3: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Example 1: Part-of-Speech Tagging

• use tag bigram as a language model

• channel model is context-indep.

3Monday, November 7, 2011

Page 4: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Ideal vs. Available Data

4Monday, November 7, 2011

Page 5: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Ideal vs. Available Data

4Monday, November 7, 2011

Page 6: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Ideal vs. Available Data

4Monday, November 7, 2011

Page 7: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Ideal vs. Available Data

4Monday, November 7, 2011

Page 8: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Ideal vs. Available Data

5

HW3: ideal HW5: realisticEY B AH LA B E R U1 2 3 4 4

AH B AW TA B A U T O1 2 3 3 4 4

AH L ER TA R A A T O1 2 3 3 4 4

EY SE E S U1 1 2 2

EY B AH LA B E R U

AH B AW TA B A U T O

AH L ER TA R A A T O

EY SE E S U

Monday, November 7, 2011

Page 9: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Ideal vs. Available Data

6Monday, November 7, 2011

Page 10: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Ideal vs. Available Data

6Monday, November 7, 2011

Page 11: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Ideal vs. Available Data

6Monday, November 7, 2011

Page 12: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Ideal vs. Available Data

6Monday, November 7, 2011

Page 13: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Ideal vs. Available Data

6Monday, November 7, 2011

Page 14: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Incomplete Data / Model

7Monday, November 7, 2011

Page 15: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

EM: Expectation-Maximization

8Monday, November 7, 2011

Page 16: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

How to Change m? 1) Hard

9Monday, November 7, 2011

Page 17: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

How to Change m? 1) Hard

9Monday, November 7, 2011

Page 18: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

How to Change m? 1) Hard

10Monday, November 7, 2011

Page 19: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

How to Change m? 1) Hard

10Monday, November 7, 2011

Page 20: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

How to Change m? 2) Soft

11Monday, November 7, 2011

Page 21: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Fractional Counts• distribution over all possible hallucinated hidden variables

• W AI N

W A I N

12

W AI N| | / \ W A I N

W AI N| |\ \ W A I N

W AI N|\ \ \ W A I N

hard-EM counts 1 0 0

AY|-> A: 0.333 A I: 0.333 I: 0.333W|-> W: 0.667 W A: 0.333 N|-> N: 0.667 I N: 0.333

fractional counts 0.333 0.333 0.333

fractional counts 0.25 0.5 0.25AY|-> A I: 0.500 A: 0.250 I: 0.250W|-> W: 0.750 W A: 0.250 N|-> N: 0.750 I N: 0.250

eventually ... 0 ... 1 ... 0Monday, November 7, 2011

Page 22: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Fractional Counts

• how about

W EH T

W E T O

B IY B IY| |\ |\ \B I I B I I

• so EM can possibly: (1) learn something correct (2) learn something wrong (3) doesn’t learn anything

• but with lots of data => likely to learn something good13

Monday, November 7, 2011

Page 23: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

EM: slow version (non-DP)

• initialize the conditional prob. table to uniform

• repeat until converged:

• E-step:

• for each training example x (here: (e...e, j...j) pair):

• for each hidden z: compute p(x, z) from the current model

• p(x) = sumz p(x, z); [debug: corpus prob p(data) *= p(x)]

• for each hidden z = (z1 z2 ... zn): for each i:

• fraccount(zi) += p(x, z) / p(x)

• M-step: count-n-divide on fraccounts => new model

14

W AI N| |\ \ W A I N

z’

W AI N|\ \ \ W A I N

z’’

W AI N| | /\ W A I N

z(z1 z2 z3)

Monday, November 7, 2011

Page 24: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

EM: fast version (DP)

• initialize the conditional prob. table to uniform

• repeat until converged:

• E-step:

• for each training example x (here: (e...e, j...j) pair):

• forward from s to t; note: forw[t] = p(x) = sumz p(x, z)

• backward from t to s; note: back[t]=1; back[s] = forw[t]

• for each edge (u, v) in the DP graph with label(u, v) = zi

• fraccount(zi) += forw[u] * back[v] * prob(u, v) / p(x)

• M-step: count-n-divide on fraccounts => new model

15sumz: (u, v) in z p(x, z)

forw[u] back[v]u v ts

forw[t] = back[s] = p(x) = sumz p(x, z)

Monday, November 7, 2011

Page 25: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

inside-outside:PCFG, SCFG, ...

How to avoid enumeration?

• dynamic programming: the forward-backward algorithm

• forward is just like Viterbi, replacing max by sum

• backward is like reverse Viterbi (also with sum)

16

POS tagging, crypto, ...

alignment, edit-distance, ...

Monday, November 7, 2011

Page 26: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Example Forward Code• for HW5. this example shows forward only.

17

n, m = len(eprons), len(jprons) forward[0][0] = 1

for i in xrange(0, n): epron = eprons[i] for j in forward[i]: for k in range(1, min(m-j, 3)+1): jseg = tuple(jprons[j:j+k]) score = forward[i][j] * table[epron][jseg] forward[i+1][j+k] += score

totalprob *= forward[n][m]

W A I N

W

AY

N

0 1 2 3 4

0

1

2

3Monday, November 7, 2011

Page 27: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Example Forward Code• for HW5. this example shows forward only.

17

n, m = len(eprons), len(jprons) forward[0][0] = 1

for i in xrange(0, n): epron = eprons[i] for j in forward[i]: for k in range(1, min(m-j, 3)+1): jseg = tuple(jprons[j:j+k]) score = forward[i][j] * table[epron][jseg] forward[i+1][j+k] += score

totalprob *= forward[n][m]

W A I N

W

AY

N

0 1 2 3 4

0

1

2

3Monday, November 7, 2011

Page 28: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Example Forward Code• for HW5. this example shows forward only.

18

n, m = len(eprons), len(jprons) forward[0][0] = 1

for i in xrange(0, n): epron = eprons[i] for j in forward[i]: for k in range(1, min(m-j, 3)+1): jseg = tuple(jprons[j:j+k]) score = forward[i][j] * table[epron][jseg] forward[i+1][j+k] += score

totalprob *= forward[n][m]............ A I .........

AY

forw[i][j]forw[i][j]forw[i][j]forw[i][j]forw[i][j]forw[i][j]forw[i][j]forw[i][j]

back[i+1][j+k]back[i+1][j+k]back[i+1][j+k]back[i+1][j+k]back[i+1][j+k]back[i+1][j+k]

0 j+k m

0

n

j

i

i+1

forw[u] back[v]u v ts

u

v

s

t

forw[s] = back[t] = 1.0 forw[t] = back[s] = p(x)

Monday, November 7, 2011

Page 29: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

EM: fast version (DP)

• initialize the conditional prob. table to uniform

• repeat until converged:

• E-step:

• for each training example x (here: (e...e, j...j) pair):

• forward from s to t; note: forw[t] = p(x) = sumz p(x, z)

• backward from t to s; note: back[t]=1; back[s] = forw[t]

• for each edge (u, v) in the DP graph with label(u, v) = zi

• fraccount(zi) += forw[u] * back[v] * prob(u, v) / p(x)

• M-step: count-n-divide on fraccounts => new model

19sumz: (u, v) in z p(x, z)

forw[u] back[v]u v ts

forw[t] = back[s] = p(x) = sumz p(x, z)

Monday, November 7, 2011

Page 30: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

EM

20Monday, November 7, 2011

Page 31: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Why EM increases p(data) iteratively?

21Monday, November 7, 2011

Page 32: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Why EM increases p(data) iteratively?

21Monday, November 7, 2011

Page 33: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Why EM increases p(data) iteratively?

21Monday, November 7, 2011

Page 34: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

Why EM increases p(data) iteratively?

22

convexauxiliary function

converge tolocal maxima

KL-

dive

rgen

ce

Monday, November 7, 2011

Page 35: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

How to maximize the auxiliary?

23Monday, November 7, 2011

Page 36: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

How to maximize the auxiliary?

23

W AI N| |\ \ W A I N

p(z’|x)=0.3

W AI N|\ \ \ W A I N

p(z’’|x)=0.2

W AI N| | /\

W A I N

p(z|x)=0.5

Monday, November 7, 2011

Page 37: Unit 3: Natural Language Learning · Unit 3: Natural Language Learning Part 1: Unsupervised Learning (EM, forward-backward, inside-outside) Monday, November 7, 2011. CS 562 - EM Review

CS 562 - EM

How to maximize the auxiliary?

23

W AI N| |\ \ W A I N

p(z’|x)=0.3

W AI N|\ \ \ W A I N

p(z’’|x)=0.2

W AI N| | /\

W A I N

p(z|x)=0.5

just count-n-divide on the fractional data!

(as if MLE on complete data)

W AI N| |\ \ W A I N

3x

W AI N|\ \ \ W A I N

2x

W AI N| | /\

W A I N

5x

Monday, November 7, 2011


Recommended