+ All Categories
Home > Technology > Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Date post: 25-Jul-2015
Category:
Upload: alopezfoo
View: 408 times
Download: 5 times
Share this document with a friend
Popular Tags:
85
Machine learning Databases Algorithms and systems Statistics and optimization Four year PhD programme Courses + PhD dissertation (No previous MSc required) http://datascience.inf.ed.ac.uk/ Discovering new patterns and knowledge from data Big data Natural language processing Computer vision Speech processing at the University of Edinburgh
Transcript
Page 1: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

• Machine learning • Databases • Algorithms and systems • Statistics and optimization

Four year PhD programme Courses + PhD dissertation (No previous MSc required)

http://datascience.inf.ed.ac.uk/

Discovering new patterns and knowledge from data

• Big data • Natural language processing • Computer vision • Speech processing

at the University of Edinburgh

Page 2: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

The biggest revolution in the technological landscape for fifty years

Now accepting applications! Find out more and apply at:

pervasiveparallelism.inf.ed.ac.uk

• • 4-year programme: 4-year programme: MSc by Research + PhDMSc by Research + PhD

• Collaboration between: ▶ University of Edinburgh’s School of Informatics ✴ Ranked top in the UK by 2014 REF

▶ Edinburgh Parallel Computing Centre ✴ UK’s largest supercomputing centre

• Full funding available

• Industrial engagement programme includes internships at leading companies

• Research-focused: Work on your thesis topic from the start

• Research topics in software, hardware, theory and

application of: ▶ Parallelism ▶ Concurrency ▶ Distribution

Page 3: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Latent Variable Models and Word Alignment

INFR11062 spring 2015lecture 3

Page 4: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Goal

•Write down a model over sentence pairs.

•Learn an instance of the model from data.

•Use it to predict translations of new sentences.

Page 5: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Goal

•Write down a model over sentence pairs.

•Learn an instance of the model from data.

•Use it to predict translations of new sentences.

Page 6: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However , the sky remained clear under the strong north wind .

IBM Model 1

Page 7: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However , the sky remained clear under the strong north wind .

IBM Model 1

p(f ,a|e)Let’s write a simple model in terms of word-to-word alignments:

Page 8: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

IBM Model 1

Page 9: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

IBM Model 1

Page 10: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

IBM Model 1

Page 11: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

IBM Model 1

p(English length|Chinese length)

Page 12: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

IBM Model 1

p(English length|Chinese length)

Page 13: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

IBM Model 1

Page 14: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

IBM Model 1

p(Chinese word position)

Page 15: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

IBM Model 1

Page 16: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

IBM Model 1

However

Page 17: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

IBM Model 1

However

p(English word|Chinese word)

Page 18: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

IBM Model 1

However

Page 19: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

IBM Model 1

However

Page 20: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

IBM Model 1

However ,

Page 21: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

IBM Model 1

However ,

Page 22: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

IBM Model 1

However , the

Page 23: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

IBM Model 1

However , the sky remained clear under the strong north wind .

Page 24: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

IBM Model 1

French, Englishsentence lengths

alignment of French word at position i

French, Englishword pair

p(f ,a|e) = p(I|J)IY

i=1

p(ai|J) · p(fi|eai)

Page 25: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

IBM Model 1

p(despite| )虽然

p(however| )

p(although| )

p(northern| )

p(north| )北

虽然

虽然

Page 26: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

IBM Model 1

p(despite| )虽然

p(however| )

p(although| )

p(northern| )

p(north| )北

虽然

虽然

???

???

???

???

???

Page 27: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

IBM Model 1

p(despite| )虽然

p(however| )

p(although| )

p(northern| )

p(north| )北

虽然

虽然

???

???

???

???

???

✓{

Page 28: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

• Optimization

Page 29: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However , the sky remained clear under the strong north wind .

MLE for IBM Model 1 (observed)

Page 30: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

MLE for IBM Model 1 (observed)

ˆ✓ = argmax

✓p(f ,a|e)

Page 31: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

MLE for IBM Model 1 (observed)

ˆ✓ = arg max

NY

n=1

0

@p(I(n)|J (n))

I(n)Y

i=1

p(a(n)i |J (n)

) · p(f (n)i |e(n)

ai)

1

A

Page 32: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

MLE for IBM Model 1 (observed)

ˆ✓ = arg max

NY

n=1

0

@p(I(n)|J (n))

I(n)Y

i=1

p(a(n)i |J (n)

) · p(f (n)i |e(n)

ai)

1

A

number ofsentences

French, Englishsentence lengths

alignment of French word at position i

French, Englishword pair

Page 33: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

MLE for IBM Model 1 (observed)

ˆ✓ = arg max

NY

n=1

0

@p(I(n)|J (n))

I(n)Y

i=1

p(a(n)i |J (n)

) · p(f (n)i |e(n)

ai)

1

A

constant (w.r.t. words)!{

Page 34: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

MLE for IBM Model 1 (observed)

ˆ✓ = arg max

✓C

NY

n=1

I(n)Y

i=1

p(f (n)i |e(n)

ai)

Page 35: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

MLE for IBM Model 1 (observed)

ˆ✓ = arg max

✓log

0

@CNY

n=1

I(n)Y

i=1

p(f (n)i |e(n)

ai)

1

A

log(a) < log(b) () a < b

Page 36: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

MLE for IBM Model 1 (observed)

ˆ✓ = arg max

log

0

@C ·Y

f,e

p(f |e)count(hf,ei)

1

A

Page 37: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

MLE for IBM Model 1 (observed)

ˆ

✓ = arg max

✓log C +

X

f,e

count(hf, ei) log p(f |e)

log of product = sum of logs

Page 38: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

MLE for IBM Model 1 (observed)

⇤(✓,�) = log C +

X

f,e

count(hf, ei) log p(f |e)

�X

e

�e

0

@X

f

p(f |e) � 1

1

A

Lagrange multiplier expresses normalization constraint{

Page 39: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

MLE for IBM Model 1 (observed)

⇤(✓,�) = log C +

X

f,e

count(hf, ei) log p(f |e)

�X

e

�e

0

@X

f

p(f |e) � 1

1

A

@⇤(✓,�)@p(f |e) =

count(hf, ei)p(f |e) � �ederivative

Page 40: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However , the sky remained clear under the strong north wind .

MLE for IBM Model 1 (observed)

虽然# of times 虽然 aligns to any word

# of times 虽然 aligns to Howeverp(however| ) =

Page 41: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However , the sky remained clear under the strong north wind .

MLE for IBM Model 1 (unobserved)

虽然p(however| ) = ???

Page 42: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

MLE for IBM Model 1 (observed)

ˆ✓ = arg max

✓log

0

@CNY

n=1

I(n)Y

i=1

p(f (n)i |e(n)

ai)

1

A

Page 43: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

ˆ✓ = arg max

✓log

0

@CNY

n=1

X

a

I(n)Y

i=1

p(f (n)i |e(n)

ai)

1

A

MLE for IBM Model 1 (unobserved)

Data likelihood = marginal probability of observed data

p(f |e) =X

a

p(f ,a|e)

Page 44: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

MLE for IBM Model 1 (unobserved)

ˆ✓ = arg max

log

0

@C ·Y

f,e

p(f |e)E[count(hf,ei)]

1

A

Page 45: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

MLE for IBM Model 1 (unobserved)

ˆ✓ = arg max

log

0

@C ·Y

f,e

p(f |e)E[count(hf,ei)]

1

A

Not constant! Depends on parameters, no analytic solution.

Page 46: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

MLE for IBM Model 1 (unobserved)

ˆ✓ = arg max

log

0

@C ·Y

f,e

p(f |e)E[count(hf,ei)]

1

A

Not constant! Depends on parameters, no analytic solution.

But it does strongly imply an iterative solution.

Page 47: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

Likelihood Estimation for Model 1

However

p(English word|Chinese word)

, the sky remained clear under the strong north wind .

unobserved!

Parameters and alignments are both unknown.

Page 48: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

Likelihood Estimation for Model 1

However

p(English word|Chinese word)

, the sky remained clear under the strong north wind .

unobserved!

If we knew the alignments, we could calculate the values of the parameters.

Parameters and alignments are both unknown.

Page 49: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

Likelihood Estimation for Model 1

However

p(English word|Chinese word)

, the sky remained clear under the strong north wind .

unobserved!

If we knew the alignments, we could calculate the values of the parameters.

If we knew the parameters, we could calculate the likelihood of the data.

Parameters and alignments are both unknown.

Page 50: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

Likelihood Estimation for Model 1

However

p(English word|Chinese word)

, the sky remained clear under the strong north wind .

unobserved!

If we knew the alignments, we could calculate the values of the parameters.

If we knew the parameters, we could calculate the likelihood of the data.

Parameters and alignments are both unknown.

Page 51: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

The Plan: Bootstrapping

•Arbitrarily select a set of parameters (say, uniform).

•Calculate expected counts of the unseen events.

•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.

•Iterate.

Page 52: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

The Plan: Bootstrapping

However , the sky remained clear under the strong north wind .

Page 53: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

The Plan: Bootstrapping

However , the sky remained clear under the strong north wind .

if we had observed the alignment, this line would

either be here (count 1) or it wouldn’t (count 0).

Page 54: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

The Plan: Bootstrapping

However , the sky remained clear under the strong north wind .

if we had observed the alignment, this line would

either be here (count 1) or it wouldn’t (count 0).

since we didn’t observe the alignment, we calculate the probability that it’s there.

Page 55: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Remember: Probabilistic Primer136136136136136136

136136136136136136

136136136136136136

136136136136136136

136136136136136136

136136136136136136

p(Y = 1) =X

x2X

p(X = x, Y = 1) =16

marginal probability

Page 56: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

p(

p(

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

p( )

) +

) +

Marginalize: sum all alignments containing the link

Page 57: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

p(

p(

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

p(

) +

) +

)

Divide by sum of all possible alignments

Page 58: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

p(

p(

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。

However , the sky remained clear under the strong north wind .

p(

) +

) +

)

Divide by sum of all possible alignments

Is this hard? How many alignments are there?

Page 59: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Computing Expectations

probability of an alignment.

p(F,A|E) = p(I|J)Y

ai

p(ai = j)p(fi|ej)

Page 60: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Computing Expectations

probability of an alignment.

observed uniform

p(F,A|E) = p(I|J)Y

ai

p(ai = j)p(fi|ej)

Page 61: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

factors across words.

Computing Expectations

probability of an alignment.

observed uniform

p(F,A|E) = p(I|J)Y

ai

p(ai = j)p(fi|ej)

Page 62: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Computing Expectations

北北

.�

a⇥A: �north

p(north| ) · p(rest of a)

marginal probability of alignments containing link

Page 63: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Computing Expectations

p(north| ).�

a⇥A: �north

p(rest of a)北北

marginal probability of alignments containing link

Page 64: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Computing Expectations

p(north| ).�

a⇥A: �north

p(rest of a)北北

marginal probability of alignments containing link

c⇥Chinese words

p(north|c).�

a⇥A: �north

p(rest of a)

marginal probability of all alignments

c

Page 65: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Computing Expectations

p(north| ).�

a⇥A: �north

p(rest of a)北北

marginal probability of alignments containing link

c⇥Chinese words

p(north|c).�

a⇥A: �north

p(rest of a)

marginal probability of all alignments

c

Page 66: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Computing Expectations

p(north| ).�

a⇥A: �north

p(rest of a)北北

marginal probability of alignments containing link

c⇥Chinese words

p(north|c).�

a⇥A: �north

p(rest of a)

marginal probability of all alignments

c

identical!

Page 67: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Computing Expectations

北p(north| ).�

c�Chinese words p(north|c)

Page 68: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Computing Expectations

北p(north| ).�

c�Chinese words p(north|c)

marginal probability (expected count) of an alignment containing the link

Page 69: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Computing Expectations

北p(north| ).�

c�Chinese words p(north|c)

marginal probability (expected count) of an alignment containing the link

For each sentence, use this quantity instead of 0 or 1

Page 70: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However , the sky remained clear under the strong north wind .

Maximum Likelihood

虽然# of times 虽然 aligns to any word

# of times 虽然 aligns to Howeverp(however| ) =

Page 71: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

虽然 北 风 呼啸 , 但 天空 依然 十分 清澈 。Although north wind howls , but sky still very clear .

However , the sky remained clear under the strong north wind .

Expectation Maximization

虽然

Expected # of times 虽然 aligns to However

p(however| ) =# of times 虽然 aligns to any word

Page 72: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Expectation Maximization

•Arbitrarily select a set of parameters (say, uniform).

•Calculate expected counts of the unseen events.

•Choose new parameters to maximize likelihood, using expected counts as proxy for observed counts.

•Iterate. (Until when?)

Page 73: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Textbook example

Page 74: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Textbook example

Page 75: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Textbook example

Page 76: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Textbook example

Page 77: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Textbook example

Page 78: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Expectation Maximization

北p(north| ).�

c�Chinese words p(north|c)

Why does this even work?

Page 79: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Expectation Maximization

Observation 1: We are still solving a maximum likelihood estimation problem.

Page 80: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Expectation Maximization

Observation 1: We are still solving a maximum likelihood estimation problem.

p(Chinese|English) =X

alignments

p(Chinese, alignment|English)

Page 81: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Expectation Maximization

Observation 1: We are still solving a maximum likelihood estimation problem.

p(Chinese|English) =X

alignments

p(Chinese, alignment|English)

MLE: choose parameters that maximize this expression.

Page 82: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Expectation Maximization

Observation 1: We are still solving a maximum likelihood estimation problem.

p(Chinese|English) =X

alignments

p(Chinese, alignment|English)

MLE: choose parameters that maximize this expression.

Minor problem: there is no analytic solution.

Page 83: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

(from Minka ’98)

Page 84: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

0 1

... and, likelihood is convex for this model:

Page 85: Edinburgh MT lecture 3: Latent Variable Models and Word Alignment

Summary

•Learning is optimization: choose parameters that optimize some function, such as likelihood.

•Supervised: maximum likelihood.

•Beware of overfitting.

•Unsupervised: expectation maximization.

•Many other objective functions and algorithms.


Recommended