+ All Categories
Home > Documents > Semi-Supervised Learning of Sequence Models via Method of...

Semi-Supervised Learning of Sequence Models via Method of...

Date post: 10-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
50
Semi-Supervised Learning of Sequence Models via Method of Moments EMNLP - Empirical Methods for Natural Language Processing Zita Marinho IST, University of Lisbon Robotics Institute, CMU Shay B. Cohen School of Informatics University of Edinburgh André F. T. Martins IT, IST, University of Lisbon Unbabel Noah A. Smith Computer Science & Eng. University of Washington November 1-6, 2016 Austin, Texas [email protected] [email protected] [email protected] [email protected]
Transcript
Page 1: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

Semi-Supervised Learning of Sequence Models

via Method of MomentsEMNLP - Empirical Methods for Natural Language Processing

Zita MarinhoIST, University of LisbonRobotics Institute, CMU

Shay B. CohenSchool of InformaticsUniversity of Edinburgh

André F. T. MartinsIT, IST, University of LisbonUnbabel

Noah A. SmithComputer Science & Eng.University of Washington

November 1-6, 2016 Austin, Texas

[email protected] [email protected] [email protected] [email protected]

Page 2: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Introduction 2

Sequence Labeling

w1

Herb

w2

fights

y1 y2 y3 y5

w3

like

w5

ninja

w4

a

w6

.

y4 y6

observed data {w1, w2, w3,…, w6} labels {y1, y2, y3,…, y6}

N V Pre .Det N

Page 3: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Introduction

N V V .Det N

3

Sequence Labeling

w1

Herb

w2

fights

y1 y2 y3 y5

w3

like

w5

ninja

w4

a

w6

.

y4 y6

observed data {w1, w2, w3,…, w6} labels {y1, y2, y3,…, y6}

Page 4: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Introduction

ADJ N V .Det N

4

Sequence Labeling

w1

Herb

w2

fights

y1 y2 y3 y5

w3

like

w5

ninja

w4

a

w6

.

y4 y6

observed data {w1, w2, w3,…, w6} labels {y1, y2, y3,…, y6}

Page 5: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Introduction

? ? ? ?? ?

5

Sequence Labeling

w1

Herb

w2

fights

y1 y2 y3 y5

w3

like

w5

ninja

w4

a

w6

.

y4 y6

K6 possible assignments

observed data {w1, w2, w3,…, w6} labels {y1, y2, y3,…, y6}

Page 6: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Introduction 6

Hidden Markov Model

w1 w2

y1 y2 y3 y5

w3 w5w4 w6

y4 y6

Learn parameters?

p(yt | yt-1)

p(wt | yt)

• supervised learning • unsupervised/semi-supervised (this talk)

Page 7: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Introduction 7

Hidden Markov Model

w1 w2

y1 y2 y3 y5

w3 w5w4 w6

y4 y6

Learn parameters?

p(yt | yt-1)

p(wt | yt)

• model can be extended to include featuresBerg-Kirkpatrick, et al, Painless unsupervised learning with features. NAACL HLT, 2010.

• supervised learning • unsupervised/semi-supervised (this talk)

Page 8: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Problem Statement 8

Maximum Likelihood estimation (MLE)

• exact inference is hard

• EM sensitive to local optima (depends on initialization)

• EM expensive in large datasets (several inference passes)

Method of Moments estimation (MoM)

computationally efficient

no local optima

one pass over data

Page 9: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Introduction 9

Hidden Markov Model

via Maximum Likelihood Estimation

unsupervised learning

semi-supervised learning

feature HMM

MLE MoM

via Method of Moments

HMM feature HMMHMMMoM

?

?

?

MLE

Arora et al., A Practical Algorithm for Topic Modeling with Provable Guarantees, ICML 2013

Shay B. Cohen, Karl Stratos, Michael Collins, Dean P. Foster and Lyle Ungar, Spectral Learning of Latent-Variable PCFGs: Algorithms and Sample Complexity, JMLR 2014

✓ ✓

✓ ✓ ✓

Page 10: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Outline 10

Learning sequence models via MoM

1. Learn HMM models via MoM

2. Solve a QP

3. Extend to feature-based model

4. Experiments

Outline

5. Experiments

Page 11: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Anchor Learning 11

Key insight:

2. Anchor Trick:

1. Conditional Independence:

learn a proxy for labels with anchors

infer label by looking at context

Method of Moments

Page 12: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Anchor Learning 12

w1

hehe

w2

its

w3

gonna

w5

a

w4

b

w6

good

w7

day

y1 y2 y3 y5y4 y6 y7 stopstart

word

1. Conditional Independence

Page 13: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

13

w1

:)

w2

wait

w3

now

w5

am

w4

I

w6

goin

w7

2

y1 y2 y3 y5y4 y6 y7 stopstart

= { w-1 , w+1 }

Log-linear model

context

1. Conditional Independence

Page 14: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Problem Statement 14

adp

wt-1

tasted

yt-1 yt

wt

like

wt+1

chimichangas

yt+1

context

1. Conditional Independence

Page 15: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Problem Statement 15

context

verb

wt-1

i

yt-1 yt

wt

like

wt+1

fajitas

yt+1

1. Conditional Independence

Page 16: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Problem Statement 16

verb

wt-1

i

yt-1 yt

wt

like

wt+1

fajitas

yt+1

“You shall know a word by the company it keeps.”Firth, 1957

context

1. Conditional Independence

Page 17: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Anchor Learning 17

w1

hehe

w2

its

w3

gonna

w5

a

w4

b

w6

good

w7

day

y1 y2 y3 y5y4 y6 y7 stopstart

contextword

contextword ? | label

1. Conditional Independence

Page 18: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Anchor Learning 18

Arora et al., A Practical Algorithm for Topic Modeling with Provable Guarantees, ICML 2013

verb

wt-1

yt-1 yt

wt

be

yt+1

label

anchor word

wt+1

p( label ≠ verb | be ) = 0

p( verb | be ) = 1

2. Anchor Trick

all instances of be = verb

Page 19: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Anchor Learning 19

y

More anchors per label

verb

more than 1 anchor word less biased context estimates

verb = b, be, are, is, am, have, going

begoareis amhavegoing

2. Anchor Trick

Page 20: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Anchor Learning 20

How to find anchors?

• small labeled corpus

• small lexicon Austinairportplayground

am,be,is,arego,make,madebecome

so,on,of

he,it,she

noun

verb

pron

adp

2. Anchor Trick

Page 21: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Method of Moments 21

co-occurrences in dataunlabeled Method of moments

Andrew fights like Jet Li.

eat Fruit like cherry.

Ann sings like me.

Children like ice-cream.

wt

context

wt-1 wt+1 wt+2

Page 22: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Method of Moments 22

Andrew fights like Jet Li.

eat Fruit like cherry.

Ann sings like me.

Children like ice-cream.

wt

context

wt-1 wt+1 wt+2

Method of moments

like

Child

ren

cher

ry

ice-cr

eam

fights

a Jet

me.context

wor

d love

there

will

Q p(context | word)

Page 23: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Method of Moments 23

Let there be love.

Bill will be a ninja.

Method of moments

like

Child

ren

cher

ry

ice-cr

eam

fights

a Jet

me.context

wor

d love

be

there

will

Q p(context | word)

Page 24: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Method of Moments 24

xcontext

Method of moments

label

contextword ? | label1. Conditional Independence p(label | word) p(context | word) p(context | label)

X

labels

=

=w

ord

label

wor

d

context

Q Γ

R

Page 25: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Method of Moments 25

Method of moments

p(label | word) p(context | word) p(context | label)X

labels

=

contextword ? | label1. Conditional Independence

2. Anchor Trick p(label | word) p(context | word) p(context | anchors)

X

labels

=

= xw

ord

label

R

context

wor

d

context

Q Γan

chor

s

Page 26: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Outline 26

Learning sequence models via MoM

Proposed work

1. Learn HMM models via MoM

2. Solve a QP

3. Extend to feature-based model

4. Experiments

Outline

5. Experiments

Page 27: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Method of Moments 27

q

Method of Moments p(label | word) p(context | word) p(context | label)

= x

wor

d

label

R

contextw

ord

context

Q Γ

anch

ors

γ

• solve per word type ~(ms)

γ = argmin || q - R γ ||2

0 ≤ γ ≤ 1γ = 1

X

labels

Page 28: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Method of Moments 28

q

Method of Moments p(label | word) p(context | word) p(context | label)

= x

wor

d

label

R

contextw

ord

context

Q Γ

anch

ors

γ

γ = argmin || q - R γ ||2

0 ≤ γ ≤ 1γ = 1

X

labels

+ λ || γsup - γ ||2

Page 29: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Method of Moments 29

q

Method of Moments p(label | word) p(context | word) p(context | label)

= x

wor

d

label

R

contextw

ord

context

Q Γ

anch

ors

γ

γ = argmin || q - R γ ||2

0 ≤ γ ≤ 1γ = 1

X

labels

+ λ || γsup - γ ||2

estimated from labeled data

estimated from unlabeled data

Page 30: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Method of Moments 30

γ p(word)X

words

p(label | word)

Learn parameters ?

p(label) =

HMM Learning

γ

coefficients

Bayes’ Rule p(word) p(label)

=

Observation Matrix

p(word | label) γ

Page 31: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Method of Moments 31

Bayes’ Rule p(word) p(label)

=

Observation Matrix

Learn parameters ?

p(word | label)

HMM Learning

γ

Transition Matrix

• estimate from labeled data only

Page 32: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Outline 32

Learning sequence models via MoM

1. Learn HMM models via MoM

2. Relax the notion of anchors

3. Solve a QP

4. Experiments

Outline

5. Experiments

Page 33: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Experiments

Semi-supervised Twitter POS tagging

33

12 Universal POS

200k words

Twitter dataset

Slav Petrov et al., A Universal Part-of-Speech Tagset, 2011Owoputi et al., Improved part-of-speech tagging for online conversational text with word clusters. 2013

2.7 M unlabeled tweets 1000-100 labeled tweets

hehe its gonna b a good dayx prt verb verb det adj noun

Page 34: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Experiments 34

Twitter POS tagging 150 training labeled sequences

71.7

77.278.2

84.3

707274767880828486

HMM

tagg

ing

accu

racy

HMM EM self-training AHMM

Page 35: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Experiments 35

Twitter POS tagging 1000 training labeled sequences

81.1

83.1

86.1

88.0

8081828384858687888990

HMM

tagg

ing

accu

racy

HMM EM self-training AHMM

Page 36: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Outline 36

Learning sequence models via MoM

Proposed work

1. Learn HMM models via MoM

2. Relax the notion of anchors

3. Extend to feature HMM

4. Experiments

Outline

5. Experiments

Page 37: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Extend to features 37

w1

:)

w2

wait

w3

now

w5

am

w4

I

w6

goin

w7

2

y1 y2 y3 y5y4 y6 y7 stopstart

• is upper• is title• is digit• is url• starts #• is emoticon

T. Berg-Kirkpatrick, Painless unsupervised learning with features, ACL 2010.

(word)�

(word)�

Log-linear model

Page 38: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Extend to features 38

w1

:)

w2

wait

w3

now

w5

am

w4

I

w6

goin

w7

2

y1 y2 y3 y5y4 y6 y7 stopstart

word ⟂ context | label

label

(word)�

ψ (context)

1. Conditional Independence

Page 39: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Extend to features 39

E [ ψ(context) x Φ(word) ]

p (context | label)

Log-linear model

p ( context | word )Q p ( label | word ) E [ Φ(word) | label ] p( label )

E [ Φ(word) ]

E [ Φ(word) ]

E[ψ(context) | label ]R

= xlabel

R

ψ(context)

Q Γ

anch

ors

(wor

d)�

(wor

d)�

ψ(context)

Γ

Page 40: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Method of Moments 40

γ = argmin || q - R γ ||2

γ = 1X

labels

+ λ || γsup - γ ||2

= xlabel

R

ψ(context)

Q Γ

anch

ors

(wor

d)�

(wor

d)�

ψ(context)

γq

• solve per feature dimension Φj

Log-linear model

Page 41: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Method of Moments 41

E[Φ(word)] p(label)

=

Learn parameters ?

E[ Φ(word) | label ] γ

γ =

mean parameters

µy = E[�(X) | Y = y]=

E [ Φ(word) | label ] p( label )E [ Φ(word) ]

Log-linear model

Page 42: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Extend to features 42

✓y

canonical parametersmean parameters

partition function

Learn parameters ?

Fenchel-Legendre Duality

max

✓y✓y

>µy � logZy

µy = E[�(X) | Y = y]

Log-linear model

✓yZy =

X

w

exp(✓>y µy)* argmax

max

✓y✓y

>µy � logZy

Zy =

X

w

exp (✓>y tw)

Page 43: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Outline 43

canonical parameters

mean parameters

Algorithmcompute moments

Γ

solve maxent problem

µy = E[�(X) | Y = y]

Q

find anchors

solve QP

✓y

R

~ 10s min

~ 10s sec

~ 2-3h

~ secs

~ 10s min

Page 44: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Outline 44

mean parameters

Algorithmcompute moments

Γ

µy = E[�(X) | Y = y]

Q

find anchors R

solve QP

supervision

canonical parameterssolve maxent problem ✓y

Page 45: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Outline 45

Learning sequence models via MoM

1. Learn HMM models via MoM

2. Relax the notion of anchors

3. Solve a QP

4. Experiments

Outline

5. Experiments

Page 46: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Experiments

81.8 81.883.4

85.3

707274767880828486

feature HMM

tagg

ing

accu

racy

HMM EM self-training AHMM

46

Twitter POS tagging 150 training labeled sequences

Page 47: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Experiments 47

Twitter POS tagging 1000 training labeled sequences

89.1 89.1 89.4 89.1

8081828384858687888990

feature HMM

tagg

ing

accu

racy

HMM EM self-training AHMM

Page 48: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Experiments 48

Twitter POS tagging

Tagging accuracy vs. labeled training size

0.7

0.75

0.8

0.85

0.9

0.95

0 100 200 300 400 500 600 700 800 900 1000

tagg

ing

accu

racy

Labeled sequences

feature HMM HMM anchor FHMM

Page 49: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | Experiments 49

Twitter POS tagging 1000 training sequences

42.0

14.910.3

3.80

10

20

30

40Tr

aini

ng T

ime

(h)

Brown Clusters EM self-training AHMM

Page 50: Semi-Supervised Learning of Sequence Models via Method of …homepages.inf.ed.ac.uk/scohen/emnlp16anchor-slides.pdf · Owoputi et al., Improved part-of-speech tagging for online conversational

EMNLP 16 | Semi-supervised sequence labeling with MoM | [email protected] 50

Conclusions

• MoM algorithm for semi-supervised learning

• flexible method (easy to add supervision)

• fast to train (only one pass over the data)

• particularly good with little supervision

Thank you [email protected]

Support for this research was provided by the Portuguese Science and Technology Foundation (FCT) and CMU Portugal Program, grant SFRH/BD/52015/2012. This work has also been partially supported by the European Union under H2020 project SUMMA, grant 688139, and by FCT, through contracts UID/EEA/50008/2013, through the LearnBig project (PTDC/EEISII/7092/2014), and the GoLocal project (grant CMUPERI/TIC/0046/2014).


Recommended