+ All Categories
Home > Documents > Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Date post: 01-Apr-2015
Category:
Upload: ashlyn-lawlis
View: 217 times
Download: 1 times
Share this document with a friend
31
Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein
Transcript
Page 1: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Improved Inference for Unlexicalized Parsing

Slav Petrov and Dan Klein

Page 2: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Unlexicalized Parsing

Hierarchical, adaptive refinement:

1,140 Nonterminal symbols 1621min Parsing time

531,200 Rewrites

[Petrov et al. ‘06]

91.2 F1 score on Dev Set (1600 sentences)

DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8

DT1 DT2 DT3 DT4

DT1

DT

DT2

Page 3: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

1621 min

Page 4: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Coarse-to-Fine Parsing[Goodman ‘97, Charniak&Johnson ‘05]

Coarse grammarNP … VP

NP-dog NP-catNP-apple VP-run NP-eat…

Refined grammar

TreebankParse

Pru

ne

NP-17 NP-12NP-1 VP-6VP-31…

Refined grammar

Parse

Page 5: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Prune?

For each chart item X[i,j], compute posterior probability:

… QP NP VP …

coarse:

refined:

E.g. consider the span 5 to 12:

< threshold

Page 6: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

1621 min

111 min(no search error)

Page 7: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

[Charniak et al. ‘06]

NP … VP

NP-dog NP-catNP-apple VP-run NP-eat…

Refined grammar

X

A,B,..

Multilevel Coarse-to-Fine Parsing

Add more rounds of

pre-parsing

Grammars coarser

than X-bar ???

???

?

Page 8: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Hierarchical Pruning

Consider again the span 5 to 12:

… QP NP VP …coarse:

split in two: … QP1

QP2

NP1 NP2 VP1 VP2 …

… QP1

QP1

QP3

QP4

NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 …split in four:

split in eight: … … … … … … … … … … … … … … … … …

Page 9: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Intermediate Grammars

X-Bar=G0

G=

G1

G2

G3

G4

G5

G6

Lea

rning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8

DT1 DT2 DT3 DT4

DT1

DT

DT2

Page 10: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

1621 min111 min

35 min(no search error)

Page 11: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

State Drift (DT tag)

somesomethisthisThatThat thesethese

That this some

the

these

this some

that

That this some

the

these

this some

that

……………… …… ……………… …… somesomethesethisThatThis thatthat EM

Page 12: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

G1

G2

G3

G4

G5

G6

Lea

rning

G1

G2

G3

G4

G5

G6

Lea

rning

Projected Grammars

X-Bar=G0

G=

Pro

jectio

n i

0(G)

1(G)

2(G)

3(G)

4(G)

5(G)G

Page 13: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Estimating Projected Grammars

Nonterminals?

Nonterminals in G

NP1VP1VP0 S0S1

NP0

Nonterminals in (G)

VP

S

NP

Projection

Easy:

Page 14: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Rules in G Rules in (G)

Estimating Projected Grammars

Rules?

S1 NP1 VP1 0.20S1 NP1 VP2 0.12S1 NP2 VP1 0.02S1 NP2 VP2 0.03S2 NP1 VP1 0.11S2 NP1 VP2 0.05S2 NP2 VP1 0.08S2 NP2 VP2 0.12

S NP VP

????

Page 15: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Treebank

Estimating Projected Grammars[Corazza & Satta ‘06]

Rules in (G)

S NP VP

Rules in G

S1 NP1 VP1 0.20S1 NP1 VP2 0.12S1 NP2 VP1 0.02S1 NP2 VP2 0.03S2 NP1 VP1 0.11S2 NP1 VP2 0.05S2 NP2 VP1 0.08S2 NP2 VP2 0.12

Infinite tree distribution

0.56

Estimating Grammars

Page 16: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Calculating Expectations

Nonterminals:

ck(X): expected counts up to depth k Converges within 25 iterations (few seconds)

Rules:

Page 17: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

1621 min111 min35 min

15 min(no search error)

Page 18: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

G1

G2

G3

G4

G5

G6

Lea

rning

Parsing times

X-Bar=G0

G=

60 %

12 %

7 %

6 %

6 %

5 %

4 %

Page 19: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Bracket Posteriors (after G0)

Page 20: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Bracket Posteriors (after G1)

Page 21: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Bracket Posteriors (Movie)(Final Chart)

Page 22: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Bracket Posteriors (Best Tree)

Page 23: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Parse Selection

Computing most likely unsplit tree is NP-hard: Settle for best derivation. Rerank n-best list. Use alternative objective function.

Parses:

-1

-1

-2

-2

-1

-1

-1Derivations:

-1

-2

-1

-1

-2

-1

-2

Page 24: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Parse Risk Minimization

Expected loss according to our beliefs:

TT : true tree TP : predicted tree L : loss function (0/1, precision, recall, F1)

[Titov & Henderson ‘06]

Use n-best candidate list and approximate

expectation with samples.

Page 25: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Reranking Results

Objective Precision Recall F1 Exact

BEST DERIVATION

Viterbi Derivation 89.6 89.4 89.5 37.4

Exact (non-sampled) 90.8 90.8 90.8 41.7

Exact/F1 (oracle) 95.3 94.4 95.0 63.9

RERANKING

Precision (sampled) 91.1 88.1 89.6 21.4

Recall (sampled) 88.2 91.3 89.7 21.5

F1 (sampled) 90.2 89.3 89.8 27.2

Exact (sampled) 89.5 89.5 89.5 25.8

Page 26: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Dynamic Programming

[Matsuzaki et al. ‘05]Approximate posterior parse distribution

à la [Goodman ‘98]Maximize number of expected correct rules

Page 27: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Objective Precision Recall F1 Exact

BEST DERIVATION

Viterbi Derivation 89.6 89.4 89.5 37.4

DYNAMIC PROGRAMMING

Variational 90.7 90.9 90.8 41.4

Max-Rule-Sum 90.5 91.3 90.9 40.4

Max-Rule-Product 91.2 91.1 91.2 41.4

Dynamic Programming Results

Page 28: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Final Results (Efficiency)

Berkeley Parser: 15 min 91.2 F-score Implemented in Java

Charniak & Johnson ‘05 Parser 19 min 90.7 F-score Implemented in C

Page 29: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Final Results (Accuracy)

≤ 40 words

F1

all

F1

EN

G

Charniak&Johnson ‘05 (generative) 90.1 89.6

This Work 90.6 90.1

Charniak&Johnson ‘05 (reranked) 92.0 91.4

GE

R

Dubey ‘05 76.3 -

This Work 80.8 80.1

CH

N

Chiang et al. ‘02 80.0 76.6

This Work 86.3 83.4

Page 30: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Conclusions

Hierarchical coarse-to-fine inference Projections Marginalization

Multi-lingual unlexicalized parsing

Page 31: Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein.

Thank You!

Parser available at

http://nlp.cs.berkeley.edu


Recommended