Learning Accurate, Compact, and Interpretable Tree Annotation

transcript

Recent Advances in Parsing Technology

WS 2011/2012

Saarland University in Saarbrücken Miloš Ercegovčević

Outline

Introduction EM algorithm

Latent Grammars Motivation Learning Latent PCFG

Split-Merge Adaptation

Efficient inference with Latent Grammars Pruning in Multilevel Coarse-to-Fine parsing Parse Selection

Introduction : EM Algorithm

Iterative algorithm for finding MLE or MAP estimates of parameters in statistical models

X – observed data; Z – set of latent variables Θ – a vector of unknown parametes Likelihood function:

MLE of the marginal likelihood :

However this quantity is intractable Often we don’t know both Z and Θ

)|,()|();( ZXpXpXL z

)|,(),;( ZXpZXL

Introduction : EM Algorithm

Find the MLE of the marginal likelihood by iteratively applying two steps:

Expectation step (E-step): Calculate Z under current Θ

Maximization step (M-step): Find Θ that maximizes the quantity

)|(maxarg )()1( tt Q

)],;([log)|( )(,|

)( ZXLEq tXZ

Latent PCFG

Standard coarse Treebank Tree Baseline for parsing F1 72.6

Parent annotated trees [Johnson ’98], [Klein & Manning ’03]

F1 86.3

Latent PCFG

Head lexicalized [Collins ’99, Charniak ’00] trees F1 88.6

Latent PCFG

Automatically clustered categories with F1 86.7 [Matsuzaki et al. ’05]

Same number of subcategories for all categories

Latent PCFG

At each step split the categories into two sets. After 6 iterations number of subcategories is 64 Initialize EM with the results of the smaller grammar

Learning Latent PCFG

Induce subcategories Like forward-backward

for HMMs Fixed brackets

Forward

X2X7X4

X5 X6X3

He was right

Backward

Learning Latent Grammar Inside-Outside probabilities

),,(),,()(),,(,

zINyINzyxzy

XIN CtsPBsrPCBAAtrP

),,(),,()(),,(,

zINxOUTzyxzx

yOUT CtsPAtrPCBABtrP

),,(),,()(),,(,

yINxOUTzyxyx

zOUT BsrPAtrPCBACtsP

Learning Latent Grammar Expectation step (E-step):

Maximization step (M-step):

),,(),,(

)(),,(),|),,,((

zINyIN

zyxxOUTzyx CtsPBsrP

CBAAtrPTwCBAtsrP

}{#:)(

''',' zyxzy

zyxzyx CBA

CBACBA

Latent Grammar : Adaptive splitting

Without loss in Accuracy

Want to split more according to the data Solution: Split everything then merge by the

split with likelihood Data

reversedsplit with likelihood Data

Latent Grammar : Adaptive splitting The likelihood of data for tree T and sentence

Then for two annotations the overall loss can be estimated as:

),,(),,(),( XOUTxx

IN AtrPAtrPTwP

i Tn i

ANNOTATION

),(),( 21

Number of Phrasal Subcategories

Number of Lexical Subcategories

D CD IN

UH , ``

D CD IN

UH , ``

RB VBx

D CD IN

UH , ``

NNP JJ

Latent Grammar : Results

ParserF1

≤ 40 words

all words

Klein & Manning ’03 86.3 85.7

Matsuzaki et al. ’05 86.7 86.1

Collins ’99 88.6 88.2

Charniak & Johnson ’05 90.1 89.6

Petrov et al. ‘06 90.2 89.7

Efficient inference with Latent Grammars Latent Grammar with 91.2 F1 score on Dev Set (1600

sentences) WSJ Training time 1621: more than a minute per sentence For usage in real-word applications this is to slow

Improve on inference: Hierarchical Pruning Parse Selection

Intermediate Grammars

X-Bar=G0

rning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8

DT1 DT2 DT3 DT4

Projected Grammars

X-Bar=G0

jectio

Treebank

Rules in (G)

S NP VP

Rules in G

S1 NP1 VP1 0.20S1 NP1 VP2 0.12S1 NP2 VP1 0.02S1 NP2 VP2 0.03S2 NP1 VP1 0.11S2 NP1 VP2 0.05S2 NP2 VP1 0.08S2 NP2 VP2 0.12

Infinite tree distribution

Estimating Grammars

Hierarchical Pruning

Consider the span:

… QP NP VP …coarse:

split in two: … QP1 QP2 NP1 NP2 VP1 VP2 …

… QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 …split in four:

split in eight: … … … … … … … … … … … … … … … … …

Parse Selection

Given a sentence w and a split PCFG grammar G select the best parse that minimize our beliefs:

Intractable: we cannot generate all the TT

P TTLGwTPT ),(),|(minarg*

Parse Selection

Possible solutions best derivation generate n-best parses and re-rank them

sampling derivations of the grammar select the minimum risk candidate based on loss

function of posterior marginals:

),,,(),,,(

nrootP

jkiBCArjkiBCAq

TG eqT )(maxarg

Results

Thank You!

References

S. Petrov, L. Barrett, R. Thibaux, D Klein. Learning Accurate, Compact, and Interpretable Tree Annotation, COLING-ACL 2006 slides.

S. Petrov and D. Klein, NACL Improved Inference for Unlexicalized Parsing : 2007 slides.

S. Petrov, L. Barrett, R. Thibaux, and D. Klein. 2006. Learning accurate, compact, and interpretable tree annotation. In COLING-ACL ’06, pages 443–440.

S. Petrov and D. Klein. 2007. Improved Inference for Unlexicalized Parsing . In NACL ’06.

T. Matsuzaki, Y. Miyao, and J. Tsujii. 2005. Probabilistic CFG with latent annotations. In ACL ’05, pages 75–82.

Learning Accurate, Compact, and Interpretable Tree Annotation

Documents