Download - Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo.

Feature Forest Modelsfor Syntactic Parsing

Yusuke Miyao

University of Tokyo

Probabilistic models for NLP

• Widely used for disambiguation of linguistic structures

• Ex.) POS tagging

A pretty girl is cryingNN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBGP(NN|a/NN, pretty)





DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG





DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

Implicit assumption

• Processing state = Primitive probability– Efficient algorithm for searching– Avoid exponential explosion of ambiguities

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

NN

DT

VBZ

JJ

VBG

A pretty girl is crying

POS tag = processing state = primitive probability

The assumption is right?

• Ex.) Shallow parsing, NE recognition


• Ex.) Shallow parsing, NE recognition

NP-B

VP-I

NP-I

O

VP-B

A pretty girl is cryingNP-B

VP-I

NP-I

O

VP-B

NP-B

VP-I

NP-I

O

VP-B

NP-B

VP-I

NP-I

O

VP-B

NP-B

VP-I

NP-I

O

VP-B


• Ex.) Shallow parsing, NE recognition– B(Begin), I(Internal), O(Other) tags are

introduced to represent multi-word tags

NP-B

VP-I

NP-I

O

VP-B


VP-I

NP-I

O

VP-B

NP-B

VP-I

NP-I

O

VP-B

NP-B

VP-I

NP-I

O

VP-B

NP-B

VP-I

NP-I

O

VP-B


• Ex.) Syntactic parsing


• Ex.) Syntactic parsing

What do you want to give?

VP

VP

S

S

S

P(VP|VP→to give)


• Ex.) Syntactic parsing– Non-local dependencies are not

represented


VP

VP

S

S

S

P(VP|VP→to give)

Problem of existing models

• Processing state Primitive probability

Problem of existing models

• Processing state Primitive probability

• How to model the probability of ambiguous structures with more flexibility?

Possible solution

• A complete structure is a primitive event– Ex.) Shallow parsing

NP-B

VP-I

NP-I

O

VP-B


VP-I

NP-I

O

VP-B

NP-B

VP-I

NP-I

O

VP-B

NP-B

VP-I

NP-I

O

VP-B

NP-B

VP-I

NP-I

O

VP-B

Possible solution


NP VP

NP VP

A pretty girl is cryingNP VP

NP VP NP

NP VP NP

All possiblesequences

Possible solution


• Probability of the sequence of multi-word tags

NP VP

NP VP


NP VP NP

NP VP NP


Possible solution


• Probability of the sequence of multi-word tags

NP VP

NP VP


NP VP NP

NP VP NP


Possible solution

• A complete structure is a primitive event– Ex.) Syntactic parsing


VP

VP

S

S

S

Possible solution


whatdo

youwant

to

give

ARG1

ARG1

ARG2

MODIFY

MODIFY

ARG2

Possible solution


• Probability of argument structures

whatdo

youwant

to

give

ARG1

ARG1

ARG2

MODIFY

MODIFY

ARG2

Problem

• Complete structures have exponentially many ambiguities

NP VP

NP VP


NP VP NP

NP VP NP

Exponentiallymany

sequences

Proposal

• Feature forest model [Miyao and Tsujii, 2002]

Proposal

• Feature forest model [Miyao and Tsujii, 2002]

1d

4d 5d3d2d

1c

3c

2c

4c 6c5c

7d6d

7c 8c 10c9c

Conjunctive node

Conjunctive node

Disjunctive node

Disjunctive node

FeaturesFeatures ,,)( 216 ffc

• Exponentially many trees are packed

• Features are assigned to each conjunctive node

Feature forest model

• Feature forest models can be efficiently estimated without exponential explosion [Miyao and Tsujii, 2002]

Feature forest model

• Feature forest models can be efficiently estimated without exponential explosion [Miyao and Tsujii, 2002]

• When unpacking the forest, the model is equivalent to maximum entropy models [Berger et al., 1996]

Application to parsing

• Applying a feature forest model to disambiguation of argument structures



• How to represent exponential ambiguities of argument structures with a feature forest?



• How to represent exponential ambiguities of argument structures with a feature forest?– Argument structures are not trees, but

DAGs (including reentrant structures)

wantARG1

ARG2

Iargue11

ARG1 1

fact

ARG1

wantARG1

ARG2

Iargue21

ARG1 1ARG2 fact

Packing argument structures

• An example including reentrant structures

She neglected the fact that I wanted to argue.

I



wantARG1

ARG2

Iargue11

ARG1 1


• Inactive parts: Argument structures whose arguments are all instantiated


I

wantARG1

ARG2

Iargue11

ARG1 1



• Inactive parts are packed into conjunctive nodes


I





I

wantA1A2

argue1I A1

argue1I

wantARG1

ARG2

Iargue21

ARG1 1ARG2 ?





I

wantA1A2

argue1I A1

argue1I





I

wantA1A2

argue1I A1

argue1I

wantA1A2 argue2

I





I

wantA1A2

argue1I A1

argue1I

wantA1A2 argue2

I

wantARG1

ARG2

Iargue21

ARG1 1ARG2 fact





I

wantA1A2

argue1I A1

argue1I

wantA1A2 argue2

I

factargue2A1A2 fact

I





I

wantA1A2

argue1I A1

argue1I

wantA1A2 argue2

I

factargue2A1A2 fact

IfactA1 want

Feature forest representationof argument structures

factA1 want

factargue2A1A2

wantA1A2

argue1I A1


I

argue1I

wantA1A2 argue2

I

factI

she

neglectA1A2 fact

sheConjunctive nodes correspond to argument structures whose arguments are all instantiated

Experiments

• Grammar: a treebank grammar of HPSG [Miyao and Tsujii, 2003]

– Extracted from the Penn Treebank [Marcus et al., 1994] Section 02-21

• Training: Section 02-21 of the Penn Treebank• Test: sentences from Section 22 covered by

the grammar• Measure: Accuracy of dependencies in

argument structures

Experiments

• Features: the combinations of– Surface strings/POS– Labels of dependencies (ARG1, ARG2, …)– Labels of lexical entries (head noun, transitive, …)– Distance

• Estimation algorithm: Limited-memory BFGS algorithm [Nocedal, 1980] with MAP estimation [Chen & Rosenfeld, 1999]

Preliminary results

• Estimation time: 143 min.

• Accuracy (precision/recall):

exact partial

Baseline 48.1 / 47.4 57.1 / 56.2

Unigram 77.3 / 77.4 81.1 / 81.3

Feature forest 85.5 / 85.3 88.4 / 88.2

Conclusion

• Feature forest models allow the probabilistic modeling of complete structures without exponential explosion

• The application to syntactic parsing resulted in the high accuracy

Ongoing work

• Refinement of the grammar and tuning of estimation parameters

• Development of efficient algorithms for best-first/beam search