Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

transcript

Probabilistic Parsing

Ling 571

Fei Xia

Week 5: 10/25-10/27/05

Outline

• Lexicalized CFG (Recap)

• Hw5 and Project 2

• Parsing evaluation measures: ParseVal

• Collin’s parser

• TAG

• Parsing summary

Lexicalized CFG recap

Important equations

),...|(),...,(

),...,()(

,...,11

AAAPAAP

AAPAPn

Lexicalized CFG

• Lexicalized rules:

• Sparse data problem– First generate the head– Then generate the unlexicalized rule

)()...()()()...()(

)()....()(

rRrRhHlLlLhA

wBwBwA

Lexicalized models

))(),(|(*)))((),(|)((

)),...,),(|(*)),...,|)((

),...,|)(,(

),...|(

),...,(),(

iiiiiii

iiiiii

rhrlhsrPrmhrlhsrhP

lrlrrhrPlrlrrhP

lrlrrhrP

lrlrlrP

lrlrPSTP

An example

• he likes her

),|Pr(*),|(*

),|(*),|(*

),|Pr(*),|(*

),|(*),|(

herNPonNPPlikesNPherP

likesVPVNPVPPlikesVPlikesP

heNPonNPPlikesNPheP

likesSNPVPSPSlikesP

An example

• he likes her

),Pr|(Pr*),Pr|(*

),|(*),|(*

),Pr|(Pr*),Pr|(*

),|Pr(*),|(*

),|(*),|(*

),|Pr(*),|(*

),|(*),|(*

),|(*),|(

heronheronPheronherP

likesVlikesVPlikesVlikesP

heonheonPheonheP

herNPonNPPlikesNPherP

likesVPVNPVPPlikesVPlikesP

heNPonNPPlikesNPheP

likesSNPVPSPlikesSlikesP

likesTopSTopPToplikesP

Head-head probability

)...)(...)((

)....)(...)((

)...)(...)((

)...)(...)((),|(

wNPlikesXC

heNPlikesXClikesNPheP

Head-rule probability

)Pr)((),|Pr(

onheNPCheNPonNPP

Estimate parameters

))((),|(

)...)(...)((

)....)(...)((),|(

wACwAAP

wAwXCwAwP

Building a statistical tool

• Design a model:– Objective function: generative model vs.

discriminative model– Decomposition: independence assumption– The types of parameters and parameter size

• Training: estimate model parameters– Supervised vs. unsupervised– Smoothing methods

• Decoding:

Team Project 1 (Hw5)

• Form a team: program language, schedule, expertise, etc.

• Understand the lexicalized model

• Design the training algorithm

• Work out the decoding (parsing) algorithm: augment CYK algorithm.

• Illustrate the algorithms with a real example.

Team Project 2

• Task: parse real data with a real grammar extracted from a treebank.

• Parser: PCFG or lexicalized PCFG

• Training data: English Penn Treebank Section 02-21

• Development data: section 00

Team Project 2 (cont)

• Hw6: extract PCFG from the treebank

• Hw7: make sure your parser works given real grammar and real sentences; measure parsing performance

• Hw8: improve parsing results

• Hw10: write a report and give a presentation

Parsing evaluation measures

Evaluation of parsers: ParseVal

• Labeled recall: • Labeled precision: • Labeled F-measure:

• Complete match: % of sents where recall and precision are 100%

• Average crossing: # of crossing per sent• No crossing: % of sents which have no crossing.

An example

Gold standard:

(VP (V saw)

(NP (Det the) (N man))

(PP (P with) (NP (Det a) (N telescope))))

Parser output:

(VP (V saw)

(NP (NP (Det the) (N man))

(PP (P with) (NP (Det a) (N telescope)))))

ParseVal measures

• Gold standard: (VP, 1, 6), (NP, 2, 3), (PP, 4, 6), (NP, 5, 6)

• System output: (VP, 1, 6), (NP, 2, 6), (NP, 2, 3), (PP, 4,

6), (NP, 5, 6)

• Recall=4/4, Prec=4/5, crossing=0

A different annotation

Gold standard: (VP (V saw) (NP (Det the) (N’ (N man)) (PP (P with) (NP (Det a) (N’ (N telescope)))))

Parser output: (VP (V saw) (NP (Det the) (N’ (N man) (PP (P with) (NP (Det a) (N’ (N telescope)))))))

ParseVal measures (cont)

• Gold standard: (VP, 1, 6), (NP, 2, 3), (N’, 3, 3), (PP, 4, 6), (NP, 5, 6), (N’, 6,6)

• System output: (VP, 1, 6), (NP, 2, 6), (N’, 3, 6), (PP, 4, 6), (NP, 5, 6), (N’, 6, 6)

• Recall=4/6, Prec=4/6, crossing=1

• A tool that calculates ParseVal measures

• To run it:evalb –p parameter_file gold_file system_output

• A copy is available in my dropbox

• You will need it for Team Project 2

Summary of Parsing evaluation measures

• ParseVal is the widely used: F-measure is the most important

• The results depend on annotation style

• EVALB is a tool that calculates ParseVal measures

• Other measures are used too: e.g., accuracy of dependency links

History-based models

• History-based approaches maps (T, S) into a decision sequence

• Probability of tree T for sentence S is:

)),....,(|(

),...,|(

),....,(),(

ddPSTP

ndd ,....1

History-based models (cont)

• PCFGs can be viewed as a history-based model

• There are other history-based models– Magerman’s parser (1995)– Collin’s parsers (1996, 1997, ….)– Charniak’s parsers (1996,1997,….)– Ratnaparkhi’s parser (1997)

Collins’ models

• Model 1: Generative model of (Collins, 1996)

• Model 2: Add complement/adjunct distinction

• Model 3: Add wh-movement

Model 1

• First generate the head constituent label• Then generate left and right dependents

),,|)()...((*

))()...(,,,|)()...((*

),,|)()...((*

),|)()...()()()...()((

hHArRrRP

hHAlLlLP

lLlLhHArRrRP

hHAlLlLP

hArRrRhHlLlLhAP

Model 1(cont)

),,|)((

))()....(,,,|)((

),,|)()...((

),,|)((

))()....(,,,|)((

),,|)()...((

hHArRP

rRrRhHArRP

hHArRrRP

hHAlLP

lLlLhHAlLP

hHAlLlLP

An example

*),,|(

*),,|)((

boughtVPSSTOPP

boughtVPSweekNPP

boughtVPSmarksNPP

boughtSVPP

boughtSruleP

)()()()(: boughtVPMarksNPweekNPboughtSrule

Sentence: Last week Marks bought Brooks.

Model 2

• Generate a head label H

• Choose left and right subcat frames

• Generate left and right arguments

• Generate left and right modifiers

An example

{}),,,|(

*{}),,,|(

*{}),,,|)((

*}){,,,|)((

*),,|({}*),,|}({

boughtVPSSTOPP

boughtVPSweekNPP

NPboughtVPSmarksNPP

boughtVPSPboughtVPSNPP

boughtSVPP

boughtSruleP

)()()()(: boughtVPMarksNPweekNPboughtSrule c

Model 3

• Add Trace and wh-movement

• Given that the LHS of a rule has a gap, there are three ways to pass down the gap– Head: S(+gap)NP VP(+gap)– Left: S(+gap)NP(+gap) VP– Right: SBAR(that)(+gap)WHNP(that)

S(+gap)

Parsing results

Model 1 87.4% 88.1%

Model 2 88.1% 88.6%

Model 3 88.1% 88.6%

Tree Adjoining Grammar (TAG)

• TAG basics:

• Extension of LTAG– Lexicalized TAG (LTAG)– Synchronous TAG (STAG)– Multi-component TAG (MCTAG)– ….

TAG basics

• A tree-rewriting formalism (Joshi et. al, 1975)

• It can generate mildly context-sensitive languages.

• The primitive elements of a TAG are elementary trees.

• Elementary trees are combined by two operations: substitution and adjoining.

• TAG has been used in – parsing, semantics, discourse, etc.– Machine translation, summarization, generation, etc.

Two types of elementary trees

Initial tree: Auxiliary tree:

Substitution operation

They draft policies

Adjoining operation

They still draft policies

Derivation tree

Elementary trees

Derived tree

Derivation tree

Derived tree vs. derivation tree

• The mapping is not 1-to-1.

• Finding the best derivation is not the same as finding the best derived tree.

Wh-movement

What do they draft ?

what do

What does John think they draft ?

Long-distance wh-movement

draft i

Who did you have dinner with?

who PP

TAG extension

• Lexicalized TAG (LTAG)

• Synchronized TAG (STAG)

• Multi-component TAG (MCTAG)

• ….

• The primitive elements in STAG are elementary tree pairs.

• Used for MT

Summary of TAG

• A formalism beyond CFG• Primitive elements are trees, not rules• Extended domain of locality• Two operations: substitution and adjoining

• Parsing algorithm: • Statistical parser for TAG• Algorithms for extracting TAG from treebanks.

)( 6nO

Parsing summary

Types of parsers

• Phrase structure vs. dependency tree• Statistical vs. rule-based• Grammar-based or not• Supervised vs. unsupervised

Our focus:Phrase structureMainly statisticalMainly Grammar-based: CFG, TAGSupervised

Grammars

• Chomsky hierarchy:– Unstricted grammar (type 0)– Context-sensitive grammar – Context-free grammar– Regular grammarHuman languages are beyond context-free

• Other formalism– HPSG, LFG– TAG– Dependency grammars

Parsing algorithm for CFG

• Top-down

• Bottom-up

• Top-down with bottom-up filter

• Earley algorithm

• CYK algorithm– Requiring CFG to be in CNF– Can be augmented to deal with PCFG,

lexicalized CFG, etc.

Extensions of CFG

• PCFG: find the most likely parse trees

• Lexicalized CFG: – use less strong independence assumption– Account for certain types of lexical and

structural dependency

Beyond CFG

• History-based models– Collins’ parsers

• TAG– Tree-writing– Mildly context-sensitive grammar– Many extensions: LTAG, STAG, …

Statistical approach

• Modeling– Choose the objective function– Decompose the function:

• Common equations: joint, conditional, marginal probabilities• Independency assumptions

• Training: – Supervised vs. unsupervised– Smoothing

• Decoding– Dynamic programming– Pruning

Evaluation of parsers

• Accuracy: ParseVal

• Robustness

• Resources needed

• Efficiency

• Richness

Other things

• Converting into CNF:– CFG– PCFG– Lexicalized CFG

• Treebank annotation– Tagset: syntactic labels, POS tag, function

tag, empty categories– Format: indentation, brackets

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

Documents