+ All Categories
Home > Documents > Parsing I Context-free grammars and issues for parsers.

Parsing I Context-free grammars and issues for parsers.

Date post: 16-Dec-2015
Category:
Upload: darcy-adams
View: 234 times
Download: 1 times
Share this document with a friend
Popular Tags:
45
Parsing I Context-free grammars and issues for parsers
Transcript
Page 1: Parsing I Context-free grammars and issues for parsers.

Parsing I

Context-free grammars and issues for parsers

Page 2: Parsing I Context-free grammars and issues for parsers.

2/45

Bibliography

• More or less all books on CL or NLP will have chapters on parsing, and some may be all or mostly about parsing

• Many are written for computer scientists– They explain linguistic

things like POSs and PS grammars

– They go into detail about implementation (eg talk of Earley’s algorithm, shift-reduce parsers)

D Jurafsky & JH Martin Speech and Language Processing, Upper Saddle River NJ (2000): Prentice Hall. Chs 9 & 10

RM Kaplan ‘Syntax’ = Ch 4 of R Mitkov (ed) The Oxford Handbook of Computational Linguistics, Oxford (2003): OUP

J Allen Natural Language Understanding (2nd ed) (1994): Addison Wesley

Page 3: Parsing I Context-free grammars and issues for parsers.

3/45

Parsing

• Bedrock of (almost) all NLP• Familiar from linguistics• But issues for NLP are practicalities rather

than universality– Implementation– Grammar writing– Interplay with lexicons– Suitability of representation (what do the trees

show?)

Page 4: Parsing I Context-free grammars and issues for parsers.

4/45

Basic issues

• Top-down vs. bottom-up

• Handling ambiguity– Lexical ambiguity– Structural ambiguity

• Breadth first vs. depth first

• Handling recursive rules

• Handling empty rules

Page 5: Parsing I Context-free grammars and issues for parsers.

5/45

Some terminology

• Rules written A B co Terminal vs. non-terminal symbolso Left-hand side (head): always non-terminalo Right-hand side (body): can be mix of terminal

and non-terminal, any number of themo Unique start symbol (usually S)o ‘’ “rewrites as”, but is not directional (an “=”

sign would be better)

Page 6: Parsing I Context-free grammars and issues for parsers.

6/45

S NP VPNP det nVP vVP v NP

1. Top-down with simple grammar

Lexicondet {an, the}n {elephant, man} v shot

S NP VP S

NP VP

NP det n

the man shot an elephant

det n

det {an, the}

the

n {elephant, man}

man

VP vVP v NP

v

v shot shot

No more rules, but input is not completely accounted for…So we must backtrack, and try the other VP rule

Page 7: Parsing I Context-free grammars and issues for parsers.

7/45

Lexicondet {an, the}n {elephant, man} v shot

S NP VPNP det nVP vVP v NP

1. Top-down with simple grammar

S NP VP S

NP VP

NP det n

the man shot an elephant

det n

det {an, the}

the

n {elephant, man}

man

VP vVP v NP

v NP

v shot shot

NP det n

det n

det {an, the} an

n {elephant, man}

elephant

No more rules, and input is completely accounted for

Page 8: Parsing I Context-free grammars and issues for parsers.

8/45

Breadth-first vs depth-first (1)

• When we came to the VP rule we were faced with a choice of two rules

• “Depth-first” means following the first choice through to the end

• “Breadth-first” means keeping all your options open

• We’ll see this distinction more clearly later,

• And also see that it is quite significant

Page 9: Parsing I Context-free grammars and issues for parsers.

9/45

S NP VPNP det nVP vVP v NP

2. Bottom-up with simple grammar

Lexicondet {an, the}n {elephant, man} v shot

S NP VP

S

NP det n

the man shot an elephant

VP v

NP VP

We’ve reached the top, but input is not completely accounted for…So we must backtrack, and try the other VP rule

det {an, the}n {elephant, man} v shot

det n v det n

NP

VP v NP

VP

S NP VP

S

We’ve reached the top, and input is completely accounted for

Page 10: Parsing I Context-free grammars and issues for parsers.

10/45

S NP VPNP det nVP vVP v NP

Same again but with lexical ambiguity

Lexicondet {an, the}n {elephant, man, shot} v shot

shot can be v or n

Page 11: Parsing I Context-free grammars and issues for parsers.

11/45

S NP VPNP det nVP vVP v NP

3. Top-down with lexical ambiguity

Lexicondet {an, the}n {elephant, man, shot} v shot

S NP VP S

NP VP

NP det n

the man shot an elephant

det n

det {an, the}

the

n {elephant, man}

man

VP vVP v NP

v NP

shot det n

an elephantSame as before: at this point, we are looking for a v, and shot fits the bill; the n reading never comes into play

Page 12: Parsing I Context-free grammars and issues for parsers.

12/45

S NP VPNP det nVP vVP v NP

4. Bottom-up with lexical ambiguity

Lexicondet {an, the}n {elephant, man, shot} v shot

S NP VP

S

NP det n

the man shot an elephant

VP v

NP VP

det {an, the}n {elephant, man, shot} v shot

det n v det n

NP

n

VP v NP VP

S

Terminology:graphnodesarcs (edges)

Page 13: Parsing I Context-free grammars and issues for parsers.

13/45

det n

S NP VPNP det nVP vVP v NP

4. Bottom-up with lexical ambiguity

Lexicondet {an, the}n {elephant, man, shot} v shot

the man shot an elephant

NP

det n v

NP

n

VP

VPS

S

Let’s get rid of all the unused arcs

Page 14: Parsing I Context-free grammars and issues for parsers.

14/45

det n

S NP VPNP det nVP vVP v NP

4. Bottom-up with lexical ambiguity

Lexicondet {an, the}n {elephant, man, shot} v shot

the man shot an elephant

NP

det n v

NP

VP

S

Let’s get rid of all the unused arcs

Page 15: Parsing I Context-free grammars and issues for parsers.

15/45

det n

S NP VPNP det nVP vVP v NP

4. Bottom-up with lexical ambiguity

Lexicondet {an, the}n {elephant, man, shot} v shot

the man shot an elephant

NP

det n v

NP

VP

SAnd let’s clear away all the arcs…

Page 16: Parsing I Context-free grammars and issues for parsers.

16/45

det n

S NP VPNP det nVP vVP v NP

4. Bottom-up with lexical ambiguity

Lexicondet {an, the}n {elephant, man, shot} v shot

the man shot an elephant

NP

det n v

NP

VP

SAnd let’s clear away all the arcs…

Page 17: Parsing I Context-free grammars and issues for parsers.

17/45

Breadth-first vs depth-first (2)

• In chart parsing, the distinction is more clear cut:

• At any point there may be a choice of things to do: which arcs to develop

• Breadth-first vs. depth-first can be seen as what order they are done in

• Queue (FIFO = breadth-first) vs. stack (LIFO= depth-first)

Page 18: Parsing I Context-free grammars and issues for parsers.

18/45

S NP VPNP det nNP det n PPVP vVP v NPVP v NP PPPP prep NP

Same again but with structural ambiguity

Lexicondet {an, the, his}n {elephant, man, shot, pyjamas} v shot prep in

S

NP VP

det n

the man

v NP

shot det n

an elephantprep NP

in det n

his pyjamas

PP

in his pyjamasthe man shot an elephant

We introduce a PP rule in two places

Page 19: Parsing I Context-free grammars and issues for parsers.

19/45

Lexicondet {an, the, his}n {elephant, man, shot, pyjamas} v shot prep in

S

NP VP

det n

the man

v NP

shot det n

an elephantprep NP

in det n

his pyjamas

PP

in his pyjamasthe man shot an elephant

We introduce a PP rule in two places

S NP VPNP det nNP det n PPVP vVP v NPVP v NP PPPP prep NP

Same again but with structural ambiguity

Page 20: Parsing I Context-free grammars and issues for parsers.

20/45

S NP VP S

NP VP

NP det n NP det n PP

the man shot an elephant in his pyjamas

det n

det {an, the, his}

n {elephant, man, shot, pyjamas}

5. Top-down with structural ambiguity

At this point, depending on our strategy (breadth-first vs. depth-first) we may consider the NP complete and look for the VP, or we may try the second NP rule.Let’s see what happens in the latter case.

PP prep NPprep in

PP

prep NP

The next word, shot, isn’t a prep,So this rule simply fails

the man

S NP VPNP det nNP det n PPVP vVP v NPVP v NP PPPP prep NP

Page 21: Parsing I Context-free grammars and issues for parsers.

21/45

S NP VP S

NP VP

NP det n NP det n PP

the man shot an elephant in his pyjamas

det n

det {an, the, his}

the

n {elephant, man, shot, pyjamas}

man

VP vVP v NPVP v NP PPv shot

an elephant

5. Top-down with structural ambiguity

shot

v

As before, the first VP rule works,But does not account for all the input.

v NP

shot

NP det n NP det n PP

det n

det {an, the, his}

n {elephant, man, shot, pyjamas} Similarly, if we try the second VP rule, and the

first NP rule …

S NP VPNP det nNP det n PPVP vVP v NPVP v NP PPPP prep NP

Page 22: Parsing I Context-free grammars and issues for parsers.

22/45

S NP VP S

NP VP

NP det n NP det n PP

the man shot an elephant in his pyjamas

det n

det {an, the, his}

the

n {elephant, man, shot, pyjamas}

man

VP vVP v NPVP v NP PPv shot

5. Top-down with structural ambiguity

shot

v

NP det n NP det n PP

So what do we try next?This?Or this?

Depth-first: it’s a stack, LIFO

Breadth-first: it’s a queue, FIFO

S NP VPNP det nNP det n PPVP vVP v NPVP v NP PPPP prep NP

an elephant

v NP

shot det n

Page 23: Parsing I Context-free grammars and issues for parsers.

23/45

S NP VP S

NP VP

NP det n NP det n PP

the man shot an elephant in his pyjamas

det n

det {an, the, his}

the

n {elephant, man, shot, pyjamas}

man

VP vVP v NPVP v NP PPv shot

5. Top-down with structural ambiguity (depth-first)

NP det n NP det n PP an elephant

v NP

det {an, the, his}n {elephant, man, shot, pyjamas}

shot det n PP

PP prep NPprep in

prep NP

in his pyjamas

S NP VPNP det nNP det n PPVP vVP v NPVP v NP PPPP prep NP

Page 24: Parsing I Context-free grammars and issues for parsers.

24/45

S NP VP S

NP VP

NP det n NP det n PP

the man shot an elephant in his pyjamas

det n

det {an, the, his}

the

n {elephant, man, shot, pyjamas}

man

VP vVP v NPVP v NP PPv shot

5. Top-down with structural ambiguity (breadth-first)

NP det n NP det n PP

v NP PP

prep NP

in his pyjamas

shot det n

andet {an, the, his}

elephant

n {elephant, man, shot, pyjamas}

PP prep NPprep in

S NP VPNP det nNP det n PPVP vVP v NPVP v NP PPPP prep NP

Page 25: Parsing I Context-free grammars and issues for parsers.

25/45

Recognizing ambiguity

• Notice how the choice of strategy determines which result we get (first).

• In both strategies, there are often rules left untried, on the list (whether queue or stack).

• If we want to know if our input is ambiguous, at some time we do have to follow these through.

• As you will see later, trying out alternative paths can be quite intensive

Page 26: Parsing I Context-free grammars and issues for parsers.

26/45

6. Bottom-up with structural ambiguity

S NP VP

NP det n

the man shot an elephant in his pyjamas

VP v

NP NPVP

VP v NP

S NP VPNP det nNP det n PPVP vVP v NPVP v NP PPPP prep NP

det n v det n prep det n

NP

NP det n PP

PP prep NP

PP

NP

VP

VP

VP v NP PP

VP

S

S

S

S

Page 27: Parsing I Context-free grammars and issues for parsers.

27/45

6. Bottom-up with structural ambiguity

the man shot an elephant in his pyjamas

NP NP

S NP VPNP det nNP det n PPVP vVP v NPVP v NP PPPP prep NP

det n v det n prep det n

NP

PP

NP

VP

VP

VP

VP

S

S

S

Page 28: Parsing I Context-free grammars and issues for parsers.

28/45

Recursive rules

• “Recursive” rules call themselves

• We already have some recursive rule pairs: NP det n PP

PP prep NP

• Rules can be immediately recursive AdjG adj AdjG

(the) big fat ugly (man)

Page 29: Parsing I Context-free grammars and issues for parsers.

29/45

Recursive rulesLeft recursiveAdjG AdjG adjAdjG adj

Right recursiveAdjG adj AdjGAdjG adj

AdjG

AdjG adj

AdjG adj

big fat rich old

AdjG

AdjGadj

AdjGadj

big fat rich old

Page 30: Parsing I Context-free grammars and issues for parsers.

30/45

NP

det n

the

NP det nNP det AdjG nAdjG AdjG adjAdjG adj

7. Top-down with left recursion

NP det nNP det AdjG n

the big fat rich old man

the

AdjG AdjG adjAdjG adj

NP

det AdjG n

AdjG adj

AdjG adj

AdjG adj

AdjG adj

AdjG adj

You can’t have left-recursive rules with a top-down parser, even if the non-recursive rule is first

Page 31: Parsing I Context-free grammars and issues for parsers.

31/45

NP det nNP det AdjG nAdjG adj AdjGAdjG adj

7. Top-down with right recursion

NP det nNP det AdjG n

the big fat rich old man

the

AdjG adj AdjGAdjG adj

NP

det AdjG n

adj AdjG

big adj AdjG

fat adj AdjG

rich adj AdjG

old adj AdjG

adj

old

man

old

Page 32: Parsing I Context-free grammars and issues for parsers.

32/45

NP det nNP det AdjG nAdjG AdvG adj AdjGAdjG adjAdvG AdvG advAdvG adv

8. Bottom-up with left and right recursion

AdjG rule is right recursive,AdvG rule is left recursive

the very very fat ugly man

det adv adv adj adj n

AdjG adj

AdjG AdjG

AdvG adv

AdvG AdvG

AdjG AdvG adj AdjG

AdjG

AdvG AdvG adv

AdvG

AdjG AdvG adj AdjG AdjGNP det AdjG n

NP

Quite a few useless paths, but overall no difficulty

Page 33: Parsing I Context-free grammars and issues for parsers.

33/45

NP det nNP det AdjG nAdjG AdvG adj AdjGAdjG adjAdvG AdvG advAdvG adv

8. Bottom-up with left and right recursion

AdjG rule is right recursive,AdvG rule is left recursive

the very very fat ugly man

det adv adv adj adj n

AdjG adj

AdjG

AdvG adv

AdvG

AdjG AdvG adj AdjG AdvG AdvG adv

AdvG

AdjG AdvG adj AdjG AdjGNP det AdjG n

NP

Page 34: Parsing I Context-free grammars and issues for parsers.

34/45

Empty rules

• For exampleNP det AdjG n

AdjG adj AdjG

AdjG ε

• Equivalent toNP det AdjG n

NP det n

AdjG adj

AdjG adj AdjG

• Or NP det (AdjG) n

AdjG adj (AdjG)

Page 35: Parsing I Context-free grammars and issues for parsers.

35/45

NP det AdjG nAdjG adj AdjGAdjG ε

7. Top-down with empty rules

NP det AdjG n

the man

the

AdjG adj AdjGAdjgG ε

NP

det AdjG n

adj AdjG man

NP det AdjG n

the big fat man

the

AdjG adj AdjGAdjgG ε

NP

det AdjG n

adj AdjG

big adj AdjG

fat

man

Page 36: Parsing I Context-free grammars and issues for parsers.

36/45

8. Bottom-up with empty rules

the fat man

det adj n

AdjG ε

AdjG adj AdjG

NP det AdjG nNP

Lots of useless paths, especially in a long sentence, but otherwise no difficulty

NP det AdjG nAdjG adj AdjGAdjG ε

AdjGAdjG AdjG AdjG

AdjG

Page 37: Parsing I Context-free grammars and issues for parsers.

37/45

Some additions to formalism

• Recursive rules build unattractive tree structures: you’d rather have flat trees with unrestricted numbers of daughters

• Kleene star

• AdjG adj*

NP

AdjG

det adj adj adj adj n

the big fat old ugly man

Page 38: Parsing I Context-free grammars and issues for parsers.

38/45

Some additions to formalism

• As grammars grow, the rule combinations multiply and it gets clumsyNP det nNP det AdjG nNP det n PPNP det AdjG n PPNP nNP AdjG nNP n PPNP AdjG n PP

NP (det) n (AdjG) n (PP)

Page 39: Parsing I Context-free grammars and issues for parsers.

39/45

Processing implications

• Parsing with Kleene star– Neatly combines empty rules and recursive

rules

Page 40: Parsing I Context-free grammars and issues for parsers.

40/45

NP det adj* n9. Top-down with Kleene star

NP det adj* n

the man

the

adj* = adj adj*adj* = ε

man

NP det adj* n

the big fat man

the

adj* = adj adj*adj* = ε

NP

det adj* n

big fat man

NP

det adj* nadj adj* adj adj*adj adj*

Page 41: Parsing I Context-free grammars and issues for parsers.

41/45

10. Bottom-up with Kleene star

the fat man

det adj n

NP det adj* n

NP

NP det adj* n

det adj n

NP

the fat ugly man

adj

Page 42: Parsing I Context-free grammars and issues for parsers.

42/45

Processing implications

• Parsing with bracketed symbols– Parser has to expand rules

• either in a single pass beforehand• or (better) on the fly (as it comes to them)

– So bracketing convention is just a convenience for rule-writers

NP (det) n (AdjG) n (PP) NP det n (AdjG) n (PP)

NP n (AdjG) n (PP)

Page 43: Parsing I Context-free grammars and issues for parsers.

43/45

Top down vs. bottom-up

• Bottom-up builds many useless trees• Top-down can propose false trails, sometimes

quite long, which are only abandoned when they reach the word level– Especially a problem if breadth-first

• Bottom-up very inefficient with empty rules• Top-down CANNOT handle left-recursion• Top-down cannot do partial parsing

– Especially useful for speech

• Wouldn’t it be nice to combine them to get the advantages of both?

Page 44: Parsing I Context-free grammars and issues for parsers.

44/45

Left-corner parsing

• The “left corner” of a rule is the first symbol after the rewrite arrow– e.g. in S NP VP, the left corner is NP.

• Left corner parsing starts bottom-up, taking the first item off the input and finding a rule for which it is the left corner.

• This provides a top-down prediction, but we continue working bottom-up until the prediction is fulfilled.

• When a rule is completed, apply the left-corner principle: is that completed constituent a left-corner?

Page 45: Parsing I Context-free grammars and issues for parsers.

45/45

S NP VPNP det nVP v VP v NP

9. Left-corner with simple grammar

NP det n

the man shot an elephant

VP v

det

the man

NP

n

S NP VP

S

VP

shot

v

but text not allaccounted for,so tryVP v NP

NP

det

an

NP det nn

elephant


Recommended