+ All Categories
Home > Documents > CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner...

CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner...

Date post: 28-Dec-2015
Category:
Upload: brian-allison
View: 212 times
Download: 0 times
Share this document with a friend
37
CS 4705 Parsing More Efficiently and Accurately
Transcript
Page 1: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

CS 4705

Parsing More Efficiently and Accurately

Page 2: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

Review

• Top-Down vs. Bottom-Up Parsers• Left-corner table provides more efficient look-

ahead• Left recursion solutions• Structural ambiguity…solutions?

Page 3: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

Issues for Better Parsing

• Efficiency• Error handling• Control strategies• Agreement and subcategorization

Page 4: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

Inefficient Re-Parsing of Subtrees

Page 5: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

Dynamic Programming

• Create table of solutions to sub-problems (e.g. subtrees) as parse proceeds

• Look up subtrees for each constituent rather than re-parsing

• Since all parses implicitly stored, all available for later disambiguation

• Examples: Cocke-Younger-Kasami (CYK) (1960), Graham-Harrison-Ruzzo (GHR) (1980) and Earley (1970) algorithms

Page 6: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

Earley’s Algorithm

• Uses dynamic programming to do parallel top-down search in (worst case) O(N3) time

• First, L2R pass fills out a chart with N+1 states (N: the number of words in the input)– Think of chart entries as sitting between words in the

input string keeping track of states of the parse at these positions

– For each word position, chart contains set of states representing all partial parse trees generated to date. E.g. chart[0] contains all partial parse trees generated at the beginning of the sentence

Page 7: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

• Chart entries represent three type of constituents:– predicted constituents (top-down predictions)

– in-progress constituents (we’re in the midst of …)

– completed constituents (we’ve found …)

• Progress in parse represented by Dotted Rules – Position of • indicates type of constituent

– 0 Book 1 that 2 flight 3

S --> • VP, [0,0] (predicting VP)

NP --> Det • Nom, [1,2] (finding NP)

VP --> V NP •, [0,3] (found VP)

– [x,y] tells us where the state begins (x) and where the dot lies (y) wrt the input

Page 8: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

S --> • VP, [0,0] – First 0 means S constituent begins at the start of the

input

– Second 0 means the dot here too

– So, this is a top-down prediction

NP --> Det • Nom, [1,2]– the NP begins at position 1

– the dot is at position 2

– so, Det has been successfully parsed

– Nom predicted next

0 Book 1 that 2 flight 3

Page 9: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

VP --> V NP •, [0,3]– Successful VP parse of entire input

– Graphical representation

Page 10: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

Successful Parse

• Final answer is found by looking at last entry in chart

• If entry resembles S --> • [0,N] then input parsed successfully

• But … note that chart will also contain a record of all possible parses of input string, given the grammar -- not just the successful one(s)– Why is this useful?

Page 11: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

Parsing Procedure for the Earley Algorithm

• Move through each set of states in order, applying one of three operators to each state:– predictor: add top-down predictions to the chart– scanner: read input and add corresponding state to chart– completer: move dot to right when new constituent found

• Results (new states) added to current or next set of states in chart

• No backtracking and no states removed: keep complete history of parse– Why is this useful?

Page 12: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

Predictor

• Intuition: new states represent top-down expectations

• Applied when non part-of-speech non-terminals are to the right of a dotS --> • VP [0,0]

• Adds new states to end of current chart– One new state for each expansion of the non-terminal

in the grammar

VP --> • V [0,0]

VP --> • V NP [0,0]

Page 13: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

Scanner

• New states for predicted part of speech.• Applicable when part of speech is to the right of a

dotVP --> • V NP [0,0] ‘Book…’

• Looks at current word in input• If match, adds state(s) to next chart

VP --> V • NP [0,1]

• I.e., we’ve found a piece of this constituent!

Page 14: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

Completer

• Intuition: we’ve found a constituent, so tell everyone waiting for this

• Applied when dot has reached right end of ruleNP --> Det Nom • [1,3]

• Find all states w/dot at 1 and expecting an NPVP --> V • NP [0,1]

• Adds new (completed) state(s) to current chartVP --> V NP • [0,3]

Page 15: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

Book that flight (Chart [0])

• Seed chart with top-down predictions for S from grammar

[0,0] Dummy start state

S NP VP [0,0] Predictor

S Aux NP VP [0,0] Predictor

S VP [0,0] Predictor

NP Det Nom [0,0] Predictor

NP PropN [0,0] Predictor

VP V [0,0] Predictor

VP V NP [0,0] Predictor

Page 16: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

CFG for Fragment of English

PropN Houston | TWA

Prep from | to | on

NP Det Nom

S VP

S Aux NP VP

S NP VP

Nom N NomNom N

Det that | this | a

N book | flight | meal | money

V book | include | prefer

Aux does

VP V NP

VP V

NP PropN

Nom Nom PP

PP Prep NP

Page 17: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

• When dummy start state is processed, it’s passed to Predictor, which produces states representing every possible expansion of S, and adds these and every expansion of the left corners of these trees to bottom of Chart[0]

• When VP --> • V, [0,0] is reached, Scanner called, which consults first word of input, Book, and adds first state to Chart[1], VP --> Book •, [0,0]

• Note: When VP --> • V NP, [0,0] is reached in Chart[0], Scanner does not need to add VP --> Book •, [0,0] again to Chart[1]

Page 18: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

Chart[1]

V book [0,1] Scanner

VP V [0,1] Completer

VP V NP [0,1] Completer

S VP [0,1] Completer

NP Det Nom [1,1] Predictor

NP PropN [1,1] Predictor

V--> book passed to Completer, which finds 2 states in Chart[0] whose left corner is V and adds them to Chart[1], moving dots to right

Page 19: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

• When VP V is itself processed by the Completer, S VP is added to Chart[1] since VP is a left corner of S

• Last 2 rules in Chart[1] are added by Predictor when VP V NP is processed

• And so on….

Page 20: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.
Page 21: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

How do we retrieve the parses at the end?

• Augment the Completer to add ptr to prior states it advances as a field in the current state– I.e. what state did we advance here?

– Read the ptrs back from the final state

Page 22: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.
Page 23: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

Error Handling

• What happens when we look at the contents of the last table column and don't find a S --> rule?– Is it a total loss? No...

– Chart contains every constituent and combination of constituents possible for the input given the grammar

• Also useful for partial parsing or shallow parsing used in information extraction

Page 24: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

Alternative Control Strategies

• Change Earley top-down strategy to bottom-up or ...

• Change to best-first strategy based on the probabilities of constituents – Compute and store probabilities of constituents in the

chart as you parse

– Then instead of expanding states in fixed order, allow probabilities to control order of expansion

Page 25: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

But there are still problems…

• Several things CFGs don’t handle elegantly:– Agreement (A cat sleeps. Cats sleep.)

S NP VP

NP Det Nom

But these rules overgenerate, allowing, e.g., *A cat sleep…

– Subcategorization (Cats dream. Cats eat cantaloupe.)

Page 26: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

VP V

VP V NP

But these also allow *Cats dream cantaloupe.

• We need to constrain the grammar rules to enforce e.g. number agreement and subcategorization differences

Page 27: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

CFG Solution

• Encode constraints into the non-terminals– Noun/verb agreement

S SgSS PlSSgS SgNP SgVPSgNP SgDet SgNom

– Verb subcat:IntransVP IntransVTransVP TransV NP

Page 28: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

• But this means huge proliferation of rules…• An alternative:

– View terminals and non-terminals as complex objects with associated features, which take on different values

– Write grammar rules whose application is constrained by tests on these features, e.g.

S NP VP (only if the NP and VP agree in number)

Page 29: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

Feature Structures

• Sets of feature-value pairs where:– Features are atomic symbols

– Values are atomic symbols or feature structures

– Illustrated by attribute-value matrix

nFeature

FeatureFeature

...2

1

nValue

ValueValue

....2

1

Page 30: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

• Number feature

• Number-person features

• Number-person-category features (3sgNP)

Num

SG

PersNumCat

3SGNP

PersNum

3SG

Page 31: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

Features, Unification and Grammars

• How do we incorporate feature structures into our grammars?– Assume that constituents are objects which have

feature-structures associated with them

– Associate sets of unification constraints with grammar rules

– Constraints must be satisfied for rule to be satisfied

• To enforce subject/verb number agreementS NP VP

<NP NUM> = <VP NUM>

Page 32: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

Agreement in English

• We need to add PERS to our subj/verb agreement constraint

This cat likes kibble.

S NP Vp

<NP AGR> = <VP AGR>

Do these cats like kibble?

S Aux NP VP

<Aux AGR> = <NP AGR>

Page 33: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

• Det/Nom agreement can be handled similarly

These cats

This cat

NP Det Nom

<Det AGR> = <Nom AGR>

<NP AGR> = <Nom AGR>• And so on …

Page 34: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

Verb Subcategorization

• Recall: Different verbs take different types of argument– Solution: SUBCAT feature, or subcategorization

frames

e.g. Bill wants George to eat.

INFVFORMHEADVPCAT

NPCATSUBCATHEAD

VCATwantORTH

,

Page 35: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

• But there are many phrasal types and so many types of subcategorization frames, e.g.– believe

– believe [VPrep in] [NP ghosts]

– believe [NP my mother]

– believe [Sfin that I will pass this test]

– believe [Swh what I see] ...

• Verbs also subcategorize for subject as well as object types ([Swh What she wanted] seemed clear.)

• And other p.o.s. can be seen as subcategorizing for various arguments, such as prepositions, nouns and adjectives (It was clear [Sfin that she was exhausted])

Page 36: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.
Page 37: CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.

Summing Up• Ambiguity, left-recursion, and repeated re-parsing

of subtrees present major problems for parsers• Solutions:

– Combine top-down predictions with bottom-up look-ahead, use dynamic programming e.g. the Earley algorithm

– Feature structures and subcategorization frames help constrain parses but increase parsing complexity

• Next time: Read Ch 12


Recommended