CS 4705 1
Basic Parsing with Context-Free Grammars
2
Analyzing Linguistic Units
• Morphological parsing: – analyze words into morphemes and affixes
– rule-based, FSAs, FSTs
• Ngrams for Language Modeling• POS Tagging• Syntactic parsing:
– identify constituents and their relationships
– to see if a sentence is grammatical
– to assign an abstract representation of meaning
3
Syntactic Parsing
• Declarative formalisms like CFGs, FSAs define the legal strings of a language -- but only tell you ‘this is a legal string of the language X’
• Parsing algorithms specify how to recognize the strings of a language and assign each string one (or more) syntactic analyses
• Parsing useful for grammar checking, semantic analysis, MT, QA, information extraction, speech recognition…almost every task in NLP…but…
4
Parsing as a Form of Search
• Searching FSAs– Finding the right path through the automaton
– Search space defined by structure of FSA
• Searching CFGs– Finding the right parse tree among all possible parse
trees
– Search space defined by the grammar
• Constraints provided by the input sentence and the automaton or grammar
5
CFG for Fragment of English
S NP VP VP V
S Aux NP VP PP -> Prep NP
S VP N book | flight | meal | money
NP Det Nom V book | include | prefer
NP PropN Aux does
Nom N Nom Prep from | to | on
Nom N PropN Houston | TWA
Nom Nom PP Det that | this | a
VP V NP
TopD BotUp E.g. LC’s
6
S
VP
NP
Nom
V Det N
Book that flight
Parse Tree for ‘Book that flight’ for Prior CFG
7
Rule Expansion
VP V NP (2)
Nom Nom PP
PropN Houston | TWANom N (4)
Prep from | to | onNom N Nom
Aux doesNP PropN
V book | include | preferNP Det Nom (3)
N book | flight | meal | moneyS VP (1)
PP -> Prep NPS Aux NP VP
VP VS NP VP
TopD BotUp E.g. LC’s
Det that | this | a
8
Top-Down Parser
• Builds from the root S node to the leaves• Assuming we build all trees in parallel:
– Find all trees with root S (or all rules w/lhs S)
– Next expand all constituents in these trees/rules
– Continue until leaves are pos
– Candidate trees failing to match pos of input string are rejected (e.g. Book that flight matches only one subtree)
9
Top-Down Search Space for CFG (expanding only leftmost leaves)
S S S
NP VP Aux NP VP VP
S S S S S S
NP VP NP VP Aux NP VP Aux NP VP VP VP
Det Nom PropN Det Nom PropN V NP V
Det Nom
N
10
Bottom-Up Parsing
• Parser begins with words of input and builds up trees, applying grammar rules whose rhs match– Book that flight
N Det N V Det N
Book that flight Book that flight
– ‘Book’ ambiguous (2 pos appear in grammar)
– Parse continues until an S root node reached or no further node expansion possible
11
Two Candidates: One Successful Parse
S
VP
VP NP NP
Nom Nom
V Det N V Det N
Book that flight Book that flight
S ~ VP NP
12
What’s right/wrong with….
• Top-Down parsers – they never explore illegal parses (e.g. which can’t form an S) -- but waste time on trees that can never match the input
• Bottom-Up parsers – they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root)
• For both: find a control strategy -- how explore search space efficiently?– Pursuing all parses in parallel or backtrack or …?
– Which rule to apply next?
– Which node to expand next?
13
A Possible Top-Down Parsing Strategy
• Depth-first search: – Agenda of search states: expand search space
incrementally, exploring most recently generated state (tree) each time
– When you reach a state (tree) inconsistent with input, backtrack to most recent unexplored state (tree)
• Which node to expand?– Leftmost or rightmost
• Which grammar rule to use?– Order in the grammar? How?
14
Top-Down, Depth-First, Left-Right Strategy
• Initialize agenda with ‘S’ tree and ptr to first word (cur)
• Loop: Until successful parse or empty agenda– Apply next applicable grammar rule to leftmost
unexpanded node (n) of current tree (t) on agenda and push resulting tree (t’) onto agenda
• If n is a POS category and matches the POS of cur, push new tree (t’’) onto agenda and increment cur
• Else pop t’ from agenda
– Final agenda contains history of successful parse
• Does this flight include a meal?
15
Fig 10.7
CFG
16
Left Corners: Top-Down Parsing with Bottom-Up Filtering
• We saw: Top-Down, depth-first, L2R parsing – Expands non-terminals along the tree’s left edge down
to leftmost leaf of tree
– Moves on to expand down to next leftmost leaf…
– Note: In successful parse, current input word will be first word in derivation of node the parser currently processing
– So….look ahead to left-corner of the tree
• B is a left-corner of A if A =*=> Bα
• Build table with left-corners of all non-terminals in grammar and consult before applying rule
17
Left Corners
18
Left-Corner Table for CFG
Category Left Corners (POS)
S Det, PropN, Aux, V
NP Det, PropN
Nom N
VP V
19
Left Recursion vs. Right Recursion
• Depth-first search will never terminate if grammar is left recursive (e.g. NP --> NP PP)
),( **
20
• Solutions:– Rewrite the grammar (automatically?) to a weakly
equivalent one which is not left-recursivee.g. The man {on the hill with the telescope…}NP NP PP (wanted: Nom plus a sequence of PPs)NP Nom PPNP NomNom Det N…becomes…NP Nom NP’Nom Det NNP’ PP NP’ (wanted: a sequence of PPs)NP’ e• Not so obvious what these rules mean…
21
– Harder to detect and eliminate non-immediate left recursion
– NP --> Nom PP
– Nom --> NP
– Fix depth of search explicitly
– Rule ordering: non-recursive rules first
• NP --> Det Nom
• NP --> NP PP
22
An Exercise: The city hall parking lot in town
• NP NP NP PP• NP Det Nom• NP Adj Nom• NP Nom Nom• Nom NP Nom• Nom N• PP Prep NP• N city | hall | lot | town• Adj parking• Prep to | for | in
23
Another Problem: Structural ambiguity
• Multiple legal structures– Attachment (e.g. I saw a man on a hill with a telescope)
– Coordination (e.g. younger cats and dogs)
– NP bracketing (e.g. Spanish language teachers)
24
NP vs. VP Attachment
25
• Solution? – Return all possible parses and disambiguate using
“other methods”
26
Summing Up
• Parsing is a search problem which may be implemented with many control strategies– Top-Down or Bottom-Up approaches each have
problems• Combining the two solves some but not all issues
– Left recursion– Syntactic ambiguity
• Next time: Making use of statistical information about syntactic constituents– Read Ch 12