Parsing CFGs with Stacks
Outline
The general parsing task
Top Down Parsing
Parsing CFGs with Stacks
Outline
Top-down parsing
Parsing CFGs with Stacks
The general parsing task
Outline
The general parsing task
Parsing CFGs with Stacks
The general parsing task
The basic parsing task
◮ Given a grammar G, a category x and an input string w1 . . .wn, the job of
a parser is to discover whether G categorises w1 . . .wn as x ,
Parsing CFGs with Stacks
The general parsing task
The basic parsing task
◮ Given a grammar G, a category x and an input string w1 . . .wn, the job of
a parser is to discover whether G categorises w1 . . .wn as x ,
◮ or equivalently, whether it permits any analysis tree whose topmost node
is x and whose leaves are w1 . . .wn.
Parsing CFGs with Stacks
The general parsing task
The basic parsing task
◮ Given a grammar G, a category x and an input string w1 . . .wn, the job of
a parser is to discover whether G categorises w1 . . .wn as x ,
◮ or equivalently, whether it permits any analysis tree whose topmost node
is x and whose leaves are w1 . . .wn.
◮ Variants of this:
◮ find all parse trees, if there is more than one◮ find also the x ’s which categorise the input, rather than assuming this is
given
Parsing CFGs with Stacks
The general parsing task
Top-down vs Bottom-up
◮ There are many ways a parser might manage the search process.
Parsing CFGs with Stacks
The general parsing task
Top-down vs Bottom-up
◮ There are many ways a parser might manage the search process.
◮ If a parser expands a tree down towards its leaves it is said to be
working top-down.
Parsing CFGs with Stacks
The general parsing task
Top-down vs Bottom-up
◮ There are many ways a parser might manage the search process.
◮ If a parser expands a tree down towards its leaves it is said to be
working top-down.
◮ By contrast a bottom-up parser fuses subtrees together with the aim of
making a single encompassing tree.
Parsing CFGs with Stacks
Top Down Parsing
Outline
Top Down Parsing
Parsing CFGs with Stacks
Top Down Parsing
As a beginning
example take the
following grammar
s ⇒ sadv , s
s ⇒ np, vp
np ⇒ john
vp ⇒ iv
iv ⇒ walks
sadv ⇒ maybe
Parsing CFGs with Stacks
Top Down Parsing
As a beginning
example take the
following grammar
s ⇒ sadv , s
s ⇒ np, vp
np ⇒ john
vp ⇒ iv
iv ⇒ walks
sadv ⇒ maybe
maybe john walks is an s according to this grammar
with the syntax analysis
s
john
sadv
maybe
s
np vp
iv
walks
Parsing CFGs with Stacks
Top Down Parsing
◮ A top-down parser will effectively seek to settle the questions ’is the input
an x ’ by deriving a tree in a succession of stages, starting with just a
single node x-tree and ending with the complete tree
Parsing CFGs with Stacks
Top Down Parsing
◮ A top-down parser will effectively seek to settle the questions ’is the input
an x ’ by deriving a tree in a succession of stages, starting with just a
single node x-tree and ending with the complete tree
◮ At every stage of this process of tree derivation, there are choices to be
made
Parsing CFGs with Stacks
Top Down Parsing
◮ A top-down parser will effectively seek to settle the questions ’is the input
an x ’ by deriving a tree in a succession of stages, starting with just a
single node x-tree and ending with the complete tree
◮ At every stage of this process of tree derivation, there are choices to be
made
◮ One choice is which node to expand
Parsing CFGs with Stacks
Top Down Parsing
◮ A top-down parser will effectively seek to settle the questions ’is the input
an x ’ by deriving a tree in a succession of stages, starting with just a
single node x-tree and ending with the complete tree
◮ At every stage of this process of tree derivation, there are choices to be
made
◮ One choice is which node to expand
◮ the other choice is how to expand each node
Parsing CFGs with Stacks
Top Down Parsing
Which node to work on ?
To illustrate the first kind of choice, consider the following two derivations
(you’ll have to zoom/magnify to see this):
Parsing CFGs with Stacks
Top Down Parsing
Which node to work on ?
To illustrate the first kind of choice, consider the following two derivations
(you’ll have to zoom/magnify to see this):
walks
np vp
maybe np vp
s
sadv s
walks
s
sadv s
maybe
s
sadv s
maybe np vp
john
s
sadv s
maybe np vp
ivjohn
s
sadv s
maybe np vp
iv
maybe np vp
s
sadv s
iv
maybe np vp
s
sadv s
john
s
sadv s
s
sadv s
s
sadv s
ivjohn
s
sadv s
maybe np vp
iv
maybe np vp
s
sadv s
john
Parsing CFGs with Stacks
Top Down Parsing
Which node to work on ?
To illustrate the first kind of choice, consider the following two derivations
(you’ll have to zoom/magnify to see this):
walks
np vp
maybe np vp
s
sadv s
walks
s
sadv s
maybe
s
sadv s
maybe np vp
john
s
sadv s
maybe np vp
ivjohn
s
sadv s
maybe np vp
iv
maybe np vp
s
sadv s
iv
maybe np vp
s
sadv s
john
s
sadv s
s
sadv s
s
sadv s
ivjohn
s
sadv s
maybe np vp
iv
maybe np vp
s
sadv s
john
Parsing CFGs with Stacks
Top Down Parsing
Which node to work on ?
To illustrate the first kind of choice, consider the following two derivations
(you’ll have to zoom/magnify to see this):
walks
np vp
maybe np vp
s
sadv s
walks
s
sadv s
maybe
s
sadv s
maybe np vp
john
s
sadv s
maybe np vp
ivjohn
s
sadv s
maybe np vp
iv
maybe np vp
s
sadv s
iv
maybe np vp
s
sadv s
john
s
sadv s
s
sadv s
s
sadv s
ivjohn
s
sadv s
maybe np vp
iv
maybe np vp
s
sadv s
john
◮ In the first derivation, there is a system to the way
the tree is grown
Parsing CFGs with Stacks
Top Down Parsing
Which node to work on ?
To illustrate the first kind of choice, consider the following two derivations
(you’ll have to zoom/magnify to see this):
walks
np vp
maybe np vp
s
sadv s
walks
s
sadv s
maybe
s
sadv s
maybe np vp
john
s
sadv s
maybe np vp
ivjohn
s
sadv s
maybe np vp
iv
maybe np vp
s
sadv s
iv
maybe np vp
s
sadv s
john
s
sadv s
s
sadv s
s
sadv s
ivjohn
s
sadv s
maybe np vp
iv
maybe np vp
s
sadv s
john
◮ In the first derivation, there is a system to the way
the tree is grown
◮ in the second derivation, the tree growth is
random.
Parsing CFGs with Stacks
Top Down Parsing
Which node to work on ?
To illustrate the first kind of choice, consider the following two derivations
(you’ll have to zoom/magnify to see this):
walks
np vp
maybe np vp
s
sadv s
walks
s
sadv s
maybe
s
sadv s
maybe np vp
john
s
sadv s
maybe np vp
ivjohn
s
sadv s
maybe np vp
iv
maybe np vp
s
sadv s
iv
maybe np vp
s
sadv s
john
s
sadv s
s
sadv s
s
sadv s
ivjohn
s
sadv s
maybe np vp
iv
maybe np vp
s
sadv s
john
◮ In the first derivation, there is a system to the way
the tree is grown
◮ in the second derivation, the tree growth is
random.
◮ in the first derivation at every step
the leftmost expandable leaf node is
expanded
Parsing CFGs with Stacks
Top Down Parsing
Which node to work on ?
To illustrate the first kind of choice, consider the following two derivations
(you’ll have to zoom/magnify to see this):
walks
np vp
maybe np vp
s
sadv s
walks
s
sadv s
maybe
s
sadv s
maybe np vp
john
s
sadv s
maybe np vp
ivjohn
s
sadv s
maybe np vp
iv
maybe np vp
s
sadv s
iv
maybe np vp
s
sadv s
john
s
sadv s
s
sadv s
s
sadv s
ivjohn
s
sadv s
maybe np vp
iv
maybe np vp
s
sadv s
john
◮ In the first derivation, there is a system to the way
the tree is grown
◮ in the second derivation, the tree growth is
random.
◮ in the first derivation at every step
the leftmost expandable leaf node is
expanded
◮ The key fact is this:
if there is an analysis tree for some
input, then it can be generated by
applying leftmost expansion
Parsing CFGs with Stacks
Top Down Parsing
◮ so an algorithm to explore the space of tree derivations, can restrict
attention to the derivations which use leftmost expansion
Parsing CFGs with Stacks
Top Down Parsing
◮ so an algorithm to explore the space of tree derivations, can restrict
attention to the derivations which use leftmost expansion
◮ this means the ’which node’ source of choice can be eliminated: always
deterministically choose the leftmost unexpanded node.
Parsing CFGs with Stacks
Top Down Parsing
◮ so an algorithm to explore the space of tree derivations, can restrict
attention to the derivations which use leftmost expansion
◮ this means the ’which node’ source of choice can be eliminated: always
deterministically choose the leftmost unexpanded node.
◮ there is still the other source of choise, of non-determinism: more than
one way to expand a given node.
Parsing CFGs with Stacks
Top Down Parsing
◮ so an algorithm to explore the space of tree derivations, can restrict
attention to the derivations which use leftmost expansion
◮ this means the ’which node’ source of choice can be eliminated: always
deterministically choose the leftmost unexpanded node.
◮ there is still the other source of choise, of non-determinism: more than
one way to expand a given node. This still has to be dealt with, but to
begin we will get familiar with the deterministic case.
Parsing CFGs with Stacks
Top Down Parsing
The frontier
Summarising the first derivation as a
series of snap shots of the leaf nodes,
you have:
Leaf nodes
s
sadv s
maybe s
maybe np vp
maybe john vp
maybe john iv
maybe john walks
Parsing CFGs with Stacks
Top Down Parsing
The frontier
Summarising the first derivation as a
series of snap shots of the leaf nodes,
you have:
Leaf nodes
s
sadv s
maybe s
maybe np vp
maybe john vp
maybe john iv
maybe john walks
Let use the term Frontier for the
subset of the leaf-nodes which are
expandable
Leaf nodes Frontier
s s
sadv s sadv s
maybe s s
maybe np vp np vp
maybe john vp vp
maybe john iv iv
maybe john walks
Parsing CFGs with Stacks
Top Down Parsing
The frontier as a stack
Leaf nodes Frontier
s s
sadv s sadv s
maybe s s
maybe np vp np vp
maybe john vp vp
maybe john iv iv
maybe john walks
Parsing CFGs with Stacks
Top Down Parsing
The frontier as a stack
Leaf nodes Frontier
s s
sadv s sadv s
maybe s s
maybe np vp np vp
maybe john vp vp
maybe john iv iv
maybe john walks
◮ Because of the choice to always
take the leftmost unexpanded
node, the frontier operates in the
fashion of a stack.
with a last-in/first-out (LIFO)
behaviour.
Parsing CFGs with Stacks
Top Down Parsing
The frontier as a stack
Leaf nodes Frontier
s s
sadv s sadv s
maybe s s
maybe np vp np vp
maybe john vp vp
maybe john iv iv
maybe john walks
◮ Because of the choice to always
take the leftmost unexpanded
node, the frontier operates in the
fashion of a stack.
with a last-in/first-out (LIFO)
behaviour.
◮ You can keep adding to the top of
a stack (pushing),
Parsing CFGs with Stacks
Top Down Parsing
The frontier as a stack
Leaf nodes Frontier
s s
sadv s sadv s
maybe s s
maybe np vp np vp
maybe john vp vp
maybe john iv iv
maybe john walks
◮ Because of the choice to always
take the leftmost unexpanded
node, the frontier operates in the
fashion of a stack.
with a last-in/first-out (LIFO)
behaviour.
◮ You can keep adding to the top of
a stack (pushing),
◮ and its the most recently added
things that you can remove
(popping) and replace (more
pushing).
Parsing CFGs with Stacks
Top Down Parsing
this leads to the idea that one can manage the search through the space of
possible tree derivations, by managing a search through a space of possible
stack states.
Parsing CFGs with Stacks
Top Down Parsing
this leads to the idea that one can manage the search through the space of
possible tree derivations, by managing a search through a space of possible
stack states.
can now give an outline of a top-down algorithm.
Parsing CFGs with Stacks
Top Down Parsing
this leads to the idea that one can manage the search through the space of
possible tree derivations, by managing a search through a space of possible
stack states.
can now give an outline of a top-down algorithm.
Let w be an array representing the input,
let i be the index of the current word.
use F for the frontier of nodes in the tree that are due to be expanded.
Top-down parsing algorithm (without backtracking)
set F to start symbol, progress indicator i = 0
Top-down parsing algorithm (without backtracking)
set F to start symbol, progress indicator i = 0
MOVES:let A = top(F)loop thru the rules {
Top-down parsing algorithm (without backtracking)
set F to start symbol, progress indicator i = 0
MOVES:let A = top(F)loop thru the rules {
if (rule is A → w [i ]){ //LEAF CANCELLATIONpop top of F
set i = i+1
goto MOVES
}
Top-down parsing algorithm (without backtracking)
set F to start symbol, progress indicator i = 0
MOVES:let A = top(F)loop thru the rules {
if (rule is A → w [i ]){ //LEAF CANCELLATIONpop top of F
set i = i+1
goto MOVES
}
else if (rule is A → D1 . . . Dn){ //LEFT EXPANSIONpop top of Fpush Dn ... push D1 note ordergoto MOVES
}
}
Top-down parsing algorithm (without backtracking)
set F to start symbol, progress indicator i = 0
MOVES:let A = top(F)loop thru the rules {
if (rule is A → w [i ]){ //LEAF CANCELLATIONpop top of F
set i = i+1
goto MOVES
}
else if (rule is A → D1 . . . Dn){ //LEFT EXPANSIONpop top of Fpush Dn ... push D1 note ordergoto MOVES
}
}
YES_NO:
if ((F is empty) && (i == size of input)) { succeed }
else { fail }
Parsing CFGs with Stacks
Top Down Parsing
About the top-down algorithm
◮ algorithm keeps looking for a move it can make to update its progress
through the input and the stack of categories F.
Parsing CFGs with Stacks
Top Down Parsing
About the top-down algorithm
◮ algorithm keeps looking for a move it can make to update its progress
through the input and the stack of categories F.
◮ first kind of move, leaf cancellation, recognises that the top the stack
represents a node which could have the current word attached
underneath it. Doing so removes a category off the stack and moves
progress through the input by 1.
Parsing CFGs with Stacks
Top Down Parsing
About the top-down algorithm
◮ algorithm keeps looking for a move it can make to update its progress
through the input and the stack of categories F.
◮ first kind of move, leaf cancellation, recognises that the top the stack
represents a node which could have the current word attached
underneath it. Doing so removes a category off the stack and moves
progress through the input by 1.
◮ second kind of move, left expansion, recognises that the top of the stack
represents a node which could have a sequence of daughters
corresponding to the right-hand side of rule attached underneath it.
Parsing CFGs with Stacks
Top Down Parsing
About the top-down algorithm
◮ algorithm keeps looking for a move it can make to update its progress
through the input and the stack of categories F.
◮ first kind of move, leaf cancellation, recognises that the top the stack
represents a node which could have the current word attached
underneath it. Doing so removes a category off the stack and moves
progress through the input by 1.
◮ second kind of move, left expansion, recognises that the top of the stack
represents a node which could have a sequence of daughters
corresponding to the right-hand side of rule attached underneath it.
◮ in checking if a move is possible, the grammar rules are considered in
order from top to bottom
Parsing CFGs with Stacks
Top Down Parsing
About the top-down algorithm
◮ algorithm keeps looking for a move it can make to update its progress
through the input and the stack of categories F.
◮ first kind of move, leaf cancellation, recognises that the top the stack
represents a node which could have the current word attached
underneath it. Doing so removes a category off the stack and moves
progress through the input by 1.
◮ second kind of move, left expansion, recognises that the top of the stack
represents a node which could have a sequence of daughters
corresponding to the right-hand side of rule attached underneath it.
◮ in checking if a move is possible, the grammar rules are considered in
order from top to bottom
◮ note in left expansion rules daughters must be pushed in a last-to-first
order, to guarantee that first daughter ends up on top of the stack.
an example
eg.
s → np, vp
np → det ,n
det → the
n → man
n → dog
vp → tv , np
tv → hit
an example
eg.
s → np, vp
np → det ,n
det → the
n → man
n → dog
vp → tv , np
tv → hit
an example
eg.
s → np, vp
np → det ,n
det → the
n → man
n → dog
vp → tv , np
tv → hit
parsing the man hit the dog (top of stack show
at left):
WORDS STACKthe man hit the dog s
an example
eg.
s → np, vp
np → det ,n
det → the
n → man
n → dog
vp → tv , np
tv → hit
parsing the man hit the dog (top of stack show
at left):
WORDS STACKthe man hit the dog s
the man hit the dog np vp
an example
eg.
s → np, vp
np → det ,n
det → the
n → man
n → dog
vp → tv , np
tv → hit
parsing the man hit the dog (top of stack show
at left):
WORDS STACKthe man hit the dog s
the man hit the dog np vp
the man hit the dog det n vp
an example
eg.
s → np, vp
np → det ,n
det → the
n → man
n → dog
vp → tv , np
tv → hit
parsing the man hit the dog (top of stack show
at left):
WORDS STACKthe man hit the dog s
the man hit the dog np vp
the man hit the dog det n vp
man hit the dog n vp
an example
eg.
s → np, vp
np → det ,n
det → the
n → man
n → dog
vp → tv , np
tv → hit
parsing the man hit the dog (top of stack show
at left):
WORDS STACKthe man hit the dog s
the man hit the dog np vp
the man hit the dog det n vp
man hit the dog n vp
hit the dog vp
an example
eg.
s → np, vp
np → det ,n
det → the
n → man
n → dog
vp → tv , np
tv → hit
parsing the man hit the dog (top of stack show
at left):
WORDS STACKthe man hit the dog s
the man hit the dog np vp
the man hit the dog det n vp
man hit the dog n vp
hit the dog vp
hit the dog tv np
an example
eg.
s → np, vp
np → det ,n
det → the
n → man
n → dog
vp → tv , np
tv → hit
parsing the man hit the dog (top of stack show
at left):
WORDS STACKthe man hit the dog s
the man hit the dog np vp
the man hit the dog det n vp
man hit the dog n vp
hit the dog vp
hit the dog tv np
the dog np
an example
eg.
s → np, vp
np → det ,n
det → the
n → man
n → dog
vp → tv , np
tv → hit
parsing the man hit the dog (top of stack show
at left):
WORDS STACKthe man hit the dog s
the man hit the dog np vp
the man hit the dog det n vp
man hit the dog n vp
hit the dog vp
hit the dog tv np
the dog np
the dog det n
an example
eg.
s → np, vp
np → det ,n
det → the
n → man
n → dog
vp → tv , np
tv → hit
parsing the man hit the dog (top of stack show
at left):
WORDS STACKthe man hit the dog s
the man hit the dog np vp
the man hit the dog det n vp
man hit the dog n vp
hit the dog vp
hit the dog tv np
the dog np
the dog det n
dog n
an example
eg.
s → np, vp
np → det ,n
det → the
n → man
n → dog
vp → tv , np
tv → hit
parsing the man hit the dog (top of stack show
at left):
WORDS STACKthe man hit the dog s
the man hit the dog np vp
the man hit the dog det n vp
man hit the dog n vp
hit the dog vp
hit the dog tv np
the dog np
the dog det n
dog n
an example
eg.
s → np, vp
np → det ,n
det → the
n → man
n → dog
vp → tv , np
tv → hit
parsing the man hit the dog (top of stack show
at left):
WORDS STACKthe man hit the dog s
the man hit the dog np vp
the man hit the dog det n vp
man hit the dog n vp
hit the dog vp
hit the dog tv np
the dog np
the dog det n
dog n
SUCCEED
What about rule choice ?
suppose the grammar:
s --> np,vp
s --> sadv,s
np --> [john]np --> det,n
np --> n
vp --> iv
iv --> [walks]sadv --> [maybe]det --> [the]n --> [man]n --> [men]
What about rule choice ?
suppose the grammar:
s --> np,vp
s --> sadv,s
np --> [john]np --> det,n
np --> n
vp --> iv
iv --> [walks]sadv --> [maybe]det --> [the]n --> [man]n --> [men]
maybe john walks is rejected
What about rule choice ?
suppose the grammar:
s --> np,vp
s --> sadv,s
np --> [john]np --> det,n
np --> n
vp --> iv
iv --> [walks]sadv --> [maybe]det --> [the]n --> [man]n --> [men]
maybe john walks is rejected
WORDS STACKmaybe john walks s
What about rule choice ?
suppose the grammar:
s --> np,vp
s --> sadv,s
np --> [john]np --> det,n
np --> n
vp --> iv
iv --> [walks]sadv --> [maybe]det --> [the]n --> [man]n --> [men]
maybe john walks is rejected
WORDS STACKmaybe john walks s
maybe john walks np vp
What about rule choice ?
suppose the grammar:
s --> np,vp
s --> sadv,s
np --> [john]np --> det,n
np --> n
vp --> iv
iv --> [walks]sadv --> [maybe]det --> [the]n --> [man]n --> [men]
maybe john walks is rejected
WORDS STACKmaybe john walks s
maybe john walks np vp
maybe john walks det n vp
What about rule choice ?
suppose the grammar:
s --> np,vp
s --> sadv,s
np --> [john]np --> det,n
np --> n
vp --> iv
iv --> [walks]sadv --> [maybe]det --> [the]n --> [man]n --> [men]
maybe john walks is rejected
WORDS STACKmaybe john walks s
maybe john walks np vp
maybe john walks det n vp
a dead end
Parsing CFGs with Stacks
Top Down Parsing
What about rule choice ?
◮ Often more than one move will be possible
Parsing CFGs with Stacks
Top Down Parsing
What about rule choice ?
◮ Often more than one move will be possible
◮ So need either
Parsing CFGs with Stacks
Top Down Parsing
What about rule choice ?
◮ Often more than one move will be possible
◮ So need either
◮ a mechanism for exploring all choices – backtracking
Parsing CFGs with Stacks
Top Down Parsing
What about rule choice ?
◮ Often more than one move will be possible
◮ So need either
◮ a mechanism for exploring all choices – backtracking◮ or a way to guide choices correctly by referring to something other than just
the top of the stack