The general parsing task Top Down Parsing · Top Down Parsing A top-down parser will effectively...

Post on 15-Jul-2020

31 views 0 download

transcript

Parsing CFGs with Stacks

Outline

The general parsing task

Top Down Parsing

Parsing CFGs with Stacks

Outline

Top-down parsing

Parsing CFGs with Stacks

The general parsing task

Outline

The general parsing task

Parsing CFGs with Stacks

The general parsing task

The basic parsing task

◮ Given a grammar G, a category x and an input string w1 . . .wn, the job of

a parser is to discover whether G categorises w1 . . .wn as x ,

Parsing CFGs with Stacks

The general parsing task

The basic parsing task

◮ Given a grammar G, a category x and an input string w1 . . .wn, the job of

a parser is to discover whether G categorises w1 . . .wn as x ,

◮ or equivalently, whether it permits any analysis tree whose topmost node

is x and whose leaves are w1 . . .wn.

Parsing CFGs with Stacks

The general parsing task

The basic parsing task

◮ Given a grammar G, a category x and an input string w1 . . .wn, the job of

a parser is to discover whether G categorises w1 . . .wn as x ,

◮ or equivalently, whether it permits any analysis tree whose topmost node

is x and whose leaves are w1 . . .wn.

◮ Variants of this:

◮ find all parse trees, if there is more than one◮ find also the x ’s which categorise the input, rather than assuming this is

given

Parsing CFGs with Stacks

The general parsing task

Top-down vs Bottom-up

◮ There are many ways a parser might manage the search process.

Parsing CFGs with Stacks

The general parsing task

Top-down vs Bottom-up

◮ There are many ways a parser might manage the search process.

◮ If a parser expands a tree down towards its leaves it is said to be

working top-down.

Parsing CFGs with Stacks

The general parsing task

Top-down vs Bottom-up

◮ There are many ways a parser might manage the search process.

◮ If a parser expands a tree down towards its leaves it is said to be

working top-down.

◮ By contrast a bottom-up parser fuses subtrees together with the aim of

making a single encompassing tree.

Parsing CFGs with Stacks

Top Down Parsing

Outline

Top Down Parsing

Parsing CFGs with Stacks

Top Down Parsing

As a beginning

example take the

following grammar

s ⇒ sadv , s

s ⇒ np, vp

np ⇒ john

vp ⇒ iv

iv ⇒ walks

sadv ⇒ maybe

Parsing CFGs with Stacks

Top Down Parsing

As a beginning

example take the

following grammar

s ⇒ sadv , s

s ⇒ np, vp

np ⇒ john

vp ⇒ iv

iv ⇒ walks

sadv ⇒ maybe

maybe john walks is an s according to this grammar

with the syntax analysis

s

john

sadv

maybe

s

np vp

iv

walks

Parsing CFGs with Stacks

Top Down Parsing

◮ A top-down parser will effectively seek to settle the questions ’is the input

an x ’ by deriving a tree in a succession of stages, starting with just a

single node x-tree and ending with the complete tree

Parsing CFGs with Stacks

Top Down Parsing

◮ A top-down parser will effectively seek to settle the questions ’is the input

an x ’ by deriving a tree in a succession of stages, starting with just a

single node x-tree and ending with the complete tree

◮ At every stage of this process of tree derivation, there are choices to be

made

Parsing CFGs with Stacks

Top Down Parsing

◮ A top-down parser will effectively seek to settle the questions ’is the input

an x ’ by deriving a tree in a succession of stages, starting with just a

single node x-tree and ending with the complete tree

◮ At every stage of this process of tree derivation, there are choices to be

made

◮ One choice is which node to expand

Parsing CFGs with Stacks

Top Down Parsing

◮ A top-down parser will effectively seek to settle the questions ’is the input

an x ’ by deriving a tree in a succession of stages, starting with just a

single node x-tree and ending with the complete tree

◮ At every stage of this process of tree derivation, there are choices to be

made

◮ One choice is which node to expand

◮ the other choice is how to expand each node

Parsing CFGs with Stacks

Top Down Parsing

Which node to work on ?

To illustrate the first kind of choice, consider the following two derivations

(you’ll have to zoom/magnify to see this):

Parsing CFGs with Stacks

Top Down Parsing

Which node to work on ?

To illustrate the first kind of choice, consider the following two derivations

(you’ll have to zoom/magnify to see this):

walks

np vp

maybe np vp

s

sadv s

walks

s

sadv s

maybe

s

sadv s

maybe np vp

john

s

sadv s

maybe np vp

ivjohn

s

sadv s

maybe np vp

iv

maybe np vp

s

sadv s

iv

maybe np vp

s

sadv s

john

s

sadv s

s

sadv s

s

sadv s

ivjohn

s

sadv s

maybe np vp

iv

maybe np vp

s

sadv s

john

Parsing CFGs with Stacks

Top Down Parsing

Which node to work on ?

To illustrate the first kind of choice, consider the following two derivations

(you’ll have to zoom/magnify to see this):

walks

np vp

maybe np vp

s

sadv s

walks

s

sadv s

maybe

s

sadv s

maybe np vp

john

s

sadv s

maybe np vp

ivjohn

s

sadv s

maybe np vp

iv

maybe np vp

s

sadv s

iv

maybe np vp

s

sadv s

john

s

sadv s

s

sadv s

s

sadv s

ivjohn

s

sadv s

maybe np vp

iv

maybe np vp

s

sadv s

john

Parsing CFGs with Stacks

Top Down Parsing

Which node to work on ?

To illustrate the first kind of choice, consider the following two derivations

(you’ll have to zoom/magnify to see this):

walks

np vp

maybe np vp

s

sadv s

walks

s

sadv s

maybe

s

sadv s

maybe np vp

john

s

sadv s

maybe np vp

ivjohn

s

sadv s

maybe np vp

iv

maybe np vp

s

sadv s

iv

maybe np vp

s

sadv s

john

s

sadv s

s

sadv s

s

sadv s

ivjohn

s

sadv s

maybe np vp

iv

maybe np vp

s

sadv s

john

◮ In the first derivation, there is a system to the way

the tree is grown

Parsing CFGs with Stacks

Top Down Parsing

Which node to work on ?

To illustrate the first kind of choice, consider the following two derivations

(you’ll have to zoom/magnify to see this):

walks

np vp

maybe np vp

s

sadv s

walks

s

sadv s

maybe

s

sadv s

maybe np vp

john

s

sadv s

maybe np vp

ivjohn

s

sadv s

maybe np vp

iv

maybe np vp

s

sadv s

iv

maybe np vp

s

sadv s

john

s

sadv s

s

sadv s

s

sadv s

ivjohn

s

sadv s

maybe np vp

iv

maybe np vp

s

sadv s

john

◮ In the first derivation, there is a system to the way

the tree is grown

◮ in the second derivation, the tree growth is

random.

Parsing CFGs with Stacks

Top Down Parsing

Which node to work on ?

To illustrate the first kind of choice, consider the following two derivations

(you’ll have to zoom/magnify to see this):

walks

np vp

maybe np vp

s

sadv s

walks

s

sadv s

maybe

s

sadv s

maybe np vp

john

s

sadv s

maybe np vp

ivjohn

s

sadv s

maybe np vp

iv

maybe np vp

s

sadv s

iv

maybe np vp

s

sadv s

john

s

sadv s

s

sadv s

s

sadv s

ivjohn

s

sadv s

maybe np vp

iv

maybe np vp

s

sadv s

john

◮ In the first derivation, there is a system to the way

the tree is grown

◮ in the second derivation, the tree growth is

random.

◮ in the first derivation at every step

the leftmost expandable leaf node is

expanded

Parsing CFGs with Stacks

Top Down Parsing

Which node to work on ?

To illustrate the first kind of choice, consider the following two derivations

(you’ll have to zoom/magnify to see this):

walks

np vp

maybe np vp

s

sadv s

walks

s

sadv s

maybe

s

sadv s

maybe np vp

john

s

sadv s

maybe np vp

ivjohn

s

sadv s

maybe np vp

iv

maybe np vp

s

sadv s

iv

maybe np vp

s

sadv s

john

s

sadv s

s

sadv s

s

sadv s

ivjohn

s

sadv s

maybe np vp

iv

maybe np vp

s

sadv s

john

◮ In the first derivation, there is a system to the way

the tree is grown

◮ in the second derivation, the tree growth is

random.

◮ in the first derivation at every step

the leftmost expandable leaf node is

expanded

◮ The key fact is this:

if there is an analysis tree for some

input, then it can be generated by

applying leftmost expansion

Parsing CFGs with Stacks

Top Down Parsing

◮ so an algorithm to explore the space of tree derivations, can restrict

attention to the derivations which use leftmost expansion

Parsing CFGs with Stacks

Top Down Parsing

◮ so an algorithm to explore the space of tree derivations, can restrict

attention to the derivations which use leftmost expansion

◮ this means the ’which node’ source of choice can be eliminated: always

deterministically choose the leftmost unexpanded node.

Parsing CFGs with Stacks

Top Down Parsing

◮ so an algorithm to explore the space of tree derivations, can restrict

attention to the derivations which use leftmost expansion

◮ this means the ’which node’ source of choice can be eliminated: always

deterministically choose the leftmost unexpanded node.

◮ there is still the other source of choise, of non-determinism: more than

one way to expand a given node.

Parsing CFGs with Stacks

Top Down Parsing

◮ so an algorithm to explore the space of tree derivations, can restrict

attention to the derivations which use leftmost expansion

◮ this means the ’which node’ source of choice can be eliminated: always

deterministically choose the leftmost unexpanded node.

◮ there is still the other source of choise, of non-determinism: more than

one way to expand a given node. This still has to be dealt with, but to

begin we will get familiar with the deterministic case.

Parsing CFGs with Stacks

Top Down Parsing

The frontier

Summarising the first derivation as a

series of snap shots of the leaf nodes,

you have:

Leaf nodes

s

sadv s

maybe s

maybe np vp

maybe john vp

maybe john iv

maybe john walks

Parsing CFGs with Stacks

Top Down Parsing

The frontier

Summarising the first derivation as a

series of snap shots of the leaf nodes,

you have:

Leaf nodes

s

sadv s

maybe s

maybe np vp

maybe john vp

maybe john iv

maybe john walks

Let use the term Frontier for the

subset of the leaf-nodes which are

expandable

Leaf nodes Frontier

s s

sadv s sadv s

maybe s s

maybe np vp np vp

maybe john vp vp

maybe john iv iv

maybe john walks

Parsing CFGs with Stacks

Top Down Parsing

The frontier as a stack

Leaf nodes Frontier

s s

sadv s sadv s

maybe s s

maybe np vp np vp

maybe john vp vp

maybe john iv iv

maybe john walks

Parsing CFGs with Stacks

Top Down Parsing

The frontier as a stack

Leaf nodes Frontier

s s

sadv s sadv s

maybe s s

maybe np vp np vp

maybe john vp vp

maybe john iv iv

maybe john walks

◮ Because of the choice to always

take the leftmost unexpanded

node, the frontier operates in the

fashion of a stack.

with a last-in/first-out (LIFO)

behaviour.

Parsing CFGs with Stacks

Top Down Parsing

The frontier as a stack

Leaf nodes Frontier

s s

sadv s sadv s

maybe s s

maybe np vp np vp

maybe john vp vp

maybe john iv iv

maybe john walks

◮ Because of the choice to always

take the leftmost unexpanded

node, the frontier operates in the

fashion of a stack.

with a last-in/first-out (LIFO)

behaviour.

◮ You can keep adding to the top of

a stack (pushing),

Parsing CFGs with Stacks

Top Down Parsing

The frontier as a stack

Leaf nodes Frontier

s s

sadv s sadv s

maybe s s

maybe np vp np vp

maybe john vp vp

maybe john iv iv

maybe john walks

◮ Because of the choice to always

take the leftmost unexpanded

node, the frontier operates in the

fashion of a stack.

with a last-in/first-out (LIFO)

behaviour.

◮ You can keep adding to the top of

a stack (pushing),

◮ and its the most recently added

things that you can remove

(popping) and replace (more

pushing).

Parsing CFGs with Stacks

Top Down Parsing

this leads to the idea that one can manage the search through the space of

possible tree derivations, by managing a search through a space of possible

stack states.

Parsing CFGs with Stacks

Top Down Parsing

this leads to the idea that one can manage the search through the space of

possible tree derivations, by managing a search through a space of possible

stack states.

can now give an outline of a top-down algorithm.

Parsing CFGs with Stacks

Top Down Parsing

this leads to the idea that one can manage the search through the space of

possible tree derivations, by managing a search through a space of possible

stack states.

can now give an outline of a top-down algorithm.

Let w be an array representing the input,

let i be the index of the current word.

use F for the frontier of nodes in the tree that are due to be expanded.

Top-down parsing algorithm (without backtracking)

set F to start symbol, progress indicator i = 0

Top-down parsing algorithm (without backtracking)

set F to start symbol, progress indicator i = 0

MOVES:let A = top(F)loop thru the rules {

Top-down parsing algorithm (without backtracking)

set F to start symbol, progress indicator i = 0

MOVES:let A = top(F)loop thru the rules {

if (rule is A → w [i ]){ //LEAF CANCELLATIONpop top of F

set i = i+1

goto MOVES

}

Top-down parsing algorithm (without backtracking)

set F to start symbol, progress indicator i = 0

MOVES:let A = top(F)loop thru the rules {

if (rule is A → w [i ]){ //LEAF CANCELLATIONpop top of F

set i = i+1

goto MOVES

}

else if (rule is A → D1 . . . Dn){ //LEFT EXPANSIONpop top of Fpush Dn ... push D1 note ordergoto MOVES

}

}

Top-down parsing algorithm (without backtracking)

set F to start symbol, progress indicator i = 0

MOVES:let A = top(F)loop thru the rules {

if (rule is A → w [i ]){ //LEAF CANCELLATIONpop top of F

set i = i+1

goto MOVES

}

else if (rule is A → D1 . . . Dn){ //LEFT EXPANSIONpop top of Fpush Dn ... push D1 note ordergoto MOVES

}

}

YES_NO:

if ((F is empty) && (i == size of input)) { succeed }

else { fail }

Parsing CFGs with Stacks

Top Down Parsing

About the top-down algorithm

◮ algorithm keeps looking for a move it can make to update its progress

through the input and the stack of categories F.

Parsing CFGs with Stacks

Top Down Parsing

About the top-down algorithm

◮ algorithm keeps looking for a move it can make to update its progress

through the input and the stack of categories F.

◮ first kind of move, leaf cancellation, recognises that the top the stack

represents a node which could have the current word attached

underneath it. Doing so removes a category off the stack and moves

progress through the input by 1.

Parsing CFGs with Stacks

Top Down Parsing

About the top-down algorithm

◮ algorithm keeps looking for a move it can make to update its progress

through the input and the stack of categories F.

◮ first kind of move, leaf cancellation, recognises that the top the stack

represents a node which could have the current word attached

underneath it. Doing so removes a category off the stack and moves

progress through the input by 1.

◮ second kind of move, left expansion, recognises that the top of the stack

represents a node which could have a sequence of daughters

corresponding to the right-hand side of rule attached underneath it.

Parsing CFGs with Stacks

Top Down Parsing

About the top-down algorithm

◮ algorithm keeps looking for a move it can make to update its progress

through the input and the stack of categories F.

◮ first kind of move, leaf cancellation, recognises that the top the stack

represents a node which could have the current word attached

underneath it. Doing so removes a category off the stack and moves

progress through the input by 1.

◮ second kind of move, left expansion, recognises that the top of the stack

represents a node which could have a sequence of daughters

corresponding to the right-hand side of rule attached underneath it.

◮ in checking if a move is possible, the grammar rules are considered in

order from top to bottom

Parsing CFGs with Stacks

Top Down Parsing

About the top-down algorithm

◮ algorithm keeps looking for a move it can make to update its progress

through the input and the stack of categories F.

◮ first kind of move, leaf cancellation, recognises that the top the stack

represents a node which could have the current word attached

underneath it. Doing so removes a category off the stack and moves

progress through the input by 1.

◮ second kind of move, left expansion, recognises that the top of the stack

represents a node which could have a sequence of daughters

corresponding to the right-hand side of rule attached underneath it.

◮ in checking if a move is possible, the grammar rules are considered in

order from top to bottom

◮ note in left expansion rules daughters must be pushed in a last-to-first

order, to guarantee that first daughter ends up on top of the stack.

an example

eg.

s → np, vp

np → det ,n

det → the

n → man

n → dog

vp → tv , np

tv → hit

an example

eg.

s → np, vp

np → det ,n

det → the

n → man

n → dog

vp → tv , np

tv → hit

an example

eg.

s → np, vp

np → det ,n

det → the

n → man

n → dog

vp → tv , np

tv → hit

parsing the man hit the dog (top of stack show

at left):

WORDS STACKthe man hit the dog s

an example

eg.

s → np, vp

np → det ,n

det → the

n → man

n → dog

vp → tv , np

tv → hit

parsing the man hit the dog (top of stack show

at left):

WORDS STACKthe man hit the dog s

the man hit the dog np vp

an example

eg.

s → np, vp

np → det ,n

det → the

n → man

n → dog

vp → tv , np

tv → hit

parsing the man hit the dog (top of stack show

at left):

WORDS STACKthe man hit the dog s

the man hit the dog np vp

the man hit the dog det n vp

an example

eg.

s → np, vp

np → det ,n

det → the

n → man

n → dog

vp → tv , np

tv → hit

parsing the man hit the dog (top of stack show

at left):

WORDS STACKthe man hit the dog s

the man hit the dog np vp

the man hit the dog det n vp

man hit the dog n vp

an example

eg.

s → np, vp

np → det ,n

det → the

n → man

n → dog

vp → tv , np

tv → hit

parsing the man hit the dog (top of stack show

at left):

WORDS STACKthe man hit the dog s

the man hit the dog np vp

the man hit the dog det n vp

man hit the dog n vp

hit the dog vp

an example

eg.

s → np, vp

np → det ,n

det → the

n → man

n → dog

vp → tv , np

tv → hit

parsing the man hit the dog (top of stack show

at left):

WORDS STACKthe man hit the dog s

the man hit the dog np vp

the man hit the dog det n vp

man hit the dog n vp

hit the dog vp

hit the dog tv np

an example

eg.

s → np, vp

np → det ,n

det → the

n → man

n → dog

vp → tv , np

tv → hit

parsing the man hit the dog (top of stack show

at left):

WORDS STACKthe man hit the dog s

the man hit the dog np vp

the man hit the dog det n vp

man hit the dog n vp

hit the dog vp

hit the dog tv np

the dog np

an example

eg.

s → np, vp

np → det ,n

det → the

n → man

n → dog

vp → tv , np

tv → hit

parsing the man hit the dog (top of stack show

at left):

WORDS STACKthe man hit the dog s

the man hit the dog np vp

the man hit the dog det n vp

man hit the dog n vp

hit the dog vp

hit the dog tv np

the dog np

the dog det n

an example

eg.

s → np, vp

np → det ,n

det → the

n → man

n → dog

vp → tv , np

tv → hit

parsing the man hit the dog (top of stack show

at left):

WORDS STACKthe man hit the dog s

the man hit the dog np vp

the man hit the dog det n vp

man hit the dog n vp

hit the dog vp

hit the dog tv np

the dog np

the dog det n

dog n

an example

eg.

s → np, vp

np → det ,n

det → the

n → man

n → dog

vp → tv , np

tv → hit

parsing the man hit the dog (top of stack show

at left):

WORDS STACKthe man hit the dog s

the man hit the dog np vp

the man hit the dog det n vp

man hit the dog n vp

hit the dog vp

hit the dog tv np

the dog np

the dog det n

dog n

an example

eg.

s → np, vp

np → det ,n

det → the

n → man

n → dog

vp → tv , np

tv → hit

parsing the man hit the dog (top of stack show

at left):

WORDS STACKthe man hit the dog s

the man hit the dog np vp

the man hit the dog det n vp

man hit the dog n vp

hit the dog vp

hit the dog tv np

the dog np

the dog det n

dog n

SUCCEED

What about rule choice ?

suppose the grammar:

s --> np,vp

s --> sadv,s

np --> [john]np --> det,n

np --> n

vp --> iv

iv --> [walks]sadv --> [maybe]det --> [the]n --> [man]n --> [men]

What about rule choice ?

suppose the grammar:

s --> np,vp

s --> sadv,s

np --> [john]np --> det,n

np --> n

vp --> iv

iv --> [walks]sadv --> [maybe]det --> [the]n --> [man]n --> [men]

maybe john walks is rejected

What about rule choice ?

suppose the grammar:

s --> np,vp

s --> sadv,s

np --> [john]np --> det,n

np --> n

vp --> iv

iv --> [walks]sadv --> [maybe]det --> [the]n --> [man]n --> [men]

maybe john walks is rejected

WORDS STACKmaybe john walks s

What about rule choice ?

suppose the grammar:

s --> np,vp

s --> sadv,s

np --> [john]np --> det,n

np --> n

vp --> iv

iv --> [walks]sadv --> [maybe]det --> [the]n --> [man]n --> [men]

maybe john walks is rejected

WORDS STACKmaybe john walks s

maybe john walks np vp

What about rule choice ?

suppose the grammar:

s --> np,vp

s --> sadv,s

np --> [john]np --> det,n

np --> n

vp --> iv

iv --> [walks]sadv --> [maybe]det --> [the]n --> [man]n --> [men]

maybe john walks is rejected

WORDS STACKmaybe john walks s

maybe john walks np vp

maybe john walks det n vp

What about rule choice ?

suppose the grammar:

s --> np,vp

s --> sadv,s

np --> [john]np --> det,n

np --> n

vp --> iv

iv --> [walks]sadv --> [maybe]det --> [the]n --> [man]n --> [men]

maybe john walks is rejected

WORDS STACKmaybe john walks s

maybe john walks np vp

maybe john walks det n vp

a dead end

Parsing CFGs with Stacks

Top Down Parsing

What about rule choice ?

◮ Often more than one move will be possible

Parsing CFGs with Stacks

Top Down Parsing

What about rule choice ?

◮ Often more than one move will be possible

◮ So need either

Parsing CFGs with Stacks

Top Down Parsing

What about rule choice ?

◮ Often more than one move will be possible

◮ So need either

◮ a mechanism for exploring all choices – backtracking

Parsing CFGs with Stacks

Top Down Parsing

What about rule choice ?

◮ Often more than one move will be possible

◮ So need either

◮ a mechanism for exploring all choices – backtracking◮ or a way to guide choices correctly by referring to something other than just

the top of the stack