+ All Categories
Home > Documents > Bottom-up Parsing

Bottom-up Parsing

Date post: 22-Jan-2016
Category:
Upload: arissa
View: 40 times
Download: 0 times
Share this document with a friend
Description:
Bottom-up Parsing. Reading Sections 4.5 and 4.7 from ASU. Predictive Parsing Summary. First and Follow sets are used to construct predictive tables For non-terminal A and input t, use a production A  a where t  First( a ) - PowerPoint PPT Presentation
69
CPSC4600 1 Bottom-up Parsing Reading Sections 4.5 and 4.7 from ASU
Transcript
Page 1: Bottom-up Parsing

CPSC4600 1

Bottom-up Parsing

Reading Sections 4.5 and 4.7 from ASU

Page 2: Bottom-up Parsing

CPSC4600 2

Predictive Parsing Summary

First and Follow sets are used to construct predictive tables

For non-terminal A and input t, use a production A where t First()

For non-terminal A and input t, if First(A) and t Follow(), then use a production A where First()

Recursive-descent without backtracking do not need the parse table explicitly

Page 3: Bottom-up Parsing

CPSC4600 3

Bottom-Up Parsing(1)

Bottom-up parsing is more general than top-down parsing And just as efficient Builds on ideas in top-down parsing

Bottom-up is the preferred method in practice

Page 4: Bottom-up Parsing

CPSC4600 4

Bottom-Up Parsing(2)

Table-driven using an explicit stack (non-recursive)

Stack can be viewed as containing both terminals and nonterminals

Basic operations: Shift: Move terminals from input stream to the stack

until the right-hand side of an appropriate production rule has been identified in the stack

Reduce: Replace the sentential form appearing on the stack (considered from top that matched the right-hand side of an appropriate production rule) with the nonterminal appearing on the left-hand side of the production.

Page 5: Bottom-up Parsing

CPSC4600 5

An Introductory Example

Bottom-up parsers don’t need left-factored grammars

Hence we can revert to the “natural” grammar for our example:

E T + E | TT num * T | num | (E)

Consider the string: num * num + num

Page 6: Bottom-up Parsing

CPSC4600 6

The Idea

Bottom-up parsing reduces a string to the start symbol by inverting productions:

E

E T + ET + E

E TT + T

T numT + num

T num * Tnum * T + num

T numnum * num + num

Input Productions Used

Page 7: Bottom-up Parsing

CPSC4600 7

Right-most Derivation

In a right-most derivation, the rightmost nonterminal of a sentential form is replaced at each derivation step.

Question: find the rightmost derivation of the string num* num + num

Page 8: Bottom-up Parsing

CPSC4600 8

Observation

Read the productions found by bottom-up parse in reverse (i.e., from bottom to top)

This is a rightmost derivation!

E

E T + ET + E

E TT + T

T numT + num

T num * Tnum * T + num

T numnum * num + num

Page 9: Bottom-up Parsing

CPSC4600 9

Important Facts

A bottom-up parser traces a rightmost derivation in reverse

Page 10: Bottom-up Parsing

CPSC4600 10

A Bottom-up Parse

E

T + E

T + T

T + num

num * T + num

num * num + numE

T E

+ num*num

T

num

T

Page 11: Bottom-up Parsing

CPSC4600 11

A Bottom-up Parse in Detail (1)

+ num*num num

num * num + num

Page 12: Bottom-up Parsing

CPSC4600 12

A Bottom-up Parse in Detail (2)

num * T + num

num * num + num

+ num*num num

T

Page 13: Bottom-up Parsing

CPSC4600 13

A Bottom-up Parse in Detail (3)

T + num

num * T + num

num * num + num

T

+ num*num num

T

Page 14: Bottom-up Parsing

CPSC4600 14

A Bottom-up Parse in Detail (4)

T + T

T + num

num * T + num

num * num + num

T

+ num*num

T

num

T

Page 15: Bottom-up Parsing

CPSC4600 15

A Bottom-up Parse in Detail (5)

T + E

T + T

T + num

num * T + num

num * num + num

T E

+ num*num

T

num

T

Page 16: Bottom-up Parsing

CPSC4600 16

A Bottom-up Parse in Detail (6)

E

T + E

T + T

T + num

num * T + num

num * num + numE

T E

+ num*num

T

num

T

Page 17: Bottom-up Parsing

CPSC4600 17

Bottom-up Parsing

A trivial bottom-up parsing algorithmLet I = input string

repeatpick a non-empty substring of I

where X is a productionif no such , backtrackreplace one by X in I

until I = “S” (the start symbol) or all possibilities are exhausted

Page 18: Bottom-up Parsing

CPSC4600 18

Observations

The termination of the algorithm (when/if)

Running time of the algorithm

If there are more than one choices for the sub-string to be replaced (reduce) which one to choose?

Page 19: Bottom-up Parsing

CPSC4600 19

Where Do Reductions Happen

Recall A bottom-up parser traces a rightmost derivation

in reverse

Let be a rightmost sentential form Assume the next reduction is by X Then is a string of terminals

Why? Because X is a step in a right-most derivation

Page 20: Bottom-up Parsing

CPSC4600 20

Shift-Reduce Parsing

Bottom-up parsing uses only two kinds of actions: Shift Reduce

Page 21: Bottom-up Parsing

CPSC4600 21

Shift

Shift: Move # (marking the part of the input that has been processed) one place to the right Shifts a terminal to the left string

ABC#xyz ABCx#yz

Page 22: Bottom-up Parsing

CPSC4600 22

Reduce

Apply an inverse production at the right end of the left string If A xy is a production, then

Cbxy#ijk CbA#ijk

Page 23: Bottom-up Parsing

CPSC4600 23

The Example with Shift-Reduce Parsing

reduce T numT + num #

shiftT + # num

shiftnum # * num + num

shiftnum * # num + num

shift#num * num + num

E #

reduce E T + ET + E #

reduce E TT + T #

shiftT # + num

reduce T num * Tnum * T # + num

reduce T numnum * num # + num

Page 24: Bottom-up Parsing

CPSC4600 24

A Shift-Reduce Parse in Detail (1)

+ num*num num

#num * num + num

Page 25: Bottom-up Parsing

CPSC4600 25

A Shift-Reduce Parse in Detail (2)

+ num*num num

num # * num + num

#num * num + num

Page 26: Bottom-up Parsing

CPSC4600 26

A Shift-Reduce Parse in Detail (3)

+ num*num num

num # * num + num

num * # num + num

#num * num + num

Page 27: Bottom-up Parsing

CPSC4600 27

A Shift-Reduce Parse in Detail (4)

+ num*num num

num # * num + num

num * # num + num

#num * num + num

num * num # + num

Page 28: Bottom-up Parsing

CPSC4600 28

A Shift-Reduce Parse in Detail (5)

+ num*num num

T

num # * num + num

num * # num + num

#num * num + num

num * T # + num

num * num # + num

Page 29: Bottom-up Parsing

CPSC4600 29

A Shift-Reduce Parse in Detail (6)

T

+ num*num num

T

num # * num + num

num * # num + num

#num * num + num

T # + num

num * T # + num

num * num # + num

Page 30: Bottom-up Parsing

CPSC4600 30

A Shift-Reduce Parse in Detail (7)

T

+ num*num num

TT + # num

num # * num + num

num * # num + num

#num * num + num

T # + num

num * T # + num

num * num # + num

Page 31: Bottom-up Parsing

CPSC4600 31

A Shift-Reduce Parse in Detail (8)

T

+ num*num num

T

T + num #

T + # num

num # * num + num

num * # num + num

#num * num + num

T # + num

num * T # + num

num * num # + num

Page 32: Bottom-up Parsing

CPSC4600 32

A Shift-Reduce Parse in Detail (9)

T

+ num*num

T

num

T

T + num #

T + # num

num # * num + num

num * # num + num

#num * num + num

T + T #

T # + num

num * T # + num

num * num # + num

Page 33: Bottom-up Parsing

CPSC4600 33

A Shift-Reduce Parse in Detail (10)

T E

+ num*num

T

num

T

T + num #

T + # num

num # * num + num

num * # num + num

#num * num + num

T + E #

T + T #

T # + num

num * T # + num

num * num # + num

Page 34: Bottom-up Parsing

CPSC4600 34

A Shift-Reduce Parse in Detail (11)

E

T E

+ num*num

T

num

T

T + num #

T + # num

num # * num + num

num * # num + num

#num * num + num

E #

T + E #

T + T #

T # + num

num * T # + num

num * num # + num

Page 35: Bottom-up Parsing

CPSC4600 35

The Stack

Left string can be implemented by a stack Top of the stack is the #

Shift pushes a terminal on the stack

Reduce pops 0 or more symbols off of the stack (production rhs) and pushes a non-terminal on the stack (production lhs)

Page 36: Bottom-up Parsing

CPSC4600 36

Key Issue (will be resolved by algorithms)

How do we decide when to shift or reduce? Consider step: num # * num + num We could reduce by T num giving T # *

num + num A fatal mistake: No way to reduce to the

start symbol E

Page 37: Bottom-up Parsing

CPSC4600 37

Conflicts

Generic shift-reduce strategy: If there is a handle on top of the stack, reduce Otherwise, shift

But what if there is a choice? If it is legal to shift or reduce, there is a

shift-reduce conflict If it is legal to reduce by two different productions,

there is a reduce-reduce conflict

Page 38: Bottom-up Parsing

CPSC4600 38

Conflict Example

Consider the ambiguous grammar:

num|

(E)|

E * E|

E + EE

Page 39: Bottom-up Parsing

CPSC4600 39

One Shift-Reduce Parse

E #

reduce E E + EE + E #

. . .. . .

reduce E E * EE * E # + num

shift#num * num + num

reduce E numE + num#

shiftE + # num

shiftE # + num

Input Action

Page 40: Bottom-up Parsing

CPSC4600 40

Another Shift-Reduce Parse

E #

reduce E E * EE * E #

. . .. . .

shiftE * E # + num

shift#num * num + num

reduce E E + EE * E + E#

reduce E numE * E + num #

shiftE * E + # num

Input Action

Page 41: Bottom-up Parsing

CPSC4600 41

Observations

In the second step E * E # + num we can either shift or reduce by E E * E

Choice determines associativity of + and *

As noted previously, grammar can be rewritten to enforce precedence

Precedence declarations are an alternative

Page 42: Bottom-up Parsing

CPSC4600 42

Overview

LR(k) parsing L: scan input Left to right R: produce rightmost derivation k tokens of lookahead

LR(0) zero tokens of look-ahead

SLR Simple LR: like LR(0) but uses FOLLOW sets

to build more “precise” parsing tables

Page 43: Bottom-up Parsing

CPSC4600 43

Basic Terminologies

Handle A substring that matches the right side of

a production whose reduction with that production’s left side constitutes one step of the rightmost derivation of the string from the start nonterminal of the grammar

Page 44: Bottom-up Parsing

CPSC4600 44

Model of - Shift Reduce Parsing

- Stack + input = current right sentential form. Locate the handle during parsing:

shift zero or more terminals (tokens) onto th e stack until a handle is on top of the stack

. - Replace the handle with a proper non terminal (

Handle Pruning): reduce to A where A

Page 45: Bottom-up Parsing

CPSC4600 45

Model of an LR Parser

LRP arsing P rogram

sm

X m

sm -1

X m -1

s0

...

a 1 ... a i ... $a n

O utputS tack

Input

action goto

Page 46: Bottom-up Parsing

CPSC4600 46

Problem: when to shift, when to reduce?

Recall grammar:E T + E | TT num * T | num | (E)

how to know when to reduce and when to shift?

Page 47: Bottom-up Parsing

CPSC4600 47

Model of - Shift Reduce Parsing

- Stack + input = current right sentential form. Locate the handle during the parsing:

shift zero or terminals onto the stack until a h andle is on top of the stack.

- Replace the handle with a proper non terminal ( Handle Pruning)

Page 48: Bottom-up Parsing

CPSC4600 48

What we need to know to do LR parsing

LR(0) states describe states in which the parser can be Note: LR(0) states are used by both LR(0) and SLR

parsers Parsing tables

transitions between LR(0) states, actions to take at transition:

shift, reduce, accept, error How to construct LR(0) states How to construct parsing tables How to drive the parser

Page 49: Bottom-up Parsing

CPSC4600 49

An LR(0) state = a set of LR(0) items

An LR(0) item [X --> a.b] says that the parser is looking for an X it has an aon top of the stack expects to find in the input a string derived from b.

Notes: [X --> a.ab] means that if a is on the input, it can be

shifted. That is: a is a correct token to see on the input, and shifting a would not “over-shift” (still a viable

prefix). [X -->a.] means that we could reduce X

Page 50: Bottom-up Parsing

CPSC4600 50

LR(0) states

S’ . E

E . T

E .T + E

T .(E)

T .num * T

T .num

S’ E . E T.

E T. + E

T num. * T

T num.

T (. E)

E .T

E .T + E

T .(E)

T .num * T

T .num

E T + E.

E T + . E

E .T

E .T + E

T .(E)

T .num * T

T .num

T num * .T

T .(E)

T .num * T

T .num

T num * T.

T (E.)

T (E).

E T

(

num

num *

)

E

E

T

num

(

(

num

T

(

Page 51: Bottom-up Parsing

CPSC4600 51

SLR Parsing

Remember the state of the automaton on each prefix of the stack

Change stack to contain pairs Symbol, DFA State

Page 52: Bottom-up Parsing

CPSC4600 52

SLR Parsing (Contd.)

For a stack sym1, state1 . . . symn, staten

staten is the final state of the DFA on sym1 … symn

Detail: The bottom of the stack is any,start where any is any dummy state start is the start state of the DFA

Page 53: Bottom-up Parsing

CPSC4600 53

Goto Table

Define Goto[i,A] = j if statei A statej where A is a nonterminal

Goto is just the transition function of the DFA One of two parsing tables

Page 54: Bottom-up Parsing

CPSC4600 54

Parser Moves

Shift x Push a, x on the stack a is current input x is a DFA state

Reduce A As before

Accept Error

Page 55: Bottom-up Parsing

CPSC4600 55

Action Table

For each state si and terminal a If si has item X .a and there is a transition on

terminal a from state i to state j then Action[i,a] = shift j

If si has item X . and a Follow(X) and X != S’ then Action[i,a] = reduce X

If si has item S’ S. then action[i,$] = accept

Otherwise, action[i,a] = error

Page 56: Bottom-up Parsing

CPSC4600 56

SLR Parsing Algorithm

Let I = w$ be initial inputLet j = 0Let DFA state 1 have item S’ .SLet stack = dummy, 1repeat

case action[top_state(stack),I[j]] ofshift k: push I[j++], k reduce X A: pop |A| pairs, I[--j] = X // prepend X to input

accept: halt normallyerror: halt and report error

Page 57: Bottom-up Parsing

CPSC4600 57

Notes on SLR Parsing Algorithm

Note that the algorithm uses only the DFA states and the input The stack symbols are never used!

However, we still need the symbols for semantic actions

Page 58: Bottom-up Parsing

CPSC4600 58

The Compiler So Far

Lexical analysis Detects inputs with illegal tokens

Parsing Detects inputs with ill-formed parse trees

Semantic analysis Last “front end” phase Catches all remaining errors

Page 59: Bottom-up Parsing

CPSC4600 59

Typical Semantic Errors

multiple declarations: a variable should be declared (in the same scope) at most once

undeclared variable: a variable should not be used before being declared.

type mismatch: type of the left-hand side of an assignment should match the type of the right-hand side.

wrong arguments: methods should be called with the right number and types of arguments.

Page 60: Bottom-up Parsing

CPSC4600 60

Sample Semantic Analyzer

For each scope in the program: process the declarations

add new entries to the symbol table (or a similar structure) and

report any variables that are multiply declared process the statements

find uses of undeclared variables,

use the symbol-table information to determine the type of each expression, and to find type errors.

Page 61: Bottom-up Parsing

CPSC4600 61

Scope Rules for Pascal-

Rule 6.1: All constants, types, variables, and procedures definedin the same block must have different names

Rule 6.2: A constant, type, or variable defined in a block is normallyknown from the end of its declaration to the end of the block. A procedure defined in a block B is normally known from the beginning of the procedure to the end of the block B

Rule 6.3: Consider a block Q that defines an object x. If Q contains a block R that defines another object named x, the first object is unknown in the scope of the second object.

Page 62: Bottom-up Parsing

CPSC4600 62

Pascal- Program (1)

{ 0 Begin Standard Block} 1 program P; 2 type T = array[1..100] of integer; 3 var x: T; 4 5 procedure Q(x: integer); 6 const c = 13; 7 begin ... x ... end{Q}; 8 9 procedure R; 10 var b, c: Boolean; 11 begin ... x ...end{R}; 12 13 begin ... end.{P} 14 {End Standard block}

Page 63: Bottom-up Parsing

CPSC4600 63

Pascal- Program (2)

{Constant = Numeral | ConstantName.}procedure Constant(Stop: Symbols);begin if Symbol = Numeral1 then Expect(Numeral, Stop) else if Symbol = Name1 then begin Find(Argument); Expect(Name1, Stop) end else SyntaxError(Stop)end;

Page 64: Bottom-up Parsing

CPSC4600 64

Pascal- Program (3)

{ConstantDefinition = ConstantName '=' Constant ';'.}procedure ConstantDefinition(stop: Symbols);begin ExpectName(Name, Symbols[Equal1, Semicolon1] +

ConstantSymbols + Stop); Expect(Equal1, ConstantSymbols + Symbols[Semicolon1]

+ Stop); Constant(Symbols[Semicolon1] + Stop); Define(Name); Expect(Semicolon1, Stop)end;

Page 65: Bottom-up Parsing

CPSC4600 65

Pascal- Program (4)

{Program = 'program' ProgramName ';' BlockBody '.'}procedure Programx(Stop: Symbols);begin Expect(Program1, Symbols[Name1, Semicolon1, Period1]

+ BlockSymbols + Stop); Expect(Name1, Symbols[Semicolon1, Period1] + BlockSymbols + Stop); Expect(Semicolon1, Symbols[Period1] + BlockSymbols +

Stop); NewBlock; BlockBody(Symbols[Period1] + Stop); EndBlock; Expect(Period1, Stop)end;

Page 66: Bottom-up Parsing

CPSC4600 66

Pascal- Program (5-1)

{Constant = Numeral | ConstantName.}procedure Constant(var Value: integer; var Typex: Pointer; Stop: Symbols);begin if Symbol = Numeral1 then begin Value := Argument; Typex := TypeInteger; Expect(Numeral, Stop) end else if Symbol = Name1 then begin Find(Argument, Object); if [email protected] = Constantx then begin Value := [email protected]; Typex := [email protected]; end

Page 67: Bottom-up Parsing

CPSC4600 67

Pascal- Program (5-2)

else begin KindError(object); Value := 0; Typex := TypeUniversal; end; Expect(Name1, Stop) end else begin SyntaxError(Stop); Value := 0; Typex := TypeUniversal; end;end;

Page 68: Bottom-up Parsing

CPSC4600 68

Pascal- Program (6)

{ConstantDefinition = ConstantName '=' Constant ';'.}

procedure ConstantDefinition(stop: Symbols); var Name, Value: integer; Constx, Typex: Pointer; begin ExpectName(Name, Symbols[Equal1, Semicolon1] + ConstantSymbols + Stop); Expect(Equal1, ConstantSymbols + Symbols[Semicolon1] +

Stop); Constant(Value, Typex, Symbols[Semicolon1] + Stop); Define(Name, Constantx, Constx); [email protected] := Value; [email protected] := Typex; Expect(Semicolon1, Stop) end;

Page 69: Bottom-up Parsing

CPSC4600 69

Static and Dynamic Scope

#include <stdio.h> int main() { int x = 1; char x = ‘b’; char y = ‘a’; q(); void p() { return 0 double x = 2.5; } printf(“%c\n”,y}; { int y[10]; } } void q() { int y = 42; printf(%d\n”, x); p(); }


Recommended