CPSC4600 1
Bottom-up Parsing
Reading Sections 4.5 and 4.7 from ASU
CPSC4600 2
Predictive Parsing Summary
First and Follow sets are used to construct predictive tables
For non-terminal A and input t, use a production A where t First()
For non-terminal A and input t, if First(A) and t Follow(), then use a production A where First()
Recursive-descent without backtracking do not need the parse table explicitly
CPSC4600 3
Bottom-Up Parsing(1)
Bottom-up parsing is more general than top-down parsing And just as efficient Builds on ideas in top-down parsing
Bottom-up is the preferred method in practice
CPSC4600 4
Bottom-Up Parsing(2)
Table-driven using an explicit stack (non-recursive)
Stack can be viewed as containing both terminals and nonterminals
Basic operations: Shift: Move terminals from input stream to the stack
until the right-hand side of an appropriate production rule has been identified in the stack
Reduce: Replace the sentential form appearing on the stack (considered from top that matched the right-hand side of an appropriate production rule) with the nonterminal appearing on the left-hand side of the production.
CPSC4600 5
An Introductory Example
Bottom-up parsers don’t need left-factored grammars
Hence we can revert to the “natural” grammar for our example:
E T + E | TT num * T | num | (E)
Consider the string: num * num + num
CPSC4600 6
The Idea
Bottom-up parsing reduces a string to the start symbol by inverting productions:
E
E T + ET + E
E TT + T
T numT + num
T num * Tnum * T + num
T numnum * num + num
Input Productions Used
CPSC4600 7
Right-most Derivation
In a right-most derivation, the rightmost nonterminal of a sentential form is replaced at each derivation step.
Question: find the rightmost derivation of the string num* num + num
CPSC4600 8
Observation
Read the productions found by bottom-up parse in reverse (i.e., from bottom to top)
This is a rightmost derivation!
E
E T + ET + E
E TT + T
T numT + num
T num * Tnum * T + num
T numnum * num + num
CPSC4600 9
Important Facts
A bottom-up parser traces a rightmost derivation in reverse
CPSC4600 10
A Bottom-up Parse
E
T + E
T + T
T + num
num * T + num
num * num + numE
T E
+ num*num
T
num
T
CPSC4600 11
A Bottom-up Parse in Detail (1)
+ num*num num
num * num + num
CPSC4600 12
A Bottom-up Parse in Detail (2)
num * T + num
num * num + num
+ num*num num
T
CPSC4600 13
A Bottom-up Parse in Detail (3)
T + num
num * T + num
num * num + num
T
+ num*num num
T
CPSC4600 14
A Bottom-up Parse in Detail (4)
T + T
T + num
num * T + num
num * num + num
T
+ num*num
T
num
T
CPSC4600 15
A Bottom-up Parse in Detail (5)
T + E
T + T
T + num
num * T + num
num * num + num
T E
+ num*num
T
num
T
CPSC4600 16
A Bottom-up Parse in Detail (6)
E
T + E
T + T
T + num
num * T + num
num * num + numE
T E
+ num*num
T
num
T
CPSC4600 17
Bottom-up Parsing
A trivial bottom-up parsing algorithmLet I = input string
repeatpick a non-empty substring of I
where X is a productionif no such , backtrackreplace one by X in I
until I = “S” (the start symbol) or all possibilities are exhausted
CPSC4600 18
Observations
The termination of the algorithm (when/if)
Running time of the algorithm
If there are more than one choices for the sub-string to be replaced (reduce) which one to choose?
CPSC4600 19
Where Do Reductions Happen
Recall A bottom-up parser traces a rightmost derivation
in reverse
Let be a rightmost sentential form Assume the next reduction is by X Then is a string of terminals
Why? Because X is a step in a right-most derivation
CPSC4600 20
Shift-Reduce Parsing
Bottom-up parsing uses only two kinds of actions: Shift Reduce
CPSC4600 21
Shift
Shift: Move # (marking the part of the input that has been processed) one place to the right Shifts a terminal to the left string
ABC#xyz ABCx#yz
CPSC4600 22
Reduce
Apply an inverse production at the right end of the left string If A xy is a production, then
Cbxy#ijk CbA#ijk
CPSC4600 23
The Example with Shift-Reduce Parsing
reduce T numT + num #
shiftT + # num
shiftnum # * num + num
shiftnum * # num + num
shift#num * num + num
E #
reduce E T + ET + E #
reduce E TT + T #
shiftT # + num
reduce T num * Tnum * T # + num
reduce T numnum * num # + num
CPSC4600 24
A Shift-Reduce Parse in Detail (1)
+ num*num num
#num * num + num
CPSC4600 25
A Shift-Reduce Parse in Detail (2)
+ num*num num
num # * num + num
#num * num + num
CPSC4600 26
A Shift-Reduce Parse in Detail (3)
+ num*num num
num # * num + num
num * # num + num
#num * num + num
CPSC4600 27
A Shift-Reduce Parse in Detail (4)
+ num*num num
num # * num + num
num * # num + num
#num * num + num
num * num # + num
CPSC4600 28
A Shift-Reduce Parse in Detail (5)
+ num*num num
T
num # * num + num
num * # num + num
#num * num + num
num * T # + num
num * num # + num
CPSC4600 29
A Shift-Reduce Parse in Detail (6)
T
+ num*num num
T
num # * num + num
num * # num + num
#num * num + num
T # + num
num * T # + num
num * num # + num
CPSC4600 30
A Shift-Reduce Parse in Detail (7)
T
+ num*num num
TT + # num
num # * num + num
num * # num + num
#num * num + num
T # + num
num * T # + num
num * num # + num
CPSC4600 31
A Shift-Reduce Parse in Detail (8)
T
+ num*num num
T
T + num #
T + # num
num # * num + num
num * # num + num
#num * num + num
T # + num
num * T # + num
num * num # + num
CPSC4600 32
A Shift-Reduce Parse in Detail (9)
T
+ num*num
T
num
T
T + num #
T + # num
num # * num + num
num * # num + num
#num * num + num
T + T #
T # + num
num * T # + num
num * num # + num
CPSC4600 33
A Shift-Reduce Parse in Detail (10)
T E
+ num*num
T
num
T
T + num #
T + # num
num # * num + num
num * # num + num
#num * num + num
T + E #
T + T #
T # + num
num * T # + num
num * num # + num
CPSC4600 34
A Shift-Reduce Parse in Detail (11)
E
T E
+ num*num
T
num
T
T + num #
T + # num
num # * num + num
num * # num + num
#num * num + num
E #
T + E #
T + T #
T # + num
num * T # + num
num * num # + num
CPSC4600 35
The Stack
Left string can be implemented by a stack Top of the stack is the #
Shift pushes a terminal on the stack
Reduce pops 0 or more symbols off of the stack (production rhs) and pushes a non-terminal on the stack (production lhs)
CPSC4600 36
Key Issue (will be resolved by algorithms)
How do we decide when to shift or reduce? Consider step: num # * num + num We could reduce by T num giving T # *
num + num A fatal mistake: No way to reduce to the
start symbol E
CPSC4600 37
Conflicts
Generic shift-reduce strategy: If there is a handle on top of the stack, reduce Otherwise, shift
But what if there is a choice? If it is legal to shift or reduce, there is a
shift-reduce conflict If it is legal to reduce by two different productions,
there is a reduce-reduce conflict
CPSC4600 38
Conflict Example
Consider the ambiguous grammar:
num|
(E)|
E * E|
E + EE
CPSC4600 39
One Shift-Reduce Parse
E #
reduce E E + EE + E #
. . .. . .
reduce E E * EE * E # + num
shift#num * num + num
reduce E numE + num#
shiftE + # num
shiftE # + num
Input Action
CPSC4600 40
Another Shift-Reduce Parse
E #
reduce E E * EE * E #
. . .. . .
shiftE * E # + num
shift#num * num + num
reduce E E + EE * E + E#
reduce E numE * E + num #
shiftE * E + # num
Input Action
CPSC4600 41
Observations
In the second step E * E # + num we can either shift or reduce by E E * E
Choice determines associativity of + and *
As noted previously, grammar can be rewritten to enforce precedence
Precedence declarations are an alternative
CPSC4600 42
Overview
LR(k) parsing L: scan input Left to right R: produce rightmost derivation k tokens of lookahead
LR(0) zero tokens of look-ahead
SLR Simple LR: like LR(0) but uses FOLLOW sets
to build more “precise” parsing tables
CPSC4600 43
Basic Terminologies
Handle A substring that matches the right side of
a production whose reduction with that production’s left side constitutes one step of the rightmost derivation of the string from the start nonterminal of the grammar
CPSC4600 44
Model of - Shift Reduce Parsing
- Stack + input = current right sentential form. Locate the handle during parsing:
shift zero or more terminals (tokens) onto th e stack until a handle is on top of the stack
. - Replace the handle with a proper non terminal (
Handle Pruning): reduce to A where A
CPSC4600 45
Model of an LR Parser
LRP arsing P rogram
sm
X m
sm -1
X m -1
s0
...
a 1 ... a i ... $a n
O utputS tack
Input
action goto
CPSC4600 46
Problem: when to shift, when to reduce?
Recall grammar:E T + E | TT num * T | num | (E)
how to know when to reduce and when to shift?
CPSC4600 47
Model of - Shift Reduce Parsing
- Stack + input = current right sentential form. Locate the handle during the parsing:
shift zero or terminals onto the stack until a h andle is on top of the stack.
- Replace the handle with a proper non terminal ( Handle Pruning)
CPSC4600 48
What we need to know to do LR parsing
LR(0) states describe states in which the parser can be Note: LR(0) states are used by both LR(0) and SLR
parsers Parsing tables
transitions between LR(0) states, actions to take at transition:
shift, reduce, accept, error How to construct LR(0) states How to construct parsing tables How to drive the parser
CPSC4600 49
An LR(0) state = a set of LR(0) items
An LR(0) item [X --> a.b] says that the parser is looking for an X it has an aon top of the stack expects to find in the input a string derived from b.
Notes: [X --> a.ab] means that if a is on the input, it can be
shifted. That is: a is a correct token to see on the input, and shifting a would not “over-shift” (still a viable
prefix). [X -->a.] means that we could reduce X
CPSC4600 50
LR(0) states
S’ . E
E . T
E .T + E
T .(E)
T .num * T
T .num
S’ E . E T.
E T. + E
T num. * T
T num.
T (. E)
E .T
E .T + E
T .(E)
T .num * T
T .num
E T + E.
E T + . E
E .T
E .T + E
T .(E)
T .num * T
T .num
T num * .T
T .(E)
T .num * T
T .num
T num * T.
T (E.)
T (E).
E T
(
num
num *
)
E
E
T
num
(
(
num
T
(
CPSC4600 51
SLR Parsing
Remember the state of the automaton on each prefix of the stack
Change stack to contain pairs Symbol, DFA State
CPSC4600 52
SLR Parsing (Contd.)
For a stack sym1, state1 . . . symn, staten
staten is the final state of the DFA on sym1 … symn
Detail: The bottom of the stack is any,start where any is any dummy state start is the start state of the DFA
CPSC4600 53
Goto Table
Define Goto[i,A] = j if statei A statej where A is a nonterminal
Goto is just the transition function of the DFA One of two parsing tables
CPSC4600 54
Parser Moves
Shift x Push a, x on the stack a is current input x is a DFA state
Reduce A As before
Accept Error
CPSC4600 55
Action Table
For each state si and terminal a If si has item X .a and there is a transition on
terminal a from state i to state j then Action[i,a] = shift j
If si has item X . and a Follow(X) and X != S’ then Action[i,a] = reduce X
If si has item S’ S. then action[i,$] = accept
Otherwise, action[i,a] = error
CPSC4600 56
SLR Parsing Algorithm
Let I = w$ be initial inputLet j = 0Let DFA state 1 have item S’ .SLet stack = dummy, 1repeat
case action[top_state(stack),I[j]] ofshift k: push I[j++], k reduce X A: pop |A| pairs, I[--j] = X // prepend X to input
accept: halt normallyerror: halt and report error
CPSC4600 57
Notes on SLR Parsing Algorithm
Note that the algorithm uses only the DFA states and the input The stack symbols are never used!
However, we still need the symbols for semantic actions
CPSC4600 58
The Compiler So Far
Lexical analysis Detects inputs with illegal tokens
Parsing Detects inputs with ill-formed parse trees
Semantic analysis Last “front end” phase Catches all remaining errors
CPSC4600 59
Typical Semantic Errors
multiple declarations: a variable should be declared (in the same scope) at most once
undeclared variable: a variable should not be used before being declared.
type mismatch: type of the left-hand side of an assignment should match the type of the right-hand side.
wrong arguments: methods should be called with the right number and types of arguments.
CPSC4600 60
Sample Semantic Analyzer
For each scope in the program: process the declarations
add new entries to the symbol table (or a similar structure) and
report any variables that are multiply declared process the statements
find uses of undeclared variables,
use the symbol-table information to determine the type of each expression, and to find type errors.
CPSC4600 61
Scope Rules for Pascal-
Rule 6.1: All constants, types, variables, and procedures definedin the same block must have different names
Rule 6.2: A constant, type, or variable defined in a block is normallyknown from the end of its declaration to the end of the block. A procedure defined in a block B is normally known from the beginning of the procedure to the end of the block B
Rule 6.3: Consider a block Q that defines an object x. If Q contains a block R that defines another object named x, the first object is unknown in the scope of the second object.
CPSC4600 62
Pascal- Program (1)
{ 0 Begin Standard Block} 1 program P; 2 type T = array[1..100] of integer; 3 var x: T; 4 5 procedure Q(x: integer); 6 const c = 13; 7 begin ... x ... end{Q}; 8 9 procedure R; 10 var b, c: Boolean; 11 begin ... x ...end{R}; 12 13 begin ... end.{P} 14 {End Standard block}
CPSC4600 63
Pascal- Program (2)
{Constant = Numeral | ConstantName.}procedure Constant(Stop: Symbols);begin if Symbol = Numeral1 then Expect(Numeral, Stop) else if Symbol = Name1 then begin Find(Argument); Expect(Name1, Stop) end else SyntaxError(Stop)end;
CPSC4600 64
Pascal- Program (3)
{ConstantDefinition = ConstantName '=' Constant ';'.}procedure ConstantDefinition(stop: Symbols);begin ExpectName(Name, Symbols[Equal1, Semicolon1] +
ConstantSymbols + Stop); Expect(Equal1, ConstantSymbols + Symbols[Semicolon1]
+ Stop); Constant(Symbols[Semicolon1] + Stop); Define(Name); Expect(Semicolon1, Stop)end;
CPSC4600 65
Pascal- Program (4)
{Program = 'program' ProgramName ';' BlockBody '.'}procedure Programx(Stop: Symbols);begin Expect(Program1, Symbols[Name1, Semicolon1, Period1]
+ BlockSymbols + Stop); Expect(Name1, Symbols[Semicolon1, Period1] + BlockSymbols + Stop); Expect(Semicolon1, Symbols[Period1] + BlockSymbols +
Stop); NewBlock; BlockBody(Symbols[Period1] + Stop); EndBlock; Expect(Period1, Stop)end;
CPSC4600 66
Pascal- Program (5-1)
{Constant = Numeral | ConstantName.}procedure Constant(var Value: integer; var Typex: Pointer; Stop: Symbols);begin if Symbol = Numeral1 then begin Value := Argument; Typex := TypeInteger; Expect(Numeral, Stop) end else if Symbol = Name1 then begin Find(Argument, Object); if [email protected] = Constantx then begin Value := [email protected]; Typex := [email protected]; end
CPSC4600 67
Pascal- Program (5-2)
else begin KindError(object); Value := 0; Typex := TypeUniversal; end; Expect(Name1, Stop) end else begin SyntaxError(Stop); Value := 0; Typex := TypeUniversal; end;end;
CPSC4600 68
Pascal- Program (6)
{ConstantDefinition = ConstantName '=' Constant ';'.}
procedure ConstantDefinition(stop: Symbols); var Name, Value: integer; Constx, Typex: Pointer; begin ExpectName(Name, Symbols[Equal1, Semicolon1] + ConstantSymbols + Stop); Expect(Equal1, ConstantSymbols + Symbols[Semicolon1] +
Stop); Constant(Value, Typex, Symbols[Semicolon1] + Stop); Define(Name, Constantx, Constx); [email protected] := Value; [email protected] := Typex; Expect(Semicolon1, Stop) end;
CPSC4600 69
Static and Dynamic Scope
#include <stdio.h> int main() { int x = 1; char x = ‘b’; char y = ‘a’; q(); void p() { return 0 double x = 2.5; } printf(“%c\n”,y}; { int y[10]; } } void q() { int y = 42; printf(%d\n”, x); p(); }