Date post:  23Jul2020 
Category:  Documents 
View:  0 times 
Download:  0 times 
Programming Language & Compiler
Topdown Parser
Hwansoo Han
2
Parsing Techniques
❖ Topdown parsers
▪ LL = Lefttoright input scan, Leftmost derivation
▪ Start at the root of the parse tree and grow toward leaves
▪ Pick a production & try to match the input
▪ Bad “pick” may need to backtrack
▪ Some grammars are backtrackfree (predictive parsing)
❖ Bottomup parsers
▪ LR = Lefttoright input scan, Rightmost derivation
▪ Start at the leaves and grow toward root
▪ As input is consumed, encode possibilities in an internal state
▪ Start in a state valid for legal first tokens
▪ Bottomup parsers handle a large class of grammars
3
Topdown Parser
❖ Problems in Topdown parser
▪ Backtrack predictive parser
▪ Leftrecursion may result in infinite loop
❖ Predictive parser
▪ LL(1) property
▪ Left factoring transforms some nonLL(1) to LL(1)
4
Remember the expression grammar?
1 Goal → Expr
2 Expr → Expr + Term
3  Expr – Term
4  Term
5 Term → Term * Factor
6  Term / Factor
7  Factor
8 Factor → number
9  id
And the input x – 2 * y
❖ Version with precedence derived last lecture
5
❖ Let’s try x – 2 * y :
▪ This worked well, except that “–” doesn’t match “+”
▪ The parser must backtrack to here
Topdown Parser – backtrack (1)
Rule Sentential Form Input
— Goal x – 2 * y
1 Expr x – 2 * y
2 Expr + Term x – 2 * y
4 Term + Term x – 2 * y
7 Factor + Term x – 2 * y
9 + Term x – 2 * y
⎯ + Term x – 2 * y
Goal
Expr
Term+Expr
Term
Fact.
Leftmost derivation, choose productions in an order that exposes problems
6
Topdown Parser – backtrack (2)
❖ Continuing with x – 2 * y :
Rule Sentential
Form
Input
— Goal x – 2 * y
1 Expr x – 2 * y
3 Expr – Term x – 2 * y
4 Term – Term x – 2 * y
7 Factor – Term x – 2 * y
9 – Term x – 2 * y
9 – Term x – 2 * y
— – Term x – 2 * y
Goal
Expr
Term–Expr
Term
Fact.
We can advance past
“–” to look at “2”
This time, “–” and
“–” matched
Now, we need to expand Term  the last NT on the fringe
7
❖ Trying to match the “2” in x – 2 * y :
❖ Where are we?
▪ “2” matches “2”
▪ We have more input, but no NTs left to expand
▪ The expansion terminated too soon
Need to backtrack
Rule Sentential Form Input
— – Term x – 2 * y
7 – Factor x – 2 * y
9 – x – 2 * y
— – x – 2 * y
Topdown Parser – backtrack (3)
Goal
Expr
TermExpr
Term
Fact.
Fact.
8
❖ Trying again with “2” in x – 2 * y :
▪ This time, we matched & consumed all the input
Success!
Rule Sentential Form Input
— – Term x – 2 * y
5 – Term * Factor x – 2 * y
7 – Factor * Factor x – 2 * y
8 – * Factor x – 2 * y
— – * Factor x – 2 * y
— – * Factor x – 2 * y
9 – * x – 2 * y
— – * x – 2 * y
Topdown Parser – backtrack (4)
Goal
Expr
Term–Expr
Term
Fact.
Fact.
Term
Fact.
*
9
Left Recursion
❖ Topdown parsers cannot handle leftrecursive grammars
▪ Formally,
A grammar is left recursive if A NT such that
a derivation A + A, for some string (NT T )+
❖ Our expression grammar is left recursive
▪ This can lead to nontermination in a topdown parser
▪ We would like to convert the left recursion to right recursion
10
Eliminating Left Recursion
❖ Remove left recursion
▪ Original grammar
F → F

where neither nor starts with F
▪ Rewrite the above as
F → P
P → P

where P is a new nonterminal
▪ Accepts the same language, but uses only right recursion
11
Eliminating Left Recursion
❖ The expression grammar contains two left recursions
❖ Applying the transformation yields
▪ These fragments use only right recursion
▪ They retain the original left associativity (evaluate left to right)
Expr → Expr + Term
 Expr – Term
 Term
Term → Term * Factor
 Term / Factor
 Factor
Expr → Term Expr
Expr → + Term Expr
 – Term Expr

Term → Factor Term
Term → * Factor Term
 / Factor Term

12
Predictive Parsing
❖ Basic idea
Given A → , the parser should be able to choose between &
❖ FIRST sets
▪ For some rhs G, define FIRST() as the set of tokens that appear as the first symbol in some string that derives from
▪ That is, x FIRST() iff * x , for some
❖ The LL(1) Property (first version) ▪ If A → and A → both appear in the grammar, we would like
FIRST() FIRST() = ▪ This would allow the parser to make a correct choice with a
lookahead of exactly one symbol !
13
Predictive Parsing – LL(1)
❖ What about productions? They complicate the definition of LL(1)
▪ If A → and A → and FIRST(), then we need to ensure that FIRST() is disjoint from FOLLOW(), too
▪ Define FIRST+() for A → as
FIRST() FOLLOW(A), if FIRST()
FIRST(), otherwise
▪ Then, a grammar is LL(1) iff A → and A → implies
FIRST+() FIRST+() =
FOLLOW() is the set of all tokens in the grammar that can legally appear immediately after an
FIRST+() is meaningful, iff is RHS of the rule A →
The FIRST Set
❖ Definition
▪ x FIRST() iff * x , for some
❖ Building FIRST(X)
▪ If X is a terminal (token), FIRST(X) = {X}
▪ If X → , then FIRST(X)
▪ Iterate until no more terminals or can be added to any FIRST(X)
if X → y1y2 … yk then
a FIRST(X) if a FIRST(yi) and FIRST(yh) for all 1 ≤ h < i
FIRST(X) if FIRST(yi) for all 1 ≤ i ≤ k
End iterate
❖ Note
▪ If FIRST(y1), then FIRST(yi) is irrelevant, for i > 1
14
The FOLLOW Set
❖ Definition
▪ FOLLOW(A) is the set of terminals that can appear immediately to the right of A in some sentential form
❖ Building FOLLOW(X) for all nonterminal X
▪ EOF FOLLOW(S)
▪ Iterate until no more terminals can be added to any FOLLOW(X)
If A → B, then put FOLLOW(A) in FOLLOW(B)
If A → B, then put {FIRST()  } in FOLLOW(B)
If A → B and FIRST(), then put FOLLOW(A) in FOLLOW(B)
End iterate
❖ Note
▪ FOLLOW is for nonterminals, no FOLLOW for terminals
▪ No in FOLLOW(X) for any nonterminal X
15
16
Left Factoring
❖ What if my grammar does not have the LL(1) property?
Sometimes, we can transform the grammar
A NT, find the longest prefix that occurs in two
or more righthand sides of A
if ≠ then replace all of the A productions, A → 1  2  …  n  ,
with A → Z  Z → 1  2  …  n
where Z is a new element of NT
Repeat until no common prefixes remain
A → 1  2  3
A → Z Z → 1
 2  3
17
Left Factoring (An example)
❖ Consider the following expression grammar
❖ After left factoring, it becomes
❖ This form has the same syntax, with the LL(1) property
Factor → Identifier
 Identifier [ ExprList ]
 Identifier ( ExprList )
FIRST+(rhs1) = { Identifier }
FIRST+ (rhs2) = { Identifier }
FIRST+ (rhs3) = { Identifier }
Factor → Identifier Arguments
Arguments → [ ExprList ]
 ( ExprList )

FIRST+ (rhs1) = { [ }
FIRST+ (rhs2) = { ( }
FIRST+ (rhs3) = { } U
FOLLOW(Factor)
It has the LL(1) property
Click here to load reader