Top-Down Parsing
Teoría de Autómatas y Lenguajes FormalesM. Luisa González Díaz
Universidad de Valladolid, 2006
Task
Parsing (of course); but do it:• Top-Down• Easy and algorithmic• Efficiently
– Knowing (input) as little as possible– Marking errors as soon as possible
Exampletype → simple | ^ simple | array [simple] of typesimple → integer | char | num ptpt num
array [ num ptpt num ] of char $
type
Lexical Analyzer
] of typearray [ simple
Exampletype → simple | ^ simple | array [simple] of typesimple → integer | char | num ptpt num
array [ num ptpt num ] of char $
type
Lexical Analyzer
] of typearray [ simple
array
Exampletype → simple | ^ simple | array [simple] of typesimple → integer | char | num ptpt num
[ num ptpt num ] of char $
type
Lexical Analyzer
] of typearray [ simple
array [
Exampletype → simple | ^ simple | array [simple] of typesimple → integer | char | num ptpt num
num ptpt num ] of char $
type
Lexical Analyzer
] of typearray [ simple
num ptpt num
array [ num
Exampletype → simple | ^ simple | array [simple] of typesimple → integer | char | num ptpt num
ptpt num ] of char $
type
Lexical Analyzer
] of typearray [ simple
num ptpt num
array [ num ptpt
Exampletype → simple | ^ simple | array [simple] of typesimple → integer | char | num ptpt num
num ] of char $
type
Lexical Analyzer
] of typearray [ simple
num ptpt num
array [ num ptpt num
Exampletype → simple | ^ simple | array [simple] of typesimple → integer | char | num ptpt num
] of char $
type
Lexical Analyzer
] of typearray [ simple
num ptpt num
array [ num ptpt num ]
Exampletype → simple | ^ simple | array [simple] of typesimple → integer | char | num ptpt num
of char $
type
Lexical Analyzer
] of typearray [ simple
num ptpt num
array [ num ptpt num ] of
Exampletype → simple | ^ simple | array [simple] of typesimple → integer | char | num ptpt num
char $
type
Lexical Analyzer
] of typearray [ simple
num ptpt num simple
char
array [ num ptpt num ] of char
Exampletype → simple | ^ simple | array [simple] of typesimple → integer | char | num ptpt num
$
type
Lexical Analyzer
] of typearray [ simple
num ptpt num simple
char
array [ num ptpt num ] of char
Exampletype → simple | ^ simple | array [simple] of typesimple → integer | char | num ptpt num
type
Lex. An.
] of typearray [ simple
num ptpt num simple
char
array [ num ptpt num ] of char
$ (no more)
$
Codeprocedure type;
if … thenmatch(array); match (‘[‘);simple;match(‘]‘); match(of); type
else if … thenmatch(‘^’); simple
else if ... thensimple
else error
Exampletype → simple | ^ simple | array [simple] of typesimple → integer | char | num ptpt num
^ integer
type
Lexical Analyzer^
Exampletype → simple | ^ simple | array [simple] of typesimple → integer | char | num ptpt num
integer $
type
Lexical Analyzer
^ simple
^ integer
match
procedure match (t: token);begin
if lookahead = t thenlooakahead := nexttoken (lexical
analyzer)else error
end
Solving ifs
We said “that rule” because we know next input symbol (lookahead):
type:• array :• ^ :• rule 1:
– any other : error will be detected later– integer, char, num : error detected now
type → simple | ^ simple | array [simple] of typesimple → integer | char | num ptpt num
rule 3rule 2
Predictive Parsing Table
PPT integer char num array ^ [ of ] $
type rule 1 rule 1 rule 1 rule 3 rule 2
simple rule 4 rule 5 rule 6
PPT … terminal …… … … …
Non terminal … rule …… … … …
Generalizingtype → simple | ^ simple | array [simple] of typesimple → integer | char | num ptpt num
Choose rule A → α when a (lookahead) can appear as first symbol derived from α
α
βa
A
B
γa
A
α = δβa
A
α =
FIRST (α) := { a Є ΣT / α ═>* a β } Stillnot complete!
Predictive Parsing Table
PPT … a …… …A … A → α …… …
PPT … terminal …… …
Non terminal … rule …… …
A → α with α≠ε to PPT [ A, FIRST(α) ]
FIRST
• FIRST (a) = { a }
• FIRST (A) = FIRST (α)
• FIRST (A α) = FIRST (A)
FIRST (α) := { a Є ΣT / α ═>* a β }
Stillnot complete!
A→α Є PU
E → T E’E’ → + T E’ | ab
T → F T’
T’ → * F T’ | aF → ( E ) | n
FIRST
First (E) = First (TE’) = First (T)First(E’) = First ( +TE’) U First (ab) = {+, a }
E E’ T T’ F
+a
First (T) = First (FT’) = First (F)First (T’) = First ( *FT’) U First (a) = { *, a }
*a
First (F) = { (, n }
(n(n
(n
E → T E’E’ → + T E’ | ab
T → F T’
T’ → * F T’ | aF → ( E ) | n
FIRST E E’ T T’ F
+a
*a
(n
(n
(n
A → α with α ≠ ε
to T [ A, First (α) - {ε} ]
( n a * + ) $
E
E’
T
T’
FE → T E’
TE’ TE’
E’ → + T E’
+TE’
E’ → ab
ab
T → FT’
FT’ FT’
T’ → * F T’*FT’
T’ → aa
F → ( E )(E)
F → nn
A bad example
program → program id ; | program id ( par-list ) ;
program
Lexical Analyzer
program id ;
program id ;
A bad example: backtracking
program
Lexical Analyzer
program id
program id (
;
$ (no more)
program → program id ; | program id ( par-list ) ;
Ups!
A bad example: backtracking
program
Lexical Analyzer
program id
program id (
;
$ (no more)
program → program id ; | program id ( par-list ) ;
A bad example: backtracking
program
Lexical Analyzer
program id ;
$ (no more)
program → program id ; | program id ( par-list ) ;
( param-list )
Not so bad: factorising
program → program id R ; R → ( par-list ) | ε
program
Lexical Analyzer
program id R
program id ;
;
ε
$ (no more)
$
Another bad example
E → E + n | n
E
+ nE
+ nE
+ nE
n
n + n + n + n
Another bad example
E → E + n | n
Lexical Analyzer
E
+ nE
$
n + n + n
+ nE
$
n
Not so bad either: eliminating left recursion
E → E + n | n E → n E’E’ → + n E’ | ε
E
n E’
n + n + n
+ n E’
+ n E’
ε
$
$ Lexical Analyzer
FIRSTFIRST (α) := { a Є ΣT / α ═>* a β } { ε }
∩
A=>*ε
E → T E’E’ → + T E’ | εT → F T’
T’ → * F T’ | ε
F → ( E ) | n
FIRST
First (E) = First (TE’)First(E’) = {+, ε }
E E’ T T’ F
+ε
First (T) = First (FT’)First (T’) = { *, ε }
*ε
First (F) = { (, n }
(n(n
(n
E → T E’E’ → + T E’ | εT → F T’
T’ → * F T’ | ε
F → ( E ) | n
FIRST E E’ T T’ F
+ε
*ε
(n
(n
(n
First ( E’F ) = {+, (, n }
First ( E’T’F ) = { + , (, n }, *
First ( E’T’T’ ) = { + , ε }, *
FIRST
• FIRST (a) = { a }
• FIRST (A) = FIRST (α)
• FIRST (A α) = (FIRST (A) – { ε }) FIRST (α)
• ε Є FIRST (X1X2 … Xp) iff
ε Є FIRST (Xi) i
∩
A=>*ε
AFIRST (α) := { a Є ΣT / α ═>* a β } { ε }
∩
A=>*ε
A→α Є PU
Remember
program → program id R ; R → ( par-list ) | ε
program
Lexical Analyzer
program id R
program id ;
;
ε
$ (no more)
$
Remember
program → program id R ; R → ( par-list ) | ε
program
Lexical Analyzer
program id R
program id id
;
ε
$ (no more)
ERROR
Remember
program → program id R ; R → ( par-list ) | ε
program
Lexical Analyzer
program id R
program id id
;
$ (no more)
ERROR
“Marking errors as soon as possible”
ε - rulesChoose rule A → ε when a (lookahead)
can appear following A in a sentential form
FOLLOW (A) := { a Є ΣT / S ═>* α Aa β } { $ }
∩S=>*αA
…
aA
C
α β
S
BA
C
α β
S
a
…
…
…
BA
C
α a
S
ε
aA
C
α β
S
...
…
B
A
C
α
S $
…
Follow
• Righthand sides: α A β with β ≠ εadd First(β)-{ε} to Follow (A)
• For every rule B → α A add Follow (B) to Follow (A)
• For every rule B → α A β with β ═> * ε add Follow (B) to Follow (A)
• Add $ to Follow (Start Symbol)
Follow: algorithm
0. Add $ to Follow (Start symbol)1. Righthand sides: α A β with β ≠ ε
add First(β)-{ε} to Follow (A)2. For every rule
B → α A orB → α A β with β ═> * ε add Follow (B) to Follow (A)
E → T E’E’ → + T E’ | εT → F T’
T’ → * F T’ | ε
F → ( E ) | n
FIRST E E’ T T’ F
+ε
*ε
(n
(n
(n
FOLLOW E E’ T T’ F
0) $
0) Add $ to Follow (Start Symbol)
E → T E’E’ → + T E’ | εT → F T’
T’ → * F T’ | ε
F → ( E ) | n
FIRST E E’ T T’ F
+ε
*ε
(n
(n
(n
FOLLOW E E’ T T’ F
0)
1) T E’
+
+TE’
1) Righthand sides:
α A β with β ≠ ε
Add First(β)-{ε} to Follow (A)
$
T E’α A β
As before
E → T E’E’ → + T E’ | εT → F T’
T’ → * F T’ | ε
F → ( E ) | n
FIRST E E’ T T’ F
+ε
*ε
(n
(n
(n
FOLLOW E E’ T T’ F
0)
1) F T’
*
* F T’
1) Righthand sides:
α A β with β ≠ ε
Add First(β)-{ε} to Follow (A)
$
F T’α A β
As before
+
E → T E’E’ → + T E’ | εT → F T’
T’ → * F T’ | ε
F → ( E ) | n
FIRST E E’ T T’ F
+ε
*ε
(n
(n
(n
FOLLOW E E’ T T’ F
0)
1) ( E )
*
1) Righthand sides:
α A β with β ≠ ε
Add First(β)-{ε} to Follow (A)
$
( E )α A β
+
)
E → T E’E’ → + T E’ | εT → F T’
T’ → * F T’ | ε
F → ( E ) | n
FIRST E E’ T T’ F
+ε
*ε
(n
(n
(n
FOLLOW E E’ T T’ F
0)
1)
E → T E’
*
2) For every rule like
B → α A
or
B → α A β with β ═> * ε
Add Follow (B) to Follow (A)
$
E → T E’B → α A
+
)
2)
)
E → T E’B → α A β
E → T E’E’ → + T E’ | εT → F T’
T’ → * F T’ | ε
F → ( E ) | n
FIRST E E’ T T’ F
+ε
*ε
(n
(n
(n
FOLLOW E E’ T T’ F
0)
1)
E’ → + T E’
*
2) For every rule like
B → α A
or
B → α A β with β ═> * ε
Add Follow (B) to Follow (A)
$
E’ → + T E’B → α A
+
)
2)
)
E’ → + T E’B → α A β
E → T E’E’ → + T E’ | εT → F T’
T’ → * F T’ | ε
F → ( E ) | n
FIRST E E’ T T’ F
+ε
*ε
(n
(n
(n
FOLLOW E E’ T T’ F
0)
1)
T → F T’
*
2) Every rule
B → α A
or
B → α A β with β ═> * ε
Add Follow (B) to Follow (A)
$
T → F T’B → α A
+
)
2)
)
T → F T’B → α A β
E → T E’E’ → + T E’ | εT → F T’
T’ → * F T’ | ε
F → ( E ) | n
FIRST E E’ T T’ F
+ε
*ε
(n
(n
(n
FOLLOW E E’ T T’ F
0)
1)
T’ → * F T’
*
2) Every rule
B → α A
or
B → α A β with β ═> * ε
Add Follow (B) to Follow (A)
$
T’ → * F T’B → α A
+
)
2)
)
T’ → * F T’B → α A β
E → T E’E’ → + T E’ | εT → F T’
T’ → * F T’ | ε
F → ( E ) | n
FIRST E E’ T T’ F
+ε
*ε
(n
(n
(n
FOLLOW E E’ T T’ F
0)
1)
*
2) Every rule
B → α A
or
B → α A β with β ═> * ε
Add Follow (B) to Follow (A)
+
2)
$
))$
)) $
))
E → T E’E’ → + T E’ | εT → F T’
T’ → * F T’ | ε
F → ( E ) | n
FIRST E E’ T T’ F
+ε
*ε
(n
(n
(n
FOLLOW E E’ T T’ F
0)
1)
*
2) Every rule
B → α A
or
B → α A β with β ═> * ε
Add Follow (B) to Follow (A)
2)
$
))
$
))
+
$
))
+
$
))
+
$
))
Predictive Parsing Table Construction
FIRST E E’ T T’ F
+ε
*ε
(n
(n
(n
1) A → α with α ≠ ε
to T [ A, First (α) - {ε} ]
FOLLOW E E’ T T’ F
$)
+$)
*+$)
$)
+$)
( n * + ) $
E
E’
T
T’
FE → T E’
TE’ TE’
FIRST E E’ T T’ F
+ε
*ε
(n
(n
(n
1) A → α with α ≠ ε
to T [ A, First (α) - {ε} ]
FOLLOW E E’ T T’ F
$)
+$)
*+$)
$)
+$)
( n * + ) $
E TE’ TE’
E’
T
T’
FE’ → + T E’
+TE’
FIRST E E’ T T’ F
+ε
*ε
(n
(n
(n
1) A → α with α ≠ ε
to T [ A, First (α) - {ε} ]
FOLLOW E E’ T T’ F
$)
+$)
*+$)
$)
+$)
( n * + ) $
E TE’ TE’
E’ +TE’
T
T’
FT → F T’
FT’ FT’
FIRST E E’ T T’ F
+ε
*ε
(n
(n
(n
1) A → α with α ≠ ε
to T [ A, First (α) - {ε} ]
FOLLOW E E’ T T’ F
$)
+$)
*+$)
$)
+$)
( n * + ) $
E TE’ TE’
E’ +TE’
T FT’ FT’
T’
FT’ → * F T’
*FT’
FIRST E E’ T T’ F
+ε
*ε
(n
(n
(n
1) A → α with α ≠ ε
to T [ A, First (α) - {ε} ]
FOLLOW E E’ T T’ F
$)
+$)
*+$)
$)
+$)
( n * + ) $
E TE’ TE’
E’ +TE’
T FT’ FT’
T’ *FT’
FF → ( E ) (E)
FIRST E E’ T T’ F
+ε
*ε
(n
(n
(n
1) A → α with α ≠ ε
to T [ A, First (α) - {ε} ]
FOLLOW E E’ T T’ F
$)
+$)
*+$)
$)
+$)
( n * + ) $
E TE’ TE’
E’ +TE’
T FT’ FT’
T’ *FT’
F (E)F → n n
FIRST E E’ T T’ F
+ε
*ε
(n
(n
(n
1) A → α with α ≠ ε
to T [ A, First (α) - {ε} ]
2) A → ε
or A → α with α ═> * ε to T [ A, Follow (A) ]
FOLLOW E E’ T T’ F
$)
+$)
*+$)
$)
+$)
( n * + ) $
E TE’ TE’
E’ +TE’
T FT’ FT’
T’ *FT’
F (E) nE → ε
ε ε
FIRST E E’ T T’ F
+ε
*ε
(n
(n
(n
1) A → α with α ≠ ε
to T [ A, First (α) - {ε} ]
2) A → ε
or A → α with α ═> * ε to T [ A, Follow (A) ]
FOLLOW E E’ T T’ F
$)
+$)
*+$)
$)
+$)
( n * + ) $
E TE’ TE’
E’ +TE’ ε ε
T FT’ FT’
T’ *FT’
F (E) nT’ → ε
εε ε
( n * + ) $
E TE’ TE’
E’ +TE’ ε ε
T FT’ FT’
T’ *FT’ ε ε ε
F (E) n
Procedure E;If lookahead in {‘(‘, n} then
T ; E’else error
Procedure E’;If lookahead =‘+’ then
match (‘+’); T; E’else if lookahead in {‘)’, $} then
nothingelse error
Procedure F;If lookahead = ‘(‘ then
match (‘(‘) ; E; match (‘)’)else if lookahead = n then
match (n)else error
( n * + ) $
E TE’ TE’
E’ +TE’ ε ε
T FT’ FT’
T’ *FT’ ε ε ε
F (E) n
E
L An
T E’
F T’
n
+
+ T E’
n
F T’
n
*
* F T’
n
n $
$
ε
ε
ε
Recursive Descendent Predictive Parsing Example
( n * + ) $
E TE’ TE’
E’ +TE’ ε ε
T FT’ FT’
T’ *FT’ ε ε ε
F (E) n
stack input output
$E n+n*n$ E → TE’
$E’T n+n*n$ T → FT’
$E’T’F n+n*n$ F → n$E’T’n n+n*n$
$E’T’ +n*n$ T’ → ε
$E’ +n*n$ E’ → +TE’
Predictive Parsing Example non recursive
$E’T+ +n*n$
$E’T n*n$ T → FT’
$E’T’F n*n$ F → n$E’T’n n*n$
$E’T’ *n$ T → *FT’
$E’T’F* *n$
$E’T’F n$ F → n$E’T’n n$
$E’T’ $ T’ → ε
$E’ $ E’ → ε
$ $ OK
Predictive Parsing Example
non recursive
output
E → TE’
T → FT’
F → n
T’ → ε
E’ → +TE’
T → FT’
T → *FT’
F → n
T’ → ε
E’ → ε
OK
F → n
output
E → TE’
T → FT’
F → n
T’ → ε
E’ → +TE’
Predictive Parsing Example
non recursive
T → FT’
T → *FT’
F → n
T’ → ε
E’ → ε
OK
E
T E’
F T’
n + T E’
F T’
n * F T’
n
$
ε
ε
ε
F → n
L L (1)
1 lookahead input
symbol
Leftmost derivation
Input from the left
A definitely bad exampleS → if c then S | if c then S else S | o
S S
if c then else oif c then o
if c then S elseSif c then S
else Sif c then S if c then S
o o o o
A definitely bad exampleS → if c then S | if c then S else S | o
if c then else oif c then o
headache
studypill
have it
YesNo
Yes
No
( )
if c then if c then
oelse
o
A definitely bad exampleS → if c then S | if c then S else S | o
if c then else oif c then o
headache
pill
have it
YesNo
YesNo
( )
if c then if c then
oelse
o
study
if o else $
S
RFIRST S R
elseε
ifo
$else
FOLLOW S R
A definitely bad example: ambiguous grammar
1:S → if c then S R2:S → o3:R → else S 4:R → ε
$else
1 2
43 4
Not LL(1)
Another definitely bad exampleS → P d | Q e P → a P c | b Q → a Q d | b
S
P d
b
P ca
P ca
S
Q e
b
Q da
Q da
an b cn d an b dn e
Lexical Analyzer… b
S → P d | Q e P → a P c | b Q → a Q d | b
S
P d
b
P ca
P ca
S
Q e
b
Q da
Q da
aa
Another definitely bad example
S → P d | Q e P → a P c | b Q → a Q d | b
First (P) = { a, b }
First (Q) = { a, b }
First (Pd) = { a, b }
First (Qe) = { a, b }
PPT a b c d e $
S
P aPc b
Q aQd b
Pd
Another definitely bad example
PdQe Qe
Not LL(1)
• G ambiguous => not LL(1)• G left recursive => not LL(1)• G not left factorised => not LL(1)
Grammars
LL(1)
ambiguous
left recursive
not left fact.
LL(1) Grammars
The good case
• No left recursion• No left factor problems• No ambiguity• And more
Predictive parsing• Only one symbol to decide :
LL(1) grammars
S → AB | bDBE A → aSeE → eB → bBd | UVU → u | εV → k | εD → aD | k
A complete example
a uε
kε
ab
e
S → AB | bDBE A → aSeE → eB → bBd | UVU → u | εV → k | εD → aD | k
a b d e u k $
S AB bDBE
A aSe
E e
B bBd UV UV UV UV UV
U ε ε u ε ε
V ε ε k ε
D aD k
S → AB
FIRST S A E U VB D
ak
bukε
AB
S → bDBE
bDBE
A → aSe
aSe
E → e
e
B → bBdbBd
B → UVUV UV
U → uu
V → k
k
D → aD
aD
D → k
k
a uε
kε
ab
e
S → AB | bDBE A → aSeE → eB → bBd | UVU → u | εV → k | εD → aD | k
FOLLOW
a b d e u k $
S AB bDBE
A aSe
E e
B bBd UV UV UV UV UV
U ε ε u ε ε
V ε ε k ε
D aD k
S → AB
FIRST S A E U VB D
ak
S A E U VB D
bukε
AB
S → bDBE
bDBE
A → aSe
aSe
E → e
e
B → bBd bBdB → UV UV UVU → uu
V → k
k
D → aD
aD
D → k
k
$ buk
buke
ee d
k
a uε
kε
ab
e
S → AB | bDBE A → aSeE → eB → bBd | UVU → u | εV → k | εD → aD | k
FOLLOW
a b d e u k $
S AB bDBE
A aSe
E e
B bBd UV UV UV UV UV
U ε ε u ε ε
V ε ε k ε
D aD k
S → AB
FIRST S A E U VB D
ak
S A E U VB D
bukε
AB
S → bDBE
bDBE
A → aSe
aSe
E → e
e
B → bBdbBd
B → UVUV UV
U → uu
V → k
k
D → aD
aD
D → k
k
$ buk
buke
ee d
k
a uε
kε
ab
e
S → AB | bDBE A → aSeE → eB → bBd | UVU → u | εV → k | εD → aD | k
FOLLOW
a b d e u k $
S AB bDBE
A aSe
E e
B bBd UV UV UV UV UV
U ε ε u ε ε
V ε ε k ε
D aD k
FIRST S A E U VB D
ak
S A E U VB D
bukε
AB bDBE
aSee
bBd UV UV
u
kaD k
$ buk
buke
ee d
k$e$e$ e
d$
ed$
a uε
kε
ab
e
S → AB | bDBE A → aSeE → eB → bBd | UVU → u | εV → k | εD → aD | k
FOLLOW
a b d e u k $
S AB bDBE
A aSe
E e
B bBd UV UV UV UV UV
U ε ε u ε ε
V ε ε k ε
D aD k
FIRST S A E U VB D
ak
S A E U VB D
bukε
AB bDBE
aSee
bBd UV UV
u
kaD k
buke
$e
$e
ked$
ed$
buk$e
ed$
B → UVU → εV → εUVUV UV
ε ε ε ε
ε ε εLL(1)