+ All Categories
Home > Documents > Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of...

Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of...

Date post: 03-Jan-2016
Category:
Upload: joshua-aron-hunt
View: 226 times
Download: 0 times
Share this document with a friend
Popular Tags:
55
Chapter 4 Top-Down Parsing Dr. Frank Lee
Transcript
Page 1: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

Chapter 4

Top-Down Parsing

Dr. Frank Lee

Page 2: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

Parsing and Parsers• Once we have described the syntax of our

programming language using a context-free grammar, the next step is to determine if a string of tokens returned from the lexical analyzer could be derived from that context-free grammar

• Determining if a sequence of tokens is syntactically correct is called parsing

• Two main strategies: – top-down parsing: from the initial symbol and working

down (modern parsers: e.g. JavaCC)– bottom-up parsing: from the leaves and working up

(earlier parsers: e.g. yacc)

Page 3: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1 Recursive Descent Parsers

• Top-down parsers are usually implemented as a mutual resursive suite of functions that descend through a parse tree for the string, and as such are called “recursive descent parsers” (RDP).

• Recursive descent parsers fall into a class of parsers known as LL(k) parsers

• LL(k) stands for Left-to-right, Leftmost-derivation, k-symbol lookahead parsers

• We first examine LL(1) parsers – LL parsers with one symbol lookahead

• See Fig.4.1 (a CFG, p49) and its RDP in Fig. 4.2 (p50)• To build the RDP, at first, we need to create the “First”

and “Follow” sets of the non-terminals in the CFG.

Page 4: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

T = {e, f, g, h, i}

N = {S’, S, A, B, C, D}

R = (0) S’ S$

(1) S AB | Cf

(2) A ef | ε

(3) B hg

(4) C DD | fi

(5) D g

Start Symbol = S’

4.1.1 First Sets and Follow SetsFor the following context-free grammar (Fig. 4.4, p52)($ is an end-of-file marker, ε means empty string )

Page 5: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1.1 First Sets and Follow Sets (cont.)

• First Sets– The First set of a non-terminal A is the set of all terminals

that can begin a string derived from A– If the empty string ε can be derived from A, then ε is also

in the First set of A– For instance, consider again the context-free grammar in

Figure 4.4 (p52 or previous slide)– In this grammar, the set of all strings derivable from the

non-terminal S’ are {efhg, fif, ggf, hg}– Thus, the First(S’) = {e,f,g,h}, where e,f,g and h are the first

terminal of each string in the above terminal set, respectively

– Similarly, we can derive the First sets of S, A, B, C and D as in Figure 4.5 (p52)

Page 6: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1.1 First Sets and Follow Sets (cont.)

• Follow Sets– For each non-terminal in a grammar, we can also create a

Follow set– The Follow set for a non-terminal A in a grammar is the set of all

terminals that could appear right after A in a derivation– Take another look at the CFG in Fig. 4.4 (p52), what terminals

can follow A in a derivation?– Consider the derivation S’ S$ AB$ Ahg$, since h follows

A in this derivation, h is in the Follow set of A. Note: $ is the end-of-file marker.

– What about the non-terminal D? Consider the partial derivation: S’ S$ Cf$ DDf$ Dgf$

– Since both f and g follow a D in this derivation, f and g are in the Follow set of D

– Fig. 4.6 (p53) lists the Follow sets of all non-terminals in the CFG in Fig. 4.4

Page 7: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1.2 Finding First and Follow Sets

• To calculate the First set of a non-terminal A, we need to calculate the First set of a string of terminals and non-terminals, since a non-terminal can derive strings which contain terminals and non-terminals

• See the two algorithms on page 54• We now use the algorithms above to find

the First set of each non-terminal in the CFG in Fig. 4.4 (p52) as below:

Page 8: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1.2.1 Finding First Sets• Assume the CFG in Fig. 4.4 (p52) is G.• At beginning, for each non-terminal A in G, set

First(A) = { }, an empty set

(0) S’ S$. Add { } to First(S’) ={ } (no change)(1) S AB. Add { } to First(S) = { } (no change)(2) S C. Add { } to First(S) = { } (no change)(3) A ef. Add e to First(A) = {e}(4) A ε. Add ε to First(A) = {e, ε}(5) B hg. Add h to First(B) = {h}(6) C DD. Add { } to First(C) = { } (no change)(7) C fi. Add f to First(C) = {f}(8) D g. Add g to First(D) = {g}

• Note: Since there were 5 changes, we need another iteration

Page 9: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1.2.1 Finding First Sets

(0) S’ S$. Add { } to First(S’) = { } (no change)(1) S AB. Add e,h to First(S) = {e, h} (2) S C. Add f to First(S) = {e, h, f}(3) A ef. Add e to First(A) = {e, ε} (no change)(4) A ε. Add ε to First(A) = {e, ε} (no change)(5) B hg. Add h to First(B) ={h} (no change)(6) C DD. Add g to First(C) = {f, g}(7) C fi. Add f to First(C) = {f, g} (no change)(8) D g. Add g to First(D) ={g} (no change)

• Note: Since there were 3 changes, we need another iteration

Page 10: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1.2.1 Finding First Sets

(0) S’ S$. Add e,f,h to First(S’) = {e, f, h}(1) S AB. Add e,h to First(S) = {e, f, h} (no change)(2) S C. Add f,g to First(S) = {e, h, f, g}(3) A ef. Add e to First(A) = {e, ε} (no change)(4) A ε. Add ε to First(A) = {e, ε} (no change)(5) B hg. Add h to First(B) = {h} (no change)(6) C DD. Add g to First(C) = {f, g} (no change)(7) C fi. Add f to First(C) = {f, g} (no change)(8) D g. Add g to First(D) = {g} (no change)

• Note: Since there were 2 changes, we need another iteration

Page 11: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1.2.1 Finding First Sets

(0) S’ S$. Add e,f,g,h to First(S’) = {e, h, f, g}(1) S AB. Add e,h to First(S) = {e, f, h} (no change)(2) S C. Add f,g to First(S) = {e, h, f, g} (no change)(3) A ef. Add e to First(A) = {e, ε} (no change)(4) A ε. Add ε to First(A) = {e, ε} (no change)(5) B hg. Add h to First(B) = {h} (no change)(6) C DD. Add g to First(C) = {f, g} (no change)(7) C fi. Add f to First(C) = {f, g} (no change)(8) D g. Add g to First(D) = {g} (no change)

• Note: Since there was 1 change, we need another iteration

Page 12: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1.2.1 Finding First Sets

(0) S’ S$. Add e,f,g,h to First(S’) (no change)(1) S AB. Add e,f, h to First(S) (no change)(2) S C. Add f,g to First(S) (no change)(3) A ef. Add e to First(A) (no change)(4) A ε. Add ε to First(A) (no change)(5) B hg. Add h to First(B) (no change)(6) C DD. Add g to First(C) (no change)(7) C fi. Add f to First(C) (no change)(8) D g. Add g to First(D) (no change)

• Since there were no changes, we stop• Note, if we examine the rules in a different order (such

as 8,7,6,5,4,3,2,1,0) in each iteration, we will still get the same result

Page 13: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1.2.2 Finding Follow Sets

• If the grammar contains the rule: S Aa, then a is in the Follow set of A, since it appears immediately after A

• If the grammar contains the rules: S ABB a │ bthen both a and b are in the Follow set of A, Why?

• Consider the following two partial derivations:S AB AaS AB Ab

So both a and b are in the Follow set of A.• If the grammar contains the rules:

S ABCB a │ b │ εC c │ dthen a, b, c and d are all in the Follow set of A. Why?

Page 14: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1.2.2 Finding Follow Sets• Consider the grammar:

S AxA Cthen x is in the Follow sets of A and C. Why?

• Consider the another grammar:S AxA CDD εthen x is in the Follow sets of A, C and D. Why?

• We now use the algorithms on page 56 to find the Follow set of each non-terminal in the CFG in Fig. 4.4 as below:

• Assume the CFG in Fig. 4.4 is G. At beginning, for each non-terminal A in G, set Follow(A) = { }, an empty set.

Page 15: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1.2.2 Finding Follow Sets• We can calculate the Follow sets from the First sets by

using the recursive algorithm on page 56.• For example, consider the following CFG, for which we

calculate the First sets for all non-terminals:Start Symbol = S’Non-terminals = {S,T,U,V}Terminals = {a,b,c,d}Rules = (0) S’ S$ (1) S TU (2) T aVa (3) T ε (4) U bVT (5) V Ub (6) V d

• The First sets of non-terminals are:S’ = {a,b}; S = {a,b}; T = {a, ε}; U = {b}; and V = {b,d}

Page 16: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

Algorithm to Find Follow Sets (p56)

1. Find First(A) for all non-terminals A in G.

2. Set Follow(A) = { } for all non-terminals A in G.

3. For each rule A γ in G, where γ is a string of terminals and non-terminals.

For each non-terminal C in γ, if the rule is of the form A α C β, where α and β are (possibly empty) strings of terminals and non-terminals:

3.1 If First(β) does not contain ε, then add all elements of First(β) to Follow(C).

3.2 If First(β) contains ε, then add all elements of First(β) except ε and all elements of Follow(A) to

Follow(C).

• Note: The First sets of non-terminals are (see previous slide):S’ = {a,b}; S = {a,b}; T = {a, ε}; U = {b}; and V= {b,d}

Page 17: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1.2.2 Finding Follow Sets• At beginning, we set Follow(S) = { } for all non-terminals S, and then

go through each rule in the CFG on page 56

(0) S’ S$ Add {$} to Follow (S) = {$}.(1) S TU Add First(U), {b}, to Follow(T) = {b}

Add Follow(S), {$}, to Follow(U) = {$}(2) T aVa Add {a} to Follow(V) = {a}(3) T ε (no change)(4) U bVT Add First(T), {a}, to Follow(V) = {a}

Add Follow(U), {$}, to Follow(T) = {b, $} Add Follow(U), {$}, to Follow(V) = {a, $} (for T

ε)(5) V Ub Add {b} to Follow(U) = {b, $}(6) V d (no change)

• The Follow sets of non-terminals are:S’ = { }; S = {$}; T = {b, $}; U = {b, $}; and V = {a,$}

• Note: Since there were some changes, we need another iteration

Page 18: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1.2.2 Finding Follow Sets(0) S’ S$ Add $ to Follow (S) = {$}. (no change)(1) S TU Add First(U), b, to Follow(T) = {b, $} (no change)

Add Follow(S), {$}, to Follow(U) = {b, $} (no change)(2) T aVa Add {a} to Follow(V) = {a, $} (no change)(3) T ε (no change)(4) U bVT Add First(T), a, to Follow(V) = {a, $} (no change)

Add Follow(U), {$}, to Follow(T) = {b, $} (no change)

Add Follow(U), {b, $}, to Follow(V) = {a, b, $} (for T ε)

(5) V Ub Add b to Follow(U) = {b, $} (no change)(6) V d (no change)

• The Follow sets of non-terminals are:S’ = { }; S = {$}; T = {b, $}; U = {b, $}; and V = {a, b, $}

• Note: Since there was 1 change, we need another iteration

Page 19: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1.2.2 Finding Follow Sets(0) S’ S$ Add $ to Follow S = {$}.. (no change)(1) S TU Add First(U), b, to Follow(T ) = {b, $} (no change)

Add Follow(S), {$}, to Follow(U) = {b, $} (no change)(2) T aVa Add a to Follow(V) = {a, $} (no change)(3) T ε (no change)(4) U bVT Add First(T), a, to Follow(V) = {a, $} (no change)

Add Follow(U), {$}, to Follow(T) = {b, $} (no change)

Add Follow(U), {b, $}, to Follow(V) = {a, b, $} (no change)

(5) V Ub Add b to Follow(U) = {b, $} (no change)(6) V d (no change)

• The Follow sets of non-terminals are:S’ = { }; S = {$}; T = {b, $}; U = {b, $}; and V = {a, b, $}

• Note: Since there were no changes, we stop.

Page 20: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1.3 LL(1) Parse Tables ($)• Once we have First and Follow sets, we can create a

Parse Table• A parse table is a blueprint for the creation of a recursive

descent parser (RDP)• The rows in the parse table are labeled with non-

terminals and the columns are labeled with terminals• Each entry in the parse table is either empty or contain a

grammar rule• The rule locates in row S, column a of a parse table tells

us which rule to apply when we are trying to parse the non-terminal S, and the next symbol in the input is an a

• For instance, for the grammar in Fig. 4.1 (p49), the parse table is:

Page 21: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1.3 LL(1) Parse Tablesid num while print > { } ; ( )

S Swhile(B) S S print(E) S {L}

E Eid Enum

B BE>E BE>E

L LSL LSL LSL Lε

• Find the First sets and Follow sets of all non-terminals.• Place each rule of the form S γ in row S in each column in First(γ), where γ is the First set of terminals and non-terminals. • Place each rule of the form S γ in row S in each column in Follow(S), where First(γ) contains ε.• Self-practice #1: Find the First and Follow sets (step by step) for the CFG in Figure 4.1 (p49), then use the CFG and two sets to create the parse table as on page 57.

Page 22: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1.3 LL(1) Parse Tables

• Consider again the CFG in Fig. 4.4. (p52) We have the First and Follow sets of each non-terminal:

Non-terminal First Follow

S’ {e,f,g,h} { }

S {e,f,g,h} {$}

A {e,ε} {h}

B {h} {$}

C {f,g} {f}

D {g} {f,g}

Page 23: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1.3 LL(1) Parse Tables

• We can now create the parse table for this grammar as below:

e f g h i

S’ S’ S$ S’ S$ S’ S$ S’ S$

S S AB S C S C S AB

A A ef A ε

B B hg

C C fi C DD

D D g

Page 24: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.1.4 Creating LL(1) Parsers -- Example

• We can crate a sample LL(1) parser by using the CFG in Fig. 4.7 to generate the First set, Follow set, and parse table as on page 59.

• Once we have the parse table, creating a suite of recursive functions is trivial. For instance, the function to parse an S based on the above parse table is shown on page 60.

• Self-Practice #2: Use the CFG in Figure 4.7 to generate the First set and Follow set (step by step), and create the parse table as on page 59.

Page 25: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.2 Grammars That Are Not LL(1)

• If we can build an LL(1) parse table for a grammar that has no duplicate entries, then we say that grammar is LL(1)

• Unfortunately, not all grammars are LL(1). For instance, the grammar in Fig. 4.8 comes up with First and Follow sets, and the parse table on page 60

• The parse table includes only one non-terminal E, but it has 5 entries in the id column. Hence, the grammar in Fig. 4.8 is ambiguous and we can not create an LL(1) parser for it

• If we wish to be able to parse the grammar in Fig. 4.8, we need to create an equivalent grammar that is not ambiguous

Page 26: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.2.1 Removing Ambiguity• There are four ways in which ambiguity can creep into (get

into) a CFG for a programming language:– Defining expressions: the straightforward definition of expressions

will often lead to ambiguity, such as Fig. 4.8.– Defining complex variables: complex variables, such as instance

variables in classes, fields in records or structs, array subscripts and pointer references, can also lead to ambiguity

– Overlap between specific and general cases: For example, in Fig. 4.9 (p61), the terminal id has several leftmost derivations (and hence several parse trees): ET, ETid, ETFid.

– Nesting statements: For instance, the nesting “if e then if e then a else a” statement on page 62 has two parse trees

• There are some languages that are inherently ambiguous• There is no algorithm that will always remove ambiguity

from a context-free grammar

Page 27: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.2.2 Removing Left Recursion

• An unambiguous grammar may still not be LL(1)

• A rule Sα (where S is a non-terminal and α is a string of terminals and non-terminals) is left-recursive if the first symbol in α is S

• The rules (1), (2), (4) and (5) in Fig. 4.10 (p63) are left-recursive

• No left-recursive grammar is LL(1)

Page 28: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.2.2 Removing Left Recursion

• For example:S S αS β

• Consider the following partial derivations:S S α S α α S α α α β α α α

• Using EBNF notation, we have:S β(α)*

• Using CFG notations, we have:S βAA αAA ε

• We have removed the left-recursion in the above example!!

Page 29: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.2.2 Removing Left Recursion

• In general, the set of rules of the form:S Sα1; S Sα2 ; S Sα3 ; ..... ; S Sαn

S β1 ; S β2 ; S β3 ; ….. ; S βn

• Can be rewritten as:S BA

B β1│β2│β3│…..│βn

A α1A│α2A│α3A│.....│αnA

A ε

Page 30: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.2.2 Removing Left Recursion

• Let’s take a closer look at the expression grammar:E E + TE E – TE T

• Using the above transformation, we get the following CFG, which has no left-recursion:E TE’E’ +TE’E’ -TE’E’ ε

• Using EBNG notations, we have:E T((+E)│(-E))*

Page 31: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.2.3 Left Factoring

• Even if a CFG is unambiguous and has no left-recursion, it still may not be LL(1).

• Consider the following two Fortran do statements:do do

var = initial, final var = initial, final, inc

loop body loop body

end do end do

Page 32: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.2.3 Left Factoring• We can describe the Fortran do statement with the following context

free grammar fragment:S do LSL id = exp, expL id = exp, exp, exp

• This CFG is not LL(1). Why? Because there are two rules for L. We can not tell which rule to use by looking only at the first symbol L.

• We can fix this problem by left-factoring the similar section of the rule as follows:S do LSL id = exp, exp L’L’ , expL’ ε

• Using EBNF notations, the Fortran do statement can also be written as follows:S do LSL id = exp, exp (, exp)? // single rule for L

Page 33: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.2.3 Left Factoring

• In general, if we have the following context-free grammar, where α and βk stand for strings of terminals and non-terminals:S αβ1 ; S αβ2 ; S αβ3 ; ... ; S αβn

• We could left-factor it to get the CFG:S αB; B β1 ; B β2 ; B β3 ; … ; B βn

• Using EBNF notations to get:S α(β1│β2│β3│…│βn)

Page 34: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.3 LL(K) Parsers ($)• LL(1) parser needs to decide which rule to apply after

looking at only one token• If more than one single token is requires to determine

which rule to apply, then the grammar is not LL(1)• For instance, consider the following simple CFG:

Terminals = {a, b, c}Non-terminals = {S}Rules = (1) S abc (2) S acbStart symbol = S

• This grammar is not LL(1), since the LL(1) parse table has duplicate entries:

a b c

S S abc; S acb

Page 35: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.3 LL(K) Parsers

• We could left-factor the grammar to make it LL(1)• We also could modify our parser so that it

examines the first two elements in the string to determine which rule to apply

• The resulting parse table would be much larger as follows:

aa ab ac ba bb bc ca cb cc

S S abc S acb

Page 36: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.3 LL(K) Parsers

• An LL(k) parser examines the first k symbols in the input before determining which rule to apply

• In order to create an LL(k) parser, we need to generalize the definitions of First and Follow sets

• Our definitions of generalized First and Follow sets will use the concept of k-prefix

• Definition 9 k-prefix (p66): The k-prefix of a string of terminals w is a string consisting of the first k terminals in w. If │w│≤ k, then the k-prefix of w is w.

• For example; given the terminals {a, b, c, d}, the 3-prefix of the set {abcd, abcc, abdd, ab, ε} is the set {abc, abd, ab, ε}

Page 37: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.3 LL(K) Parsers

• Definition 10 Firstk: The Firstk set of a non-terminal S is the k-prefix of the set of all strings of terminals derivable from S. The Firstk set of a string of terminals and non-terminal γ is the k-prefix of all strings of terminals derivable from γ.

• Definition 11 Followk: The Followk set of a non-terminal S is the k-prefix of the set of all strings of terminals that follow S in a partial derivation.

Page 38: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.3 LL(K) Parsers

Algorithm to calculate Firstk for non-terminals

1. For each non-terminal S in G, set Firstk(S) = { }

2. For each rule S γ in G, add all elements of k-prefix(Firstk(γ)) to Firstk(S)

3. If any changes were made in step 2, go back to step 2 and repeat

Page 39: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.3 LL(K) Parsers

Algorithm to calculate Firstk for a string of terminals and non-terminals:

1. For any terminal a, Firstk(a) = {a}

2. For any string of terminals and non-terminals γ = γ1γ2γ3…γn, Firstk(γ) = k-prefix( Firstk(γ1) ○ Firstk(γ2) ○ Firstk(γ3) ○ … Firstk(γn))

Page 40: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.3 LL(K) Parsers

Algorithm to calculate Followk for non-terminals1. Calculate Firstk(S) for all non-terminals S in G

2. Set Followk = { } for all non-terminals S in G

3. For each rule S γ in G

For each non-terminal S1 in γ where γ = α S1β, add

[k-prefix(Firstk(β) ○ Followk(S))] to Followk(S1). If Followk(S) = { }, add [k-prefix(Firstk(β))] to Followk(S1).

4. If any changes were made in step 3, go back to step 3 and repeat

Page 41: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.3 LL(K) Parsers

Algorithm to create an LL(k) parse table

• For each rule of the form S γ, place the rule in row S, in all columns labeled with Firstk(γ) ○ Followk(S)

Page 42: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.3.1 Local Lookahead in LL(k) Parsers

• If a CFG has t terminals and n non-terminals, then the number of entries in the LL(k) parse table can be n x tk. This makes for some pretty big parse tables

• Rather than trying to create an LL(k) parser by creating a complete LL(k) parse table, we can crate an LL(1) parse table, which may have duplicate entries

• When creating the parser from this parse table, we will use a local lookahead greater than 1 to disambiguate multiple parse table entries

• For instance, consider the following CFG fragment for simple Java. Where S generates statements and E generates expressions (Note: GETS means “=“)S <IDENTIFIER> <GETS> E <SEMICOLON>S <IDENTIFIER> <IDENTIFIER> <SEMICOLON>

Page 43: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.3.1 Local Lookahead in LL(k) Parsers

• The parse table for this grammar will have a duplicate entry in row S, column <IDENTIFIER>

• We can leave this duplicate entry in the parse table, and we can add an extra check when we build the parser

• See the ParserS() program on pages 67-68• JavaCC allows for both local (for the rules of

one non-terminal) and global (for the rules of all non-terminals) lookahead greater than 1.

Page 44: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.4 JavaCC – A LL(k) Parser Generator

• JavaCC can be used to build lexical analyzers as well as recursive descent parsers

• JavaCC takes the EBNF grammar for a language as input, and creates a suite of mutually recursive methods that implement a LL(k) parser for that language

Page 45: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.4.1 JavaCC File Format Revisited• JavaCC .jj files have the following format:

options { /* Code to set various option flags */ }

PARSER_BEGIN(foo)public class foo { /* Extra parser method definitions go here,

/* often a main program which derives the parser. */ }PARSER_END(foo)

TOKEN_MGR_DECLS: { /* Declarations used by the lexical analyzer */ }/* Token rules and actions *//* JavaCC rules and actions */

Page 46: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.4.2 JavaCC Rules• JavaCC uses EBNF rules to describe the grammar of

that language to be parsed• Each JavaCC rule has the form:

void nonTerminalName();{ /* Java Declarations */}

{ /* Rule Declarations */}

• The “Java Declarations” block will be used for building abstract syntax trees, we will leave this block blank for now

• The “Rule Declarations” block defines the right-hand side of the rules that we are writing

• Non-terminals in these rules will represent function calls, and that will be followed by (). Terminal names will be enclosed in < and >.

Page 47: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.4.2 JavaCC Rules• For instance:

S while (E) SS V = E;

• Would be represented by the JavaCC rule:

void statement ();{ } /* the Java Declarations block is blank */{

<WHILE> <LPAREN> expresssion() <RPAREN> statement() │ variable() <GETS> expression() <SEMICOLON>}

• Note: JavaCC uses the uppercase letters for terminals and lowercase letters for non-terminals

• All terminals are inside < and >• Non-terminals represent method calls in the generated parser.

Hence, the () after each non-terminal in the rule

Page 48: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.4.2 JavaCC Rules

• Consider the sample prefix expressions in the table on page 69, a CFG for this language is in Fig. 4.11 (p70)

• The rule for non-terminal expression() could also be written as on page 70, and

• A JavaCC .jj file for this language is in Fig. 4.12 (p71)• When we run prefix.jj on JavaCC, the following files will

be created:• Utility files: PaseException.java and

SimpleCharStream.java• Token management files: Token.java,

TokenMgrError.java, prefixTokenManager.java and prefixConstants.java

• Parser implementation: prefix.java

Page 49: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.4.2 JavaCC Rules• The class prefix, defined in prefix.java, contains the code

that implements the parser• A public method is defined for each non-terminal, which

will parse any string of tokens that can be derived from that non-terminal

• A main program that calls the generated parser is in Fig. 4.3 (p72)

• We could also include the main method in prefix.java, by placing its definition within the “public class prefix” section of prefix.jj

• Another parser for infix expressions can be found in Fig. 4.14 (73)

• We can also mix and match anonymous tokens with standard named tokens. A version of infix.jj that uses some anonymous tokens can be found in Fig. 4.15 (p74)

Page 50: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.4.4 Using LOOKAHEAD Directives• By default, JavaCC creates a LL(1) parser for the grammar

described in the .jj file.• What if the grammar is not LL(1)?• Consider the following JavaCC rule:

void S():{ }{ “a” “b” “c”│”a” “d” “c”}

• The above rule can not appear in an LL(1) grammar, but it can appear in an LL(2) grammar

• JavaCC not only generates LL(1) parsers, but also generates LL(k) parsers

• We can set the global LOOKAHEAD to 2 instead of 1 by adding the line “LOOKAHEAD = 2; “ to the options section of the .jj file. But this will make the generated parser much larger

Page 51: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.4.4 Using LOOKAHEAD Directives

• JavaCC allows us to change the value of the lookahead locally, instead of globally

• This could keep the flexibility and efficiency of using a lookahead of 1 for most of the parser

• We can use the directive LOOKAHEAD(k) to lookahead k symbols locally. For example:void S();{ }{ LOOKAHEAD(2) “a” “b” “c”│ “a” “d” “c”}

Page 52: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.4.4 Using LOOKAHEAD Directives

• Another example:void():{ }{

“A” (LOOKAHEAD(2)(“B” “C”)│(“B” “D”))}

• But the following example doesn’t work. Why?void S():{ }{

LOOKAHEAD (2) “A” ((“B” “C”) │(“B” “D”))

}

Page 53: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

4.4.5 Ambiguity in JavaCC Rules

• Use LOOKAHEAD directive to solve the ambiguity of “if then else” statements (P77)

void statement():{ }{ <IF> expression() <THEN> statement()

(LOOKAHEAD(1) <ELSE> statement())?│ /* other statement definitions */}

Page 54: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

Chapter 4 homework• Due: 2/27/2013• Do the following exercises on pp79-81: 1(a), 2(a), 3(a), 5.• Exercise 9: Using the following example to prove that

removing the left recursion from the original rules is correct.Example: T+T+T-T+T-T Before removing left recursion (original rules)

E E+E | E-E | T After removing left recursion

E TE’ E’ +TE’ | -TE’ | ε

(Hint: Derive the example from the rules before and after removing left recursion respectively. If they are same then the new set of rules are correct. Show your conclusion.)

Page 55: Chapter 4 Top-Down Parsing Dr. Frank Lee. Parsing and Parsers Once we have described the syntax of our programming language using a context-free grammar,

Project 2• Due: 3/6/2013• Project Topic: An LL(1) parser to parse simple Java

programs• Study Chapters 3 and 4 and go over Section 4.5 completely• Append the parser code into your lexical analyzer (Project

1)• Hand-in & Email me:

1. Your complete parser program2. The 4 test programs3. Execution results for each test program4. A README file to describe the steps to run and test your

programs in Eclipse and DOS.


Recommended