Chap. 5, Top-Down Parsing
J. H. WangOct. 26, 2015
Outline
• Overview• LL(k) Grammars• Recursive-Descent LL(1) Parsers• Table-Driven LL(1) Parsers• Obtaining LL(1) Grammars• A Non-LL(1) Language• Properties of LL(1) Parsers• Parse Table Representation• Syntactic Error Recovery and Repair
Overview
• Two forms of top-down parsers– Recursive-descent parsers– Table-driven LL parsers: LL(k) – to be explained
later
• Compiler compilers (or parser generators)– CFG as a language’s definition, parsers can be
automatically constructed– Language revision, update, or extension can be
easily applied to a new parser– Grammar can be proved unambiguous if parser
construction is successful
Top-Down Parsing
• Top-down– To grow a parse tree from root to leaves
• Predictive– Must predict which production rule to be
applied
• LL(k)– Scan input left to right, leftmost derivation, k
symbol lookahead
• Recursive descent– Can be implemented by a set of mutually
recursive procedures
LL(k) Grammars
• Recall from Chap.2– A parsing procedure for each nonterminal A– The procedure is responsible for
accomplishing one step of derivation for the corresponding production
– Choosing production by inspecting the next k tokens. • Predict set for production A is the set of
tokens that trigger the production
– Predict set is determined by the right-hand side (RHS)
• We need a strategy for choosing productions– Predictk(p): the set of length-k token strings
that predict the application of rule p• Input string: a*
• S=>*lmAy1…yn
– P={pProductionsFor(A)|aPredict(p)}• P: empty set -> syntax error• P: more than one productions -> nondeterminism• P: exactly one production -> leftmost parse can proceed
deterministically
How to Compute Predict(p)
• To predict production p: AX1…Xm, m>=0– The set of terminal symbols that are first
produced in some derivation from X1…Xm
– Those terminal symbols that can follow A
– (Fig. 5.1)
• For LL(1) grammar, the productions for each nonterminal A must have disjoint predict sets– Most programming languages have LL(1)
grammars, but some constructs require special attention
• Not all CFGs are LL(1)– More lookahead may be needed: LL(k), k>1– A more powerful parsing method may be
required (Chap. 6)– The grammar may be ambiguous
S
MATCH
PEEK
ADVANCE
ERROR
Recursive-Descent LL(1) Parsers
• Input: token stream ts– PEEK(): to examine the next input token without
advancing the input– ADVANCE(): to advances the input by one token
• To construct a recursive-descent parser– We write a separate procedure for each
nonterminal A– For each production pi, we check each symbol in
the RHS X1…Xm
• Terminal symbol: MATCH(ts, Xi)• Nonterminal symbol: call Xi(ts)
PEEK
PEEK
PEEK
PEEK
PEEK
PEEK
PEEK
PEEK
PEEK
PEEK
PEEK
MATCH
MATCH
MATCH
MATCH
MATCH
MATCH
PEEK
Table-Driven LL(1) Parsers
• Creating recursive-descent parsers can be automated, but– Size of parser code grows with the size of
grammar– Inefficiency: overhead of method calls and returns
• To create table-driven parsers, we use stack to simulate the actions by MATCH() and calls to nonterminals’ procedures– Terminal symbol: MATCH– Nonterminal symbol: table lookup– (Fig. 5.8)
PUSH
MATCH
POP
ERROR
APPLY
APPLY
POP
PUSH
PEEK
PARSER
How to Build LL(1) Parse Table
• The table is indexed by the top-of-stack (TOS) symbol and the next input token– Row: nonterminal symbol– Column: next input token– (Fig. 5.9)
ILL ABLE
Obtaining LL(1) Grammars
• It’s easy to violate the requirement of a unique prediction for each combination of nonterminal and lookahead symbols– Common prefixes– Left recursion
Common Prefixes
• Two productions for the same nonterminal begin with the same string of grammar symbols– Ex. (Fig. 5.12) Not LL(k)• Factoring transformation (Fig. 5.13)
– A-> p
– Ex. (Fig. 5.14)
ACTOR
LIMINATE EFT ECURSION
Left Recursion
• A production is left recursive if its LHS symbol is also the first symbol of its RHS– E.g. StmtList StmtList ; Stmt– AA
| • Eliminating left recursion (Fig. 5.15)
– Ex. (Fig. 5.16)
• Greibach normal form (GNF)– The right-hand sides of all productions begin
with a terminal symbol
A Non-LL(1) Language
• Almost all common programming language constructs: LL(1)– One exception: if-then-else (dangling
else program)– Can be resolved by mandating that each
else is matched to its closest unmatched then
– (Fig. 5.17)
• Ambiguous (Chap. 6)– E.g. if expr then if expr then other else
other• If expr then { if expr then other else other }• If expr then { if expr then other } else other• -> at least two distinct parses
• Dangling bracket language (DBL)– DBL={[i]j|i≥j≥0}
• if expr then Stmt -> [ (opening bracket)• else Stmt -> ] (optional closing bracket)
• Fig. 5.18(a)– S [ S CL
| λCL ] | λ
• E.g. [[] -> ambiguous
• Fig. 5.18(b)– S [ S
| TT [ T ] | λ
– Not ambiguous, but not LL(1)
• Reasons why it’s not LL(k) for any k– [Predict( S[S )
[Predict( ST )[[Predict2( S[S )[[Predict2( ST )…[kPredictk( S[S )[kPredictk( ST )
• When there’s conflict, the else should match the closest then – -> we resolve the conflict in favor of rule 4 (Fig.
5.19)– Some parser generators offer mechanisms for
establishing priorities when conflicts arises
Properties of LL(1) Parsers• A correct, leftmost parse is constructed• All grammars in LL(1) are unambiguous• All table-driven LL(1) parsers operate in linear time and
space with respect to the length of the parsed input– Linear time:
• Suppose grammar is –free, no production can be applied twice without advancing the input
• If the grammar includes , the number of nonterminals that could pop from stack because of the application of -rules is proportional to the length of input
– Linear space: • Stack grows only when a production is applied of the form A->
– If we regard the number and size of productions to be bounded by some constant, then each input token contributes to a constant increase in stack size
• If stack grew superlinearly, then the parser would require more than linear time to push entries on stack
Sections Skipped…
• Parse Table Representation– Compaction– Compression
• Syntactic Error Recovery and Repair– Error detection– Error recovery– Error repair
Thanks for Your Attention!