Date post: | 05-Feb-2016 |

Category: | ## Documents |

View: | 40 times |

Download: | 0 times |

Share this document with a friend

Description:

Top-Down Parsing. ICOM 4036 Lecture 5. Review. A parser consumes a sequence of tokens s and produces a parse tree Issues: How do we recognize that s L(G) ? A parse tree of s describes how s L(G) Ambiguity: more than one parse tree (interpretation) for some string s - PowerPoint PPT Presentation

Transcript:

Top-Down ParsingICOM 4036Lecture 5

Profs. Necula CS 164 Lecture 6-7

ReviewA parser consumes a sequence of tokens s and produces a parse treeIssues:How do we recognize that s L(G) ?A parse tree of s describes how s L(G) Ambiguity: more than one parse tree (interpretation) for some string s Error: no parse tree for some string sHow do we construct the parse tree?

Profs. Necula CS 164 Lecture 6-7

AmbiguityGrammar E E + E | E * E | ( E ) | int

Strings int + int + int

int * int + int

Profs. Necula CS 164 Lecture 6-7

Ambiguity. ExampleThis string has two parse treesEEEEE+int+intintEEEEE+int+intint+ is left-associative

Profs. Necula CS 164 Lecture 6-7

Ambiguity. ExampleThis string has two parse treesEEEEE*int+intintEEEEE+int*intint* has higher precedence than +

Profs. Necula CS 164 Lecture 6-7

Ambiguity (Cont.)A grammar is ambiguous if it has more than one parse tree for some stringEquivalently, there is more than one right-most or left-most derivation for some stringAmbiguity is badLeaves meaning of some programs ill-definedAmbiguity is common in programming languagesArithmetic expressionsIF-THEN-ELSE

Profs. Necula CS 164 Lecture 6-7

Dealing with AmbiguityThere are several ways to handle ambiguity

Most direct method is to rewrite the grammar unambiguously E E + T | T T T * int | int | ( E )

Enforces precedence of * over +Enforces left-associativity of + and *

Profs. Necula CS 164 Lecture 6-7

Ambiguity. ExampleThe int * int + int has ony one parse tree now

EEEEE*int+intintETTintT+int*Eint

Profs. Necula CS 164 Lecture 6-7

Ambiguity: The Dangling ElseConsider the grammar E if E then E | if E then E else E | OTHER

This grammar is also ambiguous

Profs. Necula CS 164 Lecture 6-7

The Dangling Else: ExampleThe expression if E1 then if E2 then E3 else E4 has two parse treesTypically we want the second form

Profs. Necula CS 164 Lecture 6-7

The Dangling Else: A Fixelse matches the closest unmatched then We can describe this in the grammar (distinguish between matched and unmatched then)

E MIF /* all then are matched */ | UIF /* some then are unmatched */MIF if E then MIF else MIF | OTHERUIF if E then E | if E then MIF else UIF Describes the same set of strings

Profs. Necula CS 164 Lecture 6-7

The Dangling Else: Example RevisitedThe expression if E1 then if E2 then E3 else E4 Not valid because the then expression is not a MIFA valid parse tree (for a UIF)

Profs. Necula CS 164 Lecture 6-7

AmbiguityNo general techniques for handling ambiguity

Impossible to convert automatically an ambiguous grammar to an unambiguous one

Used with care, ambiguity can simplify the grammarSometimes allows more natural definitionsWe need disambiguation mechanisms

Profs. Necula CS 164 Lecture 6-7

Precedence and Associativity DeclarationsInstead of rewriting the grammarUse the more natural (ambiguous) grammarAlong with disambiguating declarations

Most tools allow precedence and associativity declarations to disambiguate grammars

Examples

Profs. Necula CS 164 Lecture 6-7

Associativity DeclarationsConsider the grammar E E + E | int Ambiguous: two parse trees of int + int + intLeft-associativity declaration: %left +

Profs. Necula CS 164 Lecture 6-7

Precedence DeclarationsConsider the grammar E E + E | E * E | int And the string int + int * intPrecedence declarations: %left + %left *

Profs. Necula CS 164 Lecture 6-7

ReviewWe can specify language syntax using CFGA parser will answer whether s L(G) and will build a parse tree and pass on to the rest of the compiler

Next:How do we answer s L(G) and build a parse tree?

Profs. Necula CS 164 Lecture 6-7

Approach 1Top-Down Parsing

Profs. Necula CS 164 Lecture 6-7

Intro to Top-Down ParsingTerminals are seen in order of appearance in the token stream: t1 t2 t3 t4 t5The parse tree is constructedFrom the topFrom left to right

Profs. Necula CS 164 Lecture 6-7

Recursive Descent ParsingConsider the grammar E T + E | T T int | int * T | ( E )Token stream is: int5 * int2Start with top-level non-terminal E

Try the rules for E in order

Profs. Necula CS 164 Lecture 6-7

Recursive Descent Parsing. Example (Cont.)Try E0 T1 + E2 Then try a rule for T1 ( E3 )But ( does not match input token int5Try T1 int . Token matches. But + after T1 does not match input token *Try T1 int * T2This will match but + after T1 will be unmatchedHave exhausted the choices for T1Backtrack to choice for E0

Profs. Necula CS 164 Lecture 6-7

Recursive Descent Parsing. Example (Cont.)Try E0 T1Follow same steps as before for T1And succeed with T1 int * T2 and T2 intWith the following parse tree

Profs. Necula CS 164 Lecture 6-7

Recursive Descent Parsing. Notes.Easy to implement by handAn example implementation is provided as a supplement Recursive Descent Parsing

But does not always work

Profs. Necula CS 164 Lecture 6-7

Recursive-Descent ParsingParsing: given a string of tokens t1 t2 ... tn, find its parse treeRecursive-descent parsing: Try all the productions exhaustivelyAt a given moment the fringe of the parse tree is: t1 t2 tk A Try all the productions for A: if A BC is a production, the new fringe is t1 t2 tk B C Backtrack when the fringe doesnt match the string Stop when there are no more non-terminals

Profs. Necula CS 164 Lecture 6-7

When Recursive Descent Does Not WorkConsider a production S S a:In the process of parsing S we try the above ruleWhat goes wrong?

A left-recursive grammar has a non-terminal S S + S for some

Recursive descent does not work in such casesIt goes into an loop

Profs. Necula CS 164 Lecture 6-7

Elimination of Left RecursionConsider the left-recursive grammar S S |

S generates all strings starting with a and followed by a number of

Can rewrite using right-recursion S S S S |

Profs. Necula CS 164 Lecture 6-7

Elimination of Left-Recursion. ExampleConsider the grammar S 1 | S 0 ( = 1 and = 0 )

can be rewritten as S 1 S S 0 S |

Profs. Necula CS 164 Lecture 6-7

More Elimination of Left-RecursionIn general S S 1 | | S n | 1 | | mAll strings derived from S start with one of 1,,m and continue with several instances of 1,,n Rewrite as S 1 S | | m S S 1 S | | n S |

Profs. Necula CS 164 Lecture 6-7

General Left RecursionThe grammar S A | A S is also left-recursive because S + S

This left-recursion can also be eliminatedSee Dragon Book, Section 4.3 for general algorithm

Profs. Necula CS 164 Lecture 6-7

Summary of Recursive DescentSimple and general parsing strategyLeft-recursion must be eliminated first but that can be done automaticallyUnpopular because of backtrackingThought to be too inefficient

In practice, backtracking is eliminated by restricting the grammar

Profs. Necula CS 164 Lecture 6-7

Predictive ParsersLike recursive-descent but parser can predict which production to useBy looking at the next few tokensNo backtracking Predictive parsers accept LL(k) grammarsL means left-to-right scan of inputL means leftmost derivationk means predict based on k tokens of lookaheadIn practice, LL(1) is used

Profs. Necula CS 164 Lecture 6-7

LL(1) LanguagesIn recursive-descent, for each non-terminal and input token there may be a choice of productionLL(1) means that for each non-terminal and token there is only one production that could lead to successCan be specified as a 2D tableOne dimension for current non-terminal to expandOne dimension for next tokenA table entry contains one production

Profs. Necula CS 164 Lecture 6-7

Predictive Parsing and Left FactoringRecall the grammar E T + E | T T int | int * T | ( E )

Impossible to predict becauseFor T two productions start with intFor E it is not clear how to predict

A grammar must be left-factored before use for predictive parsing

Profs. Necula CS 164 Lecture 6-7

Left-Factoring ExampleRecall the grammar E T + E | T T int | int * T | ( E )

Factor out common prefixes of productions E T X X + E | T ( E ) | int Y Y * T |

Profs. Necula CS 164 Lecture 6-7

LL(1) Parsing Table ExampleLeft-factored grammarE T X X + E | T ( E ) | int Y Y * T | The LL(1) parsing table:

int*+()$Tint Y( E )ET XT XX+ EY* T

Profs. Necula CS 164 Lecture 6-7

LL(1) Parsing Table Example (Cont.)Consider the [E, int] entryWhen current non-terminal is E and next input is int, use production E T XThis production can generate an int in the first placeConsider the [Y,+] entryWhen current non-terminal is Y and current token is +, get rid of YWell see later why this is so

Profs. Necula CS 164 Lecture 6-7

LL(1) Parsing Tables. ErrorsBlank entries indicate error situationsConsider the [E,*] entryThere is no way to derive a string starting with * from non-terminal E

Profs. Necula CS 164 Lecture 6-7

Using Parsing TablesMethod similar to recursive descent, exceptFor each non-terminal SWe look at the next token aAnd choose the production shown at [S,a]We use a stack to keep track of pending non-terminalsWe reject when we encounter an error stateWe accept when we encounter end-of-input

Profs. Necula CS 164 Lecture 6-7

LL(1) Parsing Algorithminitialize stack = and next (pointer to tokens)repeat case stack of : if T[X,*next] = Y1Yn then stack ; else error (); : if t == *next ++ then stack ; else error ();until stack == < >

Profs. Necula CS 164 Lecture 6-7

LL(1) Parsing ExampleStack Input ActionE $ int * int $ T XT X $ int * int $ int Yint Y X $ int * int $ terminalY X $ * int $ * T* T X $ * int $ terminalT X $ int $ int Yint Y X $ int $ terminalY X $ $ X $ $ $ $ ACCEPT

Profs. Necula CS 164 Lecture 6-7

Constructing Parsing TablesLL(1) languages are those defined by a parsing table for the LL(1) algorithm

No table entry can be multiply defined

We want to generate parsing tables from CFG

Profs. Necula CS 164 Lecture 6-7

Top-Down Parsing. ReviewTop-down parsing expands a parse tree from the start symbol to the leavesAlways expand the leftmost non-terminalEint * int + int

Profs. Necula CS 164 Lecture 6-7

Top-Down Parsing. ReviewTop-down parsing expands a parse tree from the start symbol to the leavesAlways expand the leftmost non-terminalEint * int + intThe leaves at any point form a string bAg b contains only terminalsThe input string is bbdThe prefix b matchesThe next token is b

Profs. Necula CS 164 Lecture 6-7

Top-Down Parsing. ReviewTop-down parsing expands a parse tree from the start symbol to the leavesAlways expand the leftmost non-terminalEint * int + intThe leaves at any point form a string bAg b contains only terminalsThe input string is bbdThe prefix b matchesThe next token is b

Profs. Necula CS 164 Lecture 6-7

Top-Down Parsing. ReviewTop-down parsing expands a parse tree from the start symbol to the leavesAlways expand the leftmost non-terminalEint * int + intThe leaves at any point form a string bAg b contains only terminalsThe input string is bbdThe prefix b matchesThe next token is b

Profs. Necula CS 164 Lecture 6-7

Predictive Parsing. Review.A predictive parser is described by a tableFor each non-terminal A and for each token b we specify a production A a When trying to expand A we use A a if b follows next

Once we have the tableThe parsing algorithm is simple and fastNo backtracking is necessary

Profs. Necula CS 164 Lecture 6-7

Constructing Predictive Parsing TablesConsider the state S * bAgWith b the next tokenTrying to match bbdThere are two possibilities: b belongs to an expansion of AAny A a can be used if b can start a string derived from a In this case we say that b First(a)

Or

Profs. Necula CS 164 Lecture 6-7

Constructing Predictive Parsing Tables (Cont.)b does not belong to an expansion of AThe expansion of A is empty and b belongs to an expansion of gMeans that b can appear after A in a derivation of the form S * bAbw We say that b Follow(A) in this case

What productions can we use in this case?Any A a can be used if a can expand to eWe say that e First(A) in this case

Profs. Necula CS 164 Lecture 6-7

Computing First SetsDefinition First(X) = { b | X * b} { | X * }First(b) = { b }

For all productions X A1 AnAdd First(A1) {} to First(X). Stop if First(A1)Add First(A2) {} to First(X). Stop if First(A2)Add First(An) {} to First(X). Stop if First(An)Add to First(X)

Profs. Necula CS 164 Lecture 6-7

First Sets. ExampleRecall the grammar E T X X + E | T ( E ) | int Y Y * T | First sets First( ( ) = { ( } First( T ) = {int, ( } First( ) ) = { ) } First( E ) = {int, ( } First( int) = { int } First( X ) = {+, } First( + ) = { + } First( Y ) = {*, } First( * ) = { * }

Profs. Necula CS 164 Lecture 6-7

Computing Follow SetsDefinition Follow(X) = { b | S * X b }Compute the First sets for all non-terminals firstAdd $ to Follow(S) (if S is the start non-terminal)

For all productions Y X A1 AnAdd First(A1) {} to Follow(X). Stop if First(A1) Add First(A2) {} to Follow(X). Stop if First(A2)Add First(An) {} to Follow(X). Stop if First(An)Add Follow(Y) to Follow(X)

Profs. Necula CS 164 Lecture 6-7

Follow Sets. ExampleRecall the grammar E T X X + E | T ( E ) | int Y Y * T | Follow sets Follow( + ) = { int, ( } Follow( * ) = { int, ( } Follow( ( ) = { int, ( } Follow( E ) = {), $} Follow( X ) = {$, ) } Follow( T ) = {+, ) , $} Follow( ) ) = {+, ) , $} Follow( Y ) = {+, ) , $} Follow( int) = {*, +, ) , $}

Profs. Necula CS 164 Lecture 6-7

Constructing LL(1) Parsing TablesConstruct a parsing table T for CFG G

For each production A in G do:For each terminal b First() doT[A, b] = If * , for each b Follow(A) doT[A, b] = If * and $ Follow(A) doT[A, $] =

Profs. Necula CS 164 Lecture 6-7

Constructing LL(1) Tables. ExampleRecall the grammar E T X X + E | T ( E ) | int Y Y * T | Where in the row of Y do we put Y * T ?In the lines of First( *T) = { * }

Where in the row of Y do we put Y e ?In the lines of Follow(Y) = { $, +, ) }

Profs. Necula CS 164 Lecture 6-7

Notes on LL(1) Parsing TablesIf any entry is multiply defined then G is not LL(1)If G is ambiguousIf G is left recursiveIf G is not left-factoredAnd in other cases as wellMost programming language grammars are not LL(1)There are tools that build LL(1) tables

Profs. Necula CS 164 Lecture 6-7

ReviewFor some grammars there is a simple parsing strategyPredictive parsing

Next: a more powerful parsing strategy

Profs. Necula CS 164 Lecture 6-7

Popular Tags:

of 56/56

Profs. Necula CS 164 Lecture 6-7 1 Top-Down Parsing ICOM 4036 Lecture 5

Embed Size (px)

Recommended