+ All Categories
Home > Documents > Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Date post: 20-Jan-2016
Category:
Upload: noel-gaines
View: 223 times
Download: 0 times
Share this document with a friend
113
Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen
Transcript
Page 1: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 1

Lecture 3

Introduction to Parsing and

Top-Down Parsing

Cheng-Chia Chen

Page 2: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 2

Outlines

Parsing overviewLimitation of regular expressionsContext-free GrammarDerivation and Parse TreeAmbiguity of CFGsConcrete Parse Tree v.s. Abstract Syntax Tree.Recursive Descent ParsingLL(1) parsing

Page 3: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 3

Parsing Overview

What is syntax ? The way in which words are put together to form phrases,

clauses, or sentences. --- Webster’s Dictionary

The function of a parser : Input: sequence of tokens from lexer

Output: parse tree of the program

Page 4: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 4

Example

Java expr

x == y ? 1 : 2 Parser input

ID == ID ? INT : INT Parser output

ID ID

?:

==

INT

INT

Page 5: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 5

Comparison with Lexical Analysis

Phase Input Output

Lexer Sequence of characters

Sequence of tokens

Parser Sequence of tokens

Parse tree

Page 6: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 6

The Role of the Parser

Not all sequences of tokens are programs . . . . . . Parser must distinguish between valid and invalid

sequences of tokens

We need A language for describing valid sequences of tokens A method for distinguishing valid from invalid sequences

of tokens A way to construct the parse tree from the parsing

process.

Page 7: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 7

Limitation of regular expression

Recall how we define lexical structures using regular expressions.

Ex: ---- (1) digits = [0-9]+ sum = (digits “+”)* digits

match sums of the form: 28 + 301 + 9.Can we use the same way to define the syntax of a

language?

Page 8: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 8

Inadequacy of regular expressions

Consider how to use regular expressions to express sums with parentheses ? (109+235), 61, (1+ (250 + 34))

A solution: --- (2) digits = [0-9]+ sum = expr + expr expr = “(“ sum “)” | digits

Note the difference b/t (1) and (2). in (1) digits and sum can be and actually is treated as

abbreviations of their right hand side(RHS). in (2), the LHS name are used recursively at their RHS, and

cannot be treated as abbreviations.

Page 9: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 9

Inadequacy of regular expressions

To translate (1) digits = [0-9]+ sum = (digits “+”)* digits

into FA, we first translate each definition into normal regular expressions by replacing every reference of definitions by its RHS recursively: digits = [0-9]+ sum = ([0-9]+ “+”)* [0-9]+ and then apply normal procedures to translate regular

expressions into DFAs.But for definitions with recursion like (2), this approach

does not work.

Page 10: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 10

--- (2)1. digits = [0-9]+

2. sum = expr “+” expr

3. expr = “(“ sum “)” | digits Repalce sum in 3. by their RHS at 2:

expr = “(“ expr “+” expr “)” | digits ---- (4) Repace expr at (4) by itself, we get: expr = “(“ (“(“ expr “+” expr “)” | digits) “+” (“(“ expr “+” expr “)” | digits) | digits --- (5)

Conclusion: It is hopeless to eliminate names in RHS by finite

substitutions if there are direct or indirect recursions.

Page 11: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 11

It can be shown that regular expressions with non-recursive abbreviations do

not increase the expressive power of regular expressions, but ,

Regular expressions allowing recursive abbreviations do increase the expressive power of regular expressions

--- This formalism is called context-free grammar, and is just what we need for parsing.

Page 12: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 12

Redundancy induced by recursion

Inner alternation(|) is not needed: expr = ab(c|d) e ==> aux = c | d expr = a b aux e ==> aux=c aux = d expr = a b aux e

Repetition(*) is not needed expr = (a b c ) * ==> expr = (a b c) expr --- right recursion // expr = expr (a b c ) --- left recursion expr = .

Page 13: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 13

Context-Free Grammars

Programming language constructs have recursive structure

An EXPR isif EXPR then EXPR else EXPR fi , or

while ( EXPR ) EXPR , or

…Context-free grammars are a natural notation for this

recursive structure

Page 14: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 14

CFGs (Cont.)

A CFG consists of A set of terminals T A set of non-terminals N A start symbol S (a non-terminal) A set P of productions, where each rule r is of the form:

Assuming X N

X , or

X Y1 Y2 ... Yn where Yi N T

Page 15: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 15

Notational Conventions:

In these lecture notes Non-terminals are written upper-case (S, A,B,C,…) Terminals are written lower-case (a,b,c,…) X,Y,Z, … range over T U N , … ranges over strings of N U T.

The start symbol (S) is the left-hand side of the first production

Each terminal symbol corresponds to a token type from the lexer.

Each non-terminal symbol corresponds to a symbol occurring at the LHS of a production rule.

Page 16: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 16

Examples of CFGs

Expr if Expr then Expr else Expr

| while Expr do Expr

| id

Page 17: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 17

Examples of CFGs

Simple arithmetic expressions:

E E * E

| E + E

| ( E )

| id

Page 18: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 18

The Language of a CFG

Read productions as replacement rules:

X Y1 … Yn

Means X can be replaced by Y1 … Yn

X

Means X can be erased (replaced with empty string)

Page 19: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 19

Key Idea

1. Begin with a string consisting of the start symbol “S”2. Replace any non-terminal X in the string by a right-

hand side of some production

3. Repeat (2) until there are no non-terminals in the string

X Y1 … Yn

Page 20: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 20

The Language of a CFG (Cont.)

Write

X1X2 …Xi … Xn X1 … Xi-1 Y1 … Yn Xi+1…Xn

if there is a production

Xi Y1 … Yn

More formally : X if X ∃ P.∈ or define to be the binary relation { ( X ) | X if x ∃ P. } ∈ on (T U N)*.

Page 21: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 21

The Language of a CFG (Cont.)

Write X1…Xn * Y1 … Ym

if X1…Xn …… Y1 … Ym

in 0 or more stepsOr formally:

Define * to be the reflexive and transitive closure of the relation .

i.e., * iff = or ∃ 1,2,…,n (n ≥ 1) such that 1 …

n = .

Page 22: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 22

The Language of a CFG

Let G be a context-free grammar with start symbol S. Then the language L(G) of G is the set:

{ | S * where is a terminal string. }

Page 23: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 23

Terminals

Terminals are so called because there are no rules for replacing them

Once generated, terminals are permanent

Terminals ought to be token types of the language

Page 24: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 24

Examples

Strings of balanced parentheses :

Two representations of the same grammar G:

( )S S

S

( )

|

S S

( ) | 0i i i

OR

Page 25: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 25

Arithmetic Example

Simple arithmetic expressions:

Some elements of the language:

E E+E | E E | (E) | id

id id + id

(id) id id

(id) id id (id)

Page 26: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 26

Notes

The idea of a CFG is a big step. But:

Membership in a language is just “yes” or “no” We need also the parse tree of the input

Must handle errors gracefully

Need (tools for) an implementation of CFG’s.

Page 27: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 27

Derivations and Parse Trees

A derivation is a sequence :

S0, S1 , S2 , … Sn of strings over terminals and nonterminals such that

1. Si S i+1 for 0 ≤ i < n.

2. S0 = S is the start symbol.

A derivation can be shown as the drawing of a tree Start symbol is the tree’s root For a production X Y1 … Yn

add children Y1 … Yn to node X

Page 28: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 28

Derivation Example

Grammar

String

E E+E | E E | (E) | id

id id + id

Page 29: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 29

Derivation Example (Cont.)

E

E+E

E E+E

id E + E

id id + E

id id + id

E

E

E E

E+

id*

idid

Page 30: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 30

Derivation in Detail (1)

E

E

Page 31: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 31

Derivation in Detail (2)

E

E+E

E

E E+

Page 32: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 32

Derivation in Detail (3)

E E

E

E+E

E +

E

E

E E

E+

*

Page 33: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 33

Derivation in Detail (4)

E

E+E

E E+E

id E + E

E

E

E E

E+

*

id

Page 34: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 34

Derivation in Detail (5)

E

E+E

E E+E

id E +

id id +

E

E

E

E

E E

E+

*

idid

Page 35: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 35

Derivation in Detail (6)

E

E+E

E E+E

id E + E

id id + E

id id + id

E

E

E E

E+

id*

idid

Page 36: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 36

Properties of parse trees.

A parse tree has start symbol at the root terminals or empty nodes at the leaves Non-terminals at the internal nodes if internal node X has children Y1,…,Yn, then

X Y1 Y2 … Yn is a production rule.

An in-order traversal of the leaves is the original input.

The parse tree makes explicit the structure of the input string.

Page 37: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 37

Left-most and Right-most Derivations

The example is a right-most derivation At each step, replace

the right-most non-terminal

There is an equivalent notion of a left-most derivation

E

E+E

E+id

E E + id

E id + id

id id + id

Page 38: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 38

Right-most Derivation in Detail (1)

E

E

Page 39: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 39

Right-most Derivation in Detail (2)

E

E+E

E

E E+

Page 40: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 40

Right-most Derivation in Detail (3)

id

E

E+E

E+

E

E E+

id

Page 41: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 41

Right-most Derivation in Detail (4)

E

E+E

E+id

E E + id

E

E

E E

E+

id*

Page 42: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 42

Right-most Derivation in Detail (5)

E

E+E

E+id

E E

E

+ id

id + id

E

E

E E

E+

id*

id

Page 43: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 43

Right-most Derivation in Detail (6)

E

E+E

E+id

E E + id

E id + id

id id + id

E

E

E E

E+

id*

idid

Page 44: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 44

Derivations and Parse Trees

Note that right-most and left-most derivations have the same parse tree

The difference is the order in which branches are added

Page 45: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 45

Summary of Derivations

We are not just interested in whether

s (G) We need a parse tree for s

A derivation defines a parse tree But one parse tree may have many derivations

Left-most and right-most derivations are important in parser implementation in that they can each serve as the canonical derivation of a parse tree : Every parse tree has a unique left-most (and a unique right-most) derivation.

Page 46: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 46

Issues

A parser consumes a sequence of tokens s and produces a parse tree

Issues: How to recognize that s L(G) ? How to generate a parse tree of s once s L(G) Ambiguity: Is there more than one parse tree

(interpretation) for some string s ? Error handling: What should we do if no parse

tree exist for an input string.

Page 47: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 47

Ambiguity

Grammar

E E + E | E * E | ( E ) | int

String

int * int + int

Page 48: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 48

Ambiguity (Cont.)

This string has two parse trees

E

E

E E

E*

int +

intint

E

E

E E

E+

int*

intint

Page 49: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 49

Ambiguity (Cont.)

A grammar is ambiguous if it has more than one parse tree for some string Equivalently, there is more than one right-most or left-most

derivation for some stringAmbiguity is bad

Leave meaning of some programs ill-definedAmbiguity is common in programming languages

Arithmetic expressions IF-THEN-ELSE

Page 50: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 50

Dealing with Ambiguity

There are several ways to handle ambiguity

Most direct method is to rewrite the grammar unambiguously

E T + E | T

T int * T | int | ( E )

Enforces precedence of * over +

Page 51: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 51

Ambiguity: The Dangling Else

Consider the grammar E if E then E

| if E then E else E

| OTHER

This grammar is also ambiguous

Page 52: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 52

The Dangling Else: Example

The expression

if E1 then if E2 then E3 else E4

has two parse trees

if

E1 if

E2 E3 E4

if

E1 if

E2 E3

E4

• Typically we want the second form

Page 53: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 53

The Dangling Else: A Fix

else matches the closest unmatched then We can describe this in the grammar

E MIF /* all then are matched */

| UIF /* some then are unmatched */

MIF if E then MIF else MIF

| OTHER

UIF if E then E

| if E then MIF else UIF Describes the same set of strings

Page 54: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 54

The Dangling Else: Example Revisited

The expression if E1 then if E2 then E3 else E4

if(UIF)

E1 if(MIF)

E2 E3 E4

if(MIF)

E1 if(UIF)

E2 E3

E4

• Not valid because the then expression is not a MIF

• A valid parse tree (for a UIF)

Page 55: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 55

Ambiguity

No general techniques for handling ambiguity

Impossible to convert automatically an ambiguous grammar to an unambiguous one

However, if used with care, ambiguity can simplify the grammar Sometimes allows more natural definitions We need disambiguation mechanisms

Page 56: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 56

Precedence and Associativity Declarations

Instead of rewriting the grammar Use the more natural (ambiguous) grammar Along with disambiguating declarations

Most tools allow precedence and associativity declarations to disambiguate grammars

Examples …

Page 57: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 57

Associativity Declarations

Consider the grammar E E - E | int Ambiguous: two parse trees of int - int - int

E

E

E E

E-

int -

intint

E

E

E E

E -

int -

intint

• Left-associativity declaration: %left +, -

Page 58: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 58

Precedence Declarations

Consider the grammar E E + E | E * E | int And the string int + int * int

E

E

E E

E+

int *

intint

E

E

E E

E*

int+

intint• Precedence declarations: %left +, -• %left *, /

Page 59: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 59

Abstract Syntax Trees

So far a parser traces the derivation of a sequence of tokens to generate a parse tree Such a parse tree is called a concrete parse tree

since it reflects the syntax structure of the input.The rest of the compiler needs a structure more

suitable for the representation of the program Abstract syntax trees

Like parse trees but ignore some detailsAbbreviated as AST

Page 60: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 60

Abstract Syntax Tree. (Cont.)

Consider the grammar E int | ( E ) | E + E

And the string 5 + (2 + 3)

After lexical analysis (a list of tokens)

int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’

During parsing we build a parse tree …

Page 61: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 61

Example of Parse Tree

E

E E

( E )

+

E +

int5

int2

E

int3

Characteristic of the parse tree : Traces the operation of the

parser Does capture the nesting

structure But too much info

Parentheses Single-successor nodes

Page 62: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 62

Example of Abstract Syntax Tree

Also captures the nesting structureBut abstracts from the concrete syntax

=> more compact and easier to useAn important data structure in a compiler

PLUS

PLUS

2 5 3

Page 63: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 63

Constructing An AST

We first define the AST class hierarchy ASTNode IntNode , PlusNode

Consider an abstract tree type with two constructors:

new PlusNode(

T1

) =,

T2

PLUS

T1 T2

new IntNode(n) = n

Page 64: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 64

Semantic Actions (Syntax-directed definition )

This is what we’ll use to construct ASTs

Each grammar symbol may have attributes For terminal symbols (lexical tokens) attributes can be

calculated by the lexer

Each production may have an action Written as: X Y1 … Yn { action }

That can refer to or compute symbol attributes

Page 65: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 65

Constructing an AST

We define an attribute ast for non-terminals Values of ast attributes are ASTs We assume that int.lexval is the value of the integer lexeme Computed using semantic actions

E int E.ast = new IntNode(int.lexval)

| E1 + E2 E.ast = new PlusNode

(E1.ast, E2.ast)

| ( E1 ) E.ast = E1.ast

Page 66: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 66

Parse Tree Example

Consider the string int5 ‘+’ ‘(‘ int2 ‘+’ int3 ‘)’A bottom-up evaluation of the ast attribute: E.ast = new PlusNode(new IntNode(5),

new PlusNode(new IntNode(2), new IntNode(3))

PLUS

PLUS

2 5 3

Page 67: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 67

Review

We can specify language syntax using CFGA parser will answer whether s L(G)

and will build a parse tree which we convert to an AST and pass on to the rest of the compiler

Next lectures: How do we answer s L(G) and build a parse tree?

Page 68: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 68

Introduction to Top-Down Parsing

Terminals are seen in order of appearance in the token stream:

t2 t5 t6 t8 t9

The parse tree is constructed From the top From left to right

1

t2 3

4

t5

7

t6

t9

t8

Page 69: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 69

Recursive Descent Parsing

Intuitions: for a set of productions with the same LHS X

X Y1 Y2 … Yn | … Nonterminal X => Procedure definition: X() {…} Nonterminal Y at RHS => procedure call: Y() Terminal b at RHS => match(b) : boolean concatenation X Y => sequence: X() ; Y() choice Y1 Y2 | Z1 Z2 => if ( ?) then {Y1(); Y2() } eise if (?) {Z1() ;Z2() } using one or more looking ahead symbols to help decide branch.

Ex : E T + E | T

E() { if(?) { T() ; m(+); E() } else T() ; }

Page 70: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 70

Recursive Descent Parsing. Example (Cont.)

Consider the grammar E T + E | T T int | int * T | ( E ) Start with top-level non-terminal E Token stream is: int5 * int2

Try the rules for E in orderTry E0 T1 + E2

Then try a rule for T1 ( E3 ) But ( does not match input token int5

Try T1 int . Token matches. But + after T1 does not match input token * Try T1 int * T2

This will match but + after T1 will be unmatched Have exhausted the choices for T1

Backtrack to choice for E0

Page 71: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 71

Recursive Descent Parsing. Example (Cont.)

Try E0 T1

Follow same steps as before for T1

And succeed with T1 int * T2 and T2 int

result in the following parse tree

E0

T1

int5 * T2

int2

Page 72: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 72

Recursive Descent Parsing. Notes.

Easy to implement by hand

But does not always work …

Page 73: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 1

Implementation of a Recursive Descent Parser

Page 74: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 74

A Recursive Descent Parser. Preliminaries

Let Token be the type of all token objects

public class Token {

int type ; // for token type

public final static int INT = 1,

OPENT=2, CLOSE=3, PLUS=4, TIMES=5,

… ; // constants for all token types.

… // other fields omitted

//Let the global tok point to the next token to be matched

public static Token tok;

//next() returns following token of ‘this’ from lexer.

public Token next(){…}

public static Token advance(){ tok = tok.next(); }

public static Token eat(int type){

if( tok.type = type) advance(); else error(); } … }

Page 75: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 75

A Recursive Descent Parser (2)

Define boolean functions that check the input token stream for a match of A given token type (terminal)

bool term(int type) {

if(tok.type == type){ advance(); return true;}

return false; } A given production of S (the nth rule)

bool Sn() { … } // do ‘and’ test inside the body

A NonTerminal S:

bool S() { … } // do ‘or’ test inside the body

These functions eat tokens.

Page 76: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 76

A Recursive Descent Parser (3)

For production E T + E bool E1() { return T() && term(PLUS) && E(); }

For production E T bool E2() { return T(); }

For all productions of E (with backtracking) bool E() { Token save = Token.tok;

if(E1()) return true;

// E1() fails => try next rule from stored save Token; Token.tok = save; if(E2()) return true; …

// En-1() fails => try last rule from stored save Token; Token.tok = save; return En(); }

Page 77: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 77

A Recursive Descent Parser (4)

Functions for non-terminal Tbool T1() { return term(OPEN) && E() && term(CLOSE); }

bool T2() { return term(INT) && term(TIMES) && T(); }

bool T3() { return term(INT); }

bool T() { Token save = Token.tok;

if(T1()) return true;

// E1() fails => try next rule from where save occurs.

Token.tok = save; // backtracking point

if(T2()) return true;

// T2() fails => try last rule from where save appears;

Token.tok = save;

return T3(); }

Page 78: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 78

Recursive Descent Parsing. Notes.

To start the parser 1. invoke Token.init(lexer) : static init(Lexer lexer) {…} // inside Token to set tok to first token; 2. Invoke E()

Notice how this simulates the previous backtracking example.

Easy to implement by handBut does not always work and is not efficient…

Predictive parsing (without backtracking) is better

Page 79: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 79

Recursive descent parser with lookahead

Recursive Descent Parser Unpopular because of inefficient backtracking

in practice:1.Using lookahead symbols to predict which alternative

rule to match and thus avoid some backtracking. though still unable to avoid all backtrackings!!

2.Restrict the grammar to specific form so that backtracking is not needed.

Page 80: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 80

Example

Grammar 3.11 S if E then S else S S begin S L S print E L end L ; S L E num = num

Normal procedure for S() with backtracking:bool S() { Token save = Token.tok;

if(S1()) return true; Token.tok = save;

id(S2()) return true; Token.tok = save;

return S3(); }

Observations: We can avoid unnecessary tries of S1() and S2() if we know

Token.tok is a ‘print’ token. S1() and S2() can also be expanded in-line in S(); a switch(Toekn.tok.type) {…} construct can be used to

determine which Si() is to be tried.

Page 81: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 81

The Improvement (no backtracking)

void S() { switch(Token.tok.type) { case IF : eat(IF); E(); eat(THEN); S(); eat(ELSE); S(); break;} case BEGIN: eat(BEGIN); S(); L(); break;} case PRINT: eat(PRINT); E(); break; default: error(); }}

void L() { switch(Token.tok.type) { case END: eat(END); break;} case SEMI: eat(SEMI); S(); L(); break;} default: error();}}

void E() { eat(NUM); eat(EQ); eat(NUM); }

Page 82: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 82

Try with another grammarGrammar 3.10

S E$ // $ is EOF (end-of-file marker) EE+T EE-T ET TT*F TT/F TF Fid Fnum F(E)

The translated program:

void S() {E(); eat(EOF);}

void E() { switch(Toekn.tok.type) {

case ? : E(); eat(PLUS); T();}

case ? : E(); eat(MINUS); T();}

case ? : T();}}Problem: there is no apparent terminals which E() can

use to decide the clause to proceed. In fact, All clauses are possible. Ex: (2) + 3, (2) – 3, 2 * 3.

Page 83: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 83

void T() { switch(tok.type) {

case ? : T(); eat(TIMES); F();}

case ? : T(); eat(DIV); F();}

case ? : F();}}

void F() { switch(tok.type) {

case ID: eat(ID);}

case NUM: eat(NUM);}

case LPAREN: eat(LPAREN); E(); eat(RPAREN);}}When will a predictive parser not work?

There are multiple production rules for a nonterminal with the same first terminal symbol.sometimes can be resolved by factoring out common left

parts.

Page 84: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 84

When Will Recursive Descent Not Work

Consider a production S S a: In the process of parsing S we try the above rule: boolean S() { return S() && term(a); }

or void S() { S(); eat(a);} What goes wrong?

A left-recursive grammar has a non-terminal S S + S for some

Recursive descent does not work in such cases

Page 85: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 85

Elimination of Left Recursion

Consider the left-recursive grammar S | S

S generates all strings starting with a and followed by a number of i.e., *

Can rewrite using right-recursion S S’

S’ S’ |

Page 86: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 86

More Elimination of Left-Recursion

In general

S S 1 | … | S n | 1 | … | m

All strings derived from S start with one of 1,…,m and continue with several instances of 1,…,n

i.e., (1|…|m ) (1|…|n )*

Rewrite as S 1 S’ | … | m S’

S’ 1 S’ | … | n S’ |

Page 87: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 87

General Left Recursion

The grammar S A | A S

is also left-recursive because

S + S

This left-recursion can also be eliminatedSee [Dragon book, Section 4.3] for general

algorithm.

Page 88: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 88

Summary of Recursive Descent

Simple and general parsing strategy Left-recursion must be eliminated first … but that can be done automatically

Unpopular because of backtracking Thought to be too inefficient

In practice, backtracking is eliminated by restricting the grammar

Page 89: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 89

Predictive Parsers

Table-based : encode grammar rules in a parsing table instead

of program code.Like recursive-descent but parser can “predict”

which production to use By looking at the next few tokens No backtracking

Predictive parsers accept LL(k) grammars 1st L means “left-to-right” scan of input 2nd L means “leftmost derivation” k means “predict based on at most k tokens of

lookahead”In practice, LL(1) is used.

Page 90: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 90

LL(1) Languages

In recursive-descent, for each non-terminal and input token there may be a choice of production

LL(1) means that for each non-terminal and token there is only at most one production

Can be specified as a 2D table One dimension for current non-terminal to expand One dimension for next token A table entry contains zero or one production

Ex: S if E then S else S --- (1) S begin S L --- (2) S print E --(3) Then table[S][if] = (1); table[S][begin] = (2), etc.

Page 91: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 91

Predictive Parsing and Left Factoring

Recall the grammar E T + E | T

T int | int * T | ( E )

Hard to predict because For T two productions start with int For E it is not clear how to predict

A grammar must be left-factored before use for predictive parsing

Page 92: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 92

Left-Factoring Example

Recall the grammar E T + E | T

T int | int * T | ( E )

• Factor out common prefixes of productions E T X

X + E | T ( E ) | int Y

Y * T |

Page 93: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 93

LL(1) Parsing Table Example

Left-factored grammarE T X X + E | T ( E ) | int Y Y * T |

The LL(1) parsing table:

int * + ( ) $

E T X T X

X + E T int Y ( E )

Y * T

Page 94: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 94

LL(1) Parsing Table Example (Cont.)

Consider the [E, int] entry “When current non-terminal is E and next input is int, use

production E T X This production can generate an int in the first place : T X * int …

Consider the [Y,+] entry “When current non-terminal is Y and current token is +,

get rid of Y” Y can be followed by + only in a derivation in which Y …XY+… * …X+…

Page 95: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 95

LL(1) Parsing Tables. Errors

Blank entries indicate error situations Consider the [E,*] entry “There is no way to derive a string starting with * from

non-terminal E”

Page 96: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 96

Using Parsing Tables

Method similar to recursive descent, except For each non-terminal S We look at the next token a And choose the production shown at [S,a]

We use a stack to keep track of pending non-terminalsWe reject when we encounter an error stateWe accept when the stack is empty

Page 97: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 97

LL(1) Parsing Algorithm

initialize :stack = <S $> and

tok (pointer to next token)repeat case stack of <X, rest> : if(T[X, tok.type] == Y1…Yn) then stack <Y1… Yn rest>; else error (); <t, rest> : if(t == tok.type) then{ stack <rest>;

advance(); } else error ();until stack == < >

Page 98: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 98

LL(1) Parsing Example

Stack Input Action

E $ int * int $ T X

T X $ int * int $ int Y

int Y X $ int * int $ terminal

Y X $ * int $ * T

* T X $ * int $ terminal

T X $ int $ int Y

int Y X $ int $ terminal

Y X $ $ X $ $ $ $ ACCEPT

Page 99: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 99

Constructing Parsing Tables

LL(1) languages are those with a LL(1) parsing table. No table entry can contain more than one

productions

How to generate the parsing table from a CFG ?

Page 100: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 100

Constructing Parsing Tables (Cont.)

If A , where in the line of A do we place ? i.e., table[A][??] =

1. In the column of b, where b can start a string derived from * b We say that b First()

2. In the column of b, if can reduce to and b can follow an A S * A b We say b Follow(A)

Page 101: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 101

Computing Nullable Symbols

Definition:

A nonterminal X is nullable if it can derive the empty string:

X * .

Given a grammar G : the set Nullable(G) of nullable symbols can be defined recursively as follows: 1. Basis: X is nullable if X is a rule of G. 2. Recursion: if Y1,Y2,…Yk (k > 0) are nullable and X Y1 Y2 … Yk is a

rule, then X is nullable.

Page 102: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 102

Example

Let G :

S ACA A aAa | B | C

B bB | b C cC | .

By (1) C is nullable --(3)

By (2)(3) A is nullable --(4)

By (2,3,4) S is nullable --(5).

By (1,2,3,4,5) no futher nullable symbol can be found.

Hence Nullable(G) = {A,C,S}.

Page 103: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 103

For each Nonterminal X define a number order(X) as follows: order(X) = 0 if X is a rule of G. if X Y1,…,Yk is a rule and Max{order(Yi) | I = 1,2,…k} = t , then order(X) = t + 1 if there is no other rule X Z1,…,Zm with Max{order(Zi) | I = 1,2,…m} < t.

Ex: in previous example: order(C) = 1; order(A) = 2; order(S) = 3.

Theorem: 1. X is nullable iff it order(X) is defined.

2. order(X) <= #nonterminals

Page 104: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 104

Algorithm for computing Nullable(G)

input: a CFG G; output NG : the set of all nullable symbols.

NG = {};

1. for all rules r:if r = X , then NG =NG U {X};

2. repeat : NG’ = NG;

for each rule r:

if r = X Y1 Y2 … Yk and {Y1,…Yk} NG’, then

NG = NG U {X}.

until NG = NG’ // no change of NG in the iteration. Correctness:

1. Step 1 . compute all nullables of order 0;

2. kth iteration of step 2 computes nullables of order k.

3. Since all nullables are of finite order, this program must terminate.

Page 105: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 105

Computing First Sets

Definition First(X) = { b | X * b} First(X) can be defined recursively :

1. Basis: First(b) = {b};

2. recursion: if X A1A2…An Y is a rule and

All A1…An are nullable, then First(Y) First(X). can extend First(-) to First() = { b | * b } , where is a sequecne of symbols. Then

First(Y1…Yk) = UY1…Yj-1 are nullable First(Yj) Notes : A,B,C are nonterminals; a,b,c are terminals; X,Y,Z,… are

terminal or nonterminals and are strings of terminals or nonterminals.

Page 106: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 106

First Sets. Example

Recall the grammar E T X X + E | T ( E ) | int Y Y * T |

nullable set: {X,Y}.First sets

First( ( ) = { ( } First( T ) = {int, ( }

First( ) ) = { ) } First( E ) = First(T) = {int, ( }

First( int) = { int } First( X ) = {+ }

First( + ) = { + } First( Y ) = {* }

First( * ) = { * }

Page 107: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 107

Algorithm for computing First(X)

input : a CFG G;output: First(X) for all symbols X.

1. for each terminal b; First(b) = b;

2. repeat:

for each rule X Y1Y2…Yk (k > 0)

2.1 First(X) = First(X) U First(Y1);

2.2 Let t be the index of Yt which is the first non-nullable symbol at the RHD or k+1 if all symbols are nullable.

2.3 for j = 1 to t-1

First(X) = First(X) U First(Yj);

until no First(-) changes in this iteration.

Page 108: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 108

Computing Follow Sets

Definition: Follow(X) = { b | S * X b }Intuition

If X A B then First(B) Follow(A) and Follow(X) Follow(B) Also if B * then Follow(X) Follow(A) If S is the start symbol then $ Follow(S)

Recursive definition: Basis: 1. $ Follow(S) 2. if X … Ythen First() Follow(Y) Recursion: 3. if X … A and is nullable, then Follow(X) Follow(A)

Note: 1,2 are used to compute following siblings while 3 is used to compute following cousins.

Page 109: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 109

Computing Follow Sets (Cont.)

Algorithm:

0. Follow(S) = {$}, Follow(X) = {} for X≠ S;

1. For each rule of the form X Y1Y2…Yk (k > 0)

for i = 1.. k-1

for j = i+1 .. k-1

Follow(Yi) = Follow(Yi) U First(Yj);

if Yj is not nullable then break;

2. Repeat : For each production A X1 X2 … Xn

for k = n ..1

Follow(Xk) = Follow(Xk) U Follow(A);

if (Xk is not nullable) break;

until Follow does not change in this iteration.

Page 110: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 110

Follow Sets. Example

Recall the grammar E T X X + E | T ( E ) | int Y Y * T |

Follow sets

Follow( + ) = { int, ( } Follow( * ) = { int, ( }

Follow( ( ) = { int, ( } Follow( ) ) = {+, ) , $}

Follow( int) = {*, +, ) , $}

Follow( X ) = {$, ) } Follow( T ) = {+, ) , $}

Follow( E ) = {), $} Follow( Y ) = {+, ) , $}

Page 111: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 111

Constructing LL(1) Parsing Tables

Construct a parsing table T for CFG G

For each production A in G do: For each terminal b First() do

T[A, b] = If nullable(), for each b Follow(A) do

T[A, b] =

Page 112: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 112

Notes on LL(1) Parsing Tables

If any entry contains multiple rules then G is not LL(1) If G is ambiguous If G is left recursive If G is not left-factored And in other cases as well

Most programming language grammars are not LL(1)There are tools that build LL(1) tables

Page 113: Transparency No. 1 Lecture 3 Introduction to Parsing and Top-Down Parsing Cheng-Chia Chen.

Transparency No. 113

Summary

For some grammars there is a simple parsing strategy Predictive parsing

Next time: a more powerful parsing strategy


Recommended