+ All Categories
Home > Documents > CSC 415: Translators and Compilers Spring 2009

CSC 415: Translators and Compilers Spring 2009

Date post: 21-Jan-2016
Category:
Upload: adora
View: 36 times
Download: 1 times
Share this document with a friend
Description:
CSC 415: Translators and Compilers Spring 2009. Chapter 4 Syntactic Analysis. Syntactic Analysis. Sub-phases of Syntactic Analysis Grammars Revisited Parsing Abstract Syntax Trees Scanning Case Study: Syntactic Analysis in the Triangle Compiler. Structure of a Compiler. - PowerPoint PPT Presentation
95
CSC 415: Translators and Compilers Spring 2009 Chapter 4 Syntactic Analysis
Transcript
Page 1: CSC 415:  Translators and Compilers Spring 2009

CSC 415: Translators and CompilersSpring 2009

Chapter 4

Syntactic Analysis

Page 2: CSC 415:  Translators and Compilers Spring 2009

Chart 2

Syntactic Analysis

Sub-phases of Syntactic Analysis Grammars Revisited Parsing Abstract Syntax Trees Scanning Case Study: Syntactic Analysis in the

Triangle Compiler

Page 3: CSC 415:  Translators and Compilers Spring 2009

Chart 3

Structure of a Compiler

Lexical Analyzer

Parser & Semantic Analyzer

Intermediate Code Generation

Optimization

Assembly Code Generation

Symbol Table

Source code

Assembly code

tokens

parse tree

intermediate representation

intermediate representation

Page 4: CSC 415:  Translators and Compilers Spring 2009

Chart 4

Syntactic Analysis

Main function– Parse source program to discover its phrase structure– Recursive-descent parsing– Constructing an AST– Scanning to group characters into tokens

Page 5: CSC 415:  Translators and Compilers Spring 2009

Chart 5

Sub-phases of Syntactic Analysis

Scanning (or lexical analysis)– Source program transformed to a stream of tokens

Identifiers Literals Operators Keywords Punctuation

– Comments and blank spaces discarded

Parsing– To determine the source programs phrase structure– Source program is input as a stream of tokens (from the Scanner)– Treats each token as a terminal symbol

Representation of phrase structure– AST

Page 6: CSC 415:  Translators and Compilers Spring 2009

Chart 6

Lexical Analysis – A Simple Example

Scan the file character by character and group characters into words and punctuation (tokens), remove white space and comments

Tokens for this example:

letvary:Integeriny:=y+1

let var y: Integerin !new year y := y+1

Note: !new year does not appear in list of tokens. Comments are removed along with white spaces.

Page 7: CSC 415:  Translators and Compilers Spring 2009

Chart 7

Creating Tokens – Mini-Triangle Example

let var y: Integerin !new year y := y+1

InputConverter

Buffer

Scanner

l e t v a r y : I n t e g e r i n . . . .

= space)character string

let

let

var

var

y

Ident.

:

colonInteger

Ident.

in

in

y

Ident.

:=

becomes

y

Ident.

+

op.

1

Intlit.

eot

Page 8: CSC 415:  Translators and Compilers Spring 2009

Chart 8

Tokens in Triangle

// literals, identifiers, operators... INTLITERAL = 0, "<int>", CHARLITERAL = 1, "<char>", IDENTIFIER = 2,

"<identifier>", OPERATOR = 3,

"<operator>",

// reserved words - must be in alphabetical order... ARRAY = 4, "array", BEGIN = 5, "begin", CONST = 6, "const", DO = 7, "do", ELSE = 8, "else", END = 9, "end", FUNC = 10, "func", IF = 11, "if", IN = 12, "in", LET = 13, "let", OF = 14, "of", PROC = 15, "proc", RECORD = 16, "record", THEN = 17, "then", TYPE = 18, "type", VAR = 19, "var", WHILE = 20, "while",

// punctuation... DOT = 21, ".", COLON = 22, ":", SEMICOLON = 23, ";", COMMA = 24, ",", BECOMES = 25, "~", IS = 26,

// brackets... LPAREN = 27, "(", RPAREN = 28, ")", LBRACKET = 29, [", RBRACKET = 30, "]", LCURLY = 31, "{", RCURLY = 32, "}",

// special tokens... EOT = 33, "", ERROR = 34; "<error>"

Page 9: CSC 415:  Translators and Compilers Spring 2009

Chart 9

Grammars Revisited

Context free grammars– Generates a set of sentences– Each sentence is a string of terminal symbols– An unambiguous sentence has a unique phrase

structure embodied in its syntax tree

Develop parsers from context-free grammars

Page 10: CSC 415:  Translators and Compilers Spring 2009

Chart 10

Regular Expressions

A regular expression (RE) is a convenient notation for expressing a set of stings of terminal symbols

Main features– ‘|’ separates alternatives– ‘*’ indicates that the previous item may be represented

zero or more times– ‘(‘ and ‘)’ are grouping parentheses

The empty string -- a special string of length 0

Page 11: CSC 415:  Translators and Compilers Spring 2009

Chart 11

Regular Expression Basics

Algebraic Properties– | is commutative and associative

r|s = s|r r|(s|t) = (r|s)|t

– Concatenation is associative (rs)t = r(st)

– Concatenation distributes over | r(s|t) = rs|rt (s|t)r = sr|tr

– is the identity for concatenation r = r r = r

– * is idempotent r** = r* r* = (r| )*

Page 12: CSC 415:  Translators and Compilers Spring 2009

Chart 12

Regular Expression Basics

Common Extensions– r+ one or more of expression r, same as rr*– rk k repetitions of r

r3 = rrr

– ~r the characters not in the expression r ~[\t\n]

– r-z range of characters [0-9a-z]

– r? Zero or one copy of expression (used for fields of an expression that are optional)

Page 13: CSC 415:  Translators and Compilers Spring 2009

Chart 13

Regular Expression Example

Regular Expression for Representing Months– Examples of legal inputs

January represented as 1 or 01 October represented as 10

– First Try: [0|1|][0-9] 0, 1, or followed by a number between 0 and 9 Matches all legal inputs? Yes

1, 2, 3, …, 10, 11, 12, 01, 02, …, 09 Matches any illegal inputs? Yes

0, 00, 18

Page 14: CSC 415:  Translators and Compilers Spring 2009

Chart 14

Regular Expression Example

Regular Expression for Representing Months– Examples of legal inputs

January represented as 1 or 01 October represented as 10

– Second Try: [1-9]|(0[1-9])|(1[0-2]) Any number between 1 and 9 or 0 followed by any number

between 1 and 9 or 1 followed by any number between 0 and 2 Matches all legal inputs? Yes

1, 2, 3, …, 10, 11, 12, 01, 02, …, 09 Matches any illegal inputs? No

Page 15: CSC 415:  Translators and Compilers Spring 2009

Chart 15

Regular Expression Example

Regular Expression for Floating Point Numbers– Examples of legal inputs

1.0, 0.2, 3.14159, -1.0, 2.7e8, 1.0E-6, -2.5e+5 Assume that a 0 is required before numbers less than 1 and

does not prevent extra leading zeros, so numbers such as 0011 or 0003.14159 are legal

– Building the regular expression Assume

digit 0|1|2|3|4|5|6|7|8|9 Handle simple decimals such as 1.0, 0.2, 3.14159

digit+.digit+ 1 or more digits followed by . followed by 1 or more decimals

Add an optional sign (only minus, no plus)

(-| )digit+.digit+ or -?digit+.digit+

Page 16: CSC 415:  Translators and Compilers Spring 2009

Chart 16

Regular Expression Example

Regular Expression for Floating Point Numbers (cont.)– Building the regular expression (cont.)

Format for the exponent(E|e)(+|-)?(digit+)

Adding it as an optional expression to the decimal part

(-| )digit+.digit+((E|e)(+|-)?(digit+))?

Page 17: CSC 415:  Translators and Compilers Spring 2009

Chart 17

Extended BNF

Extended BNF (EBNF)– Combination of BNF and RE– N::=X, where N is a nonterminal symbol and X is an

extended RE, i.e., an RE constructed from both terminal and nonterminal symbols

– EBNF Right hand side may use |. *, (, ) Right hand side may contain both terminal and nonterminal

symbols

Page 18: CSC 415:  Translators and Compilers Spring 2009

Chart 18

Example EBNF

Expression ::= primary-Expression (Operator primary-Expression)*

primary-Expression ::= Identifier| ( Expression )

Identifier ::= a|b|c|d|e

Operator ::= +|-|*|/

Generatesea + ba – b – ca + (b * c)a + (b + c) / da – (b – (c – (d – e)))

Page 19: CSC 415:  Translators and Compilers Spring 2009

Chart 19

Grammar Transformations

Left FactorizationXY | XZ is equivalent to X(Y | Z)

single-Command ::= V-name := Expression

| if Expression then single-Command

| if Expression then single-Command

else single-Command

single-Command ::= V-name := Expression

| if Expression then single-Command

( |else single-Command)

Page 20: CSC 415:  Translators and Compilers Spring 2009

Chart 20

Grammar Transformations

Elimination of left recursion

N::= X | NY is equivalent to N::=X(Y)*Identifier ::= Letter

| Identifier Letter

| Identifier Digit

Identifier ::= Letter

| Identifier (Letter | Digit)

Identifier ::= Letter(Letter | Digit)*

Page 21: CSC 415:  Translators and Compilers Spring 2009

Chart 21

Grammar Transformations

Substitution of nonterminal symbolsGiven N::=X, we can substitute each occurrence of N with X

iff N::=X is nonrecursive and is the only production rule for N

single-Command ::= for Control-Variable := Expression To-or-Downto

Expression do single-Command

| …

Control-Variable ::= Identifier

To-or-Downto ::= to

| down

single-Command ::= for Identifier := Expression (to|downto)

Expression do single-Command

| …

Page 22: CSC 415:  Translators and Compilers Spring 2009

Chart 22

Starter Sets

Starter set of an RE X– Starters[[X]]– Set of terminal symbols that can start a string

generated by X

Examples– Starter[[his | her | its]] = {h, i}– Starter[[(re)* set]] = {r, s}

Page 23: CSC 415:  Translators and Compilers Spring 2009

Chart 23

Starter Sets

Precise and complete definition of starters:starters[[starters[[t]] = {t} where t is a terminal symbol

starters[[X Y]] = starters[[X]] starters[[Y]] if X generates starters[[X Y]] = starters[[X]] if X does not

generate starters[[X | Y]] = starters[[X]] starters[[Y]]

starters[[X *]] = starters[[X]]

To generalize fo ra starter set of an extended RE add– starters[[N]] = starters[[X]] where N is a nonterminal

symbol defined production rule N ::= X

Page 24: CSC 415:  Translators and Compilers Spring 2009

Chart 24

Example Starter Set

Expression ::= primary-Expression (Operator primary-Expression)*primary-Expression ::= Identifier

| ( Expression )Identifier ::= a|b|c|d|eOperator ::= +|-|*|/

starters[[Expression]] = starters[[primary-Expression (Operator primary-Expression)*]] = starters[[primany-Expression]] = starters[[Identifier]] starters[[ (Expressions ) ]] = starters[[a | b | c | d | e]] { ( } = {a, b, c, d, e, (}

Page 25: CSC 415:  Translators and Compilers Spring 2009

Chart 25

Scanning (Lexical Analysis)

The purpose of scanning is to recognize tokens in the source program. Or, to group input characters (the source program text) into tokens.

Difference between parsing and scanning:– Parsing groups terminal symbols, which are tokens,

into larger phrases such as expressions and commands and analyzes the tokens for correctness and structure

– Scanning groups individual characters into tokens

Page 26: CSC 415:  Translators and Compilers Spring 2009

Chart 26

Structure of a Compiler

Lexical Analyzer

Parser & Semantic Analyzer

Intermediate Code Generation

Optimization

Assembly Code Generation

Symbol Table

Source code

Assembly code

tokens

parse tree

intermediate representation

intermediate representation

Page 27: CSC 415:  Translators and Compilers Spring 2009

Chart 27

Creating Tokens – Mini-Triangle Example

let var y: Integerin !new year y := y+1

InputConverter

Buffer

Scanner

l e t v a r y : I n t e g e r i n . . . .

= space)character string

let

let

var

var

y

Ident.

:

colonInteger

Ident.

in

in

y

Ident.

:=

becomes

y

Ident.

+

op.

1

Intlit.

eot

Page 28: CSC 415:  Translators and Compilers Spring 2009

Chart 28

What Does a Scanner Do?

Handle keywords (reserve words)– Recognizes identifiers and keywords– Match explicitly

Write regular expression for each keyword Identifier is any alpha numeric string which is not a keyword

– Match as an identifier, perform lookup No special regular expressions for keywords When an identifier is found, perform lookup into preloaded

keyword table

How does Triangle handle keywords?Discuss in terms of efficiency and ease to code.

Page 29: CSC 415:  Translators and Compilers Spring 2009

Chart 29

What Does a Scanner Do?

Remove white space– Tabs, spaces, new lines

Remove comments– Single line

-- Ada comment

– Multi-line, start and end delimiters{ Pascal comment }/* c comment */

– Nested– Runaway comments

Nonterminated comments can’t be detected till end of file

Page 30: CSC 415:  Translators and Compilers Spring 2009

Chart 30

What Does a Scanner Do?

Perform look ahead– Multi-character tokens

1..10 vs. 1.10&, &&<, <=etc

Challenging input languages– FORTRAN

Keywords not reserved Blanks are not a delimiter Example (comma vs. decimal)

DO10I=1,5 start of a do loop (equivalent to a C for loop)DO10I=1.5 an assignment statement, assignment to variable DO10I

Page 31: CSC 415:  Translators and Compilers Spring 2009

Chart 31

What Does a Scanner Do?

Challenging input languages (cont.)– PL/I, keywords not reserved

IF THEN THEN THEN = ELSE; ELSE ELSE = THEN;

Page 32: CSC 415:  Translators and Compilers Spring 2009

Chart 32

What Does a Scanner Do?

Error Handling– Error token passed to parser which reports the error– Recovery

Delete characters from current token which have been read so far, restart scanning at next unread character

Delete the first character of the current lexeme and resume scanning from next character.

– Examples of lexical errors: 3.25e bad format for a constant Var#1 illegal character

– Some errors that are not lexical errors Mistyped keywords

– Begim Mismatched parenthesis Undeclared variables

Page 33: CSC 415:  Translators and Compilers Spring 2009

Chart 33

Scanner Implementation

Issues– Simpler design – parser doesn’t have to worry about

white space, etc.– Improve compiler efficiency – allows the construction of

a specialized and potentially more efficient processor– Compiler portability is enhanced – input alphabet

peculiarities and other device-specific anomalies can be restricted to the scanner

Page 34: CSC 415:  Translators and Compilers Spring 2009

Chart 34

Scanner Implementation

What are the keywords in Triangle? How are keywords and identifiers implemented in

Triangles? Is look ahead implemented in Triangle?

– If so, how?

Page 35: CSC 415:  Translators and Compilers Spring 2009

Chart 35

Structure of a Compiler

Lexical Analyzer

Parser

Intermediate Code Generation

Optimization

Assembly Code Generation

Symbol Table

Source code

Assembly code

tokens

parse tree

intermediate representation

intermediate representation

Semantic Analyzer

Page 36: CSC 415:  Translators and Compilers Spring 2009

Chart 36

Parsing

Given an unambiguous, context free grammar, parsing is– Recognition of an input string, i.e., deciding whether or

not the input string is a sentence of the grammar– Parsing of an input string, i.e., recognition of the input

string plus determination of its phrase structure. The phrase structure can be represented by a syntax tree, or otherwise.

Unambiguous is necessary so that every sentence of the grammar will form exactly one syntax tree.

Page 37: CSC 415:  Translators and Compilers Spring 2009

Chart 37

Parsing

The syntax of programming language constructs are described by context-free grammars.

Advantages of unambiguous, context-free grammars– A precise, yet easy-to understand, syntactic specification of

the programming language– For certain classes of grammars we can automatically

construct an efficient parser that determines if a source program is syntactically well formed.

– Imparts a structure to a programming language that is useful for the translation of source programs into correct object code and for the detection of errors.

– Easier to add new constructs to the language if the implementation is based on a grammatical description of the language

Page 38: CSC 415:  Translators and Compilers Spring 2009

Chart 38

Parsing

Check the syntax (structure) of a program and create a tree representation of the program

Programming languages have non-regular constructs– Nesting– Recursion

Context-free grammars are used to express the syntax for programming languages

sequence of tokens parser syntax tree

Page 39: CSC 415:  Translators and Compilers Spring 2009

Chart 39

Context-Free Grammars

Comprised of– A set of tokens or terminal symbols– A set of non-terminal symbols– A set of rules or productions which express the legal

relationships between symbols– A start or goal symbol

Example:1. expr expr – digit2. expr expr + digit3. expr digit4. digit 0|1|2|…|9

Tokens: -,+,0,1,2,…,9 Non-terminals: expr, digit Start symbol: expr

Page 40: CSC 415:  Translators and Compilers Spring 2009

Chart 40

Context-Free Grammars

1. expr expr – digit

2. expr expr + digit

3. expr digit

4. digit 0|1|2|…|9

Example input: 3 + 8 - 2

expr

expr

expr digit

digit

digit

3

2

8

+

-

Page 41: CSC 415:  Translators and Compilers Spring 2009

Chart 41

Checking for Correct Syntax

Given a grammar for a language and a program, how do you know if the syntax of the program is legal?

A legal program can be derived from the start symbol of the grammar

Grammar must be unambiguous and context-free

Page 42: CSC 415:  Translators and Compilers Spring 2009

Chart 42

Deriving a String

The derivation begins with the start symbol At each step of a derivation the right hand side of a

grammar rule is used to replace a non-terminal symbol Continue replacing non-terminals until only terminal

symbols remain

1. expr expr – digit

2. expr expr + digit

3. expr digit

4. digit 0|1|2|…|9

Example input: 3 + 8 - 2

expr expr – digit expr – 2 expr + digit - 2Rule 1 Rule 4 Rule 2

expr + 8-2 digit + 8-2 3+8 -2Rule 4 Rule 3 Rule 4

Page 43: CSC 415:  Translators and Compilers Spring 2009

Chart 43

Rightmost Derivation

The rightmost non-terminal is replaced in each step

1. expr expr – digit

2. expr expr + digit

3. expr digit

4. digit 0|1|2|…|9

Example input: 3 + 8 - 2

expr + digit - 2 expr + 8-2

expr + 8-2 digit + 8-2Rule 3

expr expr – digitRule 1

expr – digit expr – 2Rule 4

expr – 2 expr + digit - 2Rule 2

Rule 4

digit + 8-2 3+8 -2Rule 4

Page 44: CSC 415:  Translators and Compilers Spring 2009

Chart 44

Leftmost Derivation

The leftmost non-terminal is replaced in each step

1. expr expr – digit

2. expr expr + digit

3. expr digit

4. digit 0|1|2|…|9

Example input: 3 + 8 - 2

digit + digit – digit 3 + digit – digit

3 + digit – digit 3 + 8 – digitRule 4

expr expr – digitRule 1

expr – digit expr + digit – digitRule 2

expr + digit – digit digit + digit – digitRule 3

Rule 4

3 + 8 – digit 3 + 8 – 2Rule 4

Page 45: CSC 415:  Translators and Compilers Spring 2009

Chart 45

Leftmost Derivation

The leftmost non-terminal is replaced in each step

digit + digit – digit 3 + digit – digit

3 + digit – digit 3 + 8 – digitRule 4

expr expr – digitRule 1

expr – digit expr + digit – digitRule 2

expr + digit – digit digit + digit – digitRule 3

Rule 4

3 + 8 – digit 3 + 8 – 2Rule 4

expr

expr

expr digit

digit

digit

3

2

8

+

-

33

22

11

44

55

66

11

22

33

44

55

66

Page 46: CSC 415:  Translators and Compilers Spring 2009

Chart 46

Bottom-Up Parsing

Parser examines terminal symbols of the input string, in order from left to right

Reconstructs the syntax tree from the bottom (terminal nodes) up (toward the root node)

Bottom-up parsing reduces a string w to the start symbol of the grammar.– At each reduction step a particular sub-string matching

the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.

Page 47: CSC 415:  Translators and Compilers Spring 2009

Chart 47

Bottom-Up Parsing

Types of bottom-up parsing algorithms– Shift-reduce parsing

At each reduction step a particular sub-string matching the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.

– LR(k) parsing L is for left-to-right scanning of the input, the R is for

constructing a right-most derivation in reverse, and the k is for the number of input symbols of look-ahead that are used in making parsing decisions.

Page 48: CSC 415:  Translators and Compilers Spring 2009

Chart 48

Bottom-Up Parsing Example3+8-2

1. expr expr – digit

2. expr expr + digit

3. expr digit

4. digit 0|1|2|…|9

Example input: 3 + 8 - 2

3 + 8 - 2

3 + 8 - 2

digit

3 + 8 - 2

digitdigit

3 + 8 - 2

digitdigit

expr

Page 49: CSC 415:  Translators and Compilers Spring 2009

Chart 49

Bottom-Up Parsing Example3+8-2

3 + 8 - 2

digitdigit

expr

3 + 8 - 2

digitdigit

exprdigit

expr

3 + 8 - 2

digitdigit

exprdigit

Page 50: CSC 415:  Translators and Compilers Spring 2009

Chart 50

Bottom-Up Parsing Exampleabbcde

1. S aABe

2. A Abc | b

3. B d

Example input: abbcde

a b b c d e

a b b c d e

A

Abbcde aAbcde

a b b c d e

A

aAbcde

Page 51: CSC 415:  Translators and Compilers Spring 2009

Chart 51

Bottom-Up Parsing Exampleabbcde

1. S aABe

2. A Abc | b

3. B d

Example input: abbcde

a b b c d e

A

A

aAde

a b b c d e

A

A

aAbcde aAde

Page 52: CSC 415:  Translators and Compilers Spring 2009

Chart 52

Bottom-Up Parsing Exampleabbcde

1. S aABe

2. A Abc | b

3. B d

Example input: abbcde

a b b c d e

A

A

aAde aABe

B

a b b c d e

A

A

aABe

B

Page 53: CSC 415:  Translators and Compilers Spring 2009

Chart 53

Bottom-Up Parsing Exampleabbcde

1. S aABe

2. A Abc | b

3. B d

Example input: abbcde

a b b c d e

A

A

aABe S

B

S

Page 54: CSC 415:  Translators and Compilers Spring 2009

Chart 54

Bottom-Up Parsing Examplethe cat sees a rat.

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

the cat sees a rat .

the cat sees a rat

Noun

the cat sees a rat. the Noun sees a rat.

.

the cat sees a rat

Noun

the Noun sees a rat.

.

Page 55: CSC 415:  Translators and Compilers Spring 2009

Chart 55

Bottom-Up Parsing Examplethe cat sees a rat.

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

the cat sees a rat

Noun

Subject sees a rat.

Subject

.

the cat sees a rat

Noun

the Noun sees a rat. Subject sees a rat.

Subject

.

Page 56: CSC 415:  Translators and Compilers Spring 2009

Chart 56

Bottom-Up Parsing Examplethe cat sees a rat.

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

the cat sees a rat

Noun

Subject sees a rat. Subject Verb a rat.

Subject

Verb

.

the cat sees a rat

Noun

Subject Verb a rat.

Subject

Verb

.

Page 57: CSC 415:  Translators and Compilers Spring 2009

Chart 57

Bottom-Up Parsing Examplethe cat sees a rat.

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

Subject Verb a rat. Subject Verb a Noun.

the cat sees a rat

Noun

Subject

Verb

.

Noun

the cat sees a rat

Noun

Subject

Verb

.

Noun

Subject Verb a Noun.

Page 58: CSC 415:  Translators and Compilers Spring 2009

Chart 58

Bottom-Up Parsing Examplethe cat sees a rat.

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

the cat sees a rat

Noun

Subject

Verb

.

Noun

Subject Verb a Noun. Subject Verb Object.

Object

Subject Verb Object.

the cat sees a rat

Noun

Subject

Verb

.

Noun

ObjectWhat would happened if we

choose ‘Subject a Noun’ instead of

‘Object a Noun’?

Page 59: CSC 415:  Translators and Compilers Spring 2009

Chart 59

Bottom-Up Parsing Examplethe cat sees a rat.

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

the cat sees a rat

Noun

Subject

Verb

.

Noun

Subject Verb Object.

Object

Sentence

Page 60: CSC 415:  Translators and Compilers Spring 2009

Chart 60

Top-Down Parsing

The parser examines the terminal symbols of the input string, in order from left to right.

The parser reconstructs its syntax tree from the top (root node) down (towards the terminal nodes).

An attempt to find the leftmost derivation for an input string

Page 61: CSC 415:  Translators and Compilers Spring 2009

Chart 61

Top-Down Parsers

General rules for top-down parsers– Start with just a stub for the root node– At each step the parser takes the left most stub– If the stub is labeled by terminal symbol t, the parser

connects it to the next input terminal symbol, which must be t. (If not, the parser has detected a syntactic error.)

– If the stub is labeled by nonterminal symbol N, the parser chooses one of the production rules N::= X1…Xn, and grows branches from the node labeled by N to new stubs labeled X1,…, Xn (in order from left to right).

– Parsing succeeds when and if the whole input string is connected up to the syntax tree.

Page 62: CSC 415:  Translators and Compilers Spring 2009

Chart 62

Top-Down Parsing

Two forms– Backtracking parsers

Guesses which rule to apply, back up, and changes choices if it can not proceed

– Predictive Parsers Predicts which rule to apply by using look-ahead tokens

Backtracking parsers are not very efficient. We will cover Predictive parsers

Page 63: CSC 415:  Translators and Compilers Spring 2009

Chart 63

Predictive Parsers

Many types– LL(1) parsing

First L is scanning the input form left to right; second L is for producing a left-most derivation; 1 is for using one input symbol of look-ahead

Table driven with an explicit stack to maintain the parse tree

– Recursive decent parsing Uses recursive subroutines to traverse the parse tree

Page 64: CSC 415:  Translators and Compilers Spring 2009

Chart 64

Predictive Parsers (Lookahead)

Lookahead in predictive parsing– The lookahead token (next token in the input) is used

to determine which rule should be used next– For example:

1. term num term’

2. term’ ‘+’ num term’ | ‘-’ num term’ |

– num ‘0’|’1’|’2’|…|’9’

Example input: 7 + 3 - 2

term’num7

+

term

num term’

term

num term’

Page 65: CSC 415:  Translators and Compilers Spring 2009

Chart 65

Predictive Parsers (Lookahead)

1. term num term’

2. term’ ‘+’ num term’ | ‘-’ num term’ |

– num ‘0’|’1’|’2’|…|’9’

Example input: 7 + 3 - 2

term’num7

+

term

num term’

3

term’num7

+

term

num term’

3

- num term’

Page 66: CSC 415:  Translators and Compilers Spring 2009

Chart 66

Predictive Parsers (Lookahead)

1. term num term’

2. term’ ‘+’ num term’ | ‘-’ num term’ |

– num ‘0’|’1’|’2’|…|’9’

Example input: 7 + 3 - 2

term’num7 +

term

num term’

3 - num term’

2

term’num7 +

term

num term’

3 - num term’

2

Page 67: CSC 415:  Translators and Compilers Spring 2009

Chart 67

Recursive-Decent Parsing

Top-down parsing algorithm– Consists of a group of methods (programs) parseN,

one for each nonterminal symbol N of the grammar.– The task of each method parseN is to parse a single N-

phrase– These parsing methods cooperate to parse complete

sentences

Page 68: CSC 415:  Translators and Compilers Spring 2009

Chart 68

Recursive-Decent Parsing

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

.

a. Decide which production rule to apply. Only one, #1.This step created four stubs.

Page 69: CSC 415:  Translators and Compilers Spring 2009

Chart 69

Recursive-Decent Parsing

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

Noun

Page 70: CSC 415:  Translators and Compilers Spring 2009

Chart 70

Recursive-Decent Parsing

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

Noun

Page 71: CSC 415:  Translators and Compilers Spring 2009

Chart 71

Recursive-Decent Parsing

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

Noun

Page 72: CSC 415:  Translators and Compilers Spring 2009

Chart 72

Recursive-Decent Parsing

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

Noun Noun

Page 73: CSC 415:  Translators and Compilers Spring 2009

Chart 73

Recursive-Decent Parsing

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

Noun Noun

Page 74: CSC 415:  Translators and Compilers Spring 2009

Chart 74

Recursive-Decent Parsing

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

Noun Noun

Page 75: CSC 415:  Translators and Compilers Spring 2009

Chart 75

Recursive-Descent Parser for Micro-English

ParseSentence

ParseSubject

ParseObject

ParseVerb

ParseNoun

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Page 76: CSC 415:  Translators and Compilers Spring 2009

Chart 76

Recursive-Descent Parser for Micro-English

ParseSentence

parseSubject

parseVerb

parseObject

parseEnd

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Sentence

Subject

Verb

Object

.

Page 77: CSC 415:  Translators and Compilers Spring 2009

Chart 77

Recursive-Descent Parser for Micro-English

ParseSubjectif input = “I”

acceptelse if input =“a”

acceptparseNoun

else if input = “the”acceptparseNoun

else error

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Subject I

|

Noun

a

|

Noun

the

Page 78: CSC 415:  Translators and Compilers Spring 2009

Chart 78

Recursive-Descent Parser for Micro-English

ParseNoun

if input = “cat”

accept

else if input =“mat”

accept

else if input = “rat”

accept

else error

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Noun cat

| mat

| rat

Page 79: CSC 415:  Translators and Compilers Spring 2009

Chart 79

Recursive-Descent Parser for Micro-English

ParseObjectif input = “me”

acceptelse if input =“a”

acceptparseNoun

else if input = “the”acceptparseNoun

else error

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Object

me

|

Noun

a

|

Noun

the

Page 80: CSC 415:  Translators and Compilers Spring 2009

Chart 80

Recursive-Descent Parser for Micro-English

ParseVerb

if input = “like”

accept

else if input =“is”

accept

else if input = “see”

accept

else if input = “sees”

accept

else error

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Verb like

| is

| see

| sees

Page 81: CSC 415:  Translators and Compilers Spring 2009

Chart 81

Recursive-Descent Parser for Micro-English

ParseEnd

if input = “.”

accept

else error

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

.

Page 82: CSC 415:  Translators and Compilers Spring 2009

Chart 82

Systematic Development of a Recursive-Descent Parser

Given a (suitable) context-free grammar– Express the grammar in EBNF, with a single production rule for

each nonterminal symbol, and perform any necessary grammar transformations

Always eliminate left recursion Always left-factorize whenever possible

– Transcribe each EBNF production rule N::=X to a parsing method parseN, whose body is determined by X

– Make the parser consist of: A private variable currentToken; Private parsing methods developed in previous step Private auxiliary methods accept and acceptIt, both of which call the

scanner A public parse method that calls parseS, where S is the start symbol

of the grammar), having first called the scanner to store the first input token in currentToken

Page 83: CSC 415:  Translators and Compilers Spring 2009

Chart 83

Quote of the Week

“C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows away your whole leg.”– Bjarne Stroustrup

Page 84: CSC 415:  Translators and Compilers Spring 2009

Chart 84

Quote of the Week

Did you really say that? Dr. Bjarne Stroustrup: Yes, I did say something along the lines of C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows your whole leg off. What people tend to miss is that what I said about C++ is to a varying extent true for all powerful languages. As you protect people from simple dangers, they get themselves into new and less obvious problems. Someone who avoids the simple problems may simply be heading for a not-so-simple one. One problem with very supporting and protective environments is that the hard problems may be discovered too late or be too hard to remedy once discovered. Also, a rare problem is harder to find than a frequent one because you don't suspect it.  I also said, "Within C++, there is a much smaller and cleaner language struggling to get out." For example, that quote can be found on page 207 of The Design and Evolution of C++. And no, that smaller and cleaner language is not Java or C#. The quote occurs in a section entitled "Beyond Files and Syntax". I was pointing out that the C++ semantics is much cleaner than its syntax. I was thinking of programming styles, libraries and programming environments that emphasized the cleaner and more effective practices over archaic uses focused on the low-level aspects of C.

Page 85: CSC 415:  Translators and Compilers Spring 2009

Chart 85

Converting EBNF Production Rules to Parsing Methods

For production rule N::=X– Convert production rule to parsing method named parseN

Private void parseN () { Parse X }

– Refine parseE to a dummy statement– Refine parse t (where t is a terminal symbol) to accept(t) or acceptIt()– Refine parse N (where N is a non terminal symbol) to a call of the corresponding parsing

methodparseN()

– Refine parse X Y to{parseXparseY}}

– Refine parse X|YSwitch (currentToken.kind) {Cases in starter[[X]]

Parse XBreak;

Cases in starters[[Y]]:Parse YBreak

Default:Report a syntax error

}

Page 86: CSC 415:  Translators and Compilers Spring 2009

Chart 86

Converting EBNF Production Rules to Parsing Methods

For X | Y – Choose parse X only if the current token is one that

can start an X-phrase– Choose parse Y only if the current token is one that

can start an Y-phrase starters[[X]] and starters[[Y]] must be disjoint

For X*– Choose

while (currentToken.kind is in starters[[X]]) starter[[X]] must be disjoint from the set of tokens that can

follow X* in this particular context

Page 87: CSC 415:  Translators and Compilers Spring 2009

Chart 87

Converting EBNF Production Rules to Parsing Methods

A grammar that satisfies both these conditions is called an LL(1) grammar

Recursive-descent parsing is suitable only for LL(1) grammars

Page 88: CSC 415:  Translators and Compilers Spring 2009

Chart 88

Error Repair

Good programming languages are designed with a relatively large “distance” between syntactically correct programs, to increase the likelihood that conceptual mistakes are caught on syntactic errors.

Error repair usually occurs at two levels:– Local: repairs mistakes with little global import, such as

missing semicolons and undeclared variables.– Scope: repairs the program text so that scopes are

correct. Errors of this kind include unbalanced parentheses and begin/end blocks.

Page 89: CSC 415:  Translators and Compilers Spring 2009

Chart 89

Error Repair

Repair actions can be divided into insertions and deletions. Typically the compiler will use some look ahead and backtracking in attempting to make progress in the parse. There is great variation among compilers, though some languages (PL/C) carry a tradition of good error repair. Goals of error repair are:

– No input should cause the compiler to collapse– Illegal constructs are flagged– Frequently occurring errors are repaired gracefully– Minimal stuttering or cascading of errors.

LL-Style parsing lends itself well to error repair, since the compiler uses the grammar’s rules to predict what should occur next in the input

Page 90: CSC 415:  Translators and Compilers Spring 2009

Chart 90

Triangle single-Command

Single-Command ::= | V-name := Expression| Identifier ( Actual-Parameter-Sequence )| begin Command end| let Declaration in single-Command| if Expression then single-Command

else single-Command| while Expression do single-Command

V-name ::= Identifier| V-name . Identifier| V-name [ Expression ]

Identifier :: = Letter (Letter | Digit)*Letter ::= a|b|c|d|e|f|g|h|I|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z

|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|ZDigit :: = 0|1|2|3|4|5|6|7|8|9

Page 91: CSC 415:  Translators and Compilers Spring 2009

Chart 91

Starter Sets

Starter Set for RE– starters[[X]] is the string of terminal symbols that can

start a string generated by X

Examplestarters[[single-Command]] =

starters[[:=, (, begin, let, if, while]] What about Vname vs Identifier?

– Use the look ahead when encounter Identifier to look for := or (.

Page 92: CSC 415:  Translators and Compilers Spring 2009

Chart 92

Mini-Triangle Production Rules

Program ::= Command Program (1.14)

Command ::= V-name := Expression AssignCommand (1.15a) | Identifier ( Expression ) CallCommand (1.15b)

| Command ; Command SequentialCommand (1.15c)| if Expression then Command IfCommand (15.d)

else Command| while Expression do Command WhileCommand (1.15e| let Declaration in Command LetCommand (1.15f)

Expression ::= Integer-Literal IntegerExpression (1.16a)| V-name VnameExpression (1.16b)| Operator Expression UnaryExpression (1.16c)| Expression Operator Expression BinaryExpressioiun (1.16d)

V-name ::= Identifier SimpelVname (1.17)

Declaration ::= const Identifier ~ Expression ConstDeclaration (1.18a)| var Identifier : Typoe-denoter VarDeclaration (1.18b)| Declaration ; Declaration SequentialDeclaration (1.18c)

Type-denoter ::= Identifier SimpleTypeDenoter (1.19)

Page 93: CSC 415:  Translators and Compilers Spring 2009

Chart 93

Abstract Syntax Trees

An explicit representation of the source program’s phrase structure

AST for Mini-Triangle

Page 94: CSC 415:  Translators and Compilers Spring 2009

Chart 94

Abstract Syntax Trees

Program ASTs (P):

Program

C

Program ::= Command Program (1.14

Command ASTs (C):

AssignCommand

V E

CallCommand

Identifier E

spelling

SequentialCommand

C1C2

Command ::= V-name := Expression AssignCommand (1.15a)

| Identifier ( Expression ) CallCommand (1.15b)

| Command ; Command SequentialCommand (1.15c)

(1.15a)(1.15b) (1.15c)

Page 95: CSC 415:  Translators and Compilers Spring 2009

Chart 95

Abstract Syntax Trees

Command ASTs (C):

WhileCommand

V E

SequentialCommand

C1C2(1.15e)

(1.15d)

LetCommand

D C(1.15f) E

Command ::= | if Expression then Command IfCommand (15.d)

else Command

| while Expression do Command WhileCommand (1.15e

| let Declaration in Command LetCommand (1.15f)


Recommended