+ All Categories
Home > Documents > Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of...

Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of...

Date post: 14-Dec-2015
Category:
Upload: kenyon-mcallister
View: 222 times
Download: 0 times
Share this document with a friend
Popular Tags:
22
Translator Architecture Code Generator Parser Tokenizer string of characters (source code) string of tokens abstract program string of integers (object code) ? Parser
Transcript

Translator Architecture

CodeGenerator

ParserTokenizer

string ofcharacters

(source code)

string oftokens

abstractprogram

string ofintegers

(object code)

?

Parser

Here’s the Plan

Design a context-free grammar to specify (syntactically) valid BL programs

Use the grammar to implement a recursive descent parser (i.e., an algorithm to parse BL programs and construct the corresponding Program object)

Specifying Syntax with CFGs

Context-free grammars (cfg) Some syntactically valid real

constants: 37.044 615.22E16 99241. 18.E-93

Rewrite Rules for Real Constants

<real constant> <digit seqnce> . <digit seqnce> |<digit seqnce> . <digit seqnce> <exponent> |<digit seqnce> . |<digit seqnce> . <exponent>

<exponent> E <digit seqnce> |E + <digit seqnce> |E - <digit seqnce>

<digit seqnce> <digit> <digit seqnce> |<digit>

<digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

can be rewritten as

or

Four Components of a CFG Nonterminal symbols

<real constant>, <exponent>,<digit seqnce>, <digit>

Terminal symbols 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, E, +, -, .

Start symbol <real constant>

Rewrite rules on previous slide

A Derivation of 5.6E10

<real constant>

We begin with the start symbol…

… and choose one of its rewrite rules:

<real constant> <digit seqnce> . <digit seqnce> |<digit seqnce> . <digit seqnce> <exponent> |<digit seqnce> . |<digit seqnce> . <exponent>

A Derivation of 5.6E10

<real constant> <digit seqnce> . <digit seqnce> <exponent>

Now we choose one of the nonterminal symbols…

… and choose one of its rewrite rules:

<digit seqnce> <digit> <digit seqnce> |<digit>

A Derivation of 5.6E10

<real constant> <digit seqnce> . <digit seqnce> <exponent> <digit> . <digit seqnce> <exponent>

Again we choose one of the nonterminal symbols…

… and choose one of its rewrite rules:

<digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

A Derivation of 5.6E10

<real constant> <digit seqnce> . <digit seqnce> <exponent> <digit> . <digit seqnce> <exponent> 5 . <digit seqnce> <exponent>

5 . <digit> <exponent> 5 . 6 <exponent> 5 . 6 E <digit seqnce> 5 . 6 E <digit> <digit seqnce> 5 . 6 E 1 <digit seqnce> 5 . 6 E 1 <digit> 5 . 6 E 1 0

If it’s possible to find such a derivation, we write:<real constant> 5.6E10*

A Derivation Tree for 5.6E10<real constant>

<digit seqnce> <digit seqnce>

<digit seqnce>

<digit seqnce>

<digit> <digit>

<digit>

<digit>

<exponent>.

E

5 6

1

0

Find a Derivation Tree for 5.E3 The derivation tree:

Can you find a derivation tree for .6E10?

<real constant>

<digit seqnce> <exponent>.

<digit> <digit seqnce>E

<digit>

3

5

Why not?

Language Generated by a CFG Definition: Let G = (Nonterminals,

Terminals, <start symbol>, Rewrite Rules) be a context-free grammar. The language generated (or specified) by G is denoted L(G) and is defined as:

*L(G) = {x: string of Terminals (<start symbol> x)}

Another Example: A CFG for Boolean Expressions

<bool exp> T |F |NOT ( <bool exp> ) |( <bool exp> AND <bool exp> ) |( <bool exp> OR <bool exp> )

CFG for Boolean Expressions What are the nonterminal symbols?

What are the terminal symbols?

What is the start symbol?

How many rewrite rules?

<bool exp>

T, F, AND, OR, NOT, (, )

<bool exp>

5

A Derivation Tree for NOT((T OR F))

<bool exp>

NOT ( )<bool exp>

<bool exp> <bool exp>OR( )

T F

Find a Derivation Tree for ((T OR NOT (F)) AND T)

<bool exp>

<bool exp> <bool exp>AND( )

T<bool exp> <bool exp>OR( )

T NOT ( )<bool exp>

F

A Famous Context-free Grammar

<expression> <expression> <addop> <term> |

<term>

<term> <term> <multop> <factor> |<factor>

<factor> ( <expression> ) |<digit seqnce>

<addop> + | -

<multop> * | DIV | MOD

<digit seqnce> <digit> <digit seqnce> |

<digit>

<digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

What’s So Special About This CFG?

Find a derivation tree for 4 + 6 * 2<expression>

<addop> <term><expression>

<term>

<factor>

<digit seqnce>

<digit>

4

+ <term><multop> <factor>

<factor>

<digit>

6

<digit seqnce>

* <digit seqnce>

<digit>

2

What’s So Special Continued…

Find the derivationtree for(4 + 6) * 2

<expression>

<term>

<factor>

<digit seqnce>

<digit>

4

<term>

<term><multop> <factor>

<factor>

* <digit seqnce>

<digit>

2

<expression>( )

<expression><addop><term>

<digit>

6

<digit seqnce>

<factor>+

How About These Rewrite Rules?

<expression> <expression> <op> <expression> |

( <expression> ) |<digit seqnce>

<op> + | - | * | DIV | MOD

<digit seqnce> <digit> <digit seqnce> |

<digit>

<digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

New Rules Continued…

Find a derivation tree for 4 + 6 * 2<expression>

<op> <expression><expression>

<digit seqnce>

<digit>

4

+<expression><op><expression>

<digit seqnce>

6

<digit>

*<digit seqnce>

<digit>

2

New Rules Continued…

How about this tree for 4 + 6 * 2?<expression>

<op><expression> <expression>

<digit seqnce>

<digit>

2

*<expression><op><expression>

<digit seqnce>

4

<digit>

+<digit seqnce>

<digit>

6


Recommended