+ All Categories
Home > Documents > 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Date post: 19-Jan-2016
Category:
Upload: nora-wright
View: 257 times
Download: 3 times
Share this document with a friend
44
1 A Simple Syntax-Directed Translator CS308 Compiler Theory
Transcript
Page 1: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

1

A Simple Syntax-Directed Translator

CS308 Compiler Theory

Page 2: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Lecture Outline

• We shall look at a simple programming language and describe the initial phases of compilation.

• We start off by creating a ‘simple’ syntax directed translator that maps infix arithmetic to postfix arithmetic.

• This translator is then extended to cater for more elaborate programs such as (check page 39 Aho) – While (true) { x=a[i]; a[i]=a[j]; a[j]=x; }

• Which generates simplified intermediate code (as on pg40 Aho)

2CS308 Compiler Theory

Page 3: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Two Main Phases (Analysis and Synthesis)

• Analysis Phase :- Breaks up a source program into constituent pieces and produces an internal representation of it called intermediate code.

• Synthesis Phase :- translates the intermediate code into the target program.

• During this lecture we shall focus on the analysis phase (compiler front end … see figure next slide)

3CS308 Compiler Theory

Page 4: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

A Model of A Compiler Font End

4CS308 Compiler Theory

Page 5: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Syntax vs. Semantics

• The syntax of a programming language describes the proper form of its programs

• The semantics of the language defines what its programs mean.

5CS308 Compiler Theory

Page 6: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

A note on Grammars (Context-free)

• A formal grammar is used to specify the syntax of a formal language (for example a programming language like C, Java)

• Grammar describes the structure (usually hierarchical) of programming languages.

– For e.g. in Java an IF statement should fit in • if ( expression ) statement else statement

– statement -> if ( expression ) statement else statement

– Note the recursive nature of statement.

6CS308 Compiler Theory

Page 7: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

A CFG has four components

• A set of terminal symbols, sometimes referred to as ‘tokens’.

• A set of non-terminals, sometimes called ‘syntactic variables’.

• A set of productions.

• A designation of one of the non-terminals as the start symbol .

7CS308 Compiler Theory

Page 8: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

A Grammar for ‘list of digits separated by + or -’

list -> list + digit

list -> list – digit

list -> digit

digit -> 0|1|… |9

• Accepts strings such as 9-5+2, 3-1, or 7.

• list and digit are non-terminals

• 0 | 1 | … | 9, +, - are the terminal symbols

8CS308 Compiler Theory

Page 9: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Parsing and derivations

• Parsing is the problem of taking a string of terminals and figuring out how to derive it from the start symbol of the grammar

• A grammar derives strings by beginning with the start symbol and repeatedly replacing a non-terminal by the body of a production

• If it cannot be derived from the start symbol then reporting syntax errors within the string.

9CS308 Compiler Theory

Page 10: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Parse Trees

• A parse tree pictorially shows how the start symbol of a grammar derives a string in the language

• A grammar can have more than one parse tree generating a given string of terminals (thus making it ambiguous)

10CS308 Compiler Theory

Page 11: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Parse Trees

• A parse tree is a tree with the following properties:

– The root is labeled by the start symbol.

– Each leaf is labeled by a terminal or by E.

– Each interior node is labeled by a non-terminal.

– If A is the non-terminal labeling some interior node and Xl , X2, • • • , Xn are the labels of the children of that node from left to right, then there must be a production A -> X1X2 · · · Xn. Here, X1 , X2 , . . . , Xn each stand for a symbol that is either a terminal or a non-terminal .

11CS308 Compiler Theory

Page 12: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Ambiguity

string -> string + string | string - string | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

12CS308 Compiler Theory

Page 13: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Operator Associativity and Precedence

• To resolve some of the ambiguity with grammars that have operators we use: – Operator associativity :- in most programming languages arithmetic operators have left

associativity.

– E.g. 9+5-2 = (9+5)-2

– However = has right associativity, i.e. a=b=c is equivalent to a=(b=c)

– Operator Precedence :- if an operator has higher precedence, then it will bind to it’s operands first.

– eg. * has higher precedence then +, therefore 9+5*2 is equivalent to 9+(5*2)

13CS308 Compiler Theory

Page 14: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Syntax-Directed Translation

• Syntax-directed translation is done by attaching rules or program fragments to productions in a grammar.

14CS308 Compiler Theory

Page 15: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Postfix Notation

• If E is a variable or constant , then the postfix notation for E is E itself.

• If E is an expression of the form E1 op E2 , where op is any binary operator, then the postfix notation for E is E’1 E’2 op, where E’1 and E’2

are the postfix notations for E1 and E2 , respectively.

• If E is a parenthesized expression of the form (E1), then the postfix notation for E is the same as the postfix notation for E1 ·

• X=Y==Z+U*V?Y:0

15CS308 Compiler Theory

Page 16: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Synthesized Attributes

• Attributes associated with non-terminals and terminals in a grammar.

• An attribute is said to be synthesized if its value at a parse-tree node N is determined from attribute values of the children of N and at N itself.

16CS308 Compiler Theory

Page 17: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Semantic Rules for infix to postfix

17CS308 Compiler Theory

Page 18: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

A syntax-directed translation scheme

• A notation for specifying a translation by attaching program fragments to productions in a grammar.

• The program fragments are called semantic actions.

18CS308 Compiler Theory

Page 19: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Parsing

• Parsing is the process of determining how a string of terminals can be generated by a grammar.

• Two classes :

– Bottom-up, where construction starts at the leaves and proceeds towards the root;

– Top-down, where construction starts at the root and proceeds towards the leaves.

19CS308 Compiler Theory

Page 20: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Top-Down Parsing

• The top-down construction of a parse is done by starting with the root, labeled with the starting non-terminal stmt, and repeatedly performing the following two steps.

– At node N, labeled with non-terminal A, select one of the productions for A and construct children at N for the symbols in the production body.

– Find the next node at which a subtree is to be constructed, typically the leftmost unexpanded non-terminal of the tree.

20CS308 Compiler Theory

Page 21: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Top-Down Parsing

21

=>

CS308 Compiler Theory

Page 22: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Predictive Parsing

• Recursive descent parsing : a top-down method of syntax analysis in which a set of recursive procedures is used to process the input.

• Predictive parsing is a simple form of recursive-descent parsing, in which the lookahead symbol (the first symbol that can be generated by a production body) unambiguously determines the flow of control through the procedure body for each non-terminal.

22CS308 Compiler Theory

Page 23: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Predictive Parsing

23

Every non-terminal has such a procedure in predictive parser. CS308 Compiler Theory

Page 24: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Left Recursion

• Since the lookahead symbol changes only when a terminal is matched, no change to the input takes place between recursive calls of expr.

• A left-recursive production can be eliminated by rewriting the offending production.

24CS308 Compiler Theory

Page 25: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Eliminating Left Recursion

25

=>

CS308 Compiler Theory

Page 26: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Translators (using program fragments)

26

Left-recursion-eliminated

CS308 Compiler Theory

Page 27: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

A Translator for Simple Expressionsformed by extending predictive parser

27CS308 Compiler Theory

Page 28: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Lexical Analysis

• A lexical analyzer reads characters from the input and groups them into "token objects.“

• Token

• Terminal symbol

• Lexeme

28CS308 Compiler Theory

Page 29: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Extended translation scheme

29CS308 Compiler Theory

Page 30: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Reading Ahead

• Is it '>' or '>=' ? ... The lexer needs to read one character in order to decide what token to return to the parser.

• One-character read ahead usually suffices, so a simple solution is to use a variable, call it peek, to hold the next input character.

30CS308 Compiler Theory

Page 31: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Constant

• Write tokens as tuples enclosed between <> – 31 + 28 + 59 is transformed into the sequence

<num, 31><+><num, 28><+><num, 59>

• Simulate parsing some number .... If ( peek holds a digit) {

v = 0;

Do {

v = v * 10 + integer value of digit peek;

Peek = next input character; } while ( peek holds a digit );

Return token <num, v>

31CS308 Compiler Theory

Page 32: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Keywords and Identifiers

32

=>

Identifiers:

Keywords:

A character string forms an identifier only if it is not a keyword.

CS308 Compiler Theory

Page 33: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Symbol Table

• Data structures that are used by compilers to hold information about the source-program constructs.

• Information is collected incrementally throughout the analysis phase and used for the synthesis phase.

• One symbol table per scope (of declaration)...

{ int x; char y; { bool y; x; y; } x; y; }

{ { x:int; y:bool; } x:int; y:char; } 33

=>

CS308 Compiler Theory

Page 34: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Intermediate Code Generation

• The front end of a compiler constructs an intermediate representation of the source program from which the back end generates the target program.

• Two kinds of intermediate representations

– Tree, including parse trees and (abstract) syntax trees.

– Linear representations, especially “three-address code.”

34CS308 Compiler Theory

Page 35: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Static Checking

• Done by a compiler front end

• To check that the program follows the syntactic and semantic rules

• Including:– Syntactic checking

– Type checking

35CS308 Compiler Theory

Page 36: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Syntax Trees

• For statement

stmt -> while ( expr ) stmt

{ stmt.n = new While(expr.n, stmt.n }

n is a node in the syntax tree

stmts -> stmts1 stmt

{ stmts.n = new Seq(stmts1.n, stmt.n); }

36CS308 Compiler Theory

Page 37: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Syntax Trees

• For expressions

37

=>CS308 Compiler Theory

Page 38: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Three-Address Code

• Three-address code is a sequence of instructions of the form

x = y op Z

• Arrays will be handled by using the following two variants of instructions:

1. x [ y ] = z 2. x = y [ z ]

• Instructions for control flow:

1. if False x goto L 2. if True x goto L 3. goto L

• Instruction for copying value

x = y

38CS308 Compiler Theory

Page 39: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Translation of Statements

• Use jump instructions to implement the flow of control through the statement.

• The translation of if expr then stmtl

39CS308 Compiler Theory

Page 40: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Translation of Statements

40CS308 Compiler Theory

Page 41: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Functions lvalue(x:Expr) and rvalue(x:Expr)

• a = a + 1, a is computed differently for the l-value and r-value

• Two functions used to distinguish then:– Rvalue

which when applied to a nonleaf node x, generates the instructions to compute x into a

temporary var, and returns a new node representing the temporary var.

– Lvalue

which when applied to a nonleaf, generates instructions to compute the subtrees below x, and returns a node representing the “address” for x

• R-values is what we usually think of as “values” while L-values are “locations”

41CS308 Compiler Theory

Page 42: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Translation of Expressions

• Expressions contain binary operators, array accesses, assignments, constants and identifiers.

• We can take the simple approach of generating one three-address instruction for each operator node in the syntax tree of an expression.

• Expression: i-j+k translates into

t1 = i-j

t2 = t1+k

• Expression: 2 * a[i] translates into

t1 = a [ i ]

t2 = 2 * t1

42CS308 Compiler Theory

Page 43: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Test Yourself

• Generate three-address codes for

If(x[2*a]==y[b]) x[2*a+1]=y[b+1];

43CS308 Compiler Theory

Page 44: 1 A Simple Syntax-Directed Translator CS308 Compiler Theory.

Summary• Grammars

• Parse Trees and Syntax Tree– Ambiguity

• Postfix notation

• Lexical analyzer– Token

– Synthesized Attributes

• Parsing– Predicative parsing

• Syntax-directed translation– Attaching rules to productions

– Attaching program fragments to productions

• Intermediate code– Abstract syntax tree

– Three-address code44CS308 Compiler Theory


Recommended