Date post: | 16-Mar-2018 |
Category: |
Engineering |
Upload: | najmul-hassan |
View: | 111 times |
Download: | 0 times |
1
Compiler Construction (CS-636)
Muhammad Bilal BashirUIIT, Rawalpindi
Outline
1. Data Types & Type Checking
2. Intermediate Code Generation
3. Variants of Syntax Trees
4. Three-Address Code
5. Static Single-Assignment Form
6. Summary
2
Semantic Analysis
Lecture: 21-22
3
Data Types & Type Checking
One of the principal tasks of a compiler is the computation and maintenance of information on data types (type inference)
Compiler uses this information to ensure that each part of the program makes sense under the type rules of the language (type checking)
Data type information can occur in a program in several different forms
Theoretically, a data type is a set of values, or more precisely a set of values with certain operations on those values
4
Data Types & Type Checking (Continue…)
For instance, data type integer in a programming language refers to a subset of mathematical integers, together with the arithmetic operations
These sets in compiler constructions are described by a type expression
Type expressions can occur in several places in a program
5
Type Expressions & Type Constructors A programming language always contain a number
of built-in types These predefined types correspond either to
numeric data types like int or double OR they are elementary types like boolean or char
Such data types are called simple types, in that their values exhibit no explicit internal structure
An interesting predefined type in C language is void type This type has no values, and so represents empty set
6
Type Expressions & Type Constructors (Continue…) In some languages it is possible to define new
simple types subrange in Pascal and enumerated types in C
In Pascal, subrange of integers from 0 to 9 can be declared astype Digit = 0..9;
In C, an enumerated type consisting of named values can be declared astypedef enum {red, green, blue} Color;
7
Type Expressions & Type Constructors (Continue…) Given a set of predefined types, new data types can
be created using type constructors, such as array and record, or struct
Such constructors can be viewed as functions that take existing types as parameters and return new types with a structure that depends on the constructor
Such types are called structured types
8
Type Names, Type Declarations, and Recursive Types Languages that have a rich set of type constructors
usually also have a mechanism for a programmer to assign names to type expressions
Such type declarations (sometimes called type definitions) can be done in C as follows
struct RealIntRec {
double r;
int I;
};
9
Type Names, Type Declarations, and Recursive Types (Continue…) Type declarations cause the declared type names to
be entered into the symbol table just as variable declarations cause variable names to be entered
Type names are associated with attributes in the symbol table in a similar way to variable declarations
These attributes include scope and type expressions corresponding to the type name
Since type names can appear in type expressions, question arise about the recursive use of type names
10
Type Names, Type Declarations, and Recursive Types (Continue…) In C programming language, recursive type names
cannot be declared directly because at time of declaration it is unknown that how much memory be required for the structure;
struct intBST {
int val;
struct intBST *left, *right;
};
11
Type Equivalence
Given the possible type expressions of a language, a type checker must frequently answer the question of when two type expressions represent the same type
This is the question of type equivalence There are many possible ways for type equivalence
to be defined by a language Type equivalence checking can be seen as a
function in a compilerfunction typeEqual( t1, t2, TypeExp ) : Boolean
12
Type Equivalence (Continue…)
The typeEqual() function takes two type expressions and returns true if they represent the same type according to the type equivalence rules of the language
One issue that relates directly to the description of type equivalence algorithm is the way type expressions are represented within a compiler
One straightforward method is to use a syntax tree representation
13
Type Inference & Type Checking
Type checking is described in terms of semantic actions based on representation of types and a typeEqual() operation.
Compiler needs symbol table as well for this purpose along with three of its basic operations insert, lookup, and delete
14
Type Inference & Type Checking (Continue…)
Consider the following grammar;
15
Type Inference & Type Checking (Continue…)
16
Intermediate-Code Generation
Back-end of a Compiler
17
Where Are We Now?
18
Scanner
Parser
Semantics Analyzer
Intermediate Code Generator
Source code
Syntax Tree
Annotated Tree
Intermediate code
Tokens
Intermediate-Code Generation
In the analysis-synthesis model of a compiler, the front end analyzes a source program and creates an intermediate representation, from which the back end generates target code
Ideally, details of the source language are confined to the front end, and details of the target machine to the back end
With a suitably defined intermediate representation, a compiler for language I and machine j can then be built by combining the front end for language I with back end for the machine j
19
Intermediate-Code Generation (Continue…)
Following figure shows front-end model of compiler
Static checking includes type checking, which ensures that operators are applied to compatible operands
Static checking also includes any syntactic checks that remain after parsing A break statement in C is enclosed within a while, for or
switch statement
20
Intermediate-Code Generation (Continue…)
While translating a program, compiler may construct a sequence of intermediate representations
High-level representations are close to the source language and low-level representation are close to the target machine
The abstract syntax trees are high-level intermediate representation Depict natural hierarchical structure of the source program
21
Source Program
High Level Intermediate
Representation
Low Level Intermediate
Representation
Target Code
Intermediate-Code Generation (Continue…)
A low-level representation is suitable for machine-dependent tasks like register allocation and instruction selection
Three-address code can range from high- to low-level, depending upon the choice of operators
The difference between syntax trees and three-address code are superficial A syntax tree represents the component of a statement,
whereas three-address code contains labels and jump instructions to represent the flow of control, as in machine language
22
Intermediate-Code Generation (Continue…)
The choice or design of an intermediate representation varies from compiler to compiler
An intermediate representation may either be an actual language or it may consist of internal data structures that are shared by phases of the compiler
C is a programming language, yet it is often used as an intermediate form C is flexible, it compiles into efficient machine code, and its
compilers are widely available The C++ compiler consisted of a front end that generated
C, treating a C compiler as a back end
23
Variants of Syntax Trees
Nodes in a syntax tree represent constructs in the source program The children of the node represents meaningful
components of a construct
A directed acyclic graph (DAG) for an expression identifies the common suhexpression of the expression
24
Directed Acyclic Graphs for Expressions A directed acyclic graph (DAG), is a directed graph
with no directed cycles Like syntax tree for an expression, a DAG has
leaves corresponding to atomic operands and interior nodes corresponding to operators
A node N in a DAG has more than one parent if N represents a common subexpression
A DAG not only represents expressions more succinctly, it gives the compiler important clues regarding the generation of efficient code to evaluate the expression
25
Directed Acyclic Graphs for Expressions (Continue…) Create Syntax Trees and DAG’s for the following
expressions a = a + 10 a + b + (a + b) a + b + a + b a + a * (b – c) + (b – c) * d
26
The Value-Number Method for Constructing DAG’s Often, the nodes of a syntax tree or DAG are stored
in an array of records Each row of the array represents one record, and
therefore one node Consider the figure on next slide that shows a DAG
along with an array for expression i = i + 10
27
The Value-Number Method for Constructing DAG’s (Continue…) In the following figure leaves have one additional
field, which holds the lexical value, and interior nodes have two additional fields indicating the left and right children
28
The Value-Number Method for Constructing DAG’s (Continue…) In the array, we refer to nodes by giving the integer
index of the record for that node within the array This integer is called the value number for the node
or for the expression represented by the node
29
Three-Address Code
In three-address code, there is at most one operation on the right side of an instruction
Expression like x+y*z might be translated into the sequence of three-address instructions
t1 = y*z
t2 = x+t1
t1 and t2 are compiler generated temporary names
The use of names for intermediate values computed by a program allows three-address code to be rearranged easily
30
Three-Address Code (Continue…)
Exercise Represent the following DAG in three-address code
sequence
31
Addresses and Instructions
Three-address code is built from two concepts: addresses and instructions
In object-oriented terms, these concepts correspond to classes, and the various kinds of addresses and instructions correspond to appropriate subclasses
Alternatively, three-address code can be implemented using records with fields for the addresses
The records called quadruples and triples
32
Addresses and Instructions (Continue…)
In three-address code scheme, an address can be one of the following A name: The names that appear in source program. In
implementation, a source name is replaced by a pointer to its symbol table entry, where all the information about the name is kept
A constant: In practice, a compiler must deal with many different types of constants and variables
A compiler-generated temporary: It is useful, especially in optimizing compilers, to create a distinct name each time a temporary is needed
33
Addresses and Instructions (Continue…)
Few examples of three-address code instructions are mentioned below; Assignment instruction x = y op z Assignment of the form x = op y Copy instructions of the form x = y An unconditional jump goto L Conditional jumps of the form if x goto L Indexed copy instructions of the form x = y[z] OR y[z] = x etc.
34
Addresses and Instructions (Continue…)
Consider the following statement and its three-address code in the figures;do
i = i+1;
while( a[i]<v );
35
Quadruples & Triples
The description of three-address instructions specifies components of each type of instructions, but it does not specify the representation of these instructions in a data structure
In a compiler, these instructions can be implemented as objects or as records with fields for the operator and the operands
Three such representations are called “quadruples”, “triples”, and “indirect triples”
36
Quadruples
A quadruple or just “quad” has four fields, which we call op, arg1, arg2, and result In x=y+z, ‘+’ is op, y and z are arg1 and arg2 whereas x is
result
The following are some exceptions in this rule; Instructions with unary operators like x = minus y OR x = y
do not use arg2
Operators like param use neither arg2 nor result
Conditional and unconditional jumps put the target label in result
37
Quadruples (Continue…)
Example: Three-address code for the assignment a = b*-c+b*-c is shown below
38
Triples
A triple has only three fields which we call op, arg1, and arg2
In earlier example we have seen the result field is used primarily for temporary names
Using triples, we refer to the result of an operation x op y by its position rather than an explicit temporary name
Consider the figure in next slide for details;
39
Triples (Continue…)
Example: Three-address code using Triples
40
Static Single-Assignment Form
The Static Single-Assignment Form (SSA) is an intermediate representation that facilitates certain code optimizations
Two aspects distinguish SSA from three-address code All assignments in SSA are to variables with distinct names SSA uses a notational convention Φ-function to combine
two definitions of same variablesif( flag ) x = -1; else x = 1;
y = x + a
if( flag ) x1 = -1; else x2 = 1;
x3 = Φ(x1,x2)
41
42
Summary
Any Questions?