Date post: | 15-Apr-2017 |
Category: |
Software |
Upload: | muhammed-afsal-villan |
View: | 443 times |
Download: | 0 times |
1
COMPILER CONSTRUCTION
Jeena Thomas, Asst Professor, CSE, SJCET Palai
2
Jeena Thomas, Asst Professor, CSE, SJCET Palai
MAIN TOPICS COVERED
» Phases of a compiler» Analysis and synthesis phases
3
Jeena Thomas, Asst Professor, CSE, SJCET Palai
WHAT IS A COMPILER?
» A compiler is a kind of translator.
TRANSLATORSoftware that accepts text in certain language
(SOURCE LANGUAGE)
Text in another language,preserving the meaning
of text(TARGET/OBJECT
LANGUAGE)
4
Jeena Thomas, Asst Professor, CSE, SJCET Palai
» A translator, is a generalized form of compiler.» When the object language is a low level language,
such a translator is called a compiler.» This conversion process is essential for the hardware
to interpret and perform the semantics of the input program.
» As an important part of this translation process, the compiler reports to its user the presence of errors in source program.
Error message by compiler
Jeena Thomas, Asst Professor, CSE, SJCET Palai
5
6
Jeena Thomas, Asst Professor, CSE, SJCET Palai
» Compiler is a program written in source language and translates it into an equivalent target language.
example
» Source code» a=(b+c)*(b+c)*2
Target codeMOV b,R2ADD R2,cMUL R2,R2MUL R2, #2.0MOV R2,a
7
Jeena Thomas, Asst Professor, CSE, SJCET Palai
The first real compiler
» FORTRAN compilers of the late 1950s» 18 person-years to build
8
Jeena Thomas, Asst Professor, CSE, SJCET Palai
9
Jeena Thomas, Asst Professor, CSE, SJCET Palai
WHY STUDY COMPILERS?» Writing a compiler gives a student experience with large-scale
applications development. Your compiler program may be the largest program you write as a student. Experience working with really big data structures and complex interactions between algorithms will help you out on your next big programming project.
» Compiler writing is one of the shining triumphs of CS theory. It demonstrates the value of theory over the impulse to just "hack up" a solution.
» Compiler writing is a basic element of programming language research. Many language researchers write compilers for the languages they design.
» Many applications have similar properties to one or more phases of a compiler, and compiler expertise and tools can help an application programmer working on other projects besides compiler
10
» Throughout the 1950’s, compilers were considered difficult programs to write.
» The first Fortan compiler took 18 staff-years o implement.
» Good implementation languages, programming environments, and software tool has been developed as the systematic techniques for handling many of important tasks that occur during compilation.
» With these advances, a substantial compiler can be implemented even as a student project in a one-semester compiler-design course.
Jeena Thomas, Asst Professor, CSE, SJCET Palai
Compiler technology -applications
» is more broadly applicable and has been employed in rather unexpected areas.
» Text-formatting languages, preprocessor packages
» Silicon compiler for the creation of VLSI circuits» Command languages of OS» Query languages of Database systems
11
Jeena Thomas, Asst Professor, CSE, SJCET Palai
12
ISSUES IN COMPILATION» Hierarchy of operations to be maintained -to determine the correct order of evaluation of the expressions. Maintaining data type integrity -each part of complex expression can be made of different types. Compiler as prior knowledge about the nature of
user defined data types. - struct, enum, union, Appropriate storage mappings for data structures - allocation of memory for data
Jeena Thomas, Asst Professor, CSE, SJCET Palai
13
Jeena Thomas, Asst Professor, CSE, SJCET Palai
» The compiler must resolve the occurrence of each
variable name in a program to determine the name space to which a referenced variable belongs to.(Symbol table)
» Compiler should have facilities to handle different control structures like ‘if-then-else’, ‘for’, ‘while’ etc. The compiler should have the facilities to increment the loop variable and terminate the loop.
14
PHASES OF COMPILATION
» Process of compilation is highly complex, it is split into a series of subprocesses called phases.
» A phase is a logically cohesive operation that takes as input one representation of source program and produces as output another representation.
» Activities of compilation split into two parts1) Analysis part
2) Synthesis part
Jeena Thomas, Asst Professor, CSE, SJCET Palai
15
Jeena Thomas, Asst Professor, CSE, SJCET Palai
The Structure of a Compiler
16
Jeena Thomas, Asst Professor, CSE, SJCET Palai
» Analysis of source program» is done by the front end of compiler» It determines meaning of source string.» Synthesis of target program» Is done by the back end of the compiler.» An equivalent target string is constructed from
the output given by the front end of compiler.
17
Jeena Thomas, Asst Professor, CSE, SJCET Palai
Analysis
» In compiling, analysis has three phases:» Linear analysis: stream of characters read from
left-to-right and grouped into tokens; known as lexical analysis or scanning
» Hierarchical analysis: tokens grouped hierarchically with collective meaning; known as parsing or syntax analysis
» Semantic analysis: check if the program components fit together meaningfully
18
Jeena Thomas, Asst Professor, CSE, SJCET Palai
Sub-steps of synthesis part
» Optimization of code» Allocation of memory» Generation of code
19
PHASES OF A COMPILER
Jeena Thomas, Asst Professor, CSE, SJCET Palai
20
Jeena Thomas, Asst Professor, CSE, SJCET Palai
PHASES OF COMPILATION
21
Jeena Thomas, Asst Professor, CSE, SJCET Palai
1.LEXICAL ANALYSIS» Performs the linear analysis on the source
program.» It reads a stream of characters making up the
source program from left to right and groups them into tokens.
» A token is defined as a sequence of characters that have a collective meaning.
» For each token identified, this phase also determines the category of the token as identifier, constant or reserved words and its attribute that identifies the symbol’ position in the symbol table
22
Jeena Thomas, Asst Professor, CSE, SJCET Palai
LEXICAL ANALYZER/SCANNER
23
Jeena Thomas, Asst Professor, CSE, SJCET Palai
Symbol Table
» Identifiers are names of variables, constants, functions, data types, etc.
» Store information associated with identifiers» Information associated with different types of
identifiers can be different» Information associated with variables are name,
type, address, size (for array), etc.» Information associated with functions are name
, type of return value, parameters, address, etc.
24
Jeena Thomas, Asst Professor, CSE, SJCET Palai
EXAMPLE » Consider the following statement
» a=(b+c)*(b+c)*2--------------------------------(1)
25
Jeena Thomas, Asst Professor, CSE, SJCET Palai
Tokens for the expression 1Symbol Category Attribute
a Identifier #1
= operator Assignment(1)b Identifier #2
+ operator Arithmetic(1)
c Identifier #3
* operator Arithmetic(2)
( operator Open parenthesis(1)) operator Closed parenthesis(1)
2 Constant #4
26
Jeena Thomas, Asst Professor, CSE, SJCET Palai
2.SYNTAX ANALYSIS» This phase performs hierarchical analysis on the
source program.» Here, the tokens are grouped into hierarchically
nested collections with collective meaning called expressions or statements.
» It determines structure of source language.» Represents the grammar / syntax of the language.» These grammatical phrases are represented in the
form of parse tree.
27
Jeena Thomas, Asst Professor, CSE, SJCET Palai
Parse tree» Describes the syntactic structure of input» The terminal nodes represent the tokens and
interior nodes represent the expressions.
28
Jeena Thomas, Asst Professor, CSE, SJCET Palai
SYNTAX TREES
» Syntactic structures also represented using syntax trees.
» A syntax tree is a compressed representation of the parse tree, where the operators appear as interior nodes and operands for this operator as their children
29
Jeena Thomas, Asst Professor, CSE, SJCET Palai
PARSE TREE Vs. SYNTAX TREE» Syntax tree is a compressed representation of a
parse tree.» The interior node in a syntax tree represent an
operator, whereas the interior nodes in a parse tree represent an expression.
» The leaf node of a syntax tree represent the operand, whereas leaf node in a parse tree represent the tokens.
30
Jeena Thomas, Asst Professor, CSE, SJCET Palai
3.SEMANTIC ANALYSIS
» Goal- is to determine the meaning of a source string.
» It checks the source program for semantic errors and gathers the type of information that can be used in subsequent phases of compilation.
» Type checking for operations also performed during this phase.
» Output- Annotated tree
31
Jeena Thomas, Asst Professor, CSE, SJCET Palai
4.INTERMEDIATE CODE GENERATION» It is a part of the synthesis process of the
compiler.» The intermediate code is the representation for
an abstract machine.» Using the intermediate code, optimization and
code generation can be performed.
32
Jeena Thomas, Asst Professor, CSE, SJCET Palai
PROPERTIES OF AN INTERMEDIATE CODE
» It should be easily generated from semantic representation of the source program.
» It should be easy to translate the intermediate code to target language.
» It should be capable of holding the values computed during translation.
» It should maintain precedence ordering of the source language.
» It should be capable of holding the correct number of operands of the instruction.
33
Jeena Thomas, Asst Professor, CSE, SJCET Palai
example
» T1=intoreal(2)» T2=b+c» T3=b+c» T4=T2*T3» T5=T4*T1» a=T5
34
Jeena Thomas, Asst Professor, CSE, SJCET Palai
5.CODE OPTIMIZATION» The main aim of this phase is to improve on the
intermediate code to generate a code that runs faster and/or occupies less space.
» It is used to establish trade off between compilation speed and execution speed.
35
Jeena Thomas, Asst Professor, CSE, SJCET Palai
EXAMPLE
» T1=inttoreal(2)» T2=b+c» T3=T2*T2» T4=T3*T1» a=T4
36
Jeena Thomas, Asst Professor, CSE, SJCET Palai
6.CODE GENERATION
» The main aim of this phase is to allocate storage and generate a relocatable machine/ assembly code.
» Memory locations and registers are allocated for variables.
» The instructions in intermediate code format are converted into machine instructions.
37
Jeena Thomas, Asst Professor, CSE, SJCET Palai
example
» MOV R2, b» ADD R2,c» MUL R2,R2» MUL R2, #2.0» MOV R2, a
38
Jeena Thomas, Asst Professor, CSE, SJCET Palai
7.TARGET CODE OPTIMIZATION
» The compiler also attempts to improve the target code generated by the code generator by choosing proper addressing modes to improve the performance, replacing slow instructions by fast ones and eliminating redundant instructions.
» MUL R2, #2.0-------------------SHL(Shift Left Instruction)
39
Jeena Thomas, Asst Professor, CSE, SJCET Palai
Example
» MOV b, R2» ADD R2,c» MUL R2,R2» SHL R2» MOV R2, a
40
Jeena Thomas, Asst Professor, CSE, SJCET Palai
OTHER MODULES INVOLVED
» 1.)Symbol Table Management» 2.) Literal Table Management» 3.) Error Detection and Reporting
41
Jeena Thomas, Asst Professor, CSE, SJCET Palai
1.SYMBOL TABLE MANAGEMENT
» A symbol table is a data structure that contains a record for each identifier with fields for the attributes of the identifier.
» This data structure has facilities to manipulate(add/delete) the elements in it.
» The type information about the identifier is detected during lexical analysis phase and is entered into the symbol table.
» This information is used during the intermediate code generation and code generation phases of compiler to verify type information.
42
Jeena Thomas, Asst Professor, CSE, SJCET Palai
SYMBOL TABLE
Address Symbol Attribute Memory Location
1 A id,real 1000
2 B id,real 1100
3 C id,real 1110
43
Jeena Thomas, Asst Professor, CSE, SJCET Palai
2.LITERAL TABLE MANAGEMENT
» literal table maintains the details of constants and strings used in the program.
» It reduces the size of a program in memory by allowing reuse of constants and strings.
» It is also needed by the code generator to construct symbolic addresses for literals.
44
Jeena Thomas, Asst Professor, CSE, SJCET Palai
LITERAL TABLE
Address Literal Attribute Memory Location
4 2 const,int 1200
45
Jeena Thomas, Asst Professor, CSE, SJCET Palai
3.ERROR DETECTION & REPORTING
» Each phase encounters errors.» After detecting the errors, this phase must deal
with the errors to continue with the process of compilation.
46
Jeena Thomas, Asst Professor, CSE, SJCET Palai
LIST OF ERRORS ENCOUTERED BY VARIOUS PHASES
» 1. Lexical analyzer: Misspelled tokens» 2.Syntax analyzer: syntax errors like missing
parenthesis» 3.Intermediate code generator: Incompatible
operands for an operator» 4. Code Optimizer: Unreachable statements» 5. Code Generator :Memory restrictions to store a
variable. For example, when the value of an integer variable exceeds its size.
» Symbol tables: Multiply declared identifiers
47
Jeena Thomas, Asst Professor, CSE, SJCET Palai
Exercise
» Show the output of all the phases of he compiler for the following line o code
» A[index]=4+2+index
48
Jeena Thomas, Asst Professor, CSE, SJCET Palai
49
Jeena Thomas, Asst Professor, CSE, SJCET Palai
The Grouping of Phases
50
Jeena Thomas, Asst Professor, CSE, SJCET Palai
COMPILER CONSTRUCTION TOOLS
» Scanner generators» Parser generators» Syntax-directed translation engines» Automatic code generators» Data-flow engines
51
Jeena Thomas, Asst Professor, CSE, SJCET Palai
» Scanner generators: » generate lexical analyzers automatically from
the language specifications written using regular expressions.
» It generates a finite automaton to recognize the regular expression.
» Example-lex
52
Jeena Thomas, Asst Professor, CSE, SJCET Palai
» parser generators» They produce syntax analyzers from Context
Free Grammar(CFG).» As syntax analysis phase is highly complex and
consumes manual and compilation time, these parser generators are highly useful.
» Example-yacc
53
» Syntax-directed translation engines» These engines have routines to traverse the
parse tree and produce intermediate code.» The basic idea is that one or more translations
are associated with each node of parse tree.
Jeena Thomas, Asst Professor, CSE, SJCET Palai
54
Jeena Thomas, Asst Professor, CSE, SJCET Palai
» Automatic code generators» These tools convert the intermediate language
into machine language for the target machine using a collection of rules.
» Template matching process is used.» An intermediate language statement is replaced
by its equivalent machine language statement
55
Jeena Thomas, Asst Professor, CSE, SJCET Palai
» Data-flow engines» It is used in code optimization.» These tools perform good code optimization
using “data-flow analysis” which gathers information that flows from one part of the program to another.