Date post: | 16-Jul-2015 |
Category: |
Education |
Upload: | adilmehmood93 |
View: | 517 times |
Download: | 0 times |
Agenda of today presentation
• What is Compiler
• Brief History of compiler
• Task of compiler• Phases of compiler
source code
Compiler
Machine code
What is Compiler
• Is a program that translates one language to another
• Takes as input a source program typically written in a high-level language
• Produces an equivalent target program typically in assembly or machine language
• Reports error messages as part of the translation process
Brief history of Compiler
• The term “compiler” was coined in the early 1950s by Grace Murray Hopper
• The first compiler of the high-level language FORTRAN was developed between 1954 and 1957 at IBM
• The first FORTRAN compiler took 18 person-years to create
Compiler tasks
A compiler must perform two tasks: analysis of source program: The analysis part breaks up the
source program into constituent pieces and imposes a grammatical structure on them. It then uses this structure to create an intermediate representation of the source program.
synthesis of its corresponding program: constructs the desired target program from the intermediate representation and the information in the symbol table.
The analysis part is often called the front end of the compiler; the synthesis part is the back end.
Compiler phases
• Lexical Analyzer• Syntax Analyzer• Semantic Analyzer• Intermediate Code
Generator• Code Optimizer• Code Generation
Lexical Analysis (scanner): The first phase of a compiler
• Lexical analyzer reads the stream of characters making up the source program and groups the characters into meaningful sequences called lexeme
• For each lexeme, the lexical analyzer produces a token of the form that it passes on to the subsequent phase, syntax analysis(token-name, attribute-value)
• Token-name: an abstract symbol is used during syntax analysis, an
• attribute-value: points to an entry in the symbol table for this token.• Tokensrepresent basic program entities such as:
Identifiers, Literals, Reserved Words, Operators, Delimiters, etc.
Example: 1.”position” is a lexeme mapped into a token (id,
1), where id is an abstract symbol standing for identifier and 1 points to the symbol table entry for position. The symbol-table entry for an identifier holds information about the identifier, such as its name and type.
2. = is a lexeme that is mapped into the token (=). Since this token needs no attribute-value, we have omitted the second component. For notational convenience, the lexeme itself is used as the name of the abstract symbol.
3. “initial” is a lexeme that is mapped into the token (id, 2), where 2 points to the symbol-table entry for initial.
4. + is a lexeme that is mapped into the token (+).5. “rate” is a lexeme mapped into the token (id,
3), where 3 points to the symbol-table entry for rate.
6. * is a lexeme that is mapped into the token (*) .
7. 60 is a lexeme that is mapped into the token (60)
Blanks separating the lexemes would be discarded by the lexical analyzer.
position = initial + 60*
rate
Table
id 1
id 2
id 3
token lexem
Syntax Analysis (parser) : The second phase of the compiler
• The parser uses the first components of the tokens produced by the lexical analyzer to create a tree-like intermediate representation that depicts the grammatical structure of the token stream.
• A typical representation is a syntax tree in which each interior node represents an operation and the children of the node represent the arguments of the operation
token is
id1 += *id3id2 60
Syntax Analysis Example
Pay = Base + Rate* 60 The seven tokens are grouped into a parse tree
Assignment stmt
identifier
pay
= expression
expression expression+
identifier
base
Rate*60
Semantic Analysis: Third phase of the compiler
The semantics of a program are its meaningas opposed to syntax or structureThe semantics consist of:Runtime semanticsbehavior of program at runtime
Static semantics–checked by the compileStatic semantics include:Static semantics–checked by the compileDeclarations of variables and constants before useCalling functions that exist (predefined in a library or defined by the user)Passing parameters properlyType checking.Annotates the syntax tree with type information
Semantic Analysis: Third phase of the compiler The semantics of a program are its meaningas opposed to syntax or structureThe semantics consist of:Runtime semanticsbehavior of program at runtime
Static semantics–checked by the compileStatic semantics include:Static semantics–checked by the compileDeclarations of variables and constants before useCalling functions that exist (predefined in a library or defined by the user)Passing parameters properlyType checking.Annotates the syntax tree with type information
Intermediate Code Generation: three-address code
After syntax and semantic analysis of the source program, many compilers generate an explicit low-level or machine-like intermediate representation (a program for an abstract machine). This intermediate representation should have two important properties: – it should be easy to produce and– it should be easy to translate into the target machine.
The considered intermediate form called three-address code, which consists of a sequence of assembly-like instructions with three operands per instruction. Each operand can act like a register.
Code Optimization: to generate better target code
• The machine-independent code-optimization phase attempts to improve the intermediate code so that better target code will result.
• Usually better means:
– faster, shorter code, or target code that consumes less power.
• The optimizer can deduce that the conversion of 60 from integer to floating point can be done once and for all at compile time, so the int to float operation can be eliminated by replacing the integer 60 by the floating-point number 60.0. Moreover, t3 is used only once
• There are simple optimizations that significantly improve the running time
of the target program without slowing down compilation too much.
Code Generation: takes as input an intermediate representation of the source program and maps it into the target language
• If the target language is machine, code, registers or memory locations are selected for each of the variables used by the program.
• Then, the intermediate instructions are translated into sequences of machine instructions that perform the same task.
• A crucial aspect of code generation is the judicious assignment of registers to hold variables.