CADSL
Compiler Construction An Introduction
Virendra Singh
Associate Professor Computer Architecture and Dependable Systems Lab
Department of Electrical Engineering Indian Institute of Technology Bombay
http://www.ee.iitb.ac.in/~viren/ E-mail: [email protected]
A Short Course on Compiler Construction & Optimization Lecture 1 (22 May 2014)
CADSL
Use of Computers: Problem Solving
1. Problem defini/on 2. Algorithm design/ Algorithm specifica/on 3. Algorithm analysis 4. Implementa/on 5. Tes/ng 6. [Maintenance]
22 May 2014 Compilers@EE-IITB 2
CADSL
Running Program on Processor
Processor Performance = -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ Time
Program
Architecture -‐-‐> Implementa9on -‐-‐> Realiza9on
Compiler Designer Processor Designer Chip Designer
Instructions Cycles
Program Instruction Time
Cycle
(code size)
= X X
(CPI) (cycle time)
Compilers@EE-IITB 22 May 2014 3
CADSL
From Source to Executable
Compiler main() sub1() data
source program foo.c
main sub1 data
object modules foo.o
prinE scanf gets fopen exit data ...
sta9c library libc.a
Linkage Editor
main sub1 data prinE exit data
load module a.out
other
programs ...
main sub1 data prinE exit data
other ... ...
kernel
Machine memory
?
(system calls)
Loader
(Run Time)
Dynamic library case not shown
“Load 9me”
22 May 2014 Compilers@EE-IITB 4
CADSL Compilers@EE-IITB 5
What is a compiler?
A program that reads a program wriGen in one language and translates it into another language.
Source language Target language
Tradi/onally, compilers go from high-‐level languages to low-‐level languages.
22 May 2014
CADSL 22 May 2014 Compilers@EE-IITB 6
Compilers • Common compila/on tasks
– Language transla/on – Error checking and report – Performance improvement
• Fundamental compila/on principles ü The compiler must preserve the meaning of source program
ü The compiler must improve the source program in some discernible way
CADSL
Compilers Evolution
• In the beginning, there was machine language – Ugly – wri/ng code, debugging – Then came textual assembly – s/ll used on DSPs
– High-‐level languages – Fortran, Pascal, C, C++ – Machine structures became too complex and soYware management too difficult to con/nue with low-‐level languages
22 May 2014 Compilers@EE-IITB 7
CADSL
Why are Compilers Important? • Computer architecture
– Build processors that soYware can be automa/cally mapped to efficiently
– Exploi/ng hardware features • CAD tools
– Behavioral synthesis / C-‐to-‐gates tools are hardware compilers
– Use program analysis/op/miza/on to generate cheaper hardware
• SoYware developers – How do I create a compiler? – How does it map my code to the hardware
22 May 2014 Compilers@EE-IITB 8
CADSL Compilers@EE-IITB 9
Compiler Architecture
Front End – language specific
Back End – machine specific
Source Language
Target Language
Intermediate Language
In more detail:
• Separa/on of Concerns • Retarge/ng
22 May 2014
CADSL Compilers@EE-IITB 10
Compiler Architecture
Scanner (lexical
analysis)
Parser (syntax
analysis)
Code Optimizer
Semantic Analysis
(IC generator)
Code Generator
Symbol Table
Source language
tokens Syntactic structure
Intermediate Language
Target language
Intermediate Language
22 May 2014
CADSL Compilers@EE-IITB 11
Translation of an assignment
Translation of an assignment statement
22 May 2014
CADSL 22 May 2014 Compilers@EE-IITB 12
Lexical Analysis • Character stream è token stream
– Recognize “words” of a language • Theore/cal problem: specify and recognize paGerns in strings – Scanner as a prac/cal applica/on – Regular expression, finite automata – Tools that automa/cally generate scanners are commonly used
index := start + step * 20 Input: index := start + step * 20 After scanning:
identifier operator number
CADSL 22 May 2014 Compilers@EE-IITB 13
Syntactical Analysis • Token stream è syntax tree
– Recognize “sentences” of a language
• Grammars and parsers – CFG
• Parsers can be automa/cally generated
– Top-‐down and boGom-‐up parsing • Predic/ve parsing • Driven process of compiler front-‐ends
After scanning: index := start + step * 20
After parsing:
index
Assign
Exp ID :=
Num ID
ID
+ Exp
*
start
step 20
Exp
Exp Exp
CADSL Compilers@EE-IITB 14
Semantic Analysis • The seman/c analyzer uses the syntax tree and the informa/on in
the symbol table to check the source program for seman/c consistency with the language defini/on.
• Gathers type informa/on and saves it in either the syntax tree or the symbol table, for subsequent use during intermediate-‐code genera/on.
• An important part of seman/c analysis is type checking, where the compiler checks that each operator has matching operands. Ø For example, many programming language defini/ons require an array index
to be an integer; the compiler must report an error if a floa/ng-‐point number is used to index an array.
22 May 2014
CADSL Compilers@EE-IITB 15
Semantic Analysis • The language specifica/on may permit some type conversions
called coercions. – For example, a binary arithme/c operator may be applied to either a pair of
integers or to a pair of floa/ng-‐point numbers. If the operator is applied to a floa/ng-‐point number and an integer, the compiler may convert or coerce the integer into a floa/ng-‐point number.
22 May 2014
CADSL 22 May 2014 Compilers@EE-IITB 16
Semantic Analysis • Understand/annotate meaning of the program – Syntax-‐directed transla/on – Check seman/c errors
• Inconsistent variable defini/ons and uses • Type systems
– Collect knowledge of the input program • Symbol tables • Scopes
CADSL Compilers@EE-IITB 17
Compiler Architecture
Scanner (lexical
analysis)
Parser (syntax
analysis)
Code Optimizer
Semantic Analysis
(IC generator)
Code Generator
Symbol Table
Source language
tokens Syntactic structure
Intermediate Language
Target language
Intermediate Language
22 May 2014
CADSL 22 May 2014 Compilers@EE-IITB 18
Intermediate Code Generation • Representa/on of the input program
– Internal to the compiler – Encode knowledge collected during compila/on – Varied forms and levels
• Typically a compiler use more than one kind of IR
– Desired proper/es • Easy to produce, manipulate, and translate into the target code
CADSL
IR scheme
22 May 2014 Compilers@EE-IITB 19
• front end produces IR
• optimizer transforms IR to more efficient program
• back end transforms IR to target code
CADSL
Kinds of IR • Abstract syntax trees (AST) • Linear operator form of tree (e.g., posfix nota/on)
• Directed acyclic graphs (DAG) • Control flow graphs (CFG) • Program dependence graphs (PDG) • Sta/c single assignment form (SSA) • 3-‐address code • Hybrid combina/ons
22 May 2014 Compilers@EE-IITB 20
CADSL
Categories of IR • Structural
– graphically oriented (trees, DAGs) – nodes and edges tend to be large – heavily used on source-‐to-‐source translators
• Linear – pseudo-‐code for abstract machine – large varia/on in level of abstrac/on – simple, compact data structures – easier to rearrange
• Hybrid – combina/on of graphs and linear code (e.g. CFGs) – aGempt to achieve best of both worlds
22 May 2014 Compilers@EE-IITB 21
CADSL
Important IR properties • Ease of genera/on • Ease of manipula/on • Cost of manipula/on • Level of abstrac/on • Freedom of expression (!) • Size of typical procedure • Original or deriva/ve
22 May 2014 Compilers@EE-IITB 22
Subtle design decisions in the IR can have far-reaching effects on the speed and effectiveness of the compiler! è Degree of exposed detail can be crucial
CADSL
Thank You
22 May 2014 Compilers@EE-IITB 23