Compiler Principle and Technology
Prof. Dongming LU
Mar. 1st, 2021
Content1. INTRODUCTION
2. SCANNING
3. CONTEXT-FREE GRMMARS AND PARSING
4. TOP-DOWN PARSING
5. BOTTOM-UP PARSING
6. SEMANTIC ANALYSIS
7. RUNTIME ENVIRONMENT
8. CODE GENERATION
1. INTRODUCTION
What is a compiler?
A program translates one language to another equivalently
(high-level language) (machine code)
CompilerSource
language
Target
language
Extension
CompilerOne
format
Another
format
What is a compiler?
• A complex program
• Used in almost all forms of computing
• Of significant practical use
The purpose of this course• Provide basic knowledge▫ Theory
▫ Coding techniques
• Give necessary tools and practical experiences
▫ A series of simple examples
▫ TINY, C-Minus
• Understand the design pattern and module division methods in
software engineering
▫ As a special software
CompilerOne format Another format
How to translate ?
Main Topics
1.1 Why Compilers? A Brief History
1.2 Programs Related to Compilers
1.3 The Translation Process
1.4 Major Data Structures in a Compiler
1.5 Other Issues in Compiler Structure
1.6 Bootstrapping and Porting
1.1 Why?A Brief History
Why Compiler
C7 06 0000 0002 Mov x, 2 x=2
Time consuming and Tedious, Not easy to understand
Not easy to write and understand, dependent on particular machine
Brief History of Compiler
1954-1957:First compiler
1960s ~1970s: Theories and
algorithms
1970s~1990s: Some tools
last 20 years : IDE
The Chomsky Hierarchy was
also studiedFA, CFG et al Yacc, Lex
Compiler design have not changed much
Used by e.g.,
Brief History of Compiler
Noam ChomskyStructure of nature lanugage
The Chomsky Hierarchy
Four levels ofGrammar (0-3)
Type 3: FAFinite Automata
Type 2: CFGContext-Free Grammar
1.2 Programs related to Compiler
Editors
Preprocessors
Assemblers
LinkersLoaders
similar to
Profilers
Programms run before the compiler
Programs related to compiler
Compilers
Interpreters Project managers
Debuggers
Programms run after the compiler
Executing immediately Generating object code
Run slowly Run fast
Interpreters and Compilers share many of their operations
Interpreters VS Compilers
(interpreters) (compilers)
source code (BASIC)
interpreterexecute on X86
interpreterexecute on ARM
interpreterexecute on MIPS
source code(c++)
X86
ARM
MIPS
target language 1
execute
compiler 2
compiler 3
target language 2
target language 3
execute
execute
compiler 1
Preprocessor
Compiler
Assembler
Linker
Object code
other programs, e.g., debugger
Executable file
Machine-dependent
Other Programs related to compiler
Editors
Source code
Loader
1.3 The Translation Process
The Phases of a Compiler
Scanner
Parser
Semantics Analyzer
Source Code Optimizer
Code Generator
Target Code Optimizer
Literal
Table
Symbol
Table
Error
Handler
Source code
Syntax Tree
Annotated Tree
Intermediate code
Target code
Target code
Tokens
Important data structures
The phases of a compiler
• Six phases
▫ Scanner
▫ Parser
▫ Semantic Analyzer
▫ Source code optimizer
▫ Code generator
▫ Target Code Optimizer
• Three auxiliary components
▫ Literal table
▫ Symbol table
▫ Error Handler
The Scanner
• Lexical analysis: sequences of characters->tokens
eg : a[index]=4+2
id, 1 [ id, 2 ] = 4 + 2
a id[ left bracketindex id] right bracket= assignment4 number+ plus sign2 number
Scanner
Symbol Tablea
index…
. . .
. . .
. . .
1
23
The Parser• Syntax analysis:
Tokens -> Parse tree / Syntax tree ( Abstract syntax tree)
a index 4 2identifier identifier number number
expression [ expression ] expression + expression
subscript-expression additive-expression
expression = expression
Assign-expression
expression
a[index]=4+2
a index 4 2
identifier identifier number number
subscript-expression additive-expression
Assign-expression
The Parse Tree The Syntax Tree
The Semantic Analyzer
• The semantics of a program are its “meaning” :
declarations and type checking
eg: a[index]=4+2
The Syntax Tree annotated with attributes
a index 4 2
identifier identifier number number
subscript-expression additive-expression
integer integer
Assign-expression
array of integer integer integer integer
The Source Code Optimizer• The constant folding
eg: a[index]=4+2
Three-address code is easier to optimize
a index
identifier identifier
subscript-expression
integer
Assign-expression
array of integer integer
number
6
integer
a index 4 2
identifier identifier number number
subscript-expression additive-expression
integer integer
Assign-expression
array of integer integer integer integer
constant folding
The Code Generator
• Ihe properties of the target machine become the major factor.
eg: a[index]=4+2:
MOV R0, index ;; value of index -> R0
MUL R0, 2 ;; double value in R0
MOV R1, &a ;; address of a -> R1
ADD R1, R0 ;; add R0 to R1
MOV *R1, 6 ;; constant 6 -> address in R1
The Target Code Optimizer• Optimizing methods:▫ Choosing addressing modes▫ Replacing slow instructions by the faster ones▫ Eliminating redundant operations
eg: a[index]=4+2
MOV R0, index ;; value of index -> R0MUL R0, 2 ;; double value in R0MOV R1, &a ;; address of a -> R1ADD R1, R0 ;; add R0 to R1MOV *R1, 6 ;; constant 6 -> address in R1
MOV R0, index ;; value of index -> R0SHL R0 ;; double value in R0MOV &a[R0], 6 ;; constant 6 -> address a + R0
transform to
1.4 Major Data Structures in a Compiler
Major Data Structures
• TOKEN:
a single global variable
• THE SYNTAX TREE:
pointer-bases structure
• THE SYMBOL TABLE:
hash table
• THE LITERAL TABLE:
reuse of constants and strings
• INTERMEDIATE CODE:
three-address code
• TEMPORARY FILES:
products of intermediate steps
1.5 Other Issues in Compiler Structure
The Structure of Compiler
• Multiple views from different angles
▫ Logical Structure
▫ Physical Structure
▫ Sequencing of the operations, and so on.
• A major impact of the structure
▫ Reliability
▫ Efficiency
▫ Usefulness
▫ Maintainability
Analysis and Synthesis(Logical Structure)
• Analysis part: analyzes source program & compute properties
▫ Lexical analysis, syntax analysis and semantics analysis, as well as
optimization
▫ More mathematical and better understood
• Synthesis part: produces the translated codes
▫ Code generation, as well as optimization
▫ More specialized
• The two parts can be changed independently of the other
Front End and Back End(Physical Structure)
Scanner
Parser
Semantics Analyser
Source code
Intermediate code synthesis
Code Generator
Target Code Optimizer
Target code
Important
for compiler
portability
Front end-orient to
people: depend on the
source language
independent of target
language
Back end- orient to
computer:depend on
the target language
independent of source
language
Passes(Sequencing of the operations)• Pass: the repetitions to process the entire source program
before generating code.
• Most compilers with optimization use more than one pass▫ One pass for scanning and parsing
▫ One pass for semantic analysis and source-level optimization
▫ The third pass for code generation and target-level optimization
• A compiler can be one pass or multiple passes▫ One pass: efficient compilation; less efficient target code
-> Space consuming▫ Multiple passes: efficient target code; complicated compilation
-> Time consuming
Language Definition and compilers
• The lexical and syntactic structure of a programming
language
▫ Regular expressions
▫ Context-free grammar
• The semantics of a programming language in English
descriptions
▫ Language reference manual, or language definition.
• A language definition and a compiler are often
developed simultaneously
▫ The techniques have a major impact on definition
▫ The definition has a major impact on the techniques
Language Definition and compilers
Compiler options and interfaces
• Mechanisms for interfacing with the operation system
▫ Input and output facilities
▫ Access to the file system of the target machine
• Options to the user for various purposes
▫ Specification of listing characteristic
▫ Code optimization options
Error Handling
• Static (or compile-time) errors must be reported by a compiler
▫ Generate meaningful error messages
▫ Resume compilation after each error
▫ Different kind of error handing for Each phase of a compiler
• Exception handling
▫ Perform suitable runtime tests
▫ Exception handing mechanisms are required
1.6 Bootstrapping and Porting
Third Language for Compiler Construction• Machine language
▫ Execute immediately
• Another language with existed compiler on the same target machine : (First Scenario)
▫ Compile the new compiler with existing compiler
• Another language with existed compiler on different machine : (Second Scenario)
▫ Compilation produce a cross compiler
T-Diagram Describing Complex Situation
• A compiler written in language H that translates
language S into language T.
S T
H
• T-Diagram can be combined in two basic ways.
The First T-diagram Combination
A B B C A C
H H H
• Two compilers run on the same machine H▫ First from A to B
▫ Second from B to C
▫ Result from A to C on H
The Second T-diagram Combination
A B A B
H H K K
M
• Translate implementation language of a compiler from H to K
• Use another compiler from H to K
The First Scenario
A H A H
B B H H
H
• Translate a compiler from A to H written in B
▫ Use an existing compiler for language B on machine H
• Another language with existed compiler on the same target machine : (First Scenario)
The Second Scenario
A H A H
B B K K
K
• Use an existing compiler for language B on different
machine K
▫ Result in a cross compiler
• Another language with existed compiler on different machine : (Second Scenario)
Process of Bootstrapping
• Write a compiler in the same language
S T
S
• No compiler for source language yet
• Porting to a new host machine
The First step in bootstrap
A H A H
A A H H
H
• “quick and dirty” compiler written in machine language H
• Compiler written in its own language A
• Result in running but inefficient compiler
The Second step in bootstrap
A H A H
A A H H
H
• Running but inefficient compiler
• Compiler written in its own language A
• Result in final version of the compiler
The step 1 in porting
A K A K
A A H H
H
• Original compiler
• Compiler source code retargeted to K
• Result in Cross Compiler
The step 2 in porting
A K A K
A A K K
H
• Cross compiler
• Compiler source code retargeted to K
• Result in Retargeted Compiler
1.7 The Tiny Sample Language
and Compiler
The Requirements
A complete source code in (ANSI) C for a
small language whose compiler can be easily
comprehended once the techniques are
understood .
The TINY Language
Very simple structure:
• Just a sequence of statements separated by semicolons;
• All variables are interger variables;
• An if-statement and a repeat-statement;
• Read and write statements for input/output;
• Comments are within curly brackets.
The TINY Language
A simple example:{ Sample program
in TINY language- computes factorial
}
Read x; { input an integer}
If x > 0 then { don’t compute if x <=0}
fact :=1;
repeat
fact := fact * x;
x := x-1
until x=0
write fact { output factorial of x}
end
The TM Compiler
The TINY compiler consists of the following C files:
▫ Globals.h main.c
▫ Util.h util.c
▫ Scan.h scan.c
▫ Parse.h parse.c
▫ Symtab.h analyze.c
▫ Code.h code.c
▫ Cgen.h cgen.c
The TM Machine
The target for the small language TINY:
▫ Enough instructions to be adequate;
▫ The properties of RISCs;
▫ …
End of Chapter One
Thanks
Mar. 1st, 2021