Date post: | 12-Jan-2016 |
Category: |
Documents |
Upload: | thomas-mathews |
View: | 213 times |
Download: | 2 times |
The LANCE V2.0
C compiler system
Rainer Leupersphone: +49 (231) 755 6151mobile: +49 (177) 2131146
University of Dortmund, Informatik 1244221 Dortmund, Germany
fax: +49 (231) 755 6116http://ls12-www.cs.uni-dortmund
© 2000, R. Leupers
Overview
Functionality of LANCE Software structure C frontend Intermediate representation (IR) IR optimizations Control and data flow analysis Backend interface
© 2000, R. Leupers
The LANCE V2.0 compiler system
Tasks covered by LANCE: Source code analysis Generation of IR Machine-independent optimizations Data flow graph generation
Tasks not covered by LANCE: Assembly code generation (backend) Machine-specific optimizations Code assembly and linking
Purpose of LANCE: Facilitate C compiler development for new target processors Give insight into compiler structure
© 2000, R. Leupers
Key features
Full ANSI C coverage (C 89)
Modular tool and library structure
Simple three address code IR (C subset)
Plug & play IR optimizations
Backend interface compatible to OLIVE
Proven in numerous compiler projects
© 2000, R. Leupers
LANCE software structure
lance2.h
header file
liblance2.a
C++ library
C frontend
common IR
IR optimization 1
IR optimization n
machine-specificbackend
LANCE library LANCE tools
used by
© 2000, R. Leupers
ANSI C frontend
Functionality: Lexical, syntactical, and semantical analysis of C source Generation of three address code IR for a C file Emission of error messages if required (gcc style) Machine-specific constants (type bitwidth, alignment) stored in a configuration file
Implementation: Based on a context-free C grammar, according to K&R spec C source automatically generated with attribute grammar compiling system (OX, extension of lex & yacc) In total approx. 26,000 lines of C source code Validated with comprehensive test suite
© 2000, R. Leupers
Setup and IR generation
file test.c
file test.ir.c
>compile test.c
config.sparc
Environment variables: setenv LANCE2_CPP „gcc –E“ setenv LANCE2_CONFIG „config.sparc“
Call C frontend by „compile“ command:
© 2000, R. Leupers
General IR format
One IR file (*.ir.c) generated for each C source file (*.c)
External IR format: C subset (compilable !)
Internal IR format: Accessible via LANCE library IR contains a symbol table + three address code (3AC) for each C function defined in the source code
3AC is a sequence of IR statements
3AC = at most two operands, one result per statement
IR statements (mostly) consist of IR expressions blocks of 3AC augmented with source information (C code, source line no.) for debugging purposes
© 2000, R. Leupers
Classes of IR statements
Assignment: a = b + c; *p = !a; x = f(y,z); cond = *x;
Jump:goto lab;
Conditional jump:if (cond) goto lab;
Label:lab:
Return void:return;
Return value:return x;
© 2000, R. Leupers
Classes of IR expressions
Symbol: „a“, „b“, „main“, „count“, ...
Binary expression: a * b, x / 2, 3 ^ v, f &4, q % r, ...
Unary expression: !a, *p, ~x, -z, ...
Function call: f1(), f2(a,b), f3(*x, 1, y), ...
Type cast: (char)z, (int)a, (float*)b, ...
String constant: „compiler“, „design“, „is“, „fun“, ...
Integer constant: 1000, 3456, -234, -112, ...
Float constant: „3.1415926536“, „2.718281828459“, ...
© 2000, R. Leupers
Why is the LANCE IR a C subset ?
C source frontend IR-C source
CC CCexe 2exe 1 test input
output 1 output 2= ?
Validation of frontend (or any IR optimization):
C-to-C optimization:IR optimization
toolsoptimizedC sourceCC
© 2000, R. Leupers
IR data structure overview
GLOBAL SYMBOL TABLEint x1,x2,x3; double y1,y2,y3; ........
fun 1„name1“
Local symbol tableint a,b,c; ...
stm 1 stm 2 stm m
fun n„name n“
Class: assignmentID: 4123Left hand side: *pRight hand side: a + b
Class: cond. jumpID: 4124Target: „L1“Condition: c
..........
...
Class: binaryID: 10034Left arg: aRight arg: bOper: +Type: int
exp info
IR statement listfunction list
stm info
IR expression
© 2000, R. Leupers
The IR type class
C++ class IRType stores type info for all symbols and expressions Primary type: void, char, short, int, array, pointer, struct, function, ... Secondary type: subtype of arrays and pointers Storage class: extern, static, register, ... Qualifiers: const, volatile Example: const int* A[100];
Type->Class() = IRTYPE_ARRAY // primary type Type->IsConst() = true Type->Subtype()->Class() = IRTYPE_POINTER Type->Subtype()->Subtype()->Class() = IRTYPE_INT Type->ArrayDim() = 100 Type->SizeOf() = 400 // in bytes, for 32-bit pointers
Type->MemoryWords() = 200 // for a 16-bit word memory
© 2000, R. Leupers
The symbol table class
Symbol table stores all relevant information for symbols/identifiers Two hierarchy levels:
Global symbol table IR->GlobalSymbolTable() One local symbol table per function fun->LocalSymbolTable()
All local symbols get a unique numerical suffix, e.g.int f(int x) { int a,b; } int f(int x_1) { int a_2, b_3; }
Important access methods: ST->LookupSymbol(char* name) IRSymbol* ST->CreateSymbol(IRType* tp) Iterators: ST->FirstObject(), ST->NextObject()
Information stored in a table entry (class IRSymbol): Symbol type: IRType* sym->Type() Symbol name: char* sym->Name()
© 2000, R. Leupers
IR generation example
source fileIR file
forward declaration
automatic conversion
auxiliary vars
debug info
suffix 3 for parameter i
© 2000, R. Leupers
IR optimization tools
Purpose: perform machine-independent optimizations on IR Identical IR format for all tools, „plug & play“ concept Currently available tools:
Constant folding cfold tool Constant propagation constprop tool Copy propagation copyprop tool Common subexpression elimination cse tool Dead code elimination dce tool Jump optimization jmpopt tool Loop invariant code motion licm tool Induction variable elimination ive tool
Automatic iteration of IR optimizations via „iropt“ shell script
© 2000, R. Leupers
IR optimization example
compile
C source code
unoptimized IR
© 2000, R. Leupers
Constant folding
cfold
© 2000, R. Leupers
Constant propagation
constprop
© 2000, R. Leupers
Copy propagation
copyprop
© 2000, R. Leupers
Common subexpression elimination
cse
© 2000, R. Leupers
Dead code elimination
dce
© 2000, R. Leupers
Jump optimization
jmpopt
© 2000, R. Leupers
Loop invariant code motion
licm
© 2000, R. Leupers
Induction variable elimination
ive
© 2000, R. Leupers
Control flow analysis
Purpose: identify basic block structure of a C function Basic block (BB): IR statement sequence with unique entry and exit points Control flow graph (CFG): One node per BB, edge (BB1, BB2) iff BB2 may be an immediate successor of BB1 during execution Assembly code generation usually done BB after BB Example:
while (x){ BB1; if (x) then BB2; else BB3; BB4;}
BB1
BB2 BB3
BB4
© 2000, R. Leupers
CFG generation by LANCE
Class ControlFlowGraph contained in LANCE library Constructor ControlFlowGraph(Function* fun) generates CFG for any function fun LANCE tool showcfg exports CFGs in the VCG text format VCG can be used to visualize generated CFGs
showcfg xvcg
IR file VCG file CFG
© 2000, R. Leupers
CFG visualization example
showcfg +VCG tool
© 2000, R. Leupers
Data flow analysis
Goal: convert IR into data flow graph (DFG) representation for assembly code generation by tree pattern matching Performed by def/use analysis between IR statements/expressions LANCE lib class DataFlowAnalysis provides required methods Constructor DataFlowAnalysis(Function* fun) constructs data flow information for any function fun Example:
x = 5; goto lab; ... x = 6;lab: y = x + 1; ... z = 1 – y; u = y / 5;
x has two definitions: x and xy has two uses: y and y
© 2000, R. Leupers
DFG visualization example
showdfg +VCG tool
© 2000, R. Leupers
Backend interface
a b
*
+ +
2c
x y
a b
*
+ +
2c
x y
t
t t
CSE
auxiliaryvariable
LANCE lib classes LANCEDataFlowTree and DFTManager provide link between LANCE IR and tree pattern matching OLIVE/IBURG accept only trees instead of general DFGs Hence: split DFGs at the common subexpressions (CSEs)
© 2000, R. Leupers
Data structure overview
Constructor DFTManager(Function* fun) generates data flow tree (DFT) representation for an entire function fun DFTManager contains internal list of basic blocks Each BB in turn is a list of DFTs
BB 1 DFT 1 DFT 2 DFT m
BB n
..........
...
BB 2
© 2000, R. Leupers
DFT covering with OLIVE
DFTs are directly in the format required by code generators produced by OLIVE All DFTs consist of a fixed set of terminal symbols (e.g. cs_STORE) (specified in file INCL/termlist.c) Example (only a single DFT):
C file
IR fileDFT representation
© 2000, R. Leupers
Example (cont.)
simplifiedOLIVE spec
DFT in OLIVE format
assemblycode for
hypotheticalmachine
© 2000, R. Leupers
Summary
LANCE provides you with ... C frontend IR optimizations C++ library for IR access (+ important basic classes) interface to OLIVE data flow trees
Full C compiler additionally requires ... OLIVE based backend for the concrete target machine target-specific optimizations (e.g. scheduling, address gen.)