Date post: | 18-Jan-2016 |
Category: |
Documents |
Upload: | pierce-pierce |
View: | 220 times |
Download: | 0 times |
7. Code Generation
Chih-Hung Wang
Compilers
References1. C. N. Fischer, R. K. Cytron and R. J. LeBlanc. Crafting a Compiler. Pearson Education Inc., 2010.2. D. Grune, H. Bal, C. Jacobs, and K. Langendoen. Modern Compiler Design. John Wiley & Sons, 2000.3. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, 1986. (2nd Ed. 2006)1
2
Overview
3
InterpretationAn interpreter is a program that consider
the nodes of the AST in the correct order and performs the actions prescribed for those nodes by the semantics of the language.
Two varietiesRecursiveIterative
4
InterpretationRecursive interpretation
operates directly on the AST [attribute grammar]
simple to writethorough error checksvery slow: 1000x speed of compiled code
Iterative interpretationoperates on intermediate code
good error checkingslow: 100x speed of compiled code
5
Recursive Interpretation
6
Self-identifying datamust handle user-defined data typesvalue = pointer to type descriptor +
array of subvalues
example: complex number
re: 3.0
im: 4.0
7
Complex number representation
8
Iterative interpretation
Operates on threaded AST
Active node pointer
Flat loop over a case statement
IF
condition THEN ELSE
FI
9
Sketch of the main loop
10
Example for demo compiler
11
Code GenerationCompilation produces object code from the
intermediate code tree through a process called code generation
Tree rewritingReplace nodes and subtrees of the AST by
target code segmentsProduce a linear sequence of instructions
from the rewritten AST
12
Example of code generationa:=(b[4*c+d]*2)+9;
13
Machine instructionsLoad_Addr M[Ri], C, Rd
Loads the address of the Ri-th element of the array at M into Rd, where the size of the elements of M is C bytes
Load_Byte (M+Ro)[Ri], C, RdLoads the byte contents of the Ri-th element
of the array at M plus offset Ro into Rd, where the other parameters have the same meanings as above
14
Two sample instructions with their ASTs
15
Code generationMain issues:Code selection – which template?Register allocation – too few!Instruction ordering
Optimal code generation is NP-completeConsider small parts of the ASTSimplify target machineUse conventions
16
Object code sequence Load_Byte (b+Rd)[Rc], 4, RtLoad_Addr 9[Rt], 2, Ra
17
Trivial code generation
18
Code for (7*(1+5))
19
Partial evaluation
20
New Code
21
Simple code generationConsider one AST node at a time
Two simplistic target machinesPure register machinePure stack machine
BP
SP
stack
frame
vars
22
Pure stack machineInstructions
23
Example of p:=p+5Push_Local #pPush_Const 5Add_Top2Store_Local #p
24
Pure register machineInstructions
25
Example of p:=p+5Load_Mem p, R1Load_Const 5, R2Add_Reg R2, R1Store_Reg R1, p
26
Simple code generation for a stack machineThe AST for b*b – 4 *(a*c)
27
The ASTs for the stack machine instructions
28
The AST for b*b - 4*(a*c) rewritten
29
Simple code generationfor a stack machine (demo)example: b*b – 4*a*cthreaded AST
-
* *
*b b
a c
4
30
Simple code generationfor a stack machine (demo)example: b*b – 4*a*cthreaded AST
-
* *
*b b
a c
4
Sub_Top2
Mul_Top2 Mul_Top2
Mul_Top2Push_Local #b Push_Local #b
Push_Local #a Push_Local #c
Push_Const 4
31
Simple code generationfor a stack machine (demo)
example: b*b – 4*a*crewritten AST
-
* *
*b b
a c
4
Sub_Top2
Mul_Top2 Mul_Top2
Mul_Top2Push_Local #b Push_Local #b
Push_Local #a Push_Local #c
Push_Const 4
Push_Local #b
Push_Local #b
Mul_Top2
Push_Const 4
Push_Local #a
Push_Local #c
Mul_Top2
Mul_Top2
Sub_Top2
32
Depth-first code generation
33
Stack configurations
34
Simple code generation for a register machineThe ASTs for the register machine
instructions
35
Code generation with register allocation
36
Code generation with register numbering
37
Register machine code for b*b - 4*(a*c)
38
Register contents
39
Weighted register allocationIt is advantageous to generate the code for
the child that requires the most registers first
Weight:The number of registers required by a node
40
Register weight of a node
41
AST for b*b-4*(a*c) with register weights
42
Weighted register machine code
43
ExampleParameter number N 2 3 1Stored weight 4 2 1Registers occupied when 0 1 2 starting parameter NMaximum per parameter 4 3 3Overall maximum 4
44
Example: Tree representation
45
Register spilling
Too few registers?Spill registers in memory, to be
retrieved laterHeuristic: select subtree that uses all
registers, and replace it by a temporary
example: b*b – 4*a*c2 registers 1
2
11
11
3
2
2
2-
* *
*b b
a c
4
T11
46
1
2
11
11
3
2
2
2-
* *
*b b
a c
4
T11
Register spilling
Load_Mem b, R1
Load_Mem b, R2
Mul_Reg R2, R1
Store_Mem R1, T1
Load_Mem a, R1
Load_Mem c, R2
Mul_Reg R2, R1
Load_Const 4, R2
Mul_Reg R1, R2
Load_Mem T1, R1
Sub_Reg R2, R1
47
Another example
1
2
11
3
2
2
2-
* *
*b b
a c
4
T1
1
48
Algorithm
49
Machines with register-memory operationsAn instruction:
Add_Mem X, R1Adding the contents of memory location X to
R1
50
Register-weighted tree for a memory-register machine
51
Code generation for basic blocksFinding the optimal rewriting of the AST
with available instruction templates is NP-complete.
Three techniquesBasic blocksBottom-up tree rewritingRegister allocation by graph coloring
52
Basic blockImprove quality of code emitted by
simple code generationConsider multiple AST nodes at a time
Generate code for maximal basic blocks that cannot be extended by including adjacent AST nodes
basic block: a part of the control graph that
contains no splits (jumps) or combines (labels)
53
Example of basic blockA basic block consists of expressions and
assignments
Fixed sequence (;) limits code generationAn AST is too restrictive
54
From AST to dependency graphAST for the simple basic block
55
Simple algorithm to convert AST to a data dependency graphReplace arcs by downwards arrows
(upwards for destination under assignment)
Insert data dependencies from use of V to preceding assignment to V
Insert data dependencies from the assignment to a variable V to the previous assignment to V
Add roots to the graph (output variables)Remove ;-nodes and connecting arrows
56
Simple data dependency graph
57
Cleaned-up graph
58
Exercise
{ int n;
n = a+1;
x = (b+c) * n;
n = n+1;
y = (b+c) * n;
}
Convert the above codes to a data dependency graph
59
Answer
+
b c a
+
1
*
x
+ +
1
*
y
60
Common subexpression eliminationSimple example
x=a*a+2*a*b + b*b;y=a*a-2*a*b + b*b;Three common subxpressionsdouble quads = a*a + b*b;double cross_prod = 2*a*b;x = quads + cross_prod;y = quads – cross_prod;
61
Common subexpressionEqual subexpression in a basic block are
not necessarily common subexpressions
x=a*a+2*a*b + b*b;a=b=0;y=a*a-2*a*b + b*b;
62
Common subexpression example (1/3)
63
Common subexpression example (2/3)
64
Common subexpression example (3/3)
65
From dependency graph to codeRewrite nodes with machine instruction
templates, and linearize the resultInstruction ordering: ladder sequencesRegister allocation: graph coloring
66
Linearization of thedata dependency graph
Example:
(a+b)*c – d
Definition of a ladder sequenceEach root node is a ladder sequence A ladder sequence S ending in operator
node N can be extended with the left operand of N
If operator N is commutative then S may also extended with the right operand of N
Load_Mem a, R1
Add_Mem b, R1
Mul_Mem, c, R1
Sub_Mem d, R1
67
Code generated for a given ladder sequence
load_Mem b, R1 Add_Reg I1, R1 Add_Mem c, R1 Store_Reg R1, x
68
Heuristic ordering algorithmTo delay the issues of register allocation,
use pseudo-registers during the linearization
•Select ladder sequence S without more than one incoming dependencies
•Introduce temporary (pseudo-) registers for non-leaf operands, which become additional roots
•Generate code for S, using R1 as the ladder register
•Remove S from the graph
•Repeat step 1 through 4 until the entire data dependency graph has been consumed and rewritten to code
69
Example of linearization
X1
70
The code for y, *, +Load_Reg X1, R1Add_Const 1, R1Multi_Mem d, R1Store_Reg R1, y
71
Remove the ladder sequence y, *, +
72
The code for x, +, +, *Load_Reg X1, R1Mult_Reg X1, R1Add_Mem b, R1Add_Mem c, R1Store_Reg R1, x
73
The Last stepLoad_Mem a, R1Add_Const 1, R1Load_Reg R1, X1
74
The results of code generation
75
Exercise
Generate code for the following dependency graph
*
2
*
+
+
x
-
+
y
*
a
*
b
76
Answers
*
2
*
+
+
x
-
+
y
*
a
*
b
Load_Reg R2, R1
Add_Reg R3, R1
Add_Reg, R4, R1
Store_Mem R1, x
1) ladder: x, +, +
Load_Reg R2, R1
Sub_Reg R3, R1
Add_Reg, R4, R1
Store_Mem R1, y
2) ladder: y, +, -
R2R3
R4
Load_Const 2, R1
Mul_Reg Ra, R1
Mul_Reg, Rb, R1
Load_Reg R1, R3
3) ladder: R3, *, *
Load_Reg Ra, R1
Mul_Reg Ra, R1
Load_Reg R1, R2
4) ladder: R2, *
Load_Reg Rb, R1
Mul_Reg Rb, R1
Load_Reg R1, R4
5) ladder: R4, *
77
Register allocation for the linearized codeMap the pseudo-registers to memory
locations or real registers
gcc compiler
78
Code optimization in the presence of pointersPointers cause two different problems
for the dependency grapha=x * y;*p = 3;b = x * y;
a=*p * y;b = 3;c = *p * q;
x * y is not a common subexpression if p happens to point to x or y
*p * q is not a common subexpression if p happens to point to b
79
Example (1/4)Assignment under a pointer
80
Example (2/4)
Data dependency graph with an assignment under a pointer
81
Example (3/4)
Cleaned-up graph
82
Example (4/4)
Target code
*x:=R1
83
BURS code generationIn practice, machines often have a great
variety of instructions, simple ones and complicated ones, and better code can be generated if all available instructions are utilized.
Machines often have several hundred different machine instructions, often each with ten or more addressing modes, and it would be very advantages if code generators for such machines could be derived from a concise machine description rather than written by hand.
84
BURS code generationSimple instruction patterns (1/2)
85
BURS code generationSimple instruction patterns (2/2)
86
Example: Input tree
87
Naïve rewrite Its cost is 17 units
1 + 3 + 4 + 1 + 4 + 3 + 1 = 17
88
Code resulting
89
Top-down largest-fit rewrite
90
DiscussionsHow do we find all possible rewrites, and
how do we represent them? It will be clear that we do not fancy listing them all!!
How do we find the best/cheapest rewrite among all possibilities, preferably in time linear in the size of the expression to be translated.
91
Bottom-up pattern matchingThe dotted trees
92
Outline code for bottom-up pattern matching
93
Label set resulting
94
Instruction selection by dynamic programmingBottom-up pattern matching with costs
#5->reg#6->reg#7.1#8.1
Instructionsselection
95
Cost evaluationLower *
#5->reg@7#6->reg@8 (1+3+4)
Higher *#6->reg@12 (1+7+4)#8->reg@9 (1+3+5)
Top + (?)Exercise
96
Code generation by bottom-up matching
97
Code generation by bottom-up matching, using commutativity
98
Pattern matching and instruction selection combinedTwo basic operands
State S1: -> cst@0 #1->reg@1State S2: -> mem@0 #2->reg@3
99
States of the BURS
100
Creating the cost-conscious next-state tableThe triplet {‘+’, S1, S1}=S3
S3:#4->reg@3 (1+1+1)
{‘+’, S1, S2} = S5S5:
#3->reg@1+0+3=4#4->reg@1+3+1=5
Exercise: {‘+’, S1, S5}Exercise: {‘*’, S1, S2}
–#5->reg@1+0+6=7 (4)–#6->reg@1+3+4=8–#7.1@0+3+0=3 (0)–#8.1@0+3+0=3 (0)
101
Cost conscious next table
102
Code generation using cost-conscious next-state table
103
Register allocation by graph coloringProcedure-wide register allocationOnly live variables require register storage
Two variables(values) interfere when their live ranges overlap
dataflow analysis: a variable is live at node N if the value it holds is used on some path further down the control-flow graph; otherwise it is dead
104
A program segment for live analysis
105
Live range of the variables
106
Graph coloringNP complete problem
Heuristic: color easy nodes lastFind node N with lowest degreeRemove N from the graphColor the simplified graph Set color of N to the first color that is
not used by any of N’s neighbors
107
Coloring process
3 registers
108
Preprocessing the intermediate codePreprocessing of expressions
char lower_case_from_capital(char ch) { return ch + (‘a’ – ‘A’); }Constant expression evaluation char lower_case_from_capital(char ch) { return ch + 32; }
109
Arithmetic simplificationTransformations that replace an operation
by a simpler one are called strength reductions.
Operations that can be removed completely are called null sequences.
110
Some transformations for arithmetic simplification
111
Preprocessing of if-statements and goto statementsWhen the condition in an if-then-else
statement turns out to be constant, we can delete the code of the branch that will never be executed. This process is called dead code elimination.
If a goto or return statement is followed by code that has no incoming data flow, that code is dead and can be eliminated.
112
Stack representations
113
Stack representations (details)
condition
IF
THEN ELSE
>
y 0yx
5
yx
5
5
yx
5
50
yx
5
T
yx
5
yx
5
yx
5
x = 7;
yx
57
yx
5
dead
code
FI
merge
114
Preprocessing of routinesIn-lining method
115
In-lining result
Advanced examples:{int n=3; printf(“square=%d\n”, n*n);}=> {int n=3; printf(“square=%d\n”, 3*3);}=> {int n=3; printf(“square=%d\n”, 9);}
Load_par “square=%d\n”Load_par 9Call printf
116
CloningExample
double poewr_series(int n, double a[], double x) { int p; for (p=0; p<n; p++) result += a[p] * (x**p); return result }
Is called with x set to 1.0
double poewr_series(int n, double a[]) { int p; for (p=0; p<n; p++) result += a[p] * (1.0**p); return result }
double poewr_series(int n, double a[]) { int p; for (p=0; p<n; p++) result += a[p] ; return result }
117
Postprocessing the target codeStupid instruction sequences
Load_Reg R1, R2Load_Reg R2, R1orStore_Reg R1, nLoad_Mem n, R1
118
Creating replacement patternsExample
Load_Reg Ra, Rb; Load_Reg Rc, Rd |Ra=Rd, Rb=Rc => Load_Reg Ra, Rb
Load_const 1, Ra; Add_Reg Rb, Rc |Ra=Rb, is_last_use(Rb) => Increment Rc
119
Locating and replacing instructionsMultiple pattern matchingUsing FSADotted items
120
HomeworkStudy sections
4.2.13 Machine code generation4.3 Assemblers, linkers and loaders