1
Theory of Compilation 236360
Erez Petrank
Lecture 11: Optimizations
2
Register Allocation by Graph Coloring
• Address register allocation by– liveness analysis– reduction to graph coloring– optimizations by program transformation
• Main idea– register allocation = coloring of an interference graph– every node is a variable– edge between variables that “interfere” = are both live at the
same time– number of colors = number of registers
3
Example
v1
v2
v3
v4
v5
v6
v7
v8
time
V1
V8
V2
V4
V7
V6
V5
V3
Example
4
a = read();b = read();c = read();a = a + b + c;if (a<10) { d = c + 8; print(c);} else if (a<2o) { e = 10; d = e + a; print(e);} else { f = 12; d = f + a; print(f);} print(d);
a = read();b = read();c = read();a = a + b + c;if (a<10) goto B2 else goto B3
d = c + 8;print(c); if (a<20) goto B4
else goto B5
e = 10;d = e + a;print(e);
f = 12;d = f + a;print(f);
print(d);
B1
B2B3
B4B5
B6
b
ac
d
e fd
d
Example: Interference Graph
5
f a b
d e c
a = read();b = read();c = read();a = a + b + c;if (a<10) goto B2 else goto B3
d = c + 8;print(c); if (a<20) goto B4
else goto B5
e = 10;d = e + a;print(e);
f = 12;d = f + a;print(f);
print(d);
B2B3
B4B5
B6
b
ac
d
e fd
d
6
Register Allocation by Graph Coloring
• variables that interfere with each other cannot be allocated the same register
• graph coloring– classic problem: how to color the nodes of a graph with the
lowest possible number of colors– bad news: problem is NP-complete (to even approximate)– good news: there are pretty good heuristic approaches
7
Heuristic Graph Coloring• idea: color nodes one by one, coloring the “easiest” node
last• “easiest nodes” are ones that have lowest degree
– fewer conflicts• algorithm at high-level
– find the least connected node– remove least connected node from the graph– color the reduced graph recursively– re-attach the least connected node
8
Heuristic Graph Coloring
f a b
d e c
f a
d e c
f a
d e
f
d e
stack: stack: b
stack: cb stack: acb
Heuristic Graph Coloring
9
f
d e
stack: acb
f
d
stack: eacb
f
stack: deacb stack: fdeacb
f1
stack: deacb
f1
d2
stack: eacb
f1
d2 e1
stack: acb
f1 a2
d2 e1
stack: cb
10
Heuristic Graph Coloring
f1 a2 b3
d2 e1 c1
f1 a2
d2 e1 c1
f1 a2
d2 e1
stack:
stack: bstack: cb
Result:3 registers for 6 variables
Can we do with 2 registers?
11
Heuristic Graph Coloring
• two sources of non-determinism in the algorithm– choosing which of the (possibly many) nodes of lowest degree
should be detached– choosing a free color from the available colors
12
Heuristic Graph Coloring• The above heuristic gives a coloring of the graph. • But what we really need is to color the graph with a given
number of colors = number of available registers. • Many times this is not possible. • We’d like to find the maximum sub-graph that can be colored. • Vertices that cannot be colored will represent variables that will
not be assigned a register.
Similar Heuristic1. Iteratively remove any vertex
whose degree < k (with all of its edges).
2. Note: no matter how we color the other vertices, this one can be colored legitimately!
V1
V8
V2
V4
V7
V6
V5
V3
Similar Heuristic1. Iteratively remove any vertex
whose degree < k (with all of its edges).
2. Note: no matter how we color the other vertices, this one can be colored legitimately!
V1
V8
V2
V4
V7
V6
V5
V3
.4Now all vertices are of degree >=k (or graph is empty).5If graph empty: color the vertices one-by-one as in previous
slides. Otherwise ,.6Choose any vertex, remove it from the graph. Implication:
this variable will not be assigned a register. Repeat this step until we have a vertex with degree <k and go back to (1) .
Similar Heuristic1. Iteratively remove any vertex
whose degree < k (with all of its edges).
2. Note: no matter how we color the other vertices, this one can be colored legitimately!
V1
V8
V2
V4
V7
V6
V5
V3
.4Now all vertices are of degree >=k (or graph is empty).5If graph empty: color the vertices one-by-one as in previous
slides. Otherwise ,.6Choose any vertex, remove it from the graph. Implication:
this variable will not be assigned a register. Repeat this step until we have a vertex with degree <k and go back to (1) .Source of non-determinism: choose which vertex to remove in (6).
This decision determines the number of spills.
16
Summary: Code Generation• Depends on the target language and platform.
– GNU Assembly– IA-32 platform.
• Basic blocks and control flow graph show program executions paths.
• Determining variable liveness in a basic block. – useful for many optimizations. – Most important use: register allocation.
• Simple code generation. • Better register allocation via graph coloring heuristic.
17
Optimizations
18
Optimization• Improve performance. • Must maintain program semantics
– optimized program must be equivalent.
• In contrast to the name, we seldom obtain optimum. • We do not improve an inefficient algorithm, we do not fix bugs. • Classical question: how much time should we spend on compile-
time optimizations to save on running time? – With parameters unknown.
• Optimize running time (most popular), • Optimize size of code,• Optimize memory usage,• optimize energy consumption.
19
Where does inefficiency come from? • Redundancy in original program:
– Sometimes the programmer uses redundancy to make programming easier, knowing that the compiler will remove it.
– Sometimes the programmer is not very good.
• Redundancy because of high level language: – E.g., accessing an array means computing i*4 inside a loop repeatedly.
• Redundancy due to translation.– The initial compilation process is automatic and not very clever.
20
Running Time Optimization• Need to understand how the run characteristics (which
are often unknown). – Usually the program spends most of its time in a small part of
the code. if we optimize that, we gain a lot. – Thus, we invest more in inner loops. – Example: place together functions with high coupling.
• Need to know the operating system and the architecture. • We will survey a few simple methods first, starting with
building a DAG.
21
Representing a basic block computation with a DAG
• Leaves are variable or constants, marked by their names of values. • Inner vertices are marked by their operators. • We also associate variable names with the inner vertices according to the
computation advance.
t1 := 4 * it2 := a [ t1 ]t3 := 4 * it4 := b [ t3 ]t5 := t2 * t4
t6 := prod + t5
prod := t6
t7 := i + 1i := t7
if i <= 20 goto (1)
(1)
1i04ba
20t7, i+t1, t3 *
(1)<=t4[ ]t2[ ]
t5*prod0
t6, prod+
22
Building the DAGFor each instruction x: = y + z• Find the current location of y and z, • Build a new node marked “+” and connect as a parent to both nodes
(if such parent does not exist); associate this node with “x”• If x was previously associated with a different node, cancel the
previous association (so that it is not used again). • Do not create a new node for copy assignment such as x := y.
Instead, associate x with the node that y is associated with. – Such assignments are typically eliminated during the optimization.
Using the DAG
23
t1 := 4 * it2 := a [ t1 ]t3 := 4 * it4 := b [ t3 ]t5 := t2 * t4
t6 := prod + t5
prod := t6
t7 := i + 1i := t7
if i <= 20 goto (1)
(1)
1i04ba
20t7, i+t1, t3 *
(1)<=t4[ ]t2[ ]
t5*prod0
t6, prod+
t1
prod := prod + t5
i := i + 1
24
Uses of Dags• Automatic identification of common expressions• Identification of variables that are used in the block• Identification of values that are computed but not used. • Identifying computation dependence (allowing code
movements) • Avoiding redundant copying instructions.
25
Aliasing Problems• What’s wrong about the following optimization?
• The problem is with the side effect due to aliasing. • Typically, we conservatively assume aliasing: upon assignment
to an array element we assume no knowledge in array entries. • The problem is when we do not know if aliasing exists. • Relevant to pointers as well. • Relevant to routine calls when we cannot determine the routine
side-effects. • Aliasing is a major barrier in program optimizations.
x := a [ i ]a [ j ] := yz := x
x := a [ i ]a [ j ] := yz := a [ i ]
26
Optimization Methods• In the following slides we review various optimization
methods, stressing performance optimizations. • The main goal: eliminate redundant optimizations. • Some methods are platform dependent.
– In most platforms addition is faster than multiplication. • Some methods do not look useful on their own, but their
combination is effective.
27
Basic Optimizations• Common expression elimination:
– We have seen how to identify common expressions in a basic block and eliminate repeated computation.
– We will later present data flow analysis that will help us find such expressions across basic blocks.
• Copy propagation: – Given an assignment x:=y, we attempt to use y instead of x. – Possible outcome: x becomes dead and we can eliminate the
assignment.
28
Code motion• Code motion is useful in various scenarios. • Identify inner-loop code,• Identify an expression whose sources do not change in the loop,
and• Move this code outside the loop!
while (x – 3 < y) {// … instructions
that do // not change x}
t1 = x – 3;while (t1 < y) {
// … instructions that do // not change x or t1}
29
Induction variables & Strength Reduction
• Identify the loop variables, and their relation to other variables.• Eliminate dependence on induction variables as much as possible(1) i = 0;(2) t1 = i * 4;(3) t2 = a[t1](4) if (t2 > 100) goto (19)(5) … …(17) i = i + 1(18) goto (2)(19) …• Why is such code (including multiplication by 4) so widespread?
→ t1 = t1 + 4
t1 must be initialized outside the loop
Not just S.R.! We have removed dependence of t1 in i.
Thus, instructions 1 and 17 become irrelevant.
In many platforms addition is faster than multiplication (strength reduction)
30
Peephole (חור הצצה) Optimization
• Going over long code is costly. • A simple and effective alternative (though not optimal) is
peephole optimization:• Check a “small window” of code and improve only this
code section.• Some instructions can be improved even without
considering their neighboring instructions. • For example:
– x := x * 1;– a := a + 0;
31
peephole optimizations• Some optimizations that do not require a global view: • Simplifying algebraic computations: • x := x ^ 2 → x := x * x• x := x * 8 → x := x << 3• Code rearrangement:(1) if x == 1 goto (3)(2) goto (19)(3) …
↓(1) if x 1 goto (19)(2) …
32
peephole optimizations• Eliminate redundant instructions: (1) a := x(2) x := a(3) a := someFunction(a);(4) x := someOtherFunction(a, x);(5) if a > x goto (2)
• Execute peephole optimizations within basic block only and do not elide the first instruction.
!זהירותפקודה אל קופץ מישהו אם
בעיה, נוצרת .שביטלנו
i = m – 1 ; j = n ; v = a [ n ];while (1) {
void quicksort ( m , n )int m , n ; {
do i = i + 1 ; while ( a [ i ] < v ) ; do j = j – 1 ; while ( a [ j ] > v ) ;if ( i >= j ) break ; x = a [ i ] ; a [ i ] = a [ j ] ; a [ j ] = x ; }
int i , j ;int v , x ;if ( n <= m ) return ; code fragmentquicksort ( m , j ) ; quicksort ( i + 1 , n ) ; }x = a [ i ] ; a [ i ] = a [ n ] ; a [ n ] = x ;
t7 := 4 * it8 := 4 * jt9 := a [ t8 ]a [ t7 ] := t9
t10 := 4 * ja [ t10 ] := xgoto (5)t11 := 4 * ix := a [ t11 ]t12 := 4 * it13 := 4 * nt14 := a [ t13 ]a [ t12 ] := t14
t15 := 4 * na [ t15 ] := x
(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto (5)j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto (9)if i >= j goto (23)t6 := 4 * ix := a [ t6 ]
(1)(2)(3)(4)(5)(6)(7)(8)(9)
(10)(11)(12)(13)(14)(15)
i = m – 1 ; j = n ; v = a [ n ];while (1) {
void quicksort ( m , n )int m , n ; {
do i = i + 1 ; while ( a [ i ] < v ) ; do j = j – 1 ; while ( a [ j ] > v ) ;if ( i >= j ) break ; x = a [ i ] ; a [ i ] = a [ j ] ; a [ j ] = x ; }
int i , j ;int v , x ;if ( n <= m ) return ; code fragmentquicksort ( m , j ) ; quicksort ( i + 1 , n ) ; }x = a [ i ] ; a [ i ] = a [ n ] ; a [ n ] = x ;
t7 := 4 * it8 := 4 * jt9 := a [ t8 ]a [ t7 ] := t9
t10 := 4 * ja [ t10 ] := xgoto (5)t11 := 4 * ix := a [ t11 ]t12 := 4 * it13 := 4 * nt14 := a [ t13 ]a [ t12 ] := t14
t15 := 4 * na [ t15 ] := x
(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto (5)j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto (9)if i >= j goto (23)t6 := 4 * ix := a [ t6 ]
(1)(2)(3)(4)(5)(6)(7)(8)(9)
(10)(11)(12)(13)(14)(15)
B1
B2
B4B5
B1
B3
t11 := 4 * ix := a [ t11 ]t12 := 4 * it13 := 4 * nt14 := a [ t13 ]a [ t12 ] := t14
t15 := 4 * na [ t15 ] := x
t6 := 4 * ix := a [ t6 ]t7 := 4 * it8 := 4 * jt9 := a [ t8 ]a [ t7 ] := t9
t10 := 4 * ja [ t10 ] := xgoto B2
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
שלב א'ביטול ביטויים משותפים באופן גלובלי
t11 := 4 * ix := a [ t11 ]t12 := 4 * it13 := 4 * nt14 := a [ t13 ]a [ t12 ] := t14
t15 := 4 * na [ t15 ] := x
t6 := 4 * ix := a [ t6 ]t7 := 4 * it8 := 4 * jt9 := a [ t8 ]a [ t7 ] := t9
t10 := 4 * ja [ t10 ] := xgoto B2
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
שלב א'ביטול ביטויים משותפים באופן גלובלי
t4
t11 := 4 * ix := a [ t11 ]t12 := 4 * it13 := 4 * nt14 := a [ t13 ]a [ t12 ] := t14
t15 := 4 * na [ t15 ] := x
t6 := 4 * ix := a [ t6 ]t7 := 4 * it8 := 4 * jt9 := a [ t8 ]a [ t7 ] := t9
t10 := 4 * ja [ t10 ] := xgoto B2
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
שלב א'ביטול ביטויים משותפים באופן גלובלי
t4
t4
t11 := 4 * ix := a [ t11 ]t12 := 4 * it13 := 4 * nt14 := a [ t13 ]a [ t12 ] := t14
t15 := 4 * na [ t15 ] := x
t6 := 4 * ix := a [ t6 ]t7 := 4 * it8 := 4 * jt9 := a [ t8 ]a [ t7 ] := t9
t10 := 4 * ja [ t10 ] := xgoto B2
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
שלב א'ביטול ביטויים משותפים באופן גלובלי
t4
t4
t5
t11 := 4 * ix := a [ t11 ]t12 := 4 * it13 := 4 * nt14 := a [ t13 ]a [ t12 ] := t14
t15 := 4 * na [ t15 ] := x
t6 := 4 * ix := a [ t6 ]t7 := 4 * it8 := 4 * jt9 := a [ t8 ]a [ t7 ] := t9
t10 := 4 * ja [ t10 ] := xgoto B2
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
שלב א'ביטול ביטויים משותפים באופן גלובלי
t4
t4
t5
t2
t11 := 4 * ix := a [ t11 ]t12 := 4 * it13 := 4 * nt14 := a [ t13 ]a [ t12 ] := t14
t15 := 4 * na [ t15 ] := x
t6 := 4 * ix := a [ t6 ]t7 := 4 * it8 := 4 * jt9 := a [ t8 ]a [ t7 ] := t9
t10 := 4 * ja [ t10 ] := xgoto B2
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
שלב א'ביטול ביטויים משותפים באופן גלובלי
t4
t4
t5
t2
t2
t11 := 4 * ix := a [ t11 ]t12 := 4 * it13 := 4 * nt14 := a [ t13 ]a [ t12 ] := t14
t15 := 4 * na [ t15 ] := x
t6 := 4 * ix := a [ t6 ]t7 := 4 * it8 := 4 * jt9 := a [ t8 ]a [ t7 ] := t9
t10 := 4 * ja [ t10 ] := xgoto B2
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
שלב א'ביטול ביטויים משותפים באופן גלובלי
t4
t4
t5
t2
t2
t3
t11 := 4 * ix := a [ t11 ]t12 := 4 * it13 := 4 * nt14 := a [ t13 ]a [ t12 ] := t14
t15 := 4 * na [ t15 ] := x
t6 := 4 * ix := a [ t6 ]t7 := 4 * it8 := 4 * jt9 := a [ t8 ]a [ t7 ] := t9
t10 := 4 * ja [ t10 ] := xgoto B2
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
שלב א'ביטול ביטויים משותפים באופן גלובלי
t4
t4
t5
t2
t2
t3t2
t11 := 4 * ix := a [ t11 ]t12 := 4 * it13 := 4 * nt14 := a [ t13 ]a [ t12 ] := t14
t15 := 4 * na [ t15 ] := x
t6 := 4 * ix := a [ t6 ]t7 := 4 * it8 := 4 * jt9 := a [ t8 ]a [ t7 ] := t9
t10 := 4 * ja [ t10 ] := xgoto B2
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
שלב א'ביטול ביטויים משותפים באופן גלובלי
t4
t4
t5
t2
t2
t3t2
t2
t11 := 4 * ix := a [ t11 ]t12 := 4 * it13 := 4 * nt14 := a [ t13 ]a [ t12 ] := t14
t15 := 4 * na [ t15 ] := x
t6 := 4 * ix := a [ t6 ]t7 := 4 * it8 := 4 * jt9 := a [ t8 ]a [ t7 ] := t9
t10 := 4 * ja [ t10 ] := xgoto B2
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
שלב א'ביטול ביטויים משותפים באופן גלובלי
t4
t4
t5
t2
t2
t3t2
t2
t3
t11 := 4 * ix := a [ t11 ]t12 := 4 * it13 := 4 * nt14 := a [ t13 ]a [ t12 ] := t14
t15 := 4 * na [ t15 ] := x
t6 := 4 * ix := a [ t6 ]t7 := 4 * it8 := 4 * jt9 := a [ t8 ]a [ t7 ] := t9
t10 := 4 * ja [ t10 ] := xgoto B2
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
שלב א'ביטול ביטויים משותפים באופן גלובלי
t4
t4
t5
t2
t2
t3t2
t2
t3
t1
t11 := 4 * ix := a [ t11 ]t12 := 4 * it13 := 4 * nt14 := a [ t13 ]a [ t12 ] := t14
t15 := 4 * na [ t15 ] := x
t6 := 4 * ix := a [ t6 ]t7 := 4 * it8 := 4 * jt9 := a [ t8 ]a [ t7 ] := t9
t10 := 4 * ja [ t10 ] := xgoto B2
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
שלב א'ביטול ביטויים משותפים באופן גלובלי
t4
t4
t5
t2
t2
t3t2
t2
t3
t1
t1
t11 := 4 * ix := a [ t11 ]t12 := 4 * it13 := 4 * nt14 := a [ t13 ]a [ t12 ] := t14
t15 := 4 * na [ t15 ] := x
t6 := 4 * ix := a [ t6 ]t7 := 4 * it8 := 4 * jt9 := a [ t8 ]a [ t7 ] := t9
t10 := 4 * ja [ t10 ] := xgoto B2
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
שלב א'ביטול ביטויים משותפים באופן גלובלי
t4
t4
t5
t2
t2
t3t2
t2
t3
t1
t1
x := t3
t14 := a [ t1 ]a [ t2 ] := t14
a [ t1 ] := x
←←x := t3
a [ t2 ] := t5
a [ t4 ] := xgoto B2
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
ביטול ביטויים משותפים באופן שלב א' -- גלובלי
שלב ב' – copy propagation: with f:= g, we try to
use g and get rid of f .
x := t3
t14 := a [ t1 ]a [ t2 ] := t14
a [ t1 ] := x
x := t3
a [ t2 ] := t5
a [ t4 ] := xgoto B2
t3t3
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
x := t3
t14 := a [ t1 ]a [ t2 ] := t14
a [ t1 ] := x
x := t3
a [ t2 ] := t5
a [ t4 ] := xgoto B2
t3t3
Global common expression elimination Copy propagationDead code elimination – eliminate
redundant code .
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
x := t3
t14 := a [ t1 ]a [ t2 ] := t14
a [ t1 ] := x
x := t3
a [ t2 ] := t5
a [ t4 ] := xgoto B2
t3
ביטול ביטויים משותפים באופן –שלב א' גלובלי
copy propagation –שלב ב' dead code elimination שלב ג' –
(הוצאת ביטויים code motion – שלב ד'מחוץ ללולאה)
t3
Global common expression elimination Copy propagationDead code eliminationCode motion – move expressions outside the loop
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
x := t3
t14 := a [ t1 ]a [ t2 ] := t14
a [ t1 ] := x
x := t3
a [ t2 ] := t5
a [ t4 ] := xgoto B2
t3
ביטול ביטויים משותפים באופן –שלב א' גלובלי
copy propagation –שלב ב' dead code elimination שלב ג' –
code motion – שלב ד' induction variables and שלב ה' –
reduction in strength זיהוי המשתנים של)
t4 := t4 – 4
→ t4 := 4 * j
t3
Global common expression elimination Copy propagationDead code eliminationCode motionInduction variables and strength reduction
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
x := t3
t14 := a [ t1 ]a [ t2 ] := t14
a [ t1 ] := x
x := t3
a [ t2 ] := t5
a [ t4 ] := xgoto B2
t3
ביטול ביטויים משותפים באופן –שלב א' גלובלי
copy propagation –שלב ב' dead code elimination שלב ג' –
code motion – שלב ד'
t4 := t4 – 4
t2 := t2 + 4 t2 := 4 * i
t3
→ t4 := 4 * j
Global common expression elimination Copy propagationDead code eliminationInduction variables and strength reductionCommon expression elimination
Global common expression elimination Copy propagationDead code eliminationCode motionInduction variables and strength reduction
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
x := t3
t14 := a [ t1 ]a [ t2 ] := t14
a [ t1 ] := x
x := t3
a [ t2 ] := t5
a [ t4 ] := xgoto B2
t3
t4 := t4 – 4
t2 := t2 + 4
t2 >= t4
t3
t2 := 4 * i→ t4 := 4 * j
Global common expression elimination Copy propagationDead code eliminationInduction variables and strength reductionCommon expression elimination
Global common expression elimination Copy propagationDead code eliminationCode motionInduction variables and strength reduction
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
x := t3
t14 := a [ t1 ]a [ t2 ] := t14
a [ t1 ] := x
x := t3
a [ t2 ] := t5
a [ t4 ] := xgoto B2
t3
Global common expression elimination Copy propagationDead code eliminationCode motionInduction variables and strength reductionDead code elimination (again)
t4 := t4 – 4
t2 := t2 + 4
t2 >= t4
t3
t2 := 4 * i→ t4 := 4 * j
t14 := a [ t1 ]a [ t2 ] := t14
a [ t1 ] := t3
a [ t2 ] := t5
a [ t4 ] := t3
goto B2
if t2 >= t45 goto B6
t4 := t4 – 1t5 := a [ t4 ]if t5 > v goto B3
t2 := t2 + 4t3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]t4 := 4 * jt2 := 4 * i
t11 := 4 * ix := a [ t11 ]t12 := 4 * it13 := 4 * nt14 := a [ t13 ]a [ t12 ] := t14
t15 := 4 * na [ t15 ] := x
t6 := 4 * ix := a [ t6 ]t7 := 4 * it8 := 4 * jt9 := a [ t8 ]a [ t7 ] := t9
t10 := 4 * ja [ t10 ] := xgoto B2
if i >= j goto B6
j := j – 1t4 := 4 * jt5 := a [ t4 ]if t5 > v goto B3
i := i + 1t2 := 4 * it3 := a [ t2 ]if t3 < v goto B2
i := m – 1j : = nt1 := 4 * nv := a [ t1 ]
←
Global common expression eliminationcopy propagation
dead code elimination code motion induction variables and reduction in
strength
Data Flow Analysis
57
Data Flow Analysis• Global optimizations. • We need to understand the flow of data in the program to be able to change
code wisely and correctly. • This understanding will come from an analysis called data flow analysis or DFA• It’s a set of algorithms, all having the same generic frame, and their specifics
are determined by the information we are after. • Used for optimizations, but also for verification.
• Do not mix shortening with double meanings…
DFA CFG
Front-end Deterministic Final Automata Context Free Grammar
Back-end Data Flow Analysis Control Flow Graph
The Idea• We are given a graph of “program constructs”
– Single instructions, Basic blocks, etc.• The algorithm works in iterations• In each iteration we update the information for each
node in the graph according to the information in its neighbors. – A global view is never necessary.
• The algorithm terminates when no node gets updated. – Typically termination is guaranteed since knowledge is
increased in each iteration and there is a limit on knowledge size.
58
59
DFA – The Generic Algorithm• The general structure of any DFA algorithm is: N1…Nn – information about the n program nodes
(could be variables, basic blocks, etc.)
for i in {1…n}: initialize Ni
bool change = true;while (change) {
change = false;for i in {1…n}:
M = new value for Ni (funciton of neighbors of Ni). if (M Ni) then
change = true;Ni = M
}
Specific instantiations of the DFA generic structure differ in the initialization of N, and the computation of the new values.
60
Reaching Definitions –• A definition is an assignment of variable v. • A definition d reaches point p in the program, if there is at least one
possible execution path from the definition d to the point p such that there is no new definition of the same variable along the path.
• The influence of the assignment a = b+c :– It uses the variables b and c, – It kills any previous definition of a. – It generates a new definition for a.
• Similarly, the instruction “if (a<3)” uses a but does not kill nor generate and definition.
61
Reformulating “Reaching Definitions”
• A definition: an assignment that gives value to a variable v.• Reaching: a definition d reaches a point p in the program if there is
at least one path from the definition d to p such that d is not killed on the path.
• Finding reaching definitions: • Find all definitions that reach any point in the program. • This information can be used for optimizations. • This seems to require going over all paths and all definitions in the
entire program.
• (We usually think of the program as a single method (or routine). Inter-procedural analysis examines full modules, classes, or even whole programs. )
62
Computing Reaching Definitions with DFA
• Let Ni:A be the set of all reaching definitions of variable A in line i. – There is a DFA variable for each program variable and each code line.
• We start with a subset and gradually enlarge it until it contains all reaching definitions.
• When there are no more possible enlargements available, we know we’re done.
• Usually, we don’t consider copying (A=B) as an assignment because A and B are usually united during the optimization.
63
Computing Reaching Definitions with DFA
• Recall that Ni:A be the set of all reaching definitions of variable A in line i.
• Initialization: if in line i there is an assignment of a constant, an expression, or a function to variable A, i.e., non-copying assignment, then Ni:A = {i}. (In this case this is the final value.) Otherwise, Ni:A = (we need to compute this value in iterations.)
• Iteration step:– If in line i variable A is not updated, then
Ni:A = Nx:A Ny:A Nz:A …where x, y, z, etc. are all the lines from which we can directly go to line i.
– If line i contains a copy A=B, we set Ni:A = Ni:B.
64
Reaching Definitions: an Example
(1) if (b == 4) goto (4)(2) a = 5(3) goto 5(4) a = 3(5) if (a > 4) goto 4(6) c = a
ni:a
ni:c
{2}
{4}
ni:a ni:c
{2} {2} {4} {2,4} {2,4}
{2,4}
ni:a ni:c
{2} {2} {4} {2,4} {2,4}
{2,4}
Correctness Idea• If x gets updated in some line i, then after k iterations, a line that can
be executed k steps after i is updated and “knows” that i is a definition for x (if there is no closer definition of x on the path from i to it).
• Proof by induction on the “distance” of the definition from the set being updated.
• As the program is finite, the longest (non-cyclic) path in it is finite as well.
• Note that if there is an iteration with no updates, then there will not be any updates in subsequent iterations.
• We do not provide a full proof.
65
66
A Standard Saving• Running iterations with all instructions is costly for large programs. • A standard solution: run the iterations for basic blocks and not for
each single instruction. • Instead of working with the program instructions graph, we work
with the control flow graph of the basic blocks. • Obtain a smaller graph and (much) faster algorithm. • Sometimes the operations inside the basic block cancel each other
and then the computation becomes easier. • The output is the reaching definition for each block and not each
code line. – Good enough for optimizations– Can be easily extended for each line inside any given block.
67
How Does it Look Inside a Basic Block?
• We have seen earlier the impact of a single instruction like “a = b+c”.
• The impact of a basic block is the sum of all influences in the block.
68
How Does it Look Inside a Basic Block?
• A block uses a variable v if there exists an instruction p that uses v and there is no point p0 prior to p1 that that defines v locally. – Exposed use of v. – Simply put: p1 uses v’s value that was set before the block started.
• A block kills a definition d of variable v if there is an instruction in the block that defines v.
• A definition d to variable v is generated in a block if the definition d is at location p1, and there is no instruction p2 subsequent to p1 that defines v as well. – Locally generated definition.– Simply put: the generated definitions are the definitions of B that do not
get killed inside B.
69
Basic Block Reaching Definitions
• Use DFA to find reaching definitions to all basic blocks. • Data structure:
– IN[B]: all definitions reaching the beginning of B– OUT[B]: all definitions reaching the end of B
• Each assignment gets a name di, and we compute ahead of time the two sets GEN[B] and KILL[B] for each block B. – GEN[B]: set of all definitions generated in B, e.g. GEN[B]={d3,d7,d8}.– KILL[B]: set of all definitions killed in B. In fact, set of all program
definitions that set a value to a variable v that is also assigned in B.
70
Computing Reaching Definitions with DFA
• DFA Initialization: for each block B,– IN[B] = – OUT[B] =
• DFA step: – For each block B, re-compute OUT[B] given IN[B], based only on the
instructions of B. – OUT[B] = ƒB(IN[B])
71
The DFA Step (cont’d)• We need to compute reaching definitions in the end of the block given
reaching definitions in the beginning. • End-of-block reaching definitions = B’s generated definitions +
(definitions that reach the beginning of B – B’s killed definitions)• In other words:
OUT[B] = ƒB(IN[B]) = GEN[B] (IN[B] \ KILL[B])
• To obtain IN[B] we do: IN[B] = OUT[b1] OUT[b2] … OUT[bk] where b1, b2, … , bk are the blocks that reach B directly.
• IN[B] is computed before OUT[B] (which depends on IN[B]).
72
An Example
i = 1m = a[0]
t = a[i]if (t > m)
m = t
i = i + 1if (i < 10)
B1
B2
B3
B4
B5
d1d2
d3
d5
d4
KILL GEN{d4,d5} {d1,d2} B1
{d3} B2{d2} {d4} B3{d1} {d5} B4 B5
IN[B1] = OUT[B1] = {d1,d2}
IN[B2] = {d1,d2}OUT[B2] = {d1,d2,d3}
IN[B3] = {d1,d2,d3}OUT[B3] = {d1,d3,d4}
IN[B4] = {d1,d2,d3,d4}OUT[B4] = {d2,d3,d4,d5}
IN[B5] = OUT[B5] = {d2,d3,d4,d5}
IN[B2] = {d1,d2,d3,d4,d5}OUT[B2] = {d1,d2,d3,d4,d5}
IN[B3] = {d1,d2,d3,d4,d5}OUT[B3] = {d1,d3,d4,d5}
IN[B4] = {d1,d2,d3,d4,d5}
Reaching Definitions with Basic Blocks• As always, execution terminates when there are no modifications in
one iteration. • At the end, the reaching definitions of block B are IN[B].
• We will not prove the algorithm. Some properties: • The values of IN and OUT are always a subset of their real values. • Each definition can only increase the sizes of the subsets. • The final size of is bounded (by the number of definitions in the
program) and hence termination is guaranteed.
• Correctness: – if definition d reaches block B in a path of k blocks, then after k
iterations, IN[B] includes d. – A definition that does not reach B will never enter IN[B].
73
Blocks versus Instructions
74
Instructions Basic Blocks
Work per iteration
Each iteration goes over each pair of line i and variable v
Each iteration goes over each block only
number of iterations
The longest instruction path between a definition and its use
The path length is in blocks and so is the iterations number
Number of variables
number of program variables * number of code lines
Number of basic blocks
Computation OUT[i,a] = GEN[i,a] (and if empty then IN[i,a])
IN[i,a] = OUT[i1] OUT[i2] … OUT[ik]
OUT[B] = GEN[B] (IN[B] \ KILL[B])
IN[B] = OUT[b1] OUT[b2] … OUT[bk]
The output An accurate information for each code line and each variable.
An information on each block on entry and exit.
75
Uses of Reachable Definitions• Determine that a variable has a constant value at a given
point. • Identify a variable that is not initializedint i;if (…) i = 3;x = i; ← error: i might have not been initialized
• In OOP: identify an impossible downcast. • And more…
76
Using DFA for Liveness Analysis• Definition: A variable v is live in program point p if
there is an execution path starting at p, in which there is a use of v before it is defined again.
• We’ve seen previously how to determine v’s liveness inside a basic block. – By going backwards line by line in the block.
• Now let’s do the same computation for a full procedure (or program) – We decide which variables are alive on entry to each basic
block.– Go “backwards” on the CFG.
77
DFA Initialization and Computation.
• IN[B] and OUT[B] which are now sets of variables (and not sets of definitions).
• Initializing the DFA: for all blocks B, IN[B]=OUT[B]=.• Compute in advance:
Use[B]: set of variables that B uses (without redefining them before use)DEF[B]: set of variables that B generates a definition for.
• Computation step: OUT[B] = IN[b1] IN[b2] ... IN[bn] , where b1,…,bn are all blocks reachable from B. IN[B] = fB(OUT[B]) = USE[B] (OUT[B] \ DEF[B])
78
Another Example: Available Expression
• Typically, we assume an entry node B0 in the control flow graph, from which the computation starts.
• Definition: an expression x OP y is available at Point p if each path from the entry point to p has a computation of x OP y with no subsequent update of x or y before reaching p.
• Use for optimization: do not re-compute available expressions. • Talking basic blocks:
– We say that a block kills the expression x OP y if the block assigns a value to x or y and does not re-compute x OP y after the assignment.
– A block generates the expression x OP y if it computes x OP y and does not update x or y after the computation.
79
Data, Initialization, Computation• IN[B] and OUT[B] are sets of expressions. • Initializing the DFA: IN[B]=OUT[B]= for all blocks B. • Compute ahead of time:
eKill[B]: set of expressions that B kills by changing one of the variables in the expression. eGen[B]: set of expressions that B generates.
• Computation step: IN[B]=OUT[b1] OUT[b2] … OUT[bn]
where b1…bn are all blocks from which B is (directly) reachable.OUT[B] = ƒB(IN[B]) = eGEN[B] (IN[B] \ eKILL[B])
Is Everything Fine? • Not Really…• Consider the graph on the left.• IN[B2] = OUT[B1] OUT[B2].
• Suppose “x+y” is computed in B1 but
not in B2.
(and B2 does not kill it).
• Then it is available in B2, but IN[B2] will never see that.
• The problem: outputs of B1 are
available to B2 and should not be eliminated because of it.
80
B1
B2
Solution: Proper Inialization
• Create an empty entry block B0 and set OUT[B0]=
• But for all other blocks set OUT[Bi]=U, where U is the set of all expressions computed in any basic block.
• The computation step remains IN[B] = OUT[b1] OUT[b2] … OUT[bn]where b1…bn are all blocks from which B is (directly) reachable, and OUT[B] = ƒB(IN[B]) = eGEN[B]
(IN[B] \ eKILL[B])
• 81
B1
B2
Example
82
Z=x+y
W=x+y
B0
Z=x+yX=7
W=x+y
OUT[B0] =
OUT[B1] = U
OUT[B2] = U
OUT[B3] = U
OUT[B4] = U
Example
83
Z=x+y
W=x+v
B0
X=7
W=x+y
OUT[B0] =
OUT[B1] = U
OUT[B2] = U
OUT[B3] = U
OUT[B4] = U
IN[B1] =
IN[B2] = U
IN[B3] = U
IN[B4] = U
{x+y}
{x+y , x+v}
{x+y , x+v}
{x+y}
{x+y}
Correctness• Information flows from blocks to their neighbors during DFA steps.
• The values in IN[B0] and OUT[B0] are always empty and correct.• The values of OUT[B] are always a superset of the available
expressions on exit from B. • Induction: after n iterations, OUT[B] is correct for all blocks whose
distance from B0 is less than n. • Proof idea: if there is a path of length n from B0 to block B in which
“x+y” is not computed, then after n iteration OUT[B] will not include “x+y”.
• We do not provide a full proof. • But note that the initialization is crucial.
84
85
ComparisonReaching definitions
Liveness Available expressions
DFA variables are sets of...
definitions program variables
expressions
Computation direction
forwards:OUT[B] =
ƒB(IN[B])
backwards:IN[B] =
ƒB(OUT[B])
forwards:OUT[B] =
ƒB(IN[B])
computation step ƒb(x)
GEN[B] (x \ KILL[B])
USE[B] (x \ DEF[B])
eGEN[B] (x \ eKILL[B])
פעפוע מהשכנים
predאיחוד ה- succאיחוד ה- predחיתוך ה-
Initialization OUT[B]=U
Optimizations Summary• Improve performance, while preserving semantics. • We only mentioned running time optimization (which is common). • Represent the program as a DAG helps identifying common
expressions, eliminate redundant copying, and other analysis. • Often, aliasing makes things tougher. • Basic optimizations: common subexpression elimination, copy
propagation, code motion, strength reduction, dead-code elimination.
• Local optimization framework: peephole optimization• A generic algorithm for computing global information: Data Flow
Analysis. • DFA examples: reaching definitions (at the instruction and at the
basic block level), liveness analysis, available expressions.
86
Course Summary• Lexical analysis: find tokens using DFA. • Parsing: analyze structure using context-free grammars.
– Top-down (LL), bottom-up(LR), lookahead…• Semantic analysis computes attributes of grammar variables
– Many times can be done during parsing. – Check types, and create intermediate code
• Runtime: runtime stacks, memory management. • Code generation: simple code generation and register allocation. • Optimizations, and analysis they employ
87
Administration• Exam on Thursday February 7th. • 20% for “don’t know” (not for a missing answer). • Material: everything that appeared in lectures, exercises, and
homework.• During the last lecture (Thursday 24/1) Adi will run a rehearsal
exercise her in Taub 2. • Help in solving previous year tests in TA’s reception hours.
,מועד ג': רק למילואימניקיםאנא הודיעו גם למרצה וגם למתרגל האחראי על צורך פוטנציאלי במועד ג' עד
.'שבועיים לפני המבחן של מועד ב
88
Typical Test Questions• Short questions:
– What happens during compile time and what during the execution?
– When are errors discovered?• Parsing:
– Build an LR grammar for the language – Is the following grammar in LR(0), SLR(1), LALR(1), LR(1)– Something else…
• Runtime• Backpatching• DFA
90
A question about reference counting
• Q: What is the number of references that can refer to a specific given location?
• A: Entire virtual memory. (Implication: RC size = pointer size)
• Q: Suppose RC has only 3 bits. How can an overflow happen? • A: 9 pointers reference an object. • Q: suppose we consider an RC that has reached “111” as “stuck”
and never change it anymore. How does that influence the execution?
– Will the program run correctly? – Will it consume more memory?
• A: It will run correctly and consume more memory.
91
A question about reference counting
• Q: Propose a manner to fix all stuck counts of live objects. • A: Run tracing (like in mark-sweep).
Upon checking an edge, increment the RC of the child. (Of-course, RC may get stuck again, but those that should not will not…)