School of EECS, Peking University
“Advanced Compiler Techniques” (Fall 2011)
Introduction to Introduction to OptimizationsOptimizations
Guo, YaoGuo, Yao
2Fall 2011“Advanced Compiler
Techniques”
OutlineOutline Optimization RulesOptimization Rules Basic BlocksBasic Blocks Control Flow Graph (CFG)Control Flow Graph (CFG)
LoopsLoops Local OptimizationsLocal Optimizations
Peephole optimizationPeephole optimization
3Fall 2011“Advanced Compiler
Techniques”
Levels of OptimizationsLevels of Optimizations LocalLocal
inside a basic blockinside a basic block Global (intraprocedural)Global (intraprocedural)
Across basic blocksAcross basic blocks Whole procedure analysisWhole procedure analysis
InterproceduralInterprocedural Across proceduresAcross procedures Whole program analysisWhole program analysis
4Fall 2011“Advanced Compiler
Techniques”
The Golden Rules of OptimizationThe Golden Rules of Optimization
Premature Optimization is Premature Optimization is EvilEvil
Donald Knuth, Donald Knuth, premature optimization premature optimization is the root of all evilis the root of all evil Optimization can introduce new, subtle bugsOptimization can introduce new, subtle bugs Optimization usually makes code harder to Optimization usually makes code harder to
understand and maintainunderstand and maintain Get your code right first, then, Get your code right first, then, if really if really
neededneeded, optimize it, optimize it Document optimizations carefullyDocument optimizations carefully Keep the non-optimized version handy, or even Keep the non-optimized version handy, or even
as a comment in your codeas a comment in your code
5Fall 2011“Advanced Compiler
Techniques”
The Golden Rules of OptimizationThe Golden Rules of Optimization
The 80/20 RuleThe 80/20 Rule In general, In general, 80% percent of a program80% percent of a program’’s s
execution time is spent executing execution time is spent executing 20% of the code20% of the code
90%/10% for performance-hungry 90%/10% for performance-hungry programsprograms
Spend your time optimizing the important Spend your time optimizing the important 10/20% of your program10/20% of your program
Optimize the common case even at the Optimize the common case even at the cost of making the uncommon case slowercost of making the uncommon case slower
6Fall 2011“Advanced Compiler
Techniques”
The Golden Rules of OptimizationThe Golden Rules of Optimization
Good Algorithms RuleGood Algorithms Rule The best and most important way of The best and most important way of
optimizing a program is using optimizing a program is using good good algorithmsalgorithms E.g. O(n*log) rather than O(nE.g. O(n*log) rather than O(n22))
However, we still need lower level optimization However, we still need lower level optimization to get more of our programsto get more of our programs
In addition, In addition, asymptotic complexityasymptotic complexity is not is not always an appropriate metric of efficiencyalways an appropriate metric of efficiency Hidden constant may be misleadingHidden constant may be misleading E.g.E.g. a linear time algorithm than runs in 100 a linear time algorithm than runs in 100*n+*n+100 100
time is slower than a cubic time algorithm than runs time is slower than a cubic time algorithm than runs in in nn33++10 time 10 time if the problem size is smallif the problem size is small
7Fall 2011“Advanced Compiler
Techniques”
Asymptotic ComplexityAsymptotic ComplexityHidden ConstantsHidden Constants
Hidden Contants
0
500
1000
1500
2000
2500
3000
0 5 10 15
Problem Size
Exe
cuti
on
Tim
e
100*n+100
n*n*n+10
8Fall 2011“Advanced Compiler
Techniques”
General Optimization General Optimization TechniquesTechniques
Strength reductionStrength reduction Use the fastest version of an operationUse the fastest version of an operation E.g. E.g.
x >> 2x >> 2 instead ofinstead of x / 4x / 4
x << 1x << 1 instead ofinstead of x * 2x * 2
Common sub expression eliminationCommon sub expression elimination Eliminate redundant calculationsEliminate redundant calculations E.g.E.g.
double x = d * (lim / max) * sx;double x = d * (lim / max) * sx;
double y = d * (lim / max) * sy;double y = d * (lim / max) * sy;
double depth = d * (lim / max);double depth = d * (lim / max);
double x = depth * sx;double x = depth * sx;
double y = depth * sy;double y = depth * sy;
9Fall 2011“Advanced Compiler
Techniques”
General Optimization General Optimization TechniquesTechniques
Code motionCode motion InvariantInvariant expressions should be executed only expressions should be executed only
onceonce E.g.E.g.
for (int i = 0; i < x.length; i++)for (int i = 0; i < x.length; i++)
x[i] *= Math.PI * Math.cos(y);x[i] *= Math.PI * Math.cos(y);
double picosy = Math.PI * Math.cos(y);double picosy = Math.PI * Math.cos(y);
for (int i = 0; i < x.length; i++)for (int i = 0; i < x.length; i++)
x[i] *= picosy;x[i] *= picosy;
10Fall 2011“Advanced Compiler
Techniques”
General Optimization General Optimization TechniquesTechniques
Loop unrollingLoop unrolling The overhead of the loop control code can be The overhead of the loop control code can be
reduced by executing more than one iteration in reduced by executing more than one iteration in the body of the loop. the body of the loop. E.g.E.g.double picosy = Math.PI * Math.cos(y);double picosy = Math.PI * Math.cos(y);
for (int i = 0; i < x.length; i++)for (int i = 0; i < x.length; i++)
x[i] *= picosy;x[i] *= picosy;
double picosy = Math.PI * Math.cos(y);double picosy = Math.PI * Math.cos(y);
for (int i = 0; i < x.length; i += 2) {for (int i = 0; i < x.length; i += 2) {
x[i] *= picosy;x[i] *= picosy;
x[i+1] *= picosy;x[i+1] *= picosy;
}}
A efficient “+1” in arrayA efficient “+1” in arrayindexing is requiredindexing is required
11Fall 2011“Advanced Compiler
Techniques”
Compiler OptimizationsCompiler Optimizations
Compilers try to generate Compilers try to generate goodgood code code i.e.i.e. Fast Fast
Code improvement is challengingCode improvement is challenging Many problems are NP-hardMany problems are NP-hard
Code improvement may slow down Code improvement may slow down the compilation processthe compilation process In some domains, such as just-in-time In some domains, such as just-in-time
compilation, compilation speed is criticalcompilation, compilation speed is critical
12Fall 2011“Advanced Compiler
Techniques”
Phases of CompilationPhases of Compilation The first The first
three phases three phases are are language-language-dependentdependent
The last two The last two are are machine-machine-dependentdependent
The middle The middle two two dependent on dependent on neither the neither the language nor language nor the machinethe machine
13Fall 2011“Advanced Compiler
Techniques”
PhasesPhases
14Fall 2011“Advanced Compiler
Techniques”
OutlineOutline Optimization RulesOptimization Rules Basic BlocksBasic Blocks Control Flow Graph (CFG)Control Flow Graph (CFG)
LoopsLoops Local OptimizationsLocal Optimizations
Peephole optmizationPeephole optmization
15Fall 2011“Advanced Compiler
Techniques”
Basic BlocksBasic Blocks A A basic blockbasic block is a maximal sequence of is a maximal sequence of
consecutive three-address instructions consecutive three-address instructions with the following properties:with the following properties: The flow of control can only enter the basic The flow of control can only enter the basic
block thru the 1st instr.block thru the 1st instr. Control will leave the block without halting Control will leave the block without halting
or branching, except possibly at the last or branching, except possibly at the last instr.instr.
Basic blocks become the nodes of a Basic blocks become the nodes of a flow flow graphgraph, with edges indicating the order., with edges indicating the order.
16Fall 2011“Advanced Compiler
Techniques”
ExamplesExamples1)1) i = 1i = 12)2) j = 1j = 13)3) t1 = 10 * it1 = 10 * i4)4) t2 = t1 + jt2 = t1 + j5)5) t3 = 8 * t2t3 = 8 * t26)6) t4 = t3 - 88t4 = t3 - 887)7) a[t4] = 0.0a[t4] = 0.08)8) j = j + 1j = j + 19)9) if j <= 10 goto (3)if j <= 10 goto (3)10)10) i = i + 1i = i + 111)11) if i <= 10 goto (2)if i <= 10 goto (2)12)12) i = 1i = 113)13) t5 = i - 1t5 = i - 114)14) t6 = 88 * t5t6 = 88 * t515)15) a[t6] = 1.0a[t6] = 1.016)16) i = i + 1i = i + 117)17) if i <= 10 goto (13)if i <= 10 goto (13)
forfor i from 1 to 10 i from 1 to 10 dodo forfor j from 1 to 10 j from 1 to 10 dodo a[i,j]=0.0a[i,j]=0.0
forfor i from 1 to 10 i from 1 to 10 dodo a[i,i]=0.0a[i,i]=0.0
17Fall 2011“Advanced Compiler
Techniques”
Identifying Basic BlocksIdentifying Basic Blocks Input: sequence of instructions Input: sequence of instructions
instr(i)instr(i) Output: A list of basic blocksOutput: A list of basic blocks Method:Method:
Identify Identify leadersleaders:: the first instruction of a basic block the first instruction of a basic block
Iterate: add subsequent instructions to Iterate: add subsequent instructions to basic block until we reach another basic block until we reach another leaderleader
18Fall 2011“Advanced Compiler
Techniques”
Identifying LeadersIdentifying Leaders Rules for finding leaders in codeRules for finding leaders in code
First instrFirst instr in the code is a leader in the code is a leader Any instr that is the Any instr that is the targettarget of a of a
(conditional or unconditional) jump is a (conditional or unconditional) jump is a leaderleader
Any instr that Any instr that immediately followimmediately follow a a (conditional or unconditional) jump is a (conditional or unconditional) jump is a leaderleader
19Fall 2011“Advanced Compiler
Techniques”
Basic Block Partition Basic Block Partition AlgorithmAlgorithm
leaders = {1}leaders = {1} // start of program// start of programforfor i = 1 to |n| i = 1 to |n| // all instructions// all instructions
ifif instr(i) is a branch instr(i) is a branchleaders = leaders leaders = leaders UU targets of instr(i) targets of instr(i) UU
instr(i+1)instr(i+1)worklist = leadersworklist = leadersWhileWhile worklist worklist notnot emptyempty
x = first instruction in worklistx = first instruction in worklistworklist = worklist worklist = worklist –– {x} {x}block(x) = {x}block(x) = {x}forfor i = x + 1; i <= |n| && i i = x + 1; i <= |n| && i notnot inin leaders; i++ leaders; i++
block(x) = block(x) block(x) = block(x) UU {i} {i}
20Fall 2011“Advanced Compiler
Techniques”
EE
AA
BB
CC
DD
FF
Basic Block ExampleBasic Block Example
Leaders
1.1. i = 1i = 12.2. j = 1j = 13.3. t1 = 10 * it1 = 10 * i4.4. t2 = t1 + jt2 = t1 + j5.5. t3 = 8 * t2t3 = 8 * t26.6. t4 = t3 - 88t4 = t3 - 887.7. a[t4] = 0.0a[t4] = 0.08.8. j = j + 1j = j + 19.9. if j <= 10 goto (3)if j <= 10 goto (3)10.10. i = i + 1i = i + 111.11. if i <= 10 goto (2)if i <= 10 goto (2)12.12. i = 1i = 113.13. t5 = i - 1t5 = i - 114.14. t6 = 88 * t5t6 = 88 * t515.15. a[t6] = 1.0a[t6] = 1.016.16. i = i + 1i = i + 117.17. if i <= 10 goto if i <= 10 goto
(13)(13)
Basic Blocks
21Fall 2011“Advanced Compiler
Techniques”
OutlineOutline Optimization RulesOptimization Rules Basic BlocksBasic Blocks Control Flow Graph (CFG)Control Flow Graph (CFG)
LoopsLoops Local OptimizationsLocal Optimizations
Peephole optmizationPeephole optmization
22Fall 2011“Advanced Compiler
Techniques”
Control-Flow GraphsControl-Flow Graphs Control-flow graphControl-flow graph::
Node: an instruction or sequence of Node: an instruction or sequence of instructions (a instructions (a basic blockbasic block)) Two instructions i, j in same basic blockTwo instructions i, j in same basic block
iffiff execution of i execution of i guaranteesguarantees execution of j execution of j Directed edge: Directed edge: potentialpotential flow of controlflow of control Distinguished start node Distinguished start node Entry & ExitEntry & Exit
First & last instruction in programFirst & last instruction in program
23Fall 2011“Advanced Compiler
Techniques”
Control-Flow EdgesControl-Flow Edges Basic blocks = nodesBasic blocks = nodes Edges:Edges:
Add directed edge between B1 and B2 if:Add directed edge between B1 and B2 if: Branch from last statement of B1 to first Branch from last statement of B1 to first
statement of B2 (B2 is a leader), orstatement of B2 (B2 is a leader), or B2 immediately follows B1 in program order B2 immediately follows B1 in program order
and B1 does not end with unconditional branch and B1 does not end with unconditional branch (goto)(goto)
Definition of Definition of predecessorpredecessor and and successorsuccessor B1 is a predecessor of B2B1 is a predecessor of B2 B2 is a successor of B1B2 is a successor of B1
24Fall 2011“Advanced Compiler
Techniques”
Control-Flow Edge Control-Flow Edge AlgorithmAlgorithm
InputInput: block(i), sequence of basic blocks: block(i), sequence of basic blocks
OutputOutput: CFG where nodes are basic blocks: CFG where nodes are basic blocks
for i = 1 to the number of blocksfor i = 1 to the number of blocks
x = last instruction of block(i)x = last instruction of block(i)
if instr(x) is a branchif instr(x) is a branch
for each target y of instr(x),for each target y of instr(x),
create edge (i create edge (i ->-> y) y)
if instr(x) is if instr(x) is notnot unconditional branch, unconditional branch,
create edge (i -> i+1)create edge (i -> i+1)
25Fall 2011“Advanced Compiler
Techniques”
CFG ExampleCFG Example
26Fall 2011“Advanced Compiler
Techniques”
LoopsLoops Loops comes fromLoops comes from
while, do-while, for, goto……while, do-while, for, goto…… Loop definition: A set of nodes L in a Loop definition: A set of nodes L in a
CFG is a CFG is a looploop if if1.1. There is a node called the There is a node called the loop entryloop entry: :
no other node in L has a predecessor no other node in L has a predecessor outside L.outside L.
2.2. Every node in L has a nonempty path Every node in L has a nonempty path (within L) to the entry of L.(within L) to the entry of L.
27Fall 2011“Advanced Compiler
Techniques”
Loop ExamplesLoop Examples
{B3}{B3} {B6}{B6} {B2, B3, B4}{B2, B3, B4}
28Fall 2011“Advanced Compiler
Techniques”
Identifying LoopsIdentifying Loops MotivationMotivation
majority of runtimemajority of runtime focus optimization on loop bodies!focus optimization on loop bodies!
remove redundant code, replace expensive remove redundant code, replace expensive operations operations )) speed up program speed up program
Finding loops:Finding loops: easy… easy…
for i = 1 to 1000for i = 1 to 1000
for j = 1 to 1000for j = 1 to 1000
for k = 1 to 1000for k = 1 to 1000
do somethingdo something
11 i = 1; j = 1; k = 1;i = 1; j = 1; k = 1;
22 A1: if i > 1000 goto L1;A1: if i > 1000 goto L1;
33 A2: if j > 1000 goto L2;A2: if j > 1000 goto L2;
44 A3: if k > 1000 goto L3;A3: if k > 1000 goto L3;
55 do somethingdo something
66 k = k + 1; goto A3;k = k + 1; goto A3;
77 L3: j = j + 1; goto A2;L3: j = j + 1; goto A2;
88 L2: i = i + 1; goto A1;L2: i = i + 1; goto A1;
99 L1: haltL1: halt
or harderor harder(GOTOs)(GOTOs)
29Fall 2011“Advanced Compiler
Techniques”
OutlineOutline Optimization RulesOptimization Rules Basic BlocksBasic Blocks Control Flow Graph (CFG)Control Flow Graph (CFG)
LoopsLoops Local OptimizationsLocal Optimizations
Peephole optmizationPeephole optmization
30Fall 2011“Advanced Compiler
Techniques”
Local OptimizationLocal Optimization
Optimization of basic blocksOptimization of basic blocks
§8.5§8.5
31Fall 2011“Advanced Compiler
Techniques”
Transformations on basic Transformations on basic blocksblocks
Common subexpression eliminationCommon subexpression elimination: recognize : recognize redundant computations, replace with single redundant computations, replace with single temporarytemporary
Dead-code eliminationDead-code elimination: recognize computations : recognize computations not used subsequently, remove quadruplesnot used subsequently, remove quadruples
Interchange statements, for better schedulingInterchange statements, for better scheduling Renaming of temporaries, for better register Renaming of temporaries, for better register
usageusage
All of the above require All of the above require symbolic executionsymbolic execution of of the basic block, to obtain def/use informationthe basic block, to obtain def/use information
32Fall 2011“Advanced Compiler
Techniques”
Simple symbolic Simple symbolic interpretation:interpretation:
next-use information next-use information If x is computed in statementIf x is computed in statement ii,, and is an and is an
operand of statementoperand of statement j, j > ij, j > i,, its value must its value must be preserved (register or memory) until be preserved (register or memory) until jj..
If x is computed at If x is computed at k, k > ik, k > i,, the value the value computed at i has no further use, and be computed at i has no further use, and be discarded (i.e. register reused)discarded (i.e. register reused)
Next-use information is annotated over Next-use information is annotated over statementsstatements and symbol table.and symbol table.
Computed on one backwards pass over Computed on one backwards pass over statement.statement.
33Fall 2011“Advanced Compiler
Techniques”
Next-Use InformationNext-Use Information DefinitionsDefinitions
1.1. Statement Statement i i assigns a value to assigns a value to x;x;
2.2. Statement Statement j j has has xx as an operand; as an operand;
3.3. Control can flow from i to j along a Control can flow from i to j along a path with no intervening assignments path with no intervening assignments to x;to x;
Statement Statement j usesj uses the value of the value of xx computed at statement computed at statement ii..
i.e.,i.e., x x isis live live at statementat statement i. i.
34Fall 2011“Advanced Compiler
Techniques”
Computing next-useComputing next-use
Use symbol table to annotate Use symbol table to annotate status of variablesstatus of variables
Each operand in a statementEach operand in a statement carries additional information:carries additional information: Operand liveness (boolean)Operand liveness (boolean) Operand next use (later statementOperand next use (later statement))
On exit from block, all temporaries On exit from block, all temporaries are dead (no next-use)are dead (no next-use)
35Fall 2011“Advanced Compiler
Techniques”
AlgorithmAlgorithm INPUT: a basic block BINPUT: a basic block B OUTPUT: at each statement OUTPUT: at each statement i: x=y op zi: x=y op z in B, in B,
create liveness and next-use for x, y, zcreate liveness and next-use for x, y, z METHOD: for each statement in B METHOD: for each statement in B
(backward)(backward) Retrieve liveness & next-use info from a tableRetrieve liveness & next-use info from a table Set Set xx to “ to “not livenot live” and “” and “no next-useno next-use”” Set Set y, zy, z to “ to “livelive” and the next uses of y,z to “” and the next uses of y,z to “ii””
Note: step 2 & 3 cannot be interchanged.Note: step 2 & 3 cannot be interchanged. E.g., x = x + yE.g., x = x + y
36Fall 2011“Advanced Compiler
Techniques”
ExampleExample
1.1. x = 1x = 1
2.2. y = 1y = 1
3.3. x = x + yx = x + y
4.4. z = yz = y
5.5. x = y + zx = y + z
Exit:x: live, 6 y: not livez: not live
37Fall 2011“Advanced Compiler
Techniques”
Computing dependencies in a Computing dependencies in a basic block: the DAGbasic block: the DAG
Use Use directed acyclic graphdirected acyclic graph (DAG) to recognize (DAG) to recognize common subexpressions and remove redundant common subexpressions and remove redundant quadruples.quadruples.
Intermediate code optimization:Intermediate code optimization: basic block => DAG => improved block => basic block => DAG => improved block =>
assemblyassembly
Leaves are labeled with identifiers and constants.Leaves are labeled with identifiers and constants. Internal nodes are labeled with operators and Internal nodes are labeled with operators and
identifiersidentifiers
38Fall 2011“Advanced Compiler
Techniques”
DAG constructionDAG construction Forward pass over basic blockForward pass over basic block For For x = y x = y opop z; z;
Find node labeled y, or create one Find node labeled y, or create one Find node labeled z, or create oneFind node labeled z, or create one Create new node for Create new node for opop, or find an existing one , or find an existing one
with descendants y, z (with descendants y, z (need hash schemeneed hash scheme)) Add x to list of labels for new nodeAdd x to list of labels for new node Remove label x from node on which it appearedRemove label x from node on which it appeared
For For x = y;x = y; Add x to list of labels of node which currently Add x to list of labels of node which currently
holds yholds y
39Fall 2011“Advanced Compiler
Techniques”
DAG ExampleDAG Example Transform a basic block into a Transform a basic block into a
DAG.DAG.
a = b + ca = b + c
b = a – db = a – d
c = b + cc = b + c
d = a - dd = a - d
+
+
-
b0 c0
d0a
b, d
c
40Fall 2011“Advanced Compiler
Techniques”
Local Common Subexpr. Local Common Subexpr. (LCS)(LCS)
Suppose b is not live on exit.Suppose b is not live on exit.
a = b + ca = b + c
b = a – db = a – d
c = b + cc = b + c
d = a - dd = a - d
+
+
-
b0 c0
d0a
b, d
c
a = b + ca = b + c
d = a – dd = a – d
c = d + cc = d + c
41Fall 2011“Advanced Compiler
Techniques”
LCS: another exampleLCS: another example
a = b + ca = b + c
b = b – db = b – d
c = c + dc = c + d
e = b + ce = b + c
++ -
b0 c0 d0
a b
e+
c
42Fall 2011“Advanced Compiler
Techniques”
Common subexpCommon subexp
Programmers donProgrammers don’’t produce t produce common subexpressions, code common subexpressions, code generators do!generators do!
43Fall 2011“Advanced Compiler
Techniques”
Dead Code EliminationDead Code Elimination
Delete any root that has no live Delete any root that has no live variables attachedvariables attached
a = b + ca = b + c
b = b – db = b – d
c = c + dc = c + d
e = b + ce = b + c++ -
b0 c0 d0
a b
e+
c On exit:
a, b live
c, e not live
a = b + ca = b + c
b = b – db = b – d
44Fall 2011“Advanced Compiler
Techniques”
OutlineOutline Optimization RulesOptimization Rules Basic BlocksBasic Blocks Control Flow Graph (CFG)Control Flow Graph (CFG)
LoopsLoops Local OptimizationsLocal Optimizations
Peephole optmizationPeephole optmization
45Fall 2011“Advanced Compiler
Techniques”
Peephole OptimizationPeephole Optimization Dragon§8.7Dragon§8.7
Introduction to peepholeIntroduction to peephole Common techniquesCommon techniques Algebraic identitiesAlgebraic identities An exampleAn example
46Fall 2011“Advanced Compiler
Techniques”
Peephole OptimizationPeephole Optimization Simple compiler do not perform machine-Simple compiler do not perform machine-
independent code improvementindependent code improvement They generates They generates naivenaive code code
It is possible to take the target hole and It is possible to take the target hole and optimize itoptimize it Sub-optimal sequences of instructions that Sub-optimal sequences of instructions that
match an optimization pattern are transformed match an optimization pattern are transformed into optimal sequences of instructionsinto optimal sequences of instructions
This technique is known as This technique is known as peephole optimization
Peephole optimization usually works by sliding a Peephole optimization usually works by sliding a window of several instructions (a window of several instructions (a peepholepeephole))
47Fall 2011“Advanced Compiler
Techniques”
Peephole OptimizationPeephole Optimization
Goals:- improve performance- reduce memory footprint- reduce code size
Method: 1. Exam short sequences of target instructions 2. Replacing the sequence by a more efficient one.
• redundant-instruction elimination • algebraic simplifications• flow-of-control optimizations • use of machine idioms
48Fall 2011“Advanced Compiler
Techniques”
Peephole OptimizationPeephole OptimizationCommon TechniquesCommon Techniques
49Fall 2011“Advanced Compiler
Techniques”
Peephole OptimizationPeephole OptimizationCommon TechniquesCommon Techniques
50Fall 2011“Advanced Compiler
Techniques”
Peephole OptimizationPeephole OptimizationCommon TechniquesCommon Techniques
51Fall 2011“Advanced Compiler
Techniques”
Peephole OptimizationPeephole OptimizationCommon TechniquesCommon Techniques
52Fall 2011“Advanced Compiler
Techniques”
Algebraic identitiesAlgebraic identities Worth recognizing single instructions with Worth recognizing single instructions with
a constant operanda constant operand Eliminate computationsEliminate computations
A * 1 = AA * 1 = A A * 0 = 0A * 0 = 0 A / 1 = AA / 1 = A
Reduce strenthReduce strenth A * 2 = A + AA * 2 = A + A A/2 = A * 0.5A/2 = A * 0.5
Constant foldingConstant folding 2 * 3.14 = 6.282 * 3.14 = 6.28
More delicate with floating-pointMore delicate with floating-point
53Fall 2011“Advanced Compiler
Techniques”
Is this ever helpful?Is this ever helpful? Why would anyone write X * 1?Why would anyone write X * 1? Why bother to correct such obvious junk Why bother to correct such obvious junk
code?code? In fact one might writeIn fact one might write
#define MAX_TASKS 1#define MAX_TASKS 1......a = b * MAX_TASKS;a = b * MAX_TASKS;
Also, seemingly redundant code can be Also, seemingly redundant code can be produced by other optimizations. produced by other optimizations. This is an important effect.This is an important effect.
54Fall 2011“Advanced Compiler
Techniques”
Replace Multiply by ShiftReplace Multiply by Shift A := A * 4;A := A * 4;
Can be replaced by 2-bit left shift Can be replaced by 2-bit left shift (signed/unsigned)(signed/unsigned)
But must worry about overflow if language But must worry about overflow if language doesdoes
A := A / 4;A := A / 4; If If unsignedunsigned, can replace with shift right, can replace with shift right But shift right arithmetic is a well-known But shift right arithmetic is a well-known
problemproblem Language may allow it anyway (traditional C)Language may allow it anyway (traditional C)
55Fall 2011“Advanced Compiler
Techniques”
The Right Shift problemThe Right Shift problem Arithmetic Right shift: Arithmetic Right shift:
shift right and use sign bit to fill most shift right and use sign bit to fill most significant bitssignificant bits
-5 -5 111111...1111111011111111...1111111011 SAR SAR 111111...1111111101111111...1111111101 which is -3, not -2which is -3, not -2 in most languages -5/2 = -2in most languages -5/2 = -2
56Fall 2011“Advanced Compiler
Techniques”
Addition chains for Addition chains for multiplicationmultiplication
If multiply is very slow (or on a machine If multiply is very slow (or on a machine with no multiply instruction like the with no multiply instruction like the original SPARC), decomposing a constant original SPARC), decomposing a constant operand into sum of powers of two can be operand into sum of powers of two can be effective:effective: X * 125 = x * 128 - x*4 + xX * 125 = x * 128 - x*4 + x two shifts, one subtract and one add, which two shifts, one subtract and one add, which
may be faster than one multiplymay be faster than one multiply Note similarity with efficient exponentiation Note similarity with efficient exponentiation
methodmethod
57Fall 2011“Advanced Compiler
Techniques”
Flow-of-control Flow-of-control optimizationsoptimizations
goto L1 . . .L1: goto L2
goto L2 . . .L1: goto L2
if a < b goto L1 . . .L1: goto L2
if a < b goto L2 . . .L1: goto L2
goto L1 . . .L1: if a < b goto L2L3:
if a < b goto L2 goto L3 . . .L3:
58Fall 2011“Advanced Compiler
Techniques”
Peephole Opt: an Peephole Opt: an ExampleExample
debug = 0. . .if(debug) { print debugging information }
debug = 0 . . . if debug = 1 goto L1 goto L2L1: print debugging informationL2:
Source Code:
IntermediateCode:
59Fall 2011“Advanced Compiler
Techniques”
Eliminate Jump after Eliminate Jump after JumpJump
Before:
After:
debug = 0 . . . if debug = 1 goto L1 goto L2L1: print debugging informationL2:
debug = 0 . . . if debug 1 goto L2 print debugging informationL2:
60Fall 2011“Advanced Compiler
Techniques”
Constant PropagationConstant Propagation
Before:
After:
debug = 0 . . . if debug 1 goto L2 print debugging informationL2:
debug = 0 . . . if 0 1 goto L2 print debugging informationL2:
61Fall 2011“Advanced Compiler
Techniques”
Unreachable CodeUnreachable Code(dead code elimination)(dead code elimination)
Before:
After:
debug = 0 . . .
debug = 0 . . . if 0 1 goto L2 print debugging informationL2:
62Fall 2011“Advanced Compiler
Techniques”
Peephole Optimization Peephole Optimization SummarySummary
Peephole optimization is very fastPeephole optimization is very fast Small overhead per instruction since Small overhead per instruction since
they use a small, fixed-size windowthey use a small, fixed-size window
It is often easier to generate naIt is often easier to generate naïïve ve code and run peephole optimization code and run peephole optimization than generating good code!than generating good code!
63Fall 2011“Advanced Compiler
Techniques”
SummarySummary Introduction to optimizationIntroduction to optimization
Basic knowledgeBasic knowledge Basic blocksBasic blocks Control-flow graphsControl-flow graphs
Local OptimizationsLocal Optimizations Peephole optimizationsPeephole optimizations
64Fall 2011“Advanced Compiler
Techniques”
Next TimeNext Time Dataflow analysisDataflow analysis
Dragon§9.2 Dragon§9.2