1
Efficiently Exploring Compiler Optimization Sequences With
Pairwise PruningMilind Chabbi
John Mellor-CrummeyKeith Cooper
RICE UNIVERSITYDEPARTMENT OF COMPUTER SCIENCE
This work is funded by the Defense Advanced Research Projects Agency (DARPA) through the Air Force Research Lab (AFRL).
2
Compiler Optimization Phase-Ordering Problem
Order of application of compiler optimizations drastically changes measured performanceKulkarni et al. [CGO’ 06] show 38% average code size
reductionZhao et al. [CGO’09] show up to 32% speedupProduction compilers still use fixed order
Figure credit : Zhao et al. [CGO’09]
Exascale systems multiply the cost of poor node performance
3
Phase-Order Selection Is Hard Selecting best phase order is non-trivial
Program dependent Relations between optimizations are complex
• One optimization can enable/disable another
Exhaustive empirical exploration is expensive and unrealistic 20 Optimization 2.5 * 1018 possible optimization sequences “Exhaustive optimization phase order space exploration.” [Kulkarni et al. CGO '06]
• Many optimization orders lead to structurally same function instances
Approaches Analytically modeling code and effects of optimization is non-trivial and still in
infancy• “M. L. A framework for exploring optimization properties.” [Zhao et al. CC '09]
Other techniques have been tried and proven to be effective • Genetic algorithms [Cooper et al. SIGPLAN Workshop on Languages, Compilers, and Tools
for Embedded Systems 1999]
4
Roadmap
Phase order selection using pairwise constraints between optimizations
Graph model
Regression model
Conditional Sampling modelWill show effectiveness on sample numerical program FMIN throughout the discussion with dynamic instruction count (DIC) as our optimization metric
5
Interaction Is Significant Between Pairs
Interaction is significant between pairsCapture the ordering of pairs without regard
to their absolute positionsa b
a b
b a
b a
Good
Good
Bad
Bad
6
Pruning Using Pairwise Constraints Generate all possible optimization pairs of length 2 and record their
performance characteristics pairs to empirically evaluate
• 20 optimization 380 pairs vs. 2.5 * 1018 sequences For k-wise , it will be groups to empirically evaluate
Compare performance of each pair with its reverse to build pair-ordering constraints
If < then, final sequence will look like:
Reduces search space from O(n!) to O(n2) Not a silver bullet strategy
Can be used to augment other search space pruning techniques
a b b a
a b
7
Background And Effectiveness Of Pairwise Pruning
Used by test community In software testing : multiple input variables taking multiple values
cause combinatorial explosion Pairwise (a.k.a. all-pairs) testing is based on the observation that
most faults are caused by interactions of at most two factors. Pairwise-generated test suites cover all combinations of two therefore are
much smaller than exhaustive ones yet still very effective in finding defects
K. Burr and W. Young [STAR’98]
D. R. Wallace and D. R. Kuhn[International Journal of Reliability, Quality and Safety Engineering,2001]
8
Roadmap
Phase order selection using pairwise constraints between optimizations
Graph model
Regression model
Conditional Sampling model
9
Graph Model
Nodes represent optimizations : E.g. { a, b, c}Directed edges represent optimization ordersGraph construction
Empirically evaluate all pairs to add edges• ab < ba edge (a,b)• ac < ca edge (a,c)• cb < bc edge (c,b)
Add weights to edges based on profitability• E.g. (ab) Vs. (ba) has profit of 20%
a
b c
20 15
30
Graph may be cyclic or acyclic
10
Phase Order Selection For Acyclic Graphs
Topologically sort graph nodes to get a sequence Such sequence (if exists) maintains all pair-
ordering constrains a
b c
20 15
30
Model found best sequence
11
Phase Order Selection For Graphs With Cycles
Cyclic ordering constraints:ab < ba edge (a,b)bc < cb edge (b,c)ca < ac edge (c,a)
Select an edge to break in each cycle Select edge to minimize total weight of deleted
edges (minimizes cost of pair-ordering constraint violation)E.g. break edge (c,a)
Optimal sequence is : abc
a
b c
20 15
30
12
Graph Model On FMIN• 13 optimizations• ~6 billion search space• Measure benefit of =
156 pairwise orderings• Model found best had
1111 DIC• 1103 DIC was best
among 5000 random sequences
• We were within 0.73% of the best
• 3.9% of the sequences in the random sampling were better than the model found best
13
Performance Estimation
Want to predict performance of any random sequence
Useful to ensure that a given sequence optimized for one objective function does not dramatically worsen another objective E.g. Speed vs. Code size
Provides an analytical model for performance prediction
14
Graph Model For Performance Estimation
Graph model has built-in ability to estimate performance of a given sequence
To estimate the performance of a random sequence: Perform a walk on the graph using the given
sequence Add weights of violated ordering-preference along
the walk to the performance number of the model found best sequence (already known)
15
Example Graph Model For Performance Estimation
Let observed performance of model found best sequence (abcd) be 1200 instructions
Estimated performance of sequence dacb is:1200 +
a
b
c
d
120
20
30
40
60
50
+ + + = 1340
Edges decorated with absolute difference
not relative %
d
a
c
b
30
40
50
20
16
Performance Estimation With Graph Model On FMIN
6 optimizations i.e. 720 sequences
1 27 53 79 1051311571832092352612873133393653914174434694955215475735996256516777031221
1241
1261
1281
1301
1321
1341
1361
1381
1401
Graphical model-predicted DIC Observed DIC
Optimization sequence sorted by DIC
DIC
Divergence + Phase mismatch
17
Issues With Graph Model
Considered just pairs of optimizations of length 2Neglected global behavior of optimizations
Assumed weights or behaviors of pairs to be context-insensitive (i.e. same even in full length sequence)
Want a model that is context-sensitive
18
Roadmap
Phase order selection using pairwise constraints between optimizations
Graph model
Regression model
Conditional Sampling model
19
Getting Context Sensitive WithRegression Model
Take into account context of the pairs by sampling full-length sequences
Represent sequences by regression equationsRepresent all possible pairs as a parameter vectorPresence / absence of pairs in a sequence as
input variables Observed performance of a sequence as
measured value
X =
Input variables
Parameter vector
Measured value
20
Example Linear Regression Model
Optimizations : { a, b, c }Sequence :
Equation :
a b c
Xab Xba Xac Xca Xbc Xcb
1 0 1 0 1 0 1045Xabc
0 1 1 0 1 0 1050Xbac
Measured value
… …
Parameter vector
21
Analytical Model For A Sequence
Sample unique sequences Solve the linear regression to obtain value of
each of Xij
Given a sequence : Analytically projected performance is :
Xcb + Xca + Xba
c b a
22
Regression Model On FMIN
Sequence of length 66! = 720 total sequences
1 25 49 73 97 1211451691932172412652893133373613854094334574815055295535776016256496736971221
1231
1241
1251
1261
1271
1281
1291
Observed DIC Model-predicted DICOptimization sequences sorted by observed DIC
DIC
No phase mismatch, less divergence
23
Analysis of Regression-equation: Optimization Grouping Effect
Sequence of length 66! = 720 total sequences
1 25 49 73 97 1211451691932172412652893133373613854094334574815055295535776016256496736971221
1231
1241
1251
1261
1271
1281
1291
Observed DIC Model-predicted DICOptimization sequences sorted by observed DIC
DIC
gn,ln,mn
lg, lm
lg, lm
24
Refined Regression Model
100% sampling to solve regression equation
1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 341 361 381 401 421 441 461 481 501 521 541 561 581 601 621 641 661 681 7011221
1231
1241
1251
1261
1271
1281
1291
Observed DIC Model predicted DIC after augmentation
Optimization sequence sorted by DIC
DIC
Superior projections, perfect corelation
25
Regression Model With Reduced Sampling Rate
1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301 321 341 361 381 401 421 441 461 481 501 521 541 561 581 601 621 641 661 681 7011221
1231
1241
1251
1261
1271
1281
1291
Observed DIC Model predicted DIC with 12% sampling
Optimization sequence sorted by DIC
DIC
12% sampling
26
Roadmap
Phase order selection using pairwise constraints between optimizations
Graph model
Regression model
Conditional Sampling model
27
Properties of Pairs Across Phase Shifts
1 25 49 73 97 1211451691932172412652893133373613854094334574815055295535776016256496736971221
1231
1241
1251
1261
1271
1281
Observed DIC
Optimization sequences sorted by observed DIC
DIC
(m,n) = 0% (m,n) = 66.6%
(l,n) = 0% (l,n) = 66.6%
(g,n) = 0% (g,n) = 66.6%
28
Properties of Pairs Across Phase Shifts
1 25 49 73 97 1211451691932172412652893133373613854094334574815055295535776016256496736971221
1231
1241
1251
1261
1271
1281
Observed DIC
Optimization sequences sorted by observed DIC
DIC
(l,g) = 0% (l,g) = 75%
(l,g) =
0%
(l,g) = 75%
mn,ln,gn shift
(l,m) = 0% (l,m) = 75%
(l,m) = 0%
(l,m) = 75%
29
Properties of Pairs Across Phase Shifts
1 25 49 73 97 1211451691932172412652893133373613854094334574815055295535776016256496736971221
1231
1241
1251
1261
1271
1281
Observed DIC
Optimization sequences sorted by observed DIC
DIC
mn,ln,gn shift
lm, lg shift (c,d) = 0%(c,d) = 100%
0%100%
0%100%
0%100%
30
Conditional Sampling Model
Sample k << n! full length sequences that satisfy a set of pairwise ordering constraints CInitially C = {}We sampled 100 sequences in our implementation
Identify largest phase shiftObtain pattern on either side of largest phase shift
e.g. pairs present with 100% or 0% on one sideAdd pairwise constrains favoring better performance to CRepeat sampling and refining C until we reach a
performance plateau
31
Conditional Sampling On FMINConditions:
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 961103
1123
1143
1163
1183
1203
1223
1243
1263
1283
1303
Optimization sequence sorted by DIC
DIC (o,d) = 100% (o,d) = 17%
od
13 optimization : {a, b, c, d, g, l, m, n, o, q, t, v, z}
32
Conditional Sampling On FMINConditions:
od
13 optimization : {a, b, c, d, g, l, m, n, o, q, t, v, z}
vd
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 961103
1123
1143
1163
1183
1203
1223
1243
1263
1283
1303
Optimization sequence sorted by DIC
DIC Shift
(v,d) = 100% (v,d) = 60%
33
Conditional Sampling On FMIN
an,oa,bn,cn,dn,gn,ln,ol, mn,on, qn, tn, vn, zn, oq, ov
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 961103
1108
1113
1118
1123
1128
1133
1138
Optimization sequence sorted by DIC
DIC
Shift
Conditions:
od
vd
an , oa, bn, cn,
dn, gn, ln, ol, mn, on, qn,
tn, vn, zn, oq, ov
= 100%
an = 39%
cn = 39%
dn = 43%
gn = 37%
ln = 37%
ol = 79%
mn = 40%
on = 71%
qn = 37%
oq = 79%
ov = 100%
tn = 37%
oa = 80%
bn = 46%
vn = 13%
zn = 61%
34
Conditional Sampling On FMIN
cd, cv
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 961103
1103.5
1104
1104.5
1105
1105.5
Optimization sequence sorted by DIC
DIC
Shift
(c,d) = 100% (c,d) = 0%
(c,v) = 100% (c,v) = 0%
13 optimization : {a, b, c, d, g, l, m, n, o, q, t, v, z}
an,oa,bn,cn,dn,gn,ln,ol, mn,on, qn, tn, vn, zn, oq, ov
Conditions:
od
vd
35
Conditional Sampling On FMIN
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 961103
1103.5
1104
1104.5
1105
1105.5
Optimization sequence sorted by DIC
DIC
Required 500 samples i.e.
8 * 10-6 % sampling
cd, cv
an,oa,bn,cn,dn,gn,ln,ol, mn,on, qn, tn, vn, zn, oq, ov
Conditions:
od
vd
36
Summary
Order of application of compiler optimizations has dramatic effect on performance
“Pairwise pruning” reduces empirical search space by several orders of magnitude, yet effective
Three models of pairwise pruningContext insensitive graph modelContext sensitive regression modelContext sensitive Conditional Sampling model
Initial results are encouragingTechnique can be used to augment other search space
pruning techniques
37
Backup slides
In our implementation we represent presence of pair by 1 and absence by -1 Reduces unknowns to
We add a residue term Xresidue to account for residual minimum advantage of applying each optimization i
Each Xij accounts only for the advantage/disadvantage of the ordered pair (i,j)
Standalone strength of optimizations i and j are accounted in Xresidue
38
Challenges And Opportunities
Not a silver bullet strategySometimes patterns may not be as distinct as 0% or
100%, we may have to choose pattern based on higher percentage on one side • E.g. 90% on left vs. 30% on right
In our experiments we always took 100 samples, we can tune it with various techniquesVuduc et al. [International Journal of High Performance Computing
Applications - 2004] suggest a statistical early stopping criterion which suggests when sampling can be stopped
39
Graph Model On FMIN
Six optimizations : {c,d,g,l,m,n}
Model found optimal sequence : cndgml
Model found sequence had dynamic instruction count of 1221 which was best among entire 720 possible sequences