Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution
The University of Texas at Austin *Oregon Microarchitecture LabElectrical and Computer Engineering Intel Corporation
Hyesoon KimOnur MutluJared Stark*Yale N. Patt
2
Talk Outline
Problem
Wish Branches
Experimental Methodology
Results
Conclusion
3
Predicated Execution
Convert control flow dependency to data dependencyPro: Eliminate hard-to-predict branches
(normal branch code)
C B
D
AT N
p1 = (cond) branch p1, TARGET
mov b, 1 jmp JOIN
TARGET: mov b, 0
A
B
C
B
C
D
A
(predicated code)
A
B
C
if (cond) { b = 0;}else { b = 1;}
Cons: (1) Fetch blocks B and C all the time (2) Wait until p1 is resolved
Dadd x, b, 1
p1 = (cond)
(!p1) mov b, 1
(p1) mov b, 0
4
p1 = (cond)
(!p1) mov b, 1
(p1) mov b, 0
2.02
0
0.2
0.4
0.6
0.8
1
1.2
gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG
No
rma
lize
d e
xe
cu
tio
n t
ime
PREDICATED CODE
NO-DEPENDENCY
NO-DEPENDENCY + NO-FETCH
2.02
0
0.2
0.4
0.6
0.8
1
1.2
gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG
No
rma
lize
d e
xe
cu
tio
n t
ime
PREDICATED CODE
NO-DEPENDENCY
NO-DEPENDENCY + NO-FETCH
2.02
0
0.2
0.4
0.6
0.8
1
1.2
gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG
No
rma
lize
d e
xe
cu
tio
n t
ime
PREDICATED CODE
NO-DEPENDENCY
NO-DEPENDENCY + NO-FETCH
The Overhead of Predicated Execution
If all overhead is ideally eliminated, predicated execution would
provide 16% improvement in average execution time
A
B
C
(Predicated code)
D add x, b, 1
non-predicated
p1 = (cond)
(0) mov b,1
(1) mov b,0
-2%13%16%
5
The Problem
Due to the predication overhead, predicated execution sometimes reduces
performance
Branch misprediction characteristics are dependent on run-time behavior: input set,
control-flow path and phase behavior. The compiler cannot accurately
estimate the run-time behavior of branches
6
Talk Outline
Problem
Wish Branches
Experimental Methodology
Results
Conclusion
7
Wish Branches
A new type of control flow instruction 3 types: wish jump/join and wish loop
The compiler generates code (with wish branches) that can be executed either as predicated code or non-predicated code (normal branch code)
The hardware decides to execute predicated code or normal branch code at run-time based on the confidence of branch prediction
Easy to predict: normal branch code Hard to predict: predicated code
8
TARGET: (p1) mov b,0TARGET: (1) mov b,0
(!p1) mov b,1 wish.join !p1 JOIN
(1) mov b,1 wish.join (1) JOIN
Low ConfidenceWish Jump/Join
p1 = (cond) branch p1, TARGET
C B
D
AT N
mov b, 1 jmp JOIN
TARGET: mov b,0
normal branch code
A
B
C
B
C
D
A
p1 = (cond)
(!p1) mov b,1
(p1) mov b,0
predicated code
A
B
C
wish jump/join code
B
A
C
D
wish jump
p1=(cond) wish.jump p1 TARGET
A
B
C
wish join
DJOIN:
High Confidence
nop
nop
Taken
Not-Taken
9
Low Confidence
Wish Loop
X
Y
N
T
LOOP: add a, a, 1 add i, i, 1 p1 = (i<N) branch p1, LOOP
EXIT:
X
Y
N
T
H
mov p1, 1
LOOP: (p1) add a, a, 1 (p1) add i, i, 1 (p1) p1 = (cond) wish. loop p1, LOOP
EXIT:
normal backward branch code
do {
a++;
i++;
} while (i<N);
XH
X
wish loop code
Y Y
High Confidence
(1)(1)(1)
10
Mispredicted Case 1: Early-Exit
X1 X2 X3 Y
T T N
Correct execution:
Early-exit:
(Low confidence)
X1 X2
T
Y
N
X3 Y
N
Flush pipeline
Compared to normal branch code: predicate data dependency and one extra instruction (-)
…
X
Y
N
T
H
H
H
11
Mispredicted Case 2: Late-Exit
X1 X2 X3 Y
T T N
Correct execution:
Late-exit:
(Low confidence)
X1 X2
T
X3
T
Compared to normal branch code: pro: reduce flush penalty (+++)
cons: predicate data dependency and one extra instruction (-)
T
X4
T
X5
N
Y …nop nopX
Y
N
T
H
H
H
12
Mispredicted Case 3: No-Exit
X1 X2 X3 Y
T T N
Correct execution:
No-exit:
(Low confidence)
X1 X2
T
X3
T
Compared to normal branch code: predicate data dependency and one extra instruction (-)
T
X4
T
X5
T
X6 …
T
Flush pipeline
Y
X
Y
N
T
H
H
H
13
Advantages/Disadvantages of Wish Branches
Advantages compared to predicated execution Reduce the overhead of predication Increase the benefits of predicated code by
allowing the compiler to generate more aggressively-predicated code
Provide a mechanism to exploit predication to reduce the branch misprediction penalty for backward branches (Wish loops)
Make predicated code less dependent on machine configuration (eg. branch predictor)
14
Advantages/Disadvantages of Wish Branches
Disadvantages compared to predicated execution
Extra branch instructions use machine resources
Extra branch instructions increase the contention for branch predictor table entries
May constrain the compiler’s scope for code optimizations
15
Wish Branch Support
ISA Support predicated execution, wish branch instruction
Compiler Support Wish branch generation algorithms
The compiler needs to decide which branches are predicated, which are converted to wish branches, and which stay as normal branches
Hardware Support Confidence estimator Front-end and branch misprediction
detection/recovery module
16
Talk Outline
Problem
Wish Branches
Experimental Methodology
Results
Conclusion
17
Experimental Infrastructure
IA-64 provides full support for predication Convert IA-64 traces to micro-ops to simulate
an out-of-order superscalar processor model
IA-64Compiler
(ORC)
SourceCode
IA-64 Binary
IA-64 Trace µopsTrace
generationmodule
Micro-opTranslator
Micro-opSimulator
18
Simulation Methodology
Nine SPEC 2000 integer benchmarks Baseline Processor Configuration
Front End Large and accurate branch predictor (64KB
hybrid branch predictor: gshare + local) Minimum 30-cycle branch misprediction penalty 64KB, 2-cycle latency I-cache
Execution Core 8-wide out-of-order processor 512-entry instruction window
Confidence Estimator 1KB tagged 16-bit history JRS confidence
estimator (Jacobsen et al. MICRO-29)
19
Talk Outline
Problem
Wish Branches
Experimental Methodology
Results
Conclusion
20
2.02
0
0.2
0.4
0.6
0.8
1
1.2
gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG AVGnomcf
No
rma
lize
d e
xecu
tion
tim
e.
SELECTIVE-PREDICATIONAGGRESSIVE-PREDICATIONwish jump/joinwish jump/join/loop
SELECTIVE-PREDICATION: branches are selectively predicated using
compile-time cost-benefit analysis
AGGRESSIVE-PREDICATION: all branches that are suitable for if-
conversion are predicated
16% over conditional branch prediction (w/o mcf)
11% over selective-predication (w/o mcf)
7 % over aggressive predication (w/o mcf)
14% over conditional branch prediction and
13% over selective-predication and
16% over aggressive-predication
12% over conditional branch prediction
11% over selective-predication
13 % over aggressive predication
2.02
0
0.2
0.4
0.6
0.8
1
1.2
gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG AVGnomcf
No
rma
lize
d e
xecu
tion
tim
e.
SELECTIVE-PREDICATIONAGGRESSIVE-PREDICATIONwish jump/joinwish jump/join/loop
2.02
0
0.2
0.4
0.6
0.8
1
1.2
gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG AVGnomcf
No
rma
lize
d e
xecu
tion
tim
e.
SELECTIVE-PREDICATIONAGGRESSIVE-PREDICATIONwish jump/joinwish jump/join/loop
2.02
0
0.2
0.4
0.6
0.8
1
1.2
gzip vpr mcf crafty parser gap vortex bzip2 twolf AVG AVGnomcf
No
rma
lize
d e
xecu
tion
tim
e.
SELECTIVE-PREDICATIONAGGRESSIVE-PREDICATIONwish jump/joinwish jump/join/loop
Performance Improvement
24% 8% 14%-4%
non-predicated
2.02
21
Talk Outline
Problem
Wish Branches
Experimental Methodology
Results
Conclusion
22
Conclusion
New control flow instructions: wish branches (jump/join/loop)
Wish branches improve performance by dividing the work of
predication between the compiler and the microarchitecture Compiler: analyzes the control-flow graph and generates code
Microarchitecture: makes run-time decision to use predication
Wish branches provide significant performance benefits 16% compared to conditional branch prediction
13% compared to selectively predicated code
Wish branches can make predicated execution more viable
and effective in high performance processors By enabling adaptive and aggressive predicated execution