Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 217 times |
Download: | 0 times |
CSC 4250Computer Architectures
October 27, 2006
Chapter 3. Instruction-Level Parallelism
& Its Dynamic Exploitation
Nested Loops
DADDIU R1,R0,#80Loop1: L.D F2,1600(R1)
DADDIU R2,R0,#40Loop2: L.D F0,1000(R2)
ADD.D F0,F0,F2S.D F0,1000(R2)DADDIU R2,R2,#−8BNEZ R2,Loop2DADDIU R1,R1,#−8BNEZ R1,Loop1
How many times do Loop1 and Loop2 iterate?
BNEZ R2,Loop2
Branch history: TTTTN|TTTTN|TTTTN|TTTTN|…N means branch not
taken.1-bit predictor: TTTTT|NTTTT|NTTTT|NTTTT|…
→ two errors per iteration.
2-bit predictor: TTTTT|TTTTT|TTTTT|TTTTT|…→ one error per
iteration.
The error behavior for Loop1 is similar.Put more bits in the counter to improve error behavior?
Global Branch History
Global branch history:
TTTTN|T|TTTTN|T|TTTTN|T|TTTTN|T| …
Loop 22222 |1| 22222 |1| 22222 |1| 22222 |1| …
Can we use global branch history to get a better result?
(On previous slide, we looked at local branch history.)
5-Bit Global Branch History
We keep a 5-bit global branch history, and use the bit pattern to choose one of 25 1-bit predictors:
TTTTT NTTTTN TTTTNT TTTNTT TTNTTT TNTTTT T… .NNNNN T
We get 100% accuracy in the steady state.This strategy works if at least 5 bits are used.
Correlating Branch Predictors (p. 200) A 2-bit predictor uses only the recent behavior of a
single branch. SPEC92 benchmark eqntott (the worst case in
Figures 3.8 and 3.9 with an 18% error rate):
if (aa==2)
aa=0;
if (bb==2)
bb=0;
if (aa!=bb) {
MIPS Code
Assume that aa and bb are assigned to R1 and R2:
DSUBUI R3,R1,#2BNEZ R3,L1 ;branch b1 (aa!=2)DADD R1,R0,R0 ;aa=0
L1: DSUBUI R3,R2,#2BNEZ R3,L2 ;branch b2 (bb!=2)DADD R2,R0,R0 ;bb=0
L2: DSUBU R3,R1,R2 ;R3=aa−bbBEQZ R3,L3 ;branch b3 (aa==bb)
Consider the branches. The behavior of branch b3 is correlated with the behavior of branches b1 and b2: if both b1 and b2 are not taken, then b3 will be taken (as aa and bb are equal).
Simplified Example (p. 202)
Suppose that d has values 0, 1, and 2:if (d==0) d=1;if (d==1)
MIPS Code: Assume that d is assigned to R1:
BNEZ R1,L1 ;branch b1 (d!=0)DADDUI R1,R0,#1 ;d==0, so d=1
L1: DADDUI R3,R1,#−1BNEZ R3,L2 ;branch b2 (d!=1)
…L2:
Figure 3.10. Possible execution sequence
Initial value of d
d==0? b1 Value of d before b2
d==1? b2
0 Yes NT 1 Yes NT
1 No T 1 Yes NT
2 No T 2 No T
Figure 3.11. Behavior of 1-bit predictor initialized to NT
Suppose that d = 2, 0, 2, 0, …
Misprediction Rate = 100%!
d=? b1 prediction
b1 action New b1 prediction
b2 prediction
b2 action
New b2 prediction
2 NT T T NT T T
0 T NT NT T NT NT
2 NT T T NT T T
0 T NT NT T NT NT
Figure 3.12. Meaning of Prediction BitsPrediction bits Prediction if last
branch not takenPrediction if last
branch taken
NT/NT NT NT
NT/T NT T
T/NT T NT
T/T T T
Fig. 3.13. Action of 1-bit predictor with 1 bit of correlation.
Initialized to NT/NTd=? b1
predictionb1
actionNew b1
predictionb2
predictionb2
actionNew b2
prediction
2 NT/NT T T/NT NT/NT T NT/T
0 T/NT NT T/NT NT/T NT NT/T
2 T/NT T T/NT NT/T T NT/T
0 T/NT NT T/NT NT/T NT NT/T
Figure 3.14. A (2,2) Branch Prediction Buffer This buffer uses a 2-bit
global history to choose from among 22 predictors for each branch address. Each predictor is in turn a 2-bit predictor for that branch.
Figure 3.12 shows a (1,1) branch prediction buffer.
Figure 3.15. Comparison of 2-bit Predictors
Tournament Predictors (p. 206)
Adaptively combine local and global predictors. Alpha 21264 has a tournament predictor using 4K 2-bit
counters indexed by the local branch address to choose from between a global predictor and a local predictor. The global predictor also has 4K entries and is indexed by the history of the last 12 branches; each entry in the global predictor is a standard 2-bit predictor. The local predictor consists of a 2-level predictor. The top level is a local history table consisting of 1024 10-bit entries. The entry is used to index a table of 1K entries consisting of 3-bit saturating counters, providing the local prediction. (Total = 29K bits. For SPECfp95 benchmarks, less than 1 misprediction per 1000 completed instructions.)
Fig. 3.16. State Transition Diagram for Tournament Predictor The counter is incremented whenever the “predicted” predictor is
correct and the other predictor is incorrect, and it is decremented in the reverse situation.
Figure 3.17. Fraction of predictions from local predictor for a tournament predictor using SPEC89
Figure 3.18. Misprediction rates for three different predictors on SPEC89 as total # of bits is increased