Post on 06-Jan-2016
description
transcript
Computer Architecture
Lecture 6
Overview of Branch Prediction
Prediction accuracy of a 4096- entry 2-bit prediction buffer vs. infinite buffer
0% 2% 4% 6% 8% 10% 12% 14% 16% 18% Frequency of mispredictions
10%
10%
5%
5%
12%
11%
9%
9%
9%
9%
0%
0%
li
eqntott
espresso
gcc
fpppp
spice
matrix300
4096 entries:
2bits per entry
Unlimited entries
2 bits per entry
Comparison of 2 bit predictors
Frequency of mispredictions (%)0 2 4 6 8 10 12 14 16 18
10%
10%
5%
5%
12%
11%
9%
9%
9%
9%
0%
0%
li
eqntott
espresso
gcc
fpppp
spice
matrix300
5%
5%
11%
4%
6%
5%
Local 4096 entries:
2-bits per
Unlimited entries
2-bits
1024 entries (2,2)
Tournament Predictor
Use predictor P1
11
P1 Correct
P2 Correct
P1 Correct
P1 Correct
P1 Correct
Use predictor
P2
00
Use predictor P1
10
Use predictor P2
01
P2 Correct
P2 Correct
Misprediction rate of three predictors
• Note that predictors of equal capacity must be compared. Sizes of each level have to be selected to optimize prediction accurate. Influencing factors: degree of interference between branches, program likely to benefit from local/global history
Total Predictor Size (KBits)
Conditional Branch Mis-prediction Rate.
0 32 64 96 128 160 192 224 256 288 320 352 384 416 448 480 512
Correlating Predictor
Local 2-bit Predictor
8%
7%
6%
5%
4%
3%
2%
1%
0%
Tournament Predictor
Why Prediction
Prediction Reduces Branch hazards in Pipelined Processors.
Used in almost all pipelined processors
0
Mux
1
Branch prediction (T/NT)
Branch Prediction Buffer
Branch Target Address Cache
PC+4
Actual Next PC
A Branch Target Buffer
Branch predicted taken or untaken
Number
of entries
In branch target
buffer
Predicted PC
PC of instruction to fetchLookup
No: not branch instruction; proceed normally
=
Yes: Instruction is branch, use Predicted PC
Prediction Hardware (Counter Etc)
New PC
Handling an instruction with a branch-target
ID
Send PC to memory and branch-target buffer
Entry found in the branch-target buffer?
Send out predicted
PCIs
Instruction
a taken branch?
Taken
Branch?
Mispredicted Branch, kill fetched instruction
Enter Branch instruction address and next PC into branch target buffer
No
No
No
Yes
YesYes
Branch correctly Predicted; Continue execution with no stalls
Normal instruction execution
IF
EX
Penalties for possible combinations of whether the branch is in the buffer
Instruction in
buffer
Prediction Actual branch
Penalty cycles
Yes Taken Taken 0
Yes Taken Not taken 2
No Taken 2
No Not taken 0
Static Super Scalar pipeline in operation
Fetch 64-bits/clock cycle; Int on left, FP on right– Can only issue 2nd instruction if 1st instruction issues– More ports for FP registers to do FP load & FP op in a pair
Type Pipe StagesInt. instruction IF ID EX MEM WBFP instruction IF ID EX MEM WBInt. instruction IF ID EX MEM WBFP instruction IF ID EX MEM WBInt. instruction IF ID EX MEMWBFP instruction IF ID EX MEMWB
1 cycle load delay causes delay to 3 instructions in Superscalar instruction in right half can’t use it, nor instructions in
next slot
Wait for Operands
Check for RS
Check for RAW
Wait for Operands
EXTAC
MemAccess
CDB #1
EX
M1
M2
.
.M7
Divide
Wait for Operands
Wait for Operands
Integer
LD/ST
FP
Write Reg
ISSUE/ Rename to RS
ISSUE/ Rename to RS
Instr.
Cach
e
Wider Bus
CDB #2
Wait for Operands
A1
A2
A3
A4
Wait for OperandsWait for Operands
Wait for Operands
Wait for Operands
Read Reg
Dynamic Super Scalar pipeline in operation
Example 1
Loop: L.D F0,0(R1) ;F0=array elementADD.D F4,F0,F2S.D F4,0(R1) ; store result ADDIU R1,R1,#-8;8 bytes (per DW)
BNE R1,R2,LOOP ;branch R1!=R2
Dual issue, 1 Integer Unit FPMUL = 3 cc
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 First issue
1 ADD.D F4,F0,F2 1
1 S.D F4,0(R1)
1 DADDIU R1,R1,#-8
1 BNE R1,R2,Loop
2 L.D F0,0(R1)
2 ADD.D F4,F0,F2
2 S.D F4,0(R1)
2 DADDIU R1,R1,#-8
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 First issue
1 ADD.D F4,F0,F2 1
1 S.D F4,0(R1) 2
1 DADDIU R1,R1,#-8
2
1 BNE R1,R2,Loop
2 L.D F0,0(R1)
2 ADD.D F4,F0,F2
2 S.D F4,0(R1)
2 DADDIU R1,R1,#-8
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 First issue
1 ADD.D F4,F0,F2 1
1 S.D F4,0(R1) 2 3
1 DADDIU R1,R1,#-8
2
1 BNE R1,R2,Loop 3
2 L.D F0,0(R1)
2 ADD.D F4,F0,F2
2 S.D F4,0(R1)
2 DADDIU R1,R1,#-8
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1
1 S.D F4,0(R1) 2 3
1 DADDIU R1,R1,#-8
2 4
1 BNE R1,R2,Loop 3
2 L.D F0,0(R1) 4
2 ADD.D F4,F0,F2 4
2 S.D F4,0(R1)
2 DADDIU R1,R1,#-8
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5
1 S.D F4,0(R1) 2 3 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3
2 L.D F0,0(R1) 4
2 ADD.D F4,F0,F2 4
2 S.D F4,0(R1) 5
2 DADDIU R1,R1,#-8
5
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5,6 Wait for L.D
1 S.D F4,0(R1) 2 3 Wait for ADD.D
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4
2 ADD.D F4,F0,F2 4
2 S.D F4,0(R1) 5
2 DADDIU R1,R1,#-8 5
2 BNE R1,R2,Loop 6
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5,6,7 Wait for L.D
1 S.D F4,0(R1) 2 3 Wait for ADD.D
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 Wait for BNE
2 ADD.D F4,F0,F2 4
2 S.D F4,0(R1) 5
2 DADDIU R1,R1,#-8 5
2 BNE R1,R2,Loop 6
3 L.D F0,0(R1) 7
3 ADD.D F4,F0,F2 7
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 Wait for ADD.D
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 Wait for BNE
2 ADD.D F4,F0,F2 4 Wait for L.D
2 S.D F4,0(R1) 5 8 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 Wait for ALU
2 BNE R1,R2,Loop 6 Wait for DADDIU
3 L.D F0,0(R1) 7 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8
3 DADDIU R1,R1,#-8 8
3 BNE R1,R2,Loop
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 9
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 9 Wait for BNE
2 ADD.D F4,F0,F2 4 Wait for L.D
2 S.D F4,0(R1) 5 8 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 9 Wait for ALU
2 BNE R1,R2,Loop 6 Wait for DADDIU
3 L.D F0,0(R1) 7 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8 Wait for ADD.D
3 DADDIU R1,R1,#-8 8 Wait for ALU
3 BNE R1,R2,Loop 9
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 9
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 9 Wait for BNE
2 ADD.D F4,F0,F2 4 10 Wait for L.D
2 S.D F4,0(R1) 5 8 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU
2 BNE R1,R2,Loop 6 Wait for DADDIU
3 L.D F0,0(R1) 7 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8 Wait for ADD.D
3 DADDIU R1,R1,#-8 8 Wait for ALU
3 BNE R1,R2,Loop 9 Wait for DADDIU
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 9
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 9 Wait for BNE
2 ADD.D F4,F0,F2 4 10,11 Wait for L.D
2 S.D F4,0(R1) 5 8 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU
2 BNE R1,R2,Loop 6 11 Wait for DADDIU
3 L.D F0,0(R1) 7 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8 Wait for ADD.D
3 DADDIU R1,R1,#-8 8 Wait for ALU
3 BNE R1,R2,Loop 9 Wait for DADDIU
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 9
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 9 Wait for BNE
2 ADD.D F4,F0,F2 4 10,11,12 Wait for L.D
2 S.D F4,0(R1) 5 8 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU
2 BNE R1,R2,Loop 6 11 Wait for DADDIU
3 L.D F0,0(R1) 7 12 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8 Wait for ADD.D
3 DADDIU R1,R1,#-8 8 Wait for ALU
3 BNE R1,R2,Loop 9 Wait for DADDIU
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 9
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 9 Wait for BNE
2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D
2 S.D F4,0(R1) 5 8 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU
2 BNE R1,R2,Loop 6 11 Wait for DADDIU
3 L.D F0,0(R1) 7 12 13 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8 13 Wait for ADD.D
3 DADDIU R1,R1,#-8 8 Wait for ALU
3 BNE R1,R2,Loop 9 Wait for DADDIU
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 9
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 9 Wait for BNE
2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D
2 S.D F4,0(R1) 5 8 14 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU
2 BNE R1,R2,Loop 6 11 Wait for DADDIU
3 L.D F0,0(R1) 7 12 13 14 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8 13 Wait for ADD.D
3 DADDIU R1,R1,#-8 8 14 Wait for ALU
3 BNE R1,R2,Loop 9 Wait for DADDIU
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 9
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 9 Wait for BNE
2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D
2 S.D F4,0(R1) 5 8 14 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU
2 BNE R1,R2,Loop 6 11 Wait for DADDIU
3 L.D F0,0(R1) 7 12 13 14 Wait for BNE
3 ADD.D F4,F0,F2 7 15 Wait for L.D
3 S.D F4,0(R1) 8 13 Wait for ADD.D
3 DADDIU R1,R1,#-8 8 14 15 Wait for ALU
3 BNE R1,R2,Loop 9 Wait for DADDIU
Dual issue, 1 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 9
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 9 Wait for BNE
2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D
2 S.D F4,0(R1) 5 8 14 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU
2 BNE R1,R2,Loop 6 11 Wait for DADDIU
3 L.D F0,0(R1) 7 12 13 14 Wait for BNE
3 ADD.D F4,F0,F2 7 15,16 Wait for L.D
3 S.D F4,0(R1) 8 13 Wait for ADD.D
3 DADDIU R1,R1,#-8 8 14 15 Wait for ALU
3 BNE R1,R2,Loop 9 16 Wait for DADDIU
Dual issue, 1 Integer Unit, FPMUL = 3 cc
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D
1 S.D F4,0(R1) 2 3 9
1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU
1 BNE R1,R2,Loop 3 6 Wait for DADDIU
2 L.D F0,0(R1) 4 7 8 9 Wait for BNE
2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D
2 S.D F4,0(R1) 5 8 14 Wait for ADD.D
2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU
2 BNE R1,R2,Loop 6 11 Wait for DADDIU
3 L.D F0,0(R1) 7 12 13 14 Wait for BNE
3 ADD.D F4,F0,F2 7 15-17 18 Wait for L.D
3 S.D F4,0(R1) 8 13 19 Wait for ADD.D
3 DADDIU R1,R1,#-8 8 14 15 Wait for ALU
3 BNE R1,R2,Loop 9 16 Wait for DADDIU
Dual issue, 2 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 First issue
1 ADD.D F4,F0,F2 1
1 S.D F4,0(R1)
1 DADDIU R1,R1,#-8
1 BNE R1,R2,Loop
2 L.D F0,0(R1)
2 ADD.D F4,F0,F2
2 S.D F4,0(R1)
2 DADDIU R1,R1,#-8
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
Dual issue, 2 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 First issue
1 ADD.D F4,F0,F2 1
1 S.D F4,0(R1) 2
1 DADDIU R1,R1,#-8
2
1 BNE R1,R2,Loop
2 L.D F0,0(R1)
2 ADD.D F4,F0,F2
2 S.D F4,0(R1)
2 DADDIU R1,R1,#-8
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
Dual issue, 2 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 First issue
1 ADD.D F4,F0,F2 1
1 S.D F4,0(R1) 2 3
1 DADDIU R1,R1,#-8
2 3
1 BNE R1,R2,Loop 3
2 L.D F0,0(R1)
2 ADD.D F4,F0,F2
2 S.D F4,0(R1)
2 DADDIU R1,R1,#-8
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
Dual issue, 2 Integer Unit
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 Wait for LD.D
1 S.D F4,0(R1) 2 3 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3
2 L.D F0,0(R1) 4
2 ADD.D F4,F0,F2 4
2 S.D F4,0(R1)
2 DADDIU R1,R1,#-8
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5 Wait for LD.D
1 S.D F4,0(R1) 2 3 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4
2 ADD.D F4,F0,F2 4
2 S.D F4,0(R1) 5
2 DADDIU R1,R1,#-8
5
2 BNE R1,R2,Loop
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5,6 Wait for LD.D
1 S.D F4,0(R1) 2 3 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4 6 Wait for BNE
2 ADD.D F4,F0,F2 4 Wait for L.D
2 S.D F4,0(R1) 5 Wait for ADD.D
2 DADDIU R1,R1,#-8
5 6 Executes earlier
2 BNE R1,R2,Loop 6
3 L.D F0,0(R1)
3 ADD.D F4,F0,F2
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5,6,7 Wait for LD.D
1 S.D F4,0(R1) 2 3 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4 6 7 Wait for BNE
2 ADD.D F4,F0,F2 4 Wait for L.D
2 S.D F4,0(R1) 5 7 Wait for ADD.D
2 DADDIU R1,R1,#-8
5 6 7 Executes earlier
2 BNE R1,R2,Loop 6
3 L.D F0,0(R1) 7
3 ADD.D F4,F0,F2 7
3 S.D F4,0(R1)
3 DADDIU R1,R1,#-8
3 BNE R1,R2,Loop
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5-7 8 Wait for LD.D
1 S.D F4,0(R1) 2 3 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4 6 7 8 Wait for BNE
2 ADD.D F4,F0,F2 4 Wait for L.D
2 S.D F4,0(R1) 5 7 Wait for ADD.D
2 DADDIU R1,R1,#-8
5 6 7 Executes earlier
2 BNE R1,R2,Loop 6 8
3 L.D F0,0(R1) 7
3 ADD.D F4,F0,F2 7
3 S.D F4,0(R1) 8
3 DADDIU R1,R1,#-8
8
3 BNE R1,R2,Loop
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D
1 S.D F4,0(R1) 2 3 9 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4 6 7 8 Wait for BNE
2 ADD.D F4,F0,F2 4 9 Wait for L.D
2 S.D F4,0(R1) 5 7 Wait for ADD.D
2 DADDIU R1,R1,#-8
5 6 7 Executes earlier
2 BNE R1,R2,Loop 6 8 Wait for ADDIU
3 L.D F0,0(R1) 7 9 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8
3 DADDIU R1,R1,#-8
8 9
3 BNE R1,R2,Loop 9
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D
1 S.D F4,0(R1) 2 3 9 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4 6 7 8 Wait for BNE
2 ADD.D F4,F0,F2 4 9,10 Wait for L.D
2 S.D F4,0(R1) 5 7 Wait for ADD.D
2 DADDIU R1,R1,#-8
5 6 7 Executes earlier
2 BNE R1,R2,Loop 6 8 Wait for ADDIU
3 L.D F0,0(R1) 7 9 10 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8 10 Wait for ADD.D
3 DADDIU R1,R1,#-8
8 9 10 Executes earlier
3 BNE R1,R2,Loop 9 Wait for ADDIU
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D
1 S.D F4,0(R1) 2 3 9 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4 6 7 8 Wait for BNE
2 ADD.D F4,F0,F2 4 9,10,11 Wait for L.D
2 S.D F4,0(R1) 5 7 Wait for ADD.D
2 DADDIU R1,R1,#-8
5 6 7 Executes earlier
2 BNE R1,R2,Loop 6 8 Wait for ADDIU
3 L.D F0,0(R1) 7 9 10 11 Wait for BNE
3 ADD.D F4,F0,F2 7 Wait for L.D
3 S.D F4,0(R1) 8 10 Wait for ADD.D
3 DADDIU R1,R1,#-8
8 9 10 Executes earlier
3 BNE R1,R2,Loop 9 11 Wait for ADDIU
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D
1 S.D F4,0(R1) 2 3 9 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4 6 7 8 Wait for BNE
2 ADD.D F4,F0,F2 4 9-11 12 Wait for L.D
2 S.D F4,0(R1) 5 7 Wait for ADD.D
2 DADDIU R1,R1,#-8
5 6 7 Executes earlier
2 BNE R1,R2,Loop 6 8 Wait for ADDIU
3 L.D F0,0(R1) 7 9 10 11 Wait for BNE
3 ADD.D F4,F0,F2 7 12 Wait for L.D
3 S.D F4,0(R1) 8 10 Wait for ADD.D
3 DADDIU R1,R1,#-8
8 9 10 Executes earlier
3 BNE R1,R2,Loop 9 11 Wait for ADDIU
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D
1 S.D F4,0(R1) 2 3 9 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4 6 7 8 Wait for BNE
2 ADD.D F4,F0,F2 4 9 12 Wait for L.D
2 S.D F4,0(R1) 5 7 13 Wait for ADD.D
2 DADDIU R1,R1,#-8
5 6 7 Executes earlier
2 BNE R1,R2,Loop 6 8 Wait for ADDIU
3 L.D F0,0(R1) 7 9 10 11 Wait for BNE
3 ADD.D F4,F0,F2 7 12,13 Wait for L.D
3 S.D F4,0(R1) 8 10 Wait for ADD.D
3 DADDIU R1,R1,#-8
8 9 10 Executes earlier
3 BNE R1,R2,Loop 9 11 Wait for ADDIU
Dual issue, 2 Integer UnitIteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D F0,0(R1) 1 2 3 4 First issue
1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D
1 S.D F4,0(R1) 2 3 9 Wait for ADD.D
1 DADDIU R1,R1,#-8
2 3 4 Executes earlier
1 BNE R1,R2,Loop 3 5 Wait for ADDIU
2 L.D F0,0(R1) 4 6 7 8 Wait for BNE
2 ADD.D F4,F0,F2 4 9 12 Wait for L.D
2 S.D F4,0(R1) 5 7 13 Wait for ADD.D
2 DADDIU R1,R1,#-8
5 6 7 Executes earlier
2 BNE R1,R2,Loop 6 8 Wait for ADDIU
3 L.D F0,0(R1) 7 9 10 11 Wait for BNE
3 ADD.D F4,F0,F2 7 12-14 15 Wait for L.D
3 S.D F4,0(R1) 8 10 16 Wait for ADD.D
3 DADDIU R1,R1,#-8
8 9 10 Executes earlier
3 BNE R1,R2,Loop 9 11 Wait for ADDIU
Speculative Execution
Need to overcome Branch Hazards Precise Exception
Speculative Pipeline
ISSUE/ Rename to
RS
Check for RS
Check for RAW
CDB
A1
A2
A3
A4
Wait for Operands
FP
Write Reg
Wait for Operands
EXTAC
MemAcces
LD/ST
Wait for Operands
EXInteger
M1
M2
.
.M7
Wait for Operands
DivideWait for Operands
ROB
Read Reg
The Hardware: Reorder Buffer
If inst write results in program order, reg/memory always get the correct values
Reorder buffer (ROB) – reorder out-of-order inst to program order at the time of writing reg/memory (commit)
If some inst goes wrong, handle it at the time of commit – just flush inst afterwards
Inst cannot write reg/memory immediately after execution, so ROB also buffer the results
No such a place in Tomasulo original
ReorderBufferDecode
FU1 FU2
RS RS
Fetch Unit
Rename
L-bufS-buf
DM
Regfile
IM
Issue — get instruction from FP Op QueueCondition: a free RS at the required FUActions: (1) decode the instruction; (2) allocate a RS
and ROB entry; (3) do source register renaming; (4) do dest register renaming; (5) read register file; (6) dispatch the decoded and renamed instruction to the RS and ROB
Execution — operate on operands (EX)Condition: At a given FU, At lease one instruction is
readyAction: select a ready instruction and send it to the FU
Write result — finish execution (WB)Condition: At a given FU, some instruction finishes FU
executionActions: (1) FU writes to CDB, broadcast to all RSs and
to the ROB; (2) FU broadcast tag (ROB index) to all RS; (3) de-allocate the RS. Note: no register status update at this time
Speculative Tomasulo Algorithm
Speculative Tomasulo Algorithm
Commit—update register with reorder result Condition: ROB is not empty and ROB head
inst has finished execution Actions if no mis-prediction/exception: (1)
write result to register/memory, (2) update register status, (3) de-allocate the ROB entry
Actions if with mis-prediction/exception: flush the pipeline, e.g. (1) flush IFQ; (2) clear register status; (3) flush all RS and reset FU;
(4) reset ROB
Loop: LD R2,0(R1) DADDIUR2,R2,#1 SD R2,0(R1) ;store
result
DADDIUR1,R1,#4 ;increment pointer
BNE R2,R3,LOOP ;branch if not last element
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 First issue
1 ADDIU R2,R2,#1 1
1 S.D R2,0(R1)
1 DADDIU R1,R1,#4
1 BNE R2,R3,Loop
2 L.D R2,0(R1)
2 ADDIU R2,R2,#1
2 S.D R2,0(R1)
2 DADDIU R1,R1,#4
2 BNE R2,R3,Loop
3 L.D R2,0(R1)
3 ADDIU R2,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 First issue
1 ADDIU R2,R2,#1 1 Wait for LW
1 S.D R2,0(R1) 2
1 DADDIU R1,R1,#4 2
1 BNE R2,R3,Loop
2 L.D R2,0(R1)
2 ADDIU R2,R2,#1
2 S.D R2,0(R1)
2 DADDIU R1,R1,#4
2 BNE R2,R3,Loop
3 L.D R2,0(R1)
3 ADDIU R2,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 First issue
1 ADDIU R2,R2,#1 1 Wait for LW
1 S.D R2,0(R1) 2 3 Wait for ADDIU
1 DADDIU R1,R1,#4 2 3
1 BNE R2,R3,Loop 3
2 L.D R2,0(R1)
2 ADDIU R2,R2,#1
2 S.D R2,0(R1)
2 DADDIU R1,R1,#4
2 BNE R2,R3,Loop
3 L.D R2,0(R1)
3 ADDIU R2,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 Wait for LW
1 S.D R2,0(R1) 2 3 Wait for ADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3
2 L.D R2,0(R1) 4
2 ADDIU R2,R2,#1 4
2 S.D R2,0(R1)
2 DADDIU R1,R1,#4
2 BNE R2,R3,Loop
3 L.D R2,0(R1)
3 ADDIU R2,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 Wait for LW
1 S.D R2,0(R1) 2 3 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 Wait for DADDIU
2 L.D F0,0(R1) 4 Wait for BNE
2 ADDIU R4,R2,#1 4 Wait for LW
2 S.D R2,0(R1) 5
2 DADDIU R1,R1,#4 5
2 BNE R2,R3,Loop
3 L.D F0,0(R1)
3 ADDIU R4,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for LW
1 S.D R2,0(R1) 2 3 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 Wait for DADDIU
2 L.D R2,0(R1) 4 Wait for BNE
2 ADDIU R2,R2,#1 4 Wait for LW
2 S.D R2,0(R1) 5 Wait for DADDIU
2 DADDIU R1,R1,#4 5 Wait for BNE
2 BNE R2,R3,Loop 6
3 L.D R2,0(R1)
3 ADDIU R2,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for LW
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 Wait for BNE
2 ADDIU R2,R2,#1 4 Wait for LW
2 S.D R2,0(R1) 5 Wait for DADDIU
2 DADDIU R1,R1,#4 5 Wait for BNE
2 BNE R2,R3,Loop 6 Wait for DADDIU
3 L.D R2,0(R1) 7
3 ADDIU R2,R2,#1 7
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 Wait for BNE
2 ADDIU R2,R2,#1 4 Wait for LW
2 S.D R2,0(R1) 5 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 Wait for BNE
2 BNE R2,R3,Loop 6 Wait for DADDIU
3 L.D R2,0(R1) 7 Wait for BNE
3 ADDIU R2,R2,#1 7 Wait for LW
3 S.D R2,0(R1) 8
3 DADDIU R1,R1,#4 8
3 BNE R2,R3,Loop
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 Wait for BNE
2 ADDIU R2,R2,#1 4 Wait for LW
2 S.D R2,0(R1) 5 9 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 Wait for DADDIU
3 L.D R2,0(R1) 7 Wait for BNE
3 ADDIU R2,R2,#1 7 Wait for LW
3 S.D R2,0(R1) 8 Wait for DADDIU
3 DADDIU R1,R1,#4 8 Wait for BNE
3 BNE R2,R3,Loop 9
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 Wait for LW
2 S.D R2,0(R1) 5 9 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 Wait for DADDIU
3 L.D R2,0(R1) 7 Wait for BNE
3 ADDIU R2,R2,#1 7 Wait for LW
3 S.D R2,0(R1) 8 Wait for DADDIU
3 DADDIU R1,R1,#4 8 Wait for BNE
3 BNE R2,R3,Loop 9 Wait for DADDIU
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 11 Wait for LW
2 S.D R2,0(R1) 5 9 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 Wait for DADDIU
3 L.D R2,0(R1) 7 Wait for BNE
3 ADDIU R2,R2,#1 7 Wait for LW
3 S.D R2,0(R1) 8 Wait for DADDIU
3 DADDIU R1,R1,#4 8 Wait for BNE
3 BNE R2,R3,Loop 9 Wait for DADDIU
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 11 12 Wait for LW
2 S.D R2,0(R1) 5 9 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 Wait for DADDIU
3 L.D R2,0(R1) 7 Wait for BNE
3 ADDIU R2,R2,#1 7 Wait for LW
3 S.D R2,0(R1) 8 Wait for DADDIU
3 DADDIU R1,R1,#4 8 Wait for BNE
3 BNE R2,R3,Loop 9 Wait for DADDIU
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 11 12 Wait for LW
2 S.D R2,0(R1) 5 9 13 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 13 Wait for DADDIU
3 L.D R2,0(R1) 7 Wait for BNE
3 ADDIU R2,R2,#1 7 Wait for LW
3 S.D R2,0(R1) 8 Wait for DADDIU
3 DADDIU R1,R1,#4 8 Wait for BNE
3 BNE R2,R3,Loop 9 Wait for DADDIU
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 11 12 Wait for LW
2 S.D R2,0(R1) 5 9 13 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 13 Wait for DADDIU
3 L.D R2,0(R1) 7 14 Wait for BNE
3 ADDIU R2,R2,#1 7 Wait for LW
3 S.D R2,0(R1) 8 Wait for DADDIU
3 DADDIU R1,R1,#4 8 14 Wait for BNE
3 BNE R2,R3,Loop 9 Wait for DADDIU
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 11 12 Wait for LW
2 S.D R2,0(R1) 5 9 13 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 13 Wait for DADDIU
3 L.D R2,0(R1) 7 14 15 Wait for BNE
3 ADDIU R2,R2,#1 7 Wait for LW
3 S.D R2,0(R1) 8 15 Wait for DADDIU
3 DADDIU R1,R1,#4 8 14 15 Wait for BNE
3 BNE R2,R3,Loop 9 Wait for DADDIU
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 11 12 Wait for LW
2 S.D R2,0(R1) 5 9 13 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 13 Wait for DADDIU
3 L.D R2,0(R1) 7 14 15 16 Wait for BNE
3 ADDIU R2,R2,#1 7 Wait for LW
3 S.D R2,0(R1) 8 15 Wait for DADDIU
3 DADDIU R1,R1,#4 8 14 15 Wait for BNE
3 BNE R2,R3,Loop 9 Wait for DADDIU
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 11 12 Wait for LW
2 S.D R2,0(R1) 5 9 13 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 13 Wait for DADDIU
3 L.D R2,0(R1) 7 14 15 16 Wait for BNE
3 ADDIU R2,R2,#1 7 17 Wait for LW
3 S.D R2,0(R1) 8 15 Wait for DADDIU
3 DADDIU R1,R1,#4 8 14 15 Wait for BNE
3 BNE R2,R3,Loop 9 Wait for DADDIU
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 11 12 Wait for LW
2 S.D R2,0(R1) 5 9 13 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 13 Wait for DADDIU
3 L.D R2,0(R1) 7 14 15 16 Wait for BNE
3 ADDIU R2,R2,#1 7 17 18 Wait for LW
3 S.D R2,0(R1) 8 15 Wait for DADDIU
3 DADDIU R1,R1,#4 8 14 15 Wait for BNE
3 BNE R2,R3,Loop 9 Wait for DADDIU
Non-Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Comment
1 L.D R2,0(R1) 1 2 3 4 First issue
1 ADDIU R2,R2,#1 1 5 6 Wait for BNE
1 S.D R2,0(R1) 2 3 7 Wait for DADDIU
1 DADDIU R1,R1,#4 2 3 4 Execute directly
1 BNE R2,R3,Loop 3 7 Wait for DADDIU
2 L.D R2,0(R1) 4 8 9 10 Wait for BNE
2 ADDIU R2,R2,#1 4 11 12 Wait for LW
2 S.D R2,0(R1) 5 9 13 Wait for DADDIU
2 DADDIU R1,R1,#4 5 8 9 Wait for BNE
2 BNE R2,R3,Loop 6 13 Wait for DADDIU
3 L.D R2,0(R1) 7 14 15 16 Wait for BNE
3 ADDIU R2,R2,#1 7 17 18 Wait for LW
3 S.D R2,0(R1) 8 15 19 Wait for DADDIU
3 DADDIU R1,R1,#4 8 14 15 Wait for BNE
3 BNE R2,R3,Loop 9 19 Wait for DADDIU
Speculative execution:Dual issue, 2 CDB
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1
1 ADDIU R2,R2,#1 1
1 S.D R2,0(R1)
1 DADDIU R1,R1,#4
1 BNE R2,R3,Loop
2 L.D R2,0(R1)
2 ADDIU R2,R2,#1
2 S.D R2,0(R1)
2 DADDIU R1,R1,#4
2 BNE R2,R3,Loop
3 L.D R2,0(R1)
3 ADDIU R2,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D F0,0(R1) 1 2
1 ADDIU R4,R2,#1 1
1 S.D R2,0(R1) 2
1 DADDIU R1,R1,#4 2
1 BNE R2,R3,Loop
2 L.D F0,0(R1)
2 ADDIU R4,R2,#1
2 S.D R2,0(R1)
2 DADDIU R1,R1,#4
2 BNE R2,R3,Loop
3 L.D F0,0(R1)
3 ADDIU R4,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3
1 ADDIU R2,R2,#1 1
1 S.D R2,0(R1) 2 3
1 DADDIU R1,R1,#4 2 3
1 BNE R2,R3,Loop 3
2 L.D R2,0(R1)
2 ADDIU R2,R2,#1
2 S.D R2,0(R1)
2 DADDIU R1,R1,#4
2 BNE R2,R3,Loop
3 L.D R2,0(R1)
3 ADDIU R2,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4
1 ADDIU R2,R2,#1 1
1 S.D R2,0(R1) 2 3
1 DADDIU R1,R1,#4 2 3 4
1 BNE R2,R3,Loop 3
2 L.D R2,0(R1) 4
2 ADDIU R2,R2,#1 4
2 S.D R2,0(R1)
2 DADDIU R1,R1,#4
2 BNE R2,R3,Loop
3 L.D R2,0(R1)
3 ADDIU R2,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D F0,0(R1) 1 2 3 4 5
1 ADDIU R4,R2,#1 1 5
1 S.D R2,0(R1) 2 3
1 DADDIU R1,R1,#4 2 3 4
1 BNE R2,R3,Loop 3
2 L.D F0,0(R1) 4 5
2 ADDIU R4,R2,#1 4
2 S.D R2,0(R1) 5
2 DADDIU R1,R1,#4 5
2 BNE R2,R3,Loop
3 L.D F0,0(R1)
3 ADDIU R4,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4 5
1 ADDIU R2,R2,#1 1 5 6
1 S.D R2,0(R1) 2 3
1 DADDIU R1,R1,#4 2 3 4
1 BNE R2,R3,Loop 3
2 L.D R2,0(R1) 4 5 6
2 ADDIU R2,R2,#1 4
2 S.D R2,0(R1) 5 6
2 DADDIU R1,R1,#4 5 6
2 BNE R2,R3,Loop 6
3 L.D R2,0(R1)
3 ADDIU R2,R2,#1
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4 5
1 ADDIU R2,R2,#1 1 5 6 7
1 S.D R2,0(R1) 2 3 7
1 DADDIU R1,R1,#4 2 3 4
1 BNE R2,R3,Loop 3 7
2 L.D R2,0(R1) 4 5 6 7
2 ADDIU R2,R2,#1 4
2 S.D R2,0(R1) 5 6
2 DADDIU R1,R1,#4 5 6 7
2 BNE R2,R3,Loop 6
3 L.D R2,0(R1) 7
3 ADDIU R2,R2,#1 7
3 S.D R2,0(R1)
3 DADDIU R1,R1,#4
3 BNE R2,R3,Loop
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4 5
1 ADDIU R2,R2,#1 1 5 6 7
1 S.D R2,0(R1) 2 3 7 7
1 DADDIU R1,R1,#4 2 3 4 8
1 BNE R2,R3,Loop 3 7 8
2 L.D R2,0(R1) 4 5 6 7
2 ADDIU R2,R2,#1 4 8
2 S.D R2,0(R1) 5 6
2 DADDIU R1,R1,#4 5 6 7
2 BNE R2,R3,Loop 6
3 L.D R2,0(R1) 7 8
3 ADDIU R2,R2,#1 7
3 S.D R2,0(R1) 8
3 DADDIU R1,R1,#4 8
3 BNE R2,R3,Loop
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4 5
1 ADDIU R2,R2,#1 1 5 6 7
1 S.D R2,0(R1) 2 3 7 7
1 DADDIU R1,R1,#4 2 3 4 8
1 BNE R2,R3,Loop 3 7 8
2 L.D R2,0(R1) 4 5 6 7 9
2 ADDIU R2,R2,#1 4 8 9
2 S.D R2,0(R1) 5 6
2 DADDIU R1,R1,#4 5 6 7
2 BNE R2,R3,Loop 6
3 L.D R2,0(R1) 7 8 9
3 ADDIU R2,R2,#1 7
3 S.D R2,0(R1) 8 9
3 DADDIU R1,R1,#4 8 9
3 BNE R2,R3,Loop 9
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4 5
1 ADDIU R2,R2,#1 1 5 6 7
1 S.D R2,0(R1) 2 3 7 7
1 DADDIU R1,R1,#4 2 3 4 8
1 BNE R2,R3,Loop 3 7 8
2 L.D R2,0(R1) 4 5 6 7 9
2 ADDIU R2,R2,#1 4 8 9 10
2 S.D R2,0(R1) 5 6 10
2 DADDIU R1,R1,#4 5 6 7
2 BNE R2,R3,Loop 6 10
3 L.D R2,0(R1) 7 8 9 10
3 ADDIU R2,R2,#1 7
3 S.D R2,0(R1) 8 9
3 DADDIU R1,R1,#4 8 9 10
3 BNE R2,R3,Loop 9
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4 5
1 ADDIU R2,R2,#1 1 5 6 7
1 S.D R2,0(R1) 2 3 7 7
1 DADDIU R1,R1,#4 2 3 4 8
1 BNE R2,R3,Loop 3 7 8
2 L.D R2,0(R1) 4 5 6 7 9
2 ADDIU R2,R2,#1 4 8 9 10
2 S.D R2,0(R1) 5 6 10
2 DADDIU R1,R1,#4 5 6 7 11
2 BNE R2,R3,Loop 6 10 11
3 L.D R2,0(R1) 7 8 9 10
3 ADDIU R2,R2,#1 7 11
3 S.D R2,0(R1) 8 9
3 DADDIU R1,R1,#4 8 9 10
3 BNE R2,R3,Loop 9
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4 5
1 ADDIU R2,R2,#1 1 5 6 7
1 S.D R2,0(R1) 2 3 7 7
1 DADDIU R1,R1,#4 2 3 4 8
1 BNE R2,R3,Loop 3 7 8
2 L.D R2,0(R1) 4 5 6 7 9
2 ADDIU R2,R2,#1 4 8 9 10
2 S.D R2,0(R1) 5 6 10
2 DADDIU R1,R1,#4 5 6 11
2 BNE R2,R3,Loop 6 10 11
3 L.D R2,0(R1) 7 8 9 10 12
3 ADDIU R2,R2,#1 7 11 12
3 S.D R2,0(R1) 8 9
3 DADDIU R1,R1,#4 8 9 10
3 BNE R2,R3,Loop 9
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4 5
1 ADDIU R2,R2,#1 1 5 6 7
1 S.D R2,0(R1) 2 3 7 7
1 DADDIU R1,R1,#4 2 3 4 8
1 BNE R2,R3,Loop 3 7 8
2 L.D R2,0(R1) 4 5 6 7 9
2 ADDIU R2,R2,#1 4 8 9 10
2 S.D R2,0(R1) 5 6 10
2 DADDIU R1,R1,#4 5 6 11
2 BNE R2,R3,Loop 6 10 11
3 L.D R2,0(R1) 7 8 9 10 12
3 ADDIU R2,R2,#1 7 11 12 13
3 S.D R2,0(R1) 8 9 13
3 DADDIU R1,R1,#4 8 9 10
3 BNE R2,R3,Loop 9 13
Speculative execution:Dual issue, 2 CDB
Iteration
Instructions Issues Executes Mem access
Write CDB
Commit
1 L.D R2,0(R1) 1 2 3 4 5
1 ADDIU R2,R2,#1 1 5 6 7
1 S.D R2,0(R1) 2 3 7 7
1 DADDIU R1,R1,#4 2 3 4 8
1 BNE R2,R3,Loop 3 7 8
2 L.D R2,0(R1) 4 5 6 7 9
2 ADDIU R2,R2,#1 4 8 9 10
2 S.D R2,0(R1) 5 6 10
2 DADDIU R1,R1,#4 5 6 11
2 BNE R2,R3,Loop 6 10 11
3 L.D R2,0(R1) 7 8 9 10 12
3 ADDIU R2,R2,#1 7 11 12 13
3 S.D R2,0(R1) 8 9 13
3 DADDIU R1,R1,#4 8 9 10 14
3 BNE R2,R3,Loop 9 13 14
IDEAL/Perfect Processor
Register renaming Infinite virtual registers available
Branch prediction All conditional branches are predicted
exactly Jump prediction
All jumps are perfectly predicted Memory address alias analysis
All memory addresses are known exactly.
ILP perfect processor for six SPEC92
Programs
Instr
ucti
on
Issu
es p
er
cycle
0
20
40
60
80
100
120
140
160
gcc espresso li fpppp doducdtomcatv
54.862.6
17.9
75.2
118.7
150.1
Effects of reducing the size of the window
Infinite 2k 512 128 32 8 4
160
140
120
100
80
60
40
20
0
Window size
Instruction issues per cycle
Tomcatv
Doduc
Fpppp
li
Practical possibilities
Another View of Last SlideIPC
Program
Instr
ucti
on
issu
es p
er
cycle
gcc espresso li fpppp
Infinite 2K 512 128 32
doduct
0
10
20
30
40
50
6055
63
18
75
36
41
15
61
10
1512
49
13 11
35
8 8 9
14
10
119
59
16 15
9
150
60
45
34
14
tomcatv
70
80
120
130
140
Window Size
Effect of branch-prediction schemes(1)
Instruction issues per cycle
Perfect Tournament Standard Static None
predictor 2-bit
60
50
40
30
20
10
0
Branch-prediction scheme
fpppp
Doduc
Tomcatv
li
Practical possibilities
Effect of branch-prediction schemes(2)
Program
Instr
ucti
on
issu
es p
er
cycle
0
10
20
30
40
50
60
gcc espresso li fpppp doducd tomcatv
35
41
16
61
5860
9
1210
48
15
67 6
46
13
45
6 6 7
45
14
45
2 2 2
29
4
19
46
Perfect Selective predictor Standard 2-bit Static None
Branch-prediction accuracy for conditional branches in SPEC92
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Frequency of mispredictions
88% 77%
86% 82%
li
espresso
fpppp
tomcatv
86% 82%
99% 99% 100
%
98%
96%
98%
Profile-based
2-bit counter
Tournament
Intl processor based on the p6 micro- architecture
Processor First ship date
Clock rate range
L1 cache L2 cache
Pentium Pro
1995 100-200 MHz
8KB instr. + 8KB data
256 KB-1024 KB
Pentium II 1998 233-450 MHz
16KB instr. + 16KB data
256 KB-512 KB
Pentium II Xeon
1999 400-450 MHz
16KB instr. + 16KB data
512 KB-2 MB
Celeron 1999 500-900 MHz
16KB instr. + 16KB data
128 KB
Pentium III 1999 450-1100 MHz
16KB instr. + 16KB data
256 KB–512 KB
Pentium Xeon
2000 700-900 MHz
16KB instr. + 16KB data
1 MB-2 MB
P6 Architecture (P-II Onwards…)
Instruction name
Pipeline stages
Repeat rate
Integer ALU 1 1
Integer Load 3 1
Integer Multiply 4 1
FP Add 3 1
FP multiply 5 2
FP divide (64-bits)
32 32
P6 processor pipeline
Instruction
Fetch
16 bytes
Per cycle
16 bytesInstruction
Decode
3 instructions
Per cycle
6 uopsRenaming
3 upos
Per cycle
Reservation station
(20)Execution unit
(5 total)
Graduation unit
(3 uops per cycle)
Reorder buffer
(40 entries)The P6 processor pipeline showing the
throughput of each stage and the total buffering provided between stages:
Speculation factor
Percentage of instructions that do not commit in Pentium 3
Ben
chm
ark
s
0
10
20
30
40
50
60
gcc tomcatv perl compressgo li vortex apsi fpppp hydro2d
Performance: Pentium 4 vs IIISpec
rati
o
0
200
400
600
800
1000
gcc mgridvortex applu
SPEC2000 benchmarks