Department of Electrical & Computer Engineering
EC 513Computer Architecture
Prof. Michel A. Kinsy
Hazard Resolution
Department of Electrical & Computer Engineering
Instruction Interactions § An instruction in the pipeline may need a
resource being used by another instruction in the pipeline § Structural hazard
§ An instruction may depend on something produced by an earlier instruction§ Dependence may be for a data calculation
§ Data hazard§ Dependence may be for calculating the next
address§ Control hazard (branches, interrupts)
Department of Electrical & Computer Engineering
Structural Hazardsub x8, x6, x7 lw x1, 8(x2)add x5,x6,x7 sw x3, 24(x4)
Address
Inst[31-0]
PC
WriteData
Read Addr 1
Read Addr 2
WriteAddrRegisterFile
ReadData1
ReadData2
ALU
Overflow
zero
Address
WriteData
Read Data
MemWrite
MemRead
SignExtend32 64
MemtoReg
ALUSrc
Shiftleft1
ADD
PCSrc
ALUControl
1
1
00
0
1
ALUOp
ControlUnit
Branch
Memory
RegWrite
RegWrite
Instr[30, 14-12]
Instr[19-15]
Instr[24-20]
Instr[11-7]
Instr[31-21]
ADD
4
0
Department of Electrical & Computer Engineering
Multi-Stage RISC-V CPU
Address
Inst[31-0]
PC
WriteData
Read Addr 1
Read Addr 2
WriteAddrRegisterFile
ReadData1
ReadData2
ALU
Overflow
zero
Address
WriteData
Read Data
MemWrite
MemRead
SignExtend
32 64
MemtoReg
ALUSrc
Shiftleft1
ADD
PCSrc
ALUControl
1
1
00
0
1
ALUOp
ControlUnit
Branch
Memory
RegWrite
RegWrite
Instr[30, 14-12]
Instr[19-15]
Instr[24-20]
Instr[11-7]
Instr[31-21]
ADD
4
0
Department of Electrical & Computer Engineering
timet0 t1 t2 t3 t4 t5 t6 t7 . . .
IF I1 I2 I3 I3 I3 I3 I4 I5ID I1 I2 I2 I2 I2 I3 I4EX I1 nop nop nop I2 I3MA I1 nop nop nop I2WB I1 nop nop nop
Stalled Stages and Pipeline
timet0 t1 t2 t3 t4 t5 t6 t7 . .
(I1) r1 ß (r0) + 10 IF1 ID1 EX1 MA1 WB1
(I2) r4 ß (r1) + 17 IF2 ID2 ID2 ID2 ID2 EX2 MA2
(I3) IF3 IF3 IF3 IF3 ID3 EX3(I4) IF4 ID4
(I5) IF5stalled stages
Resource Usage
Department of Electrical & Computer Engineering
Resolving Data Hazards§ Strategy 1: Wait for the result to be available by
freezing earlier pipeline stages § Interlocks
§ Strategy 2: Route data as soon as possible after it is calculated to the earlier pipeline stage § Bypass
Department of Electrical & Computer Engineering
Resolving Data Hazards§ Strategy 3: Speculate on the dependence
§ Two cases:§ Guessed correctly
§ Do nothing
§ Guessed incorrectly § Kill and restart
Department of Electrical & Computer Engineering
Instruction Formats§ R-type instruction
§ I-type instruction & I-immediate (32 bits)
§ I-imm = signExtend(inst[31:20])§ S-type instruction & S-immediate (32 bits)
§ S-imm = signExtend({inst[31:25], inst[11:7]})
funct7 rs2 funct3rs1 rd opcode7 5 5 3 5 7
imm[11:0] funct3rs1 rd opcode12 5 3 5 7
imm[11:5] rs2 funct3rs1 imm[4:0] opcode7 5 5 3 5 7
Department of Electrical & Computer Engineering
Instruction Formats§ SB-type instruction & B-immediate (32 bits)
§ B-imm = signExtend({inst[31], inst[7], inst[30:25], inst[11:8], 1’b0})
§ U-type instruction & U-immediate (32 bits)
§ U-imm = signExtend({inst[31:12], 12’b0})
§ UJ-type instruction & J-immediate (32 bits)
§ J-imm = signExtend({inst[31], inst[19:12], inst[20], inst[30:21], 1’b0})
imm[12] imm[10:5] rs2 rs1 funct3 imm[4:1] imm[11] opcode1 6 5 5 3 4 1 7
rd opcode5 7
imm[31:12]20
imm[20] imm[10:1] rdimm[19:12]imm[11] opcode1 10 1 8 5 7
Department of Electrical & Computer Engineering
Instruction Formats
funct7 rs2 funct3rs1 rd opcode7 5 5 3 5 7
imm[11:0] funct3rs1 rd opcode
imm[11:5] rs2 funct3rs1 imm[4:0] opcode
imm[12] imm[10:5] rs2 rs1 funct3 imm[4:1-11] opcode
rd opcodeimm[31:12]
imm[20] imm[10:1] rdimm[19:12]imm[11] opcode
R-type
I-type
S-type
SB-type
U-type
UJ-type
Department of Electrical & Computer Engineering
Source and Destination Registersfunct7 rs2 funct3rs1 rd opcode
7 5 5 3 5 7
imm[11:0] funct3rs1 rd opcode
imm[11:5] rs2 funct3rs1 imm[4:0] opcode
imm[12] imm[10:5] rs2 rs1 funct3 imm[4:1-11] opcode
rd opcodeimm[31:12]
imm[20] imm[10:1] rdimm[19:12]imm[11] opcode
R-type
I-type
S-type
SB-type
U-type
UJ-type
source(s) destinationALU rd ß (rs1) [func3,func7] (rs2) rs1, rs2 rdALUi rd ß (rs1) [func3] I-imm rs1 rd
rd ß (rs1) [funct3, inst[30]] I-imm[4:0] rs1 rd
Department of Electrical & Computer Engineering
Source and Destination Registersfunct7 rs2 funct3rs1 rd opcode
7 5 5 3 5 7
imm[11:0] funct3rs1 rd opcode
imm[11:5] rs2 funct3rs1 imm[4:0] opcode
imm[12] imm[10:5] rs2 rs1 funct3 imm[4:1-11] opcode
rd opcodeimm[31:12]
imm[20] imm[10:1] rdimm[19:12]imm[11] opcode
R-type
I-type
S-type
SB-type
U-type
UJ-type
source(s) destinationALU rd ß (rs1) [func3,func7] (rs2) rs1, rs2 rdALUi rd ß (rs1) [func3] I-imm rs1 rd
rd ß (rs1) [funct3, inst[30]] I-imm[4:0] rs1 rdLW rd ß M [(rs1) + imm] rs1 rdSW M [(rs1) + imm] ß (rs2) rs1, rs2LUI rd ß U-imm rdAUIPC rd ß pc + U-imm rd
Department of Electrical & Computer Engineering
Source and Destination Registerssource(s) destination
ALU rd ß (rs1) [func3,func7] (rs2) rs1, rs2 rdALUi rd ß (rs1) [func3] I-imm rs1 rd
rd ß (rs1) [funct3, inst[30]] I-imm[4:0] rs1 rdLW rd ß M [(rs1) + imm] rs1 rdSW M [(rs1) + imm] ß (rs2) rs1, rs2LUI rd ß U-imm rdAUIPC rd ß pc + U-imm rdJAL rd ß pc + 4 rd
pc ß pc + J-immJALR rd ß pc + 4 rs1 rd
pc ß (rs1 + I-imm) & ~0x01BR pc ßcompare(funct3, rs1, rs2) ? rs1, rs2
pc + B-imm : pc + 4
Department of Electrical & Computer Engineering
Types of Data Hazards § Consider executing a sequence of
rk ß (ri) op (rj) type of instructions
Output-dependencer3 ß (r1) op (r2) Write-after-Write r3 ß (r6) op (r7) (WAW) hazard
Data-dependencer3 ß (r1) op (r2) Read-after-Write r5 ß (r3) op (r4) (RAW) hazard
Anti-dependencer3 ß (r1) op (r2) Write-after-Read r1 ß (r4) op (r5) (WAR) hazard
Department of Electrical & Computer Engineering
Detecting Data Hazards§ Range and Domain of instruction i
§ R(i) = Registers (or other storage) modified by instruction i
§ D(i) = Registers (or other storage) read by instruction
§ Suppose instruction j follows instruction i in the program order. Executing instruction j before the effect of instruction i has taken place can cause a§ RAW hazard if R(i) D(j) !=§ WAR hazard if D(i) R(j) !=§ WAW hazard if R(i) R(j) !=
UUU
OOO
Department of Electrical & Computer Engineering
Register vs. Memory Data Dependence
§ Data hazards due to register operands can be determined at the decode stage
§ Data hazards due to memory operands can be determined only after computing the effective address§ store M[(r1) + disp1] ß (r2) § load r3 ß M[(r4) + disp2]
§ Does (r1 + disp1) = (r4 + disp2) ?
Department of Electrical & Computer Engineering
Data Hazards: An ExampleI1 ADD x6, x6, x4
I2 LW x2, 44(x3)
I3 SUB x5, x2, x4
I4 AND x8, x6, x2
I5 SUB x10, x5, x6
I6 ADD x6, x8, x2
RAW Hazards
Department of Electrical & Computer Engineering
Data Hazards: An Example
RAW HazardsWAR Hazards
I1 ADD x6, x6, x4
I2 LW x2, 44(x3)
I3 SUB x5, x2, x4
I4 AND x8, x6, x2
I5 SUB x10, x5, x6
I6 ADD x6, x8, x2
Department of Electrical & Computer Engineering
Data Hazards: An Example
RAW HazardsWAR HazardsWAW Hazards
I1 ADD x6, x6, x4
I2 LW x2, 44(x3)
I3 SUB x5, x2, x4
I4 AND x8, x6, x2
I5 SUB x10, x5, x6
I6 ADD x6, x8, x2
Department of Electrical & Computer Engineering
Data Hazards: An Example
RAW HazardsWAR HazardsWAW Hazards
I6
I2
I4
I1
I5
I3
I1 ADD x6, x6, x4
I2 LW x2, 44(x3)
I3 SUB x5, x2, x4
I4 AND x8, x6, x2
I5 SUB x10, x5, x6
I6 ADD x6, x8, x2
Department of Electrical & Computer Engineering
Resolving Data Hazards§ Strategy 1: Wait for the result to be available by
freezing earlier pipeline stages § Interlocks
§ Strategy 2: Route data as soon as possible after it is calculated to the earlier pipeline stage § Bypass
Department of Electrical & Computer Engineering
Interlocks to resolve Data Hazards
IRIR IR31
PCA
B
Y
R
MD1 MD2
addrinst
InstMemory
0x4Add
IR
ImmExt
ALUrd1
GPRs
rs1rs2
wswd rd2
we
wdata
addr
wdata
rdataData Memory
we
nop
...x1 ß x2 + 10x4 ß x1 + 17...
Stall Condition
Department of Electrical & Computer Engineering
Cdest
we
re1 re2
Cre
ws we wsCdest Cdest
we
IRIR IR31
PCA
B
Y
R
MD1 MD2
addrinst
InstMemory
0x4Add
IR
ImmExt
ALUrd1
GPRs
rs1rs2
wswd rd2
we
wdata
addr
wdata
rdataData Memory
we
Interlock Control Logic
nop
stallCstall
ws
rsrt
Department of Electrical & Computer Engineering
Deriving the Stall SignalCdest
ws = Case opcodeALU , ALUi à rdLW, LUI à rdJAL, JALR, AUIPC à rd
we = Case opcodeALU, ALUi, LW, LUI à (ws != 0) JAL, JALR, AUIPC à on... à off
Crere1 = Case opcode
ALU, ALUi, LW, SW, BR, JALR à onLUI, JAL, AUIPC à off
re2 = Case opcodeALU, SW, BR à on
… à off
Cstall= ((rs1D =wsE).weE + (rs1D =wsM).weM + (rs1D =wsW).weW) . re1D +((rs2D =wsE).weE + (rs2D =wsM).weM + (rs2D =wsW).weW) . re2D
Department of Electrical & Computer Engineering
Cdest
we
re1 re2
Cre
ws we wsCdest Cdest
we
IRIR IR31
PCA
B
Y
R
MD1 MD2
addrinst
InstMemory
0x4Add
IR
ImmExt
ALUrd1
GPRs
rs1rs2
wswd rd2
we
wdata
addr
wdata
rdataData Memory
we
Interlock Control Logic
nop
stallCstall
ws
rsrt
Department of Electrical & Computer Engineering
Resolving Data Hazards§ Strategy 1: Wait for the result to be available by
freezing earlier pipeline stages § Interlocks
§ Strategy 2: Route data as soon as possible after it is calculated to the earlier pipeline stage § Bypass
Department of Electrical & Computer Engineering
Feedback to Resolve Hazards§ Later stages provide dependence information to earlier
stages which can stall (or kill) instructions § Controlling a pipeline in this manner works provided the
instruction at stage i+1 can complete without any interference from instructions in stages 1 to i§ Otherwise deadlocks may occur
FB1
stage1
stage2
stage3
stage4
FB2 FB3 FB4
Department of Electrical & Computer Engineering
Bypassing
§ Each stall or kill introduces a bubble in the pipeline à CPI > 1
timet0 t1 t2 t3 t4 t5 t6 t7 . .
(I1) r1 ß r0 + 10 IF1 ID1 EX1 MA1 WB1
(I2) r4 ß r1 + 17 IF2 ID2 EX2 MA2 WB2(I3) IF3 ID3 EX3 MA3 WB3
(I4) IF4 ID4 EX4 MA4
(I5) IF5 ID5 EX5
timet0 t1 t2 t3 t4 t5 t6 t7 . .
(I1) r1 ß (r0) + 10 IF1 ID1 EX1 MA1 WB1
(I2) r4 ß (r1) + 17 IF2 ID2 ID2 ID2 ID2 EX2 MA2
(I3) IF3 IF3 IF3 IF3 ID3 EX3(I4) IF4 ID4
(I5) IF5stalled stages
Department of Electrical & Computer Engineering
Adding a Bypass
IRIR IR31
PCA
B
Y
R
MD1 MD2
addrinst
InstMemory
0x4Add
IR
ImmExt
ALUrd1
GPRs
rs1rs2
wswd rd2
we
wdata
addr
wdata
rdataData Memory
we
nop
...x1 ß x2 + 10x4 ß x1 + 17...
Stall Condition
ASrc
Department of Electrical & Computer Engineering
Cstall= ((rs1D =wsE).weE + (rs1D =wsM).weM + (rs1D =wsW).weW) . re1D +((rs2D =wsE).weE + (rs2D =wsM).weM + (rs2D =wsW).weW) . re2D
Deriving the Stall Signal
ASrc = (rs1D=wsE).weE.re1D
Is this correct?
Cdestws = Case opcode
ALU , ALUi à rdLW, LUI à rdJAL, JALR, AUIPC à rd
we = Case opcodeALU, ALUi, LW, LUI à (ws != 0) JAL, JALR, AUIPC à on... à off
Crere1 = Case opcode
ALU, ALUi, LW, SW, BR, JALR à onLUI, JAL, AUIPC à off
re2 = Case opcodeALU, SW, BR à on
… à off
Department of Electrical & Computer Engineering
Deriving the Stall Signal
Load: lw x1, 10(x2)
x1 ß M[x2 + 10]x4 ß x1 + 17
Cstall= ((rs1D =wsE).weE + (rs1D =wsM).weM + (rs1D =wsW).weW) . re1D +((rs2D =wsE).weE + (rs2D =wsM).weM + (rs2D =wsW).weW) . re2D
ASrc = (rs1D=wsE).weE.re1D
Cdestws = Case opcode
ALU , ALUi à rdLW, LUI à rdJAL, JALR, AUIPC à rd
we = Case opcodeALU, ALUi, LW, LUI à (ws != 0) JAL, JALR, AUIPC à on... à off
Crere1 = Case opcode
ALU, ALUi, LW, SW, BR, JALR à onLUI, JAL, AUIPC à off
re2 = Case opcodeALU, SW, BR à on
… à off
Department of Electrical & Computer Engineering
Bypass and Stall Signals§ Split weE into two components: we-bypass, we-stall
we-bypassE = Case opcodeE
ALU, ALUi, AUIPC à (ws != 0) ... à off
ASrc = (rs1D =wsE).we-bypassE . re1D
stall = ((rs1D =wsE).we-stallE + (rs1D=wsM).weM +
(rs1D=wsW).weW). re1D+((rs2D = wsE).weE +
(rs2D = wsM).weM + (rs3D = wsW).weW). re2D
we-stallE = Case opcodeE
LW à (ws != 0) ... à off
Department of Electrical & Computer Engineering
Fully Bypassed Datapath
ASrcIRIR IR
PCA
B
Y
R
MD1 MD2
addrinst
InstMemory
0x4Add
IR ALU
ImmExt
rd1
GPRs
rs1rs2
wswdrd2
we
wdata
addr
wdata
rdataData Memory
we
31
nop
D
E M W
PC for JAL, ...
BSrc
Stall
Department of Electrical & Computer Engineering
Fully Bypassed Datapath
Is there still a need for the stall signal ?
ASrcIRIR IR
PCA
B
Y
R
MD1 MD2
addrinst
InstMemory
0x4Add
IR ALU
ImmExt
rd1
GPRs
rs1rs2
wswdrd2
we
wdata
addr
wdata
rdataData Memory
we
31
nop
D
E M W
PC for JAL, ...
BSrc
Stall
Department of Electrical & Computer Engineering
Fully Bypassed Datapath
stall = (rs1D=wsE). (opcodeE=LWE).(wsE != 0).re1D + (rs2D=wsE). (opcodeE=LWE).(wsE!= 0).re2D
ASrcIRIR IR
PCA
B
Y
R
MD1 MD2
addrinst
InstMemory
0x4Add
IR ALU
ImmExt
rd1
GPRs
rs1rs2
wswdrd2
we
wdata
addr
wdata
rdataData Memory
we
31
nop
D
E M W
PC for JAL, ...
BSrc
Stall
Department of Electrical & Computer Engineering
Why a program may have CPI >1§ Why an Instruction may not be dispatched every
cycle (CPI>1)?§ Full bypassing may be too expensive to
implement§ Typically all frequently used paths are provided§ Some infrequently used bypass paths may increase
cycle time and counteract the benefit of reducing CPI§ Loads have two cycle latency
§ Instruction after load cannot use load result§ MIPS-I ISA defined load delay slots, a software-visible
pipeline hazard (compiler schedules independent instruction or inserts NOP to avoid hazard) - Removed in MIPS-II
Department of Electrical & Computer Engineering
Resolving Data Hazards§ Strategy 1: Wait for the result to be available by
freezing earlier pipeline stages § Interlocks
§ Strategy 2: Route data as soon as possible after it is calculated to the earlier pipeline stage § Bypass