Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 1
The Processor
Chapter 4(Part II)
Baback IzadiDivision of Engineering [email protected]
SUNY – New PaltzElect. & Comp. Eng. 2
SUNY – New PaltzElect. & Comp. Eng. 2
Sequential Laundry
Sequential laundry takes 8 hours for 4 loads
30Task
Order
B
C
D
ATime
30 30 3030 30 3030 30 30 3030 30 30 3030
6 PM 7 8 9 10 11 12 1 2 AM
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 2
SUNY – New PaltzElect. & Comp. Eng. 3
SUNY – New PaltzElect. & Comp. Eng. 3
Pipelined Laundry6 PM 7 8 9
Time
B
C
D
A
3030 30 3030 30 30
Task
Order
Pipelined laundry: overlapping execution Parallelism improves performance
SUNY – New PaltzElect. & Comp. Eng. 4
SUNY – New PaltzElect. & Comp. Eng. 4
Pipelining Analogy
Four loads: Speedup
= 8/3.5 = 2.3
Non-stop: Speedup
= 2n/0.5n + 1.5 ≈ 4= number of stages
Pipelining doesn’t help latency of single task, it helps throughput of entire workload
Multiple tasks operating simultaneously using different resources
Potential speedup = Number pipe stages
Pipeline rate limited by slowestpipeline stage
Unbalanced lengths of pipe stages reduces speedup
Time to “fill” pipeline and time to “drain” it reduces speedup
Stall for dependencies
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 3
SUNY – New PaltzElect. & Comp. Eng. 5
SUNY – New PaltzElect. & Comp. Eng. 5
Single Cycle Implementation Calculate cycle time assuming negligible delays except: memory (2ns), ALU and adders (2ns), register file access (1ns)
M e m to R e g
M e m R e a d
M e m W r i te
A L U O p
A L U S rc
R e g D s t
P C
I n s t ru c t io n m e m o ry
R e a d a d d re s s
In s tru c t io n [3 1 – 0 ]
In s tr u c t io n [2 0 – 1 6 ]
In s tr u c t io n [2 5 – 2 1 ]
A d d
In s tru c t io n [5 – 0 ]
R e g W r i te
4
1 6 3 2In s tr u c t io n [1 5 – 0 ]
0R e g is t e rs
W r ite re g is te rW r ite d a ta
W r i te d a ta
R e a d d a ta 1
R e a d d a ta 2
R e a d re g is te r 1R e a d re g is te r 2
S ig n e x te n d
A L U re s u l t
Z e r o
D a ta m e m o ry
A d d re s s R e a d d a ta
M u x
1
0
M u x
1
0
M u x
1
0
M u x
1
In s tr u c t io n [1 5 – 1 1 ]
A L U c o n t ro l
S h i f t l e f t 2
P C S rc
A L U
A d d A L U re s u lt
InstructionInstr.
MemoryRegister
ReadALU Op.
Data Memory
Reg. Write Total
R-format 200ps 100ps 200ps 0 100ps 600pslw 200ps 100ps 200ps 200ps 100ps 800pssw 200ps 100ps 200ps 200ps 700psbeq 200ps 100ps 200ps 500ps
SUNY – New PaltzElect. & Comp. Eng. 6
SUNY – New PaltzElect. & Comp. Eng. 6
Single Stage VS. Pipeline PerformanceInstruction
Instr. Memory
Register Read
ALU Op.
Data Memory
Reg. Write Total
R-format 200ps 100ps 200ps 0 100ps 600pslw 200ps 100ps 200ps 200ps 100ps 800pssw 200ps 100ps 200ps 200ps 700psbeq 200ps 100ps 200ps 500ps
Single-cycle (Tc= 800ps)
Pipelined (Tc= 200ps)
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 4
SUNY – New PaltzElect. & Comp. Eng. 7
SUNY – New PaltzElect. & Comp. Eng. 7
MIPS Pipeline
Five stages, one step per stage1. IF: Instruction fetch from memory2. ID: Instruction decode & register read3. EX: Execute operation or calculate address4. MEM: Access memory operand5. WB: Write result back to register
SUNY – New PaltzElect. & Comp. Eng. 8
SUNY – New PaltzElect. & Comp. Eng. 8
Pipelining What makes it easy in MIPS All instructions are the same length Just a few instruction formats Memory operands appear only in loads and stores
What makes it hard? Structural hazards: suppose we had only one memory Control hazards: need to worry about branch instructions Data hazards: an instruction depends on a previous instruction
We’ll build a simple pipeline and look at these issues what makes it really hard for modern processors Exception handling Trying to improve performance with out-of-order execution
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 5
SUNY – New PaltzElect. & Comp. Eng. 9
SUNY – New PaltzElect. & Comp. Eng. 9
Pipeline Speedup
SUNY – New PaltzElect. & Comp. Eng. 10
SUNY – New PaltzElect. & Comp. Eng. 10
The Five Stages of Load
What do we need to add to actually split the datapath into stages?
Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory
Reg/Dec: Registers Fetch and Instruction Decode Exec: Calculate the memory address Mem: Read the data from the Data Memory Wr: Write the data back to the register file
Ifetch Reg/Dec Exec Mem WrLW
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 6
SUNY – New PaltzElect. & Comp. Eng. 11
SUNY – New PaltzElect. & Comp. Eng. 11
Basic Idea
Instructionme mory
Addre ss
4
32
0
Add Addres ult
Shiftleft 2
Ins truction
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readre gis ter 1
Readre gis ter 2
16Sign
exte nd
Writere gis ter
Writedata
ReaddataAddre ss
Datamemory
1
ALUres ult
Mux
ALUZero
IF: Instruction fetch ID: Instruction decode/register file read
EX: Execute/address calculation
MEM: Memory access WB: Write back
SUNY – New PaltzElect. & Comp. Eng. 12
SUNY – New PaltzElect. & Comp. Eng. 12
Basic Idea
Instructionme mory
Addre ss
4
32
0
Add Addres ult
Shiftleft 2
Ins truction
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readre gis ter 1
Readre gis ter 2
16Sign
exte nd
Writere gis ter
Writedata
ReaddataAddre ss
Datamemory
1
ALUres ult
Mux
ALUZero
IF: Instruction fetch ID: Instruction decode/register file read
EX: Execute/address calculation
MEM: Memory access WB: Write back
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 7
SUNY – New PaltzElect. & Comp. Eng. 13
SUNY – New PaltzElect. & Comp. Eng. 13
Pipelining and ISA Design
MIPS ISA designed for pipelining All instructions are 32-bits Easier to fetch and decode in one cycle c.f. x86: 1- to 17-byte instructions
Few and regular instruction formats Can decode and read registers in one step
Load/store addressing Can calculate address in 3rd stage, access memory in 4th stage
Alignment of memory operands Memory access takes only one cycle
SUNY – New PaltzElect. & Comp. Eng. 14
SUNY – New PaltzElect. & Comp. Eng. 14
Pipeline registers Need registers between stages To hold information produced in previous cycle
• Can you find a problem? • What instructions can we execute to manifest the problem?
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 8
SUNY – New PaltzElect. & Comp. Eng. 15
SUNY – New PaltzElect. & Comp. Eng. 15
Pipeline Operation
Cycle-by-cycle flow of instructions through the pipelined datapath “Single-clock-cycle” pipeline diagram Shows pipeline usage in a single cycle
Highlight resources used
c.f. “multi-clock-cycle” diagram Graph of operation over time
We’ll look at “single-clock-cycle” diagrams for load & store
SUNY – New PaltzElect. & Comp. Eng. 16
SUNY – New PaltzElect. & Comp. Eng.
IF for Load, Store, …
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 9
SUNY – New PaltzElect. & Comp. Eng. 17
SUNY – New PaltzElect. & Comp. Eng.
ID for Load, Store, …
SUNY – New PaltzElect. & Comp. Eng. 18
SUNY – New PaltzElect. & Comp. Eng.
EX for Load
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 10
SUNY – New PaltzElect. & Comp. Eng. 19
SUNY – New PaltzElect. & Comp. Eng.
MEM for Load
SUNY – New PaltzElect. & Comp. Eng. 20
SUNY – New PaltzElect. & Comp. Eng.
WB for Load
Wrongregisternumber
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 11
SUNY – New PaltzElect. & Comp. Eng. 21
SUNY – New PaltzElect. & Comp. Eng.
Corrected Datapath for Load
SUNY – New PaltzElect. & Comp. Eng. 22
SUNY – New PaltzElect. & Comp. Eng.
EX for Store
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 12
SUNY – New PaltzElect. & Comp. Eng. 23
SUNY – New PaltzElect. & Comp. Eng.
MEM for Store
SUNY – New PaltzElect. & Comp. Eng. 24
SUNY – New PaltzElect. & Comp. Eng.
WB for Store
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 13
SUNY – New PaltzElect. & Comp. Eng. 25
SUNY – New PaltzElect. & Comp. Eng. 25
Graphically Representing Pipelines
Can help with answering questions like: How many cycles does it take to execute this code? What is the ALU doing during cycle 4? Use this representation to help understand datapaths
IM Reg DM Reg
IM Reg DM Reg
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
lw $10, 20($1)
Programexecutionorder(in instructions)
sub $11, $2, $3
ALU
ALU
SUNY – New PaltzElect. & Comp. Eng. 26
SUNY – New PaltzElect. & Comp. Eng. 26
Why Pipeline?
Instr.
Order
Time (clock cycles)
Inst 0
Inst 1
Inst 2
Inst 4
Inst 3
AL
UIm Reg Dm Reg
AL
UIm Reg Dm Reg
AL
UIm Reg Dm Reg
AL
UIm Reg Dm Reg
AL
UIm Reg Dm Reg
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 14
SUNY – New PaltzElect. & Comp. Eng. 27
SUNY – New PaltzElect. & Comp. Eng. 27
Multi-Cycle Pipeline Diagram Form showing resource usage
SUNY – New PaltzElect. & Comp. Eng. 28
SUNY – New PaltzElect. & Comp. Eng. 28
Single-Cycle Pipeline Diagram State of pipeline in a given cycle
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 15
SUNY – New PaltzElect. & Comp. Eng. 29
SUNY – New PaltzElect. & Comp. Eng.
Pipelined Control (Simplified)
SUNY – New PaltzElect. & Comp. Eng. 30
SUNY – New PaltzElect. & Comp. Eng. 30
Pipeline Control
We have 5 stages. What needs to be controlled in each stage? Instruction Fetch and PC Increment Instruction Decode / Register Fetch Execution Memory Stage Write Back
How would control be handled in an automobile plant? A fancy control center telling everyone what to do? Should we use a finite state machine?
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 16
SUNY – New PaltzElect. & Comp. Eng. 31
SUNY – New PaltzElect. & Comp. Eng. 31
Pipeline Control Control signals derived from instruction As in single-cycle implementation
Pass control signals along just like the data
Execution/Address Calculation stage control lines
Memory access stage control lines
Write-back stage control
lines
InstructionReg Dst
ALU Op1
ALU Op0
ALU Src Branch
M em Read
M em Write
Reg write
M em to Reg
R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X
C ontrol
E X
M
W B
M
W B
W B
IF/ID ID /E X E X /M EM M E M /W B
Instruction
SUNY – New PaltzElect. & Comp. Eng. 32
SUNY – New PaltzElect. & Comp. Eng.
Pipelined Control
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 17
SUNY – New PaltzElect. & Comp. Eng. 33
SUNY – New PaltzElect. & Comp. Eng. 33
Designing a Pipelined Processor
Go back and examine your datapath and control diagram
Associated resources with states
Ensure that flows do not conflict, or figure out how to resolve
Assert control in appropriate stage
SUNY – New PaltzElect. & Comp. Eng. 34
SUNY – New PaltzElect. & Comp. Eng. 34
Pipelining Troubles? Pipeline Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards: attempt to use the same resource two different
ways at the same time. Data hazards: attempt to use item before it is ready. Instruction depends on result of prior instruction still in the pipeline.
Control hazards: attempt to make a decision before condition is evaluated.
Branch instructions Can always resolve hazards by waiting. Pipeline control must detect the hazard. Take action (or delay action) to resolve hazards.
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 18
SUNY – New PaltzElect. & Comp. Eng. 35
SUNY – New PaltzElect. & Comp. Eng. 35
Structure Hazards
Conflict for use of a resource
In MIPS pipeline with a single memory Load/store requires data access Instruction fetch would have to stall for that cycle Would cause a pipeline “bubble”
Hence, pipelined datapaths require separate instruction/data memories Or separate instruction/data caches
SUNY – New PaltzElect. & Comp. Eng. 36
SUNY – New PaltzElect. & Comp. Eng. 36
Data Hazards Problem with starting next instruction before first is finished Dependencies that “go backward in time” are data hazards
IM Reg
IM Reg
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
sub $2, $1, $3
Programexecutionorder(in instructions)
and $12, $2, $5
IM Reg DM Reg
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9
10 10 10 10 10/– 20 – 20 – 20 – 20 – 20
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
Value ofregister $2:
DM Reg
Reg
Reg
Reg
DM
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 19
SUNY – New PaltzElect. & Comp. Eng. 37
SUNY – New PaltzElect. & Comp. Eng. 37
Data Hazard Solution An instruction depends on completion of data access by a
previous instruction add $s0, $t0, $t1sub $t2, $s0, $t3
nop
nop
SUNY – New PaltzElect. & Comp. Eng. 38
SUNY – New PaltzElect. & Comp. Eng. 38
Software Solution
Have compiler guarantee no hazards Where should compiler insert “nop” instructions?
sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)
Problem: It happens too often to rely on compiler It really slows us down!
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 20
SUNY – New PaltzElect. & Comp. Eng. 39
SUNY – New PaltzElect. & Comp. Eng. 39
Data HazardsSoftware Solution
IM Reg
IM Reg
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
sub $2, $1, $3
Programexecutionorder(in instructions)
and $12, $2, $5
IM Reg DM Reg
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9
10 10 10 10 10/– 20 – 20 – 20 – 20 – 20
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
Value ofregister $2:
DM Reg
Reg
Reg
Reg
DM
SUNY – New PaltzElect. & Comp. Eng. 40
SUNY – New PaltzElect. & Comp. Eng. 40
Code Scheduling to Avoid Stalls C code for A = B + E; C = B + F;
lw $t1, 0($t0)
lw $t2, 4($t0)
add $t3, $t1, $t2
sw $t3, 12($t0)
lw $t4, 8($t0)
add $t5, $t1, $t4
sw $t5, 16($t0)
stall
stall
11 cycles not counting the dependencies
IM Reg
IM Reg
IM Reg DM Reg
IM DM Reg
IM DM Reg
DM Reg
Reg
Reg
Reg
DM
IM DM Reg
IM DM RegReg
Reg
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 21
SUNY – New PaltzElect. & Comp. Eng. 41
SUNY – New PaltzElect. & Comp. Eng. 41
Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction
C code for A = B + E; C = B + F;
lw $t1, 0($t0)
lw $t2, 4($t0)
nop
nop
add $t3, $t1, $t2
sw $t3, 12($t0)
lw $t4, 8($t0)
nop
nop
add $t5, $t1, $t4
sw $t5, 16($t0)
stall
stall
lw $t1, 0($t0)
lw $t2, 4($t0)
nop
lw $t4, 8($t0)
add $t3, $t1, $t2
sw $t3, 12($t0)
add $t5, $t1, $t4
sw $t5, 16($t0)
12 cycles15 cycles
SUNY – New PaltzElect. & Comp. Eng. 42
SUNY – New PaltzElect. & Comp. Eng. 42
Forwarding (aka Bypassing) Use result when it is computed Don’t wait for it to be stored in a register Requires extra connections in the datapath
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 22
SUNY – New PaltzElect. & Comp. Eng. 43
SUNY – New PaltzElect. & Comp. Eng. 43
Data Hazards in ALU Instructions
Consider this sequence:sub $2, $1,$3and $12,$2,$5or $13,$6,$2add $14,$2,$2sw $15,100($2)
We can resolve hazards with forwarding How do we detect when to forward?
§4.7 Data H
azards: Forw
arding vs. Stalling
SUNY – New PaltzElect. & Comp. Eng. 44
SUNY – New PaltzElect. & Comp. Eng. 44
Data Hazard Solution: Forwarding Use temporary results (ALU forwarding), don’t wait for them to be
written Also, write register file during 1st half of clock and read during 2nd
half
what if this $2 was $13?
IM R e g
IM R e g
C C 1 C C 2 C C 3 C C 4 C C 5 C C 6
T im e (in clock cycle s)
sub $ 2 , $ 1 , $ 3
P rograme xe cution orde r(in instructions)
a nd $ 1 2 , $ 2 , $5
IM R eg D M R e g
IM D M R e g
IM D M R e g
C C 7 C C 8 C C 9
10 1 0 1 0 1 0 1 0 /– 2 0 – 2 0 – 2 0 – 2 0 – 20
or $ 1 3 , $ 6 , $2
a dd $ 1 4 , $ 2 , $2
sw $ 15 , 1 0 0 ($ 2 )
V alue of re giste r $ 2 :
D M R e g
R eg
R e g
R e g
X X X – 20 X X X X XV a lue of E X /M E M :X X X X – 2 0 X X X XV a lue of M E M /W B :
D M
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 23
SUNY – New PaltzElect. & Comp. Eng. 45
SUNY – New PaltzElect. & Comp. Eng. 45
Data Hazard Solution: Forwarding
what if this $2 was $13?
IM R e g
IM R e g
C C 1 C C 2 C C 3 C C 4 C C 5 C C 6
T im e (in clock cycle s)
sub $ 2 , $ 1 , $ 3
P rogra me xe cution orde r(in instructions)
a nd $ 1 2 , $ 2 , $5
C C 7 C C 8 C C 9
10 1 0 1 0 1 0 1 0 /– 2 0 – 2 0 – 2 0 – 2 0 – 20
or $ 1 3 , $ 6 , $2
a dd $ 1 4 , $ 2 , $2
sw $ 15 , 1 0 0 ($ 2 )
V alue of re giste r $ 2 :
D M R e g
R eg
X X X – 20 X X X X XV a lue of E X /M E M :X X X X – 2 0 X X X XV a lue of M E M /W B :
D M
SUNY – New PaltzElect. & Comp. Eng. 46
SUNY – New PaltzElect. & Comp. Eng.
Dependencies & Forwarding
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 24
SUNY – New PaltzElect. & Comp. Eng. 47
SUNY – New PaltzElect. & Comp. Eng.
Forwarding Paths
SUNY – New PaltzElect. & Comp. Eng. 48
SUNY – New PaltzElect. & Comp. Eng. 48
Detecting the Need to Forward
Pass register numbers along pipeline e.g., ID/EX.RegisterRs = register number for Rs sitting in
ID/EX pipeline register
ALU operand register numbers in EX stage are given by ID/EX.RegisterRs, ID/EX.RegisterRt
Data hazards when1a. EX/MEM.RegisterRd = ID/EX.RegisterRs1b. EX/MEM.RegisterRd = ID/EX.RegisterRt2a. MEM/WB.RegisterRd = ID/EX.RegisterRs2b. MEM/WB.RegisterRd = ID/EX.RegisterRt
Fwd fromEX/MEMpipeline reg
Fwd fromMEM/WBpipeline reg
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 25
SUNY – New PaltzElect. & Comp. Eng. 49
SUNY – New PaltzElect. & Comp. Eng. 49
Detecting the Need to Forward
But only if forwarding instruction will write to a register! EX/MEM.RegWrite, MEM/WB.RegWrite
And only if Rd for that instruction is not $zero EX/MEM.RegisterRd ≠ 0,
MEM/WB.RegisterRd ≠ 0
SUNY – New PaltzElect. & Comp. Eng. 50
SUNY – New PaltzElect. & Comp. Eng.
Datapath with Forwarding00 Register file
01 Mem. or earlier ALU
10 Prior ALU
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 26
SUNY – New PaltzElect. & Comp. Eng. 51
SUNY – New PaltzElect. & Comp. Eng. 51
Forwarding Conditions
EX hazard if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))ForwardA = 10
if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)and (EX/MEM.RegisterRd = ID/EX.RegisterRt))
ForwardB = 10
MEM hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))ForwardA = 01
if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB = 01
SUNY – New PaltzElect. & Comp. Eng. 52
SUNY – New PaltzElect. & Comp. Eng. 52
Hazard Conditions Steer the result from previous instruction to the ALU
EX hazardif (EX/MEM.RegWriteand (EX/MEM.RegisterRd ≠0)and (EX /MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10if (EX/MEM.RegWriteand (EX/MEM.RegisterRd ≠0)and (EX /MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10
PC Instructionmemory
Registers
Mux
Mux
Control
ALU
EX
M
WB
M
WB
WB
ID/EX
EX/MEM
MEM/WB
Datamemory
Mux
Forwardingunit
IF/ID
Inst
ruct
i on
Mux
RdEX/MEM.RegisterRd
MEM/WB.RegisterRd
RtRt
Rs
IF/ID.RegisterRd
IF/ID.RegisterRtIF/ID.RegisterRt
IF/ID.RegisterRs
00 Register file
01 Mem. or earlier ALU
10 Prior ALU
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 27
SUNY – New PaltzElect. & Comp. Eng. 53
SUNY – New PaltzElect. & Comp. Eng. 53
Hazard Conditions Steer the result from precious instruction to the ALU
MEM hazardif (MEM/WB.RegWriteand (MEM/WB.RegisterRd ≠0)and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01if (MEM/WB.RegWriteand (MEM/WB.RegisterRd ≠0)and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01
PC Instructionmemory
Registers
Mux
Mux
Control
ALU
EX
M
WB
M
WB
WB
ID/EX
EX/MEM
MEM/WB
Datamemory
Mux
Forwardingunit
IF/ID
Inst
ruct
ion
Mux
RdEX/MEM.RegisterRd
MEM/WB.RegisterRd
RtRt
Rs
IF/ID.RegisterRd
IF/ID.RegisterRtIF/ID.RegisterRt
IF/ID.RegisterRs
SUNY – New PaltzElect. & Comp. Eng. 54
SUNY – New PaltzElect. & Comp. Eng. 54
Double Data Hazard
Consider the sequence:add $1,$1,$2add $1,$1,$3add $1,$1,$4
Both hazards occur Want to use the most recent
Revise MEM hazard condition Only fwd if EX hazard condition isn’t true
IM R eg D M R e g
IM D M R e g
IM D M R e gR e g
R eg
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 28
SUNY – New PaltzElect. & Comp. Eng. 55
SUNY – New PaltzElect. & Comp. Eng. 55
Revised Forwarding Condition
MEM hazard
if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA = 01
if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt))
and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB = 01
SUNY – New PaltzElect. & Comp. Eng. 56
SUNY – New PaltzElect. & Comp. Eng. 56
Can't always forward lw can still cause a hazard: An instruction tries to read a register following a load
instruction that writes to the same register.
Thus, we need a hazard detection unit to “stall” the load instruction
Reg
IM
Reg
Reg
IM
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
lw $2, 20($1)
Programexecutionorder(in instructions)
and $4, $2, $5
IM Reg DM Reg
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9
or $8, $2, $6
add $9, $4, $2
slt $1, $6, $7
DM Reg
Reg
Reg
DM
Need to stall for one cycle
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 29
SUNY – New PaltzElect. & Comp. Eng. 57
SUNY – New PaltzElect. & Comp. Eng. 57
Load-Use Data Hazard Can’t always avoid stalls by forwarding If value not computed when needed Can’t forward backward in time!
SUNY – New PaltzElect. & Comp. Eng. 58
SUNY – New PaltzElect. & Comp. Eng. 58
Load-Use Hazard Detection
Check when using instruction is decoded in ID stage ALU operand register numbers in ID stage are given by IF/ID.RegisterRs, IF/ID.RegisterRt
Load-use hazard when ID/EX.MemRead and
((ID/EX.RegisterRt = IF/ID.RegisterRs) or(ID/EX.RegisterRt = IF/ID.RegisterRt))
If detected, stall and insert bubble
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 30
SUNY – New PaltzElect. & Comp. Eng. 59
SUNY – New PaltzElect. & Comp. Eng. 59
How to Stall the Pipeline
Force control values in ID/EX register to 0 EX, MEM and WB do nop (no-operation)
Prevent update of PC and IF/ID register Using instruction is decoded again Following instruction is fetched again
1-cycle stall allows MEM to read data for lw Can subsequently forward to EX stage
SUNY – New PaltzElect. & Comp. Eng. 60
SUNY – New PaltzElect. & Comp. Eng.
Stall/Bubble in the Pipeline
Stall inserted here
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 31
SUNY – New PaltzElect. & Comp. Eng. 61
SUNY – New PaltzElect. & Comp. Eng.
Stall/Bubble in the Pipeline
Or, more accurately…
SUNY – New PaltzElect. & Comp. Eng. 62
SUNY – New PaltzElect. & Comp. Eng.
Datapath with Hazard Detection Stall by letting an instruction that won’t write anything go forward Controls writing of the PC and IF/ID plus MUX
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 32
SUNY – New PaltzElect. & Comp. Eng. 63
SUNY – New PaltzElect. & Comp. Eng. 63
Code Scheduling to Avoid Stalls Revisiting reordering code to avoid use of load result in the next
instruction
C code for A = B + E; C = B + F;
lw $t1, 0($t0)
lw $t2, 4($t0)
nop
add $t3, $t1, $t2
sw $t3, 12($t0)
lw $t4, 8($t0)
nop
add $t5, $t1, $t4
sw $t5, 16($t0)
stall
stall
lw $t1, 0($t0)
lw $t2, 4($t0)
lw $t4, 8($t0)
add $t3, $t1, $t2
sw $t3, 12($t0)
add $t5, $t1, $t4
sw $t5, 16($t0)
11 cycles
13 cycles
SUNY – New PaltzElect. & Comp. Eng. 64
SUNY – New PaltzElect. & Comp. Eng. 64
Control Hazards
Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can’t always fetch correct instruction Still working on ID stage of branch
In MIPS pipeline Need to compare registers and compute target early in the pipeline Add hardware to do it in ID stage
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 33
SUNY – New PaltzElect. & Comp. Eng. 65
SUNY – New PaltzElect. & Comp. Eng. 65
Branch Hazards When decide to branch, other instructions may be in the pipeline!
If branch outcome determined in MEM
§4.8 Control H
azards
PC
Flush theseinstructions(Set controlvalues to 0)
SUNY – New PaltzElect. & Comp. Eng. 66
SUNY – New PaltzElect. & Comp. Eng. 66
Stall on Branch Wait until branch outcome determined before fetching next
instruction
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 34
SUNY – New PaltzElect. & Comp. Eng. 67
SUNY – New PaltzElect. & Comp. Eng. 67
Branch Prediction
Longer pipelines can’t readily determine branch outcome early Stall penalty becomes unacceptable
Predict outcome of branch Only stall if prediction is wrong
In MIPS pipeline Can predict branches not taken Fetch instruction after branch, with no delay Need to add hardware for flushing instructions if we are wrong
SUNY – New PaltzElect. & Comp. Eng. 68
SUNY – New PaltzElect. & Comp. Eng.
MIPS with Predict Not Taken
Prediction correct
Prediction incorrect
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 35
SUNY – New PaltzElect. & Comp. Eng. 69
SUNY – New PaltzElect. & Comp. Eng. 69
Our Original Datapath
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
Reg
Writ
e
MemRead
Control
ALU
Instruction[15– 11]
6
EX
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Wr it
e
AddressData
memory
Address
SUNY – New PaltzElect. & Comp. Eng. 70
SUNY – New PaltzElect. & Comp. Eng. 70
Reduce Branch Delay
PC Instructionmemory
4
Registers
Mux
Mux
Mux
ALU
EX
M
WB
M
WB
WB
ID/EX
0
EX/MEM
MEM/WB
Datamemory
Mux
Hazarddetection
unit
Forwardingunit
IF.Flush
IF/ID
Signextend
Control
Mux
=
Shiftleft2
Mux
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 36
SUNY – New PaltzElect. & Comp. Eng. 71
SUNY – New PaltzElect. & Comp. Eng. 71
Reducing Branch Delay
Move hardware to determine outcome to ID stage Target address adder Register comparator
Example: branch taken36: sub $10, $4, $8
40: beq $1, $3, 744: and $12, $2, $548: or $13, $2, $652: add $14, $4, $256: slt $15, $6, $7
...72: lw $4, 50($7)
SUNY – New PaltzElect. & Comp. Eng. 72
SUNY – New PaltzElect. & Comp. Eng.
Example: Branch Taken
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 37
SUNY – New PaltzElect. & Comp. Eng. 73
SUNY – New PaltzElect. & Comp. Eng.
Example: Branch Taken
48
SUNY – New PaltzElect. & Comp. Eng. 74
SUNY – New PaltzElect. & Comp. Eng. 74
Pipeline Summary
Pipelining improves performance by increasing instruction throughput Executes multiple instructions in parallel Each instruction has the same latency
Subject to hazards Structure, data, control
Instruction set design affects complexity of pipeline implementation
The BIG Picture
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture
Division of Engineering Programs – SUNY New Paltz 38
SUNY – New PaltzElect. & Comp. Eng. 75
SUNY – New PaltzElect. & Comp. Eng. 75
Stalls and Performance
Stalls reduce performance But are required to get correct results
Compiler can arrange code to avoid hazards and stalls Requires knowledge of the pipeline structure
The BIG Picture