Post on 07-Jan-2016
description
transcript
Lecture 9. MIPS Processor Design – Pipelined Processor Design #2
Prof. Taeweon SuhComputer Science Education
Korea University
2010 R&E Computer System Education & Research
Korea Univ
Pipelined Datapath
2
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Korea Univ
Example for lw instruction: Instruction Fetch (IF)
3
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Instruction fetch
Korea Univ
Example for lw instruction: Instruction Decode (ID)
4
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Instruction decode
Korea Univ
Example for lw instruction: Execution (EX)
5
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Execution
Korea Univ
Example for lw instruction: Memory (MEM)
6
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Memory
Korea Univ
Example for lw instruction: Writeback (WB)
7
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Writeback
Korea Univ
Example for sw instruction: Memory (MEM)
8
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Memory
Korea Univ
Example for sw instruction: Writeback (WB): do nothing
9
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
Writeback
Korea Univ
Corrected Datapath (for lw)
10
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0
Address
Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
Datamemory
1
ALUresult
Mux
ALUZero
ID/EX
Korea Univ
Pipelining Example
11
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
r uct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Datamemory
Address
add $14, $5, $6 lw $13, 24($1) add $12, $3, $4 sub $11, $2, $3 lw $10, 20($1)
Korea Univ
Pipeline Control
12
PC
Instructionmemory
Address
Inst
ruct
ion
Instruction[20– 16]
MemtoReg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0Registers
Writeregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1Write
data
Read
data Mux
1
ALUcontrol
RegWrite
MemRead
Instruction[15– 11]
6
IF/ID ID/EX EX/MEM MEM/WB
MemWrite
Address
Datamemory
PCSrc
Zero
AddAdd
result
Shiftleft 2
ALUresult
ALU
Zero
Add
0
1
Mux
0
1
Mux
Note that in this implementation, branch instruction decides whether to branch in the MEM stage
Korea Univ
Pipeline Control
• We have 5 stages IF, ID, EX, MEM, WB
• What needs to be controlled in each stage? Instruction fetch and PC increment Instruction decode / operand fetch Execution stage
• RegDst• ALUop[1:0]• ALUSrc
Memory stage• Branch• MemRead• MemWrite
Writeback• MemtoReg• RegWrite (note that this signal is in ID stage)
13
Korea Univ
Pipeline Control
• Extend pipeline registers to include control information (created in ID)
• Pass control signals along just like the data
14
Execution/Address Calculation stage control
linesMemory access stage
control lines
Write-back stage control
lines
InstructionReg Dst
ALU Op1
ALU Op0
ALU Src Branch
Mem Read
Mem Write
Reg write
Mem to Reg
R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X
Control
EX
M
WB
M
WB
WB
IF/ID ID/EX EX/MEM MEM/WB
Instruction
Korea Univ
Datapath with Control
15
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
RegW
rite
MemRead
Control
ALU
Instruction[15– 11]
6
EX
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Writ
e
AddressData
memory
Address
Korea Univ
Datapath with Control
16
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
Reg
Writ
e
MemRead
Control
ALU
Instruction[15– 11]
6
EX
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Writ
e
AddressData
memory
Address
IF: lw $10, 9($1)IF: lw $10, 9($1)
Korea Univ
Datapath with Control
17
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
Reg
Writ
e
MemRead
Control
ALU
Instruction[15– 11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Writ
e
AddressData
memory
Address
IF: sub $11, $2, $3IF: sub $11, $2, $3 ID: lw $10, 9($1)ID: lw $10, 9($1)
11
010
0001E
“lw”
Korea Univ
Datapath with Control
18
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
Reg
Writ
e
MemRead
Control
ALU
Instruction[15– 11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Writ
e
AddressData
memory
Address
11
010
00E
ID: sub $11, $2, $3ID: sub $11, $2, $3 EX: lw $10, 9($1)EX: lw $10, 9($1)IF: and $12, $4, $5IF: and $12, $4, $5
1
0
10
000
1100
“sub”
Korea Univ
Datapath with Control
19
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
Reg
Writ
e
MemRead
Control
ALU
Instruction[15– 11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Writ
e
AddressData
memory
Address
10
000
10E
EX: sub $11, $2, $3EX: sub $11, $2, $3 MEM: lw $10, 9($1)MEM: lw $10, 9($1)ID: and $12, $4, $5ID: and $12, $4, $5
0
1
10
000
1100
IF: or $13, $6, $7IF: or $13, $6, $7
110
10
“and”
Korea Univ
Datapath with Control
20
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
Reg
Writ
e
MemRead
Control
ALU
Instruction[15– 11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Writ
e
AddressData
memory
Address
10
000
10E
MEM: sub $11, ..MEM: sub $11, .. WB: lw $10, WB: lw $10,
9($1)9($1)
EX: and $12, $4, $5EX: and $12, $4, $5
0
1
10
000
1100
ID: or $13, $6, $7ID: or $13, $6, $7
100
00
“or”
IF: add $14, $8, $9IF: add $14, $8, $9
1
1
Korea Univ
Datapath with Control
21
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
Reg
Writ
e
MemRead
Control
ALU
Instruction[15– 11]
6
X
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Writ
e
AddressData
memory
Address
10
000
10E
WB: sub $11, ..WB: sub $11, ..MEM: and $12…MEM: and $12…
0
1
10
000
1100
EX: or $13, $6, $7EX: or $13, $6, $7
100
00
“add”
ID: add $14, $8, $9ID: add $14, $8, $9
1
0
IF: xxxxIF: xxxx
Korea Univ
Datapath with Control
22
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
Reg
Writ
e
MemRead
Control
ALU
Instruction[15– 11]
6
M
WB
WBIF/ID
PCSrc
EX/MEM
MEM/WB
Mux
0
1
Mem
Writ
e
AddressData
memory
Address
10
000
10
WB: and $12…WB: and $12…
0
1
MEM: or $13, ..MEM: or $13, ..
100
00
EX: add $14, $8, $9EX: add $14, $8, $9
1
0
IF: xxxxIF: xxxx ID: xxxxID: xxxx
X
M
WB
ID/EX
E
Korea Univ
Datapath with Control
23
WB: or $13…WB: or $13…
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
Reg
Writ
e
MemRead
Control
ALU
Instruction[15– 11]
6
M
WB
WBIF/ID
PCSrc
EX/MEM
MEM/WB
Mux
0
1
Mem
Writ
e
AddressData
memory
Address
MEM: add $14, ..MEM: add $14, ..
1000
0
EX: xxxxEX: xxxx
1
0
IF: xxxxIF: xxxx ID: xxxxID: xxxx
X
M
WB
ID/EX
E
Korea Univ
Datapath with Control
24
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
Reg
Writ
e
MemRead
Control
ALU
Instruction[15– 11]
6
M
WB
WBIF/ID
PCSrc
EX/MEM
MEM/WB
Mux
0
1
Mem
Writ
e
AddressData
memory
Address
WB: add $14..WB: add $14..MEM: xxxxMEM: xxxxEX: xxxxEX: xxxx
1
0
IF: xxxxIF: xxxx ID: xxxxID: xxxx
X
M
WB
ID/EX
E
Korea Univ
Dependencies
• Dependencies Problem with starting (or executing) next instruction before first is
finished Dependencies incur data and control hazards
25
IM Reg
IM Reg
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
sub $2, $1, $3
Programexecutionorder(in instructions)
and $12, $2, $5
IM Reg DM Reg
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9
10 10 10 10 10/– 20 – 20 – 20 – 20 – 20
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
Value of register $2:
DM Reg
Reg
Reg
Reg
DM
Korea Univ
Data Hazard - Software Solution
• Data hazards Dependencies that “go backward in time”
• Have compiler guarantee no hazards? Insert nop (no operation) instructions (“0x00000000” is nop in
MIPS) Code scheduling
• Where do we insert the “nops” ?
sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)
• Problem? This really slows us down!
26
Korea Univ
Data Hazard - Pipeline Stalls?
27
IM Regsub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
DM Reg
IM Reg
IM Reg DM Reg
IM DM Reg
IM DM Reg
Reg
Reg
Reg
DM
stall
stall
stall IM
IM
IM
bubble
Korea Univ
Data Hazard - Forwarding
• Use temporary results, don’t wait for them to be written Register file forwarding to handle read/write to same register ALU forwarding
28
IM Reg
IM Reg
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
sub $2, $1, $3
Programexecution order(in instructions)
and $12, $2, $5
IM Reg DM Reg
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9
10 10 10 10 10/– 20 – 20 – 20 – 20 – 20
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
Value of register $2 :
DM Reg
Reg
Reg
Reg
X X X – 20 X X X X XValue of EX/MEM :X X X X – 20 X X X XValue of MEM/WB :
DM
Ok.. Then, do we have to do this forwarding?
1. If you are asked to design CPU using only rising-edge of the clock, then?
• Let’s stick to this for our project
2. If the register file write occurs in the first half of the clock, and read occurs in the 2nd half of the clock, then?
• Our textbook follows this
Korea Univ
Forwarding (simplified)
29
Data
Memory
Register
File
MU
X
ID/EX EX/MEM MEM/WB
ALU
Korea Univ
Forwarding (from EX/MEM)
30
ALU
Data
Memory
Register
File
MU
X
ID/EX EX/MEM MEM/WB
MU
XM
UX
Korea Univ
Forwarding (from MEM/WB)
31
ALU
Data
Memory
Register
File
MU
X
ID/EX EX/MEM MEM/WB
MU
XM
UX
Korea Univ
Forwarding (operand selection)
32
ALU
Data
Memory
Register
File
MU
X
ID/EX EX/MEM MEM/WB
MU
XM
UX
Forwarding
Unit
Korea Univ
Forwarding (operand propagation)
33
ALU
Data
Memory
Register
File
MU
X
ID/EX EX/MEM MEM/WB
MU
XM
UX
Forwarding
Unit
Rt
Rs
MU
XRd
Rt
EX/MEM Rd
MEM/WB Rd
Korea Univ
Forwarding
34
PCInstruction
memory
Registers
Mux
Mux
Control
ALU
EX
M
WB
M
WB
WB
ID/EX
EX/MEM
MEM/WB
Datamemory
Mux
Forwardingunit
IF/ID
Inst
ruct
ion
Mux
RdEX/MEM.RegisterRd
MEM/WB.RegisterRd
Rt
Rt
Rs
IF/ID.RegisterRd
IF/ID.RegisterRt
IF/ID.RegisterRt
IF/ID.RegisterRs
Korea Univ
Can't always forward
• lw (load word) can still cause a hazard An instruction tries to read a register following a load
instruction that writes to the same register
• Thus, we need a hazard detection unit to “stall” the pipeline after the load instruction
35
Reg
IM
Reg
Reg
IM
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
lw $2, 20($1)
Programexecutionorder(in instructions)
and $4, $2, $5
IM Reg DM Reg
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9
or $8, $2, $6
add $9, $4, $2
slt $1, $6, $7
DM Reg
Reg
Reg
DM
Korea Univ
Stalling
• We can stall the pipeline by keeping an instruction in the same stage
36
lw $2, 20($1)
Programexecutionorder(in instructions)
and $4, $2, $5
or $8, $2, $6
add $9, $4, $2
slt $1, $6, $7
Reg
IM
Reg
Reg
IM DM
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6Time (in clock cycles)
IM Reg DM RegIM
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9 CC 10
DM Reg
RegReg
Reg
bubble
ID ID
IF IF
Korea Univ
Hazard Detection Unit
• Stall by letting an instruction that won’t write anything go forward• Stall the pipeline if both ID/EX is a load and (rt=IF/ID.rs or
rt=IF/ID.rt)
37
PCInstruction
memory
Registers
Mux
Mux
Mux
Control
ALU
EX
M
WB
M
WB
WB
ID/EX
EX/MEM
MEM/WB
Datamemory
Mux
Hazarddetection
unit
Forwardingunit
0
Mux
IF/ID
Inst
ruct
ion
ID/EX.MemReadIF
/ID
Wri
te
PC
Wri
te
ID/EX.RegisterRt
IF/ID.RegisterRd
IF/ID.RegisterRt
IF/ID.RegisterRt
IF/ID.RegisterRs
RtRs
Rd
Rt EX/MEM.RegisterRd
MEM/WB.RegisterRd
Korea Univ
Control Hazards - Branch
• When we decide to branch, other instructions are in the pipeline!• Assume: branch is not taken
When this assumption failed, flush 3 instructions
• We are predicting “branch not taken” need to add hardware for flushing instructions if we are wrong
38
Reg
Reg
CC 1
Time (in clock cycles)
40 beq $1, $3, 7
Programexecutionorder(in instructions)
IM Reg
IM DM
IM DM
IM DM
DM
DM Reg
Reg Reg
Reg
Reg
RegIM
44 and $12, $2, $5
48 or $13, $6, $2
52 add $14, $2, $2
72 lw $4, 50($7)
CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9
Reg
Korea Univ
Alleviate Branch Hazards
• Move branch compare to ID stage of the pipeline• Add adder to calculate branch target in ID stage• Add IF.flush signal that zeros the instruction (or
squash) in IF/ID pipeline register • Reduce penalty to 1 cycle
39
IF ID MEM WBEXbeq $1,$2,L1
Taken target address is known here
IF ID MEM WBEX
Bubblee
add $1,$2,$3
L1: sub $1,$2, $3
IF ID MEM WBEX
Actual condition is generated here
…
Korea Univ
Flushing Instructions
40
PCInstruction
memory
4
Registers
Mux
Mux
Mux
ALU
EX
M
WB
M
WB
WB
ID/EX
0
EX/MEM
MEM/WB
Datamemory
Mux
Hazarddetection
unit
Forwardingunit
IF.Flush
IF/ID
Signextend
Control
Mux
=
Shiftleft 2
Mux
Korea Univ
Flushing Instructions (cycle N)
41
PCInstruction
memory
4
Registers
Mux
Mux
Mux
ALU
EX
M
WB
M
WB
WB
ID/EX
0
EX/MEM
MEM/WB
Datamemory
Mux
Hazarddetection
unit
Forwardingunit
IF.Flush
IF/ID
Signextend
Control
Mux
=
Shiftleft 2
Mux
and $12, $2, $5 beq $1, $3, L2beq $1, $3, L2and $12, $2, $5or $13, $12, $1…L2:lw $4, 40($7)
Korea Univ
Flushing Instructions (cycle N)
42
PCInstruction
memory
4
Registers
Mux
Mux
Mux
ALU
EX
M
WB
M
WB
WB
ID/EX
0
EX/MEM
MEM/WB
Datamemory
Mux
Hazarddetection
unit
Forwardingunit
IF.Flush
IF/ID
Signextend
Control
Mux
=
Shiftleft 2
Mux
and $12, $2, $5 beq $1, $3, L2
L2L2
beq $1, $3, L2and $12, $2, $5or $13, $12, $1…L2:lw $4, 40($7)
Korea Univ
Flushing Instructions (cycle N+1)
43
PCInstruction
memory
4
Registers
Mux
Mux
Mux
ALU
EX
M
WB
M
WB
WB
ID/EX
0
EX/MEM
MEM/WB
Datamemory
Mux
Hazarddetection
unit
Forwardingunit
IF.Flush
IF/ID
Signextend
Control
Mux
=
Shiftleft 2
Mux
nop beq $1, $3, L2lw $4, 40($7)beq $1, $3, L2and $12, $2, $5or $13, $12, $1…L2:lw $4, 40($7)
Korea Univ
Improving Performance
• Try and avoid stalls! E.g., reorder these instructions:
lw $t0, 0($t1)lw $t2, 4($t1)sw $t2, 0($t1)sw $t0, 4($t1)
• Add a “branch delay slot” The next instruction after a branch is always executed Rely on compiler to “fill” the slot with something useful
• Superscalar Start more than one instruction in the same cycle Most all processors are now pipelined and Superscalar
44
Korea Univ
Dynamic Scheduling
• The hardware performs the “scheduling” Hardware tries to find instructions to execute Out of order (OOO) execution is possible Speculative execution and dynamic branch prediction
• All modern processors are very complicated DEC Alpha 21264: 9 stage pipeline, 6 instruction
issue PowerPC and Pentium: branch history table Compiler technology is important
• This class has given you the background you need to learn more
45
Korea Univ
Exceptions & Interrupts
• CPU has to prepare for all possible situations it could face “Unexpected” events require change in flow of control
• Exceptions arise within the CPU Undefined opcode Arithmetic overflow in MIPS
• Some other architectures (such as x86 and ARM) do not generate exception on arithmetic overflow. Instead, set bits of the flag register inside CPU
• Interrupts are from external I/O devices• Keyboard, Mouse, Network card etc
• Many architectures and authors do not distinguish between interrupts and exceptions Often use the term “interrupt” to refer to both types of
events
46
Korea Univ
Pipelined Performance Example
• Ideally CPI = 1
• But, need to handle stalling (cause by loads and branches)
• SPECINT2000 benchmark: 25% loads 10% stores 11% branches 2% jumps 52% R-type
• Suppose 40% of loads are used by next instruction 25% of branches are mispredicted
• What is the average CPI?
47
Korea Univ
Pipelined Performance Example
• SPECINT2000 benchmark: 25% loads 10% stores 11% branches 2% jumps 52% R-type
• If there is no stall in the pipelined MIPS, how would you calculate CPI? Average CPI = (0.25) (1 CPI) + (0.10) (1 CPI) + (0.11) (1 CPI) + (0.02) (1 CPI) + (0.52) (1
CPI) = 1
• Suppose 40% of loads are used by next instruction 25% of branches are mispredicted All jumps flush next instruction
• What is the average CPI? Load/Branch CPI = 1 when no stalling, 2 when stalling. Thus CPIlw = 1 (0.6) + 2 (0.4) = 1.4 CPIbeq = 1 (0.75) + 2 (0.25) = 1.25 CPIjump = 2 (1) = 2
• Average CPI = (0.25)(1.4) + (0.1)(1) + (0.11)(1.25) + (0.02)(2) + (0.52)(1) = 1.15
48
Korea Univ
Pipelined Performance
• Critical path of the pipelined MIPS processor:
Tc = max { tpcq + tmem + tsetup
, // IF stage
2(tRFread + tmux + teq + tAND + tmux + tsetup ) , // ID stage
tpcq + tmux + tmux + tALU + tsetup , // EX stage
tpcq + tmemwrite + tsetup , // MEM stage
2(tpcq + tmux + tRFwrite) // WB stage
}
49
Where does this “2” come from?1. If you are asked to design CPU using
only rising-edge of the clock, then?• Let’s stick to this for our
project2. If the register file write occurs in the
first half of the clock, and read occurs in the 2nd half of the clock, then?
• Our textbook follows this
Korea Univ
Pipelined Performance Example
Tc = 2(tRFread + tmux + teq + tAND + tmux + tsetup ) = 2[150 + 25 + 40 + 15 + 25 + 20] ps = 550 ps
50
Element Parameter
Delay (ps)
Register clock-to-Q tpcq_PC 30
Register setup tsetup 20
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20
Equality comparator teq 40
AND gate tAND 15
Memory write Tmemwrite 220
Register file write tRFwrite 100 ps
Korea Univ
Pipelined Performance Example
• For a program with 100 billion instructions executing on a pipelined MIPS processor, CPI = 1.15 Tc = 550 ps
Execution Time = (#instructions)(cycles/instruction)(seconds/cycle)
= (100 × 109)(1.15)(550× 10-12 s) = 63 seconds
51
ProcessorExecution
Time(seconds)
Speedup(single-cycle is
baseline)
Single-cycle
95 1
Multicycle 133 0.71
Pipelined 63 1.51
Korea Univ
Backup Slides
52
Korea Univ
Exception Handling in MIPS and Handler Actions
• Exception handling in MIPS Hardware (CPU) CPU saves PC of offending (or interrupted)
instruction to the “Exception Program Counter (EPC)” register
CPU saves indication of the problem to the “Cause” register
Jump to handler at 0x8000 00180
• Exception Handler in Software Read cause, and transfer to relevant handler If restartable,
• Take corrective action• Use EPC to return to program
Otherwise• Terminate program• Report error using EPC, cause, …
53
Korea Univ
Exceptions in a Pipeline
• Another form of control hazard
• Consider overflow on add in EX stage
add $1, $2, $1
Prevent $1 from being clobbered Complete previous instructions Flush add and subsequent instructions Set Cause and EPC register values Transfer control to handler
• Similar to mispredicted branch Use much of the same hardware
54
Korea Univ
Pipeline with Exceptions
55
Korea Univ
Exception Example
• Exception on add in40 sub $11, $2, $444 and $12, $2, $548 or $13, $2, $64C add $1, $2, $150 slt $15, $6, $754 lw $16, 50($7)…
• Handler80000180 sw $25, 1000($0)80000184 sw $26, 1004($0)…
56
Korea Univ
Exception Example
57
Korea Univ
Exception Example
58