Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | ralf-willis |
View: | 224 times |
Download: | 2 times |
CMPE 421Parallel Computer Architecture
Part 2:Hardware Solution:
Forwarding
Hardware Solution: Forwarding Idea: use intermediate data, do
not wait for result to be finally written to the destination register. Two steps:
1. Detect data hazard2. Forward intermediate data to
resolve hazard
Review: MIPS Pipeline Data and Control Paths
ReadAddress
InstructionMemory
Add
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
16 32
ALU
Shiftleft 2
Add
DataMemory
Address
Write Data
ReadData
IF/ID
SignExtend
ID/EXEX/MEM
MEM/WB
Control
ALUcntrl
RegWrite
MemWriteMemRead
MemtoReg
RegDst
ALUOp
ALUSrc
Branch
PCSrc
How many bits wide is each pipeline register?
PC – 32 bitsIF/ID – 64 bitsID/EX – 9 + 32x4 + 10
= 147EX/MEM – 5 + 1 + 32x3 +
5 = 107MEM/WB – 2 + 32x2 + 5
= 71
Pipelined Datapath with Control II (as before)
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
Re
gWrit
e
MemRead
Control
ALU
Instruction[15– 11]
6
EX
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Writ
e
AddressData
memory
Address
Control signalsemanate from the controlportions of the pipeline registers
Data Forwarding Plan:
allow inputs to the ALU not just from ID/EX, but also later pipeline registers, and
use multiplexors and control signals to choose appropriate inputs to ALU
IM Reg
IM Reg
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
sub $2, $1, $3
Programexecution order(in instructions)
and $12, $2, $5
IM Reg DM Reg
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9
10 10 10 10 10/– 20 – 20 – 20 – 20 – 20
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
Value of register $2 :
DM Reg
Reg
Reg
Reg
X X X – 20 X X X X XValue of EX/MEM :X X X X – 20 X X X XValue of MEM/WB :
DM
sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)
Fig 6.29 Dependencies between pipelines move forward in time
Possible Hazard Conditions can be detected by following notations during forwarding technique
Hazard conditions:1a. EX/MEM.RegisterRd = ID/EX.RegisterRs1b. EX/MEM.RegisterRd = ID/EX.RegisterRt2a. MEM/WB.RegisterRd = ID/EX.RegisterRs2b. MEM/WB.RegisterRd = ID/EX.RegisterRt
Eg., in the earlier example (Fig. 6-29), first hazard between sub $2, $1, $3 and
and $12, $2, $5 is detected when the and is in EX stage and the sub is in MEM stage because
EX/MEM.RegisterRd = ID/EX.RegisterRs = $2 (1a) Similar to above this time dependency between “sub” & “or”
can be detected as MEM/WB.RegisterRd = ID/EX.RegisterRt = $2 (2b)
The two dependencies on “sub”-”add” are not hazard Another form of forwarding but it occurs within the reg file
There is no hazard between sub-sw
Remarks:-We don’t have any WB hazard
Why? Assume that REG. file supplies the correct result if the next instruction in the ID stage can read the register written by the current instruction in the WB stage
-Whether to forward also depends on:if the later instruction is going to write a register – if not, no need to forward, even if there is register number match as in conditions above
If the destination register of the later instruction is $0 – in which case there is no need to forward value ($0 is always 0 and never overwritten)
Forwarding Hardware To Detect hazard forwarding unit should
be added by inserting Mux’es to the ALU inputs (see Fig 6.30)
For forwarding just the R-type instrucutions initially sub, add, and , ….
Forward A and Forward B control lines to select MUX inputs that will go into ALU
Forwarding unit will be in EX stage because the ALU forwarding MUX’es are found in that stage
Forwarding Hardware
FIGURE 6.31 The control values for the forwarding multiplexors in Figure 6.30. The signed immediate that is another input to the ALU is described in the Elaboration at the end of this section.
Data Forwarding (Bypassing) Take the result from the earliest point that it exists
in any of the pipeline state registers and forward it to the functional units (e.g., the ALU) that need it that cycle
For ALU functional unit: the inputs can come from any pipeline register rather than just from ID/EX by adding multiplexors to the inputs of the ALU connecting the Rd write data in EX/MEM or MEM/WB to
either (or both) of the EX’s stage Rs and Rt ALU mux inputs
adding the proper control hardware to control the new muxes
Other functional units may need similar forwarding logic (e.g., the DM)
With forwarding can achieve a CPI of 1 even in the presence of data dependencies
ForwardingHardware
Registers
Mux M
ux
ALU
ID/EX MEM/WB
Datamemory
Mux
Forwardingunit
EX/MEM
b. With forwarding
ForwardB
RdEX/MEM.RegisterRd
MEM/WB.RegisterRd
RtRtRs
ForwardA
Mux
ALU
ID/EX MEM/WB
Datamemory
EX/MEM
a. No forwarding
Registers
Mux
Datapath before adding forwarding hardware
Datapath after adding forwarding hardware
FIGURE 6.30 On the top are the ALU and pipeline registers before adding forwarding. On the bottom, the multiplexors have been expanded to add the forwarding paths, and we show the forwarding unit. The new hardware is shown in color. This figure is a stylized drawing, how ever, leaving out details from the full datapath such as the sign extension hardware. Note that the ID/EX. RegisterRt field is shown twice, once to connect to the mux and once to the forwarding unit, but it is a single signal. As in the earlier discussion, this ignores forwarding of a store value to a store instruction. Also note that this mechanism works for slt instructions as well.
Elaboration on Forwarding Hardware
FIGURE 6.33 The datapath modified to resolve hazards via forwarding. Compared with the datapath in Figure 6.30, the additions are the multiplexors to the inputs to the ALU. This figure is a more stylized drawing, however, leaving out details from the full datapath, such as the branch hardware and the sign extension hardware.
stall
stall
Review: One Way to “Fix” a Data Hazard
Instr.
Order
add $1,
ALUIM Reg DM Reg
sub $4,$1,$5
and $6,$7,$1
ALUIM Reg DM Reg
ALUIM Reg DM Reg
Fix data hazard by
waiting – stall – but impacts
CPI
Review: Another Way to “Fix” a Data Hazard
Instr.
Order
add $1,
ALUIM Reg DM Reg
sub $4,$1,$5
and $6,$7,$1A
LUIM Reg DM Reg
ALUIM Reg DM Reg
Fix data hazards by forwarding results as soon
as they are available to
where they are needed
sw $4,4($1)
or $8,$1,$1
ALUIM Reg DM Reg
ALUIM Reg DM Reg
Notice that for now we are showing the forwarded data coming out of the ALU. After looking at the problem more closely we will see that it really is supplied by the pipeline register EX/MEM and will depict it as such.
Data Forwarding Control Conditions
1. EX/MEM hazard: if (EX/MEM.RegWriteand (EX/MEM.RegisterRd != 0)and (EX/MEM.RegisterRd = ID/EX.RegisterRs))
ForwardA = 10if (EX/MEM.RegWriteand (EX/MEM.RegisterRd != 0)and (EX/MEM.RegisterRd = ID/EX.RegisterRt))
ForwardB = 10
Forwards the result from the previous instr. to either input of the ALU
Forwards the result from the second previous instr. to either input of the ALU
2. MEM/WB hazard:if (MEM/WB.RegWriteand (MEM/WB.RegisterRd != 0)and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA = 01if (MEM/WB.RegWriteand (MEM/WB.RegisterRd != 0)and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB = 01
Forwarding Illustration
Instr.
Order
add $1,
sub $4,$1,$5
and $6,$7,$1
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
EX/MEM hazard forwarding
MEM/WB hazard forwarding
Notice that for now we are showing the forwarded data coming out of the ALU. After looking at the problem more closely we will see that it really is supplied by the pipeline register EX/MEM and will depict it as such.
Yet Another Complication!
Instr.
Order
add $1,$1,$2
ALUIM Reg DM Reg
add $1,$1,$3
add $1,$1,$4
ALUIM Reg DM Reg
ALUIM Reg DM Reg
Another potential data hazard can occur when there is a conflict between the result of the WB stage instruction and the MEM stage instruction – which should be forwarded?
Yet Another Complication!
Instr.
Order
add $1,$1,$2
ALUIM Reg DM Reg
add $1,$1,$3
add $1,$1,$4
ALUIM Reg DM Reg
ALUIM Reg DM Reg
Another potential data hazard can occur when there is a conflict between the result of the WB stage instruction and the MEM stage instruction – which should be forwarded?
Register $1 is written by both of the previous instructions, but only the most recent result (from the second ADD) should be forwarded.
Register $1 is written by both of the previous instructions, but only themost recent result (from the second ADD) should be forwarded.
Corrected Data Forwarding Control Conditions
2. MEM/WB hazard:if (MEM/WB.RegWriteand (MEM/WB.RegisterRd != 0)and (EX/MEM.RegisterRd != ID/EX.RegisterRs)and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA = 01
if (MEM/WB.RegWriteand (MEM/WB.RegisterRd != 0)and (EX/MEM.RegisterRd != ID/EX.RegisterRt)and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB = 01
Datapath with Forwarding Hardware
PCSrc
ReadAddress
InstructionMemory
Add
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
16 32
ALU
Shiftleft 2
Add
DataMemory
Address
Write Data
ReadData
IF/ID
SignExtend
ID/EXEX/MEM
MEM/WB
Control
ALUcntrl
Branch
ForwardUnit
ID/EX.RegisterRt
ID/EX.RegisterRs
EX/MEM.RegisterRd
MEM/WB.RegisterRd
MEM/WB.RegWrite
EX/MEM.RegWrite
Forwarding Hardware with Control
PCInstruction
memory
Registers
Mux
Mux
Control
ALU
EX
M
WB
M
WB
WB
ID/EX
EX/MEM
MEM/WB
Datamemory
Mux
Forwardingunit
IF/ID
Inst
ruct
ion
Mux
RdEX/MEM.RegisterRd
MEM/WB.RegisterRd
Rt
Rt
Rs
IF/ID.RegisterRd
IF/ID.RegisterRt
IF/ID.RegisterRt
IF/ID.RegisterRs
Datapath with forwarding hardware and control wires – certain details,e.g., branching hardware, are omitted to simplify the drawingNote: so far we have only handled forwarding to R-type instructions…!
Called forwarding unit, not hazard detection unit, because once data is forwarded there is no hazard!
Example Consider the following code
sequence in which the dependencies have been highlighted sub $2, $1, $3 and $4, $2, $5 or $4, $4, $2 add $9, $4, $2
We’ll try to keep the example short.— We’ll skip the first two cycles, since they’re the same as before.
Example:Forwarding
PCInstruction
memory
Registers
Mux
Mux
Mux
EX
M
WB
WB
Datamemory
Mux
Forwardingunit
Inst
ruct
ion
IF/ID
and $4, $2, $5 sub $2, $1, $3
ID/EX
before<1>
EX/MEM
before<2>
MEM/WB
or $4, $4, $2
Clock 3
2
5
10 10
$2
$5
5
2
4
$1
$3
3
1
2
Control
ALU
PCInstruction
memory
Registers
Mux
Mux
Mux
EX
M
WB
M
WB
Datamemory
Mux
Forwardingunit
Inst
ruct
ion
IF/ID
or $4, $4, $2 and $4, $2, $5
ID/EX
sub $2, . . .
EX/MEM
before<1>
MEM/WB
add $9, $4, $2
Clock 4
4
6
10 10
$4
$2
6
2
4
$2
$5
5
2
4
Control
ALU
10
2
WB
M
WB
sub $2, $1, $3and $4, $2, $5or $4, $4, $2add $9, $4, $2
Execution
example:
Clock cycle 3
Clock cycle 4
Execution
example (cont.):
PCInstruction
memory
Registers
Mux
Mux
Mux
EX
M
WB
M
WB
Datamemory
Mux
Forwardingunit
Inst
ruct
ion
IF/ID
add $9, $4, $2 or $4, $4, $2
ID/EX
and $4, . . .
EX/MEM
sub $2, . . .
MEM/WB
after<1>
Clock 5
4
2
10 10
$4
$2
2
4
9
$4
$2
4
2
24
Control
ALU
10
WB
2
1
PCInstruction
memory
Mux
Mux
Mux
EX
M
WB
M
WB
Datamemory
Mux
Forwardingunit
after<1>after<2> add $9, $4, $2 or $4, . . .
EX/MEM
and $4, . . .
MEM/WB
Clock 6
10
$4
$2
2
4
9
ALU
10
4
4
WB
4
1
Registers
Inst
ruct
ion
IF/ID
ID/EX
4
Control
sub $2, $1, $3and $4, $2, $5or $4, $4, $2add $9, $4, $2
Clock cycle 5
Clock cycle 6
Example:Forwarding
Memory-to-Memory Copies
Instr.
Order
lw $1,4($2)A
LUIM Reg DM Reg
sw $1,4($3)
ALUIM Reg DM Reg
For loads immediately followed by stores (memory-to-memory copies) can avoid a stall by adding forwarding hardware from the MEM/WB register to the data memory input.
Would need to add a Forward Unit and a mux to the memory access stage
What if lw was replaced with add $1, - is forwarding still needed? From where, to where?What if $1 was used to compute the effective address (it would be a load-use data hazard and would require a stall insertion between the lw and sw)
Load-use Hazard Detection Unit Forwarding is not the solution for all data
hazard conditions, For ex. When an instruction tries to read a register following a “lw”, that writes the same register Problem will occur In clk 4 the correct value of reg 2 is not
available at the beginning Therefore, In addition to FU, we need
hazard detection unit (HDU) is to be operated in during ID stage
HDU will insert stall between the “load” and its use
Load word can still cause a hazard: an instruction tries to read a register following a load
instruction that writes to the same register
therefore, we need a hazard detection unit to stall the pipeline after the load instruction
Data Hazards and Stalls
Reg
IM
Reg
Reg
IM
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
lw $2, 20($1)
Programexecutionorder(in instructions)
and $4, $2, $5
IM Reg DM Reg
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9
or $8, $2, $6
add $9, $4, $2
slt $1, $6, $7
DM Reg
Reg
Reg
DM
lw $2, 20($1)and $4, $2, $5or $8, $2, $6add $9, $4, $2Slt $1, $6, $7
As even a pipelinedependency goesbackward in timeforwarding will notsolve the hazard
Stalling Resolves a Hazard Same instruction sequence as before for which
forwarding by itself could not resolve the hazard:
lw $2, 20($1)
Programexecutionorder(in instructions)
and $4, $2, $5
or $8, $2, $6
add $9, $4, $2
slt $1, $6, $7
Reg
IM
Reg
Reg
IM DM
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6Time (in clock cycles)
IM Reg DM RegIM
IM DM Reg
IM DM Reg
CC 7 CC 8 CC 9 CC 10
DM Reg
RegReg
Reg
bubble
lw $2, 20($1)and $4, $2, $5or $8, $2, $6add $9, $4, $2Slt $1, $6, $7
• Hazard detection unit inserts a 1-cycle bubble in the pipeline, after which all pipeline register dependencies go forward so then the forwarding unit can handle them and there are no more hazards
• AND instruction is turned into NOP all instructions beginning with AND instruction are delayed one cycle. Hazard forces the AND and OR instructions to repeat in clock cycle 4, what they did in clock cycle 3
Example:Forwarding with Load-use Data Hazards
Instr.
Order
lw $1,4($2)
and $6,$1,$7
xor $4,$1,$5
or $8,$1,$9A
LUIM Reg DM Reg
ALUIM Reg DM
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Regsub $4,$1,$5
stall
Forwarding with Load-use Data Hazards
Instr.
Order
lw $1,4($2)
sub $4,$1,$5
and $6,$1,$7
xor $4,$1,$5
or $8,$1,$9A
LUIM Reg DM Reg
ALUIM Reg DM
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
sub $4,$1,$5
and $6,$1,$7
xor $4,$1,$5
or $8,$1,$9
The one case where forwarding cannot save anything when an instruction tries to read a register following a load instruction that writes the same register.
Load-use Hazard Detection Unit
Need a Hazard detection Unit in the ID stage that inserts a stall between the load and its use
The first line tests to see if the instruction now in the EX stage is a lw; the next two lines check to see if the destination register of the lw matches either source register of the instruction in the ID stage (the load-use instruction)
After this one cycle stall, the forwarding logic can handle the remaining data hazards
Hazard detection unit implements the following check if to stall
if ( ID/EX.MemRead // if the instruction in the EX stage is a load…
and ( ( ID/EX.RegisterRt = IF/ID.RegisterRs ) // and the destination register
or ( ID/EX.RegisterRt = IF/ID.RegisterRt ) ) ) // matches either source register
// of the instruction in the ID stage, then…
stall the pipeline
Stall Hardware Along with the Hazard Unit, we have to implement
the stall Prevent the instructions in the IF and ID stages from
progressing down the pipeline – done by preventing the PC register and the IF/ID pipeline register from changing
Hazard detection Unit controls the writing of the PC (PC.write) and IF/ID (IF/ID.write) registers
Insert a “bubble” between the lw instruction (in the EX stage) and the load-use instruction (in the ID stage) (i.e., insert a noop in the execution stream)
Set the control bits in the EX, MEM, and WB control fields of the ID/EX pipeline register to 0 (noop). The Hazard Unit controls the mux that chooses between the real control values and the 0’s.
Let the lw instruction and the instructions after it in the pipeline (before it in the code) proceed normally down the pipeline
Adding the Hazard Hardware
ReadAddress
InstructionMemory
Add
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
16 32
ALU
Shiftleft 2
Add
DataMemory
Address
Write Data
ReadData
IF/ID
SignExtend
ID/EXEX/MEM
MEM/WB
Control
ALUcntrl
Branch
PCSrc
ForwardUnit
HazardUnit
0
1
ID/EX.RegisterRt
0
ID/EX.MemRead
Mechanics of Stalling If the check to stall verifies, then the pipeline needs
to stall only 1 clock cycle after the load as after that the forwarding unit can resolve the dependency
What the hardware does to stall the pipeline 1 cycle: does not let the IF/ID register change (disable write!) – this
will cause the instruction in the ID stage to repeat, i.e., stall therefore, the instruction, just behind, in the IF stage must
be stalled as well – so hardware does not let the PC change (disable write!) – this will cause the instruction in the IF stage to repeat, i.e., stall
changes all the EX, MEM and WB control fields in the ID/EX pipeline register to 0, so effectively the instruction just behind the load becomes a nop – a bubble is said to have been inserted into the pipeline
note that we cannot turn that instruction into an nop by 0ing all the bits in the instruction itself – recall nop = 00…0 (32 bits) – because it has already been decoded and control signals generated
Pipelined Datapath with Control II (as before)
PC
Instructionmemory
Inst
ruct
ion
Add
Instruction[20– 16]
Mem
toR
eg
ALUOp
Branch
RegDst
ALUSrc
4
16 32Instruction[15– 0]
0
0
Mux
0
1
Add Addresult
RegistersWriteregister
Writedata
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Signextend
Mux
1
ALUresult
Zero
Writedata
Readdata
Mux
1
ALUcontrol
Shiftleft 2
Re
gWrit
e
MemRead
Control
ALU
Instruction[15– 11]
6
EX
M
WB
M
WB
WBIF/ID
PCSrc
ID/EX
EX/MEM
MEM/WB
Mux
0
1
Mem
Writ
e
AddressData
memory
Address
Control signalsemanate from the controlportions of the pipeline registers
Hazard Detection Unit + Forwarding Unit
PCInstruction
memory
Registers
Mux
Mux
Mux
Control
ALU
EX
M
WB
M
WB
WB
ID/EX
EX/MEM
MEM/WB
Datamemory
Mux
Hazarddetection
unit
Forwardingunit
0
Mux
IF/ID
Inst
ruct
ion
ID/EX.MemRead
IF/I
DW
rite
PC
Wri
te
ID/EX.RegisterRt
IF/ID.RegisterRd
IF/ID.RegisterRt
IF/ID.RegisterRt
IF/ID.RegisterRs
RtRs
Rd
Rt EX/MEM.RegisterRd
MEM/WB.RegisterRd
Datapath with forwarding hardware, the hazard detection unit and controls wires – certain details, e.g., branching hardware are omitted to simplify the drawing
Stalling
Execution example:
lw $2, 20($1)and $4, $2, $5or $4, $4, $2add $9, $4, $2
Stalling
Execution example:
lw $2, 20($1)and $4, $2, $5or $4, $4, $2add $9, $4, $2
Stalling
Execution example:
lw $2, 20($1)and $4, $2, $5or $4, $4, $2add $9, $4, $2
Stalling
Execution example:
lw $2, 20($1)and $4, $2, $5or $4, $4, $2add $9, $4, $2
Stalling
Execution example:
lw $2, 20($1)and $4, $2, $5or $4, $4, $2add $9, $4, $2
Stalling
Execution example:
lw $2, 20($1)and $4, $2, $5or $4, $4, $2add $9, $4, $2