of 48
7/31/2019 Pipelined MIPS Porcessor
1/48
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
The Pipelined MIPS Processor
The Pipelined MIPS Processor
We complete our study of computer architecture by investigating anapproach providing even higher performance for the MIPS CPU.
We first saw how the MIPS CPU erformance could be im roved
by converting the so-called single-cycle CPU to a multi-cycle design.
In the multi-cycle approach, instead of using a single clock cycle for the
whole instruction, the clock is accelerated, and instructions execute in
phases over several clock cycles.
Each instruction phase takes one clock cycle.
This means that as each instruction executes, only one section of the
CPU will be active per clock cycle -- the one executing that phase of theinstruction.
This suggests that perhaps we might redesign the CPU slightly so
N. B. Dodge 01/121 Lecture #21: The Pipeline MIPS Processor
a every sec on can opera e n epen en y on an ns ruc on
at the same time.
7/31/2019 Pipelined MIPS Porcessor
2/48
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
The Laundry ExampleThe Laundry Example
As an introduction to the concept of pipelining, Patterson andHennessy use the example of doing ones laundry.
Most eo le have or have access to a washer and dr er.
Assume that you need to wash several washer loads of clothing.
Would anyone divide the clothing into washer loads and then, ,
second?
No, if you were washing clothes, you would finish washing the first
, , . If there were more loads to wash, you would begin to fold and put
away finished clothing while the later loads were washing and
N. B. Dodge 01/122 Lecture #21: The Pipeline MIPS Processor
.
We can see this schematically on the next slide.
7/31/2019 Pipelined MIPS Porcessor
3/48
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Graphical Example of the Laundry CycleGraphical Example of the Laundry Cycle
N. B. Dodge 01/123 Lecture #21: The Pipeline MIPS Processor
7/31/2019 Pipelined MIPS Porcessor
4/48
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
The Pipeline ProcessorThe Pipeline Processor
Patterson and Hennessy applied this simultaneous wash-dry-fold-put away concept to the single-cycle computer model.
, , ,
simultaneously so that the instruction throughput the number of
clock cycles per instructions could be dramatically decreased.,
cycle, but the clock must be as slow as the slowest instruction.
In the multi-cycle implementation, the clock runs faster, instructions
- , . What if, each time the clock ticked, we could process an instruction
in each section of the multicycle processor? Then we could process
N. B. Dodge 01/124 Lecture #21: The Pipeline MIPS Processor
,
completing an instruction every clock cycle.
7/31/2019 Pipelined MIPS Porcessor
5/48
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Pipeline ArchitecturePipeline Architecture
A pipelined computer executes instructions concurrently.
Hardware units are organized into stages:
Execution in each stage takes exactly 1 clock period.
partial results to the next stage.
Unfortunately, as noted earlier, speed = complexity + cost.
e p pe ne approac r ngs a ona expense p us sown set of problems and complications, called hazards,
which we will also study.
N. B. Dodge 01/125 Lecture #21: The Pipeline MIPS Processor
7/31/2019 Pipelined MIPS Porcessor
6/48
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Sequential Versus Pipelined ExecutionSequential Versus Pipelined Execution
(clock
cycles) 0 1 2 3 4 5 6 7 8 9 10
lw $t0, 16($a3) Instruc.Fetch
Reg.
Fetch
ALU
Process
Mem. R/W
or ALU Out
Reg.
Write
lw $t1, 32($a3)
lw $t2, 48($a3)
Instruc.
Fetch
Reg.
Fetch
ALU
Process
Mem. R/W
or ALU Out
Reg.
Write
Instruc.
Fetch
Reg.
Fetch
AL
Proc
4 clock cycles
4 clock cycles
etc.Timeline
(clock
cycles) 0 1 2 3 4 5 6 7 8 9 105 clock cycles
lw $t0, 16($a3)
lw $t1, 32($a3)
Instruc.Fetch
Reg.Fetch
ALUProcess
Mem. R/Wor ALU Out
Reg.Write
Instruc.
Fetch
Reg.
Fetch
ALU
Process
Mem. R/W
or ALU Out
Reg.
Write
Instruc. Reg. ALU Mem. R/W Reg.
N. B. Dodge 01/126 Lecture #21: The Pipeline MIPS Processor
lw $t2, 48($a3)etc.
e c e c rocess or u r e
7/31/2019 Pipelined MIPS Porcessor
7/48
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Speed Advantage of the PipelineSpeed Advantage of the Pipeline
e mu t cyc e, ser a processor t at we stu e ast ecture can
execute n instructions in ns clock periods, or ETS= ns, where
A pipelined processor with s stages can execute n instructions in
ET = s + n
1 clock eriods. The ideal pipeline speedup depends on the number of stages, and
can be greater for more stages (hence Intels choice of a 20-stage
i eline for the current P-IV .
Thus the speed advantage of pipeline over multicycle can be
defined as:sET nsS s
N. B. Dodge 01/127 Lecture #21: The Pipeline MIPS Processor
Ps n
7/31/2019 Pipelined MIPS Porcessor
8/48
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Pipeline StagesPipeline Stages
0 1 2 3 4 5
ID/
oc cyc es
The MIPS R2000 pipeline processor is divided into five processing
RF
stages:
1. Instruction fetch (IF)
2. Instruction decode (ID) and register fetch (RF)
3. ALU instruction execution (ALU) ALU processing, branchcondition evaluation, memory address computation, etc. This is also
referred to as execution (EX)
N. B. Dodge 01/128 Lecture #21: The Pipeline MIPS Processor
. emory access
5. Write back (WB) to register file
7/31/2019 Pipelined MIPS Porcessor
9/48
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Overlapped Pipeline ExecutionOverlapped Pipeline Execution
0 1 2 3 4 5 6 7
Clock cycles
ALUIFID/
RFMEM WB Instruction 1
ALUIFID/
RFMEM WB
ID/
Instruction 2
RF
N. B. Dodge 01/129 Lecture #21: The Pipeline MIPS Processor
Instruction execution order
7/31/2019 Pipelined MIPS Porcessor
10/48
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Single-Cycle DatapathSingle-Cycle Datapath
ADDBranch
Mem. Read
32
32
32
32Reg. Dest.
M
U
X
ADDMem. To Reg.ALU Op.
Reg. Write
ALU Srce.
+
Left
shift
2
s0-31
32
5
ControlMem. Write
6 (Bits 26-31)
ALUPInstruction
AddressM
Data
AddressInst.
M
s
Rt
Rd
Read
Data 1
Read
Data 2Read
Instructionbit
32
32
3232
5
5
M
U
X
5
ReadWrite
Mem./Reg.
Select
Lines indicate need for
UX
-
UXReg. Block
WriteData
Sign
Extend
32
Write
Data
Data
16 (Bits 0-15)
32
32
ALU
Instruction
Memory
Data
Memory
N. B. Dodge 01/1210 Lecture #21: The Pipeline MIPS Processor
storage between stages if
processor is converted to
pipeline
6 (Bits 0-5)
7/31/2019 Pipelined MIPS Porcessor
11/48
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Single-Cycle Datapath with Pipeline RegistersSingle-Cycle Datapath with Pipeline Registers
MU
X
Inter-stage registers are master-slave D flip-flops; the master can
be receiving new data from the previous stage of the instructionwhile the slave flip-flop is providing data to the next stage
ADD
ADD+4
Memory
Left
shift
2
Compare
resultReg. Block
ALU
P
C
ns ruc on
Address Memory
M
U
Data
Address
Inst.
0-31 Rt
RdM
U
Read
Data 1
Read
Data 2Read
Data
Sign
Extend
Write
Data
16 32
Write
DataMaster side
of register
Slave side
N. B. Dodge 01/1211 Lecture #21: The Pipeline MIPS Processor
IF/ID ID/EX EX/MEM MEM/WB
of register
Note: Control lines and
logic not shown for clarity After David A. Patterson and John L. Hennessy, Computer Organization and Design, 2nd Edition
7/31/2019 Pipelined MIPS Porcessor
12/48
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Instruction Process Through Pipeline (1)Instruction Process Through Pipeline (1)
MU
X
ADD
ADD+4
Memory
Left
shift
2
Compare
resultReg. Block
ALU
P
C
ns ruc on
Address Memory
M
U
Data
Address
Inst.
0-31 Rt
RdM
U
Read
Data 1
Read
Data 2Read
Data
Stage 1: Instruction
loaded into IF/ID
Sign
Extend
Write
Data
16 32
Write
Data
N. B. Dodge 01/1212 Lecture #21: The Pipeline MIPS Processor
,
After David A. Patterson and John L. Hennessy, Computer Organization and Design, 2nd Edition
IF/ID ID/EX EX/MEM MEM/WB
7/31/2019 Pipelined MIPS Porcessor
13/48
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Instruction Process Through Pipeline (2)Instruction Process Through Pipeline (2)age : ns ruc on
decoded, register data
accessed, immediates
sign-extended
MU
X
ADD
ADD+4
Memory
Left
shift
2Compare
resultReg. Block
ALU
P
C
Instruction
Address Memory
M
U
Data
Address
Inst.
0-31
s
Rt
RdM
U
Read
Data 1
Read
Data 2Read
Data
Sign
Extend
XWriteData
16
X
32
Write
Data
N. B. Dodge 01/1213 Lecture #21: The Pipeline MIPS Processor
After David A. Patterson and John L. Hennessy, Computer Organization and Design, 2nd Edition
IF/ID ID/EX EX/MEM MEM/WB
7/31/2019 Pipelined MIPS Porcessor
14/48
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Instruction Process Through Pipeline (3)Instruction Process Through Pipeline (3)
Stage 3: Instruction
executed / branchaddress computed
M
U
X
ADD
ADD+4
Memory
Left
shift
2
Compare
resultReg. Block
ALU
P
C
ns ruc on
Address Memory
M
U
Data
Address
Inst.
0-31 Rt
RdM
U
Read
Data 1
Read
Data 2Read
Data
Sign
Extend
Write
Data
16 32
Write
Data
N. B. Dodge 01/1214 Lecture #21: The Pipeline MIPS Processor
After David A. Patterson and John L. Hennessy, Computer Organization and Design, 2nd Edition
IF/ID ID/EX EX/MEM MEM/WB
7/31/2019 Pipelined MIPS Porcessor
15/48
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Instruction Process Through Pipeline (4)Instruction Process Through Pipeline (4)
store, branch taken/not taken
ALU results bypass taken to
MEM/WB register
M
U
X
ADD
ADD+4
Memory
Left
shift
2
Compare
resultReg. Block
ALU
P
C
ns ruc on
Address Memory
M
U
Data
Address
Inst.
0-31 Rt
RdM
U
Read
Data 1
Read
Data 2Read
Data
Sign
Extend
Write
Data
16 32
Write
Data
N. B. Dodge 01/1215 Lecture #21: The Pipeline MIPS Processor
After David A. Patterson and John L. Hennessy, Computer Organization and Design, 2nd Edition
IF/ID ID/EX EX/MEM MEM/WB
7/31/2019 Pipelined MIPS Porcessor
16/48
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Instruction Process Through Pipeline (5)Instruction Process Through Pipeline (5)
M
UX
ADD+4
Instruction
Memory
Left
shift
2Compare
resultRs
Reg. Block
ALU
CAddress emory
M
U
X
Data
Address
Inst.
0-31 Rt
Rd
Write
M
U
X
Read
Data 1
Read
Data 2Read
Data
Stage 5: Result
write-back to
Sign
Extend
Data
16 32
WriteData
N. B. Dodge 01/1216 Lecture #21: The Pipeline MIPS Processor
dest. register
After David A. Patterson and John L. Hennessy, Computer Organization and Design, 2nd Edition
IF/ID ID/EX EX/MEM MEM/WB
7/31/2019 Pipelined MIPS Porcessor
17/48
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
Adding ControlAdding Control
Control information must be carried along as a part ofthe instruction, since this information is required at
different stages of the pipeline.
This can be done by adding more inter-stage storage
The result is very large inter-stage registers. For
example, the storage capacity required between the
instruction decode and ALU execution stages (ID/EXregister) is more than 120 bits.
N. B. Dodge 01/1217 Lecture #21: The Pipeline MIPS Processor
is shown on the next slide
7/31/2019 Pipelined MIPS Porcessor
18/48
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
ID/EX EX/MEM MEM/WB
ite
ADD+4
U
X
IF/ID
-31
Control
Decode
ite d
Regis
terWr
P InstructionAddress
Memory
Left
shift
2
Inst.
Reg. Block
Rs
Rt ReadData 1
Ins
tructionbits
ALU Srce
Branch
MemoryWr
MemoryRe
emor
y/ALUResult
ALUM
U
X
Data
Address
-
M
U
X
Rd
Write
Data
Read
Data 2
Write
Read
Data
M
Full PipelineDesign with
MemorySign
Extend
32DataBits 0-15
Bits 16-20
Bits 11-15
ALU
Cont.
M
UALU Op
N. B. Dodge 01/1218 Lecture #21: The Pipeline MIPS Processor
After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition
Reg. Dst.
7/31/2019 Pipelined MIPS Porcessor
19/48
Erik Jonsson School of En ineerin andComputer Science
e n vers y o exas a a as
The Pipeline in ActionThe Pipeline in Action
The following instruction sequence from the P&H textillustrates the pipeline in action.
w ,
sub $11, $2, $3
, ,
or $13, $6, $7
add $14 $8 $9
Note that registers are identified by number rather
than the letter ids, since that is the way they appear in
N. B. Dodge 01/1219 Lecture #21: The Pipeline MIPS Processor
t e processor. s a rem n er, = at, - = t -
t6, $2-3=$v0-v1, $4-7=$a0-a3, etc.
7/31/2019 Pipelined MIPS Porcessor
20/48
IF: Idle ID/RF: Idle EX: Idle MEM: Idle WB:Idle
M
ID/EX EX/MEM MEM/WB
rite
ADD
ADD+4
U
X
IF/ID
0-31
Decode
rite
ead
Reg
ister
P
Instruction
Address
Memory
Left
shift
2
Inst.0-31
Reg. Block
Rs
Rt Read
Data 1Instructionbit
ALU Srce
Branch
MemoryW
MemoryR
emory/ALUResul
ALUM
U
X
Data
AddressM
U
X
Rd
Write
Data
Read
Data 2
Write
Read
Data
Memory
Sign
Extend
a as -
Bits 16-20
Bits 11-15
ALU
Cont.
M
U
X
ALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition20
Reg. Dst.
Lecture #21: The Pipeline MIPS Processor
7/31/2019 Pipelined MIPS Porcessor
21/48
IF: lw $10 20 $1 ID/RF: Idle EX: Idle MEM: Idle WB:Idle
ID/EX EX/MEM MEM/WB
ite
ADD+4
U
X
IF/ID
-31
Control
Decode
ite d
RegisterWr
PInstruction
Address
Memory
Left
shift
2
Inst.
Reg. Block
Rs
Rt ReadData 1In
structionbits
ALU Srce
Branch
MemoryWr
MemoryRe
emor
y/ALUResult
ALUM
U
X
Data
Address
-
M
U
X
Rd
Write
Data
Read
Data 2
Write
Read
Data
M
MemorySign
Extend
32DataBits 0-15
Bits 16-20
Bits 11-15
ALU
Cont.
M
UALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition21
Reg. Dst.
Lecture #21: The Pipeline MIPS Processor
7/31/2019 Pipelined MIPS Porcessor
22/48
IF: sub $11 $2 $3 ID/RF: lw $10 20 $1 EX: Idle MEM: Idle WB:Idle
ID/EX EX/MEM MEM/WB
ite
ADD+4
U
X
IF/ID
-31
Control
Decode
ite d
RegisterWr
[ $1 ]
$ 1
$ 10
PInstruction
Address
Memory
Left
shift
2
Inst.
Reg. Block
Rs
Rt ReadData 1In
structionbits
ALU Srce
Branch
MemoryWr
MemoryRe
emor
y/ALUResult
X ALUM
U
X
Data
Address
-
M
U
X
Rd
Write
Data
Read
Data 2
Write
Read
Data
M
$ 10
X
20
MemorySign
ExtendDataBits 0-15
Bits 16-20
Bits 11-15
ALU
Cont.
M
UALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition22
Reg. Dst.
Lecture #21: The Pipeline MIPS Processor
7/31/2019 Pipelined MIPS Porcessor
23/48
IF: and $12 $4 $5 MEM: Idle WB:ID/RF: sub $11 $2 $3 EX: lw $10 20 $1Idle
ID/EX EX/MEM MEM/WB
ite
ADD+4
U
X
IF/ID
-31
Control
Decode
ite d
RegisterWr
[ $ 2 ] [ $1 ]
$ 3
$ 2
PInstruction
Address
Memory
Left
shift
2
Inst.
Reg. Block
Rs
Rt ReadData 1In
structionbits
ALU Srce
Branch
MemoryWr
MemoryRe
emor
y/ALUResult
[ $ 3 ]
add20
ALUM
U
X
Data
Address
-
M
U
X
Rd
Write
Data
Read
Data 2
Write
Read
Data
M
$ 11
X
X
20
$ 10
$ 10
MemorySign
Extend
DataBits 0-15
Bits 16-20
Bits 11-15
ALU
Cont.
M
UALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition23
Reg. Dst.
Lecture #21: The Pipeline MIPS Processor
7/31/2019 Pipelined MIPS Porcessor
24/48
IF: or $13 $6 $7 WB:ID/RF: and $12 $4 $5 EX: sub $11 $2 $3 MEM: lw $10 20 $1Idle
ID/EX EX/MEM MEM/WB
ite
ADD+4
U
X
IF/ID
-31
Control
Decode
ite d
RegisterWr
[ $2 ]
$ 4
$ 5
PInstruction
Address
Memory
Left
shift
2
Inst.
Reg. Block
Rs
Rt ReadData 1In
structionbits
ALU Srce
Branch
MemoryWr
MemoryRe
emory/ALUResult
[$3]
[ $3 ]sub
ALUM
U
X
Data
Address
-
M
U
X
Rd
Write
Data
Read
Data 2
Write
Read
Data
M
X
X
$ 12 $ 11$ 11 $ 10
MemorySign
Extend
DataBits 0-15
Bits 16-20
Bits 11-15
ALU
Cont.
M
UALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition24
Reg. Dst.
Lecture #21: The Pipeline MIPS Processor
7/31/2019 Pipelined MIPS Porcessor
25/48
IF: add $14 $8 $9 ID/RF: or $13 $6 $7 EX: and $12 $4 $5 MEM: sub $11 $2 $3 WB: lw $1020($1)
ID/EX EX/MEM MEM/WB
ite
ADD+4
U
X
IF/ID
-31
Control
Decode
ite d
RegisterWr
[ $6 ]
$ 6
$ 7
[ $4 ]PInstruction
Address
Memory
Left
shift
2
Inst.
Reg. Block
Rs
Rt ReadData 1In
structionbits
ALU Srce
Branch
MemoryWr
MemoryRe
emory/ALUResult
[ $7 ] [$5]
[ $5 ]and
$10 ALUM
U
X
Data
Address
-
M
U
X
Rd
Write
Data
Read
Data 2
Write
Read
Data
M
X
X
$ 13 $ 12$ 12 $ 11 $ 10
MemorySignExtend
32 DataBits 0-15
Bits 16-20
Bits 11-15
ALU
Cont.
M
UALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition25
Reg. Dst.
Lecture #21: The Pipeline MIPS Processor
7/31/2019 Pipelined MIPS Porcessor
26/48
IF: Idle ID/RF: add $14 $8 $9 EX: or $13 $6 $7 MEM: and $12 WB: sub$4, $5 $11, $2, $3
ID/EX EX/MEM MEM/WB
ite
ADD+4
U
X
IF/ID
-31
Control
Decode
ite d
RegisterWr
[ $8 ]
$ 8
$ 9
[ $6 ]PInstruction
Address
Memory
Left
shift
2
Inst.
Reg. Block
Rs
Rt ReadData 1In
structionbits
ALU Srce
Branch
MemoryWr
MemoryRe
emory/ALUResult
[ $9 ]$11 [$7]
[ $7 ] or
ALUM
U
X
Data
Address
-
M
U
X
Rd
Write
Data
Read
Data 2
Write
Read
Data
M
X
X
$ 14 $ 13$ 13 $ 12 $ 11
MemorySignExtend
DataBits 0-15
Bits 16-20
Bits 11-15
ALU
Cont.
M
UALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition26
Reg. Dst.
Lecture #21: The Pipeline MIPS Processor
7/31/2019 Pipelined MIPS Porcessor
27/48
IF: Idle ID/RF: Idle WB EX: add $14 $8 $9$12, $4, $5
ID/EX EX/MEM MEM/WB
ite
ADD+4
U
X
IF/ID
-31
Control
Decode
ite d
RegisterWr
[ $8 ]PInstruction
Address
Memory
Left
shift
2
Inst.
Reg. Block
Rs
Rt ReadData 1In
structionbits
ALU Srce
Branch
MemoryWr
MemoryRe
emory/ALUResult
[$9]
[ $9 ]add
$12 ALUM
U
X
Data
Address
-
M
U
X
Rd
Write
Data
Read
Data 2
Write
Read
Data
M
$ 14$ 14 $ 13 $ 12
MemorySignExtend
32 DataBits 0-15
Bits 16-20
Bits 11-15
ALU
Cont.
M
UALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition27
Reg. Dst.
Lecture #21: The Pipeline MIPS Processor
7/31/2019 Pipelined MIPS Porcessor
28/48
IF: Idle ID/RF: Idle WB EX: Idle$13, $6, $7
ID/EX EX/MEM MEM/WB
ite
ADD+4
U
X
IF/ID
-31
Control
Decode
ite d
RegisterWr
PInstruction
Address
Memory
Left
shift
2
Inst.
Reg. Block
Rs
Rt ReadData 1In
structionbits
ALU Srce
Branch
MemoryWr
MemoryRe
emory/ALUResult
$13ALU
M
U
X
Data
Address
-
M
U
X
Rd
Write
Data
Read
Data 2
Write
Read
Data
M
$ 14 $ 13
MemorySignExtend
32 DataBits 0-15
Bits 16-20
Bits 11-15
ALU
Cont.
M
UALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition28
Reg. Dst.
Lecture #21: The Pipeline MIPS Processor
7/31/2019 Pipelined MIPS Porcessor
29/48
IF: Idle ID/RF: Idle WB EX: Idle MEM: Idle$14, $8, $9
ID/EX EX/MEM MEM/WB
ite
ADD+4
U
X
IF/ID
-31
Control
Decode
ite d
RegisterWr
PInstruction
Address
Memory
Left
shift
2
Inst.
Reg. Block
Rs
Rt ReadData 1In
structionbits
ALU Srce
Branch
MemoryWr
MemoryRe
emory/ALUResult
$14ALU
M
U
X
Data
Address
-
M
U
X
Rd
Write
Data
Read
Data 2
Write
Read
Data
M
$ 14
MemorySignExtend
32 DataBits 0-15
Bits 16-20
Bits 11-15
ALU
Cont.
M
UALU Op
After David A. Patterson and John L. Hennessy,Computer Organization and Design, 2nd Edition29
Reg. Dst.
Lecture #21: The Pipeline MIPS Processor
7/31/2019 Pipelined MIPS Porcessor
30/48
IF: Idle ID/RF: Idle EX: Idle MEM: Idle WB:Idle
ID/EX EX/MEM MEM/WB
ite
ADD+4
U
X
IF/ID
-31
Control
Decode
ite d
RegisterWr
PInstruction
Address
Memory
Left
shift
2
Inst.
Reg. Block
Rs
Rt ReadData 1In
structionbits
ALU Srce
Branch
MemoryWr
MemoryRe
emory/ALUResult
ALUM
U
X
Data
Address
-
M
U
X
Rd
Write
Data
Read
Data 2
Write
Read
Data
M
MemorySignExtend
32 DataBits 0-15
Bits 16-20
Bits 11-15
ALU
Cont.
M
UALU Op
30
Reg. Dst.
Lecture #21: The Pipeline MIPS Processor
7/31/2019 Pipelined MIPS Porcessor
31/48
Erik Jonsson School of En ineerin and
Computer Sciencee n vers y o exas a a as
Pipeline Processor Operation SummaryPipeline Processor Operation Summary
Pipelining replaces the single-cycle processor with a
- ,
completing one part of each instruction.
A new instruction is started every clock cycle. Inter-process registers store instruction information
(data, write register, branch conditions) between cycles
between the pipeline stages.
When the pipeline is filled with instructions, an
N. B. Dodge 01/1231 Lecture #21: The Pipeline MIPS Processor
instruction completes every clock cycle.
7/31/2019 Pipelined MIPS Porcessor
32/48
Erik Jonsson School of En ineerin and
Computer Sciencee n vers y o exas a a as
Exercise 1Exercise 1
On the diagram on the next page, identify the
following:
1. Highlight all the control lines that must be active during a load
word instruction.2. As in our exercise in Lecture 20, identify the decoder
locations.
3. The ID/EX Re ister interface stores the most bits of an of the
pipeline section interfaces. Approximately how many bits isthat, according to the diagram?
N. B. Dodge 01/1232 Lecture #21: The Pipeline MIPS Processor
7/31/2019 Pipelined MIPS Porcessor
33/48
Print out a copy of this diagram and bring to class.
M
U
X
ID/EX EX/MEM MEM/WB
Control
Decode
gisterWrite
ADD
ADD+4
Memory
Left
shift
2
Reg. Block
ionbits0-3
1
Branch moryWrite
moryRead
Re
Result
ALU
P
C
Instruction
Address
MData
Inst.
0-31
M
Rt
Rd
Read
Data 1
ReadData 2 Read
Instruct
ALU Srce Me
Me
Memory/AL
Memory
U
X
U
X
Write
Data
Sign
Extend
32Write
Data
a a
Bits 0-15
ALU
Bits 16-20
Bits 11-15
.
MU
X
ALU Op
Reg. Dst.
Lecture #21: The Pipeline MIPS Processor
7/31/2019 Pipelined MIPS Porcessor
34/48
Erik Jonsson School of En ineerin and
Computer Sciencee n vers y o exas a a as
HazardsHazards
Hazards occur because data required for executing the
.
An instruction in the register fetch cycle may need
data from a register whose value will be changed by aninstruction downstream but still in process in the
pipeline (in the ALU, memory/memory bypass or
writeback c cle .
Thus an upstream instruction could access a register
and get incorrect data because the register data has not
N. B. Dodge 01/1234 Lecture #21: The Pipeline MIPS Processor
yet een up ate y a ownstream nstruct on.
7/31/2019 Pipelined MIPS Porcessor
35/48
Erik Jonsson School of En ineerin and
Computer Sciencee n vers y o exas a a as
Hazards (2)Hazards (2)
There are two types of hazards, data hazards, andcontrol hazards.
Both occur because an instruction in the ID/RF stage of
the MIPS pipeline needs register data that will be
MEM/Bypass, or WB stage.
Data hazards occur when an instruction needs register
contents for an arithmetic/ logical/memory instruction. Control hazards occur when a branch instruction is
N. B. Dodge 01/1235 Lecture #21: The Pipeline MIPS Processor
branch is not yet available in the same sort of scenario.
7/31/2019 Pipelined MIPS Porcessor
36/48
Erik Jonsson School of En ineerin and
Computer Sciencee n vers y o exas a a as
Data Hazard in the PipelineData Hazard in the Pipelineme ne
(clock
cycles) 0 1 2 3 4 5 6 7 8 9 105 clock cycles
sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2 Instruc. Reg. ALU Mem. R/W Reg.
Instruc.
Fetch
Reg.
Fetch
ALU
Process
Mem. R/W
or ALU Out
Reg.
Write
Instruc.
Fetch
Reg.
Fetch
ALU
Process
Mem. R/W
or ALU Out
Reg.
Write
add $14, $2, $2
sw $15, 100($2) Instruc.Fetch
Reg.
Fetch
ALU
Process
Mem. R/W
or ALU Out
Reg.
Write
Instruc.
Fetch
Reg.
Fetch
ALU
Process
Mem. R/W
or ALU Out
Reg.
Write
In the instruction sequence above, the last four instructionsrequire data from $2, which is changed in the first instruction.
The $2 data will not be rewritten until cycle 4, so the AND and OR
N. B. Dodge 01/1236 Lecture #21: The Pipeline MIPS Processor
n an r ns ruc ons w e c ncorrec a a rom .
Even the add may not get the correct information (sw is okay).
7/31/2019 Pipelined MIPS Porcessor
37/48
Erik Jonsson School of En ineerin and
Computer Sciencee n vers y o exas a a as
Control Hazards in the PipelineControl Hazards in the Pipelineme ne
(clock
cycles) 0 1 2 3 4 5 6 7 8 9 105 clock cycles
sub $2, $1, $3
blt $2, $8, wait
b t $2, $7, o Instruc. Reg. ALU Mem. R/W Reg.
Instruc.
Fetch
Reg.
Fetch
ALU
Process
Mem. R/W
or ALU Out
Reg.
Write
Instruc.
Fetch
Reg.
Fetch
ALU
Process
Mem. R/W
or ALU Out
Reg.
Write
add $14, $2, $2
sw $15, 100($2) Instruc.Fetch
Reg.
Fetch
ALU
Process
Mem. R/W
or ALU Out
Reg.
Write
Instruc.
Fetch
Reg.
Fetch
ALU
Process
Mem. R/W
or ALU Out
Reg.
Write
Here the problem is changed, with two branch instructions added.
Neither branch instruction may be executed correctly, once again
N. B. Dodge 01/1237 Lecture #21: The Pipeline MIPS Processor
.
This wrong data could cause an incorrect branch.
7/31/2019 Pipelined MIPS Porcessor
38/48
Erik Jonsson School of En ineerin and
Computer Sciencee n vers y o exas a a as
Forwarding as a Solution to Data HazardsForwarding as a Solution to Data Hazards
0 1 2 3 4 5
oc cyc es
ID/
RF
ID/
One solution to the problem of data hazards is forwarding.
RF
Forwarding uses the fact that although instruction 2 needs registerdata two clock cycles before instruction 1 enters the WB stage, that
data is already available as the output of the ALU.
N. B. Dodge 01/1238 Lecture #21: The Pipeline MIPS Processor
If a mechanism were available, instruction 1 could forward required
register data after its ALU cycle to the ID/RF cycle of instruction 2.
7/31/2019 Pipelined MIPS Porcessor
39/48
Erik Jonsson School of En ineerin and
Computer Sciencee n vers y o exas a a as
Forwarding Unit in the PipelineForwarding Unit in the Pipeline
Rs
Rt
Read
Data 1
ID/EX EX/MEM MEM/WB
M
U
ALU MU
X
Rd
Write
Data
Read
Data 2
Data
AddressRead
Data
M
U
X
Forward A
Memory
r e
Data
M
X
Rs
RtEX/MEM Register Rd
Forward B
X
Forwarding
UnitMEM/WB Register Rd
N. B. Dodge 01/1239 Lecture #21: The Pipeline MIPS Processor
After David A. Patterson and John L. Hennessy, Computer Organization and Design, 2nd Edition
7/31/2019 Pipelined MIPS Porcessor
40/48
Erik Jonsson School of En ineerin and
Computer Sciencee n vers y o exas a a as
Forwarding Unit OperationForwarding Unit Operation
ALU
Reg. Block
Memory
ForwardingUnit
The forwarding unit samples register ids in the EX/MEM and
MEM/WB registers to determine if source registers in the ID/RF
cyc e are e same.
If so, source register data is replaced by pipeline (as yet unwritten)
data by the forwarding unit.
N. B. Dodge 01/1240 Lecture #21: The Pipeline MIPS Processor
The correct information is thus processed and the instruction can
proceed to correct execution.
7/31/2019 Pipelined MIPS Porcessor
41/48
Erik Jonsson School of En ineerin and
Computer Sciencee n vers y o exas a a as
StallsStalls
Forwarding will not always solve the problems of data hazards.
For exam le su ose an add instruction follows a load word lw
and the add involves the register that receives the memory data.
In this case, forwarding will not work.
,
will not be available until the end of the MEM cycle. Thus the
required data is not available for a forward, and the add
instruction. if it roceeds will rocess the wron data.
A solution to this problem is the stall.
A stall halts the instruction awaiting data, while the key
N. B. Dodge 01/1241 Lecture #21: The Pipeline MIPS Processor
cycle, after which the desired data is available to the add.
7/31/2019 Pipelined MIPS Porcessor
42/48
Erik Jonsson School of En ineerin and
Computer Sciencee n vers y o exas a a as
Result of Stall ApproachResult of Stall Approachme ne
(clock
cycles) 0 1 2 3 4 5 6 7 8 9 105 clock cycles
lw $2, 32($3)
add $14, $6, $2
sw $15, 80 $2 Instruc. Reg. ALU Mem. R/W Reg.
Instruc.
Fetch
Reg.
Fetch
ALU
Process
Mem. R/W
or ALU Out
Reg.
Write
Instruc.
Fetch
Reg.
Fetch
ALU
Process
Mem. R/W
or ALU Out
Reg.
Write
Consider the 3 instructions above, the last two
depending on the lw.
$2 contents will be available at the beginning of the WBstage in the first instruction, but not before.
N. B. Dodge 01/1242 Lecture #21: The Pipeline MIPS Processor
,
the add and sw instructions hold place for one cycle.
7/31/2019 Pipelined MIPS Porcessor
43/48
Erik Jonsson School of En ineerin and
Computer Sciencee n vers y o exas a a as
Result of Stall Approach (2)Result of Stall Approach (2)(clock
cycles) 0 1 2 3 4 5 6 7 8 9 105 clock cycles
lw $2, 32($3)
add $14, $6, $2 (delayed 1 count)
sw $15, 80($2) (delayed 1 count)Instruc.
Fetch
Reg.
Fetch
ALU
Process
Mem. R/W
or ALU Out
Reg.
Write
Instruc.
Fetch
Reg.
Fetch
ALU
Process
Mem. R/W
or ALU Out
Reg.
Write
.
Fetch
.
Fetch Process
.
or ALU Out
.
Write
With the delay, the lw result feeds the ALU input stage
of the add instruction, and the fetch stage of the sw.
Note that forwarding in still required (this time fromthe MEM/WB interface, not the ALU output).
N. B. Dodge 01/1243 Lecture #21: The Pipeline MIPS Processor
, ,
following a lw must also be delayed for one clock cycle.
7/31/2019 Pipelined MIPS Porcessor
44/48
Erik Jonsson School of En ineerin and
Computer Sciencee n vers y o exas a a as
Other Problems With BranchesOther Problems With Branches
A remaining problem is what to do about instructions following a
branch. Even assuming forwarding and stalls, the branch/no
branch decision is not made until the third stage. This means that
in the MIPS pipeline, two following instructions will enter the pipe
before the branch/no branch decision is made. What if: The following instructions were for the case of branch taken and
the branch was not taken.
The following instructions were for branch not taken and it was
a en.
In either case, the wrong instructions are in the pipe and they must
be eliminated (flushed). How can this problem be prevented?
N. B. Dodge 01/1244 Lecture #21: The Pipeline MIPS Processor
ew approac es to t e pro em are s own n t e o ow ng s es.
7/31/2019 Pipelined MIPS Porcessor
45/48
Erik Jonsson School of En ineerin and
Computer Sciencee n vers y o exas a a as
Control Hazard Approaches (1)Control Hazard Approaches (1)
MIPS R-2000 Pipeline Processor
WBALU/EX
ID/RFIFMEM/
One a roach is to alwa s assume the branch is or is not taken:
Direction of pipeline flow
Say we assume the branch is never taken. Then if the instruction in ALU/EX
is a branch, the instructions in IF and ID/RF will be those in the not taken
program line (branch determination is made in ALU/EX).
s assump on s correc , e p pe ne w con nue o ow w ou e ay.
When the branch is taken, instructions in IF and ID/RF must be flushed,
usually by changing the op code of those instructions to a nop and letting
them proceed to the end of the pipe.
N. B. Dodge 01/1245 Lecture #21: The Pipeline MIPS Processor
Clearly, a 2-clock time delay is involved here, and it would be worse for longer
pipelines (P-IV pipeline ~ 20 stages).
7/31/2019 Pipelined MIPS Porcessor
46/48
Erik Jonsson School of En ineerin and
Computer Sciencee n vers y o exas a a as
Control Hazard Approaches (2)Control Hazard Approaches (2)
MIPS R-2000 Pipeline Processor
WBALU/EXID/RF
IFMEM/
Branch
Comparator
Reducing the cost of taking the branch:
In this case, a branch assumption is still made (taken or not taken).
identified in the ID/RF stage, a comparator can be added there to do thebranch/no-branch determination.
With the branch determination made in this earl sta e onl one
N. B. Dodge 01/1246 Lecture #21: The Pipeline MIPS Processor
instruction must be flushed, in the IF stage (only a 1-instruction delay).
7/31/2019 Pipelined MIPS Porcessor
47/48
Erik Jonsson School of En ineerin and
Computer Sciencee n vers y o exas a a as
Control Hazard Approaches (3)Control Hazard Approaches (3)
MIPS R-2000 Pipeline Processor
WBALU/EXID/RF
IFMEM/
Branch feedback based on History
Branch
History
ynam c ranc pre c on ase on recen ranc s ory:
In this approach, an indicator bit (0/1) gives the last branch condition.
The next branch can be made according to the bit setting.
,
time until a substantial number of calculations are complete. Some schemes use 2 bits and do not change the prediction until the
predictor is wrong twice, after which the alternate behavior is chosen.
N. B. Dodge 01/1247 Lecture #21: The Pipeline MIPS Processor
In either case, incorrect predictions will still be made, but hopefully notas often.
7/31/2019 Pipelined MIPS Porcessor
48/48
Erik Jonsson School of En ineerin and
Computer Sciencee n vers y o exas a a as
Exercise 2Exercise 2
1. Explain forwarding in your own words.
.
problem be solved?
3. Wh could 2-bit d namic branch rediction work toensure about a 1% error rate in branch prediction in
a subroutine that loops about 100 times before
called frequently, and that it always executes 100 ormore loop traversals before returning to the calling
N. B. Dodge 01/1248 Lecture #21: The Pipeline MIPS Processor
program.