Date post: | 05-Jan-2016 |
Category: |
Documents |
Upload: | irene-hart |
View: | 222 times |
Download: | 3 times |
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 11
ELEC 5200-001/6200-001ELEC 5200-001/6200-001Computer Architecture and DesignComputer Architecture and Design
Fall 2014Fall 2014 Pipelining (Chapter 6)Pipelining (Chapter 6)
Vishwani D. AgrawalVishwani D. AgrawalJames J. Danaher ProfessorJames J. Danaher Professor
Department of Electrical and Computer EngineeringDepartment of Electrical and Computer EngineeringAuburn University, Auburn, AL 36849Auburn University, Auburn, AL 36849
http://www.eng.auburn.edu/[email protected]
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 22
ILP: Instruction Level ParallelismILP: Instruction Level Parallelism
Single-cycle and multi-cycle datapaths Single-cycle and multi-cycle datapaths execute one instruction at a time.execute one instruction at a time.
How can we get better performance?How can we get better performance?
Answer: Execute multiple instructions at a Answer: Execute multiple instructions at a time:time:
Pipelining – Enhance a multi-cycle datapath to Pipelining – Enhance a multi-cycle datapath to fetch one instruction every cycle.fetch one instruction every cycle.
Parallelism – Fetch multiple instructions every Parallelism – Fetch multiple instructions every cycle.cycle.
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 33
Automobile Team AssemblyAutomobile Team Assembly
1 car assembled every four hours6 cars per day180 cars per month2,040 cars per year
1 hour
1 hour
1 hour1 hour
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 44
Automobile Assembly LineAutomobile Assembly Line
Task 11 hour
Task 21 hour
Task 31 hour
Task 41 hour
First car assembled in 4 hours (pipeline latency) thereafter, 1 car completed per hour 21 cars on first day, thereafter 24 cars per day 717 cars per month 8,637 cars per yearWhat gives 4X increase?
Mecahnical Electrical Painting Testing
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 55
Throughput: Team AssemblyThroughput: Team Assembly
Mechanical Electrical Painting Testing Mechanical Electrical Painting Testing
Time of assembling one car = n hours
where n is the number of nearly equal subtasks,each requiring 1 unit of time
Throughput = 1/n cars per unit time
Red car completed
Red car started
TimeBlue car started
Blue car completed
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 66
Throughput: Assembly LineThroughput: Assembly Line
Mechanical Electrical Painting Testing
Mechanical Electrical Painting Testing
Mechanical Electrical Painting Testing
Mechanical Electrical Painting Testing
Car 1
Car 2
Car 3
Car 4
.
.
Car 1 complete
Car 2 complete
Time to complete first car = n time units (latency)
Cars completed in time T = T – n + 1
Throughput = 1 – (n – 1)/ T cars per unit time
Throughput (assembly line) 1 – (n – 1)/ T n(n – 1)─────────────────── = ──────── = n – ───── → nThroughput (team assembly) 1/n T as T→∞
time
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 77
Some Features of Assembly LineSome Features of Assembly Line
Task 11 hour
Task 21 hour
Task 31 hour
Task 41 hour
Mechanical Electrical Painting Testing
Electrical parts delivered (JIT)
Defect found
Stall assembly line to fix the cause of defect
3 cars in the assembly line are suspects, to be removed (flush pipeline)
Pros and ConsPros and ConsAdvantages:Advantages:
Efficient use of labor.Efficient use of labor.
Specialists can do better job.Specialists can do better job.
Just in time (JIT) methodology eliminates warehouse cost.Just in time (JIT) methodology eliminates warehouse cost.
Disadvantages:Disadvantages:Penalty of defect latency.Penalty of defect latency.
Lack of flexibility in production.Lack of flexibility in production.
Assembly line work is monotonous and boring.Assembly line work is monotonous and boring.
https://www.youtube.com/watch?v=IjarLbD9r30
https://www.youtube.com/watch?v=ANXGJe6i3G8
https://www.youtube.com/watch?v=5lp4EbfPAtI
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 88
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 99
Pipelining in a ComputerPipelining in a ComputerDivide datapath into nearly equal tasks, to be Divide datapath into nearly equal tasks, to be performed serially and requiring non-overlapping performed serially and requiring non-overlapping resources.resources.Insert registers at task boundaries in the Insert registers at task boundaries in the datapath; registers pass the output data from datapath; registers pass the output data from one task as input data to the next task.one task as input data to the next task.Synchronize tasks with a clock having a cycle Synchronize tasks with a clock having a cycle time that just exceeds the time required by the time that just exceeds the time required by the longest task.longest task.Break each instruction down into the set of tasks Break each instruction down into the set of tasks so that instructions can be executed in a so that instructions can be executed in a staggered fashion.staggered fashion.
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1010
Pipelining a Single-Cycle DatapathPipelining a Single-Cycle DatapathInstructionInstruction
classclass
Instr.Instr.
fetchfetch
(IF)(IF)
Instr. Instr. DecodeDecode
(also reg. (also reg. file read)file read)
(ID)(ID)
Execution Execution (ALU(ALU
Operation)Operation)
(EX)(EX)
Data Data accessaccess
(MEM)(MEM)
Write Write Back Back (Reg. (Reg.
file file write)write)
(WB)(WB)
Total Total timetime
lwlw 2ns2ns 1ns1ns 2ns2ns 2ns2ns 1ns1ns 8ns8ns
swsw 2ns2ns 1ns1ns 2ns2ns 2ns2ns 8ns8ns
R-formatR-formatadd, sub, and, or, sltadd, sub, and, or, slt
2ns2ns 1ns1ns 2ns2ns 1ns1ns 8ns8ns
B-format,B-format, beq beq 2ns2ns 1ns1ns 2ns2ns 8ns8ns
No operation on data; idle times equalize instruction lengths.
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1111
Execution Time: Single-CycleExecution Time: Single-Cycle
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
0 2 4 6 8 10 12 14 16 . .
Time (ns) lw $1, 100($0) lw $2, 200($0) lw $3, 300($0)
Clock cycle time = 8 ns
Total time for executing three lw instructions = 24 ns
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1212
Pipelined DatapathPipelined DatapathInstructionInstruction
classclass
Instr.Instr.
fetchfetch
(IF)(IF)
Instr. Instr. DecodeDecode
(also reg. (also reg. file read)file read)
(ID)(ID)
Execu-Execu-tion tion
(ALU(ALU
Opera-Opera-tion)tion)
(EX)(EX)
Data Data accessaccess
(MEM)(MEM)
Write Write Back Back (Reg. (Reg.
file file write)write)
(WB)(WB)
Total Total timetime
lwlw 2ns2ns1ns1ns
2ns2ns2ns2ns 2ns2ns
1ns1ns
2ns2ns10ns10ns
swsw 2ns2ns1ns1ns
2ns2ns2ns2ns 2ns2ns
1ns1ns
2ns2ns10ns10ns
R-format: add, R-format: add, sub, and, or, sltsub, and, or, slt 2ns2ns
1ns1ns
2ns2ns2ns2ns 2ns2ns
1ns1ns
2ns2ns10ns10ns
B-format:B-format:
beqbeq2ns2ns
1ns1ns
2ns2ns2ns2ns 2ns2ns
1ns1ns
2ns2ns10ns10ns
No operation on data; idle time inserted to equalize instruction lengths.
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1313
Execution Time: PipelineExecution Time: Pipeline
IF ID EX MEM RW
IF ID EX MEM RW
IF ID EX MEM RW
0 2 4 6 8 10 12 14 16 . .
Time (ns) lw $1, 100($0)
lw $2, 200($0)
lw $3, 300($0)
Clock cycle time = 2 ns, four times faster than single-cycle clock
Total time for executing three lw instructions = 14 ns
Single-cycle time 24Performance ratio = ──────────── = ── = 1.7
Pipeline time 14
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1414
Pipeline PerformancePipeline PerformanceClock cycle time = 2 ns
1,003 lw instructions:
Total time for executing 1,003 lw instructions = 2,014 ns
Single-cycle time 8,024Performance ratio = ──────────── = ──── = 3.98
Pipeline time 2,014
10,003 lw instructions:
Performance ratio = 80,024 / 20,014 = 3.998 → Clock cycle ratio (4)
Pipeline performance approaches clock-cycle ratio for long programs.
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1515
Instr. mem.
PC
Ad
d
Re
g. F
ileDatamem.1
mu
x 0
1 m
ux
0
0 m
ux
1
4
1 m
ux
0
Sign ext.
Shift left 2
ALUCont.
CO
NT
RO
L
opcode
MemWriteMemRead
AL
U
Branch
zero
0-15
0-5
11-15
16-20
21-25
26-31
AL
U
Single-Cycle Datapath
MemtoReg
ALUOp
ALUSrc
RegDst
RegWrite
IF: Instr. fetch ID: Instr. decode,reg. file read
EX: Execute,address calc.
MEM: mem. access
WB:writeback
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1616
Pipelining of RISC InstructionsPipelining of RISC Instructions(From Lecture 3, Slide 6)(From Lecture 3, Slide 6)
FetchInstruction
ExamineOpcode
ALUOperation
MemoryRead/Write
StoreResult
Although an instruction takes five clock cycles,one instruction is completed every cycle.
IF ID EX MEM WB
Instruction Decode Execute Memory WriteFetch instruction and Operation Back
Fetch operands to Reg file
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1717
Instr. mem.
PC
Ad
d
Re
g. F
ileDatamem.1
mu
x 0
1 m
ux
0
0 m
ux
1
4
1 m
ux
0
Sign ext.
Shift left 2
ALUCont.
CO
NT
RO
L
opcode
MemWriteMemRead
AL
U
Branch
zero
0-15
0-5
11-15
16-20
21-25
26-31
AL
U
Pipeline Registers
MemtoReg
ALUOp
ALUSrc
RegDst
RegWrite
IF/ID ID/EX EX/MEM
MEM/WB
This requires aCONTROL not too different from single-cycle
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1818
Pipeline Register FunctionsPipeline Register FunctionsFour pipeline registers are added:Four pipeline registers are added:
Register Register namename Data heldData held
IF/IDIF/ID PC+4, Instruction word (IW)PC+4, Instruction word (IW)
ID/EXID/EX PC+4, R1, R2, IW(0-15) sign ext., IW(11-15)PC+4, R1, R2, IW(0-15) sign ext., IW(11-15)
EX/MEMEX/MEM PC+4, zero, ALUResult, R2, IW(11-15) or IW(16-20)PC+4, zero, ALUResult, R2, IW(11-15) or IW(16-20)
MEM/WBMEM/WB M[ALUResult], ALUResult, IW(11-15) or IW(16-20)M[ALUResult], ALUResult, IW(11-15) or IW(16-20)
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 1919
Instr
memPC
Ad
d
Re
g. F
ile
Data
mem.1 m
ux
0
1 m
ux
0
0 m
ux
1
4
Sign ext.
Shift left 2
opcode
AL
U
zero
0-15
16-20
21-25
26-31
AL
U
Pipelined Datapath
IF/ID ID/EX EX/MEM MEM/WB
11-15 for R-type16-20 for I-type lw
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2020
Five-Cycle PipelineFive-Cycle Pipeline
CC1 CC2 CC3 CC4 CC5
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
PC
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2121
Add InstructionAdd Instruction addadd $t0, $s1, $s2$t0, $s1, $s2Machine instruction wordMachine instruction word
000000000000 10001 10010 01000 00000 10001 10010 01000 00000 100000100000opcodeopcode $s1 $s2 $t0 $s1 $s2 $t0 functionfunction
CC1 CC2 CC3 CC4 CC5
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IF ID EX MEM WBread $s1 add write $t0read $s2 $s1+$s2
PC
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2222
PC
Re
g. F
ile
1 m
ux
0
1 m
ux
0
0 m
ux
1
4
Sign ext.
Shift left 2
opcode
AL
U
zero
0-15
16-20
21-25
26-31
AL
U
Pipelined Datapath Executing add
IF/ID ID/EX EX/MEM MEM/WB
s1
$s1
11-15 for R-type16-20 for I-type lw
t0
Instr
mems2 $s2
Ad
d
addr
Datamem
data
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2323
Load InstructionLoad Instruction
lwlw $t0, 1200 ($t1)$t0, 1200 ($t1)
100011 100011 01001 01000 01001 01000 0000 0100 1000 00000000 0100 1000 0000
opcodeopcode $t1 $t1 $t0$t0 12001200CC1 CC2 CC3 CC4 CC5
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IF ID EX MEM WBread $t1 add read write $t0sign ext $t1+1200 M[addr]1200
PC
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2424
PC
Ad
d
Re
g. F
ile addr
Datamem
data1
mu
x 0
1 m
ux
0
0 m
ux
1
4
Sign ext.
Shift left 2
opcode
AL
U
zero
0-15
16-20
21-25
26-31
AL
U
Pipelined Datapath Executing lw
IF/ID ID/EX EX/MEM MEM/WB
t1
1200
$t1
11-15 for R-type16-20 for I-type lw
t0
Instr
mem
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2525
Store InstructionStore Instructionswsw $t0, 1200 ($t1)$t0, 1200 ($t1)
101011101011 01001 01000 01001 01000 0000 0100 1000 00000000 0100 1000 0000
opcodeopcode $t1 $t1 $t0$t0 12001200CC1 CC2 CC3 CC4 CC5
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IF ID EX MEM WBread $t1 add write sign ext $t1+1200 M[addr]1200 (addr) ← $t0
PC
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2626
PC
Re
g. F
ile addr
Datamem
data1
mu
x 0
1 m
ux
0
0 m
ux
1
4
Sign ext.
Shift left 2
opcode
AL
U
zero
0-15
16-20
21-25
26-31
AL
U
Pipelined Datapath Executing sw
IF/ID ID/EX EX/MEM MEM/WB
t1
1200
$t1
11-15 for R-type16-20 for I-type lw
Instr
memt0
$t0
Ad
d
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2727
Executing a ProgramExecuting a Program
Consider a five-instruction segment:
lw $10, 20($1) sub $11, $2, $3 add $12, $3, $4 lw $13, 24($1) add $14, $5, $6
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2828
Program ExecutionProgram ExecutionCC1 CC2 CC3 CC4 CC5
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
lw $10, 20($1)
sub $11, $2, $3
add $12, $3, $4
lw $13, 24($1)
add $14, $5, $6
time
Pro
gra
m in
stru
ctio
ns
PC
PC
PC
PC
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 2929
Instr
memPC
Ad
d
Re
g. F
ile
Data
mem.1 m
ux
0
1 m
ux
0
0 m
ux
1
4
Sign ext.
Shift left 2
opcode
AL
U
zero
0-15
16-20
21-25
26-31
AL
U
CC5
IF/ID ID/EX EX/MEM MEM/WB
11-15 for R-type16-20 for I-type lw
IF: add $14, $5, $6 ID: lw $13, 24($1) EX: add $12, $3, $4MEM: sub $11, $2, $3
WB: lw $10, 20($1)
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3030
Advantages of PipelineAdvantages of PipelineAfter the fifth cycle (CC5), one instruction is After the fifth cycle (CC5), one instruction is completed each cycle; CPI ≈ 1, neglecting the completed each cycle; CPI ≈ 1, neglecting the initial initial pipeline latencypipeline latency of 5 cycles. of 5 cycles.– Pipeline latency is defined as the number of stages in Pipeline latency is defined as the number of stages in
the pipeline, orthe pipeline, or– The number of clock cycles after which the first The number of clock cycles after which the first
instruction is completed.instruction is completed.
The clock cycle time is about four times shorter The clock cycle time is about four times shorter than that of single-cycle datapath and about the than that of single-cycle datapath and about the same as that of multicycle datapath.same as that of multicycle datapath.For multicycle datapath, CPI = 3. ….For multicycle datapath, CPI = 3. ….So, pipelined execution is faster, but . . .So, pipelined execution is faster, but . . .
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3131
Science is always wrong. It never solves a problem without creating ten more.
George Bernard Shaw
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3232
Pipeline HazardsPipeline HazardsDefinition: Definition: Hazard in a pipeline is a Hazard in a pipeline is a situation in which the next instruction situation in which the next instruction cannot complete execution one clock cycle cannot complete execution one clock cycle after completion of the presentafter completion of the present instructioninstruction..
Three types of hazards:Three types of hazards:– Structural hazard (resource conflict)Structural hazard (resource conflict)– Data hazardData hazard– Control hazardControl hazard
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3333
Structural HazardStructural HazardTwo instructions cannot execute due to a Two instructions cannot execute due to a resource conflict.resource conflict.Example: Consider a computer with a Example: Consider a computer with a common data and instruction memory. common data and instruction memory. The fourth cycle of a The fourth cycle of a lwlw instruction instruction requires memory access (memory read) requires memory access (memory read) and at the same time the first cycle of the and at the same time the first cycle of the fourth instruction requires instruction fetch fourth instruction requires instruction fetch (memory read). This will cause a memory (memory read). This will cause a memory resource conflict.resource conflict.
PC
PC
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3434
Example of Structural HazardExample of Structural HazardCC1 CC2 CC3 CC4 CC5
IM/D
M
ID, R
EG
. F
ILE
R
EA
D
AL
U
IM/D
M R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IM/D
M
ID, R
EG
. F
ILE
R
EA
D
AL
U
IM/D
M R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IM/D
M
ID, R
EG
. F
ILE
R
EA
D
AL
U
IM/D
M R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IM/D
M
ID, R
EG
. F
ILE
R
EA
D
AL
U
IM/D
M R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
lw $10, 20($1)
sub $11, $2, $3
add $12, $3, $4
lw $13, 24($1)
time
Pro
gra
m in
stru
ctio
ns
Common data and instr. Mem.
Nedded by two instructions
PC
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3535
Possible Remedies for Structural Possible Remedies for Structural HazardsHazards
Provide duplicate hardware resources in Provide duplicate hardware resources in datapath.datapath.
Control unit or compiler can insert delays Control unit or compiler can insert delays (no-op cycles) between instructions. This (no-op cycles) between instructions. This is known as pipeline is known as pipeline stallstall or or bubblebubble..
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3636
Stall (Bubble) for Structural HazardStall (Bubble) for Structural HazardCC1 CC2 CC3 CC4 CC5
IM/D
M
ID, R
EG
. F
ILE
R
EA
D
AL
U
IM/D
M R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IM/D
M
ID, R
EG
. F
ILE
R
EA
D
AL
U
IM/D
M R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IM/D
M
ID, R
EG
. F
ILE
R
EA
D
AL
U
IM/D
M R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IM/D
M
ID, R
EG
. F
ILE
R
EA
D
AL
U
IM/D
M R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
lw $10, 20($1)
sub $11, $2, $3
add $12, $3, $4
lw $13, 24($1)
time
Pro
gra
m in
stru
ctio
ns
Stall (bubble)
PC
PC
PC
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3737
Data HazardData Hazard
Data hazard means that an instruction Data hazard means that an instruction cannot be completed because the needed cannot be completed because the needed data, being generated by another data, being generated by another instruction in the pipeline, is not available.instruction in the pipeline, is not available.
Example: consider two instructions:Example: consider two instructions: addadd $s0, $t0, $t1$s0, $t0, $t1
subsub $t2, $s0, $t3$t2, $s0, $t3 # needs $s0# needs $s0
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3838
Example of Data HazardExample of Data HazardCC1 CC2 CC3 CC4 CC5
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
add $s0, $t0, $t1
sub $t2, $s0, $t3
time
Pro
gra
m in
stru
ctio
ns
Write s0 in CC5
Read s0 and t3 in CC3
We need to read s0 from reg file in cycle 3But s0 will not be written in reg file until cycle 5
However, s0 will only be used in cycle 4And it is available at the end of cycle 3
PC
PC
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 3939
Forwarding or BypassingForwarding or Bypassing
Output of a resource used by an Output of a resource used by an instruction is forwarded to the input of instruction is forwarded to the input of some resource being used by another some resource being used by another instruction.instruction.
Forwarding can eliminate some, but not Forwarding can eliminate some, but not all, data hazards.all, data hazards.
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4040
Forwarding for Data HazardForwarding for Data Hazard
CC1 CC2 CC3 CC4 CC5
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
add $s0, $t0, $t1
sub $t2, $s0, $t3
time
Pro
gra
m in
stru
ctio
ns
Write s0 in CC5
Read s0 and t3 in CC3
Forwarding
PC
PC
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4141
Forwarding Unit HardwareForwarding Unit Hardware
ForwardingUnit
DataMem.A
LU
FOR
W.
MU
X
FOR
W.
MU
X
ID/EX EX/MEM MEM/WB
MU
X
Destination registersSource reg.
IDs from opcode
Datato reg.
file
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4242
Forwarding Alone May Not WorkForwarding Alone May Not Work
CC1 CC2 CC3 CC4 CC5
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
lw $s0, 20($s1)
sub $t2, $s0, $t3
time
Pro
gra
m in
stru
ctio
ns
Write s0 in CC5
Read s0 and t3 in CC3
data needed by sub (data hazard)
data available from memory only at the end of cycle 4
PC
PC
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4343
Use Bubble and ForwardingUse Bubble and ForwardingCC1 CC2 CC3 CC4 CC5
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
lw $s0, 20($s1)
sub $t2, $s0, $t3
time
Pro
gra
m in
stru
ctio
ns
Write s0 in CC5
Forwarding
stall(bubble)
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
PC
PC
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4444
Hazard Detection Unit HardwareHazard Detection Unit Hardware
ForwardingUnit
DataMem.A
LU
FOR
W.
MU
X
FOR
W.
MU
X
ID/EX EX/MEM MEM/WB
Controlsignals
registerIDs from
prev. instr.
to reg. file
HazardDetection
Unit
PC
ControlN
OP M
UX
0
IF/ID
Inst
ruct
ion
Disablewrite
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4545
Resolving HazardsResolving Hazards
Hazards are resolved by Hazard detection Hazards are resolved by Hazard detection and forwarding units.and forwarding units.
Compiler’s understanding of how these Compiler’s understanding of how these units work can improve performance.units work can improve performance.
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4646
C code: A = B + E;C = B + F;
MIPS code: lw $t1, 0($t0) . $t1 writtenlw $t2, 4($t0) . . $t2 writtenadd $t3, $t1, $t2 . . . $t1, $t2 neededsw $t3, 12($t0) . . . .lw $t4, 8($t0) . . . . . $t4 writtenadd $t5, $t1, $t4 . . . . . $t4 neededsw $t5, 16($t0) . . . . .
. . . . . . .
. . .
Avoiding Stall by Code ReorderAvoiding Stall by Code Reorder
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4747
Reordered CodeReordered Code
C code: A = B + E;C = B + F;
MIPS code: lw $t1, 0($t0)lw $t2, 4($t0)lw $t4, 8($t0)add $t3, $t1, $t2 no hazardsw $t3, 12($t0)add $t5, $t1, $t4 no hazardsw $t5, 16($t0)
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4848
Control HazardControl Hazard
Instruction to be fetched is not known!Instruction to be fetched is not known!Example: Instruction being executed is Example: Instruction being executed is branch-type, which will determine the next branch-type, which will determine the next instruction:instruction:
addadd $4, $5, $6$4, $5, $6 beqbeq $1, $2, 40$1, $2, 40 next instructionnext instruction . . .. . . 4040 andand $7, $8, $9$7, $8, $9
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 4949
Stall on BranchStall on BranchCC1 CC2 CC3 CC4 CC5
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
add $4, $5, $6
beq $1, $2, 40
next instruction or and $7, $8, $9
time
Pro
gra
m in
stru
ctio
ns
Stall (bubble)
PC
PC
PC
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 5050
Why Only One Stall?Why Only One Stall?
Extra hardware in ID phase:Extra hardware in ID phase:Additional ALU to compute branch addressAdditional ALU to compute branch address
Comparator to generate zero signalComparator to generate zero signal
Hazard detection unit writes the branch address Hazard detection unit writes the branch address in PCin PC
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 5151
Ways to Handle BranchWays to Handle BranchStall or bubbleStall or bubble
Branch prediction:Branch prediction:– HeuristicsHeuristics
Next instructionNext instruction
Prediction based on statistics (dynamic)Prediction based on statistics (dynamic)
Hardware decision (dynamic)Hardware decision (dynamic)
– Prediction error: pipeline flushPrediction error: pipeline flush
Delayed branchDelayed branch
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 5252
Delayed Branch ExampleDelayed Branch Example
Stall on branchStall on branch
add $4, $5, $6add $4, $5, $6
beq $1, $2, beq $1, $2, skipskip
next instructionnext instruction
. . .. . .
skipskip or $7, $8, $9or $7, $8, $9
Delayed branchDelayed branch
beq $1, $2, beq $1, $2, skipskip
add $4, $5, $6add $4, $5, $6
next instructionnext instruction
. . .. . .
skipskip or $7, $8, $9or $7, $8, $9
Instruction executed irrespective of branch decision
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 5353
Delayed BranchDelayed BranchCC1 CC2 CC3 CC4 CC5
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
IM
ID, R
EG
. F
ILE
R
EA
D
AL
U
DM
R
EG
. F
ILE
W
RIT
E
IF/ID
ID/E
X
ME
M/W
B
EX
/ME
M
add $4, $5, $6
beq $1, $2, skip
next instruction or skip or $7, $8, $9
time
Pro
gra
m in
stru
ctio
ns
PC
PC
PC
Fall 2014, Sep 26 . . .Fall 2014, Sep 26 . . . ELEC 5200-001/6200-001 Lecture 6ELEC 5200-001/6200-001 Lecture 6 5454
Summary: HazardsSummary: HazardsStructural hazardsStructural hazards– Cause: resource conflictCause: resource conflict– Remedies: (i) hardware resources, (ii) stall (bubble)Remedies: (i) hardware resources, (ii) stall (bubble)
Data hazardsData hazards– Cause: data unavailablityCause: data unavailablity– Remedies: (i) forwarding, (ii) stall (bubble), (iii) code Remedies: (i) forwarding, (ii) stall (bubble), (iii) code
reorderingreordering
Control hazardsControl hazards– Cause: out-of-sequence execution (branch or jump)Cause: out-of-sequence execution (branch or jump)– Remedies: (i) stall (bubble), (ii) branch prediction/pipeline Remedies: (i) stall (bubble), (ii) branch prediction/pipeline
flush, (iii) delayed branch/pipeline flushflush, (iii) delayed branch/pipeline flush