+ All Categories
Home > Documents > Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor...

Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor...

Date post: 25-Mar-2020
Category:
Upload: others
View: 8 times
Download: 0 times
Share this document with a friend
38
Chapter 4 – The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs – SUNY New Paltz 1 The Processor Chapter 4 (Part II) Baback Izadi Division of Engineering Programs [email protected] SUNY – New Paltz Elect. & Comp. Eng. 2 SUNY – New Paltz Elect. & Comp. Eng. 2 Sequential Laundry Sequential laundry takes 8 hours for 4 loads 30 T a s k O r d e r B C D A Time 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 6 PM 7 8 9 10 11 12 1 2 AM
Transcript
Page 1: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 1

The Processor

Chapter 4(Part II)

Baback IzadiDivision of Engineering [email protected]

SUNY – New PaltzElect. & Comp. Eng. 2

SUNY – New PaltzElect. & Comp. Eng. 2

Sequential Laundry

Sequential laundry takes 8 hours for 4 loads

30Task

Order

B

C

D

ATime

30 30 3030 30 3030 30 30 3030 30 30 3030

6 PM 7 8 9 10 11 12 1 2 AM

Page 2: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 2

SUNY – New PaltzElect. & Comp. Eng. 3

SUNY – New PaltzElect. & Comp. Eng. 3

Pipelined Laundry6 PM 7 8 9

Time

B

C

D

A

3030 30 3030 30 30

Task

Order

Pipelined laundry: overlapping execution Parallelism improves performance

SUNY – New PaltzElect. & Comp. Eng. 4

SUNY – New PaltzElect. & Comp. Eng. 4

Pipelining Analogy

Four loads: Speedup

= 8/3.5 = 2.3

Non-stop: Speedup

= 2n/0.5n + 1.5 ≈ 4= number of stages

Pipelining doesn’t help latency of single task, it helps throughput of entire workload

Multiple tasks operating simultaneously using different resources

Potential speedup = Number pipe stages

Pipeline rate limited by slowestpipeline stage

Unbalanced lengths of pipe stages reduces speedup

Time to “fill” pipeline and time to “drain” it reduces speedup

Stall for dependencies

Page 3: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 3

SUNY – New PaltzElect. & Comp. Eng. 5

SUNY – New PaltzElect. & Comp. Eng. 5

Single Cycle Implementation Calculate cycle time assuming negligible delays except: memory (2ns), ALU and adders (2ns), register file access (1ns)

M e m to R e g

M e m R e a d

M e m W r i te

A L U O p

A L U S rc

R e g D s t

P C

I n s t ru c t io n m e m o ry

R e a d a d d re s s

In s tru c t io n [3 1 – 0 ]

In s tr u c t io n [2 0 – 1 6 ]

In s tr u c t io n [2 5 – 2 1 ]

A d d

In s tru c t io n [5 – 0 ]

R e g W r i te

4

1 6 3 2In s tr u c t io n [1 5 – 0 ]

0R e g is t e rs

W r ite re g is te rW r ite d a ta

W r i te d a ta

R e a d d a ta 1

R e a d d a ta 2

R e a d re g is te r 1R e a d re g is te r 2

S ig n e x te n d

A L U re s u l t

Z e r o

D a ta m e m o ry

A d d re s s R e a d d a ta

M u x

1

0

M u x

1

0

M u x

1

0

M u x

1

In s tr u c t io n [1 5 – 1 1 ]

A L U c o n t ro l

S h i f t l e f t 2

P C S rc

A L U

A d d A L U re s u lt

InstructionInstr.

MemoryRegister

ReadALU Op.

Data Memory

Reg. Write Total

R-format 200ps 100ps 200ps 0 100ps 600pslw 200ps 100ps 200ps 200ps 100ps 800pssw 200ps 100ps 200ps 200ps 700psbeq 200ps 100ps 200ps 500ps

SUNY – New PaltzElect. & Comp. Eng. 6

SUNY – New PaltzElect. & Comp. Eng. 6

Single Stage VS. Pipeline PerformanceInstruction

Instr. Memory

Register Read

ALU Op.

Data Memory

Reg. Write Total

R-format 200ps 100ps 200ps 0 100ps 600pslw 200ps 100ps 200ps 200ps 100ps 800pssw 200ps 100ps 200ps 200ps 700psbeq 200ps 100ps 200ps 500ps

Single-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)

Page 4: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 4

SUNY – New PaltzElect. & Comp. Eng. 7

SUNY – New PaltzElect. & Comp. Eng. 7

MIPS Pipeline

Five stages, one step per stage1. IF: Instruction fetch from memory2. ID: Instruction decode & register read3. EX: Execute operation or calculate address4. MEM: Access memory operand5. WB: Write result back to register

SUNY – New PaltzElect. & Comp. Eng. 8

SUNY – New PaltzElect. & Comp. Eng. 8

Pipelining What makes it easy in MIPS All instructions are the same length Just a few instruction formats Memory operands appear only in loads and stores

What makes it hard? Structural hazards: suppose we had only one memory Control hazards: need to worry about branch instructions Data hazards: an instruction depends on a previous instruction

We’ll build a simple pipeline and look at these issues what makes it really hard for modern processors Exception handling Trying to improve performance with out-of-order execution

Page 5: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 5

SUNY – New PaltzElect. & Comp. Eng. 9

SUNY – New PaltzElect. & Comp. Eng. 9

Pipeline Speedup

SUNY – New PaltzElect. & Comp. Eng. 10

SUNY – New PaltzElect. & Comp. Eng. 10

The Five Stages of Load

What do we need to add to actually split the datapath into stages?

Ifetch: Instruction Fetch Fetch the instruction from the Instruction Memory

Reg/Dec: Registers Fetch and Instruction Decode Exec: Calculate the memory address Mem: Read the data from the Data Memory Wr: Write the data back to the register file

Ifetch Reg/Dec Exec Mem WrLW

Page 6: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 6

SUNY – New PaltzElect. & Comp. Eng. 11

SUNY – New PaltzElect. & Comp. Eng. 11

Basic Idea

Instructionme mory

Addre ss

4

32

0

Add Addres ult

Shiftleft 2

Ins truction

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readre gis ter 1

Readre gis ter 2

16Sign

exte nd

Writere gis ter

Writedata

ReaddataAddre ss

Datamemory

1

ALUres ult

Mux

ALUZero

IF: Instruction fetch ID: Instruction decode/register file read

EX: Execute/address calculation

MEM: Memory access WB: Write back

SUNY – New PaltzElect. & Comp. Eng. 12

SUNY – New PaltzElect. & Comp. Eng. 12

Basic Idea

Instructionme mory

Addre ss

4

32

0

Add Addres ult

Shiftleft 2

Ins truction

Mux

0

1

Add

PC

0Writedata

Mux

1Registers

Readdata 1

Readdata 2

Readre gis ter 1

Readre gis ter 2

16Sign

exte nd

Writere gis ter

Writedata

ReaddataAddre ss

Datamemory

1

ALUres ult

Mux

ALUZero

IF: Instruction fetch ID: Instruction decode/register file read

EX: Execute/address calculation

MEM: Memory access WB: Write back

Page 7: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 7

SUNY – New PaltzElect. & Comp. Eng. 13

SUNY – New PaltzElect. & Comp. Eng. 13

Pipelining and ISA Design

MIPS ISA designed for pipelining All instructions are 32-bits Easier to fetch and decode in one cycle c.f. x86: 1- to 17-byte instructions

Few and regular instruction formats Can decode and read registers in one step

Load/store addressing Can calculate address in 3rd stage, access memory in 4th stage

Alignment of memory operands Memory access takes only one cycle

SUNY – New PaltzElect. & Comp. Eng. 14

SUNY – New PaltzElect. & Comp. Eng. 14

Pipeline registers Need registers between stages To hold information produced in previous cycle

• Can you find a problem? • What instructions can we execute to manifest the problem?

Page 8: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 8

SUNY – New PaltzElect. & Comp. Eng. 15

SUNY – New PaltzElect. & Comp. Eng. 15

Pipeline Operation

Cycle-by-cycle flow of instructions through the pipelined datapath “Single-clock-cycle” pipeline diagram Shows pipeline usage in a single cycle

Highlight resources used

c.f. “multi-clock-cycle” diagram Graph of operation over time

We’ll look at “single-clock-cycle” diagrams for load & store

SUNY – New PaltzElect. & Comp. Eng. 16

SUNY – New PaltzElect. & Comp. Eng.

IF for Load, Store, …

Page 9: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 9

SUNY – New PaltzElect. & Comp. Eng. 17

SUNY – New PaltzElect. & Comp. Eng.

ID for Load, Store, …

SUNY – New PaltzElect. & Comp. Eng. 18

SUNY – New PaltzElect. & Comp. Eng.

EX for Load

Page 10: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 10

SUNY – New PaltzElect. & Comp. Eng. 19

SUNY – New PaltzElect. & Comp. Eng.

MEM for Load

SUNY – New PaltzElect. & Comp. Eng. 20

SUNY – New PaltzElect. & Comp. Eng.

WB for Load

Wrongregisternumber

Page 11: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 11

SUNY – New PaltzElect. & Comp. Eng. 21

SUNY – New PaltzElect. & Comp. Eng.

Corrected Datapath for Load

SUNY – New PaltzElect. & Comp. Eng. 22

SUNY – New PaltzElect. & Comp. Eng.

EX for Store

Page 12: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 12

SUNY – New PaltzElect. & Comp. Eng. 23

SUNY – New PaltzElect. & Comp. Eng.

MEM for Store

SUNY – New PaltzElect. & Comp. Eng. 24

SUNY – New PaltzElect. & Comp. Eng.

WB for Store

Page 13: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 13

SUNY – New PaltzElect. & Comp. Eng. 25

SUNY – New PaltzElect. & Comp. Eng. 25

Graphically Representing Pipelines

Can help with answering questions like: How many cycles does it take to execute this code? What is the ALU doing during cycle 4? Use this representation to help understand datapaths

IM Reg DM Reg

IM Reg DM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

lw $10, 20($1)

Programexecutionorder(in instructions)

sub $11, $2, $3

ALU

ALU

SUNY – New PaltzElect. & Comp. Eng. 26

SUNY – New PaltzElect. & Comp. Eng. 26

Why Pipeline?

Instr.

Order

Time (clock cycles)

Inst 0

Inst 1

Inst 2

Inst 4

Inst 3

AL

UIm Reg Dm Reg

AL

UIm Reg Dm Reg

AL

UIm Reg Dm Reg

AL

UIm Reg Dm Reg

AL

UIm Reg Dm Reg

Page 14: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 14

SUNY – New PaltzElect. & Comp. Eng. 27

SUNY – New PaltzElect. & Comp. Eng. 27

Multi-Cycle Pipeline Diagram Form showing resource usage

SUNY – New PaltzElect. & Comp. Eng. 28

SUNY – New PaltzElect. & Comp. Eng. 28

Single-Cycle Pipeline Diagram State of pipeline in a given cycle

Page 15: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 15

SUNY – New PaltzElect. & Comp. Eng. 29

SUNY – New PaltzElect. & Comp. Eng.

Pipelined Control (Simplified)

SUNY – New PaltzElect. & Comp. Eng. 30

SUNY – New PaltzElect. & Comp. Eng. 30

Pipeline Control

We have 5 stages. What needs to be controlled in each stage? Instruction Fetch and PC Increment Instruction Decode / Register Fetch Execution Memory Stage Write Back

How would control be handled in an automobile plant? A fancy control center telling everyone what to do? Should we use a finite state machine?

Page 16: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 16

SUNY – New PaltzElect. & Comp. Eng. 31

SUNY – New PaltzElect. & Comp. Eng. 31

Pipeline Control Control signals derived from instruction As in single-cycle implementation

Pass control signals along just like the data

Execution/Address Calculation stage control lines

Memory access stage control lines

Write-back stage control

lines

InstructionReg Dst

ALU Op1

ALU Op0

ALU Src Branch

M em Read

M em Write

Reg write

M em to Reg

R-format 1 1 0 0 0 0 0 1 0lw 0 0 0 1 0 1 0 1 1sw X 0 0 1 0 0 1 0 Xbeq X 0 1 0 1 0 0 0 X

C ontrol

E X

M

W B

M

W B

W B

IF/ID ID /E X E X /M EM M E M /W B

Instruction

SUNY – New PaltzElect. & Comp. Eng. 32

SUNY – New PaltzElect. & Comp. Eng.

Pipelined Control

Page 17: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 17

SUNY – New PaltzElect. & Comp. Eng. 33

SUNY – New PaltzElect. & Comp. Eng. 33

Designing a Pipelined Processor

Go back and examine your datapath and control diagram

Associated resources with states

Ensure that flows do not conflict, or figure out how to resolve

Assert control in appropriate stage

SUNY – New PaltzElect. & Comp. Eng. 34

SUNY – New PaltzElect. & Comp. Eng. 34

Pipelining Troubles? Pipeline Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards: attempt to use the same resource two different

ways at the same time. Data hazards: attempt to use item before it is ready. Instruction depends on result of prior instruction still in the pipeline.

Control hazards: attempt to make a decision before condition is evaluated.

Branch instructions Can always resolve hazards by waiting. Pipeline control must detect the hazard. Take action (or delay action) to resolve hazards.

Page 18: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 18

SUNY – New PaltzElect. & Comp. Eng. 35

SUNY – New PaltzElect. & Comp. Eng. 35

Structure Hazards

Conflict for use of a resource

In MIPS pipeline with a single memory Load/store requires data access Instruction fetch would have to stall for that cycle Would cause a pipeline “bubble”

Hence, pipelined datapaths require separate instruction/data memories Or separate instruction/data caches

SUNY – New PaltzElect. & Comp. Eng. 36

SUNY – New PaltzElect. & Comp. Eng. 36

Data Hazards Problem with starting next instruction before first is finished Dependencies that “go backward in time” are data hazards

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecutionorder(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value ofregister $2:

DM Reg

Reg

Reg

Reg

DM

Page 19: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 19

SUNY – New PaltzElect. & Comp. Eng. 37

SUNY – New PaltzElect. & Comp. Eng. 37

Data Hazard Solution An instruction depends on completion of data access by a

previous instruction add $s0, $t0, $t1sub $t2, $s0, $t3

nop

nop

SUNY – New PaltzElect. & Comp. Eng. 38

SUNY – New PaltzElect. & Comp. Eng. 38

Software Solution

Have compiler guarantee no hazards Where should compiler insert “nop” instructions?

sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15, 100($2)

Problem: It happens too often to rely on compiler It really slows us down!

Page 20: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 20

SUNY – New PaltzElect. & Comp. Eng. 39

SUNY – New PaltzElect. & Comp. Eng. 39

Data HazardsSoftware Solution

IM Reg

IM Reg

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

sub $2, $1, $3

Programexecutionorder(in instructions)

and $12, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

10 10 10 10 10/– 20 – 20 – 20 – 20 – 20

or $13, $6, $2

add $14, $2, $2

sw $15, 100($2)

Value ofregister $2:

DM Reg

Reg

Reg

Reg

DM

SUNY – New PaltzElect. & Comp. Eng. 40

SUNY – New PaltzElect. & Comp. Eng. 40

Code Scheduling to Avoid Stalls C code for A = B + E; C = B + F;

lw $t1, 0($t0)

lw $t2, 4($t0)

add $t3, $t1, $t2

sw $t3, 12($t0)

lw $t4, 8($t0)

add $t5, $t1, $t4

sw $t5, 16($t0)

stall

stall

11 cycles not counting the dependencies

IM Reg

IM Reg

IM Reg DM Reg

IM DM Reg

IM DM Reg

DM Reg

Reg

Reg

Reg

DM

IM DM Reg

IM DM RegReg

Reg

Page 21: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 21

SUNY – New PaltzElect. & Comp. Eng. 41

SUNY – New PaltzElect. & Comp. Eng. 41

Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction

C code for A = B + E; C = B + F;

lw $t1, 0($t0)

lw $t2, 4($t0)

nop

nop

add $t3, $t1, $t2

sw $t3, 12($t0)

lw $t4, 8($t0)

nop

nop

add $t5, $t1, $t4

sw $t5, 16($t0)

stall

stall

lw $t1, 0($t0)

lw $t2, 4($t0)

nop

lw $t4, 8($t0)

add $t3, $t1, $t2

sw $t3, 12($t0)

add $t5, $t1, $t4

sw $t5, 16($t0)

12 cycles15 cycles

SUNY – New PaltzElect. & Comp. Eng. 42

SUNY – New PaltzElect. & Comp. Eng. 42

Forwarding (aka Bypassing) Use result when it is computed Don’t wait for it to be stored in a register Requires extra connections in the datapath

Page 22: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 22

SUNY – New PaltzElect. & Comp. Eng. 43

SUNY – New PaltzElect. & Comp. Eng. 43

Data Hazards in ALU Instructions

Consider this sequence:sub $2, $1,$3and $12,$2,$5or $13,$6,$2add $14,$2,$2sw $15,100($2)

We can resolve hazards with forwarding How do we detect when to forward?

§4.7 Data H

azards: Forw

arding vs. Stalling

SUNY – New PaltzElect. & Comp. Eng. 44

SUNY – New PaltzElect. & Comp. Eng. 44

Data Hazard Solution: Forwarding Use temporary results (ALU forwarding), don’t wait for them to be

written Also, write register file during 1st half of clock and read during 2nd

half

what if this $2 was $13?

IM R e g

IM R e g

C C 1 C C 2 C C 3 C C 4 C C 5 C C 6

T im e (in clock cycle s)

sub $ 2 , $ 1 , $ 3

P rograme xe cution orde r(in instructions)

a nd $ 1 2 , $ 2 , $5

IM R eg D M R e g

IM D M R e g

IM D M R e g

C C 7 C C 8 C C 9

10 1 0 1 0 1 0 1 0 /– 2 0 – 2 0 – 2 0 – 2 0 – 20

or $ 1 3 , $ 6 , $2

a dd $ 1 4 , $ 2 , $2

sw $ 15 , 1 0 0 ($ 2 )

V alue of re giste r $ 2 :

D M R e g

R eg

R e g

R e g

X X X – 20 X X X X XV a lue of E X /M E M :X X X X – 2 0 X X X XV a lue of M E M /W B :

D M

Page 23: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 23

SUNY – New PaltzElect. & Comp. Eng. 45

SUNY – New PaltzElect. & Comp. Eng. 45

Data Hazard Solution: Forwarding

what if this $2 was $13?

IM R e g

IM R e g

C C 1 C C 2 C C 3 C C 4 C C 5 C C 6

T im e (in clock cycle s)

sub $ 2 , $ 1 , $ 3

P rogra me xe cution orde r(in instructions)

a nd $ 1 2 , $ 2 , $5

C C 7 C C 8 C C 9

10 1 0 1 0 1 0 1 0 /– 2 0 – 2 0 – 2 0 – 2 0 – 20

or $ 1 3 , $ 6 , $2

a dd $ 1 4 , $ 2 , $2

sw $ 15 , 1 0 0 ($ 2 )

V alue of re giste r $ 2 :

D M R e g

R eg

X X X – 20 X X X X XV a lue of E X /M E M :X X X X – 2 0 X X X XV a lue of M E M /W B :

D M

SUNY – New PaltzElect. & Comp. Eng. 46

SUNY – New PaltzElect. & Comp. Eng.

Dependencies & Forwarding

Page 24: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 24

SUNY – New PaltzElect. & Comp. Eng. 47

SUNY – New PaltzElect. & Comp. Eng.

Forwarding Paths

SUNY – New PaltzElect. & Comp. Eng. 48

SUNY – New PaltzElect. & Comp. Eng. 48

Detecting the Need to Forward

Pass register numbers along pipeline e.g., ID/EX.RegisterRs = register number for Rs sitting in

ID/EX pipeline register

ALU operand register numbers in EX stage are given by ID/EX.RegisterRs, ID/EX.RegisterRt

Data hazards when1a. EX/MEM.RegisterRd = ID/EX.RegisterRs1b. EX/MEM.RegisterRd = ID/EX.RegisterRt2a. MEM/WB.RegisterRd = ID/EX.RegisterRs2b. MEM/WB.RegisterRd = ID/EX.RegisterRt

Fwd fromEX/MEMpipeline reg

Fwd fromMEM/WBpipeline reg

Page 25: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 25

SUNY – New PaltzElect. & Comp. Eng. 49

SUNY – New PaltzElect. & Comp. Eng. 49

Detecting the Need to Forward

But only if forwarding instruction will write to a register! EX/MEM.RegWrite, MEM/WB.RegWrite

And only if Rd for that instruction is not $zero EX/MEM.RegisterRd ≠ 0,

MEM/WB.RegisterRd ≠ 0

SUNY – New PaltzElect. & Comp. Eng. 50

SUNY – New PaltzElect. & Comp. Eng.

Datapath with Forwarding00 Register file

01 Mem. or earlier ALU

10 Prior ALU

Page 26: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 26

SUNY – New PaltzElect. & Comp. Eng. 51

SUNY – New PaltzElect. & Comp. Eng. 51

Forwarding Conditions

EX hazard if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)

and (EX/MEM.RegisterRd = ID/EX.RegisterRs))ForwardA = 10

if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)and (EX/MEM.RegisterRd = ID/EX.RegisterRt))

ForwardB = 10

MEM hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)

and (MEM/WB.RegisterRd = ID/EX.RegisterRs))ForwardA = 01

if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)and (MEM/WB.RegisterRd = ID/EX.RegisterRt))

ForwardB = 01

SUNY – New PaltzElect. & Comp. Eng. 52

SUNY – New PaltzElect. & Comp. Eng. 52

Hazard Conditions Steer the result from previous instruction to the ALU

EX hazardif (EX/MEM.RegWriteand (EX/MEM.RegisterRd ≠0)and (EX /MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10if (EX/MEM.RegWriteand (EX/MEM.RegisterRd ≠0)and (EX /MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10

PC Instructionmemory

Registers

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Forwardingunit

IF/ID

Inst

ruct

i on

Mux

RdEX/MEM.RegisterRd

MEM/WB.RegisterRd

RtRt

Rs

IF/ID.RegisterRd

IF/ID.RegisterRtIF/ID.RegisterRt

IF/ID.RegisterRs

00 Register file

01 Mem. or earlier ALU

10 Prior ALU

Page 27: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 27

SUNY – New PaltzElect. & Comp. Eng. 53

SUNY – New PaltzElect. & Comp. Eng. 53

Hazard Conditions Steer the result from precious instruction to the ALU

MEM hazardif (MEM/WB.RegWriteand (MEM/WB.RegisterRd ≠0)and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01if (MEM/WB.RegWriteand (MEM/WB.RegisterRd ≠0)and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01

PC Instructionmemory

Registers

Mux

Mux

Control

ALU

EX

M

WB

M

WB

WB

ID/EX

EX/MEM

MEM/WB

Datamemory

Mux

Forwardingunit

IF/ID

Inst

ruct

ion

Mux

RdEX/MEM.RegisterRd

MEM/WB.RegisterRd

RtRt

Rs

IF/ID.RegisterRd

IF/ID.RegisterRtIF/ID.RegisterRt

IF/ID.RegisterRs

SUNY – New PaltzElect. & Comp. Eng. 54

SUNY – New PaltzElect. & Comp. Eng. 54

Double Data Hazard

Consider the sequence:add $1,$1,$2add $1,$1,$3add $1,$1,$4

Both hazards occur Want to use the most recent

Revise MEM hazard condition Only fwd if EX hazard condition isn’t true

IM R eg D M R e g

IM D M R e g

IM D M R e gR e g

R eg

Page 28: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 28

SUNY – New PaltzElect. & Comp. Eng. 55

SUNY – New PaltzElect. & Comp. Eng. 55

Revised Forwarding Condition

MEM hazard

if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)

and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)

and (EX/MEM.RegisterRd = ID/EX.RegisterRs))

and (MEM/WB.RegisterRd = ID/EX.RegisterRs))

ForwardA = 01

if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)

and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)

and (EX/MEM.RegisterRd = ID/EX.RegisterRt))

and (MEM/WB.RegisterRd = ID/EX.RegisterRt))

ForwardB = 01

SUNY – New PaltzElect. & Comp. Eng. 56

SUNY – New PaltzElect. & Comp. Eng. 56

Can't always forward lw can still cause a hazard: An instruction tries to read a register following a load

instruction that writes to the same register.

Thus, we need a hazard detection unit to “stall” the load instruction

Reg

IM

Reg

Reg

IM

CC 1 CC 2 CC 3 CC 4 CC 5 CC 6

Time (in clock cycles)

lw $2, 20($1)

Programexecutionorder(in instructions)

and $4, $2, $5

IM Reg DM Reg

IM DM Reg

IM DM Reg

CC 7 CC 8 CC 9

or $8, $2, $6

add $9, $4, $2

slt $1, $6, $7

DM Reg

Reg

Reg

DM

Need to stall for one cycle

Page 29: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 29

SUNY – New PaltzElect. & Comp. Eng. 57

SUNY – New PaltzElect. & Comp. Eng. 57

Load-Use Data Hazard Can’t always avoid stalls by forwarding If value not computed when needed Can’t forward backward in time!

SUNY – New PaltzElect. & Comp. Eng. 58

SUNY – New PaltzElect. & Comp. Eng. 58

Load-Use Hazard Detection

Check when using instruction is decoded in ID stage ALU operand register numbers in ID stage are given by IF/ID.RegisterRs, IF/ID.RegisterRt

Load-use hazard when ID/EX.MemRead and

((ID/EX.RegisterRt = IF/ID.RegisterRs) or(ID/EX.RegisterRt = IF/ID.RegisterRt))

If detected, stall and insert bubble

Page 30: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 30

SUNY – New PaltzElect. & Comp. Eng. 59

SUNY – New PaltzElect. & Comp. Eng. 59

How to Stall the Pipeline

Force control values in ID/EX register to 0 EX, MEM and WB do nop (no-operation)

Prevent update of PC and IF/ID register Using instruction is decoded again Following instruction is fetched again

1-cycle stall allows MEM to read data for lw Can subsequently forward to EX stage

SUNY – New PaltzElect. & Comp. Eng. 60

SUNY – New PaltzElect. & Comp. Eng.

Stall/Bubble in the Pipeline

Stall inserted here

Page 31: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 31

SUNY – New PaltzElect. & Comp. Eng. 61

SUNY – New PaltzElect. & Comp. Eng.

Stall/Bubble in the Pipeline

Or, more accurately…

SUNY – New PaltzElect. & Comp. Eng. 62

SUNY – New PaltzElect. & Comp. Eng.

Datapath with Hazard Detection Stall by letting an instruction that won’t write anything go forward Controls writing of the PC and IF/ID plus MUX

Page 32: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 32

SUNY – New PaltzElect. & Comp. Eng. 63

SUNY – New PaltzElect. & Comp. Eng. 63

Code Scheduling to Avoid Stalls Revisiting reordering code to avoid use of load result in the next

instruction

C code for A = B + E; C = B + F;

lw $t1, 0($t0)

lw $t2, 4($t0)

nop

add $t3, $t1, $t2

sw $t3, 12($t0)

lw $t4, 8($t0)

nop

add $t5, $t1, $t4

sw $t5, 16($t0)

stall

stall

lw $t1, 0($t0)

lw $t2, 4($t0)

lw $t4, 8($t0)

add $t3, $t1, $t2

sw $t3, 12($t0)

add $t5, $t1, $t4

sw $t5, 16($t0)

11 cycles

13 cycles

SUNY – New PaltzElect. & Comp. Eng. 64

SUNY – New PaltzElect. & Comp. Eng. 64

Control Hazards

Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can’t always fetch correct instruction Still working on ID stage of branch

In MIPS pipeline Need to compare registers and compute target early in the pipeline Add hardware to do it in ID stage

Page 33: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 33

SUNY – New PaltzElect. & Comp. Eng. 65

SUNY – New PaltzElect. & Comp. Eng. 65

Branch Hazards When decide to branch, other instructions may be in the pipeline!

If branch outcome determined in MEM

§4.8 Control H

azards

PC

Flush theseinstructions(Set controlvalues to 0)

SUNY – New PaltzElect. & Comp. Eng. 66

SUNY – New PaltzElect. & Comp. Eng. 66

Stall on Branch Wait until branch outcome determined before fetching next

instruction

Page 34: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 34

SUNY – New PaltzElect. & Comp. Eng. 67

SUNY – New PaltzElect. & Comp. Eng. 67

Branch Prediction

Longer pipelines can’t readily determine branch outcome early Stall penalty becomes unacceptable

Predict outcome of branch Only stall if prediction is wrong

In MIPS pipeline Can predict branches not taken Fetch instruction after branch, with no delay Need to add hardware for flushing instructions if we are wrong

SUNY – New PaltzElect. & Comp. Eng. 68

SUNY – New PaltzElect. & Comp. Eng.

MIPS with Predict Not Taken

Prediction correct

Prediction incorrect

Page 35: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 35

SUNY – New PaltzElect. & Comp. Eng. 69

SUNY – New PaltzElect. & Comp. Eng. 69

Our Original Datapath

PC

Instructionmemory

Inst

ruct

ion

Add

Instruction[20– 16]

Mem

toR

eg

ALUOp

Branch

RegDst

ALUSrc

4

16 32Instruction[15– 0]

0

0

Mux

0

1

Add Addresult

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

Signextend

Mux1

ALUresult

Zero

Writedata

Readdata

Mux

1

ALUcontrol

Shiftleft 2

Reg

Writ

e

MemRead

Control

ALU

Instruction[15– 11]

6

EX

M

WB

M

WB

WBIF/ID

PCSrc

ID/EX

EX/MEM

MEM/WB

Mux

0

1

Mem

Wr it

e

AddressData

memory

Address

SUNY – New PaltzElect. & Comp. Eng. 70

SUNY – New PaltzElect. & Comp. Eng. 70

Reduce Branch Delay

PC Instructionmemory

4

Registers

Mux

Mux

Mux

ALU

EX

M

WB

M

WB

WB

ID/EX

0

EX/MEM

MEM/WB

Datamemory

Mux

Hazarddetection

unit

Forwardingunit

IF.Flush

IF/ID

Signextend

Control

Mux

=

Shiftleft2

Mux

Page 36: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 36

SUNY – New PaltzElect. & Comp. Eng. 71

SUNY – New PaltzElect. & Comp. Eng. 71

Reducing Branch Delay

Move hardware to determine outcome to ID stage Target address adder Register comparator

Example: branch taken36: sub $10, $4, $8

40: beq $1, $3, 744: and $12, $2, $548: or $13, $2, $652: add $14, $4, $256: slt $15, $6, $7

...72: lw $4, 50($7)

SUNY – New PaltzElect. & Comp. Eng. 72

SUNY – New PaltzElect. & Comp. Eng.

Example: Branch Taken

Page 37: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 37

SUNY – New PaltzElect. & Comp. Eng. 73

SUNY – New PaltzElect. & Comp. Eng.

Example: Branch Taken

48

SUNY – New PaltzElect. & Comp. Eng. 74

SUNY – New PaltzElect. & Comp. Eng. 74

Pipeline Summary

Pipelining improves performance by increasing instruction throughput Executes multiple instructions in parallel Each instruction has the same latency

Subject to hazards Structure, data, control

Instruction set design affects complexity of pipeline implementation

The BIG Picture

Page 38: Chapter 4 (Part II)bai/EGC442/Lecture Notes Comp_arch_chp4_5th(2).pdf · Chapter 4 –The Processor EGC442 Introduction to Computer Architecture Division of Engineering Programs –SUNY

Chapter 4 – The Processor EGC442 Introduction to Computer Architecture

Division of Engineering Programs – SUNY New Paltz 38

SUNY – New PaltzElect. & Comp. Eng. 75

SUNY – New PaltzElect. & Comp. Eng. 75

Stalls and Performance

Stalls reduce performance But are required to get correct results

Compiler can arrange code to avoid hazards and stalls Requires knowledge of the pipeline structure

The BIG Picture


Recommended