+ All Categories
Home > Documents > 04 Pipeline

04 Pipeline

Date post: 09-Apr-2018
Category:
Upload: nehru2010
View: 219 times
Download: 0 times
Share this document with a friend
31
Computer Architecture Computer Architectures Pipelining Giorgio Richelli
Transcript
Page 1: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 1/31

Computer Architecture

Computer Architectures

Pipelining

Giorgio Richelli

Page 2: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 2/31

Page 3: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 3/31

Computer Architecture

Why Pipeline?

n

s

t r.

O

er 

Time (clock cycles)

Inst 0

Inst 1

Inst 2

Inst 4

Inst 3

A     L     

 U     Im Reg Dm Reg

A     

L      U     Im Reg Dm Reg

A     

L      U     Im Reg Dm Reg

A     

L      U     Im Reg Dm Reg

A     

L      U     Im Reg Dm Reg

Slide courtesy of D. Patterson

Page 4: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 4/31

Computer Architecture

Why Pipeline?

• Suppose we execute 100 instructions• Single Cycle Machine– 45 ns/cycle x 1 CPI x 100 inst = 4500 ns

• Multicycle Machine

– 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns• Ideal pipelined machine

– 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns

Slide courtesy of D. Patterson

Page 5: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 5/31

Computer Architecture

Benefits of Pipelining

• Before pipelining:– Throughput: 1 instruction per cycle

– (or lower cycle time and CPI=5)

• After pipelining (multiple instructions in pipe at one time)– Throughput: 1 instruction per cycle

W M  X  R F clk t t t t t t  ++++=

latchW M  X  R F clk t t t t t t t  += ),,,,max(

Page 6: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 6/31

Computer Architecture

Visualizing Pipelining

I n s 

t r.

O r d 

e r 

Time (clock cycles) 

Reg      A       L       U 

DMemIfetch Reg

Reg      A       L       U 

DMemIfetch Reg

Reg      A       L       U 

DMemIfetch Reg

Reg      A       L       U 

DMemIfetch Reg

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5

Page 7: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 7/31

Page 8: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 8/31

Page 9: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 9/31

Computer Architecture

• Read After Write (RAW) InstrJ tries to read operand before InstrIwrites it

• Caused by a “Dependence” (in compiler nomenclature).

• This hazard results from an actual need forcommunication between pipeline stages.

Data Hazards: RAW

I: add r1,r2,r3J: sub r4,r1,r3

Page 10: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 10/31

Computer Architecture

• Write After Read (WAR) InstrJ writes operand before InstrIreads it

• Called an “anti-dependence” by compiler writers.This results from reuse of the name “r1”.

• Can’t happen in the sample 5-stage pipeline because:– All instructions take 5 stages, and– Reads are always in stage 2, and– Writes are always in stage 5

I: sub r4,r1,r3

J: add r1,r2,r3

K: mul r6,r1,r7

Data Hazards: WAR

Page 11: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 11/31

Computer Architecture

Data Hazards: WAW

• Write After Write (WAW) InstrJ writes operand before InstrIwrites it.

• Called an “output dependence” by compiler writersThis also results from the reuse of name “r1”.

I: sub r1,r4,r3

J: add r1,r2,r3

K: mul r6,r1,r7

Page 12: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 12/31

Computer Architecture

Data Hazards (RAW)

Cycle

F

        I      n      s 

       t       r      u       c 

       t         i      o 

      n

R X M W

F R X M W

Write Data to R1 Here

Read from R1 Here ADD R1, R2, R3

 ADD R4, R1, R5

Page 13: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 13/31

Computer Architecture

Types of Data Hazards

• RAW (read after write)– only hazard for ‘fixed’

pipelines

– later instruction must read  after earlier instructionwrites 

• WAW (write after write)– variable-length pipeline (e.g.

FP/int)

– later instruction must write  after earlier instructionwrites 

• WAR (write after read)– pipelines with late read

– later instruction must write  after earlier instruction reads 

F R A M W

F R A M W

F R 1 2 3

F R A M W

4 W

F R 1 2 3

F R A M W

4 R 5 W

Page 14: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 14/31

Computer Architecture

Datapath vs Control

• Datapath: Storage, FU, interconnect sufficient to perform the desired functions– Inputs are Control Points– Outputs are signals

• Controller: State machine to orchestrate operation on the data path– Based on desired function and signals

Datapath

Regs

A

L

U

Controller 

Control Points

signals

Page 15: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 15/31

Computer Architecture

Control Hazards

Cycle

F

        I

      n      s 

       t       r      u       c 

       t         i      o 

      n

R X M W

F R X M W

Destination Available Here

Need Destination HereJR R25

...

XX: ADD ...

Page 16: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 16/31

Computer Architecture

Control Hazards

I n 

s t r.

O r d e r 

Time (clock cycles) 

Load 

Instr 1

Instr 2

Instr 3

Instr 4

Reg      A       L       U 

DMemIfetch Reg

Reg       A 

      L       U 

DMemIfetch Reg

Reg      A       L       U 

DMemIfetch Reg

Reg      A       L       U 

DMemIfetch Reg

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5

Reg      A       L       U 

DMemIfetch Reg

Page 17: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 17/31

Computer Architecture

Resolving Hazards: Pipeline Stalls

• Can resolve any type of hazard– data, control, or structural

• Detect the hazard

• Freeze the pipeline up to the dependent stage

until the hazard is resolved

Page 18: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 18/31

Computer Architecture

Resolving Hazards

I n 

s t r.

r d e r 

Time (clock cycles) 

Load 

Instr 1

Instr 2

Stall

Instr 3

Reg      A       L       U 

DMemIfetch Reg

Reg      A       L       U  DMemIfetch Reg

Reg      A       L       U 

DMemIfetch Reg

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5

Reg      A       L       U 

DMemIfetch Reg

Bubble Bubble Bubble BubbleBubble

Page 19: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 19/31

Computer Architecture

Example Pipeline Stall (Diagram)

Cycle

F

        I      n      s 

       t       r      u       c 

       t         i      o 

      n

R X M W

F R X M W

Write Data to R1 Here

Read from R1 Here

 ADD R1, R2, R3

 ADD R4, R1, R5

Bubble

Page 20: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 20/31

Computer Architecture

Time (clock cycles) 

Forwarding to Avoid Data Hazard

I n s t 

r.

O r d e r 

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

Reg      A       L       U 

DMemIfetch Reg

Reg      A       L       U 

DMemIfetch Reg

Reg      A       L       U 

DMemIfetch Reg

Reg      A 

      L       U 

DMemIfetch Reg

Reg      A       L       U 

DMemIfetch Reg

Page 21: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 21/31

Computer Architecture

Speedup Equation for Pipelining

pipelined

dunpipeline

TimeCycleTimeCycle 

CPIstallPipelineCPIIdealdepthPipelineCPIIdeal Speedup ×

+×=

pipelined

dunpipeline

TimeCycle

TimeCycle 

CPIstallPipeline1

depthPipeline Speedup ×

+=

InstpercyclesStallAverageCPIIdealCPIpipelined +

For simple RISC pipeline, Ideal CPI = 1:

Page 22: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 22/31

Computer Architecture

Example: Dual-port vs. Single-port

• Machine A: Dual-ported memory (“Harvard Architecture”)• Machine B: Single-ported memory, but its pipelined

implementation has a 1.05 times faster clock rate

• Ideal CPI = 1 for both

• Loads are 40% of instructions executedSpeedUpA = Pipeline Depth/(1 + 0) x (clockunpipe/clockunpipe)

= Pipeline Depth

SpeedUpB = Pipeline Depth/(1 + 0.4 x 1) x (clockunpipe/(clockunpipe/ 1.05)

= (Pipeline Depth/1.4) x 1.05= 0.75 x Pipeline Depth

SpeedUpA / SpeedUpB = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33

• Machine A is 1.33 times faster

Page 23: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 23/31

Computer Architecture

Producing fast code for 

a = b + c;

d = e – f;

assuming a, b, c, d ,e, and f in memory.

 

Slow code:

LW Rb,b

LW Rc,c

ADD Ra,Rb,Rc

SW a,Ra

LW Re,e

LW Rf ,f 

SUB Rd,Re,Rf 

SW d,Rd

Software Schedulingto Avoid Load Hazards

Faster code:LW Rb,bLW Rc,cLW Re,e ADD Ra,Rb,RcLW Rf,f

SW a,Ra SUB Rd,Re,RfSW d,Rd

Page 24: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 24/31

Computer Architecture

Branch Stall Impact

• If CPI = 1, 30% branch,Stall 3 cycles => new CPI = 1.9

• Solution:– Determine branch taken or not sooner, AND

– Compute taken branch address earlier

Page 25: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 25/31

Computer Architecture

Branch Hazard Alternatives

Stall until branch direction is clear

Predict Branch (Taken or Not Taken)– Execute successor instructions in sequence (if not

taken) and “squash” instructions in pipeline if branchmispredicted

– Must have already calculated branch target address

Page 26: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 26/31

Computer Architecture

Branch Hazard Alternatives

Delayed Branch– Define branch to take place AFTER a following instruction

branch instructionsequential successor

1

sequential successor2

........sequential successor

n

branch target if taken

– MIPS, SPARC

– Experience has shown it has issues:• Makes code difficult to maintain and debug• Compiler has to find an instruction which is safe to

execute regarless if the branch is taken or not• If the hardware changes, old programs may not work at all

Branch delay of length n 

Page 27: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 27/31

Computer Architecture

Scheduling Branch Delay Slots

• A is the best choice, fills delay slot & reduces instruction count (IC)• In B, the sub instruction may need to be copied, increasing IC• In B and C, must be okay to execute sub when branch fails

add $1,$2,$3

if $2=0 then

delay slot

A. From before branch B. From branch target C. From fall through

add $1,$2,$3

if $1=0 then

delay slot

add $1,$2,$3

if $1=0 then

delay slot

sub $4,$5,$6

sub $4,$5,$6

becomes becomes becomes

 if $2=0 then

add $1,$2,$3add $1,$2,$3

if $1=0 then

sub $4,$5,$6

add $1,$2,$3

if $1=0 then

sub $4,$5,$6

Page 28: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 28/31

Computer Architecture

Delayed Branch

• Compiler effectiveness for single branch delay slot:– Fills about 60% of branch delay slots

– About 80% of instructions executed in branch delay slotsuseful in computation

– About 50% (60% x 80%) of slots usefully filled

• Delayed branching has lost popularity compared tomore expensive but more flexible dynamicapproaches

• Growth in available transistors has made dynamicapproaches relatively cheaper

Page 29: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 29/31

Computer Architecture

Exceptions and Interrupts

• Exception: An unusual event happens to aninstruction during its execution

– Examples: divide by zero, undefined opcode

• Interrupt: Hardware signal to switch processor tonew instruction stream

– Example: sound card interrupts when it needs more audiooutput samples (audio “click” happens if it is left waiting)

Page 30: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 30/31

Computer Architecture

Exceptions and Interrupts

• Exception or interrupt must appear to happenbetween 2 instructions (Ii and Ii+1)

• Interrupt (exception) handler either abortsprogram or restarts at instruction Ii+1

Page 31: 04 Pipeline

8/8/2019 04 Pipeline

http://slidepdf.com/reader/full/04-pipeline 31/31

Computer Architecture

Precise Exceptions (Sequential Processor)

• When interrupt occurs, state of interrupted process is saved,including PC, registers, and memory

• Interrupt is precise if the following three conditions hold– All instructions preceding u have been executed, and have modified the

state correctly

– All instructions following u are unexecuted, and have not modified the

state– If the interrupt was caused by an instruction, it was caused by

instruction u , which is either completely executed (e.g.: overflow) orcompletely unexecuted (e.g: VM page fault)

• Precise interrupts are desirable if software is to fix up error thatcaused interrupt and execution has to be resumed

– Easy for external interrupts, could be complex and costly for internal– Imperative for some interrupts (VM page faults, IEEE FP standard)


Recommended