+ All Categories
Home > Documents > Data Hazards

Data Hazards

Date post: 03-Feb-2016
Category:
Upload: sivakumarb92
View: 221 times
Download: 0 times
Share this document with a friend
Description:
Processor Data Hazards
29
Data Hazards 1
Transcript
Page 1: Data Hazards

Data Hazards

1

Page 2: Data Hazards

Hazards: Key Points• Hazards cause imperfect pipelining• They prevent us from achieving CPI = 1• They are generally causes by “counter flow” data dependences in

the pipeline

• Three kinds• Structural -- contention for hardware resources• Data -- a data value is not available when/where it is needed.• Control -- the next instruction to execute is not known.

• Two ways to deal with hazards• Removal -- add hardware and/or complexity to work around the

hazard so it does not exist• Bypassing/forwarding• Speculation

• Stall -- Sacrifice performance to prevent the hazard from occurring• Stalling causes “bubbles”

2

Page 3: Data Hazards

Data Dependences• A data dependence occurs whenever one

instruction needs a value produced by another.• Register values (for now)• Also memory accesses (more on this later)

3

add $s0, $t0, $t1

sub $t2, $s0, $t3

add $t3, $s0, $t4

and $t3, $t2, $t4

sw $t1, 0($t2)

ld $t3, 0($t2)

ld $t4, 16($s4)

Page 4: Data Hazards

• In our simple pipeline, these instructions cause a hazard

Dependences in the pipeline

4

EXDeco

de

Fetch Mem Write

backadd $s0, $t0, $t1

EXDeco

de

Fetch Mem Write

backsub $t2, $s0, $t3

Cycles

Page 5: Data Hazards

How can we fix it?

• Ideas?

5

Page 6: Data Hazards

Solution 1: Make the compiler deal with it.

• Expose hazards to the big A architecture• A result is available N instructions after the instruction

that generates it.• In the meantime, the register file has the old value.• “delay slots”

• What is N?• Can it change?• What can the compiler do?

6

EXDeco

de

Fetch Mem Write

back

Page 7: Data Hazards

Compiling for delay slots

7

add $s0, $t0, $t1

sub $t2, $s0, $t3

add $t3, $s0, $t4

and $t7, $t5, $t4

add $s0, $t0, $t1

and $t7, $t5, $t4

sub $t2, $s0, $t3

add $t3, $s0, $t4

Rearrange

instructions

• The compiler must fill the delay slots with other instructions• What if it can’t?

No-ops

Page 8: Data Hazards

Solution 2: Stall

• When you need a value that is not ready, “stall”• Suspend the execution of the executing instruction• and those that follow.• This introduces a pipeline “bubble.” A bubble is a lack of

work to do. It moves through the pipeline like an instruction.

8

EXDeco

de

Fetch Mem Write

backadd $s0, $t0, $t1

Fetchsub $t2, $s0, $t3

Cycles

EXDeco

de

Mem Write

backStall

Page 9: Data Hazards

Stalling the pipeline

• Freeze all pipeline stages before the stage where the hazard occurred.• Disable the PC update• Disable the pipeline registers

• This essentially equivalent to always inserting a nop when a hazard exists• Insert nop control bits at stalled stage (decode in our

example) • How is this solution still potentially “better” than relying

on the compiler?

9

The compiler can still act like there are delay slots to avoid stalls.Implementation details are not exposed in the ISA

Page 10: Data Hazards

The Impact of Stalling On Performance

• ET = I * CPI * CT• I and CT are constant• What is the impact of stalling on CPI?

• What do we need to know to figure it out?

10

Page 11: Data Hazards

The Impact of Stalling On Performance

• ET = I * CPI * CT• I and CT are constant• What is the impact of stalling on CPI?

• Fraction of instructions that stall: 30%• Baseline CPI = 1• Stall CPI = 1 + 2 = 3

• New CPI =

11

0.3*3 + 0.7*1 = 1.6

Page 12: Data Hazards

Solution 3: Bypassing/Forwarding

• Data values are computed in Ex and Mem but “publicized in write back”

• The data exists! We should use it.

12

EXDeco

de

Fetch Mem Write

back

results known Results "published"

to registersinputs are needed

Page 13: Data Hazards

• Take the values, where ever they are

Bypassing or Forwarding

13

EXDeco

de

Fetch Mem Write

backadd $s0, $t0, $t1

EXDeco

de

Fetch Mem Write

backsub $t2, $s0, $t3

Cycles

Page 14: Data Hazards

Forwarding Paths

14

EXDeco

de

Fetch Mem Write

backadd $s0, $t0, $t1

EXDeco

de

Fetch Mem Write

backsub $t2, $s0, $t3

Cycles

EXDeco

de

Fetch Mem Write

back

EXDeco

de

Fetch Mem Write

back

sub $t2, $s0, $t3

sub $t2, $s0, $t3

Page 15: Data Hazards

Forwarding in Hardware

ReadAddress

Instruc(onMemory

Add

PC

4

Write  Data

Read  Addr  1

Read  Addr  2

Write  Addr

Register

File

Read  Data  1

Read  Data  2

16 32

ALU

Shi<le<  2

Add

DataMemory

Address

Write  Data

ReadDataIF

etch/D

ec

Dec/Exec

Exec/M

em

Mem

/WB

SignExtend

Add

Page 16: Data Hazards

Forwarding for Loads

• Load values come from the Mem stage

16

EXDeco

de

Fetch Mem Write

backld $s0, (0)$t0

EXDeco

de

Fetch Memsub $t2, $s0, $t3

Cycles

Time travel presents significantimplementation challenges

Page 17: Data Hazards

What can we do?

• Punt to the compiler• Easy enough.• Will work.• Same dangers apply as before.

• Always stall.• Forward when possible, stall otherwise• Here the compiler still has leverage• If the compiler can’t fix it, the hardware will stall

17

Page 18: Data Hazards

Hardware Cost of Forwarding

• In our pipeline, adding forwarding required relatively little hardware.• For deeper pipelines it gets much more

expensive• Roughly: ALU * pipeline stages you need to forward over• Some modern processor have multiple ALUs (4-5)• And deeper pipelines (4-5 stages of to forward across)

• Not all forwarding paths need to be supported.• If a path does not exist, the processor will need to stall.

18

Page 19: Data Hazards

Key Points: Control Hazards

• Control occur when we don’t know what the next instruction is• Mostly caused by branches• Strategies for dealing with them• Stall• Guess!• Leads to speculation• Flushing the pipeline• Strategies for making better guesses

• Understand the difference between stall and flush

19

Page 20: Data Hazards

Control Hazards

• Computing the new PC

20

add $s1, $s3, $s2

sub $s6, $s5, $s2

beq $s6, $s7, somewhere

and $s2, $s3, $s1

EXDeco

de

Fetch Mem Write

back

Page 21: Data Hazards

Computing the PC

• Non-branch instruction• PC = PC + 4

• When is PC ready?

21

EXDeco

de

Fetch Mem Write

back

Page 22: Data Hazards

Computing the PC

• Branch instructions• bne $s1, $s2, offset• if ($s1 != $s2) { PC = PC + offset} else {PC = PC + 4;}

• When is the value ready?

22

EXDeco

de

Fetch Mem Write

back

Page 23: Data Hazards

Option 2: Simple Prediction

• Can a processor tell the future?• For non-taken branches, the new PC is ready

immediately.• Let’s just assume the branch is not taken• Also called “branch prediction” or “control

speculation”• What if we are wrong?

23

Page 24: Data Hazards

Predict Not-taken

• We start the add, and then, when we discover the branch outcome, we squash it.• We “flush” the pipeline.

24

EXDeco

de

Fetch Mem Write

backbne $t2, $s0, somewhere

Cycles

bne $t2, $s4, else

...

else:

sub $t2, $s0, $t3

EXDeco

de

Fetch Mem Write

backTaken

Not-taken

add $s0, $t0, $t1EX

Deco

de

Fetch Mem Write

back

EXDeco

de

Fetch Mem Write

back

Squash

Page 25: Data Hazards

Simple “static” Prediction

• “static” means before run time• Many prediction schemes are possible• Predict taken• Pros?

• Predict not-taken• Pros?

25

Backward Taken/Forward not takenBest of both worlds.

Loops are commons

Not all branches are for loops.

Page 26: Data Hazards

Implementing Backward taken/forward not taken

• Changes in control• New inputs to the control unit• The sign of the offset• The result of the branch

• New outputs from control• The flush signal.• Inserts “noop” bits in datapath and control

26

Page 27: Data Hazards

The Importance of Pipeline depth

• There are two important parameters of the pipeline that determine the impact of branches on performance• Branch decode time -- how many cycles does it take to

identify a branch (in our case, this is less than 1)• Branch resolution time -- cycles until the real branch

outcome is known (in our case, this is 2 cycles)

27

Page 28: Data Hazards

Pentium 4 pipeline1.Branches take 19 cycles to resolve2.Identifying a branch takes 4 cycles.3.Stalling is not an option.4.Not quite as bad now, but BP is still very important.

Page 29: Data Hazards

Dynamic Branch Prediction

• Long pipes demand higher accuracy than static schemes can deliver.• Instead of making the the guess once, make it

every time we see the branch.• Predict future behavior based on past behavior

29


Recommended