+ All Categories
Home > Documents > Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental...

Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental...

Date post: 09-Jun-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
64
1 Topic 10: Pipelining COS / ELE 375 Computer Architecture and Organization Princeton University Fall 2015 Prof. David August
Transcript
Page 1: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

1

Topic 10: Pipelining

COS / ELE 375

Computer Architecture and Organization

Princeton University Fall 2015

Prof. David August

Page 2: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

2

Pipelining is Natural: Assembly Line!

Laundry Example •  Ann, Brian, Cathy, Dave

each have one load of clothes to wash, dry, and fold

•  Washer takes 30 minutes

•  Dryer takes 30 minutes

•  “Folder” takes 30 minutes

•  “Stasher” takes 30 minutes to put clothes into drawers

A B C D

Page 3: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

3

Sequential Laundry

Sequential laundry takes 8 hours for 4 loads If they learned pipelining, how long would laundry take?

30 T a s k O r d e r

B

C D

A Time 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30

6 PM 7 8 9 10 11 12 1 2 AM

Page 4: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

4

Pipelined Laundry: Start work ASAP

•  Pipelined laundry takes 3.5 hours for 4 loads!

T a s k O r d e r

12 2 AM 6 PM 7 8 9 10 11 1

Time

B C D

A 30 30 30 30 30 30 30

Page 5: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

5

30

Slow Dryers

5.5 Hours. What is going on here?

T a s k O r d e r

12 2 AM 6 PM 7 8 9 10 11 1

Time

B C D

A 30 30 30 30 30 30 30 30 30 30

Page 6: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

6

Pipelining Lessons

1.  Pipelining doesn’t help latency of single task, it helps throughput of entire workload

2.  Multiple tasks operate simultaneously using different resources

3.  Potential speedup = Number pipe stages

4.  Pipeline rate limited by slowest pipeline stage

5.  Unbalanced lengths of pipe stages reduces speedup

6.  Time to “fill” pipeline and time to “drain” it reduces speedup

7.  Stall for Dependences

6 PM 7 8 9 Time

B C D

A 30 30 30 30 30 30 30

T a s k O r d e r

Page 7: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

7

Page 8: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

1.  Instruction Fetch

2.  Instruction Decode and Register Fetch

3.  Execution, Memory Address Computation, or Branch Completion

4.  Memory Access or R-type instruction completion

5.  Write-Back Step

MIPS Pipe Stages == The Five Execution Steps

Page 9: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

9

Pipelining in MIPS

IDEAL?

Page 10: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

10

Can We Pipeline the Multicycle Datapath?

Page 11: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Can We Pipeline the Unicycle Datapath?

Page 12: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Unicycle

How do we split the datapath into stages?

Page 13: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Basic Idea

Page 14: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Slicing of Datapath

Rectangles are pipeline registers

Page 15: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Slicing of Datapath

Anything wrong in this picture?

Page 16: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Corrected Datapath

Other(?) Control Signals?

Page 17: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

17

Another View: Single Cycle, Multiple Cycle, vs. Pipeline

Clk

Cycle 1

Multiple Cycle Implementation:

Ifetch Reg Exec Mem Wr

Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10

Load Ifetch Reg Exec Mem Wr

Ifetch Reg Exec Mem Load Store

Pipeline Implementation:

Ifetch Reg Exec Mem Wr Store

Clk

Single Cycle Implementation:

Load Store Waste

Ifetch R-type

Ifetch Reg Exec Mem Wr R-type

Cycle 1 Cycle 2

Looks good, but….

Page 18: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

18

Performance? (Is it worth the pain?) Unicycle Machine 45 ns/cycle x 1 CPI x 100 inst = 4500 ns

Multicycle Machine 10 ns/cycle x 4.6 CPI (inst mix) x 100 inst = 4600 ns

Ideal pipelined machine with 5 pipeline stages 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns

Page 19: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

19

Unicycle Implementation Detail

Comb. Logic

R E G

30ns 3ns

Clock

Delay = 33ns Throughput = 30MHz

Time

Unpipelined System

Op1 Op2 Op3 • • •

•  One operation must complete before next can begin •  Operations spaced 33ns apart

Page 20: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

20

3 Stage Pipeline Implementation Detail

•  Space operations 13ns apart

•  3 operations executing simultaneously

R E G

Clock

Comb. Logic

R E G

Comb. Logic

R E G

Comb. Logic

10ns 3ns 10ns 3ns 10ns 3ns

Delay = 39ns Throughput = 77MHz

Time

Op1

Op2

Op3

• • •

Op4

Page 21: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

21

Limitation 1: Nonuniform Pipelining

Clock

R E G

Com. Log.

R E G

Comb. Logic

R E G

Comb. Logic

5ns 3ns 15ns 3ns 10ns 3ns

Delay = 18 * 3 = 54 ns Throughput = 55MHz

•  Throughput limited by slowest stage Delay determined by clock period * number of stages

•  Must attempt to balance stages

Page 22: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

22

Limitation 2: Deep Pipelines

•  Diminishing returns as we add more pipeline stages •  Register delays become limiting factor

•  Increased latency •  Small throughput gains

Unfortunately, there are other complications…

Delay = 48ns, Throughput = 128MHz Clock

R E G

Com. Log.

5ns 3ns

R E G

Com. Log.

5ns 3ns

R E G

Com. Log.

5ns 3ns

R E G

Com. Log.

5ns 3ns

R E G

Com. Log.

5ns 3ns

R E G

Com. Log.

5ns 3ns

Page 23: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

23

Page 24: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Pipeline Hazards

Next instruction cannot immediately follow previous instruction in the presence of a hazard.

Three types: Structural, Control, Data

Structural Hazards •  Resource oversubscription •  Suppose we had only one memory •  In laundry, think of a washer/dryer combo unit

Page 25: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Pipeline Hazards Control Hazards

•  What is the next instruction? •  Branch instructions take time to compute this. Solution 1: Stall

Pipeline Stall (AKA Bubble)

Page 26: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Pipeline Hazards Control Hazards

•  What is the next instruction? •  Branch instructions take time to compute this. Solution 2: Predict the Branch Target

Page 27: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Pipeline Hazards Control Hazards

•  What is the next instruction? •  Branch instructions take time to compute this. Solution 2: (Mis)Predict the Branch Target

Page 28: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Pipeline Hazards Control Hazards

•  What is the next instruction? •  Branch instructions take time to compute this. Solution 3: Delayed Decision (Used in MIPS)

More about Branch Prediction/Delayed Branching Later…

50% Fill Rate

Page 29: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Pipeline Hazards Data Hazards

Value from prior instruction is needed before write back Typical Instruction (new representation):

Page 30: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Pipeline Hazards Data Hazards

Value from prior instruction is needed before write back Data Hazard:

Solution: Bypassing

Page 31: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Pipeline Hazards Data Hazards

Value from prior instruction is needed before write back Load-Use Data Hazard:

Options: Delayed Load or Bubble

Page 32: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

32

Summary and Real Stuff

Summary •  Pipelining is a fundamental concept in computers/nature

•  Multiple instructions in flight •  Limited by length of longest stage, Latency vs.Throughput

•  Hazards gum up the works Real Stuff •  MIPS I instruction set architecture made pipeline visible

(delayed branch, delayed load) •  More performance from deeper pipelines, parallelism to

a point •  Pentium 4 has 22 pipe stages!

Page 33: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

33

Page 34: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Review: Pipelined Datapath

Note that all R-Type Instructions have a NULL stage!

Page 35: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Review: Pipeline Hazards Structural Hazards

Resource oversubscription:

Page 36: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Review: Pipeline Hazards Control Hazards

•  What is the next instruction? •  Branch instructions take time to compute this. Stall, Predict, or Delay:

Pipeline Stall - only 1 cycle/stage delay…

Page 37: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Review: Pipeline Hazards Control Hazards

•  What is the next instruction? •  Branch instructions take time to compute this. Delayed Decision (Used in MIPS):

More about Branch Prediction/Delayed Branching Later…

50% Fill Rate

Page 38: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Review: Pipeline Hazards Data Hazards

Value from prior instruction is needed before write back Data Hazard:

Solution: Bypassing

Page 39: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

Review: Pipeline Hazards Data Hazards

Value from prior instruction is needed before write back Load-Use Data Hazard:

Options: Delayed Load or Bubble

Page 40: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

40

Compiler Avoidance of Load Stalls

% loads stalling pipeline

0% 20% 40% 60% 80%

tex

spice

gcc

25%

14%

31%

65%

42%

54%

scheduled unscheduled

Page 41: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

41

Pipeline Control

Page 42: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

42

Pipeline Control •  Control is divided into 5 stages •  Signal values same as unicycle case! •  Timing is different…

Page 43: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

43

Pipeline Control •  Signal values same as unicycle case! •  Timing is different… •  Simplest method: Extend pipe registers

Page 44: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

44

Pipeline Control

Page 45: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

45

What About Data Hazards?

Page 46: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

46

What About Data Hazards?

Page 47: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

47

Forwarding Unit

How does the Forwarding Unit know when to forward?

Page 48: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

48

Forwarding Unit

EX Hazard: EX/MEM.RegWrite AND EX/MEM.RegisterRd != 0 AND EX/

MEM.RegisterRd == ID/EX.RegisterReadRs(Rt) MEM Hazard very similar, but prefer MEM over WB value

Page 49: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

49

What About Load-Use Stall? •  Forwarding can’t save the day •  Need to introduce stall in hardware or compiler

Time Travel Necessary!

Page 50: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

50

What About Load-Use Stall?

Page 51: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

51

Hazard Detection Unit

How does the Hazard Detection Unit know when to forward?

Nop is all zeros!!

Page 52: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

52

Hazard Detection Unit

ID/EX.MemRead AND (ID/EX.RegisterRt == IF/ID.RegisterRs OR ID/

EX.RegisterRt == IF/ID.RegisterRt)

Page 53: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

53

What About Control Hazards? (Predict Not-Taken Machine)

Architectural State Change?

We are OK, as long as we squash. Can we reduce delay?

Page 54: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

54

Reduce Branch Delay

1.  Move branch address calculation to decode stage (from MEM stage)

2.  Move branch decision up (Harder •  Bitwise-XOR, test for zero •  Only need Equality testing •  Much faster: No carry

Everything is done in decode stage!!

Page 55: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

55

What About Control Hazards?

Page 56: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

56

What About Control Hazards?

Page 57: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

57

Page 58: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

58

Review: Exceptions

•  What happens if instruction encoding is not valid? •  What about arithmetic overflow?

Exception An event that disrupts program execution.

When an exception occurs: •  Save the current PC in the EPC •  Cause = 0 for Undefined Instruction, 1 for Overflow •  Jump to the OS at C000000016 (not vectored)

Page 59: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

59

Review: Multicycle Exception Handling

Page 60: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

60

Exceptions in Pipelines

•  Exception must appear to programmer/OS as it would in unicycle/multicycle

•  Must squash in-flight instructions after excepting inst •  Looks a lot like a branch…

Don’t Forget EPC and Cause!!!

Page 61: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

61

Pipeline Exception Handling

Page 62: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

62

Look at this mess!!!

Page 63: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

63

Precise vs. Imprecise Exceptions

Precise Exceptions •  EPC has value of excepting instruction PC •  Easy for OS to handle •  We have been looking at precise exception machine

Imprecise Exceptions •  Reduce pipeline complexity by putting current PC or

other approximation into EPC •  OS figures it out

Page 64: Topic 10: Pipelining - Princeton University Computer Science · • Pipelining is a fundamental concept in computers/nature • Multiple instructions in flight • Limited by length

64

Summary

•  Pipelining is a fundamental concept in computers/nature •  Multiple instructions in flight •  Limited by length of longest stage, Latency vs.Throughput

•  Hazards gum up the works •  Pipeline Control can be messy!


Recommended