+ All Categories
Home > Documents > Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num...

Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num...

Date post: 17-Jan-2016
Category:
Upload: pauline-wade
View: 221 times
Download: 0 times
Share this document with a friend
Popular Tags:
31
Processor Level Parallelism
Transcript
Page 1: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Processor Level Parallelism

Page 2: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Improving the Pipeline

• Pipelined processor– Ideal speedup = num stages– Branches / conflicts mean limited returns after certain

point

Page 3: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

ILP

• Instruction Level Parallelism– Ability to run multiple instructions at the same

time

Page 4: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Superscalar

• Superscalar : capable of running multiple instructions at a time– Multiple execution units• Widen slowest part of pipeline

Page 5: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Superscalar

• Multi-issue : Start multiple instructions per clock– Parallel pipes

Page 6: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Superscalar

• Multi-issue pipeline feeding multiple execution units

Page 7: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Superscalar

• Issue:Dependency issues just got MUCH harder…

Page 8: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Superscalar Pro/Con

• Good– The hardware solves everything:• Hardware solves scheduling/registers/etc…• Compiler can still help matters

– Binary compatibility• New hardware issues old instructions in a more

efficient way

• Bad– Complex hardware– Limit to scale

Page 9: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

VLIW

• VLIW : Very Large Instruction Word– One instruction contains multiple ops

Page 10: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

VLIW

• Instructions VERY large– 240 bits?– Wasted space addressed by bundles• No dependencies within bundle

Page 11: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Who does work?

• Compiler assembles long instructions– Reorders at compile time

• Compiler has more time,information

Page 12: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

VLIW Uses

• Itanium : – EPIC : Explicitly Parallel Computing– 3 instruction bundles

Page 13: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

VLIW Pro/Con

• Good– Simple hardware• Add new functional units with no new scheduling

hardware

– Better optimization in compiler

• Bad– Binary compatibility : compiler builds for one

specific hardware– Good compilers are HARD to write

Page 14: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

ARM 15

• Modern CPU:

Page 15: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Processor Parallelism

• Process Parallelism : Run multiple instruction streams simultaneously

Page 16: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Process vs Thread

• Process : Program– Own memory space– Has at least one

thread

Page 17: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Process vs Thread

• Thread : Instruction sequence– Own registers/stack– Share memory

with otherthreads in process

Page 18: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Threaded Code

• Demo…

Page 19: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Context Switching

• Four threads running in 4-wide pipeline– Can't always fill all 4 issue slots– Have bubbles from memory access, page faults,

etc…

Page 20: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Context Switching

• Threads often have bubbles…

Page 21: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Multithreading

• MultithreadingAlternate threads to maximize hardware use– Course : run until stall, then switch

– Fine : switch every cycle

– Either one needs extra hardware

Page 22: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Multithreading Superscalar

• A 2-instruction wide pipeline with multithreading:– Still only one process per cycle

Fine grained Course grained

Page 23: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

SMT

• SMT : Simultaneous Multithreading– AKA Hyperthreading

• Issue ops from multiple threads in one cycle

• Maximize use of functional units– But need to track registers each instruction goes

with…

Page 24: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

SMT Challenges

• Resources must be duplicated or split– Split too thin hurts performance…– Duplicate everything and you aren't maximizing

use of hardware…

Page 25: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Intel vs AMD

• Variations on SMT

Page 26: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Getting Faster

• Pipelining helps to a point• Superscalar/VLIW helps to a point• SMT helps a bit• Chips getting faster

Page 27: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Getting Faster

• Pipelining helps to a point• Superscalar/VLIW helps to a point• SMT helps a bit• Chips getting faster• Only so much speedup possible– Power = heat– Power C V2 f

• C = Capacitance, how well it “stores” a charge• V = Voltage• f = frequency. I.e., how fast clock is (e.g., 3 GHz)

Page 28: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Power Density Prediction circa 2000

40048008

8080 8085

8086

286 386486

Pentium® procP6

1

10

100

1000

10000

1970 1980 1990 2000 2010

Year

Pow

er D

ensi

ty (W

/cm

2)

Hot Plate

Nuclear Reactor

Rocket Nozzle

Source: S. Borkar (Intel)

Sun’s Surface

Core 2

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Page 29: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Moore's Law Related Curves

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Page 30: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Moore's Law Related Curves

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Page 31: Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num stages – Branches / conflicts mean limited returns after.

Going Multi-core Helps Energy Efficiency• Power of typical integrated circuit C V2 f– C = Capacitance, how well it “stores” a charge– V = Voltage– f = frequency. I.e., how fast clock is (e.g., 3 GHz)

William Holt, HOT Chips 2005

Adapted from UC Berkeley "The Beauty and Joy of Computing"


Recommended