Processor Level Parallelism. Improving the Pipeline Pipelined processor – Ideal speedup = num...

Post on 17-Jan-2016

221 views 0 download

Tags:

transcript

Processor Level Parallelism

Improving the Pipeline

• Pipelined processor– Ideal speedup = num stages– Branches / conflicts mean limited returns after certain

point

ILP

• Instruction Level Parallelism– Ability to run multiple instructions at the same

time

Superscalar

• Superscalar : capable of running multiple instructions at a time– Multiple execution units• Widen slowest part of pipeline

Superscalar

• Multi-issue : Start multiple instructions per clock– Parallel pipes

Superscalar

• Multi-issue pipeline feeding multiple execution units

Superscalar

• Issue:Dependency issues just got MUCH harder…

Superscalar Pro/Con

• Good– The hardware solves everything:• Hardware solves scheduling/registers/etc…• Compiler can still help matters

– Binary compatibility• New hardware issues old instructions in a more

efficient way

• Bad– Complex hardware– Limit to scale

VLIW

• VLIW : Very Large Instruction Word– One instruction contains multiple ops

VLIW

• Instructions VERY large– 240 bits?– Wasted space addressed by bundles• No dependencies within bundle

Who does work?

• Compiler assembles long instructions– Reorders at compile time

• Compiler has more time,information

VLIW Uses

• Itanium : – EPIC : Explicitly Parallel Computing– 3 instruction bundles

VLIW Pro/Con

• Good– Simple hardware• Add new functional units with no new scheduling

hardware

– Better optimization in compiler

• Bad– Binary compatibility : compiler builds for one

specific hardware– Good compilers are HARD to write

ARM 15

• Modern CPU:

Processor Parallelism

• Process Parallelism : Run multiple instruction streams simultaneously

Process vs Thread

• Process : Program– Own memory space– Has at least one

thread

Process vs Thread

• Thread : Instruction sequence– Own registers/stack– Share memory

with otherthreads in process

Threaded Code

• Demo…

Context Switching

• Four threads running in 4-wide pipeline– Can't always fill all 4 issue slots– Have bubbles from memory access, page faults,

etc…

Context Switching

• Threads often have bubbles…

Multithreading

• MultithreadingAlternate threads to maximize hardware use– Course : run until stall, then switch

– Fine : switch every cycle

– Either one needs extra hardware

Multithreading Superscalar

• A 2-instruction wide pipeline with multithreading:– Still only one process per cycle

Fine grained Course grained

SMT

• SMT : Simultaneous Multithreading– AKA Hyperthreading

• Issue ops from multiple threads in one cycle

• Maximize use of functional units– But need to track registers each instruction goes

with…

SMT Challenges

• Resources must be duplicated or split– Split too thin hurts performance…– Duplicate everything and you aren't maximizing

use of hardware…

Intel vs AMD

• Variations on SMT

Getting Faster

• Pipelining helps to a point• Superscalar/VLIW helps to a point• SMT helps a bit• Chips getting faster

Getting Faster

• Pipelining helps to a point• Superscalar/VLIW helps to a point• SMT helps a bit• Chips getting faster• Only so much speedup possible– Power = heat– Power C V2 f

• C = Capacitance, how well it “stores” a charge• V = Voltage• f = frequency. I.e., how fast clock is (e.g., 3 GHz)

Power Density Prediction circa 2000

40048008

8080 8085

8086

286 386486

Pentium® procP6

1

10

100

1000

10000

1970 1980 1990 2000 2010

Year

Pow

er D

ensi

ty (W

/cm

2)

Hot Plate

Nuclear Reactor

Rocket Nozzle

Source: S. Borkar (Intel)

Sun’s Surface

Core 2

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Moore's Law Related Curves

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Moore's Law Related Curves

Adapted from UC Berkeley "The Beauty and Joy of Computing"

Going Multi-core Helps Energy Efficiency• Power of typical integrated circuit C V2 f– C = Capacitance, how well it “stores” a charge– V = Voltage– f = frequency. I.e., how fast clock is (e.g., 3 GHz)

William Holt, HOT Chips 2005

Adapted from UC Berkeley "The Beauty and Joy of Computing"