+ All Categories
Home > Documents > Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf ·...

Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf ·...

Date post: 12-Feb-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
33
Microprocessors: 20 Years Back, Microprocessors: 20 Years Back, 10 Years Ahead 10 Years Ahead Guri Sohi University of Wisconsin
Transcript
Page 1: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

Microprocessors: 20 Years Back, Microprocessors: 20 Years Back, 10 Years Ahead10 Years Ahead

Guri SohiUniversity of Wisconsin

Page 2: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

2

OutlineOutline

The enabler: semiconductor technologyRole of the processor architectMicro-architectures of the past 20 years

From pipelining to speculationMicro-architectures of the next 10 years

Page 3: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

3

Semiconductor TechnologySemiconductor Technology

Many more available transistorsImbalances due to disparate rates of performance improvement

E.g., logic and memory speeds

How does this impact the architecture of microprocessors?

Page 4: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

4

Number of TransistorsNumber of Transistors

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1,000,000,000

10,000,000,000

100,000,000,000

1971 1974 1982 1989 1997 2000 2004 2008 2012 2016

Tran

sist

ors

4004 8008

8080

80286

80386 80486

Pentium Pentium II

Pentium III Pentium 4

8086

Page 5: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

5

Relative Memory SpeedRelative Memory Speed

1.42.5

3.86.3

10.7

2948

75120

1

10

100

1000

1974 1978 1982 1985 1989 1993 1997 1999 2000

Proc

esso

r, M

emor

y D

ivid

e (C

ycle

Tim

e)

Page 6: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

6

Intel MicroprocessorsIntel Microprocessors

386 (275 K) 486 (1180 K)Pentium (3100 K)

Pentium II (7500 K)

Pentium III (24000 K)Pentium 4 (42000 K)

What is being done with all

the transistors?

Page 7: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

7

Role of Computer ArchitectRole of Computer Architect

Get desired level of performanceDetermine functionality neededDetermine how functionality should be implemented

Page 8: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

8

Role of Computer Architect…Role of Computer Architect…

Defining functionalityFunctionality to deal with increasing latencies (e.g., caches, wires)Functionality to increase parallelism and its exploitation

Implementing functionalityBalancing various technology parametersEase of design / verification / testing

Page 9: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

9

The Performance EquationThe Performance Equation

Time = Number of Instructions x Cycles per Instruction x Clock Cycle Time

Not much can be done about first term in hardwareBut, …

Logic speed increase - decreases 3rd termWatch out for possible increase in 2nd term

Use micro-architectural innovations to decrease 2nd

and 3rd termsReduce latenciesExploit parallelism

Page 10: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

10

MicroarchitecturalMicroarchitectural FunctionalityFunctionality

Functionality to cope with increasing memory latenciesFunctionality to exploit parallelism

Page 11: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

11

Memory HierarchiesMemory Hierarchies

Reducing access latency and improving access bandwidthSingle-level cachesMulti-level cachesNon-blocking cachesMulti-ported and multi-banked cachesTrace caches

Page 12: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

12

The March of ParallelismThe March of Parallelism

Generation 2 (1980s)Generation 1 (1970s)

Generation 4 (2000s)

Generation 3 (1990s)

Page 13: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

13

Exploiting ParallelismExploiting Parallelism

• Little change in programming modelstill write programs in sequential languages

Automatic parallelization not widely successfulGreat investment in existing software

Resort to low-level, Instruction Level Parallelism (ILP)

Page 14: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

14

Instruction Level Parallelism (ILP)Instruction Level Parallelism (ILP)

Determine small number (e.g., < 100) instructions to be executedDetermine dependence relationships and create dependence graph

Use to determine parallel executionCan be done statically (VLIW / EPIC) or dynamically (out-of-order superscalar)

Page 15: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

15

Limitations to ILPLimitations to ILP

Branch instructions inhibit determination of instructions to execute: control dependencesImperfect analysis of memory addresses inhibits reordering of memory operations: ambiguous memory dependencesProgram/algorithm data flow inhibits parallelism: true dependencesIncreasing latencies exacerbate impact of dependences

Use speculation to overcome impact of dependences

Page 16: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

16

SpeculationSpeculation

Speculation: “.. to assume a business risk in hope of gain’’

Webster

Page 17: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

17

Speculation and Computer ArchitectureSpeculation and Computer Architecture

Speculate outcome of event rather than waiting for outcome to be known

Program behavior provides rationale for high success rate

Functionality to support speculationFunctionality to speculate betterFunctionality to minimize mis-speculation penalty

Page 18: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

18

Control SpeculationControl Speculation

Predict outcome of branch instructionsSpeculatively fetch and execute instructions from predicted path

Increase available parallelismRecover if prediction is incorrect

Page 19: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

19

Model for Speculative ExecutionModel for Speculative Execution

Instructio

n

fetch & branch

prediction

Dependence

checking and

dispatching

Execution window

Completed instructions

Instn. reorder &

commit

Instruction Issue & Execution

Static program

Dynamic instruction stream

Page 20: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

20

Supporting Control SpeculationSupporting Control SpeculationTechniques to predict branch outcome: branch predictors

Initiating speculationImproving accuracy of speculation

Techniques to support speculative execution: reservation stations, register renaming etc.

Supporting speculative executionTechniques to give appearance of sequential execution: reorder buffers, etc.

Doing it transparently

Page 21: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

21

Key observationKey observation

Basic mechanisms to support control speculation can support other forms of

speculation as well

Page 22: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

22

PerformancePerformance--Inhibiting ConstraintsInhibiting ConstraintsControl dependences: inhibit creation of instruction window

Use control speculationAmbiguous data dependences: inhibit parallelism recognition

Use data dependence speculationTrue data dependences: inhibit parallelism

Use value speculationCommon mechanisms may support different forms of speculationDifferent techniques to improve accuracy of speculation

Page 23: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

23

Speculation in Use TodaySpeculation in Use Today

Address calculation and translation (especially if 2-step process)Cache hitMemory ordering violation in multiprocessorsLoad/store dependences

Page 24: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

24

Microprocessors Microprocessors –– the next 10 yearsthe next 10 years

Factor of 30 increase in semiconductor resourcesHow to use it?

New constraintsPower consumptionWire delaysDesign / verification complexity

New applications?Throughput-oriented workloadsCoarse-grain multithreaded applications

Page 25: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

25

Technology TrendsTechnology Trends

Design and verification of large number of transistors becoming unwieldyWires getting relatively slower

Short wires for fast clockImplies increase latencies; exploit locality of communication

Power issues becoming very important

Page 26: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

26

Architect’s Role RevisitedArchitect’s Role Revisited

Defining functionalityNew models needed to further increase parallelism exploitation

Implementing functionalityBecoming a dominating factor?

Speculation is likely to be the key to overcoming constraints

Page 27: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

27

Implications of TrendsImplications of Trends

Implementation considerations will imply computing chips with multiple (replicated?) processing cores

“multiprocessor” or “multiprocessor-like” or “multithreaded”Will start out as “logical” replication (e.g., SMT)Will move towards “physical” replication (e.g., CMP)

How to assign work to multiple processing cores?Independent programs (or threads)Parts of a single program

Page 28: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

28

ThroughputThroughput--Oriented ProcessingOriented Processing

Executing multiple, independent programs on underlying parallel micro-architecture

Similar to traditional throughput-oriented multiprocessorSignificant engineering challenges, but little in ways of architectural / micro-architectural innovation

Can we use underlying “multiprocessor” to speed up execution of single program?

Page 29: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

29

Parallel Processing of Single ProgramParallel Processing of Single Program

Will the promise of explicit / automatic parallelism come true?Will new (parallel) programming languages take over the world?

Don’t count on it !

Page 30: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

30

Speculative ParallelizationSpeculative Parallelization

Sequential languages aren’t going awayUse speculation to overcome inhibitors to “automatic” parallelization

Ambiguous dependencesDivide program into “speculatively parallel” portions or “speculative threads”

Page 31: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

31

Speculative ThreadsSpeculative Threads

Subject of extensive research todayDifferent speculative parallelization models being investigated

Page 32: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

32

Generic circa 2010 MicroprocessorGeneric circa 2010 Microprocessor4 – 8 general-purpose processing engines on chip

Used to execute independent programsExplicitly parallel programs (when possible)Speculatively parallel threadsHelper threads

Special-purpose processing units (e.g., DSP functionality)Elaborate memory hierarchyElaborate inter-chip communication facilitiesExtensive use of different forms of speculation

Page 33: Microprocessors: 20 Years Back, 10 Years Aheadftp.cs.wisc.edu/sohi/talks/2002/toronto.pdf · performance improvement ... 1971 1974 1982 1989 1997 2000 2004 2008 2012 2016. T r a n

33

SummarySummary

Semiconductor technology has, and will continue to, give computer architects new opportunitiesArchitects have used speculation techniques to overcome performance barriers; will likely continue to do soFuture microprocessors are going to have capability to execute multiple threads of codeNew models of speculation (e.g., thread-level speculation) will be needed to extract more parallelism


Recommended