+ All Categories
Home > Documents > Advanced Microarchitecture

Advanced Microarchitecture

Date post: 25-Feb-2016
Category:
Upload: lona
View: 44 times
Download: 2 times
Share this document with a friend
Description:
Advanced Microarchitecture. Lecture 1: Introduction. Course Floorplan. Intro/Review: 2 lectures Processor Front-end: 5 lectures Execution Core: 4 lectures Other topics: 6 lectures Processor Case Studies: 11 classes Mini-conference: 2 classes. First 8.5 weeks. Next 5.5 weeks. - PowerPoint PPT Presentation
40
Advanced Microarchitecture Lecture 1: Introduction
Transcript
Page 1: Advanced  Microarchitecture

Advanced MicroarchitectureLecture 1: Introduction

Page 2: Advanced  Microarchitecture

2

Course Floorplan• Intro/Review: 2 lectures• Processor Front-end: 5

lectures• Execution Core: 4 lectures• Other topics: 6 lectures• Processor Case Studies: 11

classes• Mini-conference: 2 classes

Lecture 1: Introduction

First 8.5 weeks

Course philosophy:

(1) First half: learn details about microarchitecture concepts(2) Second half: study real designs, applying what we covered in part 1.

Next 5.5 weeksLast week of class

Page 3: Advanced  Microarchitecture

3

Course Components• Lectures:

– I’m not taking attendance, but since there’s no textbook, attendance (and being awake) is incredibly important.

– There will be four homework assignments for this part.

• Supplemental Reading (required):– “The Pentium Chronicles” by Robert P. Colwell,

published by Wiley-Interscience, ISBN: 0-471-73617-1

– Must complete reading this before the start of case studies

• Case studies:– Paper reading is mandatory… you cannot

participate if you haven’t read the paper(s)Lecture 1: Introduction

Page 4: Advanced  Microarchitecture

4

Course Components (2)• Term Project

– Microprocessor/microarchitecture-based project– Project must be approved

• Mini-Conference– We will peer-review all projects, similar to how a

conference program committee reviews papers– Last week of class will be used to hold a mini-

conference where you present your term project– Food and drink will be provided! :-)

• No Exams, Hooray!

Lecture 1: Introduction

Page 5: Advanced  Microarchitecture

5

Grading Specifics• 4 Homeworks at 5 points each = 20 pts• 5 TPC reading summaries, 3 pts each = 15

pts• 11 case-study reading summaries and

participation,3 pts each = 33 pts

• Term project = 32 pts– Abstract/Proposal: 5 pts– Mid-project Status: 2 pts– Write-up: 10 pts– Reviews (of other people’s projects): 5 pts– Final Presentation: 10 pts

Lecture 1: Introduction

Page 6: Advanced  Microarchitecture

6

Case-Study Mechanics• If you don’t do the readings, you’re not

going to contribute anything to the discussions, therefore …– For each case-study session, you must do the

reading before the start of class– You must also write a brief summary of the

readings– You must submit the summary at the start of

class…

Lecture 1: Introduction

The summary is your entrance ticket to class:

If you don’t hand in the summary,I’m not going to let you enter the classroom!

Page 7: Advanced  Microarchitecture

7

Performance• What metric to use?

– CPI, IPC, MIPS, FLOPS, polygons/sec, frames/sec, …

• Absolute Runtime– “How long will it take to run my program?”– “How long will it take to run my programs?”

• Relative Performance– “Will my program run faster on an Intel or AMD

cpu?”– “Will my programs run faster on an Intel or AMD

cpu?”– “Will my typical program run faster on Intel or

AMD?”Lecture 1: Introduction

Page 8: Advanced  Microarchitecture

8

Iron Law of Performance

Lecture 1: Introduction

• Runtime = Total Insts

CyclesInstruction

SecondsCycle

X X

Total WorkIn Program CPI or 1/IPC 1/f (clock freq.)

Algorithms,Compilers,

ISA ExtentionsMicroarchitecture Microarchitecture,

Process Tech

This is the only performance metric that matters (for the uniprocessor world).Everything else is just a proxy!!!

Page 9: Advanced  Microarchitecture

9

Multi-Core/Performance• Correct metric depends

– Single parallel (multi-threaded) application:• Runtime

– Multiple applications (multi-programmed workload):• Typically total system throughput• Latency/Runtime of a given program not so

important

• Fairness and combined fairness/performance metrics often used.

Lecture 1: Introduction

Page 10: Advanced  Microarchitecture

10

Power• Which power do you mean?

– Maximum/peak power delivery requirements• “450W Power Supply”

– Average power delivery requirements• Battery life• Electricity bills

Lecture 1: Introduction

Page 11: Advanced  Microarchitecture

11

Dynamic Power• Power to charge/discharge a

capacitor• P = VI• I = C dv/dt

Lecture 1: Introduction

C

+-

V

Page 12: Advanced  Microarchitecture

12

Dynamic Power• P = ½CV2fa

– C: total capacitance switched– V: power supply voltage– f: clock frequency– a: activity factor

• Really, P = Siall blocks Pi = ½fV2 × Siall blocksCiai

• Ci and ai are hard to determine– Ci requires detailed circuit design– ai depends on dynamic behavior (application

specific)

Lecture 1: Introduction

Page 13: Advanced  Microarchitecture

13

Example• Cache Power

– Clock frequency = 2 GHz– L1 Instruction Cache: C=1.515 mF, a = 0.88– L1 Data Cache: C=0.741 mF, a =0.6– L2 Unified Cache: C=12.7 mF, a = 0.07– Vdd = 1.5V

• PIL1 = ½ * 1.515 mF * (1.5)2 * 2GHz * 0.88= ½ * 1.515e-9F * 2.25V2 * (1/500e-12 sec) * 0.88= 3 FV2/s = 3 (columbs/volt)*(volt2)/second= 3 columb*volt/second = 3 (Amp*sec) *

(Watt/Amp) / sec= 3 Watts

Lecture 1: Introduction

Page 14: Advanced  Microarchitecture

14

Example• L1 Data Cache: C=0.741 mF, a =0.6• PDL1 =

= 1 Watt• L2 Unified Cache: C=12.7 mF, a = 0.07• PUL2 =

= 2 Watts• Total Power of All Caches = PIL1 + PDL1 + PUL2 =

Lecture 1: Introduction

Page 15: Advanced  Microarchitecture

15

Trading Power and Performance• P = ½CV2fa• f V• P ½CV2Va• P V3

• Perf f V

• Decrease V– Performance drops linearly– Power drops cubically!

Lecture 1: Introduction

Rule of thumb:3% Power reduction correspondsto about a 1% Performance drop

A.K.A. Voltage-Frequency Scaling

Voltage can be decreased onlyso far... after that, you can only

decrease clock frequency

Page 16: Advanced  Microarchitecture

16

Static Power• “Leakage”, “Dark Current”

– Dark current name comes from current measured in photodetectors when no light is present

• Two Kinds:– Channel leakage or subthreshold conductance– Gate leakage

Lecture 1: Introduction

Page 17: Advanced  Microarchitecture

17

First, a MOS transistor

Lecture 1: Introduction

Applied Voltage

Source Drain

Gate

CurrentThreshold Voltage

Gate

Source Drain

+ + + + +- - - - -

Current

Page 18: Advanced  Microarchitecture

18

NMOS vs. PMOS• P = positive, N = negative

Lecture 1: Introduction

Gate

Source

Drain0 Volts

++ Gate

Source

DrainVdd

--

++=

PMOS NMOS

Page 19: Advanced  Microarchitecture

19

Back to Leakage

Lecture 1: Introduction

Channel LeakageSubthreshold Conductance

Gate Leakage

Page 20: Advanced  Microarchitecture

20

Leakage in MOS transistors

Lecture 1: Introduction

Gate

Source Drain

Channel Length keepsShrinking (faster transistors)Channel resistance decreases

(Leakage increases)

Oxide Thickness keepsShrinking (faster transistors)

Probability of QuantumTunneling Increases(Leakage increases)

-Vth/nVqIsub = K1We (1-e )-V/Vq

Iox = K2W(V/Tox)2e -aTox/V

Page 21: Advanced  Microarchitecture

21

Quantum Tunneling• Electrons aren’t “here” or “there”• Location is a probability distribution• Non-zero probability of being anywhere

Lecture 1: Introduction

e-

Oxid

e e-

P(Tunnel) << 1 P(Tunnel)Non-negligible

Page 22: Advanced  Microarchitecture

22

Power vs. Performance• ED product (energy * delay)

– Lower is better• Lower execution latency (i.e., higher performance)• Lower energy consumption

– Can lead to not-so-great configurations• Simple CPU really long execution time, but very low

power lower ED product (may not be acceptable)• ED2 product

– Performance more heavily weighted

Lecture 1: Introduction

Page 23: Advanced  Microarchitecture

23

Thermals• Temperature of the chip determined by

– Power/heat generation rate– Heat removal

• Given the two, T will settle at a steady state– Heat flow is function of temperature gradient– If there’s too much heat, T will increase until

gradient large enough to remove the heat fast enough

– So long as this steady state T is within allowed operating conditions, everything should work fine• May have impact on long-term reliability

Lecture 1: Introduction

Page 24: Advanced  Microarchitecture

24

Thermal Runaway• But, leakage is a function of temperature

• Temp leads to Leakage• Which burns more power• Which leads to Temp, which …

• Positive feedback loop can melt your chip

Lecture 1: Introduction

-Vth/nVqIsub = K1We (1-e )-V/Vq

Page 25: Advanced  Microarchitecture

25

Hot Spots• Average temperature != local temperature• Local spots may be hotter

– Leads to “hot spots”– Temp anywhere cannot

exceed Tjmax (transistors stopworking)

– Possible to have good averageglobal/temp but still violateTjmax locally

Lecture 1: Introduction

(Simulated P4 Thermals)

Page 26: Advanced  Microarchitecture

26

When Cooling is Insufficient

Lecture 1: Introduction

Page 27: Advanced  Microarchitecture

27

Coupling Noise

Lecture 1: Introduction

Wire 1

Wire 2

Capacitative Coupling

Wire 2

Wire 1current change

Magnetic Field

induced current

Inductive Coupling

Page 28: Advanced  Microarchitecture

28

Impact on Performance

Lecture 1: Introduction

Clock cycle time

Clock cycle time

Extra noise margin decrease in f

Page 29: Advanced  Microarchitecture

29

Power Supply Noise

Lecture 1: Introduction

Water Tank Ishower

Flush!Ijohn

Ishower - Ijohn

PressureDrop

Page 30: Advanced  Microarchitecture

30

Power Supply Noise

Lecture 1: Introduction

Power Supply Pin

Local spikes in power consumptioncan affect other very far away blocksdepending on the power distribution

network

1.2V

1.5V

1.5V1.5V

1.5V 1.5V

Page 31: Advanced  Microarchitecture

31

Same Solution as Water Supply

Lecture 1: Introduction

++++++++1.5V 1.5V++++++++

2 mA 2 mA

up to 3 mA

1 mAX

X0.75V

0.5mA 0.5mA

Decoupling orDebouncingCapacitors(“Decaps”)

Page 32: Advanced  Microarchitecture

32

Fabrication Costs• CPU (die) size greatly affects cost

– Current CPUs 1-2 cm2

– Embedded much smaller• cost and footprint matters in cell phone or iPod

Lecture 1: Introduction

Silicon Wafer

Die

Page 33: Advanced  Microarchitecture

33

Yield

Lecture 1: Introduction

13/16 working chips81.25% yield

1/4 working chips25.0% yield

ManufacturingDefects

Page 34: Advanced  Microarchitecture

34

Yield

Lecture 1: Introduction

52 die, 81.25% yield 42.25 working parts / wafer

17 die, 25.0% yield 4.25 working parts / wafer

Assuming $250 per wafer:$5.92 per die$58.82 per die

Page 35: Advanced  Microarchitecture

35

Yield

Lecture 1: Introduction

20” Display$600 30” Display

1.52 = 2.25x area$18003x $$$

Yield appliesto all sorts offabricationtechnologies,not just plainold silicon.

In 2009: $400?

As technologymatures, yieldtypically improves,which helpsto reduce cost.

Prices from apple.com as of 11/26/2007

Page 36: Advanced  Microarchitecture

36

Complexity• Design time (microarchitecture)• Implementation time (circuit, layout

engineers)• Validation/Verification (test before fab)• Debugging (test after fab)• Repeat…

Lecture 1: Introduction

Impacts Time-to-Market2x performance / 18 months= 0.893% performance / week

Each week of product delay hadbetter earn you at least 0.9%

performance!

Page 37: Advanced  Microarchitecture

37

Verification• Intel Pentium FDIV bug

– Verification/validation should catch this– It didn’t (last minute optimization, full validation

not run)– Cost: ~ $500M

• Complexity can be costly• Over half of the design effort is spent on

verification

Lecture 1: Introduction

Page 38: Advanced  Microarchitecture

38

OS, Compilers, Applications, …• Some additional direct and indirect costs• Ex. MMX/SSE

– Costs extra HW, design time, verification, etc.– Useless without cooperation from application writers

• Intel has a lot of SW people in-house to work on new applications, or work with 3rd-parties to use new technologies in their applications

• Danger: benefits on new computers, but compatibility issues with older computers

• Ex. Multi-Core– Need support from OS vendors and application writers,

otherwise no one can use the extra processors– Some of the cost shared by others; worthwhile

investment for MSFT if you have to buy Vista for full multi-core support

Lecture 1: Introduction

Page 39: Advanced  Microarchitecture

39

Goal of Processor Design• Maximize performance... Within the constraints of

– Peak power, average power, die area, metal layers, thermals, implementation complexity, verification complexity, time-to-market, cost to manufacturer (Intel), cost to OEM (Dell), cost to end-customer (you)

• Huge, multi-variable optimization problem!– Not all variables are independent– Not all variables have the same weight– The same variable may have different weights

to different customersLecture 1: Introduction

Page 40: Advanced  Microarchitecture

40

Goal of Processor Design• Slightly different for different segments

– Laptops: maximize performance and battery life– Embedded: attain “sufficient” performance and

then maximize battery life• Your MP3 player only needs to be fast enough to run

the MP3 codec; any additional performance provides no end-user benefit and just costs more/consumes more power

– Server: throughput vs. latency

• In this course, we will be mostly focused on “high-performance” processors (desktop, server)

Lecture 1: Introduction


Recommended