+ All Categories
Home > Documents > Sam Naffziger AMD Senior Fellow -...

Sam Naffziger AMD Senior Fellow -...

Date post: 20-Oct-2019
Category:
Upload: others
View: 8 times
Download: 4 times
Share this document with a friend
45
High Performance Processors in a Power Limited World Sam Naffziger AMD Senior Fellow
Transcript
Page 1: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

High Performance Processors in a Power Limited World

Sam NaffzigerAMD Senior Fellow

Page 2: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Outline

Today’s processor design landscapeTrends

Issues making designer’s lives difficultPower limitsScaling effects

Design opportunitiesCircuit levelArchitectural

Summary

Page 3: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

The All Consuming Quest for Greater Performance at Lower Cost

Increasing Transistor Density

Moore’s Law has served us well.

Increasing Performance

Page 4: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Processor Frequency vs. Time

The amazing frequency increases of the past decade have leveled off – Why?

MPU Performance vs Time

100

1000

10000

Jan-97

Jan-98

Jan-99

Jan-00

Jan-01

Jan-02

Jan-03

Jan-04

Jan-05

Jan-06

Jan-07

Jan-08

Jan-09

Perf

orm

ance

(MH

z)

Power Limits Process Issues

4GHz

Page 5: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Outline

Today’s processor design landscapeTrends

Issues making designer’s lives difficultPower limitsScaling effects

Design opportunitiesCircuit levelArchitectural

Summary

Page 6: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Power Consumption Background

Source: Holt, HotChips 17

Power has always challenged circuit integration

Bipolar → NMOS → CMOS

We’ve been bailed out by technology in the past

Page 7: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Scaling Background

2005 ITRS Projections of Vt and Vdd

0

500

1000

1500

2000

2500

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

mV Vt

Vdd

Source: Horowitz et al, IEDM 2005

0.1

1

10

Jan-85 Jan-88 Jan-91 Jan-94 Jan-97 Jan-00 Jan-03

Feat Size (um)

Vdd

Power/10

Dennard Scaling

Scaling doesn’t bail us out any more

Realistic power limit

Page 8: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

The The Processor Processor DesignerDesigner

Process Scaling

Process Scaling

Issues

Issues

Power Consumption Background

P ≈ CTOT·α·F·Vdd2 + NTOT·α·F·Vdd·ICO + NON·ILEAK·VddSwitching Power Crossover Power Leakage Power

Reducing VddReducing CTOT

Reducing ILEAK, ICO

Reducing α

The Process guys have had the biggest impact on these

But now, not only are those improvements fading, but we have a host of new challenges

VariationVoltage droopWire non-scaling

Page 9: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Outline

Today’s processor design landscapeTrends

Issues making designer’s lives difficultPower limitsScaling effects

Design opportunitiesCircuit levelArchitectural

Summary

Scaling Issues

Page 10: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

The Silicon Age Still on a Roll, But …

Medium High Very HighVariability

Energy scaling will slow down>0.5>0.5>0.35Energy/Logic Op scaling

11111111RC Delay

Low Probability High ProbabilityAlternate, 3G etc

11

2016

High Probability Low ProbabilityBulk Planar CMOS

Delay scaling will slow down>0.7~0.70.7Delay = CV/I scaling

8162232456590Technology Node (nm)

2018201420122010200820062004High Volume Manufacturing

But … all this scaling has some nasty side effectsITRS RoadmapITRS RoadmapITRS Roadmap

65nm 45nm 32nm 22nm2007 2010 2013

PDSOI FDSOI

2016

bulk

stressors + substrateengineering

+ high µmaterials

MuGFETMuCFET

elec

tros

tatic

con

trol

SiONpoly

high kmetal

gate stack

planar 3D

Source: European Nanoelectronics Initiative Advisory Council (ENIAC)

Page 11: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Device Variation Reverse Scales

Source: Pelgrom, IEEE lecture 5/11/06

Variations subtract directly off cycle time

power efficiency dropsCircuit margins degrade

The Problem:Atoms don’t scale

Page 12: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

One impact of variation is leakage spreads

Fmax (a.u)

Sidd

(A)

1.25X Fmax

>3X SIDDspread

Note:

Chip SIDD set by “smallest” gates; Fmax set by slowest gates;

Page 13: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Scaling Intrinsically Hurts Supply Integrity

0%

20%

40%

60%

80%

100%

120%

250nm 180nm 130nm 90nm

Technology Node

Leakage %

Normalized Vdd

Vdd Droop %

Source: Bose, Hotchips 17

With power per core staying constant but area, voltage and cycle times dropping, we have a big challengeRequiring a higher voltage to hit frequency is a quadratic power impact

Page 14: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Outline

Today’s processor design landscapeTrends

Issues making designer’s lives difficultPower limitsScaling effects

Design opportunitiesCircuit levelArchitectural

Summary

Scaling Issues

Circuit Designer

Page 15: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Some Ways to Shoulder the Variation Burden: Adaptive clocking

Empirically set the clock edge to optimize frequency

Higher granularity → more variation tolerance

LBIST and GA search algorithms show promise for per-part optimization

ProgrammableDelay Buffers

Page 16: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Some Ways to Shoulder the Variation Burden: Self Healing Designs

Simplest example is cache ECC on memory arrays

Next level is Intel’s Pellston technology implemented on Montecito and Tulsa

Disable defective lines detected by multiple ECC errors

Future directions involve self-checking with redundant logic and retry

Predict result through parity, residues or redundant logicOn an error, replay calculation before committing architectural stateIf replay correct, it was a transient error (particle strike, Vdddroop, random noise coupling etc.)If incorrect can reduce frequency, increase voltage or retry with an alternate execution path

Page 17: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Some Ways to Shoulder the Variation Burden: Self Healing Designs

From Fall Microprocessor From Fall Microprocessor Forum 2006Forum 2006

Page 18: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Adaptive Supply Voltage

Energy / Operation

Channel Length

Nom

Short

Long VddLow

High

Per-part and dynamic voltage management are key

More range flexibility and finer grain response will provide differentiation

Page 19: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Integrated Power and Thermal Management

“Fuse and forget” is no longer viable

Too much variation in environment, manufacturing and operating conditionsSome means of dynamic optimization needed

Fmax (a.u)

Sidd

(A)

Page 20: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Integrated Power and Thermal Management

An autonomous programmable controller enables real time optimizations

An embedded controller provides the needed flexibility

OS interfacingMulti-core managementPer-part optimization

Page 21: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Scaling IssuesOutline

Today’s processor design landscapeTrends

Issues making designer’s lives difficultPower limitsScaling effects

Design opportunitiesCircuit levelArchitectural

Summary Chip Architect

Page 22: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Traversing the Power ContourPowe r Consumption

Channel Length

Nom

Short

Long VddLow

High

P ≈ CTOT·α·F·Vdd2 + NTOT·α·F·Vdd·ICO + NON·ILEAK·VddSwitching Power Crossover Power Leakage Power

Page 23: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Frequency

Channe l Length

Nom

Short

Long VddLow

High

Traversing the Power Contour

Page 24: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Traversing the Power Contour for a Given Implementation

Energy / Operation

Channel Length

Nom

Short

Long VddLow

High

Max performance

Best power efficiency

Page 25: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

0

0.2

0.4

0.6

0.8

1

1.2

Performance^3 / Watt

Channel LengthNom

Short

Long VddLowNom

High

For Comparing Architectural Efficiency, Performance3/W is most effective

Page 26: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

• The industry has been moving from “hyper-pipelining”with short pipe stages, to something more moderate

Pipe stages

0

5

10

15

20

25

30

35

Williamette(2000)

Prescott(2004)

Core2(2006)

Opteron Power6

~2X power efficient

Optimal Pipeline Depth

Page 27: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

A Look at Mobile Processor Power

Page 28: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

A Look at Mobile System Power

Mobile System Power

TDP Average Power

Rest of systemChipsetMemory controllerCPUMemory

If a laptop burned TDP power all the time, battery life would be measured in minutes

How do we get mobile average power so much lower than TDP?

Page 29: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

MobileMark 2002 Tj 95 1800MHz 1.35V

Core Power

IO Power0

5

10

15

20

25

30

35

0 10 20 30 40 50 60 70

Time(min)

Pow

er (W

)[C

1, P

N!,

& C

3 en

able

d]

Adobe FlashMSOutlook

MSWord

MSPpt

MSExce

MSPpt

MSExcel

MS Word

MS WordandNetscape

The Answer: Take Advantage of Typically Low CPU Utilization

Processor Utilization During Normal Laptop Usage as Approximated by MobileMark

Page 30: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Reducing Power and Cooling Requirements with Processor Performance States

PP--StateStateHIGHHIGH

LOWLOW

P02600MHz

1.40V~95watts

P12400MHz

1.35V~90watts

P22200MHz

1.30V~76watts

P32000MHz

1.25V~65watts

P41800MHz

1.20V~55watts

P51000MHz

1.10V~32watts

PROCESSORPROCESSORUTILIZATIONUTILIZATION

Up to 75% power savings (at idle)!

Average CPU Core Power(measured at CPU)

0

5

10

15

20

25

10500 Connections(~62% CPU Utilization)

5000 Connections(~40% CPU Utilization)

Idle(in OS)

Pow

er (W

)

AMD PowerNow!TM DISABLED

AMD PowerNow!TM ENABLED

-33%

-62%-75%

Additionally “C-states” reduce power further by cutting clocks completely and dropping voltage to retention levels

Page 31: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Improving Peak Performance per Watt

Page 32: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Watts/(Spec*Vdd*Vdd*L)

0.01

0.1

1

0 1 10 100 1000Spec2000*L

Source: Horowitz et al, IEDM 2005

Adding Features to Increase Performance

•Increasing execution efficiency has, historically hurt power efficiency•However, the cubic reduction of power with V/F scaling has tended to make this a good tradeoff

Page 33: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

0

0.5

1

1.5

2

2.5

3

3.5

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2

IPC

energy/opfrequencyperformance

Adding Features to Increase Performance Works with V/F Scaling

0

0.5

1

1.5

2

2.5

3

3.5

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2

IPC

energy/opfrequencyperformance

If we hit VMINhowever, the game is over

Voltage scaling has it limitsMore power efficient designs have an

advantageHigh power designs get penalized due

to higher di/dt, higher temperatures etc.

Page 34: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

How Hard is Improving Existing Processors?Watts/(Spec*Vdd*Vdd*L)

0.01

0.1

1

0 1 10 100 1000Spec2000*L

Source: Horowitz et al, IEDM 2005

Peak performance costs more energy/operation

Most of the Big hitter improvements have been heavily mined already Next generation AMD cores

have >> 50% of clocks gated off even for high power code

Current and Next Generation Core Comparison

ClockClock

Logic

Logic

Gen1 Peak Gen2 Peak

C s

witc

h

Despite more features, next gen core has substantially lower Cap

Page 35: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Multi-Core to the Rescue?

Sounds like a great story, what’s the catch?

CoreCore

CacheCache

Voltage =1Frequency =1Area =1Power =1Perf =1Perf/Watt =1

CoreCoreCoreCore

CacheCache

Voltage =.85Frequency =.85Area =2Power =1Perf ≈1.7Perf/Watt ≈1.7

Page 36: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Multi-Core to the Rescue?

Some of the catches:

What if you’re already at VMIN? Need to cut frequency in half to stay within power limit

How much parallelizable code is really out there?

More compute capacity means more IO and memory bandwidth demands …

CoreCoreCoreCore

CacheCache

CoreCore

CacheCache

Voltage =1Frequency =1Area =1Power =1Perf =1Perf/Watt =1

Voltage =.85Frequency =.85Area =2Power =1Perf ≈1.7Perf/Watt ≈1.7

Page 37: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Multi-Core Issues: Amdahl’s Law

There is almost always a portion of an application that cannot be parallelized

This portion becomes a bottleneck as the number of threads is increased

A typical value is in the range of 10%

CoreCoreCoreCore

CacheCache

CoreCore

CacheCache

Voltage =1Frequency =1Area =1Power =1Perf =1Perf/Watt =1

Voltage =.85Frequency =.85Area =2Power =1Perf ≈1.7Perf/Watt ≈1.7

multi-core speedup with serial code and constant power considered

0.002.004.006.008.00

10.00

1 2 3 4 5 6 7 8

cores

spee

dup 0

5%

10%

15%

20%

ideal

Just 10% serial code drops 8 core performance improvement by 41%Ideal

Page 38: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Multi-Core Issues: IO Power

All those extra cores need their own data …

IO power in terms of W/Gb/s has been pretty constant in the range of 20mW for years

CoreCoreCoreCore

CacheCache

CoreCore

CacheCache

Voltage =1Frequency =1Area =1Power =1Perf =1Perf/Watt =1

Voltage =.85Frequency =.85Area =2Power =1Perf ≈1.7Perf/Watt ≈1.7

• If we increase IO power accordingly, but hold total chip power constant with V/F scaling, things get worse• Overall performance drops by another 10% or so …

multi-core speedup with serial code, constant power+ IO power considered

0.002.004.006.008.00

10.00

1 2 3 4 5 6 7 8

cores

spee

dup 0

5%

10%

15%

20%

ideal

Ideal

Page 39: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

The Transition to Parallel Applications

Parallel Applications

Small number of applications (worked by experts for 10+ yrs)

Awkward development, analysis and debug environments

Parallel programming is hard!

Amdahl’s law is still a law…

SW productivity is already in a crisis this worsens things!

Single-threaded Applications

Most of today’s applications

Well understood optimization techniques

Advanced development, analysis and debug tools

Conceptually, easy to think about

Establishing an appropriate balance is key for managing this important transition

Page 40: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Other Architectural Directions: Integration

Not only does the integration of more system components (i.e. memory controllers, IO etc.) improve performance

Bose, HotChips 17

Typical Server Power Breakdown

Integration reduces power significantly as wellIO communication overhead dropsCPU integrated power management can dynamically optimizePower efficiency of special function components (i.e. graphics accelerators, network processors etc.) greatly exceeds that of general purpose CPUs

Page 41: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

I/O HubI/O HubUSBUSB

PCIPCI

PCIeTM

Bridge

PCIeTM

BridgePCIeTM

Bridge

PCIeTM

Bridge

I/O HubI/O Hub

8 GB/S

8 GB/S 8 GB/S

8 GB/S

USBUSB

PCIPCI

XMBXMBXMBXMB XMBXMB XMBXMB

SRQ

Crossbar

HTMem.Ctrlr

SRQ

Crossbar

HTMem.Ctrlr

SRQ

Crossbar

HTMem.Ctrlr

SRQ

Crossbar

HTMem.Ctrlr

System-level Power Consumption

Dual-Core Packages with legacy technology• 692 watts for processors (173w each)• 48 watts for external memory controller

95% More Power

Dual-Core AMD Opteron™ processors• 380 watts for processors (95w each)

• Integrated memory controllers

MCPMCP MCPMCPMCPMCP MCPMCP

Chip

XChip

XChip

XChip

XChip

XChip

XChip

XChip

X

Source: Mixture of publicly available data sheets and AMD internal estimates. Actual system power measurements may vary based on configuration and components used

I/O HubI/O HubMemory

Controller Hub

Memory Controller

Hub

PCI-E Bridge

PCI-E BridgePCI-E Bridge

PCI-E BridgePCIeTM

Bridge

PCIeTM

Bridge

Page 42: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

I/O HubI/O HubUSBUSB

PCIPCI

PCIeTM

Bridge

PCIeTM

BridgePCIeTM

Bridge

PCIeTM

Bridge

I/O HubI/O Hub

8 GB/S

8 GB/S 8 GB/S

8 GB/S

USBUSB

PCIPCI

XMBXMBXMBXMB XMBXMB XMBXMB

SRQ

Crossbar

HTMem.Ctrlr

SRQ

Crossbar

HTMem.Ctrlr

SRQ

Crossbar

HTMem.Ctrlr

SRQ

Crossbar

HTMem.Ctrlr

System-level Power Consumption

380 watts380 watts

8.58.5wattswatts

8.58.5wattswatts

8.58.5wattswatts

8.58.5wattswatts

Dual-Core Packages with legacy technology• 692 watts for processors (173w each)• 48 watts for external memory controller

95% More Power

Dual-Core AMD Opteron™ processors• 380 watts for processors (95w each)

• Integrated memory controllers

740 watts 380 watts

MCPMCP MCPMCPMCPMCP MCPMCP

Chip

XChip

XChip

XChip

XChip

XChip

XChip

XChip

X

692 watts692 watts

Source: Mixture of publicly available data sheets and AMD internal estimates. Actual system power measurements may vary based on configuration and components used

I/O HubI/O HubMemory

Controller Hub

Memory Controller

Hub

1414wattswatts

PCI-E Bridge

PCI-E BridgePCI-E Bridge

PCI-E BridgePCIeTM

Bridge

PCIeTM

Bridge

Page 43: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Other Architectural Directions: Integration

Barriers?Integration of heterogeneous designs non-trivialIP barriersSchedule issues with multiple converging components

Big CPU Small

CPU

Small CPU

Small CPU

Small CPU

Special Accel-erators

Memory

IO

RF

Integrating dual designs for processor core enable both peak performance and throughput/watt

Watts/(Spec*Vdd*Vdd*L)

0.01

0.1

1

0 1 10 100 1000Spec2000*L

Page 44: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Summary (1 of 2)

Silicon process technology is unlikely to be the major engine of processor performance increases in the futureMajor circuit related challenges that we’ve only just started to address lie ahead:

Design for variation tolerance and mitigationMaintaining dynamic voltage headroom within reliability and variation imposed limitsAdaptive, self-healing techniques are a key direction

Designers

Variation

Leakage

Vdd DroopHot spots

Page 45: Sam Naffziger AMD Senior Fellow - IEEEewh.ieee.org/r5/denver/sscs/Presentations/2006_12_Naffziger.pdf · Sam Naffziger AMD Senior Fellow. Outline Today’s processor design landscape

Summary (2 of 2)

CPU architectures are converging on modest pipe length, limited issue out of order designsMulti-core is good, but has limits in the not too distant futureHeterogeneous integration is a key direction

Silicon process technology is unlikely to be the major engine of processor performance increases in the future

We’re up to the challenge, but it will be a joint effort …

MooreMoore’’s Laws Law

Design CommunityDesign Community


Recommended