+ All Categories
Home > Documents > The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... ·...

The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... ·...

Date post: 14-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
47
Presentation is © 2006 IBM © 2004 IBM Corporation IBM Research The Limits of CMOS Scaling from a Power- Constrained Technology Optimization Perspective D. J. Frank IBM T.J. Watson Research Center, Yorktown Heights, NY Purdue University seminar Oct. 4, 2006
Transcript
Page 1: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Presentation is © 2006 IBM © 2004 IBM Corporation

IBM Research

The Limits of CMOS Scaling from a Power-Constrained Technology Optimization Perspective

D. J. FrankIBM T.J. Watson Research Center, Yorktown Heights, NY

Purdue University seminar Oct. 4, 2006

Page 2: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

AcknowledgementsWilfried Haensch

Ghavam Shahidi

Omer Dokumaci

Mary Wisniewski

Mike Scheuermann

Phillip Restle

Steve Kosonocky

Evan Colgan

Philip Wong

Yuan Taur

Paul Solomon

Bob Dennard

Page 3: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Outline

1. General Limitations to Scaling

2. Power-Constrained Technology OptimizationModels and Assumptions

Optimization Results

3. Minimum Energy from Optimization

4. Open Questions

5. Summary

Page 4: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

1. Limitations to Scaling1. Quantum mechanical leakage currents

2. Discreteness of matter and energy

3. Material considerations

4. Thermodynamic limitations

5. Practical and environmental constraints on power

Basic idea of Scaling:

Adjust dimensions, voltages, & doping to achieve smaller FET with same electrostatic behavior.

Page 5: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Quantum Mechanical Tunneling Leakage Currents

FET 'ON'

Gnd GndVdd

FET 'OFF'

Gnd VddGnd

Gate insulator tunnelingSubthreshold leakageDirect source-to-drain tunnelingDrain-to-body tunnelingSource

DrainChannel

e-

e-

e-

Page 6: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Discrete dopant fluctuations

249,403,263 Si atoms 68,743 donors 13,042 acceptors}

[D. J. Frank, et al., Symp. VLSI Technol., p.169, 1999 andD. J. Frank and H.-S. P. Wong, IWCE, p.2, May 2000]

The number of dopant atoms in the depletion layer of a MOSFET has been scaling roughly as Leff

1.5.Statistical variation in the number of dopants, N, varies as

N1/2, causing increasing VT uncertainty for small N.Specific threshold uncertainties depend on the details of

the doping profiles.3D simulations are required to accurately evaluate these

dopant fluctuations.A preprocessor (called MCMESH3D) was written for

FIELDAY:•Checks every Si atom site to see if it is a dopant•Transfers these dopants to the simulation mesh

3D FIELDAY simulations of subthreshold current are run on ~100 different cases to statistically evaluate σVT for any given design.

•Use constant mobility model to avoid unphysical mobility dependence on dopant positions.

Page 7: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Simulated Contact Hole Exposure--Discreteness of photons

Photons absorbed

Deprotected polymer

Disolved polymer

Monte Carlo simulation of exposure and development of a 80 nm contact hole using EUV lithography. [J. Cobb, et al., Proc SPIE]

©2003, SPIE

Page 8: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Material Properties

1. Bandgap.The Si bandgap does not scale, but

(a) this is not a major problem, and (b) it can be overcome by forward biasing the body.

2. Dielectric constants.ITRS roadmap requires high-k gate insulators, but there are few materials that

come close to satisfying all the demands:High kHigh barrier (for both electrons and holes, preferably)Stable on Si at anneal temperaturesNo traps or interface statesHigh reliability

Hafnium silicate-based dielectrics are presently the most promising.

Page 9: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Thermodynamic limitations

1. The Boltzmann distribution. This causes subthreshold leakage current.

2. Irreversible computation => All switching energy is converted to heat.

3. All leakage currents and IR drops are irreversible => More heat.

4. Subthreshold slope sets fundamental limits on logic swing, but this limit is not usually important.

Page 10: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Thermal Current (subthreshold leakage)

ee(VG-VT)/ηkTIsubVT = I0

The Boltzmann distribution determines the subthreshold slope and leakage current, VT, and diode leakage currents, too.

VT can only be scaled by reducing the temperature, which is not acceptable for many applications.

Source

DrainChannel

e-Speed is very sensitive to VT/VDD ratio.

Page 11: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Practical and Environmental IssuesPower consumption and heat removal are limited by practical considerations.

Low power applications must be battery poweredMany must be lightweight => power < ~few watts.

Disposable batteries can cost >> $500/watt over life of device.

Rechargeables can cost > $50/watt over life of device.

Home electronics is limited to <~1000W by heating of the room and cost of electricity.

High performance is limited by difficulty of heat removal from chip (~100 W/chip). (Cost of electricity is ~$5/watt over life.)

Page 12: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

2. Technology Scaling Limits from System Performance Optimization

Since the end of scaling is dominated by practical considerations, it is application-dependent, requiring optimization across device, circuit, and architecture.

In the past, device, circuit and architecture design have proceeded in parallel, to increase product throughput and reduce complexity, but in an era of diminishing returns, greater performance can be achieved by optimizing across the boundaries.

As an initial exploration of this regime, a tool has been designed to capture the essential features of each complexity level in order to evaluate the impact of technology options on the performance of future systems.

Page 13: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

ConceptExistence of an Optimal Technology

leakage increases due to tunneling effects

Miniaturization

Pow

er

leakage power

dynamic powerLarge Small

Practicality imposes power constraints.

Electrostatics imposes geometric constraints

Thermodynamics imposes voltage constraints.

Quantum mechanics imposes miniaturization constraints due to tunneling.

Fixed architectural complexity+ Fixed power constraints+ Device physics= Existence of an optimal tech-nology with maximal performance.

log(Performance) Large Small

Decreasing available dynamic power overcomes speed improvements due to scaling.

Miniaturization

Page 14: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Schematic organization of optimization program

Area Model

Wire Capacitance

Device Structure

IV Model Leakage Model

Delay

Total PowerAdjust for Latency of Long Paths

Constrained optimizer

new values: improved guess

Fixed parameters

Variables: initial guess

tolerance adjustments tolerance adjustments

Wiring statistics

Leakage Power

Thermal Model

Page 15: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Assumptions and Model Details

Chip-level assumptions

Optimization metric

Device IV curves

Circuit delay

Power dissipation

Thermal model

Wire models

Accounting for variations

Page 16: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Models and ApproximationsSystem Assumptions

Processor chip is assumed to have a fixed number of cores, each with a specified number of logic gates.

Only the logic within the cores is considered within the optimizations.

The clock and memory aspects of the chip are assumed to scale inthe same way as the logic (delay, power, and area).

Core-to-core and core-to-memory communication is not dealt with.

LogicMemory

Clo

ck

Fudge Treat in detail

Fudge

Repeaters

Treat these by simple scaling from the logic part.

Page 17: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

How much area do the processor cores take?100% to 25%, generally decreasing with generation:

Dothan, 140M FETs

2 cores, 1.72B FETs

Prescott. 125M FETs

Power4, 174M FETs

Alpha 21264 ('96) 15M FETs, L1 cache only

100%

40%

40%

70%

~25%

Page 18: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Area usage within a processor core

9.3%

7.0%

60.5%

23.3%

Buffers & extra latches

Caches (L1)

Macros

Caps, Clock dist., Unused

9.3%

7.0%

20.2%

40.3%

23.3%

Buffers & extra latches

Caches

Register files

Custom & RLMs

Caps, Clock dist., Unused

9.3%

7.0%

20.2%

14.5%12.5%

13.3%

23.3%

Buffers & extralatches

Caches

Register files

Latches andLCBs

Logic

Unused/caps

Caps, Clockdist., Unused

9.3%

7.0%

20.2%

14.5%11.2%

1.2%

13.3%

23.3%

Buffers & extra latches

Caches

Register files

Latches and LCBs

Logic in use

Unused Logic

Unused/caps

Caps, Clock dist., Unused

Approximate area fractions for a high-performance microprocessor core in leading-edge technology

60%

2/31/3

31% 36%33%

90%

data from: M. Scheuermann and M. Wisniewski

Processors built with nanotechnology are likely to have similar area usage statistics.Nanotechnology may require additional area allocations for defective circuitry.Estimates of power and computational densities should take into account realistic area efficiencies.

Page 19: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Optimization Approaches

1. Engineering approach:Maximize system performance, at fixed power.

Use total logic transition rate (LTR),LTR = Ngates x activity factor/logic depth x 1/Delay Relatively little dependence on architectural details.

2. Business approach:Maximize Return on Investment (ROI).

Page 20: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

FET Model

Using a general temperature-dependent short-channel FET model in which VT, tD, and tox are coupled, halo doping effects are included, and VT is set by the doping.

Modified alpha power model:

= ⊥

ekTVVEE

LEFIekT

ekT

tWVI TGS

C

s

CHCeffox

IGSD /

F)(/)(0

0 ηµµµηηε

α

γ

10W FETLg=28nm

1mW FETLg=45nm

Page 21: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Calibrating FET IV Model

90nm 45nm 90nm 45nm

ND 3.26E18 6.00E18 2.29E18 4.35E18

xe 50.7 36.9 60.6 45.8

xoverlap 5.69 0.29 9.66 -0.55

beta 1.45 1.49 1.465 =

gamma 0.321 0.132 0.243 =

n1 -0.399 -0.643 -0.755 =

n2 2.33 1.84 2.14 =

tan theta 1.375 1.20 1.275 =

mult factor 2.04 1.97 1.933 =

Rc 0.0126 0.0162 0.0127 0.0161

Chi2 2.54 3.09 9.63 -

Separate Fits Joint Fit

ovlpGCH xLL −=Correlation plots for the joint fit:

2

1

1n

eCH

n

eCH

Deff

xLx

L

NN

+

=

( ) ( )βγµ TGCHmultD VVLI −−~

Optimizer FET model is fit to IV curves from 2D device simulator (FIELDAY).

An empirical relation for the effective body doping vschannel length is used, which allows excellent fitting to the FIELDAY data.

Best fits occur when xovlp is an optimization variable, allowing overlap to decrease with generation.

Evaluated 10 parameter fits to 90nm and 45nm technology node IV's separately, and 14 parameter fit to the data jointly.

Page 22: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Circuit Delay EstimationBasic circuit elements are:

FI=2, FO=1.65 wire-loaded NAND gates for logicinverters for repeaters, FO ~ 1.2

Delay calculations:

)(2 gateloadwirewire CCR +=τ

)2/(3 cLwire=τ

( )( ) )1(/15.0

4/33/43

3/421

α+−+τ+τ+τ

=τDDT VV

*1 2

)(

Deff

gateloadwireparasiticDD

I

CCCV ++=τ

Current is adjusted to account for noise and variations.

Propagation delay

Correction for VT/VDD.

Final delay corrections are based on Eble's thesis.

Page 23: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Power Calculation

BBOXsubVTDYNTOT PPPPP 2+++=τ−= α

DDLHCKTDYN VVVCNPD

)(21

l

),,,,(7.1 , LW

GoxDDToffDDCKTsubVT LtVVIVNP η=

),,()( , η= oxDDToxDDLW

oxcoreox tVVJVDAP

),()( 231

2 DDMaxBBDDLW

oxcoreBB VFJVDAP =

τ= α 1

CKTNLTRDl

Note that cross-through power is not included.

The powers are computed separately for logic and for repeaters.τ = mean delay for a single loaded logic gate

Dl

α is activity factor divided by logic depth. Usually ~0.015 in recent optimizations.

Page 24: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Generalized heat sink modelTwo level heat flow model:

Flow in the silicon waferFlow in the heat sink material

In each layer, the flow can be:3D (spherical) for spots smaller than thickness2D (cylindrical) at distances larger than the thickness

In silicon layer, inhomogeneous power dissipation is accounted for, to estimate maximum junction temperature at hottest point.

Si waferHeat sources

Interface

Heat spreader

(e.g., SiC or Cu)

Interface to final coolant (e.g., air or water)

ρSi – thermal sheet resistance of Si wafer

ρHS – thermal sheet resistance of heat sink

RHS – thermal contact resistance of heat sink

RSi – thermal contact resistance of Si wafer

Tem

pera

ture

Ris

e (K

)

Hot spot size (cm)

My model is red.

Kai’s data is blue.

Tem

pera

ture

Ris

e (K

)

Hot spot size (cm)

My model is red.

Kai’s data is blue.

This model is red FE model is blue

Comparison of simplified analytic model with detailed numerical model.

Page 25: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Temperature rise constraint

0.030.10

0.341.01

3.16

5.86

13.11

23.81

42.55

59.7360.00

60.00 60.00 60.00

1 10 100Total Chip Power (W)

0

1

2

3

4

5

6

7

8

9

Rel

ativ

e Pe

rfor

man

ce (a

rb u

nits

)

Labels are temperature rise at each point

1 10 100Total Chip Power (W)

0.5

1

1.5

2

2.5

Are

a m

ultip

lier

To prevent excessive heating, a constraint is introduced:

If the power level would cause the maximum chip temperature to exceed the constraint value, the core area is increased above its nominal estimated size (e.g., by diluting the core with cache) until the temperature rise is just equal to the constraint.

This makes longer average wire length, but prevents excess temperatures.

Page 26: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Communication and Wiring Models

Assume wire lengths distributed according to Rent's rule.

1E+5 1E+6 1E+7 1E+8Number of Logic Gates

468

1012

Req

uire

d#

Wiri

ng L

evel

slo

g(nu

mbe

r of w

ires)

log(length)

( )

+=

−− 3232 234)(

rCKT

rnet N

FOFOi ll

∫∫=

R

R

di

diL

net

netnoRptr l

l

ll

lll

1

1

)(

)(

RnetRptrMax

R

diN lllll

l∫= )(

CKTN2Rl

Units are gate pitches.r = Rent exponent, 0.6, here.

From optimizations:

Page 27: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Repeater Model

0.01 0.1 1Latency Penalty Factor

50

60

70

80

90

100

Rep

eate

r Spa

cing

(um

)

0.20.30.40.50.60.70.80.91

Rep

eate

r Wid

th (u

m)Repeater spacing

Repeater width

Long wires receive repeaters with a spacing that is optimized.Long wire delay can be absorbed into pipeline depth, but the latency causes inefficiency, so we use a latency penalty factor: γ.

Pecon=10 W/cm2

0.01 0.1 1 10 100CPU Core Power (W)

0.001

0.01

0.1

1

Rep

eate

r Spa

cing

(cm

)

9S10S

11S 12S 13S

Page 28: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Local Variation ModelingVariation sources:

Signal Coupling noiseSupply noiseStatistical doping variationsLER gate length variations

Consequences modeled:Increased static power• combine 1 sigma of doping, length, and noiseCritical path delay distribution• yield-based, using estimated critical path distribution,• and 1 sigma of doping and length, and worst case noise.Single stage functionality• use worst case (~6 sigma) of doping and length, no noise.

Page 29: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Optimization Results

General results

Evaluating specific possible device directionsIncreasing mobility

High-k gate dielectric and metal gates

BEOL improvement only

Better heat sinks

Sub-ambient cooling

Multi-processor tradeoffs

Page 30: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Optimize by generation Optimizations over 7 variables: tox, Lg, ND, <w>, Vdd, Srpt, <wrpt>

Dual core processor with aggressive air cooling

HfO2oxynitrideoxynitrideoxynitrideoxynitrideGate insulator

2.52.83.23.53.9k_BEOL

0.010.30.30.40.4gate depletion (nm)

221.71.41mob. enhancement

11.31.82.73.9ACLV (nm)

0.280.280.280.280.28LER sigma Lg @W=1um (nm)

0.01610.01520.01440.01360.0129Rcs (Ohm cm)

34.940.246.353.461.5halo scale len (nm)

-1.5-1.0338.7419gate overlap (nm)

507090120175Wire 1/2 pitch (nm)

32456590130Technology node (nm)

Note that the LG, tox, VDD, VT, etc. are NOT preselected. They are solved for by the optimizations.

Page 31: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Optimize by generationDual core processor with aggressive air cooling

1 10 100Total Chip Power (W)

0

1

2

3

4

5

6

7

8

9Pe

rform

ance

(arb

uni

ts)

90 nm 65 nm 45 nm 32 nm

High-k gateinsulator

Oxynitride

Dual-core processor

Page 32: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Optimize by generationDual core processor with aggressive air cooling

Gate Length vs Power Oxide Thickness vs Power

0.01 0.1 1 10 100Total Chip Power (W)

0

20

40

60

80

100

120

Gat

e Le

ngth

(nm

)

90 nm 65 nm 45 nm 32 nm

0.01 0.1 1 10 100

Total Chip Power (W)

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1.6

Equi

vale

nt O

xyni

trid

e Th

ickn

ess

(nm

) 90 nm 65 nm 45 nm 32 nm

High-k, for 32nm

Oxynitride

(high-k case assumes 0.3nm barrier layer, bandedge metal gate, HfO2-like insulator characteristics.)

Page 33: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Optimize by generationDual core processor with aggressive air cooling

0.01 0.1 1 10 100Total Chip Power (W)

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Supp

ly a

nd T

hres

hold

Vot

lage

(V) Vdd,90

VT,90Vdd,65VT,65

Vdd,45VT,45

Vdd,32VT,32

Voltages vs Power

Page 34: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Optimal Power Allocation Fractions

1 3 10 30 100 300Chip Power (W)

0%

20%

40%

60%

80%

100%Po

wer

Allo

catio

nOxide pwr, rptrsSubVT pwr, rptrsDyn. pwr, rptrs

Oxide pwr, logicSubVT pwr, logicDyn. pwr, logic

Page 35: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Impact of long-wire assumptions

1 10 100 1000Total Chip Power (W)

0

1

2

3

4

5

6

7

Perfo

rman

ce (a

rb u

nits

)

1x wires 2x wires Rwire=0

Optimized Performance vs Power

Green case: wires with repeaters are 2x the regular wire.Red case: all wire is the same size (63.6 nm, here, for 45nm node)Blue case: zero wire resistance case is for comparison.(All wires are 2:1, height to width ratio)

Page 36: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Mobility dependence

1 1.5 2 2.5 3

Mobility Enhancement Factor

1

1.05

1.1

1.15

1.2

Rel

ativ

e Pe

rfor

man

ce

1 W chip 10 W chip 100W chip

1 1.5 2 2.5 3Mobility Enhancement Factor

1

1.02

1.04

1.06

1.08

1.1

1.12

Rel

ativ

e Pe

rfor

man

ce

1W 10W 100W

32nm technology8 core processor

Air cooled

45nm technologydual core processorwater cooled

Enhanced mobility has greatest benefit at high power.

Even for large mobility enhancements, performance boost is modest: 10-15%.

1 10 100Total Chip Power (W)

1

1.05

1.1

1.15

1.2

Rel

ativ

e Pe

rfor

man

ce

1.5x, air2.0x, air

2.5x, air1.5x, water

2.0x, water 2.5x, water

45nm technologydual core processor

Page 37: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Metal-gate workfunction for high-k and oxynitride

0 0.1 0.2 0.3 0.4 0.5Workfunction offset from bandedge (ev)

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1

Perf

orm

ance

rela

tive

to p

oly-

Si

5W, oxynitride50W, oxynitride

5W, high-k50W, high-k

1 10 100Total Chip Power (W)

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

1

poly

-Si G

ate

Perf

orm

ance

Rel

ativ

e to

wfn=0 wfn=.2 wfn=.4 wfn=0, hi-k wfn=.2, hi-k wfn=.4, hi-

1 10 100Total Chip Power (W)

0

5

10

15

20

Perf

orm

ance

(arb

uni

ts)

wfn=0wfn=0.2

wfn=0.4wfn=0, hi-k

wfn=0.2, hi-kwfn=0.4, hi-k

45nm node, dual core processor with aggressive air cooling

Page 38: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Cooling scenario optimizations

Optimized over 7 variables: Lg, tox, Nd, <w>, Drptr, <wrptr>, Vdd.

1 10 100 1000 10000Total Chip Power (W)

0

1

2

3

4

5

6

7

Perf

orm

ance

(arb

uni

ts)

Low Cost AirHi Perf Air18C Water-40C Liquid

8 core processor design32nm technology

1 10 100 1000 10000

Total Chip Power (W)

0

1

2

3

4

5

Perf

orm

ance

(arb

uni

ts)

-40C Liquid18C WaterHi-Perf. AirLow-Cost Air

4 core processor design45nm technology

Page 39: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Multi-processor trade-offs

The energy / performance tradeoff is very steep at the high end.

Lower power, more parallel processors potentially offer more computation for the same total power level.

1E+12 1E+13 1E+14 1E+15 1E+16Total Logic Transistions / sec

0.1

1

10

Load

ed S

witc

hing

Ene

rgy

(fJ)

Energy vs Performance

These results are for 4-processor chips with micro-channel water cooling, pulling out all the stops.

9 variables: tox, Lg, ND, <w>, Vdd, wHP, Srpt, <wrpt>, xhalo

Page 40: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Optimizations for varying number of processor cores

1 10 100 1000Total Chip Power (W)

1

10

Perfo

rman

ce (a

rb u

nits

)

2 cores 4 cores 8 cores 16 cores

32nm node optimizationsAggressive air coolingAssume: fixed total number of FETs, divided into varying # of cores.

Optimized over 7 variables:

Lg, tox, Nd, <w>, Drptr, <wrptr>, Vdd.

Page 41: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

3. What is the best possible?Optimizations across all generations These optimizations are for hypothetical

4-processor chips with micro-channel water cooling, pulling out all the stops.

9 variables: tox, Lg, ND, <w>, Vdd, Srpt, <wrpt> , wHP, xhalo

0.001 0.01 0.1 1 10 100 1000Total Chip Power (W)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Volta

ge (V

)

Supply Voltage Threshold Voltage (VTsat)

0.001 0.01 0.1 1 10 100 1000Total Chip Power (W)

0.1

1

10

100

1000

Opt

imal

Dim

ensi

on (n

m)

Mean FET widthWire Half-pitch

Gate LengthHi-k thickness

Equiv. oxide thickness

Caveats: conventional MOSFET structure, high-performance design practices

2:1 minimum width-to-length ratio.

Wire becomes VERY small at lowest power because wire resistance has little impact on the slow speeds.

Optimal wire pitch grows for highest performance cases. Gives lower resistance, and the FETs are spreading out because of their width, anyway.

Page 42: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Continued Optimizations across all generations

0.001 0.01 0.1 1 10 100 1000Total Chip Power (W)

1E+12

1E+13

1E+14

1E+15

1E+16

Logi

c tr

ansi

tions

/sec

Performance

0.001 0.01 0.1 1 10 100 1000Total Chip Power (W)

0.1

1

10

Ener

gy p

er lo

gic

tran

sitio

n (fJ

)

Switching Energy

0.001 0.01 0.1 1 10 100 1000Total Chip Power (W)

0.1

1

10

Cor

e A

rea

(mm

2)

Area of Each CPU Core

Logic reaches a maximum density of 3.4G gates/cm2 within the logic part of the core.

(500K logic gates + cache, registers, latches, overhead)

9 variables: tox, Lg, ND, <w>, Vdd, Srpt, <wrpt> , wHP, xhalo

Power constraints limit conventional CMOS scaling to ~3.4G logic gates/cm2. The challenge for nanotechnology is to find a way to do significantly better.

Page 43: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Energy vs Performance

Numbers on each point are the total chip power for that point.

300100

3010

3

10.30.1

0.03

0.01

0.003

0.001

1E+12 1E+13 1E+14 1E+15 1E+16Logic Transitions / sec

0.01

0.1

1

10

Ene

rgy

/ Tra

nsiti

on (f

J)

oxynitride, 4wayhigh-k, 4way

high-k, 16wayhigh-k, 16w, 0 tol

Zero process tolerances is unrealistic, but serves as a lower bound. Gate lengths can be 30% smaller, yielding higher density and shorter wires.

Average loaded switching energy versus performance for cross-generational optimization (9 parameters).

Page 44: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

Minimum Energy breakdownAverage logic load capacitance: 0.17fF gate cap, 0.06fF parasitic, 0.40fF wire (3.2um average length)

Minimum supply voltage: ~ 360mV.

Raw logic switching energy: ½CV2 = 0.04fJ

Ratio of total logic power to active power: ~1.4

Ratio of logic + repeater power to logic power: ~ 1.2

Long wire latency penalty factor: ~1.6

All together: effective energy per ‘useful’ logic transition: ~0.1fJ

Page 45: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

4. Open questions

Many aspects of VLSI design are tied to the importance or ‘criticality’ of a net. How can the distribution of ‘importance’ be modeled? Can this be tied to a distribution of electrical and/or logical effort?

More accurate FET width distribution

Multiple VT optimization

Wiring hierarchy optimization

How should SRAM optimization be tied to logic, if at all?

How to optimize further up the design hierarchy into architecture?

Page 46: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

5. SummaryGeneral limitations to scaling have been summarized.

Power and temperature rise are dominant limiters.

A set of simplified models have been developed to enable fast turnaround comparative technology optimizations in the presence of power and temperature constraints.

The dependence of optimal technology parameters on application power requirements has been investigated.

The dependence of chip performance on potential technology enhancements has also been investigated.

Performance gains can still be obtained from improved cooling and/or from lower power, slower, more parallel processors.

Minimum loaded switching energy for conventional CMOS is ~0.1 fJ.

Open questions are still under investigation.

Page 47: The Limits of CMOS Scaling from a Power- Constrained ...mniemier/teaching/2011_A_Spring/hws/... · 1. Bandgap. The Si bandgap does not scale, but (a) this is not a major problem,

Purdue University NCN Seminar | © IBM | Oct. 4, 2006 © 2004 IBM Corporation

IBM Research | Silicon Technology

The End of Scaling is Optimization

Miniaturizationlog(

Syst

em P

erfo

rman

ce)

Stop when you get to the top.

Then, try to switch to a different mountain, e.g., some form of nanotechnology.

But, each technology has its own summit, and we need to try to make sure the new peak is actually higher than the one we are already on.


Recommended