The implications of energetic and thermal constraints on ... · dimensions”, IEEE Journal of...

1

The implications of energetic and thermalconstraints on current and future

processors

Pierre Michaud

June 2008

2

Outline

1. Why temperature, power and energy must be limited

2. Some basics

3. Why power consumption became a problem

4. How the power problem has been tackled

5. The temperature problem

6. How future processors may look like

3

Why temperature, power andenergy must be limited

4

Processing consumes energy

• When a processor executes a program it consumes someenergy. This energy is transformed into internal energy.

• Temperature is a measure of the average kinetic energyassociated with the disordered microscopic motion ofparticles

• A processor consuming some energy increases its owntemperature and that of its environment

5

Temperature must be limited

• We don’t want the processor to burn

• A 10 ºC temperature increase halves the processorlifetime– Several aging phenomena at work that are exponential with

temperature– Example: electromigration

• Circuits gets slower when temperature is higher

• ➔ Maximum temperature between 80 ºC and 100 ºC

6

Power consumption must be limited

• Power = energy per unit of time

• Electric power = voltage X current

• Power is limited by the power supply and by themaximum current

• A high sustained power generates a hightemperature ➔ limiting power is a way to limittemperature

7

Energy consumption must be limited

• Some processors are battery-powered– Laptop computers, hand-held devices– A battery stores a finite amount of energy– If you spend less energy to do a given work, you are able to do more

work before you need to recharge the battery

• Energy costs money– Will cost more and more (Google cares !)– Also an environmental cost

• Consuming less energy decreases power and temperature

8

Example: data center

• For each watt dissipated in the room, one extra watt mustbe consumed for the air conditioning

• Assume machines dissipate 100 KW, of which 30% (=30KW) come from CPUs

• + 100 KW for cooling ➔ 200 KW

• If we halve the power consumed by each CPU, we can …– Decrease the electric bill (15+15=30 KW saved)– Or put more CPUs in the room

9

Some basics

10

MOSFET(Metal-Oxide Semiconductor Field-Effect Transistor)

source drain

gate

substrate

dielectric

!

L

!

W

!

tox

11

Switching energy

0

C

Joule heating

i

v

ddV

=!=! ""#

dvvVCidtvV

ddV

dddd

00

)()( 2

2

1

ddCV

Energy is consumed when a gate output voltage switches from lowto high or from high to low

The switching energy depends oncapacitance and supply voltage

12

Dynamic power

• Dynamic power = switching energy consumed per second

!

Pd

= Ce" F "V

dd

2

clock frequencyequivalentcapacitance

• Equivalent capacitance takes into account contributions from allswitching gates on the chip

13

Gate delay

Cv

ddV

!

t "C #V

dd

Idsat

!

Idsat

" µ#ox

tox

$W

L$Vdd%V

t( )2

2

Gate dielectricthickness

Channellength

Channelwidth

Thresholdvoltage

Gate dielectricpermittivity

Approximate transistoras a current source

0

Shockley model

14

Why power consumption became aproblem

15

Moore’s law

• The number of transistors on a processor chip doubles every 2 years

16

Classical scaling rules

• Dennard, Gaensslen, Yu, Rideout, Bassous, Leblanc,“Design of ion-implanted MOSFET’s with very small physicaldimensions”, IEEE Journal of Solid-State Circuits, oct. 1974– On each technology generation, divide all transistors and wires

dimensions by– Divide all voltages by

• Dimensions scaling ➔ all parasitic capacitances (transistors& wires) are divided by

!

2

!

2

!

2

!

parallel plate capacitance =dielectric permittivity " plate area

dielectric thickness

17

Classical scaling: impact on delay

!

t "2

µ#

L

W

$

% &

'

( ) #

tox

*ox

$

% &

'

( ) #C #

Vdd

Vdd+V

t( )2

!

" 1

2

!

" 1

2

!

" 1

2

!

" 1

2 !

" 1

2

Under classical scaling, clock frequency(inverse of delay) can be multiplied by

!

2 "1.4

18

!

Pd

= Ce" F "V

dd

2 #Ce

C

$

% &

'

( ) "

W

L

$

% &

'

( ) "

*ox

tox

$

% &

'

( ) "Vdd

Vdd+V

t( )2

Classical scaling: impact on power

• Before 1990, supply voltage was kept constant (5 V)• In the 1990’s, power became a concern and voltage was scaled• Since 2000, voltage keeps decreasing, but more slowly

!

" 1

2

!

" 2

!

" 1

2!

" 1

2

19

Leakage currents

• Subthreshold leakage current between drain and source when the gate-to-source voltage is below the threshold voltage

• Gate leakage current due to tunneling of electrons through the dielectric layer

• Classical scaling requires that the threshold voltage be decreased when thesupply voltage is decreased– But this increases subthreshold leakage– ➔ Difficult to scale supply voltage further

• Classical scaling requires that the gate dielectric thickness be decreased– But this increases gate leakage– ➔ Use gate dielectric with higher permittivity (high-K)

drainsource

gate

dielectric

20

Static power

• Leakage currents ➔ static power consumption Ps

!

total power = Pd

+ Ps

= Ce" F "V

dd

2+V

dd" I

total leakage current

• We spend energy doing no work ☹

• Subthreshold leakage increases with temperature !

21

Microarchitects are guilty !

• Extra transistors have been used to increase the processorperformance, at the cost of more complexity and less energy efficiency– Superscalar, out-of-order, 64-bit operations, floating-point, SIMD,

multicore, etc.

• Until year 2000, clock frequency has increased not only because offaster transistors but also because of pipelining

!

Pd

= Ce" F "V

dd

2

!

constant chip area "# 2

!

pipelining "# 2

22

Core complexity across generations

R. Kumar, D. Tullsen, N. Jouppi, P. Ranganathan, “Heterogeneous Chip Multiprocessors”, IEEEComputer, Nov. 2005

Alpha 21064 (EV4) to Alpha 21464 (EV8)

23

What happened ?

• In 2000, we thought we would have processors in 2008 clocked at 10Ghz ➔ this did not happen

• In 2004, the Intel Tejas microprocessor was cancelled ➔ too muchpower, too much heat– It was very difficult to continue pushing the complexity of superscalar

processors and the clock frequency together

• But Moore’s law continues, so what do we do with extra transistors ?• ➔ big caches & multiple “simple” cores on the chip

• Power consumption has become a first-class constraint

24

How the power problem has beentackled

25

Energy-efficiency needed everywhere• Technology

– high-threshold-voltage transistors where speed is not critical (e.g., caches)– high-K gate dielectric– …

• Circuit– Sometimes, sacrificing speed a little permits saving significant energy– fine-grained clock gating ➔ don’t clock a flip-flop unless it has valid data in

input– …

• Microarchitecture– Find a good balance between complexity and performance– Disconnect parts that are not used

• Example: if a program does not perform floating-point computations, turn theFP units off

26

Voltage / frequency

• Circuit can be clocked at frequency proportional to supply voltage– As long as supply voltage is not too close to transistor threshold voltage

• Example– Assume 75% of power is dynamic and 25% static– multiply simultaneously voltage and frequency by 0.8

!

Pd

+ Ps

= Ce" F "V

dd

2+V

dd" I

!

" 0.8

!

" (0.8)2

!

" 0.8

!

0.75 " 0.83

+ 0.25 " 0.8 # 0.6 ➔ we get a 40% decrease of powerif we decrease frequency by 20%

27

Parallelism is power efficient

1 processorfrequency F

voltage V

2 processorsfrequency F/2

voltage V/2

• Parallelism allows to get the same performance whileconsuming less power

• A multicore processor permits obtaining moreperformance with the same power consumption– provided the application is parallel

28

The temperature problem

It is related to the power problem,but is not strictly equivalent to it

29

Processor heat sink

air-blowing fan takes heataway from the chip

Put on top of theprocessor chip

30

Cooling a laptop

Heat pipe

Heat sink

CPU

31

Fourier’s law of heat conduction

!

r q = "K #

r $ T

Heat flux Thermalconductivity

Temperaturegradient

Heat flows from high-temperature regions to low-temperature ones ata rate proportional to the temperature difference!

W /m2

!

W /mK

!

K /m

siliconaluminumcopper

100-150 W/mK240400

W/mKW/mK

32

Thermal resistance

!

T1

thermally insulated side

!

T2

!

section S

!

power P =Q" S

!

Fourier's law Q =P

S= K

T1"T

2

L!

length L

!

Thermal resistance R =T

1"T

2

P=

L

K # S(in kelvin per watt)

33

Convection cooling

solidtemperature T

ambient fluidin motiontemperature T0

!

Q = H(T "T0)

!

QHeat flux

!

W /m2

Heattransfercoefficient

!

W /m2K

Forced convection: the heat transfercoefficient increases with the fluid velocity

Newton’s law of cooling

!

Thermal resistance R =T "T

0

P=

1

H # S(in kelvin per watt)

area in contact with the fluid

34

Example

Heat sink

Silicon die

!

50 µm, 3.33 W/mK

Interfacematerial

!

500 µm

Transistors& wires

!

5 µm

!

150 mm2

!

Rim

= 0.1 K/W

!

Rhs

= 0.3 K/W

Primary heat path

!

Rsi

= 0.02 K/W

!

Tcircuit

"Tair

= P # (Rsi

+ Rim

+ Rhs) Each watt dissipated contributes

a 0.42 ºC temperature increase

35

Temperature is not uniform

J.D. Warnock et al., “The circuit and physical design of the POWER4 microprocessor”, IBM Journal ofResearch & Development, Jan. 2002.

36

Point source

Point sourcedissipating 1 watt

Temperature (relative to ambient) as afunction of the distance from the source

For multiple sources, add the contributions from each source

37

Impact of miniaturization on temperature

!

" 1

2

If power remains constant, temperatureincreases

38

Power must be decreased !

!

" 1

2

!

power " P

!

power P

!

P'= P " temperature increases

!

P'=P

2 " same power density (W/m2)" temperature decreases

!

P'=P

2 " temperature roughly the same

39

From single to dual-core

use this area fora second core!

Pd

= Ce" F "V

2

!

Pd" =

Ce

2# " F # " V

2

!

Pd" =

Pd

2# " F " V

2=

FV2

2

!

If we want " F > F, we must have " V <V

2# 0.84 $V

same total power ➔

40

“Dual-core” with a single valid core• Yield issue: only a fraction of the chips on a wafer will eventually be sold

– Other chips have defects

• Valid chips have either 1 or 2 valid cores

• Chips with a single valid core can use a higher voltage and frequency– (ignoring commercial considerations …)

!

Pd" =

Pd

2# " F " V

2= FV

2

times higher than thechip with 2 valid cores

We are limited by temperature ➔

!

2

41

Dynamic voltage / frequency scaling

• Processors can vary voltage and frequency dynamically (DVFS)– Keep frequency proportional to voltage

• The operating system sets frequency and voltage depending on thesituation– Thermal sensor indicates that temperature is too high ➔ decrease V & F– System activity is low ➔ decrease V & F to save energy

• When a single core is used, put the inactive core in low power mode andincrease V & F of the active core to boost performance– Intel Penryn processor ➔ 10 % frequency boost when 2nd core inactive

• Intel “Dynamic Acceleration Technology”

!

V ="F

42

DVFS in multicores

!

F2N,V

2N2N cores active ➔

N cores active ➔

!

FN,V

N

!

FNVN

2

F2NV2N

2= 2

!

" 2FN

3

" 2F2N

3= 2➔

!

FN

F2N

= 21/ 6

➔

!

F2

F1

= 21/ 6

"1.12

!

F4

F1

= 21/ 3

"1.26

!

F8

F1

= 21/ 2

"1.41

Intel Penryn

Intel Nehalem ?

43

Activity migration

• When using a single core at a time, migrating the executionperiodically to a different core decreases temperature– Spreads the same heat on a larger area

44

Activity migration


45

Activity migration


46

Activity migration


47

Activity migration


48

Activity migration


49

Activity migration


50

Activity migration


51

Activity migration


52

DVFS in multicores + activity migration

!

F2N,V

2N2N cores active ➔

N cores active ➔

!

FN,V

N

!

FNVN

2

F2NV2N

2= 2

!

" 2FN

3

" 2F2N

3= 2➔

!

FN

F2N

= 21/ 3

➔

!

F2

F1

= 21/ 3

"1.26

!

F4

F1

= 22/ 3

"1.59

!

F8

F1

= 2

DVFS is more efficient when it iscombined with activity migration➔ potential speed-up of 2 for sequentialexecution on a 8-core processor

53

The future ?

54

Let’s assume this scenario

!

frequency F "1

C

#

$ %

&

' ( )

W

L

#

$ %

&

' ( )

*ox

tox

#

$ %

&

' ( )

Vdd+V

t( )2

Vdd

!

" 2

!

dynamic core power Pd"

Ce

C

#

$ %

&

' ( )

W

L

#

$ %

&

' ( )

*ox

tox

#

$ %

&

' ( ) V

dd+V

t( )2Vdd

!

constant : 0.1V or 0.2V

!

" 2

55

My guess wish for 2010-2020

cache

big cores small cores

• Constant chip area

• Several big cores for high sequentialperformance– Vdd x 0.9 for constant core power– frequency x 1.7 on each generation– Use a single big core at a time– Migrate periodically for temperature

• Many small cores for high parallelperformance– Vdd x 0.8 for halving core power– Frequency increases slowly– Parallel performance doubles on each

generation

low freq.high freq.low Vt high Vt

56

Conclusion

• Sequential performance must increase, it is a necessity– Some applications have little parallelism– Amdahl’s law– legacy code– software productivity– Efficient activity migration may be the only long-term solution

• Peak parallel performance is likely to increase faster thansequential performance

57

Questions ?

Date post:	24-Jun-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

The implications of energetic and thermal constraints on ... · dimensions”, IEEE Journal of...

Documents