Power Consump in IC

Post on 21-Jul-2016

225 views 0 download

description

Power Consumption in ICs

transcript

Power Consumption by Integrated Circuits

Lin ZhongELEC518, Spring 2011

Power consumption of processing

• Dynamic power

2

Busy power vs. delay vs. energy

fVCaP dddyn 2

)( Tdd

dd

VVV

t

Analysis and Design of Digital ICs, Hodges et al

3

Core 2 Duo for example• Intel® Core™2 Duo processor

– T7800 at 2.6GHz– T7700 at 2.4GHz available on Thinkpad T61p– 0.75-1.35V, 35Watts

• Intel® Core™2 Duo Low Voltage– L7500 at 1.6GHz available on Thinkpad X61– 0.75-1.3V, 17Watts

• Intel® Core™2 Duo Ultra Low Voltage– U7500 at 1.06GHz available on Dell D430– 0.75-0.975V, 10Watts

4

5

Switching energy

e=1/2 C V∙ ∙ 2

Switching power

P= b C V∙ ∙ 2= a C V∙ ∙ 2 f∙

Higher integration• Selling the chipset (or solution or platform)

– Intel Centrino• Centrino Duo includes Core 2 Duo processor, 9XX Express-series chipset,

and Wi-Fi adapter– TI TCS2600 chipset

6 6

System-on-a-chip (SoC)

• TI OMAP

7

SiP: Multiple-chip product (MCP)

Siemens SX66 PDA PhoneAudiovox PPC6601KIT

32MB

400MHz

Source: Intel.com

8

SiP: Stacked-die approachQualcomm 3G CDMA2000 chip

Seven power regimes 100 clock regimes

ISSCC 20049

10

Moore’s Law

known

Exciting Unknown

11

MOSFET at nanoscale

Sunlin Chou, “Extending Moore’s Law in the Nanotechnology Era” (www.intel.com).

12

Given workload L and deadline T

• L measured by # of CPU cycles• Clock speed f ≥ L/T

• Time to finish: t = L/f

• Energy to finish: P t= a C V∙ ∙ ∙ 2 f t= a C ∙ ∙ ∙V∙ 2 L∙

13

Effect of lower clock speed (f)

Power consumption

P= a C V∙ ∙ 2 f∙

Energy consumption

E=P t= a C V∙ ∙ ∙ 2 f t= a C V∙ ∙ ∙ ∙ 2

L∙

14

Effect of lower supply voltage (V)

Power consumption

P= a C V∙ ∙ 2 f=k V∙ ∙ 3=x f∙ 3

Energy consumption

E=P t= a C V∙ ∙ ∙ 2 f t= a C V∙ ∙ ∙ ∙ 2

L∙

Maximum clock speed

f= b V∙

15

Given workload L and deadline Tsingle processor

• The processor can run at any frequency (voltage)– f= b V∙

• The processor can be complete off when work is done (zero power when idle)

• To minimize energy consumption, at which frequency should the processor run?– f ≥ L/T (in order to meet the deadline)– E=P t= a C V∙ ∙ ∙ 2 f t= a C V∙ ∙ ∙ ∙ 2 L∙– f=????

16

time

f

T

f1=L/T

f2=L/(T/2)=2f1

17

time

P

T

P1=x f∙ 3

P2=23P1

18

Given workload L and deadline TM processors

• The workload can be divided without overhead: L = L1+L2+…+LM (L ≥ Li≥0)

• To minimize energy consumption, at which frequency should processor i run?– f i= Li/T and V = u L∙ i

– Ei= a C V∙ ∙ 2 L∙ i=w L∙ i3

19

Given workload L and deadline TM processors

• The workload can be divided without overhead: L = L1+L2+…+LM (L ≥ Li≥0)

• To minimize the TOTAL energy consumption, how should the workload be allocated?– E= E1+E2+…+EM= w L∙ 1

3+w L∙ 23+…+w L∙ M

3

– = w(L13+L2

3+…+LM3)

20

From high school

• [(a+b)/2]2≤ (a2+b2)/2

≥ ≥ ≥

Quadratic mean Arithmetic mean Geometric mean harmonic mean

21

From high school (Contd.)

• [(a+b)/2]3≤ (a3+b3)/2 ( for a, b ≥0)

– E= w(L13+L2

3+…+LM3) ??? (L1+L2+…+LM)3

22

From college: Convex (Concave)

By definition of “convex”

23

Jensen’s Inequality (finite form)

• ϕ (x) is convex– ϕ (t x∙ 1+(1-t) x∙ 2)≤ t ∙ ϕ (x1)+(1-t) ∙ϕ (x2)

http://en.wikipedia.org/wiki/Jensen%27s_inequality#Proof_1_.28finite_form.29

24

• ai=1/n• ϕ (x) =x2 (Convex)

• ϕ (x) =x3(Convex for x≥0)– E= w(L1

3+L23+…+LM

3)=w M (L∙ ∙ 13+L2

3+…+LM3)/M

– ≥ w M [(L∙ ∙ 1+L2+…+LM)/M] 3=w L∙ 3/M2

More about ConvexityCost

Return

Example Cost Return

Workload distribution Energy Workload finished within T

Eating Price of apples Pleasure from eating apples

Helicopter engine Price of engine Engine thrust

Law of diminishing marginal returns

Cost of production Increase in production

More about Convexity

• Greedy optimization works• Combine simpler/cheaper components

Cost

Return

27

Check the assumptions

• Power consumption is zero when the processor is not active

Idle power (Static power)

Tstatic eTP

2 ddVddstatic eVP

When IC is idle but not powered off, e.g. SRAM28

Leakage power

30

Scaling down

Scaling down (Contd.)

31

Thermodynamics: Gas

Quantum dynamics: Individual molecules

Uniform (central limit theorem)

High variation and likely defectivel

Scaling: Not that simple (Contd.)

32

Tunneling effect

33time

f

T

f1=L/T

f2=L/(T/2)=2f1

34time

P

T

P1=x f∙ 3

35time

P

T

P1=x f∙ 3+Pstatic

36time

P

T

P1=x f∙ 3+Pstatic

P2=23x f∙ 3+Pstatic

Why is static power important?

ITRS, 2009

Pentium II (Klamath) and III (Coppermine)

7.5M Transistors28M Transistors 38

Core 2 Duo (Conroe)

64KB L1 cache, 4MB L2 cache, 291M Transistors

39

Core 1

Core 2

Solutions to “never-enough” challenge

234M transistors

24M go to L2 cache

8 SPE, each 20.9M transistors (167M transistors)

Each has 4 64KB SRAM (12M transistors)

SRAM takes 122M transistors (>50%)40

Multiple power/clock domains

TI OMAP 2 architecture, ISSCC 2005

Multimedia phone: NTT DoCoMo 3G FOMA 902i to be released with OMAP2420

41

42

Given workload L and deadline Tsingle processor

• One processor can run at any frequency (voltage)– f= b V∙

• The processor can be complete off when work is done (zero power when idle) Given Pstatic

– Given energy overhead of shutting down the processor (Eoverhead)

• To minimize energy consumption, at which frequency should the processor run?

43time

P

T

P1=x f∙ 3+Pstatic

P2=23x f∙ 3+Pstatic

Why is there overhead to power off circuit?

Clock generator

• Resonant circuit + amplifier

• Resonant circuit (Oscillator)– Crystal oscillator (>2x109/yr)

• ~10KHz to ~10MHz• Quartz, ceramics (low cost, low accuracy), surface acoustic

wave (SAW) quartz crystal (expensive, accurate)• Real-time clocks

– 32.768KHz (215), 4.194304MHz (222)• Application-specific

– 4.9152MHz (4 x 1.2288MHz, CDMA baseband frequency)……

45

ResA

• LC/RLC circuit• Ring oscillator

– Application other than oscillator?• Voltage-controlled oscillator (VCO)

– Varicap: variable capacitance diode (tuning diode)– Phase-locked loop for high-speed clock (next slide)– Frequency scaling of IC for energy saving

Oscillator (Contd.)

46

• High-speed clock from a master oscillator• Digital PLL

• Clock generation, recovery, synchronization– Digital computing, RF communication

Phase-locked loop (PLL)

47

Phase-frequency detector

Master oscillator VCO

Frequency divider (N)

voltage

48

Given workload L and deadline Tsingle processor

• The processor can run at any frequency (voltage)– f= b V∙

• The processor can be complete off when work is done (zero power when idle)

• To minimize energy consumption, at which frequency should the processor run?– f ≥ L/T (in order to meet the deadline)– E=P t= a C V∙ ∙ ∙ 2 f t= a C V∙ ∙ ∙ ∙ 2 L∙– f=????

Threshold voltage

50

Vdd scales slow & Vth scales slower• Vth is limited by the

thermal voltage

• Vdd needs to stay considerable higher than Vth to curb leakage current

• End up with destroying the scaling rules– low channel mobility

Plummer and Griffin, 2001 (Data from ITRS/NTRS)

51

Check the assumptions (Contd.)

• The workload can be divided without overhead: L = L1+L2+…+LM (L ≥ Li≥0)

• Communication cost between processors!!!

Quadrotor vs. Helicopter

Quadrotor vs. Helicopter

De Bothezat Quadrotor, 1923.

Quadrotor vs. Helicopter

A.R. Drone, 2010

Wire power consumption

55

Wire power consumption

Inter-processor communication