+ All Categories
Home > Documents > Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and...

Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and...

Date post: 04-Jun-2018
Category:
Upload: buidat
View: 215 times
Download: 0 times
Share this document with a friend
22
Innovative Power Control for Ultra Low-Power and High- Ultra Low Power and High Performance System LSIs Hiroshi Nakamura (Univ. of Tokyo) Hideharu Amano (Keio Univ.) Masaaki Kondo (Univ. of Electro-Communications) Mitaro Namiki (Tokyo Univ. of Agriculture and Tech.) Kimiyoshi Usami (Shibaura Inst. of Tech.) JST-CREST ULP Workshop (H.Nakamura) Kimiyoshi Usami (Shibaura Inst. of Tech.) 1
Transcript
Page 1: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Innovative Power Control for Ultra Low-Power and High-Ultra Low Power and HighPerformance System LSIs

Hiroshi Nakamura (Univ. of Tokyo) Hideharu Amano (Keio Univ.)Masaaki Kondo (Univ. of Electro-Communications)Mitaro Namiki (Tokyo Univ. of Agriculture and Tech.)Kimiyoshi Usami (Shibaura Inst. of Tech.)

JST-CREST ULP Workshop (H.Nakamura)

Kimiyoshi Usami (Shibaura Inst. of Tech.)

1

Page 2: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Objective and Strategy

Objective: d ti d ti f

CSystem Software

drastic power reduction of high-performance system LSIs Strategy:

Co-Opt

Strategy: innovative power controlthrough tight Co Optimization /

timizatCompiler

through tight Co-Optimization / Co-Design of system software, architecture and circuit design

ion/Co

Architecture

architecture, and circuit design. Principle:

Performance: limited by a bottleneck

o-DesigPerformance: limited by a bottleneck

Power: summation of whole system Low power and slow operation for

gnCircuit Technology

2JST-CREST ULP Workshop (H.Nakamura)

Low power and slow operation for unhurried / idle parts

Page 3: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Role of Design Hierarchy for Low Power

OS When?OS

ArchitectureWhere?

Circuit How? throttle lever of/ f

Device Clock Gating, Dual Vth, DVFS Power Gating Back-bias

power/performance

Circuit Level : Provide levers to throttle performance / power Architecture OS Level :

DVFS, Power Gating, Back bias, ..

Architecture, OS Level : Find a chance to set levers, when and where ?? architecture: Intra-task/process optimization OS: Inter-task/process optimization

JST-CREST ULP Workshop (H.Nakamura) 3

Page 4: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Preferable Throttle Lever

Effectiveness of ReconfigS t

Processor

Power Reduction Low Overhead in Area,

Systemint fp

cache

Processor

Cache busyPerformance, Power Controlling the throttle

Memory

Network

Processorint fp

cache

lever itself takes time and consumes power

Fi C t l G l it

System LSI

Fine Control Granularity in both Space and Time

L ti f b / Locations of busy / idle parts are small and change frequently

idle

and change frequently

4JST-CREST ULP Workshop (H.Nakamura)time

Page 5: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Example of Throttle Levers

for dynamic power: Clock Gating, DVFSb th ff ti DVFS ti l (P Vdd2 ) both effective, DVFS particular (Power ∝ Vdd2 )

Clock Gating: very fine-grained control with little overhead easily utilized within circuit level design

DVFS: tens of μs to change Vdd through regulator moderate granularity

for leakage power: Power Gating, Body Biasing both effective, but large overhead

in power and performance CircuitBl k

Vdd

Body biasing: spatial granularity statically defined regions

t f fi i d t l

Block

VGND

sleep Trsleep signal not easy for fine-grained control

JST-CREST ULP Workshop (H.Nakamura)

sleep Tr.

GND

sleep signal

Power Gating5

Page 6: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Role of Design Hierarchy for Low Power: The Ideal

System When?OSSystem

Architecture

When?Where?

OS

ArchitectureWhen?Where?H ?Architecture

Circuit How?

Architecture

CircuitHow?

Spatial and Temporal

DeviceDevice Spatial and Temporal

Granularity is important

Co-Design of Circuit, Architecture and OS for Power Co Optimization of Throttle Lever Control: Co-Optimization of Throttle Lever Control:

especially, Co-Optimization of Spatial and Temporal Granularityex activity localization to make full use of throttle leversex. activity localization to make full use of throttle levers

characteristics by architecture/OSJST-CREST ULP Workshop (H.Nakamura) 6

Page 7: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Team Formation of our Research Project

System SoftwareCSa Sub-theme (leader)y

Co-operative System Soft-ware with Arch. (Prof. Namiki)

Co-O

ptimSystem

Sand A

rch

( )

ReconfigNetwork

Ultra Low-Power Reconf. Architecture (Prof. Amano)

mization o

Software

hitecture

System

Memory

Processorint fp

cacheData Resident Architecture

(Prof Nakamura)

( )

Architecture/CompilerC

ofe e

(Prof. Nakamura)Co-O

ptimA

rchiteC

ircuit

Data Resident Compiler(Prof. Kondo)

VddH VddL

logicblock

Ultra Low-Power CircuitDesign (Prof. Usami)

Ci it D i

mization

cture ant D

esign

( )

7JST-CREST ULP Workshop (H.Nakamura)

block g ( )Circuit Designof

nd n

Page 8: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

(Project 1) Geyser: Low Power Processor throughFine-grained Runtime Power Gatingg g

Target: Leakage Power Background: Leakage reduction techniques so far,

Standby time: power-gating (Coarse Grain)

Runtime: Cache-decay, Drowsy-cache, (Coarse Grain in temporal)

Leakage for logic parts (ALU, multiplier, etc.) gets serious Fast but Leaky transistors are used Active ratio of those parts are not necessarily high, but active y g

parts change frequently, that is, cycle by cycleObjective : Reduce runtime leakage power of logic partsChallenge: how to optimize the granularity of power gating

JST-CREST ULP Workshop (H.Nakamura) 8

Page 9: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Instruction Pipeline with Power-Gating

Geyser: MIPS compatible processor with 5-stage pipeline, Straightforward PG (power-gating)

Turn EX-units into active mode only if necessary Ex unit gets active when an affecting instruction enters the IF stage Ex-unit gets active when an affecting instruction enters the IF stage The activated EX-unit returns to sleep mode after execution

IF ID EX MEM WBIF ID EX MEM WBInst

ALU MultOperationOperationSHIFTSHIFT

iiSHIFTSHIFT

ii

Detects which unit

Shift Div

S d k i l

InstructionInstructionInstructionInstruction

Shift

JST-CREST ULP Workshop (H.Nakamura)

Detects which unit will be used Sends wake-up signal

MIPS R3000 pipeline9

Page 10: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Challenges for Run-Time Power-Gating: Energy OverheadEnergy Overhead

PowerBreak-Even Time (BET)

Power

: Energy overhead1 3+

2 : part of leakage saving31NormalLeakage

2 : part of leakage saving

21 3+ =

( )4

31

2

Time 4 : Net Energy saving

Break-Even Time(BET)

Sleep period should be longer than BET

Sleep Wake-Up

Sleep period should be longer than BET Otherwise, total energy consumption increases

BET t ll th ll t l it f P G ti

JST-CREST ULP Workshop (H.Nakamura)

BET tells the smallest granularity for Power Gating

10

Page 11: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Break Even Time of Each Functional Unit11

11425℃ 65℃ 100℃ 125℃

90 nm technology

74 74

9225℃ 65℃ 100℃ 125℃

Cycl

44

2638

28

les @20

26 2214

2812 16 10 6 128 10 8 2 8

00MH

z

ALU Shift Mult Div CP0

BET is shortened when the chip temperature climbs up BET is shortened when the chip temperature climbs up Leakage current depends on temperature heavily

We need Novel PG strategies taking BET into account

JST-CREST ULP Workshop (H.Nakamura)

We need Novel PG strategies taking BET into account

11

Page 12: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Power Gating Strategies

Requirement: Power off Ex-units longer than BET static strategy static strategy

straightforward:Ex-units always in sleep after execution ideal compiler (ideal compiler-directed): exact average idle time of ideal compiler (ideal compiler directed): exact average idle time of

Ex-units after each instruction is known (for reference only)

dynamic strategy L1 miss: Ex-units fall asleep only if encountering L1 cache misses

L1 miss penalty = 15 cycles L2 miss: Ex units fall asleep only if encountering L2 cache misses L2 miss: Ex-units fall asleep only if encountering L2 cache misses

L2 miss penalty = 200 cycles

both static and dynamic strategiesbo s a c a d dy a c s a eg es ideal compiler + L2 cache miss

ideal (God) : ideal dynamic strategy ( ) y gy exact idle time of Ex-units are known at anytime,

upper limit of PG (for reference only)JST-CREST ULP Workshop (H.Nakamura) 12

Page 13: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Result for Frequently Used Execution Unit

straightfor ard BET isideal compiler: less chance

FPADD for MGRID

straightforward: BET is longer than sleep time waste of energy

for longer BET

L1: resulting sleep time is about 15

straightforwardideal compiler

ideal for BET<15, but waste of energy for longer BET

Relative Energy

L1L2ideal comp. + L2

L2: resulting sleep time is 200 ideal for longer BET

for shorter BET, compiler is effective

compared to

non-PG

ideal (God) ideal for longer BET

BET(cycle)

JST-CREST ULP Workshop (H.Nakamura) 13

Page 14: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Collaboration with Compiler / OS

Suggested Power Gating Strategy Co-optimization on Control Granularity of the PG lever

compiler direction by assuming short BET, p y g ,because compiler-directed PG is effective for shorter BET

for shorter BET (high temperature) compiler direction is for shorter BET (high temperature), compiler direction is put into use, and take (compiler + L2-miss) strategy

f ( ) for longer BET (low temperature), take L2-miss strategy, but ignore compiler direction

OS is expected to switch between strategies by observing changes on BETg

Power Gating Collaborated with Compiler / OSJST-CREST ULP Workshop (H.Nakamura) 14

Page 15: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Leakage Monitor [Koyama et. al. ITC-CSCC 08][Usami et. al. ISLPED2011 (poster 15)]

BET depends on the dynamic environment, such as temperature and the process variationtemperature and the process variation.

on-chip leakage monitoring circuit More leakage results in faster charging of VGND More leakage results in faster charging of VGND Estimate leakage by measuring rise-time of VGND to VREF

OS can select the best PG strategy by observing this monitor

age

(V) More leakage

ONOFFN

D V

olta Less leakage

'1''0'VGND V

GN Reference(VREF)

JST-CREST ULP Workshop (H.Nakamura)

Sleep time (s)RiseRise15

Page 16: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Co-Optimization of Throttle Lever Control in Fine-grained Runtime Power Gatinge g a ed u e o e Ga g

PG Strategy

OSPG Control through

gy

Architecture

PG Control throughActivity Localization

CircuitLever controlled

best granularitychanges dynamically

PG

Who should be responsible for PG Control

Lever controlled in 10~100cycles

changes dynamically(e.g. temperature)

Who should be responsible for PG Control depends on granularity of Control PG control granularity (BET) : 10 ~ 100 cycles PG control granularity (BET) : 10 ~ 100 cycles best granularity of control changes every msec

16JST-CREST ULP Workshop (H.Nakamura)

Page 17: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Prototype CPU : Geyser-1 [Ikebuchi et. al. ASSCC ’09]

MIPS R3000 Fujitsu e-shuttle 65nm Fujitsu e shuttle 65nm Vdd=1.2V

successfully in operation the first successful cycle

by cycle power gating

2.1 mm

4.2 mm

Shifter DIVMULT

ALU leakage monitor17JST-CREST ULP Workshop (H.Nakamura)

Page 18: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Prototype CPU : Geyser-2

Geyser-2: 2nd Prototype with caches and

TLBs on-chip max working

frequency : 210MHz(wakeup latency is less than 5ns)

r [m

W]

ge P

owe

Demonstration @ ISLPED2011 booth ④ Le

akag

18JST-CREST ULP Workshop (H.Nakamura)

ISLPED2011 booth ④Temperature [C]

Page 19: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

(Project 2) Cool Mega Array

Reconfigurable Accelerator: gnot for performance but power-efficiency

PE array consists of only a combinatorial logicPower consumption of registers

and clock distribution is reduced

PE array consists of only a combinatorial logic combinational circuitDVS region

and clock distribution is reducedLow-voltage and Low-power PE

ti b l d ith

PE

array operation balanced with data bandwidth of memory

…………SE

localization of operations Operation / Reg. access

………………

DME ……DME DME DME

………………

……

…………

Performance / Power19JST-CREST ULP Workshop (H.Nakamura)

DMEM

DMEM

DMEM

DMEMDMEM DMEM DMEM DMEM

Architecture of CMA

Page 20: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Prototype : CMA-1C

Fujitsu 65nm 8x8 PE array 12KB data memory control part : 1.2V Maximum

Power Efficiency [MOPS/mW]

power efficiency 223.2 [MOPS/mW]

Demonstration @ ISLPED2011 booth ④

20JST-CREST ULP Workshop (H.Nakamura)

ISLPED2011 booth ④PE Array Voltage [V]

Page 21: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Summary and Future Direction

Geyser : Run-time Power Gating Processorfi t l b l ti first cycle-by-cycle power gating processor

Cool Mega Array : P Effi i A l t CMA CMACMAPower Efficiency Accelerator

Other Projects

CMA

GeyserCPU

CMACMA

L2 Cache Fine Grain Power Gating NoCs

Main M

CPU L2 Cache

[Matsutani et. al. NOCS 2010][Matsutani et. al. IEEE Trans. on CAD, 4/2011]

Linux-based Evaluation Platform Memory

Demonstration @ISLPED2011 booth ④

Towards Integrated System LSIs Evaluation through real integration via g g

3D wireless NoCs21JST-CREST ULP Workshop (H.Nakamura)

Page 22: Innovative Power Control for Ultra LowUltra Low-Power and ...€¦ · Ultra LowUltra Low-Power and HighPower and High- ... limited by a bottleneck g ... runtime power gati ” P f

Selected Publications

1. N. Seki, et.al., “A Fine Grain Dynamic Sleep Control Scheme in MIPS R3000” Proc of ICCD-2008 pp 612-617 2008R3000 , Proc. of ICCD 2008, pp. 612 617, 2008

2. K.Usami, et.al., “Design and Implementation of Fine-grain Power Gating with Ground Bounce Suppression”, Proc. of VLSI Design 2009, pp. 381-386 2009386, 2009

3. N.Takagi, et.al., “Cooperative Shared Resource Access Control for Low Power Chip Multiprocessors”, ISLPED-2009, pp. 177-182, 2009S S it t l "M CCRA C b :A 3D D i ll R fi bl4. S.Saito, et.al., "MuCCRA-Cube:A 3D Dynamically Reconfigurable Processor with Inductive Coupling link," Proc. of FPL09, pp.6-11, 2009

5. D.Ikebuchi, et.al., “Geyser-1: A MIPS R3000 CPU core with fine grain ti ti ” P f IEEE ASSCC 2009 281 284 2009runtime power gating”, Proc. of IEEE ASSCC-2009, pp. 281-284, 2009

6. H. Matsutani, et.al., "Ultra Fine-Grained Run-Time Power Gating of On-Chip Routers for CMPs", Proc. of NOCS'10, pp.61-68, 2010.

7. H. Matsutani, et.al., "Performance, Area, and Power Evaluations of Ultrafine-Grained Run-Time Power-Gating Routers for CMPs", IEEE Trans. on CAD (TCAD), Vol.30, No.4, pp.520-533. Apr 2011.

8. K.Usami, et.al., “On-chip Detection Methodology for Break-Even Time of Power Gated Function Units”, Proc. of ISLPED-2011, (to appear)

22JST-CREST ULP Workshop (H.Nakamura)


Recommended