+ All Categories
Home > Documents > EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · •...

EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · •...

Date post: 15-Sep-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
32
1 EE241 - Spring 2005 Advanced Digital Integrated Circuits Lecture 11: Voltage Optimization 2 Power /Energy Optimizaton Space + Variable V T Sleep T’s Multi-V DD Variable V T + Input control Stack effects + Multi-V T Leakage DFS, DVS Clock Gating Logic design Scaled V DD TSizing Multi-V DD Active Run Time Sleep Mode Design Time Energy Variable Throughput/Latency Constant Throughput/Latency
Transcript
Page 1: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

1

EE241 - Spring 2005Advanced Digital Integrated Circuits

Lecture 11:Voltage Optimization

2

Power /Energy Optimizaton Space

+ Variable VT

Sleep T’s

Multi-VDD Variable VT

+ Input control

Stack effects

+ Multi-VTLeakage

DFS, DVSClock Gating

Logic design

Scaled VDD

TSizing

Multi-VDD

Active

Run TimeSleep ModeDesign TimeEnergy

Variable Throughput/LatencyConstant Throughput/Latency

Page 2: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

2

3

♦ Design Parameters• Circuit

(sizing, supply, threshold)

• Logic style(domino, pass-gate, …)

• Block topology (adder: CLA, CSA, …)

• Micro-architecture (parallel, pipelined)

Design Time Optimization of Active PowerDesign Time Optimization of Active Power

topology A

topology B

Delay

En

erg

y/o

p

Source: B. Nikolic

4

Sizing, Supply, Threshold Optimization

Transistor sizing can yield large power savings with small delay penalties

Gate sizing

Beta-ratio adjustments

Stack resizing

IBM EinsTuner

Supply voltage affects both active and leakage energyThreshold voltage affects primarily the leakage

Page 3: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

3

5

Sizing, Supply, Threshold Optimization

There exists optimal supply + threshold for each function

In this optimum ESw/ELk ~ 2

Depends on logic depth, activity, function

Technology is not optimal for all blocks

Adjust during the designMultiple supplies, thresholds

Variable throughput applicationsVariable supplies, thresholds

6

Multi-dimensional search

En

erg

y [E

no

m]

Delay [Dnom]

• Well-defined optimization problem• Can get pretty close to optimum with only 2 variables• Getting the minimum speed or delay is very expensive

Page 4: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

4

7

Example: W-VDD Optimization for LA Adder

♦ Reference: all paths are critical

♦ Internal energy ⇒ W more effective than Vdd

• W: E(-54%), 2Vdd: E(-27%) at dinc=10%

sizing: E (-54%)dinc=10%

nominalD=Dmin

2Vdd: E (-27%)dinc=10%

8

Multiple Supply Voltages

Block-level supply assignmentHigher throughput/lower latency functions are implemented in higher VDD

Slower functions are implemented with lower VDD

“Voltage islands” as called by IBMSeparate supply grids, level conversion performed at block boundaries

Multiple supplies inside a blockNon-critical paths moved to lower supply voltageLevel conversion within the blockPhysical design challenging

Page 5: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

5

9

Multiple Supplies in a Block

Lower VDD portion is shaded

CVS StructureConventional Design

Critical Path

Level-Shifting F/F

Critical Path

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

FF

M.Takahashi, ISSCC’98. “Clustered voltage scaling”

10

Pulsed LCFF

M/S and pulsed half-latch LCFFs (MSHL, PHL)Smaller # of MOSFETs / clock loading

Faster level conversion using half-latch structure

Shorter D-Q path from pulsed circuit

q

ck

ckb ckclk

level conversion

ckb

ckd q (inv.)

ck

ckclk

level conversion

dmo

mf

sfso db

sfso

MN1 MN2

Ishihara, ISLPED’03

Page 6: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

6

11

Pulsed LCFF

Pulsed precharge LCFF (PPR)Fast level conversion by precharge structure

Suppressed charge/discharge toggle by conditional capture

Short D-Q path

Ishihara, ISLPED’03

12

Multiple Supply Voltages

Two supply voltages per block are optimal

Optimal ratio between the supply voltages is 0.7

Level conversion is performed on the voltage boundary, using a level-converting flip-flop (LCFF)

An option is to use an asynchronous level converter

More sensitive to coupling and supply noise

Page 7: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

7

13V1 = 1.5V, VTH = 0.3V, p(t):lambda

V2 (V)V3 (V)

Po

wer

Red

uct

ion

Rat

io

0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

V1 (V)

V

2

(V

)

+

V2 (V)V

3(V

)

Three VDD’s

From Kuroda

14

1.0

0.5

Su

pp

ly V

olta

ge

Rat

io

1.0

0.4

0.5 1.0 1.5V1 (V)

Po

wer

Dis

sip

atio

n R

atio

V2/V1

P2/P1

{ V1, V2 }

V2/V1

V3/V1

{ V1, V2, V3 }

0.5 1.0 1.5V1 (V)

P3/P1

V2/V1

V3/V1

V4/V1

0.5 1.0 1.5V1 (V)

P4/P1

{ V1, V2, V3, V4 }

The more VDD’s, the less power, but the effect saturates.Power reduction effect will be decreased as VDD’s are scaled.Optimum V2/V1 is around 0.7.

Optimum Numbers of Supplies

[Hamada, CICC’01]

Page 8: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

8

15

carrygen.

partialsum

gpgen.

5:1MUX

ain

bin

carry

s0/s1

sum

sumb (long loop-back bus)

clk

clock gen.

: VDDH circuit

: VDDL circuit

INV1INV2

0.5pF

sumsel.

2:1MUX

9:1MUX

logicalunit

9:1MUX

ain0

ALU Block Diagram

16

sum

keeperpc

sumb

VDDH

VDDL

INV1 INV2

domino level converter (9:1 MUX)

ain0sel(VDDH)

VDDH

VDDL

Delay of INV1 does not increase

INV2 is placed near 9:1 MUX to increase noise immunity

Level conversion is done by a domino 9:1 MUX

Low Swing Bus & Level Converter

Page 9: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

9

17

Ene

rgy

[pJ]

TCYCLE [ns]

Room temp.

The dual-supply technique expands the power-delay optimization space

200

300

400

500

600

700

800

0.6 0.8 1.0 1.2 1.4 1.6

Single-supply

Shared well(VDDH=1.8V)

1.16GHz

VDDL=1.4VEnergy:-25.3% Delay :+2.8%

VDDL=1.2VEnergy:-33.3% Delay :+8.3%

Measured Results: Energy & Delay

18

i1 o1

VDDHVDDL

VSS

Conventional

VDDH circuit VDDL circuit

Distributing Multiple Supply VoltagesDistributing Multiple Supply Voltages

i2 o2i1 o1

VDDH

VDDL

VSS

Shared N-well

VDDH circuit VDDL circuit

i2 o2

Page 10: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

10

19

VDDH circuit

VDDH VDDL

VSS

N-well isolation

VDDL circuit

Conventional

(a) Dedicated row

(b) Dedicated region

VDDL Row

VDDH Row

VDDH Row

VDDL Row

VDDHRegion

VDDLRegion

20

VDDH circuit

VDDH

VDDL

VSS

Shared N-well

VDDL circuit

Shared-Well

(a) Floor plan image

VDDL circuit

VDDH circuit

Page 11: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

11

21

Reducing the Supply Voltage:Concurrency versus Clock Speed

Example: reference datapath

22

Parallel Data Path

Page 12: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

12

23

Pipelined Data Path

24

A Simple Data Path: Summary

Page 13: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

13

25

MAC

Unit

Addr

Gen

µP

Prog Mem

Embedded Processor

(lpArm)

Direct MappedHardware

EmbeddedFPGA

DSP(e.g. TI 320CXX )

Fle

xibi

lity

Energy

ReconfigurableProcessors

(Maia)Factor of 100-1000

100-1000 MOPS/mW

10-100MOPS/mW

.5-5MIPS/mW

Brodersen & Rabaey

Architecture Choices

26

Two Types of Processing

Fixed-rate processing (e.g. signal processing for multimedia or communications)

Stream-based computation

No advantage in obtaining throughput in excess of the real-time constraint

Variable-rate or burst-mode computation (e.g. general purpose computation)

mostly idle (or low-load) with bursts of computation

Faster is better

Page 14: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

14

27

Workload

Ene

rgy

VTmax/V

DD

Variable-rate processingVoltage as a design variable

Adapting voltage to workload yields cubic reduction!

28

Common Design Approach: Fixed VDD

Compute ASAP:

Deliv

ered

Thr

ough

put

Clock Frequency Reduction:

Excessthroughput

Always high throughput

Energy/operation remains unchanged…while throughput scaled down with fCLK

fCLKReduced

time

time

Page 15: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

15

29

Dynamic Voltage Scaling (DVS)

time

• Dynamically scale energy/operation with throughput.• Always minimize speed → minimize average energy/operation.• Extend battery life up to 10x with the exact same hardware!

Vary fCLK,VDD

Deliv

ered

Thro

ughp

ut

1 2 Dynamically adapt

BurdISSCC’00

30

Adaptive Supply Voltages

Page 16: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

16

31

Variable Algorithmic Workload

32

Typical MPEG IDCT Histogram

Page 17: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

17

33

Required speed ∝ ƒ0 0.2 0.4 0.6 0.8 1

No

rmal

ized

pow

er P

∝ƒV

2

0

0.2

0.4

0.6

0.8

1

Dynamic Power Reduction ThroughSoftware-Hardware Cooperation

ControllerController

Clock & VDD

Requiredspeed

Processor

Software

S. Lee et al, DAC, June 2000

If you don’t need to hustle, relax and save power.

HardwareSuper-linear

34

Processor: Converter Loop Sets VDD, fCLK

RST

Counter

Latch

Digital Loop Filter

L

CDD

VDD

PENAB

NENABΣ FERR

FMEAS

f1MHz

0110

100 FDES

+Register

fCLK

Ring Oscillator

V BAT

Processor

IDD

• Feedback loop sets VDD so that FERR → 0.• Ring oscillator delay-matched to CPU critical paths.• Custom loop implementation → Can optimize CDD.

7

Buck converter

Set byO.S.

BurdISSCC’00

Page 18: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

18

35

100

80

60

40

20

00 1 2 3 4 5 6

Dhry

ston

e 2.

1 M

IPS

Energy (mW/MIPS)

85 MIPS @5.6 mW/MIPS

(3.8V)

6 MIPS @0.54 mW/MIPS

(1.2V)

• Dynamic operation can increase energy efficiency > 10x.

x

Static VDD

Dynamic VDD

BurdISSCC’00

Measured System Performance & Energy

36

1.0

3.5

VDD( fCLK)∝

1.0

3.5

VDD( fCLK)∝

Max.Speed

Idle Low Speed & Idle

Increased speed forshorter process deadlines

200ms/div 200ms/div

• User-interface process: very bursty computation.

• High-latency computation done @ low speed/energy.

Compute ASAP: With Voltage Scheduler:

BurdISSCC’00

DVS for Real Applications

Page 19: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

19

37

• ZERO is implemented heuristic algorithm.• Difficult to optimize compute-intensive code (MPEG).•Big drop in energy when less speed required (3.3-4.5x)

MPEG UI AUDIO

Compute ASAP

Optimal

ZERO

Algorithm

Benchmarks

100 % 100 % 100 %

67 % 25 % 16 %

89 % 30 % 22 %

(Normalized Energy)

BurdISSCC’00

Measured Benchmark Energy Consumption

38

Recent DVS-Enabled Microprocessors

Xscale: 180nm 1.8V bulk-CMOS [Intel00]0.7-1.75V, 200-1000MHz, 55-1500mW (typ) Max. Energy Efficiency: ~23 MIPS/mW

PowerPC: 180nm 1.8V bulk-CMOS [Nowka02] 0.9-1.95V, 11-380MHz, 53-500mW (typ)Max. Energy Efficiency : ~11 MIPS/mq

Crusoe: 130nm 1.5V bulk-CMOS [Transmeta03]0.8-1.3V, 300-1000MHz, 0.85-7.5W (peak)

Pentium M: 130nm 1.5V bulk-CMOS [Intel03]0.95-1.5V, 600-1600MHz, 4.2-31W (peak)

Page 20: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

20

39

VDD-Hopping

MPEG-4 encoding

No

rmal

ized

pow

er0

0.2

0.4

0.6

0.8

1

2 3 8

# of frequency levels1

Transition time

between ƒlevels

= 200µs

Time

n-th slice finished hereNext milestone

#n #n+1

Application slicing and software feedback guarantee real-time operation.

Two hopping levels are sufficient.

40

Challenge: Design over Wide Range of Voltages

• Circuit design constraints. (Functional verification)

• Circuit delay variation. (Timing verification)

• Noise margin reduction. (Power grid, coupling)

• Delay sensitivity. (Local power distribution)

Design verification complexity similar to

high-performance processor design @ fixed VDD

Page 21: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

21

41

Relative Delay Variation

+40

+20

0

-20

Perc

ent D

elay

Var

iatio

n

VDDVT 2VT 3VT 4VT

• Timing verification only needed at min. & max. VDD.• Should also consider Vdd variations

Delay relative to ring oscillator

Gate

Interconnect

DiffusionSeries

Four extreme cases ofcritical paths:

All vary monotonically with VDD.

BurdISSCC’00

42

VDDVT 2VT 3VT 4VT

1

0.8

0.6

0.4

0.2

0Norm

aliz

ed ∂

Dela

y / D

elay

• Design of local power grid (for timing constraints) only need to consider VDD ≈ 2VT.

RVIVVDelay

VV

DelayDelayDelay

DDDDDD

DD

DD⋅=∆∆⋅

∂∂≈∂

)(,)(

BurdISSCC’00

Delay Sensitivity

Page 22: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

22

43

• Static CMOS logic.

• Ring oscillator.

• Dynamic logic (& tri-state busses).

• Sense amp (& memory cell).

Max. allowed |dVDD/dt| → Min. CDD = 100nF (0.6µm)

Circuits continue to properly operate as VDD changes

Design for Dynamically Varying VDD

44

VDD

• Static CMOS robustly operates with varying VDD.

Vin = 0 Vout = VDDrds|PMOS

CL

Vout

max. τ = 4ns

0.6µm CMOS: |dVDD/dt| < 200V/µs

Static CMOS Logic

Page 23: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

23

45

Ring Oscillator

• Output fCLK instantaneously adapts to new VDD.

60 80 100 120 140 160 180 200 220 240 260

0

1

2

3

4

Volts

Time (ns)

fCLK

VDD

Simulated with dVDD/dt = 20V/µs

46

VDD

Vout

Vin

clk

clk

Volts

Time

VoutVDDFalse logic low: ∆VDD > VTP

Latch-up: ∆VDD > Vbe

Errors

• Cannot gate clock in evaluation state.

• Tri-state busses fail similarly → Use hold circuit.

0.6µm CMOS: |dVDD/dt| < 20V/µs

clk = 1

∆VDD

−∆VDD

Dynamic Logic

Page 24: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

24

47

•• Locality of referenceLocality of reference

•• DemandDemand--driven / Datadriven / Data--driven computationdriven computation

•• ApplicationApplication--specific processingspecific processing

•• Preservation of data correlationsPreservation of data correlations

•• Distributed processingDistributed processing

System-Level Issues: Reducing Waste

Avoid switching any capacitance unneededlySharing increases capacitance

48

Clock gating

Requires careful skew control ...Fortunately well handled in todays EDA tools

Page 25: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

25

49

DSP/HIF

DEU

MIF

VDE

896Kb SRAM

10

8.5mW

0 155

30.6mW

20 25

Without clock gating

With clock gating

Power [mW]

Clock-gating efficiently reduces power, NOW

Courtesy M. Ohashi, Matsushita, ISSCC 2002, Paper #22.1

90% of F/F’s were clock-gated.

70% power reduction by clock-gating alone.

MPEG4 decoder

50

Pre-computation

Other options:• guarded evaluation• set output directly

Inputs xi … xn are not appliedif pre-computing holds

Page 26: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

26

51

Circuit-level Activity Reduction

52

Circuit-Level Activity Encoding

Conditional InversionCoding for Interconnect

Page 27: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

27

53

Eliminating Redundant Computations

54

Eliminating Redundant Computations

Page 28: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

28

55

Number Representation

56

Number Representation -Accumulator Example

Page 29: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

29

57

Two’s Complement vs Sign-Magnitude

58

Reducing Activity by Reordering Inputs

Page 30: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

30

59

Resource Sharing Can Increase Activity

60

Ad

d

Ad

d

Reg

iste

r

Application Specific Processing Reduces

“Implementation Overhead”

Application-Specific Processing

Page 31: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

31

61

The Architectural Trade-off

108

19.6

5.5

0.022

16-State ViterbiDecoder

Energy per Decoded bit (nJ)

10

4.3

1.8

2,200

64-point FFT

Transforms per second per unit area

(Trans/ms/mm2)

AreaEnergy

16-State ViterbiDecoder

Decode rate per unit area (kb/s/mm2)

64-point FFT

Energy per Transform (nJ)

1501700High-Performance DSP

50436Low-Power DSP

100683FPGA

200,0001.78Direct-Mapped Hardware

(numbers taken from vendor-published benchmarks)Orders of magnitude lower efficiency

even for an optimized processor architecture

62

Towards Heterogeneous Architectures for SOC

Xilinx Vertex ProXilinx Vertex Pro

JanusJanus Chip Chip -- ST Micro and ParadesST Micro and Parades

Berkeley PleiadesBerkeley Pleiades

Page 32: EE241 - Spring 2005bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s05/... · 2005. 3. 1. · • Logic style (domino, pass-gate, …) • Block topology (adder: CLA, CSA, …) •

32

63

• Voltage as a Design VariableMatch voltage and frequency to required performance

• Minimize waste (or reduce switching capacitance)Match computation and architecture Preserve locality inherent in algorithmExploit signal statisticsEnergy (performance) on demand

More easily accomplished in applicationMore easily accomplished in application--specific thanspecific thanprogrammable devicesprogrammable devices

Reducing Active Dissipation:Summary


Recommended