+ All Categories
Home > Documents > LOW POWER DESIGN METHODODLOGIES

LOW POWER DESIGN METHODODLOGIES

Date post: 04-Jun-2018
Category:
Upload: sakthi-velan
View: 220 times
Download: 0 times
Share this document with a friend

of 52

Transcript
  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    1/52

    ELEN 468 Lecture 29 1

    ELEN 468Advanced Logic Design

    Lecture 29

    Low Power Design

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    2/52

    ELEN 468 Lecture 29 2

    Power Dissipation

    P6Pentium proc

    486

    3862868086

    80858080

    80084004

    0.1

    1

    10

    100

    1971 1974 1978 1985 1992 2000

    Year

    Power(W

    atts)

    Power increases despite Vdd decrease

    Courtesy, Intel

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    3/52

    ELEN 468 Lecture 29 3

    Power Density

    4004

    8008

    8080

    8085

    8086

    286386

    486Pentium proc

    P6

    1

    10

    100

    1000

    10000

    1970 1980 1990 2000 2010

    Year

    PowerDensity(W

    /cm2)

    Hot Plate

    Nuclear

    Reactor

    RocketNozzle

    Courtesy, Intel

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    4/52

    ELEN 468 Lecture 29 4

    Why Power Increased

    Growing die size, fast frequency scaling

    Clock Frequency (MHz)

    10

    100

    1000

    10000

    85 87 89 91 93 95 97 99 01 03 05

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    5/52

    ELEN 468 Lecture 29 5

    Gate Power Dissipation

    Leakage power

    Dynamic power

    Short circuit power

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    6/52

    ELEN 468 Lecture 29 6

    Dynamic Power

    Occurs at eachswitching

    Pd= CLVdd2fpfp switching

    frequency

    out

    Vdd

    out

    Vdd

    SaturationLinear

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    7/52

    ELEN 468 Lecture 29 7

    Leakage Power

    Static

    Leakage current

    = a VddLeakage current

    = b/Vt

    Killer to CMOStechnology

    out

    Vdd

    out

    Vdd

    SaturationLinear

    Leakage

    Leakage

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    8/52

    ELEN 468 Lecture 29 8

    Short Circuit Power

    During switching,there is a short

    moment when bothPMOS and CMOS arepartially on

    Ps= Q(Vdd-Vt)3

    trfptrrising time

    out

    Vdd

    out

    Vdd

    Input rising

    Input falling

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    9/52

    ELEN 468 Lecture 29 9

    Where Does Power Go?

    Power percentages

    Core transistor

    leakage

    Gate leakageCache leakage

    Active power

    0%

    10%

    20%

    30%

    40%

    50%

    60%70%

    80%

    90%

    100%

    Scalable X86 CPU Design for 90nm

    Low VTdevices are

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    10/52

    ELEN 468 Lecture 29 10

    EnergyPerformance Space

    Every design is a point on a 2-D plane

    Performance

    Energy

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    11/52

    ELEN 468 Lecture 29 11

    Low Power Design

    Reduce dynamic power

    a: clock gating, sleep mode

    C: small transistors (esp. on clock), short wires

    VDD: lowest suitable voltage f: lowest suitable frequency

    Reduce static power

    Selectively use low Vt

    devices

    Power gating, MTCMOS

    Stacked devices

    Body bias

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    12/52

    ELEN 468 Lecture 29 12

    Clock Gating

    Gate off clock to idle functionalunits e.g., floating point units

    need logic to generatedisable signal increases complexity of control logic consumes power timing critical to avoid clock glitches

    at OR gate output

    additional gate delay on clocksignal

    gating OR gate can replace a buffer inthe clock distribution tree

    R

    e

    g

    clock

    disable

    Functional

    unit

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    13/52

    ELEN 468 Lecture 29 13

    Active Power Reduction - SupplyVoltage Reduction

    Static Dynamic

    Pros:

    Always active in saving

    Cons:

    Additional power delivery networkNeeds special care of interface between

    power domains

    signals close to Vtexcessive leakage

    and reduced noise margins

    Adjusting operation voltage and frequency to

    performance requirements:

    High performancehigh Vdd& frequencyPower savinglow Vdd& frequency

    Pros:

    Doesnt limit performance

    Cons:

    Penalty of transition between differentpower states can be high (in performance

    and power)

    Additional control logic

    Slow SlowFast

    High

    Supply

    Voltage

    Low

    Supply

    Voltage

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    14/52

    ELEN 468 Lecture 29 14

    Voltage Islands (Multi-Vdd)

    Allow both macro and cell voltage assignmentAllow different voltage islands in the same circuit row

    Lift unnatural layout restrictions

    Minimal placement disturbance

    Lackey+

    ICCAD02

    Usami+

    JSSC98

    Vddh

    Vddl

    GVI

    DAC03

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    15/52

    ELEN 468 Lecture 29 15

    Level Converter

    Interface circuit when Vddldrives Vddhto avoid leakage

    VddH

    VddL

    weak on!

    Vddh

    Vddl

    IN

    OUT

    Conventional dual

    supply level converter

    Vddh

    IN

    OUT

    New single supply level

    converter

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    16/52

    ELEN 468 Lecture 29 16

    Adjacency Metrics for Clustering

    Logic adjacency metric (LAM):Vddl fanin cone oflevel shifter without going through Vddh

    LC1

    Vddh

    Vddl

    LC2

    LC3

    Vddh

    Vddl

    LC2

    LC3

    Physical adjacency metric (PAM):for each candidateVddlcell, compute total size of its neighbor Vddlcells

    LAM to guide logic aware voltage assignment

    PAM to guide placement aware voltage re-assignment

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    17/52

    ELEN 468 Lecture 29 17

    Level Converter Optimizations

    Logic replacement (or gate sizing)

    ZMUX1

    LC

    LC

    LCLC

    DEC

    ZMUX2

    DEC

    B A B ALC LC

    LC/Buffer co-optimization

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    18/52

    ELEN 468 Lecture 29 18

    Placement to Form Voltage Islandswith Power Grid Co-design

    Based on Vddl and Vddh

    cell placement after

    voltage assignment,

    define Vddl/Vddhpowergrids on demand

    Detailed placement to

    form Vddl/Vddhvoltageislands that can hit their

    corresponding power

    supplies

    Vddh

    Power grids on demand

    Vddl Vddh Vddl Vddh Vddl Vddh

    Vddl

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    19/52

    ELEN 468 Lecture 29 19

    Example of Voltage Islands

    Vddl=

    1.2V

    Vddh

    = 1.5V

    No timing degradation, no area increase!

    -IBM Cu11

    -0.13um

    - 400 MHz

    (courtesy IBM)

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    20/52

    ELEN 468 Lecture 29 20

    Dynamic Frequency andVoltage Scaling

    Always run at the lowest supply voltage that meets the timingconstraints

    DFS (dynamic frequency scaling) saves only power

    DVS (dynamic voltage scaling) + DFS saves both energy and power

    A DVS+DFS system requires the following A programmable clock generator (PLL)

    PLL from 200MHz 700MHz in increments of 33MHz

    A supply regulation loop that sets the minimum VDDnecessary foroperation at the desired frequency

    32 levels of VDDfrom 1.1V to 1.6V

    An operating system that sets the required frequency + supply voltageto meet the task completion deadlines heavier load ramp up VDD, when stable speed up clock lighter load slow down clock, when PLL locks onto new rate, ramp down

    VDD

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    21/52

    ELEN 468 Lecture 29 22

    Leakage Reduction Techniques

    pullup (Vdd)

    Vx

    stack effect

    Wu

    Wl

    High Vtdevices

    Low Vtdevices

    dual Vt

    partitioning

    VnwellVdd

    Vpwell 0

    variable threshold

    (VTCMOS)

    low Vtlogic

    sleep

    sleep

    Vdd

    virtual Vdd

    HVT

    virtual Gnd

    multi-threshold

    (MTCMOS)

    HVT

    Vdd

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    22/52

    ELEN 468 Lecture 29 23

    Natural Transistor Stacks

    Reduce the leakage by stacking the devices

    Reduced Vds

    Negative Vgs

    Negative Vbs

    How?

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    23/52

    ELEN 468 Lecture 29 24

    Design with Dual Vth

    Dual Vthdesign

    Two flavors of transistors: slowhigh Vth, fastlow Vth Low Vthare faster, but have 10X leakage

    Dual Vthevaluation

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    24/52

    ELEN 468 Lecture 29 25

    Impacts of Variable VT

    Reducing the VTincreasesthe sub-threshold leakage current (exponentially)

    VT

    = VT0

    + ( F

    + VSB

    - F

    )

    where VT0 is the threshold voltage at VSB=0, VSBis the source- bulk (substrate)voltage, is the body-effect coefficient

    But, reducing VTdecreasesgate delay(increases performance)

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    25/52

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    26/52

    ELEN 468 Lecture 29 27

    Forward/Reverse Body Biasing

    RBB (Reverse Body Bias):zerobody bias in active mode, a deep

    reverse bias in standby mode.

    FBB (Forward Body Bias):high Vth instandby mode, forward body biasing to

    achieve better current drive in active mode.

    Disadvantages:Increase PN junction reverse

    leakage

    Scaling down technology worsen

    short channel effects and weaken

    the Vth modulation capability

    Disadvantages:Larger junction capacitance

    High body effect for stack devices

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    27/52

    ELEN 468 Lecture 29 28

    Implementation of Dynamic Vth Scaling

    (DTS)

    The lowest Vth is delivered (NBB-no body bias) if the highest

    performance is required.

    When the performance demand is low, clock frequency is lowered

    and Vth is raised via RBB to reduce the run time leakage power

    dissipation.

    How?When critical path replica frequency is less then reference CLK,

    adjust bias to decrease Vth.

    Otherwise adjust bias to increase Vth.

    Results:

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    28/52

    ELEN 468 Lecture 29 29

    Power Gating Using Sleep Transistors

    Or can reduce leakage bygating the supply rails whenthe circuit is in sleep mode

    in normal mode, sleep = 0 and

    the sleep transistors mustpresent as small a resistance aspossible (via sizing)

    in sleep mode, sleep = 1, thetransistor stack effect reduces

    leakage by orders of magnitude

    Or can eliminateleakage by switching off the powersupply (but lose the memory state)

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    29/52

    ELEN 468 Lecture 29 30

    Example of Power Gating

    Embedded

    PowerSwitches

    Rows of

    Standard

    Cells

    Power Switch

    Control Signals

    Can reduce power1000X

    Smaller voltage swing(IR drop on sleep

    transistors) Lower performance

    Increased noisecoupling

    Local power griddesign

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    30/52

    ELEN 468 Lecture 29 31

    Power Dissipation on VariationTolerance

    Conventional variation tolerance

    Using large timing safety margin

    Implies aggressive timing target

    Greater power dissipation

    Observation

    Near-worst-case variations occur rarely

    Safety margin is applied continuously toguard the small chance of variations

    Poor power efficiency

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    31/52

    ELEN 468 Lecture 29 32

    Question..

    Can we deal with errors instead

    preventing them from occurring by

    conservative binning/clocking?

    How fast can we speed up the

    circuit with error rate inmanageable range?

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    32/52

    ELEN 468 Lecture 29 33

    Fault tolerant system

    Begin with reference values

    Introduce redundancy Hardware: Triple Modular Redundancy

    Time: Repeated process

    Information: Code

    Software: various algorithm

    How about for delay fault?

    how do we detect (may be correct?) errors?

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    33/52

    ELEN 468 Lecture 29 34

    Delay fault tolerant system

    Delay fault detection Redundant timing margin in signal path

    +: Second sampling at increase clock period

    - : Decrease delay of reference signal between

    pipeline registers

    t1 t2

    Timing margin

    2ndsampling

    t

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    34/52

    ELEN 468 Lecture 29 35

    Delay fault tolerant system

    Delay fault removal Reference signal (SR)

    Reprocessing at slower clock period (t)

    t1 t2

    Timing margin

    t

    SR

    t

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    35/52

    ELEN 468 Lecture 29 36

    Delay fault tolerant system: Example

    RAZOR* Dynamic Voltage Scaling Design

    Reduce power voltage down tomanageable failure rate

    t1 t2

    Timing margin

    * Razor: a low-power pipeline based on circuit-level timing speculation, D. Ernst et al, 36th Annual IEEE/ACM International Symposium on Microarchitecture 2003

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    36/52

    ELEN 468 Lecture 29 37

    RAZOR continued Implemented to 120MHz clock frequency

    But for high speed circuits Managing two clocks

    Minimum path delay constraint

    Delay of MUX

    Delay fault tolerant system: Example

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    37/52

    ELEN 468 Lecture 29 38

    Delay fault tolerant system: Example

    Parity coding Parity generation based on output correlation

    Avoid well-correlated outputs for pairing

    Timing margin

    t

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    38/52

    ELEN 468 Lecture 29 39

    Now.. Lets look at delay distribution(s)

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    39/52

    ELEN 468 Lecture 29 40

    Clock speed achieved for contained error rate

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    40/52

    ELEN 468 Lecture 29 41

    Delay fault tolerant system: Example

    Parity coding (continued) Complexity

    Example: C449 ISCAS Benchmark

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    41/52

    ELEN 468 Lecture 29 42

    Recently Proposed Design

    Fault detection Partial hardware and time redundancy

    Timing margin

    t

    Ln Ln+1

    g0 gm

    L'n+1

    FL BL

    gm

    BL'

    gi

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    42/52

    ELEN 468 Lecture 29 43

    Proposed Design

    Fault removal Pipeline flush & reprocessing at lower

    clock

    Ln Ln+1

    g0 gm

    L'n+1

    FL BL

    gm

    BL'

    gi

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    43/52

    ELEN 468 Lecture 29 44

    Proposed Design

    Division of FL an BL

    PI PO

    Latch

    FL BL

    CP

    Error?BL

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    44/52

    ELEN 468 Lecture 29 45

    Proposed Design

    Division of FL an BL

    Considerations The effects on the original circuit should be

    minimal.

    Maximize delay fault detection coverage

    Minimize added complexity

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    45/52

    ELEN 468 Lecture 29 46

    Proposed Design

    Division of FL an BL First, POs to BL

    Gate with longest delay to gate with shortest delay

    For the gates connected to BL, Choose the gate with maximum delay

    Then, any gate whose number of fanout> number of fanin

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    46/52

    ELEN 468 Lecture 29 47

    Proposed Design

    Delay fault detection coverage dFL: delay from PI to any gate in FL

    di: delay from PI to any gate in original circuit

    max{ }1

    max{ }

    FLF

    i

    dC

    d

    Add graphical view

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    47/52

    ELEN 468 Lecture 29 48

    Proposed Design

    Delay simulation SPICE simulation

    TSMC 0.18um tech. Vcc=1.6V Gate delay for rising and falling signal

    Load: inverter

    Different input combinations are considered

    Delay simulation Randomly generated test vectors

    106~108according to number of primary inputs (PI)

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    48/52

    ELEN 468 Lecture 29 49

    Proposed Design

    Area complexity Ngate:Number of gates in the original circuit

    Nff :Number of ffs in each pipeline, (NPI+NPO)/2 Ngate_BL:Number of gates in BL

    Ngate_CP:Number of gates in comparison block

    NLatch:Number oflatches=Number of

    connections between FL and BL w: Complexity ratio of flipflop to gate

    _ _gate BL gate CP LatchA

    gate ff

    N N NC

    N w N

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    49/52

    ELEN 468 Lecture 29 50

    Fault Coverage vs. ComplexityFault Detection Coverage vs. Added Complexity : C499

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

    Fault detection Coverage CF

    AddedComplexityCA

    Fault Detection Covera ge vs. Adde d Complexity: C432

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0 0.1 0.2 0.3 0.4 0.5 0.6

    Fault detection Coverage CF

    AddedComplexityCA

    Fault Detection Coverage vs. Added Complexity: C880

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0 0.1 0.2 0.3 0.4 0.5 0.6

    Fault detection Coverage CF

    AddedComplexityCA

    Fault Detection Coverage vs. Added Complexity: C6288

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0 0.1 0.2 0.3 0.4 0.5 0.6

    Fault detection Coverage CF

    AddedComplex

    ityCA

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    50/52

    ELEN 468 Lecture 29 51

    Complexity

    Effective complexity penalty

    Depends on application

    More than half of area is cache

    Speed critical part: integer unit

    0.5

    AE A AAppicable areaC C CTotal chip area

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    51/52

    ELEN 468 Lecture 29 52

    Estimation of Complexity

    & AGUDataCache

    AlignMux

    RegistersALUs

    Intel Pentium 4

    Processor on 90 nm

    Process

  • 8/13/2019 LOW POWER DESIGN METHODODLOGIES

    52/52

    Conclusion

    Delay fault tolerant design is proposed

    Possible operation clock frequency gain is

    estimated from modeling and experiments Delay fault detection coverage and complexity

    are analyzed for optimal implementation

    It shows that 10% clock frequency gain is

    possible with proposed design at a moderate (8-

    25%) complexity increase


Recommended