Tutorial Outline - caesr.uwaterloo.ca · Puneet Gupta (puneet@ee.ucla.edu) Electromigration •...

Post on 08-Aug-2020

3 views 0 download

transcript

Puneet Gupta (puneet@ee.ucla.edu)

Tutorial Outline

Puneet Gupta (puneet@ee.ucla.edu)

RELIABILITY AND VARIABILITY MECHANISMS

Puneet Gupta (puneet@ee.ucla.edu)

Life Time of a System

•  More transistors + high power density/temperature + smaller dimensions + unscaled voltage à high failure/wearout rate

•  Failure models are usually not accurate

I

Infant Mortality II

Useful Life III

Aging Overall life

characteristics

Operation related failures

Quality failures Aging (wearout)

Fa

ilu

re R

ate

Time

System life time

[B. E. Hegler, Potential 1988]

Puneet Gupta (puneet@ee.ucla.edu)

Electromigration •  Atom flux induced in metal traces by high current

densities à Metal atoms experience a mechanical force and get dislodged from their position à formation of metal voids in the conductor, which eventually result in electrical opens. –  Cu is more resistant than Al BUT shrinking

dimensions + increasing current density à ~50% lifetime degradation/tech. generation

BEFORE AFTER

law) s(Black' 2 :naluminumfor 0.7 :

)10*(8.62constant sBoltzmann' :k

densitycurrent Threshold :JdensityCurrent :J

constant :A)(

5-crit

0

)/(0

a

kTEncritEM

E

eJJAMTTF a−−=                  T                                                              AF   100   105   110  

0.2   13.33   10   7.55  0.3   5.93   4.44   3.36  0.4   3.33   2.5   1.89  0.5   2.13   1.6   1.21  0.6   1.48   1.11   0.84  0.7   1.09   0.82   0.62  0.8   0.83   0.63   0.47  0.9   0.66   0.49   0.37  

MTTF: Switching activity vs. Temp

Puneet Gupta (puneet@ee.ucla.edu)

Hot Carrier Injection (HCI)

•  Hot carriers produced at the drain end of the transistor collide with the lattice atoms à Few get trapped at defect sites within the oxide, will represent a fix oxide charge à Increase in Vth

•  Device dimension shrinks linearly but operating voltage does not shrink linearly à Strengthen electric field inside of the MOS à worse HCI

•  Happens during switching

Puneet Gupta (puneet@ee.ucla.edu)

Time Dependent Dielectric Breakdown (TDDB)

•  TDDB refers to the destruction of a dielectric (1) at gate oxide or (2) between metal lines Oxide  

Oxide  

Oxide  

• Oxide  defects  accumulates  over  @me  • Overlapping  defects  form  conduc@ve  path  à  SoF  breakdown  happens  

• Conduc@on  leads  to  heat  à  thermal  damage  à  more  defects  à  more  conduc@on  

• Oxide  in  the  breakdown  spots  melts  • Conduc@ve  filament  is  formed  • Hard  breakdown  happens  Oxide  

Puneet Gupta (puneet@ee.ucla.edu)

NBTI: Negative Bias Temperature Instability

•  Vth varies on PMOS device (PBTI for NMOS) –  |Vth| increases with negative bias Vgs=-Vdd. I.e.,

happens when PMOS is ON (but not switching) –  But recovers with zero bias, Vgs=0 –  Physical mechanism not well understood –  Obeys power law

•  n ranges from 0.13 to 2.5 •  5%-20% delay degradation in 10 years

–  Strong voltage, temperature dependence –  Degradation is independent of frequency at

moderate to high frequencies –  Degradation rate is steep at the beginning but

slows down rapidly

nthV tΔ ∝

Puneet Gupta (puneet@ee.ucla.edu)

IC Structures can be highly Variable

Transistor

Interconnect

Variability will impact performance Spanos & Poolla, UCB

Puneet Gupta (puneet@ee.ucla.edu)

Small Dimensions don’t help!

•  < 100 discrete dopants in channel •  Channel length < 50 atoms across

Puneet Gupta (puneet@ee.ucla.edu)

Sources of Variability

0 10 20 30 40 50 60 70 80 Core ID

Freq

uenc

y (G

Hz)

7

6

5

4

3

2

1.2V

0.8V

7.3 GHz 5.7

GHz

25%

50%

Frequency variation in an 80-core processor within a single die in Intel's 65nm technology

Semiconductor Manufacturing Vendor Differences

Ambient Conditions Aging

Puneet Gupta (puneet@ee.ucla.edu)

•  W, L variations –  Due to photolithography proximity effect or etching –  Layout density

dependent –  Location dependent

•  Tox variation –  Usually Well controlled

•  Vth variation –  Doping fluctuation –  Stress, WPE, RTA

•  Mobility variation –  Stress, WPE

Device Parameter Variations

Source Drain

Poly Gate

STI

Well

Channel L

oxt

W

Puneet Gupta (puneet@ee.ucla.edu)

Interconnect Parameter Variations

•  Line width(w), spacing(s) –  Due to photolithography proximity effect or

etching –  Layout dependent –  Location dependent

•  Metal thickness (T) –  Due to erosion, dishing –  Layout density dependent

•  Dielectric thickness(H) –  Due to CMP

•  Dielectric Constant (ro)

S W

T

H

Puneet Gupta (puneet@ee.ucla.edu)

Taxonomy of Variations •  Source

–  Process, Vendor: Typically permanent –  Environment: Typically transient –  Wearout: Slow but eventually permanent

•  Nature –  Systematic: metal dishing, litho proximity effects

•  Predictable, given enough time and models –  Random: dopant fluctuations, material variations, LER

•  Either not understood (yet) or truly random •  Spatial Scale

–  Intra-die: litho proximity, CMP –  Inter-die: material variations

•  Includes wafer-to-wafer, lot-to-lot variations

Puneet Gupta (puneet@ee.ucla.edu)

ITRS 2009 Predictions

http://public.itrs.net

Puneet Gupta (puneet@ee.ucla.edu)

UNDERDESIGNED AND OPPORTUNISTIC COMPUTING

Puneet Gupta (puneet@ee.ucla.edu)

The Hardware-Software Interface

Time or part

Hardware Abstraction Layer

Operating System

Application

Application

overdesigned hardware

Variation: 20x in sleep power 50% in performance

Prac@ce:  over-­‐design  &  guard-­‐banding  for  illusion  of  rigidity    

Puneet Gupta (puneet@ee.ucla.edu)

Underdesigned and Opportunistic Computing (UnO)

Variability manifestations - faulty cache bits - delay variation - power variation

sensors & models Hardware signatures: - cache bit map - cpu speed-power map - memory power - ALU error rates

Selective use of Hardware Resources

Disabling parts of the

cache, cores with asymmetric reliability

Quality-Complexity Tradeoffs

Codec parameters,

iteration control, Duty Cycling

Alternate Code Paths

Multiple algorithm implementations,

dynamic recompilation

Do Nothing

Elastic User, Robust App

Puneet Gupta (puneet@ee.ucla.edu)

E.g., Variability-Aware Duty Cycling

[DATE’11, TVLSI’12] Duty-Cycled Sensors

sleep activ

e

DC = f (PA, PS, L, E) DC: Duty Cycle

PA: Active Power (W) PS: Sleep Power (W)

L: Lifetime (s) E: Energy (J)

10% PA Variation

14X PSVariation

10 Off-the-shelf ARM Cortex M3 cores in 130nm

(Atmel SAM3U)

Puneet Gupta (puneet@ee.ucla.edu)

UnO Dutycycling (TinyOS) Task(pmin, pmax)

Task (imin,imax)

Task Adaptable Task

Adaptable Task

Traditional Task

Duty Cycle Scheduler:  DC = f (PA, PS, ...)

Hardware Signature

allowable DC

PA, PS, ...

Tasks with knobs to adapt period & computation time

Variability-Aware Duty Cycle

Scheduler that maximizes active time with lifetime

& battery constraints

Puneet Gupta (puneet@ee.ucla.edu)

Improvement over Worst-Case Duty Cycle

average: 22x improvement in active time over worst-case based duty-cycling

average: 55 days short for one year’s lifetime when using datasheet spec. instead of UnO

Puneet Gupta (puneet@ee.ucla.edu)

MONITORING VARIABILITY

Puneet Gupta (puneet@ee.ucla.edu)

Variability Monitoring: How ?

Replica Monitors

In-Situ Monitors

Online Self-Test

Software Inferences

Hardware Cost Low High Medium Low Accuracy Low High High Low Coverage Low High High Low Online Operation

Yes Yes Possible Possible

•  When to sample “hardware signatures” –  Manufacturing/vendor variation à Once at

fabrication –  Manufacturing variation + aging à Once at “boot

up” –  Manufacturing variation + aging + ambient à

Periodically

Need production test

+ software interface

Need monitors + software interface

Puneet Gupta (puneet@ee.ucla.edu)

Delay Monitors •  Why ?

– Delay change is a signature for P, V, T, age

•  How ? – Replicas: how accurate can we make

them ? –  In-situ: how cheap can we make them ?

Puneet Gupta (puneet@ee.ucla.edu)

DDRO: Smarter Replicas [ISQED’12]

Each dot represents Δdelay of a critical path under variations.

Paths are extracted from ARM Cortex-M3 core

•  Existing works use inverter-based RO or single dedicated RO –  Some tunable replicas but no direct

connection to critical paths in design

•  Critical path delay sensitivities form natural clusters

•  Implications for replica delay monitors –  Design dependent –  One monitor per cluster

•  Design-dependent Ring-Oscillator (DDRO)

Puneet Gupta (puneet@ee.ucla.edu)

DDRO overview •  Systematic methodology to

design multiple DDROs based on clustering •  ILP to construct ROs from library

gates •  Automated P&R of DDROs

•  Statisitical methodology to leverage monitors to estimate chip delay •  Robust projection of chip delay

RO delays •  Margin for local variation

Puneet Gupta (puneet@ee.ucla.edu)

Experimental Results •  Simulation results

–  DDROs can monitor global variation near-perfectly

–  Accuracy floor dictated by local variation

•  No replica works! •  45nm testchip

–  4 DDROs –  14 measured die

•  Significant improvement over conventional Ros

ARM Cortex-

M3

DDRO

Global variation only Global and local variations

Number of monitors

De

lay M

arg

in(%

)

Number of monitors

Puneet Gupta (puneet@ee.ucla.edu)

SlackProbe: In Situ Timing Slack Monitors [DATE’13]

•  In-situ monitors: accurate (can monitor local variation) but with large overhead

•  Rich literature but focus exclusively on destination registers

Margined  

Monitored                     Transi3on  

Detector    

Transi3on  Detector  

 

Margin Matching delay

• SlackProbe – Allow monitors

inserted at internal nodes

– Extra delay margin for unmonitored delay

Puneet Gupta (puneet@ee.ucla.edu)

•  Path selection by opportunism window –  Defines what is being monitored (aging, process, temperature….) –  Corner (typical, worst)-based selection of paths

•  Circuit Aging Monitor: Typical = Slow process corner; Worst = Slow process + full aging corner

•  Monitor location selection

–  Consider monitor power, monitor activation rate and ECO cost –  Formulate and solve as Linear Programming (LP) problem

Delay

Best-case Chip delay

Worst-case chip delay

Path 1 Path 2 Path 3 Path 4 Path 5

Opportunism window

Worst-case design margin

Static margin

Typical operating clock period

Path and Monitor Location Selection

Puneet Gupta (puneet@ee.ucla.edu)

SlackProbe Overview

Puneet Gupta (puneet@ee.ucla.edu)

Experimental Results •  Baseline: insert monitors at all critical path endpoints •  SlackProbe: insert monitors with 5% delay margin •  Sub-32nm commercial processor benchmarks

15X-18X reduction in #monitors !

Puneet Gupta (puneet@ee.ucla.edu)

EMULATING VARIABILITY

Puneet Gupta (puneet@ee.ucla.edu)

How can we evaluate software behavior in the presence of

variability?

Binary Instrumentation

Limitations: adds cycles and energy cost to the target

code; kernel code typically not supported; native arch.

code only

Cycle-Accurate Simulation

Limitations: complexity, speed

VarEMU: Variability Extensions to the QEMU VMM Fast (binary translation), supports arbitrary emulated code,

several target machines & architectures Open-source; available at https://github.com/nesl/varemu

Puneet Gupta (puneet@ee.ucla.edu) 33

Cycle and Time Accounting

add r1, r2, r3

Translation Time Execution Time

x y z

cycle counting class

# cycles error status

Instruction Info

Translated Ops

Puneet Gupta (puneet@ee.ucla.edu)

From Cycles to Energy

Power = f (V, F, T, instruction class,

a, b, c)

Cycle Counters

Emulated Software

External Monitor

Energy

change parameters

accumulate

read, adapt to

Puneet Gupta (puneet@ee.ucla.edu)

Error Emulation

35

add r1, r2, r3

error status = PRE |

POST | REPLACE

Instruction Info

x y z

original ops

Software Controlled Global Error Enable

pre ( )

post ( )

replace ( )

Can call an external (e.g., RTL) simulator for more accurate error emulation in a co-simulation like model

Puneet Gupta (puneet@ee.ucla.edu)

Aging Emulation

•  Use existing models for NBTI – Approximate total aging ΔVth ∝ total active

time •  Active time à A.C. aging •  Clock gating à D.C. aging •  Power gating à Recovery

•  Delay as alpha power law model •  Calibrated to a commercial 45nm process

Puneet Gupta (puneet@ee.ucla.edu) 37

Interaction With Emulated Software

VarEMU Error Model

Intercepts instruction

execution and inserts errors

@I/D Memory location @Instruction decoding

@Instruction execution

App

OS

enable() disable()

App

Error status is part of process context

Error Model Parameters:

Memory locations

Probability of error

Error Magnitude ...

Cycle & Energy

Counters read ()

Power Model Parameters:

Voltage, Frequency,

A, B, C

Aging Model Parameters: Power gated time, clock gated time