Puneet Gupta ([email protected])
Life Time of a System
• More transistors + high power density/temperature + smaller dimensions + unscaled voltage à high failure/wearout rate
• Failure models are usually not accurate
I
Infant Mortality II
Useful Life III
Aging Overall life
characteristics
Operation related failures
Quality failures Aging (wearout)
Fa
ilu
re R
ate
Time
System life time
[B. E. Hegler, Potential 1988]
Puneet Gupta ([email protected])
Electromigration • Atom flux induced in metal traces by high current
densities à Metal atoms experience a mechanical force and get dislodged from their position à formation of metal voids in the conductor, which eventually result in electrical opens. – Cu is more resistant than Al BUT shrinking
dimensions + increasing current density à ~50% lifetime degradation/tech. generation
BEFORE AFTER
law) s(Black' 2 :naluminumfor 0.7 :
)10*(8.62constant sBoltzmann' :k
densitycurrent Threshold :JdensityCurrent :J
constant :A)(
5-crit
0
)/(0
a
kTEncritEM
E
eJJAMTTF a−−= T AF 100 105 110
0.2 13.33 10 7.55 0.3 5.93 4.44 3.36 0.4 3.33 2.5 1.89 0.5 2.13 1.6 1.21 0.6 1.48 1.11 0.84 0.7 1.09 0.82 0.62 0.8 0.83 0.63 0.47 0.9 0.66 0.49 0.37
MTTF: Switching activity vs. Temp
Puneet Gupta ([email protected])
Hot Carrier Injection (HCI)
• Hot carriers produced at the drain end of the transistor collide with the lattice atoms à Few get trapped at defect sites within the oxide, will represent a fix oxide charge à Increase in Vth
• Device dimension shrinks linearly but operating voltage does not shrink linearly à Strengthen electric field inside of the MOS à worse HCI
• Happens during switching
Puneet Gupta ([email protected])
Time Dependent Dielectric Breakdown (TDDB)
• TDDB refers to the destruction of a dielectric (1) at gate oxide or (2) between metal lines Oxide
Oxide
Oxide
• Oxide defects accumulates over @me • Overlapping defects form conduc@ve path à SoF breakdown happens
• Conduc@on leads to heat à thermal damage à more defects à more conduc@on
• Oxide in the breakdown spots melts • Conduc@ve filament is formed • Hard breakdown happens Oxide
Puneet Gupta ([email protected])
NBTI: Negative Bias Temperature Instability
• Vth varies on PMOS device (PBTI for NMOS) – |Vth| increases with negative bias Vgs=-Vdd. I.e.,
happens when PMOS is ON (but not switching) – But recovers with zero bias, Vgs=0 – Physical mechanism not well understood – Obeys power law
• n ranges from 0.13 to 2.5 • 5%-20% delay degradation in 10 years
– Strong voltage, temperature dependence – Degradation is independent of frequency at
moderate to high frequencies – Degradation rate is steep at the beginning but
slows down rapidly
nthV tΔ ∝
Puneet Gupta ([email protected])
IC Structures can be highly Variable
Transistor
Interconnect
Variability will impact performance Spanos & Poolla, UCB
Puneet Gupta ([email protected])
Small Dimensions don’t help!
• < 100 discrete dopants in channel • Channel length < 50 atoms across
Puneet Gupta ([email protected])
Sources of Variability
0 10 20 30 40 50 60 70 80 Core ID
Freq
uenc
y (G
Hz)
7
6
5
4
3
2
1.2V
0.8V
7.3 GHz 5.7
GHz
25%
50%
Frequency variation in an 80-core processor within a single die in Intel's 65nm technology
Semiconductor Manufacturing Vendor Differences
Ambient Conditions Aging
Puneet Gupta ([email protected])
• W, L variations – Due to photolithography proximity effect or etching – Layout density
dependent – Location dependent
• Tox variation – Usually Well controlled
• Vth variation – Doping fluctuation – Stress, WPE, RTA
• Mobility variation – Stress, WPE
Device Parameter Variations
Source Drain
Poly Gate
STI
Well
Channel L
oxt
W
Puneet Gupta ([email protected])
Interconnect Parameter Variations
• Line width(w), spacing(s) – Due to photolithography proximity effect or
etching – Layout dependent – Location dependent
• Metal thickness (T) – Due to erosion, dishing – Layout density dependent
• Dielectric thickness(H) – Due to CMP
• Dielectric Constant (ro)
S W
T
H
Puneet Gupta ([email protected])
Taxonomy of Variations • Source
– Process, Vendor: Typically permanent – Environment: Typically transient – Wearout: Slow but eventually permanent
• Nature – Systematic: metal dishing, litho proximity effects
• Predictable, given enough time and models – Random: dopant fluctuations, material variations, LER
• Either not understood (yet) or truly random • Spatial Scale
– Intra-die: litho proximity, CMP – Inter-die: material variations
• Includes wafer-to-wafer, lot-to-lot variations
Puneet Gupta ([email protected])
The Hardware-Software Interface
Time or part
Hardware Abstraction Layer
Operating System
Application
Application
overdesigned hardware
Variation: 20x in sleep power 50% in performance
Prac@ce: over-‐design & guard-‐banding for illusion of rigidity
Puneet Gupta ([email protected])
Underdesigned and Opportunistic Computing (UnO)
Variability manifestations - faulty cache bits - delay variation - power variation
sensors & models Hardware signatures: - cache bit map - cpu speed-power map - memory power - ALU error rates
Selective use of Hardware Resources
Disabling parts of the
cache, cores with asymmetric reliability
Quality-Complexity Tradeoffs
Codec parameters,
iteration control, Duty Cycling
Alternate Code Paths
Multiple algorithm implementations,
dynamic recompilation
Do Nothing
Elastic User, Robust App
Puneet Gupta ([email protected])
E.g., Variability-Aware Duty Cycling
[DATE’11, TVLSI’12] Duty-Cycled Sensors
sleep activ
e
DC = f (PA, PS, L, E) DC: Duty Cycle
PA: Active Power (W) PS: Sleep Power (W)
L: Lifetime (s) E: Energy (J)
10% PA Variation
14X PSVariation
10 Off-the-shelf ARM Cortex M3 cores in 130nm
(Atmel SAM3U)
Puneet Gupta ([email protected])
UnO Dutycycling (TinyOS) Task(pmin, pmax)
Task (imin,imax)
Task Adaptable Task
Adaptable Task
Traditional Task
Duty Cycle Scheduler: DC = f (PA, PS, ...)
Hardware Signature
allowable DC
PA, PS, ...
Tasks with knobs to adapt period & computation time
Variability-Aware Duty Cycle
Scheduler that maximizes active time with lifetime
& battery constraints
Puneet Gupta ([email protected])
Improvement over Worst-Case Duty Cycle
average: 22x improvement in active time over worst-case based duty-cycling
average: 55 days short for one year’s lifetime when using datasheet spec. instead of UnO
Puneet Gupta ([email protected])
Variability Monitoring: How ?
Replica Monitors
In-Situ Monitors
Online Self-Test
Software Inferences
Hardware Cost Low High Medium Low Accuracy Low High High Low Coverage Low High High Low Online Operation
Yes Yes Possible Possible
• When to sample “hardware signatures” – Manufacturing/vendor variation à Once at
fabrication – Manufacturing variation + aging à Once at “boot
up” – Manufacturing variation + aging + ambient à
Periodically
Need production test
+ software interface
Need monitors + software interface
Puneet Gupta ([email protected])
Delay Monitors • Why ?
– Delay change is a signature for P, V, T, age
• How ? – Replicas: how accurate can we make
them ? – In-situ: how cheap can we make them ?
Puneet Gupta ([email protected])
DDRO: Smarter Replicas [ISQED’12]
Each dot represents Δdelay of a critical path under variations.
Paths are extracted from ARM Cortex-M3 core
• Existing works use inverter-based RO or single dedicated RO – Some tunable replicas but no direct
connection to critical paths in design
• Critical path delay sensitivities form natural clusters
• Implications for replica delay monitors – Design dependent – One monitor per cluster
• Design-dependent Ring-Oscillator (DDRO)
Puneet Gupta ([email protected])
DDRO overview • Systematic methodology to
design multiple DDROs based on clustering • ILP to construct ROs from library
gates • Automated P&R of DDROs
• Statisitical methodology to leverage monitors to estimate chip delay • Robust projection of chip delay
RO delays • Margin for local variation
Puneet Gupta ([email protected])
Experimental Results • Simulation results
– DDROs can monitor global variation near-perfectly
– Accuracy floor dictated by local variation
• No replica works! • 45nm testchip
– 4 DDROs – 14 measured die
• Significant improvement over conventional Ros
ARM Cortex-
M3
DDRO
Global variation only Global and local variations
Number of monitors
De
lay M
arg
in(%
)
Number of monitors
Puneet Gupta ([email protected])
SlackProbe: In Situ Timing Slack Monitors [DATE’13]
• In-situ monitors: accurate (can monitor local variation) but with large overhead
• Rich literature but focus exclusively on destination registers
Margined
Monitored Transi3on
Detector
Transi3on Detector
Margin Matching delay
• SlackProbe – Allow monitors
inserted at internal nodes
– Extra delay margin for unmonitored delay
Puneet Gupta ([email protected])
• Path selection by opportunism window – Defines what is being monitored (aging, process, temperature….) – Corner (typical, worst)-based selection of paths
• Circuit Aging Monitor: Typical = Slow process corner; Worst = Slow process + full aging corner
• Monitor location selection
– Consider monitor power, monitor activation rate and ECO cost – Formulate and solve as Linear Programming (LP) problem
Delay
Best-case Chip delay
Worst-case chip delay
Path 1 Path 2 Path 3 Path 4 Path 5
Opportunism window
Worst-case design margin
Static margin
Typical operating clock period
Path and Monitor Location Selection
Puneet Gupta ([email protected])
Experimental Results • Baseline: insert monitors at all critical path endpoints • SlackProbe: insert monitors with 5% delay margin • Sub-32nm commercial processor benchmarks
15X-18X reduction in #monitors !
Puneet Gupta ([email protected])
How can we evaluate software behavior in the presence of
variability?
Binary Instrumentation
Limitations: adds cycles and energy cost to the target
code; kernel code typically not supported; native arch.
code only
Cycle-Accurate Simulation
Limitations: complexity, speed
VarEMU: Variability Extensions to the QEMU VMM Fast (binary translation), supports arbitrary emulated code,
several target machines & architectures Open-source; available at https://github.com/nesl/varemu
Puneet Gupta ([email protected]) 33
Cycle and Time Accounting
add r1, r2, r3
Translation Time Execution Time
x y z
cycle counting class
# cycles error status
Instruction Info
Translated Ops
Puneet Gupta ([email protected])
From Cycles to Energy
Power = f (V, F, T, instruction class,
a, b, c)
Cycle Counters
Emulated Software
External Monitor
Energy
change parameters
accumulate
read, adapt to
Puneet Gupta ([email protected])
Error Emulation
35
add r1, r2, r3
error status = PRE |
POST | REPLACE
Instruction Info
x y z
original ops
Software Controlled Global Error Enable
pre ( )
post ( )
replace ( )
Can call an external (e.g., RTL) simulator for more accurate error emulation in a co-simulation like model
Puneet Gupta ([email protected])
Aging Emulation
• Use existing models for NBTI – Approximate total aging ΔVth ∝ total active
time • Active time à A.C. aging • Clock gating à D.C. aging • Power gating à Recovery
• Delay as alpha power law model • Calibrated to a commercial 45nm process
Puneet Gupta ([email protected]) 37
Interaction With Emulated Software
VarEMU Error Model
Intercepts instruction
execution and inserts errors
@I/D Memory location @Instruction decoding
@Instruction execution
App
OS
enable() disable()
App
Error status is part of process context
Error Model Parameters:
Memory locations
Probability of error
Error Magnitude ...
Cycle & Energy
Counters read ()
Power Model Parameters:
Voltage, Frequency,
A, B, C
Aging Model Parameters: Power gated time, clock gated time