1
EE241 - Spring 2005Advanced Digital Integrated Circuits
Lecture 17:Process Variations
2
Variability SourcesPhysical
Changes in characteristics of devices and wires.Caused by IC manufacturing process & wear-out (electro-migration).Time scale: 109sec (years).
EnvironmentalChanges in VDD, Temperature, local coupling.Caused by the specifics of the design implementation.Time scale: 10−6 to 10−9 sec (clock tick).
2
3
4
Process VariationsControl of minimum features does not track feature scaling
Relative device/interconnect variations increaseSources:
Random dopant fluctuationsFeature size, oxide thickness variations
Effects:SpeedPower, primary leakageYield
3
5
Increasing Process Variations
Increase in variation of process parameters with scalingWorst-case design getting more expensive“Better than worst-case” design must be error tolerant
Percentage of total variation accounted for by within-die variation(device and interconnect)
Original Source: Original Source: SaniSani NassifNassif IBMIBM
6
4
7
0
20
40
60
80
100
120
-39.71 -25.27 -10.83 3.61 18.05 32.49
∆VTn(mv)
# of
Chi
ps
~30mV
Vt Distribution
0.18 micron~1000 samples
Low FreqLow Isb
High FreqMedium Isb
High FreqHigh Isb
8
Sources of Variations
10
100
1000
10000
1000 500 250 130 65 32
Technology Node (nm)
Mea
n N
umbe
r of D
opan
t A
tom
s
Random Dopant Fluctuations
0.01
0.1
1
1980 1990 2000 2010 2020
micron
10
100
1000
nm193nm193nm248nm248nm
365nm365nm LithographyLithographyWavelengthWavelength
65nm65nm90nm90nm
130nm130nm
GenerationGeneration
GapGap
45nm45nm32nm32nm 13nm 13nm
EUVEUV
180nm180nm
Source: Mark Bohr, Intel
Sub-wavelength Lithography
5
9
10
Achieving Sub-wavelength Resolution
6
11
12
7
13
14
8
15
16
Causes Larger Frequency Distribution
Courtesy IntelCourtesy Intel
9
17
Frequency & SD Leakage
0.9
1.0
1.1
1.2
1.3
1.4
0 5 10 15 20
Normalized Leakage (Isb)
Nor
mal
ized
Fre
quen
cy
0.18 micron~1000 samples
20X30%
Low FreqLow Isb
High FreqMedium Isb
High FreqHigh Isb
18
Variation-tolerant Design
0
0.5
1
1.5
# uArch critical pathsless more
Balance power &
frequency with
variation tolerance
0
0.5
1
1.5
Logic depthsmalllarge
frequencytarget frequency probability
00.5
11.5
2
Transistor sizesmall large
powertarget frequency probability
00.5
11.5
2
Low-Vt usagelow high
10
19
ApproachesWorst-case design
Leaves too many crumbs on the table. Huge concurrency overhead for performance.
Regular design strategies to reduce variationCareful choice of logic stylesSelf-adapting design.
Turns on-line knobs (Vdd, Vt) to guarantee operation of the design. Uses one-time correction for systematic errors
Alternative Timing ApproachesSelf-timed or clockless design
Defers the decisions to the system level. Comes with large overheadPseudo-synchronous design (e.g. Razor)
Allows for occasional timing errors. Limited operation range.
30 March 2005 Slide 20
http://www.c2s2.org
www.c2s2.org
Problem: Predictability ≈ (Chip Variability)-1Problem: Predictability ≈ (Chip Variability)-1
Std library abstractions break:don’t “hide” the details
anymore, as we scale down
Defocuseffect
Defocuseffect
Exposure variation Resist effect
(shrinks)(grows)
Local printability problems
Cu thickness distrib
Cu thickness histogram
Global effects
Demise of context-freelayout design rules
Correlated randomvariations hit ckt level
11
21
Yesterday’s Freelance Layout
Vss
Vdd
OpIp
Vss
Vdd
Op
No layout restrictionsNo layout restrictionsNo layout restrictions
22
Transistor Orientation Restrictions
Vss
Vdd
OpIp
Vss
Vdd
Op
Transistor orientation restricted to improve manufacturing control
Transistor orientation restricted to improve Transistor orientation restricted to improve manufacturing controlmanufacturing control
12
23
Op
Vss
Vdd
Ip
Vss
Vdd
Op
Transistor Width Quantization
1
30 March 2005 Slide 28
http://www.c2s2.org
www.c2s2.org
“Fabrics” Idea: Atomic Regularity(Make the Variablility Small…Everywhere)“Fabrics” Idea: Atomic Regularity(Make the Variablility Small…Everywhere)
Starting from basic manufacturingshapes circuits logic routing everything is extremely regular
Means radical re-architecting of flowsHow much predictability? At what cost?Initial motivation was “what’s after ASICs”, now more generally aimed at “predictability”
Tomorrow’sdesigns
Today’sdesigns
Regular Geometry FabricRegular Geometry Fabric
Regular/Structured Integrated SystemRegular/Structured Integrated System
Regular CircuitsRegular Circuits
ASPDAC, Jan. 2005ASPDAC, Jan. 2005
Regular Fabrics – A Plethora of Choices
FPGAFPGA
VPGACMU
VPGACMU
River PLABerkeley
River PLABerkeley
Structured ASIC (e.g. LSI RapidChip)Structured ASIC (e.g. LSI RapidChip)
Trade-off between area, performance, power and
time-to-market (factors 5 to 10)
TradeTrade--off between area, off between area, performance, power and performance, power and
timetime--toto--market market (factors 5 to 10)(factors 5 to 10)
2
30 March 2005 Slide 30
http://www.c2s2.org
www.c2s2.org
Fabric Architectures:Via Patterned Gate ArrayFabric Architectures:Via Patterned Gate Array
Configurable with 4 masks for top viasBase architecture can be like an FPGA – but replace expensive switches with mask-config vias
Many possible interconnect options: std cell routing, or fully regular top-level patterned routing
Array offers fully predictable geom. patterning
-2.9821960000Regular Logic VPGA flow
-2.5211752048Standard ASIC flow
Slack (ns)Area (um2)Network switch (80k Gates)
VPGA regular logic +standard cell routing
[Pileggi, CMU]
30 March 2005 Slide 31
http://www.c2s2.org
www.c2s2.org
Fabric Analysis: Enhanced Manufacturability for Regular Ckt FabricsFabric Analysis: Enhanced Manufacturability for Regular Ckt Fabrics
Reduced CMP effectsCopper dishing < 40ÅPost-CMP Copper thickness variation is less than 2-3%
Highly promising as a manufacturable ‘logic’replacement structure
M4 Density of CMU VPGA FPU Cu Dishing (M4) Final Post-CMP Cu Thickness (M4)
Plated Thickness (M4) Oxide Erosion (M4)
[Boning, MIT Pileggi, CMU]
3
30 March 2005 Slide 32
http://www.c2s2.org
www.c2s2.org
Fabric-level Custom Circuit Design:Limited-Switch Dynamic Logic (LSDL)Fabric-level Custom Circuit Design:Limited-Switch Dynamic Logic (LSDL)
Merges latch with every output stageSpeed of domino – with less power
Experiment: 16-bit Kogge-Stone adders, full domino vs LDSL
58 extracted 0.18um fab run modelsMonte-Carlo for chip-to-chip & mismatch
LDSL: good for size & speed~ 20% less area; ~2X faster
LSDL: good for regularity, var toleranceCells more regular in content and size. Less variation in pattern density.More tolerant of manufacturing variation
LSDL
Domino
Poly M1Pattern Density
Domino LSDL [Montoye 03]
Interchip+MismatchVariation
[Boning, MIT]
30 March 2005 Slide 33
http://www.c2s2.org
www.c2s2.org
Fabric-Level Flow Design:“Regularizing” Cell-Based FlowsFabric-Level Flow Design:“Regularizing” Cell-Based Flows
Complementary approach:
Regularize a library-based flowEvery cell is identical—except for viasRegularity issues handled by cell layout generators, exclusivelyExtends lifetime of existing flows
Looking at impacts on performanceEx: granularity of available cell sizes (i.e. library size) is reduced
[Sechen, Washington]
4
2 Dec 200434 LSI Logic Confidential © LSI Logic 2004
RapidChip® Platform ASIC
Configurable PlatformFamilies of pre-manufactured slicesSea of transistors for high density, high performance user-configurable logicUp to 5 layers of metal personalizationFlexible approach to IP:
Diffused only when performance dictates, eg high speed SerDesOn-demand for most other IPs, egprocessors
Rich portfolio of soft IPs available
Xilinx
Today’s Reconfigurable FPGA Platform
PowerPC™Processor 400+ MHz
Programmable IO
10Mbit Dual-Port™RAM
High-speed 3.125 Gbps Serial Transceivers
>500 DSP datapaths 10 Million gates
18 Bit
18 Bit36 Bit
Z
VCCIO
Z
Z
ImpedanceControl
5
36
Delay and Power Variability in CMOSGoal: Investigate the effects of variations in Vth, Lpoly, W, tox and Vdd on the performance of a family of representative circuits.
Quantify the statistical variability of circuit delay and power (active).Identify single parameter contributions to overall variability levels.
Circuits under study:NAND chain (six stages) Adders (16-bits, various architectures)Logic styles: Static, Dynamic Domino, PassgateAll transistor sizes optimized for minimum delay under an area constraint
Experimental Setup:90nm, pd-SOI technologyIndustrial research siteAll parameter distributions set by predictive BSIMSOI models, ITRS (2003)
37
Monte Carlo Simulation IGoal I: Vary all parameters simultaneously; study the statistical variability of power and delay.
Variable parameters: Vth, Lpoly, W, tox, Vdd: 1V (mean value) Temperature held at 85˚CInterdependencies between parameters reconciled within the simulation
N = 200 for adders, N = 1000 for NANDs
The spatial correlation coefficient defines parameter matching between adjacent transistors
Each parameter is assigned identically to all transistors within each circuit instanceρ is set to 1, indicates perfect correlation (worst-case)
Draw all parameter values randomly
from respective distributions
Choose circuit under study
Apply parameters to circuit to create a specific instance
Submit circuit instance to SPICE
simulation, measure active power and delay
repeat N times
6
38
The operating value of Vth is composed of its long channel Vth0 value modified by ∆Vthfactors (BSIMSOI Model):
Interdependencies between parameters are reconciled within each simulation by separating Vth, OPERATING into independent and dependent components.
Interdependencies Between Parameters
)(,)(,
)(,)(,
)(
,
,
,,,0
WVLNV
LVVVV
tNV
thNarrrowWid
HALO
DIBL
BIAS
th
haloth
dsth
bsth
oxSMchannelth
∆ +∆+
∆− ∆+
ΦΦ
≅OPERATINGthV ,
L
Vth,OPERATING
Vth0
∆Vth,HALO
∆Vth,DIBL
∆Vth,BIAS
39
Monte Carlo Simulation II
. . . .
Draw one parameter value randomly
from its distribution, hold others at nominal
Choose circuit under study
Apply parameters to circuit to create a specific instance
Submit circuit instance to SPICE
simulation, measure active power and delay
repeat N times
repeat for all p parameters
Draw one parameter value randomly
from its distribution, hold others at nominal
Apply parameters to circuit to create a specific instance
Submit circuit instance to SPICE
simulation, measure active power and delay
repeat N times
Goal II: Isolate individual parameter contributions to overall power/delay variability
Parameter distributions same as in previous setup
Again, perfect spatial correlation of parameters is assumed (ρ = 1)
7
40
NAND Chains (6-stages)
out
a
b
c
M1 M2 M3
M4
M5
M6
Static CMOS Static Passgate (LEAP)
x
ab
clk
out
clk'
c
M1
M2
M3
M4
M5
out'
clk
a
b
c
M1
M2
M3
M4
M5
M6
out
Pulsed Static Dynamic Domino
Static capacitive load, CL = 10fF
Active, FO3 load (value varies with parameter fluctuations)
41
AddersRipple carry with Manchester carry chain (passgate-based)
Static
Dynamic
Carry select, logarithmic configuration
bit00
1
0
1
0
1
0
1
0
1
bit0
bit1 bit2
bit4 bit5bit3
Cout2
Cout0
Cin
Cout5
Cout9
Cout15
bit6 bit9bit8bit7
bit15bit11 bit14bit13bit12bit10
Static, Dynamic Domino, Passgate
Bit level Cout and Sum selection
Block level Cout selection
Cout generation
8
42
Adders: CLA Trees
15
14
13
12
11
10
09
08
07
06
05
04
03
02
01
00
Sum 15
Sum 14
Sum 13
Sum 12
Sum 11
Sum 10
Sum 09
Sum 08
Sum 07
Sum 06
Sum 05
Sum 04
Sum 03
Sum 02
Sum 01
Sum 00
15
14
13
12
11
10
09
08
07
06
05
04
03
02
01
00
Sum 15
Sum 14
Sum 13
Sum 12
Sum 11
Sum 10
Sum 09
Sum 08
Sum 07
Sum 06
Sum 05
Sum 04
Sum 03
Sum 02
Sum 01
Sum 00
15
14
13
12
11
10
09
08
07
06
05
04
03
02
01
00
Sum 15
Sum 14
Sum 13
Sum 12
Sum 11
Sum 10
Sum 09
Sum 08
Sum 07
Sum 06
Sum 05
Sum 04
Sum 03
Sum 02
Sum 01
Sum 00
Kogge Stone, Radix 2Kogge Stone, Radix 4
Large stack height (static) = 8
Rippled Carries
P,G Generation
Group P,G
Sum Generation
15
14
13
12
11
10
09
08
07
06
05
04
03
02
01
00
Sum 15
Sum 14
Sum 13
Sum 12
Sum 11
Sum 10
Sum 09
Sum 08
Sum 07
Sum 06
Sum 05
Sum 04
Sum 03
Sum 02
Sum 01
Sum 00
15
14
13
12
11
10
09
08
07
06
05
04
03
02
01
00
Sum 15
Sum 14
Sum 13
Sum 12
Sum 11
Sum 10
Sum 09
Sum 08
Sum 07
Sum 06
Sum 05
Sum 04
Sum 03
Sum 02
Sum 01
Sum 00
15
14
13
12
11
10
09
08
07
06
05
04
03
02
01
00
Sum 15
Sum 14
Sum 13
Sum 12
Sum 11
Sum 10
Sum 09
Sum 08
Sum 07
Sum 06
Sum 05
Sum 04
Sum 03
Sum 02
Sum 01
Sum 00
15
14
13
12
11
10
09
08
07
06
05
04
03
02
01
00
Sum 15
Sum 14
Sum 13
Sum 12
Sum 11
Sum 10
Sum 09
Sum 08
Sum 07
Sum 06
Sum 05
Sum 04
Sum 03
Sum 02
Sum 01
Sum 00
Brent-KungLarge intermediate load capacitance along critical path (Sum07 node)
15
14
13
12
11
10
09
08
07
06
05
04
03
02
01
00
Sum 15
Sum 14
Sum 13
Sum 12
Sum 11
Sum 10
Sum 09
Sum 08
Sum 07
Sum 06
Sum 05
Sum 04
Sum 03
Sum 02
Sum 01
Sum 00
15
14
13
12
11
10
09
08
07
06
05
04
03
02
01
00
Sum 15
Sum 14
Sum 13
Sum 12
Sum 11
Sum 10
Sum 09
Sum 08
Sum 07
Sum 06
Sum 05
Sum 04
Sum 03
Sum 02
Sum 01
Sum 00
Han-Carlson
43
Delay, Power Variability: NAND chains
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
STATIC PSCMOS LEAP DOMINO
Nor
mal
ized
Del
ay V
aria
bilit
y (s
igm
a/m
ean)
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
STATIC PSCMOS LEAP DOMINO
Nor
mal
ized
Pow
er V
aria
bilit
y (s
igm
a/m
ean)
The static CMOS implementation is the most robust to process parameter variationsThe passgate style (LEAP) displays the highest levels of delay and power variability (30% higher than static)
9
44
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
MAN
CHES
TER
STAT
IC
MAN
CHES
TER
DYN
AMIC
CAR
RY
SELE
CTST
ATIC
CAR
RY
SELE
CTPA
SSG
ATE
CAR
RY
SELE
CTD
YNAM
IC
KO
GG
E-ST
ON
EST
ATIC
RAD
IX 2
KO
GG
E-ST
ON
EST
ATIC
RAD
IX 4
KO
GG
E-ST
ON
EPA
SSG
ATE
KO
GG
E-ST
ON
ED
YNAM
IC
HAN
-CAR
LSO
N
BR
ENT-
KU
NG
Nor
mal
ized
Del
ay V
aria
bilit
y (s
igm
a/m
ean)
Delay Variability: Adders
Static carry select is the most robustThe three most variable are passgate-based, between 31% - 67% more spread than static carry select
45
0.0
0.5
1.0
1.5
2.0
2.5
MAN
CHES
TER
STAT
IC
MAN
CHES
TER
DYN
AMIC
CAR
RY
SELE
CTST
ATIC
CAR
RY
SELE
CTPA
SSGA
TE
CAR
RY
SELE
CTD
YNAM
IC
KOG
GE-
STO
NE
STAT
IC R
ADIX
2
KOG
GE-
STO
NE
STAT
IC R
ADIX
4
KOG
GE-
STO
NE
PASS
GATE
KOG
GE-
STO
NE
DYN
AMIC
HAN
-CAR
LSO
N
BR
ENT-
KU
NG
Nor
mal
ized
Pow
er V
aria
bilit
y (s
igm
a/m
ean)
Most robust: static ripple with Manchester carry chainThe least robust: designs with large/irregular intermediate load capacitance along critical paths (radix 4 Kogge Stone, Brent Kung)
Power Variability: Adders
10
46
Single Parameter Breakdown: NAND Chains
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
STATIC PSCMOS LEAP DOMINO
Indi
vidu
al P
aram
eter
Co
ntrib
utio
n to
Del
ay V
aria
bilit
yVthLVddtoxW
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
STATIC PSCMOS LEAP DOMINO
Indi
vidu
al P
aram
eter
Co
ntrib
utio
n to
Del
ay V
aria
bilit
y
VthLVddtoxW
Results vary depending on final loading stage (static vs. FO3) Vth is most significant contributor in all casesFor active, F03 loads:
Passgate design is most sensitive to Vth variationsIncreased significance of L variations
Static capacitive load F03 load
47
0%
20%
40%
60%
80%
100%
MAN
CHES
TER
STAT
IC
MAN
CHES
TER
DYN
AMIC
CAR
RY
SELE
CTST
ATIC
CAR
RY
SELE
CTPA
SSG
ATE
CAR
RY
SELE
CTD
YNAM
IC
KO
GG
E-ST
ON
EST
ATIC
RAD
IX 2
KO
GG
E-ST
ON
EST
ATIC
RAD
IX 4
KO
GG
E-ST
ON
EPA
SSG
ATE
KO
GG
E-ST
ON
ED
YNAM
IC
HAN
-CAR
LSO
N
BR
ENT-
KU
NGN
orm
aliz
ed In
divi
dual
Par
amet
er
Cont
ribut
ion
to D
elay
Var
iabi
lity
WtoxvddLvth
Single Parameter Breakdown: Adders (Delay)
Vth is most significant contributor (33% average)Passgate designs are the most sensitive to Vth variationsL is nearly as significant (28% average)
11
48
0%
20%
40%
60%
80%
100%
MAN
CHES
TER
STAT
IC
MAN
CHES
TER
DYN
AMIC
CAR
RY
SELE
CTST
ATIC
CAR
RY
SELE
CTPA
SSG
ATE
CAR
RY
SELE
CTD
YNAM
IC
KO
GG
E-ST
ON
EST
ATIC
RAD
IX 2
KO
GG
E-ST
ON
EST
ATIC
RAD
IX 4
KO
GG
E-ST
ON
EPA
SSG
ATE
KO
GG
E-ST
ON
ED
YNAM
IC
HAN
-CAR
LSO
N
BR
ENT-
KU
NGN
orm
aliz
ed In
divi
dual
Par
amet
er
Cont
ribut
ion
to P
ower
Var
iabi
lity
WtoxvddLvth
Single Parameter Breakdown: Adders (Power)
Vdd contributions dominate (41% average)Vth variations are also significant (30% average)
49
ConclusionsStatic CMOS implementations are generally the most robust to parameter variations, for both delay and powerPassgate designs display the least amount of robustness:
Suffer spreads in delay and power variability between 30% – 70% higher than static designsTend to display highest sensitivity to Vth variationsThese are worst-case results, due to the assumption of perfect parameter correlation
Vth variations account for 35% - 40% of delay variabilityPower variability trends suggest a dependence upon large or irregular intermediate load capacitancesVth, L and Vdd are consistently the highest contributors to both delay (85%) and power (80%) variation.
12
50
A Self-adapting Approach
Module
Motivation: Most timing variations are systematic, and can be adjusted forat start-up time using one-time calibration!
• Relevant parameters: Tclock, Vdd, Vth• Vth control — the most effective and efficient at low voltages• Can be easily extended to include leakage-reduction and power-down in standby
TestModule
Vdd
Vbb
Test inputsand responses
Tclock
• Achieves the maximum power saving under technology limit• Inherently improves the robustness of design timing• Minimum design overhead required over the traditional design methodology
51
Vth Tuning via Body Bias
Less design cost than Vdd tuningVth tunable range: >150mV for a 90nm Technology
0
0.1
0.2
0.3
0.4
0.5
0.6
-2 -1 0 1 2
Vth (V)
Vbs
(V)
Reversed Vbs
Forward Vbs
G
B
S D
13
52
Power and Timing Tradeoffs
Vth tuning can effectively gain performance back
5
10
15
20
25
30
35
40
45
50
1.0E+03 1.0E+04 1.0E+05 1.0E+06 1.0E+07
Path Delay (ps)
Esw
itchi
ng(fJ
)
Adaptive Tuning
Worst Case, w/o Vth tuning
Worst Case, w/ Vth tuning
Nominal, w/o Vth tuning
Nominal, w/ Vth tuning
Vdd: 200-500mV
Resistor Network
4.5 mm
5.3
mm
Multiplesubsites PD & Counter Resistor
Network
CUT Bias Amplifier
Delay
Die frequency: Min(F1..F21)Die power: Sum(P1..P21)
Technology 150nm CMOSNumber of subsites per die 21
Body bias range 0.5V FBB to 0.5V RBB
Bias resolution 32 mV
1.6 X 0.24 mm, 21 sites per die150nm CMOS
Adaptive Body BiasAdaptive Body Bias----ExperimentExperiment
14
Adaptive Body BiasAdaptive Body Bias----ResultsResults
0%
20%
60%
100%
Acc
epte
d di
e
noBB
100% yield
ABB
Higher Frequency
Num
ber o
f die
s
Frequency
too slow
ftarget
too leaky
ftarget
ABB
FBB RBB
Num
ber o
f die
s
Frequency
too slow
ftarget
too leaky
ftarget
ABB
FBB RBB
97% highest bin
within die ABB
For given Freq and Power densityFor given Freq and Power density•• 100% yield with ABB 100% yield with ABB •• 97% highest freq bin with ABB for 97% highest freq bin with ABB for within die variability within die variability
55
Adaptive Approach for Dealing with Variations
Source: Sam Naffziger, HP