Lecture 13• Advanced Technologies on Digital Circuits
– Performance Metrics for Digital Circuits– Impact of Advanced Technologies
Reading: • multiple research articles (reference list at the end of this
lecture)• B. Nikolic, EE 241 course materials, UC Berkeley
1.E+03
1.E+04
1.E+05
1.E+06
1.E+07
1.E+08
1.E+09
1.E+10
1970 1980 1990 2000 20101.E+03
1.E+04
1.E+05
1.E+06
1.E+07
1.E+08
1.E+09
1.E+10
4004
8080
286
486TM
Pentium® II CPU
Pentium® 4 CPU
Itanium® 2 CPU
2x increase every
2 years
PenrynQC
Single CoreDual Core
Quad Core
6 Core8 Core#
of T
rans
isto
rs p
er D
ie
Complementary Logic Gates
11/17/2013 2Nuo Xu EE 290D, Fall 2013
• τd is reduced by increasing IEFF and reducing load capacitance C.
EFF
DDpLHpHLd I
CVtt22
1Intrinsic Delay Load capacitance includes:
• Cdiff• (effective) fanout cap.s CGG Cfringe
• Cwire
Other Combinational Logic GatesInverter Chains
Energy & Delay of Digital Circuits
11/17/2013 3Nuo Xu EE 290D, Fall 2013
• tdelay = LD · F · CVDD/(2IEFF)• Edyn + Eleak = α · LD · F · CVDD
2 + LD · F · IOFFVDDtdelay
• = α · LD · F · CVDD2 [ 1 + (LD·F /2α) / (IEFF/IOFF) ]
• LD : Logic Depth• F : effective Fanout
per stage• α : Activity Factor
approximated model for the data path:(this may not be the latency-optimized design,
but the energy-optimized one.)
Latch Latch
Courtesy of E. Alon (UC Berkeley)
Activity Factor Calculation
11/17/2013 4Nuo Xu EE 290D, Fall 2013
Energy Delivery Path during 0→1 at Output:
Energy consumed during N cycles:
Example: A 2-input NOR Gate Truth Table
Courtesy of B. Nikolic (UC Berkeley)
Short-Circuit Energy
11/17/2013 5Nuo Xu EE 290D, Fall 2013
0.10 0.15 0.20 0.25 0.30 0.35 0.4010-17
10-16
10-15
Ener
gy p
er O
pera
tion
(J)
Threshold Voltage (V)
Esc Etot
22 nm TriGate TransistorsVDD = 0.7 V
GND
VDDS
S
DDVIN VOUT
CMOS inverter: Impact of VTH on CMOS
short-circuit energy
• Short circuit energy can be mitigated by adjusting |VT,P| + |VT,N| > VDDH.J.M. Veendrick, JSSC (1984)
Energy vs. Delay Plots
11/17/2013 6Nuo Xu EE 290D, Fall 2013
L. Wei, TED (2011)
• tdelay = LD · F · CVDD/(2IEFF)
• Edyn + Eleak = α · LD · F · CVDD2 +
LD · F · IOFFVDDtdelay
• Obviously, there exists the inherent trade-off between energy and delay (1 / performance) in logic circuits:
• Optimizations needed for: VDD ION, IOFF or ION/IOFF ? Transistor sizing Logic structure (LD, α, F) System architecture
Energy
Delay
Optimal VDD and ION/IOFFTotal energy of a logic chain:
The Edyn/Eleak under optimum ION/IOFF:
H. Kam, TED (2012)(for a MOSFET)
• Pick VDD, VTH to minimize energy for given performance (1/delay)
– Assuming work function (VTH) can be freely tuned
• Result: optimal ION/IOFF LD · F / α
11/17/2013 7Nuo Xu EE 290D, Fall 2013
Derive E vs. VDD, at a fixed tdelay
Derive E vs. VDD, to find the minimum E:
2 ∙ ∙ ∙ 2 ∙ ∙∙ ∙
⁄ 0
→ , m ∙ 24∙
• Note that the optimal VDD occurs at MOSFET sub-threshold region.
B.H. Calhoun, JSSC (2005)
Application Oriented Energy Optimization
11/17/2013 Nuo Xu EE 290D, Fall 2013
H. Fuketa, IEDM (2012)
Energy vs. Delay Optimizations: To Probe Further
11/17/2013 9Nuo Xu EE 290D, Fall 2013
• Downsizing transistors ( CL)• Reducing clock frequency ( fclk) lowers the performance
• Reducing activity factor requires logic restructuring
• Increasing threshold ( VTH)• Lowering the supply voltage ( VDD) Lowers the performance
• Improving electrostatics + mobility• Reducing parasites
Designs Technology
L. Wei, TED (2011)
Pareto Curve of minimum E vs. D
Power Gating Techniques
11/17/2013 10Nuo Xu EE 290D, Fall 2013
B. Calhoun, JSSC (2004)
Dynamic VTH Tuning Sleep Transistors
Dual VDD
Y. Shimazaki, JSSC (2004)
Micro-Architectural Design
11/17/2013 11Nuo Xu EE 290D, Fall 2013
Perf. fclk, E/op ~ 0.5 unit
AB
Parallelism
A
B
A
B
Reference
Pipeline
Perf. fclk, E/op ~ 0.5 unit
Perf. fclk, E/op ~ 1 unit
• Parallelism allow transistors operate at slower speed while maintains the same throughput per frequency.
Energy
Perf.
High VDDLow VDD
w/ parallelism
Digital Circuit Design Trade-offs
11/17/2013 12Nuo Xu EE 290D, Fall 2013
S. Borkar, VLSI-T short course (2007)
Impact of Technology Scaling
11/17/2013 13Nuo Xu EE 290D, Fall 2013
Source: Intel
In a power-limited technology:• Parasites reduction seems
the most important among all solutions (i.e. SS↓, µ↑)
• VDD scaling will be helpful only for devices with good electrostatics.
A. Khakifirooz, IEDM (2008)
Impact of Variability
11/17/2013 14Nuo Xu EE 290D, Fall 2013
• Device variability hurts in two ways1. Reduces effective IEFF (delay set by worst-case)2. Increase effective IOFF (leaky devices dominate) Forces increase in nominal ION/IOFF and VDDmin …
200p 400p 600p 800p 1n
1
2
3
w/ RW
HP Devices
Ener
gy p
er S
witc
h (1
0-16 J
)
Delay (Sec)
N-MOSFET
W/Leff=50/28nm
= 0.01 F = 4 LD = 30
LP DevicesSimu.
UniformDoping
Energy vs. Delay plots under RDF Logic VDDmin vs. logic depth
H. Fuketa, IEDM (2012)
Advancement ofThin-Body MOSFETs
11/17/2013 15Nuo Xu EE 290D, Fall 2013
K. Kuhn, IEDM (2012)
N. Planes, VLSI (2012)
• Benefits of thin-body MOSFETs: Improved ION/IOFF from
higher mobility and better electrostatics
Improved variability• However, parasites
reduction are still challenging. SOI substrate will be
extremely helpful in this regime
UTBB FD-SOI @ 28 nm
TriGate @ 22 nm
Bulk vs. SOI FinFETs: Inverter Benchmark
11/17/2013 16Nuo Xu EE 290D, Fall 2013
T. Chiarella, SSE (2010)
• SOI FinFET fits better for low-power corner while bulk FinFET for the high-performance one.
• A balanced P/N design will benefit for bulk FinFET further.
References1. H.J.M. Veendrick, “Short-Circuit Dissipation of Static CMOS Circuitry and Its Impact on the Design
of Buffer Circuits,” IEEE Journal of Solid-State Circuits, vol.19, no.4, pp. 468-473, 1984.2. L. Wei et al., “Technology Assessment Methodology for Complementary Logic Applications Based
on Energy-Delay Optimization,” IEEE Trans. Electron Devices, vol.58, no.8, pp.2430-2439, 2011.3. H. Kam et al., “Design Requirements for Steeply Switching Logic Devices,” IEEE Trans. Electron
Devices, vol.59, no.2, pp.326-334, 2012.4. B.H. Calhoun et al., “Modeling and Sizing for Minimum Energy Operation in Subthreshold
Circuits,” IEEE Journal of Solid-State Circuits, vol.40, no.9, pp.1778-1786, 2005.5. H. Fuketa et al., “Device-Circuit Interactions in Extremely Low Voltage CMOS Designs,” IEEE
IEDM Tech. Dig., pp. 559-562, 2011.6. B.H. Calhoun et al., “A Leakage Reduction Methodology for Distributed MTCMOS,” ,” IEEE
Journal of Solid-State Circuits, vol.39, no.5, pp. 818-826, 2004.7. Y. Shimazaki et al., “A Shared-Well Dual-Supply-Voltage 64-bit ALU,” IEEE Journal of Solid-State
Circuits, vol.39, no.3, pp. 494-500, 2004.8. S. Borkar, “Power/Performance Management & Challenges,” VLSI Tech. Symp., Short Course,
2007.9. A. Khakifirooz et al., “The Future of High-Performance CMOS: Trends and Requirements,” IEEE
IEDM Tech. Dig., pp. 30-33, 2008.10. N. Planes et al., “28nm FDSOI Technology Platform for High-Speed Low-Voltage Digital
Applications,” Symp. VLSI Tech. Dig., pp. 133-134, 2012.11. K. Kuhn et al., “The Ultimate CMOS Device and Beyond,” IEEE IEDM Tech. Dig., pp. 171-174,
2012.12. T. Chiarella et al., “Benchmarking SOI and Bulk FinFET Alternatives for Planar CMOS Scaling
Succession,” Solid-State Electronics, vol.54, pp.855-860, 2010.