1
Low-Power VLSI
Seong-Ook Jung
2013. 5. 27.
VLSI SYSTEM LAB, YONSEI University
School of Electrical & Electronic Engineering
2 YONSEI Univ. School of EEE
Contents
1. Introduction
2. Power classification & Power – performance relationship
3. Low power design1. Architecture and algorithm level
2. Circuit level
3. Device level
4. NTV operation
4. Summary
2
Introduction
4 YONSEI Univ. School of EEE
Technology Scaling
Technology Scaling : Moore’s law The number of transistors that can be placed on an integrated circuit
has doubled approximately every 18 months
[1] http://en.wikipedia.org
3
5 YONSEI Univ. School of EEE
Development Trend
Scaling (More Moore) More devices are integrated in a
chip
New scaling road mapNot only ‘geometrical scaling’ for
2D device, but also ‘equivalent scaling’ for 3D device
Beyond bulk CMOSFinFET, SOI…
Functional Diversification (More than Moore) Several functions are merged in
a chip
[2] ITRS (International Technology Roadmap for Semiconductors) 2011
6 YONSEI Univ. School of EEE
SoC Performance
SoC Performance : Exponentially Increases!! Thanks to both device technology and design methodology
[2] ITRS (International Technology Roadmap for Semiconductors) 2011
4
7 YONSEI Univ. School of EEE
SoC Power Consumption Problem
SoC Power Consumption : ‘Also’ Severely Increases After 15 years, about x10 power is required…
[2] ITRS (International Technology Roadmap for Semiconductors) 2011
8 YONSEI Univ. School of EEE
Process Variation Problem
Process Variation : Result of Scaling Global variation and local variation
Global variation Comes from fabrication, lot, wafer processes
Different process corner (NMOS-PMOS : SS/SF/TT/FS/FF)
Local variation Truly random variation between device with identical layout
[3] Synopsis, 2005 [4] http://cnx.org
5
9 YONSEI Univ. School of EEE
Process Variation Problem
Performance Variation due to Process Variation Frequency difference ≈ 30%
Leakage current difference ≈ x20
⇒ Process variation should be considered in SoC design
[5] A. Devgan, “Leakage Issues in IC Design: Part 3”, IBM
10 YONSEI Univ. School of EEE
Effect of the Process Variation
Limitation for Low Voltage / Low Power Operation ID∝W/L*(VDD-VTH)α
VTH variation ⇒ ID variation ⇒ Performance Variation !!
Need more design margin due to process variation ⇒ VDD ↑
Yield Limitation Process variation ⇒ failure probability ↑ ⇒ Yield ↓
6
11 YONSEI Univ. School of EEE
Importance of Low Power VLSI Design
Low power VLSI design !!!
Process variation tolerant design
[2] ITRS (International Technology Roadmap for Semiconductors) 2011
12 YONSEI Univ. School of EEE
State-of-the-arts Low Power VLSI in Commercialized Product
[6] http://www.techinsights.com
7
13 YONSEI Univ. School of EEE
SPEC of State-of-the-arts Low Power VLSI
HKMG & DVFS?
In this class, we will study about a variety of low power design techniques
[1] http://en.wikipedia.org/
Power Classification & Power – Performance Relationship
8
15 YONSEI Univ. School of EEE
Power Consumption of CMOS Circuits
Ptotal = Pdynamic + Pleakage
Psw + Psc
Power Classification
=
16 YONSEI Univ. School of EEE
Switching Power : Psw
Psw is due to the charge and discharge (output transition) of the capacitors driven by the circuit according to input transition.
Psw = CLVDD2f
I=CLdV/dt=CL∆Vf
Psw=IVDD=CL∆V VDDf
In digital circuit, ∆V=VDD
Psw=IVDD=CLVDD2f
9
17 YONSEI Univ. School of EEE
Short Circuit Power : Psc
Psc is caused by the simultaneous conductance of PMOS and NMOS during input and output transitions.
Psc = (β/12)(VDD-2VTH)3 (t3-t1)
When VTN < VIN < VDD – |VTP|
VIN = VTN @ t1VIN = VDD – |VTP| @ t3
18 YONSEI Univ. School of EEE
Leakage Power : Psub, Pgate & Pjunc
Psub Ideal MOSFET : Isub = 0 In short channel MOSFET, Isub
exists when |VGS|<|VTH| Psub∝ Exp[(VGS-VTH)/mvT ] VDD
Pgate Ideal MOSFET : Igate = 0 In short channel MOSFET, Igate
exists because of thin TOX
Pgate∝WL (VGS/TOX)2 VDD
Pjunc Reverse PN junction leakage Pjunc∝ Exp[VD/vT -1] VDD
[7] K.M.Cao, “BSIM4 Gate Leakage Model Including Source-Drain Partition”, IEDM, 2000[8] http://www.altera.com/
10
19 YONSEI Univ. School of EEE
Power Vs. Performance
Case.1 : VDD ↓ All power consumption ↓ However… Delay ∝ CLVDD/ID∝
CLVDD/(VDD-VTH)α
Thus, VDD ↓ Delay ↑ Performance loss
Case.2 : VTH ↑ Psc ↓ and especially, Psub ↓ However… Delay ∝ CLVDD/ID∝
CLVDD/(VDD-VTH)α
Thus, VTH ↑ Delay ↑ Performance loss
Case.3 : f ↓ Psw ↓ However… Throughput ∝ f Performance loss
Power consumption equation Psw = CLVDD
2f
Psc = (β/12) (VDD-2VTH)3 (t3-t1)
Psub∝ Exp[(VGS-VTH)/mvT ] VDD
Pgate∝WL (VGS/TOX)2 VDD
Pjunc∝ Exp[VD/vT -1] VDD
20 YONSEI Univ. School of EEE
Tradeoff w.r.t VTH
Tradeoff between power and performance Low power design :
Power reduction without performance degradation
[9] S. Mutoh, “Review of low-voltage CMOS LSI technology as a standard in the 21st century”, 1998
11
Low Power Design - Architecture and Algorithm Levels
22 YONSEI Univ. School of EEE
Parallelism
[10] A.P. Chandrakasan, “Minimizing power consumption in digital CMOS circuits”, Proc. of IEEE, 1995
Lower VDD and frequency are used at the expense of area penalty
By adopting parallelism…Power ~ x 0.36
Area ~ x 3.4
12
23 YONSEI Univ. School of EEE
Pipeline
[10] A.P. Chandrakasan, “Minimizing power consumption in digital CMOS circuits”, Proc. of IEEE, 1995
By inserting additional pipeline latch, logic depth of critical path is reduced and thus logic can be operated with slower rate
By adopting pipeline…Power ~ x 0.39
Area ~ x 1.3
Low Power Design- Circuit Level
13
25 YONSEI Univ. School of EEE
Critical Path
Critical Path : The Worst Case Delay Path Determines SoC’s maximum performance
# of critical path << # of non-critical path
Fast non-critical path is just wasteful…By increasing non-critical path’s delay, we may achieve power
reduction because of tradeoff relation between power & performance
26 YONSEI Univ. School of EEE
Dual VDD
Basic Idea VDDL
Logic gates off the critical path
VDDH
Logic gate on the critical path
Reduce power without degrading the performance
Shaded : VDDL
Non-shaded: VDDH
[11] K. Usami, “Automated low-power technique exploiting multiple supply voltages applied to a mediaprocessor”, JSSC, 1998
14
27 YONSEI Univ. School of EEE
Dual VTH
High-VTH
Assigned to transistors in noncritical path.
Leakage saving in both standby and active modes
Low-VTH
Assigned to transistors in critical path
Maintained performance
[12] J. T. Kao, “Dual-Threshold Voltage Techniques for Low-Power Digital Circuits”, JSSC, 2000
28 YONSEI Univ. School of EEE
MTCMOS
MTCMOS : Multiple Threshold voltage CMOSBasic Circuit Scheme
Two different VTH
High-VTH (0.5~0.6V) / Low-VTH (0.2~0.3V) Two operating mode
Active / Standby
Operation Active mode
SL=1 / SL=0 VDDVVDD / VGNDVVGND
Low-VTH operating frequency
Standby mode SL=0 / SL=1 VDDV & VGNDV = floating High-VTH leakage
[13] Anis, M, “Dynamic and leakage power reduction in MTCMOS circuits”, Proc. of IEEE, 2002
15
29 YONSEI Univ. School of EEE
DVFS : Basic Concept
Basic Concept Pdynamic = CVDD
2f
VDD and frequency scaling simultaneously
VDD scalingA best way to get low Pdynamic because Pdynamic∝VDD
2
Frequency scalingOperating frequency = throughput
All tasks do not require maximum throughput
By controlling the frequency, SoC improves energy efficiency
[14] G. Dhiman, “Analysis of Dynamic Voltage Scaling for System Level Energy Management”, hotpower, 2008
30 YONSEI Univ. School of EEE
SONY Microprocessor
[15] M.Nakai, “Dynamic Voltage and Frequency Management for a Low-Power Embedded Microprocessor”, JSSC, 2005
DVFS Block
16
31 YONSEI Univ. School of EEE
DVFS
DVFS Block Diagram
Closed loop system DVC : VDD control circuit DFC : Frequency control circuit
32 YONSEI Univ. School of EEE
Delay Synthesizer Structure
Composed not only a simple transistor delay factor, but also wire delay and rise/fall delayGate delay component : one of nominal gate length and another of long
gate lengthRC delay component : wires from each of the four metal layers and its
total length is 14mm
17
33 YONSEI Univ. School of EEE
Delay Synthesizer Effect
34 YONSEI Univ. School of EEE
Operation (DVC+DFC)
Operation Procedure
Low → High : The main logic clock frequency is changed after the DVC confirms the voltage has increased enough
High → Low : Both the DVC reference clock and the system clock are changed simultaneously
18
35 YONSEI Univ. School of EEE
Performance Enhancement
Low Power Design- Device Level
19
37 YONSEI Univ. School of EEE
High-K & Metal Gate
High-K & Metal Gate SiO2 -> High-K material
Thick oxide can be usedwithout performance degradation
Gate leakage is substantiallydecreased
Low power & High performance !!
Poly gate -> Metal gatePoly gate is not suitable to high-k
material High switching voltage is required
Metal gate solves the problem of poly gate
Low power !!
[16] http://www.automationnotebook.com
38 YONSEI Univ. School of EEE
FinFET
Characteristics Vertical structure
FinFET effective width= fin thickness + 2×fin height
ScalingAs scaling goes on, variation
of planar MOSFET get worse. VDD scaling is impossible.
However, FinFET’s VTH variation can be reduced. Fully depleted device Superior short channel control
Undoped body No random dopant fluctuation
VDD scaling is possible ⇒ low power !!
[17] T. Chiarella, "Migrating from Planar to FinFET for Further CMOS Scaling: SOI or Bulk?”, ESSDERC, 2009,
3D FinFET
Fin
20
Low Power Design- Near-Threshold Voltage (NTV) Operation
40 YONSEI Univ. School of EEE
Low Voltage Operation
Low Voltage Operation Near- and sub-VTH digital circuit design has been
focused on low power consumption.
Sub-VTH Operation Sub-VTH operation is suitable only for specific
applications which do not consider performance.
Near-VTH Operation Near-VTH operation is suitable for applications
which use the DVFS, such as AP in cell phone.
Balanced trade-off between power and performance
R. G. Dreslinski, Proc. IEEE, 2010
[18] R. G. Dreslinski, "Near-Threshold Computing: Reclaiming Moore’s Law Through Energy Efficient Integrated Circuits”, Proc. of IEEE, 2010
21
41 YONSEI Univ. School of EEE
NTV Application
Samsung and Intel’s Products
Samsung : ARM Cortex-A7 processor Intel : IA-32 processor
42 YONSEI Univ. School of EEE
Samsung’s NTV
Samsung mentioned the NTV operation in Samsung’s press release on Dec. 19. 2012. Samsung Electronics Co., Ltd. announced that it reached another milestone in
the development of 14nm FinFET process technology with the successful tape-out of multiple development vehicles in collaboration with its key design and IP partners.
As part of its 14nm FinFET development process, Samsung, and its ecosystem partners – ARM, Cadence, Mentor and Synopsys – taped out multiple test chips ranging from a full ARM® Cortex™-A7 processor implementation to a SRAM-based chip capable of operation near threshold voltage levels as well as an array of analog IP.
Samsung used Synopsys tools optimized for FinFET devices to implement additional IP on this vehicle, including low power SRAMs intended to operate with the power supply close to threshold voltage levels. The move from two-dimensional transistors to three-dimensional transistors introduces several new IP and EDA tool challenges including modeling. The multi-year collaboration between Samsung and Synopsys has delivered foundational modeling technologies for 3D parasitic extraction, circuit simulation and physical design-rule support of FinFET devices.
[19] http://www.samsung.com/global/business/semiconductor/news-events/press-releases/detail?newsId=12461
22
43 YONSEI Univ. School of EEE
Intel’s NTV Intel presented 3 papers applying the NTV operation in ISSCC 2012.
[20] “Intel Labs at ISSCC 2012”, Intel Corporation, 2012
44 YONSEI Univ. School of EEE
Issues of NTV operation
Performance Variation In near-VTH region, the dependencies of driving current on VTH, VDD, and
temperature approach exponential, which significantly increases the performance variation.
[18] R. G. Dreslinski, "Near-Threshold Computing: Reclaiming Moore’s Law Through Energy Efficient Integrated Circuits”, Proc. of IEEE, 2010
23
Summary
46 YONSEI Univ. School of EEE
Summary
State-of-the-arts VLSI Low power & process variation tolerant design
P = Psw + Psc + Psub + Pgate + Pjunc
Pdynamic Pstatic
Power and performance : Trade-off Low power design
Architecture and algorithm level : parallelism, pipe line Circuit level
Long channel Stacked Dual VDD
Dual VTH
MTCMOS DVFS
Device level : FinFET, HKMG NTV operation