Post on 03-Sep-2018
transcript
Chapter 7
Optimizing Power @ Design Time – Memory
Optimizing Power @ Design Time
Memory
Benton H. CalhounJan M. Rabaey
Slide 7.1
Role of Memory in ICs
Memory is very importantFocus in this chapter is embedded memory Percentage of area going to memory is increasing
[Ref: V. De, Intel 2006]
Slide 7.2
Processor Area Becoming Memory Dominated
On-chip SRAM contains 50–90%of total transistor count
– Xeon: 48M/110M
– Itanium 2: 144M/220M
SRAM is a major source of chip static power dissipation
SRAM– Dominant in ultra low-power
applications
– Substantial fraction in others
Intel Penryn™Intel Penryn(Picture courtesy of Intel )
Slide 7.3
Chapter Outline
Introduction to Memory Architectures
Power in the Cell Array
Power for Read Access
Power for Write AccessPower
New Memory Technologies
Slide 7.4
Basic Memory Structures
[Ref: J. Rabaey, Prentice’03]
Globalamplifier/driver
Controlcircuitry
Global data busBlock selector
Block 0
Rowaddress
Columnaddress
Blockaddress
Block i Block P – 1
I/O
Slide 7.5
SRAM Metrics
Functionality
– Data retention
– Readability
– Writability
– Soft Errors
Area
Power
Process variationsincrease with scaling
Large number of cellsrequires analysis oftails (out to 6σ or 7σ)
Within-die V TH variationdue to Random DopantFluctuations (RDFs)
Why is functionality a “metric”?
Slide 7.6
Where Does SRAM Power Go?
Numerous analytical SRAM power models
Great variety in power breakdowns
Different applications cause differentcomponents of power to dominate
Hence: Depends on applications: e.g., highspeed versus low power, portable
Slide 7.7
SRAM cell
Three tasks of a cell
Hold dataBL
– WL = 0; BLs = X
BL BLWL
Q
Write
– WL = 1; BLs driven with new M1
M2
M3
M4M5
M6
data
Read
M1 M4
QB
– WL = 1; BLs precharged and left floatingTraditional 6-Transistor
(6T) SRAM cell
Slide 7.8
Key SRAM cell metrics
Key functionality metricsHoldBL BL Hold– Static Noise Margin (SNM) – Data retention voltage (DRV)
BL BLWL
Q
Read– Static Noise Margin (SNM)M1
M2
M3
M4M5
M6
Write– Write Margin
M1 M4
QB
Metrics:Area is primary constraint
Next, Power , Delay
Traditional 6-Transistor (6T) SRAM cell(6T)
Slide 7.9
Static Noise Margin (SNM)
SNM gives a measure of thecell’s stability by quantifying theDC noise required to flip the cell
[Ref: E. Seevinck, JSSC’87]
SNM is length of side of the largest embedded square on the butterfly curve
SNM is length of side ofthe largest embeddedsquare on the butterflycurve
SNM is length of side of the largest embedded square on the butterfly curve
Inv 2Inv 1
BLBBL WL
Q QB
VN
VN
M3
M1
M2
M6
M4
M5
VTC for Inv 2VTC
–1 for Inv 1
VTC for Inv2 with VN = SNMVTC
–1 for Inv1 with VN = SNM
SNM
0.150 0.30
0.3
0.15
QB
(V)
Q (V)
Slide 7.10
Static Noise Margin with Scaling
Typical cell SNM Tech and VDD scaling lower SNM
deterioratesVariations lead to failure from
Variationsdistribution
(Results obtained from simulations with Predictive Technology M d lModels –[Ref: PTM; Y. Cao ‘00])
deteriorates with scaling
from insufficient SNM
Variations worsen tail of SNM
Slide 7.11
Variability: Write Margin
WLBLBBL
Write failure:Positive SNM
0 1 01
Dominant fight (ratioed)
1Cell stabilityprior to write: Successful write:
Negative “SNM”0.6
0.8
No
rmal
ized
QB
0.2
0.4
Normalized Q
0 0.2 0.4 0.6 0.8 10
1
0.6
0.8
No
rmal
ized
QB
0.2
0.4
Normalized Q
0 0.2 0.4 0.6 0.8 10
1
0.6
0.8
No
rmal
ized
QB
0.2
0.4
Normalized Q
0 0.2 0.4 0.6 0.8 10
Slide 7.12
Variability: Cell Writability
Write Fails0
0.05
–0.05
–0.1
–0.15
–0.2TTWWSS
Temperature (°C)
–0.25–40 –20 0 20 40 60 80 100 120
WSSW
Write margin limits VDD scaling for 6T cells to 600 mV, best case.65 nm process, VDD = 0.6 VVariability and large number of cells makes this worse
VDD = 0.6 V
SN
M (
V)
Slide 7.13
Cell Array Power
Leakage Power dominates while the memory holds data
BL BLWL
‘1’‘0’
Sub-threshold leakage
Importance of Gate tunneling and GIDL depends on technology and voltages applied
Slide 7.14
0.8 1.0
Using Threshold Voltage to Reduce Leakage
T
High-VTH cells necessary if
(QT) = 0.20 μm W W
(QD) = 0.28 μm
all else is kept the sameTo keep leakage in 1 MB
high speed(0.49)
memory within bounds, VTHmust be kept in 0.4–0.6 V range
low power10–4
10–2
100
0.1 µA
10 µA
10–6
Average extrapolated VTH (V) at 25°C
–0.2 0 0.2 0.4 0.610–8
1 M
B a
rray
ret
enti
on
cu
rren
t (A
)
Extrapolated VTH = VTH(nA /μm) + 0.3 V
g = 0.1 μm Lj = 125
W (QL) = 0.18 μm 75 50
25 °C°C
°C
°C100°C
[Ref: K. Itoh, ISCAS’06]
(0.71)
Slide 7.15
Multiple Threshold Voltages
BL BLWL
[Ref: Hamzaoglu, et al., TVLSI’02]
Dual VTH cells with low-VTH access transistors provide
good tradeoffs in power and delay
BLWL
[Ref: N. Azizi, TVLSI’03]
Use high- VTH devices to lower leakage for stored ‘0’, which is
much more common than a stored ‘1’High VTH
Low VTH
‘0’
BL
Slide 7.16
Multiple Voltages
Selective usage of multiple voltages in cell array– e.g.,16 fA/cell at 25°C in 0.13 μm technology
1.0V 1.0VWL=0V High VTH to lower sub-VTH leakageRaised source, raised VDD, and lower BL reduce gate stress while maintaining SNM
1.5V
0.5V
[Ref: K. Osada, JSSC’03]
Slide 7.17
Power Breakdown During Read
Accessing correct cellD d WL d i
VDD_Prech
Deco ers, drivers– For Lower Power:
hi hi l WLMem Address
WL
hierarchical WLspulsed decoders
Performing readSense
– Charge and discharge
mp
Data
large BL capacitance– For Lower Power :
SAs and low BL swing Lower VDD
Hierarchical BLs – May require read assist
Lower BL precharge
Slide 7.18
Hierarchical Wordline Architecture
Reduces amount of switched capacitanceSaves power and lowers delay
[Ref’s: Rabaey, Prentice’03; T. Hirose, JSSC’90]
…
Localword line
Subglobal word line
Global word line
Memory cell
Block 0
…
Localword line
Block 1Blockselect
Block groupselect
…
Block 2 …Blockselect
Slide 7.19
Hierarchical Bitlines
Divide up bitlines hierarchically– Many variants possible
Reduces RC delay, also decreases CV 2 power
Lower BL leakage seen by accessed cell
Local BLs
Global BLs
Slide 7.20
BL Leakage During Read Access
Leakage into non-accessed cells– Raises power and delay– Affects BL differential
“1”
“0”
“0”
Bit
-lin
eSlide 7.21
Bitline Leakage Solutions
“1” “0”
VSSWLVSSWL
“1” “0”
V G
VGND
Raise V SS in cell (VGND) Negative Wordline (NWL)
� Hierarchical BLs� Raise VSS in cell� Negative WL voltage� Longer access FETs� Alternative bit-cells� Active compensation� Lower BL precharge
voltage
Hierarchical BLsRaise VSS in cellNegative WL voltageLonger access FETsAlternative bit-cellsActive compensationLower BL precharge voltage
[Ref: A. Agarwal, JSSC’03]
Slide 7.22
Lower Precharge Voltage
Lower BL precharge voltage decreases power and improves Read SNM
Internal bit-cell node rises lessSharp limit due to
faccidental cell writing if access FET pulls internal ‘1’ lowlow
Slide 7.23
VDD Scaling
Lower VDD (and other voltages) via classic voltage scalingvoltage scaling– Saves power
I d l– Increases delay– Limited by lost margin (read and write)
Recover Read SNM with read assist– Lower BL precharge– Boosted cell VDD [Ref: Bhavnagarwala’04, Zhang’06]
– Pulsed WL and/or Write-after-Read [Ref: Khellah’06]
– Lower WL [Ref: Ohbayashi’06]
Slide 7.24
Power Breakdown During Write
Accessing cell– Similar to Read– For Lower Power:
Hierarchical WLs
Performing write– Traditionally drive BLs full swing– For Lower Power :
Charge sharingData dependenciesLow swing BLs with amplification
Mem Cell
VDD_Prech
Address
WL
Data
Slide 7.25
Charge recycling to reduce write power
Share charge between BLs or pairs of BLs
Saves for consecutive write operations
Need to assess overhead
BL =0 V
BLB =VDD
BL =VDD/2
BLB =VDD/2
BL =VDD
BLB =0 V
old values connect floating BLs
disconnect anddrive new values
01 1
[Ref’s: K. Mai, JSSC’98; G. Ming, ASICON’05]
Basic charge recycling – saves 50% power in theory
Slide 7.26
Memory Statistics
0’s more common– SPEC2000: 90% 0s in data– SPEC2000: 85% 0s in instructions
Assumed write value using inverted data as necessary [Ref: Y. Chang, ISLPED’99]
New Bitcell:BL BLWL
WS
WWL
WZ
1R, 1W portW0: WZ = 0, WWL = 1, WS = 1W1: WZ = 1, WWL = 1, WS = 0
[Ref: Y. Chang, TVLSI’04]
Slide 7.27
Low-Swing Write
Drive the BLs with low swing
Use amplification in cell to restore values
VDD_Prech
WL
EQ
SLC
WE
VWR=VDD–VTH–ΔVBL
Din
VWR
columndecoder
BL BLB
Q QB
[Ref: K. Kanda, JSSC’04]
SLC
WL
EQ
WE
BL/BLB
Q/QB
VDD–VTH–delVBL
VDD–VTH
Slide 7.28
Write Margin
Fundamental limit to most power-reducing techniquestechniquesRecover write margin with write assist, e.g.,– Boosted WL– Collapsed cell VDD [Itoh’96, Bhavnagarwala’04]
– Raised cell VSS [Yamaoka’04, Kanda’04]
– Cell with amplification [Kanda’04]
Slide 7.29
Non-traditional cells
Key tradeoff is with functional robustnessUse alternative cell to improve robustness, then trade off for power savingse.g. Remove read SNM
WBL WBLWWL
RWL
RBL
[Ref: L. Chang, VLSI’05]
• Register file cell• 1R, 1W port• Read SNM eliminated• Allows lower VDD• 30% area overhead• Robust layout
8T SRAM cell
Slide 7.30
Cells with Pseudo-Static SNM Removal
[Ref: S. Kosonocky, ISCICT’06] [Ref: K. Takeda, JSSC’06]
BL BLWL
WLW
BL BL
WWL
WLB
WL
Isolate stored data during read
Dynamic storage for duration of read
Differential read Single-ended read
Slide 7.31
Emerging Devices: Double-gate MOSFETEmerging devices allow new SRAM structuresBack-gate biasing of thin-body MOSFET provides improved control of short-channel effects, and re-instates effective dynamic control of VTH.
Drai
n
Sour
ce
Gate
Fin Height HFIN = W /2
Gate length = L G
Fin Width = TSi
Drai
nGate1
Sour
ce
SwitchingGate
Gate2VTH Control
Fin Height H FIN = W
Gate length = Lg
Back-gated (BG) MOSFET• Independent front and back gates• One switching gate and
VTH control gate
Double-gated (DG) MOSFET
[Ref: Z. Guo, ISLPED’05]
Slide 7.32
β ratioincreased
PL
NL
PR
NR
ARAL
“1” “0”
6T SRAM Cell with Feedback
Double-Gated (DG) NMOS pull-down and PMOS load devicesBack-Gated (BG) NMOS access devices dynamically increase β-ratio– SNM during read ~300 mV– Area penalty ~ 19%
00.10.20.30.40.50.60.70.80.9
1
Vsn
2 (V
)
[Ref: Z. Guo, ISLPED’05]
00.10.20.30.40.50.60.70.80.9
1
0 0.5 1
210 mV300 mV
300 mV210 mV
READ
READ
STANDBY STANDBY
Vsn1 (V)0 0.5 1
Vsn1 (V)
Vsn
2 (V
)
6T DG-MOS 6T BG-MOS
Slide 7.33
Summary and Perspectives
Functionality is main constraint in SRAM– Variation makes the outlying cells limiters– Look at hold, read, write modes
Use various methods to improve robustnessUse various methods to improve robustness, then trade off for power savings
C ll lt th h ld– Cell voltages, thresholds– Novel bit-cells
E i d i– Emerging devices
Embedded memory major threat to continued technology scaling – innovative solutions necessary
Slide 7.34
References
B k d B k Ch tBooks and Book ChaptersK. Itoh et al., Ultra-Low Voltage Nano-scale Memories, Springer 2007.A. Macii, “Memory Organization for Low-Energy Embedded Systems,” in Low-Power Electronics Design, C. Piguet Ed., Chapter 26, CRC Press, 2005. V. Moshnyaga and K. Inoue, “Low Power Cache Design,” in Low-Power Electronics Design, C., Piguet Ed., Chapter 25, CRC Press, 2005. J. Rabaey, A. Chandrakasan and B. Nikolic, Digital Integrated Circuits, Prentice Hall, 2003.T. Takahawara and K. Itoh, “Memory Leakage Reduction,” in Leakage in Nanometer CMOS y g gTechnologies, S. Narendra, Ed., Chapter 7, Springer 2006.
ArticlesA A l VtA. Agarwal, H. Li and K. Roy, “A Single-Vt low-leakage gated-ground cache for deep submicron,” IEEE Journal of Solid-State Circuits,38(2),pp.319–328, Feb. 2003.N. Azizi, F. Najm and A. Moshovos, “Low-leakage asymmetric-cell SRAM,” IEEE Transactions on VLSI, 11(4), pp. 701–715, Aug. 2003.A Bhavnagarwala S Kosonocky S Kowalczyk R Joshi Y Chan U. , . , . , . , . , . Srinivasan andJ. Wadhwa, “A transregional CMOS SRAM with single, logic VDD and dynamic power rails,” in Symposium on VLSI Circuits, pp. 292–293, 2004.Y. Cao, T. Sato, D. Sylvester, M. Orshansky and C. Hu, “New paradigm of predictive MOSFET
”and modelinginterconnect for early circuit design, in Custom Integrated Circuits Conference(CICC), Oct. 2000, pp. 201–204.L. Chang, D. Fried, J. Hergenrother et al., “Stable SRAM cell design for the 32 nm node and beyond,” Symposium on VLSI Technology, pp. 128–129, June 2005.Y. Chang, B. Park and C. Kyung, “Conforming inverted data store for low power memory,” IEEE
1999International Symposium on Low Power Electronics and Design, .
Slide 7.35
References (cont.)
Y. Chang, F. Lai and C. Yang, “Zero-aware asymmetric SRAM cell for reducing cache power in writing zero,” IEEE Transactions on VLSI Systems, 12(8), pp. 827–836, Aug. 2004.Z. Guo, S. Balasubramanian, R. Zlatanovici, T.-J. King, and B. Nikolic, “FinFET-based SRAM design,” International Symposium on Low Power Electronics and Design, pp. 2–7, Aug. 2005. F. Hamzaoglu, Y. Ye, A. Keshavarzi, K. Zhang, S. Narendra, S. Borkar, M. Stan, and V. De, “Analysis of Dual-VT SRAM cells with full-swing single-ended bit line sensing for on-chip cache,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 10(2), pp. 91–95, Apr. 2002.T Hirose H. Kuriyama, S. Murakam, et al., “A 20-ns 4-Mb CMOS SRAM with hierarchical word decodingarchitecture,”IEEE Journal of SolidState Circuits-, 25(5) pp. 1068–1074, Oct. 1990.
. ,
K. Itoh, A. Fridi, A. Bellaouar and M. Elmasry, “A Deep sub-V, single power-supply SRAM cell with multi-VT, boosted storage node and dynamic load,” Symposium on VLSI Circuits, 133, June 1996.K. Itoh, M. Horiguchi and T. Kawahara, “Ultra-low voltage nano-scale embedded RAMs,” IEEE Symposium on Circuits and Systems, May 2006.K. Kanda, H. Sadaaki and T. Sakurai, “90% write power-saving SRAM using sense-amplifying memory cell,” IEEE Journal of Solid-State Circuits, 39(6), pp. 927–933, June 2004.S K k A Bh lS. Kosonocky, A. Bhavnagarwala and L. Chang, International conference on solid-state andintegrated circuit technology, pp. 689–692, Oct. 2006.K. Mai, T. Mori, B. Amrutur et al., ‘‘Low-power SRAM design using half-swing pulse-mode techniques,”IEEE Journal of Solid-State Circuits, 33(11) pp. 1659–1671, Nov. 1998.
‘‘ ’’G. Ming, Y. Jun and X. Jun, Low Power SRAM Design Using Charge Sharing Technique,pp.102–105, ASICON, 2005.K. Osada, Y. Saitoh, E. Ibe and K. Ishibashi, “16.7-fA/cell tunnel-leakage- suppressed 16-Mb SRAM for handling cosmic-ray-induced multierrors,” IEEE Journal of Solid-State Circuits,38(11), pp. 1952–1957, Nov. 2003.PTM – Predictive Models. Available: http://www.eas.asu.edu/˜ptm
Slide 7.36
References (cont.)
E. Seevinck, F. List and J. Lohstroh, “Static noise margin analysis of MOS SRAM Cells,” IEEE Journal of Solid-State Circuits, SC-22(5), pp. 748–754, Oct. 1987.K. Takeda, Y. Hagihara, Y. Aimoto, M. Nomura, Y. Nakazawa, T. Ishii and H. Kobatake, “A read-static-noise-margin-free SRAM cell for low-vdd and high-speed applications,” IEEEInternational Solid-State Circuits Conference, pp. 478–479, Feb. 2005.M. Yamaoka, Y. Shinozaki, N. Maeda, Y. Shimazaki, K. Kato, S. Shimada, K. Yanagisawa and K. Osadal, “A 300 MHz 25 µA/Mb leakage on-chip SRAM module featuring process-variation
-leakage -active mode for mobile -phone application processor, ” IEEEimmunity and lowInternational Solid-State Circuits Conference, 2004, pp. 494–495.
Slide 7.37