FinFET-based SRAM and Monolithic
3-D Integrated Circuit Design
Abdullah Guler
A Dissertation
Presented to the Faculty
of Princeton University
in Candidacy for the Degree
of Doctor of Philosophy
Recommended for Acceptance
by the Department of
Electrical Engineering
Adviser: Professor Niraj K. Jha
September 2019
c© Copyright by Abdullah Guler, 2019.
All rights reserved.
Abstract
Device miniaturization enabled processors to become faster and more powerful for
decades. However, device scaling became more challenging due to increasing leak-
age power consumption, intolerable short-channel effects (SCEs), and manufactur-
ing costs. This thesis aims to develop newer approaches for low-power and high-
performance designs for next generation computing technologies. It focuses on two
research directions: FinFET-based static random access memory (SRAM) design and
hybrid monolithic 3-D integrated circuit (IC) design.
The first research direction is to design area-efficient, low-power, and high-
performance SRAM cells. To this end, we investigate two approaches: multi-
parameter asymmetric (MPA) FinFET-based SRAM design and 3-D transistor-level
monolithic (TLM) SRAM design. In the first approach, we use FinFETs with up
to three asymmetries to address various SRAM challenges such as high leakage
power, read-write conflict, and width quantization issue at once. We present five
new 6T SRAM cells using MPA FinFETs and provide a comprehensive evaluation
of SRAM cells based on asymmetric FinFETs. We show MPA FinFETs can achieve
high stability metrics and reduce leakage power significantly at a cost of degraded
performance. We investigate TLM technology in the second approach of SRAM
design. In 3-D TLM design, n- and p-type transistors are fabricated on different
layers. Conventional 6T/8T SRAM cells have an area inefficiency when implemented
in 3-D due to the unequal number of n- and p-type transistors in the cell. We present
two new 3-D 8T SRAM cells that consist of four n-type and four p-type transistors
for better area efficiency. The proposed cells provide superior read performance and
lower leakage power consumption when compared to other 2-D/3-D SRAM cells at a
cost of degradation in writeability.
The second research direction of this thesis is to explore the benefits of mono-
lithic 3-D design from circuit to multi-core system level. 3-D ICs can address design
iii
challenges such as the interconnect bottleneck and memory wall. 3-D ICs reduce
power consumption, delay, and interconnect length by utilizing the vertical dimension.
Among 3-D IC solutions, monolithic 3-D technology appears to be very promising as
it provides the highest connectivity between transistor layers owing to its nanoscale
monolithic inter-tier vias (MIVs). Monolithic 3-D integration can be realized at dif-
ferent levels of granularity such as block, gate, and transistor. In this thesis, we focus
on hybrid monolithic (HM) designs, which combine modules implemented in different
monolithic styles to utilize their advantages. We develop the tools that are needed
to explore the HM design space. We develop a 3-D HM floorplanner, gate-level
placement methodology, and modeling tools for logic, memory, and NoC modules.
We integrate these tools into McPAT-monolithic, an area/timing/power architectural
modeling framework we develop for HM multi-core systems.
iv
Acknowledgments
First and foremost, I would like to thank my adviser Prof. Niraj K. Jha for his in-
valuable guidance and support during the past six years. I feel incredibly lucky to
have him as my adviser. I greatly appreciate his effort on helping me improve my
communication and writing skills, guiding me towards the right direction, and chal-
lenging me to do better. I would also like to thank my thesis reading and defense
committee for their time and invaluable feedback. I would like to thank National Sci-
ence Foundation for supporting this work under grants CCF-1318603, CCF-1714161,
and CCF-1811109.
I am indebted to all of my teachers who helped me throughout my academic
career. I would like to thank the incredible faculty in Princeton. I especially enjoyed
courses I have taken from Prof. Naveen Verma, Prof. James Sturm, and Prof. Niraj
Jha. I would like to thank Bilge Aslan, Nuri Yılmaz, Muhittin Siro, and many other
teachers whose names I could not list here.
I would like to thank my group mates Sourindra Chaudhuri, Debajit Bhattacharya,
Aoxiang Tang, Xianmin Chen, Arsalan Mosenia, Jie (Lucy) Lu, Xiaoliang Dai, Ye
(Fisher) Yu, Ozge Akmandor, Hongxu Yin, Shayan Hassantabar, Tanujay Saha, and
Prerit Terway for their support. I am also grateful to Ajay Bhoj, a former group
member whom I never met, but from whose work I learned so much.
I greatly appreciate Princetonian staff both inside and outside my department
for making campus life comfortable and enjoyable. I would especially like to thank
Colleen Conrad for helping me with all the logistics and administrative work. Life in
Princeton for me has been much easier thanks to people at Equad, Transportation
and Parking, Davis International Center, McCosh Health Center, and dining halls.
I would like to thank my friends for making campus life incredibly fun and mem-
orable. Yen, Levent, Chandra Kanth, Burcin, Onur, Li-Fang, Tugce, Yasin, Murat,
Tri, Ozge, Mert, and Chinmay are only a small fraction of people whose friendship I
v
enjoyed immensely. I am especially thankful to Lung Yen Chen for being the greatest
housemate ever. I would like to thank my lifelong friends Faruk Gencel and Denizcan
Vanlı for their support. I also thank Bayboga, Anıl, Serkan, and Ender for great
DotA games we enjoyed together to relieve the stress of PhD life.
I would finally like to thank my dad Abdulkadir, mom Rihıme, and siblings
Mıheme, Sanye, Xezal, Mihros, Melis, Isık, Semo, Inci, and Sekocan for their im-
measurable love and support. I am especially indebted to my parents and elder
siblings for the sacrifices they made to help younger ones get an education. I would
like to thank Melis for inspiring me to be a better person. Isık and Semo have not
only been my siblings but also my best friends. I have become more caring thanks
to my younger siblings Inci and Sekocan. I am also grateful to my dear nephews and
nieces for the joy and love they bring into my life.
My special thanks goes to Eiichiro Oda, the creator of One Piece. I have immensely
enjoyed the journey of Captain Monkey D. Luffy and Straw Hat Crew. I have also
learned many invaluable lessons from my favorite characters. As Portgas D. Ace once
said, we need to live a life with no regrets.
vi
Abbreviations
2-D Two-dimensional
3-D Three-dimensional
ADSG Asymmetric doping shorted-gate
AUSG Asymmetric underlap shorted-gate
AWSG Asymmetric workfunction shorted-gate
BEOL Back-end-of-line
BL Bitline
BLB Bitline bar
BLM Block-level monolithic
CMOS Complementary metal-oxide-semiconductor
DIBL Drain-induced barrier lowering
EDA Electronic design automation
FEOL Front-end-of-line
FU Functional unit
GLM Gate-level monolithic
GND Ground
HM Hybrid monolithic
IC Integrated circuit
IG Independent-gate
ILD Inter-layer dielectric
ILEAK Leakage current
IOFF Off-current
ION On-current
IREAD Read current
MIV Monolithic inter-tier via
MOSFET Metal-oxide-semiconductor field-effect transistorvii
MPA Multi-parameter asymmetric
NoC Network-on-chip
RBL Read bitline
RDF Random dopant fluctuation
RPNM Read power noise margin
RSNM Read static noise margin
RWL Read wordline
SCE Short-channel effect
SG Shorted-gate
SoC System-on-chip
SOI Silicon-on-insulator
SPA Single-parameter asymmetric
SRAM Static random access memory
TCAD Technology computer-aided design
TLM Transistor-level monolithic
TR Read time
TSV Through-silicon via
TW Write time
VDD Supply voltage
VTC Voltage transfer characteristics
Vth Threshold voltage
WL Wordline
WM Write margin
WTP Write trip power
viii
To my family.
ix
Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 FinFETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 FinFET-based SRAM design . . . . . . . . . . . . . . . . . . . 6
1.1.3 SRAM characterization . . . . . . . . . . . . . . . . . . . . . . 8
1.1.4 Monolithic 3-D integration . . . . . . . . . . . . . . . . . . . . 10
1.2 Thesis contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Related Work 15
2.1 SPA FinFET-based SRAM design . . . . . . . . . . . . . . . . . . . . 15
2.2 TLM 3-D SRAM design . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 3-D hybrid monolithic floorplanning . . . . . . . . . . . . . . . . . . . 19
2.4 Monolithic 3-D design . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Ultra-low-leakage, Robust FinFET SRAM Design Using Multi-
parameter Asymmetric FinFETs 23
x
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Simulation setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.1 SRAM dc metrics . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.2 SRAM transient metrics . . . . . . . . . . . . . . . . . . . . . 29
3.3 MPA FinFET-based 6T SRAM cells . . . . . . . . . . . . . . . . . . 30
3.3.1 Selection of promising SRAM cells . . . . . . . . . . . . . . . 30
3.3.2 SRAM dc metrics analysis . . . . . . . . . . . . . . . . . . . . 32
3.3.3 SRAM transient metrics analysis . . . . . . . . . . . . . . . . 36
3.4 Analysis of the SRAM cells under different gate workfunction, doping
concentration, supply voltage, and temperature values . . . . . . . . . 37
3.4.1 Different gate workfunction values . . . . . . . . . . . . . . . . 37
3.4.2 Different doping concentration values . . . . . . . . . . . . . . 41
3.4.3 Different supply voltage values . . . . . . . . . . . . . . . . . . 43
3.4.4 Different temperature values . . . . . . . . . . . . . . . . . . . 43
3.5 Process variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.7 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4 3-D Monolithic FinFET-based 8T SRAM Cell Design for Enhanced
Read Time and Low Leakage 53
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2 Simulation setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 Design of monolithic SRAM cells . . . . . . . . . . . . . . . . . . . . 58
4.3.1 Schematics of the SRAM cells . . . . . . . . . . . . . . . . . . 58
4.3.2 Layouts of the SRAM cells . . . . . . . . . . . . . . . . . . . . 60
4.3.3 Capacitance extraction . . . . . . . . . . . . . . . . . . . . . . 61
4.4 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.4.1 SRAM dc metric analysis . . . . . . . . . . . . . . . . . . . . 64
xi
4.4.2 SRAM transient metric analysis . . . . . . . . . . . . . . . . . 67
4.5 Impact of process variations, memory array configurations, assist tech-
niques, different temperatures, and gate workfunction values . . . . . 69
4.5.1 SRAM cell analysis under process variations . . . . . . . . . . 69
4.5.2 SRAM cell analysis under different memory array configurations 71
4.5.3 SRAM cell analysis under assist techniques . . . . . . . . . . . 73
4.5.4 SRAM cell analysis under different temperature values . . . . 76
4.5.5 SRAM cell analysis under different gate workfunction values . 77
4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.7 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5 Hybrid Monolithic 3-D IC Floorplanner 82
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.3 Simulation setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.4 FinPrin-monolithic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.5 Gate-level monolithic placement . . . . . . . . . . . . . . . . . . . . . 91
5.6 CACTI-monolithic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.7 Hybrid-monolithic 3-D IC floorplanner . . . . . . . . . . . . . . . . . 97
5.7.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . 98
5.7.2 T*-tree representation . . . . . . . . . . . . . . . . . . . . . . 98
5.7.3 Simulated annealing engine . . . . . . . . . . . . . . . . . . . 100
5.7.4 Cost function . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.7.5 Global wire power consumption . . . . . . . . . . . . . . . . . 101
5.8 HotSpot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.9 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.9.1 Floorplanning results . . . . . . . . . . . . . . . . . . . . . . . 104
xii
5.9.2 Floorplanning results at minimum area, wirelength, and power
values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.9.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.10 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6 McPAT-monolithic: An Area/Power/Timing Framework for 3-D
Hybrid Monolithic Multi-Core Systems 113
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.2 Simulation setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.3 Modeling of functional units . . . . . . . . . . . . . . . . . . . . . . . 117
6.4 Modeling of memory modules . . . . . . . . . . . . . . . . . . . . . . 118
6.5 Orion-monolithic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.6 McPAT-monolithic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.7.1 Floorplanning results of the OpenSPARC T2 core . . . . . . . 122
6.7.2 The OpenSPARC T2 SoC results . . . . . . . . . . . . . . . . 124
6.7.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.8 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7 Conclusion and Future Work 129
7.1 Summary of findings . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Bibliography 133
xiii
List of Tables
1.1 22nm and 14nm SOI FinFET device parameter values . . . . . . . . 5
3.1 22nm SOI asymmetric FinFET device parameter values . . . . . . . . 26
3.2 Numerical representation of FinFETs . . . . . . . . . . . . . . . . . . 27
3.3 SRAM cell elimination examples . . . . . . . . . . . . . . . . . . . . . 31
3.4 6T SRAM configurations . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.5 6T SRAM dc and transient metric values . . . . . . . . . . . . . . . . 33
3.6 SRAM stability metric values under different drain doping concentrations 43
3.7 Comparison of SRAM cells at iso-IREAD/iso-ILEAK . . . . . . . . . . . 44
3.8 SRAM metric values at 0◦C . . . . . . . . . . . . . . . . . . . . . . . 44
3.9 SRAM metric values at 65◦C . . . . . . . . . . . . . . . . . . . . . . 45
3.10 SRAM metric values at 90◦C . . . . . . . . . . . . . . . . . . . . . . 45
3.11 Process variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.12 Distribution characteristics, µ . . . . . . . . . . . . . . . . . . . . . . 46
3.13 Distribution characteristics, σ . . . . . . . . . . . . . . . . . . . . . . 46
4.1 SRAM cell footprint area . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 SRAM bitline and wordline capacitances . . . . . . . . . . . . . . . . 62
4.3 SRAM dc and transient metric values . . . . . . . . . . . . . . . . . . 65
4.4 Process variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.5 Distribution characteristics of SRAM dc and transient metrics . . . . 70
xiv
4.6 Impact of assist techniques on read stability and writeability . . . . . 74
4.7 Impact of assist techniques on read current and transient metrics . . 75
4.8 Gate workfunction values for designs with high stability . . . . . . . . 78
4.9 Gate workfunction values for high-performance designs . . . . . . . . 79
4.10 Gate workfunction values for low-leakage-power designs . . . . . . . . 79
4.11 Gate workfunction values for overall high-quality designs . . . . . . . 79
5.1 FGU footprint area and power values for different monolithic imple-
mentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2 Footprint area values assumed for the modules to be floorplanned . . 85
5.3 FinPrin-monolithic results . . . . . . . . . . . . . . . . . . . . . . . . 89
5.4 Placement results of 14 test circuits . . . . . . . . . . . . . . . . . . . 93
5.5 CACTI-monolithic input parameter values for memory modules . . . 95
5.6 CACTI-monolithic results . . . . . . . . . . . . . . . . . . . . . . . . 95
5.7 Instruction cache data array dimensions of BLM and TLM implemen-
tations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.8 Comparison of different monolithic designs based on minimum area-
power product showing the benefit of hybrid designs in terms of foot-
print area and power consumption . . . . . . . . . . . . . . . . . . . . 104
5.9 Minimum area configurations of different monolithic designs . . . . . 108
5.10 Minimum wirelength configurations of different monolithic designs . . 108
5.11 Minimum power consumption configurations of different monolithic de-
signs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.1 Footprint area and power consumption results for FUs . . . . . . . . 118
6.2 Memory modules in BLM vs. TLM . . . . . . . . . . . . . . . . . . . 119
6.3 Orion-monolithic results . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.4 Processor model parameter values . . . . . . . . . . . . . . . . . . . . 122
xv
6.5 Comparison of different monolithic designs based on minimum area-
power product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.6 Area and power results for the SoC components implemented in BLM
and TLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.7 Total area and power consumption results for the SoC designs . . . . 126
xvi
List of Figures
1.1 Planar MOSFET model. . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Key milestones in device scaling. . . . . . . . . . . . . . . . . . . . . 3
1.3 FinFET types: (a) SG and (b) IG. . . . . . . . . . . . . . . . . . . . 4
1.4 A 2-D cross-section of a 3-D FinFET. . . . . . . . . . . . . . . . . . . 5
1.5 6T SRAM cell schematic. . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 2-D cross section of symmetric and SPA FinFETs: (a) SG FinFET,
(b) AWSG FinFET, (c) ADSG FinFET, and (d) AUSG FinFET. . . 7
1.7 SRAM stability metrics: (a) RSNM and (b) WM. . . . . . . . . . . . 9
1.8 Monolithic 3-D integration styles: (a) block-level, (b) gate-level, and
(c) transistor-level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Monolithic 3-D floorplanning of different monolithic styles: (a) BLM,
(b) GLM/TLM, and (c) HM. . . . . . . . . . . . . . . . . . . . . . . 20
3.1 N-curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 RSNMs of the (1,1,1) and (0,1,1) cells show how the pull-up transistor
can impact read stability. . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 RSNM under different gate workfunction values. . . . . . . . . . . . . 38
3.4 WM under different gate workfunction values. . . . . . . . . . . . . . 39
3.5 RPNM under different gate workfunction values. . . . . . . . . . . . . 39
3.6 WTP under different gate workfunction values. . . . . . . . . . . . . 40
xvii
3.7 IREAD under different gate workfunction values. . . . . . . . . . . . . 40
3.8 ILEAK under different gate workfunction values. . . . . . . . . . . . . 41
3.9 TR under different gate workfunction values. . . . . . . . . . . . . . . 42
3.10 TW under different gate workfunction values. . . . . . . . . . . . . . . 42
4.1 Simulation flow for SRAM characterization. . . . . . . . . . . . . . . 57
4.2 SRAM cell schematics: (a) 6T 4N2P 2D/6T 4N2P 3D, (b) 8T 6N2P 2D,
(c) 8T 4N4P 3D prior1, (d) 8T 4N4P 3D prior2, (e) 8T 4N4P 3D proposed1,
and (f) 8T 4N4P 3D proposed2. . . . . . . . . . . . . . . . . . . . . . 58
4.3 SRAM layouts: (a) 6T 4N2P 2D, (b) 6T 4N2P 3D, (c) 8T 6N2P 2D,
(d) 8T 4N4P 3D prior1, (e) 8T 4N4P 3D prior2, (f) 8T 4N4P 3D proposed1,
and (g) 8T 4N4P 3D proposed2. . . . . . . . . . . . . . . . . . . . . . 60
4.4 6T 4N2P 2D cell: (a) FEOL only and (b) FEOL+BEOL. . . . . . . . 62
4.5 8T 4N4P 3D proposed2 p-layer: (a) FEOL only and (b) FEOL+BEOL. 63
4.6 8T 4N4P 3D proposed2 n-layer: (a) FEOL only and (b) FEOL+BEOL. 64
4.7 TR under different array configurations. . . . . . . . . . . . . . . . . . 72
4.8 TW under different array configurations. . . . . . . . . . . . . . . . . 72
4.9 ILEAK under different temperature values. . . . . . . . . . . . . . . . . 76
5.1 Floorplanning results of different monolithic implementations: (a)
BLM, (b) GLM logic + BLM memory, and (c) GLM/BLM logic +
BLM memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2 The hybrid monolithic design flow. . . . . . . . . . . . . . . . . . . . 87
5.3 The FinPrin-monolithic simulation flow. . . . . . . . . . . . . . . . . 88
5.4 8× NAND cell layout: (a) BLM/GLM, (b) TLM n-tier, and (c) TLM
p-tier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.5 Gate-level monolithic placement steps: (a) cell deflation, (b) deflated
2-D placement, (c) cell inflation, and (d) cell layer assignment. . . . . 91
xviii
5.6 Greedy layer assignment vs. ZOLP layer assignment showing the area
benefit of greedy method: (a) deflated 2-D placement, (b) cell inflation,
(c) cell layer assignment, and (d) legalization (just for ZOLP). . . . . 93
5.7 The CACTI-monolithic simulation flow. . . . . . . . . . . . . . . . . . 94
5.8 Area comparison: (a) BLM memory and (b) TLM memory module. . 96
5.9 Layout of horizontal and vertical H-trees of a memory module. . . . . 97
5.10 A T*-tree representation and the corresponding placement in 3-D. . . 99
5.11 Thermal model organization. . . . . . . . . . . . . . . . . . . . . . . . 103
5.12 Floorplanning results showing that the vertical constraints are met
for TLM/GLM modules: (a) 2-D, (b) BLM, (c) TLM, and (d) HM3
(GLM/BLM logic + BLM memory). . . . . . . . . . . . . . . . . . . 106
6.1 The HM SoC design flow. . . . . . . . . . . . . . . . . . . . . . . . . 117
6.2 The Orion-monolithic simulation flow. . . . . . . . . . . . . . . . . . 120
6.3 Hierarchical modeling in McPAT-monolithic. . . . . . . . . . . . . . . 121
6.4 OpenSPARC T2 floorplanning results: (a) 2-D, (b) BLM, (c) TLM,
and (d) HM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.5 OpenSPARC T2 SoC floorplans: (a) 2-D, (b) BLM, (c) TLM, and (d)
HM (HM core + BLM L2 + TLM CCX). . . . . . . . . . . . . . . . . 126
6.6 OpenSPARC T2 heat maps: (a) 2-D, (b) BLM, (c) TLM, and (d) HM
(HM core + BLM L2 + TLM CCX). . . . . . . . . . . . . . . . . . . 127
xix
Chapter 1
Introduction
In 1965, Gordon Moore, cofounder of Intel Corporation, predicted that the number
of devices in a chip would roughly double every year [1]. His prediction later became
known as Moore’s law that held true for more than half a century despite slight mod-
ifications over time. Moore’s law has been a roadmap for semiconductor foundries to
push the limits of innovation and cram more devices and functionality into integrated
circuits. Device scaling has been the driving force behind the exponential growth in
transistor count, increased device performance, and reduced cost per transistor [2].
Fig. 1.1 shows a planar metal-oxide-semiconductor field-effect transistor (MOS-
FET) model, which has been the fundamental device for building integrated circuits
(ICs). The current flow between source and drain is controlled by applying differ-
ent gate voltages. As device scales, the channel length between source and drain
decreases. This leads to an improvement in device performance and smaller power
reduction due to shorter channel length and smaller capacitances. Device scaling has
continued for decades despite countless challenges over the years.
Fig. 1.2 shows some key milestones enabling the continuation of device scaling over
the years. In 2003, strained silicon was added to the 90nm technology node [3]. Use
of strained silicon improved the performance of the device by up to 25% by increasing
1
p substrate
n+
Source Gate Drain
n+
SiO2
Figure 1.1: Planar MOSFET model.
electron and hole mobility. For sub-90nm technology nodes, gate leakage became a
serious issue and started hindering further scaling. As device scales, the gate insulator
gets thinner to enable gate control over the channel. However, thinning gate insulator
leads to current leaking through the gate. At the 45nm technology node, a high-k
dielectric based on HfO2 compound replaced SiO2, which was the gate insulator
for decades [4]. This replacement reduced the gate leakage by up to three orders of
magnitude. The polysilicon gate was also replaced with a metal gate that eliminated
the polysilicon depletion effect. Despite gate leakage being mostly eliminated, sub-
threshold leakage kept increasing with scaling as the source and drain terminals got
closer. At the 22nm technology node, FinFETs replaced planar MOSFETs to reduce
leakage power and enable better gate control over the channel [5].
Device scaling gets even more challenging as it approaches its fundamental physics
limits. Some of the current challenges are: approaching lithographic limits, intoler-
able short-channel effects (SCEs), power constraints, and increasing manufacturing
costs [6]. To continue enhancements in computing technology and overcome some
of above-mentioned challenges, we need alternative approaches such as designing de-
vices that employ new integration technologies, developing new architectures and
novel computation paradigms, as well as optimizing existing designs [7]. In this the-
2
SiGe strained silicon
High-k dielectric + Metal gate
FinFETs
1965 2003 2006 2008 2010 2012 2014
90nm 65nm 45nm 32nm 22nm 14nm 10nm
…
7nm
……
Figure 1.2: Key milestones in device scaling.
sis, we explore low-power and high-performance designs. Specifically, we design and
evaluate asymmetric FinFET-based static random access memory (SRAM) cells and
monolithic three-dimensional (3-D) SRAM cells, and explore monolithic 3-D designs
from module to multi-core system level.
The rest of the chapter is organized as follows. We first give a brief introduction to
FinFETs, FinFET-based SRAM design, SRAM characterization, and monolithic 3-D
ICs, followed by contributions of this thesis. An outline for the remaining chapters is
provided at the end of the chapter.
1.1 Background
This section presents background information on FinFETs, FinFET-based SRAM
design and evaluation metrics, and monolithic 3-D integration.
3
1.1.1 FinFETs
FinFETs, a type of multi-gate transistors, have replaced planar MOSFETs due to
their higher performance, superior short-channel behavior, and power efficiency [5].
FinFETs provide better control over the channel by surrounding it from multiple
sides. A better channel control suppresses the drain-induced barrier lowering (DIBL)
effect, improves subthreshold slope, and reduces leakage power consumption [8]. In
addition, FinFETs reduce random dopant fluctuation (RDF) by employing a lightly-
doped or undoped channel [9].
Fig. 1.3 shows two types of FinFETs: shorted-gate (SG) and independent-gate
(IG). In SG FinFETs, the gate wraps around the channel, whereas in IG FinFETs,
the front and back gates become independent of each other because the top part of
the FinFET is etched away. SG FinFETs have better on-current (ION) to off-current
(IOFF ) ratio than IG FinFETs and are preferred for high-performance designs. IG
FinFETs enable dynamic threshold voltage (Vth) control and offer new possibilities
for circuit design [10].
Gate Back
Gate
Front
Gate
Source
Drain Drain
Source
Oxide
(a) (b)
Figure 1.3: FinFET types: (a) SG and (b) IG.
Fig. 1.4 shows a two-dimensional (2-D) cross-section of a 3-D FinFET. The Fin-
FET parameters are gate length LG, fin thickness TSI , oxide thickness TOX , fin height
4
HFIN , spacer thickness LSP , gate underlap LUN , fin pitch FP , gate pitch GP , channel
doping concentration NCH , source/drain doping concentration NSD, and gate work-
function ΦG.
TSI
TOX
LSP
LG
LUN
NSD NSD
NCH
Figure 1.4: A 2-D cross-section of a 3-D FinFET.
We use a 22nm (Chapter 3) and 14nm (Chapters 4, 5, and 6) silicon-on-insulator
(SOI) FinFET technology in our simulations. Table 1.1 shows the parameter values
we assume for our 22nm and 14nm SOI FinFET technology. We obtain parameter
values from the data released by semiconductor foundries and calibrate them via
simulations for high-performance circuits [11, 12, 13] (22nm), [14, 15, 16] (14nm).
Table 1.1: 22nm and 14nm SOI FinFET device parameter values
Parameter (unit) Value at 22nm Value at 14nmLG(nm) 24 16TSI(nm) 10 8TOX(nm) 1 0.9HFIN(nm) 40 30LSP (nm) 12 8LUN(nm) 12 8FP (nm) 40 42GP (nm) 90 70
NCH(cm−3) 1015 1015
NSD(cm−3) 1020 1020
ΦG(eV ) n : 4.4, p : 4.8 n : 4.4, p : 4.8
5
1.1.2 FinFET-based SRAM design
SRAM is the fundamental memory cell for on-chip data storage and fast access. Reg-
ister files, buffers, and caches use SRAM cells to store the data safely and access
it quickly. Design of low power, robust, and dense memories is crucial for modern
microprocessors because SRAMs occupy more than half the die area and are respon-
sible for significant power consumption, primarily due to leakage power dissipation
[17]. Although FinFETs have less leakage than planar complementary metal-oxide-
semiconductor (CMOS) transistors, leakage power consumption is still a major issue
in FinFETs due to aggressive scaling. Besides, the width quantization issue associ-
ated with FinFETs, process variations, and read-write conflict in SRAM cells make
FinFET SRAM design even more challenging. To address SRAM challenges, we in-
vestigate asymmetric FinFET-based SRAM and 3-D SRAM design in Chapter 3 and
4, respectively.
Fig. 1.5 shows the schematic of a conventional 6T SRAM cell. It consists of a
pair of pFinFET pull-up (PU1, PU2), nFinFET access (AX1, AX2), and nFinFET
pull-down (PD1, PD2) transistors. Cross-coupled inverters are used to store the data
and access transistors are used to read and write the data.
WL
VDD
AX1 AX2
PU1 PU2
PD1 PD2
BL BLB
L R
INV2INV1
Figure 1.5: 6T SRAM cell schematic.
6
(a) (b)
(c) (d)
NSD NSD NSD
NS ND NSDNSD
NSD
Doping
Concentration
(cm-3)
1020
1018
1016
1012
-1013
-1015
ΦG
ΦGFΦG
ΦG
ΦG
ΦG
ΦG
ΦGB
LUN LUN
LUN LUN
LUN LUN
LUNDLUNS
Figure 1.6: 2-D cross section of symmetric and SPA FinFETs: (a) SG FinFET, (b)AWSG FinFET, (c) ADSG FinFET, and (d) AUSG FinFET.
One way to address various SRAM design challenges is to introduce asymmetries
in FinFETs. A single-parameter asymmetric (SPA) FinFET is created by introducing
asymmetry in a FinFET parameter value. The SPA FinFETs we consider are asym-
metric workfunction SG (AWSG), asymmetric doping SG (ADSG), and asymmetric
underlap SG (AUSG), as shown in Fig. 1.6. SPA FinFETs were shown to be effective
at reducing static power consumption, mitigating the read-write conflict by utiliz-
ing bidirectional current flow across access transistors, and achieving high density in
SRAM cells using single-fin FinFETs. Multi-parameter asymmetric (MPA) FinFETs
combine two or more asymmetries in a FinFET to benefit from different asymmetries.
For example, an asymmetric workfunction-doping SG (AWDSG) FinFET combines
asymmetries in gate workfunction and doping concentration, which can be useful for
both reducing leakage and mitigating read-write conflict in SRAM cells. In Chap-
ter 3, we investigate MPA FinFET-based 6T SRAM cells for robust, low-power, and
area-efficient designs.
7
1.1.3 SRAM characterization
Dc and transient metrics are used to evaluate SRAM cells. The dc metrics are read
static noise margin (RSNM), write margin (WM), read current (IREAD), and leakage
current (ILEAK). The transient metrics are read time (TR) and write time (TW).
1. RSNM: RSNM measures the stability of an SRAM cell during a read operation.
For the read simulation setup, wordline (WL), bitline (BL), and BL bar (BLB)
voltages are held at supply voltage (VDD) while a voltage source sweeps the
voltage at a storage node from ground (VGND) to VDD. RSNM is measured from
the butterfly curve obtained from the voltage transfer characteristics (VTC) of
the cross-coupled inverters (INV1, INV2). RSNM is the largest square that can
fit in the butterfly curve, as shown in Fig. 1.7a. An SRAM cell with a higher
RSNM is more resilient to read failures.
2. WM: WM measures the writeability of a cell. It is measured from the VTC of
the cross-coupled inverters during a write operation. For the write simulation
setup, BLB and WL are held at VDD and BL is tied to VGND while a voltage
source sweeps the voltage at a storage node from VGND to VDD. WM is the
smallest square that can fit in the lower half of the VTC curves, as shown in
Fig. 1.7b. A higher WM implies better writeability.
3. IREAD: IREAD is the current drawn from the bitline connected to the storage
node that holds a “0” during a read operation [18]. A higher IREAD implies a
fast discharge on the bitline capacitance, and hence a smaller TR.
4. ILEAK: SRAMs consume a significant amount of leakage energy as they are
mostly in the hold mode. ILEAK is the current drawn from the power source
when the cell is in the hold mode. In this mode, bitlines are at VDD while the
wordline is at VGND. In other words, the transistors connected to wordlines are
OFF in the hold mode.
8
0 0.2 0.4 0.6 0.80
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
VL (V)
VR
(V
)
RSNM
INV1
INV2
(a)
0 0.2 0.4 0.6 0.80
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
VL (V)
VR
(V
)
WM
INV1
INV2
(b)
Figure 1.7: SRAM stability metrics: (a) RSNM and (b) WM.
5. TR: TR is measured during a read operation. It is the time interval from the
point VWL reaches its 50% value during switching to the point when sense am-
plifiers are activated. It is assumed that the sense amplifiers are activated
when the difference between bitline voltages (VBL, VBLB) reaches 100 mV
(|VBL − VBLB| = 100 mV).
6. TW: TW is measured during a write operation. It is measured from the time
when the voltage at the beginning of the wordline reaches the 50% switch point
to the time when voltage at the storage node that initially stores a “1” (VL)
reaches 10% of its initial value (VL = VDD × 0.1).
For transient simulations, resistance and capacitances of the SRAM cell are
extracted to compute WL and BL capacitances. Then, a memory array is
assumed to measure TR and TW.
9
1.1.4 Monolithic 3-D integration
With continued technology scaling, interconnects have become the bottleneck in fur-
ther performance and power consumption improvements in modern microprocessors.
3-D ICs provide a promising approach for addressing the interconnect bottleneck
because they enable reduction in overall interconnect length and total number of re-
peaters on long interconnects [19]. They also have the potential to push Moore’s
law further by accommodating more transistors per unit footprint area along with a
reduction in power consumption, interconnect length, and the number of repeaters.
3-D ICs can be fabricated using either parallel or sequential integration. In parallel
3-D integration, layers are processed independently and connected by through-silicon
vias (TSVs) [20]. TSV-based 3-D technologies have been extensively studied and
shown to be effective at reducing interconnect length, power consumption, and delay
[21]. However, TSV-based 3-D ICs cannot fully utilize the benefits of the third di-
mension due to their large TSV diameter and layer alignment issues [22]. In addition,
parallel integration often uses 2-D block-level modules. This does not benefit from
the third dimension at the gate or transistor level.
On the other hand, in sequential 3-D integration, also known as monolithic 3-
D integration, the layers are processed sequentially and connected using monolithic
inter-tier vias (MIVs), which have much smaller diameter (around 50nm) compared
to that of TSVs (around 1µm). Therefore, monolithic integration offers higher den-
sity and less parasitics, delay, and power consumption. Unlike parallel integration,
monolithic integration can also benefit from a reduction in intra-module intercon-
nect lengths to further reduce power consumption and delay. There are three types
of monolithic implementations: block-level monolithic (BLM) [23], gate-level mono-
lithic (GLM) [19], and transistor-level monolithic (TLM) [22], as shown in Fig. 1.8.
Each monolithic integration implementation style has its advantages and draw-
backs. In BLM, 2-D modules are floorplanned on multiple transistor layers to build
10
L2 cache
Core 2Core 3
Core 1Core 4
(a) (b) (c)
Figure 1.8: Monolithic 3-D integration styles: (a) block-level, (b) gate-level, and (c)transistor-level.
the 3-D design [23]. Existing electronic design automation (EDA) tools can be used
to create 2-D BLM modules [24]. However, it requires a 3-D floorplanner to place
modules on multiple layers efficiently. BLM designs do not benefit from potential
intra-block interconnect reduction since all the cells of the module are placed on the
same transistor layer. GLM places 2-D standard cells on multiple transistor layers to
generate 3-D modules. Unlike BLM, GLM benefits from intra-module interconnect
length reduction [19]. This leads to smaller power consumption and delay. GLM
implementations, however, require additional EDA tools, such as a 3-D gate-level
placement tool [19]. In TLM designs, n-type and p-type transistors are fabricated on
different layers. Thus, it enables independent optimization of the transistor layers. It
also benefits from a reduction in intra-module interconnect lengths similar to GLM.
However, it requires a new 3-D cell library in which n-type and p-type transistors of
each cell are implemented on separate layers [22].
A hybrid monolithic (HM) design consists of modules that are implemented in
different monolithic styles to utilize their advantages. For example, an HM design
may use logic modules implemented in GLM or TLM for power savings and 2-D
memory modules implemented in BLM for area efficiency. To realize an HM design,
we need a floorplanner that can handle both 2-D and 3-D modules in addition to the
tools needed to implement BLM, GLM, and TLM designs.
11
In Chapter 4, we design 3-D TLM SRAM cells for enhanced read performance and
reduced leakage power consumption. In Chapter 5, we present an HM floorplanner
and investigate the benefits of HM designs at processor core level. In Chapter 6, we
introduce an area/timing/power architectural modeling framework for HM designs at
multi-core system level.
1.2 Thesis contributions
The contributions of the thesis are as follows.
• Chapter 3 explores MPA FinFET-based SRAM designs [25]. Five novel MPA
FinFET-based SRAM cells are proposed and compared with symmetric and
SPA FinFET-based SRAM cells using dc and transient metrics. FinFETs
with asymmetries have been shown to be effective at reducing ILEAK and al-
leviating read-write conflict in SRAM cells. We, for the first time, show how
MPA FinFETs can be used to design ultra-low-leakage and robust 6T SRAM
cells. We combine multiple asymmetries, namely asymmetry in gate workfunc-
tion, source/drain doping concentration, and gate underlap, to address various
SRAM design issues all at once. Simulation results show that the ILEAK of
MPA FinFET-based SRAM cells can be reduced by up to 58× while ensuring
reasonable read/write stability metrics by combining asymmetries in gate work-
function and doping concentration. In addition, an MPA FinFET-based SRAM
cell can achieve high stability metrics with 22× ILEAK reduction compared to the
traditional symmetric FinFET-based SRAM cell. There is no area overhead as-
sociated with MPA FinFET-based SRAM cells. We evaluate SRAM cells under
different gate workfunction, doping concentration, supply voltage, and tempera-
ture values and show that MPA FinFET-based SRAM cells are promising under
12
various conditions. We also discuss the effect of process variations on SRAM
cells.
• Chapter 4 proposes two new 3-D monolithic FinFET-based 8T SRAM cells and
compares them with previously reported 6T and 8T SRAM cells implemented
in 2-D/3-D [26]. Conventional 6T and 8T SRAM cells are not area-efficient
when implemented in 3-D. Thus, we investigate 8T SRAM cells with equal
number of n-type and p-type transistors to achieve area efficiency for 3-D SRAM
design. Both the proposed cells use pFinFET access transistors for better area
efficiency in 3-D and low leakage current. Using pFinFETs access transistors,
however, hurts cell writeability. Thus, one of the cells, in addition to using
pFinFET access transistor, utilizes IG pFinFETs as pull-up transistors whose
back gates are tied to VDD for better writeability. This cell has 28.1% and 43.8%
smaller footprint area, 31.6% and 43.2% smaller leakage current, and 53.2% and
29.0% lower TR compared with conventional 2-D 6T SRAM and 2-D 8T SRAM
cells, respectively. The schematics, layouts, and bitline/wordline capacitances
of various SRAM cells are analyzed to understand the trade-offs in cell stability,
performance, and static power consumption. This chapter also investigates the
impact of process variations, memory array configurations, assist techniques,
different temperatures, and gate workfunction values on SRAM cells.
• Chapter 5 introduces the first 3-D HM floorplanner (3-D-HMFP) [27]. 3-D-
HMFP is capable of handling vertical constraints imposed by 3-D modules
and includes global interconnect power consumption. It can replace modules
with their alternative implementations to explore a large HM design space.
This chapter also presents a gate-level placement method needed to implement
GLM modules. It characterizes the OpenSPARC T2 processor core using differ-
ent monolithic implementations and compares their footprint area, wirelength,
13
power consumption, and temperature. Simulations show that under the same
timing constraint, an HM design offers 48.1% reduction in footprint area and
14.6% reduction in power consumption compared to those of the 2-D design at
the cost of higher power density and slightly higher temperature.
• Chapter 6 introduces McPAT-monolithic, a framework for modeling HM multi-
core architectures [28]. We develop the tools needed to model different mono-
lithic implementation styles for logic, memory, and network-on-chip (NoC) mod-
ules. The OpenSPARC T2 processor is used as a case study to compare different
monolithic implementation styles and explore the benefits of HM design. We
show that, under the same timing constraint, an HM design offers 47.2% re-
duction in footprint area and 5.3% in power consumption compared to a 2-D
design at the cost of slightly higher on-chip temperature.
The rest of the thesis is organized as follows. Chapter 2 presents prior work on
SRAM design and 3-D ICs. Chapter 3 shows how we combined multiple asymme-
tries in FinFETs to design ultra-low-power and robust SRAMs. Chapter 4 describes
the TLM SRAM design in 3-D and two new 8T SRAM cells we designed for en-
hanced TR and reduced ILEAK. Chapter 5 presents the HM 3-D floorplanner we
developed to explore the HM design space. Chapter 6 describes McPAT-monolithic,
an area/power/timing architectural framework for monolithic 3-D ICs at the multi-
core system level. Chapter 7 presents the concluding remarks and discusses future
directions.
This thesis covers material from the following publications: Refs. [25, 26, 27, 28].
The tools described in this thesis are available from www.princeton.edu/∼jha/files/tools.
14
Chapter 2
Related Work
This chapter discusses prior work on asymmetric FinFET-based SRAM design, TLM
3-D SRAM design, 3-D hybrid monolithic floorplanning, and monolithic 3-D design.
2.1 SPA FinFET-based SRAM design
SPA FinFETs were shown to be promising in robust, low-power, and area-efficient
SRAM design. An AWSG FinFET has different gate workfunction values for the front
and back of the gate, as shown in Fig 1.3(b). It was previously shown that asymmetry
in gate workfunction can reduce FinFET ILEAK by two orders of magnitude without
degrading performance excessively [29]. SRAM cells based on AWSG FinFETs were
also shown to be promising in terms of dc metrics and dynamic writeability with
significant ILEAK reduction [30].
An ADSG FinFET has unequally-doped source and drain regions, as shown in
Fig 1.3(c). 6T SRAM cells designed using ADSG FinFETs were shown to have
higher RSNM and WM, and reduced cell ILEAK at the cost of higher access time [31].
Unequally-doped source and drain terminals lead to unequal current flow across the
transistor, depending on whether the drain-to-source voltage bias (VDS) is positive
or negative. If the doping concentration is lower at the drain side of the FinFET,
15
the VDS > 0 current is higher (smaller) than the VDS < 0 current in an nFinFET
(pFinFET) [31]. ADSG FinFETs can help mitigate the read-write conflict in a 6T
SRAM cell. A read operation requires a weaker access transistor with respect to a pull-
down transistor, whereas a write operation requires a strong access transistor with
respect to a pull-up transistor to write the cell. These contrasting requirements create
the read-write conflict in a 6T SRAM cell. Connecting the lower-doped terminal of
an ADSG FinFET to the storage node enables a weaker access transistor during the
read operation since the voltage bias from storage node-to-bitline is negative (VDS < 0
case). On the other hand, during the write operation, the voltage bias from storage
node-to-bitline is positive and the access transistor is stronger (VDS > 0 case) [31].
An AUSG FinFET has unequal gate underlap for source and drain sides, as shown
in Fig 1.3(d). SRAM cells designed with AUSG FinFETs were shown to enhance
RSNM and writeability while reducing leakage power consumption with no area over-
head [32]. In [33], SRAM cells designed using asymmetric drain spacer extension
FinFETs were shown to improve RSNM and WM, and reduce cell ILEAK, at the cost
of higher access time and area. Similar to ADSG, AUSG FinFETs with asymmetric
gate underlap provide unequal current between source and drain for VDS > 0 and
VDS < 0 [32, 33]. Thus, AUSG FinFETs can be used to mitigate the read-write
conflict.
An MPA FinFET combines multiple asymmetries in a single FinFET. Previ-
ously, MPA FinFETs were shown to be promising for ultra-low-leakage and high-
performance logic circuit design [34]. Specifically, asymmetric workfunction-underlap
SG (AWUSG) FinFETs were shown to provide a slightly higher ION and drastically
lower IOFF compared to their traditional symmetric SG FinFET counterparts.
In Chapter 3, we present SRAM cell designs based on MPA FinFETs. We combine
the advantages of different asymmetries to obtain ultra-low-leakage, robust, and dense
SRAM cells, by exhaustively exploring the SRAM cell design space spawned by MPA
16
FinFETs. We specifically aim to use asymmetry in gate workfunction to reduce
leakage power while utilizing asymmetry in doping concentration and underlap to
mitigate the read-write conflict. We identify five promising MPA FinFET-based
SRAM cells and compare them with symmetric and SPA FinFET-based SRAM cells.
2.2 TLM 3-D SRAM design
3-D TLM technology provides a new approach to SRAM design. In a TLM design, the
footprint area of an SRAM cell can be reduced significantly by building n- and p-type
transistors on two separate layers. A smaller footprint area can lead to shorter WL
and BL, hence an improvement in SRAM performance. Several 4T, 6T, and 8T SRAM
cells implemented in 3-D TLM technology have been previously reported. Monolithic
3-D 4T/6T SRAM cells that exploit dynamic back-gate biasing were reported in [35].
Inter-layer coupling was shown to improve SRAM performance and stability. Batude
et al. [36] presented a 3-D load-less 4T SRAM cell consisting of two p-type access
and two n-type drive transistors. A thin inter-layer dielectric (ILD) was used to
dynamically manipulate the Vth of the devices to improve cell stability. A 3-D 6T
SRAM cell consisting of Indium Gallium Arsenide nMOSFETs and Germanium (Ge)
pMOSFETs was shown to improve cell stability and performance while maintaining
the same ILEAK with respect to their 2-D counterparts [37]. Although the electron
and hole mobilities of III-V materials and Ge can be higher than those of silicon, it is
challenging to fabricate high-quality transistors using heterogeneous integration [37].
A 3-D 6T SRAM cell, in which the back gates of access transistors are connected to
the adjacent storage nodes to improve read stability, was presented in [38]. A 3-D 6T
SRAM cell based on ultrathin-body MOSFETs was sequentially processed using a
low thermal budget for the first time [39]. No degradation of bottom-tier devices was
observed due to the process. It was shown that top-layer transistors exhibit almost
17
identical electrical properties as the bottom-layer transistors. Designing an area-
efficient TLM SRAM cell is challenging because n- and p-type transistors need to be
placed on two layers and connected via MIVs, which impose additional constraints on
layout. A traditional 6T SRAM cell implemented in 3-D suffers from area inefficiency
because it has four n-type and two p-type transistors. In [40], a 3-D 6T SRAM cell
consisting of three n-type and three p-type transistors was proposed to reduce the
footprint area. This cell replaces an n-type access transistor with a p-type transistor to
equalize the number of n- and p-type transistors in the SRAM cell. However, it suffers
from degraded read stability due to weak p-type access transistor, and hence requires
a single-ended read through n-type access transistor. It also needs an additional WL
for p-type access transistors. This cell also has a degraded writeability with respect
to the traditional 6T SRAM cell because the pull-up transistor is as strong as p-type
access transistor. Thus, it is harder for the p-type access transistor to discharge the
storage node and flip the cell during a write operation.
A conventional 8T SRAM consists of six n-type and two p-type transistors [41].
It offers a high read stability because the internal nodes are not disturbed during a
read operation. However, similar to 6T SRAM cell, the unequal number of n- and
p-type transistors leads to an inefficient footprint area of the conventional 8T SRAM
cell when implemented in 3-D. Thus, previously reported 3-D 8T SRAM cells were
implemented using four n-type and four p-type transistors. A 3-D 8T SRAM cell,
constructed by adding two pFinFET read access transistors to a conventional 6T
SRAM cell, was presented in [42]. pFinFET access transistors were activated during
the read operation to increase the read stability. However, this cell suffers from a
degraded read performance due to the use of weaker pFinFET access transistors.
In addition, it has a 50% lower RSNM compared to the conventional 8T SRAM
cell because its internal nodes are still disturbed during a read operation. A 3-D
8T SRAM cell, constructed by replacing n-type read transistors of a conventional 8T
18
SRAM cell with p-type transistors to equalize the number of n- and p-type transistors,
was reported in [40, 43]. However, this cell can also suffer from a degraded read
performance due to the presence of weaker p-type transistors on the read path.
In Chapter 4, we present two new 3-D FinFET-based 8T SRAM cells. Our aim
is to design an area-efficient, low-power, and robust cell, along with a high read
performance. Therefore, we replace the nFinFET access transistors of a conventional
8T SRAM cell with pFinFETs for an area-efficient 3-D design and keep nFinFETs
on the read path to maintain a high read performance. The idea of replacing n-type
access transistors with p-type transistors was investigated in 2-D 6T and 8T SRAM
cells [44, 45]. Tawfik et al. [44] showed that a FinFET-based 6T SRAM cell, which
has pFinFET access transistors and IG pFinFET pull-up transistors with back gates
tied to VDD, can improve read stability by 60% and reduce ILEAK by 21% compared
to a conventional 6T SRAM cell. However, using weaker pFinFET access transistors
in our proposed cells hurts writeability. Thus, in one of the proposed cells, we replace
the SG pull-up pFinFETs with IG pFinFETs and connect their back gates to VDD
to weaken them with respect to access transistors and improve writeability.
2.3 3-D hybrid monolithic floorplanning
An HM design consists of modules implemented in different monolithic styles. BLM
modules are implemented on a single transistor layer, hence can be viewed as 2-D
modules. GLM and TLM modules, however, are implemented on multiple transistor
layers. Thus, we consider them to be 3-D modules, which can be viewed as vertically-
aligned 2-D modules. Combining 2-D and 3-D modules imposes vertical constraints
during floorplanning because the parts of GLM and TLM modules on different layers
need to be aligned, as shown in Fig 2.1.
19
(a) (b) (c)
Figure 2.1: Monolithic 3-D floorplanning of different monolithic styles: (a) BLM, (b)GLM/TLM, and (c) HM. Dashed lines indicate the vertical constraints on GLM/TLMmodules.
To explore the HM design space, we need a floorplanner that can handle both 2-D
and 3-D modules. Several studies have been reported on floorplanning of modules un-
der vertical constraints. A 3-D floorplan representation, namely the layered transitive
closure graph, was used in [46] to handle inter-layer alignment. In [47], mixed integer
linear programming formulations were used to handle block alignment constraints in
3-D floorplanning. A 3-D floorplanner based on sequence pair representation and 3-D-
graph-based packing algorithm to control vertical module alignments was proposed
in [48]. A T*tree based 3-D floorplanner, which can handle vertically-aligned 2-D
modules, was reported in [49]. A 3-D floorplanning methodology to address intercon-
nect structures imposing module alignment constraints on TSV-based systems was
proposed in [50]. In [51], a fixed-outline 3-D floorplanner that can handle folding and
alignment of different types of modules, such as soft, hard, folded, and stacked, was
proposed. However, these floorplanners do not include global interconnect power and
do not explore the HM space. In [52], high-level synthesis was integrated into a 3-D
floorplanner. This floorplanner replaces modules with their alternatives to find better
designs. However, it cannot handle hybrid floorplanning under vertical constraints.
20
In Chapter 5, we introduce 3-D-HMFP, a 3-D HM floorplanner, which can both
handle vertical constraints and replace modules with their alternative implementa-
tions to find optimal hybrid solutions. 3-D-HMFP also takes the global interconnect
power consumption into account to obtain better floorplans. We use 3-D-HMFP to
characterize the OpenSPARC T2 processor core using different monolithic implemen-
tations and compare their footprint area, power consumption, wirelength, and peak
temperature values.
2.4 Monolithic 3-D design
3-D ICs have previously been demonstrated to be effective at addressing the inter-
connect bottleneck, reducing power consumption, and improving performance. A
physical design flow for 3-D monolithic circuits was presented in [53] and shown to
decrease area, reduce wirelength, and improve performance compared to the 2-D im-
plementation. In [19], the OpenSPARC T2 core was used as a case study and it
was shown that the GLM design has 50.0% smaller footprint area and 15.6% less
power consumption compared to its 2-D counterpart. Ref. [54] demonstrated that
low-power design techniques that include folding functional unit (FU) modules can
help reduce the power consumption of 3-D ICs based on TSVs. The OpenSPARC
T2 core, implemented in a two-tier 3-D design, was shown to offer up to 52.3% re-
duced footprint area, 27.9% shorter wirelength, and 27.8% less power consumption
compared to its 2-D counterpart. In [55], the OpenSPARC T2 HM system-on-chip
(SoC) consisting of GLM logic and BLM memory modules was shown to reduce the
SoC power consumption by 8.3% when compared to its 2-D counterpart design. Prior
works, however, do not explore a large hybrid design space. In addition, they often
use an RTL-to-GDSII design flow to implement modules in different monolithic styles,
which is accurate but slow to implement. An architectural modeling tool is needed
21
to try different architectural parameter values and explore the HM design space more
quickly. Thus, we focus on developing tools to model monolithic designs from circuit
to system level and show the benefits of monolithic 3-D integration. We specifically
investigate HM designs, which can combine advantages of different monolithic styles.
In Chapter 6, we introduce McPAT-monolithic, an area, power, and timing ar-
chitectural modeling framework for multi-core HM designs. We develop FinPrin-
monolithic, CACTI-monolithic, and Orion-monolithic to model logic, memory and
NoC modules, respectively, and integrate them into McPAT-monolithic. We also in-
tegrate 3-D-HMFP with McPAT-monolithic for floorplanning of the processor cores.
McPAT-monolithic integrated with 3-D-HMFP enables a speedy and efficient design
space exploration of HM multi-core systems.
22
Chapter 3
Ultra-low-leakage, Robust FinFET
SRAM Design Using
Multi-parameter Asymmetric
FinFETs
Memory arrays consisting of SRAM cells occupy the largest area on chip and are re-
sponsible for significant leakage power consumption in modern microprocessors. With
the transition from planar CMOS technology to FinFETs, FinFET SRAM design has
become important. However, increasing leakage power consumption of FinFETs due
to aggressive scaling, width quantization, read-write conflict, and process variations
makes FinFET SRAM design challenging. In this chapter, we show how MPA Fin-
FETs can be used to design ultra-low-leakage and robust 6T SRAM cells. We propose
five novel MPA FinFET-based SRAM cells. We show that the ILEAK of MPA FinFET-
based SRAM cells can be reduced by up to 58× while ensuring reasonable read/write
stability metrics [25]. In addition, high stability metrics can be achieved with 22×
23
ILEAK reduction compared to the traditional symmetric FinFET-based SRAM cell.
There is no area overhead associated with MPA FinFET-based SRAM cells.
3.1 Introduction
Planar CMOS devices have reached their scaling limits due to intolerable SCEs and
leakage power consumption, and have been, hence, replaced by multi-gate transistors
[56]. Among multi-gate transistors, FinFETs are the most promising owing to their
compatibility with the CMOS fabrication process [8]. In FinFETs, a gate wraps
around the channel from multiple sides. This enables a better channel control, reduces
ILEAK, alleviates SCEs, and improves scalability of FinFETs. FinFETs also have
higher mobility and are less sensitive to RDF owing to their lightly-doped or undoped
channel [9].
FinFETs are very promising for SRAM design due to their robustness, higher
performance, and density. FinFET-based SRAMs have better read and write cur-
rents compared to their planar CMOS-based SRAM counterparts because FinFETs
have improved subthreshold slope, which enables lower Vth at given IOFF . Reduced
DIBL enables higher stability for FinFET-based SRAMs. Moreover, FinFET-based
SRAMs do not suffer from random dopant fluctuation, and hence have reduced Vth
and performance variation, which increases their robustness. FinFET-based SRAMs,
however, suffer from the width quantization issue, which is not an issue for planar
CMOS-based SRAMs. Overall, FinFET-based SRAMs have been shown to be supe-
rior to planar CMOS-based SRAMs. Therefore, a lot of effort has been directed at
FinFET-based SRAM design [11, 57].
Although FinFETs have less leakage than planar CMOS transistors, leakage power
consumption is still a major issue in FinFETs due to aggressive scaling. Besides,
the width quantization issue associated with FinFETs, process variations, and read-
24
write conflict in SRAM cells make FinFET SRAM design even more challenging.
One way to address these problems is to design SRAMs based on FinFETs with
asymmetric parameters. SRAM cells based on SPA FinFETs with asymmetry in
gate workfunction [30], source/drain doping concentration [31], gate underlap [33, 32],
and fin height [58] have been reported. These works have shown that SPA FinFETs
can reduce ILEAK, improve stability metrics of the SRAM cell, and help mitigate the
SRAM read-write conflict.
We take a step further. We characterize SRAM cells based on MPA FinFETs.
Such FinFETs combine two or more asymmetries in a single FinFET to gain advan-
tage from all the asymmetries. In designing MPA FinFET-based SRAM cells, we use
FinFETs with up to three asymmetries: in gate workfunction, source/drain doping
concentration, and gate underlap. MPA FinFETs offer newer tradeoffs in SRAM
design among leakage power consumption, robustness, and performance. In order to
achieve the highest density so that the area occupied by SRAM cells is minimized, we
only use single-fin FinFETs in our SRAM cells. We show that using FinFETs with
combined asymmetry in gate workfunction and source/drain doping concentration
can reduce the ILEAK of the SRAM cell by 58× while maintaining reasonable SRAM
stability metric values [25]. We also show that high read stability can be achieved
while reducing leakage power by 22×.
The rest of the chapter is organized as follows. Section 3.2 describes the sim-
ulation setup and provides SRAM dc and transient metrics needed to analyze an
SRAM cell. Section 3.3 describes the design and selection of promising MPA FinFET-
based SRAM cells and includes a comparative analysis of dc and transient metrics.
Section 3.4 evaluates SRAM metrics under different gate workfunction, doping con-
centration, supply voltage, and temperature values. Section 3.5 discusses the effect
of process variations on SRAM cells. Section 3.6 presents discussion of the results.
Section 3.7 concludes the chapter.
25
Table 3.1: 22nm SOI asymmetric FinFET device parameter values
Parameter (unit) ValueΦGF (eV ) 4.4ΦGB(eV ) 4.8NS(cm−3) 1020
ND(cm−3) 1019
LUNS(nm) 2LUND(nm) 12
3.2 Simulation setup
We evaluate the SRAM cells using a 22nm SOI FinFET technology. Table 1.1 shows
the parameter values for the traditional symmetric FinFETs. We perform 2-D hy-
drodynamic device simulations using Sentaurus Device Simulator [59] to measure
SRAM dc and transient metrics. We use Phillips unified model together with band-to-
band tunneling and avalanche multiplication models for mobility, Bandgap narrowing
model, and Shockley-Read-Hall recombination models for accurate simulation. We
perform initial simulations at the room temperature of 300◦K and a VDD of 0.9 V.
We consider other temperature values later.
In this chapter, we investigate SRAM cells based on FinFETs with asymmetry
in gate workfunction, source/drain doping concentration, gate underlap, and their
combinations. We use the parameter values shown in Table 3.1 for asymmetric SG
FinFETs (these values were chosen after careful evaluation). The asymmetric param-
eters are front-gate workfunction ΦGF , back-gate workfunction ΦGB, source doping
concentration NS, drain doping concentration ND, gate underlap at source side LUNS,
and gate underlap at drain side LUND. Other parameter values are the same as those
for symmetric SG FinFETs, as shown in Table 1.1. The SPA FinFETs we consider are
AWSG, ADSG, and AUSG. Understanding the impact of each technique on FinFET
device characteristics offers valuable insights into SRAM design. AWSG FinFETs
have different ΦGF and ΦGB. They have higher Vth, which leads to a slightly lower
26
Table 3.2: Numerical representation of FinFETs
FinFET type Numerical RepresentationSG 0 (000)
AWSG 1 (001)ADSG 2 (010)
AWDSG 3 (011)AUSG 4 (100)
AWUSG 5 (101)ADUSG 6 (110)
AWDUSG 7 (111)
ION but drastically lower IOFF compared to those of SG FinFETs. Asymmetry in
workfunction, therefore, can be an enabler for ultra-low-leakage SRAM cells with
high stability. L2 and L3 caches contribute significantly towards power consump-
tion via leakage power. Asymmetric workfunction incorporated SRAM cells can help
reduce the leakage power of the higher-level caches while providing higher stability
metrics. However, the delays associated with SRAM cells worsen because asymmetry
in workfunction leads to lower ION . The ADSG FinFET we use has a lower doping
concentration on the drain side (ND), which increases the resistance on the drain side
and reduces ION and IOFF . Reduced IOFF can help reduce the leakage power of an
SRAM cell while reduced ION can degrade the access time. In an AUSG FinFET,
the gate underlap is reduced on the source side (LUNS), causing both ION and IOFF
to increase. Both ADSG and AUSG nFinFETs (pFinFETs) have greater current
flowing for VDS > 0 (VDS < 0) compared to VDS < 0 (VDS > 0). Asymmetry in
doping concentration and underlap can address applications where the stability is the
bottleneck and read-write conflict is critical.
The device footprint of a FinFET does not increase when an asymmetry is intro-
duced, as shown in Fig. 1.6. Thus, the area of an MPA FinFET is the same as those
of symmetric and SPA FinFETs. In the rest of the chapter, for ease of reference, we
represent each FinFET type with a three-bit number, one bit for each asymmetric
parameter. Table 3.2 shows this representation.
27
The schematic of a conventional 6T SRAM cell is shown in Fig. 1.5. For the
sake of discussion, it is assumed that initially storage nodes L and R store “1” and
“0”, respectively. Each SRAM cell is represented with three numbers that denote
the type of pull-up, access, and pull-down transistors, respectively. For example,
a (0,1,4) SRAM cell consists of SG pull-up FinFETs, AWSG access FinFETs, and
AUSG pull-down FinFETs. This notation is used throughout the chapter.
3.2.1 SRAM dc metrics
Read stability, writeability, IREAD, and ILEAK are the main dc metrics of SRAM cells.
SRAM stability is of utmost importance for successful read/write operations. In this
chapter, the two most common methods to measure SRAM stability are used: static
noise margin and N-curve. RSNM depicts the minimum noise voltage that would
flip the cell bit during a read operation, thus causing a read failure. Another way
to measure stability is through the N-curve [60], as shown in Fig. 3.1. The N-curve
can be obtained using the same read simulation setup described in Section 1.1.3, by
plotting the current flowing into node L with respect to the voltage at L. The N-curve
takes both voltage and current into account. The read power noise margin (RPNM)
can be obtained from the N-curve using the following equation:
RPNM =∫ VB
VAILdVL.
RPNM is used to measure read stability. It is the power needed to upset the cell
during a read operation. A higher RSNM or RPNM implies that the SRAM cell is
more robust during the read operation.
Writeability can be quantified using WM, which is defined as the length of the
smallest square that can fit in the right half of the write butterfly curve [61], as
shown in Fig. 1.7b. Another way to measure the writeability of an SRAM cell is the
28
0 0.2 0.4 0.6 0.8−200
−150
−100
−50
0
50
100
150
A B C
VL (V)
I L (
µA)
Figure 3.1: N-curve.
write-trip power (WTP), obtained from the N-curve (Fig. 3.1), using
WTP =∫ VC
VB|IL|dVL.
A higher WM or lower WTP corresponds to better writeability.
Other dc metrics IREAD and ILEAK are explained in Section 1.1.3.
3.2.2 SRAM transient metrics
TR and TW of an SRAM cell constitute its transient metrics. We measure TR and
TW of an SRAM array consisting of 256 rows (wordlines) and 128 columns (bitlines).
We use the transport-analysis-based 3-D technology computer-aided design (TCAD)
capacitance extraction technique [62] to extract the front-end-of-line (FEOL) + back-
end-of-line (BEOL) parasitic capacitance of SRAM cells. We back-annotate the ex-
tracted capacitance in mixed-mode device simulations for better accuracy in transient
simulations. We use the π3 distributed RC line model for the wordline and bitline.
29
3.3 MPA FinFET-based 6T SRAM cells
In order to minimize the area occupied by SRAM cells and achieve the highest density,
we investigate SRAM cells consisting of FinFETs with a single fin only. Next, we
discuss how the most promising MPA FinFET-based SRAM cells are selected.
3.3.1 Selection of promising SRAM cells
We first characterize all possible configurations of MPA FinFET-based SRAM cells
using TCAD simulations. Then, the promising cells are selected from among them
using a partially automated process. Due to layout constraints, e.g., access and pull-
down transistors share the terminal connected to the storage nodes, both access and
pull-down transistors are either symmetrically or asymmetrically-doped (2 cases).
Pull-up transistors can be symmetric or asymmetric in terms of source/drain dop-
ing concentration, independent of other transistors (2 cases). In the context of gate
underlap, each FinFET type (pull-up, access, pull-down) can be either symmetric
or asymmetric (23 cases). Similarly, 23 cases arise from gate workfunction symme-
try/asymmetry. Thus, in all there are 2 × 2 × 23 × 23 = 256 different SRAM cell
configurations. We eliminate non-promising cells through the following steps, based
on SRAM dc metrics.
1. Cells that are dominated by other cells are eliminated first. For example, cell
(0,1,5), which consists of symmetric SG (0) pull-up, asymmetric workfunction
SG (1) access, and asymmetric workfunction-underlap SG (5) pull-down Fin-
FETs, is better in every dc metric compared to cell (6,2,2), which is composed
of ADUSG pull-up, ADSG access, and ADSG pull-down FinFETs, as shown
in Table 3.3. Therefore, cell (0,1,5) dominates cell (6,2,2). Hence, the latter is
eliminated. 28 cells are eliminated in this step.
30
Table 3.3: SRAM cell elimination examples
SRAM RSNM (mV) WM (mV) RPNM (µW) WTP (µW) IREAD (µA) ILEAK (nA)(6,2,2) 119 246 9.754 14.776 80.017 1.867(0,1,5) 136 275 12.977 11.550 80.421 0.120(4,0,1) 44 326 2.628 3.391 91.251 1.152(6,6,6) 130 242 12.797 19.944 84.691 3.589(2,3,7) 148 214 10.778 12.921 56.991 0.904(7,3,2) 174 255 17.464 12.403 56.870 0.103(4,1,1) 138 219 11.488 11.170 71.160 0.185(0,1,1) 126 313 9.884 8.298 71.160 0.097
2. Cells with low RSNM (RSNM < 90 mV) are eliminated next, since a low RSNM
may lead to SRAM failure, e.g., cell (4,0,1). 84 of the remaining cells are
eliminated in this step due to their low RSNM. Most of these cells have stronger
access transistors with respect to their pull-down transistors.
3. Since leakage power is a major concern in SRAM design, cells with high leakage
(ILEAK > 2.4 nA) are eliminated, e.g., cell (6,6,6). 37 of the remaining cells are
eliminated in this step due to their high ILEAK. These are mostly the cells with
asymmetry in underlap, which leads to high ILEAK.
4. Among the remaining non-dominated cells, elimination is done manually
through the following steps. 98 of the remaining cells are eliminated in this
step.
• If a cell is nearly dominated by another cell, it is eliminated. For example,
cell (2,3,7) is eliminated because it is worse than cell (7,3,2) in every metric
but IREAD, and only slightly superior in IREAD.
• Cells with fewer asymmetries are favored since they require fewer fabri-
cation steps. For example, although cell (0,1,1) has slightly lower read
stability, it has less fabrication cost and is better in other metrics com-
pared to cell (4,1,1). Thus, cell (4,1,1) is eliminated.
31
Table 3.4: 6T SRAM configurations
SRAM PU AX PD(0,0,0) [30] SG SG SG(1,1,1) [30] AWSG AWSG AWSG(2,2,2) [31] ADSG ADSG ADSG(4,4,4) [32] AUSG AUSG AUSGPGFB [63] SG IG SG
(0,1,1) SG AWSG AWSG(0,1,5) SG AWSG AWUSG(0,3,3) SG AWDSG AWDSG(3,3,3) AWDSG AWDSG AWDSG(4,3,7) AUSG AWDSG AWDUSG
After the above eliminations, only five MPA FinFET-based SRAM cells appear to
be promising. The SRAM cell configurations selected for further analysis, including
the cell consisting of symmetric SG FinFETs, three SPA FinFET-based cells, pass-
gate feedback (PGFB) design consisting of symmetric SG and IG FinFETs, and the
five proposed MPA FinFET-based cells, are shown in Table 3.4. PGFB is a well-
known 6T SRAM cell designed for high read stability [63]. It uses IG FinFETs as
access transistors. The back gate of the access transistor is connected to the storage
node in order to weaken the access transistor to achieve high read stability.
3.3.2 SRAM dc metrics analysis
In this section, we demonstrate that MPA FinFETs offer competitive dc metrics, such
as read stability, writeability, IREAD, and ILEAK. Table 3.5 shows dc and transient
metric values for the analyzed SRAM cells.
Cell (0,0,0) consists of high-performance and low-Vth SG FinFETs. Though cell
(0,0,0) has low RSNM and RPNM, it has high writeability and IREAD. However, its
ILEAK is also high. Since cell (0,0,0) is composed of conventional FinFETs, we use
this cell as the baseline.
32
Table 3.5: 6T SRAM dc and transient metric values
SRAM RSNM (mV) WM (mV) RPNM (µW) WTP (µW) IREAD (µA) ILEAK (nA) TR (ps) TW (ps)(0,0,0) 31 327 2.074 3.838 108 2.033 27.0 34.2(1,1,1) 91 361 5.570 3.354 71 0.041 45.9 38.2(2,2,2) 106 303 8.185 11.538 80 1.803 34.0 32.9(4,4,4) 68 264 5.946 13.409 121 4.529 23.5 36.1PGFB 170 159 12.738 15.788 60 2.033 49.7 64.5(0,1,1) 126 313 9.884 8.298 71 0.097 45.9 50.8(0,1,5) 136 275 12.977 11.550 80 0.120 44.6 52.6(0,3,3) 173 287 14.768 10.681 54 0.091 54.7 54.8(3,3,3) 135 339 9.647 4.996 54 0.035 54.8 38.7(4,3,7) 188 158 19.953 18.121 57 0.197 53.6 65.2
Cell (1,1,1) reduces ILEAK by 50× since it is based on AWSG FinFETs, thus
addressing the SRAM leakage power problem very effectively. Cell (1,1,1) also has
the best writeability metric values, i.e., highest WM and lowest WTP, owing to a
weaker pull-up transistor compared to an access transistor. Although cell (1,1,1)
provides a higher RSNM and RPNM compared to cell (0,0,0) due to increased Vth,
read stability is still low since the access and pull-down FinFETs are of the same
type and hence have the same strength. IREAD decreases by 34% due to the weaker
transistors.
Cell (2,2,2) utilizes bidirectional current flow across access transistors using ADSG
FinFETs. This helps mitigate the read-write conflict of the SRAM cell. By connect-
ing the low-doped terminal to the storage node, the access transistor is made weaker
during the read operation and stronger during the write operation. Therefore, ADSG
access transistors not only enhance read stability, but also help writeability. Com-
pared to cell (0,0,0), cell (2,2,2) demonstrates a significant improvement in RSNM
and RPNM, but lower writeability. Due to the lower drain doping concentration, the
drain resistance of an ADSG FinFET is larger than that of an SG FinFET. As a
result, IREAD is degraded by 26%, but ILEAK is reduced by 11%.
Cell (4,4,4) offers the highest IREAD among all configurations owing to the high
current drive of AUSG FinFETs. Decreasing LUN on the source side enables a higher
ION , but with an increased IOFF . Therefore, ILEAK increases by 2.2×. Besides, like
33
the ADSG FinFETs, AUSG FinFETs can also utilize bidirectional current flow across
access transistors to help resolve the read-write conflict.
The PGFB cell provides high read stability by connecting the storage node to the
back gate of the IG access transistor, thus weakening the access transistor during the
read operation. However, due to the weaker access transistor, WM reduces by 51%,
thus degrading writeability, and IREAD decreases by 44%. ILEAK remains the same as
that of cell (0,0,0).
In the case of the chosen MPA FinFET-based SRAM cells, cell (0,1,1) demon-
strates that changing even a single FinFET type can significantly affect the SRAM
metric values. Compared to cell (1,1,1), changing the pull-up transistor from AWSG
to SG increases RSNM by 38%, as shown in Fig. 3.2, and RPNM by 77%, at the cost
of worse writeability: 13% decrease in WM and 147% increase in WTP. The effect
of the pull-up transistor on writeability is straightforward. As the pull-up transistor
gets stronger, it becomes harder for the access transistor to discharge the storage node
that stores a “1”. Therefore, more power is required to complete the write opera-
tion. As a result, WM decreases and WTP increases. On the other hand, the pull-up
transistor strength can also impact the RSNM by changing the shape of the read
butterfly curve. Although access and pull-down transistors are the main components
that determine read stability, for better read stability the pull-up transistor needs to
be selected carefully as well. Cell (0,1,1) has 34% and 21× reduction in IREAD and
ILEAK, respectively, compared to those of cell (0,0,0).
Cell (0,1,5) has better read stability and worse writeability compared to those
of cell (0,1,1) owing to the strong AWUSG pull-down FinFET. Cell (0,1,5) has the
highest IREAD among the proposed MPA FinFET-based SRAM cells. Leakage is
reduced by 17× compared to that of cell (0,0,0) as a result of the asymmetric gate
workfunction in nFinFETs.
34
0 0.2 0.4 0.6 0.8V
L (V)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
VR
(V
)
(0,1,1)(1,1,1)
Figure 3.2: RSNMs of the (1,1,1) and (0,1,1) cells show how the pull-up transistorcan impact read stability.
Cell (0,3,3) offers high read stability together with reasonable writeability. By
combining asymmetries in gate workfunction and doping concentration, the ILEAK is
reduced by 22× compared to that of cell (0,0,0), and IREAD is decreased by 50%.
Cell (3,3,3) consists of AWDSG FinFETs, and hence offers the lowest leakage
among all cells, since leakage is reduced both due to the asymmetry in gate work-
function and in doping concentration. ILEAK is reduced by 58× compared to that of
cell (0,0,0). Yet, cell (3,3,3) has reasonable stability metric values due to its high-Vth
FinFETs, and unequal current flow through asymmetrically-doped access transistor
during read and write operations. Compared to cell (0,3,3), its RSNM is 22% and
RPNM is 35% smaller while WM is 18% higher and WTP is 53% smaller, as a result
of the weaker pull-up transistor. Cell (3,3,3) has the best writeability (highest WM,
lowest WTP) among the proposed MPA FinFET-based SRAM cells.
Cell (4,3,7) has the highest RSNM and RPNM of all SRAM cells. Adding asym-
metry in gate underlap to the pull-down transistor makes it stronger and increases
read stability. However, due to the use of a strong pull-up transistor, cell (4,3,7)
35
has the worst writeability. IREAD is 47% and ILEAK is 10× smaller than those of cell
(0,0,0).
Overall, an SRAM cell consisting of SG FinFETs suffers from adverse read stabil-
ity and leakage power consumption. One way to reduce the ILEAK and increase read
stability is to increase the Vth of the FinFETs. Asymmetry in gate workfunction can
be used to reduce leakage if we want to avoid using new gate workfunction values
to increase Vth. Use of asymmetry in doping concentration and gate underlap was
shown to be effective in mitigating the read-write conflict.
3.3.3 SRAM transient metrics analysis
Table 3.5 also shows the transient metric values of analyzed 6T SRAM cells. As
expected, as IREAD increases, TR decreases because it takes less time to discharge the
bitline capacitance and generate the voltage difference between the bitlines to activate
the sense amplifier. TR is the smallest for cell (4,4,4) as it provides the highest IREAD.
Cell (0,1,5) has the smallest TR among proposed MPA FinFET-based SRAM cells.
Due to their lowest IREAD, cells (0,3,3) and (3,3,3) have the worst TR, 2× larger than
that of cell (0,0,0).
TW is highly correlated with the strength of the pull-up and access transistors.
Cell (2,2,2) has weaker pull-up transistor compared to access transistor, leading to
better writeability and smaller TW. For the same reason, cell (3,3,3) has the smallest
TW among the MPA FinFET-based SRAM cells.
MPA FinFET-based SRAM cells we propose suffer from inferior transient metrics
because the strength of the MPA FinFETs is reduced due to asymmetry in gate
workfunction and doping concentration. Reduced ION increases the time to charge
and discharge capacitances. Therefore, both TR and TW have increased for SRAM
cells based on MPA FinFETs. Resizing the memory array or increasing the supply
voltage can help improve TR and TW.
36
3.4 Analysis of the SRAM cells under different
gate workfunction, doping concentration, sup-
ply voltage, and temperature values
In this section, we demonstrate that the proposed MPA FinFET-based SRAM cells re-
main promising under different gate workfunction (hence, different Vth), source/drain
doping concentration, supply voltage, and temperature values.
3.4.1 Different gate workfunction values
We have chosen gate workfunctions such that SG nFinFETs have ΦG = (4.4+∆Φ)eV
and SG pFinFETs have ΦG = (4.8−∆Φ)eV in order to maintain approximately twice
ION in nFinFETs than in pFinFETs. AWSG FinFETs have ΦGF = (4.4+∆Φ)eV and
ΦGB = (4.8−∆Φ)eV . ∆Φ changes from 0 to 0.2eV with a step size of 0.02eV . The
gate underlap asymmetry is chosen such that an AWUSG FinFET performs better
than an SG FinFET. To achieve this aim, we choose LUNS = (2+∆Φ/0.02)nm, which
guarantees that an AWUSG FinFET has a higher ION and lower IOFF compared
to those of an SG FinFET. At ∆Φ = 0.2eV (ΦGF = ΦGB = 4.6eV ), there is no
asymmetry in the gate workfunction. Hence, in this case, we do not introduce any
asymmetry in gate underlap. As a result, all SRAM cells, except PGFB, converge to
the behavior of the (0,0,0), (2,2,2), or (0,2,2) cells at ∆Φ = 0.2eV .
The RSNM for the SRAM cells for different gate workfunction values is shown in
Fig. 3.3. As ∆Φ (hence, Vth) increases, the RSNM of cell (0,0,0) increases. The PGFB
cell becomes even more advantages as ∆Φ increases, due to weaker access transistors.
The PGFB cell has 72% higher while cells (2,2,2) and (0,2,2) have 26% higher RSNM
compared to that of cell (0,0,0) when ∆Φ = 0.2eV .
37
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
RS
NM
(V
)
ΔΦ (eV)
(0,0,0) (1,1,1) (2,2,2) (4,4,4) PGFB (0,1,1) (0,1,5) (0,3,3) (3,3,3) (4,3,7)
Figure 3.3: RSNM under different gate workfunction values.
Fig. 3.4 shows the plots of WM with respect to ∆Φ. As ∆Φ increases, WM
increases for all cells except for PGFB. Specifically, the WM of cell (4,3,7) increases
significantly since the impact of asymmetric gate underlap on the pull-up transistor
decreases. Cells (2,2,2) and (0,2,2) have 2% lower while PGFB has 52% lower WM
compared to that of cell (0,0,0) when ∆Φ = 0.2eV .
As shown in Fig. 3.5, RPNM changes are similar to those in RSNM under different
gate workfunction values. At ∆Φ = 0.2eV , the PGFB, (0,2,2), and (2,2,2) cells have
90%, 29%, and 27% higher RPNM compared to that of cell (0,0,0), respectively.
WTP decreases as ∆Φ increases, as shown in Fig. 3.6. The PGFB cell has the
worst WTP when ∆Φ = 0.2eV , 153% higher than that of cell (0,0,0).
IREAD is plotted in Fig. 3.7. The change in IREAD for the (1,1,1) and proposed
MPA FinFET-based SRAM cells is less than 8% since they have FinFETs with an
asymmetric gate workfunction. Increasing ΦGF from 4.4eV to 4.6eV increases nFin-
FET Vth, thus decreases the current flow. On the other hand, the change in ΦGB
from 4.8eV to 4.6eV helps increase the nFinFET current. A similar scenario is valid
38
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
WM
(V
)
ΔΦ (eV)
(0,0,0) (1,1,1) (2,2,2) (4,4,4) PGFB (0,1,1) (0,1,5) (0,3,3) (3,3,3) (4,3,7)
Figure 3.4: WM under different gate workfunction values.
0
5
10
15
20
25
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
RP
NM
(μW
)
ΔΦ (eV)
(0,0,0) (1,1,1) (2,2,2) (4,4,4) PGFB (0,1,1) (0,1,5) (0,3,3) (3,3,3) (4,3,7)
Figure 3.5: RPNM under different gate workfunction values.
for pFinFETs. Therefore, the AWSG FinFET current changes less compared to the
SG FinFET current as ∆Φ increases. This explains the small change in IREAD for
the (1,1,1) and proposed MPA FinFET-based SRAM cells. The (2,2,2), (0,2,2), and
39
0
2
4
6
8
10
12
14
16
18
20
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
WT
P (μ
W)
ΔΦ (eV)
(0,0,0) (1,1,1) (2,2,2) (4,4,4) PGFB (0,1,1) (0,1,5) (0,3,3) (3,3,3) (4,3,7)
Figure 3.6: WTP under different gate workfunction values.
0
20
40
60
80
100
120
140
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
I RE
AD
(μA
)
ΔΦ (eV)
(0,0,0) (1,1,1) (2,2,2) (4,4,4) PGFB (0,1,1) (0,1,5) (0,3,3) (3,3,3) (4,3,7)
Figure 3.7: IREAD under different gate workfunction values.
PGFB cells have 25%, 25%, and 54% lower IREAD compared to that of cell (0,0,0)
when ∆Φ = 0.2eV .
40
0.00
0.01
0.10
1.00
10.00
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
I LE
AK
(nA
)
ΔΦ (eV)
(0,0,0) (1,1,1) (2,2,2) (4,4,4) PGFB (0,1,1) (0,1,5) (0,3,3) (3,3,3) (4,3,7)
Figure 3.8: ILEAK under different gate workfunction values.
As ∆Φ increases, ILEAK decreases exponentially due to an increase in Vth, as shown
in Fig. 3.8. At ∆Φ = 0.2eV , cells (2,2,2) and (0,2,2) have 28% less, while the PGFB
cell has 19% less ILEAK compared to that of cell (0,0,0).
Fig. 3.9 shows TR under different gate workfunction values. TR changes by less
than 12% for the (1,1,1) and MPA FinFET-based SRAM cells, since IREAD change
was less than 8% for these cells. At ∆Φ = 0.2eV , TR is 82% higher for the PGFB cell
compared to that of cell (0,0,0), while the (2,2,2) and (0,2,2) cells have 15% higher
TR.
TW is 60% larger for the PGFB cell when ∆Φ = 0.2eV , while it only increases by
1% and 4% for cells (2,2,2) and (0,2,2), respectively, compared to that of cell (0,0,0),
as shown in Fig. 3.10.
3.4.2 Different doping concentration values
Next, we changed the doping concentration of asymmetrically-doped FinFETs and
observed the effect on the SRAM metric values. Changing the drain doping concen-
41
0
10
20
30
40
50
60
70
80
90
100
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
TR
(ps)
ΔΦ (eV)
(0,0,0) (1,1,1) (2,2,2) (4,4,4) PGFB (0,1,1) (0,1,5) (0,3,3) (3,3,3) (4,3,7)
Figure 3.9: TR under different gate workfunction values.
0
10
20
30
40
50
60
70
80
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20
TW
(ps)
ΔΦ (eV)
(0,0,0) (1,1,1) (2,2,2) (4,4,4) PGFB (0,1,1) (0,1,5) (0,3,3) (3,3,3) (4,3,7)
Figure 3.10: TW under different gate workfunction values.
tration of asymmetrically-doped FinFETs can affect SRAM stability significantly, as
shown in Table 3.6. For example, cell (0,3,3) has 21% lower RSNM and 6% higher
WM when ND = 5×1019 cm−3 compared to when ND = 1019 cm−3. By changing the
42
Table 3.6: SRAM stability metric values under different drain doping concentrations
SRAM RSNM (mV) WM (mV)(2,2,2) 106 47 303 321(0,3,3) 173 137 287 304(3,3,3) 135 101 339 355(4,3,7) 188 156 158 171
ND(cm−3) 1019 5× 1019 1019 5× 1019
doping concentration of the drain terminal, one may achieve the desired read stability
metric values at the cost of writeability, and vice versa.
3.4.3 Different supply voltage values
We simulated all analyzed SRAM cells under different VDD values to compare their
ILEAK at the same performance and vice versa. We changed the VDD values of the cells
to get the same performance by ensuring that each cell has the same IREAD (108 µA)
and compared their ILEAK. Similarly, we compared the IREAD of cells for the same
ILEAK (2.033 nA). The results are shown in Table 3.7. For the same performance
(iso-IREAD), cell (1,1,1) has the lowest ILEAK, which is 25× smaller than that of cell
(0,0,0), thanks to the asymmetry in gate workfunction. Cell (0,1,5) provides the
highest IREAD for the same ILEAK (iso-ILEAK) due to the asymmetric gate underlap
in the pull-down transistor. Cell (0,3,3), which was shown to be promising in terms
of stability metrics, offers 7.79× ILEAK reduction for the same performance and 33%
higher IREAD for the same ILEAK. Overall, MPA FinFET-based SRAMs offer higher
performance for the same ILEAK and lower ILEAK at the same performance compared
to the conventional SRAM cell.
3.4.4 Different temperature values
The temperature of on-chip SRAM arrays can go up to 90◦C [64]. We perform
simulations under temperature value of 27◦C (300◦K), for which results are given in
43
Table 3.7: Comparison of SRAM cells at iso-IREAD/iso-ILEAK
SRAM ILEAK at iso-IREAD IREAD at iso-ILEAK
(0,0,0) 1.00× 1.00×(1,1,1) 0.04× 1.64×(2,2,2) 0.92× 1.34×(4,4,4) 2.16× 0.01×PGFB 1.17× 0.56×(0,1,1) 0.07× 1.63×(0,1,5) 0.07× 1.78×(0,3,3) 0.13× 1.33×(3,3,3) 0.10× 1.33×(4,3,7) 0.17× 1.37×
Table 3.8: SRAM metric values at 0◦C
SRAM RSNM (mV) WM (mV) RPNM (µW) WTP (µW) IREAD (µA) ILEAK (nA) TR (ps) TW (ps)(0,0,0) 37 325 2.424 4.656 112 0.500 26.7 34.9(1,1,1) 96 362 6.172 3.452 74 0.010 45.1 38.2(2,2,2) 114 302 9.170 12.455 83 0.441 33.7 33.8(4,4,4) 74 261 6.636 15.295 126 1.352 24.0 37.8PGFB 178 156 13.948 16.693 63 0.500 46.8 66.0(0,1,1) 131 309 10.683 8.858 74 0.021 45.1 51.5(0,1,5) 141 270 14.064 12.287 83 0.025 43.9 53.3(0,3,3) 179 283 15.996 11.399 55 0.019 54.2 54.7(3,3,3) 175 340 10.634 5.111 55 0.008 54.2 38.7(4,3,7) 195 151 21.574 19.490 59 0.042 53.1 67.0
Table 3.5, 0◦C, 65◦C and 90◦C, for which results are given in Table 3.8, 3.9, and
3.10, respectively. With an increasing temperature, read stability gets worse as RSNM
and RPNM decrease. Writeability, on the other hand, increases with an increasing
temperature. IREAD decreases at higher temperature since mobility degrades due to
impurity scattering. ILEAK increases exponentially with increasing temperature. Yet,
cell (3,3,3) delivers 39× reduction in ILEAK at 90◦C. TR increases with an increasing
temperature due to the decrease in IREAD. TW decreases slightly as temperature
increases. Overall, the proposed MPA FinFET-based SRAM cells can be seen to
maintain their advantages at corner temperatures.
44
Table 3.9: SRAM metric values at 65◦C
SRAM RSNM (mV) WM (mV) RPNM (µW) WTP (µW) IREAD (µA) ILEAK (nA) TR (ps) TW (ps)(0,0,0) 25 328 1.646 2.842 102 10.122 27.3 32.1(1,1,1) 82 360 4.748 3.202 68 0.253 46.9 38.2(2,2,2) 95 302 6.990 10.241 76 9.025 34.3 30.8(4,4,4) 59 268 5.046 11.004 116 20.542 23.8 33.8PGFB 157 164 11.187 14.575 57 10.122 48.9 57.5(0,1,1) 119 317 8.804 7.549 68 0.612 46.9 49.8(0,1,5) 127 279 11.510 10.582 77 0.753 45.5 51.8(0,3,3) 164 292 13.188 9.779 51 0.582 55.6 52.7(3,3,3) 126 338 8.353 4.846 51 0.221 55.6 38.6(4,3,7) 177 167 17.876 16.440 55 1.171 54.2 61.8
Table 3.10: SRAM metric values at 90◦C
SRAM RSNM (mV) WM (mV) RPNM (µW) WTP (µW) IREAD (µA) ILEAK (nA) TR (ps) TW (ps)(0,0,0) 22 329 1.407 2.300 98 24.331 27.8 30.6(1,1,1) 77 359 4.249 3.088 66 0.706 47.6 38.0(2,2,2) 88 302 6.305 9.394 73 21.753 34.8 30.1(4,4,4) 55 270 4.496 9.557 112 46.952 23.5 31.7PGFB 149 167 10.257 13.779 55 24.336 49.5 54.9(0,1,1) 114 318 8.122 7.071 66 1.699 47.5 49.1(0,1,5) 121 281 10.589 9.963 74 2.080 46.0 51.0(0,3,3) 158 295 12.230 9.235 50 1.617 56.1 49.1(3,3,3) 119 337 7.593 4.748 50 0.622 56.1 38.5(4,3,7) 171 172 16.622 15.438 53 3.121 54.6 60.8
3.5 Process variations
Process variations pose a severe challenge to SRAM performance due to the scaling
of both device parameters and VDD. SRAM cells are especially prone to process
variations because they are built from the smallest transistors to achieve high density.
Besides, SRAM operation depends on perfectly-matched transistors, which makes it
even more sensitive to process variations. We investigate variations in LG, TOX , TSI ,
and ΦG because they have been shown to affect SRAM performance and stability
metrics significantly [65]. Table 3.11 shows the nominal and [−3σ, 3σ] variation range
of these parameters. We assume the physical parameters have a normal distribution
[66] with 3σ/µ = 10% variation. We generate 100 sample points for Sobol sequence
based quasi Monte Carlo simulation, which provides performance akin to Monte Carlo
simulation while needing several orders of magnitude fewer sample points. We show
the distribution characteristics, i.e., mean (µ) and standard deviation (σ), due to
process variations in Table 3.12 and Table 3.13.
45
Table 3.11: Process variations
Parameter Nominal Value Range[−3σ, 3σ]LG(nm) 24 [21.6, 26.4]TSI(nm) 10 [9, 11]TOX(nm) 1 [0.9, 1.1]ΦGF (eV ) 4.4 [4.38, 4.42]ΦGB(eV ) 4.8 [4.78, 4.82]
Table 3.12: Distribution characteristics, µ
SRAM RSNM (mV) WM (mV) RPNM (µW) WTP (µW) IREAD (µA) ILEAK (nA) TR (ps) TW (ps)(0,0,0) 31 328 2.122 3.870 108 2.121 27.137 33.718(1,1,1) 91 361 5.636 3.371 72 0.042 45.798 38.223(2,2,2) 105 305 8.229 11.481 81 1.873 34.524 32.981(4,4,4) 68 266 6.006 13.377 122 4.969 23.781 35.928PGFB 170 160 12.877 15.827 61 2.121 49.530 64.336(0,1,1) 126 313 9.974 8.313 72 0.100 45.766 50.813(0,1,5) 136 276 13.025 11.552 81 0.125 44.524 52.620(0,3,3) 173 289 14.838 10.674 54 0.094 54.414 54.601(3,3,3) 134 340 9.675 4.981 54 0.036 54.444 38.641(4,3,7) 187 160 20.059 18.119 58 0.208 53.242 64.928
Table 3.13: Distribution characteristics, σ
SRAM RSNM (mV) WM (mV) RPNM (µW) WTP (µW) IREAD (µA) ILEAK (nA) TR (ps) TW (ps)(0,0,0) 6 1 0.315 0.524 1.6 0.562 1.123 0.732(1,1,1) 5 2 0.396 0.166 1.2 0.010 0.853 0.722(2,2,2) 5 1 0.372 0.153 1.2 0.489 0.950 0.683(4,4,4) 5 2 0.483 0.594 2.2 1.585 0.949 0.977PGFB 6 3 0.551 0.143 1.6 0.562 1.324 1.514(0,1,1) 5 3 0.486 0.168 1.3 0.020 0.853 0.746(0,1,5) 5 3 0.608 0.227 1.7 0.026 0.860 0.642(0,3,3) 4 3 0.467 0.151 1.0 0.019 0.842 0.783(3,3,3) 5 1 0.426 0.136 0.9 0.008 0.842 0.753(4,3,7) 4 5 0.650 0.409 1.1 0.059 0.834 1.106
Cells (0,3,3) and (4,3,7) have the highest µ and lowest σ for RSNM, which makes
them highly robust during read operation. We observe a 3σ RSNM value of 12-18
mV across the cells, which could negatively impact read operation for SRAM cells
with a lower RSNM.
Process variations have less impact on WM compared to RSNM. Cell (3,3,3) has
the highest µ and lowest σ among the MPA FinFET-based SRAM cells, thus offering
the best writeability performance. Cell (4,3,7) has the worst WM with the lowest µ
and highest σ.
46
Although cell (4,3,7) has the highest σ for RPNM, its σ/µ ratio is the smallest
together with that of cell (0,3,3), which makes them better in read stability.
In terms of WTP, cell (4,3,7) has the largest µ and largest σ among SRAM cells
based on MPA FinFETs, which confirms its inferior writeability performance.
SRAM cells that have a higher IREAD µ have a higher σ as well. IREAD σ/µ ratio
is less than 3% for the analyzed SRAM cells.
Process variations affect ILEAK the most because gate workfunction variation is
the main contributor to Vth variation, which impacts ILEAK exponentially. Cell (3,3,3)
has both the smallest µ and σ.
TR and TW of MPA FinFET-based SRAM cells under process variations have a
σ that is smaller than 2% of µ.
3.6 Discussion
We proposed and compared five MPA FinFET-based SRAM cells to symmetric and
SPA FinFET-based SRAM cells in terms of dc and transient metrics. Results show
that by combining multiple asymmetries in FinFETs, we can design very low-leakage
and robust SRAM cells.
For better read stability, stronger pull-down transistors compared to access tran-
sistors are needed in order to keep the voltage at the internal node that stores a “0”
as small as possible during the read operation. On the other hand, strong access and
weak pull-up transistors are needed for better writeability to more easily flip the bit of
the cell during the write operation. Using transistors with multiple fins (hence, larger
width), better SRAM stability behavior can be achieved. For example, increasing the
number of fins in the pull-down transistor will increase its strength and provide better
read stability. However, increasing the number of fins increases cell area and leakage.
As SRAMs already occupy a large fraction of the on-chip area and consume an even
47
larger fraction of leakage power, designing SRAM cells with single-fin FinFETs saves
area and power. The MPA FinFETs we analyzed offer different effective strengths
even though they are all assumed to have a single fin. Thus, they offer a path to
high-density SRAMs with low leakage and acceptable stability values.
Requiring the access transistors to be weaker for better read stability and stronger
for better writeability leads to the so-called read-write conflict. This can be addressed
by using FinFETs with asymmetry in doping concentration or gate underlap as access
transistors. Since they exhibit an unequal current flow across the FinFET for positive
and negative voltage bias between the source and drain terminals, they can help
mitigate the read-write conflict.
For ultra-low power SRAMs, cell (3,3,3) is the most promising since it offers a
58× ILEAK reduction, assuming ΦGF = 4.4eV and ΦGB = 4.8eV , compared to the
SG FinFET-based SRAM cell. Due to its asymmetry in doping concentration, it
also provides comparable stability values. On the other hand, reduced current drive
increases TR and TW of cell (3,3,3).
The ILEAK improvement in cell (3,3,3) is as expected. In the hold mode, three Fin-
FETs of a 6T SRAM cell leak (PD1, AX2, and PU2). Cell ILEAK can be estimated by
adding the leakage of these FinFETs without simulating the entire cell. The ILEAK of
an SG pFinFET and nFinFET is 0.056 nA and 0.983 nA, respectively. Thus, ILEAK
of cell (0,0,0) can be estimated as 2.022 nA (0.056+2× 0.983 = 2.022), which is very
close the value (2.033 nA) obtained via simulating the entire cell. ILEAK of cell (3,3,3),
which consists of AWDSG FinFETs, can be estimated similarly. AWDSG pFinFET
(VDS < 0), AWDSG nFinFET (VDS > 0), and AWDSG nFinFET (VDS < 0) have an
ILEAK of 0.001, 0.017, and 0.022, respectively. Although PD1 and AX2 in cell (3,3,3)
are both nFinFETs, their ILEAK is different due to an asymmetry in doping concentra-
tion. ILEAK of cell (3,3,3) can be estimated as 0.04 nA (0.001 + 0.017 + 0.022 = 0.04),
which is slightly different than the value (0.035 nA) obtained via simulating the en-
48
tire cell. A 51× reduction in ILEAK of cell (3,3,3) with respect to cell (0,0,0) can
be estimated using ILEAK values of single FinFETs, which are computationally much
cheaper to characterize compared to an entire cell. Such analytical approaches that
are based on single FinFET characteristics can be useful for evaluating SRAM cells in
early design steps. However, the entire cell needs to be simulated and characterized
in order to obtain more accurate results.
Use of cell (0,0,0) with gate workfunction values ΦGF = ΦGB = 4.6eV reduces
ILEAK by 1079× compared to that of cell (0,0,0) with ΦG = 4.4eV for nFinFET
and ΦG = 4.8eV for pFinFET. Besides, it has good read stability metric values
(RSNM=155 mV, RPNM=11.686 µW) and writeability metric values (WM=356 mV,
WTP=3.738 µW). This, however, would require an extra gate workfunction of 4.6eV
(since a ΦG of 4.4eV and 4.8eV , respectively, for an nFinFET and pFinFET will
still be required to implement high-performance logic). If an extra gate workfunction
is available, several other SRAM cells exhibit an even better performance than the
(0,0,0) cell.
For high read stability, the (0,3,3) or (4,3,7) cell can be used. The (4,3,7) cell
has the highest RSNM at the cost of weak writeability. Compared to cell (4,3,7), cell
(0,3,3) has slightly weaker read stability: RSNM is 8% and RPNM is 26% smaller,
respectively. However, it has significantly better writeability: WM is 82% higher and
WTP is 41% smaller. Cell (0,3,3) also consumes 2.2× less ILEAK compared to cell
(4,3,7) and requires fewer fabrication steps since it does not have an asymmetry in
gate underlap. Overall, for high stability purposes, cell (0,3,3) is the most promising
choice of all SRAM configurations we explored.
Cell (0,1,1) only has asymmetry in gate workfunction. Still, it offers good write-
ability and 21× leakage reduction. Its read stability can be improved using read-assist
techniques.
49
Although cell (0,1,5) has the highest IREAD among the targeted MPA FinFET-
based SRAM cells, it is less desirable because it requires an extra fabrication step due
to the use of an asymmetric gate underlap.
We have shown that MPA FinFETs can be used in designing promising SRAM
cells under different gate workfunction, source/drain doping concentration, supply
voltage, and temperature values. We have performed FinFET simulations and care-
fully select the SRAM design parameter values while taking previously reported data
into account. Nevertheless, optimal parameter values in FinFET or SRAM design can
change depending on designer’s choice and the applications. It is crucial to derive
an optimal design for symmetric and asymmetric FinFET devices. The search space
of an optimal FinFET design can be very large due to the large number of FinFET
parameters. An exhaustive search for an optimal FinFET design can be computa-
tionally laborious and time-consuming. Stochastic optimization techniques such as
genetic algorithm, simulated annealing etc., can be used to explore the large FinFET
design search space and find optimal designs.
Determining the degree of asymmetry in FinFETs in a systematic way is crucial
as it significantly impacts the performance of asymmetric FinFET-based designs.
The same methodology used to design optimal symmetric FinFETs can be used for
asymmetric FinFETs as well. However, finding optimal asymmetric FinFETs may
be computationally more costly since there is a larger design space to cover due
to additional asymmetric parameters. In addition, optimization of an asymmetric
FinFET by itself may not be sufficient for promising designs since the true benefit of
asymmetry can emerge at the circuit level. For example, an ADSG FinFET, which
may be inferior to an SG FinFET, can be especially useful at mitigating the read-
write conflict. Therefore, a device-circuit co-design approach is needed to optimize
asymmetric FinFETs.
50
In this work, only five MPA FinFET-based SRAM cells were analyzed in detail.
The cells that are not analyzed may not perform worse in every aspect compared
to these five cells. The cells that are very similar to one of the five proposed cells
were excluded from detailed analysis. For example, cell (4,3,3) performs similar to cell
(3,3,3). However, only cell (3,3,3) was chosen for further analysis as it performs better
than cell (4,3,3) in all aspects except RSNM and has fewer fabrication steps. Still,
it gives an idea as to how a (4,3,3) cell would perform. Thus, the five proposed cells
indirectly represent a larger design space, which consists of cells that can be obtained
with slight modifications in terms of asymmetries introduced in the FinFETs.
One disadvantage of MPA FinFET-based SRAM cells is that the number of fab-
rication steps increases as we introduce asymmetries in FinFETs. AWSG FinFETs
can be fabricated using differently doped gate stacks without the need for additional
masking steps [67, 68]. ADSG FinFETs require an extra mask to dope the source
and drain unequally [31]. AUSG FinFETs also require an extra mask to create an
asymmetric underlap, which can be achieved using asymmetric spacer [69] or tilt ion
implantation [70]. However, an AUSG FinFET is harder to fabricate compared to an
ADSG FinFET [33].
3.7 Chapter summary
For the first time, we showed that MPA FinFETs can be used to design ultra-low-
leakage, robust, and high-density SRAM cells. Results indicate that by using AWDSG
FinFETs in SRAM design, with combined asymmetries in gate workfunction and
doping concentration, the ILEAK of the cell can be reduced by 58× while still demon-
strating acceptable stability compared to the traditional symmetric SG FinFET-based
SRAM cell. To obtain high read stability, an SRAM cell designed with SG pull-up and
AWDSG access and pull-down transistors shows the most promise, along with 22×
51
ILEAK reduction. Cell TR and TW are higher for the proposed MPA FinFET-based
SRAM cells. MPA FinFET-based SRAM cells incur zero area overhead. However,
the number of fabrication steps increases for MPA FinFETs depending on the type
and number of asymmetries incorporated in them.
52
Chapter 4
3-D Monolithic FinFET-based 8T
SRAM Cell Design for Enhanced
Read Time and Low Leakage
FinFETs have replaced planar MOSFETs due to their superior performance, power
efficiency, and scalability. However, even FinFETs are expected to reach their scaling
limits due to physical limits, process variations, and intolerable SCEs. As an alter-
native to scaling, 3-D ICs can increase the number of transistors per unit footprint
area. Among 3-D technologies, monolithic 3-D integration promises the highest den-
sity, performance, and power efficiency owing to its high-density MIVs. In a TLM
design, n- and p-type transistors are placed on different layers. Thus, it requires a
new 3-D cell library. In this chapter, we propose two new 3-D TLM 8T SRAM cells.
Both the proposed cells use pFinFET access transistors to achieve better area and
leakage power efficiency in 3-D. One of the proposed cells utilizes IG pFinFETs as
pull-up transistors whose back gates are tied to VDD for better writeability. This cell
has 28.1% and 43.8% smaller footprint area, 31.6% and 43.2% smaller ILEAK, and
53
53.2% and 29.0% lower TR compared with conventional 2-D 6T and 8T SRAM cells,
respectively [26].
4.1 Introduction
3-D ICs can improve performance, decrease power consumption by reducing inter-
connect length, and fit more devices in a chip without increasing its footprint area.
Monolithic 3-D ICs are particularly attractive as they exploit the third dimension
more efficiently, thanks to their small MIVs that connect the transistor layers. In a
TLM design, n-type and p-type transistors are fabricated on two separate layers. This
enables independent optimization of these layers. However, every logic and memory
cell in the cell library needs to be reimplemented in 3-D.
SRAM cells often occupy more than half of the die area and consume a significant
amount of leakage power. Design of high-density, high-performance, and low-power
TLM SRAM cells with good stability metrics is crucial for TLM designs to succeed.
With continued scaling in device dimensions and VDD, the impact of process varia-
tions on SRAM stability increases. A conventional 6T SRAM cell can be prone to
stability issues when the internal nodes are disturbed during a read operation. In
addition, it has low area efficiency when implemented in 3-D because it has four n-
type and two p-type transistors. The conventional 8T SRAM cell can improve read
stability by isolating data retention from the read operation [41]. However, similar
to the 6T SRAM cell, the conventional 3-D 8T SRAM cell is not area-efficient due
to the asymmetry in n-type and p-type transistor count (six n-type and two p-type
transistors). We make the following contributions in an effort to design area-efficient,
low-power, and high-performance 3-D SRAM cells:
1. We propose two new 3-D 8T SRAM cells (8T 4N4P 3D proposed1,
8T 4N4P 3D proposed2) for enhanced TR, low ILEAK, and high read stability:
54
• 8T 4N4P 3D proposed1 replaces nFinFET access transistors of a conven-
tional 8T SRAM cell with pFinFETs to reduce the footprint area while
preventing the TR from degrading. It has a high read stability, thanks
to the isolated read operation. However, it suffers from poor writeability
due to the use of weak pFinFET access transistors. This problem can be
alleviated through write-assist techniques.
• 8T 4N4P 3D proposed2, in addition to using pFinFETs as access transis-
tors, employs IG pFinFETs as pull-up transistors and ties their back gates
to VDD to improve writeability by weakening the pull-up transistors.
2. We comprehensively evaluate the proposed cells against previously reported
2-D and 3-D SRAM cells and show that they are particularly promising for
low-power and high read performance designs.
3. We explore assist techniques to improve the writeability of the proposed cells.
We compare the proposed cells with a conventional 6T SRAM cell imple-
mented in 2-D (6T 4N2P 2D) and 3-D (6T 4N2P 3D), a conventional 2-D 8T
SRAM cell (8T 6N2P 2D), and two previously reported 3-D 8T SRAM cells
(8T 4N4P 3D prior1, 8T 4N4P 3D prior2). 8T 4N4P 3D prior1 was constructed by
adding two pFinFET read access transistors to a conventional 6T SRAM cell [42].
pFinFET access transistors were used during the read operation to improve the read
stability. 8T 4N4P 3D prior2 was constructed by replacing n-type read transistors
of a conventional 8T SRAM cell with p-type transistors [40, 43].
We implement the SRAM cells using a 14nm SOI FinFET technology and char-
acterize them via 2-D mixed-mode device simulations. We compare the cells based
on their dc and transient metrics, such as RSNM, WM, IREAD, ILEAK, TR, and TW.
8T 4N4P 3D proposed2 offers the smallest TR and lowest ILEAK compared to other
cells, along with a high RSNM. It has 28.1%, 31.6%, and 53.2% reduction in footprint
55
area, ILEAK, and TR, respectively, compared to those of 6T 4N2P 2D. It has 43.8%,
43.2%, and 29.0% reduction in footprint area, ILEAK, and TR, respectively, compared
to those of 8T 6N2P 2D at the cost of 8.8% and 57.1% degradation in RSNM and
WM, respectively [26].
The rest of the chapter is organized as follows. Section 4.2 describes the simulation
setup. Section 4.3 describes the design of monolithic SRAM cells by detailing their
schematics, layouts, and bitline/wordline capacitances. Section 4.4 presents the sim-
ulation results and comparison of cells based on their dc and transient metrics. Sec-
tion 4.5 investigates the impact of process variations, memory array configurations,
assist techniques, different temperatures, and gate workfunction values on SRAM
cells. Section 4.6 discusses our results in comparison with prior work and presents
key observations. Section 4.7 presents the concluding remarks.
4.2 Simulation setup
The simulation flow is shown in Fig. 4.1. First, the 3-D FEOL + BEOL structure
of each SRAM cell is synthesized based on its layout and technology parameter val-
ues. Then, parasitic capacitances are extracted using the transport-analysis-based
3-D TCAD capacitance extraction technique [62]. Sentaurus Device Simulator [59]
is used to perform 2-D hydrodynamic mixed-mode device simulations to obtain dc
and transient metrics of SRAM cells. It uses the Phillips unified mobility model
and doping-dependent Shockley-Read-Hall recombination model, along with band-
to-band tunneling and avalanche multiplication models, for accurate simulation. For
transient simulations, we use a 256×256 memory array configuration, consisting of
256 rows (wordlines) and 256 columns (bitlines). Wordlines and bitlines are modeled
using the π3 distributed RC line model. In our simulations, we assume that the top-
and bottom-layer transistors have the same quality. The simulations are performed
56
FinFET parameter values
SRAM
netlist
Technology
parameter values
3-D FEOL+BEOL structure
synthesizer
dc metrics: RSNM, WM,
IREAD, and ILEAK
3-D TCAD capacitance extraction
Sentaurus Device
Simulator
Transientdc
SRAM
layout
Array
configuration
Transient metrics: TR and
TW
Figure 4.1: Simulation flow for SRAM characterization.
at 300◦K temperature with a VDD of 0.8 V. A 100 nm thick SiO2 is used as the ILD
to eliminate the inter-layer coupling that may alter transistor behavior [71].
Dc and transient metrics that are defined in Section 1.1.3 are used to evaluate
the SRAM cells. Obtaining metric values of an 8T SRAM cell can differ from that
of a 6T SRAM cell. During the read operation of an 8T SRAM cell with nFinFETs
that provide a separate read path, the read bitline (RBL) and read word line (RWL)
are biased at VDD (VGND if read path transistors are pFinFETs) while the access
transistors are OFF . For these cells, IREAD is measured as the current drawn from
RBL when both read path FinFETs are ON . In the hold mode, wordlines (WL,
RWL) are at VGND if the the access/read transistors are nFinFETs and at VDD if
the access/read transistors are pFinFETs. Lastly, in the case of SRAM cells with
57
WL
VDD
AX1 AX2
PU1 PU2
PD1 PD2
BL BLB
L R
WL
VDD
AX2PU1 PU2
PD1 PD2
BL BLB
L R
AX1
AX3 AX4RWL RWL
WL
VDD
AX1 AX2
PU1 PU2
PD1 PD2
BL BLB
L R
RBL
RWL
RD1
RD2
WL
VDD
AX1 AX2
PU1 PU2
PD1 PD2
BL BLB
RD1
RD2
RWL
RBL
VDD
WL
VDD
AX1 AX2
PU1 PU2
PD1 PD2
BL BLB
RWL
RD1
RD2
RBLWL
VDD
AX1 AX2
PU1 PU2
PD1 PD2
BL BLB
RWL
RD1
RD2
RBL
(a) (b) (c)
(d) (e) (f)
L R L R L R
INV2INV1
Figure 4.2: SRAM cell schematics: (a) 6T 4N2P 2D/6T 4N2P 3D, (b) 8T 6N2P 2D,(c) 8T 4N4P 3D prior1, (d) 8T 4N4P 3D prior2, (e) 8T 4N4P 3D proposed1, and (f)8T 4N4P 3D proposed2.
RBL, we assume that the sense amplifiers are activated when the read bitline voltage
(VRBL) deviates by 100 mV from the initial point (∆VRBL = 100 mV). The rest of the
evaluation metric values are obtained in the same way for 6T and 8T SRAM cells.
4.3 Design of monolithic SRAM cells
We use the 14nm SOI FinFET technology to design the SRAM cells. Table 1.1 shows
the FinFET parameter values we use in our simulations.
4.3.1 Schematics of the SRAM cells
Fig. 4.2 shows the schematics of the SRAM cells we analyze. All SRAM cells use only
single-fin FinFETs to minimize area.
Fig. 4.2a shows the schematic of a conventional 6T SRAM cell. It consists of
a cross-coupled inverter pair (INV1:PU1-PD1, INV2:PU2-PD2) to store information
and two nFinFETs (AX1, AX2) to access the storage nodes. We evaluate both
58
the 2-D and 3-D implementations of the 6T SRAM cell. Although 6T 4N2P 3D is
implemented on two transistor layers, it has the same schematic as 6T 4N2P 2D.
6T 4N2P 2D is the baseline we use in our comparisons, unless otherwise specified.
We also assume internal nodes L and R store “1” and “0” initially.
Fig. 4.2b shows the 8T 6N2P 2D cell schematic [41]. 8T 6N2P 2D eliminates the
read-write conflict by decoupling the read operation from data retention. The read
operation is performed via a read path consisting of two nFinFETs (RD1, RD2) with-
out accessing the storage nodes directly. We only evaluate the 2-D implementation
of the conventional 8T SRAM cell since its 3-D implementation suffers from severe
footprint area inefficiency due to the asymmetry in the number of nFinFETs and
pFinFETs.
Fig. 4.2c shows the schematic of 8T 4N4P 3D prior1, a 3-D 8T SRAM cell ob-
tained by adding two p-type access transistors to a conventional 6T SRAM cell for
read stability enhancement [42]. Its 3-D implementation is balanced because each
transistor layer has four transistors. 8T 4N4P 3D prior1 alleviates the read-write
conflict. It uses weaker pFinFET access transistors during a read operation to in-
crease read stability and stronger nFinFET access transistors during a write operation
to increase writeability. However, the internal storage nodes still get disturbed dur-
ing a read operation and the read time is degraded due to the presence of the weak
pFinFET access transistors.
Fig. 4.2d shows the schematic of 8T 4N4P 3D prior2, a 3-D 8T SRAM cell, which
replaces the n-type read path transistors of a conventional 8T SRAM cell with p-type
transistors to reduce the footprint area in 3-D [40, 43]. It, however, may suffer from
a degradation in TR if the p-type transistors are slower than the n-type transistors,
which is the case in the 14nm FinFET technology.
Fig. 4.2e shows 8T 4N4P 3D proposed1. Unlike 8T 4N4P 3D prior2, it uses nFin-
FETs on the read path to keep TR small. However, it replaces nFinFET access
59
(a) (b) (c)
(d) (e)
(f) (g)
n-layer p-layer
n-layer p-layer n-layer p-layer
n-layer p-layer n-layer p-layer
GND BL VDD BLB GND VDD BL GND BLB VDD RBL GND BL VDD BLB GND
WL
RWL
WL
GND VDD GND BL BLBBL GND BLB RBL VDD
WL
RWL
WL
RWL
WL
RBL GND VDD GND BL BLB RBL GND VDD GND BL BLB
RWL
WL
RWL
WL
Figure 4.3: SRAM layouts: (a) 6T 4N2P 2D, (b) 6T 4N2P 3D, (c) 8T 6N2P 2D,(d) 8T 4N4P 3D prior1, (e) 8T 4N4P 3D prior2, (f) 8T 4N4P 3D proposed1, and(g) 8T 4N4P 3D proposed2.
transistors with pFinFETs to reduce the footprint area by equalizing the number of
nFinFETs and pFinFETs. 8T 4N4P 3D proposed1 suffers from poor writeability due
to weak pFinFET access transistors.
Fig. 4.2f shows the schematic of 8T 4N4P 3D proposed2. This cell, in addition
to employing pFinFET access transistors, uses IG pFinFET pull-up transistors with
their back gates biased at VDD to improve writeability. Weakening the pull-up tran-
sistors allows access transistors to write into the cell more easily. Among all analyzed
SRAM cells, it is the only cell that uses IG FinFETs.
4.3.2 Layouts of the SRAM cells
Fig. 4.3 shows the layout of the SRAM cells. We try to minimize the footprint
area when laying out the SRAM cells. Thus, we only used FinFETs with a single
fin. We use λ-based design rules when obtaining the layouts. In the 14nm FinFET
technology, we use λ = 7 nm. We assume the interconnect width and pitch are 4λ
60
and 8λ, respectively. MIVs are assumed to have a 4λ diameter and 8λ pitch. For
3-D SRAM cells, we assume the p-layer is at the bottom since the n-layer often needs
more routing.
Table 4.1 shows the footprint area values of the SRAM cells. 6T 4N2P 3D has
43.9% smaller footprint area compared to 6T 4N2P 2D. 8T 6N2P 2D has 28.1% larger
footprint area with respect to the baseline. 8T 4N4P 3D prior1 has 15.8% smaller
footprint area compared to 6T 4N2P 2D. However, its footprint area is 17.1% larger
than the footprint area of other 3-D 8T SRAM cells. The cells 8T 4N4P 3D prior2,
8T 4N4P 3D proposed1, and 8T 4N4P 3D proposed2 have the same footprint area,
which is 28.1% and 43.8% smaller than those of 6T 4N2P 2D and 8T 6N2P 2D,
respectively. Although the footprint area decreases for 3-D SRAM cells, their total
silicon area is larger. For example, the proposed cells have 12.3% larger total silicon
area compared to 8T 6N2P 2D.
Table 4.1: SRAM cell footprint area
SRAM W (λ) H (λ) Footprint NormalizedArea (µm2) Area (1×)
6T 4N2P 2D 57 20 0.056 1.006T 4N2P 3D 32 20 0.031 0.568T 6N2P 2D 73 20 0.072 1.28
8T 4N4P 3D prior1 40 24 0.047 0.848T 4N4P 3D prior2 41 20 0.040 0.72
8T 4N4P 3D proposed1 41 20 0.040 0.728T 4N4P 3D proposed2 41 20 0.040 0.72
4.3.3 Capacitance extraction
Table 4.2 shows the single-cell bitline (CBL, CBLB, and CRBL) and wordline capaci-
tances (CWL and CRWL) of each SRAM design, extracted using a transport-analysis-
based 3-D TCAD capacitance extraction technique [62]. Fig. 4.4 shows the FEOL
only and FEOL+BEOL structures for 6T 4N2P 2D. We use the industry-standard
61
Table 4.2: SRAM bitline and wordline capacitances
SRAM CBL (aF) CBLB (aF) CRBL (aF) CWL (aF) CRWL (aF)6T 4N2P 2D 77.34 77.05 N/A 143.11 N/A6T 4N2P 3D 72.76 72.74 N/A 138.62 N/A8T 6N2P 2D 80.21 75.13 66.63 171.91 60.08
8T 4N4P 3D prior1 88.14 87.87 N/A 119.44 119.748T 4N4P 3D prior2 62.60 64.50 61.00 140.70 90.43
8T 4N4P 3D proposed1 51.97 58.69 64.66 176.16 56.758T 4N4P 3D proposed2 60.99 58.65 64.66 176.00 56.75
thin-cell layout for 6T 4N2P 2D [72]. A long and thin cell layout leads to a smaller
CBL with respect to its CWL.
WL
AX2AX1
PU1
PU2
PD2
PD1
L
R
GND VDD GND
WL BL BLB WL
(a) (b)
WL
Figure 4.4: 6T 4N2P 2D cell: (a) FEOL only and (b) FEOL+BEOL. Dielectric re-gions are not shown.
6T 4N2P 3D has 5.9% and 3.1% smaller CBL and CWL compared to 6T 4N2P 2D.
Due to the asymmetry of the 8T 6N2P 2D layout, CBL and CBLB are different. We
use CBLavg to denote the average bitline capacitance (CBLavg = CBL/2 + CBLB/2).
8T 6N2P 2D has 0.6% larger CBLavg compared to the baseline. 8T 6N2P 2D CWL
is 20.1% larger due to its 28.1% longer cell width compared to 6T 4N2P 2D. Its
62
AX2
WL
AX1
PU1
R
L
PU2WL
VDD
VDD
(a) (b)
BL L VDD WL
WL VDD R BLB
Figure 4.5: 8T 4N4P 3D proposed2 cell p-layer: (a) FEOL only and (b)FEOL+BEOL. Dielectric regions are not shown.
CRWL is much smaller than its CWL because RWL is connected to a single transis-
tor. 8T 4N4P 3D prior1 has 14.0% larger CBLavg because each bitline is connected
to two access transistors (one nFinFET for write and one pFinFET for read oper-
ation) and the cell height is 20.0% larger relative to that of 6T 4N2P 2D. Its CWL
is 16.5% smaller than that of 6T 4N2P 2D since the cell width that determines the
wordline length is smaller for 8T 4N4P 3D prior1. 8T 4N4P 3D prior2 has 17.7% and
1.7% reduction in CBLavg and CWL, respectively, compared to those of 6T 4N2P 2D.
8T 4N4P 3D proposed1 has 28.3% smaller CBLavg and 23.1% larger CWL relative to
the baseline. Figs. 4.5-4.6 show the FEOL and FEOL+BEOL structures for the
p-layer and n-layer of 8T 4N4P 3D proposed2. 8T 4N4P 3D proposed2 has 22.5%
smaller CBLavg and 23.0% larger CWL with respect to 6T 4N2P 2D. Its CWL is larger
because WL is connected to both the p-layer and n-layer, as shown in Figs. 4.5-4.6.
CRWL is the smallest for the proposed cells, which helps improve their TR. It is 60.3%
63
and 5.5% smaller compared to CWL of 6T 4N2P 2D and CRWL of 8T 6N2P 2D, re-
spectively.
PD1
RWL
RD1
RD2
R
L
PD2
(a) (b)
RBL VDD WL
RWL GND GND
Figure 4.6: 8T 4N4P 3D proposed2 cell n-layer: (a) FEOL only and (b)FEOL+BEOL. Dielectric regions are not shown.
4.4 Simulation results
Table 4.3 shows the dc (RSNM, WM, IREAD, and ILEAK) and transient (TR and TW)
simulation results. For transient simulations, we assume a 256×256 memory array.
4.4.1 SRAM dc metric analysis
In this section, we analyze the dc metrics of the different SRAM cells.
RSNM
6T SRAM cells have the worst read stability because the access and pull-down
transistors are both low-Vth FinFETs. 8T 6N2P 2D, 8T 4N4P 3D prior2, and
64
Table 4.3: SRAM dc and transient metric values
SRAM RSNM (mV) WM (mV) IREAD (µA) ILEAK (nA) TR (ps) TW (ps)6T 4N2P 2D 23.69 252.21 149.32 3.57 52.43 55.876T 4N2P 3D 23.69 252.21 149.32 3.57 45.55 49.058T 6N2P 2D 264.25 252.10 149.32 4.30 34.56 75.41
8T 4N4P 3D prior1 140.28 252.26 121.06 3.65 69.85 41.748T 4N4P 3D prior2 264.27 252.26 58.30 3.45 57.57 47.61
8T 4N4P 3D proposed1 264.00 25.58 149.32 2.55 24.53 N/A8T 4N4P 3D proposed2 241.00 108.11 149.32 2.44 24.53 104.27
8T 4N4P 3D proposed1 have the highest RSNM because the read operation is
isolated from data retention. The storage nodes are not disturbed during a read
operation. 8T 4N4P 3D prior1 has a 116.59 mV increase in RSNM compared to
that of 6T 4N2P 2D owing to its weaker pFinFET read access transistors. The
voltage at the node storing a “0” (VR) is determined by the voltage divider formed
by AX2 and PD2 resistances. If VR rises above the trip voltage of INV1, the cell
value can flip. Thus, a lower VR is desired during a read operation. A weaker access
transistor has larger resistance, which leads to a smaller voltage rise at R. Therefore,
8T 4N4P 3D prior1 has much better read stability than 6T 4N2P 2D. However, its
RSNM is 46.9% smaller than that of 8T 6N2P 2D because the storage nodes are still
disturbed during the read operation. 8T 4N4P 3D proposed2 has 8.8% lower RSNM
than 8T 6N2P 2D due to the impact of the weaker pull-up transistors on the VTC
of the cross-coupled inverters.
WM
All SRAM cells, except the proposed cells, have a WM value around 252 mV.
8T 4N4P 3D proposed1 suffers significantly from writeability with a WM value of
only 25.58 mV. 8T 4N4P 3D proposed2 utilizes IG-type pFinFETs as pull-up tran-
sistors, whose back gates are biased at VDD to improve writeability. During a write
operation, AX1 tries to discharge storage node L while PU1 initially charges it. Thus,
a weakened PU1 allows AX1 to discharge L more easily during a write operation.
65
8T 4N4P 3D proposed2 still has 57.1% smaller WM compared to SRAM cells with
nFinFET access transistors. The writeability of the proposed cells can be improved
using write-assist techniques that are explored in Section 4.5.
IREAD
6T SRAM cells, 8T 6N2P 2D, and our 8T proposed cells have the same and highest
IREAD value of 149.32 µA because they all have two nFinFETs on the read path.
However, that does not result in an equal TR since bitline/wordline capacitances vary
among cells. 8T 4N4P 3D prior1 has 18.9% smaller IREAD compared to 6T 4N2P 2D
due to weaker pFinFET access transistors. 8T 4N4P 3D prior2 has the worst IREAD
since the read path consists of two pFinFETs. Its IREAD is 61.0% smaller than that
of 6T 4N2P 2D.
ILEAK
SRAM cells consume significant leakage energy as they are in the standby mode
most of the time. Thus, it is of utmost importance to design low-leakage SRAM
cells. 8T 6N2P 2D has the highest ILEAK among all cells. It has 20.6% higher ILEAK
compared to 6T 4N2P 2D due to two additional nFinFETs (RD1, RD2) on the iso-
lated read path. Despite the inclusion of two additional pFinFET access transistors,
8T 4N4P 3D prior1 has only 2.3% higher ILEAK compared to 6T 4N2P 2D because a
pFinFET has around 20× smaller ILEAK than an nFinFET in the technology we use.
Although 8T 4N4P 3D prior2 has two more pFinFETs, its ILEAK is surprisingly 3.2%
smaller compared to that of 6T 4N2P 2D. The reason is that, in the standby mode,
VR is slightly lower in 8T 4N4P 3D prior2 compared to that of 6T 4N2P 2D, which
reduces the ILEAK of PD1. 8T 4N4P 3D proposed1 has 28.6% and 40.8% smaller
ILEAK compared to 6T 4N2P 2D and 8T 6N2P 2D, respectively, due to the replace-
ment of the nFinFET access transistors with pFinFETs. 8T 4N4P 3D proposed2 has
66
the smallest ILEAK among all cells. In addition to including pFinFET access transis-
tors, biasing the back gate of pFinFET pull-up transistors at VDD also reduces ILEAK.
It has 31.6% and 43.2% smaller ILEAK compared to 6T 4N2P 2D and 8T 6N2P 2D,
respectively.
4.4.2 SRAM transient metric analysis
In this section, we analyze the transient metrics (TR and TW) of the different SRAM
cells. TR and TW do not represent read/write delays of the whole memory as we
do not consider the delays of the peripheral circuitry. They only consider wordline
and bitline delays, which are the only delay components affected by memory cell
design. TR and TW are useful to compare the performance of the cells and understand
how the wordline and bitline capacitances and the transistor strength affect SRAM
performance.
TR
TR depends strongly on IREAD and the bitline/wordline capacitances. Despite
their equal IREAD, 6T 4N2P 3D has 13.1% smaller TR due to a smaller CBL and
CWL compared to 6T 4N2P 2D. 8T 6N2P 2D has 34.1% smaller TR because it
has smaller CRBL and CRWL compared to CBL and CWL of 6T 4N2P 2D, respec-
tively. 8T 4N4P 3D prior1 suffers from a higher TR because its CBL is high and
the access transistors are pFinFETs, which leads to a small IREAD. Thus, its TR is
33.2% higher compared to that of 6T 4N2P 2D. Despite its 61.0% smaller IREAD,
8T 4N4P 3D prior2 has only 9.8% higher TR due to its small CRBL and CRWL. The
two proposed cells have the smallest TR owing both to their high IREAD and small
CRBL and CRWL. Their TR is 53.2% smaller compared to that of 6T 4N2P 2D. The
proposed cells have 57.4% smaller TR compared to 8T 4N4P 3D prior2 in which the
read path transistors are pFinFETs.
67
TW
6T 4N2P 3D has 12.2% smaller TW compared to the baseline due to its smaller
CWL. 8T 6N2P 2D has a large CWL, which leads to a 35.0% higher TW.
8T 4N4P 3D prior1 has 25.3% smaller TW compared to 6T 4N2P 2D due to its
smaller CWL. 8T 4N4P 3D prior2 has 14.8% smaller TW owing to its small cell width,
leading to a smaller CWL compared to that of the baseline. 8T 4N4P 3D proposed1 is
unable to complete the write operation successfully because the pFinFET access tran-
sistors are unable to win the fight against the equally strong pFinFET pull-up transis-
tors. 8T 4N4P 3D proposed1 can write the cell with the use of write-assist techniques
or stronger pFinFETs, which are explored in Section 4.5. 8T 4N4P 3D proposed2
ties the back gates of the pFinFET pull-up transistors to VDD to weaken them with
respect to access transistors and improve writeability. Despite the 82.53 mV improve-
ment in WM compared to that of 8T 4N4P 3D proposed1, 8T 4N4P 3D proposed2
still has 86.6% and 38.3% higher TW compared to 6T 4N2P 2D and 8T 6N2P 2D,
respectively.
Overall, the proposed cells have the highest IREAD and smallest ILEAK and
TR among all cells. They offer a high RSNM by isolating the read operation
from data retention. 8T 4N4P 3D proposed2 has slightly worse read stability
compared to 8T 4N4P 3D proposed1 due to the use of back-gate bias on pull-
up transistors. However, the proposed cells suffer from inferior writeability.
8T 4N4P 3D proposed1 is unable to even complete the write operation success-
fully, whereas 8T 4N4P 3D proposed2 can complete the write operation in 1.9×
worst time with respect to that of 6T 4N2P 2D. Poor writeability of the proposed
cells can be addressed through write-assist techniques, changing of the memory array
configuration, or modifying the strength of the FinFETs.
68
Table 4.4: Process variations
Parameter (unit) Nominal Value Range [−3σ, 3σ]LG (nm) 16 [14.4, 17.6]TSI (nm) 8 [7.2, 8.8]TOX (nm) 0.9 [0.81, 0.99]ΦGN (eV ) 4.4 [4.38, 4.42]ΦGP (eV ) 4.8 [4.78, 4.82]
4.5 Impact of process variations, memory array
configurations, assist techniques, different
temperatures, and gate workfunction values
In this section, we explore the impact of process variations, memory array configura-
tions, assist techniques, different temperatures, and gate workfunction values on the
SRAM cells to analyze the trade-offs involved among different design metrics across
different SRAM cells.
4.5.1 SRAM cell analysis under process variations
The impact of process variations on circuits increases with continued scaling in device
dimensions and VDD. SRAM cells are particularly prone to process variations because
they are generally constructed using minimum-sized transistors to minimize area. We
analyze the impact of variations in physical parameters, such as LG, TSI , and TOX ,
that have been shown to impact SRAM performance the most [65]. We also investigate
variations in Vth by modeling it with gate workfunction variations. Table 4.4 shows
the nominal value and [−3σ, 3σ] variation range of these parameters. We assume that
the physical parameters have a normal distribution and a 3σ/µ = 10% variation [66].
We generate 100 sample points using the Sobol sequence for quasi Monte Carlo sim-
ulations [73], which need dramatically fewer sample points to achieve a performance
close to Monte Carlo simulations. Process variations in the n-layer and p-layer are
69
Table 4.5: Distribution characteristics of SRAM dc and transient metrics
SRAM RSNM (mV) WM (mV) IREAD (µA) ILEAK (nA) TR (ps) TW (ps)µ σ µ σ µ σ µ σ µ σ µ σ
6T 4N2P 2D 25.87 4.92 265.01 1.70 182.16 4.66 3.71 1.10 52.51 1.09 55.81 0.786T 4N2P 3D 25.75 5.14 265.05 3.33 182.43 4.61 3.76 1.11 45.62 1.06 48.80 1.118T 6N2P 2D 268.49 3.72 264.95 1.70 182.14 4.63 4.46 1.30 34.64 1.01 76.04 1.04
8T 4N4P 3D prior1 122.76 3.75 265.09 3.30 137.03 3.15 3.80 1.04 69.90 1.25 40.90 0.908T 4N4P 3D prior2 268.58 3.62 265.19 3.32 64.61 1.88 3.60 1.06 57.70 1.23 47.75 1.12
8T 4N4P 3D proposed1 268.33 3.76 15.65 5.28 182.39 4.70 2.68 0.77 24.61 0.98 N/A N/A8T 4N4P 3D proposed2 240.22 5.96 101.77 5.57 182.46 4.47 2.57 0.74 24.60 0.98 103.97 3.13
independent of each other because the transistor layers are processed sequentially.
Thus, we generate two sets of sample points, one for each transistor layer in the case
of 3-D SRAM cells. Table 4.5 shows the mean (µ) and standard deviation (σ) of the
distribution characteristics obtained from process variation simulation results.
8T 4N4P 3D prior2 has the lowest σ for RSNM among all SRAM cells.
8T 4N4P 3D proposed2 has the largest σ, which can be due to its weaker pull-up
transistors.
8T 4N4P 3D proposed2 has the highest σ for WM as well. However, σ/µ is the
largest for 8T 4N4P 3D proposed1. The proposed cells suffer from WM variations
the most due to their weak pFinFET access transistors.
8T 4N4P 3D prior2 has the worst σ/µ ratio in IREAD due to its smaller IREAD.
The IREAD σ/µ ratio is similar for other cells.
ILEAK is affected the most by process variations as it has an exponential depen-
dence upon Vth. This leads to an average 4.6% higher µ compared to the nominal
ILEAK values shown in Table 4.3. The proposed cells have the smallest µ and σ values
for ILEAK. 8T 4N4P 3D prior2 has the smallest ILEAK σ/µ ratio.
The proposed cells have the smallest µ and σ values for TR. However, the σ/µ
ratio is also high for the proposed cells due to the small TR.
8T 4N4P 3D proposed2 suffers more in TW. It has the highest σ and σ/µ ratio
as it has the weakest pull-up transistors that are more prone to variations.
70
Overall, the writeability of 8T 4N4P 3D proposed2 is prone to process variations
the most because it has weaker pull-up transistors. Relatively higher variations in
RSNM for 8T 4N4P 3D proposed2 should not be a major issue since its RSNM is
high enough.
4.5.2 SRAM cell analysis under different memory array con-
figurations
We explore the impact of array configurations on SRAM transient metrics since TR
and TW are determined by the bitline and wordline capacitances. We explore four
different array configurations: 256×128, 512×128, 256×256, and 512×256. The num-
bers represent the number of wordlines and bitlines, respectively. The 256×256 case
is the baseline.
Fig. 4.7 shows TR under different array configurations. As expected, the proposed
cells have the smallest TR values under all configurations owing to their small CRBL
and CRWL. The advantage of the proposed cells grows higher for arrays with longer
wordlines. 8T 4N4P 3D prior1 has smaller TR than 8T 4N4P 3D prior2 when the
number of bitlines is 128 since 8T 4N4P 3D prior1 has a higher IREAD. As the word-
line length increases, its impact on TR increases. Thus, 8T 4N4P 3D prior2 has a
smaller TR than that of 8T 4N4P 3D prior1 when the number of bitlines is 256 due
to its smaller CRWL.
Fig. 4.8 shows TW under different array configurations. 8T 4N4P 3D proposed2
has a high CWL, which dominates TW. TW can be reduced for 8T 4N4P 3D proposed2
by 41.1% if the number of bitlines is halved. However, this may come at the expense of
other costs, such as inefficient utilization of memory area. As in the case of TR, some
cells may be better than others in terms of TW under specific array configurations.
For example, 8T 4N4P 3D prior2 has a smaller TW compared to that of 6T 4N2P 3D
for the 256×256 and 512×256 cases, whereas 6T 4N2P 3D has a smaller TW when
71
0
10
20
30
40
50
60
70
80
256x128 512x128 256x256 512x256
TR
(ps)
Array configuration (WLxBL)
6T_4N2P_2D
6T_4N2P_3D
8T_6N2P_2D
8T_4N4P_3D_prior1
8T_4N4P_3D_prior2
8T_4N4P_3D_proposed1
8T_4N4P_3D_proposed2
Figure 4.7: TR under different array configurations. TR values of the proposed cellsoverlap as they are almost the same.
0
20
40
60
80
100
120
256x128 512x128 256x256 512x256
TW
(ps)
Array configuration (WLxBL)
6T_4N2P_2D
6T_4N2P_3D
8T_6N2P_2D
8T_4N4P_3D_prior1
8T_4N4P_3D_prior2
8T_4N4P_3D_proposed2
Figure 4.8: TW under different array configurations.
the number of bitlines is 128. 8T 4N4P 3D prior2 has a smaller wordline resistance
since its wordline is implemented on a wider metal than that of 6T 4N2P 3D. Thus,
72
its TW gets better relative to that of 6T 4N2P 3D as the number of bitlines increases,
despite its higher CWL.
Overall, the results show that the memory array configuration can be modified to
improve transient metrics, although it may come at the expense of other costs not
explored in this work. The TW of 8T 4N4P 3D proposed2 can be improved by using
a memory array configuration with smaller wordline lengths. In addition, the impact
of modifying the memory array configuration can vary for different SRAM cells since
their bitline/wordline capacitances and resistances are different.
4.5.3 SRAM cell analysis under assist techniques
Next, we explore the impact of read/write assist techniques on the stability and
transient metrics of the SRAM cells. The read-assist techniques we explore are cell-
VDD boosting (VDD+), WL lowering (VWL-), negative BL (VBL-), and cell-GND
lowering (VGND-), and the write-assist techniques are cell-VDD lowering (VDD-), WL
boosting (VWL+), positive BL (VBL+), and cell-GND boosting (VGND+). In addition,
we investigate the impact of RWL boosting (VRWL+) and positive RBL (VRBL+) on
the read performance of the 8T SRAM cells. For each assist technique, we assume
a 0.3 V change in the target voltage level. For example, VBL+ assumes that bitline
voltages are precharged to 1.1 V. We report the most effective assist technique for each
cell for improving a specific SRAM metric. The results show that assist techniques
can help improve the writeability of the proposed cells.
Table 4.6 shows the most effective assist techniques for read stability and writeabil-
ity. It shows the RSNM and WM values after the assist technique is applied, along
with the improvement in RSNM and WM relative to the nominal case that does
not employ assist. Assist techniques help the RSNM of the 6T SRAM cells the most
since they have a small RSNM initially. Reducing VWL weakens the access transistors,
which alleviates the disturbance on internal storage nodes. For 8T 4N4P 3D prior1,
73
Table 4.6: Impact of assist techniques on read stability and writeability
SRAM Assist RSNM (mV) ∆ RSNM (mV) Assist WM (mV) ∆ WM (mV)6T 4N2P 2D VWL- 176.22 152.53 VGND+ 405.97 153.766T 4N2P 3D VWL- 176.22 152.53 VGND+ 405.97 153.768T 6N2P 2D VDD+ 292.92 28.67 VGND+ 406.45 154.35
8T 4N4P 3D prior1 VRWL+ 235.45 95.16 VGND+ 405.49 153.238T 4N4P 3D prior2 VDD+ 292.85 28.58 VGND+ 405.49 153.23
8T 4N4P 3D proposed1 VDD+ 292.82 28.82 VGND+ 229.63 204.058T 4N4P 3D proposed2 VGND- 277.82 36.82 VGND+ 322.73 214.62
VRWL+ increases the read stability because the access transistors are pFinFETs,
hence are weakened with increasing VRWL. The RSNM improvement for 8T SRAM
cells with separate circuitry for the read operation is smaller since they already have
a high RSNM. 8T 4N4P 3D proposed2 has its best RSNM when assisted with VGND-.
For all cells, WM improves the most with VGND+. Increasing VGND raises the
voltage at R, which is connected to the gate of PU1. A higher voltage at the gate
weakens PU1 and allows AX1 to discharge L and complete the write operation more
easily. The proposed cells have more than 200 mV improvement in WM with VGND+.
In addition, 8T 4N4P 3D proposed1 now performs the write operation successfully
with the help of the VGND+ assist technique.
Table 4.7 shows the best assist techniques for IREAD, TR, and TW. For all cells,
except 8T 4N4P 3D prior1 and 8T 4N4P 3D prior2, IREAD increases the most with
VGND- due to the increase in voltage difference between the bitline to be discharged
and VGND. For 8T 4N4P 3D prior1, VGND- increases the drain-to-source voltage of
AX4, which already operates in the velocity saturation mode. Thus, IREAD does
not increase significantly. VBL+ increases IREAD for 8T 4N4P 3D prior1 more than
VGND- does, since it increases the gate-to-source voltage (VGS) of AX4. The increases
in IREAD for 8T 4N4P 3D prior1 and 8T 4N4P 3D prior2 are smaller compared to
other cells as they use pFinFETs to discharge the bitlines during the read operation.
For 8T 4N4P 3D prior2, IREAD increases the most with VDD+ since the pFinFET
read path transistors try to charge RBL from 0 to VDD.
74
Table 4.7: Impact of assist techniques on read current and transient metrics
SRAM Assist IREAD (µA) ∆ IREAD (µA) Assist TR (ps) ∆ TR (ps) Assist TW (ps) ∆ TW (ps)6T 4N2P 2D VGND- 311.61 128.35 VGND- 24.97 -27.46 VDD- 33.97 -21.906T 4N2P 3D VGND- 311.61 128.35 VGND- 20.73 -24.82 VDD- 26.71 -22.348T 6N2P 2D VGND- 311.67 128.38 VGND- 13.73 -20.83 VDD- 50.46 -24.95
8T 4N4P 3D prior1 VBL+ 216.14 24.41 VRWL+ 43.56 -26.28 VDD- 20.73 -21.018T 4N4P 3D prior2 VDD+ 119.45 54.56 VDD+ 31.83 -25.74 VDD- 27.08 -20.53
8T 4N4P 3D proposed1 VGND- 311.67 128.38 VGND- 6.96 -17.57 VGND+ 57.42 N/A8T 4N4P 3D proposed2 VGND- 311.67 128.38 VGND- 6.97 -17.56 VGND+ 52.14 -52.13
TR is improved the most by VGND- for all cells except 8T 4N4P 3D prior1 and
8T 4N4P 3D prior2, similar to IREAD. The proposed cells maintain their advantage
compared to other SRAM cells in terms of TR. For 8T 4N4P 3D prior1, TR improves
more with VRWL+ despite the higher IREAD with VBL+, which shows that IREAD may
not capture the transient behavior of the read operation and fail to represent TR
accurately. Thus, transient simulations are necessary when determining the perfor-
mance of SRAM cells. For 8T 4N4P 3D prior2, TR decreases the most with VDD+
since the read path transistors are pFinFETs.
TW improves the most with VDD- for all cells except the proposed cells. Re-
duced cell-VDD decreases the VGS of PU1 and leads to a smaller current from VDD
to L through PU1. Thus, TW decreases since PU1 is weakened with respect to
AX1. For the proposed cells, VGND+ is the best assist technique for reducing TW.
Increasing VGND also reduces the current flowing through PU1 and enables AX1
to discharge L more easily. With the VGND+ assist technique, TW is halved for
8T 4N4P 3D proposed2 relative to no assist. 8T 4N4P 3D proposed1 is able to per-
form the write operation successfully with VGND+.
Overall, the assist techniques are shown to be effective at improving noise margins
and reducing read/write times. However, the assist techniques often come at the cost
of degrading other SRAM metrics, increasing power consumption, or introducing an
area overhead as they may require extra circuitry, such as level shifters or voltage
generators. For example, although VWL- increases the RSNM of 6T 4N2P 2D, it
reduces IREAD by 50.3%, which leads to a 20.2% increase in TR.
75
4.5.4 SRAM cell analysis under different temperature values
The operating temperature of a microprocessor can go up to 90◦C [64]. FinFET IOFF
increases exponentially with increasing temperature, whereas its ION does not change
much [14]. Thus, the impact of temperature on SRAM cell metrics is not prominent
except on leakage current. Fig. 4.9 shows ILEAK under different temperature values.
8T 6N2P 2D always has the highest ILEAK among all cells. Although the proposed
cells have the smallest leakage across different temperature values, their leakage re-
duction as a percentage becomes smaller with increasing temperature. At 370◦K,
8T 4N4P 3D proposed2 has 24.7% (31.6% at 300◦K) and 37.3% (43.2% at 300◦K)
smaller ILEAK compared to 6T 4N2P 2D and 8T 6N2P 2D, respectively.
0
10
20
30
40
50
60
70
300 310 320 330 340 350 360 370
I LE
AK
(nA
)
Temperature (K)
6T_4N2P_2D
6T_4N2P_3D
8T_6N2P_2D
8T_4N4P_3D_prior1
8T_4N4P_3D_prior2
8T_4N4P_3D_proposed1
8T_4N4P_3D_proposed2
Figure 4.9: ILEAK under different temperature values.
76
4.5.5 SRAM cell analysis under different gate workfunction
values
We analyze SRAM cells under different workfunction values to determine whether the
proposed cells are promising when targeted at various design objectives, such as high
stability, high performance, or low leakage. We choose four gate workfunction values
for nFinFETs (ΦGN = 4.3, 4.4, 4.5, and 4.6 eV) and four for pFinFETs (ΦGP = 4.6,
4.7, 4.8, and 4.9 eV). We simulate only four workfunction values for each FinFET
type since our aim is not to optimize the cells further but, rather, to understand
the trade-offs among various design objectives across different SRAM cells. We will
assume that the ΦGN = 4.4 eV and ΦGP = 4.8 eV case is the baseline. The Vth of an
nFinFET increases with increasing ΦGN . This leads to a smaller ION and IOFF . On
the other hand, the Vth of a pFinFET decreases with an increasing ΦGP , leading to
a higher ION and IOFF . We use the objective function in Eq. (4.1) to determine the
best workfunction pairs for different design objectives. We exclude IREAD from the
objective function since we already have TR that represents read performance.
Objective function =(RSNM ∗WM)α
(ILEAK + c)β ∗ (TR ∗ TW)γ(4.1)
The exponents are determined based on the targeted design objective after a
careful exploration of the value space. ILEAK has an exponential dependence upon
Vth, which changes linearly with the gate workfunction. Thus, the range of ILEAK is
very wide under the analyzed gate workfunction values. We use constant c to prevent
ILEAK from dominating the objective function value. We use the following exponents
and c value to determine the best gate workfunction pair for maximizing the objective
function for the corresponding design objective.
• α = 3, β = 1, γ = 1, and c = 1 for high stability,
• α = 1, β = 1, γ = 3, and c = 10 for high performance,
77
• α = 1, β = 1, γ = 1, and c = 0 for low leakage, and
• α = 3, β = 1, γ = 3, and c = 1 for overall quality.
Table 4.8 shows the gate workfunction values that maximize the objective
function for high stability. The cells tend to converge to high ΦGN and low ΦGP
values for better stability metrics, mostly because the RSNM generally increases
with increasing Vth. ILEAK is very small due to the use of FinFETs with high Vth.
8T 4N4P 3D proposed2 has the smallest TR along with a high RSNM. Compared
to its baseline (ΦGN = 4.4 eV, ΦGP = 4.8 eV), 8T 4N4P 3D proposed2 has better
RSNM, WM, and ILEAK at the cost of a degraded IREAD and TR.
Table 4.8: Gate workfunction values for designs with high stability
SRAM ΦGN (eV) ΦGP (eV) RSNM (mV) WM (mV) IREAD (µA) ILEAK (nA) TR (ps) TW (ps)6T 4N2P 2D 4.6 4.7 164.33 295.68 103.66 0.0044 83.56 77.126T 4N2P 3D 4.6 4.8 181.65 279.68 103.66 0.0882 74.37 72.668T 6N2P 2D 4.6 4.6 338.66 319.69 103.67 0.0028 61.03 96.64
8T 4N4P 3D prior1 4.5 4.6 196.71 322.62 80.40 0.0939 102.50 42.778T 4N4P 3D prior2 4.6 4.7 353.39 295.68 49.04 0.0065 72.73 66.07
8T 4N4P 3D proposed1 4.6 4.8 323.75 130.27 103.67 0.1734 50.29 87.198T 4N4P 3D proposed2 4.6 4.8 351.30 237.59 103.67 0.1734 50.28 71.11
Table 4.9 shows the gate workfunction pairs for high-performance designs. Fin-
FETs with a small Vth are favored for high performance as they are faster. However,
low-Vth FinFETs leak more and also have a degraded RSNM. 6T SRAM cells prefer
nFinFETs with ΦGN = 4.5 eV for better stability, while keeping the performance
high. Their leakage is much smaller compared to other SRAM cells due to their
higher-Vth FinFETs. 8T 4N4P 3D prior2 and the proposed cells use pFinFETs with
low Vth, which leads to a significant increase in leakage current. 8T 4N4P 3D prior2
beats all cells in terms of TR.
Table 4.10 shows results for the low-power designs. All SRAM cells, except
8T 4N4P 3D proposed1, use FinFETs with the highest Vth to minimize leakage.
8T 4N4P 3D proposed1 needs to use a slightly stronger pFinFET to be able to com-
78
Table 4.9: Gate workfunction values for high-performance designs
SRAM ΦGN (eV) ΦGP (eV) RSNM (mV) WM (mV) IREAD (µA) ILEAK (nA) TR (ps) TW (ps)6T 4N2P 2D 4.5 4.7 93.77 291.09 143.15 0.0959 66.81 62.396T 4N2P 3D 4.5 4.7 93.77 291.09 143.15 0.0959 59.19 55.158T 6N2P 2D 4.4 4.6 242.81 323.61 183.29 4.4625 34.56 63.58
8T 4N4P 3D prior1 4.4 4.6 179.63 323.61 80.50 3.7199 102.49 31.558T 4N4P 3D prior2 4.4 4.9 250.93 247.95 81.13 10.5124 44.23 51.20
8T 4N4P 3D proposed1 4.6 4.9 286.04 159.60 103.67 6.7937 50.30 71.518T 4N4P 3D proposed2 4.4 4.9 245.33 141.63 183.29 9.3969 24.53 80.40
plete the write operation successfully. 8T 4N4P 3D proposed2 has the smallest ILEAK
and TR.
Table 4.10: Gate workfunction values for low-leakage-power designs
SRAM ΦGN (eV) ΦGP (eV) RSNM (mV) WM (mV) IREAD (µA) ILEAK (nA) TR (ps) TW (ps)6T 4N2P 2D 4.6 4.6 135.74 320.32 103.66 0.0024 83.58 70.266T 4N2P 3D 4.6 4.6 135.74 320.32 103.66 0.0024 74.39 61.338T 6N2P 2D 4.6 4.6 338.66 319.69 103.67 0.0028 61.03 96.64
8T 4N4P 3D prior1 4.6 4.6 166.06 320.32 80.24 0.0024 102.52 54.208T 4N4P 3D prior2 4.6 4.6 329.37 320.32 33.66 0.0024 94.11 61.02
8T 4N4P 3D proposed1 4.6 4.7 356.12 87.90 103.67 0.0058 50.29 117.548T 4N4P 3D proposed2 4.6 4.6 282.97 117.01 103.67 0.0017 50.27 115.18
Table 4.11 shows results for designs with high quality. 8T 4N4P 3D proposed2
has the second best RSNM and TR. The leakage current of the proposed cells in this
setup is worse than that of other cells since they use stronger pFinFETs for better
WM and TW.
To summarize, we show that 8T 4N4P 3D proposed1 can use stronger pFinFETs
for successful writes to the cell. 8T 4N4P 3D proposed2 appears to be very promising
for almost all design objectives, particularly for the high-performance and low-leakage
scenarios. The results also show that optimizing the cells for a single design objective
often hurts other SRAM metrics.
Table 4.11: Gate workfunction values for overall high-quality designs
SRAM ΦGN (eV) ΦGP (eV) RSNM (mV) WM (mV) IREAD (µA) ILEAK (nA) TR (ps) TW (ps)6T 4N2P 2D 4.6 4.7 164.33 295.68 103.66 0.0044 83.56 77.126T 4N2P 3D 4.6 4.7 164.33 295.68 103.66 0.0044 74.38 67.028T 6N2P 2D 4.5 4.6 287.58 322.62 143.16 0.1126 47.62 77.80
8T 4N4P 3D prior1 4.5 4.6 196.71 322.62 80.40 0.0939 102.50 42.778T 4N4P 3D prior2 4.6 4.7 353.39 295.68 49.04 0.0065 72.73 66.07
8T 4N4P 3D proposed1 4.6 4.8 323.75 130.27 103.67 0.1734 50.29 87.198T 4N4P 3D proposed2 4.6 4.8 351.30 237.59 103.67 0.1734 50.28 71.11
79
4.6 Discussion
In this chapter, we evaluated two new 3-D SRAM cells and compared them with
their counterparts. In this section, we make some key observations, particularly on
the differences between our results and those of previous works.
The footprint area of cells varies among different works due to differences in the de-
sign rules and technology used. For example, 8T 4N4P 3D prior2 has 40.0%, 17.3%,
and 28.1% smaller footprint area compared to 6T 4N2P 2D that is analyzed in [40],
[43], and our work, respectively. Certain top- and bottom-layer transistors are aligned
in [43] to exploit inter-layer coupling, which can decrease area efficiency. In [40],
the transistors are implemented with planar CMOS technology, which makes it eas-
ier to adjust device widths and design area-efficient cells. The IREAD reduction of
8T 4N4P 3D prior2 in [40] is addressed by using larger p-type read transistors. All
FinFET designs, such as ours, suffer from the width quantization issue (i.e., the num-
ber of fins can only be an integer). Therefore, it may incur more area to increase the
strength of the read-path pFinFETs since the number of fins has to be an integer.
The strength of n-type transistors with respect to p-type transistors has a large
impact on SRAM cell metrics. For example, using p-type access transistors improves
writeability in [45], whereas it degrades writeability significantly in our designs. The
difference comes from the technologies used in these two works. The p-type transistors
are stronger in the 5nm technology used in [45]. Therefore, p-type access transistors
are stronger and offer improved writeability. On the other hand, pFinFETs are weaker
than nFinFETs in the 14nm technology we use in this chapter. Thus, the proposed
cells have poor writeability.
Two SRAM cells with the same number and type of transistors may have
different footprint areas, depending on the routing requirements. For example,
8T 4N4P 3D prior1 has a 17.1% larger footprint area compared to other 8T 4N4P 3D
SRAM cells, although it does not even have an RBL signal unlike the other 8T SRAM
80
cells. MIV connections between the two transistor layers particularly limit the area
efficiency of 3-D cells.
SRAM cells were compared using their area, speed, and performance metrics. The
significance of each metric depends strongly on the application and design objectives.
For example, the speed is of utmost importance for a register file or L1 cache. How-
ever, for higher cache levels such L2 and L3, the footprint area and leakage may be
more important. Thus, we compared the SRAM cells for different design objectives.
The proposed cells were shown to be useful especially for designs requiring low-leakage
and high read speed.
4.7 Chapter summary
We proposed two new 3-D 8T SRAM cells implemented in TLM technology and
compared them with 6T and 8T SRAM cells presented in prior works. Our proposed
cells use pFinFET access transistors to achieve an area-efficient 3-D design. They
offer the smallest leakage current and read time among all cells, along with high read
stability, at the expense of poor writeability. One cell we proposed uses IG FinFETs
as pull-up transistors with the back gates tied to VDD to improve writeability. This
cell reduces the footprint area and leakage current by 28.1% and 31.6%, respectively,
while improving the read time by 53.2% compared to a conventional 2-D 6T SRAM
cell. It also has 43.8%, 43.2%, and 29.0% reduction in footprint area, leakage current,
and read time, respectively, compared to a conventional 2-D 8T SRAM cell, at the
cost of 8.8% and 57.1% degradation in RSNM and WM, respectively. We showed that
the writeability of the proposed cells can be improved with write-assist techniques,
such as cell-GND boosting.
81
Chapter 5
Hybrid Monolithic 3-D IC
Floorplanner
Different monolithic 3-D integration styles such as BLM, GLM, and TLM have been
shown to reduce power consumption, delay, and interconnect length of a chip. An HM
design has modules implemented in different monolithic styles to further optimize the
design objectives such as area, wirelength, and power consumption. However, a lack
of electronic design automation tools makes HM 3-D IC design quite challenging. In
this chapter, we introduce 3-D-HMFP, the first HM 3-D IC floorplanner. We charac-
terize the OpenSPARC T2 processor core using different monolithic implementations
and compare their footprint area, wirelength, power consumption, and temperature.
We show, via simulations, that under the same timing constraint, an HM design of-
fers 48.1% reduction in footprint area and 14.6% reduction in power consumption
compared to those of the 2-D design at the cost of higher power density and slightly
higher temperature [27].
82
5.1 Introduction
Transistors have scaled for decades and become faster in each technology generation.
Interconnects, however, have become a growing concern due to increased wire resis-
tance with scaling [74, 75]. They have become delay and power consumption bottle-
necks in modern microprocessors [76, 77]. When repeaters and flip-flops are added to
the interconnects to reduce their delay, the power consumption of the microprocessor
increases even more [78].
Monolithic 3-D integration can address the interconnect bottleneck as it reduces
interconnect length and the number of repeaters needed for long interconnects. An
HM design consists of modules implemented in different monolithic implementation
styles. It offers better trade-offs among chip performance, area, wirelength, and
power consumption. In this chapter, we introduce 3-D-HMFP to enable efficient
exploration of the HM design space. We use 3-D-HMFP to compare different mono-
lithic implementations of the OpenSPARC T2 processor core in terms of footprint
area, wirelength, power consumption, and temperature.
We first create FinFET libraries based on TCAD device simulations and generate
cell layouts. We separate modules of the OpenSPARC T2 processor core into logic and
memory modules. In order to characterize the logic and memory modules, we develop
tools called FinPrin-monolithic and CACTI-monolithic, respectively, atop our prior
tools: FinPrin [79] and CACTI-PVT [80]. They feed area, power, and wirelength
values to 3-D-HMFP for floorplanning experiments. Compared to a conventional 2-D
design, an HM design reduces the footprint area by 48.1% and power consumption
by 14.6% under the same timing constraint of 1 ns (hence, under a clock frequency
of 1 GHz) [27].
We use the thermal analysis tool HotSpot 6.0 [81] for 3-D monolithic thermal
analysis, and incorporate it into 3-D-HMFP. Although an HM design consumes less
83
power, its power density is higher due to the reduction in footprint area. This leads
to slightly higher temperatures on chip.
The rest of the chapter is organized as follows. Section 5.2 illustrates the poten-
tial benefits of HM designs through an example. Section 5.3 describes the simulation
setup for the HM design. Section 5.4 introduces FinPrin-monolithic, which we use to
characterize the area, power, and delay of logic modules. Section 5.5 explains how
3-D gate-level placement can be done for GLM designs. Section 5.6 describes CACTI-
monolithic, which we use to characterize memory modules. Section 5.7 describes the
proposed 3-D-HMFP in detail through problem formulation, T*-tree representation,
simulated annealing engine, cost function, and global wire power consumption. Sec-
tion 5.8 describes how HotSpot 6.0 is used for thermal analysis of the chip. Section 5.9
discusses floorplanning simulation results. Section 5.10 presents the concluding re-
marks.
5.2 Motivation
In this section, we demonstrate how the HM design style can be used to find an optimal
design. Table 5.1 shows the footprint area and power values of the OpenSPARC T2
floating-point and graphics unit (FGU) implemented in different monolithic styles us-
ing FinPrin-monolithic. The GLM implementation of FGU has the smallest footprint
area and the lowest power consumption compared to its BLM and TLM implemen-
tations. However, using GLM modules ubiquitously may not guarantee the best chip
design, given the complicated architecture of modern microprocessors that contain
tens of modules.
The following example shows how an HM design can offer trade-offs between
desired objectives, such as area and power consumption. Suppose we have five logic
and memory modules with relative footprint area values shown in Table 5.2. A, B, and
84
Table 5.1: FGU footprint area and power values for different monolithic implemen-tations
Monolithic type Area (µm2) Power (µW)BLM 21696 22973TLM 13821 21396GLM 10633 19866
C refer to logic modules that can be implemented in both BLM and GLM, whereas
D and E refer to memory modules that are only implemented in BLM.
Table 5.2: Footprint area values assumed for the modules to be floorplanned
Logic MemoryMonolithic type A B C D E
BLM 12 6 8 15 9GLM 6 3 4 - -
Fig. 5.1 shows different floorplanning scenarios. The first design consists of only
BLM modules. It achieves a footprint area of 26 (12+6+8) in the best case. The
second design uses all GLM logic modules to reduce power consumption. However,
its minimum achievable footprint area is 28 (6+3+4+15). On the other hand, the
third design utilizes both BLM and GLM logic modules and achieves a footprint area
of 25 (6+4+15). Although a BLM implementation of module B has higher power
consumption than that of its GLM implementation, the footprint area reduction in
the third design may reduce silicon cost, increase yield, and decrease global wire
power consumption.
Despite its simplicity, this example shows the need for an HM floorplanner. A
processor core has tens of modules some of which can be implemented in 3-D, whereas
others are better implemented in 2-D.
85
(a) (b) (c)
D
A
BC
B
B
C
C
A
A
A
A
B
C
DD
E
EE
C
Figure 5.1: Floorplanning results of different monolithic implementations showingthe benefit of hybrid design: (a) BLM, (b) GLM logic + BLM memory, and (c)GLM/BLM logic + BLM memory.
5.3 Simulation setup
Floorplanning experiments require area and power consumption values of the mod-
ules. We characterize the modules of the OpenSPARC T2 processor core using differ-
ent monolithic styles and feed the area and power values to 3-D-HMFP for floorplan-
ning. Fig. 5.2 shows the HM design flow. First, we use Sentaurus Device Simulator
[59] to perform 2-D hydrodynamic mixed-mode device simulations to characterize
FinFET logic and memory cells. We assume a 14nm SOI FinFET technology, whose
parameters are shown in Table 1.1. We generate BLM and TLM cell layouts using the
Magic VLSI layout tool [82] to obtain the area values of the cells. We characterize
the FinFET logic and memory libraries based on timing, power consumption, and
area values obtained from device simulations and cell layouts. For the FinFET logic
library, we characterize INV, NAND, NOR (sizes 1×, 2×, 4×, 8×, and 16×), and a D
flip-flop (DFF). The FinFET memory library has a 6T SRAM cell in addition to logic
86
Sentaurus TCAD
Design Compiler
FinFET logic libraryDesign Compiler library
FinPrin-monolithic
FinFET parameter values
RTL netlist
Gate-level netlist
Logic modulesarea/power/timing
FinFET memory library
CACTI-monolithic
3D-HMFP
HotSpot 6.0
Area/power
Temperature
Magic VLSI
Gate power/timing Gate area
Memory modulesarea/power/timing
Interconnects
Memory/cache structure
Figure 5.2: The hybrid monolithic design flow.
cells. We separate the OpenSPARC T2 processor core components into two groups:
logic and memory modules. Logic modules include components such as the decoder
and execution unit, which are synthesizable by Design Compiler [83]. We use FinPrin-
monolithic, which is built on top of FinPrin [79], to characterize the logic modules.
Memory modules have caches, register files, etc., which are not synthesizable. We
use CACTI-monolithic, which is built on top of CACTI-PVT [80], to characterize
memory modules. We obtain area and power consumption values of modules under 1
ns timing constraint from FinPrin-monolithic and CACTI-monolithic and feed these
values to 3-D-HMFP. 3-D-HMFP also calculates the global wire power consumption.
It produces overall chip area, wirelength, and power consumption values for the 2-D
design and different 3-D monolithic designs, such as BLM, TLM, and HM. Lastly,
87
Capo
Is the timing constraint met?
Gate-level netlist
GSRC format
FLUTE
Yes
Repeater insertion
No Can add repeaters?
Yes
No
FinFET logic library
Area, power, and delay
Temperature/frequency
Is it gate-level monolithic?
No
Gate-level placementYes
Delay model
Area modelPower model
Wire model
Figure 5.3: The FinPrin-monolithic simulation flow.
we use HotSpot 6.0 [81] to calculate the temperature of the chip. In the following
sections, we describe the tools we use in the HM design flow.
5.4 FinPrin-monolithic
We use FinPrin-monolithic to calculate the delay, area, and power consumption of
logic modules. We build FinPrin-monolithic on top of FinPrin [79], a FinFET logic
circuit analysis and optimization tool. The FinPrin-monolithic simulation flow is
shown in Fig. 5.3. The steps of this simulation flow are as follows:
88
Table 5.3: FinPrin-monolithic results
Monolithic type Area Area Wirelength Wirelength Dynamic Leakage Wire Total power Total power(µm2) reduction (%) (µm) reduction (%) power (µW) power (µW) power (µW) (µW) reduction (%)
BLM 94659 0.0 4490727 0.0 49956 3545 41099 94600 0.0TLM 61072 35.5 3736031 16.8 49018 3468 34192 86678 8.4GLM 47294 50.0 3141085 30.1 48970 3462 28747 81179 14.2
1. FinPrin-monolithic takes the gate-level netlist generated by Design Compiler
and converts it into the GSRC bookshelf format for place-and-route.
2. Capo, a 2-D floorplacer [84], performs row-based placement. For GLM modules,
an intermediate step generates 3-D gate-level placement. Details of the gate-
level placement are given in the next section. FinPrin-monolithic applies global
routing to the placement.
3. FinPrin-monolithic calculates the area, power, and delay values of the circuit
and determines the critical path based on the FinFET logic library and tem-
perature. We have incorporated FLUTE, a fast lookup table based rectilinear
Steiner minimal tree algorithm [85], into FinPrin-monolithic for more accurate
wirelength and, consequently, delay and power calculations.
4. If the timing constraint is violated, FinPrin-monolithic adds repeaters to only
the interconnects on the critical path.
5. FinPrin-monolithic repeats steps 3 and 4 until the timing constraint is met or
adding repeaters on the critical path does not decrease the delay anymore.
We characterized 13 logic modules of the OpenSPARC T2 processor core using
FinPrin-monolithic. They are the instruction fetch unit consisting of three sub-units,
decode unit, two execution units, load-store unit, floating-point unit, trap logic unit,
memory management unit, gasket unit, performance monitor unit, and pick unit. We
assume 1 GHz clock frequency and 330 ◦K temperature. FinPrin-monolithic results
are shown in Table 5.3. It shows the total footprint area, wirelength, and power
consumption of the 13 logic modules. TLM modules have 35.5%, 16.8%, and 8.4%
89
P diffusion
N diffusion
P FET
N FET
(a) (b) (c)
Gate
Metal 1
Metal 2
Contact
Figure 5.4: 8× NAND cell layout: (a) BLM/GLM, (b) TLM n-tier, and (c) TLMp-tier.
reduction in the footprint area, wirelength, and power consumption, respectively,
compared to those of BLM modules. Fig. 5.4 shows the layout of an 8× NAND
cell in BLM/GLM and TLM. TLM cells have higher total silicon area compared to
BLM/GLM cells due to their use of intra-cell monolithic vias. The height of a BLM
cell is 84λ (0.588 µm) while the height of a TLM cell is 54λ (0.378 µm). Depending on
how the TLM cells are implemented, the footprint area can be smaller than the BLM
cells by 46% [86], 44.4%, or 38.8% [87]. Our TLM cells have 35.7% less footprint area
compared to the 2-D cells, which is reasonable relative to prior work. Although TLM
modules have a smaller footprint area due to their smaller cell layouts, their total
silicon area (on both layers) is 29.0% larger than that of BLM. Their power reduction
is mostly due to a decrease in wirelength. GLM modules offer the smallest footprint
area, the shortest wirelength, and the lowest power consumption. Compared to BLM
modules, their footprint area, wirelength, and power consumption are reduced by
50.0%, 30.1%, and 14.2%, respectively. The main contributor to the power reduction
90
Figure 5.5: Gate-level monolithic placement steps: (a) cell deflation, (b) deflated 2-Dplacement, (c) cell inflation, and (d) cell layer assignment.
in GLM modules is the decrease in wirelength. A 30.1% decrease in the wirelength is
responsible for a 13.1% decrease in the total power consumption. The wirelength in
GLM modules, however, is slightly optimistic because FinPrin-monolithic does not
include the wirelength in the z-dimension (which we estimate would only contribute
1-2% to the total wirelength).
5.5 Gate-level monolithic placement
2-D placement tools can be used for BLM and TLM designs. We modify a 2-D
placement process to perform GLM placement. We use a similar approach to the
ones reported in [53] and [19] to perform GLM placement. The placement steps are
shown in Fig. 5.5.
1. Cell deflation: The cell widths are halved.
2. Deflated 2-D placement: Capo [84] is used for row-based placement of deflated
cells.
3. Cell inflation: The cell widths are returned to their original values, leading to
cell overlaps.
91
4. Cell layer assignment: Cells are assigned to one of the layers in 3-D design.
A greedy algorithm is used to minimize the placement area and remove the
overlaps.
The algorithmic steps for greedy layer assignment are illustrated with the example
shown in Fig. 5.5. The algorithm starts with the first row and assigns its first cell to
the bottom layer and its second cell to the top layer. Then, it considers each cell in
the row in order and assigns it to the layer with less total cell width. Its aim is to
minimize the footprint area. It repeats this process for all rows.
We compared our cell layer assignment method with the Zero-One Linear Program
(ZOLP) method presented in [53]. ZOLP formulates cell layer assignment as a linear
programming problem. It performs cell assignment to minimize the total overlap
between cells while assuming that the cells have a fixed position. A legalization step
follows ZOLP layer assignment to remove the remaining overlaps, which may increase
the placement area and wirelength. We compared the two methods using 14 different
logic circuits, such as s713, s1196, s1488, and s9234 from the ISCAS’89 benchmark
suite, and the arithmetic-logic unit and multiplier unit from the OpenSPARC T1
benchmark. These circuits are only used to compare gate-level placement methods
and are different from the OpenSPARC T2 modules that are characterized in this
chapter. Table 5.4 shows the total area and wirelength values of the circuits for
different placements. The greedy method performs slightly better in both footprint
area and wirelength compared to ZOLP. Compared to 2-D placement, ZOLP reduces
the footprint area and wirelength by 47.5% and 25.8%, respectively. Our method,
on the other hand, reduces the footprint area and wirelength by 49.2% and 28.1%,
respectively, as shown in Table 5.4.
Fig. 5.6 shows a comparison of the above two methods on a small example to
demonstrate how the greedy method can perform better than ZOLP. In this example,
seven cells are assigned to two layers using the two methods. ZOLP leads to a larger
92
Table 5.4: Placement results of 14 test circuits
Design Area (µm2) Area Wirelength (µm) Wirelengthreduction (%) reduction (%)
2-D 2680 0.0 88192 0.0Deflated 2-D 1356 49.4 62667 28.9ZOLP 3-D 1408 47.5 65456 25.8Greedy 3-D 1361 49.2 63443 28.1
area compared to the greedy method because of the remaining overlaps between cells
after ZOLP layer assignment. The greedy method, on the other hand, removes all the
overlap during the layer assignment. It also performs better in wirelength because
wirelength is often proportional to the area of the design. The position of a cell
after greedy layer assignment is relatively close to its original position in 2-D deflated
cell placement. Therefore, the wirelength is close to the wirelength in deflated cell
placement, as shown in Table 5.4, which has already been optimized during deflated
cell placement. The greedy method is also faster because it is simpler and does not
need a legalization step. Therefore, we choose the greedy method since it performs
slightly better in area and wirelength and is significantly faster than ZOLP. Neither
method minimizes the number of vias. To minimize the number of vias, a min-cut
partitioner can be used to split the circuit [19].
Figure 5.6: Greedy layer assignment vs. ZOLP layer assignment showing the areabenefit of greedy method: (a) deflated 2-D placement, (b) cell inflation, (c) cell layerassignment, and (d) legalization (just for ZOLP).
93
Area/power/timing
Cache configuration- Cache size- Block size- Associativity- Bank count- Technology
FinFET design library- Logic cells- Memory cells- Capacitance model- Resistance model- Delay model
CACTI-monolithicCost function Technology parameters
Figure 5.7: The CACTI-monolithic simulation flow.
5.6 CACTI-monolithic
We used CACTI-monolithic to characterize 22 memory modules of the OpenSPARC
T2 processor core. CACTI-monolithic is built on top of CACTI-PVT [80], which is
a FinFET cache/memory modeling tool. The CACTI-monolithic simulation flow is
shown in Fig. 5.7. Cache parameters, such as cache size, block size, associativity, bank
count, and technology node, are defined by the user. CACTI-monolithic investigates
different memory configurations by exploring different values for various parameters,
such as the number of segments in a bank wordline, number of segments in a bank
bitline, and number of sets mapped to each bank wordline. It computes the timing,
area, and power consumption of the memory module based on FinFET logic and
memory libraries and technology parameter values. It finds the best configuration
based on the cost function defined by the user.
We use CACTI-monolithic to model BLM and TLM modules. We modify the
area values of the cells in the FinFET logic and memory libraries in order to char-
94
Table 5.5: CACTI-monolithic input parameter values for memory modules
Input ICD&ICT DCA&DTA IRF FRF DVA DTLB ITLB SCMMemory type Cache Cache RAM RAM RAM CAM CAM CAM
Cache size (bytes) 16896 9216 288 2048 128 608 304 64Block size (bytes) 32 16 9 8 32 16 8 1
Associativity 8 4 1 1 1 FA FA FANumber of banks 4 2 1 1 1 1 1 1
Tag size (bits) 30 30 - - - 66 66 37
acterize TLM memory modules. To the best of our knowledge, no tool is available
for characterizing GLM memory modules. Therefore, we do not evaluate such mod-
ules. Table 5.5 shows the CACTI-monolithic input parameter values for important
OpenSPARC T2 memory modules. They are instruction cache data array (ICD),
instruction cache tag array (ICT), data cache array (DCA), data cache tag array
(DTA), integer register file (IRF), floating-point register file (FRF), data valid bit
array (DVA), data translation lookaside buffer (DTLB), instruction translation looka-
side buffer (ITLB), and store buffer CAM (SCM). The rest of the memory modules
are smaller RAM arrays. We assume 1 GHz clock frequency and 330 ◦K operating
temperature. FA stands for fully-associative.
Table 5.6: CACTI-monolithic results
Monolithic type Area Area Dynamic Leakage H-tree Total power Total power(µm2) reduction (%) power (µW) power (µW) power (µW) (µW) reduction (%)
BLM 25444 0.0 3075 3124 4349 10548 0.0TLM 18998 25.3 2966 3115 4222 10303 2.3
CACTI-monolithic results are shown in Table 5.6. It shows the total footprint
area, dynamic power, leakage power, H-tree power, and total power of the 22 memory
modules. In all, TLM memory modules have 2.3% less power consumption and 25.3%
smaller footprint area compared to those of BLM memory modules. TLM memory
modules do not benefit from smaller TLM cell layouts as much as TLM logic modules
do, both in terms of area and power consumption. Although the TLM cell layout
is 35.7% smaller than the BLM cell layout, TLM memory modules have only 25.3%
smaller footprint area. The footprint area benefit of TLM cell array is reduced because
95
both BLM and TLM have similar routing area for the same structure, as shown in
Fig. 5.8.
(a) (b)
Figure 5.8: Area comparison: (a) BLM memory and (b) TLM memory module.
In addition, 25.3% smaller footprint area for TLM memory modules does not
translate into a significant reduction in total power consumption due to the organi-
zational structure and dimensions of the memory modules. CACTI-monolithic uses
horizontal and vertical H-tree structures to route data and address, as shown in
Fig. 5.9. These H-trees dominate the power of the memory modules. Their length
is significantly impacted by the width, height, and organizational structure of the
memory module. Table 5.7 shows results for an ICD [16.5KB (16KB data and 0.5KB
parity), 8-way set associative, 32B line size, 64 entries] implemented in BLM and
TLM. The module width of both implementations is similar. However, the height
of the TLM implementation is 24.4% smaller. Because the memory module width
is larger than its height, horizontal H-trees are longer and, consequently, affect the
power consumption the most. Both TLM and BLM memory modules have similar
horizontal H-tree lengths because their width is similar. This leads to only a small
reduction in power consumption for TLM memory modules, mostly due to shorter
vertical H-trees and number of repeaters.
96
H0 H1
H2 H2
V0 V0 V0 V0
V2 V2
V2 V2
V1 V1
V2 V2
V2 V2
V1 V1
V2 V2
V2 V2
V1 V1
V2 V2
V2 V2
V1 V1
Figure 5.9: Layout of horizontal and vertical H-trees of a memory module.
Table 5.7: Instruction cache data array dimensions of BLM and TLM implementations
Monolithic type Width (µm) Height (µm) Power (µW)BLM 151 78 5757TLM 150 59 5613
5.7 Hybrid-monolithic 3-D IC floorplanner
3-D-HMFP is built atop 3DFP [88], which is a thermal-aware floorplanner tool de-
veloped for TSV-based 3-D ICs. 3-D-HMFP can handle hybrid floorplanning while
taking the global interconnect power consumption into account. It is implemented in
C++. The main differences between 3DFP and 3-D-HMFP are as follows:
97
• 3DFP floorplans only 2-D modules on multiple layers, assuming no vertical
constraints. 3-D-HMFP handles both 2-D and 3-D modules and aligns parts of
GLM and TLM modules on two layers.
• 3DFP assumes wire power to be fixed at 30.0% of the total module power. 3-
D-HMFP does not make such an assumption since wire power is significantly
impacted by the technology node being employed. Instead, 3-D-HMFP calcu-
lates wire power based on the FinFET logic library and technology-dependent
wire resistance and capacitance values.
• 3DFP, at a time, can only explore the design space of modules implemented in
the same style. 3-D-HMFP explores a larger design space because it can replace
modules with their alternative implementations.
• 3DFP uses a B*-tree representation. 3-D-HMFP uses a T*-tree representation
for handling vertical constraints.
5.7.1 Problem formulation
We assume that the two parts of a GLM or TLM module on different layers are
aligned with each other to reduce the footprint area and wirelength. Not aligning
them can make the design complicated and increase routing. The main challenge in
HM floorplanning is to make sure that both parts of every GLM and TLM module
are aligned on the two layers, as shown in Fig. 2.1.
5.7.2 T*-tree representation
3-D-HMFP uses a T*-tree representation that is inspired by the B*-tree representa-
tion, which is an efficient and flexible data structure for non-slicing floorplans [89].
T*-tree has been used for 3-D floorplanning, considering vertically-aligned rectilinear
modules, in [49]. Fig. 5.10 shows a T*-tree representation and the corresponding
98
0
5 1 4
2
05 6
47
36
7
Top layer
Bottom layer
1
326
4 7
Figure 5.10: A T*-tree representation and the corresponding placement in 3-D.
placement of modules in 3-D. Modules 4, 6, and 7 are assumed to be implemented
in GLM or TLM. Hence, they have the same footprint on the two layers. Other
modules are assumed to be implemented in BLM and thus occupy space only on one
layer. The T*-tree represents each module with a node. A node can have up to three
children nodes.
3-D-HMFP uses a depth-first search algorithm to pack the modules. It starts from
the root node and places the module on the bottom layer. At each node, it visits the
middle, left, and right subtrees in order. The middle child module is placed on top
of the parent module on the upper layer. The left child module is placed to the right
of the parent module. The right child module is placed above the parent module on
the same layer. We use two linked-list data structures (one for each layer) to keep
track of the boundaries of the placement and determine the locations of the placed
modules. More details of the T*-tree representation and packing algorithm can be
found in [49] and [89].
Not every T*-tree representation corresponds to a valid placement. For example,
a GLM or TLM module cannot be the middle child of another module. 3-D-HMFP
checks the legality of the solution at each step. Any operation that leads to an invalid
99
solution is dismissed by the algorithm. For example, in Fig. 5.10, swapping module
1 with module 7 is not legal since the resulting state has three layers. Hence, such an
operation is rejected by the algorithm.
5.7.3 Simulated annealing engine
3-D-HMFP uses a simulated annealing engine to perturb the floorplanning solutions.
Five different operations can be performed on the T*-tree nodes in the floorplanning
algorithm for this purpose. The operations are as follows:
1. Rotate: It rotates a module by 90◦. In other words, it swaps the width and
height values of a module.
2. Resize: It modifies the width and height values of a soft module while keeping
the module area the same.
3. Move: It moves a module to another position in the T*-tree.
4. Swap: It swaps the positions of two modules in the T*-tree.
5. Replace: It replaces a module with one of its alternative implementations. For
example, it can replace a GLM module with its BLM implementation.
After each operation on the T*tree, the simulated annealing engine decides
whether to move to a new state based on a weighted cost function specified by the
user.
5.7.4 Cost function
The goal of the floorplanning algorithm is to find a solution with the smallest weighted
cost function, which is as follows for a 2-D design:
cost = α ∗ A+ β ∗WL, (5.1)
100
where A and WL are the area and wirelength of the design. Thus, 3-D-HMFP tries
to minimize the area and wirelength of a 2-D design.
For 3-D HM floorplanning, we use Eq. (5.2):
cost = α ∗ A+ β ∗WL+ γ ∗D + θ ∗ P, (5.2)
where D and P are the deviation and power consumption of the design. In 3-D
floorplanning, different layers need to have similar dimensions in order to fully utilize
the silicon area and minimize the dead space. We calculate the deviation as
D = |W1 −W2| ∗ |H1 −H2|/A, (5.3)
where W and H denote the width and height of the layers, respectively, and the
subscripts the layer number. D is smaller if the two layers have similar dimensions.
We also need to add power consumption to the cost function since hybrid floorplanning
can replace a module with an alternative that has a different power value. 3-D-HMFP
favors a GLM implementation of a module to its BLM implementation. Although they
have similar total silicon area, a GLM implementation has lower power consumption
due to reduced wirelength. This reduces the cost function we are trying to minimize.
5.7.5 Global wire power consumption
Interconnects have started to dominate the power consumption of modern micro-
processors. Thus, excluding interconnect power during floorplanning undermines
floorplanning quality and underestimates the peak temperature. At each stage, 3-
D-HMFP calculates the length of the global wires and determines the number of
repeaters that need to be added. It calculates the power consumption of the global
wires and repeaters based on wire resistance and capacitance values obtained from
ITRS 2013 [90] and the FinFET logic library.
101
3-D-HMFP only calculates the global wire power consumption. Intermediate wires
are already taken into account by FinPrin-monolithic and CACTI-monolithic when
they characterize logic and memory modules, respectively.
5.8 HotSpot
Reduced footprint area, vertically-stacked multiple active layers, and low thermal
conductivity of the inter-layer dielectric increase the power density of monolithic 3-D
ICs and lead to higher on-chip temperatures [76]. Thus, the peak temperature can
become an issue for monolithic 3-D ICs due to their higher power density. Hence, a
thermal model is needed for monolithic 3-D ICs in order to accurately estimate the
peak temperature and avoid hotspots on chip. We incorporate HotSpot 6.0 [81] into
3-D-HMFP for thermal analysis of the chip. 3-D-HMFP provides area and power
values of the different modules to HotSpot 6.0. Since HotSpot 6.0 cannot handle
interconnect power separately, we distribute global wire power among modules in
proportion to their areas. We use HotSpot’s grid model to obtain more accurate
temperature values. HotSpot 6.0 outputs not only the temperature of each block
but also temperatures at a finer grid level specified by the user. The user can effect
a trade-off between speed and accuracy of thermal simulation by changing the grid
resolution.
Fig. 5.11 shows the thermal model organization along with the thicknesses we
assume for each layer. In addition to the floorplan, power consumption, and layer
dimensions, we also specify the layer heat capacity and thermal resistivity values.
We use thermal properties and layer thicknesses of the default thermal package in
the HotSpot 6.0 distribution [91], which is reasonable for a typical high-performance
processor. We assume 1µm thickness for silicon layers consisting of active silicon
and metal layers for the 14nm technology node [92]. We assume 100nm inter-layer
102
dielectric thickness between silicon layers, which is enough to eliminate the inter-layer
coupling that may alter transistor behavior [71]. We report the peak temperature of
the chip in our results.
Thermal interface material: 20 μm
Heat spreader: 1 mm
Heat sink: 6.9 mm
Bulk silicon: 150 μm
Bottom layer: 1 μmTop layer: 1 μm
SiO2:
0.1 μm
Figure 5.11: Thermal model organization.
5.9 Results
The OpenSPARC T2 [93] processor core is characterized using six different implemen-
tations: 2-D, BLM, TLM, and three HM designs. The footprint area, global wire-
length, total power consumption, dead space, peak temperature of the chip, footprint
area reduction, global wirelength reduction, total power reduction, and simulation
run-time for each design is computed. The results are obtained through the same
methodology, using the same tools, and under the same 1 ns timing constraint. Logic
and memory modules are treated as soft and hard modules, respectively. Because
the simulated annealing algorithm is a stochastic technique, it only approximates the
globally optimum solution. Therefore, comparing the results after a single run for
each design may not be fair. Hence, we run 100 simulations for each design and use
the best case for comparison. We run floorplanning experiments on a 3.10 GHz ma-
chine with 64-bit quad-core Intel i5 processor, 8 GB DRAM, and Ubuntu 12.04 LTS
operating system.
103
Table 5.8: Comparison of different monolithic designs based on minimum area-powerproduct showing the benefit of hybrid designs in terms of footprint area and powerconsumption
Design Area Wirelength Power Dead Temperature Area Wirelength Power Run(µm2) (µm) (µW) space (%) (◦K) reduction (%) reduction (%) reduction (%) time (s)
2-D 120516 971365 125442 0.3 325.2 0.0 0.0 0.0 6.2BLM 61984 584606 116271 3.1 330.8 48.6 39.8 7.3 6.8TLM 81171 732676 112298 1.4 327.5 32.6 24.6 10.5 6.1
HM1 (GLM logic + TLM memory) 66420 678064 105867 0.2 329.0 44.9 30.2 15.6 6.0HM2 (GLM logic + BLM memory) 62600 755597 107134 4.1 329.8 48.1 22.2 14.6 6.3
HM3 (GLM/BLM logic + BLM memory) 61410 727442 110498 2.8 330.3 49.0 25.1 11.9 6.7
5.9.1 Floorplanning results
Table 5.8 shows the floorplanning results. The floorplan with the minimum area-
power product value is chosen for each design because both parameters are important.
All designs exhibit very small total dead space. Results show that a hybrid monolithic
design (HM2) offers a 48.1% reduction in footprint area and a 14.6% reduction in
power consumption at the cost of a small increase in temperature.
The 2-D design has the largest footprint area and power consumption among all
designs. Its peak temperature is the lowest since temperature depends on power
density. We use this design as the baseline.
The BLM design has 48.6% lower footprint area and 7.3% lower power consump-
tion relative to the 2-D design. Its power reduction is due to a reduction in global
wire power consumption.
The TLM design has 32.6% smaller footprint area relative to the 2-D design since
TLM cell layouts have 35.7% smaller footprint area compared to those of BLM. The
power reduction is 10.5% due to shorter local and global interconnects.
The HM1 design implements all logic modules in GLM and all memory modules
in TLM. This design can be floorplanned using a 2-D floorplanner since all modules
have two layers. Its footprint area and power consumption are, respectively, 44.9%
and 15.6% lower compared to those of the 2-D design. HM1 offers the best power
value among all designs because it saves power from both logic and memory modules.
104
It has 8.9% less power consumption compared to the BLM design mostly due to the
intra-module wirelength power reduction of GLM logic modules.
The HM2 design implements all logic modules in GLM and all memory mod-
ules in BLM. It uses GLM logic modules to save footprint area and power. It uses
BLM memory modules since they occupy smaller total silicon area compared to TLM
memory modules. Hence, the footprint area is reduced by 48.1%. HM2 has 14.6%
reduction in power consumption. The results are similar to those of [19] in which the
OpenSPARC T2 core implemented in monolithic design has 50.0% smaller footprint
area and 15.6% less power consumption compared to its 2-D counterpart. Similarly,
the monolithic OpenSPARC T2 core design was reported to have 14.0% smaller power
consumption compared to the 2-D design [55].
The HM3 design uses logic modules from both BLM and GLM implementations.
The footprint area is reduced by 49.0% and power by 11.9% compared to those of the
2-D design. HM3 offers the best footprint area since it can use both BLM and GLM
logic modules to minimize the dead space. It has slightly worse power consumption
than that of HM2 because it sometimes uses BLM logic modules instead of GLM
logic modules to reduce the footprint area and global wirelength. HM3 offers more
flexibility than HM2 because the HM3 design space is a superset of the HM2 design
space. However, exploring a larger design space increases algorithm run-time slightly.
In our case, HM3 takes 5.3% more time, on an average, to reach the solution compared
to HM2.
3-D-HMFP has the same thermal-aware floorplanning ability as its predecessor
3DFP [88]. However, temperature reduction was not significant for thermal-aware
floorplanning because, in our designs, the peak temperature and the temperature
range of different floorplans were already low for non-thermal-aware floorplans.
Fig. 5.12 shows the floorplans of the 2-D, BLM, TLM, and HM3 designs reported
in Table 5.8. The 2-D design is implemented on a single layer and the 3-D designs on
105
(a) (b) (c) (d)
Figure 5.12: Floorplanning results showing that the vertical constraints are met forTLM/GLM modules: (a) 2-D, (b) BLM, (c) TLM, and (d) HM3 (GLM/BLM logic +BLM memory). Colors indicate the implementation type: blue: BLM, brown: TLM,and green: GLM
two (top and bottom) layers. The TLM design has the same floorplan on both layers
since all modules have identical dimensions on both layers. The HM3 design has both
GLM and BLM modules. GLM modules have two parts and occupy the same space
on both layers.
106
5.9.2 Floorplanning results at minimum area, wirelength,
and power values
One may have different objectives for floorplanning, such as decreasing the footprint
area, reducing the power consumption, or reducing the wirelength for easier rout-
ing. Therefore, we compare the results for the minimum area, wirelength, or power
configurations to better understand the trade-offs.
Table 5.9 shows the minimum-area floorplanning results. The various designs
from Table 5.8 are used as the baseline for the corresponding designs from here on.
Minimum footprint area is achieved at the cost of a higher power consumption and
increase in total wirelength. The minimum-area 2-D design has a 23.5% larger wire-
length and 3.9% higher power consumption. For the minimum-area TLM design,
wirelength and power increase by 36.3% and 6.3%, respectively, with only a 1.0%
footprint area reduction. The HM1 and HM2 designs have less than 1.0% smaller
area at the cost of around 1.0% power consumption increase. The HM2 design, inter-
estingly, has 0.8% higher power despite a 2.4% decrease in wirelength. This can be
explained based on the number of repeaters on long global interconnects, since both
HM2 designs have the same modules, but different global interconnects. The higher
total global wirelength does not necessarily lead to a higher power consumption. The
design with the higher overall wirelength but fewer long interconnects can have less
power, because short interconnects do not require repeaters. The HM3 minimum-area
design has 2.1% area reduction with 7.8% increase in power. These results show that
the minimum-area design might not be the best design overall. The floorplanner may
place modules that are connected to each other with global wires far from each other
to minimize area. This can increase the wirelength and power consumption.
Table 5.10 shows the results for minimum wirelength. Reducing the global wire-
length can save power and make routing easier. However, the area may increase when
trying to minimize the wirelength. The BLM design has 26.0% less wirelength at a
107
Table 5.9: Minimum area configurations of different monolithic designs
Design Area Wirelength Power Dead Temperature Area Wirelength Power(µm2) (µm) (µW) space (%) (◦K) reduction (%) reduction (%) reduction (%)
2-D 120132 1199780 130388 0.0 325.4 0.3 -23.5 -3.9BLM 60888 729625 119870 1.4 331.5 1.8 -24.8 -3.1TLM 80358 998886 119333 0.4 328.3 1.0 -36.3 -6.3
HM1 (GLM logic + TLM memory) 66305 754492 107121 0.0 329.3 0.2 -11.3 -1.2HM2 (GLM logic + BLM memory) 62150 737539 108043 3.4 329.9 0.7 2.4 -0.8
HM3 (GLM/BLM logic + BLM memory) 60114 785678 119116 0.2 330.2 2.1 -8.0 -7.8
cost of a 9.9% footprint area increase. The HM3 design can also reduce the wire-
length significantly owing to its exploration of a larger design space. It has 28.1%
less wirelength, but 4.0% footprint area and 3.6% power increase despite its shorter
wirelength. Increase in power is due to replacement of GLM modules with BLM
modules.
Table 5.10: Minimum wirelength configurations of different monolithic designs
Design Area Wirelength Power Dead Temperature Area Wirelength Power(µm2) (µm) (µW) space (%) (◦K) reduction (%) reduction (%) reduction (%)
2-D 122000 869321 124983 1.6 325.0 -1.2 10.5 0.4BLM 68115 432366 113402 11.8 329.4 -9.9 26.0 2.5TLM 84671 672989 111393 5.4 327.1 -4.3 8.1 0.8
HM1 (GLM logic + TLM memory) 67262 628482 104832 1.4 328.9 -1.3 7.3 1.0HM2 (GLM logic + BLM memory) 64064 613726 105918 6.3 329.5 -2.3 18.8 1.1
HM3 (GLM/BLM logic + BLM memory) 63856 523247 114516 5.9 330.3 -4.0 28.1 -3.6
Table 5.11 shows the designs with the smallest power consumption. Power reduc-
tion is generally obtained from wirelength reduction at the cost of a higher footprint
area. For the 2-D design, power consumption decreases by 1.0% owing to a 3.2%
reduction in wirelength, but the area increases by 2.1%. The BLM design has 2.5%
less power consumption at a cost of 9.9% increase in footprint area. The HM2 design
has 2.1% power reduction, but 15.4% footprint area increase. The HM3 design has
3.8% less power consumption with a 4.2% increase in footprint area. Unlike other
cases, HM3 can save power by replacing BLM modules with GLM ones at the cost of
an increase in footprint area.
Overall, highly optimizing a single design objective value may come at the cost of
degrading other design objective values significantly. Therefore, the designer needs to
select the cost function and the coefficients of the specific design objectives carefully
in order to avoid exaggerating the effect of only one objective.
108
Table 5.11: Minimum power consumption configurations of different monolithic de-signs
Design Area Wirelength Power Dead Temperature Area Wirelength Power(µm2) (µm) (µW) space (%) (◦K) reduction (%) reduction (%) reduction (%)
2-D 123057 940181 124159 2.4 324.9 -2.1 3.2 1.0BLM 68115 432366 113402 11.8 329.4 -9.9 26.0 2.5TLM 84903 688873 110570 5.7 327.2 -4.6 6.0 1.5
HM1 (GLM logic + TLM memory) 69296 631438 104236 4.3 328.4 -4.3 6.9 1.5HM2 (GLM logic + BLM memory) 72240 657147 104907 16.9 328.1 -15.4 13.0 2.1
HM3 (GLM/BLM logic + BLM memory) 63960 666829 106284 6.7 329.4 -4.2 8.3 3.8
5.9.3 Discussion
Floorplanning results show that HM designs offer trade-offs among chip footprint
area, wirelength, and power consumption. The quality of the HM solution depends
significantly on the number and implementation style of the modules. For a 3-D design
with two layers, only 2-D modules can be on top of other 2-D modules. Therefore, for
quality hybrid solutions, 2-D modules should be grouped into two groups with similar
overall area. This was possible with our benchmark, as evidenced by dead space as
small as 3.4% in the HM2 design, in which only BLM memory modules are placed
on top of other BLM memory modules as logic modules are implemented in GLM
style. However, this may not be the case for all benchmarks. If 2-D modules cannot
be grouped into two groups with similar area values, then HM2 would not be able
to find good solutions in terms of footprint area. In that case, fortunately, HM3 can
still find quality solutions to reduce footprint area and wirelength, since it can replace
GLM logic modules with BLM logic modules. As expected, replacing GLM modules
with their alternatives in the HM3 design increases the power consumption of the
modules, since GLM modules have the smallest power consumption. On the other
hand, HM3 can reduce global wire power consumption by reducing the footprint area.
Therefore, replacing GLM modules with their alternatives may increase or decrease
the total power consumption depending on the benchmark.
The following assumptions are made in our simulations.
109
1. 3-D designs have two transistor layers. However, monolithic designs can also
be realized with more layers. More layers may facilitate a further reduction in
footprint area, wirelength, and power consumption. However, the peak tem-
perature may increase further, since the power density will increase and the
upper layers will be farther from the heat sink. However, HM designs may be-
come even more attractive as more layers are added because of the flexibility
they offer. For example, an HM floorplanner can mitigate hotspots by replac-
ing GLM modules that have high power density with their BLM alternatives.
Moreover, the 3-D-HMFP cost function can be modified to place modules with
high power density on the lower layers. 3-D-HMFP can easily be modified to
handle more layers with a few modifications, such as adding a contour for each
layer to keep track of the placement on that layer during packing and modifying
the legalization constraint on the number of layers.
2. In 3-D designs, transistor quality is the same on both transistor layers. It
was previously shown that top-layer transistors, which are processed at a lower
temperature, can match the quality of the bottom-layer transistors [94].
3. BLM, GLM, and TLM cells are assumed to have the same power and timing
characteristics as they behave almost identically at the cell level. This assump-
tion holds true only if top-layer transistors have the same quality as bottom-layer
transistors.
4. Simulated annealing, a commonly used stochastic technique, is used to per-
turb floorplanning solutions. Simulated annealing worked well because there
are only 35 modules in our design. However, stochastic techniques can suffer
from scalability issues if the number of modules is significantly higher. In such
cases, algorithms based on non-stochastic approaches, such as deferred decision
making [95], can be used.
110
5. Fabrication cost and challenges are taken into account. However, fabrication
cost can be an issue with monolithic designs. BLM and GLM designs require
additional inter-active-layer metal layers to connect intra-module cells. On the
other hand, a TLM design requires much fewer metal layers between active
layers. A recent study [96] estimates that a TLM design is 23.0% less costly
compared to a GLM design, thanks to fewer metal layers and fabrication steps
in the TLM fabrication process. Although, based on objectives pursued in this
work, a TLM design does not seem to be promising, it may be advantageous if
fabrication cost is taken into account. Thus, more work needs to be done on
incorporating fabrication cost in the floorplanning cost function of monolithic
designs.
The results may differ if some of the assumptions do not hold. For example, the
transistor quality of top-layer transistors may be worse than that of bottom-layer
transistors in high-volume production of monolithic designs. The benefits of 3-D
designs may still outweigh the shortcomings of the fabrication process despite the
degradation in top-layer transistor quality since critical modules can be fabricated on
the bottom layer. In case assumptions change, the proposed tools can be modified to
accommodate new assumptions and re-evaluate designs using the same methodology.
5.10 Chapter summary
We introduced a 3-D HM floorplanner in this chapter. We characterized the
OpenSPARC T2 processor core in different monolithic designs and compared their
footprint area, wirelength, power, and temperature values. We showed, via simula-
tions, that HM designs offer interesting trade-offs among different design objectives,
such as footprint area, wirelength, and power consumption. We showed that, relative
111
to the 2-D design, a 3-D HM design can reduce footprint area by 48.1% and power
consumption by 14.6%.
112
Chapter 6
McPAT-monolithic: An
Area/Power/Timing Framework
for 3-D Hybrid Monolithic
Multi-Core Systems
3-D ICs have the potential to push Moore’s law further by accommodating more
transistors per unit footprint area along with a reduction in power consumption,
interconnect length, and the number of repeaters. Monolithic 3-D integration is
particularly promising in this regard as it offers a very high connectivity between
vertical transistor layers owing to its nanoscale MIVs. An HM design aims to further
optimize area, power, and performance of the chip by combining different monolithic
styles. In this chapter, we introduce McPAT-monolithic, a framework for modeling
HM multi-core architectures. We use the OpenSPARC T2 processor as a case study
to compare different monolithic implementation styles and explore the benefits of HM
design. Our simulations show that, under the same timing constraint, an HM design
113
offers 47.2% reduction in footprint area and 5.3% in power consumption compared to
a 2-D design at the cost of slightly higher on-chip temperature [28].
6.1 Introduction
Device scaling has become more challenging due to lithographic constraints, increas-
ing manufacturing costs, and exacerbating SCEs [6]. Moreover, the contribution of
interconnects towards delay and power consumption of the chip has been increas-
ing with scaling [75, 77]. 3-D ICs offer a new approach to fit more transistors in
the same footprint area while decreasing power consumption and delay by reducing
interconnect lengths [19]. Compared to other 3-D IC solutions, monolithic 3-D inte-
gration offers better performance and power efficiency due to its high-density MIVs
[22]. In addition, sequential processing of transistor layers in monolithic integration
introduces less parasitics and offers better alignment.
An HM design combines multiple monolithic integration styles to explore trade-
offs among chip footprint area, power consumption, and performance. In Chapter 5,
we introduced 3-D-HMFP, a 3-D HM floorplanner, to explore the HM design space
at the processor core level. However, modern processors consist of multiple cores,
multi-level caches, and a network-on-chip (NoC) to enable communication among
these modules. In this chapter, we present McPAT-monolithic, a framework for HM
design at the multi-core level.
We build McPAT-monolithic atop McPAT, an area/power/timing analysis
framework for multi-core designs implemented in planar technologies [97]. McPAT-
monolithic uses FinFET technology to implement different monolithic styles. We
develop the tools necessary to explore the HM design space and integrate them into
McPAT-monolithic. McPAT-monolithic uses FinPrin-monolithic, CACTI-monolithic,
and Orion-monolithic to model logic, memory, and NoC modules, respectively. All
114
the tools were evaluated based on the same 14nm SOI FinFET design library we
created via TCAD device simulations [59]. We use 3-D-HMFP for processor core
floorplanning and HotSpot 6.0 [81], a thermal analysis tool, to compute the on-chip
temperature. We use the OpenSPARC T2 SoC as a case study to compare the
different monolithic styles, including HM at the multi-core level. We show that an
HM design reduces the footprint area by 47.2% and power consumption by 5.3%
compared to a conventional 2-D design at the cost of increasing the peak temperature
by 8 ◦C, assuming the same clock frequency of 1.4 GHz [28].
We make the following contributions.
1. We introduce McPAT-monolithic for a speedy exploration of different monolithic
designs at the multi-core system level.
2. We integrate 3-D-HMFP into McPAT-monolithic for floorplanning of HM de-
signs and computing global interconnects.
3. We analyze different monolithic implementations of OpenSPARC T2 at the
processor core and SoC levels and discuss the benefits of HM designs.
The rest of the chapter is organized as follows. Section 6.2 describes the simulation
setup and design flow. Section 6.3 describes the modeling of FUs using FinPrin-
monolithic. Section 6.4 describes characterization of memory modules using CACTI-
monolithic. Section 6.5 describes Orion-monolithic, an NoC area and power modeling
tool. Section 6.6 introduces McPAT-monolithic, a multi-core architecture modeling
framework. Section 6.7 presents the simulation results. Section 6.8 concludes the
chapter.
115
6.2 Simulation setup
Fig. 6.1 shows the HM SoC design flow we use. First, we characterize FinFET logic
and memory cells through 2-D hydrodynamic mixed-mode device simulations using
Sentaurus Device Simulator [83]. We use the Magic VLSI layout tool [82] for BLM,
GLM, and TLM cell layouts. We generate the FinFET design library based on the
cell area, power, and timing characteristics. The library has INV, NAND, NOR (sizes
1×, 2×, 4×, 8×, and 16×), a D flip-flop (DFF), and 6T SRAM cell.
We use McPAT-monolithic to model area, power, and timing of the multi-core
systems. It uses Orion-monolithic for NoC and CACTI-monolithic for cache/memory
modeling. It uses macromodels derived for FUs obtained through FinPrin-monolithic.
We feed area and power consumption of the processor core modules obtained from
McPAT-monolithic to 3-D-HMFP for floorplanning of the processor core. Lastly, we
use HotSpot 6.0 [81] to compute the temperature of the chip. If the final temperature
of a module computed by HotSpot is different than its initially assumed tempera-
ture, power is re-computed until the temperature becomes consistent. We make the
following assumptions in our simulations.
1. All 3-D designs have two transistor layers.
2. The top- and the bottom-layer transistors have the same behavior. If the top-
layer transistors are processed at a lower temperature to sustain the quality of
bottom-layer transistors, they may have slightly different characteristics. How-
ever, in [94], it was shown that the performance of top-layer transistors that are
processed at a lower temperature can match the performance of bottom-layer
transistors, which is the basis for our assumption.
3. The power and timing characteristics of the BLM, GLM, and TLM cells are the
same as they exhibit almost identical behavior at the cell level when we assume
the same transistor model on both layers.
116
Sentaurus TCAD
FinFET design library
FinPrin-monolithic
FinFET parameter values
3-D-HMFP
HotSpot 6.0
Area/power
Temperature
Magic VLSI
Cell power/timing Cell area
Interconnects
Area model Power model Timing model
McPAT-monolithic
CACTI-monolithic(memory)
Orion-monolithic(NoC)
FU models Processor description
Figure 6.1: The HM SoC design flow.
6.3 Modeling of functional units
We use FinPrin-monolithic to characterize various FUs, such as arithmetic-logic unit
(ALU), multiplier (MUL), and floating-point unit (FPU). We characterized six FUs of
the OpenSPARC T2 core using FinPrin-monolithic and integrated them into McPAT-
monolithic. These units are ALU, FPU, gasket unit (GKT), pick unit (PKU), perfor-
mance monitoring unit (PMU), and trap logic unit (TLU). The remaining modules
of the OpenSPARC T2 processor core are modeled using McPAT-monolithic. We
assume 1.4 GHz clock frequency and 330 ◦K temperature.
Table 6.1 shows FinPrin-monolithic results (footprint area and power consump-
tion) for the six FUs implemented in different monolithic styles. In all, TLM modules
have 35.7% and 5.3% reduction in the footprint area and power consumption, respec-
117
Table 6.1: Footprint area and power consumption results for FUs
Monolithic type → BLM TLM GLMFunctional unit Area Power Area Power Area Power
↓ (µm2) (µW) (µm2) (µW) (µm2) (µW)ALU 9078 109822 6309 (-30.5%) 109728 (-0.1 %) 4879 (-46.3 %) 104829 (-4.5%)FPU 23866 210513 15203 (-36.3%) 195861 (-7.0 %) 11696 (-51.0 %) 181329 (-13.9%)GKT 3029 3182 1933 (-36.2%) 2971 (-6.6 %) 1532 (-49.4 %) 2868 (-9.8%)PKU 4068 7145 2618 (-35.6%) 6669 (-6.7 %) 2051 (-49.6 %) 6372 (-10.8%)PMU 2622 2787 1704 (-35.0 %) 2568 (-7.9 %) 1324 (-49.5 %) 2533 (-9.1%)TLU 18614 27367 11663 (-37.3%) 23725 (-13.3 %) 9014 (-51.6 %) 22164 (-19.0%)Total 61277 360816 39430 (-35.7%) 341522 (-5.3 %) 30497 (-50.2 %) 320094 (-11.3%)
tively, compared to BLM modules. The area reduction is as expected since the TLM
logic cells we use have a similar footprint area reduction compared to 2-D cells [27].
The reduction in power consumption differs slightly from one logic module to another
based on how interconnects contribute to power consumption. GLM modules offer
around 50% footprint area reduction compared to BLM modules. The area reduction
is sometimes larger than 50% due to the reduction in the number of repeaters that
are needed for long interconnects. The total power consumption of GLM modules is
11.3% smaller than that of BLM modules owing to shorter interconnects and fewer
repeaters.
6.4 Modeling of memory modules
McPAT-monolithic uses CACTI-monolithic to model BLM and TLM memory mod-
ules. For the OpenSPARC T2 core, McPAT-monolithic characterizes 20 modules
that include the instruction cache, data cache array, integer register file, floating-
point register file, data translation lookaside buffer, instruction translation lookaside
buffer, and miss/fill/prefetch buffers associated with instruction and data caches.
Table 6.2 shows the total footprint area and power consumption of these memory
modules implemented in BLM and TLM. In all, TLM memory modules offer 38.1%
reduction in footprint area with respect to BLM modules, mainly because the TLM
SRAM cell is 43.9% smaller than the BLM SRAM cell [26]. TLM memory modules
118
have 12.8% smaller power consumption owing to shorter interconnects that are used
to route data inside the memory module.
Table 6.2: Memory modules in BLM vs. TLM
BLM TLM Reduction (%)Area (µm2) 165766 102641 38.1Power (µW) 140252 122361 12.8
6.5 Orion-monolithic
We have built Orion-monolithic atop Orion 2.0 [98] and integrated it into McPAT-
monolithic. It characterizes the power and area of NoCs implemented in BLM and
TLM styles. Fig. 6.2 shows the Orion-monolithic simulation flow. It models various
NoC components, such as the crossbar, arbiter, buffers, and the clock/link based on
the NoC model, technology parameter values, and FinFET design library.
Table 6.3 shows the Orion-monolithic results for the OpenSPARC T2 cache cross-
bar (CCX). CCX connects the processor cores to the L2 cache banks. It is an 8 (T2
cores) by 9 (8 L2 banks + IO) matrix crossbar. When implemented in TLM, it has a
35.3% smaller footprint area and 12.7% lower power consumption. The power saving
is due to shorter interconnects in the crossbar and buffer.
Table 6.3: Orion-monolithic results
BLM TLM TLM vs. BLMNoC Area Power Area Power Area Power
component (µm2) (µW) (µm2) (µW) red. (%) red. (%)Crossbar 504282 83127 324181 72241 35.7 13.1Arbiter 36356 11909 23372 11909 35.7 0.0Buffer 5334 17890 3606 14096 32.4 21.2Clock 7121 5158 6741 4885 5.3 5.3Total 553093 118083 357900 103131 35.3 12.7
119
Area/power
NoC configuration- Crossbar type - #Input ports- #Output ports- Flit width- Technology
FinFET design library- Logic cells- Memory cells- Capacitance model- Resistance model- Delay model
Buffer model
Arbiter modelCrossbar model
Clock/link model
Figure 6.2: The Orion-monolithic simulation flow.
6.6 McPAT-monolithic
We build McPAT-monolithic atop McPAT [97], a framework for modeling multi-core
architectures. McPAT-monolithic models the processor in a hierarchical manner, as
shown in Fig. 6.3. It starts from the low-level circuits and models the architecture
in a bottom-up fashion. We have enhanced McPAT in the following ways to obtain
McPAT-monolithic.
• Integrating a FinFET design library characterized via TCAD simulations rather
than relying on parameter values scaled from prior technology nodes.
• Adding support for FinFET libraries by extending capacitance and resistance
models. FinFETs suffer from the width quantization issue that forces a FinFET
120
to only have an integer number of fins. Thus, capacitance models that are
implemented for a planar technology are not applicable to FinFET technology.
• Updating delay models based on the TCAD device simulations.
In this work, we model the processor cores, L2 cache, and NoC. The components of
a processor core modeled in McPAT-monolithic are the instruction fetch unit (IFU),
execution unit (EXE), load store unit (LSU), memory management unit (MMU), and
FUs such as ALU, FPU, and MUL. We were unable to model some SoC components,
such as the memory controller that did not have a macromodel available. For such
components, McPAT uses empirical data from earlier technology nodes and scales
them to newer technologies. We have excluded these modules in our analysis since
we use a 14nm FinFET technology and scaling empirical values from a prior planar
technology will not be accurate.
Processor
CoresEXE MMUIFU LSU
FUs ALU FPU MUL
Cache
Decoder
SRAM array
Wire
NoCCrossbar
Arbiter Buffer
Link
Figure 6.3: Hierarchical modeling in McPAT-monolithic.
6.7 Results
We characterize the OpenSPARC T2 [93] processor cores, L2 cache, and NoC us-
ing four different implementations: 2-D, BLM, TLM, and HM. Table 6.4 shows the
OpenSPARC T2 processor configuration and technology parameter values. We run
the simulations on a 3.10 GHz machine with 64-bit quad-core Intel i5 processor, 8
GB DRAM, and Ubuntu 12.04 LTS operating system.
121
Table 6.4: Processor model parameter values
Parameter ValueProcessor model OpenSPARC T2Processor type In-order
Number of cores 8Number of threads 4Instruction width 32
L2 cache 4 MB (Eight 512 KB banks)L1 instruction cache 16 KB
L1 data cache 8 KBTechnology 14 nm SOI FinFETFrequency 1.4 GHz
Supply voltage 0.8 V
Table 6.5: Comparison of different monolithic designs based on minimum area-powerproduct
Design Area Power Wirelength Dead Temperature Area Power Wirelength(µm2) (µW) (µm) space (%) (◦K) red. (%) red. (%) red. (%)
2-D 249864 383780 662232 1.1 331 0 0 0BLM 126228 377604 451531 2.1 341 49.5 1.6 31.8TLM 155727 354453 511426 0.5 338 37.7 7.6 22.8HM 136157 359895 458939 2.7 339 45.5 6.2 30.7
6.7.1 Floorplanning results of the OpenSPARC T2 core
Table 6.5 shows the floorplanning results for a single OpenSPARC T2 processor core.
It presents the footprint area, global wirelength, power consumption, dead space,
and peak temperature of the design. Area and power values of the core modules are
obtained using McPAT-monolithic and fed into 3-D-HMFP for floorplanning. Power
consumption consists of runtime dynamic and leakage power. Global interconnects are
added among modules during floorplanning. For each design, we run 100 simulations
and select the floorplan that minimizes the area-power product since both parameters
are important for a good design.
The 2-D design has the largest footprint area and the lowest peak temperature.
Although it has the highest power consumption among all designs, it has the smallest
power density due to its larger footprint area. This leads to smaller on-chip temper-
ature values. The 2-D design forms the baseline.
122
The BLM design offers a 49.5% reduction in footprint area along with 1.6% lower
power consumption owing to the 31.8% reduction in global wirelength. Due to a
higher power density, its peak temperature is higher by 10 ◦C compared to that of
the baseline.
The TLM design has 37.7%, 7.6%, and 22.8% reduction in the footprint area,
power consumption, and wirelength, respectively. It offers higher power reduction
compared to the BLM design since it benefits from intra-module interconnect length
reduction. However, its global wirelength reduction is smaller than that of the BLM
design due to a higher footprint area.
The HM design, which combines modules that are implemented in all different
monolithic styles, offers a 45.5% footprint area reduction. The 6.2% power reduction
is not as much as that of the TLM design since the BLM modules used in the HM
design have more power consumption than TLM modules. The HM design uses
only three GLM modules out of the six GLM modules available. Although GLM
modules have a smaller footprint area and power consumption compared to other
implementations, the floorplanner may not use all of them in order to balance the
areas of the two layers.
Fig. 6.4 shows the floorplans of the OpenSPARC T2 core implemented in 2-D,
BLM, TLM, and HM, whose results were presented in Table 6.5. The 2-D design is
implemented on a single layer, whereas monolithic 3-D designs are implemented on
two (top and bottom) layers. The TLM design has the same floorplan on both layers
since all modules are implemented in 3-D. The HM design has modules implemented
in all three monolithic styles. Fig. 6.4 shows that GLM and TLM modules have the
same footprint on the bottom and top layers.
123
(a) (b) (c) (d)
Figure 6.4: OpenSPARC T2 floorplanning results: (a) 2-D, (b) BLM, (c) TLM, and(d) HM. Colors indicate the implementation style: blue: BLM, brown: TLM, andgreen: GLM.
6.7.2 The OpenSPARC T2 SoC results
Table 6.6 shows the area and power consumption values of the SoC components im-
plemented in BLM and TLM. L2 cache and CCX results are obtained directly from
McPAT-monolithic. We do not have a GLM case since most of the modules in the
processor are not modeled in GLM. The area and power consumption of the processor
cores are corrected by 3-D-HMFP after floorplanning since McPAT-monolithic does
not obtain a floorplan and ignores global interconnects. In all, the TLM implemen-
tation offers a 36.8% and 8.0% reduction in footprint area and power consumption,
124
Table 6.6: Area and power results for the SoC components implemented in BLM andTLM
BLM TLM TLM vs. BLMArea Power Area Power Area Power
(mm2) (W) (mm2) (W) reduction (%) reduction (%)Cores 2.00 2.98 1.25 2.75 37.7 7.9
L2 2.94 0.82 1.87 0.76 36.5 7.7CCX 0.55 0.12 0.36 0.10 35.3 12.7Total 5.49 3.92 3.47 3.61 36.8 8.0
respectively. The reduction in power consumption is the largest for CCX since its
power is dominated by interconnects in the crossbar and buffer.
Fig. 6.5 shows the SoC floorplans we generated for the different designs: 2-D,
BLM, TLM, and HM. We obtain the processor core floorplan from 3-D-HMFP and
floorplan the SoC in a similar fashion to the original OpenSPARC T2 chip. We assume
the dimensions of the L2 cache and CCX are flexible (however, their area remains the
same) for the sake of a fair area comparison among different designs. The 2-D design
only uses 2-D modules. The BLM SoC design consists of the cores that are imple-
mented in 3-D BLM (Fig. 6.4(b)), 2-D L2 cache, and 2-D CCX modules. In the BLM
SoC design, we split the L2 cache banks unequally between the two layers in order to
achieve a balanced area. The TLM SoC design assumes all modules are implemented
in TLM. The HM SoC design contains the cores that are implemented in HM by
3-D-HMFP, as shown in Fig. 6.4(d). It assumes that the L2 cache is implemented in
BLM for area efficiency and CCX in TLM for lower power consumption. Table 6.7
shows the overall footprint area and power consumption results for all SoC designs.
The 2-D SoC design has the highest footprint area and power consumption. It forms
the baseline. The BLM SoC design has 49.7% and 1.4% reduction in footprint area
and power consumption, respectively, with respect to the 2-D baseline. The reduction
in power consumption comes from shorter global interconnects. The TLM SoC design
offers 36.8% reduction in footprint area along with 8.0% lower power consumption. It
125
Core 1
L2T
CCX
Core 2
Core 3
Core 4
L2T L2T L2T
Core 5
Core 6
Core 7
Core 8
L2T L2T L2T L2T
L2D
L2D
L2D
L2D
Core 1
Core 2
Core 3
Core 4
Core 5
Core 6
Core 7
Core 8
L2D
L2D
L2D
L2D
Core 1
L2T L2T L2T L2T
CCX
L2DCore 2 Core 3 Core 4
Core 5 Core 6 Core 7 Core 8
L2T L2T L2T L2T
L2D
L2D
L2D
L2D
L2D
L2D
L2D
Core 1
CCX
L2DCore 2 Core 3 Core 4
Core 5 Core 6 Core 7 Core 8
L2D
L2D
L2D
L2D
L2D
L2D
L2D
Core 1
CCX
L2DCore 2 Core 3 Core 4
Core 5 Core 6 Core 7 Core 8
L2D
L2D
L2D
L2D
L2D
L2D
L2D
Core 1
CCX
L2DCore 2 Core 3 Core 4
Core 5 Core 6 Core 7 Core 8
Top layer
Bottom layer
L2D
L2D
L2D
Core 1
CCX
L2DCore 2 Core 3 Core 4
Core 5 Core 6 Core 7 Core 8L2D
L2D
L2D
(a) (b) (c) (d)
Figure 6.5: OpenSPARC T2 SoC floorplans: (a) 2-D, (b) BLM, (c) TLM, and (d)HM (HM core + BLM L2 + TLM CCX).
benefits from a reduction in both global and intra-module interconnect lengths. The
HM SoC design has 47.2% smaller footprint area and 5.3% lower power consumption
with respect to the 2-D design. The HM2 SoC design uses an L2 cache implemented
in TLM to increase the power savings to 6.9%. However, the area reduction for the
HM2 design is a smaller 39.5%. It shows how HM designs can offer trade-offs among
different design objectives.
Table 6.7: Total area and power consumption results for the SoC designs
Area Power Area Power(mm2) (W) reduction (%) reduction (%)
2-D 5.49 3.92 0.0 0.0BLM 2.76 3.86 49.7 1.4TLM 3.47 3.61 36.8 8.0
HM (BLM L2 + TLM CCX) 2.90 3.71 47.2 5.3HM2 (TLM L2 + TLM CCX) 3.32 3.65 39.5 6.9
Fig. 6.6 shows the heat map of the SoC designs. As expected, 3-D designs have
higher temperatures (7-10 ◦C) than the 2-D design. The SoC implemented in TLM
has the highest peak temperature. The top layers have higher temperatures than the
bottom layers since they are farther away from the heat sink. They have around 1-2
◦C higher peak temperature values compared to the bottom layers.
126
(a) (b) (c) (d)
Tem
per
ature
(°K
)
324
326
328
330
332
334
336
Figure 6.6: OpenSPARC T2 heat maps: (a) 2-D, (b) BLM, (c) TLM, and (d) HM(HM core + BLM L2 + TLM CCX).
6.7.3 Discussion
We have shown, via simulations, that HM designs can offer various trade-offs among
design objectives, such as footprint area, power consumption, and on-chip tempera-
ture. This section presents some key observations on these trade-offs.
Our power consumption results consist of runtime dynamic and leakage power.
Runtime dynamic power of a module depends on its activity factor. Thus, power
reduction of a 3-D design is determined by the number of calls to its modules. For
example, the FPU implemented in GLM has a 13.9% power reduction, whereas the
ALU implemented in GLM has only a 4.5% power reduction compared to the 2-D
implementations. However, the impact of an ALU on runtime dynamic power may
be larger than that of the FPU if there are significantly more arithmetic operations
than floating-point operations. Thus, the power savings of a 3-D FPU with respect
to a 2-D FPU may not be significant at runtime if the FPU is rarely used. Hence,
the power benefit of a 3-D design depends strongly on the benchmark.
McPAT-monolithic cannot model modules in the GLM style except the FUs,
which can be characterized by FinPrin-monolithic. Thus, the power reduction of
the OpenSPARC T2 hybrid core design in this work is not as significant as it could
127
be if there were more GLM modules. In Chapter 5, we showed that an HM design
can reduce the power consumption of the T2 core by 14.6% since most logic modules
were implemented in GLM. However, in this work the reduction in power consump-
tion of the HM core is only 6.2% because fewer modules are implemented in GLM.
It is challenging to design a tool that is capable of modeling GLM modules at the
architecture level since it requires circuit-level 3-D gate placement. Adding GLM
modeling capability can enable McPAT-monolithic to use GLM modules more and
explore a larger HM design space.
Although we have only considered 3-D designs with two layers, monolithic designs
can be implemented on more than two layers. However, the temperature will become
a more pressing concern since the peak temperature increases around 10 ◦C just by
adding a second transistor layer, despite the lower power consumption.
We do not use an RTL-to-GDSII physical design flow in our simulations. We use
tools that are fast in order to explore the HM design space and different architectures
more quickly. However, our tools are not as accurate as the tools that are used in
an RTL-to-GDSII design flow. After the design parameters are decided using the
McPAT-monolithic framework, an RTL-to-GDSII physical design flow can be used
for more accurate simulation results.
6.8 Chapter summary
We introduced McPAT-monolithic, an area/power/timing modeling tool for 3-D
monolithic multi-core architectures. We used it to model the OpenSPARC T2 SoC
implemented in different monolithic design styles. We demonstrated that an HM
design consisting of modules implemented in different monolithic styles can reduce
the footprint area and power consumption by 47.2% and 5.3%, respectively, compared
to the 2-D design at the cost of a higher on-chip temperature.
128
Chapter 7
Conclusion and Future Work
This chapter summarizes the findings of this thesis and discusses future directions.
7.1 Summary of findings
In Chapter 3, we explored the use of MPA FinFETs for low-power, robust, and
dense SRAM cell design. We investigated FinFETs with asymmetries in gate work-
function, source/drain doping concentration, gate underlap, and their combinations.
We showed that asymmetry in gate workfunction combined with asymmetry in dop-
ing concentration can reduce the leakage power by 58× and mitigate the read-write
conflict. We showed MPA FinFETs can also be effective at addressing the width-
quantization issue since the strength of a FinFET can be adjusted by introducing
asymmetries while keeping the fin count the same. Thus, it is possible to achieve
different effective pull-down-to-access and access-to-pull-up ratios in terms of their
strengths to achieve good stability metrics even with single-fin FinFETs. Use of
MPA FinFETs, however, often degrades the SRAM performance due to weaker Fin-
FETs. In addition, adding asymmetries increases the number of fabrication steps
that can increase manufacturing cost and degrade the yield.
129
In Chapter 4, we presented two new 3-D 8T SRAM cells for enhanced TR and low
ILEAK. We used pFinFET access transistors to achieve an area-efficient design in 3-D
by equalizing the number of nFinFETs and pFinFETs in the 8T SRAM cell. However,
use of pFinFET access transistors degrades writeability of the proposed cells. Thus,
in one of the proposed cells, we employed IG pull-up transistors with their back gate
tied to VDD to improve the degraded writeability. This cell has 28.1%, 31.6%, and
53.2% smaller footprint area, ILEAK, and TR, respectively, compared to a conventional
2-D 6T SRAM cell. It also has 43.8%, 43.2%, and 29.0% reduction in footprint area,
ILEAK, and TR, respectively, compared to a conventional 2-D 8T SRAM cell. We
investigated various assist techniques and showed cell-GND-boosting can be used to
improve the writeability of the proposed cells. We also investigated SRAM cells for
different design objectives such as high stability, high performance, or low leakage.
The proposed cells are shown to be particularly promising for high-read-performance
and low-leakage purposes.
In Chapter 5, we investigated the benefits of HM designs. We developed tools
needed to model logic and memory modules implemented in different monolithic styles
such as BLM, GLM, and TLM. We developed an effective and fast gate-level place-
ment method for GLM logic modules. We also presented 3-D-HMFP, the first 3-D
HM floorplanner, to explore the HM design space. We showed an HM design can
reduce the footprint area and power consumption of the OpenSPARC T2 processor
core by 48.1% and 14.6% compared to a 2-D design.
In Chapter 6, we presented McPAT-monolithic, an architectural modeling frame-
work for HM designs at the multi-core level. We developed an NoC modeling tool
for BLM and TLM designs. We integrated 3-D-HMFP into McPAT-monolithic for
processor core floorplanning. We showed that an HM design can reduce the footprint
area of the OpenSPARC T2 SoC by 47.2% along with a 5.3% reduction in runtime
power consumption with respect to its 2-D counterpart.
130
7.2 Future work
In Chapter 3, we only consider FinFETs with asymmetries in gate workfunction,
doping concentration, gate underlap, and their combinations. However, there may be
other ways to introduce asymmetry in FinFET-based SRAM cells such as using Fin-
FETs with different heights. Multiple-fin-height FinFETs can be useful at addressing
the width quantization issue. The same methodology we employ in Chapter 3 can be
used even if more asymmetries are introduced. The fabrication cost of introducing
asymmetries needs be investigated in order to make a better comparison between
symmetric, SPA, and MPA FinFET-based SRAM cells. Another future direction can
be investigating assist techniques for MPA FinFET-based SRAM cells to improve the
degraded performance.
In Chapter 4, we chose a thick ILD to prevent inter-layer coupling. A thin ILD
can induce additional variations and alter device characteristics. The impact of inter-
layer coupling on the proposed 8T SRAM cells can be investigated in the future. Use
of IG FinFETs in 3-D SRAM cells can also be explored in the presence of inter-layer
coupling as they enable dynamic leakage management by modifying Vth dynamically.
For monolithic 3-D integration, we only considered designs with two transistor
layers. Designs with three or more transistor layers can be explored as they can reduce
the interconnect and power consumption even further. However, the temperature
increase can be a problem with these designs. Novel cooling techniques capable of
reducing temperatures of monolithic 3-D designs can be investigated in the future.
Modeling of memory modules implemented in GLM can be an interesting research
topic as they occupy the largest area on the chip. Folding can be a possible way to
model GLM 3-D memory modules since they have a regular structure. Adaptation of
monolithic 3-D integration can eliminate the memory wall problem by providing a high
connectivity between the processor core and an on-chip memory. Thus, 3-D designs
with on-chip memory can be explored to breach the communication gap between
131
the processor and main memory. Finally, heterogeneous monolithic 3-D designs that
integrate different components such digital and analog ICs, micro-electro-mechanical
systems, and sensors can be investigated in the future.
We have used 22nm and 14nm technology nodes in our experiments. The benefits
of the proposed designs need to be investigated for smaller nodes as scaling contin-
ues. Although we expect to see similar benefits at smaller nodes, the improvements
obtained by the proposed designs may differ. For example, we showed that AWDSG
FinFETs can lower SRAM ILEAK at 22nm by 58× with given parameter values. It is
expected that AWDSG FinFETs will lower SRAM ILEAK at 14nm as well since both
asymmetries reduce ILEAK. However, the improvement may be different at the 14nm
node based on the design parameters. Similarly, benefits of TLM SRAMs may differ
at smaller nodes especially based on the relative strength of nFinFET and pFinFETs.
Thus, the proposed designs need to be re-evaluated based on the targeted node.
132
Bibliography
[1] G. E. Moore, “Cramming more components onto integrated circuits,” Electron.,vol. 38, no. 8, pp. 114–117, Apr. 1965.
[2] J. M. Shalf and R. Leland, “Computing beyond Moore’s law,” IEEE Computer,vol. 48, no. 12, pp. 14–23, Dec. 2015.
[3] T. Ghani, M. Armstrong, C. Auth, M. Bost, P. Charvat, G. Glass, T. Hoffmann,K. Johnson, C. Kenyon, J. Klaus et al., “A 90nm high volume manufacturinglogic technology featuring novel 45nm gate length strained silicon CMOS tran-sistors,” in Proc. IEEE Int. Electron Device Mtg., Dec. 2003, pp. 11–6.
[4] K. Mistry, C. Allen, C. Auth, B. Beattie, D. Bergstrom, M. Bost, M. Brazier,M. Buehler, A. Cappellani, R. Chau et al., “A 45nm logic technology with high-k+ metal gate transistors, strained silicon, 9 Cu interconnect layers, 193nm drypatterning, and 100% Pb-free packaging,” in Proc. IEEE Int. Electron DeviceMtg., Dec. 2007, pp. 247–250.
[5] C. Auth, C. Allen, A. Blattner, D. Bergstrom, M. Brazier, M. Bost, M. Buehler,V. Chikarmane, T. Ghani, T. Glassman et al., “A 22nm high performance andlow-power CMOS technology featuring fully-depleted tri-gate transistors, self-aligned contacts and high density MIM capacitors,” in Proc. Int. Symp. VLSITechnol., June 2012, pp. 131–132.
[6] T. N. Theis and H. S. P. Wong, “The end of Moore’s law: A new beginning forinformation technology,” Comput. Science Engineering, vol. 19, no. 2, pp. 41–50,Mar. 2017.
[7] R. S. Williams, “What’s next? [The end of Moore’s law],” Comput. ScienceEngineering, vol. 19, no. 2, pp. 7–13, Mar. 2017.
[8] D. Hisamoto, W.-C. Lee, J. Kedzierski, H. Takeuchi, K. Asano, C. Kuo, E. An-derson, T.-J. King, J. Bokor, and C. Hu, “FinFET-a self-aligned double-gateMOSFET scalable to 20 nm,” IEEE Trans. Electron Devices, vol. 47, no. 12, pp.2320–2325, Dec. 2000.
[9] Y.-K. Choi, L. Chang, P. Ranade, J.-S. Lee, D. Ha, S. Balasubramanian, A. Agar-wal, M. Ameen, T.-J. King, and J. Bokor, “FinFET process refinements for im-proved mobility and gate work function engineering,” in Proc. IEEE Int. ElectronDevice Mtg., Dec. 2002, pp. 259–262.
133
[10] M. C. Wang, “Independent-gate FinFET circuit design methodology,” IAENGInt. J. of Comput. Science, vol. 37, no. 1, pp. 1–8, Feb. 2010.
[11] H. Kawasaki, V. S. Basker, T. Yamashita, C. Lin, Y. Zhu, J. Faltermeier,S. Schmitz, J. Cummings, S. Kanakasabapathy, H. Adhikari et al., “Challengesand solutions of FinFET integration in an SRAM cell and a logic circuit for 22nm node and beyond,” in Proc. IEEE Int. Electron Device Mtg., Dec. 2009, pp.1–4.
[12] T. Yamashita, V. S. Basker, T. Standaert, C.-C. Yeh, T. Yamamoto, K. Maitra,C.-H. Lin, J. Faltermeier, S. Kanakasabapathy, M. Wang et al., “Sub-25nm Fin-FET with advanced fin formation and short channel effect engineering,” in Proc.IEEE Symp. VLSI Technol., June 2011, pp. 14–15.
[13] C. C. Wu, D. W. Lin, A. Keshavarzi, C. H. Huang, C. T. Chan, C. H. Tseng,C. L. Chen, C. Y. Hsieh, K. Y. Wong, M. L. Cheng, T. H. Li et al., “Highperformance 22/20nm FinFET CMOS devices with advanced high-k/metal gatescheme,” in Proc. IEEE Int. Electron Device Mtg., Dec. 2010, pp. 27.1.1–27.1.4.
[14] X. Dai and N. K. Jha, “Improving convergence and simulation time of quan-tum hydrodynamic simulation: Application to extraction of best 10-nm FinFETparameter values,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 25,no. 1, pp. 319–329, Jan. 2017.
[15] Intel Corp., “Intel 14nm technology,” 2014. [Online]. Available: https://www.intel.com/content/www/us/en/silicon-innovations/intel-14nm-technology.html
[16] Samsung Foundry, “14nm FinFET technology,” 2015. [Online]. Available:http://www.samsung.com/semiconductor/foundry/process-technology/14nm/
[17] S. Manne, A. Klauser, and D. Grunwald, “Pipeline gating: Speculation controlfor energy reduction,” in Proc. Int. Symp. Comput. Architecture, vol. 26, no. 3,June 1998, pp. 132–141.
[18] S. O. Toh, Z. Guo, T.-J. K. Liu, and B. Nikolic, “Characterization of dynamicSRAM stability in 45 nm CMOS,” IEEE J. Solid-State Circuits, vol. 46, no. 11,pp. 2702–2712, Nov. 2011.
[19] S. Panth, K. Samadi, Y. Du, and S. K. Lim, “Design and CAD methodologiesfor low power gate-level monolithic 3D ICs,” in Proc. Int. Symp. Low PowerElectron. Design, Aug. 2014, pp. 171–176.
[20] R. S. Patti, “Three-dimensional integrated circuits and the future of system-on-chip designs,” Proc. IEEE, vol. 94, no. 6, pp. 1214–1224, June 2006.
[21] D. H. Kim, K. Athikulwongse, and S. K. Lim, “Study of through-silicon-viaimpact on the 3-D stacked IC layout,” IEEE Trans. Very Large Scale Integr.(VLSI) Syst., vol. 21, no. 5, pp. 862–874, May 2013.
134
[22] C. Liu and S. K. Lim, “A design tradeoff study with monolithic 3D integration,”in Proc. Int. Symp. Qual. Electron. Design, Mar. 2012, pp. 529–536.
[23] S. Panth, K. Samadi, Y. Du, and S. K. Lim, “Power-performance study of block-level monolithic 3D-ICs considering inter-tier performance variations,” in Proc.Des. Autom. Conf., June 2014, pp. 1–6.
[24] Y. Lee, P. Morrow, and S. K. Lim, “Ultra high density logic designs usingtransistor-level monolithic 3D integration,” in Proc. Int. Conf. Comput.-AidedDes., Nov. 2012, pp. 539–546.
[25] A. Guler and N. K. Jha, “Ultra-low-leakage, robust FinFET SRAM design usingmultiparameter asymmetric FinFETs,” ACM J. Emerg. Technol. Comput. Syst.,vol. 13, no. 2, pp. 26:1–26:25, 2017.
[26] A. Guler and N. K. Jha, “Three-dimensional monolithic FinFET-based 8TSRAM cell design for enhanced read time and low leakage,” IEEE Trans. VeryLarge Scale Integr. (VLSI) Syst., vol. 27, no. 4, pp. 899–912, Apr. 2019.
[27] A. Guler and N. K. Jha, “Hybrid monolithic 3-D IC floorplanner,” IEEE Trans.Very Large Scale Integr. (VLSI) Syst., vol. 26, no. 10, pp. 1868–1880, Oct. 2018.
[28] ——, “McPAT-monolithic: An area/power/timing architecture modeling frame-work for 3-D hybrid monolithic multi-core systems,” IEEE Trans. Very LargeScale Integr. (VLSI) Syst., under review.
[29] A. N. Bhoj and N. K. Jha, “Design of logic gates and flip-flops in high-performance FinFET technology,” IEEE Trans. Very Large Scale Integr. (VLSI)Syst., vol. 21, no. 11, pp. 1975–1988, Nov. 2013.
[30] ——, “Parasitics-aware design of symmetric and asymmetric gate-workfunctionFinFET SRAMs,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22,no. 3, pp. 548–561, Mar. 2014.
[31] F. Moradi, S. K. Gupta, G. Panagopoulos, D. T. Wisland, H. Mahmoodi, andK. Roy, “Asymmetrically doped FinFETs for low-power robust SRAMs,” IEEETrans. Electron Devices, vol. 58, no. 12, pp. 4241–4249, Dec. 2011.
[32] S. M. Salahuddin, H. Jiao, and V. Kursun, “A novel 6T SRAM cell with asym-metrically gate underlap engineered FinFETs for enhanced read data stabilityand write ability,” in Proc. Int. Symp. Qual. Electron. Design, Mar. 2013, pp.353–358.
[33] A. Goel, S. K. Gupta, and K. Roy, “Asymmetric drain spacer extension (ADSE)FinFETs for low-power and robust SRAMs,” IEEE Trans. Electron Devices,vol. 58, no. 2, pp. 296–308, Feb. 2011.
135
[34] S. M. Chaudhuri and N. K. Jha, “Ultra-low-leakage and high-performance logiccircuit design using multiparameter asymmetric FinFETs,” ACM J. Emerg.Technol. Comput. Syst., vol. 12, no. 4, p. 43, Mar. 2016.
[35] F. Andrieu, R. Berthelon, R. Boumchedda, G. Tricaud, L. Brunet, P. Batude,B. Mathieu, E. Avelar, A. A. de Sousa, G. Cibrario, O. Rozeau et al., “Designtechnology co-optimization of 3D-monolithic standard cells and SRAM exploitingdynamic back-bias for ultra-low-voltage operation,” in Proc. IEEE Int. ElectronDevice Mtg., Dec. 2017, pp. 20.3.1–20.3.4.
[36] P. Batude, M. A. Jaud, O. Thomas, L. Clavelier, A. Pouydebasque, M. Vinet,S. Deleonibus, and A. Amara, “3D CMOS integration: Introduction of dynamiccoupling and application to compact and robust 4T SRAM,” in Proc. Int. Conf.Integr. Circuit Design and Technol., June 2008, pp. 281–284.
[37] K. C. Yu, M. L. Fan, P. Su, and C. T. Chuang, “Evaluation of monolithic 3-Dlogic circuits and 6T SRAMs with InGaAs-n/Ge-p ultra-thin-body MOSFETs,”IEEE J. Electron Devices Society, vol. 4, no. 2, pp. 76–82, Mar. 2016.
[38] O. Thomas, M. Vinet, O. Rozeau, P. Batude, and A. Valentian, “Compact 6TSRAM cell with robust read/write stabilizing design in 45nm monolithic 3D ICtechnology,” in Proc. Int. Conf. Integr. Circuit Design and Technol., May 2009,pp. 195–198.
[39] C. H. Shen, J. M. Shieh, T. T. Wu, W. H. Huang, C. C. Yang, C. J. Wan,C. D. Lin, H. H. Wang, B. Y. Chen, G. W. Huang, Y. C. Lien, S. Wong et al.,“Monolithic 3D chip integrated with 500ns NVM, 3ps logic circuits and SRAM,”in Proc. IEEE Int. Electron Device Mtg., Dec. 2013, pp. 9.3.1–9.3.4.
[40] C. Liu and S. K. Lim, “Ultra-high density 3D SRAM cell designs for monolithic3D integration,” in Proc. IEEE Int. Interconnect Technol. Conf., June 2012, pp.1–3.
[41] L. Chang, D. M. Fried, J. Hergenrother, J. W. Sleight, R. H. Dennard, R. K.Montoye, L. Sekaric, S. J. McNab, A. W. Topol, C. D. Adams, K. W. Guarini,and W. Haensch, “Stable SRAM cell design for the 32 nm node and beyond,” inProc. Int. Symp. VLSI Technol., June 2005, pp. 128–129.
[42] D. Bhattacharya and N. K. Jha, “Ultra-high density monolithic 3-D FinFETSRAM with enhanced read stability,” IEEE Trans. Circuits Syst. I, Reg. Papers,vol. 63, no. 8, pp. 1176–1187, Aug. 2016.
[43] M. L. Fan, V. P. H. Hu, Y. N. Chen, P. Su, and C. T. Chuang, “Stability andperformance optimization of heterochannel monolithic 3-D SRAM cells consid-ering interlayer coupling,” IEEE Trans. Electron Devices, vol. 61, no. 10, pp.3448–3455, Oct. 2014.
136
[44] S. A. Tawfik and V. Kursun, “Compact FinFET memory circuits with p-typedata access transistors for low leakage and robust operation,” in Proc. Int. Symp.Qual. Electron. Design, Mar. 2008, pp. 855–860.
[45] R. Yarmand, B. Ebrahimi, H. Afzali-Kusha, A. Afzali-Kusha, and M. Pedram,“High-performance and high-yield 5 nm underlapped FinFET SRAM design us-ing p-type access transistors,” in Proc. Int. Symp. Qual. Electron. Design, Mar.2015, pp. 10–17.
[46] J. H. Law, E. F. Young, and R. L. Ching, “Block alignment in 3D floorplan usinglayered TCG,” in Proc. Great Lakes Symp. VLSI, Apr.-May 2006, pp. 376–380.
[47] X. Li, Y. Ma, and X. Hong, “A novel thermal optimization flow using incrementalfloorplanning for 3D ICs,” in Proc. Asia South Pacific Des. Automat. Conf., Jan.2009, pp. 347–352.
[48] R. K. Nain and M. Chrzanowska-Jeske, “Fast placement-aware 3-D floorplanningusing vertical constraints on sequence pairs,” IEEE Trans. Very Large ScaleIntegr. (VLSI) Syst., vol. 19, no. 9, pp. 1667–1680, Sept. 2011.
[49] A. Quiring, M. Lindenberg, M. Olbrich, and E. Barke, “3D floorplanning con-sidering vertically aligned rectilinear modules using T*-tree,” in Proc. IEEE Int.3D Syst. Integration Conf., Jan. 2012, pp. 1–5.
[50] J. Knechtel, E. F. Young, and J. Lienig, “Planning massive interconnects in 3-Dchips,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 34, no. 11,pp. 1808–1821, Nov. 2015.
[51] J.-M. Lin, P.-Y. Chiu, and Y.-F. Chang, “SAINT: Handling module folding andalignment in fixed-outline floorplans for 3D ICs,” in Proc. Int. Conf. Comput.-Aided Des., Nov. 2016, pp. 1–7.
[52] Y. Chen, G. Sun, Q. Zou, and Y. Xie, “3DHLS: Incorporating high-level synthesisin physical planning of three-dimensional (3D) ICs,” in Proc. Des. Automat. &Test Europe Conf., Mar. 2012, pp. 1185–1190.
[53] S. Bobba, A. Chakraborty, O. Thomas, P. Batude, and G. de Micheli, “Celltransformations and physical design techniques for 3D monolithic integrated cir-cuits,” ACM J. Emerg. Technol. Comput. Syst., vol. 9, no. 3, pp. 19:1–19:28,Oct. 2013.
[54] M. Jung, T. Song, Y. Peng, and S. K. Lim, “Design methodologies for low-power3-D ICs with advanced tier partitioning,” IEEE Trans. Very Large Scale Integr.(VLSI) Syst., vol. 25, no. 7, pp. 2109–2117, July 2017.
[55] S. Panth, K. Samadi, Y. Du, and S. K. Lim, “Shrunk-2D: A physical designmethodology to build commercial-quality monolithic 3-D ICs,” IEEE Trans.Comput.-Aided Design Integr. Circuits Syst., vol. 36, no. 10, pp. 1716–1724,Oct. 2017.
137
[56] E. J. Nowak, I. Aller, T. Ludwig, K. Kim, R. V. Joshi, C.-T. Chuang, K. Bern-stein, and R. Puri, “Turning silicon on its edge [double gate CMOS/FinFETtechnology],” IEEE Circuits and Devices Mag., vol. 20, no. 1, pp. 20–31, Jan.-Feb 2004.
[57] Z. Guo, S. Balasubramanian, R. Zlatanovici, T.-J. King, and B. Nikolic,“FinFET-based SRAM design,” in Proc. Int. Symp. Low Power Electron. Design,Aug. 2005, pp. 2–7.
[58] A. B. Sachid and C. Hu, “Denser and more stable SRAM using FinFETs withmultiple fin heights,” IEEE Trans. Electron Devices, vol. 59, no. 8, pp. 2037–2041, Aug. 2012.
[59] Synopsys Inc., “Sentaurus TCAD tool suite, version I-2013.12,” 2013. [Online].Available: htttp://www.synopsys.com
[60] E. Grossar, M. Stucchi, K. Maex, and W. Dehaene, “Read stability and write-ability analysis of SRAM cells for nanometer technologies,” IEEE J. Solid-StateCircuits, vol. 41, no. 11, pp. 2577–2588, Nov. 2006.
[61] A. Bhavnagarwala, S. Kosonocky, C. Radens, K. Stawiasz, R. Mann, Q. Ye, andK. Chin, “Fluctuation limits & scaling opportunities for CMOS SRAM cells,” inProc. IEEE Int. Electron Device Mtg., Dec. 2005, pp. 659–662.
[62] A. N. Bhoj, R. V. Joshi, and N. K. Jha, “3-D TCAD based parasitic capacitanceextraction for emerging multi-gate devices and circuits,” IEEE Trans. Very LargeScale Integr. (VLSI) Syst., vol. 21, no. 11, pp. 2094–2105, Nov. 2013.
[63] A. Carlson, Z. Guo, S. Balasubramanian, R. Zlatanovici, T.-J. K. Liu, andB. Nikolic, “SRAM read/write margin enhancements using FinFETs,” IEEETrans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 6, pp. 887–900, June2010.
[64] B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang, G. H. Loh, D. Mc-Cauley, P. Morrow, D. W. Nelson, D. Pantuso, P. Reed, J. Rupley, S. Shankar,J. Shen, and C. Webb, “Die stacking (3D) microarchitecture,” in Proc. Int. Symp.Microarchitecture, Dec. 2006, pp. 469–479.
[65] S. Xiong and J. Bokor, “Sensitivity of double-gate and FinFET devices to processvariations,” IEEE Trans. Electron Devices, vol. 50, no. 11, pp. 2255–2261, Nov.2003.
[66] E. Chin, M. Dunga, and B. Nikolic, “Design trade-offs of a 6T FinFET SRAMcell in the presence of variations,” in Proc. IEEE. Symp. VLSI Circuits, 2006,pp. 445–449.
138
[67] J. Kedzierski, D. M. Fried, E. J. Nowak, T. Kanarsky, J. H. Rankin, H. Hanafi,W. Natzle, D. Boyd, Y. Zhang, R. A. Roy, J. Newbury, C. Yu et al., “High-performance symmetric-gate and CMOS-compatible Vt asymmetric-gate Fin-FET devices,” in Proc. IEEE Int. Electron Device Mtg., Dec. 2001, pp. 19.5.1–19.5.4.
[68] L. Mathew, M. Sadd, B. E. White, A. Vandooren, S. Dakshina-Murthy, J. Cobb,T. Stephens, R. Mora, D. Pham, J. Conner, T. White et al., “Finfet with isolatedn+ and p+ gate regions strapped with metal and polysilicon,” in Proc. IEEEInt. SOI Conf., Sept.-Oct. 2003, pp. 109–110.
[69] T. Ghani, K. Mistry, P. Packan, M. Armstrong, S. Thompson, S. Tyagi, andM. Bohr, “Asymmetric source/drain extension transistor structure for high per-formance sub-50 nm gate length CMOS devices,” in Proc. IEEE Symp. VLSITechnol., June 2001, pp. 17–18.
[70] K. Ohuchi, K. Adachi, A. Hokazono, and Y. Toyoshima, “Source/drain engineer-ing for sub 100-nm technology node,” in Proc. IEEE Int. Conf. Ion ImplantationTechnol., Sept. 2002, pp. 7–12.
[71] P. Batude, M. A. Jaud, O. Thomas, L. Clavelier, A. Pouydebasque, M. Vinet,S. Deleonibus, and A. Amara, “3D CMOS integration: Introduction of dynamiccoupling and application to compact and robust 4T SRAM,” in Proc. Int. Conf.Integr. Circuit Design and Technol., June 2008, pp. 281–284.
[72] R. W. Mann and B. H. Calhoun, “New category of ultra-thin notchless 6T SRAMcell layout topologies for sub-22nm,” in Proc. Int. Symp. Qual. Electron. Design,Mar. 2011, pp. 1–6.
[73] A. Singhee and R. A. Rutenbar, “Why quasi-Monte Carlo is better than MonteCarlo or Latin hypercube sampling for statistical circuit analysis,” IEEE Trans.Comput.-Aided Design Integr. Circuits Syst., vol. 29, no. 11, pp. 1763–1776, Nov.2010.
[74] R. Ho, K. W. Mai, and M. A. Horowitz, “The future of wires,” Proc. IEEE,vol. 89, no. 4, pp. 490–504, Apr. 2001.
[75] J. S. Clarke, C. George, C. Jezewski, A. M. Caro, D. Michalak, and J. Torres,“Process technology scaling in an increasingly interconnect dominated world,”in Proc. Int. Symp. VLSI Technol., June 2014, pp. 1–2.
[76] J. Cong, J. Wei, and Y. Zhang, “A thermal-driven floorplanning algorithm for3D ICs,” in Proc. Int. Conf. Comput.-Aided Des., Nov. 2004, pp. 306–313.
[77] K. Vaidyanathan, D. H. Morris, U. E. Avci, I. S. Bhati, L. Subramanian, J. Gaur,H. Liu, S. Subramoney, T. Karnik, H. Wang, and I. A. Young, “Overcoming in-terconnect scaling challenges using novel process and design solutions to improveboth high-speed and low-power computing modes,” in Proc. IEEE Int. ElectronDevice Mtg., Dec. 2017.
139
[78] P. Kapur, G. Chandra, and K. C. Saraswat, “Power estimation in global inter-connects and its reduction using a novel repeater optimization methodology,” inProc. Des. Autom. Conf., June 2002, pp. 461–466.
[79] Y. Yang and N. K. Jha, “FinPrin: FinFET logic circuit analysis and optimizationunder PVT variations,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,vol. 22, no. 12, pp. 2462–2475, Dec. 2014.
[80] C.-Y. Lee and N. K. Jha, “FinCANON: A PVT-aware integrated delay andpower modeling framework for FinFET-based caches and on-chip networks,”IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 5, pp. 1150–1163, May 2014.
[81] R. Zhang, M. R. Stan, and K. Skadron, “HotSpot 6.0: Validation, accelerationand extension,” University of Virginia, Tech. Report, CS-2015-04, Aug. 2015.
[82] J. K. Ousterhout, G. T. Hamachi, R. N. Mayo, W. S. Scott, and G. S. Taylor,“The Magic VLSI layout system,” IEEE Des. Test Comput., vol. 2, no. 1, pp.19–30, Feb. 1985.
[83] Synopsys Inc., “Synopsys Design Compiler,” 2013. [Online]. Available:htttp://www.synopsys.com
[84] J. A. Roy, S. N. Adya, D. A. Papa, and I. L. Markov, “Min-cut floorplacement,”IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 25, no. 7, pp.1313–1326, July 2006.
[85] C. Chu and Y.-C. Wong, “FLUTE: Fast lookup table based rectilinear Steinerminimal tree algorithm for VLSI design,” IEEE Trans. Comput.-Aided DesignIntegr. Circuits Syst., vol. 27, no. 1, pp. 70–83, Jan. 2008.
[86] C. Yan and E. Salman, “Mono3D: Open source cell library for monolithic 3-Dintegrated circuits,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 65, no. 3,pp. 1075–1085, Mar. 2018.
[87] B. W. Ku, T. Song, A. Nieuwoudt, and S. K. Lim, “Transistor-level monolithic3D standard cell layout optimization for full-chip static power integrity,” in Proc.Int. Symp. Low Power Electron. Design, July 2017, pp. 1–6.
[88] W.-L. Hung, G. M. Link, Y. Xie, N. Vijaykrishnan, and M. J. Irwin, “Inter-connect and thermal-aware floorplanning for 3D microprocessors,” in Proc. Int.Symp. Qual. Electron. Design, Mar. 2006, pp. 98–104.
[89] Y.-C. Chang, Y.-W. Chang, G.-M. Wu, and S.-W. Wu, “B*-Trees: A new rep-resentation for non-slicing floorplans,” in Proc. Des. Autom. Conf., June 2000,pp. 458–463.
[90] ITRS, “International Technology Roadmap for Semiconductor,” 2013. [Online].Available: http://www.itrs2.net/2013-itrs.html
140
[91] HotSpot, “HotSpot 6.0 temperature modeling tool,” 2015. [Online]. Available:http://lava.cs.virginia.edu/HotSpot/
[92] S. K. Samal, S. Panth, K. Samadi, M. Saeidi, Y. Du, and S. K. Lim, “Adaptiveregression-based thermal modeling and optimization for monolithic 3-D ICs,”IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 35, no. 10, pp.1707–1720, Oct. 2016.
[93] Oracle, “OpenSPARC T2,” 2007. [Online]. Available: http://www.oracle.com
[94] P. Batude, M. Vinet, B. Previtali, C. Tabone, C. Xu, J. Mazurier, O. Weber,F. Andrieu, L. Tosti, L. Brevard, B. Sklenard, P. Coudrain et al., “Advances,challenges and opportunities in 3D CMOS sequential integration,” in Proc. IEEEInt. Electron Device Mtg., Dec. 2011, pp. 7.3.1–7.3.4.
[95] J. Z. Yan and C. Chu, “DeFer: Deferred decision making enabled fixed-outlinefloorplanning algorithm,” IEEE Trans. Comput.-Aided Design Integr. CircuitsSyst., vol. 29, no. 3, pp. 367–381, Mar. 2010.
[96] J. Shi, D. Nayak, S. Banna, R. Fox, S. Samavedam, S. K. Samal, and S. K.Lim, “A 14nm FinFET transistor-level 3D partitioning design to enable high-performance and low-cost monolithic 3D IC,” in Proc. IEEE Int. Electron DeviceMtg., Dec. 2016, pp. 2–5.
[97] S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi,“The McPAT framework for multicore and manycore architectures: Simultane-ously modeling power, area, and timing,” ACM Trans. Archit. Code Optim.,vol. 10, no. 1, p. 5, Apr. 2013.
[98] A. B. Kahng, B. Li, L. Peh, and K. Samadi, “Orion 2.0: A power-area simulatorfor interconnection networks,” IEEE Trans. Very Large Scale Integr. (VLSI)Syst., vol. 20, no. 1, pp. 191–196, Jan. 2012.
141