FinFET-based SRAM and Monolithic 3-D Integrated Circuit Design€¦ · 3-D Integrated Circuit...

FinFET-based SRAM and Monolithic

3-D Integrated Circuit Design

Abdullah Guler

A Dissertation

Presented to the Faculty

of Princeton University

in Candidacy for the Degree

of Doctor of Philosophy

Recommended for Acceptance

by the Department of

Electrical Engineering

Adviser: Professor Niraj K. Jha

September 2019

c© Copyright by Abdullah Guler, 2019.

All rights reserved.

Abstract

Device miniaturization enabled processors to become faster and more powerful for

decades. However, device scaling became more challenging due to increasing leak-

age power consumption, intolerable short-channel effects (SCEs), and manufactur-

ing costs. This thesis aims to develop newer approaches for low-power and high-

performance designs for next generation computing technologies. It focuses on two

research directions: FinFET-based static random access memory (SRAM) design and

hybrid monolithic 3-D integrated circuit (IC) design.

The first research direction is to design area-efficient, low-power, and high-

performance SRAM cells. To this end, we investigate two approaches: multi-

parameter asymmetric (MPA) FinFET-based SRAM design and 3-D transistor-level

monolithic (TLM) SRAM design. In the first approach, we use FinFETs with up

to three asymmetries to address various SRAM challenges such as high leakage

power, read-write conflict, and width quantization issue at once. We present five

new 6T SRAM cells using MPA FinFETs and provide a comprehensive evaluation

of SRAM cells based on asymmetric FinFETs. We show MPA FinFETs can achieve

high stability metrics and reduce leakage power significantly at a cost of degraded

performance. We investigate TLM technology in the second approach of SRAM

design. In 3-D TLM design, n- and p-type transistors are fabricated on different

layers. Conventional 6T/8T SRAM cells have an area inefficiency when implemented

in 3-D due to the unequal number of n- and p-type transistors in the cell. We present

two new 3-D 8T SRAM cells that consist of four n-type and four p-type transistors

for better area efficiency. The proposed cells provide superior read performance and

lower leakage power consumption when compared to other 2-D/3-D SRAM cells at a

cost of degradation in writeability.

The second research direction of this thesis is to explore the benefits of mono-

lithic 3-D design from circuit to multi-core system level. 3-D ICs can address design

iii

challenges such as the interconnect bottleneck and memory wall. 3-D ICs reduce

power consumption, delay, and interconnect length by utilizing the vertical dimension.

Among 3-D IC solutions, monolithic 3-D technology appears to be very promising as

it provides the highest connectivity between transistor layers owing to its nanoscale

monolithic inter-tier vias (MIVs). Monolithic 3-D integration can be realized at dif-

ferent levels of granularity such as block, gate, and transistor. In this thesis, we focus

on hybrid monolithic (HM) designs, which combine modules implemented in different

monolithic styles to utilize their advantages. We develop the tools that are needed

to explore the HM design space. We develop a 3-D HM floorplanner, gate-level

placement methodology, and modeling tools for logic, memory, and NoC modules.

We integrate these tools into McPAT-monolithic, an area/timing/power architectural

modeling framework we develop for HM multi-core systems.

iv

Acknowledgments

First and foremost, I would like to thank my adviser Prof. Niraj K. Jha for his in-

valuable guidance and support during the past six years. I feel incredibly lucky to

have him as my adviser. I greatly appreciate his effort on helping me improve my

communication and writing skills, guiding me towards the right direction, and chal-

lenging me to do better. I would also like to thank my thesis reading and defense

committee for their time and invaluable feedback. I would like to thank National Sci-

ence Foundation for supporting this work under grants CCF-1318603, CCF-1714161,

and CCF-1811109.

I am indebted to all of my teachers who helped me throughout my academic

career. I would like to thank the incredible faculty in Princeton. I especially enjoyed

courses I have taken from Prof. Naveen Verma, Prof. James Sturm, and Prof. Niraj

Jha. I would like to thank Bilge Aslan, Nuri Yılmaz, Muhittin Siro, and many other

teachers whose names I could not list here.

I would like to thank my group mates Sourindra Chaudhuri, Debajit Bhattacharya,

Aoxiang Tang, Xianmin Chen, Arsalan Mosenia, Jie (Lucy) Lu, Xiaoliang Dai, Ye

(Fisher) Yu, Ozge Akmandor, Hongxu Yin, Shayan Hassantabar, Tanujay Saha, and

Prerit Terway for their support. I am also grateful to Ajay Bhoj, a former group

member whom I never met, but from whose work I learned so much.

I greatly appreciate Princetonian staff both inside and outside my department

for making campus life comfortable and enjoyable. I would especially like to thank

Colleen Conrad for helping me with all the logistics and administrative work. Life in

Princeton for me has been much easier thanks to people at Equad, Transportation

and Parking, Davis International Center, McCosh Health Center, and dining halls.

I would like to thank my friends for making campus life incredibly fun and mem-

orable. Yen, Levent, Chandra Kanth, Burcin, Onur, Li-Fang, Tugce, Yasin, Murat,

Tri, Ozge, Mert, and Chinmay are only a small fraction of people whose friendship I

v

enjoyed immensely. I am especially thankful to Lung Yen Chen for being the greatest

housemate ever. I would like to thank my lifelong friends Faruk Gencel and Denizcan

Vanlı for their support. I also thank Bayboga, Anıl, Serkan, and Ender for great

DotA games we enjoyed together to relieve the stress of PhD life.

I would finally like to thank my dad Abdulkadir, mom Rihıme, and siblings

Mıheme, Sanye, Xezal, Mihros, Melis, Isık, Semo, Inci, and Sekocan for their im-

measurable love and support. I am especially indebted to my parents and elder

siblings for the sacrifices they made to help younger ones get an education. I would

like to thank Melis for inspiring me to be a better person. Isık and Semo have not

only been my siblings but also my best friends. I have become more caring thanks

to my younger siblings Inci and Sekocan. I am also grateful to my dear nephews and

nieces for the joy and love they bring into my life.

My special thanks goes to Eiichiro Oda, the creator of One Piece. I have immensely

enjoyed the journey of Captain Monkey D. Luffy and Straw Hat Crew. I have also

learned many invaluable lessons from my favorite characters. As Portgas D. Ace once

said, we need to live a life with no regrets.

vi

Abbreviations

2-D Two-dimensional

3-D Three-dimensional

ADSG Asymmetric doping shorted-gate

AUSG Asymmetric underlap shorted-gate

AWSG Asymmetric workfunction shorted-gate

BEOL Back-end-of-line

BL Bitline

BLB Bitline bar

BLM Block-level monolithic

CMOS Complementary metal-oxide-semiconductor

DIBL Drain-induced barrier lowering

EDA Electronic design automation

FEOL Front-end-of-line

FU Functional unit

GLM Gate-level monolithic

GND Ground

HM Hybrid monolithic

IC Integrated circuit

IG Independent-gate

ILD Inter-layer dielectric

ILEAK Leakage current

IOFF Off-current

ION On-current

IREAD Read current

MIV Monolithic inter-tier via

MOSFET Metal-oxide-semiconductor field-effect transistorvii

MPA Multi-parameter asymmetric

NoC Network-on-chip

RBL Read bitline

RDF Random dopant fluctuation

RPNM Read power noise margin

RSNM Read static noise margin

RWL Read wordline

SCE Short-channel effect

SG Shorted-gate

SoC System-on-chip

SOI Silicon-on-insulator

SPA Single-parameter asymmetric

SRAM Static random access memory

TCAD Technology computer-aided design

TLM Transistor-level monolithic

TR Read time

TSV Through-silicon via

TW Write time

VDD Supply voltage

VTC Voltage transfer characteristics

Vth Threshold voltage

WL Wordline

WM Write margin

WTP Write trip power

viii

To my family.

ix

Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.1 FinFETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.2 FinFET-based SRAM design . . . . . . . . . . . . . . . . . . . 6

1.1.3 SRAM characterization . . . . . . . . . . . . . . . . . . . . . . 8

1.1.4 Monolithic 3-D integration . . . . . . . . . . . . . . . . . . . . 10

1.2 Thesis contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Related Work 15

2.1 SPA FinFET-based SRAM design . . . . . . . . . . . . . . . . . . . . 15

2.2 TLM 3-D SRAM design . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 3-D hybrid monolithic floorplanning . . . . . . . . . . . . . . . . . . . 19

2.4 Monolithic 3-D design . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Ultra-low-leakage, Robust FinFET SRAM Design Using Multi-

parameter Asymmetric FinFETs 23

x

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2 Simulation setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2.1 SRAM dc metrics . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2.2 SRAM transient metrics . . . . . . . . . . . . . . . . . . . . . 29

3.3 MPA FinFET-based 6T SRAM cells . . . . . . . . . . . . . . . . . . 30

3.3.1 Selection of promising SRAM cells . . . . . . . . . . . . . . . 30

3.3.2 SRAM dc metrics analysis . . . . . . . . . . . . . . . . . . . . 32

3.3.3 SRAM transient metrics analysis . . . . . . . . . . . . . . . . 36

3.4 Analysis of the SRAM cells under different gate workfunction, doping

concentration, supply voltage, and temperature values . . . . . . . . . 37

3.4.1 Different gate workfunction values . . . . . . . . . . . . . . . . 37

3.4.2 Different doping concentration values . . . . . . . . . . . . . . 41

3.4.3 Different supply voltage values . . . . . . . . . . . . . . . . . . 43

3.4.4 Different temperature values . . . . . . . . . . . . . . . . . . . 43

3.5 Process variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.7 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4 3-D Monolithic FinFET-based 8T SRAM Cell Design for Enhanced

Read Time and Low Leakage 53

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54


4.3 Design of monolithic SRAM cells . . . . . . . . . . . . . . . . . . . . 58

4.3.1 Schematics of the SRAM cells . . . . . . . . . . . . . . . . . . 58

4.3.2 Layouts of the SRAM cells . . . . . . . . . . . . . . . . . . . . 60

4.3.3 Capacitance extraction . . . . . . . . . . . . . . . . . . . . . . 61

4.4 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.4.1 SRAM dc metric analysis . . . . . . . . . . . . . . . . . . . . 64

xi

4.4.2 SRAM transient metric analysis . . . . . . . . . . . . . . . . . 67

4.5 Impact of process variations, memory array configurations, assist tech-

niques, different temperatures, and gate workfunction values . . . . . 69

4.5.1 SRAM cell analysis under process variations . . . . . . . . . . 69

4.5.2 SRAM cell analysis under different memory array configurations 71

4.5.3 SRAM cell analysis under assist techniques . . . . . . . . . . . 73

4.5.4 SRAM cell analysis under different temperature values . . . . 76

4.5.5 SRAM cell analysis under different gate workfunction values . 77

4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.7 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5 Hybrid Monolithic 3-D IC Floorplanner 82

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84


5.4 FinPrin-monolithic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.5 Gate-level monolithic placement . . . . . . . . . . . . . . . . . . . . . 91

5.6 CACTI-monolithic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.7 Hybrid-monolithic 3-D IC floorplanner . . . . . . . . . . . . . . . . . 97

5.7.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . 98

5.7.2 T*-tree representation . . . . . . . . . . . . . . . . . . . . . . 98

5.7.3 Simulated annealing engine . . . . . . . . . . . . . . . . . . . 100

5.7.4 Cost function . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.7.5 Global wire power consumption . . . . . . . . . . . . . . . . . 101

5.8 HotSpot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.9 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.9.1 Floorplanning results . . . . . . . . . . . . . . . . . . . . . . . 104

xii

5.9.2 Floorplanning results at minimum area, wirelength, and power

values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.9.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.10 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

6 McPAT-monolithic: An Area/Power/Timing Framework for 3-D

Hybrid Monolithic Multi-Core Systems 113

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114


6.3 Modeling of functional units . . . . . . . . . . . . . . . . . . . . . . . 117

6.4 Modeling of memory modules . . . . . . . . . . . . . . . . . . . . . . 118

6.5 Orion-monolithic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.6 McPAT-monolithic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

6.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6.7.1 Floorplanning results of the OpenSPARC T2 core . . . . . . . 122

6.7.2 The OpenSPARC T2 SoC results . . . . . . . . . . . . . . . . 124

6.7.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6.8 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

7 Conclusion and Future Work 129

7.1 Summary of findings . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

Bibliography 133

xiii

List of Tables

1.1 22nm and 14nm SOI FinFET device parameter values . . . . . . . . 5

3.1 22nm SOI asymmetric FinFET device parameter values . . . . . . . . 26

3.2 Numerical representation of FinFETs . . . . . . . . . . . . . . . . . . 27

3.3 SRAM cell elimination examples . . . . . . . . . . . . . . . . . . . . . 31

3.4 6T SRAM configurations . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.5 6T SRAM dc and transient metric values . . . . . . . . . . . . . . . . 33

3.6 SRAM stability metric values under different drain doping concentrations 43

3.7 Comparison of SRAM cells at iso-IREAD/iso-ILEAK . . . . . . . . . . . 44

3.8 SRAM metric values at 0◦C . . . . . . . . . . . . . . . . . . . . . . . 44

3.9 SRAM metric values at 65◦C . . . . . . . . . . . . . . . . . . . . . . 45

3.10 SRAM metric values at 90◦C . . . . . . . . . . . . . . . . . . . . . . 45


3.12 Distribution characteristics, µ . . . . . . . . . . . . . . . . . . . . . . 46

3.13 Distribution characteristics, σ . . . . . . . . . . . . . . . . . . . . . . 46

4.1 SRAM cell footprint area . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2 SRAM bitline and wordline capacitances . . . . . . . . . . . . . . . . 62

4.3 SRAM dc and transient metric values . . . . . . . . . . . . . . . . . . 65


4.5 Distribution characteristics of SRAM dc and transient metrics . . . . 70

xiv

4.6 Impact of assist techniques on read stability and writeability . . . . . 74

4.7 Impact of assist techniques on read current and transient metrics . . 75

4.8 Gate workfunction values for designs with high stability . . . . . . . . 78

4.9 Gate workfunction values for high-performance designs . . . . . . . . 79

4.10 Gate workfunction values for low-leakage-power designs . . . . . . . . 79

4.11 Gate workfunction values for overall high-quality designs . . . . . . . 79

5.1 FGU footprint area and power values for different monolithic imple-

mentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.2 Footprint area values assumed for the modules to be floorplanned . . 85

5.3 FinPrin-monolithic results . . . . . . . . . . . . . . . . . . . . . . . . 89

5.4 Placement results of 14 test circuits . . . . . . . . . . . . . . . . . . . 93

5.5 CACTI-monolithic input parameter values for memory modules . . . 95

5.6 CACTI-monolithic results . . . . . . . . . . . . . . . . . . . . . . . . 95

5.7 Instruction cache data array dimensions of BLM and TLM implemen-

tations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.8 Comparison of different monolithic designs based on minimum area-

power product showing the benefit of hybrid designs in terms of foot-

print area and power consumption . . . . . . . . . . . . . . . . . . . . 104

5.9 Minimum area configurations of different monolithic designs . . . . . 108

5.10 Minimum wirelength configurations of different monolithic designs . . 108

5.11 Minimum power consumption configurations of different monolithic de-

signs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.1 Footprint area and power consumption results for FUs . . . . . . . . 118

6.2 Memory modules in BLM vs. TLM . . . . . . . . . . . . . . . . . . . 119

6.3 Orion-monolithic results . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.4 Processor model parameter values . . . . . . . . . . . . . . . . . . . . 122

xv

6.5 Comparison of different monolithic designs based on minimum area-

power product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.6 Area and power results for the SoC components implemented in BLM

and TLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

6.7 Total area and power consumption results for the SoC designs . . . . 126

xvi

List of Figures

1.1 Planar MOSFET model. . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Key milestones in device scaling. . . . . . . . . . . . . . . . . . . . . 3

1.3 FinFET types: (a) SG and (b) IG. . . . . . . . . . . . . . . . . . . . 4

1.4 A 2-D cross-section of a 3-D FinFET. . . . . . . . . . . . . . . . . . . 5

1.5 6T SRAM cell schematic. . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.6 2-D cross section of symmetric and SPA FinFETs: (a) SG FinFET,

(b) AWSG FinFET, (c) ADSG FinFET, and (d) AUSG FinFET. . . 7

1.7 SRAM stability metrics: (a) RSNM and (b) WM. . . . . . . . . . . . 9

1.8 Monolithic 3-D integration styles: (a) block-level, (b) gate-level, and

(c) transistor-level. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1 Monolithic 3-D floorplanning of different monolithic styles: (a) BLM,

(b) GLM/TLM, and (c) HM. . . . . . . . . . . . . . . . . . . . . . . 20

3.1 N-curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2 RSNMs of the (1,1,1) and (0,1,1) cells show how the pull-up transistor

can impact read stability. . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3 RSNM under different gate workfunction values. . . . . . . . . . . . . 38

3.4 WM under different gate workfunction values. . . . . . . . . . . . . . 39

3.5 RPNM under different gate workfunction values. . . . . . . . . . . . . 39

3.6 WTP under different gate workfunction values. . . . . . . . . . . . . 40

xvii

3.7 IREAD under different gate workfunction values. . . . . . . . . . . . . 40

3.8 ILEAK under different gate workfunction values. . . . . . . . . . . . . 41

3.9 TR under different gate workfunction values. . . . . . . . . . . . . . . 42

3.10 TW under different gate workfunction values. . . . . . . . . . . . . . . 42

4.1 Simulation flow for SRAM characterization. . . . . . . . . . . . . . . 57

4.2 SRAM cell schematics: (a) 6T 4N2P 2D/6T 4N2P 3D, (b) 8T 6N2P 2D,

(c) 8T 4N4P 3D prior1, (d) 8T 4N4P 3D prior2, (e) 8T 4N4P 3D proposed1,

and (f) 8T 4N4P 3D proposed2. . . . . . . . . . . . . . . . . . . . . . 58

4.3 SRAM layouts: (a) 6T 4N2P 2D, (b) 6T 4N2P 3D, (c) 8T 6N2P 2D,

(d) 8T 4N4P 3D prior1, (e) 8T 4N4P 3D prior2, (f) 8T 4N4P 3D proposed1,

and (g) 8T 4N4P 3D proposed2. . . . . . . . . . . . . . . . . . . . . . 60

4.4 6T 4N2P 2D cell: (a) FEOL only and (b) FEOL+BEOL. . . . . . . . 62

4.5 8T 4N4P 3D proposed2 p-layer: (a) FEOL only and (b) FEOL+BEOL. 63

4.6 8T 4N4P 3D proposed2 n-layer: (a) FEOL only and (b) FEOL+BEOL. 64

4.7 TR under different array configurations. . . . . . . . . . . . . . . . . . 72

4.8 TW under different array configurations. . . . . . . . . . . . . . . . . 72

4.9 ILEAK under different temperature values. . . . . . . . . . . . . . . . . 76

5.1 Floorplanning results of different monolithic implementations: (a)

BLM, (b) GLM logic + BLM memory, and (c) GLM/BLM logic +

BLM memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.2 The hybrid monolithic design flow. . . . . . . . . . . . . . . . . . . . 87

5.3 The FinPrin-monolithic simulation flow. . . . . . . . . . . . . . . . . 88

5.4 8× NAND cell layout: (a) BLM/GLM, (b) TLM n-tier, and (c) TLM

p-tier. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.5 Gate-level monolithic placement steps: (a) cell deflation, (b) deflated

2-D placement, (c) cell inflation, and (d) cell layer assignment. . . . . 91

xviii

5.6 Greedy layer assignment vs. ZOLP layer assignment showing the area

benefit of greedy method: (a) deflated 2-D placement, (b) cell inflation,

(c) cell layer assignment, and (d) legalization (just for ZOLP). . . . . 93

5.7 The CACTI-monolithic simulation flow. . . . . . . . . . . . . . . . . . 94

5.8 Area comparison: (a) BLM memory and (b) TLM memory module. . 96

5.9 Layout of horizontal and vertical H-trees of a memory module. . . . . 97

5.10 A T*-tree representation and the corresponding placement in 3-D. . . 99

5.11 Thermal model organization. . . . . . . . . . . . . . . . . . . . . . . . 103

5.12 Floorplanning results showing that the vertical constraints are met

for TLM/GLM modules: (a) 2-D, (b) BLM, (c) TLM, and (d) HM3

(GLM/BLM logic + BLM memory). . . . . . . . . . . . . . . . . . . 106

6.1 The HM SoC design flow. . . . . . . . . . . . . . . . . . . . . . . . . 117

6.2 The Orion-monolithic simulation flow. . . . . . . . . . . . . . . . . . 120

6.3 Hierarchical modeling in McPAT-monolithic. . . . . . . . . . . . . . . 121

6.4 OpenSPARC T2 floorplanning results: (a) 2-D, (b) BLM, (c) TLM,

and (d) HM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

6.5 OpenSPARC T2 SoC floorplans: (a) 2-D, (b) BLM, (c) TLM, and (d)

HM (HM core + BLM L2 + TLM CCX). . . . . . . . . . . . . . . . . 126

6.6 OpenSPARC T2 heat maps: (a) 2-D, (b) BLM, (c) TLM, and (d) HM

(HM core + BLM L2 + TLM CCX). . . . . . . . . . . . . . . . . . . 127

xix

Chapter 1

Introduction

In 1965, Gordon Moore, cofounder of Intel Corporation, predicted that the number

of devices in a chip would roughly double every year [1]. His prediction later became

known as Moore’s law that held true for more than half a century despite slight mod-

ifications over time. Moore’s law has been a roadmap for semiconductor foundries to

push the limits of innovation and cram more devices and functionality into integrated

circuits. Device scaling has been the driving force behind the exponential growth in

transistor count, increased device performance, and reduced cost per transistor [2].

Fig. 1.1 shows a planar metal-oxide-semiconductor field-effect transistor (MOS-

FET) model, which has been the fundamental device for building integrated circuits

(ICs). The current flow between source and drain is controlled by applying differ-

ent gate voltages. As device scales, the channel length between source and drain

decreases. This leads to an improvement in device performance and smaller power

reduction due to shorter channel length and smaller capacitances. Device scaling has

continued for decades despite countless challenges over the years.

Fig. 1.2 shows some key milestones enabling the continuation of device scaling over

the years. In 2003, strained silicon was added to the 90nm technology node [3]. Use

of strained silicon improved the performance of the device by up to 25% by increasing

1

p substrate

n+

Source Gate Drain

n+

SiO2

Figure 1.1: Planar MOSFET model.

electron and hole mobility. For sub-90nm technology nodes, gate leakage became a

serious issue and started hindering further scaling. As device scales, the gate insulator

gets thinner to enable gate control over the channel. However, thinning gate insulator

leads to current leaking through the gate. At the 45nm technology node, a high-k

dielectric based on HfO2 compound replaced SiO2, which was the gate insulator

for decades [4]. This replacement reduced the gate leakage by up to three orders of

magnitude. The polysilicon gate was also replaced with a metal gate that eliminated

the polysilicon depletion effect. Despite gate leakage being mostly eliminated, sub-

threshold leakage kept increasing with scaling as the source and drain terminals got

closer. At the 22nm technology node, FinFETs replaced planar MOSFETs to reduce

leakage power and enable better gate control over the channel [5].

Device scaling gets even more challenging as it approaches its fundamental physics

limits. Some of the current challenges are: approaching lithographic limits, intoler-

able short-channel effects (SCEs), power constraints, and increasing manufacturing

costs [6]. To continue enhancements in computing technology and overcome some

of above-mentioned challenges, we need alternative approaches such as designing de-

vices that employ new integration technologies, developing new architectures and

novel computation paradigms, as well as optimizing existing designs [7]. In this the-

2

SiGe strained silicon

High-k dielectric + Metal gate

FinFETs

1965 2003 2006 2008 2010 2012 2014

90nm 65nm 45nm 32nm 22nm 14nm 10nm

…

7nm

……

Figure 1.2: Key milestones in device scaling.

sis, we explore low-power and high-performance designs. Specifically, we design and

evaluate asymmetric FinFET-based static random access memory (SRAM) cells and

monolithic three-dimensional (3-D) SRAM cells, and explore monolithic 3-D designs

from module to multi-core system level.

The rest of the chapter is organized as follows. We first give a brief introduction to

FinFETs, FinFET-based SRAM design, SRAM characterization, and monolithic 3-D

ICs, followed by contributions of this thesis. An outline for the remaining chapters is

provided at the end of the chapter.

1.1 Background

This section presents background information on FinFETs, FinFET-based SRAM

design and evaluation metrics, and monolithic 3-D integration.

3

1.1.1 FinFETs

FinFETs, a type of multi-gate transistors, have replaced planar MOSFETs due to

their higher performance, superior short-channel behavior, and power efficiency [5].

FinFETs provide better control over the channel by surrounding it from multiple

sides. A better channel control suppresses the drain-induced barrier lowering (DIBL)

effect, improves subthreshold slope, and reduces leakage power consumption [8]. In

addition, FinFETs reduce random dopant fluctuation (RDF) by employing a lightly-

doped or undoped channel [9].

Fig. 1.3 shows two types of FinFETs: shorted-gate (SG) and independent-gate

(IG). In SG FinFETs, the gate wraps around the channel, whereas in IG FinFETs,

the front and back gates become independent of each other because the top part of

the FinFET is etched away. SG FinFETs have better on-current (ION) to off-current

(IOFF ) ratio than IG FinFETs and are preferred for high-performance designs. IG

FinFETs enable dynamic threshold voltage (Vth) control and offer new possibilities

for circuit design [10].

Gate Back

Gate

Front

Gate

Source

Drain Drain

Source

Oxide

(a) (b)

Figure 1.3: FinFET types: (a) SG and (b) IG.

Fig. 1.4 shows a two-dimensional (2-D) cross-section of a 3-D FinFET. The Fin-

FET parameters are gate length LG, fin thickness TSI , oxide thickness TOX , fin height

4

HFIN , spacer thickness LSP , gate underlap LUN , fin pitch FP , gate pitch GP , channel

doping concentration NCH , source/drain doping concentration NSD, and gate work-

function ΦG.

TSI

TOX

LSP

LG

LUN

NSD NSD

NCH

Figure 1.4: A 2-D cross-section of a 3-D FinFET.

We use a 22nm (Chapter 3) and 14nm (Chapters 4, 5, and 6) silicon-on-insulator

(SOI) FinFET technology in our simulations. Table 1.1 shows the parameter values

we assume for our 22nm and 14nm SOI FinFET technology. We obtain parameter

values from the data released by semiconductor foundries and calibrate them via

simulations for high-performance circuits [11, 12, 13] (22nm), [14, 15, 16] (14nm).

Table 1.1: 22nm and 14nm SOI FinFET device parameter values

Parameter (unit) Value at 22nm Value at 14nmLG(nm) 24 16TSI(nm) 10 8TOX(nm) 1 0.9HFIN(nm) 40 30LSP (nm) 12 8LUN(nm) 12 8FP (nm) 40 42GP (nm) 90 70

NCH(cm−3) 1015 1015

NSD(cm−3) 1020 1020

ΦG(eV ) n : 4.4, p : 4.8 n : 4.4, p : 4.8

5

1.1.2 FinFET-based SRAM design

SRAM is the fundamental memory cell for on-chip data storage and fast access. Reg-

ister files, buffers, and caches use SRAM cells to store the data safely and access

it quickly. Design of low power, robust, and dense memories is crucial for modern

microprocessors because SRAMs occupy more than half the die area and are respon-

sible for significant power consumption, primarily due to leakage power dissipation

[17]. Although FinFETs have less leakage than planar complementary metal-oxide-

semiconductor (CMOS) transistors, leakage power consumption is still a major issue

in FinFETs due to aggressive scaling. Besides, the width quantization issue associ-

ated with FinFETs, process variations, and read-write conflict in SRAM cells make

FinFET SRAM design even more challenging. To address SRAM challenges, we in-

vestigate asymmetric FinFET-based SRAM and 3-D SRAM design in Chapter 3 and

4, respectively.

Fig. 1.5 shows the schematic of a conventional 6T SRAM cell. It consists of a

pair of pFinFET pull-up (PU1, PU2), nFinFET access (AX1, AX2), and nFinFET

pull-down (PD1, PD2) transistors. Cross-coupled inverters are used to store the data

and access transistors are used to read and write the data.

WL

VDD

AX1 AX2

PU1 PU2

PD1 PD2

BL BLB

L R

INV2INV1

Figure 1.5: 6T SRAM cell schematic.

6

(a) (b)

(c) (d)

NSD NSD NSD

NS ND NSDNSD

NSD

Doping

Concentration

(cm-3)

1020

1018

1016

1012

-1013

-1015

ΦG

ΦGFΦG

ΦG

ΦG

ΦG

ΦG

ΦGB

LUN LUN

LUN LUN

LUN LUN

LUNDLUNS

Figure 1.6: 2-D cross section of symmetric and SPA FinFETs: (a) SG FinFET, (b)AWSG FinFET, (c) ADSG FinFET, and (d) AUSG FinFET.

One way to address various SRAM design challenges is to introduce asymmetries

in FinFETs. A single-parameter asymmetric (SPA) FinFET is created by introducing

asymmetry in a FinFET parameter value. The SPA FinFETs we consider are asym-

metric workfunction SG (AWSG), asymmetric doping SG (ADSG), and asymmetric

underlap SG (AUSG), as shown in Fig. 1.6. SPA FinFETs were shown to be effective

at reducing static power consumption, mitigating the read-write conflict by utiliz-

ing bidirectional current flow across access transistors, and achieving high density in

SRAM cells using single-fin FinFETs. Multi-parameter asymmetric (MPA) FinFETs

combine two or more asymmetries in a FinFET to benefit from different asymmetries.

For example, an asymmetric workfunction-doping SG (AWDSG) FinFET combines

asymmetries in gate workfunction and doping concentration, which can be useful for

both reducing leakage and mitigating read-write conflict in SRAM cells. In Chap-

ter 3, we investigate MPA FinFET-based 6T SRAM cells for robust, low-power, and

area-efficient designs.

7

1.1.3 SRAM characterization

Dc and transient metrics are used to evaluate SRAM cells. The dc metrics are read

static noise margin (RSNM), write margin (WM), read current (IREAD), and leakage

current (ILEAK). The transient metrics are read time (TR) and write time (TW).

1. RSNM: RSNM measures the stability of an SRAM cell during a read operation.

For the read simulation setup, wordline (WL), bitline (BL), and BL bar (BLB)

voltages are held at supply voltage (VDD) while a voltage source sweeps the

voltage at a storage node from ground (VGND) to VDD. RSNM is measured from

the butterfly curve obtained from the voltage transfer characteristics (VTC) of

the cross-coupled inverters (INV1, INV2). RSNM is the largest square that can

fit in the butterfly curve, as shown in Fig. 1.7a. An SRAM cell with a higher

RSNM is more resilient to read failures.

2. WM: WM measures the writeability of a cell. It is measured from the VTC of

the cross-coupled inverters during a write operation. For the write simulation

setup, BLB and WL are held at VDD and BL is tied to VGND while a voltage

source sweeps the voltage at a storage node from VGND to VDD. WM is the

smallest square that can fit in the lower half of the VTC curves, as shown in

Fig. 1.7b. A higher WM implies better writeability.

3. IREAD: IREAD is the current drawn from the bitline connected to the storage

node that holds a “0” during a read operation [18]. A higher IREAD implies a

fast discharge on the bitline capacitance, and hence a smaller TR.

4. ILEAK: SRAMs consume a significant amount of leakage energy as they are

mostly in the hold mode. ILEAK is the current drawn from the power source

when the cell is in the hold mode. In this mode, bitlines are at VDD while the

wordline is at VGND. In other words, the transistors connected to wordlines are

OFF in the hold mode.

8

0 0.2 0.4 0.6 0.80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

VL (V)

VR

(V

)

RSNM

INV1

INV2

(a)

0 0.2 0.4 0.6 0.80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

VL (V)

VR

(V

)

WM

INV1

INV2

(b)

Figure 1.7: SRAM stability metrics: (a) RSNM and (b) WM.

5. TR: TR is measured during a read operation. It is the time interval from the

point VWL reaches its 50% value during switching to the point when sense am-

plifiers are activated. It is assumed that the sense amplifiers are activated

when the difference between bitline voltages (VBL, VBLB) reaches 100 mV

(|VBL − VBLB| = 100 mV).

6. TW: TW is measured during a write operation. It is measured from the time

when the voltage at the beginning of the wordline reaches the 50% switch point

to the time when voltage at the storage node that initially stores a “1” (VL)

reaches 10% of its initial value (VL = VDD × 0.1).

For transient simulations, resistance and capacitances of the SRAM cell are

extracted to compute WL and BL capacitances. Then, a memory array is

assumed to measure TR and TW.

9

1.1.4 Monolithic 3-D integration

With continued technology scaling, interconnects have become the bottleneck in fur-

ther performance and power consumption improvements in modern microprocessors.

3-D ICs provide a promising approach for addressing the interconnect bottleneck

because they enable reduction in overall interconnect length and total number of re-

peaters on long interconnects [19]. They also have the potential to push Moore’s

law further by accommodating more transistors per unit footprint area along with a

reduction in power consumption, interconnect length, and the number of repeaters.

3-D ICs can be fabricated using either parallel or sequential integration. In parallel

3-D integration, layers are processed independently and connected by through-silicon

vias (TSVs) [20]. TSV-based 3-D technologies have been extensively studied and

shown to be effective at reducing interconnect length, power consumption, and delay

[21]. However, TSV-based 3-D ICs cannot fully utilize the benefits of the third di-

mension due to their large TSV diameter and layer alignment issues [22]. In addition,

parallel integration often uses 2-D block-level modules. This does not benefit from

the third dimension at the gate or transistor level.

On the other hand, in sequential 3-D integration, also known as monolithic 3-

D integration, the layers are processed sequentially and connected using monolithic

inter-tier vias (MIVs), which have much smaller diameter (around 50nm) compared

to that of TSVs (around 1µm). Therefore, monolithic integration offers higher den-

sity and less parasitics, delay, and power consumption. Unlike parallel integration,

monolithic integration can also benefit from a reduction in intra-module intercon-

nect lengths to further reduce power consumption and delay. There are three types

of monolithic implementations: block-level monolithic (BLM) [23], gate-level mono-

lithic (GLM) [19], and transistor-level monolithic (TLM) [22], as shown in Fig. 1.8.

Each monolithic integration implementation style has its advantages and draw-

backs. In BLM, 2-D modules are floorplanned on multiple transistor layers to build

10

L2 cache

Core 2Core 3

Core 1Core 4

(a) (b) (c)

Figure 1.8: Monolithic 3-D integration styles: (a) block-level, (b) gate-level, and (c)transistor-level.

the 3-D design [23]. Existing electronic design automation (EDA) tools can be used

to create 2-D BLM modules [24]. However, it requires a 3-D floorplanner to place

modules on multiple layers efficiently. BLM designs do not benefit from potential

intra-block interconnect reduction since all the cells of the module are placed on the

same transistor layer. GLM places 2-D standard cells on multiple transistor layers to

generate 3-D modules. Unlike BLM, GLM benefits from intra-module interconnect

length reduction [19]. This leads to smaller power consumption and delay. GLM

implementations, however, require additional EDA tools, such as a 3-D gate-level

placement tool [19]. In TLM designs, n-type and p-type transistors are fabricated on

different layers. Thus, it enables independent optimization of the transistor layers. It

also benefits from a reduction in intra-module interconnect lengths similar to GLM.

However, it requires a new 3-D cell library in which n-type and p-type transistors of

each cell are implemented on separate layers [22].

A hybrid monolithic (HM) design consists of modules that are implemented in

different monolithic styles to utilize their advantages. For example, an HM design

may use logic modules implemented in GLM or TLM for power savings and 2-D

memory modules implemented in BLM for area efficiency. To realize an HM design,

we need a floorplanner that can handle both 2-D and 3-D modules in addition to the

tools needed to implement BLM, GLM, and TLM designs.

11

In Chapter 4, we design 3-D TLM SRAM cells for enhanced read performance and

reduced leakage power consumption. In Chapter 5, we present an HM floorplanner

and investigate the benefits of HM designs at processor core level. In Chapter 6, we

introduce an area/timing/power architectural modeling framework for HM designs at

multi-core system level.

1.2 Thesis contributions

The contributions of the thesis are as follows.

• Chapter 3 explores MPA FinFET-based SRAM designs [25]. Five novel MPA

FinFET-based SRAM cells are proposed and compared with symmetric and

SPA FinFET-based SRAM cells using dc and transient metrics. FinFETs

with asymmetries have been shown to be effective at reducing ILEAK and al-

leviating read-write conflict in SRAM cells. We, for the first time, show how

MPA FinFETs can be used to design ultra-low-leakage and robust 6T SRAM

cells. We combine multiple asymmetries, namely asymmetry in gate workfunc-

tion, source/drain doping concentration, and gate underlap, to address various

SRAM design issues all at once. Simulation results show that the ILEAK of

MPA FinFET-based SRAM cells can be reduced by up to 58× while ensuring

reasonable read/write stability metrics by combining asymmetries in gate work-

function and doping concentration. In addition, an MPA FinFET-based SRAM

cell can achieve high stability metrics with 22× ILEAK reduction compared to the

traditional symmetric FinFET-based SRAM cell. There is no area overhead as-

sociated with MPA FinFET-based SRAM cells. We evaluate SRAM cells under

different gate workfunction, doping concentration, supply voltage, and tempera-

ture values and show that MPA FinFET-based SRAM cells are promising under

12

various conditions. We also discuss the effect of process variations on SRAM

cells.

• Chapter 4 proposes two new 3-D monolithic FinFET-based 8T SRAM cells and

compares them with previously reported 6T and 8T SRAM cells implemented

in 2-D/3-D [26]. Conventional 6T and 8T SRAM cells are not area-efficient

when implemented in 3-D. Thus, we investigate 8T SRAM cells with equal

number of n-type and p-type transistors to achieve area efficiency for 3-D SRAM

design. Both the proposed cells use pFinFET access transistors for better area

efficiency in 3-D and low leakage current. Using pFinFETs access transistors,

however, hurts cell writeability. Thus, one of the cells, in addition to using

pFinFET access transistor, utilizes IG pFinFETs as pull-up transistors whose

back gates are tied to VDD for better writeability. This cell has 28.1% and 43.8%

smaller footprint area, 31.6% and 43.2% smaller leakage current, and 53.2% and

29.0% lower TR compared with conventional 2-D 6T SRAM and 2-D 8T SRAM

cells, respectively. The schematics, layouts, and bitline/wordline capacitances

of various SRAM cells are analyzed to understand the trade-offs in cell stability,

performance, and static power consumption. This chapter also investigates the

impact of process variations, memory array configurations, assist techniques,

different temperatures, and gate workfunction values on SRAM cells.

• Chapter 5 introduces the first 3-D HM floorplanner (3-D-HMFP) [27]. 3-D-

HMFP is capable of handling vertical constraints imposed by 3-D modules

and includes global interconnect power consumption. It can replace modules

with their alternative implementations to explore a large HM design space.

This chapter also presents a gate-level placement method needed to implement

GLM modules. It characterizes the OpenSPARC T2 processor core using differ-

ent monolithic implementations and compares their footprint area, wirelength,

13

power consumption, and temperature. Simulations show that under the same

timing constraint, an HM design offers 48.1% reduction in footprint area and

14.6% reduction in power consumption compared to those of the 2-D design at

the cost of higher power density and slightly higher temperature.

• Chapter 6 introduces McPAT-monolithic, a framework for modeling HM multi-

core architectures [28]. We develop the tools needed to model different mono-

lithic implementation styles for logic, memory, and network-on-chip (NoC) mod-

ules. The OpenSPARC T2 processor is used as a case study to compare different

monolithic implementation styles and explore the benefits of HM design. We

show that, under the same timing constraint, an HM design offers 47.2% re-

duction in footprint area and 5.3% in power consumption compared to a 2-D

design at the cost of slightly higher on-chip temperature.

The rest of the thesis is organized as follows. Chapter 2 presents prior work on

SRAM design and 3-D ICs. Chapter 3 shows how we combined multiple asymme-

tries in FinFETs to design ultra-low-power and robust SRAMs. Chapter 4 describes

the TLM SRAM design in 3-D and two new 8T SRAM cells we designed for en-

hanced TR and reduced ILEAK. Chapter 5 presents the HM 3-D floorplanner we

developed to explore the HM design space. Chapter 6 describes McPAT-monolithic,

an area/power/timing architectural framework for monolithic 3-D ICs at the multi-

core system level. Chapter 7 presents the concluding remarks and discusses future

directions.

This thesis covers material from the following publications: Refs. [25, 26, 27, 28].

The tools described in this thesis are available from www.princeton.edu/∼jha/files/tools.

14

https://www.princeton.edu/~jha/files/tools.html

Chapter 2

Related Work

This chapter discusses prior work on asymmetric FinFET-based SRAM design, TLM

3-D SRAM design, 3-D hybrid monolithic floorplanning, and monolithic 3-D design.

2.1 SPA FinFET-based SRAM design

SPA FinFETs were shown to be promising in robust, low-power, and area-efficient

SRAM design. An AWSG FinFET has different gate workfunction values for the front

and back of the gate, as shown in Fig 1.3(b). It was previously shown that asymmetry

in gate workfunction can reduce FinFET ILEAK by two orders of magnitude without

degrading performance excessively [29]. SRAM cells based on AWSG FinFETs were

also shown to be promising in terms of dc metrics and dynamic writeability with

significant ILEAK reduction [30].

An ADSG FinFET has unequally-doped source and drain regions, as shown in

Fig 1.3(c). 6T SRAM cells designed using ADSG FinFETs were shown to have

higher RSNM and WM, and reduced cell ILEAK at the cost of higher access time [31].

Unequally-doped source and drain terminals lead to unequal current flow across the

transistor, depending on whether the drain-to-source voltage bias (VDS) is positive

or negative. If the doping concentration is lower at the drain side of the FinFET,

15

the VDS > 0 current is higher (smaller) than the VDS < 0 current in an nFinFET

(pFinFET) [31]. ADSG FinFETs can help mitigate the read-write conflict in a 6T

SRAM cell. A read operation requires a weaker access transistor with respect to a pull-

down transistor, whereas a write operation requires a strong access transistor with

respect to a pull-up transistor to write the cell. These contrasting requirements create

the read-write conflict in a 6T SRAM cell. Connecting the lower-doped terminal of

an ADSG FinFET to the storage node enables a weaker access transistor during the

read operation since the voltage bias from storage node-to-bitline is negative (VDS < 0

case). On the other hand, during the write operation, the voltage bias from storage

node-to-bitline is positive and the access transistor is stronger (VDS > 0 case) [31].

An AUSG FinFET has unequal gate underlap for source and drain sides, as shown

in Fig 1.3(d). SRAM cells designed with AUSG FinFETs were shown to enhance

RSNM and writeability while reducing leakage power consumption with no area over-

head [32]. In [33], SRAM cells designed using asymmetric drain spacer extension

FinFETs were shown to improve RSNM and WM, and reduce cell ILEAK, at the cost

of higher access time and area. Similar to ADSG, AUSG FinFETs with asymmetric

gate underlap provide unequal current between source and drain for VDS > 0 and

VDS < 0 [32, 33]. Thus, AUSG FinFETs can be used to mitigate the read-write

conflict.

An MPA FinFET combines multiple asymmetries in a single FinFET. Previ-

ously, MPA FinFETs were shown to be promising for ultra-low-leakage and high-

performance logic circuit design [34]. Specifically, asymmetric workfunction-underlap

SG (AWUSG) FinFETs were shown to provide a slightly higher ION and drastically

lower IOFF compared to their traditional symmetric SG FinFET counterparts.

In Chapter 3, we present SRAM cell designs based on MPA FinFETs. We combine

the advantages of different asymmetries to obtain ultra-low-leakage, robust, and dense

SRAM cells, by exhaustively exploring the SRAM cell design space spawned by MPA

16

FinFETs. We specifically aim to use asymmetry in gate workfunction to reduce

leakage power while utilizing asymmetry in doping concentration and underlap to

mitigate the read-write conflict. We identify five promising MPA FinFET-based

SRAM cells and compare them with symmetric and SPA FinFET-based SRAM cells.

2.2 TLM 3-D SRAM design

3-D TLM technology provides a new approach to SRAM design. In a TLM design, the

footprint area of an SRAM cell can be reduced significantly by building n- and p-type

transistors on two separate layers. A smaller footprint area can lead to shorter WL

and BL, hence an improvement in SRAM performance. Several 4T, 6T, and 8T SRAM

cells implemented in 3-D TLM technology have been previously reported. Monolithic

3-D 4T/6T SRAM cells that exploit dynamic back-gate biasing were reported in [35].

Inter-layer coupling was shown to improve SRAM performance and stability. Batude

et al. [36] presented a 3-D load-less 4T SRAM cell consisting of two p-type access

and two n-type drive transistors. A thin inter-layer dielectric (ILD) was used to

dynamically manipulate the Vth of the devices to improve cell stability. A 3-D 6T

SRAM cell consisting of Indium Gallium Arsenide nMOSFETs and Germanium (Ge)

pMOSFETs was shown to improve cell stability and performance while maintaining

the same ILEAK with respect to their 2-D counterparts [37]. Although the electron

and hole mobilities of III-V materials and Ge can be higher than those of silicon, it is

challenging to fabricate high-quality transistors using heterogeneous integration [37].

A 3-D 6T SRAM cell, in which the back gates of access transistors are connected to

the adjacent storage nodes to improve read stability, was presented in [38]. A 3-D 6T

SRAM cell based on ultrathin-body MOSFETs was sequentially processed using a

low thermal budget for the first time [39]. No degradation of bottom-tier devices was

observed due to the process. It was shown that top-layer transistors exhibit almost

17

identical electrical properties as the bottom-layer transistors. Designing an area-

efficient TLM SRAM cell is challenging because n- and p-type transistors need to be

placed on two layers and connected via MIVs, which impose additional constraints on

layout. A traditional 6T SRAM cell implemented in 3-D suffers from area inefficiency

because it has four n-type and two p-type transistors. In [40], a 3-D 6T SRAM cell

consisting of three n-type and three p-type transistors was proposed to reduce the

footprint area. This cell replaces an n-type access transistor with a p-type transistor to

equalize the number of n- and p-type transistors in the SRAM cell. However, it suffers

from degraded read stability due to weak p-type access transistor, and hence requires

a single-ended read through n-type access transistor. It also needs an additional WL

for p-type access transistors. This cell also has a degraded writeability with respect

to the traditional 6T SRAM cell because the pull-up transistor is as strong as p-type

access transistor. Thus, it is harder for the p-type access transistor to discharge the

storage node and flip the cell during a write operation.

A conventional 8T SRAM consists of six n-type and two p-type transistors [41].

It offers a high read stability because the internal nodes are not disturbed during a

read operation. However, similar to 6T SRAM cell, the unequal number of n- and

p-type transistors leads to an inefficient footprint area of the conventional 8T SRAM

cell when implemented in 3-D. Thus, previously reported 3-D 8T SRAM cells were

implemented using four n-type and four p-type transistors. A 3-D 8T SRAM cell,

constructed by adding two pFinFET read access transistors to a conventional 6T

SRAM cell, was presented in [42]. pFinFET access transistors were activated during

the read operation to increase the read stability. However, this cell suffers from a

degraded read performance due to the use of weaker pFinFET access transistors.

In addition, it has a 50% lower RSNM compared to the conventional 8T SRAM

cell because its internal nodes are still disturbed during a read operation. A 3-D

8T SRAM cell, constructed by replacing n-type read transistors of a conventional 8T

18

SRAM cell with p-type transistors to equalize the number of n- and p-type transistors,

was reported in [40, 43]. However, this cell can also suffer from a degraded read

performance due to the presence of weaker p-type transistors on the read path.

In Chapter 4, we present two new 3-D FinFET-based 8T SRAM cells. Our aim

is to design an area-efficient, low-power, and robust cell, along with a high read

performance. Therefore, we replace the nFinFET access transistors of a conventional

8T SRAM cell with pFinFETs for an area-efficient 3-D design and keep nFinFETs

on the read path to maintain a high read performance. The idea of replacing n-type

access transistors with p-type transistors was investigated in 2-D 6T and 8T SRAM

cells [44, 45]. Tawfik et al. [44] showed that a FinFET-based 6T SRAM cell, which

has pFinFET access transistors and IG pFinFET pull-up transistors with back gates

tied to VDD, can improve read stability by 60% and reduce ILEAK by 21% compared

to a conventional 6T SRAM cell. However, using weaker pFinFET access transistors

in our proposed cells hurts writeability. Thus, in one of the proposed cells, we replace

the SG pull-up pFinFETs with IG pFinFETs and connect their back gates to VDD

to weaken them with respect to access transistors and improve writeability.

2.3 3-D hybrid monolithic floorplanning

An HM design consists of modules implemented in different monolithic styles. BLM

modules are implemented on a single transistor layer, hence can be viewed as 2-D

modules. GLM and TLM modules, however, are implemented on multiple transistor

layers. Thus, we consider them to be 3-D modules, which can be viewed as vertically-

aligned 2-D modules. Combining 2-D and 3-D modules imposes vertical constraints

during floorplanning because the parts of GLM and TLM modules on different layers

need to be aligned, as shown in Fig 2.1.

19

(a) (b) (c)

Figure 2.1: Monolithic 3-D floorplanning of different monolithic styles: (a) BLM, (b)GLM/TLM, and (c) HM. Dashed lines indicate the vertical constraints on GLM/TLMmodules.

To explore the HM design space, we need a floorplanner that can handle both 2-D

and 3-D modules. Several studies have been reported on floorplanning of modules un-

der vertical constraints. A 3-D floorplan representation, namely the layered transitive

closure graph, was used in [46] to handle inter-layer alignment. In [47], mixed integer

linear programming formulations were used to handle block alignment constraints in

3-D floorplanning. A 3-D floorplanner based on sequence pair representation and 3-D-

graph-based packing algorithm to control vertical module alignments was proposed

in [48]. A T*tree based 3-D floorplanner, which can handle vertically-aligned 2-D

modules, was reported in [49]. A 3-D floorplanning methodology to address intercon-

nect structures imposing module alignment constraints on TSV-based systems was

proposed in [50]. In [51], a fixed-outline 3-D floorplanner that can handle folding and

alignment of different types of modules, such as soft, hard, folded, and stacked, was

proposed. However, these floorplanners do not include global interconnect power and

do not explore the HM space. In [52], high-level synthesis was integrated into a 3-D

floorplanner. This floorplanner replaces modules with their alternatives to find better

designs. However, it cannot handle hybrid floorplanning under vertical constraints.

20

In Chapter 5, we introduce 3-D-HMFP, a 3-D HM floorplanner, which can both

handle vertical constraints and replace modules with their alternative implementa-

tions to find optimal hybrid solutions. 3-D-HMFP also takes the global interconnect

power consumption into account to obtain better floorplans. We use 3-D-HMFP to

characterize the OpenSPARC T2 processor core using different monolithic implemen-

tations and compare their footprint area, power consumption, wirelength, and peak

temperature values.

2.4 Monolithic 3-D design

3-D ICs have previously been demonstrated to be effective at addressing the inter-

connect bottleneck, reducing power consumption, and improving performance. A

physical design flow for 3-D monolithic circuits was presented in [53] and shown to

decrease area, reduce wirelength, and improve performance compared to the 2-D im-

plementation. In [19], the OpenSPARC T2 core was used as a case study and it

was shown that the GLM design has 50.0% smaller footprint area and 15.6% less

power consumption compared to its 2-D counterpart. Ref. [54] demonstrated that

low-power design techniques that include folding functional unit (FU) modules can

help reduce the power consumption of 3-D ICs based on TSVs. The OpenSPARC

T2 core, implemented in a two-tier 3-D design, was shown to offer up to 52.3% re-

duced footprint area, 27.9% shorter wirelength, and 27.8% less power consumption

compared to its 2-D counterpart. In [55], the OpenSPARC T2 HM system-on-chip

(SoC) consisting of GLM logic and BLM memory modules was shown to reduce the

SoC power consumption by 8.3% when compared to its 2-D counterpart design. Prior

works, however, do not explore a large hybrid design space. In addition, they often

use an RTL-to-GDSII design flow to implement modules in different monolithic styles,

which is accurate but slow to implement. An architectural modeling tool is needed

21

to try different architectural parameter values and explore the HM design space more

quickly. Thus, we focus on developing tools to model monolithic designs from circuit

to system level and show the benefits of monolithic 3-D integration. We specifically

investigate HM designs, which can combine advantages of different monolithic styles.

In Chapter 6, we introduce McPAT-monolithic, an area, power, and timing ar-

chitectural modeling framework for multi-core HM designs. We develop FinPrin-

monolithic, CACTI-monolithic, and Orion-monolithic to model logic, memory and

NoC modules, respectively, and integrate them into McPAT-monolithic. We also in-

tegrate 3-D-HMFP with McPAT-monolithic for floorplanning of the processor cores.

McPAT-monolithic integrated with 3-D-HMFP enables a speedy and efficient design

space exploration of HM multi-core systems.

22

Chapter 3

Ultra-low-leakage, Robust FinFET

SRAM Design Using

Multi-parameter Asymmetric

FinFETs

Memory arrays consisting of SRAM cells occupy the largest area on chip and are re-

sponsible for significant leakage power consumption in modern microprocessors. With

the transition from planar CMOS technology to FinFETs, FinFET SRAM design has

become important. However, increasing leakage power consumption of FinFETs due

to aggressive scaling, width quantization, read-write conflict, and process variations

makes FinFET SRAM design challenging. In this chapter, we show how MPA Fin-

FETs can be used to design ultra-low-leakage and robust 6T SRAM cells. We propose

five novel MPA FinFET-based SRAM cells. We show that the ILEAK of MPA FinFET-

based SRAM cells can be reduced by up to 58× while ensuring reasonable read/write

stability metrics [25]. In addition, high stability metrics can be achieved with 22×

23

ILEAK reduction compared to the traditional symmetric FinFET-based SRAM cell.

There is no area overhead associated with MPA FinFET-based SRAM cells.

3.1 Introduction

Planar CMOS devices have reached their scaling limits due to intolerable SCEs and

leakage power consumption, and have been, hence, replaced by multi-gate transistors

[56]. Among multi-gate transistors, FinFETs are the most promising owing to their

compatibility with the CMOS fabrication process [8]. In FinFETs, a gate wraps

around the channel from multiple sides. This enables a better channel control, reduces

ILEAK, alleviates SCEs, and improves scalability of FinFETs. FinFETs also have

higher mobility and are less sensitive to RDF owing to their lightly-doped or undoped

channel [9].

FinFETs are very promising for SRAM design due to their robustness, higher

performance, and density. FinFET-based SRAMs have better read and write cur-

rents compared to their planar CMOS-based SRAM counterparts because FinFETs

have improved subthreshold slope, which enables lower Vth at given IOFF . Reduced

DIBL enables higher stability for FinFET-based SRAMs. Moreover, FinFET-based

SRAMs do not suffer from random dopant fluctuation, and hence have reduced Vth

and performance variation, which increases their robustness. FinFET-based SRAMs,

however, suffer from the width quantization issue, which is not an issue for planar

CMOS-based SRAMs. Overall, FinFET-based SRAMs have been shown to be supe-

rior to planar CMOS-based SRAMs. Therefore, a lot of effort has been directed at

FinFET-based SRAM design [11, 57].

Although FinFETs have less leakage than planar CMOS transistors, leakage power

consumption is still a major issue in FinFETs due to aggressive scaling. Besides,

the width quantization issue associated with FinFETs, process variations, and read-

24

write conflict in SRAM cells make FinFET SRAM design even more challenging.

One way to address these problems is to design SRAMs based on FinFETs with

asymmetric parameters. SRAM cells based on SPA FinFETs with asymmetry in

gate workfunction [30], source/drain doping concentration [31], gate underlap [33, 32],

and fin height [58] have been reported. These works have shown that SPA FinFETs

can reduce ILEAK, improve stability metrics of the SRAM cell, and help mitigate the

SRAM read-write conflict.

We take a step further. We characterize SRAM cells based on MPA FinFETs.

Such FinFETs combine two or more asymmetries in a single FinFET to gain advan-

tage from all the asymmetries. In designing MPA FinFET-based SRAM cells, we use

FinFETs with up to three asymmetries: in gate workfunction, source/drain doping

concentration, and gate underlap. MPA FinFETs offer newer tradeoffs in SRAM

design among leakage power consumption, robustness, and performance. In order to

achieve the highest density so that the area occupied by SRAM cells is minimized, we

only use single-fin FinFETs in our SRAM cells. We show that using FinFETs with

combined asymmetry in gate workfunction and source/drain doping concentration

can reduce the ILEAK of the SRAM cell by 58× while maintaining reasonable SRAM

stability metric values [25]. We also show that high read stability can be achieved

while reducing leakage power by 22×.

The rest of the chapter is organized as follows. Section 3.2 describes the sim-

ulation setup and provides SRAM dc and transient metrics needed to analyze an

SRAM cell. Section 3.3 describes the design and selection of promising MPA FinFET-

based SRAM cells and includes a comparative analysis of dc and transient metrics.

Section 3.4 evaluates SRAM metrics under different gate workfunction, doping con-

centration, supply voltage, and temperature values. Section 3.5 discusses the effect

of process variations on SRAM cells. Section 3.6 presents discussion of the results.

Section 3.7 concludes the chapter.

25

Table 3.1: 22nm SOI asymmetric FinFET device parameter values

Parameter (unit) ValueΦGF (eV ) 4.4ΦGB(eV ) 4.8NS(cm−3) 1020

ND(cm−3) 1019

LUNS(nm) 2LUND(nm) 12

3.2 Simulation setup

We evaluate the SRAM cells using a 22nm SOI FinFET technology. Table 1.1 shows

the parameter values for the traditional symmetric FinFETs. We perform 2-D hy-

drodynamic device simulations using Sentaurus Device Simulator [59] to measure

SRAM dc and transient metrics. We use Phillips unified model together with band-to-

band tunneling and avalanche multiplication models for mobility, Bandgap narrowing

model, and Shockley-Read-Hall recombination models for accurate simulation. We

perform initial simulations at the room temperature of 300◦K and a VDD of 0.9 V.

We consider other temperature values later.

In this chapter, we investigate SRAM cells based on FinFETs with asymmetry

in gate workfunction, source/drain doping concentration, gate underlap, and their

combinations. We use the parameter values shown in Table 3.1 for asymmetric SG

FinFETs (these values were chosen after careful evaluation). The asymmetric param-

eters are front-gate workfunction ΦGF , back-gate workfunction ΦGB, source doping

concentration NS, drain doping concentration ND, gate underlap at source side LUNS,

and gate underlap at drain side LUND. Other parameter values are the same as those

for symmetric SG FinFETs, as shown in Table 1.1. The SPA FinFETs we consider are

AWSG, ADSG, and AUSG. Understanding the impact of each technique on FinFET

device characteristics offers valuable insights into SRAM design. AWSG FinFETs

have different ΦGF and ΦGB. They have higher Vth, which leads to a slightly lower

26

Table 3.2: Numerical representation of FinFETs

FinFET type Numerical RepresentationSG 0 (000)

AWSG 1 (001)ADSG 2 (010)

AWDSG 3 (011)AUSG 4 (100)

AWUSG 5 (101)ADUSG 6 (110)

AWDUSG 7 (111)

ION but drastically lower IOFF compared to those of SG FinFETs. Asymmetry in

workfunction, therefore, can be an enabler for ultra-low-leakage SRAM cells with

high stability. L2 and L3 caches contribute significantly towards power consump-

tion via leakage power. Asymmetric workfunction incorporated SRAM cells can help

reduce the leakage power of the higher-level caches while providing higher stability

metrics. However, the delays associated with SRAM cells worsen because asymmetry

in workfunction leads to lower ION . The ADSG FinFET we use has a lower doping

concentration on the drain side (ND), which increases the resistance on the drain side

and reduces ION and IOFF . Reduced IOFF can help reduce the leakage power of an

SRAM cell while reduced ION can degrade the access time. In an AUSG FinFET,

the gate underlap is reduced on the source side (LUNS), causing both ION and IOFF

to increase. Both ADSG and AUSG nFinFETs (pFinFETs) have greater current

flowing for VDS > 0 (VDS < 0) compared to VDS < 0 (VDS > 0). Asymmetry in

doping concentration and underlap can address applications where the stability is the

bottleneck and read-write conflict is critical.

The device footprint of a FinFET does not increase when an asymmetry is intro-

duced, as shown in Fig. 1.6. Thus, the area of an MPA FinFET is the same as those

of symmetric and SPA FinFETs. In the rest of the chapter, for ease of reference, we

represent each FinFET type with a three-bit number, one bit for each asymmetric

parameter. Table 3.2 shows this representation.

27

The schematic of a conventional 6T SRAM cell is shown in Fig. 1.5. For the

sake of discussion, it is assumed that initially storage nodes L and R store “1” and

“0”, respectively. Each SRAM cell is represented with three numbers that denote

the type of pull-up, access, and pull-down transistors, respectively. For example,

a (0,1,4) SRAM cell consists of SG pull-up FinFETs, AWSG access FinFETs, and

AUSG pull-down FinFETs. This notation is used throughout the chapter.

3.2.1 SRAM dc metrics

Read stability, writeability, IREAD, and ILEAK are the main dc metrics of SRAM cells.

SRAM stability is of utmost importance for successful read/write operations. In this

chapter, the two most common methods to measure SRAM stability are used: static

noise margin and N-curve. RSNM depicts the minimum noise voltage that would

flip the cell bit during a read operation, thus causing a read failure. Another way

to measure stability is through the N-curve [60], as shown in Fig. 3.1. The N-curve

can be obtained using the same read simulation setup described in Section 1.1.3, by

plotting the current flowing into node L with respect to the voltage at L. The N-curve

takes both voltage and current into account. The read power noise margin (RPNM)

can be obtained from the N-curve using the following equation:

RPNM =∫ VB

VAILdVL.

RPNM is used to measure read stability. It is the power needed to upset the cell

during a read operation. A higher RSNM or RPNM implies that the SRAM cell is

more robust during the read operation.

Writeability can be quantified using WM, which is defined as the length of the

smallest square that can fit in the right half of the write butterfly curve [61], as

shown in Fig. 1.7b. Another way to measure the writeability of an SRAM cell is the

28

0 0.2 0.4 0.6 0.8−200

−150

−100

−50

0

50

100

150

A B C

VL (V)

I L (

µA)

Figure 3.1: N-curve.

write-trip power (WTP), obtained from the N-curve (Fig. 3.1), using

WTP =∫ VC

VB|IL|dVL.

A higher WM or lower WTP corresponds to better writeability.

Other dc metrics IREAD and ILEAK are explained in Section 1.1.3.

3.2.2 SRAM transient metrics

TR and TW of an SRAM cell constitute its transient metrics. We measure TR and

TW of an SRAM array consisting of 256 rows (wordlines) and 128 columns (bitlines).

We use the transport-analysis-based 3-D technology computer-aided design (TCAD)

capacitance extraction technique [62] to extract the front-end-of-line (FEOL) + back-

end-of-line (BEOL) parasitic capacitance of SRAM cells. We back-annotate the ex-

tracted capacitance in mixed-mode device simulations for better accuracy in transient

simulations. We use the π3 distributed RC line model for the wordline and bitline.

29

3.3 MPA FinFET-based 6T SRAM cells

In order to minimize the area occupied by SRAM cells and achieve the highest density,

we investigate SRAM cells consisting of FinFETs with a single fin only. Next, we

discuss how the most promising MPA FinFET-based SRAM cells are selected.

3.3.1 Selection of promising SRAM cells

We first characterize all possible configurations of MPA FinFET-based SRAM cells

using TCAD simulations. Then, the promising cells are selected from among them

using a partially automated process. Due to layout constraints, e.g., access and pull-

down transistors share the terminal connected to the storage nodes, both access and

pull-down transistors are either symmetrically or asymmetrically-doped (2 cases).

Pull-up transistors can be symmetric or asymmetric in terms of source/drain dop-

ing concentration, independent of other transistors (2 cases). In the context of gate

underlap, each FinFET type (pull-up, access, pull-down) can be either symmetric

or asymmetric (23 cases). Similarly, 23 cases arise from gate workfunction symme-

try/asymmetry. Thus, in all there are 2 × 2 × 23 × 23 = 256 different SRAM cell

configurations. We eliminate non-promising cells through the following steps, based

on SRAM dc metrics.

1. Cells that are dominated by other cells are eliminated first. For example, cell

(0,1,5), which consists of symmetric SG (0) pull-up, asymmetric workfunction

SG (1) access, and asymmetric workfunction-underlap SG (5) pull-down Fin-

FETs, is better in every dc metric compared to cell (6,2,2), which is composed

of ADUSG pull-up, ADSG access, and ADSG pull-down FinFETs, as shown

in Table 3.3. Therefore, cell (0,1,5) dominates cell (6,2,2). Hence, the latter is

eliminated. 28 cells are eliminated in this step.

30

Table 3.3: SRAM cell elimination examples

SRAM RSNM (mV) WM (mV) RPNM (µW) WTP (µW) IREAD (µA) ILEAK (nA)(6,2,2) 119 246 9.754 14.776 80.017 1.867(0,1,5) 136 275 12.977 11.550 80.421 0.120(4,0,1) 44 326 2.628 3.391 91.251 1.152(6,6,6) 130 242 12.797 19.944 84.691 3.589(2,3,7) 148 214 10.778 12.921 56.991 0.904(7,3,2) 174 255 17.464 12.403 56.870 0.103(4,1,1) 138 219 11.488 11.170 71.160 0.185(0,1,1) 126 313 9.884 8.298 71.160 0.097

2. Cells with low RSNM (RSNM < 90 mV) are eliminated next, since a low RSNM

may lead to SRAM failure, e.g., cell (4,0,1). 84 of the remaining cells are

eliminated in this step due to their low RSNM. Most of these cells have stronger

access transistors with respect to their pull-down transistors.

3. Since leakage power is a major concern in SRAM design, cells with high leakage

(ILEAK > 2.4 nA) are eliminated, e.g., cell (6,6,6). 37 of the remaining cells are

eliminated in this step due to their high ILEAK. These are mostly the cells with

asymmetry in underlap, which leads to high ILEAK.

4. Among the remaining non-dominated cells, elimination is done manually

through the following steps. 98 of the remaining cells are eliminated in this

step.

• If a cell is nearly dominated by another cell, it is eliminated. For example,

cell (2,3,7) is eliminated because it is worse than cell (7,3,2) in every metric

but IREAD, and only slightly superior in IREAD.

• Cells with fewer asymmetries are favored since they require fewer fabri-

cation steps. For example, although cell (0,1,1) has slightly lower read

stability, it has less fabrication cost and is better in other metrics com-

pared to cell (4,1,1). Thus, cell (4,1,1) is eliminated.

31

Table 3.4: 6T SRAM configurations

SRAM PU AX PD(0,0,0) [30] SG SG SG(1,1,1) [30] AWSG AWSG AWSG(2,2,2) [31] ADSG ADSG ADSG(4,4,4) [32] AUSG AUSG AUSGPGFB [63] SG IG SG

(0,1,1) SG AWSG AWSG(0,1,5) SG AWSG AWUSG(0,3,3) SG AWDSG AWDSG(3,3,3) AWDSG AWDSG AWDSG(4,3,7) AUSG AWDSG AWDUSG

After the above eliminations, only five MPA FinFET-based SRAM cells appear to

be promising. The SRAM cell configurations selected for further analysis, including

the cell consisting of symmetric SG FinFETs, three SPA FinFET-based cells, pass-

gate feedback (PGFB) design consisting of symmetric SG and IG FinFETs, and the

five proposed MPA FinFET-based cells, are shown in Table 3.4. PGFB is a well-

known 6T SRAM cell designed for high read stability [63]. It uses IG FinFETs as

access transistors. The back gate of the access transistor is connected to the storage

node in order to weaken the access transistor to achieve high read stability.

3.3.2 SRAM dc metrics analysis

In this section, we demonstrate that MPA FinFETs offer competitive dc metrics, such

as read stability, writeability, IREAD, and ILEAK. Table 3.5 shows dc and transient

metric values for the analyzed SRAM cells.

Cell (0,0,0) consists of high-performance and low-Vth SG FinFETs. Though cell

(0,0,0) has low RSNM and RPNM, it has high writeability and IREAD. However, its

ILEAK is also high. Since cell (0,0,0) is composed of conventional FinFETs, we use

this cell as the baseline.

32

Table 3.5: 6T SRAM dc and transient metric values

SRAM RSNM (mV) WM (mV) RPNM (µW) WTP (µW) IREAD (µA) ILEAK (nA) TR (ps) TW (ps)(0,0,0) 31 327 2.074 3.838 108 2.033 27.0 34.2(1,1,1) 91 361 5.570 3.354 71 0.041 45.9 38.2(2,2,2) 106 303 8.185 11.538 80 1.803 34.0 32.9(4,4,4) 68 264 5.946 13.409 121 4.529 23.5 36.1PGFB 170 159 12.738 15.788 60 2.033 49.7 64.5(0,1,1) 126 313 9.884 8.298 71 0.097 45.9 50.8(0,1,5) 136 275 12.977 11.550 80 0.120 44.6 52.6(0,3,3) 173 287 14.768 10.681 54 0.091 54.7 54.8(3,3,3) 135 339 9.647 4.996 54 0.035 54.8 38.7(4,3,7) 188 158 19.953 18.121 57 0.197 53.6 65.2

Cell (1,1,1) reduces ILEAK by 50× since it is based on AWSG FinFETs, thus

addressing the SRAM leakage power problem very effectively. Cell (1,1,1) also has

the best writeability metric values, i.e., highest WM and lowest WTP, owing to a

weaker pull-up transistor compared to an access transistor. Although cell (1,1,1)

provides a higher RSNM and RPNM compared to cell (0,0,0) due to increased Vth,

read stability is still low since the access and pull-down FinFETs are of the same

type and hence have the same strength. IREAD decreases by 34% due to the weaker

transistors.

Cell (2,2,2) utilizes bidirectional current flow across access transistors using ADSG

FinFETs. This helps mitigate the read-write conflict of the SRAM cell. By connect-

ing the low-doped terminal to the storage node, the access transistor is made weaker

during the read operation and stronger during the write operation. Therefore, ADSG

access transistors not only enhance read stability, but also help writeability. Com-

pared to cell (0,0,0), cell (2,2,2) demonstrates a significant improvement in RSNM

and RPNM, but lower writeability. Due to the lower drain doping concentration, the

drain resistance of an ADSG FinFET is larger than that of an SG FinFET. As a

result, IREAD is degraded by 26%, but ILEAK is reduced by 11%.

Cell (4,4,4) offers the highest IREAD among all configurations owing to the high

current drive of AUSG FinFETs. Decreasing LUN on the source side enables a higher

ION , but with an increased IOFF . Therefore, ILEAK increases by 2.2×. Besides, like

33

the ADSG FinFETs, AUSG FinFETs can also utilize bidirectional current flow across

access transistors to help resolve the read-write conflict.

The PGFB cell provides high read stability by connecting the storage node to the

back gate of the IG access transistor, thus weakening the access transistor during the

read operation. However, due to the weaker access transistor, WM reduces by 51%,

thus degrading writeability, and IREAD decreases by 44%. ILEAK remains the same as

that of cell (0,0,0).

In the case of the chosen MPA FinFET-based SRAM cells, cell (0,1,1) demon-

strates that changing even a single FinFET type can significantly affect the SRAM

metric values. Compared to cell (1,1,1), changing the pull-up transistor from AWSG

to SG increases RSNM by 38%, as shown in Fig. 3.2, and RPNM by 77%, at the cost

of worse writeability: 13% decrease in WM and 147% increase in WTP. The effect

of the pull-up transistor on writeability is straightforward. As the pull-up transistor

gets stronger, it becomes harder for the access transistor to discharge the storage node

that stores a “1”. Therefore, more power is required to complete the write opera-

tion. As a result, WM decreases and WTP increases. On the other hand, the pull-up

transistor strength can also impact the RSNM by changing the shape of the read

butterfly curve. Although access and pull-down transistors are the main components

that determine read stability, for better read stability the pull-up transistor needs to

be selected carefully as well. Cell (0,1,1) has 34% and 21× reduction in IREAD and

ILEAK, respectively, compared to those of cell (0,0,0).

Cell (0,1,5) has better read stability and worse writeability compared to those

of cell (0,1,1) owing to the strong AWUSG pull-down FinFET. Cell (0,1,5) has the

highest IREAD among the proposed MPA FinFET-based SRAM cells. Leakage is

reduced by 17× compared to that of cell (0,0,0) as a result of the asymmetric gate

workfunction in nFinFETs.

34

0 0.2 0.4 0.6 0.8V

L (V)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

VR

(V

)

(0,1,1)(1,1,1)

Figure 3.2: RSNMs of the (1,1,1) and (0,1,1) cells show how the pull-up transistorcan impact read stability.

Cell (0,3,3) offers high read stability together with reasonable writeability. By

combining asymmetries in gate workfunction and doping concentration, the ILEAK is

reduced by 22× compared to that of cell (0,0,0), and IREAD is decreased by 50%.

Cell (3,3,3) consists of AWDSG FinFETs, and hence offers the lowest leakage

among all cells, since leakage is reduced both due to the asymmetry in gate work-

function and in doping concentration. ILEAK is reduced by 58× compared to that of

cell (0,0,0). Yet, cell (3,3,3) has reasonable stability metric values due to its high-Vth

FinFETs, and unequal current flow through asymmetrically-doped access transistor

during read and write operations. Compared to cell (0,3,3), its RSNM is 22% and

RPNM is 35% smaller while WM is 18% higher and WTP is 53% smaller, as a result

of the weaker pull-up transistor. Cell (3,3,3) has the best writeability (highest WM,

lowest WTP) among the proposed MPA FinFET-based SRAM cells.

Cell (4,3,7) has the highest RSNM and RPNM of all SRAM cells. Adding asym-

metry in gate underlap to the pull-down transistor makes it stronger and increases

read stability. However, due to the use of a strong pull-up transistor, cell (4,3,7)

35

has the worst writeability. IREAD is 47% and ILEAK is 10× smaller than those of cell

(0,0,0).

Overall, an SRAM cell consisting of SG FinFETs suffers from adverse read stabil-

ity and leakage power consumption. One way to reduce the ILEAK and increase read

stability is to increase the Vth of the FinFETs. Asymmetry in gate workfunction can

be used to reduce leakage if we want to avoid using new gate workfunction values

to increase Vth. Use of asymmetry in doping concentration and gate underlap was

shown to be effective in mitigating the read-write conflict.

3.3.3 SRAM transient metrics analysis

Table 3.5 also shows the transient metric values of analyzed 6T SRAM cells. As

expected, as IREAD increases, TR decreases because it takes less time to discharge the

bitline capacitance and generate the voltage difference between the bitlines to activate

the sense amplifier. TR is the smallest for cell (4,4,4) as it provides the highest IREAD.

Cell (0,1,5) has the smallest TR among proposed MPA FinFET-based SRAM cells.

Due to their lowest IREAD, cells (0,3,3) and (3,3,3) have the worst TR, 2× larger than

that of cell (0,0,0).

TW is highly correlated with the strength of the pull-up and access transistors.

Cell (2,2,2) has weaker pull-up transistor compared to access transistor, leading to

better writeability and smaller TW. For the same reason, cell (3,3,3) has the smallest

TW among the MPA FinFET-based SRAM cells.

MPA FinFET-based SRAM cells we propose suffer from inferior transient metrics

because the strength of the MPA FinFETs is reduced due to asymmetry in gate

workfunction and doping concentration. Reduced ION increases the time to charge

and discharge capacitances. Therefore, both TR and TW have increased for SRAM

cells based on MPA FinFETs. Resizing the memory array or increasing the supply

voltage can help improve TR and TW.

36

3.4 Analysis of the SRAM cells under different

gate workfunction, doping concentration, sup-

ply voltage, and temperature values

In this section, we demonstrate that the proposed MPA FinFET-based SRAM cells re-

main promising under different gate workfunction (hence, different Vth), source/drain

doping concentration, supply voltage, and temperature values.

3.4.1 Different gate workfunction values

We have chosen gate workfunctions such that SG nFinFETs have ΦG = (4.4+∆Φ)eV

and SG pFinFETs have ΦG = (4.8−∆Φ)eV in order to maintain approximately twice

ION in nFinFETs than in pFinFETs. AWSG FinFETs have ΦGF = (4.4+∆Φ)eV and

ΦGB = (4.8−∆Φ)eV . ∆Φ changes from 0 to 0.2eV with a step size of 0.02eV . The

gate underlap asymmetry is chosen such that an AWUSG FinFET performs better

than an SG FinFET. To achieve this aim, we choose LUNS = (2+∆Φ/0.02)nm, which

guarantees that an AWUSG FinFET has a higher ION and lower IOFF compared

to those of an SG FinFET. At ∆Φ = 0.2eV (ΦGF = ΦGB = 4.6eV ), there is no

asymmetry in the gate workfunction. Hence, in this case, we do not introduce any

asymmetry in gate underlap. As a result, all SRAM cells, except PGFB, converge to

the behavior of the (0,0,0), (2,2,2), or (0,2,2) cells at ∆Φ = 0.2eV .

The RSNM for the SRAM cells for different gate workfunction values is shown in

Fig. 3.3. As ∆Φ (hence, Vth) increases, the RSNM of cell (0,0,0) increases. The PGFB

cell becomes even more advantages as ∆Φ increases, due to weaker access transistors.

The PGFB cell has 72% higher while cells (2,2,2) and (0,2,2) have 26% higher RSNM

compared to that of cell (0,0,0) when ∆Φ = 0.2eV .

37

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20

RS

NM

(V

)

ΔΦ (eV)

(0,0,0) (1,1,1) (2,2,2) (4,4,4) PGFB (0,1,1) (0,1,5) (0,3,3) (3,3,3) (4,3,7)

Figure 3.3: RSNM under different gate workfunction values.

Fig. 3.4 shows the plots of WM with respect to ∆Φ. As ∆Φ increases, WM

increases for all cells except for PGFB. Specifically, the WM of cell (4,3,7) increases

significantly since the impact of asymmetric gate underlap on the pull-up transistor

decreases. Cells (2,2,2) and (0,2,2) have 2% lower while PGFB has 52% lower WM

compared to that of cell (0,0,0) when ∆Φ = 0.2eV .

As shown in Fig. 3.5, RPNM changes are similar to those in RSNM under different

gate workfunction values. At ∆Φ = 0.2eV , the PGFB, (0,2,2), and (2,2,2) cells have

90%, 29%, and 27% higher RPNM compared to that of cell (0,0,0), respectively.

WTP decreases as ∆Φ increases, as shown in Fig. 3.6. The PGFB cell has the

worst WTP when ∆Φ = 0.2eV , 153% higher than that of cell (0,0,0).

IREAD is plotted in Fig. 3.7. The change in IREAD for the (1,1,1) and proposed

MPA FinFET-based SRAM cells is less than 8% since they have FinFETs with an

asymmetric gate workfunction. Increasing ΦGF from 4.4eV to 4.6eV increases nFin-

FET Vth, thus decreases the current flow. On the other hand, the change in ΦGB

from 4.8eV to 4.6eV helps increase the nFinFET current. A similar scenario is valid

38

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20

WM

(V

)

ΔΦ (eV)

(0,0,0) (1,1,1) (2,2,2) (4,4,4) PGFB (0,1,1) (0,1,5) (0,3,3) (3,3,3) (4,3,7)

Figure 3.4: WM under different gate workfunction values.

0

5

10

15

20

25

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20

RP

NM

(μW

)

ΔΦ (eV)

(0,0,0) (1,1,1) (2,2,2) (4,4,4) PGFB (0,1,1) (0,1,5) (0,3,3) (3,3,3) (4,3,7)

Figure 3.5: RPNM under different gate workfunction values.

for pFinFETs. Therefore, the AWSG FinFET current changes less compared to the

SG FinFET current as ∆Φ increases. This explains the small change in IREAD for

the (1,1,1) and proposed MPA FinFET-based SRAM cells. The (2,2,2), (0,2,2), and

39

0

2

4

6

8

10

12

14

16

18

20

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20

WT

P (μ

W)

ΔΦ (eV)

(0,0,0) (1,1,1) (2,2,2) (4,4,4) PGFB (0,1,1) (0,1,5) (0,3,3) (3,3,3) (4,3,7)

Figure 3.6: WTP under different gate workfunction values.

0

20

40

60

80

100

120

140

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20

I RE

AD

(μA

)

ΔΦ (eV)

(0,0,0) (1,1,1) (2,2,2) (4,4,4) PGFB (0,1,1) (0,1,5) (0,3,3) (3,3,3) (4,3,7)

Figure 3.7: IREAD under different gate workfunction values.

PGFB cells have 25%, 25%, and 54% lower IREAD compared to that of cell (0,0,0)

when ∆Φ = 0.2eV .

40

0.00

0.01

0.10

1.00

10.00

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20

I LE

AK

(nA

)

ΔΦ (eV)

(0,0,0) (1,1,1) (2,2,2) (4,4,4) PGFB (0,1,1) (0,1,5) (0,3,3) (3,3,3) (4,3,7)

Figure 3.8: ILEAK under different gate workfunction values.

As ∆Φ increases, ILEAK decreases exponentially due to an increase in Vth, as shown

in Fig. 3.8. At ∆Φ = 0.2eV , cells (2,2,2) and (0,2,2) have 28% less, while the PGFB

cell has 19% less ILEAK compared to that of cell (0,0,0).

Fig. 3.9 shows TR under different gate workfunction values. TR changes by less

than 12% for the (1,1,1) and MPA FinFET-based SRAM cells, since IREAD change

was less than 8% for these cells. At ∆Φ = 0.2eV , TR is 82% higher for the PGFB cell

compared to that of cell (0,0,0), while the (2,2,2) and (0,2,2) cells have 15% higher

TR.

TW is 60% larger for the PGFB cell when ∆Φ = 0.2eV , while it only increases by

1% and 4% for cells (2,2,2) and (0,2,2), respectively, compared to that of cell (0,0,0),

as shown in Fig. 3.10.

3.4.2 Different doping concentration values

Next, we changed the doping concentration of asymmetrically-doped FinFETs and

observed the effect on the SRAM metric values. Changing the drain doping concen-

41

0

10

20

30

40

50

60

70

80

90

100

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20

TR

(ps)

ΔΦ (eV)

(0,0,0) (1,1,1) (2,2,2) (4,4,4) PGFB (0,1,1) (0,1,5) (0,3,3) (3,3,3) (4,3,7)

Figure 3.9: TR under different gate workfunction values.

0

10

20

30

40

50

60

70

80

0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20

TW

(ps)

ΔΦ (eV)

(0,0,0) (1,1,1) (2,2,2) (4,4,4) PGFB (0,1,1) (0,1,5) (0,3,3) (3,3,3) (4,3,7)

Figure 3.10: TW under different gate workfunction values.

tration of asymmetrically-doped FinFETs can affect SRAM stability significantly, as

shown in Table 3.6. For example, cell (0,3,3) has 21% lower RSNM and 6% higher

WM when ND = 5×1019 cm−3 compared to when ND = 1019 cm−3. By changing the

42

Table 3.6: SRAM stability metric values under different drain doping concentrations

SRAM RSNM (mV) WM (mV)(2,2,2) 106 47 303 321(0,3,3) 173 137 287 304(3,3,3) 135 101 339 355(4,3,7) 188 156 158 171

ND(cm−3) 1019 5× 1019 1019 5× 1019

doping concentration of the drain terminal, one may achieve the desired read stability

metric values at the cost of writeability, and vice versa.

3.4.3 Different supply voltage values

We simulated all analyzed SRAM cells under different VDD values to compare their

ILEAK at the same performance and vice versa. We changed the VDD values of the cells

to get the same performance by ensuring that each cell has the same IREAD (108 µA)

and compared their ILEAK. Similarly, we compared the IREAD of cells for the same

ILEAK (2.033 nA). The results are shown in Table 3.7. For the same performance

(iso-IREAD), cell (1,1,1) has the lowest ILEAK, which is 25× smaller than that of cell

(0,0,0), thanks to the asymmetry in gate workfunction. Cell (0,1,5) provides the

highest IREAD for the same ILEAK (iso-ILEAK) due to the asymmetric gate underlap

in the pull-down transistor. Cell (0,3,3), which was shown to be promising in terms

of stability metrics, offers 7.79× ILEAK reduction for the same performance and 33%

higher IREAD for the same ILEAK. Overall, MPA FinFET-based SRAMs offer higher

performance for the same ILEAK and lower ILEAK at the same performance compared

to the conventional SRAM cell.

3.4.4 Different temperature values

The temperature of on-chip SRAM arrays can go up to 90◦C [64]. We perform

simulations under temperature value of 27◦C (300◦K), for which results are given in

43

Table 3.7: Comparison of SRAM cells at iso-IREAD/iso-ILEAK

SRAM ILEAK at iso-IREAD IREAD at iso-ILEAK

(0,0,0) 1.00× 1.00×(1,1,1) 0.04× 1.64×(2,2,2) 0.92× 1.34×(4,4,4) 2.16× 0.01×PGFB 1.17× 0.56×(0,1,1) 0.07× 1.63×(0,1,5) 0.07× 1.78×(0,3,3) 0.13× 1.33×(3,3,3) 0.10× 1.33×(4,3,7) 0.17× 1.37×

Table 3.8: SRAM metric values at 0◦C


Table 3.5, 0◦C, 65◦C and 90◦C, for which results are given in Table 3.8, 3.9, and

3.10, respectively. With an increasing temperature, read stability gets worse as RSNM

and RPNM decrease. Writeability, on the other hand, increases with an increasing

temperature. IREAD decreases at higher temperature since mobility degrades due to

impurity scattering. ILEAK increases exponentially with increasing temperature. Yet,

cell (3,3,3) delivers 39× reduction in ILEAK at 90◦C. TR increases with an increasing

temperature due to the decrease in IREAD. TW decreases slightly as temperature

increases. Overall, the proposed MPA FinFET-based SRAM cells can be seen to

maintain their advantages at corner temperatures.

44





3.5 Process variations

Process variations pose a severe challenge to SRAM performance due to the scaling

of both device parameters and VDD. SRAM cells are especially prone to process

variations because they are built from the smallest transistors to achieve high density.

Besides, SRAM operation depends on perfectly-matched transistors, which makes it

even more sensitive to process variations. We investigate variations in LG, TOX , TSI ,

and ΦG because they have been shown to affect SRAM performance and stability

metrics significantly [65]. Table 3.11 shows the nominal and [−3σ, 3σ] variation range

of these parameters. We assume the physical parameters have a normal distribution

[66] with 3σ/µ = 10% variation. We generate 100 sample points for Sobol sequence

based quasi Monte Carlo simulation, which provides performance akin to Monte Carlo

simulation while needing several orders of magnitude fewer sample points. We show

the distribution characteristics, i.e., mean (µ) and standard deviation (σ), due to

process variations in Table 3.12 and Table 3.13.

45

Table 3.11: Process variations

Parameter Nominal Value Range[−3σ, 3σ]LG(nm) 24 [21.6, 26.4]TSI(nm) 10 [9, 11]TOX(nm) 1 [0.9, 1.1]ΦGF (eV ) 4.4 [4.38, 4.42]ΦGB(eV ) 4.8 [4.78, 4.82]

Table 3.12: Distribution characteristics, µ


Table 3.13: Distribution characteristics, σ

SRAM RSNM (mV) WM (mV) RPNM (µW) WTP (µW) IREAD (µA) ILEAK (nA) TR (ps) TW (ps)(0,0,0) 6 1 0.315 0.524 1.6 0.562 1.123 0.732(1,1,1) 5 2 0.396 0.166 1.2 0.010 0.853 0.722(2,2,2) 5 1 0.372 0.153 1.2 0.489 0.950 0.683(4,4,4) 5 2 0.483 0.594 2.2 1.585 0.949 0.977PGFB 6 3 0.551 0.143 1.6 0.562 1.324 1.514(0,1,1) 5 3 0.486 0.168 1.3 0.020 0.853 0.746(0,1,5) 5 3 0.608 0.227 1.7 0.026 0.860 0.642(0,3,3) 4 3 0.467 0.151 1.0 0.019 0.842 0.783(3,3,3) 5 1 0.426 0.136 0.9 0.008 0.842 0.753(4,3,7) 4 5 0.650 0.409 1.1 0.059 0.834 1.106

Cells (0,3,3) and (4,3,7) have the highest µ and lowest σ for RSNM, which makes

them highly robust during read operation. We observe a 3σ RSNM value of 12-18

mV across the cells, which could negatively impact read operation for SRAM cells

with a lower RSNM.

Process variations have less impact on WM compared to RSNM. Cell (3,3,3) has

the highest µ and lowest σ among the MPA FinFET-based SRAM cells, thus offering

the best writeability performance. Cell (4,3,7) has the worst WM with the lowest µ

and highest σ.

46

Although cell (4,3,7) has the highest σ for RPNM, its σ/µ ratio is the smallest

together with that of cell (0,3,3), which makes them better in read stability.

In terms of WTP, cell (4,3,7) has the largest µ and largest σ among SRAM cells

based on MPA FinFETs, which confirms its inferior writeability performance.

SRAM cells that have a higher IREAD µ have a higher σ as well. IREAD σ/µ ratio

is less than 3% for the analyzed SRAM cells.

Process variations affect ILEAK the most because gate workfunction variation is

the main contributor to Vth variation, which impacts ILEAK exponentially. Cell (3,3,3)

has both the smallest µ and σ.

TR and TW of MPA FinFET-based SRAM cells under process variations have a

σ that is smaller than 2% of µ.

3.6 Discussion

We proposed and compared five MPA FinFET-based SRAM cells to symmetric and

SPA FinFET-based SRAM cells in terms of dc and transient metrics. Results show

that by combining multiple asymmetries in FinFETs, we can design very low-leakage

and robust SRAM cells.

For better read stability, stronger pull-down transistors compared to access tran-

sistors are needed in order to keep the voltage at the internal node that stores a “0”

as small as possible during the read operation. On the other hand, strong access and

weak pull-up transistors are needed for better writeability to more easily flip the bit of

the cell during the write operation. Using transistors with multiple fins (hence, larger

width), better SRAM stability behavior can be achieved. For example, increasing the

number of fins in the pull-down transistor will increase its strength and provide better

read stability. However, increasing the number of fins increases cell area and leakage.

As SRAMs already occupy a large fraction of the on-chip area and consume an even

47

larger fraction of leakage power, designing SRAM cells with single-fin FinFETs saves

area and power. The MPA FinFETs we analyzed offer different effective strengths

even though they are all assumed to have a single fin. Thus, they offer a path to

high-density SRAMs with low leakage and acceptable stability values.

Requiring the access transistors to be weaker for better read stability and stronger

for better writeability leads to the so-called read-write conflict. This can be addressed

by using FinFETs with asymmetry in doping concentration or gate underlap as access

transistors. Since they exhibit an unequal current flow across the FinFET for positive

and negative voltage bias between the source and drain terminals, they can help

mitigate the read-write conflict.

For ultra-low power SRAMs, cell (3,3,3) is the most promising since it offers a

58× ILEAK reduction, assuming ΦGF = 4.4eV and ΦGB = 4.8eV , compared to the

SG FinFET-based SRAM cell. Due to its asymmetry in doping concentration, it

also provides comparable stability values. On the other hand, reduced current drive

increases TR and TW of cell (3,3,3).

The ILEAK improvement in cell (3,3,3) is as expected. In the hold mode, three Fin-

FETs of a 6T SRAM cell leak (PD1, AX2, and PU2). Cell ILEAK can be estimated by

adding the leakage of these FinFETs without simulating the entire cell. The ILEAK of

an SG pFinFET and nFinFET is 0.056 nA and 0.983 nA, respectively. Thus, ILEAK

of cell (0,0,0) can be estimated as 2.022 nA (0.056+2× 0.983 = 2.022), which is very

close the value (2.033 nA) obtained via simulating the entire cell. ILEAK of cell (3,3,3),

which consists of AWDSG FinFETs, can be estimated similarly. AWDSG pFinFET

(VDS < 0), AWDSG nFinFET (VDS > 0), and AWDSG nFinFET (VDS < 0) have an

ILEAK of 0.001, 0.017, and 0.022, respectively. Although PD1 and AX2 in cell (3,3,3)

are both nFinFETs, their ILEAK is different due to an asymmetry in doping concentra-

tion. ILEAK of cell (3,3,3) can be estimated as 0.04 nA (0.001 + 0.017 + 0.022 = 0.04),

which is slightly different than the value (0.035 nA) obtained via simulating the en-

48

tire cell. A 51× reduction in ILEAK of cell (3,3,3) with respect to cell (0,0,0) can

be estimated using ILEAK values of single FinFETs, which are computationally much

cheaper to characterize compared to an entire cell. Such analytical approaches that

are based on single FinFET characteristics can be useful for evaluating SRAM cells in

early design steps. However, the entire cell needs to be simulated and characterized

in order to obtain more accurate results.

Use of cell (0,0,0) with gate workfunction values ΦGF = ΦGB = 4.6eV reduces

ILEAK by 1079× compared to that of cell (0,0,0) with ΦG = 4.4eV for nFinFET

and ΦG = 4.8eV for pFinFET. Besides, it has good read stability metric values

(RSNM=155 mV, RPNM=11.686 µW) and writeability metric values (WM=356 mV,

WTP=3.738 µW). This, however, would require an extra gate workfunction of 4.6eV

(since a ΦG of 4.4eV and 4.8eV , respectively, for an nFinFET and pFinFET will

still be required to implement high-performance logic). If an extra gate workfunction

is available, several other SRAM cells exhibit an even better performance than the

(0,0,0) cell.

For high read stability, the (0,3,3) or (4,3,7) cell can be used. The (4,3,7) cell

has the highest RSNM at the cost of weak writeability. Compared to cell (4,3,7), cell

(0,3,3) has slightly weaker read stability: RSNM is 8% and RPNM is 26% smaller,

respectively. However, it has significantly better writeability: WM is 82% higher and

WTP is 41% smaller. Cell (0,3,3) also consumes 2.2× less ILEAK compared to cell

(4,3,7) and requires fewer fabrication steps since it does not have an asymmetry in

gate underlap. Overall, for high stability purposes, cell (0,3,3) is the most promising

choice of all SRAM configurations we explored.

Cell (0,1,1) only has asymmetry in gate workfunction. Still, it offers good write-

ability and 21× leakage reduction. Its read stability can be improved using read-assist

techniques.

49

Although cell (0,1,5) has the highest IREAD among the targeted MPA FinFET-

based SRAM cells, it is less desirable because it requires an extra fabrication step due

to the use of an asymmetric gate underlap.

We have shown that MPA FinFETs can be used in designing promising SRAM

cells under different gate workfunction, source/drain doping concentration, supply

voltage, and temperature values. We have performed FinFET simulations and care-

fully select the SRAM design parameter values while taking previously reported data

into account. Nevertheless, optimal parameter values in FinFET or SRAM design can

change depending on designer’s choice and the applications. It is crucial to derive

an optimal design for symmetric and asymmetric FinFET devices. The search space

of an optimal FinFET design can be very large due to the large number of FinFET

parameters. An exhaustive search for an optimal FinFET design can be computa-

tionally laborious and time-consuming. Stochastic optimization techniques such as

genetic algorithm, simulated annealing etc., can be used to explore the large FinFET

design search space and find optimal designs.

Determining the degree of asymmetry in FinFETs in a systematic way is crucial

as it significantly impacts the performance of asymmetric FinFET-based designs.

The same methodology used to design optimal symmetric FinFETs can be used for

asymmetric FinFETs as well. However, finding optimal asymmetric FinFETs may

be computationally more costly since there is a larger design space to cover due

to additional asymmetric parameters. In addition, optimization of an asymmetric

FinFET by itself may not be sufficient for promising designs since the true benefit of

asymmetry can emerge at the circuit level. For example, an ADSG FinFET, which

may be inferior to an SG FinFET, can be especially useful at mitigating the read-

write conflict. Therefore, a device-circuit co-design approach is needed to optimize

asymmetric FinFETs.

50

In this work, only five MPA FinFET-based SRAM cells were analyzed in detail.

The cells that are not analyzed may not perform worse in every aspect compared

to these five cells. The cells that are very similar to one of the five proposed cells

were excluded from detailed analysis. For example, cell (4,3,3) performs similar to cell

(3,3,3). However, only cell (3,3,3) was chosen for further analysis as it performs better

than cell (4,3,3) in all aspects except RSNM and has fewer fabrication steps. Still,

it gives an idea as to how a (4,3,3) cell would perform. Thus, the five proposed cells

indirectly represent a larger design space, which consists of cells that can be obtained

with slight modifications in terms of asymmetries introduced in the FinFETs.

One disadvantage of MPA FinFET-based SRAM cells is that the number of fab-

rication steps increases as we introduce asymmetries in FinFETs. AWSG FinFETs

can be fabricated using differently doped gate stacks without the need for additional

masking steps [67, 68]. ADSG FinFETs require an extra mask to dope the source

and drain unequally [31]. AUSG FinFETs also require an extra mask to create an

asymmetric underlap, which can be achieved using asymmetric spacer [69] or tilt ion

implantation [70]. However, an AUSG FinFET is harder to fabricate compared to an

ADSG FinFET [33].

3.7 Chapter summary

For the first time, we showed that MPA FinFETs can be used to design ultra-low-

leakage, robust, and high-density SRAM cells. Results indicate that by using AWDSG

FinFETs in SRAM design, with combined asymmetries in gate workfunction and

doping concentration, the ILEAK of the cell can be reduced by 58× while still demon-

strating acceptable stability compared to the traditional symmetric SG FinFET-based

SRAM cell. To obtain high read stability, an SRAM cell designed with SG pull-up and

AWDSG access and pull-down transistors shows the most promise, along with 22×

51

ILEAK reduction. Cell TR and TW are higher for the proposed MPA FinFET-based

SRAM cells. MPA FinFET-based SRAM cells incur zero area overhead. However,

the number of fabrication steps increases for MPA FinFETs depending on the type

and number of asymmetries incorporated in them.

52

Chapter 4

3-D Monolithic FinFET-based 8T

SRAM Cell Design for Enhanced

Read Time and Low Leakage

FinFETs have replaced planar MOSFETs due to their superior performance, power

efficiency, and scalability. However, even FinFETs are expected to reach their scaling

limits due to physical limits, process variations, and intolerable SCEs. As an alter-

native to scaling, 3-D ICs can increase the number of transistors per unit footprint

area. Among 3-D technologies, monolithic 3-D integration promises the highest den-

sity, performance, and power efficiency owing to its high-density MIVs. In a TLM

design, n- and p-type transistors are placed on different layers. Thus, it requires a

new 3-D cell library. In this chapter, we propose two new 3-D TLM 8T SRAM cells.

Both the proposed cells use pFinFET access transistors to achieve better area and

leakage power efficiency in 3-D. One of the proposed cells utilizes IG pFinFETs as

pull-up transistors whose back gates are tied to VDD for better writeability. This cell

has 28.1% and 43.8% smaller footprint area, 31.6% and 43.2% smaller ILEAK, and

53

53.2% and 29.0% lower TR compared with conventional 2-D 6T and 8T SRAM cells,

respectively [26].

4.1 Introduction

3-D ICs can improve performance, decrease power consumption by reducing inter-

connect length, and fit more devices in a chip without increasing its footprint area.

Monolithic 3-D ICs are particularly attractive as they exploit the third dimension

more efficiently, thanks to their small MIVs that connect the transistor layers. In a

TLM design, n-type and p-type transistors are fabricated on two separate layers. This

enables independent optimization of these layers. However, every logic and memory

cell in the cell library needs to be reimplemented in 3-D.

SRAM cells often occupy more than half of the die area and consume a significant

amount of leakage power. Design of high-density, high-performance, and low-power

TLM SRAM cells with good stability metrics is crucial for TLM designs to succeed.

With continued scaling in device dimensions and VDD, the impact of process varia-

tions on SRAM stability increases. A conventional 6T SRAM cell can be prone to

stability issues when the internal nodes are disturbed during a read operation. In

addition, it has low area efficiency when implemented in 3-D because it has four n-

type and two p-type transistors. The conventional 8T SRAM cell can improve read

stability by isolating data retention from the read operation [41]. However, similar

to the 6T SRAM cell, the conventional 3-D 8T SRAM cell is not area-efficient due

to the asymmetry in n-type and p-type transistor count (six n-type and two p-type

transistors). We make the following contributions in an effort to design area-efficient,

low-power, and high-performance 3-D SRAM cells:

1. We propose two new 3-D 8T SRAM cells (8T 4N4P 3D proposed1,

8T 4N4P 3D proposed2) for enhanced TR, low ILEAK, and high read stability:

54

• 8T 4N4P 3D proposed1 replaces nFinFET access transistors of a conven-

tional 8T SRAM cell with pFinFETs to reduce the footprint area while

preventing the TR from degrading. It has a high read stability, thanks

to the isolated read operation. However, it suffers from poor writeability

due to the use of weak pFinFET access transistors. This problem can be

alleviated through write-assist techniques.

• 8T 4N4P 3D proposed2, in addition to using pFinFETs as access transis-

tors, employs IG pFinFETs as pull-up transistors and ties their back gates

to VDD to improve writeability by weakening the pull-up transistors.

2. We comprehensively evaluate the proposed cells against previously reported

2-D and 3-D SRAM cells and show that they are particularly promising for

low-power and high read performance designs.

3. We explore assist techniques to improve the writeability of the proposed cells.

We compare the proposed cells with a conventional 6T SRAM cell imple-

mented in 2-D (6T 4N2P 2D) and 3-D (6T 4N2P 3D), a conventional 2-D 8T

SRAM cell (8T 6N2P 2D), and two previously reported 3-D 8T SRAM cells

(8T 4N4P 3D prior1, 8T 4N4P 3D prior2). 8T 4N4P 3D prior1 was constructed by

adding two pFinFET read access transistors to a conventional 6T SRAM cell [42].

pFinFET access transistors were used during the read operation to improve the read

stability. 8T 4N4P 3D prior2 was constructed by replacing n-type read transistors

of a conventional 8T SRAM cell with p-type transistors [40, 43].

We implement the SRAM cells using a 14nm SOI FinFET technology and char-

acterize them via 2-D mixed-mode device simulations. We compare the cells based

on their dc and transient metrics, such as RSNM, WM, IREAD, ILEAK, TR, and TW.

8T 4N4P 3D proposed2 offers the smallest TR and lowest ILEAK compared to other

cells, along with a high RSNM. It has 28.1%, 31.6%, and 53.2% reduction in footprint

55

area, ILEAK, and TR, respectively, compared to those of 6T 4N2P 2D. It has 43.8%,

43.2%, and 29.0% reduction in footprint area, ILEAK, and TR, respectively, compared

to those of 8T 6N2P 2D at the cost of 8.8% and 57.1% degradation in RSNM and

WM, respectively [26].

The rest of the chapter is organized as follows. Section 4.2 describes the simulation

setup. Section 4.3 describes the design of monolithic SRAM cells by detailing their

schematics, layouts, and bitline/wordline capacitances. Section 4.4 presents the sim-

ulation results and comparison of cells based on their dc and transient metrics. Sec-

tion 4.5 investigates the impact of process variations, memory array configurations,

assist techniques, different temperatures, and gate workfunction values on SRAM

cells. Section 4.6 discusses our results in comparison with prior work and presents

key observations. Section 4.7 presents the concluding remarks.


The simulation flow is shown in Fig. 4.1. First, the 3-D FEOL + BEOL structure

of each SRAM cell is synthesized based on its layout and technology parameter val-

ues. Then, parasitic capacitances are extracted using the transport-analysis-based

3-D TCAD capacitance extraction technique [62]. Sentaurus Device Simulator [59]

is used to perform 2-D hydrodynamic mixed-mode device simulations to obtain dc

and transient metrics of SRAM cells. It uses the Phillips unified mobility model

and doping-dependent Shockley-Read-Hall recombination model, along with band-

to-band tunneling and avalanche multiplication models, for accurate simulation. For

transient simulations, we use a 256×256 memory array configuration, consisting of

256 rows (wordlines) and 256 columns (bitlines). Wordlines and bitlines are modeled

using the π3 distributed RC line model. In our simulations, we assume that the top-

and bottom-layer transistors have the same quality. The simulations are performed

56

FinFET parameter values

SRAM

netlist

Technology

parameter values

3-D FEOL+BEOL structure

synthesizer

dc metrics: RSNM, WM,

IREAD, and ILEAK

3-D TCAD capacitance extraction

Sentaurus Device

Simulator

Transientdc

SRAM

layout

Array

configuration

Transient metrics: TR and

TW

Figure 4.1: Simulation flow for SRAM characterization.

at 300◦K temperature with a VDD of 0.8 V. A 100 nm thick SiO2 is used as the ILD

to eliminate the inter-layer coupling that may alter transistor behavior [71].

Dc and transient metrics that are defined in Section 1.1.3 are used to evaluate

the SRAM cells. Obtaining metric values of an 8T SRAM cell can differ from that

of a 6T SRAM cell. During the read operation of an 8T SRAM cell with nFinFETs

that provide a separate read path, the read bitline (RBL) and read word line (RWL)

are biased at VDD (VGND if read path transistors are pFinFETs) while the access

transistors are OFF . For these cells, IREAD is measured as the current drawn from

RBL when both read path FinFETs are ON . In the hold mode, wordlines (WL,

RWL) are at VGND if the the access/read transistors are nFinFETs and at VDD if

the access/read transistors are pFinFETs. Lastly, in the case of SRAM cells with

57

WL

VDD

AX1 AX2

PU1 PU2

PD1 PD2

BL BLB

L R

WL

VDD

AX2PU1 PU2

PD1 PD2

BL BLB

L R

AX1

AX3 AX4RWL RWL

WL

VDD

AX1 AX2

PU1 PU2

PD1 PD2

BL BLB

L R

RBL

RWL

RD1

RD2

WL

VDD

AX1 AX2

PU1 PU2

PD1 PD2

BL BLB

RD1

RD2

RWL

RBL

VDD

WL

VDD

AX1 AX2

PU1 PU2

PD1 PD2

BL BLB

RWL

RD1

RD2

RBLWL

VDD

AX1 AX2

PU1 PU2

PD1 PD2

BL BLB

RWL

RD1

RD2

RBL

(a) (b) (c)

(d) (e) (f)

L R L R L R

INV2INV1

Figure 4.2: SRAM cell schematics: (a) 6T 4N2P 2D/6T 4N2P 3D, (b) 8T 6N2P 2D,(c) 8T 4N4P 3D prior1, (d) 8T 4N4P 3D prior2, (e) 8T 4N4P 3D proposed1, and (f)8T 4N4P 3D proposed2.

RBL, we assume that the sense amplifiers are activated when the read bitline voltage

(VRBL) deviates by 100 mV from the initial point (∆VRBL = 100 mV). The rest of the

evaluation metric values are obtained in the same way for 6T and 8T SRAM cells.

4.3 Design of monolithic SRAM cells

We use the 14nm SOI FinFET technology to design the SRAM cells. Table 1.1 shows

the FinFET parameter values we use in our simulations.

4.3.1 Schematics of the SRAM cells

Fig. 4.2 shows the schematics of the SRAM cells we analyze. All SRAM cells use only

single-fin FinFETs to minimize area.

Fig. 4.2a shows the schematic of a conventional 6T SRAM cell. It consists of

a cross-coupled inverter pair (INV1:PU1-PD1, INV2:PU2-PD2) to store information

and two nFinFETs (AX1, AX2) to access the storage nodes. We evaluate both

58

the 2-D and 3-D implementations of the 6T SRAM cell. Although 6T 4N2P 3D is

implemented on two transistor layers, it has the same schematic as 6T 4N2P 2D.

6T 4N2P 2D is the baseline we use in our comparisons, unless otherwise specified.

We also assume internal nodes L and R store “1” and “0” initially.

Fig. 4.2b shows the 8T 6N2P 2D cell schematic [41]. 8T 6N2P 2D eliminates the

read-write conflict by decoupling the read operation from data retention. The read

operation is performed via a read path consisting of two nFinFETs (RD1, RD2) with-

out accessing the storage nodes directly. We only evaluate the 2-D implementation

of the conventional 8T SRAM cell since its 3-D implementation suffers from severe

footprint area inefficiency due to the asymmetry in the number of nFinFETs and

pFinFETs.

Fig. 4.2c shows the schematic of 8T 4N4P 3D prior1, a 3-D 8T SRAM cell ob-

tained by adding two p-type access transistors to a conventional 6T SRAM cell for

read stability enhancement [42]. Its 3-D implementation is balanced because each

transistor layer has four transistors. 8T 4N4P 3D prior1 alleviates the read-write

conflict. It uses weaker pFinFET access transistors during a read operation to in-

crease read stability and stronger nFinFET access transistors during a write operation

to increase writeability. However, the internal storage nodes still get disturbed dur-

ing a read operation and the read time is degraded due to the presence of the weak

pFinFET access transistors.

Fig. 4.2d shows the schematic of 8T 4N4P 3D prior2, a 3-D 8T SRAM cell, which

replaces the n-type read path transistors of a conventional 8T SRAM cell with p-type

transistors to reduce the footprint area in 3-D [40, 43]. It, however, may suffer from

a degradation in TR if the p-type transistors are slower than the n-type transistors,

which is the case in the 14nm FinFET technology.

Fig. 4.2e shows 8T 4N4P 3D proposed1. Unlike 8T 4N4P 3D prior2, it uses nFin-

FETs on the read path to keep TR small. However, it replaces nFinFET access

59

(a) (b) (c)

(d) (e)

(f) (g)

n-layer p-layer

n-layer p-layer n-layer p-layer

n-layer p-layer n-layer p-layer

GND BL VDD BLB GND VDD BL GND BLB VDD RBL GND BL VDD BLB GND

WL

RWL

WL

GND VDD GND BL BLBBL GND BLB RBL VDD

WL

RWL

WL

RWL

WL

RBL GND VDD GND BL BLB RBL GND VDD GND BL BLB

RWL

WL

RWL

WL

Figure 4.3: SRAM layouts: (a) 6T 4N2P 2D, (b) 6T 4N2P 3D, (c) 8T 6N2P 2D,(d) 8T 4N4P 3D prior1, (e) 8T 4N4P 3D prior2, (f) 8T 4N4P 3D proposed1, and(g) 8T 4N4P 3D proposed2.

transistors with pFinFETs to reduce the footprint area by equalizing the number of

nFinFETs and pFinFETs. 8T 4N4P 3D proposed1 suffers from poor writeability due

to weak pFinFET access transistors.

Fig. 4.2f shows the schematic of 8T 4N4P 3D proposed2. This cell, in addition

to employing pFinFET access transistors, uses IG pFinFET pull-up transistors with

their back gates biased at VDD to improve writeability. Weakening the pull-up tran-

sistors allows access transistors to write into the cell more easily. Among all analyzed

SRAM cells, it is the only cell that uses IG FinFETs.

4.3.2 Layouts of the SRAM cells

Fig. 4.3 shows the layout of the SRAM cells. We try to minimize the footprint

area when laying out the SRAM cells. Thus, we only used FinFETs with a single

fin. We use λ-based design rules when obtaining the layouts. In the 14nm FinFET

technology, we use λ = 7 nm. We assume the interconnect width and pitch are 4λ

60

and 8λ, respectively. MIVs are assumed to have a 4λ diameter and 8λ pitch. For

3-D SRAM cells, we assume the p-layer is at the bottom since the n-layer often needs

more routing.

Table 4.1 shows the footprint area values of the SRAM cells. 6T 4N2P 3D has

43.9% smaller footprint area compared to 6T 4N2P 2D. 8T 6N2P 2D has 28.1% larger

footprint area with respect to the baseline. 8T 4N4P 3D prior1 has 15.8% smaller

footprint area compared to 6T 4N2P 2D. However, its footprint area is 17.1% larger

than the footprint area of other 3-D 8T SRAM cells. The cells 8T 4N4P 3D prior2,

8T 4N4P 3D proposed1, and 8T 4N4P 3D proposed2 have the same footprint area,

which is 28.1% and 43.8% smaller than those of 6T 4N2P 2D and 8T 6N2P 2D,

respectively. Although the footprint area decreases for 3-D SRAM cells, their total

silicon area is larger. For example, the proposed cells have 12.3% larger total silicon

area compared to 8T 6N2P 2D.

Table 4.1: SRAM cell footprint area

SRAM W (λ) H (λ) Footprint NormalizedArea (µm2) Area (1×)

6T 4N2P 2D 57 20 0.056 1.006T 4N2P 3D 32 20 0.031 0.568T 6N2P 2D 73 20 0.072 1.28

8T 4N4P 3D prior1 40 24 0.047 0.848T 4N4P 3D prior2 41 20 0.040 0.72

8T 4N4P 3D proposed1 41 20 0.040 0.728T 4N4P 3D proposed2 41 20 0.040 0.72

4.3.3 Capacitance extraction

Table 4.2 shows the single-cell bitline (CBL, CBLB, and CRBL) and wordline capaci-

tances (CWL and CRWL) of each SRAM design, extracted using a transport-analysis-

based 3-D TCAD capacitance extraction technique [62]. Fig. 4.4 shows the FEOL

only and FEOL+BEOL structures for 6T 4N2P 2D. We use the industry-standard

61

Table 4.2: SRAM bitline and wordline capacitances

SRAM CBL (aF) CBLB (aF) CRBL (aF) CWL (aF) CRWL (aF)6T 4N2P 2D 77.34 77.05 N/A 143.11 N/A6T 4N2P 3D 72.76 72.74 N/A 138.62 N/A8T 6N2P 2D 80.21 75.13 66.63 171.91 60.08

8T 4N4P 3D prior1 88.14 87.87 N/A 119.44 119.748T 4N4P 3D prior2 62.60 64.50 61.00 140.70 90.43

8T 4N4P 3D proposed1 51.97 58.69 64.66 176.16 56.758T 4N4P 3D proposed2 60.99 58.65 64.66 176.00 56.75

thin-cell layout for 6T 4N2P 2D [72]. A long and thin cell layout leads to a smaller

CBL with respect to its CWL.

WL

AX2AX1

PU1

PU2

PD2

PD1

L

R

GND VDD GND

WL BL BLB WL

(a) (b)

WL

Figure 4.4: 6T 4N2P 2D cell: (a) FEOL only and (b) FEOL+BEOL. Dielectric re-gions are not shown.

6T 4N2P 3D has 5.9% and 3.1% smaller CBL and CWL compared to 6T 4N2P 2D.

Due to the asymmetry of the 8T 6N2P 2D layout, CBL and CBLB are different. We

use CBLavg to denote the average bitline capacitance (CBLavg = CBL/2 + CBLB/2).

8T 6N2P 2D has 0.6% larger CBLavg compared to the baseline. 8T 6N2P 2D CWL

is 20.1% larger due to its 28.1% longer cell width compared to 6T 4N2P 2D. Its

62

AX2

WL

AX1

PU1

R

L

PU2WL

VDD

VDD

(a) (b)

BL L VDD WL

WL VDD R BLB

Figure 4.5: 8T 4N4P 3D proposed2 cell p-layer: (a) FEOL only and (b)FEOL+BEOL. Dielectric regions are not shown.

CRWL is much smaller than its CWL because RWL is connected to a single transis-

tor. 8T 4N4P 3D prior1 has 14.0% larger CBLavg because each bitline is connected

to two access transistors (one nFinFET for write and one pFinFET for read oper-

ation) and the cell height is 20.0% larger relative to that of 6T 4N2P 2D. Its CWL

is 16.5% smaller than that of 6T 4N2P 2D since the cell width that determines the

wordline length is smaller for 8T 4N4P 3D prior1. 8T 4N4P 3D prior2 has 17.7% and

1.7% reduction in CBLavg and CWL, respectively, compared to those of 6T 4N2P 2D.

8T 4N4P 3D proposed1 has 28.3% smaller CBLavg and 23.1% larger CWL relative to

the baseline. Figs. 4.5-4.6 show the FEOL and FEOL+BEOL structures for the

p-layer and n-layer of 8T 4N4P 3D proposed2. 8T 4N4P 3D proposed2 has 22.5%

smaller CBLavg and 23.0% larger CWL with respect to 6T 4N2P 2D. Its CWL is larger

because WL is connected to both the p-layer and n-layer, as shown in Figs. 4.5-4.6.

CRWL is the smallest for the proposed cells, which helps improve their TR. It is 60.3%

63

and 5.5% smaller compared to CWL of 6T 4N2P 2D and CRWL of 8T 6N2P 2D, re-

spectively.

PD1

RWL

RD1

RD2

R

L

PD2

(a) (b)

RBL VDD WL

RWL GND GND

Figure 4.6: 8T 4N4P 3D proposed2 cell n-layer: (a) FEOL only and (b)FEOL+BEOL. Dielectric regions are not shown.

4.4 Simulation results

Table 4.3 shows the dc (RSNM, WM, IREAD, and ILEAK) and transient (TR and TW)

simulation results. For transient simulations, we assume a 256×256 memory array.

4.4.1 SRAM dc metric analysis

In this section, we analyze the dc metrics of the different SRAM cells.

RSNM

6T SRAM cells have the worst read stability because the access and pull-down

transistors are both low-Vth FinFETs. 8T 6N2P 2D, 8T 4N4P 3D prior2, and

64

Table 4.3: SRAM dc and transient metric values

SRAM RSNM (mV) WM (mV) IREAD (µA) ILEAK (nA) TR (ps) TW (ps)6T 4N2P 2D 23.69 252.21 149.32 3.57 52.43 55.876T 4N2P 3D 23.69 252.21 149.32 3.57 45.55 49.058T 6N2P 2D 264.25 252.10 149.32 4.30 34.56 75.41

8T 4N4P 3D prior1 140.28 252.26 121.06 3.65 69.85 41.748T 4N4P 3D prior2 264.27 252.26 58.30 3.45 57.57 47.61

8T 4N4P 3D proposed1 264.00 25.58 149.32 2.55 24.53 N/A8T 4N4P 3D proposed2 241.00 108.11 149.32 2.44 24.53 104.27

8T 4N4P 3D proposed1 have the highest RSNM because the read operation is

isolated from data retention. The storage nodes are not disturbed during a read

operation. 8T 4N4P 3D prior1 has a 116.59 mV increase in RSNM compared to

that of 6T 4N2P 2D owing to its weaker pFinFET read access transistors. The

voltage at the node storing a “0” (VR) is determined by the voltage divider formed

by AX2 and PD2 resistances. If VR rises above the trip voltage of INV1, the cell

value can flip. Thus, a lower VR is desired during a read operation. A weaker access

transistor has larger resistance, which leads to a smaller voltage rise at R. Therefore,

8T 4N4P 3D prior1 has much better read stability than 6T 4N2P 2D. However, its

RSNM is 46.9% smaller than that of 8T 6N2P 2D because the storage nodes are still

disturbed during the read operation. 8T 4N4P 3D proposed2 has 8.8% lower RSNM

than 8T 6N2P 2D due to the impact of the weaker pull-up transistors on the VTC

of the cross-coupled inverters.

WM

All SRAM cells, except the proposed cells, have a WM value around 252 mV.

8T 4N4P 3D proposed1 suffers significantly from writeability with a WM value of

only 25.58 mV. 8T 4N4P 3D proposed2 utilizes IG-type pFinFETs as pull-up tran-

sistors, whose back gates are biased at VDD to improve writeability. During a write

operation, AX1 tries to discharge storage node L while PU1 initially charges it. Thus,

a weakened PU1 allows AX1 to discharge L more easily during a write operation.

65

8T 4N4P 3D proposed2 still has 57.1% smaller WM compared to SRAM cells with

nFinFET access transistors. The writeability of the proposed cells can be improved

using write-assist techniques that are explored in Section 4.5.

IREAD

6T SRAM cells, 8T 6N2P 2D, and our 8T proposed cells have the same and highest

IREAD value of 149.32 µA because they all have two nFinFETs on the read path.

However, that does not result in an equal TR since bitline/wordline capacitances vary

among cells. 8T 4N4P 3D prior1 has 18.9% smaller IREAD compared to 6T 4N2P 2D

due to weaker pFinFET access transistors. 8T 4N4P 3D prior2 has the worst IREAD

since the read path consists of two pFinFETs. Its IREAD is 61.0% smaller than that

of 6T 4N2P 2D.

ILEAK

SRAM cells consume significant leakage energy as they are in the standby mode

most of the time. Thus, it is of utmost importance to design low-leakage SRAM

cells. 8T 6N2P 2D has the highest ILEAK among all cells. It has 20.6% higher ILEAK

compared to 6T 4N2P 2D due to two additional nFinFETs (RD1, RD2) on the iso-

lated read path. Despite the inclusion of two additional pFinFET access transistors,

8T 4N4P 3D prior1 has only 2.3% higher ILEAK compared to 6T 4N2P 2D because a

pFinFET has around 20× smaller ILEAK than an nFinFET in the technology we use.

Although 8T 4N4P 3D prior2 has two more pFinFETs, its ILEAK is surprisingly 3.2%

smaller compared to that of 6T 4N2P 2D. The reason is that, in the standby mode,

VR is slightly lower in 8T 4N4P 3D prior2 compared to that of 6T 4N2P 2D, which

reduces the ILEAK of PD1. 8T 4N4P 3D proposed1 has 28.6% and 40.8% smaller

ILEAK compared to 6T 4N2P 2D and 8T 6N2P 2D, respectively, due to the replace-

ment of the nFinFET access transistors with pFinFETs. 8T 4N4P 3D proposed2 has

66

the smallest ILEAK among all cells. In addition to including pFinFET access transis-

tors, biasing the back gate of pFinFET pull-up transistors at VDD also reduces ILEAK.

It has 31.6% and 43.2% smaller ILEAK compared to 6T 4N2P 2D and 8T 6N2P 2D,

respectively.

4.4.2 SRAM transient metric analysis

In this section, we analyze the transient metrics (TR and TW) of the different SRAM

cells. TR and TW do not represent read/write delays of the whole memory as we

do not consider the delays of the peripheral circuitry. They only consider wordline

and bitline delays, which are the only delay components affected by memory cell

design. TR and TW are useful to compare the performance of the cells and understand

how the wordline and bitline capacitances and the transistor strength affect SRAM

performance.

TR

TR depends strongly on IREAD and the bitline/wordline capacitances. Despite

their equal IREAD, 6T 4N2P 3D has 13.1% smaller TR due to a smaller CBL and

CWL compared to 6T 4N2P 2D. 8T 6N2P 2D has 34.1% smaller TR because it

has smaller CRBL and CRWL compared to CBL and CWL of 6T 4N2P 2D, respec-

tively. 8T 4N4P 3D prior1 suffers from a higher TR because its CBL is high and

the access transistors are pFinFETs, which leads to a small IREAD. Thus, its TR is

33.2% higher compared to that of 6T 4N2P 2D. Despite its 61.0% smaller IREAD,

8T 4N4P 3D prior2 has only 9.8% higher TR due to its small CRBL and CRWL. The

two proposed cells have the smallest TR owing both to their high IREAD and small

CRBL and CRWL. Their TR is 53.2% smaller compared to that of 6T 4N2P 2D. The

proposed cells have 57.4% smaller TR compared to 8T 4N4P 3D prior2 in which the

read path transistors are pFinFETs.

67

TW

6T 4N2P 3D has 12.2% smaller TW compared to the baseline due to its smaller

CWL. 8T 6N2P 2D has a large CWL, which leads to a 35.0% higher TW.

8T 4N4P 3D prior1 has 25.3% smaller TW compared to 6T 4N2P 2D due to its

smaller CWL. 8T 4N4P 3D prior2 has 14.8% smaller TW owing to its small cell width,

leading to a smaller CWL compared to that of the baseline. 8T 4N4P 3D proposed1 is

unable to complete the write operation successfully because the pFinFET access tran-

sistors are unable to win the fight against the equally strong pFinFET pull-up transis-

tors. 8T 4N4P 3D proposed1 can write the cell with the use of write-assist techniques

or stronger pFinFETs, which are explored in Section 4.5. 8T 4N4P 3D proposed2

ties the back gates of the pFinFET pull-up transistors to VDD to weaken them with

respect to access transistors and improve writeability. Despite the 82.53 mV improve-

ment in WM compared to that of 8T 4N4P 3D proposed1, 8T 4N4P 3D proposed2

still has 86.6% and 38.3% higher TW compared to 6T 4N2P 2D and 8T 6N2P 2D,

respectively.

Overall, the proposed cells have the highest IREAD and smallest ILEAK and

TR among all cells. They offer a high RSNM by isolating the read operation

from data retention. 8T 4N4P 3D proposed2 has slightly worse read stability

compared to 8T 4N4P 3D proposed1 due to the use of back-gate bias on pull-

up transistors. However, the proposed cells suffer from inferior writeability.

8T 4N4P 3D proposed1 is unable to even complete the write operation success-

fully, whereas 8T 4N4P 3D proposed2 can complete the write operation in 1.9×

worst time with respect to that of 6T 4N2P 2D. Poor writeability of the proposed

cells can be addressed through write-assist techniques, changing of the memory array

configuration, or modifying the strength of the FinFETs.

68

Table 4.4: Process variations

Parameter (unit) Nominal Value Range [−3σ, 3σ]LG (nm) 16 [14.4, 17.6]TSI (nm) 8 [7.2, 8.8]TOX (nm) 0.9 [0.81, 0.99]ΦGN (eV ) 4.4 [4.38, 4.42]ΦGP (eV ) 4.8 [4.78, 4.82]

4.5 Impact of process variations, memory array

configurations, assist techniques, different

temperatures, and gate workfunction values

In this section, we explore the impact of process variations, memory array configura-

tions, assist techniques, different temperatures, and gate workfunction values on the

SRAM cells to analyze the trade-offs involved among different design metrics across

different SRAM cells.

4.5.1 SRAM cell analysis under process variations

The impact of process variations on circuits increases with continued scaling in device

dimensions and VDD. SRAM cells are particularly prone to process variations because

they are generally constructed using minimum-sized transistors to minimize area. We

analyze the impact of variations in physical parameters, such as LG, TSI , and TOX ,

that have been shown to impact SRAM performance the most [65]. We also investigate

variations in Vth by modeling it with gate workfunction variations. Table 4.4 shows

the nominal value and [−3σ, 3σ] variation range of these parameters. We assume that

the physical parameters have a normal distribution and a 3σ/µ = 10% variation [66].

We generate 100 sample points using the Sobol sequence for quasi Monte Carlo sim-

ulations [73], which need dramatically fewer sample points to achieve a performance

close to Monte Carlo simulations. Process variations in the n-layer and p-layer are

69

Table 4.5: Distribution characteristics of SRAM dc and transient metrics

SRAM RSNM (mV) WM (mV) IREAD (µA) ILEAK (nA) TR (ps) TW (ps)µ σ µ σ µ σ µ σ µ σ µ σ

6T 4N2P 2D 25.87 4.92 265.01 1.70 182.16 4.66 3.71 1.10 52.51 1.09 55.81 0.786T 4N2P 3D 25.75 5.14 265.05 3.33 182.43 4.61 3.76 1.11 45.62 1.06 48.80 1.118T 6N2P 2D 268.49 3.72 264.95 1.70 182.14 4.63 4.46 1.30 34.64 1.01 76.04 1.04

8T 4N4P 3D prior1 122.76 3.75 265.09 3.30 137.03 3.15 3.80 1.04 69.90 1.25 40.90 0.908T 4N4P 3D prior2 268.58 3.62 265.19 3.32 64.61 1.88 3.60 1.06 57.70 1.23 47.75 1.12

8T 4N4P 3D proposed1 268.33 3.76 15.65 5.28 182.39 4.70 2.68 0.77 24.61 0.98 N/A N/A8T 4N4P 3D proposed2 240.22 5.96 101.77 5.57 182.46 4.47 2.57 0.74 24.60 0.98 103.97 3.13

independent of each other because the transistor layers are processed sequentially.

Thus, we generate two sets of sample points, one for each transistor layer in the case

of 3-D SRAM cells. Table 4.5 shows the mean (µ) and standard deviation (σ) of the

distribution characteristics obtained from process variation simulation results.

8T 4N4P 3D prior2 has the lowest σ for RSNM among all SRAM cells.

8T 4N4P 3D proposed2 has the largest σ, which can be due to its weaker pull-up

transistors.

8T 4N4P 3D proposed2 has the highest σ for WM as well. However, σ/µ is the

largest for 8T 4N4P 3D proposed1. The proposed cells suffer from WM variations

the most due to their weak pFinFET access transistors.

8T 4N4P 3D prior2 has the worst σ/µ ratio in IREAD due to its smaller IREAD.

The IREAD σ/µ ratio is similar for other cells.

ILEAK is affected the most by process variations as it has an exponential depen-

dence upon Vth. This leads to an average 4.6% higher µ compared to the nominal

ILEAK values shown in Table 4.3. The proposed cells have the smallest µ and σ values

for ILEAK. 8T 4N4P 3D prior2 has the smallest ILEAK σ/µ ratio.

The proposed cells have the smallest µ and σ values for TR. However, the σ/µ

ratio is also high for the proposed cells due to the small TR.

8T 4N4P 3D proposed2 suffers more in TW. It has the highest σ and σ/µ ratio

as it has the weakest pull-up transistors that are more prone to variations.

70

Overall, the writeability of 8T 4N4P 3D proposed2 is prone to process variations

the most because it has weaker pull-up transistors. Relatively higher variations in

RSNM for 8T 4N4P 3D proposed2 should not be a major issue since its RSNM is

high enough.

4.5.2 SRAM cell analysis under different memory array con-

figurations

We explore the impact of array configurations on SRAM transient metrics since TR

and TW are determined by the bitline and wordline capacitances. We explore four

different array configurations: 256×128, 512×128, 256×256, and 512×256. The num-

bers represent the number of wordlines and bitlines, respectively. The 256×256 case

is the baseline.

Fig. 4.7 shows TR under different array configurations. As expected, the proposed

cells have the smallest TR values under all configurations owing to their small CRBL

and CRWL. The advantage of the proposed cells grows higher for arrays with longer

wordlines. 8T 4N4P 3D prior1 has smaller TR than 8T 4N4P 3D prior2 when the

number of bitlines is 128 since 8T 4N4P 3D prior1 has a higher IREAD. As the word-

line length increases, its impact on TR increases. Thus, 8T 4N4P 3D prior2 has a

smaller TR than that of 8T 4N4P 3D prior1 when the number of bitlines is 256 due

to its smaller CRWL.

Fig. 4.8 shows TW under different array configurations. 8T 4N4P 3D proposed2

has a high CWL, which dominates TW. TW can be reduced for 8T 4N4P 3D proposed2

by 41.1% if the number of bitlines is halved. However, this may come at the expense of

other costs, such as inefficient utilization of memory area. As in the case of TR, some

cells may be better than others in terms of TW under specific array configurations.

For example, 8T 4N4P 3D prior2 has a smaller TW compared to that of 6T 4N2P 3D

for the 256×256 and 512×256 cases, whereas 6T 4N2P 3D has a smaller TW when

71

0

10

20

30

40

50

60

70

80

256x128 512x128 256x256 512x256

TR

(ps)

Array configuration (WLxBL)

6T_4N2P_2D

6T_4N2P_3D

8T_6N2P_2D

8T_4N4P_3D_prior1

8T_4N4P_3D_prior2

8T_4N4P_3D_proposed1


Figure 4.7: TR under different array configurations. TR values of the proposed cellsoverlap as they are almost the same.

0

20

40

60

80

100

120

256x128 512x128 256x256 512x256

TW

(ps)

Array configuration (WLxBL)

6T_4N2P_2D

6T_4N2P_3D

8T_6N2P_2D

8T_4N4P_3D_prior1

8T_4N4P_3D_prior2


Figure 4.8: TW under different array configurations.

the number of bitlines is 128. 8T 4N4P 3D prior2 has a smaller wordline resistance

since its wordline is implemented on a wider metal than that of 6T 4N2P 3D. Thus,

72

its TW gets better relative to that of 6T 4N2P 3D as the number of bitlines increases,

despite its higher CWL.

Overall, the results show that the memory array configuration can be modified to

improve transient metrics, although it may come at the expense of other costs not

explored in this work. The TW of 8T 4N4P 3D proposed2 can be improved by using

a memory array configuration with smaller wordline lengths. In addition, the impact

of modifying the memory array configuration can vary for different SRAM cells since

their bitline/wordline capacitances and resistances are different.

4.5.3 SRAM cell analysis under assist techniques

Next, we explore the impact of read/write assist techniques on the stability and

transient metrics of the SRAM cells. The read-assist techniques we explore are cell-

VDD boosting (VDD+), WL lowering (VWL-), negative BL (VBL-), and cell-GND

lowering (VGND-), and the write-assist techniques are cell-VDD lowering (VDD-), WL

boosting (VWL+), positive BL (VBL+), and cell-GND boosting (VGND+). In addition,

we investigate the impact of RWL boosting (VRWL+) and positive RBL (VRBL+) on

the read performance of the 8T SRAM cells. For each assist technique, we assume

a 0.3 V change in the target voltage level. For example, VBL+ assumes that bitline

voltages are precharged to 1.1 V. We report the most effective assist technique for each

cell for improving a specific SRAM metric. The results show that assist techniques

can help improve the writeability of the proposed cells.

Table 4.6 shows the most effective assist techniques for read stability and writeabil-

ity. It shows the RSNM and WM values after the assist technique is applied, along

with the improvement in RSNM and WM relative to the nominal case that does

not employ assist. Assist techniques help the RSNM of the 6T SRAM cells the most

since they have a small RSNM initially. Reducing VWL weakens the access transistors,

which alleviates the disturbance on internal storage nodes. For 8T 4N4P 3D prior1,

73

Table 4.6: Impact of assist techniques on read stability and writeability

SRAM Assist RSNM (mV) ∆ RSNM (mV) Assist WM (mV) ∆ WM (mV)6T 4N2P 2D VWL- 176.22 152.53 VGND+ 405.97 153.766T 4N2P 3D VWL- 176.22 152.53 VGND+ 405.97 153.768T 6N2P 2D VDD+ 292.92 28.67 VGND+ 406.45 154.35

8T 4N4P 3D prior1 VRWL+ 235.45 95.16 VGND+ 405.49 153.238T 4N4P 3D prior2 VDD+ 292.85 28.58 VGND+ 405.49 153.23

8T 4N4P 3D proposed1 VDD+ 292.82 28.82 VGND+ 229.63 204.058T 4N4P 3D proposed2 VGND- 277.82 36.82 VGND+ 322.73 214.62

VRWL+ increases the read stability because the access transistors are pFinFETs,

hence are weakened with increasing VRWL. The RSNM improvement for 8T SRAM

cells with separate circuitry for the read operation is smaller since they already have

a high RSNM. 8T 4N4P 3D proposed2 has its best RSNM when assisted with VGND-.

For all cells, WM improves the most with VGND+. Increasing VGND raises the

voltage at R, which is connected to the gate of PU1. A higher voltage at the gate

weakens PU1 and allows AX1 to discharge L and complete the write operation more

easily. The proposed cells have more than 200 mV improvement in WM with VGND+.

In addition, 8T 4N4P 3D proposed1 now performs the write operation successfully

with the help of the VGND+ assist technique.

Table 4.7 shows the best assist techniques for IREAD, TR, and TW. For all cells,

except 8T 4N4P 3D prior1 and 8T 4N4P 3D prior2, IREAD increases the most with

VGND- due to the increase in voltage difference between the bitline to be discharged

and VGND. For 8T 4N4P 3D prior1, VGND- increases the drain-to-source voltage of

AX4, which already operates in the velocity saturation mode. Thus, IREAD does

not increase significantly. VBL+ increases IREAD for 8T 4N4P 3D prior1 more than

VGND- does, since it increases the gate-to-source voltage (VGS) of AX4. The increases

in IREAD for 8T 4N4P 3D prior1 and 8T 4N4P 3D prior2 are smaller compared to

other cells as they use pFinFETs to discharge the bitlines during the read operation.

For 8T 4N4P 3D prior2, IREAD increases the most with VDD+ since the pFinFET

read path transistors try to charge RBL from 0 to VDD.

74

Table 4.7: Impact of assist techniques on read current and transient metrics

SRAM Assist IREAD (µA) ∆ IREAD (µA) Assist TR (ps) ∆ TR (ps) Assist TW (ps) ∆ TW (ps)6T 4N2P 2D VGND- 311.61 128.35 VGND- 24.97 -27.46 VDD- 33.97 -21.906T 4N2P 3D VGND- 311.61 128.35 VGND- 20.73 -24.82 VDD- 26.71 -22.348T 6N2P 2D VGND- 311.67 128.38 VGND- 13.73 -20.83 VDD- 50.46 -24.95

8T 4N4P 3D prior1 VBL+ 216.14 24.41 VRWL+ 43.56 -26.28 VDD- 20.73 -21.018T 4N4P 3D prior2 VDD+ 119.45 54.56 VDD+ 31.83 -25.74 VDD- 27.08 -20.53

8T 4N4P 3D proposed1 VGND- 311.67 128.38 VGND- 6.96 -17.57 VGND+ 57.42 N/A8T 4N4P 3D proposed2 VGND- 311.67 128.38 VGND- 6.97 -17.56 VGND+ 52.14 -52.13

TR is improved the most by VGND- for all cells except 8T 4N4P 3D prior1 and

8T 4N4P 3D prior2, similar to IREAD. The proposed cells maintain their advantage

compared to other SRAM cells in terms of TR. For 8T 4N4P 3D prior1, TR improves

more with VRWL+ despite the higher IREAD with VBL+, which shows that IREAD may

not capture the transient behavior of the read operation and fail to represent TR

accurately. Thus, transient simulations are necessary when determining the perfor-

mance of SRAM cells. For 8T 4N4P 3D prior2, TR decreases the most with VDD+

since the read path transistors are pFinFETs.

TW improves the most with VDD- for all cells except the proposed cells. Re-

duced cell-VDD decreases the VGS of PU1 and leads to a smaller current from VDD

to L through PU1. Thus, TW decreases since PU1 is weakened with respect to

AX1. For the proposed cells, VGND+ is the best assist technique for reducing TW.

Increasing VGND also reduces the current flowing through PU1 and enables AX1

to discharge L more easily. With the VGND+ assist technique, TW is halved for

8T 4N4P 3D proposed2 relative to no assist. 8T 4N4P 3D proposed1 is able to per-

form the write operation successfully with VGND+.

Overall, the assist techniques are shown to be effective at improving noise margins

and reducing read/write times. However, the assist techniques often come at the cost

of degrading other SRAM metrics, increasing power consumption, or introducing an

area overhead as they may require extra circuitry, such as level shifters or voltage

generators. For example, although VWL- increases the RSNM of 6T 4N2P 2D, it

reduces IREAD by 50.3%, which leads to a 20.2% increase in TR.

75

4.5.4 SRAM cell analysis under different temperature values

The operating temperature of a microprocessor can go up to 90◦C [64]. FinFET IOFF

increases exponentially with increasing temperature, whereas its ION does not change

much [14]. Thus, the impact of temperature on SRAM cell metrics is not prominent

except on leakage current. Fig. 4.9 shows ILEAK under different temperature values.

8T 6N2P 2D always has the highest ILEAK among all cells. Although the proposed

cells have the smallest leakage across different temperature values, their leakage re-

duction as a percentage becomes smaller with increasing temperature. At 370◦K,

8T 4N4P 3D proposed2 has 24.7% (31.6% at 300◦K) and 37.3% (43.2% at 300◦K)

smaller ILEAK compared to 6T 4N2P 2D and 8T 6N2P 2D, respectively.

0

10

20

30

40

50

60

70

300 310 320 330 340 350 360 370

I LE

AK

(nA

)

Temperature (K)

6T_4N2P_2D

6T_4N2P_3D

8T_6N2P_2D

8T_4N4P_3D_prior1

8T_4N4P_3D_prior2



Figure 4.9: ILEAK under different temperature values.

76

4.5.5 SRAM cell analysis under different gate workfunction

values

We analyze SRAM cells under different workfunction values to determine whether the

proposed cells are promising when targeted at various design objectives, such as high

stability, high performance, or low leakage. We choose four gate workfunction values

for nFinFETs (ΦGN = 4.3, 4.4, 4.5, and 4.6 eV) and four for pFinFETs (ΦGP = 4.6,

4.7, 4.8, and 4.9 eV). We simulate only four workfunction values for each FinFET

type since our aim is not to optimize the cells further but, rather, to understand

the trade-offs among various design objectives across different SRAM cells. We will

assume that the ΦGN = 4.4 eV and ΦGP = 4.8 eV case is the baseline. The Vth of an

nFinFET increases with increasing ΦGN . This leads to a smaller ION and IOFF . On

the other hand, the Vth of a pFinFET decreases with an increasing ΦGP , leading to

a higher ION and IOFF . We use the objective function in Eq. (4.1) to determine the

best workfunction pairs for different design objectives. We exclude IREAD from the

objective function since we already have TR that represents read performance.

Objective function =(RSNM ∗WM)α

(ILEAK + c)β ∗ (TR ∗ TW)γ(4.1)

The exponents are determined based on the targeted design objective after a

careful exploration of the value space. ILEAK has an exponential dependence upon

Vth, which changes linearly with the gate workfunction. Thus, the range of ILEAK is

very wide under the analyzed gate workfunction values. We use constant c to prevent

ILEAK from dominating the objective function value. We use the following exponents

and c value to determine the best gate workfunction pair for maximizing the objective

function for the corresponding design objective.

• α = 3, β = 1, γ = 1, and c = 1 for high stability,

• α = 1, β = 1, γ = 3, and c = 10 for high performance,

77

• α = 1, β = 1, γ = 1, and c = 0 for low leakage, and

• α = 3, β = 1, γ = 3, and c = 1 for overall quality.

Table 4.8 shows the gate workfunction values that maximize the objective

function for high stability. The cells tend to converge to high ΦGN and low ΦGP

values for better stability metrics, mostly because the RSNM generally increases

with increasing Vth. ILEAK is very small due to the use of FinFETs with high Vth.

8T 4N4P 3D proposed2 has the smallest TR along with a high RSNM. Compared

to its baseline (ΦGN = 4.4 eV, ΦGP = 4.8 eV), 8T 4N4P 3D proposed2 has better

RSNM, WM, and ILEAK at the cost of a degraded IREAD and TR.

Table 4.8: Gate workfunction values for designs with high stability

SRAM ΦGN (eV) ΦGP (eV) RSNM (mV) WM (mV) IREAD (µA) ILEAK (nA) TR (ps) TW (ps)6T 4N2P 2D 4.6 4.7 164.33 295.68 103.66 0.0044 83.56 77.126T 4N2P 3D 4.6 4.8 181.65 279.68 103.66 0.0882 74.37 72.668T 6N2P 2D 4.6 4.6 338.66 319.69 103.67 0.0028 61.03 96.64

8T 4N4P 3D prior1 4.5 4.6 196.71 322.62 80.40 0.0939 102.50 42.778T 4N4P 3D prior2 4.6 4.7 353.39 295.68 49.04 0.0065 72.73 66.07

8T 4N4P 3D proposed1 4.6 4.8 323.75 130.27 103.67 0.1734 50.29 87.198T 4N4P 3D proposed2 4.6 4.8 351.30 237.59 103.67 0.1734 50.28 71.11

Table 4.9 shows the gate workfunction pairs for high-performance designs. Fin-

FETs with a small Vth are favored for high performance as they are faster. However,

low-Vth FinFETs leak more and also have a degraded RSNM. 6T SRAM cells prefer

nFinFETs with ΦGN = 4.5 eV for better stability, while keeping the performance

high. Their leakage is much smaller compared to other SRAM cells due to their

higher-Vth FinFETs. 8T 4N4P 3D prior2 and the proposed cells use pFinFETs with

low Vth, which leads to a significant increase in leakage current. 8T 4N4P 3D prior2

beats all cells in terms of TR.

Table 4.10 shows results for the low-power designs. All SRAM cells, except

8T 4N4P 3D proposed1, use FinFETs with the highest Vth to minimize leakage.

8T 4N4P 3D proposed1 needs to use a slightly stronger pFinFET to be able to com-

78

Table 4.9: Gate workfunction values for high-performance designs




plete the write operation successfully. 8T 4N4P 3D proposed2 has the smallest ILEAK

and TR.

Table 4.10: Gate workfunction values for low-leakage-power designs




Table 4.11 shows results for designs with high quality. 8T 4N4P 3D proposed2

has the second best RSNM and TR. The leakage current of the proposed cells in this

setup is worse than that of other cells since they use stronger pFinFETs for better

WM and TW.

To summarize, we show that 8T 4N4P 3D proposed1 can use stronger pFinFETs

for successful writes to the cell. 8T 4N4P 3D proposed2 appears to be very promising

for almost all design objectives, particularly for the high-performance and low-leakage

scenarios. The results also show that optimizing the cells for a single design objective

often hurts other SRAM metrics.

Table 4.11: Gate workfunction values for overall high-quality designs




79

4.6 Discussion

In this chapter, we evaluated two new 3-D SRAM cells and compared them with

their counterparts. In this section, we make some key observations, particularly on

the differences between our results and those of previous works.

The footprint area of cells varies among different works due to differences in the de-

sign rules and technology used. For example, 8T 4N4P 3D prior2 has 40.0%, 17.3%,

and 28.1% smaller footprint area compared to 6T 4N2P 2D that is analyzed in [40],

[43], and our work, respectively. Certain top- and bottom-layer transistors are aligned

in [43] to exploit inter-layer coupling, which can decrease area efficiency. In [40],

the transistors are implemented with planar CMOS technology, which makes it eas-

ier to adjust device widths and design area-efficient cells. The IREAD reduction of

8T 4N4P 3D prior2 in [40] is addressed by using larger p-type read transistors. All

FinFET designs, such as ours, suffer from the width quantization issue (i.e., the num-

ber of fins can only be an integer). Therefore, it may incur more area to increase the

strength of the read-path pFinFETs since the number of fins has to be an integer.

The strength of n-type transistors with respect to p-type transistors has a large

impact on SRAM cell metrics. For example, using p-type access transistors improves

writeability in [45], whereas it degrades writeability significantly in our designs. The

difference comes from the technologies used in these two works. The p-type transistors

are stronger in the 5nm technology used in [45]. Therefore, p-type access transistors

are stronger and offer improved writeability. On the other hand, pFinFETs are weaker

than nFinFETs in the 14nm technology we use in this chapter. Thus, the proposed

cells have poor writeability.

Two SRAM cells with the same number and type of transistors may have

different footprint areas, depending on the routing requirements. For example,

8T 4N4P 3D prior1 has a 17.1% larger footprint area compared to other 8T 4N4P 3D

SRAM cells, although it does not even have an RBL signal unlike the other 8T SRAM

80

cells. MIV connections between the two transistor layers particularly limit the area

efficiency of 3-D cells.

SRAM cells were compared using their area, speed, and performance metrics. The

significance of each metric depends strongly on the application and design objectives.

For example, the speed is of utmost importance for a register file or L1 cache. How-

ever, for higher cache levels such L2 and L3, the footprint area and leakage may be

more important. Thus, we compared the SRAM cells for different design objectives.

The proposed cells were shown to be useful especially for designs requiring low-leakage

and high read speed.

4.7 Chapter summary

We proposed two new 3-D 8T SRAM cells implemented in TLM technology and

compared them with 6T and 8T SRAM cells presented in prior works. Our proposed

cells use pFinFET access transistors to achieve an area-efficient 3-D design. They

offer the smallest leakage current and read time among all cells, along with high read

stability, at the expense of poor writeability. One cell we proposed uses IG FinFETs

as pull-up transistors with the back gates tied to VDD to improve writeability. This

cell reduces the footprint area and leakage current by 28.1% and 31.6%, respectively,

while improving the read time by 53.2% compared to a conventional 2-D 6T SRAM

cell. It also has 43.8%, 43.2%, and 29.0% reduction in footprint area, leakage current,

and read time, respectively, compared to a conventional 2-D 8T SRAM cell, at the

cost of 8.8% and 57.1% degradation in RSNM and WM, respectively. We showed that

the writeability of the proposed cells can be improved with write-assist techniques,

such as cell-GND boosting.

81

Chapter 5

Hybrid Monolithic 3-D IC

Floorplanner

Different monolithic 3-D integration styles such as BLM, GLM, and TLM have been

shown to reduce power consumption, delay, and interconnect length of a chip. An HM

design has modules implemented in different monolithic styles to further optimize the

design objectives such as area, wirelength, and power consumption. However, a lack

of electronic design automation tools makes HM 3-D IC design quite challenging. In

this chapter, we introduce 3-D-HMFP, the first HM 3-D IC floorplanner. We charac-

terize the OpenSPARC T2 processor core using different monolithic implementations

and compare their footprint area, wirelength, power consumption, and temperature.

We show, via simulations, that under the same timing constraint, an HM design of-

fers 48.1% reduction in footprint area and 14.6% reduction in power consumption

compared to those of the 2-D design at the cost of higher power density and slightly

higher temperature [27].

82

5.1 Introduction

Transistors have scaled for decades and become faster in each technology generation.

Interconnects, however, have become a growing concern due to increased wire resis-

tance with scaling [74, 75]. They have become delay and power consumption bottle-

necks in modern microprocessors [76, 77]. When repeaters and flip-flops are added to

the interconnects to reduce their delay, the power consumption of the microprocessor

increases even more [78].

Monolithic 3-D integration can address the interconnect bottleneck as it reduces

interconnect length and the number of repeaters needed for long interconnects. An

HM design consists of modules implemented in different monolithic implementation

styles. It offers better trade-offs among chip performance, area, wirelength, and

power consumption. In this chapter, we introduce 3-D-HMFP to enable efficient

exploration of the HM design space. We use 3-D-HMFP to compare different mono-

lithic implementations of the OpenSPARC T2 processor core in terms of footprint

area, wirelength, power consumption, and temperature.

We first create FinFET libraries based on TCAD device simulations and generate

cell layouts. We separate modules of the OpenSPARC T2 processor core into logic and

memory modules. In order to characterize the logic and memory modules, we develop

tools called FinPrin-monolithic and CACTI-monolithic, respectively, atop our prior

tools: FinPrin [79] and CACTI-PVT [80]. They feed area, power, and wirelength

values to 3-D-HMFP for floorplanning experiments. Compared to a conventional 2-D

design, an HM design reduces the footprint area by 48.1% and power consumption

by 14.6% under the same timing constraint of 1 ns (hence, under a clock frequency

of 1 GHz) [27].

We use the thermal analysis tool HotSpot 6.0 [81] for 3-D monolithic thermal

analysis, and incorporate it into 3-D-HMFP. Although an HM design consumes less

83

power, its power density is higher due to the reduction in footprint area. This leads

to slightly higher temperatures on chip.

The rest of the chapter is organized as follows. Section 5.2 illustrates the poten-

tial benefits of HM designs through an example. Section 5.3 describes the simulation

setup for the HM design. Section 5.4 introduces FinPrin-monolithic, which we use to

characterize the area, power, and delay of logic modules. Section 5.5 explains how

3-D gate-level placement can be done for GLM designs. Section 5.6 describes CACTI-

monolithic, which we use to characterize memory modules. Section 5.7 describes the

proposed 3-D-HMFP in detail through problem formulation, T*-tree representation,

simulated annealing engine, cost function, and global wire power consumption. Sec-

tion 5.8 describes how HotSpot 6.0 is used for thermal analysis of the chip. Section 5.9

discusses floorplanning simulation results. Section 5.10 presents the concluding re-

marks.

5.2 Motivation

In this section, we demonstrate how the HM design style can be used to find an optimal

design. Table 5.1 shows the footprint area and power values of the OpenSPARC T2

floating-point and graphics unit (FGU) implemented in different monolithic styles us-

ing FinPrin-monolithic. The GLM implementation of FGU has the smallest footprint

area and the lowest power consumption compared to its BLM and TLM implemen-

tations. However, using GLM modules ubiquitously may not guarantee the best chip

design, given the complicated architecture of modern microprocessors that contain

tens of modules.

The following example shows how an HM design can offer trade-offs between

desired objectives, such as area and power consumption. Suppose we have five logic

and memory modules with relative footprint area values shown in Table 5.2. A, B, and

84

Table 5.1: FGU footprint area and power values for different monolithic implemen-tations

Monolithic type Area (µm2) Power (µW)BLM 21696 22973TLM 13821 21396GLM 10633 19866

C refer to logic modules that can be implemented in both BLM and GLM, whereas

D and E refer to memory modules that are only implemented in BLM.

Table 5.2: Footprint area values assumed for the modules to be floorplanned

Logic MemoryMonolithic type A B C D E

BLM 12 6 8 15 9GLM 6 3 4 - -

Fig. 5.1 shows different floorplanning scenarios. The first design consists of only

BLM modules. It achieves a footprint area of 26 (12+6+8) in the best case. The

second design uses all GLM logic modules to reduce power consumption. However,

its minimum achievable footprint area is 28 (6+3+4+15). On the other hand, the

third design utilizes both BLM and GLM logic modules and achieves a footprint area

of 25 (6+4+15). Although a BLM implementation of module B has higher power

consumption than that of its GLM implementation, the footprint area reduction in

the third design may reduce silicon cost, increase yield, and decrease global wire

power consumption.

Despite its simplicity, this example shows the need for an HM floorplanner. A

processor core has tens of modules some of which can be implemented in 3-D, whereas

others are better implemented in 2-D.

85

(a) (b) (c)

D

A

BC

B

B

C

C

A

A

A

A

B

C

DD

E

EE

C

Figure 5.1: Floorplanning results of different monolithic implementations showingthe benefit of hybrid design: (a) BLM, (b) GLM logic + BLM memory, and (c)GLM/BLM logic + BLM memory.


Floorplanning experiments require area and power consumption values of the mod-

ules. We characterize the modules of the OpenSPARC T2 processor core using differ-

ent monolithic styles and feed the area and power values to 3-D-HMFP for floorplan-

ning. Fig. 5.2 shows the HM design flow. First, we use Sentaurus Device Simulator

[59] to perform 2-D hydrodynamic mixed-mode device simulations to characterize

FinFET logic and memory cells. We assume a 14nm SOI FinFET technology, whose

parameters are shown in Table 1.1. We generate BLM and TLM cell layouts using the

Magic VLSI layout tool [82] to obtain the area values of the cells. We characterize

the FinFET logic and memory libraries based on timing, power consumption, and

area values obtained from device simulations and cell layouts. For the FinFET logic

library, we characterize INV, NAND, NOR (sizes 1×, 2×, 4×, 8×, and 16×), and a D

flip-flop (DFF). The FinFET memory library has a 6T SRAM cell in addition to logic

86

Sentaurus TCAD

Design Compiler

FinFET logic libraryDesign Compiler library

FinPrin-monolithic


RTL netlist

Gate-level netlist

Logic modulesarea/power/timing

FinFET memory library

CACTI-monolithic

3D-HMFP

HotSpot 6.0

Area/power

Temperature

Magic VLSI

Gate power/timing Gate area

Memory modulesarea/power/timing

Interconnects

Memory/cache structure

Figure 5.2: The hybrid monolithic design flow.

cells. We separate the OpenSPARC T2 processor core components into two groups:

logic and memory modules. Logic modules include components such as the decoder

and execution unit, which are synthesizable by Design Compiler [83]. We use FinPrin-

monolithic, which is built on top of FinPrin [79], to characterize the logic modules.

Memory modules have caches, register files, etc., which are not synthesizable. We

use CACTI-monolithic, which is built on top of CACTI-PVT [80], to characterize

memory modules. We obtain area and power consumption values of modules under 1

ns timing constraint from FinPrin-monolithic and CACTI-monolithic and feed these

values to 3-D-HMFP. 3-D-HMFP also calculates the global wire power consumption.

It produces overall chip area, wirelength, and power consumption values for the 2-D

design and different 3-D monolithic designs, such as BLM, TLM, and HM. Lastly,

87

Capo

Is the timing constraint met?

Gate-level netlist

GSRC format

FLUTE

Yes

Repeater insertion

No Can add repeaters?

Yes

No

FinFET logic library

Area, power, and delay

Temperature/frequency

Is it gate-level monolithic?

No

Gate-level placementYes

Delay model

Area modelPower model

Wire model

Figure 5.3: The FinPrin-monolithic simulation flow.

we use HotSpot 6.0 [81] to calculate the temperature of the chip. In the following

sections, we describe the tools we use in the HM design flow.

5.4 FinPrin-monolithic

We use FinPrin-monolithic to calculate the delay, area, and power consumption of

logic modules. We build FinPrin-monolithic on top of FinPrin [79], a FinFET logic

circuit analysis and optimization tool. The FinPrin-monolithic simulation flow is

shown in Fig. 5.3. The steps of this simulation flow are as follows:

88

Table 5.3: FinPrin-monolithic results

Monolithic type Area Area Wirelength Wirelength Dynamic Leakage Wire Total power Total power(µm2) reduction (%) (µm) reduction (%) power (µW) power (µW) power (µW) (µW) reduction (%)

BLM 94659 0.0 4490727 0.0 49956 3545 41099 94600 0.0TLM 61072 35.5 3736031 16.8 49018 3468 34192 86678 8.4GLM 47294 50.0 3141085 30.1 48970 3462 28747 81179 14.2

1. FinPrin-monolithic takes the gate-level netlist generated by Design Compiler

and converts it into the GSRC bookshelf format for place-and-route.

2. Capo, a 2-D floorplacer [84], performs row-based placement. For GLM modules,

an intermediate step generates 3-D gate-level placement. Details of the gate-

level placement are given in the next section. FinPrin-monolithic applies global

routing to the placement.

3. FinPrin-monolithic calculates the area, power, and delay values of the circuit

and determines the critical path based on the FinFET logic library and tem-

perature. We have incorporated FLUTE, a fast lookup table based rectilinear

Steiner minimal tree algorithm [85], into FinPrin-monolithic for more accurate

wirelength and, consequently, delay and power calculations.

4. If the timing constraint is violated, FinPrin-monolithic adds repeaters to only

the interconnects on the critical path.

5. FinPrin-monolithic repeats steps 3 and 4 until the timing constraint is met or

adding repeaters on the critical path does not decrease the delay anymore.

We characterized 13 logic modules of the OpenSPARC T2 processor core using

FinPrin-monolithic. They are the instruction fetch unit consisting of three sub-units,

decode unit, two execution units, load-store unit, floating-point unit, trap logic unit,

memory management unit, gasket unit, performance monitor unit, and pick unit. We

assume 1 GHz clock frequency and 330 ◦K temperature. FinPrin-monolithic results

are shown in Table 5.3. It shows the total footprint area, wirelength, and power

consumption of the 13 logic modules. TLM modules have 35.5%, 16.8%, and 8.4%

89

P diffusion

N diffusion

P FET

N FET

(a) (b) (c)

Gate

Metal 1

Metal 2

Contact

Figure 5.4: 8× NAND cell layout: (a) BLM/GLM, (b) TLM n-tier, and (c) TLMp-tier.

reduction in the footprint area, wirelength, and power consumption, respectively,

compared to those of BLM modules. Fig. 5.4 shows the layout of an 8× NAND

cell in BLM/GLM and TLM. TLM cells have higher total silicon area compared to

BLM/GLM cells due to their use of intra-cell monolithic vias. The height of a BLM

cell is 84λ (0.588 µm) while the height of a TLM cell is 54λ (0.378 µm). Depending on

how the TLM cells are implemented, the footprint area can be smaller than the BLM

cells by 46% [86], 44.4%, or 38.8% [87]. Our TLM cells have 35.7% less footprint area

compared to the 2-D cells, which is reasonable relative to prior work. Although TLM

modules have a smaller footprint area due to their smaller cell layouts, their total

silicon area (on both layers) is 29.0% larger than that of BLM. Their power reduction

is mostly due to a decrease in wirelength. GLM modules offer the smallest footprint

area, the shortest wirelength, and the lowest power consumption. Compared to BLM

modules, their footprint area, wirelength, and power consumption are reduced by

50.0%, 30.1%, and 14.2%, respectively. The main contributor to the power reduction

90

Figure 5.5: Gate-level monolithic placement steps: (a) cell deflation, (b) deflated 2-Dplacement, (c) cell inflation, and (d) cell layer assignment.

in GLM modules is the decrease in wirelength. A 30.1% decrease in the wirelength is

responsible for a 13.1% decrease in the total power consumption. The wirelength in

GLM modules, however, is slightly optimistic because FinPrin-monolithic does not

include the wirelength in the z-dimension (which we estimate would only contribute

1-2% to the total wirelength).

5.5 Gate-level monolithic placement

2-D placement tools can be used for BLM and TLM designs. We modify a 2-D

placement process to perform GLM placement. We use a similar approach to the

ones reported in [53] and [19] to perform GLM placement. The placement steps are

shown in Fig. 5.5.

1. Cell deflation: The cell widths are halved.

2. Deflated 2-D placement: Capo [84] is used for row-based placement of deflated

cells.

3. Cell inflation: The cell widths are returned to their original values, leading to

cell overlaps.

91

4. Cell layer assignment: Cells are assigned to one of the layers in 3-D design.

A greedy algorithm is used to minimize the placement area and remove the

overlaps.

The algorithmic steps for greedy layer assignment are illustrated with the example

shown in Fig. 5.5. The algorithm starts with the first row and assigns its first cell to

the bottom layer and its second cell to the top layer. Then, it considers each cell in

the row in order and assigns it to the layer with less total cell width. Its aim is to

minimize the footprint area. It repeats this process for all rows.

We compared our cell layer assignment method with the Zero-One Linear Program

(ZOLP) method presented in [53]. ZOLP formulates cell layer assignment as a linear

programming problem. It performs cell assignment to minimize the total overlap

between cells while assuming that the cells have a fixed position. A legalization step

follows ZOLP layer assignment to remove the remaining overlaps, which may increase

the placement area and wirelength. We compared the two methods using 14 different

logic circuits, such as s713, s1196, s1488, and s9234 from the ISCAS’89 benchmark

suite, and the arithmetic-logic unit and multiplier unit from the OpenSPARC T1

benchmark. These circuits are only used to compare gate-level placement methods

and are different from the OpenSPARC T2 modules that are characterized in this

chapter. Table 5.4 shows the total area and wirelength values of the circuits for

different placements. The greedy method performs slightly better in both footprint

area and wirelength compared to ZOLP. Compared to 2-D placement, ZOLP reduces

the footprint area and wirelength by 47.5% and 25.8%, respectively. Our method,

on the other hand, reduces the footprint area and wirelength by 49.2% and 28.1%,

respectively, as shown in Table 5.4.

Fig. 5.6 shows a comparison of the above two methods on a small example to

demonstrate how the greedy method can perform better than ZOLP. In this example,

seven cells are assigned to two layers using the two methods. ZOLP leads to a larger

92

Table 5.4: Placement results of 14 test circuits

Design Area (µm2) Area Wirelength (µm) Wirelengthreduction (%) reduction (%)

2-D 2680 0.0 88192 0.0Deflated 2-D 1356 49.4 62667 28.9ZOLP 3-D 1408 47.5 65456 25.8Greedy 3-D 1361 49.2 63443 28.1

area compared to the greedy method because of the remaining overlaps between cells

after ZOLP layer assignment. The greedy method, on the other hand, removes all the

overlap during the layer assignment. It also performs better in wirelength because

wirelength is often proportional to the area of the design. The position of a cell

after greedy layer assignment is relatively close to its original position in 2-D deflated

cell placement. Therefore, the wirelength is close to the wirelength in deflated cell

placement, as shown in Table 5.4, which has already been optimized during deflated

cell placement. The greedy method is also faster because it is simpler and does not

need a legalization step. Therefore, we choose the greedy method since it performs

slightly better in area and wirelength and is significantly faster than ZOLP. Neither

method minimizes the number of vias. To minimize the number of vias, a min-cut

partitioner can be used to split the circuit [19].

Figure 5.6: Greedy layer assignment vs. ZOLP layer assignment showing the areabenefit of greedy method: (a) deflated 2-D placement, (b) cell inflation, (c) cell layerassignment, and (d) legalization (just for ZOLP).

93

Area/power/timing

Cache configuration- Cache size- Block size- Associativity- Bank count- Technology

FinFET design library- Logic cells- Memory cells- Capacitance model- Resistance model- Delay model

CACTI-monolithicCost function Technology parameters

Figure 5.7: The CACTI-monolithic simulation flow.

5.6 CACTI-monolithic

We used CACTI-monolithic to characterize 22 memory modules of the OpenSPARC

T2 processor core. CACTI-monolithic is built on top of CACTI-PVT [80], which is

a FinFET cache/memory modeling tool. The CACTI-monolithic simulation flow is

shown in Fig. 5.7. Cache parameters, such as cache size, block size, associativity, bank

count, and technology node, are defined by the user. CACTI-monolithic investigates

different memory configurations by exploring different values for various parameters,

such as the number of segments in a bank wordline, number of segments in a bank

bitline, and number of sets mapped to each bank wordline. It computes the timing,

area, and power consumption of the memory module based on FinFET logic and

memory libraries and technology parameter values. It finds the best configuration

based on the cost function defined by the user.

We use CACTI-monolithic to model BLM and TLM modules. We modify the

area values of the cells in the FinFET logic and memory libraries in order to char-

94

Table 5.5: CACTI-monolithic input parameter values for memory modules

Input ICD&ICT DCA&DTA IRF FRF DVA DTLB ITLB SCMMemory type Cache Cache RAM RAM RAM CAM CAM CAM

Cache size (bytes) 16896 9216 288 2048 128 608 304 64Block size (bytes) 32 16 9 8 32 16 8 1

Associativity 8 4 1 1 1 FA FA FANumber of banks 4 2 1 1 1 1 1 1

Tag size (bits) 30 30 - - - 66 66 37

acterize TLM memory modules. To the best of our knowledge, no tool is available

for characterizing GLM memory modules. Therefore, we do not evaluate such mod-

ules. Table 5.5 shows the CACTI-monolithic input parameter values for important

OpenSPARC T2 memory modules. They are instruction cache data array (ICD),

instruction cache tag array (ICT), data cache array (DCA), data cache tag array

(DTA), integer register file (IRF), floating-point register file (FRF), data valid bit

array (DVA), data translation lookaside buffer (DTLB), instruction translation looka-

side buffer (ITLB), and store buffer CAM (SCM). The rest of the memory modules

are smaller RAM arrays. We assume 1 GHz clock frequency and 330 ◦K operating

temperature. FA stands for fully-associative.

Table 5.6: CACTI-monolithic results

Monolithic type Area Area Dynamic Leakage H-tree Total power Total power(µm2) reduction (%) power (µW) power (µW) power (µW) (µW) reduction (%)

BLM 25444 0.0 3075 3124 4349 10548 0.0TLM 18998 25.3 2966 3115 4222 10303 2.3

CACTI-monolithic results are shown in Table 5.6. It shows the total footprint

area, dynamic power, leakage power, H-tree power, and total power of the 22 memory

modules. In all, TLM memory modules have 2.3% less power consumption and 25.3%

smaller footprint area compared to those of BLM memory modules. TLM memory

modules do not benefit from smaller TLM cell layouts as much as TLM logic modules

do, both in terms of area and power consumption. Although the TLM cell layout

is 35.7% smaller than the BLM cell layout, TLM memory modules have only 25.3%

smaller footprint area. The footprint area benefit of TLM cell array is reduced because

95

both BLM and TLM have similar routing area for the same structure, as shown in

Fig. 5.8.

(a) (b)

Figure 5.8: Area comparison: (a) BLM memory and (b) TLM memory module.

In addition, 25.3% smaller footprint area for TLM memory modules does not

translate into a significant reduction in total power consumption due to the organi-

zational structure and dimensions of the memory modules. CACTI-monolithic uses

horizontal and vertical H-tree structures to route data and address, as shown in

Fig. 5.9. These H-trees dominate the power of the memory modules. Their length

is significantly impacted by the width, height, and organizational structure of the

memory module. Table 5.7 shows results for an ICD [16.5KB (16KB data and 0.5KB

parity), 8-way set associative, 32B line size, 64 entries] implemented in BLM and

TLM. The module width of both implementations is similar. However, the height

of the TLM implementation is 24.4% smaller. Because the memory module width

is larger than its height, horizontal H-trees are longer and, consequently, affect the

power consumption the most. Both TLM and BLM memory modules have similar

horizontal H-tree lengths because their width is similar. This leads to only a small

reduction in power consumption for TLM memory modules, mostly due to shorter

vertical H-trees and number of repeaters.

96

H0 H1

H2 H2

V0 V0 V0 V0

V2 V2

V2 V2

V1 V1

V2 V2

V2 V2

V1 V1

V2 V2

V2 V2

V1 V1

V2 V2

V2 V2

V1 V1

Figure 5.9: Layout of horizontal and vertical H-trees of a memory module.

Table 5.7: Instruction cache data array dimensions of BLM and TLM implementations

Monolithic type Width (µm) Height (µm) Power (µW)BLM 151 78 5757TLM 150 59 5613

5.7 Hybrid-monolithic 3-D IC floorplanner

3-D-HMFP is built atop 3DFP [88], which is a thermal-aware floorplanner tool de-

veloped for TSV-based 3-D ICs. 3-D-HMFP can handle hybrid floorplanning while

taking the global interconnect power consumption into account. It is implemented in

C++. The main differences between 3DFP and 3-D-HMFP are as follows:

97

• 3DFP floorplans only 2-D modules on multiple layers, assuming no vertical

constraints. 3-D-HMFP handles both 2-D and 3-D modules and aligns parts of

GLM and TLM modules on two layers.

• 3DFP assumes wire power to be fixed at 30.0% of the total module power. 3-

D-HMFP does not make such an assumption since wire power is significantly

impacted by the technology node being employed. Instead, 3-D-HMFP calcu-

lates wire power based on the FinFET logic library and technology-dependent

wire resistance and capacitance values.

• 3DFP, at a time, can only explore the design space of modules implemented in

the same style. 3-D-HMFP explores a larger design space because it can replace

modules with their alternative implementations.

• 3DFP uses a B*-tree representation. 3-D-HMFP uses a T*-tree representation

for handling vertical constraints.

5.7.1 Problem formulation

We assume that the two parts of a GLM or TLM module on different layers are

aligned with each other to reduce the footprint area and wirelength. Not aligning

them can make the design complicated and increase routing. The main challenge in

HM floorplanning is to make sure that both parts of every GLM and TLM module

are aligned on the two layers, as shown in Fig. 2.1.

5.7.2 T*-tree representation

3-D-HMFP uses a T*-tree representation that is inspired by the B*-tree representa-

tion, which is an efficient and flexible data structure for non-slicing floorplans [89].

T*-tree has been used for 3-D floorplanning, considering vertically-aligned rectilinear

modules, in [49]. Fig. 5.10 shows a T*-tree representation and the corresponding

98

0

5 1 4

2

05 6

47

36

7

Top layer

Bottom layer

1

326

4 7

Figure 5.10: A T*-tree representation and the corresponding placement in 3-D.

placement of modules in 3-D. Modules 4, 6, and 7 are assumed to be implemented

in GLM or TLM. Hence, they have the same footprint on the two layers. Other

modules are assumed to be implemented in BLM and thus occupy space only on one

layer. The T*-tree represents each module with a node. A node can have up to three

children nodes.

3-D-HMFP uses a depth-first search algorithm to pack the modules. It starts from

the root node and places the module on the bottom layer. At each node, it visits the

middle, left, and right subtrees in order. The middle child module is placed on top

of the parent module on the upper layer. The left child module is placed to the right

of the parent module. The right child module is placed above the parent module on

the same layer. We use two linked-list data structures (one for each layer) to keep

track of the boundaries of the placement and determine the locations of the placed

modules. More details of the T*-tree representation and packing algorithm can be

found in [49] and [89].

Not every T*-tree representation corresponds to a valid placement. For example,

a GLM or TLM module cannot be the middle child of another module. 3-D-HMFP

checks the legality of the solution at each step. Any operation that leads to an invalid

99

solution is dismissed by the algorithm. For example, in Fig. 5.10, swapping module

1 with module 7 is not legal since the resulting state has three layers. Hence, such an

operation is rejected by the algorithm.

5.7.3 Simulated annealing engine

3-D-HMFP uses a simulated annealing engine to perturb the floorplanning solutions.

Five different operations can be performed on the T*-tree nodes in the floorplanning

algorithm for this purpose. The operations are as follows:

1. Rotate: It rotates a module by 90◦. In other words, it swaps the width and

height values of a module.

2. Resize: It modifies the width and height values of a soft module while keeping

the module area the same.

3. Move: It moves a module to another position in the T*-tree.

4. Swap: It swaps the positions of two modules in the T*-tree.

5. Replace: It replaces a module with one of its alternative implementations. For

example, it can replace a GLM module with its BLM implementation.

After each operation on the T*tree, the simulated annealing engine decides

whether to move to a new state based on a weighted cost function specified by the

user.

5.7.4 Cost function

The goal of the floorplanning algorithm is to find a solution with the smallest weighted

cost function, which is as follows for a 2-D design:

cost = α ∗ A+ β ∗WL, (5.1)

100

where A and WL are the area and wirelength of the design. Thus, 3-D-HMFP tries

to minimize the area and wirelength of a 2-D design.

For 3-D HM floorplanning, we use Eq. (5.2):

cost = α ∗ A+ β ∗WL+ γ ∗D + θ ∗ P, (5.2)

where D and P are the deviation and power consumption of the design. In 3-D

floorplanning, different layers need to have similar dimensions in order to fully utilize

the silicon area and minimize the dead space. We calculate the deviation as

D = |W1 −W2| ∗ |H1 −H2|/A, (5.3)

where W and H denote the width and height of the layers, respectively, and the

subscripts the layer number. D is smaller if the two layers have similar dimensions.

We also need to add power consumption to the cost function since hybrid floorplanning

can replace a module with an alternative that has a different power value. 3-D-HMFP

favors a GLM implementation of a module to its BLM implementation. Although they

have similar total silicon area, a GLM implementation has lower power consumption

due to reduced wirelength. This reduces the cost function we are trying to minimize.

5.7.5 Global wire power consumption

Interconnects have started to dominate the power consumption of modern micro-

processors. Thus, excluding interconnect power during floorplanning undermines

floorplanning quality and underestimates the peak temperature. At each stage, 3-

D-HMFP calculates the length of the global wires and determines the number of

repeaters that need to be added. It calculates the power consumption of the global

wires and repeaters based on wire resistance and capacitance values obtained from

ITRS 2013 [90] and the FinFET logic library.

101

3-D-HMFP only calculates the global wire power consumption. Intermediate wires

are already taken into account by FinPrin-monolithic and CACTI-monolithic when

they characterize logic and memory modules, respectively.

5.8 HotSpot

Reduced footprint area, vertically-stacked multiple active layers, and low thermal

conductivity of the inter-layer dielectric increase the power density of monolithic 3-D

ICs and lead to higher on-chip temperatures [76]. Thus, the peak temperature can

become an issue for monolithic 3-D ICs due to their higher power density. Hence, a

thermal model is needed for monolithic 3-D ICs in order to accurately estimate the

peak temperature and avoid hotspots on chip. We incorporate HotSpot 6.0 [81] into

3-D-HMFP for thermal analysis of the chip. 3-D-HMFP provides area and power

values of the different modules to HotSpot 6.0. Since HotSpot 6.0 cannot handle

interconnect power separately, we distribute global wire power among modules in

proportion to their areas. We use HotSpot’s grid model to obtain more accurate

temperature values. HotSpot 6.0 outputs not only the temperature of each block

but also temperatures at a finer grid level specified by the user. The user can effect

a trade-off between speed and accuracy of thermal simulation by changing the grid

resolution.

Fig. 5.11 shows the thermal model organization along with the thicknesses we

assume for each layer. In addition to the floorplan, power consumption, and layer

dimensions, we also specify the layer heat capacity and thermal resistivity values.

We use thermal properties and layer thicknesses of the default thermal package in

the HotSpot 6.0 distribution [91], which is reasonable for a typical high-performance

processor. We assume 1µm thickness for silicon layers consisting of active silicon

and metal layers for the 14nm technology node [92]. We assume 100nm inter-layer

102

dielectric thickness between silicon layers, which is enough to eliminate the inter-layer

coupling that may alter transistor behavior [71]. We report the peak temperature of

the chip in our results.

Thermal interface material: 20 μm

Heat spreader: 1 mm

Heat sink: 6.9 mm

Bulk silicon: 150 μm

Bottom layer: 1 μmTop layer: 1 μm

SiO2:

0.1 μm

Figure 5.11: Thermal model organization.

5.9 Results

The OpenSPARC T2 [93] processor core is characterized using six different implemen-

tations: 2-D, BLM, TLM, and three HM designs. The footprint area, global wire-

length, total power consumption, dead space, peak temperature of the chip, footprint

area reduction, global wirelength reduction, total power reduction, and simulation

run-time for each design is computed. The results are obtained through the same

methodology, using the same tools, and under the same 1 ns timing constraint. Logic

and memory modules are treated as soft and hard modules, respectively. Because

the simulated annealing algorithm is a stochastic technique, it only approximates the

globally optimum solution. Therefore, comparing the results after a single run for

each design may not be fair. Hence, we run 100 simulations for each design and use

the best case for comparison. We run floorplanning experiments on a 3.10 GHz ma-

chine with 64-bit quad-core Intel i5 processor, 8 GB DRAM, and Ubuntu 12.04 LTS

operating system.

103

Table 5.8: Comparison of different monolithic designs based on minimum area-powerproduct showing the benefit of hybrid designs in terms of footprint area and powerconsumption

Design Area Wirelength Power Dead Temperature Area Wirelength Power Run(µm2) (µm) (µW) space (%) (◦K) reduction (%) reduction (%) reduction (%) time (s)

2-D 120516 971365 125442 0.3 325.2 0.0 0.0 0.0 6.2BLM 61984 584606 116271 3.1 330.8 48.6 39.8 7.3 6.8TLM 81171 732676 112298 1.4 327.5 32.6 24.6 10.5 6.1

HM1 (GLM logic + TLM memory) 66420 678064 105867 0.2 329.0 44.9 30.2 15.6 6.0HM2 (GLM logic + BLM memory) 62600 755597 107134 4.1 329.8 48.1 22.2 14.6 6.3

HM3 (GLM/BLM logic + BLM memory) 61410 727442 110498 2.8 330.3 49.0 25.1 11.9 6.7

5.9.1 Floorplanning results

Table 5.8 shows the floorplanning results. The floorplan with the minimum area-

power product value is chosen for each design because both parameters are important.

All designs exhibit very small total dead space. Results show that a hybrid monolithic

design (HM2) offers a 48.1% reduction in footprint area and a 14.6% reduction in

power consumption at the cost of a small increase in temperature.

The 2-D design has the largest footprint area and power consumption among all

designs. Its peak temperature is the lowest since temperature depends on power

density. We use this design as the baseline.

The BLM design has 48.6% lower footprint area and 7.3% lower power consump-

tion relative to the 2-D design. Its power reduction is due to a reduction in global

wire power consumption.

The TLM design has 32.6% smaller footprint area relative to the 2-D design since

TLM cell layouts have 35.7% smaller footprint area compared to those of BLM. The

power reduction is 10.5% due to shorter local and global interconnects.

The HM1 design implements all logic modules in GLM and all memory modules

in TLM. This design can be floorplanned using a 2-D floorplanner since all modules

have two layers. Its footprint area and power consumption are, respectively, 44.9%

and 15.6% lower compared to those of the 2-D design. HM1 offers the best power

value among all designs because it saves power from both logic and memory modules.

104

It has 8.9% less power consumption compared to the BLM design mostly due to the

intra-module wirelength power reduction of GLM logic modules.

The HM2 design implements all logic modules in GLM and all memory mod-

ules in BLM. It uses GLM logic modules to save footprint area and power. It uses

BLM memory modules since they occupy smaller total silicon area compared to TLM

memory modules. Hence, the footprint area is reduced by 48.1%. HM2 has 14.6%

reduction in power consumption. The results are similar to those of [19] in which the

OpenSPARC T2 core implemented in monolithic design has 50.0% smaller footprint

area and 15.6% less power consumption compared to its 2-D counterpart. Similarly,

the monolithic OpenSPARC T2 core design was reported to have 14.0% smaller power

consumption compared to the 2-D design [55].

The HM3 design uses logic modules from both BLM and GLM implementations.

The footprint area is reduced by 49.0% and power by 11.9% compared to those of the

2-D design. HM3 offers the best footprint area since it can use both BLM and GLM

logic modules to minimize the dead space. It has slightly worse power consumption

than that of HM2 because it sometimes uses BLM logic modules instead of GLM

logic modules to reduce the footprint area and global wirelength. HM3 offers more

flexibility than HM2 because the HM3 design space is a superset of the HM2 design

space. However, exploring a larger design space increases algorithm run-time slightly.

In our case, HM3 takes 5.3% more time, on an average, to reach the solution compared

to HM2.

3-D-HMFP has the same thermal-aware floorplanning ability as its predecessor

3DFP [88]. However, temperature reduction was not significant for thermal-aware

floorplanning because, in our designs, the peak temperature and the temperature

range of different floorplans were already low for non-thermal-aware floorplans.

Fig. 5.12 shows the floorplans of the 2-D, BLM, TLM, and HM3 designs reported

in Table 5.8. The 2-D design is implemented on a single layer and the 3-D designs on

105

(a) (b) (c) (d)

Figure 5.12: Floorplanning results showing that the vertical constraints are met forTLM/GLM modules: (a) 2-D, (b) BLM, (c) TLM, and (d) HM3 (GLM/BLM logic +BLM memory). Colors indicate the implementation type: blue: BLM, brown: TLM,and green: GLM

two (top and bottom) layers. The TLM design has the same floorplan on both layers

since all modules have identical dimensions on both layers. The HM3 design has both

GLM and BLM modules. GLM modules have two parts and occupy the same space

on both layers.

106

5.9.2 Floorplanning results at minimum area, wirelength,

and power values

One may have different objectives for floorplanning, such as decreasing the footprint

area, reducing the power consumption, or reducing the wirelength for easier rout-

ing. Therefore, we compare the results for the minimum area, wirelength, or power

configurations to better understand the trade-offs.

Table 5.9 shows the minimum-area floorplanning results. The various designs

from Table 5.8 are used as the baseline for the corresponding designs from here on.

Minimum footprint area is achieved at the cost of a higher power consumption and

increase in total wirelength. The minimum-area 2-D design has a 23.5% larger wire-

length and 3.9% higher power consumption. For the minimum-area TLM design,

wirelength and power increase by 36.3% and 6.3%, respectively, with only a 1.0%

footprint area reduction. The HM1 and HM2 designs have less than 1.0% smaller

area at the cost of around 1.0% power consumption increase. The HM2 design, inter-

estingly, has 0.8% higher power despite a 2.4% decrease in wirelength. This can be

explained based on the number of repeaters on long global interconnects, since both

HM2 designs have the same modules, but different global interconnects. The higher

total global wirelength does not necessarily lead to a higher power consumption. The

design with the higher overall wirelength but fewer long interconnects can have less

power, because short interconnects do not require repeaters. The HM3 minimum-area

design has 2.1% area reduction with 7.8% increase in power. These results show that

the minimum-area design might not be the best design overall. The floorplanner may

place modules that are connected to each other with global wires far from each other

to minimize area. This can increase the wirelength and power consumption.

Table 5.10 shows the results for minimum wirelength. Reducing the global wire-

length can save power and make routing easier. However, the area may increase when

trying to minimize the wirelength. The BLM design has 26.0% less wirelength at a

107

Table 5.9: Minimum area configurations of different monolithic designs

Design Area Wirelength Power Dead Temperature Area Wirelength Power(µm2) (µm) (µW) space (%) (◦K) reduction (%) reduction (%) reduction (%)

2-D 120132 1199780 130388 0.0 325.4 0.3 -23.5 -3.9BLM 60888 729625 119870 1.4 331.5 1.8 -24.8 -3.1TLM 80358 998886 119333 0.4 328.3 1.0 -36.3 -6.3

HM1 (GLM logic + TLM memory) 66305 754492 107121 0.0 329.3 0.2 -11.3 -1.2HM2 (GLM logic + BLM memory) 62150 737539 108043 3.4 329.9 0.7 2.4 -0.8

HM3 (GLM/BLM logic + BLM memory) 60114 785678 119116 0.2 330.2 2.1 -8.0 -7.8

cost of a 9.9% footprint area increase. The HM3 design can also reduce the wire-

length significantly owing to its exploration of a larger design space. It has 28.1%

less wirelength, but 4.0% footprint area and 3.6% power increase despite its shorter

wirelength. Increase in power is due to replacement of GLM modules with BLM

modules.

Table 5.10: Minimum wirelength configurations of different monolithic designs


2-D 122000 869321 124983 1.6 325.0 -1.2 10.5 0.4BLM 68115 432366 113402 11.8 329.4 -9.9 26.0 2.5TLM 84671 672989 111393 5.4 327.1 -4.3 8.1 0.8

HM1 (GLM logic + TLM memory) 67262 628482 104832 1.4 328.9 -1.3 7.3 1.0HM2 (GLM logic + BLM memory) 64064 613726 105918 6.3 329.5 -2.3 18.8 1.1

HM3 (GLM/BLM logic + BLM memory) 63856 523247 114516 5.9 330.3 -4.0 28.1 -3.6

Table 5.11 shows the designs with the smallest power consumption. Power reduc-

tion is generally obtained from wirelength reduction at the cost of a higher footprint

area. For the 2-D design, power consumption decreases by 1.0% owing to a 3.2%

reduction in wirelength, but the area increases by 2.1%. The BLM design has 2.5%

less power consumption at a cost of 9.9% increase in footprint area. The HM2 design

has 2.1% power reduction, but 15.4% footprint area increase. The HM3 design has

3.8% less power consumption with a 4.2% increase in footprint area. Unlike other

cases, HM3 can save power by replacing BLM modules with GLM ones at the cost of

an increase in footprint area.

Overall, highly optimizing a single design objective value may come at the cost of

degrading other design objective values significantly. Therefore, the designer needs to

select the cost function and the coefficients of the specific design objectives carefully

in order to avoid exaggerating the effect of only one objective.

108

Table 5.11: Minimum power consumption configurations of different monolithic de-signs


2-D 123057 940181 124159 2.4 324.9 -2.1 3.2 1.0BLM 68115 432366 113402 11.8 329.4 -9.9 26.0 2.5TLM 84903 688873 110570 5.7 327.2 -4.6 6.0 1.5

HM1 (GLM logic + TLM memory) 69296 631438 104236 4.3 328.4 -4.3 6.9 1.5HM2 (GLM logic + BLM memory) 72240 657147 104907 16.9 328.1 -15.4 13.0 2.1

HM3 (GLM/BLM logic + BLM memory) 63960 666829 106284 6.7 329.4 -4.2 8.3 3.8

5.9.3 Discussion

Floorplanning results show that HM designs offer trade-offs among chip footprint

area, wirelength, and power consumption. The quality of the HM solution depends

significantly on the number and implementation style of the modules. For a 3-D design

with two layers, only 2-D modules can be on top of other 2-D modules. Therefore, for

quality hybrid solutions, 2-D modules should be grouped into two groups with similar

overall area. This was possible with our benchmark, as evidenced by dead space as

small as 3.4% in the HM2 design, in which only BLM memory modules are placed

on top of other BLM memory modules as logic modules are implemented in GLM

style. However, this may not be the case for all benchmarks. If 2-D modules cannot

be grouped into two groups with similar area values, then HM2 would not be able

to find good solutions in terms of footprint area. In that case, fortunately, HM3 can

still find quality solutions to reduce footprint area and wirelength, since it can replace

GLM logic modules with BLM logic modules. As expected, replacing GLM modules

with their alternatives in the HM3 design increases the power consumption of the

modules, since GLM modules have the smallest power consumption. On the other

hand, HM3 can reduce global wire power consumption by reducing the footprint area.

Therefore, replacing GLM modules with their alternatives may increase or decrease

the total power consumption depending on the benchmark.

The following assumptions are made in our simulations.

109

1. 3-D designs have two transistor layers. However, monolithic designs can also

be realized with more layers. More layers may facilitate a further reduction in

footprint area, wirelength, and power consumption. However, the peak tem-

perature may increase further, since the power density will increase and the

upper layers will be farther from the heat sink. However, HM designs may be-

come even more attractive as more layers are added because of the flexibility

they offer. For example, an HM floorplanner can mitigate hotspots by replac-

ing GLM modules that have high power density with their BLM alternatives.

Moreover, the 3-D-HMFP cost function can be modified to place modules with

high power density on the lower layers. 3-D-HMFP can easily be modified to

handle more layers with a few modifications, such as adding a contour for each

layer to keep track of the placement on that layer during packing and modifying

the legalization constraint on the number of layers.

2. In 3-D designs, transistor quality is the same on both transistor layers. It

was previously shown that top-layer transistors, which are processed at a lower

temperature, can match the quality of the bottom-layer transistors [94].

3. BLM, GLM, and TLM cells are assumed to have the same power and timing

characteristics as they behave almost identically at the cell level. This assump-

tion holds true only if top-layer transistors have the same quality as bottom-layer

transistors.

4. Simulated annealing, a commonly used stochastic technique, is used to per-

turb floorplanning solutions. Simulated annealing worked well because there

are only 35 modules in our design. However, stochastic techniques can suffer

from scalability issues if the number of modules is significantly higher. In such

cases, algorithms based on non-stochastic approaches, such as deferred decision

making [95], can be used.

110

5. Fabrication cost and challenges are taken into account. However, fabrication

cost can be an issue with monolithic designs. BLM and GLM designs require

additional inter-active-layer metal layers to connect intra-module cells. On the

other hand, a TLM design requires much fewer metal layers between active

layers. A recent study [96] estimates that a TLM design is 23.0% less costly

compared to a GLM design, thanks to fewer metal layers and fabrication steps

in the TLM fabrication process. Although, based on objectives pursued in this

work, a TLM design does not seem to be promising, it may be advantageous if

fabrication cost is taken into account. Thus, more work needs to be done on

incorporating fabrication cost in the floorplanning cost function of monolithic

designs.

The results may differ if some of the assumptions do not hold. For example, the

transistor quality of top-layer transistors may be worse than that of bottom-layer

transistors in high-volume production of monolithic designs. The benefits of 3-D

designs may still outweigh the shortcomings of the fabrication process despite the

degradation in top-layer transistor quality since critical modules can be fabricated on

the bottom layer. In case assumptions change, the proposed tools can be modified to

accommodate new assumptions and re-evaluate designs using the same methodology.

5.10 Chapter summary

We introduced a 3-D HM floorplanner in this chapter. We characterized the

OpenSPARC T2 processor core in different monolithic designs and compared their

footprint area, wirelength, power, and temperature values. We showed, via simula-

tions, that HM designs offer interesting trade-offs among different design objectives,

such as footprint area, wirelength, and power consumption. We showed that, relative

111

to the 2-D design, a 3-D HM design can reduce footprint area by 48.1% and power

consumption by 14.6%.

112

Chapter 6

McPAT-monolithic: An

Area/Power/Timing Framework

for 3-D Hybrid Monolithic

Multi-Core Systems

3-D ICs have the potential to push Moore’s law further by accommodating more

transistors per unit footprint area along with a reduction in power consumption,

interconnect length, and the number of repeaters. Monolithic 3-D integration is

particularly promising in this regard as it offers a very high connectivity between

vertical transistor layers owing to its nanoscale MIVs. An HM design aims to further

optimize area, power, and performance of the chip by combining different monolithic

styles. In this chapter, we introduce McPAT-monolithic, a framework for modeling

HM multi-core architectures. We use the OpenSPARC T2 processor as a case study

to compare different monolithic implementation styles and explore the benefits of HM

design. Our simulations show that, under the same timing constraint, an HM design

113

offers 47.2% reduction in footprint area and 5.3% in power consumption compared to

a 2-D design at the cost of slightly higher on-chip temperature [28].

6.1 Introduction

Device scaling has become more challenging due to lithographic constraints, increas-

ing manufacturing costs, and exacerbating SCEs [6]. Moreover, the contribution of

interconnects towards delay and power consumption of the chip has been increas-

ing with scaling [75, 77]. 3-D ICs offer a new approach to fit more transistors in

the same footprint area while decreasing power consumption and delay by reducing

interconnect lengths [19]. Compared to other 3-D IC solutions, monolithic 3-D inte-

gration offers better performance and power efficiency due to its high-density MIVs

[22]. In addition, sequential processing of transistor layers in monolithic integration

introduces less parasitics and offers better alignment.

An HM design combines multiple monolithic integration styles to explore trade-

offs among chip footprint area, power consumption, and performance. In Chapter 5,

we introduced 3-D-HMFP, a 3-D HM floorplanner, to explore the HM design space

at the processor core level. However, modern processors consist of multiple cores,

multi-level caches, and a network-on-chip (NoC) to enable communication among

these modules. In this chapter, we present McPAT-monolithic, a framework for HM

design at the multi-core level.

We build McPAT-monolithic atop McPAT, an area/power/timing analysis

framework for multi-core designs implemented in planar technologies [97]. McPAT-

monolithic uses FinFET technology to implement different monolithic styles. We

develop the tools necessary to explore the HM design space and integrate them into

McPAT-monolithic. McPAT-monolithic uses FinPrin-monolithic, CACTI-monolithic,

and Orion-monolithic to model logic, memory, and NoC modules, respectively. All

114

the tools were evaluated based on the same 14nm SOI FinFET design library we

created via TCAD device simulations [59]. We use 3-D-HMFP for processor core

floorplanning and HotSpot 6.0 [81], a thermal analysis tool, to compute the on-chip

temperature. We use the OpenSPARC T2 SoC as a case study to compare the

different monolithic styles, including HM at the multi-core level. We show that an

HM design reduces the footprint area by 47.2% and power consumption by 5.3%

compared to a conventional 2-D design at the cost of increasing the peak temperature

by 8 ◦C, assuming the same clock frequency of 1.4 GHz [28].

We make the following contributions.

1. We introduce McPAT-monolithic for a speedy exploration of different monolithic

designs at the multi-core system level.

2. We integrate 3-D-HMFP into McPAT-monolithic for floorplanning of HM de-

signs and computing global interconnects.

3. We analyze different monolithic implementations of OpenSPARC T2 at the

processor core and SoC levels and discuss the benefits of HM designs.

The rest of the chapter is organized as follows. Section 6.2 describes the simulation

setup and design flow. Section 6.3 describes the modeling of FUs using FinPrin-

monolithic. Section 6.4 describes characterization of memory modules using CACTI-

monolithic. Section 6.5 describes Orion-monolithic, an NoC area and power modeling

tool. Section 6.6 introduces McPAT-monolithic, a multi-core architecture modeling

framework. Section 6.7 presents the simulation results. Section 6.8 concludes the

chapter.

115


Fig. 6.1 shows the HM SoC design flow we use. First, we characterize FinFET logic

and memory cells through 2-D hydrodynamic mixed-mode device simulations using

Sentaurus Device Simulator [83]. We use the Magic VLSI layout tool [82] for BLM,

GLM, and TLM cell layouts. We generate the FinFET design library based on the

cell area, power, and timing characteristics. The library has INV, NAND, NOR (sizes

1×, 2×, 4×, 8×, and 16×), a D flip-flop (DFF), and 6T SRAM cell.

We use McPAT-monolithic to model area, power, and timing of the multi-core

systems. It uses Orion-monolithic for NoC and CACTI-monolithic for cache/memory

modeling. It uses macromodels derived for FUs obtained through FinPrin-monolithic.

We feed area and power consumption of the processor core modules obtained from

McPAT-monolithic to 3-D-HMFP for floorplanning of the processor core. Lastly, we

use HotSpot 6.0 [81] to compute the temperature of the chip. If the final temperature

of a module computed by HotSpot is different than its initially assumed tempera-

ture, power is re-computed until the temperature becomes consistent. We make the

following assumptions in our simulations.

1. All 3-D designs have two transistor layers.

2. The top- and the bottom-layer transistors have the same behavior. If the top-

layer transistors are processed at a lower temperature to sustain the quality of

bottom-layer transistors, they may have slightly different characteristics. How-

ever, in [94], it was shown that the performance of top-layer transistors that are

processed at a lower temperature can match the performance of bottom-layer

transistors, which is the basis for our assumption.

3. The power and timing characteristics of the BLM, GLM, and TLM cells are the

same as they exhibit almost identical behavior at the cell level when we assume

the same transistor model on both layers.

116

Sentaurus TCAD

FinFET design library

FinPrin-monolithic


3-D-HMFP

HotSpot 6.0

Area/power

Temperature

Magic VLSI

Cell power/timing Cell area

Interconnects

Area model Power model Timing model

McPAT-monolithic

CACTI-monolithic(memory)

Orion-monolithic(NoC)

FU models Processor description

Figure 6.1: The HM SoC design flow.

6.3 Modeling of functional units

We use FinPrin-monolithic to characterize various FUs, such as arithmetic-logic unit

(ALU), multiplier (MUL), and floating-point unit (FPU). We characterized six FUs of

the OpenSPARC T2 core using FinPrin-monolithic and integrated them into McPAT-

monolithic. These units are ALU, FPU, gasket unit (GKT), pick unit (PKU), perfor-

mance monitoring unit (PMU), and trap logic unit (TLU). The remaining modules

of the OpenSPARC T2 processor core are modeled using McPAT-monolithic. We

assume 1.4 GHz clock frequency and 330 ◦K temperature.

Table 6.1 shows FinPrin-monolithic results (footprint area and power consump-

tion) for the six FUs implemented in different monolithic styles. In all, TLM modules

have 35.7% and 5.3% reduction in the footprint area and power consumption, respec-

117

Table 6.1: Footprint area and power consumption results for FUs

Monolithic type → BLM TLM GLMFunctional unit Area Power Area Power Area Power

↓ (µm2) (µW) (µm2) (µW) (µm2) (µW)ALU 9078 109822 6309 (-30.5%) 109728 (-0.1 %) 4879 (-46.3 %) 104829 (-4.5%)FPU 23866 210513 15203 (-36.3%) 195861 (-7.0 %) 11696 (-51.0 %) 181329 (-13.9%)GKT 3029 3182 1933 (-36.2%) 2971 (-6.6 %) 1532 (-49.4 %) 2868 (-9.8%)PKU 4068 7145 2618 (-35.6%) 6669 (-6.7 %) 2051 (-49.6 %) 6372 (-10.8%)PMU 2622 2787 1704 (-35.0 %) 2568 (-7.9 %) 1324 (-49.5 %) 2533 (-9.1%)TLU 18614 27367 11663 (-37.3%) 23725 (-13.3 %) 9014 (-51.6 %) 22164 (-19.0%)Total 61277 360816 39430 (-35.7%) 341522 (-5.3 %) 30497 (-50.2 %) 320094 (-11.3%)

tively, compared to BLM modules. The area reduction is as expected since the TLM

logic cells we use have a similar footprint area reduction compared to 2-D cells [27].

The reduction in power consumption differs slightly from one logic module to another

based on how interconnects contribute to power consumption. GLM modules offer

around 50% footprint area reduction compared to BLM modules. The area reduction

is sometimes larger than 50% due to the reduction in the number of repeaters that

are needed for long interconnects. The total power consumption of GLM modules is

11.3% smaller than that of BLM modules owing to shorter interconnects and fewer

repeaters.

6.4 Modeling of memory modules

McPAT-monolithic uses CACTI-monolithic to model BLM and TLM memory mod-

ules. For the OpenSPARC T2 core, McPAT-monolithic characterizes 20 modules

that include the instruction cache, data cache array, integer register file, floating-

point register file, data translation lookaside buffer, instruction translation lookaside

buffer, and miss/fill/prefetch buffers associated with instruction and data caches.

Table 6.2 shows the total footprint area and power consumption of these memory

modules implemented in BLM and TLM. In all, TLM memory modules offer 38.1%

reduction in footprint area with respect to BLM modules, mainly because the TLM

SRAM cell is 43.9% smaller than the BLM SRAM cell [26]. TLM memory modules

118

have 12.8% smaller power consumption owing to shorter interconnects that are used

to route data inside the memory module.

Table 6.2: Memory modules in BLM vs. TLM

BLM TLM Reduction (%)Area (µm2) 165766 102641 38.1Power (µW) 140252 122361 12.8

6.5 Orion-monolithic

We have built Orion-monolithic atop Orion 2.0 [98] and integrated it into McPAT-

monolithic. It characterizes the power and area of NoCs implemented in BLM and

TLM styles. Fig. 6.2 shows the Orion-monolithic simulation flow. It models various

NoC components, such as the crossbar, arbiter, buffers, and the clock/link based on

the NoC model, technology parameter values, and FinFET design library.

Table 6.3 shows the Orion-monolithic results for the OpenSPARC T2 cache cross-

bar (CCX). CCX connects the processor cores to the L2 cache banks. It is an 8 (T2

cores) by 9 (8 L2 banks + IO) matrix crossbar. When implemented in TLM, it has a

35.3% smaller footprint area and 12.7% lower power consumption. The power saving

is due to shorter interconnects in the crossbar and buffer.

Table 6.3: Orion-monolithic results

BLM TLM TLM vs. BLMNoC Area Power Area Power Area Power

component (µm2) (µW) (µm2) (µW) red. (%) red. (%)Crossbar 504282 83127 324181 72241 35.7 13.1Arbiter 36356 11909 23372 11909 35.7 0.0Buffer 5334 17890 3606 14096 32.4 21.2Clock 7121 5158 6741 4885 5.3 5.3Total 553093 118083 357900 103131 35.3 12.7

119

Area/power

NoC configuration- Crossbar type - #Input ports- #Output ports- Flit width- Technology

FinFET design library- Logic cells- Memory cells- Capacitance model- Resistance model- Delay model

Buffer model

Arbiter modelCrossbar model

Clock/link model

Figure 6.2: The Orion-monolithic simulation flow.

6.6 McPAT-monolithic

We build McPAT-monolithic atop McPAT [97], a framework for modeling multi-core

architectures. McPAT-monolithic models the processor in a hierarchical manner, as

shown in Fig. 6.3. It starts from the low-level circuits and models the architecture

in a bottom-up fashion. We have enhanced McPAT in the following ways to obtain

McPAT-monolithic.

• Integrating a FinFET design library characterized via TCAD simulations rather

than relying on parameter values scaled from prior technology nodes.

• Adding support for FinFET libraries by extending capacitance and resistance

models. FinFETs suffer from the width quantization issue that forces a FinFET

120

to only have an integer number of fins. Thus, capacitance models that are

implemented for a planar technology are not applicable to FinFET technology.

• Updating delay models based on the TCAD device simulations.

In this work, we model the processor cores, L2 cache, and NoC. The components of

a processor core modeled in McPAT-monolithic are the instruction fetch unit (IFU),

execution unit (EXE), load store unit (LSU), memory management unit (MMU), and

FUs such as ALU, FPU, and MUL. We were unable to model some SoC components,

such as the memory controller that did not have a macromodel available. For such

components, McPAT uses empirical data from earlier technology nodes and scales

them to newer technologies. We have excluded these modules in our analysis since

we use a 14nm FinFET technology and scaling empirical values from a prior planar

technology will not be accurate.

Processor

CoresEXE MMUIFU LSU

FUs ALU FPU MUL

Cache

Decoder

SRAM array

Wire

NoCCrossbar

Arbiter Buffer

Link

Figure 6.3: Hierarchical modeling in McPAT-monolithic.

6.7 Results

We characterize the OpenSPARC T2 [93] processor cores, L2 cache, and NoC us-

ing four different implementations: 2-D, BLM, TLM, and HM. Table 6.4 shows the

OpenSPARC T2 processor configuration and technology parameter values. We run

the simulations on a 3.10 GHz machine with 64-bit quad-core Intel i5 processor, 8

GB DRAM, and Ubuntu 12.04 LTS operating system.

121

Table 6.4: Processor model parameter values

Parameter ValueProcessor model OpenSPARC T2Processor type In-order

Number of cores 8Number of threads 4Instruction width 32

L2 cache 4 MB (Eight 512 KB banks)L1 instruction cache 16 KB

L1 data cache 8 KBTechnology 14 nm SOI FinFETFrequency 1.4 GHz

Supply voltage 0.8 V

Table 6.5: Comparison of different monolithic designs based on minimum area-powerproduct

Design Area Power Wirelength Dead Temperature Area Power Wirelength(µm2) (µW) (µm) space (%) (◦K) red. (%) red. (%) red. (%)

2-D 249864 383780 662232 1.1 331 0 0 0BLM 126228 377604 451531 2.1 341 49.5 1.6 31.8TLM 155727 354453 511426 0.5 338 37.7 7.6 22.8HM 136157 359895 458939 2.7 339 45.5 6.2 30.7

6.7.1 Floorplanning results of the OpenSPARC T2 core

Table 6.5 shows the floorplanning results for a single OpenSPARC T2 processor core.

It presents the footprint area, global wirelength, power consumption, dead space,

and peak temperature of the design. Area and power values of the core modules are

obtained using McPAT-monolithic and fed into 3-D-HMFP for floorplanning. Power

consumption consists of runtime dynamic and leakage power. Global interconnects are

added among modules during floorplanning. For each design, we run 100 simulations

and select the floorplan that minimizes the area-power product since both parameters

are important for a good design.

The 2-D design has the largest footprint area and the lowest peak temperature.

Although it has the highest power consumption among all designs, it has the smallest

power density due to its larger footprint area. This leads to smaller on-chip temper-

ature values. The 2-D design forms the baseline.

122

The BLM design offers a 49.5% reduction in footprint area along with 1.6% lower

power consumption owing to the 31.8% reduction in global wirelength. Due to a

higher power density, its peak temperature is higher by 10 ◦C compared to that of

the baseline.

The TLM design has 37.7%, 7.6%, and 22.8% reduction in the footprint area,

power consumption, and wirelength, respectively. It offers higher power reduction

compared to the BLM design since it benefits from intra-module interconnect length

reduction. However, its global wirelength reduction is smaller than that of the BLM

design due to a higher footprint area.

The HM design, which combines modules that are implemented in all different

monolithic styles, offers a 45.5% footprint area reduction. The 6.2% power reduction

is not as much as that of the TLM design since the BLM modules used in the HM

design have more power consumption than TLM modules. The HM design uses

only three GLM modules out of the six GLM modules available. Although GLM

modules have a smaller footprint area and power consumption compared to other

implementations, the floorplanner may not use all of them in order to balance the

areas of the two layers.

Fig. 6.4 shows the floorplans of the OpenSPARC T2 core implemented in 2-D,

BLM, TLM, and HM, whose results were presented in Table 6.5. The 2-D design is

implemented on a single layer, whereas monolithic 3-D designs are implemented on

two (top and bottom) layers. The TLM design has the same floorplan on both layers

since all modules are implemented in 3-D. The HM design has modules implemented

in all three monolithic styles. Fig. 6.4 shows that GLM and TLM modules have the

same footprint on the bottom and top layers.

123

(a) (b) (c) (d)

Figure 6.4: OpenSPARC T2 floorplanning results: (a) 2-D, (b) BLM, (c) TLM, and(d) HM. Colors indicate the implementation style: blue: BLM, brown: TLM, andgreen: GLM.

6.7.2 The OpenSPARC T2 SoC results

Table 6.6 shows the area and power consumption values of the SoC components im-

plemented in BLM and TLM. L2 cache and CCX results are obtained directly from

McPAT-monolithic. We do not have a GLM case since most of the modules in the

processor are not modeled in GLM. The area and power consumption of the processor

cores are corrected by 3-D-HMFP after floorplanning since McPAT-monolithic does

not obtain a floorplan and ignores global interconnects. In all, the TLM implemen-

tation offers a 36.8% and 8.0% reduction in footprint area and power consumption,

124

Table 6.6: Area and power results for the SoC components implemented in BLM andTLM

BLM TLM TLM vs. BLMArea Power Area Power Area Power

(mm2) (W) (mm2) (W) reduction (%) reduction (%)Cores 2.00 2.98 1.25 2.75 37.7 7.9

L2 2.94 0.82 1.87 0.76 36.5 7.7CCX 0.55 0.12 0.36 0.10 35.3 12.7Total 5.49 3.92 3.47 3.61 36.8 8.0

respectively. The reduction in power consumption is the largest for CCX since its

power is dominated by interconnects in the crossbar and buffer.

Fig. 6.5 shows the SoC floorplans we generated for the different designs: 2-D,

BLM, TLM, and HM. We obtain the processor core floorplan from 3-D-HMFP and

floorplan the SoC in a similar fashion to the original OpenSPARC T2 chip. We assume

the dimensions of the L2 cache and CCX are flexible (however, their area remains the

same) for the sake of a fair area comparison among different designs. The 2-D design

only uses 2-D modules. The BLM SoC design consists of the cores that are imple-

mented in 3-D BLM (Fig. 6.4(b)), 2-D L2 cache, and 2-D CCX modules. In the BLM

SoC design, we split the L2 cache banks unequally between the two layers in order to

achieve a balanced area. The TLM SoC design assumes all modules are implemented

in TLM. The HM SoC design contains the cores that are implemented in HM by

3-D-HMFP, as shown in Fig. 6.4(d). It assumes that the L2 cache is implemented in

BLM for area efficiency and CCX in TLM for lower power consumption. Table 6.7

shows the overall footprint area and power consumption results for all SoC designs.

The 2-D SoC design has the highest footprint area and power consumption. It forms

the baseline. The BLM SoC design has 49.7% and 1.4% reduction in footprint area

and power consumption, respectively, with respect to the 2-D baseline. The reduction

in power consumption comes from shorter global interconnects. The TLM SoC design

offers 36.8% reduction in footprint area along with 8.0% lower power consumption. It

125

Core 1

L2T

CCX

Core 2

Core 3

Core 4

L2T L2T L2T

Core 5

Core 6

Core 7

Core 8

L2T L2T L2T L2T

L2D

L2D

L2D

L2D

Core 1

Core 2

Core 3

Core 4

Core 5

Core 6

Core 7

Core 8

L2D

L2D

L2D

L2D

Core 1

L2T L2T L2T L2T

CCX

L2DCore 2 Core 3 Core 4

Core 5 Core 6 Core 7 Core 8

L2T L2T L2T L2T

L2D

L2D

L2D

L2D

L2D

L2D

L2D

Core 1

CCX



L2D

L2D

L2D

L2D

L2D

L2D

L2D

Core 1

CCX



L2D

L2D

L2D

L2D

L2D

L2D

L2D

Core 1

CCX



Top layer

Bottom layer

L2D

L2D

L2D

Core 1

CCX


Core 5 Core 6 Core 7 Core 8L2D

L2D

L2D

(a) (b) (c) (d)

Figure 6.5: OpenSPARC T2 SoC floorplans: (a) 2-D, (b) BLM, (c) TLM, and (d)HM (HM core + BLM L2 + TLM CCX).

benefits from a reduction in both global and intra-module interconnect lengths. The

HM SoC design has 47.2% smaller footprint area and 5.3% lower power consumption

with respect to the 2-D design. The HM2 SoC design uses an L2 cache implemented

in TLM to increase the power savings to 6.9%. However, the area reduction for the

HM2 design is a smaller 39.5%. It shows how HM designs can offer trade-offs among

different design objectives.

Table 6.7: Total area and power consumption results for the SoC designs

Area Power Area Power(mm2) (W) reduction (%) reduction (%)

2-D 5.49 3.92 0.0 0.0BLM 2.76 3.86 49.7 1.4TLM 3.47 3.61 36.8 8.0

HM (BLM L2 + TLM CCX) 2.90 3.71 47.2 5.3HM2 (TLM L2 + TLM CCX) 3.32 3.65 39.5 6.9

Fig. 6.6 shows the heat map of the SoC designs. As expected, 3-D designs have

higher temperatures (7-10 ◦C) than the 2-D design. The SoC implemented in TLM

has the highest peak temperature. The top layers have higher temperatures than the

bottom layers since they are farther away from the heat sink. They have around 1-2

◦C higher peak temperature values compared to the bottom layers.

126

(a) (b) (c) (d)

Tem

per

ature

(°K

)

324

326

328

330

332

334

336

Figure 6.6: OpenSPARC T2 heat maps: (a) 2-D, (b) BLM, (c) TLM, and (d) HM(HM core + BLM L2 + TLM CCX).

6.7.3 Discussion

We have shown, via simulations, that HM designs can offer various trade-offs among

design objectives, such as footprint area, power consumption, and on-chip tempera-

ture. This section presents some key observations on these trade-offs.

Our power consumption results consist of runtime dynamic and leakage power.

Runtime dynamic power of a module depends on its activity factor. Thus, power

reduction of a 3-D design is determined by the number of calls to its modules. For

example, the FPU implemented in GLM has a 13.9% power reduction, whereas the

ALU implemented in GLM has only a 4.5% power reduction compared to the 2-D

implementations. However, the impact of an ALU on runtime dynamic power may

be larger than that of the FPU if there are significantly more arithmetic operations

than floating-point operations. Thus, the power savings of a 3-D FPU with respect

to a 2-D FPU may not be significant at runtime if the FPU is rarely used. Hence,

the power benefit of a 3-D design depends strongly on the benchmark.

McPAT-monolithic cannot model modules in the GLM style except the FUs,

which can be characterized by FinPrin-monolithic. Thus, the power reduction of

the OpenSPARC T2 hybrid core design in this work is not as significant as it could

127

be if there were more GLM modules. In Chapter 5, we showed that an HM design

can reduce the power consumption of the T2 core by 14.6% since most logic modules

were implemented in GLM. However, in this work the reduction in power consump-

tion of the HM core is only 6.2% because fewer modules are implemented in GLM.

It is challenging to design a tool that is capable of modeling GLM modules at the

architecture level since it requires circuit-level 3-D gate placement. Adding GLM

modeling capability can enable McPAT-monolithic to use GLM modules more and

explore a larger HM design space.

Although we have only considered 3-D designs with two layers, monolithic designs

can be implemented on more than two layers. However, the temperature will become

a more pressing concern since the peak temperature increases around 10 ◦C just by

adding a second transistor layer, despite the lower power consumption.

We do not use an RTL-to-GDSII physical design flow in our simulations. We use

tools that are fast in order to explore the HM design space and different architectures

more quickly. However, our tools are not as accurate as the tools that are used in

an RTL-to-GDSII design flow. After the design parameters are decided using the

McPAT-monolithic framework, an RTL-to-GDSII physical design flow can be used

for more accurate simulation results.

6.8 Chapter summary

We introduced McPAT-monolithic, an area/power/timing modeling tool for 3-D

monolithic multi-core architectures. We used it to model the OpenSPARC T2 SoC

implemented in different monolithic design styles. We demonstrated that an HM

design consisting of modules implemented in different monolithic styles can reduce

the footprint area and power consumption by 47.2% and 5.3%, respectively, compared

to the 2-D design at the cost of a higher on-chip temperature.

128

Chapter 7

Conclusion and Future Work

This chapter summarizes the findings of this thesis and discusses future directions.

7.1 Summary of findings

In Chapter 3, we explored the use of MPA FinFETs for low-power, robust, and

dense SRAM cell design. We investigated FinFETs with asymmetries in gate work-

function, source/drain doping concentration, gate underlap, and their combinations.

We showed that asymmetry in gate workfunction combined with asymmetry in dop-

ing concentration can reduce the leakage power by 58× and mitigate the read-write

conflict. We showed MPA FinFETs can also be effective at addressing the width-

quantization issue since the strength of a FinFET can be adjusted by introducing

asymmetries while keeping the fin count the same. Thus, it is possible to achieve

different effective pull-down-to-access and access-to-pull-up ratios in terms of their

strengths to achieve good stability metrics even with single-fin FinFETs. Use of

MPA FinFETs, however, often degrades the SRAM performance due to weaker Fin-

FETs. In addition, adding asymmetries increases the number of fabrication steps

that can increase manufacturing cost and degrade the yield.

129

In Chapter 4, we presented two new 3-D 8T SRAM cells for enhanced TR and low

ILEAK. We used pFinFET access transistors to achieve an area-efficient design in 3-D

by equalizing the number of nFinFETs and pFinFETs in the 8T SRAM cell. However,

use of pFinFET access transistors degrades writeability of the proposed cells. Thus,

in one of the proposed cells, we employed IG pull-up transistors with their back gate

tied to VDD to improve the degraded writeability. This cell has 28.1%, 31.6%, and

53.2% smaller footprint area, ILEAK, and TR, respectively, compared to a conventional

2-D 6T SRAM cell. It also has 43.8%, 43.2%, and 29.0% reduction in footprint area,

ILEAK, and TR, respectively, compared to a conventional 2-D 8T SRAM cell. We

investigated various assist techniques and showed cell-GND-boosting can be used to

improve the writeability of the proposed cells. We also investigated SRAM cells for

different design objectives such as high stability, high performance, or low leakage.

The proposed cells are shown to be particularly promising for high-read-performance

and low-leakage purposes.

In Chapter 5, we investigated the benefits of HM designs. We developed tools

needed to model logic and memory modules implemented in different monolithic styles

such as BLM, GLM, and TLM. We developed an effective and fast gate-level place-

ment method for GLM logic modules. We also presented 3-D-HMFP, the first 3-D

HM floorplanner, to explore the HM design space. We showed an HM design can

reduce the footprint area and power consumption of the OpenSPARC T2 processor

core by 48.1% and 14.6% compared to a 2-D design.

In Chapter 6, we presented McPAT-monolithic, an architectural modeling frame-

work for HM designs at the multi-core level. We developed an NoC modeling tool

for BLM and TLM designs. We integrated 3-D-HMFP into McPAT-monolithic for

processor core floorplanning. We showed that an HM design can reduce the footprint

area of the OpenSPARC T2 SoC by 47.2% along with a 5.3% reduction in runtime

power consumption with respect to its 2-D counterpart.

130

7.2 Future work

In Chapter 3, we only consider FinFETs with asymmetries in gate workfunction,

doping concentration, gate underlap, and their combinations. However, there may be

other ways to introduce asymmetry in FinFET-based SRAM cells such as using Fin-

FETs with different heights. Multiple-fin-height FinFETs can be useful at addressing

the width quantization issue. The same methodology we employ in Chapter 3 can be

used even if more asymmetries are introduced. The fabrication cost of introducing

asymmetries needs be investigated in order to make a better comparison between

symmetric, SPA, and MPA FinFET-based SRAM cells. Another future direction can

be investigating assist techniques for MPA FinFET-based SRAM cells to improve the

degraded performance.

In Chapter 4, we chose a thick ILD to prevent inter-layer coupling. A thin ILD

can induce additional variations and alter device characteristics. The impact of inter-

layer coupling on the proposed 8T SRAM cells can be investigated in the future. Use

of IG FinFETs in 3-D SRAM cells can also be explored in the presence of inter-layer

coupling as they enable dynamic leakage management by modifying Vth dynamically.

For monolithic 3-D integration, we only considered designs with two transistor

layers. Designs with three or more transistor layers can be explored as they can reduce

the interconnect and power consumption even further. However, the temperature

increase can be a problem with these designs. Novel cooling techniques capable of

reducing temperatures of monolithic 3-D designs can be investigated in the future.

Modeling of memory modules implemented in GLM can be an interesting research

topic as they occupy the largest area on the chip. Folding can be a possible way to

model GLM 3-D memory modules since they have a regular structure. Adaptation of

monolithic 3-D integration can eliminate the memory wall problem by providing a high

connectivity between the processor core and an on-chip memory. Thus, 3-D designs

with on-chip memory can be explored to breach the communication gap between

131

the processor and main memory. Finally, heterogeneous monolithic 3-D designs that

integrate different components such digital and analog ICs, micro-electro-mechanical

systems, and sensors can be investigated in the future.

We have used 22nm and 14nm technology nodes in our experiments. The benefits

of the proposed designs need to be investigated for smaller nodes as scaling contin-

ues. Although we expect to see similar benefits at smaller nodes, the improvements

obtained by the proposed designs may differ. For example, we showed that AWDSG

FinFETs can lower SRAM ILEAK at 22nm by 58× with given parameter values. It is

expected that AWDSG FinFETs will lower SRAM ILEAK at 14nm as well since both

asymmetries reduce ILEAK. However, the improvement may be different at the 14nm

node based on the design parameters. Similarly, benefits of TLM SRAMs may differ

at smaller nodes especially based on the relative strength of nFinFET and pFinFETs.

Thus, the proposed designs need to be re-evaluated based on the targeted node.

132

Bibliography

[1] G. E. Moore, “Cramming more components onto integrated circuits,” Electron.,vol. 38, no. 8, pp. 114–117, Apr. 1965.

[2] J. M. Shalf and R. Leland, “Computing beyond Moore’s law,” IEEE Computer,vol. 48, no. 12, pp. 14–23, Dec. 2015.

[3] T. Ghani, M. Armstrong, C. Auth, M. Bost, P. Charvat, G. Glass, T. Hoffmann,K. Johnson, C. Kenyon, J. Klaus et al., “A 90nm high volume manufacturinglogic technology featuring novel 45nm gate length strained silicon CMOS tran-sistors,” in Proc. IEEE Int. Electron Device Mtg., Dec. 2003, pp. 11–6.

[4] K. Mistry, C. Allen, C. Auth, B. Beattie, D. Bergstrom, M. Bost, M. Brazier,M. Buehler, A. Cappellani, R. Chau et al., “A 45nm logic technology with high-k+ metal gate transistors, strained silicon, 9 Cu interconnect layers, 193nm drypatterning, and 100% Pb-free packaging,” in Proc. IEEE Int. Electron DeviceMtg., Dec. 2007, pp. 247–250.

[5] C. Auth, C. Allen, A. Blattner, D. Bergstrom, M. Brazier, M. Bost, M. Buehler,V. Chikarmane, T. Ghani, T. Glassman et al., “A 22nm high performance andlow-power CMOS technology featuring fully-depleted tri-gate transistors, self-aligned contacts and high density MIM capacitors,” in Proc. Int. Symp. VLSITechnol., June 2012, pp. 131–132.

[6] T. N. Theis and H. S. P. Wong, “The end of Moore’s law: A new beginning forinformation technology,” Comput. Science Engineering, vol. 19, no. 2, pp. 41–50,Mar. 2017.

[7] R. S. Williams, “What’s next? [The end of Moore’s law],” Comput. ScienceEngineering, vol. 19, no. 2, pp. 7–13, Mar. 2017.

[8] D. Hisamoto, W.-C. Lee, J. Kedzierski, H. Takeuchi, K. Asano, C. Kuo, E. An-derson, T.-J. King, J. Bokor, and C. Hu, “FinFET-a self-aligned double-gateMOSFET scalable to 20 nm,” IEEE Trans. Electron Devices, vol. 47, no. 12, pp.2320–2325, Dec. 2000.

[9] Y.-K. Choi, L. Chang, P. Ranade, J.-S. Lee, D. Ha, S. Balasubramanian, A. Agar-wal, M. Ameen, T.-J. King, and J. Bokor, “FinFET process refinements for im-proved mobility and gate work function engineering,” in Proc. IEEE Int. ElectronDevice Mtg., Dec. 2002, pp. 259–262.

133

[10] M. C. Wang, “Independent-gate FinFET circuit design methodology,” IAENGInt. J. of Comput. Science, vol. 37, no. 1, pp. 1–8, Feb. 2010.

[11] H. Kawasaki, V. S. Basker, T. Yamashita, C. Lin, Y. Zhu, J. Faltermeier,S. Schmitz, J. Cummings, S. Kanakasabapathy, H. Adhikari et al., “Challengesand solutions of FinFET integration in an SRAM cell and a logic circuit for 22nm node and beyond,” in Proc. IEEE Int. Electron Device Mtg., Dec. 2009, pp.1–4.

[12] T. Yamashita, V. S. Basker, T. Standaert, C.-C. Yeh, T. Yamamoto, K. Maitra,C.-H. Lin, J. Faltermeier, S. Kanakasabapathy, M. Wang et al., “Sub-25nm Fin-FET with advanced fin formation and short channel effect engineering,” in Proc.IEEE Symp. VLSI Technol., June 2011, pp. 14–15.

[13] C. C. Wu, D. W. Lin, A. Keshavarzi, C. H. Huang, C. T. Chan, C. H. Tseng,C. L. Chen, C. Y. Hsieh, K. Y. Wong, M. L. Cheng, T. H. Li et al., “Highperformance 22/20nm FinFET CMOS devices with advanced high-k/metal gatescheme,” in Proc. IEEE Int. Electron Device Mtg., Dec. 2010, pp. 27.1.1–27.1.4.

[14] X. Dai and N. K. Jha, “Improving convergence and simulation time of quan-tum hydrodynamic simulation: Application to extraction of best 10-nm FinFETparameter values,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 25,no. 1, pp. 319–329, Jan. 2017.

[15] Intel Corp., “Intel 14nm technology,” 2014. [Online]. Available: https://www.intel.com/content/www/us/en/silicon-innovations/intel-14nm-technology.html

[16] Samsung Foundry, “14nm FinFET technology,” 2015. [Online]. Available:http://www.samsung.com/semiconductor/foundry/process-technology/14nm/

[17] S. Manne, A. Klauser, and D. Grunwald, “Pipeline gating: Speculation controlfor energy reduction,” in Proc. Int. Symp. Comput. Architecture, vol. 26, no. 3,June 1998, pp. 132–141.

[18] S. O. Toh, Z. Guo, T.-J. K. Liu, and B. Nikolic, “Characterization of dynamicSRAM stability in 45 nm CMOS,” IEEE J. Solid-State Circuits, vol. 46, no. 11,pp. 2702–2712, Nov. 2011.

[19] S. Panth, K. Samadi, Y. Du, and S. K. Lim, “Design and CAD methodologiesfor low power gate-level monolithic 3D ICs,” in Proc. Int. Symp. Low PowerElectron. Design, Aug. 2014, pp. 171–176.

[20] R. S. Patti, “Three-dimensional integrated circuits and the future of system-on-chip designs,” Proc. IEEE, vol. 94, no. 6, pp. 1214–1224, June 2006.

[21] D. H. Kim, K. Athikulwongse, and S. K. Lim, “Study of through-silicon-viaimpact on the 3-D stacked IC layout,” IEEE Trans. Very Large Scale Integr.(VLSI) Syst., vol. 21, no. 5, pp. 862–874, May 2013.

134

https://www.intel.com/content/www/us/en/silicon-innovations/intel-14nm-technology.html

https://www.intel.com/content/www/us/en/silicon-innovations/intel-14nm-technology.html

http://www.samsung.com/semiconductor/foundry/process-technology/14nm/

[22] C. Liu and S. K. Lim, “A design tradeoff study with monolithic 3D integration,”in Proc. Int. Symp. Qual. Electron. Design, Mar. 2012, pp. 529–536.

[23] S. Panth, K. Samadi, Y. Du, and S. K. Lim, “Power-performance study of block-level monolithic 3D-ICs considering inter-tier performance variations,” in Proc.Des. Autom. Conf., June 2014, pp. 1–6.

[24] Y. Lee, P. Morrow, and S. K. Lim, “Ultra high density logic designs usingtransistor-level monolithic 3D integration,” in Proc. Int. Conf. Comput.-AidedDes., Nov. 2012, pp. 539–546.

[25] A. Guler and N. K. Jha, “Ultra-low-leakage, robust FinFET SRAM design usingmultiparameter asymmetric FinFETs,” ACM J. Emerg. Technol. Comput. Syst.,vol. 13, no. 2, pp. 26:1–26:25, 2017.

[26] A. Guler and N. K. Jha, “Three-dimensional monolithic FinFET-based 8TSRAM cell design for enhanced read time and low leakage,” IEEE Trans. VeryLarge Scale Integr. (VLSI) Syst., vol. 27, no. 4, pp. 899–912, Apr. 2019.

[27] A. Guler and N. K. Jha, “Hybrid monolithic 3-D IC floorplanner,” IEEE Trans.Very Large Scale Integr. (VLSI) Syst., vol. 26, no. 10, pp. 1868–1880, Oct. 2018.

[28] ——, “McPAT-monolithic: An area/power/timing architecture modeling frame-work for 3-D hybrid monolithic multi-core systems,” IEEE Trans. Very LargeScale Integr. (VLSI) Syst., under review.

[29] A. N. Bhoj and N. K. Jha, “Design of logic gates and flip-flops in high-performance FinFET technology,” IEEE Trans. Very Large Scale Integr. (VLSI)Syst., vol. 21, no. 11, pp. 1975–1988, Nov. 2013.

[30] ——, “Parasitics-aware design of symmetric and asymmetric gate-workfunctionFinFET SRAMs,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22,no. 3, pp. 548–561, Mar. 2014.

[31] F. Moradi, S. K. Gupta, G. Panagopoulos, D. T. Wisland, H. Mahmoodi, andK. Roy, “Asymmetrically doped FinFETs for low-power robust SRAMs,” IEEETrans. Electron Devices, vol. 58, no. 12, pp. 4241–4249, Dec. 2011.

[32] S. M. Salahuddin, H. Jiao, and V. Kursun, “A novel 6T SRAM cell with asym-metrically gate underlap engineered FinFETs for enhanced read data stabilityand write ability,” in Proc. Int. Symp. Qual. Electron. Design, Mar. 2013, pp.353–358.

[33] A. Goel, S. K. Gupta, and K. Roy, “Asymmetric drain spacer extension (ADSE)FinFETs for low-power and robust SRAMs,” IEEE Trans. Electron Devices,vol. 58, no. 2, pp. 296–308, Feb. 2011.

135

[34] S. M. Chaudhuri and N. K. Jha, “Ultra-low-leakage and high-performance logiccircuit design using multiparameter asymmetric FinFETs,” ACM J. Emerg.Technol. Comput. Syst., vol. 12, no. 4, p. 43, Mar. 2016.

[35] F. Andrieu, R. Berthelon, R. Boumchedda, G. Tricaud, L. Brunet, P. Batude,B. Mathieu, E. Avelar, A. A. de Sousa, G. Cibrario, O. Rozeau et al., “Designtechnology co-optimization of 3D-monolithic standard cells and SRAM exploitingdynamic back-bias for ultra-low-voltage operation,” in Proc. IEEE Int. ElectronDevice Mtg., Dec. 2017, pp. 20.3.1–20.3.4.

[36] P. Batude, M. A. Jaud, O. Thomas, L. Clavelier, A. Pouydebasque, M. Vinet,S. Deleonibus, and A. Amara, “3D CMOS integration: Introduction of dynamiccoupling and application to compact and robust 4T SRAM,” in Proc. Int. Conf.Integr. Circuit Design and Technol., June 2008, pp. 281–284.

[37] K. C. Yu, M. L. Fan, P. Su, and C. T. Chuang, “Evaluation of monolithic 3-Dlogic circuits and 6T SRAMs with InGaAs-n/Ge-p ultra-thin-body MOSFETs,”IEEE J. Electron Devices Society, vol. 4, no. 2, pp. 76–82, Mar. 2016.

[38] O. Thomas, M. Vinet, O. Rozeau, P. Batude, and A. Valentian, “Compact 6TSRAM cell with robust read/write stabilizing design in 45nm monolithic 3D ICtechnology,” in Proc. Int. Conf. Integr. Circuit Design and Technol., May 2009,pp. 195–198.

[39] C. H. Shen, J. M. Shieh, T. T. Wu, W. H. Huang, C. C. Yang, C. J. Wan,C. D. Lin, H. H. Wang, B. Y. Chen, G. W. Huang, Y. C. Lien, S. Wong et al.,“Monolithic 3D chip integrated with 500ns NVM, 3ps logic circuits and SRAM,”in Proc. IEEE Int. Electron Device Mtg., Dec. 2013, pp. 9.3.1–9.3.4.

[40] C. Liu and S. K. Lim, “Ultra-high density 3D SRAM cell designs for monolithic3D integration,” in Proc. IEEE Int. Interconnect Technol. Conf., June 2012, pp.1–3.

[41] L. Chang, D. M. Fried, J. Hergenrother, J. W. Sleight, R. H. Dennard, R. K.Montoye, L. Sekaric, S. J. McNab, A. W. Topol, C. D. Adams, K. W. Guarini,and W. Haensch, “Stable SRAM cell design for the 32 nm node and beyond,” inProc. Int. Symp. VLSI Technol., June 2005, pp. 128–129.

[42] D. Bhattacharya and N. K. Jha, “Ultra-high density monolithic 3-D FinFETSRAM with enhanced read stability,” IEEE Trans. Circuits Syst. I, Reg. Papers,vol. 63, no. 8, pp. 1176–1187, Aug. 2016.

[43] M. L. Fan, V. P. H. Hu, Y. N. Chen, P. Su, and C. T. Chuang, “Stability andperformance optimization of heterochannel monolithic 3-D SRAM cells consid-ering interlayer coupling,” IEEE Trans. Electron Devices, vol. 61, no. 10, pp.3448–3455, Oct. 2014.

136

[44] S. A. Tawfik and V. Kursun, “Compact FinFET memory circuits with p-typedata access transistors for low leakage and robust operation,” in Proc. Int. Symp.Qual. Electron. Design, Mar. 2008, pp. 855–860.

[45] R. Yarmand, B. Ebrahimi, H. Afzali-Kusha, A. Afzali-Kusha, and M. Pedram,“High-performance and high-yield 5 nm underlapped FinFET SRAM design us-ing p-type access transistors,” in Proc. Int. Symp. Qual. Electron. Design, Mar.2015, pp. 10–17.

[46] J. H. Law, E. F. Young, and R. L. Ching, “Block alignment in 3D floorplan usinglayered TCG,” in Proc. Great Lakes Symp. VLSI, Apr.-May 2006, pp. 376–380.

[47] X. Li, Y. Ma, and X. Hong, “A novel thermal optimization flow using incrementalfloorplanning for 3D ICs,” in Proc. Asia South Pacific Des. Automat. Conf., Jan.2009, pp. 347–352.

[48] R. K. Nain and M. Chrzanowska-Jeske, “Fast placement-aware 3-D floorplanningusing vertical constraints on sequence pairs,” IEEE Trans. Very Large ScaleIntegr. (VLSI) Syst., vol. 19, no. 9, pp. 1667–1680, Sept. 2011.

[49] A. Quiring, M. Lindenberg, M. Olbrich, and E. Barke, “3D floorplanning con-sidering vertically aligned rectilinear modules using T*-tree,” in Proc. IEEE Int.3D Syst. Integration Conf., Jan. 2012, pp. 1–5.

[50] J. Knechtel, E. F. Young, and J. Lienig, “Planning massive interconnects in 3-Dchips,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 34, no. 11,pp. 1808–1821, Nov. 2015.

[51] J.-M. Lin, P.-Y. Chiu, and Y.-F. Chang, “SAINT: Handling module folding andalignment in fixed-outline floorplans for 3D ICs,” in Proc. Int. Conf. Comput.-Aided Des., Nov. 2016, pp. 1–7.

[52] Y. Chen, G. Sun, Q. Zou, and Y. Xie, “3DHLS: Incorporating high-level synthesisin physical planning of three-dimensional (3D) ICs,” in Proc. Des. Automat. &Test Europe Conf., Mar. 2012, pp. 1185–1190.

[53] S. Bobba, A. Chakraborty, O. Thomas, P. Batude, and G. de Micheli, “Celltransformations and physical design techniques for 3D monolithic integrated cir-cuits,” ACM J. Emerg. Technol. Comput. Syst., vol. 9, no. 3, pp. 19:1–19:28,Oct. 2013.

[54] M. Jung, T. Song, Y. Peng, and S. K. Lim, “Design methodologies for low-power3-D ICs with advanced tier partitioning,” IEEE Trans. Very Large Scale Integr.(VLSI) Syst., vol. 25, no. 7, pp. 2109–2117, July 2017.

[55] S. Panth, K. Samadi, Y. Du, and S. K. Lim, “Shrunk-2D: A physical designmethodology to build commercial-quality monolithic 3-D ICs,” IEEE Trans.Comput.-Aided Design Integr. Circuits Syst., vol. 36, no. 10, pp. 1716–1724,Oct. 2017.

137

[56] E. J. Nowak, I. Aller, T. Ludwig, K. Kim, R. V. Joshi, C.-T. Chuang, K. Bern-stein, and R. Puri, “Turning silicon on its edge [double gate CMOS/FinFETtechnology],” IEEE Circuits and Devices Mag., vol. 20, no. 1, pp. 20–31, Jan.-Feb 2004.

[57] Z. Guo, S. Balasubramanian, R. Zlatanovici, T.-J. King, and B. Nikolic,“FinFET-based SRAM design,” in Proc. Int. Symp. Low Power Electron. Design,Aug. 2005, pp. 2–7.

[58] A. B. Sachid and C. Hu, “Denser and more stable SRAM using FinFETs withmultiple fin heights,” IEEE Trans. Electron Devices, vol. 59, no. 8, pp. 2037–2041, Aug. 2012.

[59] Synopsys Inc., “Sentaurus TCAD tool suite, version I-2013.12,” 2013. [Online].Available: htttp://www.synopsys.com

[60] E. Grossar, M. Stucchi, K. Maex, and W. Dehaene, “Read stability and write-ability analysis of SRAM cells for nanometer technologies,” IEEE J. Solid-StateCircuits, vol. 41, no. 11, pp. 2577–2588, Nov. 2006.

[61] A. Bhavnagarwala, S. Kosonocky, C. Radens, K. Stawiasz, R. Mann, Q. Ye, andK. Chin, “Fluctuation limits & scaling opportunities for CMOS SRAM cells,” inProc. IEEE Int. Electron Device Mtg., Dec. 2005, pp. 659–662.

[62] A. N. Bhoj, R. V. Joshi, and N. K. Jha, “3-D TCAD based parasitic capacitanceextraction for emerging multi-gate devices and circuits,” IEEE Trans. Very LargeScale Integr. (VLSI) Syst., vol. 21, no. 11, pp. 2094–2105, Nov. 2013.

[63] A. Carlson, Z. Guo, S. Balasubramanian, R. Zlatanovici, T.-J. K. Liu, andB. Nikolic, “SRAM read/write margin enhancements using FinFETs,” IEEETrans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no. 6, pp. 887–900, June2010.

[64] B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang, G. H. Loh, D. Mc-Cauley, P. Morrow, D. W. Nelson, D. Pantuso, P. Reed, J. Rupley, S. Shankar,J. Shen, and C. Webb, “Die stacking (3D) microarchitecture,” in Proc. Int. Symp.Microarchitecture, Dec. 2006, pp. 469–479.

[65] S. Xiong and J. Bokor, “Sensitivity of double-gate and FinFET devices to processvariations,” IEEE Trans. Electron Devices, vol. 50, no. 11, pp. 2255–2261, Nov.2003.

[66] E. Chin, M. Dunga, and B. Nikolic, “Design trade-offs of a 6T FinFET SRAMcell in the presence of variations,” in Proc. IEEE. Symp. VLSI Circuits, 2006,pp. 445–449.

138

htttp://www.synopsys.com

[67] J. Kedzierski, D. M. Fried, E. J. Nowak, T. Kanarsky, J. H. Rankin, H. Hanafi,W. Natzle, D. Boyd, Y. Zhang, R. A. Roy, J. Newbury, C. Yu et al., “High-performance symmetric-gate and CMOS-compatible Vt asymmetric-gate Fin-FET devices,” in Proc. IEEE Int. Electron Device Mtg., Dec. 2001, pp. 19.5.1–19.5.4.

[68] L. Mathew, M. Sadd, B. E. White, A. Vandooren, S. Dakshina-Murthy, J. Cobb,T. Stephens, R. Mora, D. Pham, J. Conner, T. White et al., “Finfet with isolatedn+ and p+ gate regions strapped with metal and polysilicon,” in Proc. IEEEInt. SOI Conf., Sept.-Oct. 2003, pp. 109–110.

[69] T. Ghani, K. Mistry, P. Packan, M. Armstrong, S. Thompson, S. Tyagi, andM. Bohr, “Asymmetric source/drain extension transistor structure for high per-formance sub-50 nm gate length CMOS devices,” in Proc. IEEE Symp. VLSITechnol., June 2001, pp. 17–18.

[70] K. Ohuchi, K. Adachi, A. Hokazono, and Y. Toyoshima, “Source/drain engineer-ing for sub 100-nm technology node,” in Proc. IEEE Int. Conf. Ion ImplantationTechnol., Sept. 2002, pp. 7–12.

[71] P. Batude, M. A. Jaud, O. Thomas, L. Clavelier, A. Pouydebasque, M. Vinet,S. Deleonibus, and A. Amara, “3D CMOS integration: Introduction of dynamiccoupling and application to compact and robust 4T SRAM,” in Proc. Int. Conf.Integr. Circuit Design and Technol., June 2008, pp. 281–284.

[72] R. W. Mann and B. H. Calhoun, “New category of ultra-thin notchless 6T SRAMcell layout topologies for sub-22nm,” in Proc. Int. Symp. Qual. Electron. Design,Mar. 2011, pp. 1–6.

[73] A. Singhee and R. A. Rutenbar, “Why quasi-Monte Carlo is better than MonteCarlo or Latin hypercube sampling for statistical circuit analysis,” IEEE Trans.Comput.-Aided Design Integr. Circuits Syst., vol. 29, no. 11, pp. 1763–1776, Nov.2010.

[74] R. Ho, K. W. Mai, and M. A. Horowitz, “The future of wires,” Proc. IEEE,vol. 89, no. 4, pp. 490–504, Apr. 2001.

[75] J. S. Clarke, C. George, C. Jezewski, A. M. Caro, D. Michalak, and J. Torres,“Process technology scaling in an increasingly interconnect dominated world,”in Proc. Int. Symp. VLSI Technol., June 2014, pp. 1–2.

[76] J. Cong, J. Wei, and Y. Zhang, “A thermal-driven floorplanning algorithm for3D ICs,” in Proc. Int. Conf. Comput.-Aided Des., Nov. 2004, pp. 306–313.

[77] K. Vaidyanathan, D. H. Morris, U. E. Avci, I. S. Bhati, L. Subramanian, J. Gaur,H. Liu, S. Subramoney, T. Karnik, H. Wang, and I. A. Young, “Overcoming in-terconnect scaling challenges using novel process and design solutions to improveboth high-speed and low-power computing modes,” in Proc. IEEE Int. ElectronDevice Mtg., Dec. 2017.

139

[78] P. Kapur, G. Chandra, and K. C. Saraswat, “Power estimation in global inter-connects and its reduction using a novel repeater optimization methodology,” inProc. Des. Autom. Conf., June 2002, pp. 461–466.

[79] Y. Yang and N. K. Jha, “FinPrin: FinFET logic circuit analysis and optimizationunder PVT variations,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,vol. 22, no. 12, pp. 2462–2475, Dec. 2014.

[80] C.-Y. Lee and N. K. Jha, “FinCANON: A PVT-aware integrated delay andpower modeling framework for FinFET-based caches and on-chip networks,”IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 5, pp. 1150–1163, May 2014.

[81] R. Zhang, M. R. Stan, and K. Skadron, “HotSpot 6.0: Validation, accelerationand extension,” University of Virginia, Tech. Report, CS-2015-04, Aug. 2015.

[82] J. K. Ousterhout, G. T. Hamachi, R. N. Mayo, W. S. Scott, and G. S. Taylor,“The Magic VLSI layout system,” IEEE Des. Test Comput., vol. 2, no. 1, pp.19–30, Feb. 1985.

[83] Synopsys Inc., “Synopsys Design Compiler,” 2013. [Online]. Available:htttp://www.synopsys.com

[84] J. A. Roy, S. N. Adya, D. A. Papa, and I. L. Markov, “Min-cut floorplacement,”IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 25, no. 7, pp.1313–1326, July 2006.

[85] C. Chu and Y.-C. Wong, “FLUTE: Fast lookup table based rectilinear Steinerminimal tree algorithm for VLSI design,” IEEE Trans. Comput.-Aided DesignIntegr. Circuits Syst., vol. 27, no. 1, pp. 70–83, Jan. 2008.

[86] C. Yan and E. Salman, “Mono3D: Open source cell library for monolithic 3-Dintegrated circuits,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 65, no. 3,pp. 1075–1085, Mar. 2018.

[87] B. W. Ku, T. Song, A. Nieuwoudt, and S. K. Lim, “Transistor-level monolithic3D standard cell layout optimization for full-chip static power integrity,” in Proc.Int. Symp. Low Power Electron. Design, July 2017, pp. 1–6.

[88] W.-L. Hung, G. M. Link, Y. Xie, N. Vijaykrishnan, and M. J. Irwin, “Inter-connect and thermal-aware floorplanning for 3D microprocessors,” in Proc. Int.Symp. Qual. Electron. Design, Mar. 2006, pp. 98–104.

[89] Y.-C. Chang, Y.-W. Chang, G.-M. Wu, and S.-W. Wu, “B*-Trees: A new rep-resentation for non-slicing floorplans,” in Proc. Des. Autom. Conf., June 2000,pp. 458–463.

[90] ITRS, “International Technology Roadmap for Semiconductor,” 2013. [Online].Available: http://www.itrs2.net/2013-itrs.html

140

htttp://www.synopsys.com

http://www.itrs2.net/2013-itrs.html

[91] HotSpot, “HotSpot 6.0 temperature modeling tool,” 2015. [Online]. Available:http://lava.cs.virginia.edu/HotSpot/

[92] S. K. Samal, S. Panth, K. Samadi, M. Saeidi, Y. Du, and S. K. Lim, “Adaptiveregression-based thermal modeling and optimization for monolithic 3-D ICs,”IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 35, no. 10, pp.1707–1720, Oct. 2016.

[93] Oracle, “OpenSPARC T2,” 2007. [Online]. Available: http://www.oracle.com

[94] P. Batude, M. Vinet, B. Previtali, C. Tabone, C. Xu, J. Mazurier, O. Weber,F. Andrieu, L. Tosti, L. Brevard, B. Sklenard, P. Coudrain et al., “Advances,challenges and opportunities in 3D CMOS sequential integration,” in Proc. IEEEInt. Electron Device Mtg., Dec. 2011, pp. 7.3.1–7.3.4.

[95] J. Z. Yan and C. Chu, “DeFer: Deferred decision making enabled fixed-outlinefloorplanning algorithm,” IEEE Trans. Comput.-Aided Design Integr. CircuitsSyst., vol. 29, no. 3, pp. 367–381, Mar. 2010.

[96] J. Shi, D. Nayak, S. Banna, R. Fox, S. Samavedam, S. K. Samal, and S. K.Lim, “A 14nm FinFET transistor-level 3D partitioning design to enable high-performance and low-cost monolithic 3D IC,” in Proc. IEEE Int. Electron DeviceMtg., Dec. 2016, pp. 2–5.

[97] S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi,“The McPAT framework for multicore and manycore architectures: Simultane-ously modeling power, area, and timing,” ACM Trans. Archit. Code Optim.,vol. 10, no. 1, p. 5, Apr. 2013.

[98] A. B. Kahng, B. Li, L. Peh, and K. Samadi, “Orion 2.0: A power-area simulatorfor interconnection networks,” IEEE Trans. Very Large Scale Integr. (VLSI)Syst., vol. 20, no. 1, pp. 191–196, Jan. 2012.

141

http://lava.cs.virginia.edu/HotSpot/

http://www.oracle.com

Date post:	22-Jan-2021
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

FinFET-based SRAM and Monolithic 3-D Integrated Circuit Design€¦ · 3-D Integrated Circuit...

Documents