Copyright by Jae-Seok Yang 2011

Copyright

by

Jae-Seok Yang

2011

The Dissertation Committee for Jae-Seok Yangcertifies that this is the approved version of the following dissertation:

Nanometer VLSI Design-Manufacturing Interface for

Large Scale Integration

Committee:

David Z. Pan, Supervisor

Jacob Abraham

Michael Orshansky

Frank Liu

Nur Touba



by

Jae-Seok Yang, B.S.; M.S.; M.S.

DISSERTATION

Presented to the Faculty of the Graduate School of

The University of Texas at Austin

in Partial Fulfillment

of the Requirements

for the Degree of

DOCTOR OF PHILOSOPHY

THE UNIVERSITY OF TEXAS AT AUSTIN

May 2011

Dedicated to my father Jun-Hwan Yang,

my mother Mrs. Soon-Hui Jeong

and my wife Mrs. Yoon-Jeong Cho

Acknowledgments

First and foremost, I would like to thank my research advisor Prof.

David Z. Pan. He has been always readily available for technical guidance and

discussions, and I have gained many other skills from him such as identifying

research, executing research ideas, writing clear papers, and time management

during my graduate life. He has not only guided me on research, but also

provided psychological support during times of turmoil in my life. Without

his guidance and support, this dissertation would not have been completed.

I would like to acknowledge the help of my fellow graduate students. I

am thankful to the technical discussions and friendly support they have pro-

vided. Past and current graduate students without whose help this dissertation

would not have been possible include: James Ban, Minsik Cho, Jiwoo Pak,

Duo Ding, Jerrica Gao, Ou He, Wooyoung Jang, Anurag Kumar, Katrina Lu,

Joydeep Mitra, Ashutosh Chakraborty, Kun Yuan, HyunJin Kim, Yilin Zhang

and Bei Yu. It has been my privilege to work with you all. I would especially

thank James Ban who has provided me immense support throughout these

years.

I would like to express deep gratitude to my PhD committee members:

Prof. Jacob Abraham, Prof. Michael Orshansky, Dr. Frank Liu and Prof.

Nur Touba for agreeing to serve in my PhD committee despite their very busy

v

schedules, and for their insightful comments and discussions. I would also

like to thank Prof. Sung Kyu Lim and GTCAD lab members at School of

ECE, Georgia Institute of Technology including Krit Athikulwongse, Young-

Joon Lee, Xin Zhao, Dae Hyun Kim and Moongon Jung. Collaboration with

GTCAD lab members has been really great opportunities to challenge new

research.

Special thanks to the friendly and professional administrative staff at

the ECE Department and CERC at the University of Texas at Austin (UT). In

particular, I am grateful to the support and encouragement from Debi Prather

and Melanie Gulick, the ECE graduate program coordinator.

I would like to specially thank Prof. Earl Swartzlander, Prof. Mark

McDermott under whom I took several graduate courses which helped me

improve my research work. I would also like to thank CAE team members

in Samsung for the friendly support they have provided. Past and current

CAE team members without whose help this dissertation would not have been

possible include: Dr. Jeong-Taek Kong, Dr Chul-Hong Park, Joon-Ho Choi,

Sanghoon Lee, Jong-Bae Lee, Moon-Hyun Yoo, Young-Kwan Park. I would

specially thank Dr. Jeong-Taek Kong who gave me opportunities to start PhD

program and provided me immense support throughout PhD program, both

in times of deep despair and in times of joy.

Finally, and most importantly, I am indebted to the love and support

that I have received from my family members. In special, I am thankful to my

wife, my parents, sister, parents-in-law, sister-in-law, brother-in-law who made

vi

several sacrifices to ensure that I could pursue my dream of pursuing a PhD.

Before I joined University of Texas, Austin, I had a strong mind of going back

to industry and giving up pursuing a PhD. If not for the immense emotional

support and constant encouragement that I got from my wife, I would not

have been able to complete my PhD. I would also like to thank my son, who

will be born in July, 2011.

vii



Publication No.

Jae-Seok Yang, Ph.D.

The University of Texas at Austin, 2011

Supervisor: David Z. Pan

As nanometer Very Large Scale Integration (VLSI) demands more tran-

sistor density to fabricate multi-cores and memory blocks in a limited die size,

many researches have been performed to keep Moore’s Low in two different

ways: 2D geometric shrinking and 3D vertical wafer stacking. For the geomet-

ric shrinking, nano patterning with 193nm lithography equipment is one of the

most fundamental challenges beyond 22nm while the next-generation lithog-

raphy, such as Extreme Ultra-Violet (EUV) lithography still faces tremendous

challenges for volume production in the near future. As a practical solution,

Double Patterning Lithography (DPL) has become a leading candidate for

sub-20nm lithography process. Another approach for multi-core integration is

3D wafer stacking with Through Silicon Via (TSV). Computer-Aided-Design

(CAD) approaches to enable robust DPL and TSV technology are the main

focus of this dissertation.

viii

DPL poses new challenges for overlay and layout decomposition. There-

fore, overlay induced variation modeling and efficient decomposition for better

manufacturability are in great demand. Since the variation of metal space

caused by overlay results in coupling capacitance variation, we first model

metal spacing variation with individual overlay sources. Then, all overlay

sources are considered to determine the worst timing with coupling capaci-

tance variation. Non-parallel pattern caused by overlay is converted to par-

allel one with equivalent spacing having the same delay to be applicable of a

traditional RC extraction flow. Our experiments show that the delay variation

due to overlay in DPL can be up to 9.1%, and well decomposed layout can

reduce the variability.

For DPL layout decomposition, we propose a multi-objective and flex-

ible framework for stitch minimization, balanced density, and overlay com-

pensation, simultaneously. We use a graph theoretic algorithm for minimum

stitch insertion and balanced density. Additional decomposition constraints

for overlay compensation are obtained by Integer Linear Programming (ILP).

Robust contact decomposition can be obtained with additional constraints.

With these constraints, global decomposition is performed using a modified

Fiduccia-Mattheyses (FM) graph partitioning algorithm. Experimental re-

sults show that the proposed framework is highly scalable and fast: we can

decompose all 15 benchmark circuits in five minutes in a density balanced

fashion, while an ILP-based approach can finish only the smallest five circuits.

In addition, we can remove more than 95% of the timing variation induced by

ix

overlay for tested structures.

Three-dimensional integration has new manufacturing and design chal-

lenges such as device variation due to TSV induced stress and timing corner

mismatch between different stacked dies. Since TSV fill material and silicon

have different Coefficients of Thermal Expansion (CTE), TSV causes silicon

deformation due to different temperatures at chip manufacturing and operat-

ing. Therefore, the systematic variation due to TSV induced stress should be

considered for robust 3D IC design. We propose systematic TSV stress aware

timing analysis and show how to optimize layout for better performance. First,

a stress contour map with an analytical radial stress model is generated. Then,

the tensile stress is converted to hole and electron mobility variations depend-

ing on geometric relations between TSVs and transistors. Mobility variation

aware cell library and netlist are generated and incorporated in an industrial

timing engine for 3D-IC timing analysis. TSV stress induced timing variations

can be as much as ± 10% for an individual cell. As an application for layout

optimization, we can exploit the stress-induced mobility enhancement to im-

prove timing on critical cells. We show that stress-aware perturbation could

reduce cell delay by up to 14.0% and critical path delay by 6.5% in our test

case.

Three-dimensional Clock Tree Synthesis (3D CTS) is one of the main

design difficulties in 3D integration because clock network is spreading over all

tiers. In 3D CTS, timing corner mismatch between tiers is caused because each

tier is manufactured in independent process. Therefore, inter-die variation

x

should be considered to analyze and optimize for paths spreading over several

tiers in 3D CTS. In addition, mobility variation of a clock buffer due to stress

from TSV can cause unexpected skew which degrades overall chip performance.

Therefore, we propose clock period optimization to consider both timing corner

mismatch and TSV induced stress. In our experiments, we show that our clock

buffer tier assignment reduces clock period variation up to 34.2%, and the

most of stress-induced skew can be removed by our stress-aware CTS. Overall,

we show that performance gain can be up to 5.7% with the proposed CTS

algorithm.

As technology scaling continues toward 14nm and 3D-integration, this

dissertation addresses several key issues in the design-manufacturing interface,

and proposes unified analysis and optimization techniques for effective design

and manufacturing integration.

xi

Table of Contents

Acknowledgments v

Abstract viii

List of Tables xv

List of Figures xvi

Chapter 1. Introduction 1

1.1 Challenges for Nanometer VLSI Design and Manufacturing . . 1

1.2 Overview and Contribution of this Dissertation . . . . . . . . . 8

Chapter 2. Overlay aware Timing Variation Modeling for DPL 11

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Preliminaries and Related Works . . . . . . . . . . . . . . . . . 13

2.2.1 DPT Process . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.2 Capacitance Variation induced by Overlay . . . . . . . . 14

2.2.3 Sources of Overlay . . . . . . . . . . . . . . . . . . . . . 15

2.3 Layout Distortion Estimation . . . . . . . . . . . . . . . . . . 18

2.3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . 18

2.3.2 Translation Overlay Consideration . . . . . . . . . . . . 20

2.3.3 Rotation Overlay Consideration . . . . . . . . . . . . . . 22

2.3.4 Magnification Overlay Consideration . . . . . . . . . . . 24

2.3.5 Overall Capacitance Variation with Overlay . . . . . . . 25

2.4 Timing Analysis with Parameterized Capacitance . . . . . . . 28

2.5 Applications and Results . . . . . . . . . . . . . . . . . . . . . 30

2.6 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

xii

Chapter 3. A Graph-Theoretic, Multi-Objective Layout Decom-position 38

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.2 Decomposition Requirements and Motivation . . . . . . . . . . 41

3.2.1 Balanced Density . . . . . . . . . . . . . . . . . . . . . . 41

3.2.2 Robust Contact Decomposition . . . . . . . . . . . . . . 42

3.2.3 Overlay Compensation . . . . . . . . . . . . . . . . . . . 43

3.3 Bi-partitioning Based Decomposition . . . . . . . . . . . . . . 44

3.3.1 Overall Decomposition Flow . . . . . . . . . . . . . . . 44

3.3.2 Finding Neighboring Rectangles . . . . . . . . . . . . . 47

3.3.3 Grouping and Relative Coloring . . . . . . . . . . . . . 48

3.3.4 Group Color Assignment Problem . . . . . . . . . . . . 49

3.3.5 Min-Cut based Stitch Minimization . . . . . . . . . . . 50

3.3.6 Modified Graph Min-Cut Partitioning . . . . . . . . . . 53

3.3.7 Complexity Analysis . . . . . . . . . . . . . . . . . . . . 56

3.4 Application to Contact Layer Decomposition . . . . . . . . . . 57

3.5 Application to Robust Metal Layer Decomposition . . . . . . . 60

3.5.1 TDD Constraints for Overlay Compensation . . . . . . . 60

3.5.2 TDD Constraints aware Decomposition . . . . . . . . . 65

3.6 Application to Hierarchical Decomposition . . . . . . . . . . . 67

3.7 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 68

3.8 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Chapter 4. TSV Stress aware Timing Analysis and Layout Op-timization for 3D-ICs 81

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.2 Related Work and Motivation . . . . . . . . . . . . . . . . . . 83

4.3 Overall TSV Stress aware Design Flow . . . . . . . . . . . . . 86

4.4 TSV Stress and Mobility Variation Modeling . . . . . . . . . . 88

4.4.1 Mobility Variation for a Single TSV . . . . . . . . . . . 88

4.4.2 Mobility Variation for Multiple TSVs . . . . . . . . . . 91

4.5 Timing Analysis with TSV Stress Consideration . . . . . . . . 94

4.5.1 Timing Analysis for 3D-ICs . . . . . . . . . . . . . . . . 95

xiii

4.5.2 Timing Library for Mobility Variation . . . . . . . . . . 96


4.7 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Chapter 5. Robust Clock Tree Synthesis with Timing Yield Op-timization for 3D-ICs 104

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.2 Related Work and Motivation . . . . . . . . . . . . . . . . . . 106

5.2.1 Clock Buffer Tier Assignment . . . . . . . . . . . . . . . 107

5.2.2 Clock Buffer Variation due to TSV induced Stress . . . 109

5.3 Robust Clock Tree Design . . . . . . . . . . . . . . . . . . . . 112

5.3.1 σCP Minimization for Critical Paths . . . . . . . . . . . 114

5.3.2 Buffer Variation Modeling under TSV induced Stress . . 119

5.3.3 Three-Dimensional Buffered Clock Tree Synthesis (CTS) 122


5.5 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

Chapter 6. Conclusions 134

Bibliography 137

Vita 153

xiv

List of Tables

2.1 Timing Errors ignoring the dependency. . . . . . . . . . . . . . . . . 29

3.1 Runtime comparison(mins = 42nm) . . . . . . . . . . . . . . . 73

3.2 Decomposition results with ISCAS benchmark(mins = 54nm,minom = 20nm), ILP solver(exact method) . . . . . . . . . . . 74

3.3 Decomposition results with ISCAS benchmark(mins = 54nm,minom = 20nm), Graph Partition(Proposed heuristic) . . . . . 74

3.4 Overlay compensation with TDD . . . . . . . . . . . . . . . . 79

4.1 TSV specification . . . . . . . . . . . . . . . . . . . . . . . . . 98

4.2 Longest path delay and TNS comparison . . . . . . . . . . . . 100

4.3 Gate optimizations on the target path with perturbation . . . 101

5.1 Buffer rising delay variation according to mobility changes (nom-inal delay: 210ps) . . . . . . . . . . . . . . . . . . . . . . . . . 120

5.2 Circuit Information . . . . . . . . . . . . . . . . . . . . . . . . 128

5.3 Skew change due to TSV stress according to clock source z-location (CKT9) . . . . . . . . . . . . . . . . . . . . . . . . . 128

5.4 Clock period analysis result. Case1 (No covariance optimizationis done, and no stress is considered) . . . . . . . . . . . . . . . 129

5.5 Clock period analysis result. Case2 (No covariance optimizationis done, but TSV stress is considered) . . . . . . . . . . . . . . 130

5.6 Clock period analysis result. Case3 (Covariance optimization isdone, but no stress is considered) . . . . . . . . . . . . . . . . 131

5.7 Clock period analysis result. Case4 (Covariance optimization isdone, and TSV stress is considered) . . . . . . . . . . . . . . . 132

5.8 No stress, no inter-die aware CTS vs. our stress and inter-dieaware CTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

xv

List of Figures

1.1 Lithography trend to follow Moore’s law [Courtesy Intel]. . . . 1

1.2 Mask split in DPL . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Unwanted pattern distortion in DPL . . . . . . . . . . . . . . 4

1.4 An illustration showing difficulties in DPL . . . . . . . . . . . 5

1.5 TSV can cause tensile stress in silicon near TSV. . . . . . . . 6

2.1 DPT process and patterning result. . . . . . . . . . . . . . . . 13

2.2 Effective capacitance variation. . . . . . . . . . . . . . . . . . 14

2.3 Translation overlay variable. . . . . . . . . . . . . . . . . . . . 16

2.4 Rotation overlay variable. . . . . . . . . . . . . . . . . . . . . 17

2.5 Magnification overlay variable. . . . . . . . . . . . . . . . . . . 17

2.6 Vector expression of overlay. . . . . . . . . . . . . . . . . . . . 19

2.7 ∆S dependency on location. . . . . . . . . . . . . . . . . . . . 20

2.8 ∆ST for various geometric relations. . . . . . . . . . . . . . . . 21

2.9 ∆XR and ∆YR for rotation overlay. . . . . . . . . . . . . . . . 22

2.10 β′ for various geometric relations. . . . . . . . . . . . . . . . . 24

2.11 ∆XM and ∆YM for magnification overlay. . . . . . . . . . . . 25

2.12 ∆S dependency on location. . . . . . . . . . . . . . . . . . . . 26

2.13 Equivalent n-π model for Seqv calculation. . . . . . . . . . . . 26

2.14 Overlay aware timing analysis flow. . . . . . . . . . . . . . . . 30

2.15 Test structure used to verify overlay aware timing analysis flow 31

2.16 Delay variation with overlay when target layout is in near cen-ter. Translation overlay is dominant. . . . . . . . . . . . . . . 32

2.17 Delay variation with overlay when target layout is in near edgeof die. Rotation and magnification overlay as well as translationoverlay have an impact on delay variation. . . . . . . . . . . . 33

2.18 Test structure for an alternative decomposition which has lessoverlay effect on timing . . . . . . . . . . . . . . . . . . . . . . 34

xvi

2.19 Comparision of delay variation between two different decompo-sition scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.20 Compensation trend of overlay induced variation as logic depthincreases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.21 Timing variation over θ for multiple metal layers. . . . . . . . 37

3.1 Layout decomposition. . . . . . . . . . . . . . . . . . . . . . . 39

3.2 Aerial image comparison of decomposed layout with differentdensity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.3 An illustration of contact to gate space variation due to overlay. 43

3.4 Overlay compensated decomposition. . . . . . . . . . . . . . . 44

3.5 Overall flow of the decomposition framework. . . . . . . . . . 45

3.6 Decomposition results after each step (AOI2 cell, Metal1). . . 46

3.7 Re-segmentation and grouping process. . . . . . . . . . . . . . 48

3.8 An example showing group color assignment formulation afterrelative coloring. . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.9 An example showing how to handle a repulsive layout pair inFM partitioning algorithm. . . . . . . . . . . . . . . . . . . . . 53

3.10 Group color assignment with different partitioning scheme anddecomposition results. . . . . . . . . . . . . . . . . . . . . . . 55

3.11 Layout division for local balance. . . . . . . . . . . . . . . . . 56

3.12 ∆Ion variation for various overlay cases. . . . . . . . . . . . . . 58

3.13 An application to contact layer decomposition. . . . . . . . . . 59

3.14 An example with four neighbors. . . . . . . . . . . . . . . . . 64

3.15 Different decomposition with TDD. . . . . . . . . . . . . . . . 65

3.16 Decomposition with TDD constraints. . . . . . . . . . . . . . . 66

3.17 Flattened layout decomposition. . . . . . . . . . . . . . . . . . 69

3.18 Hierarchical layout decomposition. . . . . . . . . . . . . . . . . 70

3.19 Metal 1 decomposition with our framework. . . . . . . . . . . 71

3.20 Graph showing runtime increment as circuits are bigger. . . . 72

3.21 Extremely unbalanced decomposition (S38584). . . . . . . . . 75

3.22 C432 decomposed layout. . . . . . . . . . . . . . . . . . . . . . 76

3.23 The balanced decomposed layout has less EPE than the unbal-anced one (C432). . . . . . . . . . . . . . . . . . . . . . . . . . 77

xvii

3.24 Delay variation for an inverter chain to verify the robust contactdecomposition in DPL. . . . . . . . . . . . . . . . . . . . . . . 78

3.25 Reduction of timing variation as more stitches are inserted (Net3). 80

4.1 Thermal stress around TSV. . . . . . . . . . . . . . . . . . . . 84

4.2 Mobility change due to tensile stress. . . . . . . . . . . . . . . 85

4.3 Overall flow for TSV stress aware design. . . . . . . . . . . . . 87

4.4 Optimal orientation of MOSFET to maximize the mobility for(001) surface, 〈110〉 channel. . . . . . . . . . . . . . . . . . . . 89

4.5 Stress contour map for a single TSV with 0.5um KOZ. . . . . 91

4.6 Mobility contour map for a TSV. . . . . . . . . . . . . . . . . 92

4.7 Linear super-position of TSV stress. . . . . . . . . . . . . . . . 93

4.8 Zigzag TSV placement has less (∆µ/µ)h between rows due tocompensation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.9 Timing corner determination according to mobility variation. . 95

4.10 Timing corner with TSV stress. . . . . . . . . . . . . . . . . . 96

4.11 Inverter delay variation with different (∆µ/µ)h and (∆µ/µ)e. . 97

4.12 ∆µ/µ contour map for 22 x 21 TSV array. . . . . . . . . . . . 99

4.13 Cell perturbation to take advantage of mobility variation. . . . 103

5.1 Clock path p with clock buffers . . . . . . . . . . . . . . . . . 107

5.2 Thermal stress around TSV. . . . . . . . . . . . . . . . . . . . 110

5.3 Buffer delay variation due to TSV stress. . . . . . . . . . . . . 111

5.4 Overall proposed 3D CTS flow. . . . . . . . . . . . . . . . . . 112

5.5 An illustration of the buffer tier assignment procedure in abottom-up manner. Note that MP1’s location is changed atstep(c) to minimize #TSVs. . . . . . . . . . . . . . . . . . . . 114

5.6 Necessity of #TSV control between two consecutive clock buffers.118

5.7 TSV induced stress and clock buffer variation modeling. . . . 121

5.8 Wire, TSV, and buffer modeling for delay calculation. . . . . . 124

5.9 Illustrations for merging point determination. . . . . . . . . . 125

5.10 Runtime trend. . . . . . . . . . . . . . . . . . . . . . . . . . . 132

5.11 Trade-off between covariance and # TSVs. . . . . . . . . . . . 133

xviii

Chapter 1

Introduction

1.1 Challenges for Nanometer VLSI Design and Manu-facturing

As nanometer very large scale integration (VLSI) becomes the main-

stream of semiconductor industry, more transistors need to be integrated into

one chip. There are mainly two approaches to integrate more transistors within

limited chip size for multi-core design. The first approach is geometrical shrink-

ing of feature size (2D-integration), and the second one is vertical die stacking

with TSVs (3D-integration).

Figure 1.1: Lithography trend to follow Moore’s law [Courtesy Intel].

Geometric shrinking has been preferred to increase chip performance

as well as transistor density. However, the mainstream lithography technology

1

using 193nm stepper has been facing severe limitations [8, 29, 50] for sub-20nm

technology node as shown in Fig. 1.1. The smallest feature size to be printed is

defined as K1·λ/NA in which K1 is referred to as K-factor for a given process,

λ is a wavelength of a light source, and NA is a numerical aperture determined

by lens size. When we use immersion lithography of NA=1.35, K-factor should

be less than 0.2 to print 64nm pitch (half pitch: 32nm) pattern. However, the

theoretical limitation of K-factor is 0.25 with intensive OPC [21, 82, 87]. One

possible solution to overcome the limitation is to use high NA lithography

system. Chip makers have been using immersion lithography for sub-40nm

patterning which enhances NA from 0.93 (dry) to 1.35 (wet). However, it

is hard to find new liquid material to increase NA more than 1.35 in the

near future [1, 27, 39]. Electron beam lithography and nano-imprint have their

serious limitations due to throughput and mask defects [51, 86, 98]. As an

ideal solution, EUV(Extreme Ultra-Violet) lithography has been proposed.

Since the wavelength of EUV light source is 13.5nm, sub-20nm patterning is

possible with EUV. However, EUV lithography equipment is not available for

20nm production due to technical barriers such as the lack of power sources,

resists, and defect-free masks.

An alternative choice of 20nm patterning is Double Patterning Lithog-

raphy (DPL) [4, 48, 67, 68] which prints two lines in a pitch as illustrated in

Fig. 1.2. Double patterning lithography (DPL) [55] has been proposed as a

strong candidate for 20nm/14nm technology nodes. Double patterning can

be implemented in three ways: Litho-Etch-Litho-Etch (LELE), Litho-Freeze-

2

(a) Layout before decomposition (b) Layout after decomposition

Figure 1.2: Mask split in DPL

Litho-Etch (LFLE), and Self-Aligned Double Patterning (SADP). LELE uses

two lithography exposures and etches on hard-mask to create smaller chip

features. LFLE works by freezing the developed resist pattern of the first ex-

posure, then adding a second resist layer immediately on top for the second

exposure. The resist pattern is etched at one time after developing. SADP

works by depositing a spacer layer over the chip covering all hard mask fea-

tures. The covered layer is selectively etched away leaving two sidewalls along

any ridge, then the ridge is removed [5, 6, 76]. LELE and LFLE require more

accurate overlay control to align two exposures [18, 24, 37, 48, 57]. LFLE uses

fewer processing steps [2, 78]. However, LFLE requires insensitive resists be-

tween two lithography steps. SADP has less flexibility for 2-D patterning and

requires more processing steps such as trimming masks. The overlay require-

ment of SADP is less stringent than for other double patterning methods.

Therefore, SADP is widely used for NAND-flash which has 1-D structures and

requires the most advanced technology.

Since metal layer has inherently 2-D structures, we need to resolve the

3

(a) Designed pattern (b) Printed pattern with overlay

Figure 1.3: Unwanted pattern distortion in DPL

problems of the LELE/LFLE approach which are overlay in Fig. 1.3 and effi-

cient layout decomposition. The advanced lithography equipment has about

5nm of overlay in 3-σ level which is relatively large compared with 14nm half

pitch size [48, 57]. Since overlay may change the circuit behavior such as tim-

ing, noise, and power due to the distortion of layout pattern as shown in

Fig. 1.3(b), we need a methodology to estimate the variation due to overlay

during chip design. Metal layer variation due to overlay can affect the circuit

performance and yield seriously because overlay changes coupling capacitance

between metals. In this dissertation, we focus a systematic method to estimate

the performance variation with several overlay variables.

All types of DPL require layout decomposition before manufactur-

ing [16, 83, 85, 100]. When two features are located closely within the mini-

mum design rule, they need to be decomposed two different masks for LELE

and LFLE. SADP also requires layout decomposition to assign a feature to

a specific sidewall. During decomposition, coloring conflict can be resolved

by inserting stitches. However, minimum stitch insertion is preferred because

4

(a) Potential yield loss due tostitches

(b) irresolvable conflict

Figure 1.4: An illustration showing difficulties in DPL

stitch insertion requires overlap margin, resulting in unwanted chip area in-

crease. In addition, stitches may result in significant printability degradation

due to overlay error and line-end effect [17] as shown in Fig. 1.4(a). Not all the

conflicts can be resolved by inserting stitches. An irresolvable conflict is called

a native conflict in Fig. 1.4(b). The only way to remove native conflicts is to

modify layout [97]. After layout modification, we need to decompose layout

iteratively. Therefore, a fast decomposition algorithm is required to shorten

the time of an iterative layout modification and decomposition. Another re-

quirement is density balancing between decomposed patterns. In addition,

we observe that overlay effect on circuit variation can be removed with well-

decomposed layout. The key challenge of DPL is to accomplish high quality

decomposition for large-scale layouts under reasonable runtime with the follow-

ing objectives: a) the number of stitches is minimized, b) the balance between

two decomposed layers is maximized for further enhanced patterning, c) Pre-

5

defined coloring relations are kept for more robust contact decomposition, d)

the impact of overlay on coupling capacitance is reduced for less timing varia-

tion [93]. We show a scalable and flexible decomposition framework with the

multi-objectives to meet the above constraints.

Many chip makers have released 3D die integration with wire bonding.

The advantages of 3D integration include heterogeneous integration and chip

density increase by stacking dies vertically. 3D integration with TSV has

been gained main focus for future multi-core integration because of several

additional benefits compared with wire bonding scheme, which include wire

length reduction and increased band width between dies.

Figure 1.5: TSV can cause tensile stress in silicon near TSV.

However, 3D-ICs with TSVs have new manufacturing challenges. Tung-

sten(W), poly-silicon, and copper (Cu) have all been considered as fill mate-

rials of TSVs. Since copper has low resistivity, it is widely used material

for TSV fill. However, copper CTE differs from silicon CTE which can be

a source of silicon strain. The CTE mismatch between copper and silicon

6

causes inevitable stress on silicon near TSVs. Because copper electroplating

and annealing temperature is higher than operating temperature, tensile stress

appears on silicon [54, 74] after cooling down to room temperature as shown

in Fig. 1.5.

The tensile stress on silicon causes reliability problem such as crack-

ing [40, 61, 65]. In addition, the stress can change mobility of carriers. There-

fore, TSV stress induced by CTE mismatch may cause timing violation if

cells on a critical path are placed near TSVs. Tensile stress enhances electron

mobility. However, hole mobility is either enhanced or degraded depending

on TSV and transistor channel direction. Longitudinal tensile stress reduces

hole mobility while transverse tensile stress increases the mobility [80]. When

TSV induced tensile stress is 100MPa and the stress works for longitudinal

direction, hole mobility degradation can be up to 7.2%, which makes PMOS

transition slow. If PMOS is on a critical path, it can cause unexpected setup

time violation which is not detected with the current timing analysis flow. We

propose a design flow to analyze timing variation by TSV induced stress, and

show its implications for layout optimizations during 3D-IC design.

Three-dimensional Clock Tree Synthesis (3D CTS) has new challenges

such as timing corner mismatch between tiers and device variation due to

Through Silicon Via (TSV) induced stress [95]. Timing corner mismatch be-

tween tiers is caused because each tier is manufactured in independent process.

Therefore, inter-die variation should be considered to analyze and optimize for

paths spreading over several tiers. TSV induced stress is another challenge in

7

3D CTS. Mobility variation of a clock buffer due to stress from TSV can cause

unexpected skew which degrades overall chip performance. In this dissertation,

we propose clock tree design methodology with the following objectives: (a) to

minimize clock period variation by assigning optimal z-location of clock buffers

with an Integer Linear Program (ILP) formulation, (b) to prevent unwanted

skew induced by the stress.

1.2 Overview and Contribution of this Dissertation

The algorithms and modeling for addressing the above mentioned chal-

lenges are described in the next four chapters. The overall flow of the disser-

tation is as follows:

In Chapter 2, we provide a set of analytical formulae to model coupling

capacitance variation induced by overlay, and show that the timing variation

due to overlay can be up to 9.1%. We also show that layout decomposition

methods [16, 83] play a role in timing variation reduction. The work in Chapter

2 is used as a cost function for overlay tolerant layout decomposition for DPL

in Chapter 3. Within given timing constraints, the work provides a way to

determine the maximum allowed amount of overlay for process development.

Layout decomposition work for robust DPL is in Chapter 3. For lay-

out decomposition, we propose two step approaches with initial coloring and

optimization by showing that the number of stitch insertions required to de-

compose a layout is equal to the cut size of graph partitioning, and stitch

minimization can be achieved by a min-cut partitioning algorithm. Our graph

8

based decomposition is faster than ILP based decomposition with less than 1%

stitch increase. In addition, we show that layout decomposition plays a role in

enhancing patterning quality by enforcing balanced density and reduction of

overlay effects on contact and metal layers. Our work is the first one dealing

with lithography friendly objectives such as density and variability reduction

during layout decomposition.

In Chapter 4, we propose compact stress and mobility modeling to

consider systematic timing variation caused by TSV stress for 3D-ICs. We

show that TSV stress can change hole mobility quite significantly, e.g., from

-22% to +10%, which means more than 20% variation for single cell delay.

Thus it can deteriorate the overall chip performance thus must be considered

during timing analysis and optimization. Finally, we propose several layout

optimization techniques including small perturbation, zigzag TSV placement

and optimal cell rotation, and show that the optimization can improve single

cell delay by 14% and critical path delay up to 6.5%

Three-dimensional CTS to consider inter-die variation and TSV in-

duced stress is proposed in Chapter 5 by showing that clock buffer tier assign-

ment can play a role to reduce clock period variation. With our clock buffer

assignment, we show that standard deviation of clock period can be reduced

up to 34.2%. Thus, we can increase chip operating frequency for the same

timing yield, or we can increase timing yield for the same operating frequency.

This is also the first work to show that TSV-induced stress can cause un-

wanted clock skew and propose buffer delay variation model to consider the

9

stress during CTS. With stress-aware 3D CTS, we show that the most of skew

due to TSV-induced stress can be removed.

Finally, in Chapter 6, we draw conclusions based on the results of the

previous chapters as well as present a few promising research directions to

enable robust and reliable double patterning lithography and 3D integration

with TSV.

10

Chapter 2

Overlay aware Timing Variation Modeling for

DPL

2.1 Introduction

As the minimum pitch size decreases, the lithography process has been

confronted with severe limitation. The traditional lithography using 193nm

wavelength light cannot print sub-20nm pattern. Even though EUV is a

promising candidate for a light source of future lithography equipment, it is

not clear when EUV equipment will be commercially available because of the

difficulty in dealing with the strong light source. The alternative technology

for 20nm technology is DPL. Since DPL does patterning twice, we can print

two lines in a pitch which reduces K-factor to 0.125. Even though DPT is

an inefficient process because of the low throughput, the industry is trying to

adopt DPL as a solution of 20nm patterning because DPL is probably the only

technology to be available before EUV equipment is shipped.

The difficulty of DPL is the stitch generation and overlay control to

reduce mask misalignment between the first and second lithography steps. The

advanced lithography equipment has about 5nm of overlay in 3-σ level which

is relatively large compared with 20nm half pitch size [48, 57]. Since overlay

11

may change the circuit behavior such as timing, noise, and power due to the

distortion of layout pattern [9, 10, 20], we need a methodology to estimate the

variation due to overlay during design. Metal layer variation due to overlay

can affect the circuit performance and yield seriously because overlay changes

coupling capacitance between metals.

In this chapter, we present a systematic method to estimate the per-

formance variation with several overlay variables. This is the first work to

deal with overlay and timing relation, to our best knowledge. From the over-

lay measurement, it is possible to distinguish sources of overlay which are

translation, rotation, and magnification. Since overlay changes metal spacing,

we model coupling capacitance to be a function of overlay variables. Metal

spacing with overlay is not constant because overlay effects are different on

a position by position. Thus, parallel interconnects become non-parallel after

overlay modeling. To be able to use traditional 2.5-D extraction tool [63], non-

parallel conductors are converted to parallel layers with equivalent spacing in

terms of timing behavior. After modeling capacitance with overlay variables,

we show how to analyze timing by sweeping overlay variables, and determine

the variables which make the worst timing. With these variables, we can

construct a new equivalent layout with overlay consideration for an effective

timing analysis.

The rest of the chapter is organized as follows. Preliminaries and related

works are presented in section 2.2. We propose how to model timing variation

in section 2.3. The determination of overlay variables to make the worst timing

12

is presented in section 2.4. In section 2.5, the application of the work and

experimental results are shown. We draw discussions in section 2.6.

2.2 Preliminaries and Related Works

2.2.1 DPT Process

Figure 2.1: DPT process and patterning result.

Various double patterning methods are being developed. A common

way of DPT consists of Lithography-Etching-Lithography-Etching process as

shown in Figure 2.1 [57]. The first pattern is formed on hard mask 1 in (c), and

the second photo resist is etched in (e). We can print the final pattern on hard

mask 2(g). One of the most important challenges of DPT comes from overlay

13

between two independent lithography steps. Overlay causes the variation of

coupling capacitances for metal layers in (i). The variation of capacitances

between metal layers results in a timing and noise variation.

2.2.2 Capacitance Variation induced by Overlay

(a) One sided cap. (b) Two sided cap.

(c) Decoupled cap.

Figure 2.2: Effective capacitance variation.

Since coupling capacitance is inversely proportional to spacing between

two conductors, a spacing variation due to overlay in DPT results in a variation

of a coupling capacitance [94]. For instance, if metal spacing without overlay is

14

50nm and overlay reduces the spacing by 10nm, Cc overlay is equal to 1.25Cc

in Figure 2.2(a). One sided capacitance variation is understood explicitly.

Overlay effect in Figure 2.2(b) is not clear because the total capacitance is

changed from 2Cc to 2.08Cc (Cc overlay1: 1.25Cc, Cc overlay2: 0.83Cc) at

spacing=50nm, Y-translation overlay=10nm. However, if interconnects have

different slew rate, we can see clearly overlay effect on timing. The coupling

capacitance can be substituted to an equivalent grounded capacitance for an

equal delay [13, 41]. The miller factor is defined as MF = 1 + Tv/Ta. Here,

Tv is a transition time of victim, and Ta is a transition time of aggressor. If

MF1 is three, and MF2 is one in Figure 2.2(c), the total decoupled capacitance

is 4.58Cc which is 14.5% bigger than the capacitance (4Cc) in case of ignoring

overlay. From this observation, overlay in DPT has a significant effect on

an effective capacitance for multiple aggressor case as well as single aggressor

case.

2.2.3 Sources of Overlay

In this section, overlay variables are assumed to be measured from the

misalignment between the first and the second patterning. For simplicity,

we assume that overlay is a systematic variation, and randomly generated

overlay is negligible. There are three types of overlay [50]. The first one is a

translation overlay which is caused by mismatching a mask in a horizontal or

vertical direction. The translation overlay has two variables. The first variable

is overlay amplitude. The amplitude in advanced lithography system is in the

15

(a) Translation overlay (b) Variables for translation overlay

Figure 2.3: Translation overlay variable.

range of 3nm to 5nm. From a wafer measurement, we can extract the overlay

amplitude of 3-σ level. The second variable is overlay angle which is purely

random because a translation shift can be any direction. Overlay amplitude

is symbolized as α, and overlay angle is expressed as θ in Figure 2.3. The

counterclockwise of θ is defined as a positive θ.

The second source of overlay is a rotation of masks between the first

patterning and the second patterning. Even though the rotation overlay is less

than 0.1µ radian in advanced equipment, rotation overlay can make several

nm of layout distortion at the edge of a die. A variable of rotation overlay is

φ, and the clockwise direction of rotation is defined as positive φ as shown in

Figure 2.4. Rotation overlay depends on pattering location and the amount

of φ.

The last component of overlay is called magnification overlay. If there

16

(a) Rotation overlay (b) Variable for rotation overlay

Figure 2.4: Rotation overlay variable.

(a) Magnification overlay (b) Variable for magnification overlay

Figure 2.5: Magnification overlay variable.

17

is a temperature variation between the first and the second lithography step,

there is magnification overlay because a mask has a different thermal expansion

rate with a wafer. In a projection lithography system, a magnification factor

fluctuation is another source of magnification overlay. The variation on layout

due to magnification overlay is bigger as the pattern is far from the center

point. A variable of magnification overlay is defined to M in Figure 2.5. A

positive M means the second patterning is enlarged than the first one. M is

less than 0.1ppm in advanced lithography equipment. The overlay variables

are extracted in 3-σ level from overlay measurement [50].

2.3 Layout Distortion Estimation

We will propose a numerical formula to predict coupling capacitance

variation induced by overlay in this section.

2.3.1 Problem Definition

In Figure 2.6(a), CC is a coupling capacitance obtained by standard

RC extraction tool without overlay consideration. Coverlay is a coupling capac-

itance with overlay modeling. (X, Y) is a node on 2nd patterned interconnect.

The original space(S) is increased by ∆S which makes CC to be Coverlay. Each

overlay source makes a shift of (X,Y). (X, Y) is shifted to (XT , YT ) by trans-

lation overlay. (XR,YR) is a shifted point by rotation overlay. Similarly, (X,Y)

moves to (XM ,YM) by magnification overlay. The overall shift is the sum of

18

(a) Coverlay definition (b) ∆S definition

Figure 2.6: Vector expression of overlay.

each movement.

∆S = ∆ST +∆SR +∆SM (2.1)

Since two metals are parallel horizontally, ∆S is equal to ∆Y . In

Figure 2.6(b), metal space is increased by ∆YT+∆YR+∆YM . Since coupling

capacitance is in inverse proportional to metal space, Coverlay is a function of

∆S in the following.

Coverlay =S

S +∆S· CC (2.2)

To incorporate overlay measurement, we need a formula of ∆S with

overlay variables.

19

2.3.2 Translation Overlay Consideration

To predict metal space for translation overlay, let us assume that (X,Y)

is moved to (XT , YT ) by α and θ.

(X, Y )α,θ↔ (XT , YT ) (2.3)

Figure 2.7: ∆S dependency on location.

From Figure 2.7, X-translation(∆XT ) is αcosθ, and Y-translation(∆YT )

is αsinθ. Even though ∆XT and ∆YT depend on α and θ, ∆ST is not propor-

tional to α and θ directly because ∆ST depends on a geometric relation.

Figure 2.8 shows possible geometric relations which are represented by

γ defined as a degree between X-axis and orthogonal vector from 1st pattern

to 2nd pattern. Since metal space is changed by X-translation overlay when

two metals are parallel in vertical direction, ∆ST is equal to ∆XT for γ=0.

20

Figure 2.8: ∆ST for various geometric relations.

Similarly, ∆ST is ∆YT for γ=π/2. We calculate ∆ST for every geometric

relation in Figure 2.8. ∆ST is generalized in the following equation.

∆ST = α · cos (θ − γ) (2.4)

Equation 2.4 makes us consider a capacitance variation with translation

overlay. Since γ represents geometric relations, it is a fixed value for a given

decomposed layout. Thus, it is possible to analyze a timing variation with θ

and α variables.

21

2.3.3 Rotation Overlay Consideration

To predict metal space for rotation overlay, let us assume that (X,Y)

is moved to (XR,XR) by φ rotation.

(X, Y )φ↔ (XR, YR) (2.5)

Since the effects of rotation overlay are not the same when their po-

sitions are different, we need to consider locations of patterns for estimating

distortion induced by rotation overlay.

Figure 2.9: ∆XR and ∆YR for rotation overlay.

The location of (X,Y) is expressed by the distance(D) from the center

and the angle(β) from X-axis in Figure 2.9. D’ is a distance from the center

to (XR, YR). β is a position angle which is positive in a counterclockwise.

D =√X2 + Y 2, D′ = D/ cosφ (2.6)

22

∆XR with φ rotation is |XR −X|, and ∆YR is |YR − Y |. ∆XR and

∆YR are obtained from Figure 2.9.

∆XR = D · cos (β − φ) / cosφ− cosβ

∆YR = D ·

cos(

β +π

2− φ)

/ cosφ− cos(

β +π

2

)

(2.7)

∆XR and ∆YR have π/2 phase difference which is useful for deriving a

general formula.

∆SR = D · cos (β′ − φ) / cosφ− cos (β′) (2.8)

In equation 2.8, β′ is a new variable considering both geometric rela-

tion and position angle. When the geometric relation is γ=0, metal space is

increased by ∆XR. Thus, β′ is equal to β when γ is zero. If γ is π, ∆SR is

-∆XR because metal space decreases for the relation. Since π phase shift of

cos changes its sign, β′ is β-π when γ is π. Similarly, we calculate β’ for every

relation in Figure 2.10. General formula of ∆SR is in the following equation.

∆SR = D · cos (β − γ − φ) / cosφ− cos (β − γ) (2.9)

We can model a capacitance variation with rotation overlay using equa-

tion 2.9. D and β is determined by initial position (X,Y). Since γ is a fixed

value for a given decomposed layout and φ is only a variable, we can analyze

a timing variation by sweeping the rotation variable from -φ to +φ.

23

Figure 2.10: β′ for various geometric relations.

2.3.4 Magnification Overlay Consideration

D” is defined as a distance from the center to a shifted point due to

magnification overlay. Since M is defined as (D”-D)/D, D”-D is equal to M·D.

Thus, ∆XM and ∆YM are obtained in the following.

∆XM = M ·D · cosβ,∆YM = M ·D · sinβ (2.10)

The layout distortion due to magnification overlay is generalized in the

following.

∆SM = M ·D · cos(β − γ) (2.11)

Since M is only a variable in DPT in Equation 2.11, we can analyze

timing variation by sweeping the magnification variable from -M to +M.

24

Figure 2.11: ∆XM and ∆YM for magnification overlay.

2.3.5 Overall Capacitance Variation with Overlay

∆S = ∆ST +∆SR +∆SM

= α · cos (θ − γ)

+D · cos (β − γ − φ) / cosφ− cos (β − γ)

+M ·D · cos(β − γ) (2.12)

In equation 2.12, we propose the numerical expression of ∆S param-

eterized by overlay variables, location, and geometric relation. Even though

rotation and magnification overlay depend on their initial position, the linear

summation of each spacing variation gives us an accurate result.

Non-parallel metal layer after overlay consideration should be converted

to parallel one with equivalent timing behavior to be applicable of 2.5D RC-

extractor which cannot deal with a diagonal pattern. In Figure 2.12, we assume

that (X1,Y1) is close to a driver, and (Xn,Yn) is close to a receiver. ∆S1 is

25

Figure 2.12: ∆S dependency on location.

a metal spacing variation at (X1,Y1) while ∆Sn is metal spacing variation at

(Xn,Yn). ∆ST is the same in every position of a die because ∆ST is inde-

pendent of a location. However, since ∆SR and ∆SM depend on their initial

position, ∆S1 is different with ∆Sn.

Figure 2.13: Equivalent n-π model for Seqv calculation.

Seqv is defined as the metal spacing to have an equivalent delay. To find

Seqv, interconnect is assumed to be modeled by n-π model which is equivalent

to one-π model in Figure 2.13. Ci and Ri is the capacitance and resistance

in ith-π structure. The capacitance at the driver is C1, and the one at the

26

receiver is Cn.

R

2n

n−1∑

i=1

i

n(Ci + Ci+1) + Cn

(2.13)

Using elmore delay model [32, 33, 53], the delay of n-π model is ex-

pressed in equation 2.13. When C1 is equal to Cn, the above equation becomes

R · C/2 which is the same value with the delay of one-π model. When C1 is

not equal to Cn, the delay of interconnect is R ·Ceqv/2 because the resistance

is constant. Ceqv is the equivalent capacitance after converting to parallel

pattern.

∆C = Ci+1 − Ci =Cn − C1

n− 1

Ci = C1 +∆C · (i− 1)

Ci+1 = C1 +∆C · i (2.14)

∆C is a step increase of capacitance for n-π model. Equation 2.13 can

be expressed by C1 and ∆C. As n goes infinite, the delay of interconnect is

presented in the following.

limn→∞

[

R

2n

n−1∑

i=1

i

n(2C1 −∆C + 2 ·∆C · i) + Cn

]

(2.15)

After simplifying, equation 2.15 becomes R·C1+2Cn/6 which should be

equal to R·Ceqv/2.

Ceqv =1

3· C1 +

2

3· Cn

Seqv =3 · S1 · Sn

2 · S1 + Sn

(2.16)

27

Here, S1 is S+∆S1, and Sn is S+∆Sn. Since there is a resistance

shield effect, the capacitance close to a receiver has more effect on delay, and

equation 2.16 implicates the physical meaning.

Coverlay =S

Seqv

· CC (2.17)

Every coupling capacitance can be represented by equation 2.17 after

overlay consideration.

2.4 Timing Analysis with Parameterized Capacitance

This section will show the determination of overlay variables to be the

worst timing result. Once we decide the variables for the worst delay, we

can modify a layout with overlay variables that is able to be used for the

worst corner timing simulation. Then, the traditional timing analysis flow

can be used for the layout which is modified to maximize the delay with

overlay. There are four overlay variables defined. Obviously, the large values

of variables maximize the distortion of layout. However, φ and M are a signed

value, and the angle of translation overlay is purely random variable. Thus,

the efficient way of determining φ, M and θ for the worst timing is required.

Since rotation and magnification overlay depend on the location of pat-

tern, the location shift caused by other type of overlay can affect the shift of

rotation and magnification. However, the shifts are several nm which is rela-

tively small compared with the location of pattern. To see the dependency of

each overlay, we construct two types of random circuit consisted of ten logic

28

depths. We use a different RC generation method for each case of circuit. One

hundreds of circuits in each type are generated randomly.

Table 2.1: Timing Errors ignoring the dependency.Case1 Case2

Average(%) 0.007 0.011Std.Deviation(%) 0.020 0.036Avg.+ 3Std(%) 0.067 0.119

When we find θ to make the worst timing, φ and M are set to zero. θ

and M are set to zero for finding φ of the worst case. Similarly, we find M

when θ and M are zero. The exact variables to make the worst timing result

are obtained by sweeping every variable in a given range. Since the number

of variable to be swept is three, sweeping variables for entire search space is a

time consuming procedure. In table I, we compare the timing result between

the case of ignoring dependency and exact solution obtained by sweeping the

variables. The difference is less than 0.119% in 3-σ level which is negligible.

From this result, we see that one variable for the worst timing can be obtained

by fixing other variables to zero.

Figure 2.14 shows the timing analysis flow for a systematic overlay vari-

ation. Geometric information and coupling capacitance without overlay are

extracted. Coupling capacitances are substituted to Coverlay. Since we do the

variable searching consecutively, we can use the previously obtained variable

as an initial point instead of zero. We could see that overlay variables obtained

by the consecutive way is exactly the same with those obtained by sweeping

29

Figure 2.14: Overlay aware timing analysis flow.

simultaneously. The number of simulation to determine overlay variables for

the worst timing is Nθ+Nφ+NM where Nθ, Nφ and NM is the number of sim-

ulation for sweeping θ, φ and M respectively. With the overlay variables, we

can construct new metal layers considered the variation for the worst corner

timing analysis. The best corner timing analysis can be done in a similar way.

2.5 Applications and Results

In this section, we verify the accuracy of modeling and give an applica-

tion example. Figure 2.15 shows a test structure used to verify the presented

work. Our targeting interconnect for timing analysis is the one in the middle.

Translation overlay will be dominated because the location of wire is close to

the center.

We assumed that α is 3nm, φ is ± 0.05 µ radian, M is ± 0.05ppm, and

30

Figure 2.15: Test structure used to verify overlay aware timing analysis flow

metal spacing without overlay is 48nm. This assumption is reasonable at 3-σ

level [57]. We calculated the delay along the translation angle as shown in

Figure 2.17. As we expected, rotation and magnification overlay is negligible

because the position of interconnect is close to the center. The variation due

to translation overlay is up to 7.8%.

To see the rotation and magnification overlay effect, the same layout

in Figure 2.15 was shifted by (5000um,5000um). With the same overlay con-

dition, the overall timing variation is 9.1%. The additional 1.3% variation

comes from rotation and magnification overlay. When φ is negative, the de-

lay increases because the counterclockwise rotation of the second patterning

reduces spacing between metal layers. Similarly, if M is positive, the second

patterning moves toward the first patterning. Thus, we can see the worst de-

31

(a) Line graph along translation angle (b) Circular graph along translation angle

Figure 2.16: Delay variation with overlay when target layout is in near center.Translation overlay is dominant.

lay when M is positive value. With overlay vs. delay graph, we can see the

variation effects for both the overall overlay and separated overlay source.

It is possible to optimize interconnect with overlay vs. delay graph. Fig-

ure 2.18 shows an alternative decomposition method for DPT which switches

the patterning order of the lower part of targeting interconnect.

The delay of two decomposition cases are compared in Figure 2.19. The

decomposition in Figure 2.18 is more robust than the first one. The delay of the

worst overlay case is 0.87ns which is 3% less than the first case. In the second

decomposition, the variation range is 2.7% while the first decomposition case

has 9.1% variation range. If there are two or more ways to decompose metal

layer, we need a way to decide which one is better. Overlay vs. delay graph

gives us a way to determine the optimized decomposition.

32


Figure 2.17: Delay variation with overlay when target layout is in near edgeof die. Rotation and magnification overlay as well as translation overlay havean impact on delay variation.

The delay propagation effects will be verified in here. We generated

RC network and geometric relation randomly. The same overlay condition in

the previous experiment was used. Two fan-ins and one fan-out are assumed

to construct a circuit connection. The well-designed circuit with fully random

relation that γ is allowed to be 0, π/2, π, 3π/2 shows a self-compensated trend.

As a logic depth increases, the cumulated delay may have less delay variation

because positive and negative variation is compensated for well-designed cir-

cuit. To compare the result with a bad designed circuit, we generate a circuit

with constraints that γ is allowed to be 0, π/2. With the γ restriction, the

circuit has particular geometric relations that the second pattern is always

placed on the upper or right side of the first pattern. As logic depth goes over

ten, the self-compensation effect of a restricted design is much less than the

33

Figure 2.18: Test structure for an alternative decomposition which has lessoverlay effect on timing

case having fully random relation. The overlay aware routing may reduce the

timing variation of a circuit.

Figure 2.21 shows a timing result of multiple metal layer. For simplicity,

only translation overlay is considered in the experiment. With the assumption

that an overlay is independent between layers, the overlay angle making the

worst delay for each layer makes the worst delay when we consider two layers

together. For example, in Figure 2.21, the overlay angle to make the worst

delay is 280 for metal 1, and 160 for metal 2. When we consider two layers

together, the worst delay is the case that each layer makes the worst delay.

After finding the overlay variables of an individual layer to make the worst

delay, we can analysis circuit behavior with these variables for multiple layer

34


Figure 2.19: Comparision of delay variation between two different decomposi-tion scheme.

analysis.

2.6 Discussions

We proposed a method to estimate the layout distortion due to over-

lay which is inevitable for DPT. We define several overlay variables such as

the amplitude of translation overlay, the angle of rotation overlay, and the

magnification factor. With a given overlay variable, we could model the pa-

rameterized coupling capacitance. We showed how to determine the overlay

variables for the worst timing of a chip. In addition, we showed that different

pattern decompositions vary in overlay tolerance. This work provides a way

of designing a robust circuit with consideration of overlay.

35

Figure 2.20: Compensation trend of overlay induced variation as logic depthincreases.

36

(a) Test circuit (b) Delay vs. θ on Metal 1

(c) Delay vs. θ on Metal 2 (d) Delay vs. θ on Metal 1 and 2

Figure 2.21: Timing variation over θ for multiple metal layers.

37

Chapter 3

A Graph-Theoretic, Multi-Objective Layout

Decomposition

3.1 Introduction

In the previous chapter, we discussed overlay effect on timing in DPL.

DPL requires layout decomposition before manufacturing the process. The

other challenge for DPL is layout decomposition. As we observed in the pre-

vious chapter, robust decomposition can mitigate overlay effect on timing.

When the distance between two patterns is less than minimum space(mins),

two rectangles need to be assigned to different masks. We can resolve conflicts

by inserting stitches during decomposition as shown in Fig. 3.1(a). However,

minimum stitch insertion is preferred to reduce area and improve yield. Not

all the conflicts can be resolved by inserting stitches. An irresolvable conflict is

called a native conflict in Fig. 3.1(b). The only way to remove native conflicts

is to modify layout. After layout modification, we need to decompose layout

iteratively. Therefore, a fast decomposition algorithm is required to shorten

the time of an iterative layout modification and decomposition.

Several rule-based decomposition methods have been proposed in [17,

43, 45, 90, 91, 99]. The combined approach of routing and decomposition was

38

(a) Decomposable layout (b) Undecomposable layout

Figure 3.1: Layout decomposition.

proposed in [17]. The paper in [43, 45, 90] proposed a layout decomposition

method based on constraint graph construction and ILP. The paper in [99]

presented a simultaneous optimization of conflicts and stitches with an ILP

formulation. Our algorithm does not need to consider conflicts during de-

composition explicitly because our framework guarantees conflict-free after an

initial (relative) coloring. In addition, since ILP is NP-hard, the algorithm

in [43, 99] cannot be applied to large scale decomposition without layout par-

tition, which is required to decompose layout balanced or overlay compensated.

Even though layout partitioning can be used, optimality cannot be guaranteed.

Furthermore, conventional metrics are just decomposability and stitch. There-

fore, we need a framework that can handle all requirements within reasonable

time.

The proposed decomposition framework is highly scalable enough for

large scale layout decomposition. The decomposition framework consists of

three steps: relative coloring, constraint insertion for robust decomposition,

and group color assignment by bi-partitioning. In the framework, relative

39

coloring works for removing every conflict except native conflicts, and we opti-

mize decomposition for minimum stitch and balanced density during a group

color assignment step. The optimization for decomposition is mainly based on

a graph theoretical approach, which is extended from a min-cut graph parti-

tioning algorithm such as FM [26]. Overall, the framework runs in polynomial

time.

Another benefit of our framework is that balanced density can be

achieved concurrently with stitch minimization. Uneven density can cause

more not only pattern distortion and hotspot [87], but also etch bias between

two patterning steps, while a balanced layout allows more space for scatter-

ing bar insertion during OPC(optical proximity correction) as well as better

patterning quality and uniformity. Therefore, the decomposed two layers need

to be balanced as much as possible. Our framework balances local density as

well as global density.

In addition to the above advantages, our framework is flexible to add

more constraints during decomposition. For applications to show the flexi-

bility of our framework, we propose contact decomposition and overlay self-

compensated decomposition. When DPL is used for contact layer patterning,

source and drain contacts forming a transistor need to have the same color

in order to reduce side effects due to overlay between the first and the sec-

ond contact patterning. In addition, because metal patterns formed by DPL

can have space variation due to overlay, it can change coupling capacitance

between neighboring wires, resulting in timing variation [15, 30, 96]. TDD con-

40

straints are generated by an ILP formulation to mitigate the timing variation

due to overlay. The number of variables in our ILP formulation for TDD

constraints generation is tractable enough for large scale decomposition. The

TDD constraints are inserted as a weighted edge in our graph modeling for

color assignment. In the experimental result, we show that timing variation

caused by overlay can be reduced from 8.1%∼9.3% to less than 0.5%.

Overall, our graph theoretic decomposition optimizes layout for mini-

mum stitch insertion with balanced density and overlay compensation, simul-

taneously.

The rest of the chapter is organized as follows. We introduce several

requirements for decomposition in section 3.2. Our decomposition framework

is proposed in section 3.3. Robust contact decomposition will be shown in

section 3.4. In section 3.5, we will explain how to extract TDD constraints, and

show our graph based TDD approach. We show hierarchical decomposition

in section 3.6. Experimental results are shown in section 3.7, and we draw

discussions in section 3.8.

3.2 Decomposition Requirements and Motivation

3.2.1 Balanced Density

Even though two features within minimum space are assigned to dif-

ferent masks, unbalanced density can cause lithography hotspot as well as

lowered CD uniformity due to irregular pitch. The aerial image of an unbal-

anced decomposition can have a patterning problem as shown in Fig. 3.2(a).

41

(a) Unbalanced decomposition: 38%(Red) and 62% (Blue)

(b) Balanced decomposition: 48%(Red) an 52% (Blue)

Figure 3.2: Aerial image comparison of decomposed layout with different den-sity.

The balanced decomposition of the same circuit in Fig. 3.2(b) does not have

the problem. Even though the decomposed layout in Fig. 3.2 does not repre-

sent a general case, balanced decomposition provides more lithography friendly

layout because of higher regularity [73]. Therefore, balanced density should

be considered in decomposition algorithms.

3.2.2 Robust Contact Decomposition

In DPL, overlay between source/drain(S/D) contact can change the

distance between contact to gate. Fig. 3.3(a) shows contact patterning without

overlay. In traditional single patterning, S/D contact displacement due to

overlay is the same direction as shown in Fig. 3.3(b). However, if we assign S/D

contact to different masks in DPL, overlay of S/D contact may not be the same

direction. Two worst cases are shown in Fig. 3.3(c) and Fig. 3.3(d). When

42

we ignore contact stress effect, a transistor in Fig. 3.3(c) has the maximum

channel current, while Fig. 3.3(d) shows the minimum current case. In the

section 3.4, we propose a robust decomposition for contact layer to address

the S/D contact mismatch.

(a) No overlay between S/Dcontacts

(b) S/D overlay with single pat-terning

(c) Maximum Ion case in DPL (d) Minimum Ion case in DPL

Figure 3.3: An illustration of contact to gate space variation due to overlay.

3.2.3 Overlay Compensation

The main problem of litho-etch-litho-etch (LELE) DPL comes from

overlay between the 1st and the 2nd patterning step. Even though manufac-

43

(a) Decomposition without overlaycompensation

(b) Decomposition with overlay com-pensation

Figure 3.4: Overlay compensated decomposition.

turing engineers are working to gain more overlay control, it is impossible

to remove overlay completely. Therefore, it is desirable to compensate over-

lay during design phase. Fig. 3.4 shows an observation how to compensate

the overlay effect. Without overlay consideration during decomposition, the

variations for the coupling capacitance between two metal layers are in the

same direction(Fig. 3.4(a)). However, Fig. 3.4(b) shows less timing variation

because C1 decreases while C2 increases with overlay. In the section 3.5, we

propose a decomposition method in order to compensate overlay effect.

3.3 Bi-partitioning Based Decomposition

3.3.1 Overall Decomposition Flow

The overall flow of our framework is shown in Fig. 3.5. The first step is

to divide every shape into rectangles which are basic units in our framework

because polygon is difficult to find neighboring shape, and dividing into single

grid unit can increase complexity and memory usage.

Fig. 3.6 shows the decomposition result after each step. A rectangle

44

Figure 3.5: Overall flow of the decomposition framework.

is divided into smaller ones based on projected region from a non-touching

neighboring rectangle in Fig. 3.6(b). The re-segmented rectangles based on the

projection in Fig. 3.6(c) are grouped by traversing non-touching neighbors in

Fig. 3.6(d). If a re-segmented rectangle does not have a non-touching neighbor,

the rectangle becomes an independent group by itself. A group consisted of

more than two rectangles is defined as a dependent group. There are nine

dependent groups in Fig. 3.6(d). In Fig. 3.6(e), initial (relative) colors in

each group is assigned to the rectangles to remove every conflict. There are

23 stitches after relative color assignment which does not consider the stitch

minimization. We will explain grouping and relative coloring in section 3.3.3.

Layout decomposition at the relative coloring step is done without any

objective. To minimize stitches, we introduce group color determination with

45

(a) Segmentationinto rectangles andfinding conflicts

(b) Projection toconflict rectangleswithin mins

(c) Re-segmentationbased on projectedrectangles

(d) Rectangle group-ing for initial coloring

(e) Relative coloringto remove conflictswithin groups

(f) Group color as-signment to minimizestitch insertion

Figure 3.6: Decomposition results after each step (AOI2 cell, Metal1).

the min-cut partitioning based algorithm in section 3.3.5. Fig. 3.6(f) shows

the final coloring result after stitch minimization. After bi-partitioning of

decomposition graph, rectangles in the same partition have the same coloring.

After the partitioning, group coloring may be flipped or kept the same color.

For example, all rectangles in G1, G3, and G5 flipped their initial color in

Fig. 3.6(f). Notice that an independent group may flip its color as well. After

color flipping, three stitches are required to resolve conflicts.

The separation of relative coloring and group color assignment enable

46

Algorithm 1 Finding neighbors in left edge

1: Sorted R according to right coordinate : RR2: for each rectangle r ∈ R do3: r.left - mins ≤ r′ ≤ r.left : r′ ∈ RN4: for each rectangle r′ ∈ RN do5: if Distance(r,r′) == zero then6: r →touching neighbor.insert(r′)7: else if Distance(r,rr) ≤ mins then8: r →non touching neighbor.insert(r′)9: end if

10: end for11: end for

us to find a native conflict during the relative coloring step, which reduces the

design time for iteratively removing native conflicts and decomposition.

3.3.2 Finding Neighboring Rectangles

After converting polygons into rectangles, touching neighbors as well as

non-touching neighbors which are the rectangle within the given distance mins

are obtained for every rectangle. Algorithm 1 shows the pseudo-code to find

neighbors from left edge of a rectangle. Searching right, bottom, and top edges

are done in the similar way. First, we increasingly sort all rectangles based on

right edge coordinate. Let r.left be the left position of a rectangle r. After the

sort, non-touching neighbors of r are limited to rectangles having a right edge

coordinate from r.left - mins to r.left. The complexity of finding neighbors

is O(NlogN) due to the sorting, in which N is the number of rectangles.

47

(a) Re-segmentation (b) Relative coloring

Figure 3.7: Re-segmentation and grouping process.

3.3.3 Grouping and Relative Coloring

A rectangle is divided into smaller pieces of rectangles based on the

candidate location for stitch insertion. For example, in Fig. 3.7(a), a rectangle

is divided into three small rectangles (r1, r2, r3). Projection from r4 makes

r1 to be colored differently from r4. In addition, to get enough margin for

overlap of stitch insertion, r1 should be extended by minom/2, where minom

is the minimum overlap margin for overlay. r3 and r5 should have a different

color in a similar way. r2 does not have a non-touching neighbor which means

that we can assign any color to r2. The re-segmented rectangles are grouped

by a neighboring relation in Fig. 3.7(b). A relative color is assigned to all the

grouped rectangles, which indicates their relative relationship to other rectan-

gles in the same group. Grouping and relative coloring can be implemented

simultaneously with Algorithm 2.

48

Algorithm 2 Grouping and Relative Coloring(DFS)

1: A set of re-segmented rectangles : S2: RelativeColor = 03: for each rectangle s ∈ S do4: Create new group G5: s →relative color = Relative Color6: G →rectangle.insert(s)7: A set of non touching neighbors : N8: for each rectangle n ∈ N do9: Recursive Relative Coloring(G,n,INV(RelativeColor))

10: end for11: end for12: Recursive Relative Coloring(G,s,RelativeColor)13: s →relative color = RelativeColor14: G →rectangle.insert(s)15: A set of non touching neighbors : N16: for each rectangle n ∈ N do17: Recursive Relative Coloring(G,n,INV(RelativeColor))18: end for

3.3.4 Group Color Assignment Problem

In a given layout, let G = gi|1 ≤ i ≤ n be a set of groups and

H = hw|1 ≤ w ≤ x be a set of stitch candidates and S = sj|1 ≤ j ≤ m

be a set of stitches to be inserted. Every boundary between two touching

groups becomes a candidate for stitch insertion. x is the number of touching

pairs of groups. The goal of our problem is to find minimum set of S from

H. When two touching groups have different colors, hw becomes an element

of S. Let G0 = gi0 |1 ≤ i0 ≤ n0 be a set of groups assigned to color0 and

G1 = gi1 |1 ≤ i1 ≤ n1 be a set of groups assigned to color1. The group color

assignment is a process which assigns gi to either G0 or G1 to minimize m, the

49

element number of S.

Let Aw and Bw be the wth touching pair between two groups. If Aw and

Bw have different colors, stitch insertion is required and hw becomes an element

of S. From the observation, an objective function for stitch minimization can

be formulated as minimizing∑

w Aw

⊕

Bw, which is formulated with an ILP in

(3.1). Exclusive-OR operation can be formulated in ILP with four constraints

in the following formulation.

Minimize :∑

Xw

Subject To : Aw + Bw ≥ Xw

: Aw −Bw ≤ Xw

: −Aw + Bw ≤ Xw

: −Aw −Bw + 2 ≥ Xw (3.1)

3.3.5 Min-Cut based Stitch Minimization

To link the boolean objective function with layout segmented into rect-

angles, a boolean variable is assigned to each rectangle to indicate the group

and relative color. For example, rectangle group A in Fig. 3.8(a) is expressed

by A and A to indicate group and relative color. Independent rectangle groups

represented by X, Y, and Z do not have a complementary variable.

There are five groups in Fig. 3.8(a). Our goal is to determine the five

binary variables to minimize the objective function in Fig. 3.8(b). The ILP

50

(a) Boolean variable assignment aftergrouping and relative coloring

(b) Formulation for minimumstitch insertion

Figure 3.8: An example showing group color assignment formulation afterrelative coloring.

formulation in (3.1) is not an efficient way for the problem because ILP is in

class NP-Hard. There is no way to find the exact solution for the formula in

polynomial time because the number of solution sets is 2n, when there are n

groups in a layout. Therefore, we need an efficient heuristic method for our

problem.

If we present the objective function as a graph, according to the follow-

ing theorem, the group color assignment for minimum stitch insertion can be

achieved with a graph bi-partitioning algorithm. Even though a graph parti-

tioning does not guarantee an optimal solution, efficient heuristics such as FM

have been studied.

Theorem 1. The number of stitches in layout decomposition is equal to the

cut size of the bi-partitioning problem in graph theory.

51

Proof. Let G = gi|1 ≤ i ≤ n be a set of groups and S = sj|1 ≤ j ≤ m be

a set of stitches. After group color assignment, G0 = gi0 |1 ≤ i0 ≤ n0 is a

set of groups assigned to color0 and G1 = gi1 |1 ≤ i1 ≤ n1 is a set of groups

assigned to color1. sj appears only when two touching groups are assigned

to gi0 and gi1 because a stitch disappears when two touching groups have the

same color. Therefore, the number of stitches, m, is equal to the number of

times that two touching groups have different colors.

In a similar way, let C = ck|1 ≤ k ≤ p be a set of cuts and V =

vl|1 ≤ l ≤ q be a set of vertexes in the graph for bi-partitioning. The

cut in bi-partitioning appears when two connected vertexes are in a different

partition. Therefore, the cut size, p is equal to the sum of the edge weight

linked between other partitions. When we consider G as a vertex in a graph,

G0 and G1 become two partitioned set of vertexes. Therefore, m becomes p

with graph partitioning expression, and the number of stitches is equal to the

cut size of the bi-partitioning.

According to Theorem 1, the solution of a min-stitch formulation is the

same as the results of a min-cut bi-partitioning. The only difference from a

conventional min-cut partitioning algorithm is that a vertex representing the

group color is incompatible with a complementary vertex of the same group.

We define the two nodes to be in different partitions as a repulsive pair.

52

3.3.6 Modified Graph Min-Cut Partitioning

We implement the classic FM partitioning algorithm with the following

modification in order to support repulsive layout pairs and balanced density.

(a) A and A are in the same par-tition

(b) A and A are in the differentpartitions

Figure 3.9: An example showing how to handle a repulsive layout pair in FMpartitioning algorithm.

Repulsive pairs cannot be guaranteed to be assigned to different par-

titions simply by assigning negative weights because min-cut partitioning can

fall in local minima. Therefore, in our method, we process the repulsive pairs

in two different ways. First, if the two nodes are in the same partition, their

moving gains are calculated independently. The one with a higher gain is sub-

ject to be moved first, then upon moving, the other one will be locked as well

53

with a gain of zero. In Fig. 3.9(a), A and A form a repulsive pair, but they

are in the same partition, so their gains are zero and eight respectively. When

A is moved, A is locked as a result.

On the other hand, if the two nodes are in the opposite partition, their

gains are kept the same and the value is the sum of their original gains. Upon

picking one of them as the candidate for moving, a swap is carried out to ensure

they are still kept on opposite sides. If A and A are in opposite partition like

shown in Fig. 3.9(b), the sum of their original gains is eight, so their gains

become eight. It results in a swap in A and A when A is picked for moving.

Regardless of what the initial partition is in this example, it results in the

same final solution under our method.

In Fig. 3.10, we construct a graph to decompose the layout in Fig. 3.8.

The pair of group colors represented by A and A is a repulsive pair. E and E

also need to be in different partitions. Edge weight between two vertexes means

the number of touching pairs between two groups. For example, since A and

X have two touching points, the edge weight between A and X becomes two.

Fig. 3.10(a) shows min-cut partitioning and the corresponding decomposed

layout for minimum stitch insertion. Red color is assigned to all rectangles

belonging to groupA and groupE, while rectangles in A, E, X, Y and Z are

assigned to blue color.

During partitioning, we can control stitch count and density balance.

Fig. 3.10(b) shows the balanced partitioning and the decomposed layout which

has four stitches. Global balance can be achieved by adding a weight on a

54

(a) Min-cut partitioning and the decomposed layout

(b) Balanced min-cut partitioning and the decomposed layout

Figure 3.10: Group color assignment with different partitioning scheme anddecomposition results.

vertex in the graph. In order to maintain a balance between two partitions,

we need to start with a balanced initial partitioning solution and then keep

tracks of the total weight on both partitions. Every time before moving a

vertex into another partition, a check is carried out to ensure the area of the

original partition does not drop below a certain threshold after moving.

To consider local balance, we need to divide the layout into subdivisions

and assign a balance constraint to each of them as shown in Figure 3.11. The

constraints for local balanced density are presented in Figure 3.11(b). r is the

55

(a) Layout division (b) Local balance constraints

Figure 3.11: Layout division for local balance.

balance ratio, and when r is 0.5, the decomposed layers are evenly distributed.

Wji is the total area of Rji. smaxji is the weight of a vertex having maximum

weight in Rji. Aji is the sum of vertex weight in a partition. When a layout

is divided into i columns and j rows, the number of constraints is i× j. The

local balance is guaranteed by not moving a vertex to another partition if the

move breaks any of the balance constraints.

3.3.7 Complexity Analysis

Let N be the number of rectangles, and E be the number of neighbor-

ing pairs. Segmentation from polygon into a rectangle takes O(N). Finding

neighbors need O(NlogN) because sorting according to coordinate is required.

The complexity of projection to non-touching neighbor is O(E). Grouping and

relative coloring using DFS requires O(N+E). Group color assignment with

min-cut partitioning can be done in linear time, O(N). Therefore, the com-

plexity of the framework is O(NlogN). Since generating TDD constraints runs

56

for selected nets and the number of variable is usually less than one hundred,

we can ignore the complexity of the TDD part.

The proposed decomposition framework works in polynomial time, while

ILP based approaches [43, 99] are in NP-hard class.

3.4 Application to Contact Layer Decomposition

The early applications of DPL are contact and metal layers. Since

transistor size becomes smaller, contact becomes one of dominant variation

source [7]. As we discuss in the section 3.2.2, contact mismatch can cause

additional transistor variation.

We use TCAD simulation in order to verify current variation due to

contact overlay in DPL. We assume that maximum overlay is 7.5nm which

means that the overlay case for Fig. 3.3(d) has +15um S/D contact space

variation. Since single patterning has no space variation between S/D con-

tact, zero of space variation means single patterning results in Fig 3.12. As

we can expect, if we make S/D contacts with the different masks in DPL,

∆Ion variation due to overlay becomes almost two times than that of single

patterning. Therefore, we need to print S/D contacts on the same mask in

order to remove the additional variation due to overlay in DPL.

Our decomposition framework can be used to address the requirement.

Fig. 3.13(a) shows contact layers to be decomposed, and violated contact pairs

for minimum spacing rule are (C,G), (F ,I), (H,K) and (J ,N) in the figure

57

Figure 3.12: ∆Ion variation for various overlay cases.

because spacing between pairs is less than mins. In the graph expression,

the violated pairs are repulsive nodes and represented by fine (red) lines in

Fig. 3.13(c). Since S/D contacts in a transistor need to be assigned to the

same mask, we insert strongly connected edges between S/D contacts. The

strongly connected edges are not allowed to be cut during graph partitioning.

In order to guarantee that, we group the strongly connected node as a virtual

node in Fig. 3.13(d). By grouping the strongly connected edge, we can make

sure that strongly connected nodes are partitioned to the same mask. The

corresponding decomposed result is shown in Fig. 3.13(b). Even though we

enforce S/D contacts to be in the same mask, area increase is negligible if S/D

contact space is larger than mins defined in design rule, because the proposed

decomposition does not generate more mins violation.

58

(a) Original contact layer (b) Decomposed contact layer with overlayconsideration

(c) Decomposition graph for the contact layer (Bold line:strongly connected edge, Fine (Red) line: repulsive rela-tions)

(d) Simplified graph after contactgrouping

Figure 3.13: An application to contact layer decomposition.

59

3.5 Application to Robust Metal Layer Decomposition

3.5.1 TDD Constraints for Overlay Compensation

We can get a robustly decomposed layout against timing variation due

to overlay. Delay variation due to overlay is expressed in Eqn(3.2). Since

non-coupling capacitance such as metal to ground and output loading is in-

dependent of overlay, ∆Delayoverlay does not need to consider non-coupling

capacitance.

∆Delayoverlay = Delayno overlay −Delayoverlay (3.2)

In Eqn(3.3), let Cc be a coupling capacitance when there is no overlay

with distance d0. ǫ is the permittivity of an isolation material such as SiO2.

When we consider space variation due to overlay, Cc becomes Cc overlay which

has a distance variation presented by ∆d0. Let g be a variation of the coupling

capacitance between horizontally parallel patterns, and h be a variation of the

coupling capacitance between vertically parallel patterns.

Cc = ǫArea/do,∆Cc = do/(do +∆do)

Cc overlay = Cc∆Cc = ǫArea/(do +∆do)

g and h = 1−∆Cc = ∆do/(do +∆do) (3.3)

Let ak be the multiplication factor of the coupling capacitances for a

horizontally parallel pattern and bk be the multiplication factor for vertically

parallel patterns. ak and bk are defined as kth coupling capacitance times

60

the resistance from driver to the location of the coupling capacitance. In

equ 3.4, Rd is the driver resistance and Rn is the nth resistance from the driver.

We assume that coupling capacitance is located in the mth resistance. MFk

is the miller factor [13, 41] of the kth coupling capacitance for equal timing

determined by slew rate of victim and aggressor.

ak and bk =

Rd +m−1∑

n=1

Rn + 0.5Rm

MFkCck (3.4)

∆Delayoverlay can be re-written in Eqn(3.5). i denotes the number of

aggressors horizontally, and the number of aggressors vertically is presented

by j.

∆Delayoverlay = a1g1 + . . .+ aigi + b1h1 + . . .+ bjhj

= AGT + BHT

where, A = [a1 a2 . . . ai] , G = [g1 g2 . . . gi]

B = [b1 b2 . . . bj] , H = [h1 h2 . . . hj] (3.5)

Distance variation due to translation overlay is expressed in Eqn(3.6)

according to [96]. For simplicity, we will focus on translation overlay. Our

work can be extended to include rotation and magnification overlay. λ is the

amplitude of translation overlay, and θ is the translation angle which is a

totally random value. γ represents the relative geometry relation between the

61

1st and the 2nd pattern. Then, our problem is defined to assign γ to reduce

overlay effect on timing.

∆do = λcos(θ − γ) (3.6)

From Eqn(3.3) and (3.6), we can derive gi and hi. Since gi has two

possible choices according to γ (Note that the horizontally parallel pattern

can have two geometric relations: γ = π/2 or γ = 3π/2), gi is modeled with

a binary variable, xi which indicate geometric relations. For example, if xi

has one, it means that γ = π/2 is preferred to reduce overlay effect, while

the opposite geometric relation (γ = 3π/2) is preferred if xi is zero. We can

simplify gi when d0 is relatively larger than λ. Since overlay amplitude should

be controlled to less than 10% of spaces(d0) between two patterns, d0+λsin(θ)

can be simplified to d0. hj has a π/2 phase difference with gi. After simplifying,

we present gi and hj in Eqn(3.7).

gi =xiλcos(θ − π/2)/(d0 + λcos(θ − π/2))

+(1− xi)λcos(θ − 3π/2)/(d0 + λcos(θ − 3π/2))

≈ λ

d0(2xi − 1)sin(θ)

hj =yjλcos(θ)/(d0 + λcos(θ))

+(1− yj)λcos(θ − π)/(d0 + λcos(θ − π))

≈ λ

d0(2yj − 1)cos(θ) (3.7)

62

From Eqn(3.5) and (3.7), our cost function for overlay impact on timing

is derived in Eqn(3.8). To minimize the cost function, we need to minimize√

α2 + β2, the amplitude of sin(θ + φ). φ works only for phase difference.

Since the horizontal and vertical patterns are orthogonal, we can optimize α2

and β2 independently, which means that the overlay impact is minimized when

α2 and β2 have minimum values, respectively.

√

α2 + β2sin(θ + φ)

where, α = 2AXT −i∑

n=1

an, β = 2BY T −j∑

n=1

bn

X = [x1, x2, . . . , xi] , Y = [y1, y2, . . . , yj ]

φ = sin−1

(

β√

α2 + β2

)

(3.8)

Finally, we propose an ILP formulation to find the relative positions of

horizontally parallel patterns to minimize α2.

minimize 4×i∑

n=1

an

(

AW Tn − wnn

i∑

p=1

ap

)

+

(

i∑

p=1

ap

)2

s.t. wii = xi

1 + wij ≥ xi + xj

xi ≥ wij

xj ≥ wij (3.9)

63

New binary variables, wij are introduced to change quadratic integer

programming to linear integer programming. xi and wij are binary variables

in (3.9), and Wn are defined as follows.

Wn = [wn1 wn2 . . . wni] (3.10)

In a similar way, we can get a solution set of Y to minimize β2 by

substituting a to b, and x to y in the above ILP formulation.

Figure 3.14: An example with four neighbors.

Fig 3.14 shows an example pattern to apply the proposed method.

a1, a2, b1, and b2 corresponding to each coupling capacitance are presented in

Fig 3.14. Here, there are two horizontally parallel aggressors and two vertically

parallel aggressors.

α = a1(2x1 − 1) + a2(2x2 − 1)

β = b1(2y1 − 1) + b2(2y2 − 1) (3.11)

64

α and β of Fig 3.14 are shown in Eqn(3.11). By the proposed ILP

formulation in Eqn(3.9), we can determine x1 and x2 to minimize α2, and

y1 and y2 to minimize β2, respectively. Here, x1 and x2 mean the relative

decomposition for Cc1 and Cc2 as shown in Fig. 3.15.

(a) Overlay compensation with threestitches

(b) The same compensation with twostitches

Figure 3.15: Different decomposition with TDD.

After applying our ILP formulation, we can get two different decompo-

sition results as shown in Fig 3.15. Three stitches are inserted at Fig 3.15(a).

However, a different solution for the same overlay compensation requires only

two stitches. We will present our combined approach of TDD constraints and

graph based stitch minimization in order to obtain stitch optimization with

TDD constraints.

3.5.2 TDD Constraints aware Decomposition

From the previous subsection, we can get constraints for TDD. X and Y

are determined by the ILP formulation. Let X be partitioned into X0 and X1.

65

(a) After relative coloring (b) Group color assignment withoutTDD constraints

(c) TDD Constraints inser-tion

(d) Group color assignment when the edgeweight(w) is bigger than one

Figure 3.16: Decomposition with TDD constraints.

In Fig 3.15(a), X0 is x1 while x2 belongs to X1. Since the complement of x2

is x′2, we can express the constraints with X0 = x1, x

′2. Since the elements in

X0 need to have the same color, x1 should be forced to be with x′2 in the same

partition. We can make them to be in the same partition by adding weighted

edge. The allowed stitch increase can be controlled by adjusting the weight.

Decomposition for more overlay compensation needs to increase edge weight

which results in more stitches.

66

Fig. 3.16 shows the graph based TDD procedure. The grouping and

relative coloring result is presented in Fig. 3.16(a). Graph based decomposition

without TDD constraints is in Fig. 3.16(b). Two edges are inserted to consider

TDD constraints in Fig. 3.16(c). When the weight denoted by w is bigger than

one, new partition result is in Fig. 3.16(d). Two stitches are inserted for TDD,

and the decomposition result looks like Fig. 3.15(b).

3.6 Application to Hierarchical Decomposition

One more requirement for layout decomposition is hierarchical decom-

position support. Since standard cells are optimized with manual effort, chang-

ing coloring within standard cell may not be preferred. Our proposed decom-

position can be applied to hierarchical decomposition as well as flattened de-

composition. In Fig. 3.17 (a), we show flattened decomposition of connected

cells (inverter-nor-inverter). After decomposition graph generation and min-

cut partitioning, we can get decomposed layout for minimum stitch insertion

in Fig. 3.17(b). We can observe three stitches in both decomposition graph

and decomposed layout.

We show the decomposition flow with pre-decomposed standard cells

in 3.18. Let inverter1 be decomposed into I1 and I1’, and Nor cell be pre-

decomposed into N1 and N1’, and Inverter2 be pre-decomposed into I2 and

I2’. Then, we can construct decomposition graph based on pre-decomposed

metal layer. The size of decomposition graph reduces dramatically. When

N cells need to be decomposed, the number of node in decomposition graph

67

becomes 2N . After min-cut partitioning of the decomposition graph, we can

generate hierarchical decomposed layout in 3.18(c). Only Inverter2 flips its

coloring to minimize the number of stitch. Our proposed decomposition works

without losing pre-assigned relative color by only allowing color flipping.

3.7 Experimental Results

We implement the decomposition framework in C++ and OpenAc-

cess2.2 in order to interface with GDSII directly. We test on a 3.0GHz Linux

machine with 4G RAM to verify our algorithm.

First, we present the decomposed layout for metal 1 layer in Fig. 3.19.

Minimum metal width and space used in Fig. 3.19 (a) are 32nm, 34nm, respec-

tively. Our framework works well for the complicated metal patterning. We

verify that there is no design rule violation after decomposition. 13 stitches

are inserted to resolve rule violations.

Second, we show the scalability and runtime of our algorithm. Poly-

silicon layer in benchmark circuits is scaled down to 40nm half pitch. ISCAS-

85&89 benchmark circuits are used to verify the scalability. Before decompo-

sition, minimum space between poly-silicon was 40nm. We select mins=42nm

and minom=10nm for decomposition to avoid native conflicts, which should be

removed by layout modification. ISCAS-89 circuits have many native conflicts

when mins is bigger than 43nm because the ISCAS-89 benchmarks we have

are not designed for double patterning friendly. Table 3.1 shows the runtime of

decomposition as the design size increases. S38584 which is the biggest circuit

68

(a) Decomposition graph with cell abutment (Inverter-Nor2-Inverter)

(b) Corresponding decomposed layout

Figure 3.17: Flattened layout decomposition.

69

(a) Initial decomposed standard cells (b) Hierarchical de-composition graph

(c) Hierarchically decomposed layout

Figure 3.18: Hierarchical layout decomposition.

70

(a) Metal 1 decomposition with 13 stitch insertions

(b) First mask (c) Second mask

Figure 3.19: Metal 1 decomposition with our framework.

71

(a) Graph partition time (b) Total runtime

Figure 3.20: Graph showing runtime increment as circuits are bigger.

in the benchmark can be decomposed evenly in 285.24s. Since mins=42nm is

close to 40nm minimum space, only a few stitches are required to resolve the

conflicts. The graph in Fig. 3.20(a) supports that our min-cut partition based

approach has linear time complexity as we mentioned in the previous section.

Fig. 3.20(b) shows total runtime of decomposition, versus circuit size.

Third, we verify the quality and efficiency of our framework. Table 3.2

shows the result of our ILP formulation in (3.1) and our heuristic result based

on min-cut partitioning is in Table 3.3. We compare runtime and stitch op-

timization during layout decomposition. GLPK(GNU Linear Programming

Kit) solver is used for ILP solving. Because decomposition with ILP formu-

lation is intractable as circuit size increases, we divide a circuit into several

parts by row of cells because we cannot decompose with ILP even for small

circuits like C499. Since poly-silicon layers are isolated with other parts of

the circuit by row of cells, decomposition after dividing into several rows still

provides an exact solution with the ILP formulation. Note that our ILP im-

72

Table 3.1: Runtime comparison(mins = 42nm)

Runtime(second) ResultsCircuit #Group #Touching Except Min-cut Inserted Balance

neighbors partition partition Total Stitches ratio(%)

C432 1554 763 0.48 0.01 0.49 0 48.18C499 3503 2260 1.12 0.23 1.35 0 48.08C880 3105 1308 0.86 0.07 0.93 0 48.10C1355 4630 2091 1.39 0.21 1.60 1 48.05C1908 7403 3447 2.81 0.52 3.33 0 48.23C2670 11325 5291 3.32 0.92 4.24 0 48.05C3540 13934 6062 4.71 1.26 5.97 0 48.02C5315 20393 9382 7.19 2.01 9.20 5 48.02C6288 18836 7764 6.14 1.27 7.41 0 48.02C7552 29642 13344 11.84 3.13 14.97 0 48.03S1488 5952 2558 1.57 0.29 1.86 0 48.01S15850 7983 3282 130.38 9.78 140.16 0 48.04S35932 188556 75943 263.29 14.21 277.50 0 48.01S38417 195448 74311 270.43 18.49 288.92 0 48.00S38584 188298 72342 262.66 22.58 285.24 1 48.01

plementation provides an optimal solution for minimum stitch insertion with

more runtime because our ILP implementation does not use any speed-up tech-

nique described in [43, 99], which may sacrifice optimality. We usemins=54nm

and minom=20nm to enable more potential stitch insertions. Since ISCAS-89

benchmarks have native conflicts when mins is below 54nm, we do not show

their results in Table 3.2 and Table 3.3. When mins is bigger than 54nm, there

are native conflicts on several ISCAS-85 benchmarks. Therefore, we choose

mins=54nm based on the availability of decomposition, which is enough to

show the efficiency of our framework. The quality of the min-cut graph par-

73

Table 3.2: Decomposition results with ISCAS benchmark(mins = 54nm,minom = 20nm), ILP solver(exact method)

No balance, ILP(Exact)Circuit #Groups #Touching #Partitions RunTime Inserted Balanced

neighbors for ILP (total) stitches ratio(%)

C432 1512 1098 1 0.63 1 20.35C499 3103 3280 12 100.85 50 24.01C880 3758 2631 14 4525.57 198 30.09C1355 4836 3083 18 702.40 114 18.91C1908 7795 5472 18 37019.76 371 22.09C2670 12863 9905 - > 24Hr - -C3540 16638 12021 - > 24Hr - -C5315 24483 18373 - > 24Hr - -C6288 19922 11577 - > 24Hr - -C7552 34309 24789 - > 24Hr - -

Table 3.3: Decomposition results with ISCAS benchmark(mins = 54nm,minom = 20nm), Graph Partition(Proposed heuristic)

No balance 48% balanceCircuit RunTime RunTime Inserted Balanced RunTime RunTime Inserted Balanced

comparison (sec) stitches ratio(%) comparison (sec) stitches ratio(%)

C432 x1.4 0.46 1 33.60 x1.0 0.65 2 48.12C499 x49.9 2.02 50 46.47 x49.9 2.02 50 48.50C880 x2773.0 1.63 198 47.12 x2807.4 1.61 198 48.87C1355 x347.4 2.02 114 36.12 x344.0 2.04 114 48.00C1908 x9762.6 3.79 372 46.78 x10422.2 3.55 373 48.66C2670 - 6.70 947 43.51 - 6.87 948 49.30C3540 - 9.85 1034 41.46 - 10.07 1034 49.39C5315 - 17.43 1546 40.87 - 18.50 1549 48.00C6288 - 11.57 256 30.81 - 11.25 256 48.13C7552 - 30.89 2058 41.97 - 31.52 2060 48.02

titioning depends on the initial partitioning. Thus, we execute the modified

min-cut partitioning for twenty times and pick the case with minimum stitch.

Runtime means the total runtime for twenty runs. When we do not consider

74

balanced density, min-cut partitioning and ILP based decomposition have the

same number of stitches except C1908. C1908 has one more stitch in our

approach. The runtime of graph theoretic decomposition is 1.4∼9762.6 times

faster than that of ILP based decomposition.

(a) Blue(13%), Green(87%) (b) Blue(50%), Green(50%)

Figure 3.21: Extremely unbalanced decomposition (S38584).

Fig. 3.21(a) shows the extremely unbalanced decomposition. Even

though every rule violation is removed in Fig. 3.21(a), we can further improve

patterning quality with balanced partitioning.

Fig. 3.22 compares decomposition between the unbalanced and bal-

anced case at mins=60nm and minom=20nm for C432. The unbalanced layer

has nine stitches with 27% balanced ratio while the balanced decomposition

has 17 stitches with globally 50% balanced ratio. In addition, Fig. 3.22(b)

is locally balanced because we enforce the local density balance between 40%

and 60% in each cell row. The result also shows that there is the trade-off

75

(a) Yellow(27%), Blue(73%) (b) Yellow(50%), Blue(50%)

Figure 3.22: C432 decomposed layout.

between stitch counts and density balance.

OPC and lithography simulation is executed using CALIBRE for the

two cases in Fig. 3.22. Edge Placement Error(EPE) distributions after OPC

are compared in Fig. 3.23. The balanced decomposition in Fig. 3.22(b) shows

lower EPE distribution than the unbalanced decomposition in Fig. 3.22(a),

which indicates that the balanced decomposition has less variation than the

unbalanced decomposition.

Next, we verify the effectiveness to reduce variation caused by overlay

between contact layers in DPL. Fig. 3.24 shows delay variation comparison for

delay path consisted of five inverters. We run Monte-Carlo simulation based

on TCAD simulation in Fig. 3.12. The number of simulation was 50,000. In

3-σ level, we observe that delay variation reduces from ±1% to approximately

76

Figure 3.23: The balanced decomposed layout has less EPE than the unbal-anced one (C432).

±0.5% with the proposed decomposition method.

Last, we verify the effectiveness of timing driven decomposition. As

a test circuit, we made three net structures with a metal spacing of 32nm

assuming 3nm of overlay as shown in Table 3.4. By changing edge weight,

we could control the number of inserted stitches. For example, we could see

41.4% overlay compensation with one stitch insertion when edge weight is 0.2

for Net3. Since the maximum peak to peak delay variation caused by overlay

is 9.0%, timing variation on overlay becomes 5.274%(=9%*(1-0.414)) after

one stitch insertions. It becomes 1.098%(=9%*(1-0.878)) after three stitch

insertion. When we increase edge weight, we could see more stitch insertions

and higher overlay compensation rate.

Fig. 3.25 compares overlay compensation according to different stitch

77

Figure 3.24: Delay variation for an inverter chain to verify the robust contactdecomposition in DPL.

insertion for Net3. As more stitches are inserted, timing fluctuation along the

translation angle reduces. When nine stitches are inserted, we can see that

there is no timing variation due to overlay. The bottom peak variation is not

symmetric with top peak variation in the graph because of the second order

term of sin(θ) and cos(θ) in Eqn 3.7. Note that ignoring the second order term

with the assumption that d0 + α is approximately equal to d0 is reasonable

because we could compensate overlay more than 95% in every test structure.

3.8 Discussions

In this chapter, we propose an efficient and flexible layout decomposi-

tion framework with a graph theoretical approach. All the benchmark circuits

can be decomposed in five minutes with balanced density. Our framework can

78

Table 3.4: Overlay compensation with TDD

Timing OverlayTest #Horizontal #Vertical Variation Edge Inserted Compensationcircuit neighbors neighbors on overlay weight stitches Rate(%)

0.2 0 0.0net1 3 6 9.3% 0.5 1 76.3

1 3 95.9100 3 95.90.2 0 0.0

net2 10 9 8.1% 0.5 0 0.01 3 54.6

100 10 99.90.2 1 41.4

net3 16 16 9.0% 0.5 3 87.81 9 99.8

100 9 99.8

expedite decomposition which requires iterative executions and fixing layout

in order to remove native conflicts. Since the decomposition framework is

flexible to add constraints, we extend our work to timing driven decomposi-

tion which reduces the timing variation due to overlay. As a future work, we

can extend the framework for correlation aware decomposition and multiple

decomposition using a multi-way partitioning algorithm.

79

Figure 3.25: Reduction of timing variation as more stitches are inserted (Net3).

80

Chapter 4

TSV Stress aware Timing Analysis and Layout

Optimization for 3D-ICs

4.1 Introduction

As we discussed in the previous chapters, geometric scaling has been

facing limitation. To keep Moore’s law, the 3D-IC stacking has gained tremen-

dous interests because integration with TSVs can increase chip performance

as well as chip density [14, 46, 49, 102]. In addition, two chips manufactured

by a different process can also be integrated as one chip with 3D integration.

CTE of copper is 17×10−6K−1 at 20C, while CTE of silicon is 3×10−6K−1

at 20C [19]. The CTE mismatch between copper and silicon causes inevitable

stress on silicon for both via-first and via-last approaches. The stress can

change mobility of carriers. Therefore, TSV stress induced by CTE mismatch

may cause timing violation if cells on a critical path are placed near TSVs.

Tensile stress enhances electron mobility. However, hole mobility is either

enhanced or degraded depending on stress and transistor channel direction.

Longitudinal tensile stress reduces hole mobility while transverse tensile stress

increases the mobility [80]. When TSV induced tensile stress is 100MPa and

the stress works for longitudinal direction, hole mobility degradation can be

81

up to 7.2%, which makes PMOS transition slow. If the PMOS is on a critical

path, it can cause unexpected setup time violation which is not detected with

the current timing analysis flow.

Even though several papers have been published regarding TSV stress

for reliability, this is the first work addressing TSV stress from the circuit

design perspective, to our best knowledge. In this chapter, we propose a design

flow to analyze TSV stress induced timing variation, and show its implications

for layout optimizations during 3D-IC design. The first step of our framework

is to generate stress map assuming that TSVs are pre-placed. Stress calculation

is based on analytical model and linear super-position. Stress map is used to

estimate hole mobility variation and electron mobility variation. Since every

cell near TSVs has a different mobility depending on stress and orientation

between channel and TSV, we substitute a cell near TSVs to another cell

having the same topology but having different timing characteristics according

to the estimated hole and electron mobility change.

To show the benefit of our framework, we present that TSV stress aware

design plays an important role to optimize timing by adjusting cell locations

to take advantage of enhanced mobility property due to TSV stress. Since

hole mobility contour is different from electron mobility contour, PMOS and

NMOS should be optimized separately. If a PMOS in a cell is on a critical

path, the cell becomes a critical cell for hole mobility optimization. An NMOS

critical cell can be optimally placed using the similar procedure [3, 64].

The rest of the chapter is organized as follows. We introduce related

82

work regarding strained silicon technology and stress impacts in section 4.2.

The overall stress-aware timing analysis and design flow will be shown in sec-

tion 4.3. We propose compact mobility modeling in section 4.4. In section 4.5,

we will explain how to analyze timing with TSV stress. Experimental results

are shown in section 4.6, and we draw discussions in section 4.7.

4.2 Related Work and Motivation

The formula (4.1) shows the relation between stress and strain. E is

Young’s Modulus, E for silicon is 160GPa. σ is the applied stress and ǫ is the

deformation rate. For example, 160MPa stress in silicon results in 0.1% strain

in silicon.

σ = E × ǫ (4.1)

Strained silicon has been used to enhance Ion of a transistor [81]. How-

ever, there are several unwanted stress sources, which should be considered

during the design phase. Shallow trench isolation(STI) is one of the uninten-

tional stress source [42, 62] because SiO2 used for STI fill pushes out silicon

atoms near STI.

During 3D-IC manufacturing, another stress is caused by CTE mis-

match between copper TSV and silicon as shown in Fig. 4.1. Investigations [75]

show that at 200C an anneal time of 30-60 minutes is required in order to

achieve reasonable copper layer properties. Since CTE of copper is larger

83

Figure 4.1: Thermal stress around TSV.

than silicon, at room temperature, copper has less volume compared with

that during annealing process because of contraction. Several papers have

been published to simulate the TSV induced stress [19, 54] using FEA(finite

element analysis) simulation. They show that TSV can cause tensile stress of

more than 200MPa.

∆µ

µ= −Π× σ (4.2)

Mobility(µ) change as a function of applied stress(σ) has been proposed

by the following formula [77], where Π is the tensor of piezo-resistive coefficients

for holes and electrons, and σ is the applied stress in silicon. Tensile stress has

a positive sign and compressive stress has a negative sign.

Since tensile stress increases mean free path for electron, it enhances

NMOS performance. However, longitudinal tensile stress degrades PMOS per-

formance as shown in Fig. 4.2(a) [36]. With longitudinal stress, piezo-resistive

84

(a) ∆µ/µ for longitudinal tensile stress

(b) ∆µ/µ for transverse tensile stress

Figure 4.2: Mobility change due to tensile stress.

85

coefficient for electrons is -3.16× 10−10 Pa−1, and the coefficient for holes is

7.18× 10−10 Pa−1 for (001) wafer surface and 〈110〉 channel which are the

most popular scheme for semiconductor manufacturing [77, 79]. For example,

when TSV stress is 200MPa, (∆µ/µ)e is +6.32% for NMOS, and (∆µ/µ)h is

-14.36% for PMOS.

However, if TSV is placed perpendicular to a transistor channel, mo-

bility for both holes and electrons is enhanced by adding more space in silicon

lattice for carriers to move fast. For transverse stress, piezo-resistive coeffi-

cient for electrons is -1.76× 10−10 Pa−1, and the coefficient for holes is -6.63×

10−10 for (001) surface and 〈110〉 channel. Similarly, we can expect (∆µ/µ)e =

+3.52%, (∆µ/µ)h = +13.26% with σ=200MPa. Empirically, it is known that

(∆Ion/Ion)pmos is 0.5∼0.9 times of (∆µ/µ)h, and (∆Ion/Ion)nmos is 0.4∼0.6

times of (∆µ/µ)e [56, 84] because Ion of a transistor is determined by the sum

of source, drain, and channel resistance. Therefore, TSV stress aware timing

analysis and layout optimization are essential steps for 3D-IC design.

4.3 Overall TSV Stress aware Design Flow

The overall flow of our 3D-IC design methodology is shown in Fig. 4.3.

Our timing analysis is consisted of two steps. The first step is to calculate

TSV stress and mobility change. Since FEA simulation which provides an

accurate solution takes several hours to simulate stress for one TSV, we use

the analytical model proposed in [54]. Mobility change can be calculated

by extension of the formula (4.2). We will explain the process and device

86

Figure 4.3: Overall flow for TSV stress aware design.

modeling for a single TSV in section 4.4.1, and extend to consider multiple

TSVs in section 4.4.2. The second step is 3D timing analysis with TSV stress.

We use PrimeTime as a STA (static timing analysis) engine. In section 4.5, we

explain how to deal with Verilog netlist and timing library to consider mobility

variation. The timing result can be used for layout optimization.Intuitively, if

a PMOS in a cell is on a critical path, the cell should be moved to the region

that has positive (∆µ/µ)h. Then, we can run timing analysis iteratively to

verify the optimization effect.

87

4.4 TSV Stress and Mobility Variation Modeling

In this section, we will present compact process and device modeling

to consider TSV stress effect on timing.

4.4.1 Mobility Variation for a Single TSV

In this section, we assume that the shape of TSV is a cylindrical type

which is widely used for better manufacturability. FEA based TSV simulation

has been proposed [19, 54]. The simulation approaches provide an accurate

solution with long runtime which is not acceptable for our design flow that

should calculate stress for several thousands of TSVs iteratively after each

optimization. Assuming 2-D radial plain stress, we use the following analytical

solution which is known as Lame′

stress solution in [54].

σrr = −B∆α∆T

2

(

R

r

)2

(4.3)

The analytical stress model provides a relatively accurate solution [54].

In the formula (4.3), B is biaxial modulus, ∆α is CTE difference between cop-

per and silicon, ∆T is the temperature difference between copper annealing

and operating temperature. R is TSV radius, and r is a distance from TSV

edge. We assume that ∆T is 175C which is the case of 25C for the room

temperature and 200C for the copper annealing temperature which is rela-

tively low annealing temperature [75]. The formula shows that the thermal

88

stress near TSV depends on the ratio of TSV radius and a distance from a

TSV edge.

(a) NMOS mobility variation (b) PMOS mobility variation

(c) Optimal placement for thebest NMOS performance

(d) Optimal placement for thebest PMOS performance

Figure 4.4: Optimal orientation of MOSFET to maximize the mobility for(001) surface, 〈110〉 channel.

The formula (4.2) provides an efficient way to calculate mobility varia-

tion due to σrr. As we observed in section 4.2, mobility change depends on not

only σrr but also orientation between applied force and a transistor channel.

The empirical value for showing the relation of mobility change and a channel

direction has been proposed in [36]. We extend the formula (4.2) to consider

89

stress and channel direction in (4.4).

∆µ

µ(θ) = −Π× σrr × α (θ)

θ = tan−1

∣

∣

∣

∣

YTSV − Ypoly

XTSV −Xpoly

∣

∣

∣

∣

(4.4)

where α (θ) is an orientation factor as a function of θ which is defined

the degree between the center of TSV and the center of a transistor channel

when a transistor is placed vertically as shown in Fig. 4.4(a),(b). Π is the

piezo-resistive coefficient at θ = 0 which works as longitudinal stress.

In Fig. 4.4(a), if NMOS is in right side of TSV, θ becomes zero, and

α(0) becomes one, which enhances NMOS mobility at its maximum. However,

if NMOS is in upper side of TSV, α(π/2) is 0.5, which means that NMOS

mobility increase is half of the enhancement at θ=0. PMOS shows opposite

trends, which has the best mobility enhancement at θ=π/2. If θ is zero,

then, PMOS becomes slower than the case of no stress. Fig. 4.4(c) and (d)

show the transistor direction for the best performance. Even though the mixed

channel direction is not allowed due to the patterning difficulty, the observation

provides a way to optimize layout for 3D-ICs.

We generate stress contour map based on (4.3). Fig. 4.5 shows contour

for a TSV having radium of 1.5um. Since the region near TSV may have a

crack or extremely high stress, we define that 0.5um from TSV edge is Keep-

Out-Zone(KOZ), in which no cell is allowed to be placed. We can see stress

90

Figure 4.5: Stress contour map for a single TSV with 0.5um KOZ.

of more than 200MPa out of KOZ. Approximately, 100MPa stress appears on

the region of 1um from a KOZ edge.

Fig. 4.6(a) shows a contour map for hole mobility variation. From the

contour, we can see that hole mobility decreases in a horizontal direction, while

it increases in a vertical region. 45 direction has no hole mobility change.

Contour map for electron mobility variation is presented in Fig. 4.6(b). As we

see in Fig. 4.4(a), horizontal direction has more mobility enhancement zone.

4.4.2 Mobility Variation for Multiple TSVs

Since we use many TSVs for signaling, power/ground and clock net-

work, we need to consider stress effect for multiple TSVs. Each TSV works as

91

(a) Contour map for hole mobility variation

(b) Contour map for electron mobility variation

Figure 4.6: Mobility contour map for a TSV.

stress source to silicon. When a position in a wafer is strained by multiple stress

sources, linear super-position can provide the multiple stress solution [54]. We

propose the mobility variation for multiple TSVs.

∆µ

µ total

=∑ ∆µ

µ(θ) = −Π

∑

i∈TSV s

(σi × α (θi))

(4.5)

92

where σi is the tensile stress caused by ith TSV, α (θi) is the orientation

factor of ith TSV. θi is the degree between the center of ith TSV and a point

that we want to get mobility variation.

(a) σtotal (b) (∆µ/µ)h (c) (∆µ/µ)e

Figure 4.7: Linear super-position of TSV stress.

Fig. 4.7 shows stress and mobility variation contour with linear super-

position for four-TSV array. We can see more stress in a region between TSVs.

(∆µ/µ)e contour has similar trend with stress contour. However, (∆µ/µ)h has

less variation between TSVs. In Fig. 4.8(a) and (b), we compare the (∆µ/µ)h

for two different TSV placement schemes having the same TSV density. Since

zigzag TSV placement has compensation effect for positive and negative hole

mobility between adjacent rows, Fig. 4.8(a) has more stress free zone than

Fig. 4.8(b) even if the mobility degradation effect within a row remains the

same. Fig. 4.8(c) and (d) show electron mobility contour for zigzag and regular

TSV placement, respectively. They do not have compensation effect. From

Fig. 4.8, we can see that zigzag TSV placement is preferred for less PMOS

93

variation while regular TSV placement is preferred for more hole mobility

enhancement zone.

(a) Zigzag TSV placement (b) Regular TSV placement

(c) Zigzag TSV placement (d) Regular TSV placement

Figure 4.8: Zigzag TSV placement has less (∆µ/µ)h between rows due tocompensation.

4.5 Timing Analysis with TSV Stress Consideration

In this section, we explain how to incorporate the mobility variation

into cell level STA flow.

94

4.5.1 Timing Analysis for 3D-ICs

Figure 4.9: Timing corner determination according to mobility variation.

Even though topology of a cell is the same, its timing characteristic will

be changed. Fig. 4.9 shows the example that cells having the same topology

and size can be in different timing corners systematically determined by TSVs.

When two TSVs are near three inverters, cell characteristics are different in

a different position. From the formula (4.5), we can determine ∆µ/µ in any

point for a given layout. After mobility calculation, our framework renames for

cells to include mobility variation in verilog netlist. For example, I2 is renamed

to INVX1 N8 P8 which means -8% hole mobility, +8% electron mobility in

Fig. 4.9.

We prepare a verilog netlist and a parasitic extraction file (SPEF) per

die. In addition, we make a top level Verilog netlist that instantiates the dies

and connects them using wires which corresponds to TSV connections. Then

we make a top level SPEF file for the TSV connections. With a proper timing

constraints file, we can run PrimeTime and get the 3D STA results.

95

4.5.2 Timing Library for Mobility Variation

Figure 4.10: Timing corner with TSV stress.

To consider the systematic variation during timing analysis, we charac-

terize a cell with different mobility corners as shown in Fig. 4.10. Hole mobility

variation is from -14% to +8%, and electron mobility variation is up to +8%

to cover stress caused by TSVs in Fig. 4.9. I1 in Fig. 4.9 is matched the corner

near FF corner, while I3 is in FS corner. With mobility variation aware library

and Verilog netlist having renamed cells, we can run PrimeTime to do timing

analysis with TSV stress.

To cover mobility variation caused by multiple TSVs, we need to extend

the mobility variation range (-22%≤ (∆µ/µ)h ≤+10%, 0%≤ (∆µ/µ)e ≤+24%).

If mobility step is 2%, we need to characterize 221 library with different mo-

bility values which is not available. However, we can observe that rising delay

variation only depends on (∆µ/µ)h, falling delay variation depends on (∆µ/µ)e

96

(a) Rising delay dependency on (∆µ/µ)h

(b) Falling delay dependency on (∆µ/µ)e

Figure 4.11: Inverter delay variation with different (∆µ/µ)h and (∆µ/µ)e.

from Fig. 4.11. When we simulate inverter rising delay with mobility varia-

tion, electron mobility variation does not work for the delay. Similarly, we

can see that falling delay only depends on electron mobility variation. In

addition, from Fig. 4.11, we can see that hole mobility variation can cause

more than 20% PMOS performance variation depending on device technol-

ogy, and electron mobility variation can enhance NMOS performance up to

7.5%. We use inverter in NCSU library and PTM spice model [101] to get the

97

Table 4.1: TSV specification

Width Landing pad KOZ Height Dielectric Resistance Capacitance4.14um 4.54um 0.4um 20um 0.2um 0.1Ω 70fF

tables in Fig. 4.11. Therefore, we can fix (∆µ/µ)e when we sweep (∆µ/µ)h.

30 (=17+13) library characterization will be enough to cover the entire mo-

bility set. If mobility step is 4%, 16 (=9+7) library set is required. Since

delay variation has semi-linear dependency on mobility variation, we can use

interpolation for the mobility value between two libraries.


We implement TSV stress aware 3-D timing analysis flow in C++. We

test on a 3.0GHz Linux machine with 4G RAM to verify our implementation.

We generate the mobility aware library based on NCSU 45nm cell library with

2% mobility step. TSV used in this experimentation is in Table 4.1.

First, we show the efficiency of our compact stress and mobility mod-

eling. When we want to get ∆µ/µ at any point on a die, we can get the value

promptly. Even though we generate mobility contour in Fig. 4.12 (Die size:

1.752mm2, #TSVs: 462), it takes only 14.9s. The proposed timing analy-

sis with compact process/device model is fast enough to be used for iterative

optimization purpose. Fig. 4.12(a) shows an observation for layout optimiza-

tion that the leftmost and rightmost sides have more hole mobility enhanced

98

zone than the middle area because the region has less mobility degradation by

horizontally placed neighboring TSVs.

(a) (∆µ/µ)h (b) (∆µ/µ)e

Figure 4.12: ∆µ/µ contour map for 22 x 21 TSV array.

Second, we compare stress aware timing result with no stress case. Ten

benchmark circuits are used to show the timing variation in Table 4.2. The

benchmark circuits are placed for wire length minimization [46] without TSV

stress consideration. We assume that there are four dies stacking, and the

number of inserted TSVs are 10% of #cells in each circuit. When we consider

TSV stress effect, the longest path delay of the benchmarks has variation from

-1.32% to 3.58%. Some benchmarks have timing gain while some benchmarks

have timing penalty. If we consider TSV stress effect during cells and TSVs

placement, we can expect performance improvement for every benchmark.

TNS (total negative slack) has more variation from -12.43% to 22.9% which

99

is bigger than delay variation. That motivates the need of TSV stress aware

layout optimization.

Table 4.2: Longest path delay and TNS comparison

Without TSV stress With TSV stress DifferenceCircuit #Cells Longest TNS Longest TNS Longest TNS

Delay(ns) (ns) Delay(ns) (ns) Delay

IDCT 14,864 12.07 -21,293 11.91 -19,652 -1.32% -7.71%8051 15,712 4.78 -7,868 4.94 -7,956 3.32% 1.12%8086 19,895 9.56 -8,557 9.56 -9,045 0.00% 5.71%MAC2 29,706 7.72 -17,561 7.72 -17,619 0.07% 0.33%

ETHERNET 77,234 18.30 -476 18.95 -482 3.58% 1.24%RISC 88,401 8.28 -1,249 8.34 -1,535 0.74% 22.90%B18 103,711 11.28 -2,082 11.25 -1,823 -0.27% -12.43%

DES PERT 109,181 8.61 -2,801 8.64 -2,575 0.25% -8.06%VGA LCD 126,379 8.01 -543 8.14 -538 1.56% -1.02%

B19 168,943 13.01 -5,539 12.98 -4,974 -0.20% -10.20%

Last, we manually optimize the critical path in 8051 to present the po-

tential benefit of TSV stress aware layout optimization. Before optimization,

the path delay is 4.94ns with stress aware timing analysis. However, we could

reduce the delay to 4.62ns with small layout perturbation which is 6.5% im-

provement. It is even less than the path delay without stress which is 4.78ns in

Table 4.2. Table 4.3 shows the gates on the path. We can see the cell remaining

according to the mobility variation. We adjust each cell location with small

perturbation so that each cell has timing gain. The maximum timing gain in a

cell is 14% improvement. Fig. 4.13 shows how cell relocation works for timing

optimization. We capture the placement result on die2 with mobility variation

contours. The cells in logic depth 2,4,8 and 9 are hole mobility critical cells

100

Table 4.3: Gate optimizations on the target path with perturbation

Logic Original Optimized Timing Original Optimized ReductionDepth Gate Gate Arc Delay(ns) Delay(ns) Ratio

DFFPOSX1 DFFPOSX1 fall 0.337 0.334 -0.6%1 NOR3X1 N2 P14 NOR3X1 P4 P14 rise 0.800 0.767 -4.1%2 AND2X1 N12 P12 AND2X1 P0 P12 rise 0.539 0.492 -8.7%3 INVX1 N6 P12 INVX1 N6 P16 fall 0.207 0.191 -7.9%4 INVX1 N12 P12 INVX1 P2 P12 rise 0.653 0.585 -10.4%5 AND2X1 N16 P16 AND2X1 N4 P14 rise 0.576 0.535 -7.2%6 BUFX2 P6 P12 BUFX2 P6 P12 rise 0.245 0.216 -11.8%7 AOI22X1 P4 P10 AOI22X1 P4 P14 fall 0.159 0.148 -7.3%8 INVX1 P0 P10 INVX1 P2 P12 rise 0.107 0.105 -1.5%9 OR2X1 N4 P10 OR2X1 P2 P8 rise 0.490 0.468 -4.3%10 OR2X2 N16 P18 OR2X2 N2 P12 rise 0.068 0.059 -13.3%11 NOR3X1 P0 P14 NOR3X1 P0 P16 fall 0.100 0.089 -11.6%12 NAND3X1 N4 P14 NAND3X1 P2 P12 rise 0.055 0.051 -7.3%13 BUFX2 N4 P14 BUFX2 P4 P12 rise 0.157 0.149 -4.8%14 OR2X2 P0 P8 OR2X2 P2 P8 rise 0.170 0.169 -1.0%15 AOI22X1 N16 P16 AOI22X1 N16 P16 fall 0.076 0.075 -1.7%16 OAI21X1 N4 P14 OAI21X1 P4 P12 rise 0.072 0.069 -4.9%17 NOR3X1 P2 P14 NOR3X1 P2 P16 fall 0.035 0.034 -2.3%18 AOI21X1 N18 P18 AOI21X1 P4 P12 rise 0.047 0.040 -14.0%19 INVX1 N16 P16 INVX1 N16 P18 fall 0.027 0.024 -9.6%20 OAI21X1 P6 P14 OAI21X1 P6 P14 rise 0.017 0.017 1.6%

Path Delay 4.93719 4.61777 -6.5%

because the timing arc is rising on the path. Therefore, we perturb the cells

to be placed close to green area with hole mobility contour. However, the cells

in logic depth 3 and 7 are electron mobility critical. Therefore, we push the

cells to have more mobility enhancement in Fig. 4.13(c) (d).

101

4.7 Discussions

The 3D IC stacking requires TSV for interconnection between wafers.

Cu TSV causes thermal stress which can lead to significant timing variations.

Stress, though commonly believed to have negative impact on timing can ac-

tually be taken advantage of for timing optimization, since it is a strongly

layout dependent, systematic effect. In this chapter, we develop the first-order

compact model for mobility variation and propose a design methodology to

analyze the systematic variation and optimize layout by locating critical cells

in a mobility enhanced region. We observe up to 24.9% delay variation for

PMOS and 7.5% delay variation for NMOS, which is not considered in exist-

ing timing analysis flow. Our TSV stress-aware timing analysis framework for

3D-IC also open the opportunity for many stress-aware layout optimizations,

such as placement and TSV optimizations.

102

(a) Hole mobility contour with originalcell placement

(b) Hole mobility contour after cell per-turbation

(c) Electron mobility contour with origi-nal cell placement

(d) Electron mobility contour after cellperturbation

Figure 4.13: Cell perturbation to take advantage of mobility variation.

103

Chapter 5

Robust Clock Tree Synthesis with Timing

Yield Optimization for 3D-ICs

5.1 Introduction

As we discussed in the previous chapter, 3D integration with TSVs has

been gained main focus for future System On Chip (SOC) integration. How-

ever, we need to resolve several design challenges for robust 3D integration.

Since TSV induced stress is a source of systematic variation, we can use the

stress during chip design. In this chapter, we present design optimization tech-

niques for TSV-based 3D-ICs using the stress model presented in the previous

chapter, especially in the clock network design [71, 72].

There have been several works on CTS in 3D-ICs. BURITO [59] ad-

dresses buffered clock tree in two stacked dies, and the work in [47] clarifies

the whole flow for the 3D CTS in N-stacked dies without buffer insertion.

The paper in [102] proposed pre-bond testable CTS methods. However, the

previous works have not considered new design challenges for 3D-IC such as

inter-die variation and TSV induced stress.

Process variation can be decomposed into three components [52]: wafer-

to-wafer (inter-die) variation, intra-die variation and random variation. The

104

main challenge of 3D design comes from integration of tiers in different timing

corners, which means that cells along a path can have totally different char-

acteristics on variation. In addition, cells in different tiers lose their spatial

correlation. In other words, cells placed only in the same tier are spatially

correlated in process variation [89]. The paper in [25] proposed how to select

tiers for 3D integration based on the pre-bond measurement data in order to

maximize parametric yield. In this chapter, we propose more aggressive clock

network design to take advantage of timing corner mismatch. After all the cells

and signal TSVs are placed, we can adjust clock buffer z-location to minimize

sum of covariance for better timing yield. We propose an ILP formulation to

determine clock buffer z-location for optimization of near critical paths.

Another design for manufacturing (DFM) challenge of 3D-ICs comes

from difference of Coefficients of Thermal Expansion (CTE) [19, 54, 92]. Be-

cause CTE of copper is larger than the value of silicon, tensile stress appears

on silicon near TSVs after cooling down to room temperature. The stress can

change clock buffer driving capability due to mobility variation. Since PMOS

is more sensitive to silicon stress [92], rising delay has more impact on the

stress, which means that clocking scheme using positive edge triggered flip

flop is more susceptible to TSV induced stress. In this chapter, we propose

buffer delay model for the stress and stress aware clock network design.

Initially, we generate an abstract tree. Since the abstract tree does not

provide where clock buffers are inserted, we cannot determine z-location of

clock buffer before clock buffer insertions are determined. To break this cycle,

105

we use a bottom-up tree construction approach from sink to source, iteratively.

At leaf nodes, we identify if buffer insertion is required, then, determine z-

location with our ILP formulation which works for minimizing clock period

variation. We fix z-location of buffers determined already at the previous steps.

In the next level, we find buffer insertion points and determine z-location of

buffers for nodes. Iteratively, z-locations of clock buffers are determined to

optimize timing yield until it reaches to a clock source. Meanwhile, buffer

variation due to the stress is calculated and considered.

The rest of the work is organized as follows. We will show related

work and motivation in section 5.2. We will propose our robust clock tree

construction in section 5.3. Experimental results will be shown in section 5.4,

and we drea discussions in section 5.5.

5.2 Related Work and Motivation

Clock period(CP ) under process variation is determined by the follow-

ing equation 5.1 at 3σ-level.

CP = µcp + 3σcp (5.1)

Here, mean of clock period is determined by equation 5.2. TCtQ and Tsetup

are clock to q propagation delay and setup time for a flip-flop, respectively.

Combinational logic delay is denoted by Tlogic. Tskew is clock skew for a clock

network.

106

µcp = TCtQ + Tlogic + Tsetup + Tskew (5.2)

There are two ways to improve chip performance, thereby, enhance

timing yield during CTS. First, we can try to minimize σcp. In this chapter,

we show that σcp reduction can be achieved during CTS for 3D-ICs. This is

a random variation reduction process. The second method is to minimize µcp.

We can achieve the goal by Tskew reduction in 3D CTS by considering TSV

induced stress.

5.2.1 Clock Buffer Tier Assignment

Figure 5.1: Clock path p with clock buffers

Fig. 5.1 shows a clock path spreading along two dies. We define clock

buffers connected to Flip-Flop (F/F) for path inputs as a type-A buffers. In a

similar way, clock buffers connected to F/Fs for path outputs are defined as a

type-B buffers. In Fig. 5.1, buffer A is type-A, and buffer B is type-B. Let F1,

B be placed in die0 and L1, F2 be placed in die1. If z-location of clock buffer

107

A is flexible, we can assign clock buffer A to either die0 or die1. Intuitively,

we can assign A to die0 to avoid TSV insertion between A and F1. However,

we have to consider covariance between A and other cells during z-location

determination for the buffer.

For the path p in Fig. 5.1, µcp is determined by equation 5.3 if prop-

agation delay from clock source to buffer A is the same one with delay from

clock source to buffer B. Here, E(F1) stands for mean delay of F1 clock to

q, and E(F2) is mean value of F2 setup time. Mean value of each cell delay

is denoted by E(cell).

µ(p)cp = E(A) + E(F1) + E(L1) + E(F2)− E(B) (5.3)

Variance of CP for the path p is determined by equation 5.4. Variance

of each cell delay is denoted by V ar(cell). σ(p)2cp is the sum of variance of each

gate and covariance between two cells. Since two cells in different dies lose

their correlation, their covariance terms become zero. After cell placement,

we can still determine clock buffer z-location in order to minimize sum of

covariance, which reduces σcp and enhances operating frequency and timing

yield in 3D-ICs.

108

σ(p)2cp = V ar(µcp)

= V ar(A) + V ar(F1) + V ar(L1) + V ar(F2) + V ar(B)

+ 2Cov(A,F1) + Cov(A,L1) + Cov(A,F2)− Cov(A,B)

+ Cov(F1, L1) + Cov(F1, F2)− Cov(F1, B)

+ Cov(L1, F2)− Cov(L1, B)− Cov(F2, B) (5.4)

In our example, let each cell have the same variance and covariance

which are denoted by V AR and COV , respectively. If buffer A is placed

on die1, Cov(A,F1), Cov(A,B), Cov(F1, L1), Cov(F1, F2), Cov(L1, B) and

Cov(F2, B) become zero because they lose their correlation. Similarly, we can

obtain sum of covariance when buffer A is assigned to die0 in equation 5.5.

We can minimize clock period variation by putting buffer A into die0.

If bufferA is on die0, σ2cp = 5V AR

If bufferA is on die1, σ2cp = 5V AR + 4COV (5.5)

In this chapter, we propose an optimal buffer tier assignment to mini-

mize σcp for near critical paths during 3D CTS.

5.2.2 Clock Buffer Variation due to TSV induced Stress

Strained silicon has been used to enhance Ion of a transistor [81]. How-

ever, there are several unwanted stress sources, which should be considered

109

during the design phase.

Figure 5.2: Thermal stress around TSV.

In 3D-IC manufacturing, unwanted stress is caused by CTE mismatch

between copper TSV and silicon as shown in Fig. 5.2. Investigations [75] show

that at 200C an anneal time of 30-60 minutes is required in order to achieve

reasonable copper layer properties. Since CTE of copper is larger than that of

silicon, after annealing, copper has less volume compared with silicon. Several

papers have been published to simulate TSV induced stress [19, 54] using finite

element analysis(FEA) simulation. They show that TSV can cause tensile

stress of more than 200MPa.

∆Mobility = −Π× TSVstress (5.6)

Mobility(µ) change as a function of applied stress has been proposed

by the piezo-resistance model [77], where Π is the tensor of piezo-resistive

coefficients for holes and electrons, and TSVstress is the applied stress in silicon

due to TSVs.

110

(a) Slower rising delay with longitudinaltensile stress

(b) Faster rising delay with transverse ten-sile stress

Figure 5.3: Buffer delay variation due to TSV stress.

Longitudinal tensile stress degrades PMOS performance as shown in

Fig. 5.3(a) [36]. With longitudinal stress, piezo-resistive coefficient for holes

is 7.18× 10−10 Pa−1 for (001) wafer surface and 〈110〉 channel which are the

most popular schemes for semiconductor manufacturing [77, 79]. For example,

when TSV stress is 200MPa, ∆Mobility is -14.36% for PMOS.

However, if TSV is placed perpendicular to a transistor channel in

Fig. 5.3(b), hole mobility is enhanced by adding more space in silicon lattice for

carriers to move fast. For transverse stress, piezo-resistive coefficient for holes

is -6.63× 10−10 for (001) surface and 〈110〉 channel. Similarly, we can expect

∆Mobility = +13.26% with TSVstress=200MPa. Empirically, it is known that

driving current variation for PMOS is 0.5∼0.9 times of ∆Mobility [56, 84].

111

Therefore, systematic clock buffer variation should be considered for robust

clock tree construction.

5.3 Robust Clock Tree Design

Figure 5.4: Overall proposed 3D CTS flow.

In Fig. 5.4, we propose 3D CTS to deal with new challenges presented

112

in section 5.2. The first step is to generate an initial abstract tree having

minimum wire-length with 3D-MMM algorithm [59]. 3D-MMM algorithm

constructs a 3D abstract tree with decision of z-location of merging points in

a recursive top-down manner. We assign the clock TSVs under a given TSV

upper bound, and determine the hierarchical connection among the clock sinks,

internal nodes and clock TSVs. The abstract tree has only merging point and

child node information. In other words, after abstract tree generation, we

do not know where clock buffers are inserted. Therefore, we cannot decide

z-location of a clock buffer. However, to determine buffer insertion, we need

to know TSV insertion point and buffer z-location to calculate downstream

capacitance. To break the problem, we propose a depth by depth buffered

clock tree construction approach from sink to source as illustrated in Fig. 5.5.

First, as shown in Fig. 5.5(a), we identify buffer insertion points if

the downstream capacitance is bigger than allowed maximum capacitance.

Then, in Fig. 5.5(b), we determine z-location of buffers in order to minimize

covariance terms with an ILP formulation in section 5.3.1. After buffer z-

location determination, we need to adjust the z-location of merging point of

the up-stream tree in order to minimize TSV insertion in Fig. 5.5(c). On the

next level of the abstract tree, the same procedures are executed in Fig. 5.5(d).

Once z-location is determined for a clock buffer, we determine x and y location

of buffers. After that, buffer variation due to TSV stress is calculated and

wire-length is calculated to get rid of skew in section 5.3.2.

113

(a) Identification of buffer insertionpoints

(b) Buffer tier assignment

(c) Updating tier information for mergingpoint(MP)s

(d) Repeating the procedures to the nextlevel

Figure 5.5: An illustration of the buffer tier assignment procedure in a bottom-up manner. Note that MP1’s location is changed at step(c) to minimize#TSVs.

5.3.1 σCP Minimization for Critical Paths

From the observation in section 5.2.1, our goal is to minimize sum of

covariance by assigning clock buffer z-location optimally.

Minimize

M−1∑

i=1

M∑

j=i+1

αi,j ∗ Covi,j

(5.7)

Our problem is defined in formulation 5.7. Every pair of two cells in

a clock path has a covariance value denoted by αi,j . M is the number of

114

instances including clock buffers, flip-flops and logic gates in a clock path.

Covi,j = Di,0Dj,0 +Di,1Dj,1 + · · ·+Di,N−1Dj,N−1

where, D ∈ 0, 1

Di,0 +Di,1 + · · ·+Di,N−1 = 1

Dj,0 +Dj,1 + · · ·+Dj,N−1 = 1 (5.8)

Covi,j shows their relations for covariance in the boolean equation 5.8.

If z-location of cell i is the same with that of cell j, Covi,j becomes one. Oth-

erwise, Covi,j becomes zero, which means that there is no spatial correlation

between two cells. Di,n is a binary variable used to indicate z-location of cell

i. For example, if Di,0 is one, cell i is determined to be placed on die0. N is

the number of tiers to be stacked for 3D integration.

115

Minimize

M−1∑

i=1

M∑

j=i+1

αi,j

N−1∑

k=0

Yi,j,k

Subject to

Di,0 +Di,1 + · · ·+Di,N−1 = 1

Dj,0 +Dj,1 + · · ·+Dj,N−1 = 1

Di,k +Dj,k − Yi,j,k ≤ 1

Di,k +Dj,k − Yi,j,k ≥ 0

Di,k −Dj,k − Yi,j,k ≥ −1

−Di,k +Dj,k − Yi,j,k ≥ −1 (5.9)

By combining formulation 5.8 and 5.7, we can obtain an ILP formu-

lation to minimize covariance in formulation 5.9 for the most critical path.

Yi,j,k are temporal binary variations introduced to convert AND operation

(Di,kDj,k) to ILP. If z-locations of two cells are already determined during 3D

placement, we can skip the pair in formulation 5.9 and save runtime for solving

the ILP formulation.

Clock buffers can be connected to multiple clock sinks. If a buffer

z-location determined by one path differs from z-location determined from

another clock path, there will be conflicts of optimization procedure.

Di,0 +Di,1 + · · ·+Di,N−1 = 1 (No restriction)

⇒Di,t−1 +Di,t +Di,t+1 = 1 (With restriction) (5.10)

116

In addition, we need to prevent insertion of multiple TSVs between

consecutive clock buffers. For example, a parent buffer can be assigned to

die3 when a child buffer is already fixed to die1. In that case, two TSVs

are required. To avoid the hopping problem, we restrict z-location for parent

buffer i from t − 1 to t + 1 when pre-determined child buffer is on die t as

shown in formulation 5.10.

Minimize Z1 + Z2 + · · ·+ Zp + · · ·+ ZL

Subject to

M ′−1∑

i=1

M ′

∑

j=i+1

αi,j

N−1∑

k=0

Yi,j,k,p = Zp

Di,t−1,p +Di,t,p +Di,t+1,p = 1

Dj,t′−1,p +Dj,t′,p +Dj,t′+1,p = 1

Di,k,p +Dj,k,p − Yi,j,k,p ≤ 1

Di,k,p +Dj,k,p − Yi,j,k,p ≥ 0

Di,k,p −Dj,k,p − Yi,j,k,p ≥ −1

−Di,k,p +Dj,k,p − Yi,j,k,p ≥ −1 (5.11)

We extend the ILP formulation to optimize multiple critical paths in

formulation 5.11. L is the number of targeting paths for our optimization

problem. M ′ is the number of instances including clock buffer, flip-flop and

logic gates in clock path p. t and t′ are child node z-locations for clock buffer i

117

and j, respectively. The formulation aims to minimize delay variation for the

selected critical paths.

αi,j = ±2 (Cov(i, j) ∗ ρi,j − βi,j)

If xi,j ≤ XL, ρi,j = 1− xi,j

XL

∗ (1− ρmin)

Else If xi,j > XL, ρi,j = ρmin (5.12)

We use the spatial correlation model in [28] to consider distance factor

of spatial correlation as shown in equation 5.12. Let covariance between two

cells i and j be Cov(i, j). We can characterize Cov(i, j) from Hspice measure-

ment. ρi,j is the distance factor to represent that spatial correlation reduces as

distance between two cells increases. xi,j means geometrical distance between

two cells. If xi,j is smaller than XL, ρi,j decreases as xi,j increases. When xi,j

reaches XL, ρi,j becomes ρmin.

(a) Too many TSV insertion (b) Less TSV insertion with less optimalsolution

Figure 5.6: Necessity of #TSV control between two consecutive clock buffers.

The proposed formulation can insert many TSVs between clock buffers

as shown in Fig. 5.6(a). In order to control the number of TSVs, we introduce

118

a new parameter βi,j in equation 5.12. By increasing βi,j , we can decrease

αi,j , thereby, raise the possibility of assigning clock buffer i and j to the same

die. It can reduce the number of inserted TSVs. We can explore the optimal

βi,j value to minimize clock period variance at the specific number of TSV

insertion. αi,j has minus sign only if one clock buffer is type-B defined in

section 5.2.1 because variation of type-B buffer can compensate overall clock

period variation.

5.3.2 Buffer Variation Modeling under TSV induced Stress

Our stress induced variation modeling consists of three steps: 1) com-

pact stress modeling, 2) piezo-resistive model to calculate ∆Mobility, 3) buffer

characterization by sweeping hole and electron mobility. Since FEA simulation

takes several hours even for single TSV stress simulation, we use the analyt-

ical compact model in [54] and linear superposition for multiple TSVs [92]

as a practical way. Then, we convert the stress to mobility variation with

piezo-resistive model in equation 5.6. Since mobility variation due to stress

depends on not only applied stress strength but also orientation between TSV

and transistor channel [36], we use the modified piezo-resistive model in equa-

tion 5.13. Here, Of (θ) is an orientation factor which is obtained from empirical

data in [36] and θ is the degree between center of TSV and transistor channel.

∆Mobility = −Π× TSVstress ×Of (θ) (5.13)

119

Clock buffer delay is pre-characterized according to hole and electron

mobility variation. Assuming rising edge triggered flip-flops, our concern on

buffer delay variation can be narrowed to rising delay only. In table 5.1,

we present rising delay variation to show how much clock buffer delay can

be changed by mobility variation. We can extend the work to falling edge

triggered cases in a similar way. From the table 5.1, rising delay variation

mainly depends on hole mobility variation because PMOS is used to charge

output capacitance during the rising transition. We use NanGate library and

45nm PTM model [101] to characterize the delay variation.

Table 5.1: Buffer rising delay variation according to mobility changes (nominaldelay: 210ps)

∆ Electron ∆ Hole MobilityMobility -16% -8% 0% 8% 16%

0% 12.0% 5.1% 0.0% -4.0% -7.6%8% 10.8% 4.8% -0.3% -4.4% -7.9%16% 11.3% 5.3% 0.1% -4.7% -8.4%

To show clock buffer variation under our modeling, we present rising

delay contour based on the proposed modeling with four TSVs in Fig. 5.7.

Fig. 5.7(a) shows TSV induced stress contour. Radius of TSVs is 2um and

Keep-Out-Zone (KOZ), denoted by gray cylindrical shape, is 1um. Stress due

to the TSV is approximately 150Pa out of KOZ. Fig. 5.7(b), (c) shows electron

mobility and hole mobility variation contours, respectively. Since hole mobility

can be either enhanced or degraded based on relative orientation between a

120

(a) TSV induced stress contour (b) Electron mobility variationcontour

(c) Hole mobility variation con-tour

(d) Rising buffer delay varia-tion contour

Figure 5.7: TSV induced stress and clock buffer variation modeling.

TSV and a transistor channel, we can see that hole mobility is more susceptible

to the stress than electron mobility. Finally, Fig. 5.7(d) shows buffer delay

variation contour for rising transition. As we expect, rising buffer delay is

strongly depending on hole mobility variation. In the four TSVs case, we

observe approximately 10% delay variation for clock buffers from -3% to +7%.

Therefore, TSV stress can lead excessive skew over the minimum skew target

if we do not take account of TSV induced stress effect during CTS.

121

5.3.3 Three-Dimensional Buffered Clock Tree Synthesis (CTS)

The major difference between 2D and 3D clock tree comes from TSVs.

TSVs not only add much larger capacitances which cause more buffer insertion

than 2D clock tree, but also give stress to the clock buffer nearby and changes

the effective resistance of the buffer. Since TSV may lead to manufacturability

problems as well, it is desirable to reduce the number of TSVs during 3D CTS,

besides the fundamental goal of 2D clock tree, zero skew with minimum wire-

length. The 3D CTS is done in bottom-up manner. We assume that TSVs

for logic paths are already fixed, and TSVs for clock trees can be arbitrary

located unless there is an overlapping with other TSVs or cells.

Abstract Tree Generation : As briefly explained in section 5.3, we

use 3D-MMM algorithm to get the abstract tree from given sink location under

the given TSV upper bound [59]. After this step, z-location of each merging

point (MP) is determined.

For every depth in bottom-up manner, do followings:

a) Identify candidates for buffer insertion, if child node capaci-

tance exceeds predefined capacitance.

b) Determine z-location of buffer, using the ILP formulation to

minimize covariance. ILP formulation uses the clock tree information which

has been constructed so far, and logical path information to make the optimal

z-location of newly inserted buffer. If the z-location of buffer determined by

the ILP formulation is different from the z-location of child node, a TSV is

122

inserted between child node and buffer. If buffers on two edges are assigned

to the same tier and MP is not, we substitute MP tier to buffer tier in order

to reduce the number of TSVs.

c) Determine (x, y) location for clock buffers. To get the delay

variation of buffer due to TSV stress, we need to fix buffer and TSV loca-

tion. For simplicity, we assume that TSVs are located immediately after child

nodes if they are required. To determine buffer location, we calculate maxi-

mum allowed wire-length from the TSV to the buffer to guarantee small enough

capacitance. Fig. 5.8 shows wire, TSV, and buffer models to calculate down-

stream capacitance and downstream delay. Buffer’s (x, y) location is defined

as an un-overlapped point with TSV and KOZ on the straight line connecting

two child nodes within the maximum allowed wire-length from the TSV.

d) Get the wire-length of each edge. Based on the downstream

capacitance and downstream delay of left and right child nodes, we calculate

the wire-length from a child node to merging point to meet zero skew. Since

we already know the exact (x, y, z) location of child node, we also have the

minimum wire-length between two child nodes based on half perimeter model.

As shown in Fig. 5.9(a), we need to search the location of merging point on

1-dimensional coordinate, from zero(child1) to totalWL(child2), where

totalWL = |x2− x1|+ |y2− y1|. (5.14)

We use binary search to get the wire-length of each edge. To be more

specific, as depicted in Fig. 5.9(a), from the current reference point, point1 =

123

Figure 5.8: Wire, TSV, and buffer modeling for delay calculation.

γ for the merging point, calculate the skew at the point2 = (γ+dl) and at the

point3 = (γ-dl), where dl is the unit length to move. If skew at point2 is the

minimum between three, we move the reference point to the right side, and if

skew at point3 has the minimum skew, next reference point will be in the left

side. The location of reference point, γ, can be determined using the following

equation 5.15, where i indicates the iteration index for binary search.

γi+1 =

γi − (0.5(i+2))× totalWLif point3 has the minimum skew

γi + (0.5(i+2))× totalWLif point2 has the minimum skew

(5.15)

When the skew at a certain point is smaller than the skew tolerance,

calculation of wire-length from child node to merging point is finished. We

use the maximum iteration for binary search as 15, which can guarantee 3nm

124

(a) Merging point on 1-D (b) Merging point on 2-D

Figure 5.9: Illustrations for merging point determination.

resolution for 100um wire-length. Elongation of the wire is needed when skew

at left child node or right child is the minimum along the whole wire, and

if it is larger than the skew tolerance. In such a case, we can calculate the

wire-length to be elongated as explained in [70].

e) Determine (x, y) location of merging point and TSVs. Merg-

ing point can be placed somewhere in between two child node in x-y plane. We

decide (x, y) location of merging point and TSVs based on the ratio of wire-

length in left and right edge, as described in Fig. 5.9(b). The (x,y) location of

merging point can be expressed as equation 5.16.

xMP = γ × a

a+ b

yMP = γ × b

a+ b(5.16)

Similarly, (x,y) value of TSV can be determined in the same manner because

they are evenly distributed along the edge. For example in Fig. 5.9(b), TSV

for child1 is located in the middle of child1 and MP.

125

f) Calculate stress-induced buffer resistance and refine the

wire-length to compensate it. With the stress map, we can adjust buffer

delay at the current buffer location. Delay variation is directly interpreted as

the buffer resistance variation, thus buffer resistance under the stress map can

be calculated as well. Now revisit the step e) with updated buffer resistance

to compensate the change of buffer resistance. Note that in this time, all the

location of TSVs are fixed as the previous location to keep the same stress

effect, and only wire length is adjusted, and (x, y) of merging point is changed

due to the wire-length change.

By doing step a) to f) for every depth from the bottom of the clock

tree, a buffered 3D clock tree with N dies can be constructed with minimum

wire-length as well as the skew under skew tolerance of the system.


We implement the proposed CTS flow in C++ and test on a 2.93GHz

Linux machine with 16G RAM. We use NanGate library and 45nm PTM

model [101] to characterize variance and covariance assuming 5% inter-die

and 5% intra-die variation of transistor lengths. Gurobi [35] is used as an ILP

solver.

We use several random circuits to verify the efficiency of our algorithm.

Table 5.2 shows circuit information used for our experiments. We use the same

clock sink number and TSV density for all benchmarks to focus on the trend

by the various numbers of tiers to be stacked. # T.P. means the number of

126

targeted paths for the optimization. For example, if we choose # T.P.=1, our

algorithm tries to optimize the most critical path. TSV density is a percentage

of occupied area by TSVs. TSV diameter is 4um and KOZ is 1um. We assume

that TSV capacitance is 28ff and resistance is 0.053Ω.

First, we show that our work can provide a design guideline to reduce

the stress effect on clock skew. Table 5.3 shows skew caused by TSV stress

according to clock source z-location. To see stress induced skew change, we do

CTS without stress consideration to be zero skew, and measure the skew with

the stress model. Since a bottom tier in 3D stacking does not need TSVs on

silicon substrate, a clock buffer in Tier 0 does not have an effect on the stress.

If a clock source is in Tier 0 (bottom tier), clock buffers tend to be concentrated

on Tier 0, which can reduce skew variation on the stress. However, we can see

huge increase of the skew (62.9ps) when the clock source is placed on Tier 1.

This result shows a guideline for skew reduction caused by TSV stress. For

the remaining experiments, we assume that clock sources are placed in Tier 0

to show conservative results.

Second, we verify the usefulness of our stress aware CTS. Table 5.4 and

Table 5.5 show case1 and case2 in order to show skew variation for all of the

benchmarks. In Table 5.4, case1 means CTS without covariance optimization

and stress consideration while case2 in Table 5.5 is stress aware CTS without

covariance optimization. In the table, Cov. means average covariance for the

optimized paths. σ stands for standard deviation of CP . Covariance and σ

are average values for all targeting paths. The comparison shows that the

127

Table 5.2: Circuit Information

Name #Tier DieSize:um2 #T.P. #Sink TSV density

CKT1 2 10002 1 2000 10%CKT2 3 10002 1 2000 10%CKT3 4 10002 1 2000 10%CKT4 2 20002 10 2000 10%CKT5 3 20002 10 2000 10%CKT6 4 20002 10 2000 10%CKT7 2 50002 100 2000 10%CKT8 3 50002 100 2000 10%CKT9 4 50002 100 2000 10%

Table 5.3: Skew change due to TSV stress according to clock source z-location(CKT9)

Source Tier Tier 0 Tier 1 Tier 2 Tier 3

Skew without TSV stress < 0.1ps < 0.1ps < 0.1ps < 0.1psSkew with TSV stress 9.3ps 62.9ps 53.7ps 37.0ps

skew due to the stress can be up to 12.8ps for CKT8 if we do not consider

TSV stress variation during CTS. Clock period of CKT8 can increase 2.8%

from 454ps to 466.8ps. If the clock source is on Tier 1, overall clock frequency

can increase more than 10% from Table 5.3. Table 5.4 and Table 5.5 show no

penalty of clock buffers, TSVs and wire-length for stress aware CTS.

Next, our variation reduction using the ILP formulation is verified in

Table 5.6 and Table 5.7. CTS without stress consideration, case3 in Table 5.6,

shows relatively large skew caused by TSV stress because our ILP formulation

enforces clock buffers on spreading more evenly over the tiers. We use β = 0 to

128

Table 5.4: Clock period analysis result. Case1 (No covariance optimization isdone, and no stress is considered)

σ Skew µcp # # WL CPUCircuit Cov. (ps) (ps) +3σ(ps) Buf TSV (um) (s)

CKT1 7.8 14.0 1.4 405.6 877 676 2.03e7 18CKT2 -1.4 13.9 5.0 400.4 1025 1288 2.39e7 24CKT3 -23.5 13.0 6.5 476.4 1168 1824 3.00e7 23CKT4 -13.8 13.4 0.2 429.5 892 684 2.15e7 20CKT5 11.9 14.6 10.1 484.8 1025 1293 2.31e7 23CKT6 0.0 14.4 3.8 430.1 1195 1918 3.16e7 22CKT7 89.2 16.7 1.0 460.4 878 674 2.08e7 19CKT8 107.0 17.4 12.8 466.8 1042 1307 2.61e7 18CKT9 133.6 18.2 9.3 466.0 1185 1895 3.05e7 21

see maximum variation reduction. β is a control parameter to avoid too many

TSV insertion introduced in equation 5.12. In case4, CKT9 takes 143 seconds

to do CTS. Fig. 5.10 shows the runtime increase as the number of optimized

paths are increased. Even though our ILP formulation is in class-NP, we show

that five thousands of paths can be optimized in 275 minutes, which is still

reasonable for practical use.

Last, Table 5.8 compares case1 in Table 5.4 and case4 in Table 5.7 to

see clearly combined impact on covariance reduction and stress consideration.

With our ILP formulation, standard deviation(σ) for clock period decreases

up to 34.2% for CKT6. Combining our ILP formulation and stress modeling,

we can reduce the clock period for CKT8 at 3-σ level up to 5.7%. All of the

benchmarks show huge variation(avg: 22.3%) and skew reduction(avg: 5.6ps)

129

Table 5.5: Clock period analysis result. Case2 (No covariance optimization isdone, but TSV stress is considered)


CKT1 8.0 14.0 0.1 404.3 877 676 2.03e7 15CKT2 -1.2 13.9 0.0 395.5 1025 1288 2.39e7 24CKT3 -23.5 13.0 0.0 469.9 1170 1824 3.00e7 25CKT4 -13.9 13.4 0.0 429.2 892 684 2.15e7 19CKT5 12.6 14.7 0.0 474.8 1024 1293 2.31e7 24CKT6 -0.2 14.4 0.0 426.2 1198 1918 3.15e7 24CKT7 89.7 16.8 0.0 459.5 881 674 2.08e7 20CKT8 106.9 17.4 0.0 454.0 1044 1307 2.62e7 15CKT9 134.8 18.1 0.8 457.0 1184 1895 3.05e7 21

even if clock sources are assumed on the bottom tier.

Buffer and wire-length increase are negligible during covariance opti-

mization. However, the number of TSVs dramatically increases up to 59.1%.

In our ILP formulation, we can control covariance optimization ratio and TSV

insertion number by adjusting β in equation 5.12. More interestingly, we ob-

serve that there is trade-off relation between covariance reduction and TSV

insertion. More covariance reduction requires more TSV insertion in Fig. 5.11.

From the graph, we can achieve maximum covariance reduction for a maximum

allowed TSV number.

130

Table 5.6: Clock period analysis result. Case3 (Covariance optimization isdone, but no stress is considered)


CKT1 -100.3 9.3 1.4 391.6 877 680 2.03e7 34CKT2 -34.8 12.6 12.4 404.0 1024 1322 2.39e7 42CKT3 -59.3 11.6 31.2 496.8 1168 1851 3.00e7 43CKT4 -56.9 11.7 40.0 464.4 893 797 2.15e7 41CKT5 -94.5 10.4 25.3 487.4 1026 1494 2.32e7 51CKT6 -116.8 9.5 11.8 423.3 1195 2303 3.18e7 57CKT7 -14.6 13.3 45.6 494.7 883 1002 2.13e7 92CKT8 -29.9 12.9 50.9 491.4 1044 2080 2.69e7 107CKT9 -13.7 13.6 42.5 485.4 1186 2302 3.08e7 141

5.5 Discussions

For 3D-IC design, we observe two important design challenges: Varia-

tion between tiers, TSV induced stress. Inter-die variation effect can be used

to compensate clock path variation, which optimizes random variation. TSV

induced stress is a systematic component of variation. We could reduce nom-

inal value of clock period by considering the stress during CTS, and minimize

the variation of clock period with optimal assignment of clock buffer z-location.

The proposed 3D CTS can enhance maximum frequency up to 5.7% by com-

bining the two approaches. Our observations are not limited to CTS, and open

the new opportunities for statistical timing analysis and physical design.

131

Table 5.7: Clock period analysis result. Case4 (Covariance optimization isdone, and TSV stress is considered)


CKT1 -100.3 9.3 0.1 390.3 877 680 2.03e7 32CKT2 -28.8 12.9 0.0 392.4 1025 1322 2.39e7 43CKT3 -60.4 11.5 0.0 465.4 1170 1851 3.00e7 43CKT4 -55.3 11.8 0.0 424.5 893 797 2.16e7 45CKT5 -95.2 10.4 0.0 462.0 1025 1494 2.32e7 52CKT6 -113.5 9.6 0.0 412.1 1198 2303 3.17e7 56CKT7 -12.9 13.4 0.0 449.3 886 1006 2.13e7 90CKT8 -30.4 12.8 0.0 440.4 1046 2080 2.70e7 113CKT9 -11.3 13.7 0.0 443.1 1186 2280 3.09e7 143

Figure 5.10: Runtime trend.

132

Figure 5.11: Trade-off between covariance and # TSVs.

Table 5.8: No stress, no inter-die aware CTS vs. our stress and inter-die awareCTS

Circuit ∆σ Skew(ps) µcp + 3σ(ps) #Buf #TSV WL

CKT1 -33.3% -1.3 -3.8% 0.0% 0.6% 0.1%CKT2 -9.1% -5.0 -2.0% 0.0% 2.6% 0.1%CKT3 -11.2% -6.5 -2.3% 0.2% 1.5% 0.0%CKT4 -12.3% -0.2 -1.2% 0.1% 16.5% 0.3%CKT5 -28.7% -10.1 -4.7% 0.0% 15.5% 0.3%CKT6 -34.2% -3.8 -4.2% 0.3% 20.1% 0.5%CKT7 -20.5% -1.0 -2.4% 0.9% 49.3% 2.2%CKT8 -26.0% -12.8 -5.7% 0.4% 59.1% 3.3%CKT9 -25.4% -9.3 -4.9% 0.1% 20.3% 1.3%AVG -22.3% -5.6 -3.5% 0.2% 20.6% 0.9%

133

Chapter 6

Conclusions

This dissertation studied algorithms and modeling techniques to mit-

igate the difficulty of continuing Moore’s law. To achieve high density in-

tegration, two different approaches were exploited. The first direction was to

overcome patterning limitation with double patterning technology. The second

technique was 3D wafer stacking with TSV connection.

DPL should be used to print below 80nm pitch pattern which cannot

be printed with current lithography equipment. At 20nm technology node,

DPL will be used for metal layers. However, gate, contact and metal layers

need to be printed with DPL for 14nm technology if EUV is not available.

In this dissertation, we propose CAD approaches to enable more robust DPL.

In Chapter 2, we proposed a method to estimate the layout distortion due

to overlay which is inevitable for DPT. We define several overlay variables

such as the amplitude of translation overlay, the angle of rotation overlay, and

the magnification factor. With a given overlay variable, we could model the

parameterized coupling capacitance. We showed how to determine the overlay

variables for the worst timing of a chip. This work provides a way of designing

a robust circuit with consideration of overlay. In Chapter 3, we propose a

134

fast and flexible layout decomposition framework with a graph theoretical

approach. To show the flexibility of our work, we extend our work to reduce

the timing variation due to overlay for contact and metal layer decomposition.

TSV will be used for memory stacking, and its usage will be extended

to memory and logic integration, and finally, TSV will be used to connect

between logic chips. Since wafer stacking with TSV is under development pro-

cess, we need to investigate the possible problem and solution for robust TSV

integration. In Chapter 4, we show that Cu TSV causes thermal stress which

can lead to significant timing variations. Stress can be taken advantage of for

timing optimization, since it is a strongly layout dependent, systematic effect.

In this thesis, we develop the first-order compact model for mobility variation

and propose a design methodology to analyze the systematic variation and

optimize layout by locating critical cells in a mobility enhanced region. For

3D-IC design aspect, we observe one more design challenge which is variation

between tiers. Inter-die variation effect can be used to compensate clock path

variation, which optimizes random variation while TSV induced stress is a

systematic component of variation. We could reduce nominal value of clock

period by considering the stress during CTS, and minimize the variation of

clock period with optimal assignment of clock buffer z-location. The proposed

3D CTS can enhance maximum frequency by combining the two approaches.

We hope that this dissertation will foster further research follow-ups in

the above area. Some of the possible directions include:

135

• In 14nm technology node, double patterning may not be enough to get

50% scalability. If EUV is not ready for 14nm technology, triple pattern-

ing may be the only choice for metal layers. Triple patterning requires

more complicated decomposition. Since our decomposition algorithm

uses bi-partitioning, our work can be extended to layout decomposition

for triple patterning with a 3-way partitioning method. Even our work

can be extended to provide an effective solution for quadraple patterning.

In addition, rule based decomposition may not be efficient because of no

hot spot consideration. Therefore, hybrid layout decomposition com-

bining rule and model based decomposition may generate more effective

decomposed layout.

• Our TSV stress-aware timing analysis framework for 3D-IC may open

the opportunity for many stress-aware layout optimizations, such as CTS

and TSV optimizations. Our observations are not limited to CTS and

physical design area, and open the new opportunities for statistical tim-

ing analysis. Because stress due to TSV can happen in the vertical

direction even if we focus the horizontal direction stress in this disserta-

tion, there will be good research opportunities with vertical stress con-

sideration. In addition, since there are many other stress sources during

semiconductor manufacturing, we can extend the work to consider the

combining effect of other stress sources.

136

Bibliography

[1] K. Adam and W. Maurer. Polarization effects in immersion lithogra-

phy. Journal of Microlithography, Microfabrication and Microsystems,

4(3):031106, 2005.

[2] Tomoyuki Ando, Masaru Takeshita, Ryoich Takasu, Yasuhiro Yoshii,

Jun Iwashita, Shogo Matsumaru, Sho Abe, and Takeshi Iwai. Pattern

Freezing Process Free Litho-Litho-Etch Double Patterning. In Proc. of

SPIE, volume 7140, Feb 2008.

[3] Krit Athikulwongse, Ashutosh Chakraborty, Jae-Seok Yang, David Z.

Pan, and Sung Kyu Lim. Stress-Driven 3D-IC Placement with TSV

Keep-Out Zone and Regularity Study. In Proc. Int. Conf. on Computer

Aided Design, Nov 2010.

[4] G. Bailey, A. Tritchkov, J.-W. Park, L. Hong, V. Wiaux, E. Hendrickx,

S. Verhaegen, P. Xie, and J. Versluijs. Double pattern EDA solutions

for 32nm HP and beyond. In Proc. SPIE 6521, 2007.

[5] Yongchan Ban, Soo-Han Choi, Kevin Lucas, Chul-Hong Park, and David Z.

Pan. Layout Decomposition of Self-Aligned Double Patterning for 2D

Random Logic Patterning. In Proc. SPIE, 2011.

137

[6] Yongchan Ban, Kevin Lucas, and David Z. Pan. Flexible 2D Layout De-

composition Framework for Spacer-type Double Pattering Lithography.

In Proc. Design Automation Conf., June 2011.

[7] Yongchan Ban and David Z. Pan. Compact Modeling and Robust Lay-

out Optimization for Contacts in Deep Sub-wavelength Lithography. In

Proc. Design Automation Conf., Jun 2010.

[8] Yongchan Ban and Jae-Seok Yang. Layout Aware Line-Edge Roughness

Modeling and Poly Optimization for Leakage Minimization. In Proc.

Design Automation Conf., June 2011.

[9] K. D. Boese, A. B. Kahng, B. A. McCoy, and G. Robins. Fidelity and

near-optimality of Elmore-based routing constructions. In Proc. IEEE

Int. Conf. on Computer Design, pages 81–84, 1993.

[10] K. Cao, J. Hu, and S. Dobre. Standard cell characterization considering

lithography induced variations. In IEEE/ACM Int. Workshop on on

Timing Issues in the Specification and Synthesis of Digital Systems (TAU

), 2006.

[11] H. Chang and S.S. Sapatnekar. Statistical Timing Analysis Under Spa-

tial Correlations. In IEEE Trans. on Computer-Aided Design of Inte-

grated Circuits and Systems, September 2005.

[12] Y.-S. Chang, M.-F. Tsai, C-C Lin, and J-C Lai. Pattern Decomposi-

tion and Process Integration of Self-Aligned Double Patterning for 30nm

138

Node NAND FLASH Process and Beyond. In Proc. SPIE 7274, 2009.

[13] P. Chen, D. A. Kirkpatrick, and K. Keutzer. Miller Factor for Gate-

Level Coupling Delay Calculation. In Proc. Int. Conf. on Computer


[14] C. Chiang and S. Sinha. The road to 3d eda tool readiness. In Proc.

Asia and South Pacific Design Automation Conf., Jan 2009.

[15] E. Y. Chin and A. R. Neureuther. Variability aware interconnect timing

models for double patterning. In Proc. SPIE 7275, 2009.

[16] T.-B. Chiou, R. Socha, H. Chen, L. Chen, S. Hsu, P. Nikolsky, A. van

Oosten, and A. C. Chen. Development of layout split algorithms and

printability evaluation for double patterning technology. In Proc. SPIE

6924, 2008.

[17] Minsik Cho, Yongchan Ban, and David Z. Pan. Double Patterning Tech-

nology Friendly Detailed Routing. In Proc. Int. Conf. on Computer


[18] Dongsub Choi, Chulseung Lee, Changjin Bang, Daehee Cho, Myunggoon

Gil, Pavel Izilson, Seunghoon Yoon, and Dohwa Lee. Optimization of

high order control including overlay, alignment, and sampling. In Proc.

SPIE 6922, 2008.

139

[19] T. Dao, D. H. Triyoso, M. Petras, and M. Canonico. Through Silicon

Via Stress Characterization. In IEEE International Conference on IC

Design and Technology, 2009.

[20] F. Dartu, N. Menezes, J. Qian, and L. T. Pillage. A gate-delay model for

high-speed CMOS circuits. In Proc. Design Automation Conf., pages

576–580, 1994.

[21] Duo Ding, Jhih-Rong Gao, Kun Yuan, and David Z. Pan. A Generic

Lithography-friendly Detailed Router based on Post RET Data Learning

and Hotspot Detection . In Proc. Design Automation Conf., Jun 2011.

[22] Micrea Dusa, Jo Finders, and Stephen Hsu. Double patterning lithog-

raphy: The bridge between low k1 ArF and EUV. In mic, Feb 2008.

[23] Mircea Dusa, John Quaedackers, Olaf F. A. Larsen, Jeroen Meessen,

Eddy van der Heijden, Gerald Dicker, Onno Wismans, Paul de Haas,

Koen van Ingen Schenau, Jo Finders, Bert Vleemingb, Geert Storms,

Patrick Jaenen, Shaunee Cheng, and Mireille Maenhoudt. Pitch dou-

bling through dual patterning lithography challenges in integration and

litho budgets. In Proc. SPIE, volume 6520, February 2007.

[24] Ilan Englard, Rich Piech, Claudio Masia, Noam Hillel, Liraz Gershtein,

Dana Sofer, Ram Peltinov, and Ofer Adan. Accurate in-resolution level

overlay metrology for multi patterning lithography techniques. In Proc.

SPIE 6922, 2008.

140

[25] C. Ferri, S. Reda, and R. I. Bahar. Strategies for improving the para-

metric yield and profits of 3d ics. In Proc. Int. Conf. on Computer


[26] C.M. Fiduccia and R.M. Mattheyses. A Linear-Time Heuristic for Im-

proving Network Partitions. In Proc. Design Automation Conf., June

1982.

[27] D. Flagello, Bernd Geh, Steve Hansen, and Michael Totzeck. Polar-

ization effects associated with hyper-numerical-aperture (> 1) lithogra-

phy. Journal of Microlithography, Microfabrication and Microsystems,

4(3):031104, 2005.

[28] P. Friedberg, Y. Cao, J. Cain, R. Wang, J. Rabaey, and C. Spanos.

Modeling Within-Die Spatial Correlation Effects for Process-Design Co-

Optimization. In Proc. Int. Symp. on Quality Electronic Design, Mar

2005.

[29] H. Fukutome, T. Aoyama, Y. Momiyama, T. Kubo, Y. Tagawa, and

H. Arimoto. Direct evaluation of gate line edge roughness impact on

extension profiles in sub-50nm n-mosfets. In Proc. Int. Symp. on

Physical Design, pages 433–436, 2004.

[30] R. S. Ghaida and P. Gupta. Design-Overlay Interactions in Metal Dou-

ble Patterning. In Proc. SPIE 7275, 2009.

141

[31] Mohit Gupta, Kwangok Jeong, and Andrew B. Kahng. Timing yield-

aware color reassignment and detailed placement perturbation for double

patterning lithography. In Proc. Int. Conf. on Computer Aided Design,

November 2009.

[32] R. Gupta, B. Tutuianu, B. Krauter, and L. T. Pillage. The Elmore

delay as a bound for RC trees with generalized input signals. In Proc.

Design Automation Conf., pages 364–369, June 1995.

[33] R. Gupta, B. Tutuianu, and L. T. Pileggi. The Elmore delay as a

bound for RC trees with generalized input signals. In IEEE Trans.

on Computer-Aided Design of Integrated Circuits and Systems, pages

95–104, January 1997.

[34] http://www.gnu.org/software/glpk/.

[35] http://www.gurobi.com/.

[36] H. Irie, K. Kita, K. Kyuno, and A. Toriumi. In-Plane Mobility Anisotropy

and Universality Under Uni-axial Strains in n- and p-MOS Inversion

Layers on (100), (110), and (111) Si. In IEEE International Electron

Devices Meeting, 2004.

[37] K. Jeong, A. B. Kahng, and R. O. Topaloglu. Assessing Chip-Level

Impact of Double-Patterning Lithography. In Proc. Int. Symp. on

Quality Electronic Design, March 2010.

142

[38] K. Jeong and A.B. Kahng. Timing Analysis and Optimization Impli-

cations of Bimodel CD Distribution in Double Patterning Lithography.

In Proc. Asia and South Pacific Design Automation Conference, 2009.

[39] Minhee Jung, Joon-Min Park, Moonseok Kim, Sukjoon Hong, Jaisoon

Kim, In-Ho Park, and Hye-Keun Oh. 32 nm half pitch formation with

high numerical aperture single exposure. In Proc. of SPIE, volume

7274, Feb 2009.

[40] Moongon Jung, Joydeep Mitra, David Z. Pan, and Sung Kyu Lim. TSV

Stress-aware Full-Chip Mechanical Reliability Analysis and Optimiza-

tion for 3D IC. In Proc. Design Automation Conf., June 2011.

[41] A. B. Kahng, S. Muddu, and E. Sarto. On Switch Factor Based Analysis

of Coupled RC Interconnects. In Proc. Design Automation Conf., June

2000.

[42] A. B. Kahng, P. Sharma, and R. O. Topaloglu. Exploiting STI stress

for perform. In Proc. Int. Conf. on Computer Aided Design, Nov 2007.

[43] A.B. Kahng, C.-H. Park, and H. Yao. Layout Decomposition for Double

Patterning Lithography. In Proc. Int. Conf. on Computer Aided

Design, Nov 2008.

[44] Andrew B. Kahng, Sudhakar Muddu, and Devendra Vidhani. Noise and

Delay Uncertainty Studies for Coupled RC Interconnects. In Proc. Int.

Conf. Asic/SOC, 1999.

143

[45] Andrew B. Kahng, Chul-Hong Park, Xu Xu, and Hailong Yao. Layout

Decomposition Approaches for Double Patterning Lithography.

[46] D. H. Kim, K. Athikulwongse, and S. K. Lim. A Study of Through-

Silicon-Via Impact on the 3-D Stacked IC Layout. In Proc. Int. Conf.

on Computer Aided Design, Nov 2009.

[47] T.-Y. Kim and T. Kim. Clock tree embedding for 3d ics. In Proc. Asia

and South Pacific Design Automation Conf., Jan 2010.

[48] D. Laidler, P. Leray, K. D’have, and S. Cheng. Sources of Overlay Error

in Double Patterning Integration Schemes. In Proc. SPIE 6922, 2008.

[49] Y.-J. Lee, R. Goel, and S. K. Lim. Multi-functional Interconnect Co-

optimization for Fast and Reliable 3D Stacked ICs. In Proc. Int. Conf.


[50] Harry J. Levinson. Principles of Lithography, 2nd Edition. SPIE

Publications, 2005.

[51] Burn J. Lin. Successors of ArF Water-Immersion Lithography: EUV

Lithography, Multi-e-beam Maskless Lithography, or Nanoimprint? In

J. Micro/Nanolith. MEMS MOEMS, volume 7, Dec 2008.

[52] F. Liu. A General Framwwork for Spatial Correlation Modeling in VLSI

Design. In Proc. Design Automation Conf., Jun 2007.

144

[53] F.-J. Liu, J. Lillis, and C.-K. Cheng. A new layout-driven timing model

for incremental layout optimization. In Proc. Asia and South Pacific

Design Automation Conf., 1997.

[54] K. H. Lu, X. Zhang, S.-K. Ryu, J. Im, R. Huang, and P. S. Ho. Thermo-

Mechanical Reliability of 3-D ICs containing Through Silicon Vias. In

Electronic Components and Technology Conference, 2009.

[55] Kevin Lucas, Chris Cork, Alex Miloslavsky, Gerry Luk-Pat, Levi Barnes,

John Hapli, John Lewellen, Greg Rollins, Vincent Wiaux, and Staf Ver-

haegen. Interactions of double patterning technology with wafer pro-

cessing, OPC and design flows. In Proc. SPIE 6924, 2008.

[56] M. S. Lundstrom. On the Mobility Versus Drain Current Relation for

a Nanoscale MOSFET. In IEEE Electron Device Letters, volume 22,

pages 293–295, June 2001.

[57] W.-K. Ma, J.-H. Kang, C.-M. Lim, H.-S. Kim, S.-C. Moon, S. Lalbaha-

doersing, and S.-C Oh. Alignment system and process optimization for

improvement of double patterning overlay. In Proc. SPIE 6922, 2008.

[58] Yongchan Ban Minsik Cho, Kun Yuan and David Z. Pan. ELIAD:

Efficient Lithography Aware Detailed Routing Algorithm with Compact

and Macro Post-OPC Printability Prediction. In IEEE Transactions

on Computer-Aided Design of Integrated Circuits and Systems (TCAD),

volume 28, pages 1006–1016, 2009.

145

[59] J. Minz, X. Zhao, and S. K. Lim. Buffered clock tree synthesis for 3d

ics under thermal variations. In Proc. Asia and South Pacific Design

Automation Conf., Jan 2008.

[60] J. Mitra, P. Yu, and D. Z. Pan. RADAR: RET-Aware Detailed Rout-

ing Using Fast Lithography Simulations. In Proc. Design Automation

Conf., Jun 2005.

[61] Joydeep Mitra, Moongon Jung, Rui Huang, Suk-Kyu Ryu, Sung Kyu

Lim, and David Z. Pan. A Fast Simulation Framework for Full-Chip

Thermo-Mechanical Stress and Reliability Analysis of Through-Silicon-

Via based 3D ICs. In Electronic Components and Technology Confer-

ence, 2011.

[62] V. Moroz, L. Smith, X.-W. Lin, D. Pramanik, and G. Rollins. Stress-

Aware Design Methodology. In Proc. Int. Symp. on Quality Electronic

Design, March 2006.

[63] K. Nabors and J. White. Fastcap: A multipole accelerated 3-d capaci-

tance extraction program. IEEE Trans. on Computer-Aided Design of

Integrated Circuits and Systems, 10:1447–1459, Nov., 1991.

[64] R. Nair, L. Berman, P. S. Hauge, and E. J. Yoffa. Generation of perfor-

mance constraints for layout. IEEE Trans. on Computer-Aided Design

of Integrated Circuits and Systems, 8(8):860–874, 1989.

146

[65] Jiwoo Pak, Mohit Pathak, Sung Kyu Lim, and David Z. Pan. Modeling

of Electromigration in Through-Silicon-Via Based 3D IC. In Electronic

Components and Technology Conference, 2011.

[66] David Z. Pan, Minsik Cho, and Kun Yuan. Manufacturability Aware

Routing in Nanometer VLSI. In Foundations and Trends in Electronic

Design Automation, volume 4, page 197, 2010.

[67] David Z. Pan, Jae-Seok Yang, Kun Yuan, and Minsik Cho. CAD for

Double Patterning Lithography. In IEEE International Conference on

IC Design and Technology, Jun 2010.

[68] David Z. Pan, Jae-Seok Yang, Kun Yuan, Minsik Cho, and Yongchan

Ban. Layout Optimizations for Double Patterning Lithography. In

IEEE 8th International Conference on ASIC (ASICON), Oct 2009.

[69] Paul Penfield and Jorge Rubinstein. Signal delay in RC tree networks.

In Proc. Design Automation Conf., Jun 1981.

[70] R.-S.Tsay. An exact zero-skew clock routing algorithm. In IEEE Trans.

on Computer-Aided Design of Integrated Circuits and Systems, Vol.12,

No.2, pp.242-249 1993.

[71] A. Rajaram, J. Hu, and R. Mahapatra. Reducing clock skew variability

via cross links. In Proc. Design Automation Conf., 2004.

147

[72] A. Rajaram, D. Z. Pan, and J. Hu. Improved algorithms for link based

non-tree clock network for skew variability reduction. In Proc. Int.

Symp. on Physical Design, 2005.

[73] J. Rubinstein and A. Neureuther. Post-decomposition assessment of

double patterning layout. In Proc. SPIE 6924, 2008.

[74] C. S. Selvanayagam, J. H. Lau, X. Zhang, S.K.W. Seah, K. Vaidyanathan,

and T. C. Chai. Nonlinear Thermal Stress/Strain Analysis of Copper

Filled TSV and their Flip-Chip Microbumps. In Electronic Components

and Technology Conference, 2008.

[75] N. Serin, T. Serin, S. Horzum, and Y. Celik. Annealing effects on the

properties of copper oxide thin films prepared by chemical deposition.

In Electronic Journals, volume 20, pages 398–401, May 2005.

[76] Weicheng Shiu, Hung Jen Liu, Jan Shiun Wu, Tsu-Li Tseng, Chun Te

Liao, Chien Mao Liao, Jerry Liu, and Troy Wang. Advanced self-aligned

double patterning development for sub-30-nm DRAM manufacturing.

In Proc. of SPIE, volume 7274, Feb 2009.

[77] C. S. Smith. Piezoresistance effect in germanium and silicon. In Physi-

cal Review, volume 94, pages 42–49, Apr 1954.

[78] Hisanori Sugimachi, Hitoshi Kosugi, Tsuyoshi Shibata, Junichi Kitano,

Koichi Fujiwara, Michihiro Mita, Akimasa Soyano, Shiro Kusumoto, Mo-

toyuki Shima, and Yoshikazu Yamaguchi. CD Uniformity improvement

148

for Double-Patterning Lithography (Litho-Litho-Etch) Using Freezing

Process. In Proc. of SPIE, volume 7273, Feb 2009.

[79] S. Suthram, J. C. Ziegert, T. Nishida, and S. E. Thompson. Piezore-

sistance Coefficients of (100) Silicon nMOSFETs Measured at Low and

High Channel Stress. In IEEE Electron Device Letters, volume 28, pages

58–60, Jan 2007.

[80] S. E. Thompson, M. Armstrong, and C. Auth et al. A 90 nm logic tech-

nology featuring strained-silicon. In IEEE Trans. on Electron Devices,

volume 51, pages 1790–1797, Nov 2004.

[81] S. E. Thompson, G. Sun, Y. Sung Choi, and T. Nishida. Uniaxial-

Process-Induced Strained-Si: Extending the CMOS Roadmap. In IEEE

Trans. on Electron Devices, volume 53, pages 1010–1020, May 2006.

[82] M. Totzeck, P. Graupner, T. Heil, A. Gohnermeier, O. Dittmann, D. Krah-

mer, V. Kamenov, J. Ruoff, and D. Flagello. Polarization influence on

imaging. Journal of Microlithography, Microfabrication and Microsys-

tems, 4(3):031108, 2005.

[83] N. Toyama, T. Adachi, Y. Inazuki, T. Sutou, Y. Morikawa, H. Mohri,

and N. Hayashi. Pattern decomposition for double patterning from

photomask viewpoint. In Proc. SPIE 6521, 2007.

[84] K. Uchida, T. Krishnamohan, K.C. Saraswat, and Y. Nishi. Physical

mechanisms of electron mobility enhancement in uniaxial stressed MOS-

149

FETs and impact of uniaxial stress engineering in ballistic regime. In

IEEE International Electron Devices Meeting, 2005.

[85] Vincent Wiaux, Staf Verhaegen, Shaunee Cheng, Fumio Iwamoto, Patrick

Jaenen, Mireille Maenhoudt, Takashi Matsuda, Sergei Postnikov, and

Geert Vandenberghe. Split and design guidelines for double patterning.

In Proc. of SPIE, volume 6924, Feb 2008.

[86] M. J. Wieland, G. de Boer, G. F. ten Berge, R. Jager, T. van de Peut,

J. J. M. Peijster, E. Slot, S. W. H. K. Steenbrink, T. F. Teepen, A. H. V.

van Veen, and B. J. Kampherbeek. MAPPER: high-throughput mask-

less lithography. In Proc. of SPIE, volume 7271, Feb 2009.

[87] Alfred K. Wong. Resolution Enhancement Techniques in Optical Lithog-

raphy. SPIE Publications, 2001.

[88] Obert Wood, Chiew-Seng Koay, Karen Petrillo, Hiroyuki Mizuno, and

Sudhar Raghunathan. Integration of EUV lithography in the fabrication

of 22-nm node devices. In Proc. of SPIE, volume 7271, Feb 2009.

[89] Jinjun Xiong, Vladimir Zolotov, and Lei He. Robust extraction of

spatial correlation. In Proc. Int. Symp. on Physical Design, pages 2–9,

New York, NY, USA, 2006. ACM Press.

[90] Y. Xu and C. Chu. GREMA: Graph reduction based mask assignment

for double patterning technology. In Proc. Int. Conf. on Computer


150

[91] Yue Xu and Chris Chu. A Matching Based Decomposer for Double

Patterning Lithography. In IEEE Trans. on Computer-Aided Design of

Integrated Circuits and Systems, March 2010.

[92] Jae-Seok Yang, Krit Athikulwongse, Young-Joon Lee, Sung Kyu Lim,

and David Z. Pan. TSV Stress Aware Timing Analysis with Appli-

cations to 3D-IC Layout Optimization. In Proc. Design Automation

Conf., June 2010.

[93] Jae-Seok Yang, Katrina Lu, Minsik Cho, Kun Yuan, and David Z. Pan.

A New GraphTheoretic, MultiObjective Layout Decomposition Frame-

work for Double Patterning Lithography. In Proc. Asia and South

Pacific Design Automation Conf., Jan 2010.

[94] Jae-Seok Yang and A. Neureuther. Crosstalk Noise Variation Assess-

ment and Analysis for the Worst Process Corner. In Proc. Int. Symp.

on Quality Electronic Design, March 2008.

[95] Jae-Seok Yang, Jiwoo Pak, Xin Zhao, Sung Kyu Lim, and David Z. Pan.

Robust Clock Tree Synthesis with Timing Yield Optimization for 3DICs.

In Proc. Asia and South Pacific Design Automation Conf., Jan 2011.

[96] Jae-Seok Yang and David Z. Pan. Overlay Aware Interconnect and

Timing Variation Modeling for Double Patterning Technology. In Proc.

Int. Conf. on Computer Aided Design, Nov 2008.

151

[97] Kun Yuan and David Z. Pan. WISDOM: Wire Spreading Enhanced

Decomposition of Masks in Double Patterning Lithography. In Proc.

Int. Conf. on Computer Aided Design, Nov 2010.

[98] Kun Yuan and David Z. Pan. E-Beam Lithography Stencil Planning

and Optimization with Overlapped Characters. In Proc. Int. Symp. on

Physical Design, March 2011.

[99] Kun Yuan, Jae-Seok Yang, and David Z. Pan. Double Patterning Layout

Decomposition for Simultaneous Conflict and Stitch Minimization. In

Proc. Int. Symp. on Physical Design, March 2009.

[100] Kun Yuan, Jae-Seok Yang, and David Z. Pan. Double Patterning Lay-

out Decomposition for Simultaneous Conflict and Stitch Minimization.

In IEEE Trans. on Computer-Aided Design of Integrated Circuits and

Systems, Feb 2010.

[101] W. Zhao and Y. Cao. New generation of Predictive Technology Model

for sub-45nm early design exploration. In IEEE Trans. on Electron

Devices, 2006.

[102] X. Zhao, D. Lewis, H.-H. S. Lee, and S. K. Lim. Pre-bond Testable

Low-Power Clock Tree Design for 3D Stacked ICs. In Proc. Int. Conf.


152

Vita

Jae-Seok Yang received the B.S. degree in electrical enginering from

Sogang Univercity, Seoul, Korea, in 1997, the M.S. degree in electrical engi-

neering and computer science from the University of California, Berkeley, in

2007. He is currently working toward the Ph.D. degree in electrical and com-

puter engineering at The University of Texas, Austin. He was with Samsung

semiconductor research center from 1999 to 2005 at Hwasung, Korea. From

fall 2010, he is working for Samsung semiconductor research center. He has

published over 14 papers in international conferences and journals.

Jae-Seok Yang’s research interests include nanometer VLSI design for

manufacturability and design automation for 3D-IC design. In particular, he

has worked on algorithm and modeling for robust double patterning lithog-

raphy, 3D integration with TSV, signal integrity analysis, statistical timing

analysis, clock tree synthesis and layout dependent stress modeling. He was

the recipient of the Best Paper Award at the SOC Design Conference, Seoul,

Korea, in 2002, the Samsung Scholarship in 2005, and the Best Paper Award

at the Asian and South Pacific Design Automation Conference in 2010. He has

served as reviewer for several international conferences and journals including

DAC, ICCAD, ASPDAC, SLIP, TCAD, TVLSI.

Permanent address: Jae-Seok Yang,[email protected]

153

This dissertation was typeset with LATEX† by the author.

†LATEX is a document preparation system developed by Leslie Lamport as a specialversion of Donald Knuth’s TEX Program.

154

Date post:	18-Dec-2021
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

Copyright by Jae-Seok Yang 2011

Documents