IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION...

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS,VOL. X, NO. X, JULY 2014 1

Coupling Mitigation in 3D Multiple-StackedDevices

Pooria M.Yaghini,Student Member, IEEE,Ashkan Eghbal,Student Member, IEEE,Misagh Khayambashi,and Nader Bagherzadeh,Fellow, IEEE,

Abstract—3D multiple-stacked IC has been proposed tosupport energy efficiency for data center operations as DRAMscaling improves annually. A 3D multiple-stacked IC is a singlepackage containing multiple dies, stacked together, usingThrough-Silicon Via (TSV) technology. Despite the advantagesof 3D design, fault occurrence rate increases with featuresize reduction of logic devices, which gets worse for 3Dstacked designs. TSV coupling is one of the main reliabilityissues for 3D multiple-stacked IC data TSVs. It has largedisruptive effects on signal integrity and transmission delay.In this paper, we first characterize the inductance parasiticsin contemporary TSVs, then we analyze and present a classi-fication for inductive coupling cases. Next we devise a codingalgorithm to mitigate the TSV-to-TSV inductive coupling. Thecoding method controls the current flow direction in TSVs byadjusting the data bit streams at run-time to minimize theinductive coupling effects. After performing formal analyseson the efficiency scalability of devised algorithm, an enhancedapproach supporting larger bus sizes is proposed. Our experi-mental results show that the proposed coding algorithm yieldssignificant improvements while its hardware-implemented en-coder results tangible latency, power consumption, and area.

Index Terms—Reliability, 3D multiple-stacked IC, TSV,Coupling, Signal Integrity.

I. I NTRODUCTION

EXASCALE systems are expected to have approxi-mately one billion processing elements by 2020 [1],

[2]. Three Dimensional (3D) IC designs is considered asa viable solution for integrating more cores on a chip,while imposing smaller footprint area and better timingperformance than 2D architecture [3].

Wire-bonding and flip-chip stacking have made theirways into mainstream semiconductor manufacturing inrecent years, but they are not considered anymore fornew generation of 3D integrations [4]. Through-SiliconVia (TSV) is currently more attractive, supporting bet-ter performance and integrated functionality. With 3D in-tegration technology employing TSVs, the average andmaximum distance among components on different dieswill be substantially reduced, savings delay, power, andarea factors [5]. Fault occurrence rate increases with theemergence of nano-scale circuits and designing more com-plex circuits on a chip, which will be more critical for3D multiple-stacked IC designs. Reliable TSVs, as one ofthe major components in 3D multiple-stacked IC, are indemand. Many research groups have studied and proposed

The authors are with the Center for Pervasive Communications andComputing at Department of Electrical Engineering and Computer Sci-ence, University of California, Irvine, CA, 92697 USA e-mail: ([email protected]; [email protected]; [email protected]; [email protected]).

reliable TSVs [6]–[8]. The reliability of TSV interconnectsis analyzed in [6]. The reliability aware TSV planning forthe 3D multiple-stacked IChas been proposed in [7]. Thekey design for reliability challenges and possible solutionsfor TSV-based 3D IC integration are discussed in [8].

TSV coupling is one of the major issues in 3D multiple-stacked IC designs because of increased parasitic signalsas compared to 2D ICs, which may result in delay oreven mutual coupling between adjacent TSVs [9], [10].The term TSV coupling refers to capacitive and inductivecouplings among neighbor TSVs, which the latter is morecritical in higher frequency data transmissions [11]. Electricfield results in capacitance coupling and magnetic field isa source of inductive coupling. The impact of TSVs on SIin 3D ICs has been investigated in several articles [12],[13]. TSV-to-TSV inductive coupling is one of the faultsources exacerbating the Signal Integrity (SI) effect whichcauses two major issues. First, it increases the path delaydue to Miller effect. Second, the coupling noise can resultin logic function failure. An analytical model for thecoupling capacitance between pairs of TSVs is presentedin [12]. The TSV resiliency during manufacturing steps byinvestigating resistive open defects has been studied [14]. Acomplete set of self-consistent equations including self andcoupling terms for resistance, capacitance and inductanceofvarious TSV structures are presented in [13]. As a solutionto TSV-to-TSV coupling issue increasing TSV distances,shielding the victim TSVs, inserting buffers at the victimnet, decreasing the driver size at the aggressor net, andincreasing the load at both victim and aggressor net havebeen suggested [10]. However, the last two suggestionshave negative implications on timing performance, andothers need high effort at post-design time.

In [15], a coding scheme has been suggested for a matrixof TSVs, reducing the maximum capacitive crosstalk by25% in a mesh of TSV with size of3 × n. However, theimpact of inductive coupling on SI has not been evaluatedyet. The goal of this paper is to investigate the reliabilityof3D multiple-stacked IC against the inductive TSV-to-TSVcoupling and to propose a scalable inductive coupling awarecoding. The proposed algorithm in this paper is intendedto support two major 3D device categories. The devisedbaseline algorithm [16] targets the first 3D device categorywhich consists of designs with low TSV concentration (lessthan 100 TSVs) such as 3D NoC [17]–[19]. An enhancedscheme is proposed to support architectures with highTSV concentration (around 500 TSVs) such as 3D DRAMmemories (Hybrid Cube Memory (HMC) [20]–[22]), asshown in Fig. 1. Our experimental results show that the


TSV 2

Insulator

(ox)

Substrate

TSV 1

TSV

Body(cu)Depletion

Data TSVs Memory bank

Vault controller

A 3D Memory Vault (HMC)

Top view of two data TSVs

Pitch

TSV

diameter

Command TSVs

Fig. 1. 3D multiple-stacked IC vault, vertically interconnected by TSV bus in 3D integration technology

proposed coding algorithm yields significant improvementswhile its hardware-implemented encoder depicts tangiblelatency, power consumption, and area.The major contributions of this article are:

• To analyze the reliability issue of inductive TSV-to-TSVcoupling fault effect within a 3D multiple-stacked IC. Alsoproviding an analytical failure (data corruption) estimationof TSV links caused by inductive TSV coupling effect.• To devise a method to minimize the effect of magneticfield caused by TSVs, including an analytical analysis todemonstrate the strength of proposed technique.• To present a scalable coding approach with modest in-formation redundancy overhead for implementing the pro-posed technique in large-scale 3D multiple-stacked ICs.• To prove the efficiency in mitigating TSV-to-TSV induc-tive coupling and justify the scalability and practicalityofour proposed schemes through concrete experiments andhardware implementation and synthesis.

The rest of this article is organized as follows: Abrief introduction of 3D multiple-stacked IC architectureisprovided in Section II. Inductive TSV-to-TSV coupling isinvestigated in Section III. Section IV presents an inductivecoupling analysis. The baseline algorithm coding and itsimprovement evaluation are discussed in Sections V and VI,respectively. The enhanced algorithm coding scheme and itscoupling mitigation efficiency and hardware implementa-tion are provided in Section VII. Section VIII overviews therelated work. Finally, Section IX delivers some conclusionremarks.

II. 3D MULTIPLE-STACKED IC ARCHITECTURE

The advent of 3D or vertical integration is a promisingpath to boost scalability and power/performance charac-teristics to extend capabilities of modern integrated cir-cuits [23]–[25]. These capabilities are inherent to 3D ICs,resulting in considerably shorter interconnecting wires inthe vertical direction. 3D integration supports new oppor-tunities by providing feasible and cost effective approachesfor integrating heterogeneous cores to realize future com-puter systems. It supports heterogeneous stacking becausedifferent types of components can be fabricated separately,

and silicon layers can be implemented with different tech-nologies. One of the most promising technologies for 3D ICintegration is the notion of TSVs [26], pillars manufacturedacross thinned silicon substrates to establish inter-die con-nectivity after die bonding. Salient TSV features include:fine pitches, high densities, and high compatibility withthe standard CMOS process. 3D ICs are interconnectedwith high-density short and thin TSVs, supporting low-level integration and superior to existing solutions. Multiplelayers of 2D planar designs are stacked on each other andare vertically interconnected by high-density short and thinTSVs in 3D integration technologies.

Micro-bumps are the interfaces between TSVs and 2Ddesigns. The minimum TSV depth normally is about40-100µm which is projected to reach30-40µm by 2018. Acopper TSV in standard Si-bulk technology is expectedto have minimum via diameter of2-4µm, 1:20 minimumaspect ratio,4-7µm minimum via pitch,0.5µm oxide thick-ness (tox), and there can be up to 2-8 dies per stack [27].It is important to note that the TSV process is independentfrom the technology node used for the 2D chip and it doesnot scale. TSV diameters and pitches are two to three orderslarger than transistor channel gate lengths. Furthermore,inorder to reach a high yield rate, manufacturers typicallyimpose a minimum TSV density policy to maintain theplanarity of the wafer during chemical and mechanicalpolishing [28]. For example, Tezzaron requires that at leastone TSV in every250µm× 250µm area [29].

III. I NDUCTIVE TSV-TO-TSV COUPLING

The impact of TSVs on future 3D ICs is still un-known [30]. However, chip warpage, TSV coupling, andthermal stress are known as main causes of TSV failure [9],[26].

The term TSV coupling refers to capacitive and inductivecouplings among neighboring TSVs. Electric field resultsin capacitance coupling and magnetic field is a sourceof inductive coupling. The capacitance coupling betweenTSVs depends on the permittivity of the oxide, TSVgeometry, the arrangement of surrounding TSVs and bodycontacts places. The capacitance coupling is influenced bythe arrangement of the TSVs while the inductive coupling


5 6 7 8 9 1020

40

60

80

100

120

140

160

TSV pitch (µm)

Indu

ctiv

e vo

ltage

(m

V)

Frequency 10GHzFrequency 5GHzFrequency 3GHzFrequency 1GHz

(a) TSV Pitch

20 16 14 10 720

40

60

80

100

120

140

160

180

Technology (nm)

Inductive v

oltage (

mV

)

Frequency 10GHz

Frequency 5GHz

Frequency 3GHz

Frequency 1GHz

(b) Technology

5:50 10:100 20:200 25:250 5:200 10:200 20:400 25:5000

50

100

150

200

250

300

Aspect ratio (diameter:length)

Inductive v

oltage (

mV

)

Frequency 10GHz

Frequency 5GHz

Frequency 3GHz

Frequency 1GHz

(c) TSV aspect ratio

Fig. 2. Inductive coupling SPICE simulation results

is slightly dependent on the distance of neighbor TSVsas inductive coupling has longer range of effect thancapacitance coupling noises [31]. Inductive coupling amongneighboring TSVs is more critical in higher frequency datatransmissions [32], and long TSVs which is considered inthis article. Processors with higher operating frequency areemerging as the process technology is scaling down; anexample is the IBM 5.2GHz multiprocessor [33].

A. Inductive Coupling Characteristics

To characterize the effect of inductive coupling, a 3x3matrix of TSVs is modeled at circuit-level with SynopsysHSPICE. The middle TSV is assumed to be the victimand the other 8 are the aggressors. In simulations, the topend of each TSV is connected to the output of an inverter,which drives the input of another inverter connected to thebottom of the TSV. The coupled TSV structure is modeledbased on [10], [34] as a lumped RLC circuit. The circuit iscomposed of a series TSV resistanceRTSV and inductanceLTSV , parallel silicon substrate resistanceRsi and capaci-tanceCsi, and silicon dioxide capacitanceCox around theTSV. These components describe the relationship betweenthe signal TSVs. The values of circuit elements are obtainedusing analytic equations based on the dimensions of thestructure, such as oxide thicknesstox, silicon substrateheighthsi, TSV radiusrTSV , and TSV pitchPTSV and bymaterial properties like dielectric constantǫ and resistivityρ.

Rsi = ρsicosh−1

[

PTSV

2rTSV

]

πhsi(1)

whereρsi = 0.0012T 2 − 0.0352T + 10 (2)

and

Csi = ǫsiπhsi

cosh−1[

PTSV

2rTSV

] (3)

Cox = ǫox2πhsi

ln rTSV +toxrTSV

](4)

whereǫox = 0.016T + 3.6 (5)

TSV inductance [35] is also derived through partial self-inductance and mutual inductance. Partial self-inductancedepends on the diameter and length of TSV and is ex-pressed as:

LTSVself=µ0lTSV

2π[ln(

2lTSV

rTSV)−

3

4] (6)

LTSVMutual=µ0lTSV

2π[ln(

lTSV

PTSV+

√

1 + (lTSV

PTSV)

2

(7)

−

√

1 + (PTSV

lTSV)

2

+PTSV

lTSV]

where µ0 is the permeability of free space given by4π· 10−7.

Predictive Technology Model (PTM) [36] FinFET tran-sistor models are employed to implement inverters in thisexperiment. The worst-case induced voltage on the victimTSV is reported for different TSV pitches (Fig. 2(a)),process technologies (Fig. 2(b)), and TSV aspect ratio(Fig. 2(c)) over different frequencies. The simulation pa-rameter values are chosen according to ITRS [27] inter-connect report, as shown in Table I.

Table ISIMULATION CONFIGURATIONS IN FIG. 2

Figure Technology Length Pitch Diameter2(a) 14nm 100µm 8µm 4µm2(b) 20, 16, 14, 10, 7nm 100µm 8µm 4µm2(c) 14nm 10− 500µm 9− 29µm 5− 25µm

Based on Fig. 2(c), it is observed that as TSVs becomelonger (even with the same aspect ratio) the magnetic fluxlinking the two TSVs increases proportionally. Therefore,as the length of TSVs grow, mutual coupling betweenaggressors and victim increases almost linearly and thecoupled voltage rises proportionally.

Although the linkage flux between two TSVs is a strongfunction of the length, its dependence on TSV-to-TSV pitchis weak. Changing the pitch between cylindrical TSVsaffects mutual inductance in two ways. First, it changesthe magnetic field created by the aggressor. Secondly,considering Faraday’s law, it alters the surface on which themagnetic field is integrated to calculate the linkage flux. As


0 � 1

TS

V

Sender

Receiver

Down LayerUp Layer

(a) Downward current flow

1� 0

TS

V

Sender

Receiver

Down LayerUp Layer

(b) Upward current flow

0 � 1

TS

V

Sender

Receiver

Down LayerUp Layer

(c) Upward current flow

1� 0

TS

V

Sender

Receiver

Down LayerUp Layer

(d) Downward current flow

TS

V

Receiver

Down LayerUp Layer

1 � 1or

Sender

0 � 0

No

Cur

rent

(e) Off-Current mode

TS

V

Down Layer

Up Layer

Sender

Receiver

1 � 1or

0 � 0N

o C

urre

nt

(f) Off-Current mode

Fig. 3. Current flow direction in TSV.

long as the proximity effect and other high order magneticeffects are trivial, current distribution in a TSV remainsalmost symmetrical regardless of the pitch size. Therefore,the magnetic field created by an aggressor does not vary bythe pitch size, making the first effect to be negligible. Sincethe pitch between the TSVs is at least an order of magnitudesmaller than their lengths, the second effect is small, but thelinkage flux and consequently mutual coupling decreasesslightly as pitch increases, shown in Fig. 2(a).

As shown in Fig. 2(b), induced voltage is a func-tion of process technology. As processes advance, gatecapacitance gets smaller and voltage rise/fall time be-comes shorter. These two effects have opposite impacts oncharging/discharging current of gate capacitance. The samecurrent that charges (or discharges) the gate capacitancepasses through TSV and causes inductive coupling to itsneighboring TSVs. Thus, inductively coupled voltage variesfor different technologies.

As technology advances and supply voltage shrinks, thecoupled voltage becomes a greater portion ofVdd, resultingin higher probability of error. Among all the physicalparameters, the length of the TSVs has the major impact oninductive coupling, specifically for global TSVs connectingmore number of layers.

B. Current Flow in TSVs

The current flow direction of a TSV is data-dependent,based on charging and discharging of the intermediatecapacitor between each pair of transistors of stacked planar.The behavior of the intermediate capacitor relies on theprevious and current data bit values. Fig. 3 illustrates sixpossible cases depending on the data bit values and locationof the sender, resulting in three possible current flows inTSVs. There is a downward current flow when the inputdata bit of sender changes from ’0’ to ’1’ and ’1’ to ’0’

Fig. 4. Different TSV patterns leading to inductive coupling

if the sender is in lower and upper level, respectively asshown in Fig. 3(a) and Fig. 3(d). Similarly, there is anupward current flow direction, if the data bit of the senderchanges from ’1’ to ’0’ and ’0’ to ’1’ if the sender is inlower and upper level, respectively as shown in Fig. 3(b)and Fig. 3(c). TSV does not carry any current, if there is noswitching between the previous and next data bit values, asshowed in Fig. 3(e) and Fig. 3(f). In the rest of this articlesuch a TSV is called an inactive TSV, which does not haveany current flow.

IV. I NDUCTIVE COUPLING ANALYSIS

A. Problem Definition

An accurate analysis of coupling-induced failure requiresa complex electromagnetic analysis of neighboring TSVs.Since such an analysis is outside the scope of this article,an approximate form of the problem is considered.

Assuming the electromagnetic proximity effect and otherhigh order effects can be neglected, the coupling-inducedvoltage βtot is simply the sum of voltages induced onthe victim TSV by the neighboring TSVs. Faraday’s lawimplied that:

βtot =

N∑

i=1

Vcoupl,i =

N∑

i=1

Mv,idIidt

(8)

where• N is the total number of aggressors.• Vcoupl,i is the voltage coupled on the victim byith

aggressor, assuming all other aggressors have constantcurrent.

• Mv,i is the mutual inductance betweenith aggressorand victim TSVs.Mv,i is calculated from Equa-tion 9 [35].

Mv,i =µ0

2π

[

lln

(

l +√

di2 + l2

di

)

+ di −

√

di2 + l2

]

(9)where di is the distance ofith aggressor from thevictim TSV andl is the length of a TSV.

• Ii is the current ofith aggressor TSV.


The representation ofβtot can be further simplified asfollows. Assume that the inductive coupling voltage causedby a single horizontal or vertical neighboring TSV isβtot,thenβtot is equal toα×β, where the parameterα dependson the current flow direction and arrangement of activeneighboring TSVs.

Each victim TSV has four neighbors in horizontal orvertical directions. Fig. 4 shows a top view of differentgeometrical possibilities of neighbor configurations. Only4 neighbors are considered to simplify the analysis. Theparameterα equals the algebraic sum of current valuesin neighbors of the victim TSV. With only 4 neighbors,α assumes a values in{−4, · · · , 4}. Clearly, the severityof inductive coupling is higher for larger absolute valuesof α and the goal of this article is to reduce currentconfigurations that lead to high values of|α|. In otherwords, the higher the sum of neighboring currents, thehigher is the inductively coupled voltage.

With the cross section view of TSVs, the current ofhorizontal and vertical neighbors of a victim TSV areexamined to measure the severity of inductive TSV-to-TSVcoupling. The effect of diagonal neighboring TSVs, whichcauses less mutual coupling effect than adjacent TSVs, isnot considered in our work.

B. Coding Scheme

The main contribution of this article is to propose acoding scheme to mitigate inductive coupling occurrence byadjusting the sequence of data flits1. The suggested codingtechnique replaces largerα values by smaller ones. Whilethe baseline algorithm is intended to show the mitigationgain obtained by using our method, a variation of thebaseline algorithm called “enhanced algorithm” introducesscalability into the baseline approach.

One possible approach to data modification is to performinversion on a properly-selected set of input bit streams.For this method, decoding of the received signal requiresknowledge of the location of the bits that have beeninverted, inflicting serious overhead. One workaround forthis overhead is to perform inversion on rows of TSV datarather than individual TSV bits. Clearly, this reduction inoverhead comes at the cost of inferior performance.

The outline of baseline algorithm is explained as a two-phase algorithm:

1) In the first phase, each cell decides whether or notto submit inversion requests to its vertical neigh-bors (above and below) with the goal of decreasingthe sum of its neighboring currents. Submitting therequests to only two neighbors, rather than fourneighbors, is chosen for the sake of simplicity ofthe design. These requests are stored in for futureprocessing.

2) Once all requests have been submitted, cells processtheir received requests and decide whether or not toaccept inversion. Finally based on these individual

1flit stands for “flow control unit”

decisions, each row decides whether or not to acceptinversion.

C. System Model

Assume that the number of bits to be transmitted overTSVs is denoted byLD at each time slot. The encoded datablock is sent over a matrix of TSVs withNR rows andNC

columns, withNR andNC satisfyingNR ×NC = LD.With this convention, the original data to be transmitted

at time slott is represented byDt matrix. Similarly, theencoded data that has already been sent at timet − 1 isrepresented byDt−1.

The current flow direction of each TSV is specified bythe modified data already sent over the TSV, namelydt−1

(dt−1 is a cell of Dt−1 matrix). Similarly, dt representsthe encoded data bit to be transmitted, whiledt meansthe original bit to be transmitted at time slott. Withthis convention and the proposed inversion mechanism,dtwill be either dt or dt. A simple analysis of the circuitryconnected to a TSV reveals that:

1) If the dt−1 = 0 and dt = 0, the TSV current will be0 (#).

2) If the dt−1 = 0 and dt = 1, the TSV current will be1 (⊙).

3) If the dt−1 = 1 and dt = 0, the TSV current will be-1 (⊗).

4) If the dt−1 = 1 and dt = 1, the TSV current will be0 (#).

with ⊙, #, and⊗ representing current values of 1, 0, and -1respectively. Consequently, the direction of current can becalculated, ifdt−1 and dt are known. From the precedingdiscussion, it is easy to see that theNR × NC matrix Crepresenting the current of TSVs is equal to:

C(D1, D2) = D2−D1 (10)

The key parameter in baseline algorithm is the sum ofneighboring TSV current. Therefore, it is helpful to defineanNR×NC matrixP , where the(i, j)th element ofP (Pij)is equal to algebraic sum of neighboring TSV currents.From this definition, the elements ofP can take any valuesin the set{−4, · · · , 4}.

V. BASELINE CODING ALGORITHM

In the proposed coding each cell (corresponding to eachTSV) will send/receive an inversion request to the cellabove or below itself based on its neighborhood condition.These neighbor cells then decide, based on the receivedrequests, whether or not to honor the requests. Beforedelving into the details, the effect of bit inversion on TSVcurrent should be examined.

A. Effect of bit inversion on TSV current

The direction of current passing through a TSV isspecified by the previous data bit already sent over TSVand the data bit to be sent over the TSV as discussedin Section III-B. In the coding algorithm,dt = dt if the


inversion decision is taken, otherwisedt = dt. The changeof current is summarized as follows:

1) In case of no inversion,(dt−1, dt = dt) = (0, 0)results in #. If an inversion is performed, i.e.(dt−1, dt = dt) = (0, 1), the current will be⊙.

2) In case of no inversion,(dt−1, dt = dt) = (0, 1)results in ⊙. If an inversion is performed, i.e.(dt−1, dt = dt) = (0, 0), the current will be#.

3) In case of no inversion,(dt−1, dt = dt) = (1, 0)results in ⊗. If an inversion is performed, i.e.(dt−1, dt = dt) = (1, 1), the current will be#.

4) In case of no inversion,(dt−1, dt = dt) = (1, 1)results in #. If an inversion is performed, i.e.(dt−1, dt = dt) = (1, 0), the current will be⊗.

It is concluded that# can be changed to both⊗ and⊙ (Ifdt=1 or 0 respectively) by the inversion ofdt, while ⊙ and⊗ are only changed to#.

B. Reducing the sum of neighbor currents by inversion

The proposed coding consists of two phases as follows.1) Submitting inversion requests:As mentioned previ-

ously, the goal of each cell is to see how it can reducethe sum of its neighbor currents by inverting the data onneighbor TSVs above and below itself. Table II lists allpossible forms of requests that can be submitted by a victimTSV ⊕ to its neighbors to reduce|α|. The classification isbased on various scenarios that happen for the neighborsabove and below of a victim TSV. The proposed actionsin Table II are based on two factors; first the upwardcurrent flow (⊙) conversion to downward one (⊗) is notpossible. Second, the current change, achieved by inversionof corresponding neighbor, should be adjusted in such away that the magnitude of the sum of neighbor currentsdecreases. It is easy to verify that the proposed actions inthe table follows these guidelines.

In some of the configurations of Table II, requests aresent to only one of the vertical neighbors, while in othersrequests will be sent to both neighbors. The former isidentified by the word ’only’ in the third column of Table II.The following example illustrates the necessity of sendinga single request to only one of the neighbors. Consider thefollowing TSV current configuration:

⊙#⊕#

#

If the neighbor above is changed as⊙ → #, the sum ofcurrents will be 0 which is the ideal situation. On the otherhand, the same situation results if only the neighbor belowis changed as# → ⊗.

It is desirable to have a simple decision rule rather thana lookup table in order to figure out when to submit aninversion request. The entire set of proposed requests inTable II is summarized in a very simple form.

1) For a cell(i, j) with P [i][j] > 0, if the data of targetneighbor is 1 and current of that neighbor is not -1(⊗), send an inversion request to that neighbor.

Table IIREDUCING THE SUM OF NEIGHBOR CURRENTS BY INVERTING

VERTICAL NEIGHBORS

|α| Typicalpatterns

Inversion request to vertical neighbors

1

#

⊙ ⊕ #

#only one# → ⊗

⊙# ⊕ #

#only one of these:# → ⊗ or ⊙ → #

⊗⊙ ⊕ ⊙

## → ⊗

⊙⊗ ⊕ ⊙

#only one of these:# → ⊗ or ⊙ → #

⊙⊙ ⊕ #

⊗⊙ → #

⊙⊗ ⊕ #

⊙only one⊙ → #

2

⊙# ⊕ #

⊙⊙ → #

⊙⊙ ⊕ #

#⊙ → # and# → ⊗

#

⊙ ⊕ ⊙#

# → ⊗

⊙⊙ ⊕ ⊗

⊙⊙ → #

⊙⊙ ⊕ ⊙

⊗⊙ → #

3

⊙⊙ ⊕ #

⊙⊙ → #

⊙⊙ ⊕ ⊙

#⊙ → # and# → ⊗

4⊙

⊙ ⊕ ⊙⊙

⊙ → #

2) For a cell(i, j) with P [i][j] < 0, if the data of targetneighbor is 0 and current of that neighbor is not 1(⊙), send an inversion request to that neighbor.

The result of such decision is stored in twoNR × NC

matrices, called Request From Above (RFA) and RequestFrom Below (RFB). The elements of these matrices areeither 0 or 1. IfRFA[i][j] is 1, it means that the cell at(i, j)has received an inversion request from the cell above itself.RFB is defined similarly.RFA andRFB are initializedto 0 before any operation. Also, note that the first (last) rowdoes not receive any requests from above (below).

2) Processing inversion requests:Once all inversionrequests are submitted and cells with mutually exclusiverequests are marked, the inversion decision is made. Onepossible technique for cell(i, j) is to grant such an inver-sion request only ifRFA[i][j] = 1 andRFB[i][j] = 1.Since the top row and bottom row do not have anyneighbors above and below them respectively, the first rowof RFA and the last row ofRFB are set to 1 in order toavoid decision conflicts with the proposed approach.

OnceRFA andRFB are finalized, they are combinedas an intention matrixImat = AND(RFA,RFB). IfImat[i][j] = 1, the data bit of cell(i, j) is marked forinversion. Since the final inversion is row-based rather thancell based, an intention vectorIvec (of size NR × 1) isconstructed fromImat.

If the number of 1’s are greater than the number of 0’s,


Algorithm 1 Summary of baseline algorithm

1: Take Dt andDt as inputs2: ConstructC by C = Dt − Dt−1

3: Construct matrixP by setting its (i, j)th elementP [i][j] equal to the algebraic sum of the currents ofneighbors of TSV(i, j)

4: Initialize RFA andRFB to 0. Then set the first rowof RFA and the last row ofRFB to 1

5: for all i and j, i 6= NR, setRFB[i][j] = 1 if the ORof following conditions are truedo

6: P [i+ 1][j] < 0 andDt[i][j] == 07: P [i+ 1][j] > 0 andDt[i][j] == 18: end for9: for all i andj, i 6= 1, setRFA[i][j] = 1 if at least one

of the following conditions holdsdo10: P [i− 1][j] < 0 andDt[i][j] == 011: P [i− 1][j] > 0 andDt[i][j] == 112: end for13: Imat = AND(RFA,RFB)14: Ivec[i] =

∑NC

j=1 Imat[i][j] for all i ∈ {1, · · · , NR}.

15: Ivec[i] = 1(Ivec[i] ≥ NC/2), where the identityfunction 1 returns 1 when its argument is true, andreturns 0 otherwise.

16: Dt[i][1...NC ] = XOR(

Ivec[i], Dt[i][1...NC ])

for all i’s

row i is selected to be inverted, andIvec[i] is set to 1.Ivec

is initialized to zero.

C. An Example of baseline algorithm

Suppose that the previously transmitted data and theunmodified current data are:

Dt−1 =

0 1 1 11 0 0 00 1 0 01 0 0 1

, Dt =

1 0 0 01 0 1 10 1 0 00 1 1 1

If Dt is transmitted, the current matrix and the corre-spondingP matrix will be:

C =

1 −1 −1 −10 0 1 10 0 0 0

−1 1 1 0

=

⊙ ⊗ ⊗ ⊗# # ⊙ ⊙# # # #

⊗ ⊙ ⊙ #

P =

−1 0 −1 01 0 0 0−1 1 2 11 0 1 1

In this case, the number of cells withP = 0 throughP = 4 is 6, 9, 1, 0, and 0 respectively. Now, suppose thatthe baseline algorithm is applied to(Dt, Dt−1). RFA andRFB will hold the following values:

RFB =

1 0 0 00 0 1 10 0 0 01 1 1 1

, RFA =

1 1 1 10 0 0 00 0 0 01 1 1 1

Then, theImat and the correspondingIvec are:

Imat =

0 0 0 00 0 0 00 0 0 01 1 1 1

, Ivec =

0001

Which results in the inversion of last row ofDt:

Dt =

1 0 0 01 0 1 10 1 0 01 0 0 0

With this inversion, the new current andP matrix willbe:

C =

1 −1 −1 −10 0 1 10 0 0 00 0 0 −1

, P =

−1 0 −1 01 0 0 00 0 1 00 0 −1 0

Note that after inversion, the count of cells withP = 0throughP = 4 is 11, 5, 0, 0, and 0 respectively: While thenumber of cells withP = 0 has grown from 6 to 11, thenumber of cells withP = 1 andP = 2 has dropped from9 to 5 and from 1 to 0.

VI. BASELINE ALGORITHM EVALUATION

In order to evaluate the efficiency of the proposed code,a long random sequence of bits is input into two systems:one with encoder, and the other without encoder. Also tohave a more robust evaluation of the baseline algorithm,PARSEC [37] benchmark memory traces captured usingPIN [38] are utilized as a real-world data traffic. Denote theclass of cells with|Pij | = 0, · · · , 4 by ψk for k = 0, · · · , 4.Then, the relative occurrence frequency (denoting byF (i))of all ψi’s are counted for differenti’s. Comparing theoccurrence frequency diagrams of the two systems showswhether the proposed coding lowers the occurrence ofψi’swith larger i. If such a decline is observed, the resultis a decrease in inductive coupling as discussed before.Fig. 5(a) and Fig. 5(c) show the relative occurrence ofdifferentψis for an8×8 TSV bus in both uncoded (left bar)and coded (right bar) system with random and PARSECbenchmark data traffic, respectively. In Fig. 5(a), there are5 pairs of columns, where the left column of each columnpair belongs to the uncoded system and the right columnbelongs to the coded system. As is evident from this figure,the relative frequency of occurrence ofψi’s with largei ≥ 2 has decreased. Furthermore, the relative frequencyof occurrence ofψi remains almost the same. Importantly,the relative frequency of occurrence ofψ0 has increased.Fig. 5(b) is the ratio of right columns to the left columnof column pairs of Fig. 5(a) to emphasize the change ofrelative frequency of occurrence of differentψis.


0 1 2 3 40

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

i, Index of class

F(i)

Uncoded

Coded

(a)

0 1 2 3 40

0.2

0.4

0.6

0.8

1

1.2

1.4

i, Index of class

Fcoded(i)/

Funcoded(i)

(b)

blackscholes bodytrack canneal facesim ferret fluidanimate raytraces vips X2640

5

10

15x 10

5

Nu

mb

er

of

occu

rre

nce

PARSEC benchmark workloads

=0

=1

=2

=3

=4

(c)

Fig. 5. Evaluating the efficiency of baseline algorithm for an 8× 8 TSV bus with both random (5(a) and 5(b)) and PARSEC (5(c)) data traffic. In5(c), for each workload the left bar represents the uncoded and the right bar shows the coded approach results.

0 4 8 12 16 20 24 28 32 350

5

10

15

20

25

Number of columns

ICM

gai

n (%

)

NR

=4

NR

=6

NR

=8

NR

=10

NR

=16

NR

=32

Fig. 6. ICM gain versus number of columns

A. Evaluation Metrics

Apart from the visual conclusion, it is possible to define ascalar measure to quantify the effectiveness of the proposedcoding. One possibility is to form a weighted sum of rela-tive frequencies, where the weight of the relative frequencyof ψi is taken to bei to impose a penalty on high valuesof i. Consequently, the lower the value of this measure,the better is the efficiency. We define Inductive CouplingMitigation (ICM) metric, as indicated byµ in Equation 11,to evaluate the efficiency of the proposed algorithm.

µ =

4∑

i=0

i.F (i) (11)

The “ICM gain” of the coding algorithm is also quantifiedas the ratio of “change inµ due to coding” to the “µ ofuncoded system”:

ν =|µuncoded− µcoded|

µuncoded(12)

Applying this measure to the result of Fig. 5 shows that theICM measure changes from 0.99 to 0.78, which amountsto a 21% ICM gain.

Finally, the proposed coding imposes an overhead fordata transmission. In order to transmitLD bits over anNR × NC TSV matrix (LD = NR × NC), an additional

0 4 8 12 16 20 24 28 32 350

5

10

15

20

25

Number of rows

ICM

gai

n (%

)

NC

=4

NC

=6

NC

=8

NC

=10

NC

=12

NC

=16

NC

=24

NC

=32

Fig. 7. ICM gain versus number of rows

NR bits are required to transmitIvec alongside the modifieddata. Thus, the overhead is written as:

η =NR

N = NRNC=

1

NC(13)

B. Scalability of baseline algorithm

It is of interest to analyze the variation of ICM gainν with TSV bus dimensionsNR and NC . This analysisprovides an efficient physical distribution and placement ofTSVs within a vault. For a given bus size (NR ×NC) andunder certain constraints onν and η, this information isused to decide on the values ofNR andNC .

Fig. 6 and Fig. 7 show the gain improvement for thefixed number of rows (columns) as the number of columns(rows) grows. It is observed that increasing bothNR andNC decreasesν; however, the negative effect of increasingNC is more severe than the effect of increasingNR. Fig. 8illustrates an instance of this fact by comparingν for thetwo cases ofNR = 4 andNC = 4. It is observed that ateach fixed bus size, the ICM gain is better when the numberof columns has a fixed value.

In the following, the observed variation ofν with NC andNR are justified. A rigorous justification of the variationof ν with NC and NR requires calculating the valuesof F (i) for i = 0, · · · , 4 for the coded system. How-ever, this calculation requires analyzing a very large scale


0 32 64 96 128 160 192 224 2560

0.1

0.2

0.3

Bus size

Impr

ovem

ent r

atio

0 32 64 96 128 160 192 224 2560

0.1

0.2

Bus size

Ove

rhea

d ra

tio

NR

=4

NC

=4

NR

=4

NC

=4

Fig. 8. ICM gain and corresponding information redundancy for the samenumber of bits in a bus with column or row growth

Markov chain which is outside the scope of this paper.Consequently, the indirect methods are used to justify theobserved behavior.

1) Justification ofNC effect on ICM gain:An indirectapproach for examining the impact ofNC on ν is to calcu-late the probability of row inversion. Given the algorithm,it is reasonable to assume that adjacent cells in a row aremarked independently for inversion, i.e.:

P(Imati,j = 1 | {Imat

i,n , n ∈ {1, · · · , NC}\{j}}) ≈ P(Imati,j = 1)

whereP represents probability of its argument, “\” standsfor set complement operator, andImat

i,j denotes the elementof Imat at rowi and columnj . With the assumption of sameinversion probability for all cells (data bits)Pinv,cell, andgiven the fact that a row is inverted only when more thanhalf of its cells are marked for inversion, the probability ofrow inversion is represented by Equation 14.

Pinv,row =

Nc∑

i=⌈Nc/2⌉

(

Nc

i

)

(Pinv,cell)i(1−Pinv,cell)

Nc−i (14)

Fig. 9 showsPinv,row as a function ofNC for differentvalues ofPinv,cell as plot parameters. It is observed that whenPinv,cell < 0.5, Pinv,row decreases withNC . Combining thisobservation with the fact thatPinv,cell < 0.5 for baselinealgorithm (as shown in Appendix A), it is concludedthat increasingNC decreasesPinv,row within the proposedalgorithm. AsPinv,row decreases, the code is less frequentlyemployed, and this reluctance for engaging the inversionmechanism deprives the system of the ICM gain promisedby coding. Consequently, the observed descending trend ofICM gain with the increase ofNC is justified.

In practice, those cells that are closer to the border andcorners have higher probability of inversion, as discussedin Appendix A. To elaborate,Pinv,cell is a function ofprobability of 0’s and 1’s in the input bit stream and the

0 10 20 30 40 50 60 70

0

0.2

0.4

0.6

0.8

1

NC

Pro

b. o

f ro

w in

ve

rsio

n P(cell inv)=.3

P(cell inv)=.4

P(cell inv)=.5

P(cell inv)=.6

P(cell inv)=.7

P(cell inv)=.9

Fig. 9. Probability of row inversion versus number of cells ina row

location of the cell. Consequently, the calculation of rowinversion probability in Equation 14 is not accurate enough.However, as long as the maximum cell inversion probabilityof the cells in a row is less than 0.5, the same descendingtrend of row inversion probability withNC is observed (seeFig. 9).

2) Justification ofNR effect on ICM gain: The effectof NR on ν is examined in a similar fashion. To see this,first note that based on the calculations of Appendix A,the highest probability of row inversion belongs to rows 1andNR, while rows 2 andNR − 1 have lower probabilityof inversion, and the lowest probability belongs to rows3, 4, · · · , (NR − 2). Denote these probabilities byP(1,NR)

inv,row ,

P(2,NR−1)inv,row , and P

(3···NR−2)inv,row . Assuming that rows are in-

verted independently, the average of the ratio of invertedrows (forNR ≥ 4) to NR is shown in Equation 15.

Ninv =2

NRP

(1,NR)inv,row +

2

NRP

(2,NR−1)inv,row +(1−

4

NR)P

(3···NR−2)inv,row

(15)By calculating the sensitivity ofNinv to NR, i.e.(dNinv/dNR)/(Ninv), it is possible to indirectly justify theeffect ofNR on ν:

dNinv/dNR

Ninv

−2

N2R

P(1,NR)

inv,row + −2

N2R

P(2,NR−1)

inv,row + 4

N2R

P(3···NR−2)

inv,row

2NR

P(1,NR)

inv,row + 2NR

P(2,NR−1)

inv,row +(1− 4NR

)P(3···NR−2)

inv,row

= 1NR

O(1)(16)

whereO(1) represents a scalar of order 1. ForNR > 10,this ratio is very small and keeps getting smaller for largeNR. This reduction in sensitivity means that the averagefraction of inverted rows remains almost constant and theICM gain does not decrease significantly. This justifies whyincreasing the number of rows does not have a noticeableimpact on the ICM gain.

While the discussion so far suggests that a minimal valueof NC is beneficial as long as the only parameter of interestis the ICM gainν, low values ofNC lead to a higher valueof overhead, since the overhead is given by1/NC .

VII. E NHANCED CODING ALGORITHM

In some 3D IC devices, a large bus size and high ICMgain are required simultaneously. 3D stacked-DRAM is


0 50 100 150 200 2500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Bus size (LD

) (bit)

ICM

gain

(%

)P=2; N

R=4; N

C=(Bus size)/N

R

P=4; NR=4; N

C=(Bus size)/N

R

P=8; NR=4; N

C=(Bus size)/N

R

P=1; NR=4; N

C=(Bus size)/N

R

P=1; NC=4; N

R=(Bus size)/N

C

Fig. 10. ICM gain for the same bus size with different partitions (P) vsrow (NR) growth

an example, in which the size of TSV data bus typicallyexceeds hundred bits. As discussed before, the ICM gaindrops asNR orNC increases. This means that the proposedcoding is not scalable and may not be able to providesufficient ICM gain for applications such as 3D stacked-DRAM with large bus sizes. A partitioning scheme isproposed in the following subsection to make the ICM gainof our coding scalable for larger bus size.

A. Partitioning Approach

Given that the ICM gain of baseline algorithm is sig-nificant for a small bus size, we propose an approach tomake a large bus size gain-scalable by partitioning the largeNR ×NC matrix of TSVs intoq × p sub matrices of sizeNR/q andNC/p and apply the coding on all sub matricesindependently, resulting in an enhanced algorithm. Withthis method, the ICM gain ofNR ×NC network equal theICM gain of an(NR/q)× (NC/p) network at the cost ofincreased overhead. Denoting the ICM gain and overheadof anm×n with a q×p partitioning, the newν(m,n, q, p)andη(m,n, q, p) are presented by Equation 17:

ν(m,n, q, p) = ν(m

q,n

p, 1, 1) (17)

η(m,n, p, q) =p.q.NR

q

NR.NC=

p

NC= p.η(m,n, 1, 1)

A column partitioning is considered here for the fol-lowing reasons, i.eq = 1, instead of row or row-columnpartitioning. First, the impact ofNC on ν is more severethan the effect ofNR on ν as discussed in Section VI-B.Therefore, performing partitioning on columns is prefer-able. Second, row partitioning does not offer substantialgain improvement because of the insensitivity ofν to NR.In column partitioning approach, the same ICM is obtainedas reported for a small matrix while the overhead is thesame as that of increasing rows.

A general benefit of partitioning is to make parallelimplementation possible, resulting in a faster encoder ar-chitecture. The factor of parallelism is equal to the numberof partitions.

0 32 64 96 128 160 192 224 2560

10

20

30

40

50

60

70

80

90

100

Bus size(LD

) (bit)

Ove

rhe

ad

(%

)

P=1; NC

=4; NR

=LD

/NC

P=1; NR

=4; NC

=LD

/NR

P=2; NR

=4; NC

=LD

/NR

P=4; NR

=4; NC

=LD

/NR

Fig. 11. Information redundancy overhead rate for the same bussizewith different partitions (P) vs row (NR) growth

64 128 256 5120

10

20

30

40

50

60

70

80

Bus size (LD

) (bit)

Are

a (

µm

2)

P=1;NC=8;L

D/N

R

P=1;NR=8;L

D/N

C

P=2;NR=8;L

D/N

C

P=4;NR=8;L

D/N

C

P=8;NR=8;L

D/N

C

Fig. 12. Encoder silicon area

B. Enhanced Algorithm Evaluation

The ICM gain and overhead of enhanced algorithm andbaseline algorithm are compared in Fig. 10 and Fig. 11.The row scaling approach (constantNC) offers a consistentICM gain but suffers from a constant high value of overheadregardless of the bus size. On the other hand, the columnscaling approach (constantNR) improves overhead as bussize increases, but causes a severe drop in the ICM gain.The enhanced algorithm combines the high ICM gain ofrow-scaling approach with the low overhead of columnscaling approach.

C. Hardware Synthesis Results

The encoder of both proposed coding algorithms issynthesized by Synopsys Design Compiler using 28nmTSMC library (1.05V, 25 ◦C) to report latency, powerconsumption, and area. Decoder component is the same for

Table IIIPROPOSED ENCODER LATENCY VERSUSTSV BUS SIZE (ps)

Bus size NC=8;NR=LD/NC NR=8;NC=LD/NR

P=1 P=1 P=2 P=4 P=8128-bit 144 207 147 112 97256-bit 203 295 207 144 118512-bit 639 283 219 221 151


64 128 256 5120

10

20

30

40

50

60

Bus size (LD

) (bit)

Pow

er

consum

ption (

µW

)

P=1;NC=8;L

D/N

R

P=1;NR=8;L

D/N

C

P=2;NR=8;L

D/N

C

P=4;NR=8;L

D/N

C

P=8;NR=8;L

D/N

C

Fig. 13. Proposed encoder power consumption versus TSV bus size

both of the proposed baseline and enhanced algorithms. Theoverhead is also negligible as compared to the encoder unitin terms of latency, power consumption, and area, since itis composed of a simple comparator and a mix of invertergates.

A TSV data bus with variable size is also modeled inHSPICE. The TSV model in [10] is used to capture theinductive TSV-to-TSV coupling effect.

At each step of hardware implementation, inversionrequests of a single TSV row are processed. Therefore, thelatency of the encoder is constrained by TSV row size.The latency of row scaling is observed to be higher thanthe latency of column scaling as illustrated in Table III. Inpartitioning method, the number of columns in a partitiondecreases as the number of partitions increases, reducingthe overall processing latency. This is because all thesmaller partitions are evaluated in parallel. For example fora 512 TSV bundle the latency of encoder with 8 partitionsis almost50% of the one with one partition.

A larger bus size needs more complex combinationallogic and memory units, resulting in higher power con-sumption and area footprint as depicted in Fig. 12 andFig. 13. However, partitioning the columns of TSVs, thehardware complexity declines as fewer net wirings andlogic components are needed. Fig. 12 demonstrates thedecreasing trend in area for the same bus size as the parti-tioning factor grows. A TSV with 512 bits and 8 partitionsoccupies almost20% of the area for the same bandwidthwith one partition, as reported in Fig. 12. For example,the power consumption of encoder in 512 bundle of TSVswith 8 partitions is almost20% of the same bandwidth withone partition, as shown in Fig. 13. Applications with morenumber of partitions have less power consumption due tohardware size reduction.

VIII. R ELATED WORK

Exascale systems will experience different kinds of faultsaccording to the current knowledge of existing super-computers. The power consumption budget for the futureexascale computer systems are dedicated to memory sub-system. 3D-stacking DRAM memories have been suggested

since current DRAMs will not meet the expected powerconsumption budget. A system-level design methodologyfor scalable fault-tolerance of distributed on-chip memoriesin NoCs has been introduced in [46]. The effects of tran-sient faults are examined in synchronous and asynchronous2D NoC [41]. An efficient Built-In-Self-Repair (BISR)algorithm to fulfil the test and reliability needs for 3D-stacked memories have been proposed [39]. The effectsof temperature, refresh period, and ECC policy on thereliability and power consumption of 3D-stacked embeddedDRAM are examined to minimize the energy consumptionwithout violating error rate limitation [40]. 3D DRAMcircuits include a large number of TSVs, which are proneto open defects and coupling noises. The faulty behaviorof TSV open defects occurred on the wordlines and thebitlines of 3D DRAM circuits are modeled in [42].

It is anticipated the existing resilient approaches, relyingon automatic or application level checkpoint-restart, willnot be practical as check pointing and restarting time willexceed the mean time to failure of a full system [43]. Theimpact of TSVs on SI in 3D ICs has been considered inseveral articles [12], [13]. Analytical model for the couplingcapacitance between pairs of TSVs is also reported in [12].A complete set of self-consistent equations including selfand coupling terms for resistance, capacitance and induc-tance of various TSV structures are presented in [13]. Fivesolutions are suggested in [10] to reduce the couplingincluding: increasing TSV distances, shielding the victimTSVs, insert buffers at the victim net, decreasing the driversize at the aggressor net, and increasing the load at bothvictim and aggressor net. The last two suggestions havenegative implications for timing performance, and othersneed high effort at post-design time. Many different codingtechniques have been presented to avoid the crosstalk issueamong communication links [44], [45]. The specificationof TSVs is also totally different from wires on a 2D chipas described in Section III. A coding scheme has beensuggested for a matrix of TSVs, reducing the maximumcrosstalk by25%. This approach is only applicable forcapacitive crosstalk (coupling). Also this method is notscalable and supports a mesh of TSV with size of3 × n,limiting the TSV insertion process. It imposes around40%information redundancy with an encoder and decoder ofquadratic complexity in circuit area [15]. However, theimpact of inductive coupling on SI which is more importantin higher frequencies rather capacitance coupling [35], hasnot been evaluated yet.

IX. CONCLUSION

Although 3D multiple-stacked IC is a promising solutionfor exascale computing, its vulnerability to inductive TSV-to-TSV coupling has not been extensively studied. A codingalgorithm is proposed to mitigate inductive TSV-to-TSVcoupling after characterizing such issue by modifying inputdata stream. The ICM of the algorithm is then gaugedin terms of various metrics such as mitigation measure,ICM gain, and data overhead. It is observed that the


algorithm provides a significant ICM gain for relativelysmall bus sizes, while it suffers from a descending trendin performance as bus size increases.

With large bus size applications such as 3D multiple-stacked IC, a partitioning approach is added to the algo-rithm to make it scalable with bus size. At the cost ofreasonable overhead, significant ICM gain is obtained evenin the case of large bus sizes. In addition to ICM gain andoverhead, other practical issues such as power consumptionand silicon area are also reported. It is observed that thecoding approach promises lower power consumption andsilicon area while providing considerable ICM gain forlarge bus sizes.

REFERENCES

[1] T. A. S. on Exascale Computing, “Report on exascale computing,”2010.

[2] “The 43rd top500 list,” http://www.top500.org/lists/2014/06/, 2014.[3] K. Yoon, G. Kim, W. Lee, T. Song, J. Lee, H. Lee, K. Park, and

J. Kim, “Modeling and analysis of coupling between tsvs, metal,and rdl interconnects in tsv-based 3d ic with silicon interposer,”in Electronics Packaging Technology Conference. EPTC ’09. 11th,2009, pp. 702–706.

[4] D. Henry, S. Cheramy, J. Charbonnier, P. Chausse, M. Neyret,G. Garnier, C. Brunet-Manquat, S. Verrun, N. Sillon, L. Bonnot,A. Farcy, L. Cadix, M. Rousseau, and E. Saugier, “Developmentand characterisation of high electrical performances tsv for 3d ap-plications,” inElectronics Packaging Technology Conference. EPTC’09. 11th, 2009, pp. 528–535.

[5] M. Jung, J. Mitra, D. Z. Pan, and S. K. Lim, “Tsv stress-awarefull-chip mechanical reliability analysis and optimizationfor 3d ic,”Commun. ACM, vol. 57, no. 1, pp. 107–115, Jan. 2014.

[6] “Reliability of {TSV} interconnects: Electromigration, thermal cy-cling, and impact on above metal level dielectric,”MicroelectronicsReliability, vol. 53, no. 1, pp. 17 – 29, 2013.

[7] A. Shayan, X. Hu, H. Peng, C.-K. Cheng, W. Yu, M. Popovich,T. Toms, and X. Chen, “Reliability aware through silicon viaplanning for 3d stacked ics,” inDesign, Automation Test in EuropeConference Exhibition, 2009. DATE ’09., April 2009, pp. 288–291.

[8] D. Pan, S.-K. Lim, K. Athikulwongse, M. Jung, J. Mitra, J. Pak,M. Pathak, and J. seok Yang, “Design for manufacturability andreliability for tsv-based 3d ics,” inDesign Automation Conference(ASP-DAC), 2012 17th Asia and South Pacific, Jan 2012, pp. 750–755.

[9] S. Itr, “ITRS 2012 Executive Summary,” ITRS.[10] C. Liu, T. Song, J. Cho, J. Kim, J. Kim, and S.-K. Lim, “Full-chip

tsv-to-tsv coupling analysis and optimization in 3d ic,” inDesignAutomation Conference (DAC), 2011 48th ACM/EDAC/IEEE, 2011,pp. 783–788.

[11] L. T. B. Wu, X. Gu and M. Ritter, “Electromagnetic modelingofmassively coupled through silicon vias for 3d interconnects,” inMicrowave and Optical Technology Letters, 2011, p. 12041206.

[12] I. Savidis and E. Friedman, “Closed-form expressions of3-d viaresistance, inductance, and capacitance,”Electron Devices, IEEETransactions on, vol. 56, no. 9, pp. 1873–1881, 2009.

[13] R. Weerasekera, M. Grange, D. Pamunuwa, H. Tenhunen, andL.-R.Zheng, “Compact modelling of through-silicon vias (tsvs) in three-dimensional (3-d) integrated circuits,” in3D System Integration,2009. 3DIC 2009. IEEE International Conference on, 2009, pp. 1–8.

[14] C. Metzler, A. Todri-Sanial, A. Bosio, L. Dilillo, P. Girard, A. Vi-razel, P. Vivet, and M. Belleville, “Computing detection probabilityof delay defects in signal line tsvs,” inTest Symposium (ETS), 18thIEEE European, 2013, pp. 1–6.

[15] R. Kumar and S. P. Khatri, “Crosstalk avoidance codes for3d vlsi,”in Design, Automation Test in Europe Conference Exhibition (DATE),2013, 2013, pp. 1673–1678.

[16] A. Eghbal, P. M. Yaghini, S. S. Yazdi, and N. Bagherzadeh, “Tsv-to-tsv inductive coupling-aware coding scheme for 3d network-on-chip,” in Defect and Fault Tolerance in VLSI and NanotechnologySystems (DFT), 2014 IEEE International Symposium on, Oct 2014,pp. 92–97.

[17] T. Xu, P. Liljeberg, and H. Tenhunen, “A study of throughsiliconvia impact to 3d network-on-chip design,” inElectronics and In-formation Engineering (ICEIE), 2010 International Conference On,vol. 1, Aug 2010, pp. V1–333–V1–337.

[18] S. Kumar and R. Van Leuken, “A 3d network-on-chip for stacked-die transactional chip multiprocessors using through silicon vias,” inDesign Technology of Integrated Systems in Nanoscale Era (DTIS),2011 6th International Conference on, April 2011, pp. 1–6.

[19] J. Lee, D. Lee, S. Kim, and K. Choi, “Deflection routing in3d network-on-chip with tsv serialization,” inDesign AutomationConference (ASP-DAC), 2013 18th Asia and South Pacific, Jan 2013,pp. 29–34.

[20] J. Jeddeloh and B. Keeth, “Hybrid memory cube new dram ar-chitecture increases density and performance,” inVLSI Technology(VLSIT), 2012 Symposium on, June 2012, pp. 87–88.

[21] J. T. Pawlowski, “Hybrid memory cube (hmc),” inHOT-CHIPS,2011.

[22] “Hybrid memory cube,” http://www.micron.com/products/hybrid-memory-cube.

[23] F. Dubois, A. Sheibanyrad, F. Petrot, and M. Bahmani, “Elevator-first: A deadlock-free distributed routing algorithm for verticallypartially connected 3d-nocs,”Computers, IEEE Transactions on,vol. 62, no. 3, pp. 609–615, 2013.

[24] Y. Cheng, L. Zhang, Y. Han, and X. Li, “Thermal-constrained taskallocation for interconnect energy reduction in 3-d homogeneousmpsocs,”Very Large Scale Integration (VLSI) Systems, IEEE Trans-actions on, vol. 21, no. 2, pp. 239–249, 2013.

[25] V. Pasca, L. Anghel, C. Rusu, and M. Benabdenbi, “Configurableserial fault-tolerant link for communication in 3d integrated sys-tems,” in On-Line Testing Symposium (IOLTS), 2010 IEEE 16thInternational, 2010, pp. 115–120.

[26] K. Tu, “Reliability challenges in 3d ic packaging technology,”Microelectronics Reliability, vol. 51, no. 3, pp. 517 – 523, 2011.

[27] S. Itr, “Table intc7 - itrs 2013 executive summary,” ITRS.[28] U. Tida, R. Yang, C. Zhuo, and Y. Shi, “On the efficacy of through-

silicon-via inductors,”Very Large Scale Integration (VLSI) Systems,IEEE Transactions on, vol. PP, no. 99, pp. 1–1, 2014.

[29] D. H. Kim, K. Athikulwongse, M. Healy, M. Hossain, M. Jung,I. Khorosh, G. Kumar, Y.-J. Lee, D. Lewis, T.-W. Lin, C. Liu,S. Panth, M. Pathak, M. Ren, G. Shen, T. Song, D. H. Woo, X. Zhao,J. Kim, H. Choi, G. Loh, H.-H. Lee, and S. K. Lim, “3d-maps: 3dmassively parallel processor with stacked memory,” inSolid-StateCircuits Conference Digest of Technical Papers (ISSCC), 2012 IEEEInternational, Feb 2012, pp. 188–190.

[30] D. H. Kim and S.-K. Lim, “Design quality trade-off studies for 3-dics built with sub-micron tsvs and future devices,”Emerging andSelected Topics in Circuits and Systems, IEEE Journal on, vol. 2,no. 2, pp. 240–248, 2012.

[31] K. Salah, H. Ragai, Y. Ismail, and A. El Rouby, “Equivalent lumpedelement models for various n-port through silicon vias networks,”in Design Automation Conference (ASP-DAC), 16th Asia and SouthPacific, 2011, pp. 176–183.

[32] B. Wu, X. Gu, L. Tsang, and M. B. Ritter, “Electromagneticmodel-ing of massively coupled through silicon vias for 3d interconnects,”Microwave and Optical Technology Letters, vol. 53, no. 6, pp. 1204–1206, 2011.

[33] J. e. Warnock, “A 5.2ghz microprocessor chip for the ibm zenterprisesystem,” in Solid-State Circuits Conference Digest of TechnicalPapers (ISSCC), 2011 IEEE International, 2011, pp. 70–72.

[34] G. Katti, M. Stucchi, K. De Meyer, and W. Dehaene, “Electricalmodeling and characterization of through silicon via for three-dimensional ics,”Electron Devices, IEEE Transactions on, vol. 57,no. 1, pp. 256 –262, jan. 2010.

[35] A. Todri, S. Kundu, P. Girard, A. Bosio, L. Dilillo, and A. Virazel, “Astudy of tapered 3-d tsvs for power and thermal integrity,”Very LargeScale Integration (VLSI) Systems, IEEE Transactions on, vol. 21,no. 2, pp. 306–319, 2013.

[36] PTM, “Predictive Technology Model,” ptm.asu.edu.[37] C. Bienia and K. Li, “Parsec 2.0: A new benchmark suite forchip-

multiprocessors,” inProceedings of the 5th Annual Workshop onModeling, Benchmarking and Simulation, June 2009.

[38] Intel-cooperation, “Pin - A Dynamic Binary Instrumentation Tool,”https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool.

[39] X. Wang, D. Vasudevan, and H.-H. Lee, “Global built-in self-repairfor 3d memories with redundancy sharing and parallel testing,” in 3D


Systems Integration Conference (3DIC), 2011 IEEE International,2012, pp. 1–8.

[40] W. Yun, K. Kang, and C.-M. Kyung, “Thermal-aware energy mini-mization of 3d-stacked l3 cache with error rate limitation,” in Circuitsand Systems (ISCAS), 2011 IEEE International Symposium on, 2011,pp. 1672–1675.

[41] P. M. Yaghini, A. Eghbal, H. Pedram, and H. R. Zarandi, “Inves-tigation of transient fault effects in synchronous and asynchronousnetwork on chip router,”J. Syst. Archit., vol. 57, no. 1, pp. 61–68,Jan. 2011.

[42] L. Jiang, Y. Liu, L. Duan, Y. Xie, and Q. Xu, “Modeling tsvopendefects in 3d-stacked dram,” inTest Conference (ITC), 2010 IEEEInternational, 2010, pp. 1–9.

[43] F. Cappello, A. Geist, B. Gropp, L. Kale, B. Kramer, and M.Snir,“Toward exascale resilience,”Int. J. High Perform. Comput. Appl.,vol. 23, no. 4, pp. 374–388, Nov. 2009.

[44] B. Wang, J. Xie, and Q. Wang, “Crosstalk-aware channel transmit-ting scheme for error resilience noc interconnects,” inElectronics,Communications and Control (ICECC), 2011 International Confer-ence on, 2011, pp. 190–193.

[45] M. Shafaei, A. Patooghy, and S. Miremadi, “Numeral-basedcrosstalkavoidance coding to reliable noc design,” inDigital System Design(DSD), 2011 14th Euromicro Conference on, 2011, pp. 55–62.

[46] A. BanaiyanMofrad, N. Dutt, and G. Girao, “Modeling and analy-sis of fault-tolerant distributed memories for networks-on-chip,” inProceedings of the Conference on Design, Automation and Test inEurope, ser. DATE ’13, 2013, pp. 1605–1608.

Pooria M.Yaghini is a Ph.D. candidate in de-partment of electronic engineering and computerscience at University of California, Irvine. Hehas received his B.S. in computer hardware En-gineering from Sadjad University in 2006 andhis M.S. degree in computer engineering fromAmirkabir University of Technology (TehranPolytechnic) in 2009. He is also a recipient of2011 Henry Samueli Endowed Fellowship. Hisresearch interests include 3D IC design, on-chipnetworks, low-power design, reliability analysis

and verification, and fault-tolerant architecture design.

Ashkan Eghbal received the B.Sc. degree inComputer Hardware Engineering from AzadUniversity Central Tehran Branch, Tehran, Iran,in 2007, and the M.Sc. in Advanced ComputerArchitecture from Amirkabir University of Tech-nology, Tehran, Iran, in 2010, respectively. Heis currently pursuing PhD degree in ComputerSystem and Software with University of Cali-fornia, Irvine. He has been with the AdvancedComputer Architecture Group, UCI, since 2010as a research and teacher assistant. His current

research interests include the Reliability analysis, On-Chip Interconnec-tion Network, 3D Stack Architectures, Fault-tolerant design, and EnergyEfficient Embedded Systems.

Misagh Khayambashi is a PhD student inthe Department of Electrical Engineering at theUniversity of California Irvine. He completedhis B.Sc. at Isfahan University of Technology,Iran. His research interests lie in the area ofstatistical and deterministic signal processing,system modeling and identification, compressivesensing, and estimation theory.

Nader Bagherzadehis a professor of computerengineering in the department of electrical engi-neering and computer science at the Universityof California, Irvine, where he served as a chairfrom 1998 to 2003. Dr Bagherzadeh has beeninvolved in research and development in theareas of: computer architecture, reconfigurablecomputing, VLSI chip design, network-on-chip,3D chips, sensor networks, and computer graph-ics since he received a Ph.D. degree from theUniversity of Texas at Austin in 1987. He is

a Fellow of the IEEE. Professor Bagherzadeh has published more than200 articles in peer-reviewed journals and conferences. Hehas trainedhundreds of students who have assumed key positions in software andcomputer systems design companies in the past twenty years. He hasbeen a PI or Co-PI on more than $8 million worth of research grants fordeveloping next generation computer systems for applications in generalpurpose computing and digital signal processing.


APPENDIX

PROOF OFPROBABILITY OF BIT INVERSION

In this section, the indexing<> [i][j] is replaced with<>i,j for notational simplicity. Also,RFA andRFB arereplaced withA andB.

To calculate the probability of a cell being marked forinversion, i.e.P(Ii,j = 1), the decision making mechanismin the algorithm should be analyzed. From the algorithm,the following expression should be calculated:

P(Ii,j = 1) = P(Ai,j = 1 andBi,j = 1) (18)

Depending on the value of(i, j), the calculation of thisexpression takes different forms:

A. i ∈ {3, · · · , NR − 2} and j ∈ {2, · · · , NC − 1}

Based on the algorithms and the relation between datavalue and current directions:

P(Ii,j = 1) =

P(di,j = 1)×[

P(ci,j = 1|di,j = 1)×P(Pi−1,j andPi+1,j > 0|ci,j = 1)+P(ci,j = 0|di,j = 1)×

P(Pi−1,j andPi+1,j > 0|ci,j = 0)]

+

P(di,j = 0)×[

P(ci,j = −1|di,j = 0)×P(Pi−1,j andPi+1,j < 0|ci,j = −1)+P(ci,j = 0|di,j = 0)×

P(Pi−1,j andPi+1,j < 0|ci,j = 0)]

(19)and also

P(Pi−1,j andPi+1,j > 0|ci,j = 1) =P(ci−1,j−1 + ci−1,j+1 + ci−2,j > −1)×P(ci+1,j+1 + ci+1,j−1 + ci+2,j > −1)

(20)

which is calculated by counting all the possibilities andconsideringP(c = 0) = 1/2 (0 → 0 or 1 → 1), P(c =1) = 1/4 (0 → 1), andP(c = −1) = 1/4 (1 → 0). Puttingit altogether:

P(Ii,j = 1) = .5(

.5( 1132 )(1132 ) + .5( 2132 )(

2132 ))

+

.5(

.5( 1132 )(1132 ) + .5( 2132 )(

2132 ))

= 2811024

(21)

B. i ∈ {2, NR − 1} and j ∈ {2, · · · , NC − 1}

The calculation is the same, except for that wheni =2(NR − 1), ci−2,j(ci+2,j) does not enter the calculationsbecause its index is out of range. We will have:

P(Ii,j = 1) = .5(

.5( 1132 )(516 ) + .5( 2132 )(

1116 ))

+

.5(

.5( 1132 )(516 ) + .5( 2132 )(

1116 ))

= 143512

(22)

C. i ∈ {1, NR} and j ∈ {2, · · · , NC − 1}

When i = 1(NR), RFA(RFB) is always set to 1 andall the probabilities dependent on values of currents at theabove (below) rows simply drop:

P(Ii,j = 1) = .5(

.5( 1132 ) + .5( 2132 ))

+

.5(

.5( 1132 ) + .5( 2132 ))

= 12

(23)

D. i ∈ {3, · · · , NR − 2} and j ∈ {1, NC}

Whenj = 1(Nc), c∗,j−1(c∗,j+1) can be eliminated fromcalculations:

P(Ii,j = 1) = .5(

.5( 516 )(

516 ) + .5( 1116 )(

1116 ))

+

.5(

.5( 516 )(

516 ) + .5( 1116 )(

1116 ))

= 73256

(24)

E. i ∈ {2, NR − 1} and j ∈ {1, NC}

When(i, j) = (2, 1), only ci−1,j+1, ci+1,j+1, andci+2,j

contribute to the calculation:

P(Ii,j = 1) = .5(

.5( 14 )(516 ) + .5( 34 )(

1116 ))

+

.5(

.5( 14 )(516 ) + .5( 34 )(

1116 ))

= 1964

(25)

The other 3 possibilities for(i, j) are similar.

F. i ∈ {1, NR} and j ∈ {1, NC}

When(i, j) = (1, 1), only ci+1,j+1 andci+2,j contributeto the calculation andRFA can be neglected:

P(Ii,j = 1) = .5(

.5( 516 ) + .5( 1116 )

)

+

.5(

.5( 516 ) + .5( 1116 )

)

= 12

(26)

The other 3 possibilities for(i, j) are similar.

Date post:	25-Aug-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION...

Documents