+ All Categories
Home > Documents > DOIshanbhag.ece.illinois.edu/publications/yongjune-tcom... · 2018. 9. 9. · difference between BL...

DOIshanbhag.ece.illinois.edu/publications/yongjune-tcom... · 2018. 9. 9. · difference between BL...

Date post: 09-Nov-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
17
Title: Generalized water-filling for source-aware energy-efficient SRAMs Archived version Accepted manuscript: the content is identical to the published paper, but without the final typesetting by the publisher Published version DOI : 10.1109/TCOMM.2018.2841406 Journal homepage https://www.comsoc.org/tc Authors (contact) Yongjune Kim ([email protected]) Mingu Kang ([email protected]) Lav R. Varshney ([email protected]) Naresh R. Shanbhag ([email protected]) Affiliation University of Illinois at Urbana Champaign Article begins on next page
Transcript
Page 1: DOIshanbhag.ece.illinois.edu/publications/yongjune-tcom... · 2018. 9. 9. · difference between BL and BL-bar (BLB).The symmetric structure of SRAMs (Fig. 1(b)) allows for differential

Title: Generalized water-filling for source-aware energy-efficient SRAMs

Archived version Accepted manuscript: the content is identical to the published paper, but without the final typesetting by the publisher

Published version DOI :

10.1109/TCOMM.2018.2841406

Journal homepage https://www.comsoc.org/tc

Authors (contact)

Yongjune Kim ([email protected]) Mingu Kang ([email protected]) Lav R. Varshney ([email protected]) Naresh R. Shanbhag ([email protected])

Affiliation University of Illinois at Urbana Champaign

Article begins on next page

Page 2: DOIshanbhag.ece.illinois.edu/publications/yongjune-tcom... · 2018. 9. 9. · difference between BL and BL-bar (BLB).The symmetric structure of SRAMs (Fig. 1(b)) allows for differential

1

Generalized Water-filling for Source-awareEnergy-efficient SRAMs

Yongjune Kim, Mingu Kang, Member, IEEE, Lav R. Varshney, Senior Member, IEEE,and Naresh R. Shanbhag, Fellow, IEEE

Abstract—Conventional low-power static random access mem-ories (SRAMs) reduce read energy by decreasing the bit-linevoltage swings uniformly across the bit-line columns. This isbecause the read energy is proportional to the bit-line swings.On the other hand, bit-line swings are limited by the need toavoid decision errors especially in the most significant bits. Wepropose a principled approach to determine optimal non-uniformbit-line swings by formulating convex optimization problems. Fora given constraint on mean squared error of retrieved words, weconsider criteria to minimize energy (for low-power SRAMs),maximize speed (for high-speed SRAMs), and minimize energy-delay product. These optimization problems can be interpretedas classical water-filling, ground-flattening and water-filling, andsand-pouring and water-filling, respectively. By leveraging theseinterpretations, we also propose greedy algorithms to obtainoptimized discrete swings. Numerical results show that energy-optimal swing assignment reduces energy consumption by half ata peak signal-to-noise ratio of 30dB for an 8-bit accessed word.The energy savings increase to four times for a 16-bit accessedword.

Index Terms—Static random access memory (SRAM), infor-mation theory, convex optimization, discrete optimization.

I. INTRODUCTION

Von Neumann computing architectures separate memoryunits from computing units, leading to frequent data accessthat consumes enormous energy. Since static random accessmemories (SRAMs) access requires more energy than arith-metic operations [1], SRAM access energy accounts for thesignificant part of the total energy consumption in manyinformation processing circuits [2]–[8]. Thus, it is importantto reduce the energy consumption of SRAM access. The basicway to reduce the access energy is to decrease either supplyvoltages or bit-line (BL) swings, which increases vulnerabilityto variations and noise. If we reduce supply voltages or BLswings across all BL columns [9], [10], then bit error rates(BERs) of all bit positions increase equally.

In many error tolerant applications including signal process-ing and machine learning (ML) tasks, however, the impact ofbit errors depends on bit position. For example, errors in themost significant bits (MSBs) of image pixels degrade overall

Manuscript received November 27, 2017; revised March 16, 2018 and April17, 2018; accepted May 14, 2018. This work was supported in part by Systemson Nanoscale Information fabriCs (SONIC), one of the six SRC STARnetCenters, sponsored by MARCO and DARPA. This work was presented inpart at the IEEE International Symposium on Information Theory (ISIT), Vail,CO, USA, June 2018.

The authors are with the Coordinated Science Laboratory, Univer-sity of Illinois at Urbana-Champaign, Urbana, IL 61801, USA (e-mail: [email protected]; [email protected]; [email protected];[email protected]).

image quality much more than errors in the least significantbits (LSBs). Likewise, an MSB error can cause a catastrophicloss in the inference accuracy of ML applications.

Until now, the following techniques have been proposed toaddress the different impacts of each bit position for energyefficiency:

1) Storing the MSBs in more robust bit cells and the LSBsin less robust cells [11], [12],

2) Applying higher supply voltage for the MSBs and lowersupply voltage for the LSBs [13]–[15],

3) Unequal error protection (UEP) by error control codes(ECCs) [16]–[18],

4) LSB dropping (dropping the LSBs at the cost of reducedarithmetic precision) [19]–[21].

The first approach requires costly bit cells redesign andmanual array reorganization. Also, the bit cells are fixedat design time, so it is unable to dynamically track thetime-varying fidelity requirement [21]. The second approachemploys different supply voltages for each bit position, whichsignificantly complicates the power routing network. Practicalimplementations only allow a few supply voltage levels [14],[15]. UEP attempts to assign different protection levels tobits according to their significance [22], [23]. Fine-grainedUEP is limited due to the stringent low-latency constraint ofSRAMs [16]. Hence, most UEP schemes for SRAMs considertwo levels of protection [16]–[18], [21]. Another drawback ofUEP is area overhead to store parity bits. LSB dropping [19]–[21] does not suffer from area overhead and enables dynamicfidelity control by changing the number of dropped LSBs.Note that LSB dropping allows two levels of granularity(dropped/undropped) for each bit position. Recently, selectiveECCs were proposed by combining UEP and LSB drop-ping [20], [21]. Since parity bits are stored in dropped LSB-cells, the selective ECC does not suffer from area penalty.In [24], the authors proposed adaptive coding techniques fordifferent computations inspired by selective ECC techniques.

This paper presents an information-theoretic approach todetermine the optimal BL swing assignments. For a givenconstraint on mean squared error (MSE) of retrieved words,we formulate convex optimization problems whose objectivesare as follows:C1. Minimize energy (low-power SRAMs),C2. Maximize speed (high-speed SRAMs),C3. Minimize energy-delay product (EDP).Solutions to these convex optimization problems yield optimalperformance that is theoretically attainable. By casting read

Page 3: DOIshanbhag.ece.illinois.edu/publications/yongjune-tcom... · 2018. 9. 9. · difference between BL and BL-bar (BLB).The symmetric structure of SRAMs (Fig. 1(b)) allows for differential

2

access for SRAMs as communication over parallel channels,we investigate the fundamental trade-offs between physicalresources (energy, delay, and EDP) and a fidelity (MSE)constraint.

In addition, we provide generalized water-filling interpreta-tions for our optimal solutions. This follows since accessinga B-bit word is equivalent to communicating informationthrough B parallel channels. In classical water-filling, theground represents the noise levels of parallel channels [25],[26]. On the other hand, the importance of each bit position de-termines the ground levels in our optimization problems. Eachoptimization problem has its own interpretation depending onits objective function: water-filling (C1), ground-flattening andwater-filling (C2), and sand-pouring and water-filling (C3),respectively. We also observe interesting connections betweenour problems and variants on water-filling such as constant-power water-filling [27], [28] and mercury/water-filling [29].Also, we show that the proposed optimization techniques canbe extended to a wide range of sources and noise models.

Furthermore, we propose an SRAM circuit architecture toassign non-uniform bit-level swings effectively. The proposedarchitecture separates the data for each bit position in differentSRAM subarrays by interleaving. The proposed architectureenables fine-grained and dynamic control of bit-level swingsdepending on time-varying fidelity requirements with little cir-cuit complexity overhead. Also, we propose greedy algorithmsto optimize swing values drawn from a discrete set due tocircuit implementation limitations. Generalized water-fillinginterpretations and Karush-Kuhn-Tucker (KKT) conditions areleveraged to develop these discrete optimization algorithms.

The rest of this paper is organized as follows. Section IIintroduces key metrics of energy, delay, and fidelity. Section IIIformulates the convex optimization problems to determinethe optimum swings and provides generalized water-fillinginterpretations. Section IV shows that the proposed opti-mization techniques can be extended to various source andnoise models. Section V investigates the SRAM architectureand develops greedy algorithms to optimize discrete swings.Section VI gives numerical results and Section VII concludes.

II. SRAM METRICS FOR RESOURCE AND FIDELITY

The total energy in an SRAM read access is given by

Etotal = Earray + Eperi + Eleakage (1)

where Earray and Eperi denote the dynamic energy consumptionfrom the SRAM bit cell array and the peripheral circuitry,respectively, and Eleakage represents the energy loss due toleakage. Earray is the dominant component of read energyconsumption in high-density SRAMs during normal read op-erations [3], [10], [30]. Hence, we focus on Earray, which isgiven by

Earray ∝ NBLNWLCbitVdd∆ (2)

where NBL and NWL are the numbers of bit-lines (BLs) andword-lines (WLs) in a memory bank, respectively. Cbit is theBL capacitance per bit cell and Vdd is the supply voltage. Also,∆ denotes the voltage swing in read access.

Ro

w D

eco

de

r

Swing

BL0 BLB0

Sense

Amplifier

Column

Mux

Cell

Cell

Cell

Cell

Cell

Cell

Cell

Cell

Sense

Amplifier

Column

Mux

Cell

Cell

Cell

Cell

Cell

Cell

Cell

Cell

Sense

Amplifier

Column

Mux

Cell

Cell

Cell

Cell

Cell

Cell

Cell

Cell

Precharge

WL0

WL1

WL2

WL3

BL5 BLB5

(a)

WL WL

xi xi

BL BLB

(b)

Fig. 1. SRAM’s structure: (a) NBL × NWL SRAM block (NBL = 6 andNWL = 4) and (b) schematic of a 6T SRAM bit cell.

As shown in Fig. 1(a), the voltage swing ∆ is the voltagedifference between BL and BL-bar (BLB). The symmetricstructure of SRAMs (Fig. 1(b)) allows for differential signal-ing by using the voltage difference between BL and BLB.This voltage difference occurs because either BL or BLB isdischarged according to the stored bit xi during read operation.A sense amplifier (SA) detects which line (BL or BLB) hasthe higher voltage and decides whether the correspondingbit cell stores 1 or 0. The SA decides that x = 1 ifVBL − VBLB = ∆ > 0 (VBL and VBLB denote the voltagesof BL and BLB, respectively). Otherwise, the retrieved valueis xi = 0.

The voltage swing ∆ can be controlled by changing theWL pulse-width (i.e., WL activation time) TWL during readoperation since

∆ =Ic

NWLCbit· TWL (3)

where Ic is the discharge current corresponding to the accessedbit cell [10]. From (2) and (3), we can observe that Earray isdirectly proportional to TWL. Also, TWL has a direct impacton the read access time [10], [31].

In SRAMs, the voltage swing ∆ is determined by readoperation parameters as shown in (3). Even if we read thedata from the same SRAM cells, we can control the biterror probability during read operation by changing WL pulse-width TWL. Unlike communication systems where the transmitpower decides the noise margin of signals, the noise marginof SRAMs (i.e., ∆) can be controlled during read operations.

Since larger voltage swing ∆ improves the noise margin,there are trade-off relations between reliability, energy, and

Page 4: DOIshanbhag.ece.illinois.edu/publications/yongjune-tcom... · 2018. 9. 9. · difference between BL and BL-bar (BLB).The symmetric structure of SRAMs (Fig. 1(b)) allows for differential

3

delay. These relations will be explained in the followingsubsections.

A. Resource Metrics for Accessing B-bit Word: Energy, Delay,and EDP

We define resource metrics for energy, delay, and EDPfor accessing a B-bit word. First, read access energy can bedefined as follows.

Definition 1: The read energy to access a B-bit word is

E(∆) =

B−1∑b=0

∆b = 1T∆ (4)

where E(∆) represents Earray in (1). Also, 1 denotes the all-one vector and the superscript T denotes transpose. Note that∆ = (∆0, . . . ,∆B−1) where ∆b denotes the swing for the bthbit position in a B-bit word and b ∈ J0, B − 1K (for integersi and j such that i < j, Ji, jK = {i, . . . , j}).

Definition 2: The maximum swing corresponding to a B-bitword is

ρ = max(∆) = max {∆0, . . . ,∆B−1} . (5)

If we allot non-uniform swings for each bit position,the access time for a B-bit word depends on Tmax =max{TWL,0, . . . , TWL,B−1} where TWL,b denotes the WLpulse-width for the bth bit position. Note that Tmax is the pulse-width corresponding to the maximum swing ρ because of (3).Since a B-bit word is retrieved by accessing corresponding Bbit cells in parallel by enabling the same WL, the read accesstime is determined by Tmax. Hence, the maximum swing ρ isa proper metric to be minimized to maximize read speed.

The EDP is considered to be a fundamental metric as itcaptures the trade-off between energy and delay [32], [33].The EDP was proposed to consider the energy and thespeed jointly. In [34], the generalized EDP was proposed byweighting the delay part even more depending on applications.We define the EDP for accessing a B-bit word based onDefinitions 1 and 2.

Definition 3: The EDP to access a B-bit word is

EDP(∆, w) = E(∆) · ρw = 1T∆ · ρw. (6)

where w is usually chosen among 1, 2, and 3 [35]. Theoriginal EDP [32] corresponds to EDP(∆, w = 1), which iswidely used. A larger w would be selected for a higher speedperformance application. Note that minimizing EDP(∆, w) isequivalent to a form of multi-objective joint optimization [36].

B. Fidelity Metric for Accessing B-bit Word: MSE

We will define a fidelity metric for accessing a B-bit word.Suppose that a B-bit word x = (x0, . . . , xB−1) is stored inSRAM cells, where x0 and xB−1 are the LSB and MSB,respectively. Note that x can be represented by

x =

B−1∑b=0

2bxb (7)

where xb ∈ {0, 1} and x ∈ J0, 2B − 1K. Also, x̂ =(x̂0, . . . , x̂B−1) denotes the retrieved B-bit word. A decisionerror flips the original bit xb as follows:

x̂b = xb ⊕ εb (8)

where ⊕ denotes XOR operator and εb = 1 denotes a bit errorin bth bit position. The decimal representation of the retrievedword is x̂ =

∑B−1b=0 2bx̂b.

We assume that the bit errors are symmetric and independentin (8). The symmetric error assumption is valid becauseSRAM cells (Fig. 1(b)) have symmetric structure [16]. Thecommon 6T SRAM cells are designed and fabricated to besymmetric [37]–[39]. The symmetric error model is foundin many papers in the literature [16], [17], [39]. The inde-pendent error model is justified by two reasons. First, theSRAM measurements show negligible spatial correlation [40],[41]. Further, most SRAMs adopt the interleaved architecture.Due to the SRAM’s variation properties and the interleavedarchitecture, bit errors are independent.

The decimal error e is given by

e = x̂− x =

B−1∑b=0

2beb (9)

where eb = x̂b − xb ∈ {−1, 0, 1}. Since major noise sourcesof SRAMs are well modeled as Gaussian distributions [39],[40], [42]–[44], the bit error probability of the bth bit positionis given by

pb = Pr (εb = 1) = Q

(∆b

σ

)(10)

where ∆b and σ2 denote the swing of bth bit position andthe noise variance in the corresponding BL, respectively. Notethat Q(x) =

∫∞x

1√2π

exp(− t

2

2

)dt. By increasing ∆b in (10),

we can reduce pb. However, larger ∆b implies more energyconsumption and slower speed (see Definitions 1 and 2).

To measure memory retrieval reliability, bit error probability(10) is not appropriate for many applications, since it does notdistinguish the differential impact of MSB and LSB errors.Hence, we use the MSE as a fidelity metric.

Definition 4: The MSE of x is given by

MSE(x) = E[(x̂− x)2

]= E

[e2]. (11)

Lemma 5: For a uniformly distributed x, MSE(x) is givenby

MSE(x) = MSE(∆) =

B−1∑b=0

4bQ

(∆b

σ

). (12)

Proof: If x is uniformly distributed, the xbs are indepen-dent and identically distributed (i.i.d.) and follow the Bernoullidistribution Ber

(12

). The MSE of x is given by

MSE(x) = E

(B−1∑b=0

2beb

)2 =

B−1∑b=0

4bpb (13)

=

B−1∑b=0

4bQ

(∆b

σ

)(14)

Page 5: DOIshanbhag.ece.illinois.edu/publications/yongjune-tcom... · 2018. 9. 9. · difference between BL and BL-bar (BLB).The symmetric structure of SRAMs (Fig. 1(b)) allows for differential

4

TABLE IRESOURCE AND FIDELITY METRICS FOR SINGLE-BIT AND B-BIT WORD

ACCESS

Single bit B-bit word Remarks

Energy ∆ E(∆) = 1T∆ Definition 1Delay ∆ ρ = max(∆) Definition 2EDP ∆2 EDP(∆) = E(∆) · ρ Definition 3

Fidelity p = Q(

∆σ

)MSE(∆) =

∑4bQ

(∆bσ

)Lemma 5

where (13) follows from E[e2b

]= E [εb] = pb and E [eiej ] =

0 since the ebs are independent and E [eb] = 0 for xb ∼Ber

(12

)[16]. In addition, (14) follows from (10). Because

MSE(x) is a function of ∆, we set MSE(x) = MSE(∆).Note that MSE(x) is the nonnegative weighted sum of bit

error probabilities. The weight 4b represents the differentialimportance of each bit position. We show that MSE(x) isconvex.

Lemma 6: MSE(∆) is a convex function of ∆.Proof: Q(x) is convex for x ≥ 0 because d2Q(x)

dx2 =x√2π

exp(−x

2

2

)≥ 0. Since ∆b ≥ 0 and MSE(∆) is the

nonnegative weighted sum of Q(

∆b

σ

), MSE(∆) is convex.

A signed number x can be represented by x = −xB−1·2B−1+∑B−2b=0 2bxb whose MSE(x) is the same as (12).Table I summarizes the key resource and fidelity metrics for

single-bit and B-bit word accesses.

III. OPTIMAL BIT-LEVEL SWINGS

We formulate convex optimization problems to determinethe optimum swings. For a given constraint on MSE, weattempt to (1) minimize energy (low-power SRAMs), (2)maximize speed (high-speed SRAMs), and (3) minimize EDP.Also, we provide generalized water-filling interpretations ofthese optimization problems based on KKT conditions.

A. Energy Minimization

Here, we minimize the read energy for a given constraint onMSE. Hence, we formulate the following convex optimizationproblem.

minimize∆

E(∆) = 1T∆

subject toB−1∑b=0

4bQ

(∆b

σ

)≤ V

∆b ≥ 0, b = 0, . . . , B − 1

(15)

where V is a constant corresponding to the given constraintof MSE. Since the objective and constraints are convex, theoptimization problem (15) is convex. The optimal solution canbe derived by KKT conditions.

Theorem 7: The optimal swing ∆∗ of (15) is given by

∆∗b =

0, if ν ≤√

2πσ4b ,

σ

√2 log

(4b√2πσ· ν), otherwise

(16)

where ν is a dual variable.

0

(LSB)

b B - 1

(MSB)

Water level:

Water depth:

Ground level:

Bit position

(a)

0

(LSB)

b B - 1

(MSB)

Reverse

water level:

Water depth:

Bit position

Reverse

ground level:

(b)

Fig. 2. Graphical interpretations of Theorem 7: (a) water-filling and (b)reverse water-filling.

Proof: We define the Lagrangian L1(∆, ν, λ) associatedwith problem (15) as

L1(∆, ν, λ) = 1T∆ + ν

(B−1∑b=0

4bQ

(∆b

σ

)− V

)

−B−1∑b=0

λb∆b (17)

where ν and λ = (λ0, . . . , λB−1) are the dual variables. Thesolution (16) is derived from L1 and the KKT conditions. Thedetails of the proof are given in Appendix A.

The optimal solution (16) can be interpreted as classicalwater-filling or reverse water-filling as shown in Fig. 2. Eachbit position can be regarded as an individual channel among Bparallel channels. In the water-filling interpretation (Fig. 2(a)),the ground levels depend on the importance of bit positions.Hence, larger swings are assigned to more significant bitpositions. For a bit position b such that ν >

√2πσ4b , we can

readily obtain the following equation (see Appendix A):

log ν = log

√2πσ

4b+

∆2b

2σ2(18)

where log ν, log√

2πσ4b , and ∆2

b

2σ2 represent the water level, theground level, and the water depth, respectively. The water levellog ν depends on the MSE constraint V . Fig. 2(b) illustrates areverse water-filling interpretation of (16). For a bit position bsuch that 1

ν <4b√

2πσ, by modifying (18), we can readily obtain

log4b√2πσ

= log1

ν+

∆2b

2σ2(19)

Page 6: DOIshanbhag.ece.illinois.edu/publications/yongjune-tcom... · 2018. 9. 9. · difference between BL and BL-bar (BLB).The symmetric structure of SRAMs (Fig. 1(b)) allows for differential

5

where log 4b√

2πσand log 1

ν denote the reverse ground level andthe reverse water level, respectively.

Remark 8 (LSB dropping and constant-power water-filling):Constant-power water-filling activates the subset of paral-lel channels but with a constant power allocation [27],[28]. Constant-power water-filling in communication theory isequivalent to LSB dropping in circuit theory [19]–[21] sinceLSB dropping allocates uniform swings for undropped bitpositions.

B. Speed Maximization

Here, we maximize the speed of read access for a givenconstraint on MSE. The maximum speed is achieved byminimizing ρ since ρ is proportional to the maximum pulse-width Tmax.

minimize∆

ρ = max {∆0, . . . ,∆B−1}

subject toB−1∑b=0

4bQ

(∆b

σ

)≤ V

∆b ≥ 0, b = 0, . . . , B − 1

(20)

By introducing an additional variable ξ, we can reformulate(20) as

minimize∆,ξ

ξ

subject toB−1∑b=0

4bQ

(∆b

σ

)≤ V

0 ≤ ∆b ≤ ξ, b = 0, . . . , B − 1

(21)

This reformulated optimization problem is also convex. FromKKT conditions, we show that ξ = ρ (see Appendix B).

Theorem 9: The optimal swing ∆∗ of (20) is given by

∆∗b = ρ = ξ = σ

√2 log

(4B − 1

3√

2πσ· ν)

(22)

for all b ∈ J0, B − 1K. Note that ν is a dual variable.Proof: We define the Lagrangian L2(∆, ξ, ν, λ, η) asso-

ciated with problem (21) as

L2(∆, ξ, ν, λ, η) = ξ + ν

(B−1∑b=0

4bQ

(∆b

σ

)− V

)

−B−1∑b=0

λb∆b +

B−1∑b=0

ηb(∆b − ξ) (23)

where ν, λ = (λ0, . . . , λB−1), and η = (η0, . . . , ηB−1) aredual variables. The optimal solution (22) can be derived fromL2 and corresponding KKT conditions. The details of theproof are given in Appendix B.

The optimal solution (22) can be interpreted as ground-flattening and water-filling. For any b ∈ J0, B− 1K, we derivethe following equation (see Appendix B):

log ν = log

√2πσ

4b+ log ηb +

∆2b

2σ2(24)

where log ν, log√

2πσ4b , log ηb, and ∆2

b

2σ2 represent the waterlevel, the ground level, the ground-flattening term, and the

Ground level:

Flattenedground level:

Flattening term:

0

(LSB)

b B - 1

(MSB)

Bit position

(a)

Water level:

Water depth:

0

(LSB)

b B - 1

(MSB)

Bit position

Flattenedground level:

(b)

Fig. 3. Ground-flattening and water-filling interpretation of Theorem 9: (a)ground-flattening and (b) water-filling (after ground-flattening).

water depth, respectively. Compared with (18), we observethat (24) has an additional ground-flattening term log ηb. Bysolving KKT conditions, we show that

log ηb = log3

4B − 1· 4b < 0. (25)

Hence, the flattened ground level (i.e., the sum of the groundlevel and the ground flattening term) is given by

log

√2πσ

4b+ log ηb = log

3√

2πσ

4B − 1. (26)

Since the unequal ground levels are flattened by the flatteningterms, the water depths of all bit positions are identical afterwater-filling (Fig. 3(b)). In addition, the optimal solution (22)can be interpreted by sand-pouring and reverse water-filling.We can modify (26) into log 4b

√2πσ

+ log 1ηb

= log 4B−13√

2πσ.

The positive sand depth (i.e., log 1ηb

) (see (25)) fills the gap

between each reverse ground level (i.e., log 4b√

2πσ) and the

reverse flattened ground level (i.e., log 4B−13√

2πσ) (Fig. 4(a)),

which results in uniform swings as shown in Fig. 4(b).Remark 10: Conventional uniform swing assignment maxi-

mizes the read access speed.Remark 11: For conventional uniform swing assignment, the

MSE is given by

MSE(x) =4B − 1

3· p (27)

which comes from Lemma 5 and pb = p for any b ∈ J0, B−1K.Remark 12: The overall bit error rate (BER) is the sum

of bit error probabilities of all bit positions, i.e., BER =∑B−1b=0 Q

(∆b

σ

). Since Q(·) is convex (see the proof of

Lemma 6), the uniform swing assignment minimizes theoverall BER.

If we do not consider the differential importance of eachbit position, the conventional uniform swing is optimal since

Page 7: DOIshanbhag.ece.illinois.edu/publications/yongjune-tcom... · 2018. 9. 9. · difference between BL and BL-bar (BLB).The symmetric structure of SRAMs (Fig. 1(b)) allows for differential

6

0

(LSB)

b B - 1

(MSB)

Reverse

flattened-ground:

Reverse

ground level:

Bit position

Sand depth:

(a)

0

(LSB)

b B - 1

(MSB)

Reverse

flattened-ground:

Bit position

Reverse

water level:

Water depth:

(b)

Fig. 4. Sand-pouring and reverse water-filling interpretation of Theorem 9:(a) sand-pouring and (b) reverse water-filling (after sand-pouring).

it maximizes the read access speed (Remark 10) and minimizesthe overall BER (Remark 12).

C. EDP Minimization

We formulate the following convex optimization problem tominimize EDP (i.e., EDP(∆, w = 1)) for a given constrainton MSE.

minimize∆,ξ

1T∆ · ξ

subject toB−1∑b=0

4bQ

(∆b

σ

)≤ V

0 ≤ ∆b ≤ ξ, b = 0, . . . , B − 1

(28)

which is derived by considering (6) and (21). We show that ξis equal to ρ (see Appendix C).

Theorem 13: The optimal swing ∆∗ of (28) is given by

∆∗b =0, if log ν

ρ ≤ log√

2πσ4b ,

ρ, if log νρ ≥ log

√2πσ4b + ρ2

2σ2 ,

σ

√2 log

(4b√2πσ· νρ), otherwise

(29)

where ν is a dual variable.

0

(LSB)

b B - 1

(MSB)

Bit position

Sand depth:

Ground level:

Water level:

Water depth:

(a)

0

(LSB)

b B - 1

(MSB)

Reverse

water level:

Water depth:

Bit position

Reverse

ground level:

Flattening term:

(b)

Fig. 5. Graphical interpretations of Theorem 13: (a) sand-pouring and water-filling and (b) ground-flattening and reverse water-filling.

Proof: We define the Lagrangian L3(∆, ξ, ν, λ, η) asso-ciated with problem (28) as

L3(∆, ξ, ν, λ, η) = 1T∆ · ξ + ν

(B−1∑b=0

4bQ

(∆b

σ

)− V

)

−B−1∑b=0

λb∆b +

B−1∑b=0

ηb(∆b − ξ) (30)

where ν, λ = (λ0, . . . , λB−1), and η = (η0, . . . , ηB−1) aredual variables. The optimal solution (29) can be derived fromL3 and corresponding KKT conditions. The details of theproof are given in Appendix C.

The optimal solution of (29) can be interpreted by sand-pouring and water-filling as shown in Fig. 5(a). For log ν

ρ >

log√

2πσ4b , we derive the following equation (see Appendix C):

logν

ρ= log

√2πσ

4b+ log

(1 +

ηbρ

)+

∆2b

2σ2(31)

where log νρ , log

√2πσ4b , log

(1 + ηb

ρ

), and ∆2

b

2σ2 represent thewater level, the ground level, the sand depth, and the waterdepth, respectively. Pouring sand suppresses the maximumwater depth (i.e., the maximum swing) and water-filling al-locates swings to optimize energy efficiency.

The following corollary shows the relation between the sanddepth and other metrics.

Corollary 14: The sand depth sb is given by

sb = log

(1 +

ηbρ

)(32)

Page 8: DOIshanbhag.ece.illinois.edu/publications/yongjune-tcom... · 2018. 9. 9. · difference between BL and BL-bar (BLB).The symmetric structure of SRAMs (Fig. 1(b)) allows for differential

7

where

ηb =

{0, if 0 ≤ ∆b < ρ,

> 0, if ∆b = ρ.(33)

Hence, sb = 0 for 0 ≤ ∆b < ρ and sb > 0 for ∆b = ρ. Also,the amount of sand is given by

B−1∑b=0

exp(sb) =E(∆)

ρ+B. (34)

Proof: See Appendix C.We observe that the amount of sand depends on the energyand the maximum swing.

Suppose that sand is poured in only the MSB position, i.e.,∆B−1 = ρ and ∆b < ρ for b ∈ J0, B − 2K. Then,

ηB−1 =

B−1∑b=0

ηb =

B−1∑b=0

∆b = E(∆) (35)

which follows from (33), (71) (in Appendix C), and Defini-tion 1. Hence,

sB−1 = log

(1 +

E(∆)

ρ

)= log

(1 +

B

PASR(∆)

)(36)

where the peak-to-average swing ratio (PASR) of swings isgiven by

PASR(∆) =ρ

1B · E(∆)

. (37)

We also note that (36) takes a similar form as the Gaussianchannel’s capacity. By (36) and (37), we obtain

PASR(∆) =B

exp (sB−1)− 1(38)

which shows that more sand reduces the PASR of swings.Fig. 5(b) illustrates the ground-flattening and reverse water-

filling interpretation. From (31), we obtain log 4b√

2πσ+

log ρρ+ηb

= log ρν +

∆2b

2σ2 where the negative flattening termlog ρ

ρ+ηbsuppresses the maximum swing and reverse water-

filling up to the reverse water level log ρν optimizes energy

efficiency.Remark 15 (Sand-pouring and mercury-filling): Sand-

pouring and water-filling has a connection to mercury/water-filling [29] because both are two-level filling. In themercury/water-filling problem, the mercury is poured beforewater-filling to fill the gap between an ideal Gaussian signaland practical signal constellations, hence, each mercury depthdepends only on the corresponding signal constellation. Onthe other hand, sand-pouring depends on the ground levels andsand depths are correlated with each other since sand-pouringattempts to flatten the ground levels. Also, the amount ofpoured sand depends on water-filling as shown in Corollary 14whereas the amount of mercury is not related to water-filling.

Table II summarizes water-filling and reverse water-fillinginterpretations for our optimization problems. Notice the du-ality between ground-flattening and sand-pouring.

IV. NON-UNIFORM SOURCES AND NON-GAUSSIANNOISES

In this section, we study how to extend our optimizationproblems to non-uniformly distributed sources and to non-Gaussian noise models.

A. Non-uniform Sources

In Lemma 5, we considered the MSE of a uniformlydistributed source. For a non-uniformly distributed sourcex =

∑B−1b=0 2bxb of (7), the MSE is derived in the following

proposition.Proposition 16: The MSE of x is given by

MSE(x) =

B−1∑b=0

4bpb + 2

B−1∑b=1

b−1∑b′=0

2b+b′pbpb′φ(b, b′) (39)

where φ(b, b′) = Pr (xb = xb′)−Pr (xb 6= xb′), pb = Q(

∆b

σ

),

and p′b = Q(

∆b′σ

).

Proof: From (13), the MSE of x is given by

MSE(x) = E

(B−1∑b=0

2beb

)2

=

B−1∑b=0

4bpb + 2

B−1∑b=1

b−1∑b′=0

2b+b′E [ebeb′ ] (40)

where eb = x̂b − xb ∈ {−1, 0, 1}. If x is uni-formly distributed, then E [ebeb′ ] = E[eb]E[eb′ ] = 0.For a non-uniformly distributed source, E [ebeb′ ] for b 6=b′ is given by E [ebeb′ ] =

∑x,x̂ p(x)p(x̂ | x)ebeb′ =

pbpb′ {Pr(xb = xb′)− Pr(xb 6= xb′)} = pbpb′φ(b, b′).Note that (39) is not convex since φ(b, b′) can be negative.

We show that (39) can be approximated to (12) if the followingconditions are satisfied.

Claim 17: If p0 � 12 or Pr (xb = xb′) ' Pr (xb 6= xb′) for

any b 6= b′, then (39) can be approximated as (12).Proof: We can rewrite (39) as follows:

MSE(x) = p0 +

B−1∑b=1

(4b + cb)pb (41)

where cb = 2b+1∑b−1b′=0 2b

′pb′φ(b, b′). For any sources, we

show that (39) can be approximated as (12) if p0 � 12 .

|cb| ≤ 2b+1b−1∑b′=0

2b′pb′ |φ(b, b′)| ≤ 2b+1

b−1∑b′=0

2b′pb′ (42)

≤ 2b+1p0

b−1∑b′=0

2b′

= 2b+1(2b − 1)p0 (43)

where (42) follows from |φ(b, b′)| ≤ 1. Also, (43) followsfrom the fact that p0 ≥ pb for b ∈ J1, B − 1K in ouroptimization problems. If 4b � 2b+1(2b − 1)p0 for anyb ∈ J1, B − 1K, then we can neglect the MSE differencebetween a uniformly distributed source and non-uniformlydistributed sources, which is satisfied by the condition p0 � 1

2 .If a given source satisfies φ(b, b′) ' 0 (i.e., Pr (xb = xb′) '

Pr (xb 6= xb′)), then cb ' 0 and (39) can be approximated as(12) for any bit error probabilities.

We note that the condition of p0 � 12 implies that the

cross-product terms pbp′b in (39) can be neglected, which is awidely used approximation. We will use (12) instead of (39) tomaintain convex optimization formulation. For a non-uniformsource whose data statistics do not follow |φ(b, b′)| ' 0,the obtained solutions may be sub-optimal. These sub-optimal

Page 9: DOIshanbhag.ece.illinois.edu/publications/yongjune-tcom... · 2018. 9. 9. · difference between BL and BL-bar (BLB).The symmetric structure of SRAMs (Fig. 1(b)) allows for differential

8

TABLE IISUMMARY OF GENERALIZED WATER-FILLING

Water-filling interpretation Reverse water-filling interpretation Ground levels

Min energy Water-filling Reverse water-filling UnflattenedMax speed Ground-flattening / water-filling Sand-pouring / reverse water-filling Perfectly flattenedMin EDP Sand-pouring / water-filling Ground-flattening / reverse water-filling Partially flattened

solutions converge to the optimal solutions for lower MSE dueto the condition of p0 � 1

2 .

B. Non-Gaussian Noise Models

Although SRAM noise is well-modeled as a Gaussian dis-tribution, the proposed optimization problems can be extendedto non-Gaussian noise models. We show that the convexity ofproposed optimization problems are maintained if the noise isunimodal and symmetric with zero mean.

Claim 18: If the noise is modeled a unimodal and symmetricdistribution with zero mean, then MSE(∆) is convex.

Proof: Suppose that the noise distribution is f(t), whichis a unimodal and symmetric distribution with zero mean.Then, the bit error probability is given by pb =

∫∞∆bf(t)dt.

Note that d2pbd∆2

b= −df(∆b)

d∆b≥ 0 which follows from df(∆b)

d∆b≤ 0

for ∆b ≥ 0. Since the MSE is the nonnegative weighted sumof bit error probabilities, the MSE is also convex.Hence, the proposed optimization problems are convex formany important noise models including Gaussian, Laplacian,Student’s t, and Cauchy distribution models.

For the Laplacian noise model, we will provide the optimalswings for three criteria in Section III. The Laplacian probabil-ity density function (PDF) is given by f(t) = 1

2α exp(− |t|α

)where the variance is σ2 = 2α2. The bit error probability ofthe bth position is given by pb =

∫∞∆bf(t)dt = 1

2 exp(−∆b

α

).

By Lemma 5, the MSE is given by

MSE(∆) =1

2

B−1∑b=0

4b exp

(−∆b

α

)(44)

which is convex as shown in Claim 18.Corollary 19: For each criterion, the optimized swings under

the Laplacian noise model are as follows.1) Energy minimization:

∆∗b =

{0, if ν ≤ 2α

4b ,

α log(

4b

2α · ν), otherwise.

(45)

2) Speed maximization:

∆∗b = ρ = α · log

(4B − 1

6α· ν). (46)

3) EDP minimization:

∆∗b =

0, if log ν

ρ ≤ log 2α4b ,

ρ, if log νρ ≥ log 2α

4b + ρ2α ,

α log(

4b

2α ·νρ

), otherwise.

(47)

Proof: The proof is similar to Theorem 7, Theorem 9,and Theorem 13 after replacing (12) with (44).

Subarray 0

(for LSB)

Interleaver

SRAM Array

Subarray 1

Subarray

B 1

(for MSB)

Pulse-width

Control

Configuration

code

Fig. 6. Proposed interleaved architecture.

For other unimodal and symmetric noise models with zeromean, the optimized swings can be obtained in a similarmanner. Hence, the proposed optimizations can be effectivelyapplied even if future scaled-down SRAMs might be affectedby non-Gaussian noises where the central limit theorem doesnot hold.

V. ARCHITECTURE AND DISCRETE SWINGS

In the previous section, we determined the optimized swingsassuming that any real value can be assigned to bit-levelswings. However, current SRAM architectures and circuitsdo not support fine-grained bit-level swing assignments. Inthis section, we propose an SRAM architecture to effectivelyenable bit-level swing control. Also, we provide algorithms tooptimize discrete-valued swings rather than continuous-valuedswings.

A. Proposed Architecture

In [10], an SRAM architecture that allocates differentswings for each memory instance (array or sub-array) wasintroduced. The fine-grained swings were achieved by WLpulse-width control with little overhead. This architectureattempts to compensate for the impact of spatial variations byapplying different pulse-widths to each sub-array. By tweakingthe architecture of [10], we propose an architecture thatcontrols bit-level swings in an efficient manner in Fig. 6. Wecan separate the data for each bit position in different sub-arrays by interleaving. Note that interleaving is already usedin most SRAMs for soft-error immunity [45], [46]. Hence, ourarchitecture does not incur additional overhead, compared tothe architecture in [10].

The proposed architecture enables fine-grained bit-levelswing control by adjusting pulse-width for each sub-arrays.We consider a case that the optimal swings for several MSErequirements are calculated by the proposed algorithms outsidethe memory. These sets of swings are stored as configurationcodes to control the corresponding pulse-widths. Depending onthe MSE requirement, the pulse-width control unit receives theconfiguration codes corresponding to the optimized swings.

Page 10: DOIshanbhag.ece.illinois.edu/publications/yongjune-tcom... · 2018. 9. 9. · difference between BL and BL-bar (BLB).The symmetric structure of SRAMs (Fig. 1(b)) allows for differential

9

We do not need to specify the MSE requirement whenwriting data because the read operations of SRAM can controlvoltage swings and the corresponding MSE as explained inSection II. Hence, the proposed architecture can support manydifferent MSE requirements even for the same written data.Although two sets of data from different applications are storedin the same sub-array, it is unlikely that those are retrievedsimultaneously for these two applications. Since the proposedarchitecture supports different MSE requirements during readoperations, different MSE requirements for each piece of datacan be met when they are accessed at different times.

B. Optimization of Discrete Swings: Discrete Water-filling

Since pulse-width control is usually implemented by cas-caded logic gates [10], the swing granularity depends onlogic gates response time, which is a finite value. Hence,we present optimization algorithms for discrete swings byleveraging graphical interpretations from Section III.

For Criterion 1 (minimize energy) and Criterion 2 (maxi-mize speed), our algorithm approximates the Levin–Campelloalgorithm [47]–[49]. The optimization problem of Criterion3 (minimize EDP) cannot be solved by the Levin–Campelloalgorithm and so we develop an algorithm based on sand-pouring and water-filling interpretation and its KKT condi-tions.

Suppose that β is the granularity in discrete swings. Ourdiscrete water-filling algorithm (Algorithm 1) attempts toobtain the discrete swings minimizing energy or maximizingspeed by a greedy approach. The basic idea is to fill the waterfrom the bit position whose temporal water level is the lowest.For Criterion 1, the ground level should be gb = log

√2πσ4b for

b ∈ J0, B − 1K as shown in Fig. 2(a). For Criterion 2, we setthe ground level as g = 0, which represents the flat groundlevel as shown in Fig. 3.

Algorithm 1 Discrete water-filling for (15) and (20)1: Set ground level g = (g0, . . . , gB−1) depending on

problems2: ∆← 03: while MSE(∆) > V do4: b← arg min

b∈J0,B−1K

{gb +

∆2b

2σ2

}. Lowest water level

5: ∆b ← ∆b + β . Fill more water6: end while7: return ∆

To minimize energy by discrete swings, we tailor the Levin–Campello algorithm by replacing line 4 in Algorithm 1 with

b = arg minb∈J0,B−1K

{MSE(∆ + βeb)−MSE(∆)} (48)

where eb is a unit vector where eb = 1 and e′b = 0 for b′ 6= b.Since MSE(∆) is the sum of convex functions, the discreteswings obtained by the Levin–Campello algorithm are optimal.We show that Algorithm 1 is an approximation of the Levin–Campello algorithm.

Corollary 20: The solution by Algorithm 1 converges to thesolution by Levin–Campello algorithm for small β.

Proof: By Lemma 5,

MSE(∆ + βeb)−MSE(∆)

= 4b(Q

(∆b + β

σ

)−Q

(∆b

σ

)). (49)

As β → 0, (49) converges to

β · 4b ·∂Q(

∆b

σ

)∂∆b

= −β · 4b√2πσ

exp

(− ∆2

b

2σ2

). (50)

We can consider choosing b that minimizes (50) as follows:

b = arg min

{−β · 4b√

2πσexp

(− ∆2

b

2σ2

)}= arg min

{log

√2πσ

4b+

∆2b

2σ2

}(51)

which is equivalent to line 4 of Algorithm 1.Numerical results in Section VI show that the discrete swingsobtained by Algorithm 1 are almost identical to the solutionsby the Levin–Campello algorithm.

We present an algorithm to obtain discrete swings to min-imize EDP in Algorithm 2. The Levin-Campello algorithmcannot solve this problem since the ρ = max (∆) in EDPcannot be handled by the Levin-Campello algorithm. Byleveraging the sand-pouring and water-filling interpretation ofFig. 5 and KKT conditions, Algorithm 2 attempts to pour sandand fill water iteratively.

Algorithm 2 Sand-pouring and discrete water-filling for (28)

1: gb ← log√

2πσ4b for all b ∈ J0, B − 1K . Set ground level

2: ∆← 0, η ← 0, s← 03: while MSE(∆) > V do4: ρ← max(∆)5: b← arg min

b∈J0,B−1K{gb + sb} . Lowest sand level

6: ηb ← ηb + β . Pour more sand7: for b = 0 to B − 1 do8: sb ← log

(1 + ηb

ρ

). Calculate sand depth

9: end for10: b← arg min

b∈J0,B−1K

{gb + sb +

∆2b

2σ2

}. Lowest water level

11: ∆b ← ∆b + β . Fill more water12: end while13: return ∆

At each iteration, Algorithm 2 first pours more sand fromthe lowest sand level as shown in line 5. Note that the sandlevel of each bit position is the sum of the correspondingground level and sand depth. We increase ηb by β in line 6and ∆b by β in line 11 at each iteration to satisfy theoptimal condition

∑ηb =

∑∆b (see (71) in Appendix C).

After increasing ηb, the sand depth sb of each bit positionis calculated by Corollary 14, which indicates the increasedamount of sand. Afterwards, water is filled from the bitposition whose water level is the lowest. Note that the sanddepth sb affects the water level unlike Algorithm 1 (Compareline 4 of Algorithm 1 and line 10 of Algorithm 2). Numericalresults in Section VI show that the EDP loss due to discreteswings of Algorithm 2 is negligible for moderate granularity.

Page 11: DOIshanbhag.ece.illinois.edu/publications/yongjune-tcom... · 2018. 9. 9. · difference between BL and BL-bar (BLB).The symmetric structure of SRAMs (Fig. 1(b)) allows for differential

10

10 15 20 25 30 35 40 45 50 55 60

PSNR (dB)

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

E(∆

) /

BMinimize Energy

Maximize Speed (Uniform Swing)

Minimize EDP

(a)

10 15 20 25 30 35 40 45 50 55 60

PSNR (dB)

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

E(∆

) /

B

Minimize Energy

Maximize Speed (Uniform Swing)

Minimize EDP

(b)

Fig. 7. Comparison of energy consumption for (a) B = 8 and (b) B = 16(σ2 = 1).

VI. NUMERICAL RESULTS

We evaluate the solutions of the three optimization problemsfor both continuous and discrete swings. Note that the solutionof maximizing speed is equivalent to the conventional uniformswing as noted in Remark 10. Also, we compare the proposedoptimization to LSB dropping and selective ECCs.

Fig. 7 compares the read energy consumption E(∆) asin Definition 1 for a given constraint of peak signal-to-noise ratio (PSNR). The PSNR depends on the MSE as

PSNR = 10 log10(2B−1)

2

MSE(∆) . At PSNR = 30dB, the optimalsolution of (15) (i.e., minimizing energy) reduces the energyconsumption by half for B = 8, compared to uniform swing(i.e., maximizing speed). For B = 16, the energy consumptionof energy-optimal swing will be only quarter, compared to theuniform swing. Energy consumption of EDP-optimal swing isslightly worse than that of energy-optimal swing.

Fig. 8 compares the maximum delay ρ as in Definition 2 fora given PSNR. The conventional uniform swing minimizes themaximum delay; hence it is the speed-optimal solution. Theswings minimizing energy achieve significant energy savingsat the cost of speed (e.g., the maximum delay increase of 20%at PSNR = 30dB). The EDP-optimal swings increase only 8%of maximum delay at PSNR = 30dB.

Fig. 9 compares the EDP for a given PSNR. As formulated,the swings minimizing EDP show the best results. The EDPcan be reduced by 45% for B = 8 at PSNR = 30dB. The EDP

10 15 20 25 30 35 40 45 50 55 60

PSNR (dB)

0

1

2

3

4

5

6

Ma

x d

ela

y ρ

Minimize Energy

Maximize Speed (Uniform Swing)

Minimize EDP

(a)

10 15 20 25 30 35 40 45 50 55 60

PSNR (dB)

0

1

2

3

4

5

6

Ma

x d

ela

y ρ

Minimize Energy

Maximize Speed (Uniform Swing)

Minimize EDP

(b)

Fig. 8. Comparison of maximum delay for (a) B = 8 and (b) B = 16(σ2 = 1).

improvement is much more for B = 16, e.g., 75% EDP savingat PSNR = 30dB. Note that slight loss of speed performancecan result in significant energy and EDP savings.

Fig. 10 shows optimal solutions to (a) minimize energy,(b) minimize maximum delay, and (c) minimize EDP. Asshown in Fig. 10(a), we should allocate larger swings for moresignificant bits. Also, we observe that the swings for severalLSBs can be zero depending on PSNR, e.g., ∆0 = ∆1 =∆2 = 0 at PSNR = 30dB, a refined kind of LSB dropping.These numerical solutions confirm Theorem 7 and its water-filling interpretation in Fig. 2. Fig. 10(b) shows the solutionsminimizing maximum delay. As we showed in Theorem 9,uniform swings minimize the maximum delay. The optimizedswings in Fig. 10(c) minimize the EDP. Although the EDP-optimal swings are similar to the energy-optimal swings, weobserve that ∆6 = ∆7 = ρ at PSNR = 30dB. It is becausethese two bit positions are filled with sand to suppress themaximum delay as shown in Theorem 13 and its graphicalinterpretation in Fig. 5.

Fig. 11 compares uniform swings, energy-optimal swingsin (16), LSB dropping, and selective ECCs. The proposedenergy-optimal swings outperform the other techniques sincethe energy-optimal swings achieve the target PSNR with theminimum energy E(∆). LSB dropping deactivates L LSBsand allocates uniform swings for (B − L) undropped bitpositions. In the low PSNR regime, dropping more LSBs (i.e.,larger L) can be effective. However, larger L will limit the

Page 12: DOIshanbhag.ece.illinois.edu/publications/yongjune-tcom... · 2018. 9. 9. · difference between BL and BL-bar (BLB).The symmetric structure of SRAMs (Fig. 1(b)) allows for differential

11

10 15 20 25 30 35 40 45 50 55 60

PSNR (dB)

0

5

10

15

20

25

ED

P(∆

) /

BMinimize Energy

Maximize Speed (Uniform Swing)

Minimize EDP

(a)

10 15 20 25 30 35 40 45 50 55 60

PSNR (dB)

0

5

10

15

20

25

ED

P(∆

) /

B

Minimize Energy

Maximize Speed (Uniform Swing)

Minimize EDP

(b)

Fig. 9. Comparison of EDP for (a) B = 8 and (b) B = 16 (σ2 = 1).

levels of achievable PSNRs. Selective ECCs store parity bitsin LSBs to prevent the additional memory overhead. UnlikeLSB dropping, selective ECCs allocate uniform swings for allthe bit positions. In spite of the LSB information loss, theoverall PSNR can be improved by correcting errors in MSBs.As in [21], we consider (n, k) Hamming codes for selectiveECCs since complicated ECCs are impractical for SRAMs. Ina selective ECC (7, 4) for B = 8, the bits of (x4, x5, x6, x7)are protected by losing information of (x0, x1, x2). Since threeLSBs are lost, the PSNR of selective ECC (7, 4) converges tothe PSNR by LSB dropping (L = 4) as shown in Fig. 11(a).For B = 8, a (15, 11) Hamming code cannot be incorporatedinto an 8-bit word. Hence, we store four parity bits of aHamming (15, 11) codeword in the last LSBs of four different8-bit words as proposed in [21]. Note that selective ECC (15,11) for B = 8 converges to LSB dropping (L = 1) forhigh E(∆) since both schemes discard only the last LSBs.In Fig. 11(b), all selective ECCs are applied to one 16-bitword.

Fig. 12 shows that the energy penalty due to discrete swingsis negligible for moderate granularity β. Energy consumptionof discrete swings obtained by our Algorithm 1 is almostthe same as the Levin–Campello algorithm as explained inCorollary 20. Fig. 13 compares the EDP by optimal swingsof Theorem 13 and discrete swings by Algorithm 2. Bycomparing Fig. 12 to Fig. 13, we observe that the EDP is moresensitive to β than the energy. The reason is that the EDP isperturbed by the discretization of ρ as well as the discretization

10 15 20 25 30 35 40 45 50 55 60

PSNR (dB)

0

1

2

3

4

5

6

∆0 (LSB)

∆1

∆2

∆3

∆4

∆5

∆6

∆7 (MSB)

(a)

10 15 20 25 30 35 40 45 50 55 60

PSNR (dB)

0

1

2

3

4

5

6

∆0 (LSB)

∆1

∆2

∆3

∆4

∆5

∆6

∆7 (MSB)

(b)

10 15 20 25 30 35 40 45 50 55 60

PSNR (dB)

0

1

2

3

4

5

6

∆0 (LSB)

∆1

∆2

∆3

∆4

∆5

∆6

∆7 (MSB)

(c)

Fig. 10. Optimal solutions (a) minimizing energy, (b) maximizing speed, and(c) minimizing EDP (σ2 = 1).

of energy. Nonetheless, the EDP penalty at PSNR = 30dB isvery little for moderate granularity such as β = 1. We canobserve that the EDP penalty due to discrete swings is smallerfor larger B. Since the Levin–Campello algorithm cannot solvethe EDP optimization problem, it is absent in Fig. 13.

For Laplacian noise model, we compare the energy con-sumption, the maximum delay, and the EDP by the optimizedswings in Fig. 14. The optimized swings for three criteriaare given in Corollary 19. Fig. 14(a) shows that the energy-optimal swings reduce the energy consumption more than halffor B = 8 at PSNR = 30dB. The swings minimizing energyachieve this significant energy savings at the cost of the speed

Page 13: DOIshanbhag.ece.illinois.edu/publications/yongjune-tcom... · 2018. 9. 9. · difference between BL and BL-bar (BLB).The symmetric structure of SRAMs (Fig. 1(b)) allows for differential

12

1 1.5 2 2.5 3 3.5 4 4.5 5

E(∆) / B

10

20

30

40

50

60

70

80

90

PS

NR

(d

B)

Uniform swings

Energy-optimal swings

LSB dropping (L=1)

LSB dropping (L=2)

LSB dropping (L=3)

LSB dropping (L=4)

Selective ECC (7, 4)

Selective ECC (15, 11)

(a)

1 1.5 2 2.5 3 3.5 4 4.5 5

E(∆) / B

10

20

30

40

50

60

70

80

90

100

110

PS

NR

(d

B)

Uniform swings

Energy-optimal swings

LSB dropping (L=2)

LSB dropping (L=4)

LSB dropping (L=6)

LSB dropping (L=8)

Selective ECC (7, 4)

Selective ECC (15, 11)

(b)

Fig. 11. Comparison of uniform swings, energy-optimal swings, LSBdropping, and selective ECC (a) B = 8 and (b) B = 16 (σ2 = 1).

(e.g., the maximum delay increase of 30% at PSNR = 30dB).The EDP-optimal swings increase 10% of maximum delayat PSNR = 30dB (see Fig. 14(b)). Fig. 14(c) shows that theEDP-optimal swings achieve the 55% EDP saving at PSNR =30dB.

VII. CONCLUSION

SRAM is a critical component for information process-ing systems. Casting read access for SRAMs as an end-to-end communication problem, we found the optimal bit-levelswings of SRAMs for applications with fidelity dependent onbit position. We formulated convex optimization problems todetermine the optimal swings for the objective functions ofenergy, maximum delay, and EDP. The optimized bit-levelswings can achieve significant energy (50% for 8-bit wordand 75% for 16-bit word) and EDP (45% for 8-bit word and75% for 16-bit word) savings at PSNR of 30dB compared tothe conventional uniform swings.

By treating each bit position as an individual channel, wecast bit-level swing optimization problems as generalizationsof water-filling that may involve sand-pouring and ground-flattening. Also, we developed optimization algorithms fordiscrete swings by leveraging water-filling interpretations andKKT conditions. The discrete swings obtained by proposedalgorithms achieve almost the same energy and EDP savingsas the continuous swings for moderate granularity.

10 15 20 25 30 35 40 45 50 55 60

PSNR (dB)

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

E(∆

) /

B

Uniform swings (cont)

Energy-optimal swings (cont)

Algorithm 1 (β=0.1)

Levin-Campello (β=0.1)

Algorithm 1 (β=0.5)

Levin-Campello (β=0.5)

Algorithm 1 (β=1)

Levin-Campello (β=1)

(a)

10 15 20 25 30 35 40 45 50 55 60

PSNR (dB)

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

E(∆

) /

B

Uniform swings (cont)

Energy-optimal swings (cont)

Algorithm 1 (β=0.1)

Levin-Campello (β=0.1)

Algorithm 1 (β=0.5)

Levin-Campello (β=0.5)

Algorithm 1 (β=1)

Levin-Campello (β=1)

(b)

Fig. 12. Energy consumption of discrete swings obtained by Algorithm 1and the Levin–Campello algorithm for (a) B = 8 and (b) B = 16 (σ2 = 1).

APPENDIX APROOF OF THEOREM 7

The KKT conditions of (15) are as follows:B−1∑b=0

4bQ

(∆b

σ

)≤ V, ν ≥ 0, (52)

ν ·

{B−1∑b=0

4bQ

(∆b

σ

)− V

}= 0, (53)

∆b ≥ 0, λb ≥ 0, λb∆b = 0 (54)

for b ∈ J0, B − 1K. From ∂L1

∂∆b= 0, λb is given by

λb = 1− ν · 4b√2πσ

exp

(− ∆2

b

2σ2

)≥ 0. (55)

By (54) and (55), ∆b

{1− ν · 4b

√2πσ

exp(− ∆2

b

2σ2

)}= 0.

If ν = 0, then λb = 1 and ∆b = 0 for any b ∈ J0, B − 1Kbecause of (54) and (55). Since ∆ = 0 is a trivial solution,we claim that ν 6= 0, which results in

∑B−1b=0 4bQ

(∆b

σ

)= V .

If ν ≤√

2πσ4b , then ∆b > 0 is impossible because it would

imply λb = 0 and ν =√

2πσ4b exp

(∆2

b

2σ2

), which contradicts

the condition ν ≤√

2πσ4b . Hence, ∆b = 0 for ν ≤

√2πσ4b .

If ν >√

2πσ4b , then ∆b = 0 is impossible because it would

imply ν =√

2πσ4b exp

(∆2

b

2σ2

)=√

2πσ4b , which contradicts the

condition ν >√

2πσ4b . We claim that ∆b > 0 and λb = 0,

Page 14: DOIshanbhag.ece.illinois.edu/publications/yongjune-tcom... · 2018. 9. 9. · difference between BL and BL-bar (BLB).The symmetric structure of SRAMs (Fig. 1(b)) allows for differential

13

10 15 20 25 30 35 40 45 50 55 60

PSNR (dB)

0

5

10

15

20

25

ED

P(∆

) /

BUniform swings (cont)

EDP-optimal swings (cont)

Algorithm 2 (β=0.1)

Algorithm 2 (β=0.5)

Algorithm 2 (β=1)

(a)

10 15 20 25 30 35 40 45 50 55 60

PSNR (dB)

0

5

10

15

20

25

ED

P(∆

) /

B

Uniform swings (cont)

EDP-optimal swings (cont)

Algorithm 2 (β=0.1)

Algorithm 2 (β=0.5)

Algorithm 2 (β=1)

(b)

Fig. 13. EDP of discrete swings obtained by Algorithm 2 for (a) B = 8 and(b) B = 16 (σ2 = 1).

which results in and (18) for ν >√

2πσ4b . Thus, the optimal

solution ∆∗ of (15) can be derived from (16).

APPENDIX BPROOF OF THEOREM 9

The KKT conditions of (21) are as follows:

B−1∑b=0

4bQ

(∆b

σ

)≤ V, ν ≥ 0, (56)

ν ·

{B−1∑b=0

4bQ

(∆b

σ

)− V

}= 0, (57)

0 ≤ ∆b ≤ ξ, λb ≥ 0, ηb ≥ 0, (58)λb∆b = 0, ηb(∆b − ξ) = 0 (59)

for b ∈ J0, B − 1K. From ∂L2

∂∆b= 0 and ∂L2

∂ξ = 0, we obtainthe following equations:

λb = ηb − ν ·4b√2πσ

exp

(− ∆2

b

2σ2

)≥ 0, (60)

B−1∑b=0

ηb = 1 (61)

By (59) and (60),{ηb − ν · 4b

√2πσ

exp(− ∆2

b

2σ2

)}∆b = 0. If

ν = 0, then ηb∆b = 0. Also, note that ηb(∆b − ξ) = 0 from

10 15 20 25 30 35 40 45 50 55 60

PSNR (dB)

0

1

2

3

4

5

6

7

8

9

E(

) /

B

Minimize Energy

Maximize Speed (Uniform Swing)

Minimize EDP

(a)

10 15 20 25 30 35 40 45 50 55 60

PSNR (dB)

0

1

2

3

4

5

6

7

8

9

10

Ma

x d

ela

y

Minimize Energy

Maximize Speed (Uniform Swing)

Minimize EDP

(b)

10 15 20 25 30 35 40 45 50 55 60

PSNR (dB)

0

10

20

30

40

50

60

70

80

ED

P(

) /

B

Minimize Energy

Maximize Speed (Uniform Swing)

Minimize EDP

(c)

Fig. 14. Comparison of (a) energy consumption, (b) maximum delay, and (c)EDP under Laplacian noise model (B = 8, σ2 = 1).

(59). Both ηb∆b = 0 and ηb(∆b − ξ) = 0 result in ηb = 0 forany b ∈ J0, B− 1K, which violates (61). Hence, we claim that

ν > 0,

B−1∑b=0

4bQ

(∆b

σ

)= V. (62)

From (60), ν ≤ ηb ·√

2πσ4b exp

(∆2

b

2σ2

). If ν ≤ ηb ·

√2πσ4b , then

∆b = 0 and ηb = 0, which violates ν > 0 of (62). Hence,ν > ηb ·

√2πσ4b , which implies ∆b > 0 and λb = 0 for all

b ∈ J0, B − 1K because of (59). By λb = 0 and (60),

ηb = ν · 4b√2πσ

exp

(− ∆2

b

2σ2

). (63)

Page 15: DOIshanbhag.ece.illinois.edu/publications/yongjune-tcom... · 2018. 9. 9. · difference between BL and BL-bar (BLB).The symmetric structure of SRAMs (Fig. 1(b)) allows for differential

14

Because of ν > 0 and (59), we claim that ηb > 0 and

∆b = ξ (64)

for all b ∈ J0, B − 1K. Hence, the optimal solution of (21) isuniform swings, i.e., ∆∗ = (ξ, . . . , ξ) where ρ = max (∆∗) =ξ. We confirm that the reformulated problem (21) is equivalentto the original problem (20).

By (63) and (64),

ν =

√2πσ

4b· ηb · exp

(ρ2

2σ2

)(65)

which is equivalent to (24). From (61) and (65), we obtain(22) and (25).

APPENDIX CPROOF OF THEOREM 13 AND COROLLARY 14

The KKT conditions of (28) are as follows:

B−1∑b=0

4bQ

(∆b

σ

)≤ V, ν ≥ 0, (66)

ν ·

{B−1∑b=0

4bQ

(∆b

σ

)− V

}= 0, (67)

0 ≤ ∆b ≤ ξ, λb ≥ 0, ηb ≥ 0, (68)λb∆b = 0 ηb(∆b − ξ) = 0 (69)

for all b ∈ J0, B−1K. From ∂L3

∂∆b= 0 and ∂L3

∂ξ = 0, we obtainthe following equations:

ξ + ηb = λb + ν · 4b√2πσ

exp

(− ∆2

b

2σ2

), (70)

B−1∑b=0

∆b =

B−1∑b=0

ηb (71)

Suppose that ν = 0, then ξ+ηb = λb for all b ∈ J0, B−1K,which implies (ξ + ηb) ∆b = 0 because of (69). For b suchthat ∆b 6= 0, we observe ηb = 0 because of ξ+ηb = 0, ηb ≥ 0and ξ ≥ 0. For b such that ∆b = 0, ηb = 0 because of (69).Hence, if ν = 0, then ηb = 0 for all b ∈ J0, B − 1K, whichimplies ∆b = 0 for all b ∈ J0, B−1K due to ∆b ≥ 0 and (71).Thus, we claim that

ν > 0,

B−1∑b=0

4bQ

(∆b

σ

)= V (72)

which is the same as (62).By (69) and (70),

λb∆b = ν

{ξ + ηbν− 4b√

2πσexp

(− ∆2

b

2σ2

)}∆b = 0 (73)

where νξ+ηb

≤√

2πσ4b exp

(∆2

b

2σ2

)because of λb ≥ 0. If ν

ξ+ηb≤

√2πσ4b , then ∆b = 0, which implies ηb = 0 by (69). Hence,

we claim that

∆b = 0, ηb = 0, ifν

ξ≤√

2πσ

4b. (74)

If νξ+ηb

>√

2πσ4b , then ∆b > 0 and λb = 0, i.e.,

ν

ξ + ηb=

√2πσ

4bexp

(∆2b

2σ2

). (75)

By (69) and (70),

ηb(∆b − ξ)

= ν

{4b√2πσ

exp

(− ∆2

b

2σ2

)− ξ − λb

ν

}(∆b − ξ) = 0 (76)

where νξ−λb

≥√

2πσ4b exp

(∆2

b

2σ2

)because of ηb ≥ 0. If ν

ξ−λb≥

√2πσ4b exp

(ξ2

2σ2

), then ∆b = ξ > 0, which implies λb = 0 by

(69). Hence, we claim that

∆b = ξ, λb = 0, ifν

ξ≥√

2πσ

4bexp

(ξ2

2σ2

). (77)

If√

2πσ4b ≤ ν

ξ−λb<√

2πσ4b exp

(ξ2

2σ2

), then

ν

ξ − λb=

√2πσ

4bexp

(∆2b

2σ2

). (78)

By (75) and (78),

ν

ξ + ηb=

ν

ξ − λb=

√2πσ

4bexp

(∆2b

2σ2

)(79)

for 0 < ∆b < ξ. ξ + ηb = ξ − λb (i.e., ηb = −λb) meansηb = λb = 0 because of ηb ≥ 0 and λb ≥ 0. Hence, we claimthat

ν

ξ=

√2πσ

4bexp

(∆2b

2σ2

), ηb = λb = 0 (80)

for√

2πσ4b < ν

ξ <√

2πσ4b exp

(ξ2

2σ2

).

Due to (71), there should exist ηb > 0 for b ∈ J0, B− 1K tomake

∑B−1b=0 ∆b > 0. Hence, there exists ∆b = ξ due to (69),

which implies ρ = max(∆) = ξ. From (74), (77), (80), andρ = ξ, we can obtain the optimal solution ∆∗ of (29).

Note that sb > 0 for ∆b = ρ and λb = 0. In this case, (70)can be modified into

ρ+ ηb = ν · 4b√2πσ

exp

(− ρ2

2σ2

). (81)

As shown in Fig. 5(a), the sand depth sb is given by

sb = logν

ρ−

(log

√2πσ

4b+

ρ2

2σ2

)= log

ν

ρ− log

ν

ρ+ ηb

= log

(1 +

ηbρ

)(82)

where (82) follows from (81). If 0 ≤ ∆b < ρ, then ηb = 0 asshown in (74) and (80). Hence, sb = 0 for 0 ≤ ∆b < ρ. Hence,(32) in Corollary 14 is proved. Also, (34) in Corollary 14 isderived from (71) and (82).

ACKNOWLEDGMENT

The authors would like to thank S. K. Gonugondla for hisconstructive discussions.

Page 16: DOIshanbhag.ece.illinois.edu/publications/yongjune-tcom... · 2018. 9. 9. · difference between BL and BL-bar (BLB).The symmetric structure of SRAMs (Fig. 1(b)) allows for differential

15

REFERENCES

[1] M. Horowitz, “Computing’s energy problem (and what we can do aboutit),” in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.Pap., Feb. 2014, pp. 10–14.

[2] C.-P. Lin, P.-C. Tseng, Y.-T. Chiu, S.-S. Lin, C.-C. Cheng, H.-C. Fang,W.-M. Chao, and L.-G. Chen, “A 5mW MPEG4 SP encoder with 2Dbandwidth-sharing motion estimation for mobile applications,” in Proc.IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Pap., Feb. 2006,pp. 1626–1635.

[3] M. E. Sinangil and A. P. Chandrakasan, “Application-specific SRAMdesign using output prediction to reduce bit-line switching activity andstatistically gated sense amplifiers for up to 1.9× lower energy/access,”IEEE J. Solid-State Circuits, vol. 49, no. 1, pp. 107–117, Jan. 2014.

[4] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, andW. J. Dally, “EIE: Efficient inference engine on compressed deep neuralnetwork,” in Proc. ACM/IEEE 43rd Int. Symp. Comput. Architecture(ISCA), Jun. 2016, pp. 243–254.

[5] Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural net-works,” IEEE J. Solid-State Circuits, vol. 52, no. 1, pp. 127–138, Jan.2017.

[6] V. Sze, Y.-H. Chen, T.-J. Yang, and J. Emer, “Efficient processing ofdeep neural networks: A tutorial and survey,” Proc. IEEE, vol. 105,no. 12, pp. 2295–2329, Dec. 2017.

[7] M. Kang, M.-S. Keel, N. R. Shanbhag, S. Eilert, and K. Curewitz,“An energy-efficient VLSI architecture for pattern recognition via deepembedding of computation in SRAM,” in Proc. IEEE Int. Conf. Acoust.,Speech, Signal Process. (ICASSP), May 2014, pp. 8326–8330.

[8] M. Kang, S. K. Gonugondla, A. Patil, and N. R. Shanbhag, “A Multi-functional in-memory inference processor using a standard 6T SRAMarray,” IEEE J. Solid-State Circuits, vol. 53, no. 2, pp. 642–655, Feb.2018.

[9] B. Zhai, D. Blaauw, D. Sylvester, and S. Hanson, “A Sub-200mV 6TSRAM in 0.13 µm CMOS,” in Proc. IEEE Int. Solid-State CircuitsConf. (ISSCC) Dig. Tech. Pap., Feb. 2007, pp. 332–606.

[10] M. H. Abu-Rahma, M. Anis, and S. S. Yoon, “Reducing SRAM powerusing fine-grained wordline pulsewidth control,” IEEE Trans. VLSI Syst.,vol. 18, no. 3, pp. 356–364, Mar. 2010.

[11] I. J. Chang, D. Mohapatra, and K. Roy, “A priority-based 6T/8T hybridSRAM architecture for aggressive voltage scaling in video applications,”IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 2, pp. 101–112,Feb. 2011.

[12] J. Kwon, I. J. Chang, I. Lee, H. Park, and J. Park, “Heterogeneous SRAMcell sizing for low-power H.264 applications,” IEEE Trans. Circuits Syst.I, vol. 59, no. 10, pp. 2275–2284, Oct. 2012.

[13] J. George, B. Marr, B. E. S. Akgul, and K. V. Palem, “Probabilisticarithmetic and energy efficient embedded signal processing,” in Proc.Int. Conf. Compilers, Architecture and Synthesis for Embedded Systems(CASES), Oct. 2006, pp. 158–168.

[14] K. Yi, S.-Y. Cheng, F. Kurdahi, and A. Eltawil, “A partial memoryprotection scheme for higher effective yield of embedded memory forvideo data,” in Proc. Asia-Pacific Comput. Syst. Architecture Conf., Aug.2008, pp. 1–6.

[15] M. Cho, J. Schlessman, W. Wolf, and S. Mukhopadhyay, “Reconfig-urable SRAM architecture with spatial voltage scaling for low powermobile multimedia applications,” IEEE Trans. VLSI Syst., vol. 19, no. 1,pp. 161–165, Jan. 2011.

[16] X. Yang and K. Mohanram, “Unequal-error-protection codes in SRAMsfor mobile multimedia applications,” in Proc. IEEE/ACM Int. Conf.Comput.-Aided Design (ICCAD), Nov. 2011, pp. 21–27.

[17] H. Tang and J. Park, “Unequal-error-protection error correction codesfor the embedded memories in digital signal processors,” IEEE Trans.VLSI Syst., vol. 24, no. 6, pp. 2397–2401, Jun. 2016.

[18] I. Lee, J. Kwon, J. Park, and J. Park, “Priority based error correctioncode (ECC) for the embedded SRAM memories in H.264 system,” J.Signal Process. Syst., vol. 73, no. 2, pp. 123–136, Mar. 2013.

[19] H. Kaul, M. Anders, S. Mathew, S. Hsu, A. Agarwal, F. Sheikh,R. Krishnamurthy, and S. Borkar, “A 1.45GHz 52-to-162GFLOPS/Wvariable-precision floating-point fused multiply-add unit with certaintytracking in 32nm CMOS,” in Proc. IEEE Int. Solid-State Circuits Conf.(ISSCC) Dig. Tech. Pap., Feb. 2012, pp. 182–184.

[20] F. Frustaci, M. Khayatzadeh, D. Blaauw, D. Sylvester, and M. Alioto,“SRAM for error-tolerant applications with dynamic energy-qualitymanagement in 28 nm CMOS,” IEEE J. Solid-State Circuits, vol. 50,no. 5, pp. 1310–1323, May 2015.

[21] F. Frustaci, D. Blaauw, D. Sylvester, and M. Alioto, “ApproximateSRAMs with dynamic energy-quality management,” IEEE Trans. VLSISyst., vol. 24, no. 6, pp. 2128–2141, Jun. 2016.

[22] B. Masnick and J. Wolf, “On linear unequal error protection codes,”IEEE Trans. Inf. Theory, vol. 13, no. 4, pp. 600–607, Oct. 1967.

[23] S. Borade, B. Nakiboglu, and L. Zheng, “Unequal error protection:An information-theoretic perspective,” IEEE Trans. Inf. Theory, vol. 55,no. 12, pp. 5511–5539, Dec. 2009.

[24] C. H. Huang, Y. Li, and L. Dolecek, “ACOCO: Adaptive coding forapproximate computing on faulty memories,” IEEE Trans. Commun.,vol. 63, no. 12, pp. 4615–4628, Dec. 2015.

[25] C. E. Shannon, “Communication in the presence of noise,” Proc. IRE,vol. 37, no. 1, pp. 10–21, Jan. 1949.

[26] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed.Hoboken, NJ: Wiley-Interscience, 2006.

[27] P. S. Chow, “Bandwidth optimized digital transmission techniques forspectrally shaped channels with impulse noise,” Ph.D. dissertation,Stanford University, 1993.

[28] W. Yu and J. M. Cioffi, “On constant power water-filling,” in Proc. IEEEInt. Conf. Commun. (ICC), Jun. 2001, pp. 1665–1669.

[29] A. Lozano, A. M. Tulino, and S. Verdu, “Optimum power allocationfor parallel Gaussian channels with arbitrary input distributions,” IEEETrans. Inf. Theory, vol. 52, no. 7, pp. 3033–3051, Jul. 2006.

[30] A. Macii, L. Benini, and M. Poncino, Memory Design Techniques forLow Energy Embedded Systems. Kluwer Academic Publishers, 2002.

[31] M. H. Abu-Rahma, Y. Chen, W. Sy, W. L. Ong, L. Y. Ting, S. S. Yoon,M. Han, and E. Terzioglu, “Characterization of SRAM sense amplifierinput offset for yield prediction in 28nm CMOS,” in Proc. IEEE CustomIntegrated Circuits Conf. (CICC), Sep. 2011, pp. 1–4.

[32] M. Horowitz, T. Indermaur, and R. Gonzalez, “Low-power digitaldesign,” in Proc. IEEE Symp. Low Power Electr., Oct. 1994, pp. 8–11.

[33] R. Gonzalez and M. Horowitz, “Energy dissipation in general purposemicroprocessors,” IEEE J. Solid-State Circuits, vol. 31, no. 9, pp. 1277–1284, Sep. 1996.

[34] D. M. Brooks, P. Bose, S. E. Schuster, H. Jacobson, P. N. Kudva,A. Buyuktosunoglu, J. Wellman, V. Zyuban, M. Gupta, and P. W. Cook,“Power-aware microarchitecture: Design and modeling challenges fornext-generation microprocessors,” IEEE Micro, vol. 20, no. 6, pp. 26–44, Nov. 2000.

[35] J. H. Laros III, K. Pedretti, S. M. Kelly, W. Shu, K. Ferreira, J. Vandyke,and C. Vaughan, Energy Delay Product. London: Springer London,2013, pp. 51–55.

[36] R. T. Marler and J. S. Arora, “Survey of multi-objective optimizationmethods for engineering,” Struct. Multidisc. Optim., vol. 26, no. 6, pp.369–395, Apr. 2004.

[37] K. Osada, J.-U. Shin, M. Khan, Y.-D. Liou, K. Wang, K. Shoji,K. Kuroda, S. Ikeda, and K. Ishibashi, “Universal-Vdd 0.65-2.0V32 kB cache using voltage-adapted timing-generation scheme and alithographical-symmetric cell,” in Proc. IEEE Int. Solid-State CircuitsConf. (ISSCC) Dig. Tech. Pap., Feb. 2001, pp. 168–169.

[38] F. Arnaud et al., “A functional 0.69µm2 embedded 6T-SRAM bit cellfor 65nm CMOS platform,” in Proc. Symp. VLSI Tech. (VLSIT), Jun.2003, pp. 65–66.

[39] M. Abu-Rahma and M. Anis, Nanometer Variation-Tolerant SRAM:Circuits and Statistical Design for Yield. Springer Publishing Company,2012.

[40] K. Agarwal, F. Liu, C. McDowell, S. Nassif, K. Nowka, M. Palmer,D. Acharyya, and J. Plusquellic, “A test structure for characterizinglocal device mismatches,” in Proc. Symp. VLSI Circuits (VLSIC), Jun.2006, pp. 67–68.

[41] K. J. Kuhn, “Reducing variation in advanced logic technologies: Ap-proaches to process and design for manufacturability of nanoscaleCMOS,” in Proc. IEEE Int. Electron Devices Meeting (IEDM), Dec.2007, pp. 471–474.

[42] T. Mizuno, J. Okumtura, and A. Toriumi, “Experimental study ofthreshold voltage fluctuation due to statistical variation of channel dopantnumber in MOSFET’s,” IEEE Trans. Electron Devices, vol. 41, no. 11,pp. 2216–2221, Nov. 1994.

[43] S. Mukhopadhyay, H. Mahmoodi, and K. Roy, “Modeling of failureprobability and statistical design of SRAM array for yield enhancementin nanoscaled CMOS,” IEEE Trans. Comput.-Aided Design Integr.Circuits Syst., vol. 24, no. 12, pp. 1859–1880, Dec. 2005.

[44] B. S. Leibowitz, J. Kim, J. Ren, and C. J. Madden, “Characterization ofrandom decision errors in clocked comparators,” in Proc. IEEE CustomIntegrated Circuits Conf. (CICC), Sep. 2008, pp. 691–694.

Page 17: DOIshanbhag.ece.illinois.edu/publications/yongjune-tcom... · 2018. 9. 9. · difference between BL and BL-bar (BLB).The symmetric structure of SRAMs (Fig. 1(b)) allows for differential

16

[45] J. Maiz, S. Hareland, K. Zhang, and P. Armstrong, “Characterizationof multi-bit soft error events in advanced SRAMs,” in Proc. IEEE Int.Electron Devices Meeting (IEDM), Dec. 2003, pp. 21.4.1–21.4.4.

[46] K. Osada, Y. Saitoh, E. Ibe, and K. Ishibashi, “16.7-fA/cell tunnel-leakage-suppressed 16-Mb SRAM for handling cosmic-ray-induced mul-tierrors,” IEEE J. Solid-State Circuits, vol. 38, no. 11, pp. 1952–1957,Nov. 2003.

[47] B. Fox, “Discrete optimization via marginal analysis,” Manag. Sci.,vol. 13, no. 3, pp. 210–216, 1966.

[48] J. Campello, “Optimal discrete bit loading for multicarrier modulationsystems,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), Aug. 1998, p.193.

[49] ——, “Practical bit loading for DMT,” in Proc. IEEE Int. Conf.Commun. (ICC), Jun. 1999, pp. 801–805.

Yongjune Kim received the B.S. and M.S. degreesin Electrical and Computer Engineering from SeoulNational University, Seoul, South Korea, in 2002 and2004, respectively, and the Ph.D. degree in Electricaland Computer Engineering from Carnegie MellonUniversity, Pittsburgh, PA, USA, in 2016. From2007 to 2011, he was with Samsung Electronics andSamsung Advanced Institute of Technology, SouthKorea. He is currently a Post-Doctoral Scholar withthe Coordinated Science Laboratory, University ofIllinois at Urbana-Champaign, Urbana, IL, USA. His

research interests include coding for nanoscale devices, in-memory computing,and machine learning. He was a recipient of the Best Paper Award at the IEEEInternational Conference on Communications and the Best Paper Award at theSamsung Semiconductor Technology Symposium.

Mingu Kang (M’13) received the B.S. and M.S. de-grees in Electrical and Electronic Engineering fromYonsei University, Seoul, South Korea, in 2007 and2009, respectively, and the Ph.D. degree in Electricaland Computer Engineering from the University ofIllinois at Urbana-Champaign, Urbana, IL, USA, in2017. From 2009 to 2012, he was with the MemoryDivision, Samsung Electronics, Hwaseong, SouthKorea, where he was involved in the circuit andarchitecture design of phase change memory. Since2017, he has been with the IBM Thomas J. Watson

Research Center, Yorktown Heights, NY, USA, where he designs machinelearning accelerator architecture. His current research interests include low-power integrated circuits, architecture, and system for machine learning, signalprocessing, and neuromorphic computing.

Lav R. Varshney (S’00–M’10–SM’15) receivedthe B.S. degree (magna cum laude) in electricaland computer engineering with honors from CornellUniversity, Ithaca, New York, in 2004. He receivedthe S.M., E.E., and Ph.D. degrees, all in electricalengineering and computer science, from the Mas-sachusetts Institute of Technology, Cambridge, in2006, 2008, and 2010, where his theses receivedthe E. A. Guillemin Thesis Award and the J.-A.Kong Award Honorable Mention. He is an assistantprofessor in the Department of Electrical and Com-

puter Engineering, the Department of Computer Science (by courtesy), theCoordinated Science Laboratory, the Beckman Institute, and the NeuroscienceProgram at the University of Illinois at Urbana-Champaign. He is also leadingcurriculum initiatives for the new B.S. degree in Innovation, Leadership,and Engineering Entrepreneurship in the College of Engineering. During2010–2013, he was a research staff member at the IBM Thomas J. WatsonResearch Center, Yorktown Heights, New York. His research interests includeinformation and coding theory; limits of nanoscale, human, and neuralcomputing; human decision making and collective intelligence; and creativity.Dr. Varshney is a member of Eta Kappa Nu, Tau Beta Pi, and Sigma Xi. Hereceived the IBM Faculty Award in 2014 and was a Finalist for the Bell LabsPrize in 2014 and 2016. He and his students have won several best paperawards. His work appears in the anthology, The Best Writing on Mathematics2014 (Princeton University Press). He currently serves on the advisory boardof the AI XPRIZE.

Naresh R. Shanbhag (F’06) received the Ph.D. de-gree in Electrical Engineering from the University ofMinnesota, Minneapolis, MN, USA, in 1993. From1993 to 1995, he was with the AT&T Bell Laborato-ries, Murray Hill, NJ, USA, where he led the designof high-speed transceiver chip-sets for very high-speed digital subscriber line. In 1995, he joined theUniversity of Illinois at Urbana-Champaign, Urbana,IL, USA. He has held visiting faculty appointmentsat the National Taiwan University, Taipei, Taiwan,in 2007, and at Stanford University, Stanford, CA,

USA, in 2014. He is currently the Jack Kilby Professor of Electrical andComputer Engineering with the University of Illinois at UrbanaChampaign.His current research interests include the design of energy-efficient integratedcircuits and systems for communications, signal processing, and machinelearning. He has authored or co-authored more than 200 publications inthis area and holds 13 U.S. patents. Dr. Shanbhag was a recipient of theNational Science Foundation CAREER Award in 1996, the IEEE Circuitsand Systems Society Distinguished Lecturership in 1997, the 2010 RichardNewton GSRC Industrial Impact Award, and multiple Best Paper Awards.In 2000, he co-founded and served as the Chief Technology Officer ofIntersymbol Communications, Inc., (acquired in 2007 by Finisar Corporation)a semiconductor startup that provided DSP-enhanced mixed-signal ICs forelectronic dispersion compensation of OC-192 optical links. From 2013 to2017, he was the founding Director of the Systems On Nanoscale InformationfabriCs Center (SONIC), a 5-year multi-university center funded by DARPAand SRC under the STARnet program.


Recommended