Combating Hot Carrier Effects via Bit-level Transition Balancing

Combating Hot Carrier Effects via Bit-level Transition Balancing

Yazdan Aghaghiri, Massoud Pedram

Abstract - In this paper, we address the issue of hot-carriers and their impact on reducing the reliability of a VLSI

circuit by accelerating the aging process of transistors. We tackle this phenomenon by formulating and solving an

encoding problem, which we refer to as the Bit-level Transition Balancing (BTB). The BTB problem is to find

encoding techniques that minimize the maximum activity over a group of lines, which we refer to as a bus, thus

making the whole bus less vulnerable to hot-carrier degradation. We approach this problem systematically by first

answering the question of how much information about the characteristics of the data that appears on the bus is

needed to find a combinational and/or sequential encoding function that optimally solve the BTB problem. Next, we

propose a number of different encoding techniques that efficiently solve the BTB problem. Experimental results

demonstrate the effectiveness of such techniques.

1. Introduction

With the current trend of shrinking minimum feature sizes and rising clock frequencies in VLSI circuits, reliability

has become a major design issue. In spite of one-time benefits of reducing the supply voltage level, the substrate

temperature and power density are rapidly increasing. At the same time, a decrease in critical device dimensions to

sub micron ranges, results in more intense horizontal and vertical electric fields in the channel region. Under the

gate of a transistor, these enormous fields give rise to electrons and holes with kinetic energies significantly higher

than silicon band gap (1.1eV.) These electrons and holes may be injected into the gate oxide and can cause

permanent changes in the oxide interface charge distribution. This phenomenon is called hot carrier effect, or

sometimes hot electron effect because this injection happens more often for electrons due to smaller barrier height of

electrons as compared to holes (i.e. 3.1eV for electrons compared to 4.8eV for holes [2].) A sizeable increase in the

threshold voltage of the affected transistors and a corresponding decrease in their drain current driving capability are

undesirable results of such hot carrier injections in the gate oxide. The hot carrier effect is exacerbated as the

technology moves toward smaller device dimensions and higher clock frequencies [2]. Another phenomenon caused

by carriers having energies higher than 1.1ev is the creation of electron-hole pairs through impact ionization. In an

NMOS device, generated electrons are collected by the drain whereas generated holes drift in the substrate toward

the ground terminals, and thereby, contribute to a substrate current. Carrier injection and impact ionization become

most severe when the device is in the saturation region because in this case the intensity of electrical fields in the

channel is at maximum. Therefore, device degradation strongly depends on the duration of time that the transistor

stays in the saturation region and substrate leakage current is a good indicator of the degree of device degradation

[4]. Every time, the output of a gate makes a transition, some transistors should pass through the saturation region

either to turn on or turn off. This means that the amount of time spent in saturation directly depends in the output

activity of the device. Besides it depends on the output load capacitance and slew rate of the input signal since these

two parameters directly determine the output slew rate for a given gate. Now, let Dfresh and Daged denote the fresh

and stressed (aged) propagation delay of a CMOS inverter, we can write:

( ) ( , , . ).aged slew load sw freshD t T C N t Dψ=

where Tslew denotes the input slew rate, Cload is the output capacitance, Nsw represents the output switching activity,

and t the time parameter (total �on-time� of the circuit in seconds.) Function ψ is a non-linear function which is

determined from transistor level simulations and it is usually represented as a three-dimensional table [5]. For gates

other than the simple inverters, the ratio-based degradation model proposed in [3] is applied. In this model we

have aged freshD Dα= , where α≥1 is defined as the overall degradation of all transistors in the gate and is calculated

as:

1...( ) 1i

i nnα α

=

= − +∑

In the above equation n is the number of transistors in series and αi ≥1 is the aged to fresh delay ratio when only

input pin i is under stress and is defined by ( , , )slew load swT C Nψ .

Hot carrier degradation can cause digital systems to fail. A line driver may fail due to an increase in its threshold

voltage, which slows down the driver to a point that it violates the bus time constraints, i.e., a gate delay fault is

being created as a result of hot carrier degradation. It is also possible to encounter a case whereby the increased

delay of consecutive gates along a combinational path in the circuit creates a setup time violation for the flip-flop at

the end of the path, which is known as a path delay fault. In this work, we do not consider faults caused by

accumulated path delay. Our focus is on on-chip busses with a single driver. We define the lifetime of a wire or line

as the Mean Time To Failure (MTTF) of its driving gate. This wire might be a segment of a bus between the driver

and a repeater or a segment between two repeaters. We also assume that the whole system fails as soon as a single

wire fails.

There are different approaches for modeling the lifetime of a wire. For example, we can say that a line fails when

aged fresh freshD D D Dβ∆ = − > where β≥0 is a user-defined parameter greater than zero (e.g., 0.25.). However

lifetime of a wire is actually a random variable with a certain probability distribution function. Therefore, the above

deterministic definition is not suitable. Unfortunately it is extremely difficult to derive the lifetime distribution from

the physics of the hot-carrier effect [2]. Complete statistical characterization of a transistor�s lifetime highly depends

on technology and various physical phenomena. Instead what one typically does is to empirically determine the

distribution that best fits the lifetime of a wire in different designs and different chip realizations of the same design.

Both lognormal and Weibull probability distribution functions have been used for characterizing hot-carrier lifetime

[20]. These two distribution functions are however similar near their mean values. In this paper, we choose to model

the lifetime of an interconnect driver as a lognormal random variable. In addition, based on the previous discussion,

we make the following assumption: �mean value of the driver lifetime is inversely proportional to its output

switching activity.�

Hot carrier effect has been extensively studied in the past few years. Different design techniques such as transistor

resizing and reordering [8,2], logic factorization [6], technology mapping [9], logic restructuring [5] and binding and

scheduling [7] for minimizing hot-carrier degradation have been introduced. In this paper, we propose a completely

new approach to minimizing the hot carrier induced failures of on-chip busses in VLSI circuits. More precisely, by

increasing lifetime of the gates that are subject to most severe hot carrier degradation (e.g., bus drivers), we increase

the lifetime of the whole circuit. Buses tend to have high capacitance compared to other wires in the system. In

addition they usually have a high activity rate compared to other nets in the circuit. This makes the bus drivers some

of the most hot-electron degradation-prone components.

We assume that we are given a set of interconnect lines in a VLSI circuit, which are flagged as hot-carrier sensitive

(HC-sensitive.) Furthermore, we assume that if one of these lines fails, then the whole circuit will fail. Our approach

is to add some encoding/decoding hardware for protecting the circuit against hot carrier failure. We want the

encoding and decoding functions to be as lightweight as possible so as to minimize their impact on overall area,

delay and power consumption of the circuit. Getting back to the bus problem, the fact is that, even with equal sized

drivers and identical loads for each and every bus line, different lines of the same bus may age at different rates due

to their different bit-level activities over time. For example, the LSB of a bus carrying small positive numbers over

the circuit lifetime is expected to have a much higher activity then the MSB of the same bus. The bus as a whole,

however, fails as soon as any of its lines fail. The encoding solutions we propose increase the lifetime of a bus by

balancing switching activity among all bus lines, i.e., by minimizing the maximum line activity of the bus. If we do

not balance the transition counts over different lines of the bus, then the expected lifetime of the bus will be will be

dominated by the most active line of the bus, and as a result, it will be much shorter.

The remainder of this paper is organized as follows. In the next section, we will setup the problem precisely and in

section 3 we will explain the solution type that we have adopted to solve this problem and look at some definitions.

In Section 4 different combinational functions are examined, where in Section 5 sequential solutions are studied.

Results are presented in Section 6, whereas conclusions are provided in Section 7.

2. Problem Setup

The goal of this work is to apply encoding functions that reduce the maximum bit-level activity over a set of bus line

drivers. The bus encoding and decoding function are performed in such a way that the logic functionality of the

circuit is not impacted. If we consider a single-bit bus, there will obviously exist no single-bit encoding and

decoding functions that can reduce the bit-level activity without modifying the surrounding logic. Instead consider a

multi-bit bus where the bus lines are bundled together, implying that the set of drivers are placed in close proximity

of each other as are the set of bus receivers. We assume that the bus drivers have nearly identical electrical

characteristics (i.e., in terms of their size and driver strength.) Our goal is to maximize the bus lifetime, which is set

by the minimum lifetime among all its bit lines, through bus encoding. The lifetime of a bus line is mainly

determined from the output switching activity of its driver. This is because the drivers for all bus lines are the same,

so the only factor impacting the line lifetime is its activity level, which in turn determines the extent of the hot-

carrier effect on the line driver.

Figure 1 shows an example of a bus comprising of three global lines. One of these lines has much higher activity

than the other two, and is thus flagged as an HC-sensitive line. The two other lines can be used to reduce the hot-

carrier vulnerability of this sensitive line by reducing the output switching activity of its driver. This can likely be

achieved by increasing the activity of the non-sensitive lines. Note however that the overall lifetime of the bus will

improve because the maximum activity of the bus lines will have been reduced. Adding the decoder and encoder

logic can be done with a minimal impact on the chip routing because of the proximity of the bus drivers and

receivers.

Encoder

Decoder

Sensitive Line Sensitive Line

ThreeGlobal Lines

Figure 1 A set of 3 global lines that are good candidates for encoding and decoding.

For a given bus, we denote the lifetime of the driver of line i by a r.v. (random variable), iLT . We assume that each

iLT is a r.v. with lognormal probability distribution function and that these r.v.�s are independent from each other.

We further assume that the expectation value of iLT is proportional to the inverse of the switching activity of line i.

Therefore, the Mean Time To Failure (i.e., the lifetime) of a bus line is inversely proportional to the switching

activity of the line driver. When a single bit line driver fails, the whole bus fails, the whole circuit fails. If we denote

the lifetime of the whole bus with LTbus, then we have { }bus i iLT Min LT= . Define ( ) ( )iLT iG T PROB LT T≡ ≤

as the cumulative distribution function (cdf) of iLT . Similarly, the cdf of

busLT can be defined as:

( ) ( ) 1 (1 ( ))bus iLT bus LT

iG T PROB LT T G T≡ ≤ = − −∏ .

Let )(TPX denote the (marginal) probability distribution function (pdf) of r.v. X. The PDF of busLT is obtained by

taking derivative of its cumulative distribution function, yielding:

( ) ( ( ) (1 ( )))bus i jLT LT LT

i j i

P T P T G T≠

= −∑ ∏

Obviously, { }( ) ( )bus i iE LT E Min LT= , where E denotes the expectation value operator. It is, however, difficult to

calculate this expectation value. Instead, we approximate it with )}({ ii LTEMin , which is accurate if the variances of

all iLT lognormal variables are in the same range. This is because if any of the iLT variables has an expectation

value, which is much larger than )}({ ii LTEMin , then that variable will have little impact in determining )( busLTE .

To illustrate this point, let L and S denote two lognormal r.v.�s such that the mean of variable L is greater than the

mean of variable S. Table 1 reports a comparison of ( ( ), ( )) ( )Min E L E S E S= with ( ( , ))E Min L S for different

combinations of ( ) / ( )E L E S and ( ) / ( )Var L Var S assuming E(S)=1,Var(S)=0.25. This table shows three facts:

1) For a fixed ratio of ( ) / ( )E L E S , the error between ( ( , ))E Min L S and ( )E S grows as the ratio of

( ) / ( )Var L Var S increases.

2) For a fixed ratio of ( ) / ( )Var L Var S , the error between ( ( , ))E Min L S and ( )E S diminishes as the ratio of

( ) / ( )E L E S increases.

3) For ( ) / ( )E L E S > 2, the error between ( ( , ))E Min L S and ( )E S vanishes and remains so independent of

( ) / ( )Var L Var S as if variable L has no impact in determining ( ( , ))E Min L S .

Table 1- ( ( , ))E Min L S / ( )E S

( ) / ( )E L E S 1 2 3 5 7 10 1 0.73 0.96 0.99 0.99 0.99 0.99 4 0.63 0.92 0.99 0.99 0.99 0.99 9 0.55 0.87 0.97 0.99 0.99 0.99 16 0.50 0.82 0.95 0.99 0.99 0.99 V

ar(L

) / V

ar(S

)

25 0.45 0.77 0.92 0.99 0.99 0.99

Speaking more generally, we know: ∫∫∞

=

∞

=

−−+=00

)()()()()()()S)(L,(ττ

ττττττ dLPSQdSPLQSELEMinE

where ∫∞

=

=τ

τt

XX dtttPQ )()( . For two lognormal random variables L and S where )()( SELE ≥ , if the PDF

function of the overlap between the pdf functions of these two random variables is small, then )()( ττ SL PQ may be

approximated with )()( τSPLE whereas )()( ττ LS PQ is nearly zero. Therefore, in this case,

( ( , )) ( ) ( ( ), ( ))E Min L S E S Min E L E S≈ = .

Although the above conclusions are made for the case of having only two r.v.�s, our experiments show that they will

still hold as long as the number of r.v.�s remains small, i.e., ≤ 8. For example, suppose that instead of two r.v.�s, S

and L, we have five r.v.�s S, L1 � L4 such that E(Li)=3E(S) and Var(Li)/Var(S) < 9 for i=1…4. In this case, using

our approximation will only result in an additional 4% error on top of the results of Table 1 (the total error will be

7%, which is still rather small.)

The hot-carrier-aware bus encoding problem is different than the low-power bus encoding problem [12-16]. This is

because the objective of the former is to minimize the maximum of bit-level activity (a minmax cost function)

whereas the latter attempts to minimize the total bit-level activity (a minsum cost function.) Note that HC-aware bus

encoding can be applied to a bus that has been optimized for low power dissipation to in order to balance the bit-

level transition counts of bus lines. This may result in some increase in the total power dissipation, but increases the

bus reliability over time.

The abovementioned problem setup is based on a number of approximations. It is possible to come up with rare

examples where by reducing the maximum bit-level activity of every bus line; the lifetime of the bus becomes

actually shorter. To be more precise, consider a four-bit bus 1 2 3( , , , )S L L L , where the expected values of log-

normally distributed lifetime variables, LTi, of the bus lines are: 1 2 3( ) 1, ( ) ( ) ( ) 5E S E L E L E L= = = = . All variances

are all equal to 0.25. The exact lifetime of this bus (assuming lognormal distribution) is written as:

, ( { }) ( {log ( ( ), ( )})init exactk k kbus k k

E Min LT E Min norm E LT Var LTLT = = . With our proposed approximation, we have:

, ( { }) 1init approxkbus k

Min E LTLT = = .

Suppose that after HC-aware bus encoding, the activities change such that the new lifetime expectation values are:

( ) ( ) 1.1iE S E L= = for all i. Let�s assume that the variance remains the same. From our approximation, the final bus

lifetime is: , ( { }) 1.1final approxkbus k

Min E LTLT = = , which is 10% longer than before, that is, , ,final approx init approxbus busLT LT> , and

thus, the HC-aware encoding appears to have been effective. However, for this case, , 0.67final exactbusLT = which shows

that the encoding has been unsuccessful. In general, when such a case happens and activities of a bunch of non-

critical lines increase significantly, the actual lifetime of the bus, ,final exactbusLT may become smaller than ,init exact

busLT . So

the question is under what conditions we can trust our approximations and actually expect lifetime improvement

after HC-aware encoding of a bus.

In the above example, assume that the activity of no line becomes larger than 50% of its initial activity. Making that

assumption, the lifetime will be extended by at least by 4% in the worst case. This means that although the absolute

accuracy of our approximation may not be good, its relative accuracy (i.e., fidelity of the approximation) is high.1 As

a general rule, given a set of lines, as long as the maximum line activity is reduced by a high percentage (e.g., 50%

or more with respect to the initial maximum) or activities of non-sensitive are not close to the reduced maximum

activity after encoding (e.g., remain less than 50% of the new maximum), then we will have a lifetime improvement.

This is in fact the scenario that we encounter in practice, that is, in the methods that we will present, we almost

always achieve around 20-50% reduction in the maximum activity (see Results section) and we never a see a large

increase in switching activities of non-critical lines. This means that although the actual activity values after

encoding might affect the accuracy of our approximation, there will be a high fidelity between the exact lifetime and

approximated one and lifetime will be extended almost as much as we expect it to.

Finally we recognize that in the past RT level design techniques for distributing the bus activity have been proposed

[7]. These methods attempt to bind and schedule the data transfers of a control and data flow graph representing the

application so as to evenly distribute the switching activities. In our proposed approach, encoding is performed after

completing the logical design with some level of information from placement and routing. This is the stage in flow

when the arrival time and slew of logic signals and capacitance of interconnect lines can be well estimated. These

1 Let f be an approximation of function g, we say f has a high fidelity with respect to g if f(x) > f(y) implies that g(x) > g(y).

physical attributes of the design are needed to first identify degradation-prone drivers for application of encoding

function.

3. A Methodology for Reducing the Maximum Bus Activity

Based on what we said in the previous section, we will next examine encoding techniques that can reduce the

maximum transition count of individual lines of a bus.

First we give some definitions and describe the notation that will be used throughout this paper.

Definition. A trace is a collection of binary numbers or symbols that consecutively appear on a bus and is

represented by T=<V1,V2, �, VLE>. Each Vi is an M-bit binary number. M is the width of the bus. LE is the length

of the trace. The total number of transitions on line i of a trace is denoted by TRi and is called the bit transition of

line i. This bit transition can be normalized to clock to show the activity of line i. The maximum of these bit

transitions for a trace is denoted by M(T), i.e., M(T)=maxi{TRi}.

Definition. Each Vi in a trace T may be treated as an M-bit binary number. Let Ni denote the number of symbols that

are equal to i, 0 <= i < 2M, i.e. 1

( )i jj LE

N V iδ≤ ≤

= −∑ where δ is the Kronecker Delta function.

Definition. We denote a function mapping on trace T as F(T), where F: V∈ T ! F(V) ∈ F(T). The function is

reversible when F-1(F(T))=T.

Definition. For a trace T, we define inter-trace B as the trace obtained by performing Exclusive-OR (XOR) of

consecutive symbols in trace T, i.e., B=<U1,U2,�,ULE-1> where Ui=Vi⊕ Vi+1 for i ranging from 1 to LE-1. We call

each symbol Ui an inter-symbol. Furthermore, we use the notation B=X(T) (X stands for XOR) to show that B is the

inter-trace of T. Clearly, the number of transitions that occur as the value of bus is changed from Vi to Vi+1 is

determined by the number of one�s in the inter-symbol Ui. For instance, U2=00011 implies that there will be

transitions on first and second line moving from V2 to V3 on a 5-bit bus.

We are interested in finding a function that reduces the maximum bit transition or M(T) for a trace T. Different

traces emerge on a certain bus in different epochs. These traces may be similar to each other in terms of a number of

different characteristics. Of course we would like to design the encoder and decoder that work fine for all these

traces. Therefore, we will look at our problem based on the following methodology: Given a certain amount of

information about a trace, we will investigate whether this information would be sufficient to design the appropriate

encoding/decoding functions for the purpose of reducing the maximum bit transition count. We would next devise

the kind of logic functions, combinational or sequential, that are capable of accomplishing this goal.

We ought to characterize and classify traces based on their characteristics. Indeed, existence and construction of a

one-to-one (reversible) function F that minimizes M(T) is strongly dependent on the set of common characteristics

that are extracted from these traces.

Definition. An equivalent class of traces CL(c) is the set of traces T that are characterized based on some

characteristic of interest, c. In other words, they cannot be distinguished from each other as far as characteristic c is

concerned.

Example classes are CL(N0=100 i.e. a trace with exactly a hundred symbols equal to zero),CL(M=1,LE=200,N0=N1

i.e. A one bit trace of 200 symbols(bits) with equal number of ones and zeros), CL(TRi =Ki given for every i , i.e. all

traces that have Ki transitions on line i, {Ki} is given), CL(a 2M-state lag-one Markov source generator), etc. Notice

that different classes may have common members.

Definition. Bit-level Transition Balancing Problem: Given a class of traces CL(c), the bit-level transition

balancing (BTB) problem refers to the problem of finding a function ψ that

1. is reversible.

2. For every member T of CL(c), M(ψ (T)) < M(T).

Lemma 1. No universal function, combinational or sequential, can always reduce M(T) for all traces.

This is an intuitive, yet imperative, result. Notice that existence of such a function would be in conflict with

information theory principles, because repeated application of that function would eliminate all transitions in a trace

without loss of information. This lemma is remarkable in the sense that it prompts us to characterize the trace first

and then find solutions based on that characteristic. There is no magical logic that can always reduce maximum

activity. We must have a certain level of knowledge about the bus in order to come up with a function that always

works.

We investigate combinational and sequential decoders and encoders separately. Combinational functions, if

successful to do what we want them to, would be much better choices, as they do not need clock and additional

flops. For sequential functions, we recognize two different categories. The first one is a special category of

sequential circuits that we call inter-sequential functions and the second one makes use of general sequential

functions. Inter-sequential, as we will see, are combinational functions that are applied to the inter-trace B instead of

trace T and are usually much simpler than general finite state machines. We first look at combinational logic in

detail.

4. Coding with Combinational Functions

In this section, we investigate when combinational functions are capable of reducing M(T). A combinational

function F: {0,1}M ! {0,1}M can be used for the encoding task only if it is reversible. The total number of such

functions are (2M)!, corresponding to permutations of all symbols in an M-bit space. However, since transitions on

lines are of primary significance, these functions can be partitioned into NP-equivalence sets (NP stands for

Negation and Permutation). Two functions are NP-equivalent exactly if they can be transformed to one another by

inverting one or more output columns and/or swapping some columns. Therefore, the number of NP-equivalence

sets is (2M)! / (M! * 2M). For example, for M=2, there are three distinct classes. One function from each class is

shown in Table 2 (F1, F2 and F3). The function shown from each class has the property of mapping all zero�s to all

zero�s. Some other members of class [F1] have also been shown.

Table 2- Representatives of three NP-equivalence sets in the 2-bit space plus some other members of class [F1].

X F1(X) F2(X) F3(X) Some Members of [F1] 00 00 00 00 00 10 01 01 01 11 10 11 10 10 11 10 01 00 11 11 10 01 11 01

Definition. Consider a function F, we say F replace-inverts the ith bit, if there is a corresponding bit in the output

that is the same as the ith bit or its inversion. Clearly, when a function replace-inverts a bit, it does not change the bit

transition count of that line.

Lemma 2. A characteristic (trace) class defined by a given set of bit transitions {TRi} is unbalanceable under all

reversible combinational functions. More precisely, for every reversible combinational function F, a trace may be

found that actually increases M(T):

{ }, , ( ) : ( ( )) ( )c TR F T CL c M F T M Ti∀ = ∀ ∃ ∈ > .

Proof. For a function F and a given set of bit transitions if all bit transitions are nonzero, we construct a counter

example by building a trace that increases the maximum number of transitions after applying F, i.e. we build a trace

that overwhelms the function. If the number of transitions of a line is zero, that line can be omitted; therefore,

without loss of generality, we assume that all TRi�s are greater than zero.

Assume TRi represents the maximum number of transitions. Any function that reduces M(T) cannot replace-invert

this bit. The trace T can be built as follows:

The first TRi+1 symbols of the trace are composed of two symbols that only differ in ith position, say V1 and V2.

Obviously, F(V1) and F(V2) should differ at least in one bit (say jth position), which will cause the same TRi

transitions on that bit in F(T). Now we add one more symbol to this trace, which causes the maximum number of

transitions to exceed TRi. For this new symbol ith bit is the same as previous symbol. The rest of the bits are allowed

to change without violating the bit transition constraints because TRi�s, are all greater than zero. Now we pick a

symbol from the 2M-1 possible symbols whose mapping will cause a transition on the jth position under function F.

Such a symbol exists because F doesn�t replace-invert the ith bit. The rest of the trace can be easily generated to

fulfill all the remaining bit transition constraints. F is overwhelmed and proof is complete. "

Therefore knowing bit transitions is not enough to solve the BTB problem. Let�s think in a new direction and

assume the information that we have about a trace is the exact count of each symbol (Number of times each symbol

appears in the trace) in the trace. More precisely, the given information looks like this: For trace T, {Ni, 0<=i<2M} is

given.

Lemma 3. A characteristic class defined by {Ni} with Ni > 0, 0<=i<2M, is unbalanceable under all reversible

combinational functions, that is:

{ : 0}, , ( ) : ( ( )) ( )c N N F T CL c M F T M Ti i∀ = > ∀ ∃ ∈ > .

Proof. ({ : 0}) ( ( )) ( ) ({ , 1}) ( ( )) ( )i i i iif F T CL N N M F T M T T CL N N M F T M T∃ ∀ ∈ > < ⇒ ∀ ∈ = < . If a function reduces

M(T) for any trace that is a member of CL({Ni, Ni>0}), the function should work with an arbitrary trace containing

exactly one of each symbol. This is because if equal symbols are ordered consecutively in a trace of CL({Ni: Ni>0}),

their effect on M(T) would be as if only one of them exists in the trace. Therefore, the problem might be reduced to

finding a function that reduces M(T) for a trace that have all symbols exactly once. Now suppose that the input trace

with Ni=1 is arranged to construct a balanced gray code [10]. By definition, this leads to the minimum achievable

M(T) in input, so no function can reduce M(T) for this trace and the proof is complete."

This means that as long as all symbols are present in the trace, no function can guarantee to reduce maximum

transition of the trace. This result is intuitively expected. Characterizing by set {Ni} is not suitable for grouping

traces as far as BTB problem is concerned. This is because the position of these symbols with respect to each other

in the trace is a determining factor, yet its completely ignored in characterization. Interestingly, if not all symbols are

present in a trace, the class may become balanceable. This can be easily shown by the following example. V is a

trace in the 7-bit space and its symbols are shown in the first column, i.e. T∈ CL({Ni=1, 0<=i<8, Ni=0 for the rest of

i}). It is easy to verify that no matter how the symbols are ordered, M(T) will be greater than 2. The second column

which shows F(V). It is not difficult to verify that M(F(T)) will always be 2. In fact, it is the redundancy in the

encoding that enables us to find a combinational function with the desired property. An interesting problem will be

to find those sub-classes (characterized by the number of symbols) for which the problem is balanceable. This is an

open problem and we haven�t been able to solve it yet. Next, we will look at other methods to characterize a trace.

Table 3- Encoding with redundancy to reduce maximum transitions

V, M(T)>2 F(V),M(F(T))=2 11111 000 11111110 11111 001 11111101 11111 011 11111011 11111 111 11110111 11111 101 11101111 11111 100 11011111 11111 110 10111111 11111 010 01111111

Definition. Li denotes the number of inter-symbols in the inter-trace B of a trace T that are equal to i, 0<=i<2M i.e.,

1 1( )i j

j LEL U iδ

≤ ≤ −

= −∑ .

Another way to increase the amount of information compared to the case that the bit transitions of a trace is known,

is to provide the set of Li values, i.e., {Li, i=0…2M-1}. Obviously, bit transitions for line j (TRj) is simply calculated

by adding up those Li�s that correspond to an inter-symbol with a one in its jth position, i.e. TRj=∑Li*(jth bit of Li) for

0<=i<2M. Please take note that for all T∈ CL({Li}), M(T) is the same. Same thing is true when traces are

characterized by {TRi}. But when the traces are modeled by {Ni}, M(T) would be different for different instances

of the trace. This motivates for the following definition.

Definition. Given a characteristic class of traces CL(c) we say that the class CL(c) is uniform if all of its traces have

equal M(T) values. Furthermore, a uniform characteristic class is regular under a set of reversible functions {Fi} if,

for each F∈ {Fi}, characteristic class F(CL(c)) is uniform.

In this work, we name combinational functions with the property F(V1 ⊕ V2)= F(V1) ⊕ F(V2) as inter-combinational

functions. In other words, mapping of inter-symbol of two symbols will be equal to the inter-symbol of their

mapping. We call such a function, an inter-combinational function. Of course, not all the combinational functions in

the M-bit space have this characteristic. Only a small portion of the combinational functions will be inter-

combinational for a large M. If a combinational function is inter-combinational, it is easy to prove that we should

always have F(V=00�0) equal to 00�0. To completely specify the rest of the function, it is enough to specify the

output for M linearly independent inputs (linear independence means none of these values can be generated by an

XOR relationship of the other ones). An instance of a linearly independent set would be one-hot symbols (binary

numbers with a single one in their binary representation). Any other input can be decomposed to an XOR of these

one-hot values. Mapping of any symbol under F is uniquely determined if mapping of an independent set is known.

If a function is inter-combinational, it is possible to calculate M(F(T)) where T belongs to CL({Li}) and Li�s are the

number of inter-symbols.

Note. Characteristic class CL({Li}) is uniform. However, it is only regular under inter-combinational subset of

combinational functions.

Consider an example with M=2. In this case, interestingly, each NP-equivalence class has an inter-combinational

representative. These representatives are exactly the functions that were reported in Table 2. F1, F2 and F3 are all

inter-combinational functions. Now suppose that a set of traces (M=2) is characterized by number of inter-symbols

in the trace as shown in the Table 4. A 01 inter-symbol happens during one of the following events on the bus

(00!01 or 01!00 or 11!10 or 10!11) and total number of such events is equal to L1 based on the following

table.

Table 4- Modeling a trace based on number of inter-symbols.

Inter-Symbol U # (Number) 00 L0 01 L1 10 L2 11 L3

For such a trace M(T) = Max{L1+L3, L2+L3}. It is easy to prove that there exists a function in the 2-bit space that

always reduces M(T) if and only if L3 is the absolute maximum of {L1, L2, L3} and L2 not equal to L1. In such a

case, either F2 or F3 will reduce M(T). For example, if L2 is the minimum of the three i.e. L2<L1<L3, then applying

F2 over T (refer to Table 2) will result in M(F(T)) equal to Max{L2+L1, L2+L3}=L3+L2 which is less than M(T)=

Max{L1+L3, L2+L3}=L1+L3. A similar approach may be used for M>2.

Example. Consider the two least significant bits (LSB) of an instruction address bus. Suppose that the instructions

are sequential 80% of the time. L1, L2, and L3 can be determined as follows (Here Li�s are specified as percentage of

the corresponding inter-symbol to the total number of inter-symbols). Eighty percent sequential instructions

contribute to 40% L1 and 40% L3, whereas 20% non-sequential means 5% L0, 5% L1, 5% L2 and 5% L3. Thus,

L0=5% L1=45% L2=5% L3=45%, M(T)=90%. After applying F2 function to this bus, the new Li�s will be: L0=5%

L1=45% L2=45% L3=5% and thus, M(F(T))=50%.

For T characterized by {Li} we have a methodology to determine whether M(F(T)) is less than M(T) by using an

inter-combinational function F. This means that if the trace is tested against all inter-combinational functions, it

would be possible to answer whether it will be balanceable under inter-combinational functions. However, this test

is not possible for large M�s since the required time for it is exponential with respect to M. In practice, we do not

want to consider M larger that 6 or 7 because of the increasing complexity of encoding/decoding functions. Another

point is that inter-combinational functions are only a small subset of the set of combinational functions. It is very

difficult to analyze the effect of a non-inter-combinational function over a trace characterized by a set of Li�s. (By

non-inter-combinational, we mean a combinational function that is not inter-combinational.) We cannot find a single

CL({Li}) that is balanceable under a non-inter-combinational function. Therefore, we surmise that no characteristic

CL({Li)} is balanceable under non-inter-combinational functions, although we have not been able to prove this

statement. This means that if CL({Li}) is unbalanceable under inter-combinational functions, then it will be

unbalanceable under all combinational functions. The basis for our conjecture is the fact that CL({Li}) is uniform

under the set of inter-combinational functions only. Therefore, it is not possible to control the variations of M(F(T))

of traces under non-inter-combinational functions.

Definition. A trace class CL(c) may be defined by a lag-one Markov Source R(I,S), that is, each distinct symbol V

in any trace T in this class denotes a state s ∈ S of the Markov source, and each pair of consecutive vectors

<Vi,Vi+1> defines a transition edge in the Markov source between si and si+1. I denotes the set of external inputs of

the Markov source.

A Markov source is completely specified if the probability of being in each state and the conditional probability of

transitioning from one state to another are known. The transition probability matrix of R completely defines the

characteristic class CL(c=R(I,S)). We assume that external input values are uniformly distributed in the input space.

Definition. We define a reversible function mapping F on R(I,S) as R(I,F(S)) with F-1(F(R(I,S)))=R(I,S).

Lemma 4. A characteristic class defined by a Markov source R(I,S) is regular under all reversible combinational

functions.

Proof. CL(c=R(I,S)) is uniform and it will mapped to CL(R(I,F(S)) under a combinational function F. CL(R(I,F(S))

is itself uniform, therefore, R(I,S) is regular under all reversible combinational functions.

For small M, it is thus possible to construct an output Markov source for every function F and find new M(F(T)).

Therefore, balanceability can be checked by applying a function of each NP-equivalence class to the Markov source.

The problem of finding best of these functions is similar to the minimum hamming-weight state assignment problem

which is known to be NP-complete [19]. For that reason, we developed a heuristic algorithm for our problem too. Of

course if M is small enough, then brute-force checking will lead to optimum solution.

Definition. Minimum Max-Transition State Assignment: Find a reversible function F such that for traces of class

R(I,F(S)), M(T) is minimized.

We next present a heuristic algorithm, named PermuteStates, for solving this problem. Complexity of step 3 in this

heuristic algorithm is 2MaxSetSize. The larger MaxSetSize is, the closer the heuristic solution will be to the optimum

one.

ALGORITHM (PermuteStates) 1. Generate an initial state assignment by setting F(si) = i, si ∈ S; 2. SetSize = 2; 3. for every subset H of S with cardinality of SetSize 4. for every possible permutation of states in H 5. if the permutation reduces M(T), then accept it and break; 6. if a permutation has been accepted, then SetSize=2 and goto step 3; 7. SetSize = SetSize + 1; 8. if SetSize > MaxSetSize then exit; 9. goto step 3;

A practical example of a Markov source modeling is the characteristic class representing a sequential trace. We

define a sequential Markov source as a source that generates a trace in which each symbol is either equal to the

previous symbol or the previous symbol incremented by one. Now for a sequential trace best transition-balancing is

done using Balanced gray codes. Balanced gray codes are gray codes that result in almost equal number of

transitions on all lines of the trace if the input trace is sequential [11]. By �almost equal,� we mean that the

difference between the transition counts of any two lines is not greater than two. If M is a power of two, it is

possible to construct a fully balanced gray code, i.e., one that would result in exactly the same number of transitions

in all bit lines. Because of this, fully balanced gray codes are more advantageous. Balanced gray codes, which in

general are not inter-combinational functions, are the best solutions for a sequential Markov source [11]. Instruction

address traces fit very well into class of sequential traces. Here, we model instruction and data addresses as a lag one

Markov source. As it will be seen in the experimental results section, a 6-bit balanced gray code is a good practical

solution for instruction address buses.

Before starting the next section, let�s take note of the relationship among different characteristic classes that we have

considered so far. We say characteristic c is reduced to characteristic c� if every T ∈ c is also a T ∈ c� and we write

c < c�. If characteristic c� is enough to determine balanceability under a set of functions, then c will also be enough.

We can simply map any c to the corresponding c� and do the check over characteristic c�. Now consider the four

different characteristic classes that we have examined so far. We have:

{R(I,S) : Markov-Source} < {Ni: Number of symbols} and

{R(I,S) } < {Li : Number of inter-symbols } < {TRi : Bit Transitions}.

We will use these relationships when analyzing sequential functions in the next section.

5. Coding with Sequential Functions

In this section, first, we examine a special category of sequential circuits that we refer to as inter-sequential and later

we study the general class of sequential circuits. We will see that sequential circuits are the most effective functions

for solving BTB because they can balance classes with the least given information, which are the characteristic

classes characterized by bit transitions.

5.1 Inter-Sequential Functions

In inter-sequential encoders, registers and some XOR gates are used to generate the inter-symbols, and then a

combinational function is applied to the inter-trace to generate the output inter-trace. Finally, the output function is

recovered from its inter-trace by using again XOR gates and registers.

Definition. We define an inter-Sequential function mapping G on T as G(T), where X(G(T)) = F(X(T)) and

G(V1)=V1 (F is a combinational function.)

Note that X(T) is the inter-trace of T. In addition, consider that the equation is only determining the inter-trace of G.

The actual output trace depends on how we define mapping of the first symbol. The function is reversible if G-

1(G(T))=T and this happens if and only if F is a reversible combinational function. For inter-sequential functions, we

have F(Vi ⊕ Vi+1)= G(Vi) ⊕ G(Vi+1), a similar equation to what we had for inter-combinational functions.

Inter-Sequential functions are important in the sense that, first, their output only depend on the current symbol and

the previous symbol, i.e. Vi and Vi-1. Therefore, they only need to save the previous symbol, which means less

overhead compared to a general sequential encoder. Second, it would be interesting to analyze these encoders

because of the similarities that they have to inter-combinational circuits.

Lemma 5. A characteristic class defined by bit transitions {TRi} is not balanceable under any reversible inter-

sequential functions. More precisely, for every reversible inter-sequential function G, a trace may be found M(G(T))

is higher than M(T) :

{ }, , ( ) : ( ( )) ( )c TR G T CL c M G T M Ti∀ = ∀ ∃ ∈ > .

Proof. Suppose, TRj is the maximum number of bit transitions. Consider an input trace T composed of TRj+1

symbols that have transitions only in the jth position. This means the inter-symbol consists of a symbol with all

zero�s except on the jth position (a one-hot code). For any function that reduces the maximum transition of this trace,

this one-hot code should be mapped to zero. Otherwise, the same number of transitions will happen on some other

line in the output trace. Therefore, inter-symbol zero should be mapped to another inter-symbol that has at least a

one in its binary representation. Now, the last symbol in T can be appended to it without causing any change to its

bit transitions. This means that inter-symbol zero can be added in any trace without altering the bit transition

constraints. However the added zero inter-symbols should be mapped to a non-zero inter-symbol and this will cause

a transition on some bit of the output trace and by doing this the function will be eventually overwhelmed and proof

is complete. "

Lemma 6. A characteristic class defined by the number of inter-symbols {Li} is regular under all reversible inter-

sequential functions.

Proof. A class CL({Li}) will be converted to a class CL({Li�}) under an inter-sequential function and they are both

uniform. Therefore, any characteristic lass CL({Li}) is regular under all reversible inter-sequential functions. "

We examine the case for M=2 just as we did for inter-combinational functions. Suppose that the given information

is the same as that provided in Table 4. Each function corresponds to a permutation of inter-symbols. For trace T,

M(T) is calculated to be equal to Max{L1+L3, L2+L3}. Suppose that the characteristic class is mapped to CL({Li�})

under G. For output traces, M(G(T)) is similarly calculated to be Max{L1�+L3�, L2�+L3�}. To get a lower M(G(T))),

the function should map L3� to the minimum and L0� to the maximum of Li�s. Apparently inter-sequential functions

are more effective in decreasing M(T) compared to inter-combinational subset of combinational functions.

For inter-sequential circuits there is no need to go further than this and model the trace with a Markov source. This

is because CL({Li})<CL(R(I,S)) and balanceability is determined at this level of information.

5.2 General Sequential Functions

General sequential functions are the most effective functions for balancing transitions of a trace. They are also

associated with the maximum complexity compared with the functions of previous sections.

Definition. We define a sequential function mapping H on T as H(T), where H(Vi) = F(V1,�,Vi). The function is

reversible if H-1(H(T))=T. Index i does not have to be a finite number.

Lemma 7. A characteristic class defined by a complete set of bit transitions {TRi} is balanceable exactly if the

difference between the maximum and the minimum bit transition counts is two or more.

{ }, { } { } 2,

, ( ) : ( ( )) ( )

c TR given Max TRi Min TRii

F T CL c M F T M T

∀ = − ≥

∃ ∀ ∈ <

Proof. First we prove an interesting property of any sequential function that reduces M(T). Consider the bit

transitions are given and for any given sequential function, we want to build a trace having those bit transitions in a

fashion that it overwhelms that function. By overwhelming a function we mean demonstrating that the function has

M(H(T)) equal or higher than M(T). First we claim that if for a sequential function H, at any point in time a non-

zero inter-symbol is mapped into a zero inter-symbol (i.e. Ui=0) under F, that function can be overwhelmed very

easily. Since a transition in input has been translated to zero transitions in output, a zero inter-symbol in the input

should lead to at least one transition in the output. Therefore, if at any point during the construction of the trace, the

sequential machine generates no-transition for the non-zero inter-symbol that we are going to insert, we will insert a

zero inter-symbol in the input trace, instead. This will cause at least one transition in the output. If this continues

infinitely, the function will be overwhelmed sooner or later, because we will keep on inserting transitions in H(T)

without inserting transitions in T. Now, suppose we build our input trace in a way that in each step only one

transition happens. Based on what we said we know that the sequential function generates at least one transition in

output for each transition in input. Now, we argue that it is enough to look at only those functions that always have

equal number of transitions in input and in output i.e. they distribute transitions over different lines. In other words

we have proved that no other function can do better than these functions. We have shown the general block diagram

of these transition-distributing functions in Figure 2. The permutation function is not a fixed permutation and might

change with time. This is actually one of the main differences with inter-sequential circuits.

Generate Inter-symbols Permute bits Generate output trace

from inter-symbols

Figure 2- Model for transition-routing sequential functions

A sequential function used to evenly distribute transitions over the bit lines resembles a complex routing network

that can route transitions of each line to any other line. Evidently, the routing configuration should be changed as

time progresses in order to achieve a uniform distribution profile of bit level switching activities. However, it is

important to note that this reconfiguring is only based on the symbols that have been already conveyed to the

receiver and the knowledge of the bit-level activities of different lines. Now, based on what mentioned about the

properties of sequential functions, we state that a suitable function may not exist for all given set of bit-level

transition counts. For instance given bit transitions {TR1=4,TR2=3,TR3=3} it is easy to verify that no function can

reduce M(T). Since transitions are going to be only distributed (not suppressed) between the lines, no distribution

can decrease M(T) in this case. Therefore, we assume that average of TR�s are at least one unit less than M(T) in the

original trace. If only one line has the maximum activity, this will translate into a difference of at least two between

the maximum and the minimum of the bit transitions. Without loss of generality we assume that only one line in T

has the maximum activity.

The algorithm for controlling the routing network is as follows. In each step, we show that based on the transitions

that happen, either the suitable function is found or it will reduce to another balanceable problem.

Assume the line that has the maximum activity is Lmax and the line with the minimum activity is Lmin.

1. If Lmax makes a transition and Lmin does not, swap Lmin and Lmax and set the encoding to be equal to H(V)=V for

the remainder of the trace. For the output trace, M(H(T)) will be less than the original trace.

2. If Lmin makes a transition, difference of at least two will remain between the maximum and the minimum

transitions. Repeat this algorithm.

3. If none of them make any transitions, do not change the routing. Repeat the algorithm.

Its not easy to verify that M(T) will be reduced by at least one using this algorithm and the proof is complete."

Sequential functions are much more effective in implementing bit balancing encoders when compared to

combinational functions. They can actually solve the BTB problem for cases where combinational and inter-

sequential functions utterly fail. However, the above lemma is just to show the potential of sequential circuits. The

algorithm presented in that proof actually reduces M(T) by only one. Yet by cascading such blocks maximum

transition can be reduced as much as it is possible for the given characteristic. However, each of these blocks is

pretty costly because keeping track of transitions on each line and identifying the maximum and the minimum is

definitely very expensive in terms of the hardware resources. The superiority of sequential circuits over

combinational functions comes at the price of their increased overhead.

Such solutions are complex compared to combinational solutions. Therefore, simpler sequential functions may be

used instead that can balance transitions in a heuristic approach. A straightforward example is to use sequential

functions that only swap two different lines. In fact instead of using a complete M-bit to M-bit routing network, two

multiplexers may be used to swap two lines. The bit swapping should be done in a way to make the transition count

of these two lines almost equal. Therefore, if there is a big difference between the transition counts of two lines, then

the maximum transition can be prudently decreased.

The first proposed block is called the interchange block shown in Figure 3. It simply swaps two lines every time the

value of the two lines are equal, i.e. both of them are zero or one. This is because we do not want to add any extra

transitions to the original trace. For this scheme to work properly, the two lines should be uncorrelated. Otherwise, it

can be easily shown that this encoder may become ineffective in some cases (as an example, consider the trace

<00,01,00,00,01,00,00,�>). The interchange block is a fast solution when a vulnerable line is neighbored by a

sturdy line from which it can get some help to convey the information, while reducing its exposure to degradation.

Transition Counter Control Logic

Input 1

Input 2

Output 1

Output 2

Figure 3- Interchange Block

It is possible to use several interchange blocks or progressive levels of it to achieve better results. In such cases, to

make the scheme even simpler and decrease the overhead of the controlling logic, we can modify the interchange

block and employ a global decision maker for all blocks whose job is to swap the lines in all interchange blocks after

K clocks. This approach may marginally increase the total number of transitions but it will be almost as effective as

before in terms of reducing M(T). We call this technique the global-interchange solution.

In practice, it is sometimes very efficient to use functions that map a non-zero inter-symbol to inter-symbol zero in

the output. This may lead to cancellation of a large number of transitions. For example, this will be advantageous if

the coding is devised in a way to suppress transitions for sequential addresses in a trace of instruction addresses.

This is a common trick used in encodings that aim to reduce maximum transitions of a trace [13]. Other

configurations are also possible. One effective solution is to extract the inter-symbol and rotate it by I bits in each

clock cycle. I is incremented by one (I is calculated mod n) for the next cycle. Therefore the one�s in the inter-

symbol are distributed over different lines. Figure 3 illustrates this scheme. Another solution would be to send

(Vi+1)⊕ Vi+1 to the rotating network for sequential traces. This has the additional advantage that it leads to no

transitions when values are sequential, consequently M(T) can be reduced even more. We call the first approach

XOR-Rotate and the second one T0-XOR-Rotate. Recall that T0-XOR [13] is an irredundant bus encoding technique

that sends (Vi+1)⊕ Vi+1 on the bus by transition signaling (XORing the value with the previous value on the bus).

Vi ⊕⊕⊕⊕ Vi+1O r

(V i+1) ⊕⊕⊕⊕ V i+1

Rotate I bits and XOR, increase I

T F(T)

Figure 4- XOR-Rotate & T0-XOR-Rotate

6. Experimental Results

In the previous sections, we studied the BTB problem under different constraints. In this section, we present

experimental results of applying different encoding techniques on two kinds of traces, i.e. instruction address traces

and data address traces. We have applied both combinational and sequential functions to these traces and compared

the results with each other. Our methodology is to report the results based on averaging over six different SPEC2000

benchmarks: vpr, parser, equake, gcc, vortex, and art. Each trace was generated by simulating 10 million

instructions using Simplescalar architecture simulator [21]. We report two different quantities for each method: 1)

Max Transition Ratio which is the ratio of the maximum transition count of the bus lines after encoding to that

before encoding, 2) Total Transition Ratio which is the ratio of the total transition count of the bus after encoding to

that before encoding. Of course, Max Transition Ratio is of primary interest in this paper, however, Total Transition

Ratio shows what percent of the total transitions has been eliminated. The greater this percentage, the less

probability of odd scenarios (such as the one investigated at the end of section 2) and the more energy saving. Table

5 and Table 6 present the comparison between different techniques.

Instruction address traces are mostly sequential traces; therefore, we expect the balanced gray code to be the most

effective method for reducing the maximum transition counts of these traces. Using the balanced gray code, for each

new sequential address, only one transition occurs and this transition is distributed over different bus lines. We

tested balanced gray codes for buses of width 3, 4, 5 and 6. The number after dash sign in front of each balanced

gray code entry refers to the width of the bus. For ideal sequential symbols, the result should get better as the bus

becomes wider. However, as can be seen in the table, this is not the case for instruction addresses and the marginal

improvement in the performance of balanced gray code diminishes as a result of non-sequential instructions. Next

entries in the table correspond to results obtained by applying PermuteStates technique on a Markov model

extracted from the instruction addresses. Again numbers after dash show the width of the bus, e.g., PermuteState-4

is the result for a 4-bit wide bus. Based on results, PermuteStates performs better than balanced gray code when the

size of the bus is 5 or 6. We used MaxSetSize equal to 4. We also experimented over Interchange, the sequential

encoder presented in the final section. We have reported the results for three configurations of the interchange block.

For two lines, if one line is more active than the other one, interchange distributes the transitions of more active line

to the other line. We used configurations with multiple levels of interchange. The number after dash in front of each

interchange entry represents the number of levels used. As expected, Interchange-1 is capable of reducing M(T) by

half. This is due to the fact that the highest active line of the bus is grouped with a line with almost zero activity.

Therefore, the transitions after encoding will be the total transition count divided by two. The reported results are for

the best possible configuration of grouping two bits, meaning that at each level, the line with the highest activity is

grouped with the line with the lowest activity and so on.

Finally, We have reported the results for XOR-Rotate and T0-XOR-Rotate methods. As we mentioned earlier, these

methods require rotation networks that can perform arbitrary amount of rotation in each clock. The superb results of

these methods come at the expense of having extremely complex logic networks to perform the required arithmetic

and rotation operations. As it can be seen, the performance of T0-XOR-Rotate is superior to than that of XOR-Rotate.

This is due to the fact that T0-XOR-Rotate eliminates many transitions by exploiting the sequentiality of instruction

addresses.

Table 5- Comparison of different methods applied over instruction addresses.

Method Max Trans. Ratio Total Trans. Ratio Bal. Gray-3 0.538 0.704 Bal. Gray-4 0.311 0.654 Bal. Gray-5 0.314 0.633 Bal. Gray-6 0.235 0.603 PermuteStates-4 0.351 0.708 PermuteStates-5 0.265 0.621 PermuteStates-6 0.231 0.567 Interchange-1 0.508 1 Interchange-2 0.302 1 Interchange-3 0.203 1 XOR-Rotate 0.084 1 T0-XOR-Rotate 0.017 0.199

In practice, target bus may not be sequential. In such a case, methods like balanced gray code would not be

applicable anymore. A very simple example would be a data address bus. In Table 6, we have shown the results for

such a bus. Notation and conventions in this table are similar to Table 5. It can be observed that the balanced gray

code has a poor performance. Besides that, due to the fact that data address bus transitions are originally much more

balanced compared to instruction address buses, even interchange or XOR-Rotate will not be a successful solution

and are not reported in the table. The only effective solution that we could find in this case, is the PermuteStates

encoder. We have reported results for three different bus widths.

Table 6- Comparison of different methods applied over data addresses.

Method Max Trans. Ratio Total Trans. Ratio Bal. Gray-4 .823 0.946 Bal. Gray-5 .981 1.021 Bal. Gray-6 .960 0.982 PermuteStates-4 0.586 0.822 PermuteStates-5 0.523 0.823 PermuteStates-6 0.467 0.775 Interchange-1 1.043 1

Lets summarize the results of the two tables. For instruction address buses, when we experiment over buses of size

up to 6, best result is generated by using three levels of interchange and leads to 79.3% reduction in max transition

whereas PermuteStates heuristic achieves 76.9% reduction. PermuteStates is a pure combinational logic and needs

much less overhead compared to three levels of Interchange block. We do not take into account the result of XOR-

Rotate and T0-XOR-Rotate because of their infeasibility. For data addresses, the only real effective method is

PermuteStates technique. It achieves a 53.3% reduction in max transitions and none of the other techniques has a

close performance. All techniques other than Interchange, bring a good amount of reduction in total transitions as

well. As explained early in the paper, this will certify the validity of estimations to a certain level.

7. Conclusions

In this paper, we thoroughly investigated the problem of reducing maximum transition count for a group of lines.

We looked at the problem when various levels of information are available for the trace and by applying different

kinds of functions such as combinational, inter-sequential and sequential logic. We were able to exactly solve the

problem in many cases. We presented polynomial time solutions when the exact solution leads to a non-feasible

algorithm. We presented experimental results using instruction and data addresses buses, which are good examples

of typical buses that might be vulnerable to hot-carrier degradation. Our experimental results also show the

effectiveness of Markov-source heuristic for instruction address traces and data address traces. The actual selection

of a technique highly depends on the characteristic of the trace and other constraints in the system.

8. Reference

1. Y. Leblebici, S. M. Kang, Hot Carrier Reliability of MOS VLSI Circuits, Kulwer Academic Publishers, 1993.

2. E. A. Amerasekera and F. N. Najm, Failure Mechanisms in Semiconductor Devices. Wiley& Sons, 1998.

3. H. Yonezawa, J. Fang, Y. Kawakami, N. Iwanishi, L. Wu, A. Chen, N. Koike, P. Chen, C. Yeh and Z. Liu, �Ratio Based Hot-Carrier Degradation Modeling for Aged Timing Simulation of Millions of Transistors Digital Circuits, IEEE Int’l Electron Devices Meeting Technical Digest, pp. 93-96, 1998.

4. P.C. Li and I. Hajj, �Computer Aided Redesign of VLSI Circuits for Hot-Carrier Reliability,� Proc. of International Conference on Computer Design, 1993.

5. C. Chang, K. Wang, M. Marek-Sadowska, � Layout-Driven Hot-Carrier Degradation Minimization Using Logic Restructuring Techniques,� Proc. Design Automation Conference, 2001.

6. K. Roy and S. Prasad, � Logic Synthesis for Reliability � An Early Start to Controlling Electromigration and Hot Carrier Effects,� Proc. Design Automation and Test in Europe, 1994.

7. A. Dasgupta, R. Karri, �Electromigration Reliability Enhancement Via Bus Activity Distribution,� Proc. of Design Automation Conference, 1996.

8. A. Dasgupta, R. Karri, � Hot-Carrier Reliability Enhancement via Input Reordering and Transistor Sizing�, Proc. of Design Automation Conference, pp. 819-824, 1996.

9. Z. Chen, I. Koren, � Technology Mapping for Hot-Carrier Reliability Enhancement�, Proc. of International Society for Optical Engineering, Vol. 3216,pp. 42-50, 1997.

10. K. C. Kapur, L.R. Lamberson, �Reliability in Engineering Design,� John Wiley & Sons, 1997.

11. G. S. Bhat, C. D. Savage, � Balanced Gray Codes,� Electronic Journal of Combinatorics 3, No. 1, R25, 1996.

12. L. Benini, G. De Micheli, E. Macii, D. Sciuto, C. Silvano, �Asymptotic Zero-Transition Activity Encoding for Address Buses in Low-Power Microprocessor-Based Systems,� IEEE 7th Great Lakes Symposium on VLSI, Urbana, IL, pp. 77-82, 1997.

13. W. Fornaciari, M. Polentarutti, D.Sciuto, and C. Silvano, �Power Optimization of System-Level Address Buses Based on Software Profiling,� Proc. International Symposium on Hardware/Software Codesign, pp. 29-33, Apr. 2000.

14. S. Ramprasad, N. Shanbhag, I. N. Hajj, � A Coding Framework for Low-Power Address and Data Busses�, IEEE Transactions on Very Large Scale Integration Systems, Vol 7, No. 2, pp. 1280-1294, June 1999.

15. P. P. Sotiriadis, A. P. Chandrakasan, � Bus Energy Reduction By Transition Pattern Coding Using a Detailed Deep Submicrometer Bus Model,� IEEE Transaction on Circuits and Systems, Vol 5, No. 10, Oct 2003.

16. Y. Aghaghiri, F. Fallah, M. Pedram, �Reducing Transitions on Memory Buses Using Sector-Based Encoding Technique,� Proc. of Int’l Symposium on Low Power Electronics and Design, pp. 190-195, 2002.

17. Komatsu, M. Ikeda, K. Asada, � Low Power Chip Interface based on Bus Data Encoding with Adaptive Code-book Method�, Proc. of Ninth Great Lakes Symposium, pp368-371, 1999.

18. A. Abdollahi, F. Fallah, M. Pedram, � Runtime Mechanisms for Leakage Current Reduction in CMOS VLSI Circuits,� Proc. Intl. Symposium on Low Power Electronics and Design, pp. 213-218, Aug. 2002.

19. V. Veeramachaneni, A. Tyagi, S. Rajgopal, � Re-encoding for Low Power State Assignment of FSMs,� Intl. Symposium on Low Power Electronics and Design, pp. 173-178, 1995.

20. JEITA, Standard of Japan Electronics and Information Technology Industries Association, �Failure Mechanism Driven Reliability Test Methods for LSIs (Amendment 1)�, Oct 2001.

21. http://www.simplescalar.com/

Date post:	03-Feb-2022
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Combating Hot Carrier Effects via Bit-level Transition Balancing

Documents