Combating Hot Carrier Effects via Bit-level Transition Balancing
Yazdan Aghaghiri, Massoud Pedram
Abstract - In this paper, we address the issue of hot-carriers and their impact on reducing the reliability of a VLSI
circuit by accelerating the aging process of transistors. We tackle this phenomenon by formulating and solving an
encoding problem, which we refer to as the Bit-level Transition Balancing (BTB). The BTB problem is to find
encoding techniques that minimize the maximum activity over a group of lines, which we refer to as a bus, thus
making the whole bus less vulnerable to hot-carrier degradation. We approach this problem systematically by first
answering the question of how much information about the characteristics of the data that appears on the bus is
needed to find a combinational and/or sequential encoding function that optimally solve the BTB problem. Next, we
propose a number of different encoding techniques that efficiently solve the BTB problem. Experimental results
demonstrate the effectiveness of such techniques.
1. Introduction
With the current trend of shrinking minimum feature sizes and rising clock frequencies in VLSI circuits, reliability
has become a major design issue. In spite of one-time benefits of reducing the supply voltage level, the substrate
temperature and power density are rapidly increasing. At the same time, a decrease in critical device dimensions to
sub micron ranges, results in more intense horizontal and vertical electric fields in the channel region. Under the
gate of a transistor, these enormous fields give rise to electrons and holes with kinetic energies significantly higher
than silicon band gap (1.1eV.) These electrons and holes may be injected into the gate oxide and can cause
permanent changes in the oxide interface charge distribution. This phenomenon is called hot carrier effect, or
sometimes hot electron effect because this injection happens more often for electrons due to smaller barrier height of
electrons as compared to holes (i.e. 3.1eV for electrons compared to 4.8eV for holes [2].) A sizeable increase in the
threshold voltage of the affected transistors and a corresponding decrease in their drain current driving capability are
undesirable results of such hot carrier injections in the gate oxide. The hot carrier effect is exacerbated as the
technology moves toward smaller device dimensions and higher clock frequencies [2]. Another phenomenon caused
by carriers having energies higher than 1.1ev is the creation of electron-hole pairs through impact ionization. In an
NMOS device, generated electrons are collected by the drain whereas generated holes drift in the substrate toward
the ground terminals, and thereby, contribute to a substrate current. Carrier injection and impact ionization become
most severe when the device is in the saturation region because in this case the intensity of electrical fields in the
channel is at maximum. Therefore, device degradation strongly depends on the duration of time that the transistor
stays in the saturation region and substrate leakage current is a good indicator of the degree of device degradation
[4]. Every time, the output of a gate makes a transition, some transistors should pass through the saturation region
either to turn on or turn off. This means that the amount of time spent in saturation directly depends in the output
activity of the device. Besides it depends on the output load capacitance and slew rate of the input signal since these
two parameters directly determine the output slew rate for a given gate. Now, let Dfresh and Daged denote the fresh
and stressed (aged) propagation delay of a CMOS inverter, we can write:
( ) ( , , . ).aged slew load sw freshD t T C N t Dψ=
where Tslew denotes the input slew rate, Cload is the output capacitance, Nsw represents the output switching activity,
and t the time parameter (total �on-time� of the circuit in seconds.) Function ψ is a non-linear function which is
determined from transistor level simulations and it is usually represented as a three-dimensional table [5]. For gates
other than the simple inverters, the ratio-based degradation model proposed in [3] is applied. In this model we
have aged freshD Dα= , where α≥1 is defined as the overall degradation of all transistors in the gate and is calculated
as:
1...( ) 1i
i nnα α
=
= − +∑
In the above equation n is the number of transistors in series and αi ≥1 is the aged to fresh delay ratio when only
input pin i is under stress and is defined by ( , , )slew load swT C Nψ .
Hot carrier degradation can cause digital systems to fail. A line driver may fail due to an increase in its threshold
voltage, which slows down the driver to a point that it violates the bus time constraints, i.e., a gate delay fault is
being created as a result of hot carrier degradation. It is also possible to encounter a case whereby the increased
delay of consecutive gates along a combinational path in the circuit creates a setup time violation for the flip-flop at
the end of the path, which is known as a path delay fault. In this work, we do not consider faults caused by
accumulated path delay. Our focus is on on-chip busses with a single driver. We define the lifetime of a wire or line
as the Mean Time To Failure (MTTF) of its driving gate. This wire might be a segment of a bus between the driver
and a repeater or a segment between two repeaters. We also assume that the whole system fails as soon as a single
wire fails.
There are different approaches for modeling the lifetime of a wire. For example, we can say that a line fails when
aged fresh freshD D D Dβ∆ = − > where β≥0 is a user-defined parameter greater than zero (e.g., 0.25.). However
lifetime of a wire is actually a random variable with a certain probability distribution function. Therefore, the above
deterministic definition is not suitable. Unfortunately it is extremely difficult to derive the lifetime distribution from
the physics of the hot-carrier effect [2]. Complete statistical characterization of a transistor�s lifetime highly depends
on technology and various physical phenomena. Instead what one typically does is to empirically determine the
distribution that best fits the lifetime of a wire in different designs and different chip realizations of the same design.
Both lognormal and Weibull probability distribution functions have been used for characterizing hot-carrier lifetime
[20]. These two distribution functions are however similar near their mean values. In this paper, we choose to model
the lifetime of an interconnect driver as a lognormal random variable. In addition, based on the previous discussion,
we make the following assumption: �mean value of the driver lifetime is inversely proportional to its output
switching activity.�
Hot carrier effect has been extensively studied in the past few years. Different design techniques such as transistor
resizing and reordering [8,2], logic factorization [6], technology mapping [9], logic restructuring [5] and binding and
scheduling [7] for minimizing hot-carrier degradation have been introduced. In this paper, we propose a completely
new approach to minimizing the hot carrier induced failures of on-chip busses in VLSI circuits. More precisely, by
increasing lifetime of the gates that are subject to most severe hot carrier degradation (e.g., bus drivers), we increase
the lifetime of the whole circuit. Buses tend to have high capacitance compared to other wires in the system. In
addition they usually have a high activity rate compared to other nets in the circuit. This makes the bus drivers some
of the most hot-electron degradation-prone components.
We assume that we are given a set of interconnect lines in a VLSI circuit, which are flagged as hot-carrier sensitive
(HC-sensitive.) Furthermore, we assume that if one of these lines fails, then the whole circuit will fail. Our approach
is to add some encoding/decoding hardware for protecting the circuit against hot carrier failure. We want the
encoding and decoding functions to be as lightweight as possible so as to minimize their impact on overall area,
delay and power consumption of the circuit. Getting back to the bus problem, the fact is that, even with equal sized
drivers and identical loads for each and every bus line, different lines of the same bus may age at different rates due
to their different bit-level activities over time. For example, the LSB of a bus carrying small positive numbers over
the circuit lifetime is expected to have a much higher activity then the MSB of the same bus. The bus as a whole,
however, fails as soon as any of its lines fail. The encoding solutions we propose increase the lifetime of a bus by
balancing switching activity among all bus lines, i.e., by minimizing the maximum line activity of the bus. If we do
not balance the transition counts over different lines of the bus, then the expected lifetime of the bus will be will be
dominated by the most active line of the bus, and as a result, it will be much shorter.
The remainder of this paper is organized as follows. In the next section, we will setup the problem precisely and in
section 3 we will explain the solution type that we have adopted to solve this problem and look at some definitions.
In Section 4 different combinational functions are examined, where in Section 5 sequential solutions are studied.
Results are presented in Section 6, whereas conclusions are provided in Section 7.
2. Problem Setup
The goal of this work is to apply encoding functions that reduce the maximum bit-level activity over a set of bus line
drivers. The bus encoding and decoding function are performed in such a way that the logic functionality of the
circuit is not impacted. If we consider a single-bit bus, there will obviously exist no single-bit encoding and
decoding functions that can reduce the bit-level activity without modifying the surrounding logic. Instead consider a
multi-bit bus where the bus lines are bundled together, implying that the set of drivers are placed in close proximity
of each other as are the set of bus receivers. We assume that the bus drivers have nearly identical electrical
characteristics (i.e., in terms of their size and driver strength.) Our goal is to maximize the bus lifetime, which is set
by the minimum lifetime among all its bit lines, through bus encoding. The lifetime of a bus line is mainly
determined from the output switching activity of its driver. This is because the drivers for all bus lines are the same,
so the only factor impacting the line lifetime is its activity level, which in turn determines the extent of the hot-
carrier effect on the line driver.
Figure 1 shows an example of a bus comprising of three global lines. One of these lines has much higher activity
than the other two, and is thus flagged as an HC-sensitive line. The two other lines can be used to reduce the hot-
carrier vulnerability of this sensitive line by reducing the output switching activity of its driver. This can likely be
achieved by increasing the activity of the non-sensitive lines. Note however that the overall lifetime of the bus will
improve because the maximum activity of the bus lines will have been reduced. Adding the decoder and encoder
logic can be done with a minimal impact on the chip routing because of the proximity of the bus drivers and
receivers.
Encoder
Decoder
Sensitive Line Sensitive Line
ThreeGlobal Lines
Figure 1 A set of 3 global lines that are good candidates for encoding and decoding.
For a given bus, we denote the lifetime of the driver of line i by a r.v. (random variable), iLT . We assume that each
iLT is a r.v. with lognormal probability distribution function and that these r.v.�s are independent from each other.
We further assume that the expectation value of iLT is proportional to the inverse of the switching activity of line i.
Therefore, the Mean Time To Failure (i.e., the lifetime) of a bus line is inversely proportional to the switching
activity of the line driver. When a single bit line driver fails, the whole bus fails, the whole circuit fails. If we denote
the lifetime of the whole bus with LTbus, then we have { }bus i iLT Min LT= . Define ( ) ( )iLT iG T PROB LT T≡ ≤
as the cumulative distribution function (cdf) of iLT . Similarly, the cdf of
busLT can be defined as:
( ) ( ) 1 (1 ( ))bus iLT bus LT
iG T PROB LT T G T≡ ≤ = − −∏ .
Let )(TPX denote the (marginal) probability distribution function (pdf) of r.v. X. The PDF of busLT is obtained by
taking derivative of its cumulative distribution function, yielding:
( ) ( ( ) (1 ( )))bus i jLT LT LT
i j i
P T P T G T≠
= −∑ ∏
Obviously, { }( ) ( )bus i iE LT E Min LT= , where E denotes the expectation value operator. It is, however, difficult to
calculate this expectation value. Instead, we approximate it with )}({ ii LTEMin , which is accurate if the variances of
all iLT lognormal variables are in the same range. This is because if any of the iLT variables has an expectation
value, which is much larger than )}({ ii LTEMin , then that variable will have little impact in determining )( busLTE .
To illustrate this point, let L and S denote two lognormal r.v.�s such that the mean of variable L is greater than the
mean of variable S. Table 1 reports a comparison of ( ( ), ( )) ( )Min E L E S E S= with ( ( , ))E Min L S for different
combinations of ( ) / ( )E L E S and ( ) / ( )Var L Var S assuming E(S)=1,Var(S)=0.25. This table shows three facts:
1) For a fixed ratio of ( ) / ( )E L E S , the error between ( ( , ))E Min L S and ( )E S grows as the ratio of
( ) / ( )Var L Var S increases.
2) For a fixed ratio of ( ) / ( )Var L Var S , the error between ( ( , ))E Min L S and ( )E S diminishes as the ratio of
( ) / ( )E L E S increases.
3) For ( ) / ( )E L E S > 2, the error between ( ( , ))E Min L S and ( )E S vanishes and remains so independent of
( ) / ( )Var L Var S as if variable L has no impact in determining ( ( , ))E Min L S .
Table 1- ( ( , ))E Min L S / ( )E S
( ) / ( )E L E S 1 2 3 5 7 10 1 0.73 0.96 0.99 0.99 0.99 0.99 4 0.63 0.92 0.99 0.99 0.99 0.99 9 0.55 0.87 0.97 0.99 0.99 0.99 16 0.50 0.82 0.95 0.99 0.99 0.99 V
ar(L
) / V
ar(S
)
25 0.45 0.77 0.92 0.99 0.99 0.99
Speaking more generally, we know: ∫∫∞
=
∞
=
−−+=00
)()()()()()()S)(L,(ττ
ττττττ dLPSQdSPLQSELEMinE
where ∫∞
=
=τ
τt
XX dtttPQ )()( . For two lognormal random variables L and S where )()( SELE ≥ , if the PDF
function of the overlap between the pdf functions of these two random variables is small, then )()( ττ SL PQ may be
approximated with )()( τSPLE whereas )()( ττ LS PQ is nearly zero. Therefore, in this case,
( ( , )) ( ) ( ( ), ( ))E Min L S E S Min E L E S≈ = .
Although the above conclusions are made for the case of having only two r.v.�s, our experiments show that they will
still hold as long as the number of r.v.�s remains small, i.e., ≤ 8. For example, suppose that instead of two r.v.�s, S
and L, we have five r.v.�s S, L1 � L4 such that E(Li)=3E(S) and Var(Li)/Var(S) < 9 for i=1…4. In this case, using
our approximation will only result in an additional 4% error on top of the results of Table 1 (the total error will be
7%, which is still rather small.)
The hot-carrier-aware bus encoding problem is different than the low-power bus encoding problem [12-16]. This is
because the objective of the former is to minimize the maximum of bit-level activity (a minmax cost function)
whereas the latter attempts to minimize the total bit-level activity (a minsum cost function.) Note that HC-aware bus
encoding can be applied to a bus that has been optimized for low power dissipation to in order to balance the bit-
level transition counts of bus lines. This may result in some increase in the total power dissipation, but increases the
bus reliability over time.
The abovementioned problem setup is based on a number of approximations. It is possible to come up with rare
examples where by reducing the maximum bit-level activity of every bus line; the lifetime of the bus becomes
actually shorter. To be more precise, consider a four-bit bus 1 2 3( , , , )S L L L , where the expected values of log-
normally distributed lifetime variables, LTi, of the bus lines are: 1 2 3( ) 1, ( ) ( ) ( ) 5E S E L E L E L= = = = . All variances
are all equal to 0.25. The exact lifetime of this bus (assuming lognormal distribution) is written as:
, ( { }) ( {log ( ( ), ( )})init exactk k kbus k k
E Min LT E Min norm E LT Var LTLT = = . With our proposed approximation, we have:
, ( { }) 1init approxkbus k
Min E LTLT = = .
Suppose that after HC-aware bus encoding, the activities change such that the new lifetime expectation values are:
( ) ( ) 1.1iE S E L= = for all i. Let�s assume that the variance remains the same. From our approximation, the final bus
lifetime is: , ( { }) 1.1final approxkbus k
Min E LTLT = = , which is 10% longer than before, that is, , ,final approx init approxbus busLT LT> , and
thus, the HC-aware encoding appears to have been effective. However, for this case, , 0.67final exactbusLT = which shows
that the encoding has been unsuccessful. In general, when such a case happens and activities of a bunch of non-
critical lines increase significantly, the actual lifetime of the bus, ,final exactbusLT may become smaller than ,init exact
busLT . So
the question is under what conditions we can trust our approximations and actually expect lifetime improvement
after HC-aware encoding of a bus.
In the above example, assume that the activity of no line becomes larger than 50% of its initial activity. Making that
assumption, the lifetime will be extended by at least by 4% in the worst case. This means that although the absolute
accuracy of our approximation may not be good, its relative accuracy (i.e., fidelity of the approximation) is high.1 As
a general rule, given a set of lines, as long as the maximum line activity is reduced by a high percentage (e.g., 50%
or more with respect to the initial maximum) or activities of non-sensitive are not close to the reduced maximum
activity after encoding (e.g., remain less than 50% of the new maximum), then we will have a lifetime improvement.
This is in fact the scenario that we encounter in practice, that is, in the methods that we will present, we almost
always achieve around 20-50% reduction in the maximum activity (see Results section) and we never a see a large
increase in switching activities of non-critical lines. This means that although the actual activity values after
encoding might affect the accuracy of our approximation, there will be a high fidelity between the exact lifetime and
approximated one and lifetime will be extended almost as much as we expect it to.
Finally we recognize that in the past RT level design techniques for distributing the bus activity have been proposed
[7]. These methods attempt to bind and schedule the data transfers of a control and data flow graph representing the
application so as to evenly distribute the switching activities. In our proposed approach, encoding is performed after
completing the logical design with some level of information from placement and routing. This is the stage in flow
when the arrival time and slew of logic signals and capacitance of interconnect lines can be well estimated. These
1 Let f be an approximation of function g, we say f has a high fidelity with respect to g if f(x) > f(y) implies that g(x) > g(y).
physical attributes of the design are needed to first identify degradation-prone drivers for application of encoding
function.
3. A Methodology for Reducing the Maximum Bus Activity
Based on what we said in the previous section, we will next examine encoding techniques that can reduce the
maximum transition count of individual lines of a bus.
First we give some definitions and describe the notation that will be used throughout this paper.
Definition. A trace is a collection of binary numbers or symbols that consecutively appear on a bus and is
represented by T=<V1,V2, �, VLE>. Each Vi is an M-bit binary number. M is the width of the bus. LE is the length
of the trace. The total number of transitions on line i of a trace is denoted by TRi and is called the bit transition of
line i. This bit transition can be normalized to clock to show the activity of line i. The maximum of these bit
transitions for a trace is denoted by M(T), i.e., M(T)=maxi{TRi}.
Definition. Each Vi in a trace T may be treated as an M-bit binary number. Let Ni denote the number of symbols that
are equal to i, 0 <= i < 2M, i.e. 1
( )i jj LE
N V iδ≤ ≤
= −∑ where δ is the Kronecker Delta function.
Definition. We denote a function mapping on trace T as F(T), where F: V∈ T ! F(V) ∈ F(T). The function is
reversible when F-1(F(T))=T.
Definition. For a trace T, we define inter-trace B as the trace obtained by performing Exclusive-OR (XOR) of
consecutive symbols in trace T, i.e., B=<U1,U2,�,ULE-1> where Ui=Vi⊕ Vi+1 for i ranging from 1 to LE-1. We call
each symbol Ui an inter-symbol. Furthermore, we use the notation B=X(T) (X stands for XOR) to show that B is the
inter-trace of T. Clearly, the number of transitions that occur as the value of bus is changed from Vi to Vi+1 is
determined by the number of one�s in the inter-symbol Ui. For instance, U2=00011 implies that there will be
transitions on first and second line moving from V2 to V3 on a 5-bit bus.
We are interested in finding a function that reduces the maximum bit transition or M(T) for a trace T. Different
traces emerge on a certain bus in different epochs. These traces may be similar to each other in terms of a number of
different characteristics. Of course we would like to design the encoder and decoder that work fine for all these
traces. Therefore, we will look at our problem based on the following methodology: Given a certain amount of
information about a trace, we will investigate whether this information would be sufficient to design the appropriate
encoding/decoding functions for the purpose of reducing the maximum bit transition count. We would next devise
the kind of logic functions, combinational or sequential, that are capable of accomplishing this goal.
We ought to characterize and classify traces based on their characteristics. Indeed, existence and construction of a
one-to-one (reversible) function F that minimizes M(T) is strongly dependent on the set of common characteristics
that are extracted from these traces.
Definition. An equivalent class of traces CL(c) is the set of traces T that are characterized based on some
characteristic of interest, c. In other words, they cannot be distinguished from each other as far as characteristic c is
concerned.
Example classes are CL(N0=100 i.e. a trace with exactly a hundred symbols equal to zero),CL(M=1,LE=200,N0=N1
i.e. A one bit trace of 200 symbols(bits) with equal number of ones and zeros), CL(TRi =Ki given for every i , i.e. all
traces that have Ki transitions on line i, {Ki} is given), CL(a 2M-state lag-one Markov source generator), etc. Notice
that different classes may have common members.
Definition. Bit-level Transition Balancing Problem: Given a class of traces CL(c), the bit-level transition
balancing (BTB) problem refers to the problem of finding a function ψ that
1. is reversible.
2. For every member T of CL(c), M(ψ (T)) < M(T).
Lemma 1. No universal function, combinational or sequential, can always reduce M(T) for all traces.
This is an intuitive, yet imperative, result. Notice that existence of such a function would be in conflict with
information theory principles, because repeated application of that function would eliminate all transitions in a trace
without loss of information. This lemma is remarkable in the sense that it prompts us to characterize the trace first
and then find solutions based on that characteristic. There is no magical logic that can always reduce maximum
activity. We must have a certain level of knowledge about the bus in order to come up with a function that always
works.
We investigate combinational and sequential decoders and encoders separately. Combinational functions, if
successful to do what we want them to, would be much better choices, as they do not need clock and additional
flops. For sequential functions, we recognize two different categories. The first one is a special category of
sequential circuits that we call inter-sequential functions and the second one makes use of general sequential
functions. Inter-sequential, as we will see, are combinational functions that are applied to the inter-trace B instead of
trace T and are usually much simpler than general finite state machines. We first look at combinational logic in
detail.
4. Coding with Combinational Functions
In this section, we investigate when combinational functions are capable of reducing M(T). A combinational
function F: {0,1}M ! {0,1}M can be used for the encoding task only if it is reversible. The total number of such
functions are (2M)!, corresponding to permutations of all symbols in an M-bit space. However, since transitions on
lines are of primary significance, these functions can be partitioned into NP-equivalence sets (NP stands for
Negation and Permutation). Two functions are NP-equivalent exactly if they can be transformed to one another by
inverting one or more output columns and/or swapping some columns. Therefore, the number of NP-equivalence
sets is (2M)! / (M! * 2M). For example, for M=2, there are three distinct classes. One function from each class is
shown in Table 2 (F1, F2 and F3). The function shown from each class has the property of mapping all zero�s to all
zero�s. Some other members of class [F1] have also been shown.
Table 2- Representatives of three NP-equivalence sets in the 2-bit space plus some other members of class [F1].
X F1(X) F2(X) F3(X) Some Members of [F1] 00 00 00 00 00 10 01 01 01 11 10 11 10 10 11 10 01 00 11 11 10 01 11 01
Definition. Consider a function F, we say F replace-inverts the ith bit, if there is a corresponding bit in the output
that is the same as the ith bit or its inversion. Clearly, when a function replace-inverts a bit, it does not change the bit
transition count of that line.
Lemma 2. A characteristic (trace) class defined by a given set of bit transitions {TRi} is unbalanceable under all
reversible combinational functions. More precisely, for every reversible combinational function F, a trace may be
found that actually increases M(T):
{ }, , ( ) : ( ( )) ( )c TR F T CL c M F T M Ti∀ = ∀ ∃ ∈ > .
Proof. For a function F and a given set of bit transitions if all bit transitions are nonzero, we construct a counter
example by building a trace that increases the maximum number of transitions after applying F, i.e. we build a trace
that overwhelms the function. If the number of transitions of a line is zero, that line can be omitted; therefore,
without loss of generality, we assume that all TRi�s are greater than zero.
Assume TRi represents the maximum number of transitions. Any function that reduces M(T) cannot replace-invert
this bit. The trace T can be built as follows:
The first TRi+1 symbols of the trace are composed of two symbols that only differ in ith position, say V1 and V2.
Obviously, F(V1) and F(V2) should differ at least in one bit (say jth position), which will cause the same TRi
transitions on that bit in F(T). Now we add one more symbol to this trace, which causes the maximum number of
transitions to exceed TRi. For this new symbol ith bit is the same as previous symbol. The rest of the bits are allowed
to change without violating the bit transition constraints because TRi�s, are all greater than zero. Now we pick a
symbol from the 2M-1 possible symbols whose mapping will cause a transition on the jth position under function F.
Such a symbol exists because F doesn�t replace-invert the ith bit. The rest of the trace can be easily generated to
fulfill all the remaining bit transition constraints. F is overwhelmed and proof is complete. "
Therefore knowing bit transitions is not enough to solve the BTB problem. Let�s think in a new direction and
assume the information that we have about a trace is the exact count of each symbol (Number of times each symbol
appears in the trace) in the trace. More precisely, the given information looks like this: For trace T, {Ni, 0<=i<2M} is
given.
Lemma 3. A characteristic class defined by {Ni} with Ni > 0, 0<=i<2M, is unbalanceable under all reversible
combinational functions, that is:
{ : 0}, , ( ) : ( ( )) ( )c N N F T CL c M F T M Ti i∀ = > ∀ ∃ ∈ > .
Proof. ({ : 0}) ( ( )) ( ) ({ , 1}) ( ( )) ( )i i i iif F T CL N N M F T M T T CL N N M F T M T∃ ∀ ∈ > < ⇒ ∀ ∈ = < . If a function reduces
M(T) for any trace that is a member of CL({Ni, Ni>0}), the function should work with an arbitrary trace containing
exactly one of each symbol. This is because if equal symbols are ordered consecutively in a trace of CL({Ni: Ni>0}),
their effect on M(T) would be as if only one of them exists in the trace. Therefore, the problem might be reduced to
finding a function that reduces M(T) for a trace that have all symbols exactly once. Now suppose that the input trace
with Ni=1 is arranged to construct a balanced gray code [10]. By definition, this leads to the minimum achievable
M(T) in input, so no function can reduce M(T) for this trace and the proof is complete."
This means that as long as all symbols are present in the trace, no function can guarantee to reduce maximum
transition of the trace. This result is intuitively expected. Characterizing by set {Ni} is not suitable for grouping
traces as far as BTB problem is concerned. This is because the position of these symbols with respect to each other
in the trace is a determining factor, yet its completely ignored in characterization. Interestingly, if not all symbols are
present in a trace, the class may become balanceable. This can be easily shown by the following example. V is a
trace in the 7-bit space and its symbols are shown in the first column, i.e. T∈ CL({Ni=1, 0<=i<8, Ni=0 for the rest of
i}). It is easy to verify that no matter how the symbols are ordered, M(T) will be greater than 2. The second column
which shows F(V). It is not difficult to verify that M(F(T)) will always be 2. In fact, it is the redundancy in the
encoding that enables us to find a combinational function with the desired property. An interesting problem will be
to find those sub-classes (characterized by the number of symbols) for which the problem is balanceable. This is an
open problem and we haven�t been able to solve it yet. Next, we will look at other methods to characterize a trace.
Table 3- Encoding with redundancy to reduce maximum transitions
V, M(T)>2 F(V),M(F(T))=2 11111 000 11111110 11111 001 11111101 11111 011 11111011 11111 111 11110111 11111 101 11101111 11111 100 11011111 11111 110 10111111 11111 010 01111111
Definition. Li denotes the number of inter-symbols in the inter-trace B of a trace T that are equal to i, 0<=i<2M i.e.,
1 1( )i j
j LEL U iδ
≤ ≤ −
= −∑ .
Another way to increase the amount of information compared to the case that the bit transitions of a trace is known,
is to provide the set of Li values, i.e., {Li, i=0…2M-1}. Obviously, bit transitions for line j (TRj) is simply calculated
by adding up those Li�s that correspond to an inter-symbol with a one in its jth position, i.e. TRj=∑Li*(jth bit of Li) for
0<=i<2M. Please take note that for all T∈ CL({Li}), M(T) is the same. Same thing is true when traces are
characterized by {TRi}. But when the traces are modeled by {Ni}, M(T) would be different for different instances
of the trace. This motivates for the following definition.
Definition. Given a characteristic class of traces CL(c) we say that the class CL(c) is uniform if all of its traces have
equal M(T) values. Furthermore, a uniform characteristic class is regular under a set of reversible functions {Fi} if,
for each F∈ {Fi}, characteristic class F(CL(c)) is uniform.
In this work, we name combinational functions with the property F(V1 ⊕ V2)= F(V1) ⊕ F(V2) as inter-combinational
functions. In other words, mapping of inter-symbol of two symbols will be equal to the inter-symbol of their
mapping. We call such a function, an inter-combinational function. Of course, not all the combinational functions in
the M-bit space have this characteristic. Only a small portion of the combinational functions will be inter-
combinational for a large M. If a combinational function is inter-combinational, it is easy to prove that we should
always have F(V=00�0) equal to 00�0. To completely specify the rest of the function, it is enough to specify the
output for M linearly independent inputs (linear independence means none of these values can be generated by an
XOR relationship of the other ones). An instance of a linearly independent set would be one-hot symbols (binary
numbers with a single one in their binary representation). Any other input can be decomposed to an XOR of these
one-hot values. Mapping of any symbol under F is uniquely determined if mapping of an independent set is known.
If a function is inter-combinational, it is possible to calculate M(F(T)) where T belongs to CL({Li}) and Li�s are the
number of inter-symbols.
Note. Characteristic class CL({Li}) is uniform. However, it is only regular under inter-combinational subset of
combinational functions.
Consider an example with M=2. In this case, interestingly, each NP-equivalence class has an inter-combinational
representative. These representatives are exactly the functions that were reported in Table 2. F1, F2 and F3 are all
inter-combinational functions. Now suppose that a set of traces (M=2) is characterized by number of inter-symbols
in the trace as shown in the Table 4. A 01 inter-symbol happens during one of the following events on the bus
(00!01 or 01!00 or 11!10 or 10!11) and total number of such events is equal to L1 based on the following
table.
Table 4- Modeling a trace based on number of inter-symbols.
Inter-Symbol U # (Number) 00 L0 01 L1 10 L2 11 L3
For such a trace M(T) = Max{L1+L3, L2+L3}. It is easy to prove that there exists a function in the 2-bit space that
always reduces M(T) if and only if L3 is the absolute maximum of {L1, L2, L3} and L2 not equal to L1. In such a
case, either F2 or F3 will reduce M(T). For example, if L2 is the minimum of the three i.e. L2<L1<L3, then applying
F2 over T (refer to Table 2) will result in M(F(T)) equal to Max{L2+L1, L2+L3}=L3+L2 which is less than M(T)=
Max{L1+L3, L2+L3}=L1+L3. A similar approach may be used for M>2.
Example. Consider the two least significant bits (LSB) of an instruction address bus. Suppose that the instructions
are sequential 80% of the time. L1, L2, and L3 can be determined as follows (Here Li�s are specified as percentage of
the corresponding inter-symbol to the total number of inter-symbols). Eighty percent sequential instructions
contribute to 40% L1 and 40% L3, whereas 20% non-sequential means 5% L0, 5% L1, 5% L2 and 5% L3. Thus,
L0=5% L1=45% L2=5% L3=45%, M(T)=90%. After applying F2 function to this bus, the new Li�s will be: L0=5%
L1=45% L2=45% L3=5% and thus, M(F(T))=50%.
For T characterized by {Li} we have a methodology to determine whether M(F(T)) is less than M(T) by using an
inter-combinational function F. This means that if the trace is tested against all inter-combinational functions, it
would be possible to answer whether it will be balanceable under inter-combinational functions. However, this test
is not possible for large M�s since the required time for it is exponential with respect to M. In practice, we do not
want to consider M larger that 6 or 7 because of the increasing complexity of encoding/decoding functions. Another
point is that inter-combinational functions are only a small subset of the set of combinational functions. It is very
difficult to analyze the effect of a non-inter-combinational function over a trace characterized by a set of Li�s. (By
non-inter-combinational, we mean a combinational function that is not inter-combinational.) We cannot find a single
CL({Li}) that is balanceable under a non-inter-combinational function. Therefore, we surmise that no characteristic
CL({Li)} is balanceable under non-inter-combinational functions, although we have not been able to prove this
statement. This means that if CL({Li}) is unbalanceable under inter-combinational functions, then it will be
unbalanceable under all combinational functions. The basis for our conjecture is the fact that CL({Li}) is uniform
under the set of inter-combinational functions only. Therefore, it is not possible to control the variations of M(F(T))
of traces under non-inter-combinational functions.
Definition. A trace class CL(c) may be defined by a lag-one Markov Source R(I,S), that is, each distinct symbol V
in any trace T in this class denotes a state s ∈ S of the Markov source, and each pair of consecutive vectors
<Vi,Vi+1> defines a transition edge in the Markov source between si and si+1. I denotes the set of external inputs of
the Markov source.
A Markov source is completely specified if the probability of being in each state and the conditional probability of
transitioning from one state to another are known. The transition probability matrix of R completely defines the
characteristic class CL(c=R(I,S)). We assume that external input values are uniformly distributed in the input space.
Definition. We define a reversible function mapping F on R(I,S) as R(I,F(S)) with F-1(F(R(I,S)))=R(I,S).
Lemma 4. A characteristic class defined by a Markov source R(I,S) is regular under all reversible combinational
functions.
Proof. CL(c=R(I,S)) is uniform and it will mapped to CL(R(I,F(S)) under a combinational function F. CL(R(I,F(S))
is itself uniform, therefore, R(I,S) is regular under all reversible combinational functions.
For small M, it is thus possible to construct an output Markov source for every function F and find new M(F(T)).
Therefore, balanceability can be checked by applying a function of each NP-equivalence class to the Markov source.
The problem of finding best of these functions is similar to the minimum hamming-weight state assignment problem
which is known to be NP-complete [19]. For that reason, we developed a heuristic algorithm for our problem too. Of
course if M is small enough, then brute-force checking will lead to optimum solution.
Definition. Minimum Max-Transition State Assignment: Find a reversible function F such that for traces of class
R(I,F(S)), M(T) is minimized.
We next present a heuristic algorithm, named PermuteStates, for solving this problem. Complexity of step 3 in this
heuristic algorithm is 2MaxSetSize. The larger MaxSetSize is, the closer the heuristic solution will be to the optimum
one.
ALGORITHM (PermuteStates) 1. Generate an initial state assignment by setting F(si) = i, si ∈ S; 2. SetSize = 2; 3. for every subset H of S with cardinality of SetSize 4. for every possible permutation of states in H 5. if the permutation reduces M(T), then accept it and break; 6. if a permutation has been accepted, then SetSize=2 and goto step 3; 7. SetSize = SetSize + 1; 8. if SetSize > MaxSetSize then exit; 9. goto step 3;
A practical example of a Markov source modeling is the characteristic class representing a sequential trace. We
define a sequential Markov source as a source that generates a trace in which each symbol is either equal to the
previous symbol or the previous symbol incremented by one. Now for a sequential trace best transition-balancing is
done using Balanced gray codes. Balanced gray codes are gray codes that result in almost equal number of
transitions on all lines of the trace if the input trace is sequential [11]. By �almost equal,� we mean that the
difference between the transition counts of any two lines is not greater than two. If M is a power of two, it is
possible to construct a fully balanced gray code, i.e., one that would result in exactly the same number of transitions
in all bit lines. Because of this, fully balanced gray codes are more advantageous. Balanced gray codes, which in
general are not inter-combinational functions, are the best solutions for a sequential Markov source [11]. Instruction
address traces fit very well into class of sequential traces. Here, we model instruction and data addresses as a lag one
Markov source. As it will be seen in the experimental results section, a 6-bit balanced gray code is a good practical
solution for instruction address buses.
Before starting the next section, let�s take note of the relationship among different characteristic classes that we have
considered so far. We say characteristic c is reduced to characteristic c� if every T ∈ c is also a T ∈ c� and we write
c < c�. If characteristic c� is enough to determine balanceability under a set of functions, then c will also be enough.
We can simply map any c to the corresponding c� and do the check over characteristic c�. Now consider the four
different characteristic classes that we have examined so far. We have:
{R(I,S) : Markov-Source} < {Ni: Number of symbols} and
{R(I,S) } < {Li : Number of inter-symbols } < {TRi : Bit Transitions}.
We will use these relationships when analyzing sequential functions in the next section.
5. Coding with Sequential Functions
In this section, first, we examine a special category of sequential circuits that we refer to as inter-sequential and later
we study the general class of sequential circuits. We will see that sequential circuits are the most effective functions
for solving BTB because they can balance classes with the least given information, which are the characteristic
classes characterized by bit transitions.
5.1 Inter-Sequential Functions
In inter-sequential encoders, registers and some XOR gates are used to generate the inter-symbols, and then a
combinational function is applied to the inter-trace to generate the output inter-trace. Finally, the output function is
recovered from its inter-trace by using again XOR gates and registers.
Definition. We define an inter-Sequential function mapping G on T as G(T), where X(G(T)) = F(X(T)) and
G(V1)=V1 (F is a combinational function.)
Note that X(T) is the inter-trace of T. In addition, consider that the equation is only determining the inter-trace of G.
The actual output trace depends on how we define mapping of the first symbol. The function is reversible if G-
1(G(T))=T and this happens if and only if F is a reversible combinational function. For inter-sequential functions, we
have F(Vi ⊕ Vi+1)= G(Vi) ⊕ G(Vi+1), a similar equation to what we had for inter-combinational functions.
Inter-Sequential functions are important in the sense that, first, their output only depend on the current symbol and
the previous symbol, i.e. Vi and Vi-1. Therefore, they only need to save the previous symbol, which means less
overhead compared to a general sequential encoder. Second, it would be interesting to analyze these encoders
because of the similarities that they have to inter-combinational circuits.
Lemma 5. A characteristic class defined by bit transitions {TRi} is not balanceable under any reversible inter-
sequential functions. More precisely, for every reversible inter-sequential function G, a trace may be found M(G(T))
is higher than M(T) :
{ }, , ( ) : ( ( )) ( )c TR G T CL c M G T M Ti∀ = ∀ ∃ ∈ > .
Proof. Suppose, TRj is the maximum number of bit transitions. Consider an input trace T composed of TRj+1
symbols that have transitions only in the jth position. This means the inter-symbol consists of a symbol with all
zero�s except on the jth position (a one-hot code). For any function that reduces the maximum transition of this trace,
this one-hot code should be mapped to zero. Otherwise, the same number of transitions will happen on some other
line in the output trace. Therefore, inter-symbol zero should be mapped to another inter-symbol that has at least a
one in its binary representation. Now, the last symbol in T can be appended to it without causing any change to its
bit transitions. This means that inter-symbol zero can be added in any trace without altering the bit transition
constraints. However the added zero inter-symbols should be mapped to a non-zero inter-symbol and this will cause
a transition on some bit of the output trace and by doing this the function will be eventually overwhelmed and proof
is complete. "
Lemma 6. A characteristic class defined by the number of inter-symbols {Li} is regular under all reversible inter-
sequential functions.
Proof. A class CL({Li}) will be converted to a class CL({Li�}) under an inter-sequential function and they are both
uniform. Therefore, any characteristic lass CL({Li}) is regular under all reversible inter-sequential functions. "
We examine the case for M=2 just as we did for inter-combinational functions. Suppose that the given information
is the same as that provided in Table 4. Each function corresponds to a permutation of inter-symbols. For trace T,
M(T) is calculated to be equal to Max{L1+L3, L2+L3}. Suppose that the characteristic class is mapped to CL({Li�})
under G. For output traces, M(G(T)) is similarly calculated to be Max{L1�+L3�, L2�+L3�}. To get a lower M(G(T))),
the function should map L3� to the minimum and L0� to the maximum of Li�s. Apparently inter-sequential functions
are more effective in decreasing M(T) compared to inter-combinational subset of combinational functions.
For inter-sequential circuits there is no need to go further than this and model the trace with a Markov source. This
is because CL({Li})<CL(R(I,S)) and balanceability is determined at this level of information.
5.2 General Sequential Functions
General sequential functions are the most effective functions for balancing transitions of a trace. They are also
associated with the maximum complexity compared with the functions of previous sections.
Definition. We define a sequential function mapping H on T as H(T), where H(Vi) = F(V1,�,Vi). The function is
reversible if H-1(H(T))=T. Index i does not have to be a finite number.
Lemma 7. A characteristic class defined by a complete set of bit transitions {TRi} is balanceable exactly if the
difference between the maximum and the minimum bit transition counts is two or more.
{ }, { } { } 2,
, ( ) : ( ( )) ( )
c TR given Max TRi Min TRii
F T CL c M F T M T
∀ = − ≥
∃ ∀ ∈ <
Proof. First we prove an interesting property of any sequential function that reduces M(T). Consider the bit
transitions are given and for any given sequential function, we want to build a trace having those bit transitions in a
fashion that it overwhelms that function. By overwhelming a function we mean demonstrating that the function has
M(H(T)) equal or higher than M(T). First we claim that if for a sequential function H, at any point in time a non-
zero inter-symbol is mapped into a zero inter-symbol (i.e. Ui=0) under F, that function can be overwhelmed very
easily. Since a transition in input has been translated to zero transitions in output, a zero inter-symbol in the input
should lead to at least one transition in the output. Therefore, if at any point during the construction of the trace, the
sequential machine generates no-transition for the non-zero inter-symbol that we are going to insert, we will insert a
zero inter-symbol in the input trace, instead. This will cause at least one transition in the output. If this continues
infinitely, the function will be overwhelmed sooner or later, because we will keep on inserting transitions in H(T)
without inserting transitions in T. Now, suppose we build our input trace in a way that in each step only one
transition happens. Based on what we said we know that the sequential function generates at least one transition in
output for each transition in input. Now, we argue that it is enough to look at only those functions that always have
equal number of transitions in input and in output i.e. they distribute transitions over different lines. In other words
we have proved that no other function can do better than these functions. We have shown the general block diagram
of these transition-distributing functions in Figure 2. The permutation function is not a fixed permutation and might
change with time. This is actually one of the main differences with inter-sequential circuits.
Generate Inter-symbols Permute bits Generate output trace
from inter-symbols
Figure 2- Model for transition-routing sequential functions
A sequential function used to evenly distribute transitions over the bit lines resembles a complex routing network
that can route transitions of each line to any other line. Evidently, the routing configuration should be changed as
time progresses in order to achieve a uniform distribution profile of bit level switching activities. However, it is
important to note that this reconfiguring is only based on the symbols that have been already conveyed to the
receiver and the knowledge of the bit-level activities of different lines. Now, based on what mentioned about the
properties of sequential functions, we state that a suitable function may not exist for all given set of bit-level
transition counts. For instance given bit transitions {TR1=4,TR2=3,TR3=3} it is easy to verify that no function can
reduce M(T). Since transitions are going to be only distributed (not suppressed) between the lines, no distribution
can decrease M(T) in this case. Therefore, we assume that average of TR�s are at least one unit less than M(T) in the
original trace. If only one line has the maximum activity, this will translate into a difference of at least two between
the maximum and the minimum of the bit transitions. Without loss of generality we assume that only one line in T
has the maximum activity.
The algorithm for controlling the routing network is as follows. In each step, we show that based on the transitions
that happen, either the suitable function is found or it will reduce to another balanceable problem.
Assume the line that has the maximum activity is Lmax and the line with the minimum activity is Lmin.
1. If Lmax makes a transition and Lmin does not, swap Lmin and Lmax and set the encoding to be equal to H(V)=V for
the remainder of the trace. For the output trace, M(H(T)) will be less than the original trace.
2. If Lmin makes a transition, difference of at least two will remain between the maximum and the minimum
transitions. Repeat this algorithm.
3. If none of them make any transitions, do not change the routing. Repeat the algorithm.
Its not easy to verify that M(T) will be reduced by at least one using this algorithm and the proof is complete."
Sequential functions are much more effective in implementing bit balancing encoders when compared to
combinational functions. They can actually solve the BTB problem for cases where combinational and inter-
sequential functions utterly fail. However, the above lemma is just to show the potential of sequential circuits. The
algorithm presented in that proof actually reduces M(T) by only one. Yet by cascading such blocks maximum
transition can be reduced as much as it is possible for the given characteristic. However, each of these blocks is
pretty costly because keeping track of transitions on each line and identifying the maximum and the minimum is
definitely very expensive in terms of the hardware resources. The superiority of sequential circuits over
combinational functions comes at the price of their increased overhead.
Such solutions are complex compared to combinational solutions. Therefore, simpler sequential functions may be
used instead that can balance transitions in a heuristic approach. A straightforward example is to use sequential
functions that only swap two different lines. In fact instead of using a complete M-bit to M-bit routing network, two
multiplexers may be used to swap two lines. The bit swapping should be done in a way to make the transition count
of these two lines almost equal. Therefore, if there is a big difference between the transition counts of two lines, then
the maximum transition can be prudently decreased.
The first proposed block is called the interchange block shown in Figure 3. It simply swaps two lines every time the
value of the two lines are equal, i.e. both of them are zero or one. This is because we do not want to add any extra
transitions to the original trace. For this scheme to work properly, the two lines should be uncorrelated. Otherwise, it
can be easily shown that this encoder may become ineffective in some cases (as an example, consider the trace
<00,01,00,00,01,00,00,�>). The interchange block is a fast solution when a vulnerable line is neighbored by a
sturdy line from which it can get some help to convey the information, while reducing its exposure to degradation.
Transition Counter Control Logic
Input 1
Input 2
Output 1
Output 2
Figure 3- Interchange Block
It is possible to use several interchange blocks or progressive levels of it to achieve better results. In such cases, to
make the scheme even simpler and decrease the overhead of the controlling logic, we can modify the interchange
block and employ a global decision maker for all blocks whose job is to swap the lines in all interchange blocks after
K clocks. This approach may marginally increase the total number of transitions but it will be almost as effective as
before in terms of reducing M(T). We call this technique the global-interchange solution.
In practice, it is sometimes very efficient to use functions that map a non-zero inter-symbol to inter-symbol zero in
the output. This may lead to cancellation of a large number of transitions. For example, this will be advantageous if
the coding is devised in a way to suppress transitions for sequential addresses in a trace of instruction addresses.
This is a common trick used in encodings that aim to reduce maximum transitions of a trace [13]. Other
configurations are also possible. One effective solution is to extract the inter-symbol and rotate it by I bits in each
clock cycle. I is incremented by one (I is calculated mod n) for the next cycle. Therefore the one�s in the inter-
symbol are distributed over different lines. Figure 3 illustrates this scheme. Another solution would be to send
(Vi+1)⊕ Vi+1 to the rotating network for sequential traces. This has the additional advantage that it leads to no
transitions when values are sequential, consequently M(T) can be reduced even more. We call the first approach
XOR-Rotate and the second one T0-XOR-Rotate. Recall that T0-XOR [13] is an irredundant bus encoding technique
that sends (Vi+1)⊕ Vi+1 on the bus by transition signaling (XORing the value with the previous value on the bus).
Vi ⊕⊕⊕⊕ Vi+1O r
(V i+1) ⊕⊕⊕⊕ V i+1
Rotate I bits and XOR, increase I
T F(T)
Figure 4- XOR-Rotate & T0-XOR-Rotate
6. Experimental Results
In the previous sections, we studied the BTB problem under different constraints. In this section, we present
experimental results of applying different encoding techniques on two kinds of traces, i.e. instruction address traces
and data address traces. We have applied both combinational and sequential functions to these traces and compared
the results with each other. Our methodology is to report the results based on averaging over six different SPEC2000
benchmarks: vpr, parser, equake, gcc, vortex, and art. Each trace was generated by simulating 10 million
instructions using Simplescalar architecture simulator [21]. We report two different quantities for each method: 1)
Max Transition Ratio which is the ratio of the maximum transition count of the bus lines after encoding to that
before encoding, 2) Total Transition Ratio which is the ratio of the total transition count of the bus after encoding to
that before encoding. Of course, Max Transition Ratio is of primary interest in this paper, however, Total Transition
Ratio shows what percent of the total transitions has been eliminated. The greater this percentage, the less
probability of odd scenarios (such as the one investigated at the end of section 2) and the more energy saving. Table
5 and Table 6 present the comparison between different techniques.
Instruction address traces are mostly sequential traces; therefore, we expect the balanced gray code to be the most
effective method for reducing the maximum transition counts of these traces. Using the balanced gray code, for each
new sequential address, only one transition occurs and this transition is distributed over different bus lines. We
tested balanced gray codes for buses of width 3, 4, 5 and 6. The number after dash sign in front of each balanced
gray code entry refers to the width of the bus. For ideal sequential symbols, the result should get better as the bus
becomes wider. However, as can be seen in the table, this is not the case for instruction addresses and the marginal
improvement in the performance of balanced gray code diminishes as a result of non-sequential instructions. Next
entries in the table correspond to results obtained by applying PermuteStates technique on a Markov model
extracted from the instruction addresses. Again numbers after dash show the width of the bus, e.g., PermuteState-4
is the result for a 4-bit wide bus. Based on results, PermuteStates performs better than balanced gray code when the
size of the bus is 5 or 6. We used MaxSetSize equal to 4. We also experimented over Interchange, the sequential
encoder presented in the final section. We have reported the results for three configurations of the interchange block.
For two lines, if one line is more active than the other one, interchange distributes the transitions of more active line
to the other line. We used configurations with multiple levels of interchange. The number after dash in front of each
interchange entry represents the number of levels used. As expected, Interchange-1 is capable of reducing M(T) by
half. This is due to the fact that the highest active line of the bus is grouped with a line with almost zero activity.
Therefore, the transitions after encoding will be the total transition count divided by two. The reported results are for
the best possible configuration of grouping two bits, meaning that at each level, the line with the highest activity is
grouped with the line with the lowest activity and so on.
Finally, We have reported the results for XOR-Rotate and T0-XOR-Rotate methods. As we mentioned earlier, these
methods require rotation networks that can perform arbitrary amount of rotation in each clock. The superb results of
these methods come at the expense of having extremely complex logic networks to perform the required arithmetic
and rotation operations. As it can be seen, the performance of T0-XOR-Rotate is superior to than that of XOR-Rotate.
This is due to the fact that T0-XOR-Rotate eliminates many transitions by exploiting the sequentiality of instruction
addresses.
Table 5- Comparison of different methods applied over instruction addresses.
Method Max Trans. Ratio Total Trans. Ratio Bal. Gray-3 0.538 0.704 Bal. Gray-4 0.311 0.654 Bal. Gray-5 0.314 0.633 Bal. Gray-6 0.235 0.603 PermuteStates-4 0.351 0.708 PermuteStates-5 0.265 0.621 PermuteStates-6 0.231 0.567 Interchange-1 0.508 1 Interchange-2 0.302 1 Interchange-3 0.203 1 XOR-Rotate 0.084 1 T0-XOR-Rotate 0.017 0.199
In practice, target bus may not be sequential. In such a case, methods like balanced gray code would not be
applicable anymore. A very simple example would be a data address bus. In Table 6, we have shown the results for
such a bus. Notation and conventions in this table are similar to Table 5. It can be observed that the balanced gray
code has a poor performance. Besides that, due to the fact that data address bus transitions are originally much more
balanced compared to instruction address buses, even interchange or XOR-Rotate will not be a successful solution
and are not reported in the table. The only effective solution that we could find in this case, is the PermuteStates
encoder. We have reported results for three different bus widths.
Table 6- Comparison of different methods applied over data addresses.
Method Max Trans. Ratio Total Trans. Ratio Bal. Gray-4 .823 0.946 Bal. Gray-5 .981 1.021 Bal. Gray-6 .960 0.982 PermuteStates-4 0.586 0.822 PermuteStates-5 0.523 0.823 PermuteStates-6 0.467 0.775 Interchange-1 1.043 1
Lets summarize the results of the two tables. For instruction address buses, when we experiment over buses of size
up to 6, best result is generated by using three levels of interchange and leads to 79.3% reduction in max transition
whereas PermuteStates heuristic achieves 76.9% reduction. PermuteStates is a pure combinational logic and needs
much less overhead compared to three levels of Interchange block. We do not take into account the result of XOR-
Rotate and T0-XOR-Rotate because of their infeasibility. For data addresses, the only real effective method is
PermuteStates technique. It achieves a 53.3% reduction in max transitions and none of the other techniques has a
close performance. All techniques other than Interchange, bring a good amount of reduction in total transitions as
well. As explained early in the paper, this will certify the validity of estimations to a certain level.
7. Conclusions
In this paper, we thoroughly investigated the problem of reducing maximum transition count for a group of lines.
We looked at the problem when various levels of information are available for the trace and by applying different
kinds of functions such as combinational, inter-sequential and sequential logic. We were able to exactly solve the
problem in many cases. We presented polynomial time solutions when the exact solution leads to a non-feasible
algorithm. We presented experimental results using instruction and data addresses buses, which are good examples
of typical buses that might be vulnerable to hot-carrier degradation. Our experimental results also show the
effectiveness of Markov-source heuristic for instruction address traces and data address traces. The actual selection
of a technique highly depends on the characteristic of the trace and other constraints in the system.
8. Reference
1. Y. Leblebici, S. M. Kang, Hot Carrier Reliability of MOS VLSI Circuits, Kulwer Academic Publishers, 1993.
2. E. A. Amerasekera and F. N. Najm, Failure Mechanisms in Semiconductor Devices. Wiley& Sons, 1998.
3. H. Yonezawa, J. Fang, Y. Kawakami, N. Iwanishi, L. Wu, A. Chen, N. Koike, P. Chen, C. Yeh and Z. Liu, �Ratio Based Hot-Carrier Degradation Modeling for Aged Timing Simulation of Millions of Transistors Digital Circuits, IEEE Int’l Electron Devices Meeting Technical Digest, pp. 93-96, 1998.
4. P.C. Li and I. Hajj, �Computer Aided Redesign of VLSI Circuits for Hot-Carrier Reliability,� Proc. of International Conference on Computer Design, 1993.
5. C. Chang, K. Wang, M. Marek-Sadowska, � Layout-Driven Hot-Carrier Degradation Minimization Using Logic Restructuring Techniques,� Proc. Design Automation Conference, 2001.
6. K. Roy and S. Prasad, � Logic Synthesis for Reliability � An Early Start to Controlling Electromigration and Hot Carrier Effects,� Proc. Design Automation and Test in Europe, 1994.
7. A. Dasgupta, R. Karri, �Electromigration Reliability Enhancement Via Bus Activity Distribution,� Proc. of Design Automation Conference, 1996.
8. A. Dasgupta, R. Karri, � Hot-Carrier Reliability Enhancement via Input Reordering and Transistor Sizing�, Proc. of Design Automation Conference, pp. 819-824, 1996.
9. Z. Chen, I. Koren, � Technology Mapping for Hot-Carrier Reliability Enhancement�, Proc. of International Society for Optical Engineering, Vol. 3216,pp. 42-50, 1997.
10. K. C. Kapur, L.R. Lamberson, �Reliability in Engineering Design,� John Wiley & Sons, 1997.
11. G. S. Bhat, C. D. Savage, � Balanced Gray Codes,� Electronic Journal of Combinatorics 3, No. 1, R25, 1996.
12. L. Benini, G. De Micheli, E. Macii, D. Sciuto, C. Silvano, �Asymptotic Zero-Transition Activity Encoding for Address Buses in Low-Power Microprocessor-Based Systems,� IEEE 7th Great Lakes Symposium on VLSI, Urbana, IL, pp. 77-82, 1997.
13. W. Fornaciari, M. Polentarutti, D.Sciuto, and C. Silvano, �Power Optimization of System-Level Address Buses Based on Software Profiling,� Proc. International Symposium on Hardware/Software Codesign, pp. 29-33, Apr. 2000.
14. S. Ramprasad, N. Shanbhag, I. N. Hajj, � A Coding Framework for Low-Power Address and Data Busses�, IEEE Transactions on Very Large Scale Integration Systems, Vol 7, No. 2, pp. 1280-1294, June 1999.
15. P. P. Sotiriadis, A. P. Chandrakasan, � Bus Energy Reduction By Transition Pattern Coding Using a Detailed Deep Submicrometer Bus Model,� IEEE Transaction on Circuits and Systems, Vol 5, No. 10, Oct 2003.
16. Y. Aghaghiri, F. Fallah, M. Pedram, �Reducing Transitions on Memory Buses Using Sector-Based Encoding Technique,� Proc. of Int’l Symposium on Low Power Electronics and Design, pp. 190-195, 2002.
17. Komatsu, M. Ikeda, K. Asada, � Low Power Chip Interface based on Bus Data Encoding with Adaptive Code-book Method�, Proc. of Ninth Great Lakes Symposium, pp368-371, 1999.
18. A. Abdollahi, F. Fallah, M. Pedram, � Runtime Mechanisms for Leakage Current Reduction in CMOS VLSI Circuits,� Proc. Intl. Symposium on Low Power Electronics and Design, pp. 213-218, Aug. 2002.
19. V. Veeramachaneni, A. Tyagi, S. Rajgopal, � Re-encoding for Low Power State Assignment of FSMs,� Intl. Symposium on Low Power Electronics and Design, pp. 173-178, 1995.
20. JEITA, Standard of Japan Electronics and Information Technology Industries Association, �Failure Mechanism Driven Reliability Test Methods for LSIs (Amendment 1)�, Oct 2001.
21. http://www.simplescalar.com/