+ All Categories
Home > Documents > The capacity of coded systems

The capacity of coded systems

Date post: 22-Sep-2016
Category:
Upload: ab
View: 215 times
Download: 0 times
Share this document with a friend
15
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. I, JANUARY 1997 ~ 113 The Capacity of Coded Systems John T. Coffey, Member, IEEE, and Aaron B. Kiely, Member, IEEE Abstruct-There are many situations in which a coding scheme is fixed for a given channel. The data processing theorem implies that the capacity of the composite channel cannot be higher than that of the original channel; usually, the capacity will be strictly lower. This paper examines the problem of how much capacity must be lost in the most common such situations. The various combinations of presence and absence of encoder and decoder, along with choice of encoder and decoder, are considered. The degree to which coding schemes can both deliver low error probability and maintain high capacity in the composite channel is examined. Index Terms- Data processing, superchannel, encoder, de- coder. I. INTRODUCTION IVEN a noisy channel, the channel coding theorem tells G us that we can communicate at rates up to capacity with arbitrarily low error probability by the use of error-correcting codes. On the other hand, the data processing inequality tells us that adding processing can only decrease capacity if it has any effect at all. Thus in achieving any given combination of rate and error probability, by fixing an encoder and/or decoder that must be used, we will have compromised our ability to deliver even better combinations, should the need arise. The natural question is how much capacity is lost in the various situations. A motivating scenario is one in which two classes of user will be transmitting over a channel under our control. User type A is willing to tolerate relatively high error probabilities, and for cost reasons will not apply external coding. User type B demands much lower error probabilities, and will apply arbitrarily sophisticated codes to achieve reliable transmission at the highest possible rates. We must therefore find an encoder and decoder that will deliver acceptable error probability to user A, while preserving the highest possible capacity in the composite channel seen by user B. A second motivating situation is in the design of on-chip error correction for random-access memories. These often have an on-chip error-correcting code, to combat “soft” errors induced by alpha particles and defects introduced in the manufacturing process. The problem is that to maintain high yield, we need simple on-chip circuitry, and thus relatively Manuscript received September 23, 1994; revised June 12, 1996. This work was supported in part by the National Science Foundation under Grant NSF- NCR-9105832. The material in this paper was presented in part at the IEEE Intemational Workshop on Information Theory, Bahia, Brazil, June 21-26, 1992. J. T. Coffey is with the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109 USA. A. B. Kiely is with the Communications Research Group, Mail 238-420, Jet Propulsion Laboratory, Pasadena, CA 91 109 USA. Publisher Item Identifier S 0018-9448(97)00159-4. weak codes, such as the commonly used shortened Hamming codes. See, for example, [18, sec. 4.31. If we plan to address this problem by applying external coding, a primary goal is to choose the on-chip code to maximize the capacity of the chip, considered as a channel, rather than the reliability. In many cases, a coding scheme has been fixed at some point in the past, resulting in performance that is unsatisfactory in new circumstances, and requiring new external coding. A spectacular current example is the Galileo mission, in which the failure of the main antenna has lowered achievable data rates by orders of magnitude. Although it is possible to reconfigure much of the on-board encoding, all communication through the low-gain antenna must pass through the (7,1/2) convolutional code before being transmitted [6]. Lauer [13] calculates capacities for various coded sub- systems in the transmission channel used for Digital Audio Broadcasting, as a method for evaluating sources of overall performance loss. A related topic is the computation of reliability functions for superchannels in concatenated coding schemes as considered by Fomey [S, ch. 41. Rather than computing the performance of particular concatenated coding schemes, Forney uses “rep- resentative” inner channels whose properties are derived from the coding theorem for the underlying channel. In this paper, we examine in detail the implications of the use of fixed error-correction schemes for the capacity of the overall system. The various cases differ in the degree to which various elements of the system are “hard-wired,” and on the underlying channel. We consider the cases of encoder plus decoder, decoder alone, and encoder alone, over the binary- symmetric channel and the hard-quantized Gaussian channel. To distinguish between these cases, we adopt a notation in which information about the setup is present in a subscript. The subscripts ed, e, and d denote encoder-plus-decoder, encoder alone, and decoder alone, respectively. The subscripts B and G denote binary-symmetric channel and hard-limited Gaussian channel, respectively. Thus the capacity with fixed encoder and decoder over a hard-quantized Gaussian channel is denoted Ced,~, and so on. For the hard-limited Gaussian channel, it will be shown that capacity is not necessarily achieved in the wideband limit T t 0. Since the wideband limit is usually of interest anyway, we will denote it with superscript *, so that C:d,G denotes the capacity with fixed encoder and decoder over the hard-limited Gaussian channel as T t 0. We derive the basic expressions for capacity in these dif- ferent cases in Section 11. These different cases give rise to a number of quite different optimization problems; in the remainder of the paper (Sections III-V) we concentrate on a selection of the most interesting of these. 0018-9448/97$10.00 0 1997 IEEE
Transcript
Page 1: The capacity of coded systems

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. I , JANUARY 1997

~

113

The Capacity of Coded Systems John T. Coffey, Member, IEEE, and Aaron B. Kiely, Member, IEEE

Abstruct-There are many situations in which a coding scheme is fixed for a given channel. The data processing theorem implies that the capacity of the composite channel cannot be higher than that of the original channel; usually, the capacity will be strictly lower. This paper examines the problem of how much capacity must be lost in the most common such situations. The various combinations of presence and absence of encoder and decoder, along with choice of encoder and decoder, are considered. The degree to which coding schemes can both deliver low error probability and maintain high capacity in the composite channel is examined.

Index Terms- Data processing, superchannel, encoder, de- coder.

I. INTRODUCTION IVEN a noisy channel, the channel coding theorem tells G us that we can communicate at rates up to capacity with

arbitrarily low error probability by the use of error-correcting codes. On the other hand, the data processing inequality tells us that adding processing can only decrease capacity if it has any effect at all. Thus in achieving any given combination of rate and error probability, by fixing an encoder and/or decoder that must be used, we will have compromised our ability to deliver even better combinations, should the need arise. The natural question is how much capacity is lost in the various situations.

A motivating scenario is one in which two classes of user will be transmitting over a channel under our control. User type A is willing to tolerate relatively high error probabilities, and for cost reasons will not apply external coding. User type B demands much lower error probabilities, and will apply arbitrarily sophisticated codes to achieve reliable transmission at the highest possible rates. We must therefore find an encoder and decoder that will deliver acceptable error probability to user A, while preserving the highest possible capacity in the composite channel seen by user B.

A second motivating situation is in the design of on-chip error correction for random-access memories. These often have an on-chip error-correcting code, to combat “soft” errors induced by alpha particles and defects introduced in the manufacturing process. The problem is that to maintain high yield, we need simple on-chip circuitry, and thus relatively

Manuscript received September 23, 1994; revised June 12, 1996. This work was supported in part by the National Science Foundation under Grant NSF- NCR-9105832. The material in this paper was presented in part at the IEEE Intemational Workshop on Information Theory, Bahia, Brazil, June 21-26, 1992.

J. T. Coffey is with the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109 USA.

A. B. Kiely is with the Communications Research Group, Mail 238-420, Jet Propulsion Laboratory, Pasadena, CA 91 109 USA.

Publisher Item Identifier S 0018-9448(97)00159-4.

weak codes, such as the commonly used shortened Hamming codes. See, for example, [18, sec. 4.31. If we plan to address this problem by applying external coding, a primary goal is to choose the on-chip code to maximize the capacity of the chip, considered as a channel, rather than the reliability.

In many cases, a coding scheme has been fixed at some point in the past, resulting in performance that is unsatisfactory in new circumstances, and requiring new external coding. A spectacular current example is the Galileo mission, in which the failure of the main antenna has lowered achievable data rates by orders of magnitude. Although it is possible to reconfigure much of the on-board encoding, all communication through the low-gain antenna must pass through the (7,1/2) convolutional code before being transmitted [6].

Lauer [13] calculates capacities for various coded sub- systems in the transmission channel used for Digital Audio Broadcasting, as a method for evaluating sources of overall performance loss.

A related topic is the computation of reliability functions for superchannels in concatenated coding schemes as considered by Fomey [S, ch. 41. Rather than computing the performance of particular concatenated coding schemes, Forney uses “rep- resentative” inner channels whose properties are derived from the coding theorem for the underlying channel.

In this paper, we examine in detail the implications of the use of fixed error-correction schemes for the capacity of the overall system. The various cases differ in the degree to which various elements of the system are “hard-wired,” and on the underlying channel. We consider the cases of encoder plus decoder, decoder alone, and encoder alone, over the binary- symmetric channel and the hard-quantized Gaussian channel. To distinguish between these cases, we adopt a notation in which information about the setup is present in a subscript. The subscripts ed, e , and d denote encoder-plus-decoder, encoder alone, and decoder alone, respectively. The subscripts B and G denote binary-symmetric channel and hard-limited Gaussian channel, respectively. Thus the capacity with fixed encoder and decoder over a hard-quantized Gaussian channel is denoted C e d , ~ , and so on. For the hard-limited Gaussian channel, it will be shown that capacity is not necessarily achieved in the wideband limit T t 0. Since the wideband limit is usually of interest anyway, we will denote it with superscript *, so that C:d,G denotes the capacity with fixed encoder and decoder over the hard-limited Gaussian channel as T t 0.

We derive the basic expressions for capacity in these dif- ferent cases in Section 11. These different cases give rise to a number of quite different optimization problems; in the remainder of the paper (Sections III-V) we concentrate on a selection of the most interesting of these.

0018-9448/97$10.00 0 1997 IEEE

Page 2: The capacity of coded systems

114 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 1, JANUARY 1997

11. BASIC CAPACITY EXPRESSIONS

A. Encoder Plus Decoder

Given a binary-symmetric channel with a fixed outer linear (n, k ) code, we fix a standard array decoder with some choice of coset representatives E. l The input is a set of k bits, converted by the encoder to a codeword et, the channel adds the error pattern e, the decoder finds the unique codeword cg such that c, - cg + e E E , and the output consists of the k bits corresponding to cg . The superchannel comprising encoder, channel, and decoder is thus a 2'" x 2'" channel in which the transition probability

P(Y = CjJX = e;) = eEE

where & ( C L ) is the probability that the channel adds the noise vector a. Since the code is linear, each row and column of the transition matrix is a permutation of every other row and column, and the channel is therefore symmetric. Capacity is therefore achieved with equiprobable inputs, and so

bitslcodeword.

Since

where

U E S

is the weight enumerator of the set S and p is the crossover probability of the channel, this finally becomes

bitslchannel use. (1)

A typical result is plotted in Fig. 1 (for the (23,12) Golay code).

Suppose that we are using the code over a hard-quantized Gaussian channel with control over the signaling interval Tb.

The crossover probability p of the channel is &( d m ) . It is convenient to normalize the units of time by setting T = PsTb/No so that p = Q(m). Using this value for p in (1) gives a capacity for a fixed value of T of c e d , G ( T )

bitslchannel use, or c e d , G ( T ) / T bitsls. We then find that the capacity of the overall system, including control over the signaling interval, is sup, C e d , G (T) /T bits/s.

For an uncoded system, the optimum choice for T is T -+ 0 (wideband limit). Although this is not always true for coded systems, as we shall see in Section 111-B, the capacity obtained as T i 0 still provides the most general and tractable way of comparing capacity loss across different coded systems.

'We will only cover the case of a standard array decoder here; the case where we allow bounded distance rules and an extra erasure output can be handled by simlar methods but is more unwieldy.

Middle curve:

- P 5

Fig. 1. channel.

Capacity (in bits) using (23,12) Golay code with binary-symmetric

After some calculations (see Appendix I), the wideband limit capacity becomes

bitsls. (2)

We consider the conditions under which T + 0 represents a local optimum in Section 111-B. We can argue that since for the uncoded channel 0 is a local optimum, with capacity dropping rapidly as T becomes larger, the situation at T + 0 represents at least a first approximation to overall capacity loss when a code is used. More generally, the Gaussian channel is one of a wide class of channels with noise scaling [l], in which crossover probability is a function of some resource X and we are attempting to maximize mutual information per unit resource, i.e., a function of the form I ( p ( X ) ) / X . The question of whether X i 0 represents a local optimum depends on the channel, but ii we do choose X + 0, then the capacity loss due to coding is always given by the factor in braces in (2) , which therefore represents a channel-independent way of measuring capacity loss due to coding. Furthermore, if the quantity in braces is positive, then we can always find classes of channels for which the X i 0 case is a global optimum. See Section 111-B and Appendix I for more discussion on these points.

B. Decoder, No Encoder

In this case, the system consists of a binary-symmetric channel followed by a standard array decoder for some linear (n, k ) code. The decoder takes blocks of n received bits, determines the coset of the (n, k ) code in which this received word lies, subtracts the corresponding coset representative, and produces a codeword from the linear (n, k ) code at the output. Thus the superchannel in this case is a 2n x 2'" channel, with inputs consisting of any binary n-tuple w, and outputs consisting of codewords. The transition probability P(w, e ) is the probability of the channel adding an error vector a that is sufficient to place w + a in the decoding region of codeword

Page 3: The capacity of coded systems

COFFEY AND IUELY: THE CAPACITY OF CODED SYSTEMS

c, i.e.,

P(w, c) = PE(c - w + e ) = A c - w + ~ ( p ) . e € &

We note that for any codeword c i , we have P(w+ci, cj+c;) = P(w, c j ) , and so the set of transition probabilities is the same for any two input words in the same coset of the code.

We now have

This upper bound is achieved by taking any input for which the row entropy of the transition matrix is minimized and signaling using this element along with the other words in its coset, each with equal probability. Thus if an encoder is missing, it is best to signal using either the standard encoder or a coset of the code. The resulting capacity is

C d , B = k - min H ( Y I X = w) bitskodeword. (3) wEF,"

C. Encoder, No Decoder In this case, the system consists of the encoder for a

linear (n, k) code followed by a binary-symmetric channel. Thus the superchannel is a 2k x 2n channel. The transition probability from codeword ci to output word T is just pWt("+'%) . (I - p)n-wt(rfca). Now the outputs can be partitioned into subsets consisting of the cosets of the code, and then the transition probabilities within each subset are symmetric. We conclude that the channel is symmetric [9, p. 941 and that capacity is achieved using equiprobable inputs. In this case, we find that H ( Y I X ) = n X ( p ) bitskodeword, and H ( Y ) = H(e)+H(Yle) , where e is the coset representative for Y . Here X(.) is the binary entropy function. Then H(Yle ) = k , and the uncertainty about which coset Y lies in, H ( e ) , is obtained from the weight enumerators of the sets {e + C}. We find that

bitskhannel use. (4)

A typical result (for the ( 2 3 , 1 2 ) Golay code) is sketched in Fig. 1.

D. Overview In the remainder of the paper, we consider various questions

implied by these basic capacity results. The case of encoder plus decoder over a hard-quantized Gaussian channel (2) is considered in Section 111. We consider among others the questions of whether it is possible to lose no capacity with a nontrivial code, what the best choice of decoder is to maximize capacity, when a maximum-likelihood decoder is optimal, and whether decoders that are not standard-may can ever be optimal.

The hard-quantized Gaussian channel can be considered as a special case of the binary-symmetric channel in which

115

p + 1 / 2 . The case of encoder plus decoder over a binary- symmetric channel with general values of p is considered in Section 111-C. The situation as p -+ 0 is considered in Section

The case of decoder but no encoder is considered in Section IV. We consider the questions of whether it is possible to lose no capacity for any nontrivial code, and what the achievable capacities using cosets of the code rather than the code itself are.

The case of encoder but no decoder is considered in Section V. We consider the question of when it is possible to achieve the full capacity of the uncoded channel in the hard-quantized Gaussian channel, and investigate the nature of the tradeoff between capacity and error probability for a binary-symmetric channel with long codes.

111-D.

111. ENCODER PLUS DECODER

A. Very Noisy Limit For a hard-limited Gaussian channel with a fixed encoder

and decoder we have, from (2 )

bits/s. ( 5 )

Comparing this to the capacity of the uncoded channel, 2Ps / (7rNo In 2 ) bit& we see that, as indicated earlier, the capacity has been reduced by the factor in braces. Expanding the term out gives many terms of the form 2 wt ( a ) wt (b) , which can be upper-bounded by wt + wt (b)', with equality if and only if wt (a) = wt (b ) . Thus

2n-k wt ( c + e l2 CEC e € &

- 2 n - k - f: (;) u2 u=o

so the term in braces is 51, with equality if and only if wt ( a ) = wt ( b ) for every pair of words a and b in the same decoding region c+ E of every codeword c. Since there is only one word of weight 0, this can only happen if the decoding region has size 1, i.e., if and only if the code is the trivial (n, n, 1) code. Thus if k < n, some capacity must necessarily be lost. (We will henceforth refer to the quantity in braces in (2) as the capacity gain by using the code, reserving the term capacity loss for the difference between coded and uncoded capacities.)

In (2), it is convenient to consider the standard array, with codewords in the top row, coset representatives in the leftmost column, and other entries equal to the sum of the corresponding codeword and coset representative. Then the

Page 4: The capacity of coded systems

116 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 1, JANUARY 1997

quantity

/ \ 2

CEC \ e€& 1

is obtained by summing the weights in each column and squaring, and summing over all columns.

It is often convenient to recast (2) in different terms. Letting q2 denote the number of words in E having a 1 in the zth position, and letting c, be the ith bit of c, we find that

n

e € € i=l

and

After expanding and simplifying, we find that if di > 2, then

bits/s (7)

(see Appendix 1). If X , is the ith element in the coset representative of a randomly chosen coset, then the variance of X , i s ~ ; ( 2 " - ~ - y,)2-(2"-2k), and the capacity depends on minimizing the average such variance. Equation (7) also shows that adding a fixed offset to each coset representative, which will interchange q, and Z n F k - qr for all relevant z's, makes no difference to capacity, as we should expect. Hence without loss of generality we can take q2 5 2n-k-1 for each 2.

What choice of coset representatives E should be used? While a maximum-likelihood decoder is the obvious choice, there are reasons for choosing other complete decoders. The choice of rules to minimize information bit-error probability rather than block-error probability, for example, is considered in [12]. Alternatively, we may have fixed a standard array decoding scheme, but retained some flexibility in the choice of coset representatives. What choice of coset representatives 8 maximizes the capacity?

Equation (7) shows that if we are choosing from some set of decoders in which each q, can be minimized simultaneously, then the resulting decoder must be optimal (and must be the maximum-likelihood decoder if that is in the set). If we restrict the choice of coset representatives to make the 9,'s independent of 2, i.e., q, = q , assuming that this is possible, then substituting into (7) gives

which is again maximized when q is as small as possible. Also, from (2) we are seeking to maximize

+ wt ( e + e)wt (c + e') e#.' cEC

in which our choice of coset representatives affects only the rightmost term. For each pair of cosets, this term is of the form

ajb,, and by a classic inequality [lo, p. 2611 this form is maximized over all rearrangements of the two series when the two series are similarly ordered. Thus if it possible to choose the coset representatives so that for every pair of cosets, a sequence of codewords that places the sequence wt ( e + e) in increasing order will also do so for the sequence wt ( e + e'), then that set of coset representatives must be optimal. This is possible for repetition codes when maximum-likelihood decoding is used: each row in the standard array has two elements, with the leftmost word being of the lower weight in each case. Thus maximum-likelihood decoding is optimal for repetition codes (note that we do not require d l > 2 in (2)).

These considerations seem to point to the coset leaders (i.e., the minimum-weight coset representatives) being at least a good choice in general. They are, however, not always optimal, as the following counterexamples show. For a (7,4) Hamming code, consider the set of coset representatives formed by taking the coset elements that are zero on the information bits (i.e., decoding by ignoring all parity check bits and taking the received information bits as they are). This choice gives

whereas the coset leaders give a sum of 49. It is easy to show that these are the best two decoders, up to the equivalence obtained by adding a fixed offset. For the (7 ,3) simplex code, the same strategy of ignoring parity checks gives

q2(2n--k - q z ) = (n - k)4"-k-l - - 256

whereas the best maximum-likelihood decoder has

with corresponding capacities 0.3936 and 0.3608 bitsls, re- spectively. The code has seven cosets with three joint coset leaders of weight 2, and a single coset with seven joint coset leaders of weight 3. We can assign the coset leaders of weight 2 and 3 arbitrarily, in general getting a different capacity from different choices. The optimal choice is to select any of the joint coset leaders of weight 3, to specify one of the locations in this coset leader arbitrarily, and then for the remaining cosets to select the coset leader with a 1 in this specified position, if there is one, and the coset leader with ones in the remaining two locations from the arbitrarily chosen leader of weight 3, if not. This results in the vector of qz's being (8,4,4,2,2,2,2) in some order, and the number of cases is small enough to verify that this must be optimal among all maximum-likelihood decoders. For comparison, the natural ML decoder in which the seven coset leaders of weight 2

Page 5: The capacity of coded systems

COFFEY AND KIELY THE CAPACITY OF CODED SYSTEMS 117

are cyclic shifts of each other gives qi’s of (4,4,4,3,3,3,3), with a resulting

qi(2n-k - 9;) = 300

(capacity 0.3034 bits/s). The “ignore parity checks” scheme is the optimal decoder (sketch of proof: we need only consider the cases with q2 = 26 or 28. For the first case, at most one of the q2’s can be zero, and the sum of any two qz’s cannot exceed 14. Then we cannot do any better than (8,6,5,5,1,1,0) , which gives 264. For the second case, at most two q2’s are zero, but then each qi 5 7, and then (7,7,7,6,1,0,0) gives 264 also).

The idea of ignoring parity checks may seem unmotivated at first sight, but it arises naturally from consideration of decoder error probabilities in the very noisy limit. An obvious, though suboptimal, scheme is to consider each of the k information bits as passing through an equivalent binary- symmetric channel, with crossover probability p , equal to the bit-error probability of the code, ignoring the correlation between these bits, and then transmitting at a rate of k(1 - X ( p e ) ) bitdcodeword. It is a consequence of results proved in [12], however, that for all codes with d l > 2, the bit- error probability is at least p , the crossover probability of the underlying channel, for some region pcrit 5 p 5 1/2, with pcrit # 1/2. So maximum-likelihood decoding, or any other strategy, can only make error probability worse for sufficiently noisy channels. In this case we are considering the very noisy limit, and can only better the result of the “ignore parity checks” scheme if the correlation between decoded information bits is enough to overcome the higher individual error probabilities.

There are many more cases where the “ignore parity checks” scheme outperforms at least some maximum-likelihood de- coders, such as the (8,4,4) extended Hamming code, the (15,7,5) double error-correcting BCH code, the (15,5,7) triple error-correcting BCH code, and the (16,8,4) self-dual code d16 described in [3]. However, the natural maximum- likelihood decoder in which we make the q2’s as equal as possible, for which it is easiest to calculate the required quantity q2(2n-k - si), is the worst of the maximum- likelihood decoders: since q2 is the same for all maximum- likelihood decoders, being the sum of the weights of the coset leaders, we must maximize the quantity q;, and this is achieved by making the q2’s as unequal as possible. Since there are usually many maximum-likelihood decoders, it is often difficult to calculate which one is best. Some can be handled by the general considerations given in the next section.

1) MI. Decoding ofthe (15,7,5) BCH Code: For the (15, 7,5) BCH code, we demonstrate that none of the 370 possible maximum-likelihood decoders can achieve q2(2n-k - q2) as low as 131460, and all are therefore inferior to the “ignore parity checks” scheme, which has

qi(2n-k - qi) = 131072.

The weights and multiplicity of coset leaders, obtained from [4, pp. 427 and 4401, are 65 cosets containing a unique leader of weight 3, 70 containing three joint coset leaders of weight

3, and the remainder with unique leaders of weight 5 2 . By cyclicity, each position is in the support of the same number of unique coset leaders of weight 3, i.e., (3/15)65, and so each q2 2 28 in a maximum-likelihood decoder. Noting that each of the three joint coset leaders in a coset must be disjoint, since their sum is a codeword, we see that the number of nonunique coset leaders with a 1 in a given position is (9/15)70 = 42, and so qi 5 70 for each i. Since at most (i) distinct nonunique coset leaders can fit into any five locations, there must be at least 60 that have at least a single 1 in the remaining ten locations, so that any ten locations must have

q; 2 lO(28) + 60 = 340.

Rearranging coordinates so that the q2’s are in descending order, we find that

70 2 Qi 2 42 2 . . . 2 415 2 28 (9) (10) (1 1)

It is easy to see that the maximum of q: under these constraints occurs when the third constraint is satisfied with equality.

Now fixing 45 = r , the maximum is achieved when the q2’s are as unequal as possible. The q2’s become

41 + q 2 + . . . 4- Qi5 = 630 41 + . . . + q5 5 290.

(70, * a * , 70, a , T, ’ ” , T, T , ‘ ” , T , b, 28, ’ “ , 28).

Defining z and y to be the number of 70’s and 28’s, respec- tively, we have

290 - 5r 290 - 5r - 0

7 0 - r 1 = 7 0 - r and

60 y = 1 0 - - !28] = 10 - - 7.

The third constraint determines a and b, according to a = 290 - 702 + (4 - 2). and b = 88 - ( r - 28)(9 - 9). We find that

q;(2n-k - 4;) = z 70 (186) + a (256 - a )

+ (13 - 2 - y) T (256 - r ) + b(256-b)+y28(228)

and this simplifies to

131460 + rl(i - 7) ( r - 2 q 2 + e( i - e)@ - 7 0 ) ~

which is 2131460, with equality iff 7 = 0 = 0, which happens only for r = 40 and r = 58. Neither of these corresponds to an achievable maximum-likelihood decoder, since the excess sum of any nine q2’s (i.e., over 28 each) must be at least 70 - (:) = 50, and the excess sum of any eleven q,’s must be at least 70 - (i) = 66, ruling out r = 58 and r = 40, respectively.

This argument provides a bound on the best maximum- likelihood decoder that is inferior to the “ignore parity checks” scheme. We do not know either the best overall decoder or the best maximum-likelihood decoder.

Page 6: The capacity of coded systems

118 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 1, JANUARY 1997

2) Optimality for Longer Humming Codes: We have seen that a maximum-likelihood decoder is suboptimal for the (7,4) Hamming code. It is optimal for all longer Hamming codes, as we now demonstrate.

For this class of codes, 2n-k = n + 1, and since for a maximum-likelihood decoder each qi = 1, we find that

4 i W - 4i) = n2.

In a nonmaximum-likelihood decoder, some of the q2’s can be zero. But for each such zero a higher weight element in the coset has been chosen, raising qz by at least one. The problem is therefore to minimize qz(n + 1 - q2)subject to a) 0 5 q2 5 (n + 1)/2, each y2 an integer; b) yz 2 n + z , where z = number of qz that are zero. The proof proceeds by considering the three cases z < (n - 1)/2, (n - l ) / 2 5 z < (3n - 3)/4, and (3n - 3)/4 5 z . The first case is relatively straightforward. For the second and third, writing z = zmln + s, where zmin is the lower limit of the range, and then computing the optimal solution for that z , gives a function that is a quadratic in s with negative second derivative. The minima are therefore only obtained at the endpoints of the range. Examining the solutions at these endpoints and at the adjacent points eliminates all but two solutions in each case, one at each endpoint. These solutions can then be eliminated as not corresponding to a standard array decoder, in that distinct coset representatives cannot be found that sum to the required 4 % ’ ~ . The details are given in Appendix 11.

Note that this proof shows that the maximum-likelihood decoder is the unique optimal decoder for all Hamming codes other than the (7,4) code.

Substituting n2 for

n

qi(2n-k - 4;) i=l

in (7) gives the result that under maximum-likelihood decod- ing, the Hamming codes have capacity

C& = (2Ps/nNo In 2 ) . (1 - 4n/(n + 1)’) > (2Ps/7rN0 In 2 ) . ( (n - 4)/n) bit&

We can interpret this in the following way. An uncoded system achieves close to (2PsT/7rNo In 2) bitslchannel use for sufficiently small T. Using a fixed Hamming encoder with a decoder that ignores parity checks would achieve close to

bits per n channel uses; it is natural to interpret this as equivalent to n - logz (n + 1) uses of the uncoded channel, rather than n uses, so that in this case the encodeddecoder pair has “cost” us log, (n + 1) channel uses out of each n channel uses. The expression for the capacity with a Hamming encoder and maximum-likelihood decoder above shows us that this encodeddecoder pair has “cost” only four channel uses out of n in the same sense, regardless of the value of n.

3) Restriction to Standard Array Decoder: We have re- stricted the decoding to be via standard array. Is this proviso necessary? That is, if we restrict the decoder to produce a codeword at the output, but can adapt the rule by which that codeword is selected, is it necessarily the case that the optimal decoder will be of standard array type? In fact, it is not so in general, and sometimes we can achieve higher overall capacity using a standard array decoder designed for a subcode.

For the class of repetition codes, consideration of (6) gives a capacity of

The term in braces is monotonically decreasing in n, approach- ing the limit 2 / n . Note that this is in addition to the factor 2/n due to quantization.

This factor is greater than the capacity gain for many codes that contain the all-ones codeword. The (7,4) Hamming code, for example, has a capacity gain of 0.5714 over the uncoded case (using the “ignore parity checks” standard array decoder), whereas the (7,1,7) repetition code has capacity gain 0.6836. Thus we would be better off building a decoder for the repetition code and using only two of the codewords of the (7,4) Hamming code.

The question of how to determine the best decoder in general remains unresolved.

4) Asymptotically Long Codes: For any rate R, all codes of any length n satisfy

C&/CT: 5 1 - 27--1(1- R) (12)

and as n -+ 00 virtually all codes, when decoded via maximum-likelihood decoding, satisfy

C:*, G/CA 2 1-4 ‘K1( 1 - R) (1 -7-t-1 (1 - R ) ) +o( 1). (13)

Both functions are strictly monotonically increasing in R, taking the value 0 at R = 0 and 1 at R = 1, and both are strictly greater than R for 0 < R < 1 (so that ignoring parity checks is not optimal for the average long code).

Since we saw in Section 11-A that we c y take y2 5 2n-k-1 without loss of generality, we have 2n--k - q2 L 2n-k-1. Then we find that

and

Ce*d,G/C; = 1 - (4/n) 2--(2n-2k) 4;(2”-‘“ - 4;)

Now cqz is the sum of the weights of the 2n-k coset representatives. By a result of Massey [16] the fraction p of ones in any M distinct n-tuples satisfies X(p) 2 n-’ log, M , and so

from which (12) follows. For the second claim, we use the result that virtually all

linear codes have a covering radius only slightly larger than

Page 7: The capacity of coded systems

COFFEY AND KIELY THE CAPACITY OF CODED SYSTEMS 119

the sphere packing bound [5] , i.e., all coset leaders have weight - < n7P1(1 - R) + o(n) for a fraction of (n, nR) codes that goes to 1 as n -+ CO. Then

q; 5 2n-“(n3C-1(1 - R) + +))

for almost all codes, which together with the lower bound for qi above gives

qi = n2”-“7--1(1 - R) + o(1))

for virtually all codes. Substituting into (7), and using again the fact that C qf is then minimized when the qi’s are equal, we obtain (13). It seems reasonable to assume that with probability approaching 1, the q; ’ s will simultaneously assume values close to the mean, 2n-k(7-l-1(1 - R) + o(1)); if this is true then the lower bound (13) is tight for virtually all codes.

B. Local optimality at T -+ 0

It is well known that with a hard-limited Gaussian channel with binary phase-shift keying, capacity is achieved as T t 0 [17, p. 3121. This is no longer the case when we use coding on the channel. In extreme cases, in fact, the capacity gain can be 0. This occurs when we choose our coset representatives so that CeEE wt (c + e) is independent of e, i.e., equal to n2”-k-l for all e. It is not always possible to find coset representatives that achieve this, but a simple example is a (3,1,3) repetition code in which the nonzero coset representatives consist of all words of weight 2. For the Hamming codes, this can be achieved by taking the nonzero coset representatives to be the n cyclic shifts of a noncodeword of weight (n + l)/2 (it is easy to see that these must all lie in different cosets). Consideration of (2) shows that this can never happen with a maximum-likelihood decoder (the sum of elements in the leftmost column of the standard array must be strictly lower than the sum in every other column).

Assuming that the quantity C& > 0, we are led to consider the next term in the expansion of Ced, G ( T ) /T, which

I ” T

0.2 0.4 0.6 0 . 8 1

Fig. 2. over hard-quantized Gaussian channel. Ps/No has been normalized to 1.

Capacity (in bitsls) using ( 7 , 3 ) simplex code with various decoders

code, we find

C e d , G ( T ) - 11 11 T112 -

T 147r In 2 $- 77r3/2 In 2 +(---- 37r2 8 In 2 217r In 2 ) T + . . .

= 0.3608 + 0.4071 f 0.1493 T - 0.8705 T3I2 + O ( T 2 )

for the maximum-likelihood decoder that is best at T --f 0. The “natural” decoder mentioned in Section 111-A has

Ced’ G ( T ) = 0.3034 + 0.5228 T1/2 + 0.0566 T T

- 0.8809T3/2 + O(T2).

Thus the maximum does not occur at T -+ 0 in either case. In fact, the maxima, which are 0.4936 (T = 0.2252) and 0.4754 (2’ = 0.2603), respectively, are both higher than the capacity of the “ignore parity checks” scheme capacity at T t 0. This is illustrated in Fig. 2.

Assuming that we have a code that satisfies either d l > 3 or 1 E C, we must examine the next term in the expansion of Ced, G(T)/T, which is

That this is not negative in general is seen by the example of the (7,4) Hamming code, for which this reduces to (489/128- 3r /8) rP2 M 0.268. The overall capacity for this code is

0.5166 + 0.7725 T - 2.324 T 2 + 3.106 T3 - 2.919 T4 + . . . plotted in Fig. 3. The maximum, 0.6041 at T = 0.280, is greater than the maximum for the “ignore parity checks” scheme.

Since as noted earlier it is possible for many codes to make f l ( c ) = 0 for all c E C, it will not in general be true that (14) must be negative, regardless of the conditions imposed on the code. If, however, we also impose conditions on the

where the probability of decoding to c is

Pr (c) = P+(l+ fi(c)€ + f2(c)e2 + . . .) with p ( T ) = 1/2 - E. If the all-ones word 1 E C, we find from the expressions for fl(c) and f i ( c ) in Appendix I that f ~ ( l + c) = -fl(c) and f 2 ( 1 + e) = f z (c ) , and therefore the sum vanishes. Even if 1 $! C, the term will still vanish if di > 3, but we omit the proof.

A familiar example of a class of codes that satisfies neither condition is the set of simplex codes. For the (7,3) simplex

Page 8: The capacity of coded systems

Fig. 3. over hard-quantized Gaussian channel. Ps /No has been normalized to 1.

Capacity (in bit&) using (7,4) Hamming code with ML decoder

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 1, JANUARY 1997

C ( T ) / T

T

decoder, it becomes possible to derive upper bounds on this term. Assuming that we use the best decoder, which must therefore achieve capacity of at least k / n of the uncoded capacity, we obtain a bound on f ? ( e ) which can be used in (14). Unfortunately, our best result along these lines is still positive. We conjecture that with appropriate conditions on the code parameters, an argument can be found that would make this negative. Here we confine ourselves to showing that the term must be negative for all Hamming codes other than the (7,4) code.

We find that

( n - 2 w~ ( c + e ) ) = (n - ~ ) ( n - 2 wt ( e ) ) e

for the Hamming codes, and so

= 2 ~ ( ~ - ' ) 2 ( n - 1 ) ( n - 2wt ( e ) )

f z ( c ) = 2-(n-k) 2 ( ( n - 3 ) ( n - 2wt (e))' + 4 n - nzn-IC) and f3(c) = 4 2-(n-k)(n - 5)((n - 2 wt ( c ) ) 3

- (3n - 2)(n - 2 wt (e))).

Now from [15, p. 1321 the moments CcEC (n - 2 wt ( c ) ) ~ are 2-(n-k) times the corresponding moments for the (n, n, 1) code if dL > i. These are

and

(:) (n - 2i)j = 0 , j odd.

From this we find that if d l > 4 we have

c t C

and the remaining terms simplify to

2 ( n - 1) 3n2 In 2 (n + 1)4

(n3 - 7r(n -

Fig. 4. Capacity (in bits/s) using (31,26) and ( 1 5 , l l ) Hamming codes with ML decoding over hard-quantized Gaussian channel. Ps/No has been normalized to 1.

in which the cubic has a single real root at n = 2.398. Since all Hamming codes other than the (7,4) code have d'- > 4, we conclude that the term is negative for all these codes, and that T + 0 is a local optimum.

For the (15,11,3) Hamming code, capacity with a maximum-likelihood decoder tums out to be

0.7032 - 0.3525 T + 0.1133 '1" + 2.231 T 3 - 8.3775 T4 + while the (31,26,3) Hamming code has capacity

0.8072-0.3873T+0.1181 T2-0.0258T3+0.0047T4+. . . .

These are plotted in Fig. 4. The derivative at T i 0 tends to -2 (n - 1)/(3 7r2 In 2) for Hamming codes as n + 00; this is the same as for the uncoded case.

C. Intermediate Error Probabilities

If we ignore the noise scaling factor and simply consider the capacity of the 2k x 2k channel compared to the uncoded BSC, we still must lose capacity by using any code other than the (n, n, 1) code. This follows because even if we remove the decoder, which cannot decrease capacity, we must lose some capacity for all p < l / 2 , as is demonstrated in Section V.

The question becomes whether we can operate at a p for which the error probability delivered by the coding scheme is low, but the capacity of the overall system is high, i.e., relatively close to the uncoded capacity.

For very long block lengths, this is easily seen to be possible. Selecting codes with rate arbitrarily close to the capacity of the uncoded channel, we can make the delivered error probability arbitrarily low, and hence the overall system becomes an essentially noiseless channel of rate (and capacity) arbitrarily close to the capacity of the uncoded channel.

For specific classes of codes, we do not necessarily achieve this favorable tradeoff. For the Hamming codes, it will emerge in Section V that in comparing the performance of long codes, the figure of interest is p = c / n , with c fixed. Capacity is

H ( Y I X ) = k -

Page 9: The capacity of coded systems

COFFEY AND KIELY: THE CAPACITY OF CODED SYSTEMS 121

where p z is the transition probability to a codeword of weight i. Fixing c and letling n become large, we find that

P' -

If i > 0 this is

- i p 2 - 1 ( 1 - p ) - + 1 + p , ( l - p ) ~ - - z + p , + l ( l - p ) n-i-1 .

i (c/n),-1(1 - c/n)n--z+l (1 + o(1)).

Then

In l / p 2 = ( i - 1) In n ( 1 + o(1)) - ln i.

Noting that

po log, l/po -+ c(c+ 1)e-"/ In 2 and

i>O

are both O(l), we find that

Taking the derivatives of pi log, l/pi, we find that

and so on. Since p , is at most O(p"+') (apart from PO, which is 1 - O(pf+l ) ) , each of these terms is 0 at p = 0 for the first t derivatives. For the (t + 1)st derivative, the (In (l /pz)) dt++'p,/dpt++' term is - cc at p = 0 (and all other terms are finite).

All decoders decoding to t errors will have practically the same capacity for small p . Suppose we are interested in refining this still further, on the basis that the best decoder at p --f 0 will also be best throughout some region near 0, Then we should minimize the magnitude of the following term, which is ct+l(t + l ) ( t + l ) ! In (l/p) + O(1) where ci is the number of words of weight i that are not coset representatives, i.e , error patterns of weight i that are misdecoded. This means we should minimize ct+l, i.e., adopt a maximum-likelihood rule for words of weight t + 1 also. The form of the general optimal decoder is unknown.

So the capacity is IV. DECODER, No ENCODER

For a binary-symmetric channel with fixed standard array IC - c( 1 - e-') In n (1 + o( 1)) bitskodeword.

The uncoded capacity is decoder but no encoder we have, from (3),

n - n%(p) = n - c log, n + O(1) Cd, B = R - ( min H(Y1w)) bitskhannel use n WEFT

and the capacity loss over the uncoded case is therefore

n - IC - ce-' log, n( 1 + o( 1))

and this is achieved when we use only the elements of a single coset as inputs to the channel, with equiprobable signaling.

Signaling using the code is always at least jointly optimal as p 4 0. It is not in general a unique optimum, however. For = (1 - Ce-? log, 71 (1 -t- o(1)) biEAxxhmd.

This function has its minimum at c = 1, when it takes the value 0.632 log, n, after which the capacity loss rises back to log, n + o(n). the value it takes at p = 0. Thus we are left with a certain minimum capacity loss (as a fraction of the capacity loss at p = 0) for long Hamming codes. Intuitively p must be some distance away from 0 before we can get any reduction in capacity loss, but if it becomes too high, enough errors will be introduced by the decoder to remove the effectiveness of the decoder. This begins happening at the point at which the expected number of errors matches the error-correcting capability of the code; for comparison c M 1.25 is the point at which a maximum-likelihood decoder yields the same error probability as the underlying channel, as will be seen in Section V.

D. Very Quiet Limit As p 4 0, we demonstrate that C e d , ~ ( p ) has its first t

derivatives equal to 0; the next is -m. Here t is the guaranteed error-correcting power of the combination of encoder and decoder, and it follows that the optimal decoder in the very quiet limit will be one that takes every minimum-weight element in a coset as coset representative, if that element has weight up to [ (d - 1)/2J. (There may be many such decoders, since this does not specify what happens in any other cosets.)

the (8,4,4) Hamming code with the decoder that produces the highest Cld, G, we can use the coset containing (1 0 0 0 0 0 0 O), where the coset leaders have been chosen so that every leader of weight 2 has a 1 in its first position. Then for all p , the capacity of the system using these inputs is exactly the same as that obtained using the code. (The sets & and & + w are identical. This occurs for codes with an overall parity check where we select coset leaders to have a 1 in the parity check position whcn possible, and for many other codes also, including the best ML decoder for the (7,3) simplex code given in Section 111-A.)

If the decoder is the "ignore parity checks" decoder, then C d , = R(l - X ( p ) ) bitskhannel use, regardless of the coset used.

As p -+ 1/2, by modifying the derivation in Appendix I the capacity becomes

. E2 + O ( E 3 )

where E = 1/2 - p and w has been chosen to maximize the first term on the right. As before

Page 10: The capacity of coded systems

122 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 1, JANUARY 1997

which is the first term multiplied by Ps/7rNo E'. If d l > 2, we can rewrite this in the same form as (7), with q, replaced by 2"-'" - q, for every i for which w, = 1. However, this is exactly the expression we would get by replacing the set of coset representatives E by E + w, and this cannot change the capacity, as noted in Section 111. So all cosets give the same C2,G when d l > 2. For higher order terms, adding w to c does not give the same result as adding w to e, and the behavior away from p = 1/2 varies with the coset. For the (7, 4) Haiiiming code, capacity on the Gaussian channel becomes

0.5166 - 0.4178 T + 0.4560 T 2 - 0.4662 T3 + . ', for any (proper) coset. For the (7, 3) simplex code with coset representatives chosen to maximize C*, we obtain the same capacity as before using a coset obtained by adding a 1 in the location with q; = 8 to the code; we obtain

0.3608 - 0.1849 T + 0.1836 T 2 + . . . for a coset obtained by adding a 1 in a location with 4% = 4 to the code; and we obtain

0.3608 - 0.3154 T + 0.3075 T2 + ' . . for a coset obtained by adding a 1 in a location with pa = 2 to the code. The (5,2,3) code generated by 10110 and 01011 has di = 2, and so the cosets do not all necessarily give the same Ci, G. We choose the coset representatives to be of minimum weight, with a 1 in first position in case of ties. Then using the code gives capacity 0.471 + 0.414T1/' + O ( T ) bits per second, while using the coset containing 10000 gives capacity 0.333 + 0.155 T1/2 + O ( T ) bits per second.

Note that in each case examined, capacity in the region near T = 0 is no higher when a proper coset is used than when the code is used. We do not know of any example where using a coset gives a higher capacity than using the code when the decoder is fixed.

V. ENCODER, N o DECODER From Section 11-C, the capacity is

e t E

bitsichannel use. 1

.log, ~

Ae+c (P)

We find that Ce, B = 1 - %(p) if and only if the third term is 1 - R, i.e., if and only if each coset is equally likely. Apart from the degenerate case R = 1, when we only have one coset, it is known that this can only happen at p = 1/2 [2], so capacity must be lost on a BSC if the uncoded capacity is nonzero.

For the hard-quantized Gaussian channel, we find that with any code, linear or nonlinear, we achieve the full uncoded capacity if and only if in each position zero and one are equally likely to be transmitted. Expanding (4) about p = 1/2 and using p = Q as in Section 11-A, we find (see 0

Appendix 111) that

where Px(c) is the probability distribution used for the input codewords. Now we recognize that

X E C ccc a=1

where 4% is the probability that two randomly selected code- words differ in the ith position; 4% = 2pZ(1 - p,), with p, the probability that a randomly selected codeword has a 1 in the ith position. The sum is maximized if and only if each pz = l / 2 , when it takes the value n/2, from which (17) gives the capacity of the uncoded channel.

A. Intermediate Values of p

For long codes, the situation with no decoder can be no worse than the situation with a decoder, for which it was shown in Section 111-C that arbitrarily low error probabilities are possible with arbitrarily small loss of capacity. For specific families of codes, this was not true; of course, we can achieve more favorable results when no decoder is present.

For the Hamming codes, we find that

CB ( P ) - Ce, B ( P ) = PC (n - I C ) - ~ ( P c ) + (1 - P C ) log2 (1 + I l n )

bitsicodeword where pc is the probability of receiving a codeword given that one was transmitted, which is (n + 1)-'( 1 + n( 1 - 2 ~ ) ( ~ + ~ ) / ' ) [14, p. 1.571. This difference is monotonically decreasing in p for all Hamming codes, taking the value 0 at p = 1/2. The capacity loss can be rewritten as

n n + l + ;Ft(l/(n + 1)) bitdcodeword.

C&) - C e , , B ( p ) = ~ (1 - 2p)@+l)', log, n - X ( p c )

As n becomes large, it is convenient to express p as c / n Then pc M e-c and

c B ( p ) - C e , B ( P ) !Z e-' log, n - x(t(pC) f x ( l / ( n f 1)).

This can be further simplified to

bitsicodeword. The capacity loss at p = 0 is log, (n + I), and thus

CB(P) - c e , B ( P ) -+ e - c ( C ~ ( 0 ) - C e , ~ ( 0 ) ) .

Page 11: The capacity of coded systems

COFFEY AND KIELY THE CAPACITY OF CODED SYSTEMS 123

We need information on the gain in reliability for different values of p . Define go(p) (resp., gl(p)) to be the probability of receiving a word from a particular proper coset with a 0 (resp., 1) in a particular location given 0 transmitted; and define pc , to be the probability of an error pattern equal to a codeword, and with a 1 in the particular location. Then the probability of incorrectly decoding any particular information bit is p , = pc , l + g ~ ( p ) + ( n - l ) gl(p), whereas the probability of receiving a 1 in that location given 0 transmitted is p = P C , l + ng1(p). s o

pe = P + (gob) - g l ( ~ ) ) = P + (Pr(coset) - 2gl(p)).

The probability of a particular coset is (n + 1)-l(1 - (1 - Zp)("+l)/ ') . We find gl(p) by taking the enumerator for codewords of weight i - 1 with a 0 in first position, and adding a 1 in first position, i.e.,

n n - i + l g1(p) = Ai-1 pi(1 - p y - i

i= l

so that 1

2" p e = p - - (-(1- 2p) + (1 - 2p)(n-1)/2

. (1 - 2p+ 2p2 + 2 n p ( l - p ) } ) .

Now when p is such that the second term is zero, we gain nothing from the code, and for higher p , the code has made things worse. This happens for large n roughly where 4p2(1 - p)'(l - 2p)"n' M 1, or, substituting p = c / n , where ec = 1 + 2c. For other values of c the gain in reliability is approximately

(e-'(1+ ZC)' - 1)/(n + 1) x (eVC(1 + ~ c ) ~ - 1) c-lp.

The crossover point of no gain is at c M 1.25 (a result given in [7 ] ) and at this point the capacity loss is a fraction 1/3.5 = 0.28 of its maximum value at p = 0. For general values of c the gain in reliability is approximately

( e P ( 1 + 2c)' - l ) / (n + 1) M (e-'(1 + ZC)' - 1) c - l p

and the capacity loss is a fraction e-' of its maximum value. This provides a tradeoff of the type in Fig. 5, in which the higher curve is the capacity loss expressed as a fraction of the capacity loss at c = 0, and the lower curve is the amount by which the error probability has been reduced, as a fraction of the crossover probability of the underlying channel.

The derivations in this section have assumed that capacity is being maximized, i.e., that all codewords are equally likely to be transmitted. Other capacityjerror probability tradeoffs are possible by using various subcodes.

VI. CONCLUSION A detailed analysis of the capacity loss on the superchannels

obtained by fixing elements of coding schemes on a binary- symmetric channel has been presented. The amount of capacity sacrificed depends strongly on which parts of the coding scheme have been fixed. In general, fixing any part of a

Relative capacity loss

Fig. 5. Hamming codes without a decoder.

Tradeoff between capacity (loss) and enor probability for long

decoder will have a larger effect on capacity than fixing an encoder.

A number of problems on functions of weight enumerators arise from the case with an encoder and decoder over a Gaussian channel. Capacity in this case is affected by the choice of coset representatives. Although it is not known in general when maximum-likelihood decoding is optimal, a number of results have been derived for classes of codes for which detailed information on coset weight enumerators is available. Maximum-likelihood decoding is shown to be optimal for all Hamming codes other than the (7,4) code, and T -+ 0 is shown to be locally optimal also. For a system with fixed decoder, an optimal scheme is to signal using the code or a single coset. In the wideband limit on the Gaussian channel, every coset gives the same capacity to first order if d L > 2; we conjecture that it is always optimal to use the code. Although a favorable tradeoff between error probability and capacity loss can be achieved for unrestricted long codes, even when a decoder must be used, the tradeoffs obtainable from the class of Hamming codes are generally much poorer.

APPENDIX I VERY NOISY LIMIT WITH ENCODER AND DECODER

With a fixed encoder and decoder, the capacity in bits/ channel use is

." CEC

from (1). We expand around p = 1/2 to get

Ac+E(1/2 - E , 1/2 + E ) = 2-'(1+ f i ( c ) ~ + ~ ' ( C ) E ' + ...) i.e., f i ( c ) is defined as

( Z k / i ! ) ( @ / t k i ) Ac+~(1 /2 - E , 1/2 + E) I ,=o . Note that

f i ( C ) = (2"/i!) (ai/a&z) C € C

CEC

if i > 0, since

AC+&(1/2 - E , 1/2 + E ) = 1. CEC

Page 12: The capacity of coded systems

124 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 1, JANUARY 1997

Expanding the log term in the standard power series and collecting terms gives

Now

2 k - n --

i ! e € &

This yields

4 1 f3(c) = - - ( ( n - 2wt (e + e ) ) 3

3 2 n - k e € &

- (3n - 2)(n - 2wt ( e + e ) ) )

So the ratio of coded-to-uncoded capacity is given by

n-l k 2 n = 2 -

CEC

CEC

In the Gaussian case, (19) gives the well-known result CG = 2 Ps/(T NO In 2) bits/s.

B. Higher Order Terms While the behavior of the capacity in the limit of zero

resource per transmitted bit is a function only of the code, and not of the effect of noise scaling on the channel, this is not true of the behavior for larger A. We confine ourselves to the calculation of the power series expansion for the Gaussian case.

Here we need to expand E as a function of T , defining T as PsTh/No (dimensionless time). We have

or

and so on. Inserting this expansion into (18) gives

A. Gaussian Channel

In proving (2), it is more convenient to prove a more general result. Abdel-Ghaffar and McEliece [ 11 discuss the capacity of channels with noise scaling, of which the Gaussian channel with adjustable T is an example. In general, we take the crossover probability p to be a function of some resource A. In storage systems, for example, this can be the area per bit. Then the number of bits per unit resource that can be transmitted is - I (p (X) ) /X . For the Gaussian channel, X is the time per transmitted bit Th, and p(Th) = Q(J-).

Writing p(X) = 1/2 - .(A) for X very small gives, via the above results

. lim ~ E 2 A-0 X

bitdunit resource.

For the uncoded channel, we get

~ lim 3 bitdunit resource. I n 2 A+O X (19)

2 - -

Comparing this to (18) we see that the term -7rf?/3 has appeared in the coefficient of T , arising from the higher order terms in the expansion of E ~ ( T ) ; a different correction term would arise for a different noise scaling rule. In general, if the channel improves more slowly with increasing T (or A), larger negative correction terms will be added to higher powers of T . A channel for which p e decays sufficiently slowly with increasing resource will have global optimal capacity at X = 0, assuming this quantity is nonzero, since all higher powers of T will have large negative multiples of f," added.

C. Alternative Form of Capacity Expression

sions for the wideband limit capacity provided dL > 2. Here we demonstrate that (6) and (7) are equivalent expres-

Page 13: The capacity of coded systems

125 COFFEY AND KIELY: THE CAPACITY OF CODED SYSTEMS

From (6), the capacity is

We have

= c 22n-2k Wt2(C) + 2n-k+1 wt ( e ) CEC C€C

n / n

i=l CEC \z=1

n

(1 - 2c;) wt (c) + 4: ctc CEC i=l

c t C i=l J # i

where we have used the fact that d l > 2 to simplify the first term.

Now

CEC cEC: c,=o cEC: c,=l

= 2 wt ( e ) - wt ( e ) . cEC: c,=o CEC

Noting that fixing c; = 0 amounts to shortening the code, we have

= ( n - 1)2k--2 cEC: c,=o

and (1 - 2cz) wt ( e ) = - zk - ]

CEC

The last term above can be rewritten as

a # j cEC

and the inner sum vanishes if d L > 2 (by [15, ch. 5, Theorem 81).

Making these substitutions yields

/ n \ 2 E 1 2n-k wt ( c ) + E U i ( 1 - 2c;) 1 CEC \ i=l

n

n(n + 1) - Z k qi(2n-k - 42) - - 2%-12-2

i=l

and substituting into (6) gives (7).

APPENDIX I1 OPTIMALITY OF MAXIMUM-LIKELIHOOD

DECODING FOR LONGER HAMMING CODES

The problem is to minimize q2(n + 1 - q2) subject to a) 0 5 qa 5 (n + l)/Z, each q; an integer; b) Cq; 2 n + z , z = number of qa that are zero. We need one other condition to rule out some possible solutions: there are at least z coset representatives with weight 2 or greater, and if z > 4, at least z - 4 of these have at least a single one outside any three given places. This follows from the need to have distinct coset representatives.

Applying these conditions to the ( 1 5 , l l ) code gives two solutions to qa(n + 1 - 4%) 5 n2, namely q = (8, 8 , 8 , 2, 0, ..., 0) and q = (8, 8, 7, 1, 1, 0, ..., 0); but both violate the coset representatives condition. We can therefore assume in what follows that n 2 31.

We consider the three cases z < (n - 1) /2 , (n - l ) / Z 5 x < (3n - 3)/4, and x > (3n - 3)/4. (Note that (3n - 3)/4 is not an integer, since n = Z m - 1.)

Case I z < (n - 1)/2 Here max qa 5 z + 1 if we only reassign z coset representatives. The maximum occurs when we have q = ( z + 1, z + 1, 1, . . e , 1, 0, . e . , 0), so that

since n - 22 - 2 > 0 when z < (n - 1)/2. Reassigning more than z coset representatives while still keeping only z of the qa7s equal to zero only makes things worse, in that the optimal solution will not contain any quantities lower than the solution given above.

Case ZZ (n - l ) / Z 5 x < (3n - 3)/4: Here two qa's can take the value (n + 1)/2, and the optimal solution for any fixed z is

q = ( ( n + 1 ) / 2 , ( n + 1 ) / 2 , 2 z - n + 2 , 1, ' " , 1 , 0 , ..., 0 ) .

Writing z = (n - 1 ) / 2 + s, we expand Cq2(n + 1 - qa) to n2 - n/Z + 1 /2 - 4s2 + (n - 2)s. This being a quadratic in s with negative second derivative, the minimum can only occur at the limits of the range, which is 0 5 s 5 (n - 3)/4. The two endpoints give n2 - n/2 + 1/2 and n2 - n/4 - 1/4, respectively, but the next two points, s = 1 and s = (n - 7)/4, give n2 + n/2 - 11/2 and n2 + 3n/4 - 33/4, respectively, and thus all intermediate values of s give qa(n + 1 - qa) > n2.

The optimal solutions at the endpoints both violate the distinct coset representatives condition, and the next best solution in each case exceeds n2: for s = 0,

( ( n + 1 ) / 2 , (n-1) /2 , 2, 1, ..., L O , . e . , 0)

gives n2 + n/2 - 5/2, while for s = (n - 3)/4,

gives n2 + 3n/4 - 21/4.

Page 14: The capacity of coded systems

126 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO 1, JANUARY 1997

Case ZZZ x > (372 - 3)/4: Here three q;’s can take the and value (n + 1)/2, and the optimal solution for any fixed z is

-16 [ Px(x)(n - 2 wt (x + y)) 2” In 2 4 = ( ( n + 1) /2 , (n + 1)/& (. + 1)/&

2x - 3n/2 + 5/2,1, . . . , 1 , 0 , . . . , 0).

Writing z = (3n - 3)/4 + 1/2 + s, we find that

CEC qz(n + 1 - q z ) = nz - n/4 - 5/4 - 4s’ + (n - 6 ) s .

Now 0 5 s 5 ( n - 15)/4, where the upper limit arises from the fact that x 5 n - k < n - 4, a necessary condition to represent all cosets. The upper limit gives

and we find after manipulations that

qz(n + 1 - 4 % ) = n2 + 2n - 35. X Y p=1/2

4n ( l - n) 16 In 2 The lower limit s = 0 gives n2 - n/4 - 5/4, while s = 1

gives n2 + 3n/4 - 45/4. This again means that all solutions with s > 0 are > n2. The optimal solution with s = 0 again violates the distinct coset representatives condition, and the next best solution for this case,

=

Then

1 + n q = ( ( n + 1) /2 , (n + 1)/2, (n - 1) /2 , 3 , 1, . . . , I)

TNO In 2 gives n2 + 3n/4 - 2514.

APPENDIX 111 ENCODER, No DECODER

To derive (16), we note that for the BSC, we have H ( Y I X ) = n X ( p ) and

where

and Px(x) is the probability distribution on the input words. Then we can rewrite I @ ) as

as in (16).

ACKNOWLEDGMENT

The authors wish to acknowledge a very helpful and thor- ough review.

REFERENCES

[l] K. Abdel-Ghaffar and R. J. McEliece, “The ultimate limits of informa- tion density,” in Proc. NATO Advanced Study Institute on Pelformance Limits in Communication Theory and Practice, I1 Ciocco, July 1986.

[2] T. C. Ancheta, “Duality for the group-coset probability ratio,” IEEE Trans. Inform. Theory, vol. IT-28, no. 1, pp. 110-111, Jan. 1982.

[31 E. F. Assmus, Jr., and V. Pless, “On the covering radius of extrema1 self-dual codes,” IEEE Trans. Inform. Theory, vol. IT-29, no. 3, pp. 359-363, May 1983.

[4] E. R. Berlekamp, Algebraic Coding Theoly. New York: McGraw-Hill, 1968.

[5] V. M. Blinovskii, “Lower asymptotic bound on the number of linear code words in a sphere of given radius in F: ,” Probl. Pered. Inform., vol. 23, no. 2, pp. 50-53, 1987. (In Russian. English translation in Probl. Inform. Transm., vol. 23, no. 2, pp. 130-133, 1987.)

[6] K.-M. Cheung, D. Divsalar, S. Dolinar, I. Onyszchuk, F. Pollara, and L. Swanson, “Changing the coding system on a spacecraft in flight,” in Proc. IEEE Int. Symp. on Information Theory (San Antonio, TX, Jan. 1993), p. 381.

[7] P. Delsarte, “Partial-optimal piecewise decoding of linear codes,” IEEE Trans. Inform. Theory, vol. IT-24, no. 1, pp. 70-75, Jan. 1978.

[8] G. D. Forney, Jr., Concatenated Codes. Cambridge, MA: MIT Press, 1966.

[9] R. G. Gallager, Jr., Information Theory and Reliable Communication. New York Wiley, 1968.

[lo] G. H. Hardy, J. E. Littlewood, and G. P6lya, Inequalities, 2nd ed. New York Cambridge Univ. Press, 1952.

where f ( P ) = P log2 ( P l ( 1 - P I ) . Now

(a2.f(~)/a~2)lp=1/2 = 8 / In 2

Page 15: The capacity of coded systems

COFFEY AND KIELY: THE CAPACITY OF CODED SYSTEMS 127

[ l I] A. B. Kiely and J. T. Coffey, “The cost of encoding and decoding,” in Proc. IEEE Int. Workshop on Information Theory (Bahia, Brazil, June 1992), pp. 52-56.

[12] A. B. Kiely, J. T. Coffey, and M. R. Bell, “Optimum information bit decoding of linear block codes,” IEEE Trans. Inform. Theory, vol. 41, no. 1, pp. 130-140, Jan. 1995.

[13] V. Lauer, “Using channel capacity as a criterion in the design of a communication system,” European Trans. Telecomm. Related Technol., vol. 6, no. 4, pp. 447-454, JulylAug. 1995.

[I41 R. J. McEliece, The Theory of Information and Coding. Reading, MA: Addison-Wesley, 1977.

1151 F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-Correcting Codes. Amsterdam, The Netherlands: North-Holland, 1978.

[16] J. L. Massey, “On the fractional weight of distinct binary n-tuples,” IEEE Trans. Inform. Theory, vol. IT-20, no. 1, p. 131, Jan. 1974.

1171 J. R. Pierce and E. C . Posner, Introduction to Communication Science and Systems.

1181 T. R. N. Rao and E. Fujiwara, Error-Control Coding for Computer Systems. Englewood Cliffs, NJ: Prentice-Hall, 1989.

New York Plenum, 1980.


Recommended