Capacity, mutual information, and coding for finite-state Markov channels

868 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 42, NO. 3, MAY 1996

Capacity, Mutual Information, and Coding for Finite-State Markov Channels

Andrea J. Goldsmith, Member, IEEE and Pravin P. Varaiya, Fellow, IEEE

Abstruct- The Finite-State Markov Channel (FSMC) is a discrete time-varying channel whose variation is determined by a finite-state Markov process. These channels have memory due to the Markov channel variation. We obtain the FSMC capacity as a function of the conditional channel state probability. We also show that for i.i.d. channel inputs, this conditional probability converges weakly, and the channel’s mutual information is then a closed-form continuous function of the input distribution. We next consider coding for FSMC’s. In general, the complexity of maximum-likelihood decoding grows exponentially with the channel memory length. Therefore, in practice, interleaving and memoryless channel codes are used. This technique results in some performance loss relative to the inherent capacity of channels with memory. We propose a maximum-likelihood decision-feedback decoder with complexity that is independent of the channel memory. We calculate the capacity and cutoff rate of our technique, and show that it preserves the capacity of certain FSMC’s. We also compare the performance of the decision-feedback decoder with that of interleaving and memoryless channel coding on a fading channel with 4PSK modulation.

Index Terms-Finite-state Markov channels, capacity, mutual information, decision-feedback maximum-likelihood decoding.

I. INTRODUCTION HIS PAPER extends the capacity and coding results of Mushkin and Bar-David [l] for the Gilbert-Elliot

channel to a more general time-varying channel model. The Gilbert-Elliot channel is a stationary two-state Markov chain, where each state is a binary-symmetric channel (BSC), as in Fig. 1. The transition probabilities between states are g and b, respectively, and the crossover probabilities for the “good” and “bad” BSC’s are p~ and p ~ , respectively, where p~ < p ~ . Let 2, E {O, I}, y, E (0. l}, and xn = 5 , c€ yn denote. respectively, the channel input, channel output, and channel error on the nth transmission. In [l], the capacity of the Gilbert-Elliot channel is derived as

Manuscript received February 18, 1994; revised September 15, 199.5. This work was supported in part by an IBM graduate fellowship, and in part by the PATH program, Institute of Transportation Studies, University of California, Berkeley. The material in this paper was presented in part at the IEEE International Symposium on Information Theory, Trondheim, Norway, June 1994.

A. J. Goldsmith is with the Dcpartment of Electrical Engineering, Califomia Institute of Technology, Pasadena, CA 91 125 USA.

P. P. Varaiya is with the Department of Electrical Engineering and Computer Science, University of California, Berkeley, CA 94720 USA.

Publisher Item Identifier S 0018-9448(96)0293.5-5.

where h is the entropy function, qn = p ( z n = 1 1 zn-’), q, converges to qm in distribution, and qW is independent of the initial channel state.

In this paper we derive the capacity of a more general finite-state Markov channel, where the channel states are not necessarily BSC’s. We model the channel as a Markov chain S, which takes values in a finite state space C of memoryless channels with finite input and output alphabets. The conditional input/output probability is thus p(y, I x,,S,), where x, and y, denote the channel input and output, respectively. The channel transition probabilities are independent of the input, so our model does not include IS1 channels. We refer to the channel model as a finite-state Markov channel (FSMC). If the transmitter and receiver have perfect state information, then the capacity of the FSMC is just the statistical average over all states of the corresponding channel capacity [2]. On the other hand, with no information about the channel state or its transition structure, capacity is reduced to that of the Arbitrarily Varying Channel [3]. We consider the intermediate case, where the channel transition structure of the FSMC is known.

The memory of the FSMC comes from the dependence of the current channel state on past inputs and outputs. As a result, the entropy in the channel output is a function of the channel state conditioned on all past outputs. Similarly, the conditional output entropy given the input is determined by the channel state probability conditioned on all past inputs and outputs. We use this fact to obtain a formula for channel capacity in terms of these conditional probabilities. Our formula can be computed recursively, which significantly reduces its computation complexity. We also show that when the channel inputs are i.i.d., these conditional state probabilities converge in distribution, and their limit distributions are continuous functions of the input distribution. Thus for any i.i.d. input distribution 8, the mutual information of the FSMC is a closed- form continuous function of 0. This continuity allows us to find li.i .d, the maximum mutual information relative to all i.i.d. input distributions, using straightforward maximization techniques. Since 1i.i.d < C, our result provides a simple lower bound for the capacity of general FSMC’s.

The Gilbert-Elliot channel has two features which facilitate its capacity analysis: its conditional entropy H(Y” 1 X n ) is independent of the input distribution, and it is a symmetric channel, so a uniform input distribution induces a uniform output distribution. We extend these properties to a general class of FSMC’s and show that for this class, 1i . i .d equals the channel capacity. This class includes channels varying between

0018-9448/96$05.00 0 1996 IEEE

GOLDSMITH AND VAKAIYA: CAPACITY, MUTUAL INFORMATION, AND CODING FOR MARKOV CHANNELS X69

1 1 -PG

Fig. 1. Gilbert-Elliot channel.

any finite number of BSC’s, as well as quantized additive white noise (AWN) channels with symmetric PSK inputs and time-varying noise statistics or amplitude fading.

In principle, communication over a finite-state channel is possible at any rate below the channel capacity. However, good maximum-likelihood (ML) coding strategies for channels with memory are difficult to determine, and the decoder complexity grows exponentially with memory length. Thus a common strategy for channels with memory is to disperse the memory using an interleaver: if the span of the interleaver is long, then the cascade of the interleaver, channel, and deinterleaver can be considered memoryless, and coding techniques for memoryless channels may be used [4]. However, this cascaded channel has a lower inherent Shannon capacity than the original channel, since coding is restricted to memoryless channel codes.

The complexity of ML decoding can be reduced significantly without this capacity degradation by implementing a decision-feedback decoder, which consists of a recursive estimator for the channel state distribution conditioned on past inputs and outputs, followed by an ML decoder. We will see that the estimate T , = p ( S n 1 znp1, , X ~ . ~ ~ - ~ , . . . , ? J ~ ) is a sufficient statistic for the ML decoder input, given all past inputs and outputs. Thus the ML decoder operates on a memoryless system. The only additional complexity of this approach over the conventional method of interleaving and memoryless channel encoding is the recursive calculation of 7rn. We will calculate the capacity penalty of the decision-feedback decoder for general FSMC’s (ignoring error propagation), and show that this penalty vanishes for a certain class of FSMC’s.

The most common example of an FSMC is a correlated fading channel. In [5] , an FSMC model for Rayleigh fading is proposed, where the channel state varies over binary- symmetric channels with different crossover probabilities. Our recursive capacity formula is a generalization of the capacity found in [ 5 ] , and we also prove the convergence of their recursive algorithm. Since capacity is generally unachievable for any practical coding scheme, the channel cutoff rate indicates the practical achievable information rate of a channel with coding. The cutoff rate for correlated fading channels with MPSK inputs, assuming channel state information at the receiver, was obtained in [6]: we obtain the same cutoff rate on this channel using decision-feedback decoding.

Most coding techniques for fading channels rely on built-in time diversity in the code to mitigate the fading effect. Code designs of this type can be found in [7]-[9] and the references therein. These codes use the same time-diversity idea as interleaving and memoryless channel encoding, except that the

diversity is implemented with the code metric instead of the interleaver. Thus as with interleaving and memoryless channel encoding, channel correlation information is ignored with these coding schemes. Maximum-likelihood sequence estimation for fading channels without coding has been examined in [lo], [ l l ] . However, it is difficult lo implement coding with these schemes due to the code delays. In our scheme, coding delays do not result in state decision delays, since the decisions are based on estimates of the coded bits. We can introduce coding in our decision-feedback scheme with a consequent increase in delay and complexity, as we will discuss in Section VI.

The remainder of the paper is organized as follows. In Section I1 we define the FSMC, and obtain some properties of the channel based on this definition. In Section I11 we derive a recursive relationship for the distribution of the channel state conditioned on past inputs and outputs, or on past outputs alone. We also show these conditional state distributions converge to limit distributions for i.i.d. channel inputs. In Section IV we obtain the capacity of the FSMC in terms of the condition state distributions, and obtain a simple formula for I , , d . Uniformly symmeiric variable-noise FSMC’s are defined in Section V. For this channel class (which includes the Gilbert-Elliot channel), capacity is achieved with uniform i.i.d. channel inputs. In Section VI we present the decision-feedback decoder, and obtain the capacity and cutoff rate penalties of the decision-feedback decoding scheme. These penalties vanish for uniformly symmetric variable-noise channels. Numerical results for the capacity and cutoff rate of a two-state variable- noise channel with 4PSK modulation and decision-feedback decoding are presented in Section VII.

11. CHANNEL MODEL

Let S, be the state at time n of an irreducible, aperiodic, stationary Markov chain with state space C = { cl s . . C K } . S, is positive recurrent and ergodic. The state space C corresponds to K different discrete memoryless channels (DMC’s), with common finite input and output alphabets denoted by X and JJ, respectively. Let P be the matrix of transition probabilities for S . so

PA,,,, = p ( S n + ~ = cm j Sn = CA,) (2)

independent of n by stationarity. We denote the input and output of the FSMC at time r i by z,, and y,, respectively, and we assume that the channel inputs are independent of its states. We will use the notation

n T n = ( T I , . . . T n )


Fig. 2. Finite-state Markov channel.

and

for T = x,y, or S. The FSMC is defined by its conditional inputJoutput prob-

ability at time a, which is determined by the channel state at time ri

where pk(y I x) = p(y I x , S = c k ) , and I[.] denotes the indicator function ( I [ & = c k ] = 1 if s, = c k and 0 otherwise). The memory of the FSMC is due to the Markov structure of the state transitions, which leads to a dependence of S,, on previous values. The FSMC is memoryless if and only if PkvL = PJw, for all k , j , and m. The finite-state Markov channel is illustrated in Fig. 2.

By assumption, the state at time a + 1 is independent of previous inputloutput pairs when conditioned on S,

P(S,+l 1 Sn,Zn,Yn) = P(Sn+1 I Sn). (4)

Since the channels in C are memoryless

P(Yn+l I Sn+l,:L,+l, s n , x n , Y") = P(Yn+l I &+l, xn+1). ( 5 )

If we also assume that the x,'s are independent, then

111. CONDITIONAL STATE DISTRIBUTION

The conditional channel state distribution is the key to determining the capacity of the FSMC through a recursive algorithm. It is also a sufficient statistic for the input given all past inputs and outputs, thus allowing for the reduced complexity of the decision-feedback decoder. In this section we show that the state distribution conditioned on past input/output pairs can be calculated using a recursive formula. A similar formula is derived for the state distribution conditioned on past outputs alone, under the assumption of independent channel inputs. We also show that these state distributions converge weakly under i i d . inputs, and the resulting limit distributions are continuous functions of the input distribution.

We denote these conditional state distributions by the K - dimensional random vectors T~ = ( T ~ ( l), . . . , .irn(K)) and ,on = (~~(1): ' ' ~ pn(K) ) , respectively, where

(9) P7L(k) = P(Sn = Ck I Yn-l) and

The following recursive formula for T~ is derived in Appendix I:

where D(x,, y,) is a diagonal K x K matrix with kth diagonal term pk(y, I IC,), and 1 = (1,. . . , l)T is a K-dimensional vector. Equation (1 1) defines a recursive relation for 7rn which takes values on the state space

The initial value for 7rn is

T o = @(So = Cl),.",P(SO = C K ) )

and its transition probabilities are

P(.irn+l=Q. I T n = P ) = l [ ( x n , y n ) : f ( xn , yn ,P)=aI X , E X Y n E Y

. P ( Y n I T n = P , x n ) P ( Z n ) . (12)

Note that (12) is independent of n for stationary inputs. For independent inputs, there is a similar recursive formula

for Pn

where B(y,) is a diagonal K x K matrix with kth diagonal term p(yn 1 S, = Q).' The derivation of (13) is similar to that of (1 1) in Appendix I, using (8) instead of (5) and removing all IC terms. The variable ,on also takes values on the state space A, with initial value PO = TO and transition probabilities

P(Pn+l = 01 I Pn = P ) = f ^ ( Y n , P ) = 4 Y n E Y

.P(Yn I P n = P I . (14) 'Note that B ( y n ) has an implicit dependence on the distribution of xn.

GOLDSMITH AND VARAIYA: CAPACITY, MUTUAL INFORMATION, AND CODING FOR MARKOV CHANNELS

~

87 1

As for 7rn, the transition probabilities in (14) are independent of n when the inputs are stationary.

We show in Appendix I1 that for i.i.d. inputs, 7rn and pn are Markov chains that converge in distribution to limits which are independent of the initial channel state, under some mild constraints on C. These convergence results imply that for any bounded continuous function f , the following limits exist and are equal for all i:

lirn ~ [ f ( 7 r ~ ) ] = lim E [ ~ ( T ; ) ] n-oo n-m

and

where

and

P: = p ( S n I F1, SO = ea).

This convergence allows us to obtain a closed-form solution for the mutual information under i.i.d. inputs. We also show in Lemmas A2.3 and A2.5 of Appendix I1 that the limit distributions for 7r and p are continuous functions of the input distribution.

Lemmas A2.6 and A2.7 of Appendix I1 show the surprising result that 7rn and pn are not necessarily Markov chains when the input distribution is Markov. Since the weak convergence of 7rn and prl requires this Markov property, (15) and (1 6) are not valid for general Markov inputs.

Iv. ENTROPY, MUTUAL INFORMATION, AND CAPACITY

We now derive the capacity of the FSMC based on the distributions of 7rn and pn. We also obtain some additional properties of the entropy and mutual information when the channel inputs are i.i.d.

By definition, the Markov chain Sn is aperiodic and irreducible over a finite state space, so the effect of its initial state dies away exponentially with time [12]. Thus the FSMC is an indecomposable channel. The capacity of an indecomposable channel is independent of its initial state, and is given by [13, Theorem 4.6.41

(17) 1

n-w~(xn) rb C = lim max - I ( X n ; Y n )

where I ( . ; .) denotes mutual information and P ( X n ) denotes the set of all input distributions on X". The mutual information can be written as

I ( X " ; Y") = H ( Y " ) - H ( Y n 1 X") (18)

where H ( Y ) = E [-logp(y)] and H ( Y I X ) = E [-logp(y I x)]. It is easily shown [14] that

and

n

H(Y" I X " ) = CH(E: 1 X i , Y t - l , x i - l ) . (20) 2=1

The following lemma, proved in Appendix 111, allows the mutual information to be written in terms of 7rn and pn.

Lemma 4.1:

and

r K 1

Using this lemma in (19) and (20) and substituting into

Theorem 4.1: The capacity of the FSMC is given by (IS) yields the following theorem.

1 C = lim max -

"-00 P ( X - ) n n r r K

i=l L L k=l J

where the dependence on Q E P ( X n ) of the distributions for 7ra , pa, and ya is implicit. This capacity expression is easier to calculate than Gallager's formula (17), since the 7rn terms can be computed recursively. The recursive calculation for pt requires independent inputs. Ilowever, for many channels of interest H ( Y , 1 p a ) will be a constant independent of the input distribution (such channels are discussed in Section V). For these channels, the capacity calculation reduces to minimizing the second term in (23) relative to the input distribution, and the complexity of this minimization is greatly reduced when 7rt can be calculated easily.

Using Lemma 4.1, we can also express the capacity as

Although [ 13, Theorem 4.6.41 guarantees the convergence of (24), the random vectors 7rn and pn do not necessarily converge in distribution for general input distributions. We proved this convergence in Section 111 for i.i.d. inputs. We now derive some additional properties of the entropy and mutual information under this input restriction. These properties are summarized in Lemmas 4 .24 .7 below, which are proved in Appendix IV.

SI2

Lemma 4.2: When the channel inputs are stationary

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 42, NO. 3, MAY 1996

Proof: From (18)

Lemma 4.3: For i.i.d. input distributions, the following limits exist and are equal:

lim H(Y, I X , , X ~ - - ~ , Y ~ - ' ) n i m

= lim H(Yn 1 Xn,XR-',Y"-l,So). (26) n i m

We now consider the entropy in the output alone. Lemma 4.4 For stationary inputs,

Lemma 4.5: For i.i.d. input distributions, the following limits exist and are equal:

lim H(Yn 1 Yn-l) = lim H ( Y , I Y"- ' ;SO). (28) n i m n-oc

The next lemma is proved using the convergence results for nn and pn and a change of variables in the entropy expressions (26) and (28).

Lemma 4.6: For any i.i.d. input distribution 8 E P ( X )

X E X

(29)

where the Q superscript on p,, 7rn, and p ( g 1 p ) shows their dependence on the input distribution, U' denotes the limiting distribution of p i , and pB denotes the limiting distribution of TTL.

We now combine the above lemmas to get a closed form expression for the mutual information under i i d . inputs.

Theorem 4.2: For any i.i.d. input distribution 8 E P ( X ) , the average mutual information per channel use is given by

B

A . 1 IQ = lim - I @ ( Y " ; X " ) n-m 'n.

Y€Y J A

YCY X E X

J A

I (Y"; X " ) = H ( Y " ) - H ( Y "

If we fix 8 E P ( X )

n

H ( Y " 1 X " ) = H(yz I x,, Y 2 z = 1

X " ) .

' , X 2 - ' ) (31)

by (20), and the terms of the summation are nonnegative and monotonically decreasing in i by Lemma 4.2. Thus

= lim H(Y , 1 X n , X n P 1 , Y n - ' ) . (32) n-w

Similarly, from (19)

n

H ( Y " ) = C H ( Y , 1 yi-1) i = I

(33)

and by Lemma 4.4, the terms of this summation are nonnegative and monotonically decreasing in i . Hence

Applying Lemmas 4.1 and 4.6 completes the proof. 0 It is easily shown that since Y' and p B are continuous

functions of 8, IQ is also. Moreover, the calculation of IQ is relatively simple, since asymptotic values of p and Y

are obtained using the recursive formulas (12) and (14), respectively. For the channel described in Section VII, these recursive formulas closely approach their final values after only 40 iterations. Unfortunately, this simplified formula for mutual information under i.i.d. inputs cannot be extended to Markov inputs, since 7r, and p n are no longer Markov chains under these conditions.

We now consider the average mutual information maximized over all i.i.d. input distributions. Define

(35)

Since P ( X ) is compact and IO continuous in 8, Ii.i.d. achieves its supremum on P ( X ) , and the maximization can be done using standard techniques for continuous functions. Moreover, it is easily shown that 1i.i.d 5 c. Thus (35) provides a relatively simple formula to lower-bound the capacity of general FSMC's.

The next section will describe a class of channels for which uniform i.i.d. channel inputs achieve channel capacity. Thus 1i.i.d. = C , and the capacity can be found using the formula of Theorem 4.2. This channel class includes fading or variable- noise channels with symmetric PSK inputs, as well as channels which vary over a finite set of BSC's.

GOLDSMlTH AND VARAIYA: CAPACITY, MUTUAL INFORMATION, AND CODING FOR MARKOV CHANNELS x73

V. UNIFORMLY SYKMETRIC VARIABLE-NOISE CHANNELS

In this section we c' h e two classes of FSMC's: uniformly symmetric channels i I variable-noise channels. The mutual information and capa ty of these channel classes have additional properties wl ch we outline in the lemmas below. Moreover, we will show in the next section that the decision- feedback decoder achieves capacity for uniformly symmetric variable-noise FSMC's.

Dejnition: For a DMC, let M denote the matrix of inputloutput probabilities

n Mij = p(y = . j I 2 = i ) j j E y . 1; E x.

A discrete memoryless channel is output-symmetric if the rows of M are permutations of each other, and the columns of M are permutations of each other.2

Dejinition: A FSMC is uniformly symmetric if every channel ck E C iy output-symmetric.

The next lemma, proved in Appendix V, shows that for uniformly symmetric FSMC's, the conditional output entropy is maximized with uniform i.i.d. inputs.

Lemma 5.1: For uniformly symmetric FSMC's and any initial state So = c,, H(Y , I pn) . H(Y , I &), H ( Y , I 7rn) ,

and H ( Y , I T A ) are all maximized for a uniform and i.i.d. input distribution, and these maximum values equal log IYI.

Definition: Let X , and Y, denote the input and output, respectively, of an FSMC. We say that an FSMC is a variable- noise channel if there exists a function such that for 2, = 4(Xn,Y , ) , p ( Z n I X n ) = p ( Z n ) , and 2" is a sufficient statistic for S" (so S" is independent of X" and Y" given Zn) . Typically, d, is associated with an additive noise channel, as we discuss in more detail below.

If 2" is a sufficient statistic for S", then

n 7r" = p ( S , I x n - 1 , y n - l )

= p ( S , I x-1, y 7 1 - 1 , 2"-1 ) =p(sn I zrL--l). (36)

Using (36) and replacing the pairs (X,.Y,) with Z,, in the derivation of Appendix I, we can simplify the recursive calculation of 7r,

where D(zn) is a diagonal K x K matrix with kth diagonal term p(zn I S, = c k ) . The transition probabilities are also simplified

2 , € 2

The next lemma, proved in Appendix V, shows that for a uniformly symmetric variable-noise channel, the output entropy conditioned on the input is independent of the input distribution.

2Symmetric channels, defined in [13, p. 941, are a more general class of memoryless channel?; an output-symmetric channel is a symmetric channel with a single output partition.

Lemma 5.2: For uniformly symmetric variable-noise FSMC's and all i , H(Yn I Xn,7r,) and H ( Y , 1 X,,n;) do not depend on the input distribution.

Consider an FSMC where each c k t C is an AWN channel with noise density n k . If we let 2 = Y ~ X , then it is easily shown that this is a variable-noise channel. However, such channels have an infinite output alphabet. In general, the output of an AWN channel is quantized to the nearest symbol in a finite output alphabet: we call this the quantized AWN (Q-AWN) channel.

If the Q-AWN channel has a symmetric multiphase input alphabet of constant amplitude and output phase quantization [4, p. 801, then it is easily checked that pk: (y I x) depends only on pk( Iy-xl), which in turn depends only on the noise density n k . Thus it is a variable-noise channeL3 We show in Appendix VI that variable-noise Q-AWN channels with the same input and output alphabets are also uniformly symmetric. Uniformly symmetric variable-noise channels have the property that 1i . i .d .

equals the channel capacity, as we show in the following theorem.

Theorem 5. I : Capacity of uniformly symmetric variable- noise channels is achieved with an input distribution that is uniform and i.i.d. The capacity is given by

r

where p is the limiting distribution for T , under uniform i.i.d. inputs. Moreover, C = limn-+oo C, = limn+oo (7; for all i , where

increases with n, and

decreases with n. Proof From Lemmas 5.1 and 5.2, C,; Ci, and C are

all maximized with uniform i.i.d. inputs. With this input distribution

Crl = log IYI - H(Y" I X,,n,)

c; = log /YI - H(Y" I Xn,7rA).

and

Applying Lemmas 4.2 and 4.3, we get that H(Y , I X7,,7rn) decreases with n, H ( Y , I X,, TI) increases with n, and both

31f the input alphabet of a Q-AWN channel is not symmetric or the input symbols have different amplitudes, then the distribution of Z = 11' - will depend on the input. To see this, consider a Q-AWN channel with a 16- QAM inputhtput alphabet (so the output is quantized to the nearest input symbol). There are four different sets of Z = 11' - A-/ values, depending on the amplitude of the input symbol. Thus the distribution of Z over all its possible values (the union of all four sets) will change, depending on the amplitude of the input symbol.

~


n n

DECODER FSMC INTER- LEAVER ENCODER

Fig. 3. System model.

STATE ESTIMATOR MAXIMUM-LIKELIHOOD DECODER ........................................................................ ................................................................. n

' j / - I :

................................................... :

Fig. 4. Decision-feedback decoder.

converge to the same limit. Finally, under uniform i i d . inputs

by Lemma 4.1 and (32). Applying Lemma 4.6 to

lim H(Yn 1 X,,nn) 12-03

completes the proof. 0 The BSC is equivalent to a binary-input Q-AWN channel

with binary quantization [4]. Thus an FSMC where Ck indexes a set of BSC's with different crossover probabilities is a uniformly symmetric variable-noise channel. Therefore, both [l, Proposition 41 and the capacity formula obtained in [5] are corollaries of Theorem 5.1.

VI. DECISION-FEEDBACK DECODER

A block diagram for a system with decision-feedback decoding is depicted in Fig. 3. The system is composed of a conventional (block or convolutional) encoder for memoryless channels, block interleaver, FSMC, decision-feedback decoder, and deinterleaver. Fig. 4 outlines the decision-feedback decoder design, which consists of a channel state estimator followed by an ML decoder. We will show in this section that if we ignore error propagation, a system employing this decision-feedback decoding scheme on uniformly symmetric variable-noise channels is information-lossless: it has the same capacity as the original FSMC, given by (30) for i.i.d. uniform inputs. Moreover, we will see that the output of the state estimator is a sufficient statistic for the current output given all past inputs and outputs, which reduces the system of Fig. 3 to a discrete memoryless channel. Thus the ML input sequence is determined on a symbol-by-symbol basis, eliminating the complexity and delay of sequence decoders.

The interleaver works as follows. The output of the encoder is stored row by row in a J x L interleaver, and transmitted over the channel column by column. The deinterleaver per- forms the reverse operation. Because the effect of the initial

channel state dies away, the received symbols within any row of the deinterleaver become independent as J becomes infinite. However, the symbols within any column of the deinterleaver are received from consecutive channel uses, and are thus dependent. This dependence is called the latent channel memory, and the state estimator enables the ML decoder to make use of this memory.

Specifically, the state estimator uses the recursive relationship of (1 1) to estimate nn. It will be shown below that the ML decoder operates on a memoryless system, and can therefore determine the ML input sequence on a per-symbol basis. The input to the ML decoder is the channel output y, and the state estimate e,, and its output is the z, which maximizes logp(y,. ?, 1 zn ) , assuming equally likely input ~ y m b o l s . ~ The soft-decision decoder uses conventional techniques (e.g., Viterbi decoding) with branch metrics

(43)

We now evaluate the information, capacity, and cutoff rates of a system using the decision-feedback decoder, assuming Cn = nn (i.e., ignoring error propagation). We will use the notation y31 = yn to explicitly denote that yn is in the j th row and Ith column of the deinterleaver. Similarly, njl = T ,

and zjl = 2 , denote, respectively, the state estimate and interleaver input corresponding to yjl. Assume now that the state estimator is reset every J iterations so, for each 1, the state estimator goes through j recursions of (1 1) to calculate ~ ~ l . By (12), this recursion induces a distribution p(njl) on n j ~ that depends only on p(X j - ' ) . Thus the system up to the output of the state estimator is equivalent to a set of parallel z-output channels, where the n-output channel is defined, for a given j , by the input zjl, the output pair (yjl, njl), and the inputloutput probability

A

n

A

P ( Y j l , T j l I X j l ) = C P k ( Y j l I xjz)Tjl(k)P(njl). (44) k

41f the xn are not equally likely, then logp(s,) must be added to the decoder metric.

GOLDSMITH AND VARAIYA: CAPACITY, MUTUAL INFORMATION, AND CODING FOR MARKOV CHANNELS 87.5

For each j , the r-output channel is the same for I = 1,2, . . . L, and therefore there are J different r-output channels, each used L times. We thus drop the I subscript of z j l , yjl, and rj l in the decoder block diagram of Fig. 4. The first 7r-output channel ( j = 1) is equivalent to the FSMC with interleaving and memoryless channel encoding, since the estimator is reset and therefore TI^ = T O ; 1 5 1 5 L.

The j th r-output channel is discrete, since xji and yjl are taken from finite alphabets, and since 7r j i can have at most lX l j lylj different values. It is also asymptotically memoryless with deep interleaving (large <I), which we prove in Appendix VII. Finally, we show in Appendix VI11 that for a fixed input distribution, the .I 7r-output channels are independent, and the average mutual information of the parallel channels is

Let

- .I

where

(47) a c, = H ( Y , I X,) - H ( Y , I X , . d

for the maximizing distribution p (XJ) . The capacity of the decision-feedback decoding system is then

Comparing (48) to (24), we see that the capacity penalty of the decision-feedback decoder is given by

For uniformly symmetric variable-noise channels, uniform i.i.d. inputs achieve both C and Cdf, and with this input C - Cdf = 0. Thus the decision-feedback decoder preserves the inherent capacity of such channels.

Although capacity gives the maximum data rate for any ML encoding scheme, established coding techniques generally operate at or below the channel cutoff rate [41. Since the T-

output channels are independent for a fixed input distribution p ( X ” ) , the random coding exponent for the parallel set is

.I

E o ( l , p ( X J ) ) = >:R3 (50) 7=l

where

Rj =

The cutoff rate of the decision-feedback decoding system is

R~~ = lirri rnax - ~ j .

We show in Appendix IX that for uniformly symmetric variable-noise channels, the maximizing input distribution in (52) is uniform and i.i.d., the resulting value of Ri is increasing in j , and the cutoff rate R d f becomes

(52) n 1

J - c Q P ( X J ) J j=l

/

where p is the invariant distribution for ?r under i.i.d. uniform inputs.

Our calculations throughout this section have ignored the impact of error propagation. Referring to Fig. 4, error propagation occurs when the decision-feedback decoder output for the maximum-likelihood inpiit symbol 2, is in error, which will then cause the estimate of +, to be in error. Since x, is the value of the coded symbol, the error probability for 8, does not benefit from any coding gain. Unfortunately, since block or convolutional decoding introduces delay, the post-decoding decisions cannot be fed back to the decision- feedback decoder to update the +, value. This is exactly the difficulty faced by an adaptive decision-feedback equalizer (DFE), where decoding decisions are used to update the DFE tap coefficients [16]. New methods to combine DFE’s and coding have recently been proposed, and reveral of these methods can be used to obtain some coding gain in the estimate of xJ fed back through our decision-feedback decoder. In particular, the structure of our decision-feedback decoder already includes the interleaveddeinterleaver pair proposed by Eyuboglu for DFE’s with coding 1171. In his method, this pair introduced a periodic delay in the received bits such that delayed reliable decisions can be used for feedback. Applying this idea to our system effectively combines the decision- feedback decoder, deinterleaver, and decoder. Specifically, the symbols transmitted over each r-output channel are decoded together, and the symbol decisions output from the decoder are then used by the decision-feedback decoder to update the values of the subsequent 7r-output channel. The complexity and delay of this design increases linearly with the block length of the T-output channel code, but it is independent of the

~

876

X *

IEEE TRANSACTIONS ON INFORMATION THEORY,

1-b

VOL. 42, NO. 3, MAY 1996

I

Fig. 5. Two-state fading channel.

"G

I " 8

I

9

TWO-STATE CHANNEL

1-9

Y t

channel memory since this memory is captured in the sufficient statistic rr,. Another approach to implement coding gain uses soft decisions on the received symbols to update 7rn, then later corrects this initial rr, estimate if the decoded symbols differ from their initial estimates [ 181. This method truncates the number of symbols affected by an incorrect decision, at a cost of increased complexity to recalculate and update the rrn values. Finally, decision-feedback decoding can be done in parallel, where each parallel path corresponds to a different estimate of the received symbol. The number of parallel paths will grow exponentially in this case, however we may be able to apply some of the methods outlined in [I91 and [20] to reduce the number of paths sustained through the trellis.

VII. TWO-STATE VARIABLE-NOISE CHANNEL

We now compute the capacity and cutoff rates of a two- state Q-AWN channel with variable SNR, Gaussian noise, and 4PSK modulation. The variable SNR can represent different fading levels in a multipath channel, or different noise andor interference levels. The model is shown in Fig. 5. The input to the channel is a 4PSK symbol, to which noise of variance n G or nB is added, depending on whether the channel is in state G (good) or B (bad). We assume that the SNR is 10 dB for channel G, and -5 dB for channel B. The channel output is quantized to the nearest input symbol and, since this is a uniformly symmetric variable-noise channel, the capacity and cutoff rates are achieved with uniform i.i.d. inputs. The state transition probabilities are depicted in Fig. 5. We assume a stationary initial distribution of the state process, so SO = G) = g / ( g + b ) and p(S0 = B) = b / ( g + b ) .

Fig. 6 shows the iterative calculation of (12) for p(rr, (G) = m), where

In this example, the difference of subsequent distributions after 40 recursions is below the quantization level (dol = 0.01) of the graph. Fig. 7 shows the capacity (C,) and cutoff rate (R,) of the .7th rr-output channel, given by (47) and (52), respectively. Note that C,=, and R,=l in this figure are the capacity and cutoff rate of the FSMC with interleaving and memoryless channel encoding. Thus the difference between the initial and final values of C, and R, indicate the performance improvement of the decision-feedback decoder over conventional techniques.

p( acn(n)[G]<a+da)

n=l5 _ _ n=10

n=5 :J n=O

.__

-

p( acn(n)[G]<a+da) 1 o r

- n=15 _ _ n=10

n=5 .__ 0 8

0.6

0.4

n=O

g=b=.l da= .01

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 a

Fig. 6 . Recursive distribution of T,.

1.1 t

, ,,, . . . . . . . . . . . . .. . ............ .. ....................

- C(i) for g=b=.Ol _ _ _ C(j) for g=b=.l - _ R(i) for g=b=.Ol .. . . . . R(j) for g=b=.l

0 5 10 15

j

Fig. 7. Capacity and cutoff rate for j t h 7i-output channel.

For this two-state model, the channel memory can be n

quantified by the parameter p = I-g-b, since for a E { G, B } c11

p ( S n = a I s, = a ) - p ( S , = a I so # a ) = p,'. (54)


~

877

I ' l l 1 .o

I

I

I

I

0.8 I 1 I I I I I 1 I 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Fig. 8.

(I) 2.0 m - (I) 1.8 C 2 1.6 0 m 5 1.4

1.2

. c

1 .o

0.8

0.6

0.4

0.2

0.9

P

Decoder performance versus channel memory. ..; . , / . - ' - - - - - - - _ - - - - _ _ - -

..,.. .. . ' '

,... . . . . - - , .. , . . . . . .

. . . . . . . . . . , . . . . . , . . . . ~ capacity (b=.l)

cutoff rate (b=.l) _ - .__ capacity (b=.9)

cutoff rate (b=.9)

I I I I I I I I I 1

9 ) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Fig. 9. Decoder performance versu5 g

In Fig. 8 we show the decision-feedback decoder's capacity and cutoff rates (Cdf and Rdf , respectively) as functions of p. We expect these performance measures to increase as p increases, since more latency in the channel should improve the accuracy of the state estimator; Fig. 8 confirms this hypothesis. Finally, in Fig. 9 we show the decision- feedback decoder's capacity and cutoff rates as functions of g. The parameter g is inversely proportional to the average number of consecutive B channel states (which corresponds to a 15 dB fade), thus Fig. 9 can be interpreted as the relationship between the maximum transmission rate and the average fade duration.

VIII. SUMMARY

We have derived the Shannon capacity of an FSMC as a function of the conditional probabilities

Pn(k ) = P(Sn = Ck I P)

and

We also showed that with i.i.d. inputs, these conditional probabilities converge weakly, and the channel's mutual information under this input constraint is then a closed-form continuous function of the input distribution. This continuity allows I , , d , the maximum mutual information of the FSMC over all i.i.d. inputs, to be found using standard maximization techniques. Additional properties of the entropy and capacity for uniformly symmetric variable-noise channels were also derived.

We then proposed an ML decision-feedback decoder, which calculates recursive estimates of T , from the channel output and the decision-feedback decoder output. We showed that for asymptotically deep interleaving, a system employing the decision-feedback decoder is equivalent to a discrete memoryless channel with input .rn and output (y,, rn). Thus the ML sequence decoding can be done on a symbol-by- symbol basis. Moreover, the decision-feedback decoder preserves the inherent capacity of uniformly symmetric variable- noise channels, assuming the effect of error propagation is negligible. This class of FSMC's includes fading or variable- noise channels with symmetric PSK inputs as well as channels which vary over a finite set of BSC's. For general FSMC's, we obtained the capacity and cutoff rate penalties of the decision-feedback decoding scheme.

We also presented numerical results for the performance of the decision-feedback decoder on a two-state variable-noise channel with 4PSK modulation. These results demonstrate significant improvement over conventional schemes which use interleaving and memoryless channel encoding, and the improvement is most pronounced on quasistatic channels. This result is intuitive, since the longer the FSMC stays in a given state, the more accurately the state estimator will predict that state. Finally, we present results for the decoder performance relative to the average fade duration; as expected, the performance improves as the average fade duration decreases.

APPENDIX 1

In this Appendix, we derive the recursive formula (1 1) for T,. First, we have (55) at the top of the following page, where a, b, and d follow from Bayes rule, and c follows from (5 ) . Moreover

p(?, y") = P(P, Y", s,, = C k )

p(z,,y, 1 s, = Ck,Zn-- l ,yn- - l )

= CP(:Y, I sn = C k , G 4 n - 1 , Y n - 1 )

P ( Y n I s7l =: C k , Z n ) P ( G l I zn-7

k E K

= k € K

. p(S, = CL, zn--l, p)

. p ( z , 1 S,,zn--l,yn--l)p(S, = C k , X n - - l , y n - - l )

L E K

= k E K

.p(Sn = C L 1 2n-1,y"-1)p(~"-1,yn-1) (56)

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 42, NO. 3, MAY 1996

where we again use Bayes rule and the last equality follows from (5). Substituting (56) in the denominator of (59 , and canceling the common terms p ( z , I IC,-') and p(xn-', 1~"~') yields

P(Sn I xn,Yn) - -

P(Yn I Sn,zn)p(Sn I xn-l,yn-l) P ( Y n I sn = Ck,zn)P(sn = Ck 1 xn-'.yn-')

k t K

(57)

which, for a particular value of S,, becomes

P(S, = Cl I xn ,Yn) - P(Yn I sn = CL,z:,)P(Sn = c1 I zn-l,yn-l) c P(Yn I s, = Ck,x,)P(Sn = Ck I xn-1%Yn-l) -

k € K

(58)

Finally, from (4)

P(S,+I -- Cl I zn, Y") = P(S, = cg I zn, Y n ) P 3 1 . (59) JEK

Substituting this into (58) yields the desired result

APPENDIX I1

In this Appendix we show that for i.i.d. inputs, T, and ,on are Markov chains that converge in distribution to a limit which is independent of the initial channel state, and that the resulting limit distributions are continuous functions of the input distribution ~ ( I c ) . We also show that the Markov property does not hold for Markov inputs.

We begin by showing the Markov property for independent inputs.

Lemma A2.1: For independent inputs, 7rn is a Markov chain.

Proof:

P(Tn+l I Tn, . . . , T O ) = P(T,+l I Tn , . . .,TO, Z,, Yn)

= P(T,+~ I T n , ~ n ) p ( x n , ~n IT,)

= P(Tn+l I Tn) (60)

x n ,Yn

.P(Z,, Yn IT,,.. ' 1 710)

X n r Y n

~~

To obtain the weak convergence of rn and ,on, we also assume that the channel inputs are i.i.d., since we can then apply convergence results for partially observed Markov chains [21]. Consider the new stochastic process U, = (S,, y,, z,) defined on the state space U = C x Y x X. Since S, is stationary and ergodic and z, is i.i.d., U, is stationary and ergodic. It is easily checked that U, is Markov.

Let (S. y. z) j denote the j th element of U, and J = IUI. To specify its individual components, we use the notation

n

n

n ( s ( j ) > Y ( j ) > q ) ) = (S ,Y>Z) j .

= P[(Sn+l,Yn+l,GL+l) = (S,Y,.)j I (Sn,Yn,Zn) The J x J probability transition matrix for U , Pu, is

= (S,Y, . )k) l (61)

= TO(k)Pk(YO I z o ) p ( z o ) . (62)

independent of n. The initial distribution of U , T:, is given by

P ( S 0 = C k ; Yo = Y, 5 0 =

Let gy.z: U i Y x X and gy : U i Y be the projections

gy,z(Sn,Yn,~n) = (Yn,GL)

gy(Sn,Yn,Zn) = (Yn).

and

These projections form the new processes W, = g,,x[U,] and V, = gy[Un]. We regard W, and V, as partial observations of the Markov chain U,; the pairs (Un, W,) and (U,, Vn) are referred to as partially observed Markov chains. The distribution of U, conditioned on W, and V,, respectively, is

7r: = (T,u(1) ," . ,7r ,u(J))

P, = (,o:(l), . . . , P,u (J ) )

T 3 . i ) = P(U, = ( S , Y , Z ) j I W n )

= P(U, = ( S , Y , Z ) j I V").

and U-

where

(63)

(64)

and

Note that

.,u(j) = P(Un = (S, Y , 4 J I W") where the second equality follows from (1 1) and (6). Thus 7r,

,on is also Markov for independent inputs. is Markov. A similar argument using (13) and (8) shows that = P(S, = q3) I Zn, w")li.n = Z(3), Yn = Y(3)l

= nn(k)l[z, = X ( 3 ) , ? J n = Y(3)I (65)

GOLDSMITH AND VARAIYA: CAPACITY, MUTUAL INFORMATION, AND CODING FOR MARKOV CHANNELS 879

where S(jl = c k . Thus if n," converges in distribution, n, must also converge in distribution. Similarly, p n converges in distribution if does.

We will use the following definition for subrectangular matrices in the subsequent theorem.

Definition: Let D = (Dr3) denote a d x d matrix. If Dz1,71 # 0 and Dz2,32 # 0 implies that also DzlrJ2 # 0 and DZLr3, # 0, then D is called a subrectangular matrix.

We can now state the convergence theorem, due to Kaijser [21], for the distribution of a Markov chain conditioned on partial observations.

Theorem A2. I : Let U, be a stationary and ergodic Markov chain with transition matrix Pu and state space U . Let y be a function with domain U and range Z. Define a new process 2, = g(U,). For z E Z and U(,) the j th element of U , define matrix M ( z ) by

Suppose that Pu and 9 are such that there exists a finite sequence z1, . . . , z , of elements in Z that yield a nonzero subrectangular matrix for the matrix product M(z1) . . . M(z,). Then p ( U,, I Z n ) converges in distribution and moreover the limit distribution is independent of the initial distribution of U .

We first apply this theorem to T,".

Assumption 1: Assume that there exists a finite sequence (yn, .E,), n = 1,. . . , rrL, such that the matrix product M(y1, z1) . . . M(y,, z,) is nonzero and subrectangular, where

Then by Theorem A2.1, n," converges in distribution to a limit which is independent of the initial distribution. By (65), this implies that 7rTL also converges in distribution, and its limit distribution is independent of T O . We thus get the following lemma, which was stated in (15).

Lemma A2.2: For any bounded continuous function f , the following limits exist and are equal for all i

(68)

The subrectangularity condition on M is satisfied if for some input ic E X there exists a y E 3 such that p k ( y I .E) > 0 for all k . It is also satisfied if all the elements of the matrix P are nonzero.

From (1 1) and (12), the limit distribution of T~ is a function of the i.i.d. input distribution. Let P ( X ) denote the set of all possible distributions on X . The following lemma, proved below, shows that the limit distribution of nn is continuous on P ( X ) .

Lemma A2.3: Let p' denote the limit distribution of 7r, as a function of the i.i.d. distribution 0 E P ( X ) . Then p' is a continuous function of 8, i.e., 8, 4 I9 implies that pLenL 4 p'.

We now consider the convergence and continuity of the distribution for pn. Define the matrix N by

lini E[f(7rn)] = lim E[.f(7rR)]. n-cc n-cc

(69) if gy[(S, y, z) j ] = y otherwise.

and note that for any y E Y and :E E X

M&, x) = N&)I(ic(j) = x). (70)

To apply Theorem A2.1 to p t , we must find a sequence y1, . . . , y1 which yields a nonzero and subrectangular matrix for the product N(y1) . . . N(yl ) . Consider the projection onto y of the sequence ( y n , z,), 71 = 1, . . . , nr, from Assumption 1. Let yn, n = 1, . . . , m denote this projection. Using (70) and the fact that all the elements of M are nonnegative, it is easily shown that for M = M(yl,sl)...M(y,,z,) and N = N(y1) . . . N(g,), if for any i and ,I, Mi,j is nonnegative, then Ni,j is nonnegative also. From this we deduce that if M is nonzero and subrectangular, then N must also be nonzero and subrectangular.

We can now apply Theorem A2.1 to p:, which yields the convergence in distribution OS (1: and thus p,. Moreover, the limit distributions of these random vectors are independent of their initial states. Thus we get the following result, which was stated in (16).

Lemma A2.4: For any bounded continuous function f , the following limits exist and are equal for all i :

A

A

From (13) and (14), the limit distribution of p n is also a function of the input distribution. The following lemma shows that the limit distribution of prL is continuous on P ( X ) .

Lemma A2.5: Let U' denote the limit distribution of p n as a function of the i.i.d. distribution I9 E P ( X ) . Then U' is a continuous function of H , so 0, 4 H implies that vBn' 4 U'.

Proof of Lemmas A2.3 and A2.5: We must show that for all 0,, 0 E P ( X ) , if Qm --+ 0, then p', --+ p' and u ' ~ ~ ~ ---f u8. We first show the convergence of vBm. From [12, p. 3461, in order to show that I,'- --+ U', it suffices to show that (U',} is a tight sequence of probability measure^,^ and that any subsequence of U'- which converge? weakly converges

Tightness of the sequence {U',} follows from the fact that A is a compact set. Now suppose there is a subsequence U',, = U', which converges weakly to ,si/. We must show that (Sl, = U', where 11' is the unique invariant distribution for p under the transformation (14) with input distribution p ( x ) = 0. Thus it suffices to show that for every bounded, continuous, real-valued function 4 on A,

to U'.

n

4(ff)d)(dQ) = J, J, 4 ( 4 4 W ) P 0 ( f j ( ? I P ) (72)

where p ' ( a I p) = p(p,+l =- Q I = p) is given by (14) under the i.i.d. input distribution 8, and is thus independent of n. Applying the triangle inequality we get that for any k

n

5 A sequence of probability measures {vTrz} is tight if for all f > 0 there exists a compact set I< such that 7/ (I<) > 1 - f for all 7 1 E {vm}.


f are linear functions of 0, and the denominator is nonzero. Similarly, d k + B implies that for fixed y and /3, p o k (y I P ) +

p o ( y 1 01, since pe(y 1 p ) is linear in B . Since 4 is continuous, this implies that for fixed y and P (74)

(75)

Since this inequality holds for all IC, in order to show (72), we need only show that the three terms (73)-(75) all converge to zero as k i 00. But (73) converges to zero since vok converges weakly to $. Moreover, (74) equals zero for all k , since v o k is the invariant p distribution under the transformation (14) with input distribution d k . Substituting (14) for @ ( a I 0) in (75) yields

where f e is given by (13) with p ( z ) = 8, and

K

Thus for any t we can find k sufficiently large such that

J , i . i f f l r ( l J : P ) ) P S k (Y I P ) - 4 ( f " Y l P ) ) P % I P W k (dP)

5 t L v y d P ) = t. (81)

So (79) converges to zero. Finally, for fixed y and 8, fs(y, p) and pe(y I P ) are linear in b, so 4(fo(y,P))pe(y I P ) is a bounded continuous function of p. Thus (80) converges to zero by the weak convergence of u O k to ?,b [12, Theorem 25.81. i?

Since the { p o n z ) sequence is also tight, the proof that pe- i khe follows if the limit of any convergent subsequence of {born) is the invariant distribution for 7r under (12). This is shown with essentially the same argument as above for i v', using (12) instead of (14) fo rp (a 1 p), ps(y I z,P)

instead of po(y 1 /3), and summations over X x y instead of Y. The details are omitted.

LemmaA2.6: In general, the Markov property does not hold for 7rn under Markov inputs.

Prooj? We show this using a counterexample. Let C = {cl: c2, c3) be the state space for S,, with transition probabilities

P = /?II 113 2;3 113 ;;:) 113 (82)

and initial distribution no = (1/3, 1/3,1/3). This Markov chain is irreducible, aperiodic, and stationary. Each of the states correspond to a memoryless channel, where the input alphabet is (0: l} and the output alphabet is { O , l , 2 ) . The memoryless channels c1 , c2, and c3 are defined as follows:

c l : e2 : e3 :

pl(o I 0) = p I ( 2 I 1) = 1, p s ( l I 0) = p ~ ( 2 I 1) = I,

p 3 ( 0 I 1) = p ~ ( 1 I I) = l /2 ,

otherwise p l (y I x) = 0. otherwise pz(y I x) = 0.

otherwise p3(y I x) = 0.

The stochastic process {rn}Tzo then takes values on the three points 010 = (1/3,1/3,1/3), a1 = (2/3,0,1/3), and

Let the Markov input distribution be given by p(x0 = 0) =

p 3 ( 2 IO) = 1,

a 2 = (0 ,2/3,1/3) .

p ( z 0 = I) = 1/2 and p(xn = z n - ~ ) = 1 for R > 0. Then

p(r3 = I ~2 = a o , ~ i = ~ 1 ) = 113 4(.f"Y,P))P0(Y I / 3 ) v o k ( d P ) while

p ( r 3 QO I ~2 = ~ 0 ) = 5/6. + IJ, - s, 4(f0(l/,P))P0(Y I P)V(dP) I . (80)

So

hold for pn under Markov inputs.

is not a Markov process. But for any fixed y and P, d k --+ B implies that f O k (y, p) i

f ' ( y , P ) , since from (13), the numerator and denominator of Lemma A2.7: In general, the Markov property does not

88 I GOLDSMITH AND VARAIYA: CAPACITY, MUTUAL INFORMATION, AND CODING FOR MARKOV CHANNELS

Proof We prove this using a counterexample similar to that of Lemma A2.6. Let the FSMC be as in Lemma A2.6 with the following change in the definition of the memoryless channels c1, c2, and cg:

c l : otherwise p l ( y I E ) = 0. e2 : otherwise p 2 ( y I J ) = 0. c3 :

p l (1 IO) = pl (1 I 1) = 1, p 2 ( 2 I 0) = p2(2 I 1) = 1, p 3 ( 0 I 0) = p 3 ( 2 I 0) = 1/2, P 3 ( 0 I 1) = 1/4, P 3 ( 2 I 1) = 3/4. otherwise p 3 ( y 1 J ) = 0.

It is easily shown that the state space for the stochastic process {pn};?Lo includes the points and a1 defined in Lemma A2.6. Using the same Markov input distribution defined there, we have

P ( P 3 = 0 0 1 p2 = QO,Pl = CUI) = 5/36

P ( P 3 = (20 I P2 = Ro) = 8/57.

while

So { P ~ } , " , ~ is not a Markov process.

The second inequality in (25) results from the fact that conditioning on an additional random variable, in this case the initial state So, always reduces the entropy [14]. The proof of the third inequality in (25) is similar to that of the first

where a and d follow from properties of conditional expectation, b follows from (4) and (S), c follows from Jensen's inequality, and e follows from the channel and input stationarity. 0

Proofof Lemma 4.3: From Lemma 4.1

APPENDIX I11

H(Yn 1 X n , X n - l , Y n - l ) . We have In this Appendix, we prove Lemma 4.1. Consider first

H(Y,, I x,, xn-1, Y-1)

= E[-logp(y, I 2,, zn--l, y n - 9 K

-log p ( y T 1 1 E , , S,,, = c k ) k = 1

(83)

1 K

. p ( ~ n = ck I En-1,Y7~-1)1

= E -log C p l ; ( y n . 1 n;,)n,(k)

= E[-logp(y, 1 5,, 7rTL)]

= H(KL j X n , x n ) .

[ k = l

Similarly

n where n: = xR for some 1:. Applying (IS) to (86) and (87)

Proof of Lemma 4.4: The proof of this lemma is similar to that of Lemma 4.2 above. For the first inequality in (27), we have

completes the proof.

E f ( P [ Y n I A E:f(P[Y,l+l I Yznl)

= E:f(E(P[?In+l I ?]"I I Yz"))

5 EE(f(P[Yn+l I Y,]) I Y,n)

= Ef(P[Yn+l I ? I n ] )

The argument that H(Y, 1 Yn-')) = H ( Y , I ,on) is the same, with all the z terms removed and x, replaced by pTL. U b

APPENDIX IV

(88) d In this Appendix, we prove Lemmas 4.24.6.

Proof of Lemma 4.2; We first note that the conditional where the log function

is concave On Lo, l1. To show the first inequality in (2s), let f denote any concave function. Then

H ( W I v) = ElOgP(w I where a follows from the stationarity of the inputs and channel, b and d follow from properties of conditional expectation [12], and c is a consequence of Jensen's inequality.

The second inequality results from the fact that conditioning on an additional random variable reduces entropy. Finally, for the third inequality, we have

E f ( P [ % I Z7L, xn--l, P I ) ' E.f (p[:Ym.+l 1 2 T L + 1 > z; > Y;1) b = E.f(E(P[Y,+l I Zn+l,:~%P] I z n + l , G , Y ; ) )

4 EE(.f(P[?/,,+, I Zn+l,Zn,yn]) I Tk+1,2;,Y;) E f ( P [ P n + l I Yn, s o ] ) g E. f (E(P[Yn+l I Yn, SI] I Y"; So))

4- Ef(E(P[YTL+l I w;", SI] I Y", so)) 5 EE(f(P[Y,+l I Yz", SI]) I Y n , so) = Ef(P[Yn+l I Y;:S1])

A E f ( P [ Y n . I so])

(84) d = Ef(p[yn+l 1 271+l,Z'L:yn])

where U follows from the stationarity of the channel and d

(89) the inputs, b and d follow from properties of conditional expectation [ 121, and c is a consequence of Jensen's inequality.


where a and d follow from properties of conditional expectation, b follows from (6), c follows from Jensen's inequality, and e follows from the channel and input stationarity. 0

Proof of Lemma 4.5: Following a similar argument as in the proof of Lemma 4.3, we have that

is continuous in p and is bounded by log I y I [12, Theorem 25.81.

The limiting conditional entropy H(Yn 1 X,, T,) is obtained with a similar argument. Let p: denote the distribution of T: and be denote the corresponding limit distribution. Then

a where p: = pC for some i . Applying (16) to (90) and (91)

Proof of Lemma 4.6: We first consider the limiting conditional entropy H(Yn I p:) as n + 00. Let u: denote the distribution of p: and U' denote the corresponding limit distribution. Also, let pO(Y 1 .) explicitly denote the dependence of the (conditional) output probability on 8. Then

completes the proof. 0

lirn H(Y, I p:) n i m

' PO (Yn I Yn-l)PO (w"-') r

c = lim n i x

Y *- s"-lEX"-l

= 7 2 - c c lim c Y"-'€Yn-l Zn-l tX-1

PO ( y n - l , xn-l)

= lim r

The second and fourth equalities in (92) follow from the fact that pn is a function of gnP1. We also use this in the fifth equality to take expectations relative to p n instead of yn- l . The sixth equality follows from the definition of un and the stationarity of the channel inputs. The last equality follows

;>% -lOgP(Y I z, T ) P ( Y I 5 , T)e (z )

(93) where we use the fact that T , is a function of zn-' and yn- l , and the last equality follows from the weak convergence of

0

Y EY X t X

from the weak convergence of p a and the fact that the entropy' 7ra to T O .

GOISXMITH AND VARAIYA: CAPACITY, MUTUAL INFORMATION, AND CODING FOR MARKOV CHANNELS

~

883

APPENDIX V

Proof of Lemma 5. I : From [ 141, In this Appendix, we prove Lemmas 5.1 and 5.2.

H(K I P n ) 5 H(Y,) 5 log I Y I

H(Yn I P k ) I H(Y,) S l o g I Y I and similarly

for any L . But since each ck E C is output symmetric, for each IC the columns of M k e { M t = pk( . j I l ) ,Z E X , j E Y} are permutations of each other. Thus, if the marginal p(z , ) is uniform, then p(v, 1 S, = C A ) is also uniform, i.e., p(vr l I S, = c k ) = 1/ I 3/ I . Hence for any p, E A

K

d ? / n I o n ) = CP(Un I sn = Ck)Pn(k) k = l

- K

and similarly p(yn I &) = 1/ I Y I for any i . Thus

(95) = log I Y I and similarly

for any 7 . Since (95) only requires that p(x,) is uniform for each n, an i.i.d. uniform input distribution achieves this maximum. Substituting T for p in the above argument yields

0 Proof ($Lemma 5.2: We consider only H(Yn I X,, T,),

since the same argument applieq for H(Yn 1 X,.T;). By the output qymmetry of each ~k E C, the sets

the result for H(Y,, 1 T,) and H(Y , 1 T A ) .

are permutations of each other. Thus

So H(Y , I X,, T,) depends only on the distribution of T,.

But by (38), this distribution depends only on the distribution of Z"-l. The proof then follows from the fact that p(Zn I X " ) = p ( 2 " ) . 0

APPENDIX VI

We consider a Q-AWN channel where the output is quantized to the nearest input symbol and the input alphabet consists of symmetric PSK symbols. We want to show that for any IC P t e pk(y = j I z = i ) has rows which are permutations of each other and columns which are permutations of each other. The input/output symbols are given by

, M . (97) - AeJzTTnIM m = 1,. . , Ym = xm -

Define the M x M matrix Z by Z;j = Iy; - zjl and let f&(Z;,) denote the distribution of the quantized noise, which is determined by the noise density n k and the values of A and 211 from (97). By symmetry of the inpuvoutput symbols and the noise, the rows of Z are permutations of each other, and the columns are also permutations of each other.

If M is odd, then

and if M is even

Thus P; depends only on the value of Z z J ; the rows of PG are therefore permutations of each other, and so are the columns.

APPENDIX VI1

We will show that the 7r-output channel is asymptotically memoryless as J i 00. Indeed, since the FSMC is indecomposable and stationary

lim P(S,+J, Sn) = lim p ( S , + ~ ) p ( s , ) ( 100) J-00 J-00

for any n, and thus also

Therefore, since T ~ L and T ~ ( ~ - . ~ ) are J iterations apart, T ~ L and T ~ ( ~ - ~ ) are asymptotically independent as J -+ 00.

In order to show that the 7r-output channel is memoryless, we must show that for any ,j and L

L

p ( Y j L , T j L I z j L ) - - : n ~ ( ~ j l , ~ j l I xjl) . (102) l=1

We can decompose p(yjL, -iriL 1 ziL) as follows:

p(y jL,# I ZjL)

884 [EEE TRANSACTIONS ON INFORMATION THEORY, VOL. 42, NO. 3, MAY 1996

Thus we need only show that the lth factor in the right-hand side of (103) equals p(y,l,?r,l 1 ( c 3 1 ) in the limit as J + x. This result is proved in the following lemma.

Lemma A7.1: For asymptotically large J

p ( y J l , 7r31 I Z31, @I) , 7 r ( l - 1 ) , Z3(z-1) ) = P(Y3l.n.71 I " 3 1 ) .

(104)

where the second equality follows from (4) and (5 ) , the third equality follows from (4) and ( l l ) , and the fourth equality follows from (101) in the asymptotic limit of deep interleaving.

APPENDIX VI11

The 7r-output channels are independent if

J

P(YJ,TJ I Z J ) = flP(Yi,.rri I Z j ) . j=1

This is shown in the following string of equalities:

P ( Y J , X J I Z J ) J

= n p ( y j , 7 r j I Zj,y+1,7rj-1,Zj-l)

= fl& 1 7rj,"j,yj-l,7rj-l,Zj-l)

j=1 .7

j=1

. P(7r j I Z j , yj-l, zj-l) J

= n p ( y j I 7rj,zj)p(7rJ 1 Xj ,y j -1 ,7 r - l ; z j -1 ) j=1

J

= n P i v j I 7 r j , Z j ) P ( X j ) (1 07)

where the third equality follows from ( 5 ) and the last equality follows from the fact that we ignore error propagation, so z j - l , yj-', and 7 r - l are all known constants at time j .

We now determine the average mutual information of the parallel 7r-output channels for a fixed input distribution p ( X " ) . The average mutual information of the parallel set is

j=1

1 J

1, = - I ( Y J , 7 r J ; X J ) . (1 08)

From above, the parallel channels are independent, and each channel is memoryless with asymptotically deep interleaving.

Thus we obtain (45) as follows:

1 J --I(YJ; 7 r J ; X J )

= H ( Y J , 7r") - H ( Y J , 7rJ I X J ) = H ( Y J j 7 r J ) + H(7rJ) - ( H ( Y J j T J , X J ) + H(7rJ I X J ) ) = H ( Y J I 7 r J ) - H ( Y J l T J , X J )

J

= CHI? 1x3) - H ( Y , I T J > X J ) ( 109) 3=1

where the third equality follows from the fact that

p(7rJ I ZJ) = p(7rJ 1 ZJ- l ) = p(7r")

by definition of 7r and by the memoryless property of the 7r3

channels The last inequality follows from the fact that

(110) H ( Y , I YJ-1.T") = H ( Y , I p p J ) = H ( Y , I 7 r J )

since the 7r3 channels are memoryless and pJ = Ez3--17rJ

APPENDIX IX

In this Appendix we examine the cutoff rate for uniformly symmetric variable-noise channels. The first three lemmas show that for these channels, the maximizing distribution of (52) is uniform and i i d . We then determine that Rj, as given by (52), is monotonically increasing in j , and use this to get a simplified formula for R d f in terms of the limiting value of Rj.

Proof From the proof of Lemma 5.2, 7rj is a function of Zj-l, and is independent of Xj-'. So p(.irj) does not depend on the input distribution. The result then follows from the

Corollary: An independent input distribution achieves the maximum of Rdf.

Lemma A9.2: For a fixed input distribution p ( X J ) , the J corresponding 7r-output channels are all symmetric [13, p. 941.

Proof We must show that for any j < J , the set of outputs for the j th 7r-output channel can be partitioned into subsets such that the corresponding submatrices of transition probabilities have rows which are permutations of each other and columns which are permutations of each other. We will call such a matrix row/column-permutable.

Let nj 5 IXljIJlj be the number of points 6 E A with p(7rj = 6) > 0, and let {&}:& explicitly denote this set. Then we can partition the output into nj sets, where the ith set consists of the pairs {(y,&): y E Y } . We want to show that the transition probability matrix associated with each of these output partitions is rowkolumn-permutable, i.e., that for all i, 1 5 i 5 nj, the 1x1 x IyI matrix

Lemma A9.I: For all j , Rj depends only on p ( x j ) .

definition of Rj. 0

' A P2 = p ( y j = y , " j = 6; I "j = (c), z E x , y E y (111)

has rows which are permutations of each other, and columns which are permutations of each other.


~

885

Since the FSMC is a variable-noise channel, there is a function f such that pk(y I x) depends only on z = f (x ,y ) for all k , 1 5 k 5 K . Therefore, if for some k’, pkt(y I x) = pkt(y’ I d), then f(x,y) = f(x’,y’). But since z = f (x ;y ) is the same for all k , this implies that

n

P k ( Y I = Pk(Y’ I x’) V k , 1 I lc I K. (1 12)

Fix k’. Then by definition of uniform symmetry, pk/(y I x) is rowkolumn-permutable. Using (1 12), we get that the 1x1 x (Y( matrix

K

pc = C P k ( Y I x), 5 E X,Y E Y (1 13)

is also rowkolumn-permutable. Moreover, multiplying a matrix by any constant will not change the permutability of its rows and columns, hence the matrix

k = l

p; = C P k ( Y I x) S i P ( T j = S i ) , 32 E x,y E Y (114) L1 1 is also rowkolumn-permutable. But this completes the proof, since

p ( y j = y, 7rj = Si 1 x j = x) K

= C p k ( y j = y I xj = x)Sip(7rj = S i ) . (115) k = l

U Lemma A9.3: For i.i.d. uniform inputs, R; is monotonically

increasing in j. Proofi For uniform i i d . inputs

Let

Then

We want to show that

or, equivalently, that

Following an argument similar to that of Lemma 4.2, we have

r l 2

r l 2

YEY

r l 2

where a follows from stationarity and b follows from Jensen’s

Lemma A9.4: For uniformly symmetric variable-noise channels, a uniform i.i.d. input distribution maximizes R d f . Moreover

inequality. U

Proof From Lemma A9.2, the maximizing distribution for R d f is independent. Moreover. from Lemma A9.2, each of the 7r-output channels are symmetric, therefore from [ 13, p. 1441, a uniform distribution for p(X, ) maximizes R, for all j , and therefore it maximizes R d f . By Lemma A9.3, R, is monotonically increasing in j for i.i.d. uniform inputs. Finally, by Lemma A2.2, for f ( 7 r , ) as defined in (117), Ef(.li-,)

converges to a limit which is independent of the initial channel state, and thus so does R, = -log (&Ef(.li-,)). Therefore

. J

ACKNOWLEDGMENT

The authors wish to thank V. Borkar for suggesting the proof of Lemma A2.3. They also wish to thank the reviewers for their helpful comments and suggestions, and for providing the counterexample of Lemma A2.6.


REFERENCES

[ I ] M. Mushkin and 1. Bar-David, “Capacity and coding for the Gilbert- Elliot channels,” IEEE Trans. Inform. Theoq, vol. 35, no. 6, pp. 1277-1290, Nov. 1989.

121 A. J. Goldsmith, “The capacity of time-varying multipath channels.“ Masters thesis, Dept. of Elec. Eng. Comput. Sci., Univ. of California at Berkeley, May 1991.

131 I. Csiszjr and J. Korner, lnjormation Theory: Coding Theorems for Discrete Memoryless Channels.

141 A. J. Viterbi and J. K. Omura, Principles ofDigital Communication and Coding. New York: McGraw-Hill, 1979.

151 H. S. Wang and N. Moayeri, “Modeling, capacity. and joint source/channel coding for Rayleigh fading channels,” Tech. Rep. WINLAB-TR-32, Wireless Information Network Lab., Rutgers Univ.. New Brunswick, NJ, May 1992. Also “Finite-state Markov channel-A useful model for radio communication channels,” IEEE Trans. Vehic. Techno!., vol. 44, no. I , pp. 163-171, Feb. 1995.

[6] K. Leeuwin-BoullC and J. C. Belfiore, “The cutoff rate of time correlated Trans. Inform. Theory, vol. 39, no. 2, pp.

612-617, Mar. 1993. 171 N. Seshadri and C.-E. W. Sundberg, “Coded modulations for fading

channels-An overview,” European Trans. Telecommun. Related Tech- nol., vol. ET-4, no. 3, pp. 309-324, May-June 1993.

[SI L.-F. Wei, “Coded M-DPSK with built-in time diversity for fading channels,” IEEE Trans. Inform. Theory, vol. 39, no. 6, pp. 1820-1839. Nov. 1993.

[9] D. Divsalar and M. K. Simon, “The design of trellis coded MPSK for fading channels: Set partitioning for optimum code design,” IEEE Trans. Commun., vol. 36, no. 9, pp. 1013-1021, Sept. 1988.

I IO] W. C. Dam and D. P. Taylor, “An adaptive maximum likelihood receiver

New York: Academic Press, 198 1.

for correlated Rayleigh-fading channels,” IEEE Trans. Commun., vol. 42, no. 9, pp. 2684-2692, Sept. 1994. J. H. Lodge and M. L. Moher, “Maximum-likelihood sequence estimation of CPM signals transmitted over Rayleigh flat-fading channels,” IEEE Trans. Commun., vol. 38, no. 6, pp. 787-794, June 1990. P. Billingsley. Probability and Measure. R. G. Gallager, Information Theory and Reliable Communication. New York: Wiley, 1968. T. M. Cover and J. A. Thomas, Elements of Information Theory. New York: Wiley, 1991. P. R. Kumar and P. Varaiya, Stochastic Systems: Estimation, Identi- $cation, and Adaptive Control. Englewood Cliffs, NJ: Prentice-Hall, 1986. J. G. Proakis, Digital Communications, 2nd ed. New York: McGraw- Hill, 1989. M. V. Eyuboglu, “Detection of coded modulation signals on linear, severely distorted channels using decision-feedback noise prediction with interleaving.” IEEE Trans. Commun., vol. 36, no. 4, pp. 401-409, Apr. 1988. J. C. S. Cheung and R. Steele, “Soft-decision feedback equalizer for continuous phase modulated signals in wideband mobile radio channels,” IEEE Trans. Commun., vol. 42, no. 2/3/4, pp. 1628-1638, Feb.-Apr. 1994. A. DuelLHallen and C. Heegard, “Delayed decision-feedback sequence estimation,” IEEE Trans. Commun., vol. 37, no. 5, pp. 428-436, May 1989. M. V. Eyuboglu and S . U. H. Qureshi, “Reduced-state sequence estimation with set partitioning and decision feedback,” IEEE Trans. Commun., vol. 36, no. I , pp. 13-20, Jan. 1988. T. Kaijser, “A limit theorem for partially observed Markov chains,” Ann. Probab., vol. 3, no. 4, pp. 677-696, 1975.

New York: Wiley, 1986

Date post:	21-Sep-2016
Category:	Documents
Upload:	pp
View:	213 times
Download:	1 times

Capacity, mutual information, and coding for finite-state Markov channels

Documents