+ All Categories
Home > Documents > Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear...

Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear...

Date post: 19-Jun-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
22
Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter h(t) with Fourier transform H (f ), and the noise power spectral density S n (f ) may be non-white. By a simple whitening filter argument, we show that without loss of generality or optimality we may consider only the white noise case. The channel capacity of a linear channel is derived via Shannon’s well-known water-pouring argument, which determines the capacity-achieving transmit power spectral density subject to a constraint on total transmit power. This gives good practical guidelines as to which frequency band(s) should be used for transmission. Multicarrier modulation is one straightforward method for approaching this capacity. Using the principles of optimum detection theory, we show that for single-carrier PAM or QAM transmission through such a channel with a given symbol interval T , without loss of optimality we may reduce the channel model to an equivalent discrete-time channel model by use of a matched filter or a whitened matched filter. This development uses the principles of discrete- time spectral factorization, which may be viewed as an instance of Cholesky factorization. Optimum sequence detection may then be performed by the Viterbi algorithm (VA). However, while this gives bounds on the best possible performance, in practice the VA is too complex. The simplest receiver is simply a linear receiver that removes intersymbol interference. We show that an optimal “zero-forcing” (ZF-LE) receiver exists when there are no nulls in the discrete-time channel spectrum. We determine its performance when it does exist, and show that it is near-canonical when there are no near-nulls in the channel spectrum. The simplest nonlinear receiver is a zero-forcing decision-feedback equalizer (ZF-DFE). We show that an optimal ZF-DFE exists when there is no null band in the transmit spectrum, and we compute its performance. One striking result (“Price’s result”) is that at high SNRs the gap to capacity with uncoded PAM or QAM and ZF-DFE is approximately the same for all linear Gaussian channels; i.e., the ZF-DFE is effectively canonical. The above developments apply to uncoded PAM or QAM. There are no practical difficulties involved in combining coding with linear equalization. However, decision-feedback equaliza- tion does not combine naturally with coding, since immediate decisions are needed. We show that in point-to-point applications where equalization may be performed in the transmitter, DFE-equivalent equalization performance may be obtained in conjunction with coding by a transmitter equalization technique called precoding. 205
Transcript
Page 1: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

Chapter 15

Linear Gaussian Channels

In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter h(t) with Fourier transform H (f ), and the noise power spectral density Sn(f ) may be non-white.

By a simple whitening filter argument, we show that without loss of generality or optimality we may consider only the white noise case.

The channel capacity of a linear channel is derived via Shannon’s well-known water-pouring argument, which determines the capacity-achieving transmit power spectral density subject to a constraint on total transmit power. This gives good practical guidelines as to which frequency band(s) should be used for transmission. Multicarrier modulation is one straightforward method for approaching this capacity.

Using the principles of optimum detection theory, we show that for single-carrier PAM or QAM transmission through such a channel with a given symbol interval T , without loss of optimality we may reduce the channel model to an equivalent discrete-time channel model by use of a matched filter or a whitened matched filter. This development uses the principles of discrete-time spectral factorization, which may be viewed as an instance of Cholesky factorization.

Optimum sequence detection may then be performed by the Viterbi algorithm (VA). However, while this gives bounds on the best possible performance, in practice the VA is too complex.

The simplest receiver is simply a linear receiver that removes intersymbol interference. We show that an optimal “zero-forcing” (ZF-LE) receiver exists when there are no nulls in the discrete-time channel spectrum. We determine its performance when it does exist, and show that it is near-canonical when there are no near-nulls in the channel spectrum.

The simplest nonlinear receiver is a zero-forcing decision-feedback equalizer (ZF-DFE). We show that an optimal ZF-DFE exists when there is no null band in the transmit spectrum, and we compute its performance. One striking result (“Price’s result”) is that at high SNRs the gap to capacity with uncoded PAM or QAM and ZF-DFE is approximately the same for all linear Gaussian channels; i.e., the ZF-DFE is effectively canonical.

The above developments apply to uncoded PAM or QAM. There are no practical difficulties involved in combining coding with linear equalization. However, decision-feedback equaliza-tion does not combine naturally with coding, since immediate decisions are needed. We show that in point-to-point applications where equalization may be performed in the transmitter, DFE-equivalent equalization performance may be obtained in conjunction with coding by a transmitter equalization technique called precoding.

205

Page 2: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

206 CHAPTER 15. LINEAR GAUSSIAN CHANNELS

15.1 Noise-whitening filters

The general linear Gaussian channel model is

r (t ) = s (t ) ∗ h (t ) + n (t ), (15.1)

where the transmitted signal s (t ) has power spectral density (p.s.d.) S x(f ) and is subject to a power constraint S x(f ) df ≤ P , the channel impulse response h (t ) is L 2 and has Fourier transform H (f ), and the additive noise n (t ) is a Gaussian process with p.s.d. S n(f ). The channel band is defined as B c = supp H (f ) = {f : H (f ) �= 0}.

The channel SNR function is defined as SNRc(f ) = |H (f )|2/S n(f ). We assume that SNRc(f ) < ∞ for all f ; i.e., S n(f ) > 0 whenever H (f ) �= 0.

A noise-whitening filter is any filter with response g (t ) and transform G (f ) such that |G (f )|2 = S n/S n(f ) for some constant S n > 0 for all frequencies f ∈ B c. The filter is applied to the received signal r (t ) to give

r ′(t ) = r (t ) ∗ g (t ) = s (t ) ∗ h (t ) ∗ g (t ) + n (t ) ∗ g (t ).

Since G (f ) is nonzero in the channel band B c, it is invertible in B c and thus information-lossless.

The filtered noise n ′(t ) = n (t ) ∗ g (t ) then has constant p.s.d. S n� (f ) = S n for all f ∈ B c. Since the receiver may without loss of optimality filter out all frequencies for which H (f ) = 0, it follows that the whitened noise n ′(t ) is equivalent to white Gaussian noise with p.s.d. S n with respect to the channel band B c.

The channel response is now h ′(t ) = h (t ) ∗ g (t ), with spectrum H ′(f ) = H (f )G (f ). The channel SNR function is unchanged, since

2 2 22 |H (f )| |G (f )| |H (f )|

= = = SNRc(f ). (15.2)2 S n(f )SNR′

c(f ) = |H ′(f )|

S n(f )|G (f )|S n

We conclude that an equivalent channel model is

r ′(t ) = s (t ) ∗ h ′(t ) + n ′(t ), (15.3)

where h ′(t ) = h (t ) ∗ g (t ) and n ′(t ) is AWGN with p.s.d. S n. Thus without loss of generality or optimality, we may continue to assume that the noise is white.

15.2 Channel capacity and water-pouring

Shannon showed that the total capacity of this channel is maximized subject to a power con-straint S x(f ) df ≤ P if the transmit p.s.d. S x(f ) is chosen as

S x(f ) = K (P ) − 1/ SNRc(f ), if K (P ) − 1/ SNRc(f ) ≥ 0; 0, if K (P ) − 1/ SNRc(f ) ≤ 0,

(15.4)

where K (P ) is a constant chosen so that the power constraint S x(f ) df ≤ P is satisfied. Note 2 2that 1/ SNRc(f ) = S n/ |H ′(f )| = S n(f )/ |H (f )| . See Appendix 10-A for a proof.

This result is illustrated by the “water-pouring” diagram of Figure 10.1.

Page 3: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

207 15.2. CHANNEL CAPACITY AND WATER-POURING

1/SNRc(f)

K(P)

Sx(f)

W

frequency f

Figure 10.1. Determination of optimum transmit p.s.d. S x(f ) by water-pouring.

On a typical channel, the function 1/ SNRc(f ) forms a “bowl” into which “water” S x(f ) is poured. The “height” of the water is a constant K , and the “depth” is S x(f ) = K − 1/ SNRc(f ) for all frequencies for which K − 1/ SNRc(f ) ≥ 0. Water is poured until its total “volume”

S x(f ) df is equal to P , at which time its height is K (P ).

The set of frequencies B = {f | S x(f ) > 0} is called the capacity-achieving band. On a typical channel, B is one continuous band of width W , as shown in Figure 10.1. However, if the function 1/ SNRc(f ) has multiple local minima, then depending on P the capacity-achieving band may consist of two or more frequency intervals.

Note that if H (f ) is an ideal brick-wall channel over some band B of width W and S n(f ) is constant over B so that SNRc(f ) is constant, then the water-pouring result prescribes that S x(f ) should be constant over the band B and zero elsewhere, as expected.

There is usually little loss in capacity if S x(f ) is chosen to be flat over the capacity-achieving band B . This observation justifies the use of conventional PAM or QAM modulation over B . If B is made up of more than one frequency interval, then PAM or QAM modulation may be used over each interval independently, in a form of frequency division multiplexing. In practice, a band somewhat narrower than B is often used.

To achieve capacity, the transmitted signal s (t ) should actually be a zero-mean Gaussian pro-cess with p.s.d. S x(f ). Of course a digitally modulated signal s (t ) cannot actually be Gaussian. All that is required in practice is that the original data sequence {a k } resemble an i.i.d. Gaussian sequence statistically, to the accuracy of the continuous approximation. This is accomplished by shaping techniques such as those mentioned in Section 9.3.

Multicarrier transmission is an alternative modulation technique that is directly inspired by water-pouring. The band B is effectively split into many parallel subbands of small width ∆W . Within each subband, the function 1/ SNRc(f ) may be assumed to be nearly constant, so modulation and coding techniques suitable for ideal band-limited channels may be used. Intersymbol interference is avoided by allowing an appropriate guard band between symbols; the relative rate loss due to this guard band becomes negligible as the signaling interval T = 1/ ∆f becomes large. Capacity may be approached if the power P (f ) in a subband at frequency f is allocated in water-pouring fashion, P (f ) = S x(f )∆f , and if the rate R (f ) is made to approach the subband capacity C (f ) = ∆f log (1 + SNR(f )) b/s, where SNR(f ) = S x(f )SNRc(f ) is the signal-to-noise ratio in the subband at frequency f .

Page 4: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

208 CHAPTER 15. LINEAR GAUSSIAN CHANNELS

15.3 Normalized SNR for linear Gaussian channels

The capacity in bits per two dimensions with the optimum S x(f ) is

df C = log2 (1 + SNR(f )) , (15.5)

WB

where we define W = |B | = df andB

SNR(f ) = S x(f )SNRc(f ). (15.6)

We may translate this into an equivalent signal-to-noise ratio SNReq satisfying

dflog (1 + SNReq) = log (1 + SNR(f )) , (15.7)

WB

where the logarithms may have any common base. In other words, 1 + SNReq is the geometric mean of 1 + SNR(f ) over B . Equivalently, C = log (1 + SNReq) is the arithmetic mean of C (f ) = log (1 + SNR(f )). Note that if SNR(f ) is constant over B , then SNReq is equal to this constant.

In this way we may roughly characterize an arbitrary linear Gaussian channel by a bandwidth W and a signal-to-noise ratio SNReq such that the number of available dimensions per second is 2W and the capacity in b/2D is C = log (1 + SNReq), as in the ideal channel case.

We may proceed to define the normalized signal-to-noise ratio as before:

SNReqSNRnorm = 2ρ − 1 , (15.8)

where ρ is the actual spectral efficiency of a given transmission scheme in b/2D. Since by the Shannon limit we must have ρ < C for reliable transmission, we may again express the Shannon limit by the lower bound SNRnorm > 1, and measure the distance to the Shannon limit of a given transmission scheme by its required SNRnorm, in dB.

15.4 Reduction to equivalent discrete-time model

The water-pouring argument allows us to determine the capacity-achieving band B and thus to choose one or more frequency intervals to be used by one or more PAM or QAM modulation schemes. Therefore, following Chapter 3, we henceforth assume that s (t ) is a real or complex equivalent baseband signal of the form

s (t ) = a k p (t − kT ), k

where {a k} is a real or complex data sequence and p (t ) is an equivalent baseband impulse response.

The transmit power spectral density is then

2S x(f ) = S x|P (f )| ,

Page 5: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

� �

209 15.4. REDUCTION TO EQUIVALENT DISCRETE-TIME MODEL

where Sx is the average energy of the signal alphabet A per two dimensions and P (f) is the transform of p(t). To obtain the water-pouring transmit p.s.d. Sx(f), assuming that the capacity-achieving band B is a single passband interval of width W , the modulation interval should be chosen as T = 1/W . This implies that the Nyquist band is B, and that Sx(f) = 0 outside the Nyquist band, so there is no aliasing. Furthermore, typically Sx(f) = 0 at the Nyquist band edges.

The channel output is then

r(t) = akf(t − kT ) + n(t), k

where f(t) = p(t) ∗ h(t) is the composite response of the transmitter and channel, and n(t) is additive white Gaussian noise with p.s.d. Sn.

At the channel output, the signal p.s.d. is Sx|F (f)|2 = Sx|P (f)|2|H(f)|2. The output signal-to-noise ratio is therefore

SNRo = Sx|F (f)|2 df dfB = SNR(f) .

SnW WB

In other words, SNRo is the arithmetic mean of SNR(f) over B. We show in Appendix 10-B that this implies that SNRo ≥ SNReq.

We now use the principles of optimum detection theory to reduce the continuous-time model r(t) = k akf(t − kT ) + n(t) to an equivalent discrete-time model, without loss of optimality.

The signal space is now the Hilbert space W (f ) spanned by the set of shifted impulse responses f = {f(t − kT ), k ∈ Z}. By the results of Chapter 4, the set r = {rk} of outputs of a bank of matched filters is a set of sufficient statistics for detection of the sequence a = {ak}, where

rk = r(t)f ∗(t − kT ) dt.

The set r = {rk} may be obtained by sampling the output z(t) of a single matched filter with ∗response f (−t) at times kT to give rk = z(kT ), since z(t) = r(τ)f∗(τ − t) dτ .

The composite response is then Rff (t) = f(t) ∗ f∗(−t), the autocorrelation function of f(t), whose transform is |F (f)|2 . Define

Rk = Rff (kT ); (15.9)

thus by Chapter 2 � rk = ajRk−j + nk,

j

where � nk = n(t)f ∗(t − kT ) dt.

In D-transform notation, r(D) = a(D)R(D) + n(D).

By the aliasing theorem, the transform of the sequence {Rk} is the aliased spectrum

1 S(u) = |F (

u − m)|2 .

T T m

Page 6: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

� �

210 CHAPTER 15. LINEAR GAUSSIAN CHANNELS

If there is no aliasing— e.g., if Sx(f ) is the water-pouring transmit spectrum— then

S(u) = 1 |F (

u )|2 .

T T

In this case the output signal-to-noise ratio SNRo is the same in discrete time, since

2 df = SxSNRo =

Sx |F (f )| S(u) du,W SnSn B

the arithmetic mean of SNR(u) = (Sx/Sn)S(u).

Furthermore, by Chapter 5, since nk = n(t)f ∗(t − kT ) dt, we have

∗E[njnk ∗] = Sn f (t − jT )f (t − kT ) dt

= SnRff ((k − j)T ) = SnRk−j .

Thus, up to the constant factor Sn, the autocorrelation sequence of the noise sequence n(D) is the sampled autocorrelation sequence {Rk = Rff (kT )}.

To summarize:

Equivalent discrete-time model (via MF): The output sequence {rk} of a T -sampled matched filter is a set of sufficient statistics and has D-transform

r(D) = a(D)R(D) + n ′(D), (15.10)

where R(D) is the D-transform of the autocorrelation sequence {Rff (kT )}, with spectrum S(u) equal to the aliased spectrum of |F (u/T )|2, and n′(D) is a stationary Gaussian sequence with autocorrelation function

Rn�n� (D) = SnR(D). (15.11)

15.5 Spectral factorization in discrete time

We now give a unique factorization theorem for a discrete-time autocorrelation function R(D), or equivalently for its spectrum S(u). In order for this theorem to hold, we require two technical assumptions, called the Paley-Wiener conditions:

(a) The spectrum S(u) must be integrable: S(u) du < ∞. (This always holds when S(u) is the aliased spectrum of an integrable spectrum |F (f )|2.)

(b) The spectrum S(u) must be log-integrable: log S(u) du > −∞. (This holds if S(u) > 0 everywhere, or even if S(u) has only a countable set of algebraic zeroes; it fails if S(u) = 0 over any band of frequencies of nonzero measure.)

Then we have the following discrete-time spectral factorization theorem:

Page 7: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

211 15.5. SPECTRAL FACTORIZATION IN DISCRETE TIME

Theorem 15.1 (Discrete-time spectral factorization) Let {Rk } be any autocorrelation function with D-transform R(D) = R∗(D−1) and real spectrum S(u) ≥ 0, and assume that S(u) is integrable and log-integrable. Then there exists a unique canonical (causal, monic, minimum-phase) impulse response {hk } with D-transform h(D) = 1 + h1D + h2D

2 + . . . and spectrum H(u) such that

R(D) = A2h(D)h ∗(D−1); (15.12) S(u) = A2|H(u)|2 , (15.13)

where the constant A2 > 0 is the geometric mean of the spectrum S(u); i.e., �

log A2 = log S(u) du, (15.14)

where the logarithms may have any common base.

Appendix 10-C sketches a proof of this theorem. Appendix 10-D shows how this factorization may be understood as a Cholesky factorization of a Toeplitz Gram matrix.

For uniqueness and stability, h(D) must be minimum-phase. If h(D) is rational, then “minimum-phase” means that all poles and zeroes are on or outside the complex unit circle. (In order for h(D) to be in L2, all poles must be outside the unit circle.) A causal, monic, minimum-phase h(D) is called canonical.

The spectral factorization theorem thus says that if a real spectrum S(u) satisfies the Paley-Wiener conditions, then there exists a “square root” AH(u) of S(u) such that S(u) = A2|H(u)| . By requiring that h(D) be canonical, the factorization is made unique.

Spectral factorization is straightforward if R(D) is finite; i.e., Rk = 0 for k > L for some degree L. (Note that R(D) must be finite if f(t) is a finite impulse response, and that R(D) may be approximated arbitrarily closely by a sufficiently long finite sequence.) Then R(D) has precisely 2L complex roots. By the Hermitian symmetry of autocorrelation functions, R(D) = R∗(D−1), which implies that if α is a root then (α∗)−1 is a root; i.e., the roots are grouped in L conjugate inverse pairs. Since |α| ≥ 1 implies |(α ∗)−1| ≤ 1, at least one root of each pair satisfies |α| ≥ 1.

We may thus write

∗R(D) = K (1 − α−1D)(1 − (αi )−1D−1)i 1≤i≤L

where K is some constant and {αi, 1 ≤ i ≤ L} is a set of L roots with |αi| ≥ 1. Then R(D) = Kh(D)h∗(D−1), where

h(D) = (1 − α−1D)i 1≤i≤L

is causal, monic and minimum-phase— i.e., canonical. By uniqueness, h(D) must be the canon-ical factor of R(D), and K must equal A2 . Note that h(D) has degree L.

This procedure generalizes straightforwardly to rational R(D); then h(D) is the rational canon-ical response that has as its poles and zeroes one of each pair of poles or zeroes of R(D) that lies on or outside the complex unit circle, and h∗(D−1) has as its poles and zeroes the conjugate inverses of the poles and zeroes of h(D). Note that if S(u) is integrable, then R(D) can have no poles on the complex unit circle.

2

Page 8: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

� � �

� �

212 CHAPTER 15. LINEAR GAUSSIAN CHANNELS

∗D−1 +Example 15.1 (First-order spectra) Let R(D) have degree 1; i.e., let R(D) = R1

R0 + R1D, where R0 is real and positive. In order for the spectrum

R1 ∗ −2πiu + R0 + R1e 2πiu S(u) = e

= R0 + 2|R1| cos(2πu + θ(R1))

to be nonnegative for all u, we must have 2|R1| ≤ R0. By the quadratic formula, the roots of R(D) are

α = −R0 − R0

2 − 4|R1|2

2R1 ;

(α ∗)−1 = −R0 + R0

2 − 4|R1|2

2R1 =

2R1 ∗

−R0 − �

R0 2 − 4|R1|2

.

Since 2|R1| ≤ R0, the discriminant R02 −4|R1|2 is nonnegative and its square root is real (and by

convention positive). Therefore the root α has larger magnitude, |α| ≥ 1. The unique canonical spectral factorization is therefore

∗R(D) = A2(1 − α−1D)(1 − (α )−1D−1),

where 1

2A2 = −R1α = R0 + R02 − 4|R1| .

2

Interestingly, this yields a closed-form formula for the definite integral

log (R0 + 2|R1| cos(2πu)) du = log S(u) du = log A2 .

It can be shown that this formula (which may be found in standard integral tables) implies the validity of the formula (??) for A2 for any rational R(D).

15.6 The whitened matched filter and the canonical model

Section 10.4 shows that without loss of optimality, using the T -sampled outputs of an MF, we obtain the equivalent discrete-time channel model

r(D) = a(D)R(D) + n ′(D),

where n′(D) is a stationary Gaussian sequence with autocorrelation function

Rn�n� (D) = SnR(D).

By the spectral factorization theorem, if R(D) satisfies the Paley-Wiener conditions, then we may write

∗R(D) = A2h(D)h (D−1)

for some canonical h(D) and A2 > 0. Moreover, we may represent n′(D) as

∗n ′(D) = n(D)Ah (D−1),

Page 9: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

� �

′ �

213 15.6. THE WHITENED MATCHED FILTER AND THE CANONICAL MODEL

where n(D) is i.i.d. Gaussian with mean zero and symbol variance Sn, since then

∗Rn�n� (D) = Rnn(D)A2h (D−1)h(D) = SnR(D).

Consequently ∗ ∗r(D) = a(D)A2h(D)h (D−1) + n(D)Ah (D−1).

Suppose h(D) has no zeroes on the unit circle; then it is invertible. We may thus filter r(D) by the anticausal filter 1/Ah∗(D−1) to obtain

r ′(D) = a(D)Ah(D) + n(D),

where h(D) is canonical and n(D) is i.i.d. Gaussian with symbol variance Sn.

We call this the canonical discrete-time channel model. It is the discrete-time equivalent of the continuous-time AWGN channel model r(t) = k ak f(t − kT ) + n(t) with which we started. In this model, the power spectrum

2S(u) = A2|H(u)|2is a T -aliased version of the continuous-time power spectrum |F (f)| . As previously noted, if

there is no aliasing, then the output signal-to-noise ratio is unchanged:

SNRo = Sx |F (f)|2 df T =

Sx S(u) du;

Sn B Sn [0,1)

i.e., SNRo is the mean of the discrete-time SNR function SNR(u) = (Sx/Sn)S(u).

The cascade of the continuous-time matched filter with anticausal response f∗(−t) with the discrete-time filter with anticausal response 1/Ah ∗(D−1) is a continuous-time filter with anticausal response ε∗(−t), called a whitened matched filter (WMF). The samples of the whitened matched filter are the correlations

rk = r(t)ε ∗(t − kT ) dt

of the received sequence r(t) with the time-shifts {ε(t − kT ), k ∈ Z}. The set ε = {ε(t − kT ), k ∈ Z} may be seen to be an orthonormal basis for the signal space

W (f ) generated by the set f = {f(t − kT ), k ∈ Z}, as follows. Explicitly, since ε(t) is defined as the cascade of f(t) with the discrete-time filter 1/Ah(D), we have

ε(t − kT ) = (1/A)f(t − kT ) − h1ε(t − (k + 1)T ) − h2ε(t − (k + 2)T ) − . . . .

Inverting this equation, we have

f(t − kT ) = A hj ε(t − (k + j)T ). j

From this it follows that the autocorrelation functions of f and ε are related by

∗Rff (D) = A2h(D)h (D−1)Rεε(D).

∗But Rff (D) = R(D) = A2h(D)h (D−1). It follows that Rεε(D) = 1; i.e., the generators {ε(t − kT ), k ∈ Z} are orthonormal.

Page 10: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

214 CHAPTER 15. LINEAR GAUSSIAN CHANNELS

The fact that n(D) is an i.i.d. Gaussian sequence with symbol variance Sn then follows directly from the orthonormality of the basis {ε(t − kT ), k ∈ Z}.

It may be shown that this continuous-time definition of the WMF and the canonical discrete-time model are valid whenever R(D) satisfies the Paley-Wiener conditions, so that the canonical spectral factor h(D) is well-defined, regardless of whether h(D) has zeroes on the unit circle.

To summarize:

Canonical discrete-time model (via WMF): Provided that the discrete-time autocorrelation function R(D) satisfies the Paley-Wiener conditions (i.e., its spectrum S(u) is log-integrable), the output sequence {r } of a T -sampled whitened matched filter is well defined and is a set of k sufficient statistics with D-transform

r ′(D) = a(D)Ah(D) + n(D), (15.15)

∗where R(D) = A2h(D)h (D−1) is the canonical spectral factorization of R(D) (so h(D) is canonical and log A2 = log S(u) du) and n(D) is i.i.d. Gaussian with symbol variance Sn.

15.7 Maximum-likelihood sequence detection

Given this canonical discrete-time channel model, we now show that we may use the Viterbi algorithm to perform maximum-likelihood sequence detection.

We must assume that R(D) and hence h(D) is finite; however, any well-defined autocorrelation sequence R(D) may be approximated arbitrarily closely by a finite sequence.

Suppose that h(D) has degree L; i.e., h(D) = 1+h1D+· · · + hLDL. Define the signal sequence

s(D) = a(D)Ah(D).

Then s(D) is the output of a finite-impulse-response (FIR) filter with memory-L response Ah(D) when the input is the data sequence a(D). Such a filter may be realized by a shift register with L memory elements, each of which stores an element of the input alphabet A; i.e., the memory at time k stores (ak−1, . . . , ak−L). Thus the filter is a finite-state machine with |A|L states. The set of all possible output sequences s(D) may thus be represented by a trellis with |A|L states.

The problem of finding the a(D) that maximizes the likelihood p(r(D) | a(D)) is thus a problem of maximum-likelihood sequence detection (MLSD) of the output of a finite-state machine in memoryless (i.i.d. Gaussian) noise. It may therefore be solved by applying the Viterbi algorithm (VA) to this |A|L-state trellis.

In most practical settings, |A|L is so large so that the VA is impractical. However, since the VA is optimal, we may obtain upper bounds on performance by analyzing its error probability. Also, numerous suboptimal approximations to the VA that approach the optimal performance have been developed.

The error probability with the VA is governed by the minimum squared Euclidean distance between possible signal sequences s(D) = a(D)Ah(D).

If the canonical channel model r(D) = a(D)Ah(D) + n(D) is real and the input alphabet A is an M -PAM alphabet A = b{±1, ±3, . . . , ±(M − 1)}, then a pair of nearby signal sequences s(D) = a(D)Ah(D) and s′(D) = a′(D)Ah(D) is obtained if a(D) and a′(D) differ at only one

Page 11: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

� �

′ �

� �

� �� � �

15.8. THE MATCHED-FILTER LOWER BOUND 215

time k by a single level; i.e., a(D) − a′(D) = ±2bDk . This is called a single-symbol error event. Then

s(D) − s ′(D) = (a(D) − a ′(D))Ah(D) = ±2bDkAh(D).

It follows that the squared distance between these two signals is

‖s − s ′‖2 = 4b2A2‖h‖2 = 4b2 A2|H(u)|2 du = 4b2 S(u) du.

The probability of error for such a pair is � � �� � � √ ‖s − s′‖2 √ b2

Pr(s → s) = Q 4σ2 = Q S(u) du

σ2 ,

where σ2 = N0/2 is the variance per dimension. Moreover, there are two such single-symbol error events at each time k if ak is an interior point in the M -PAM constellation, or one such event if ak is a boundary point.

With a complex canonical channel model and a QAM signal alphabet A, the same development goes through, except that there may as many as 4 single-symbol error events if ak is an interior point in the QAM constellation.

In many cases, single-symbol error events will be the minimum-distance error events. However, especially if there is severe intersymbol interference in the canonical response h(D), other types of error events may have smaller distance.

Whatever the minimum squared distance d2 min(s, s

′) between error events, we also have the pairwise lower bound

√ d2 min(s, s

′)Pr(E) ≥ Q

4σ2 .

Thus, as is true in general, we have a pairwise lower bound, an estimate (the union bound estimate), and an upper bound (the union bound) that are all dominated by a term √ Q (d2

min(s, s′)/4σ2), and that differ only in their multiplicative coefficients (plus additional

terms, in the case of the union bound).

15.8 The matched-filter lower bound

Suppose that only a single non-zero signal is sent from an M -PAM or M × M -QAM alphabet A with d2 = 4b2 . Then the analysis above shows that an optimum detector will have error min probability approximately equal to

√ b2

Pr(E) ≈ KQ S(u) du σ2 , (15.16)

where K is the average number of nearest neighbors in A. Since no sequence detector can possibly outperform this optimal one-shot detector, this value of Pr(E) is called the matched-filter lower bound (MFB).

We define SNRMFB = SNRo = (Sx/Sn)( S(u) du). Since b2/σ2 = 3(Sx/Sn)/(2ρ − 1) for uncoded M -PAM or M × M -QAM transmission, and SNRnorm = SNReq/(2ρ − 1), we may write the MFB as ��

SNRMFB � � √

Pr(E) ≥ KQ 3SNRnorm .SNReq

Page 12: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

′ ′

� �

216 CHAPTER 15. LINEAR GAUSSIAN CHANNELS

which shows that the factor γMFB = SNRMFB/SNReq may be regarded as a potential “ISI gain” due to the intersymbol interference in the equivalent discrete-time channel response h(D). Note that γMFB = 1 and h(D) = 1 if and only if the channel is ideal (S(u) is constant); otherwise γMFB > 1.

From the discussion of the previous section, to the accuracy of the union bound estimate, this potential ISI gain may indeed be realized with uncoded M -PAM or M × M -QAM transmission and MLSD, provided that single-symbol error events of the type ±2bDkAh(D) are the only minimum-distance error events.

We shall see subsequently that with powerful coded modulation this potential ISI gain cannot be realized; the minimum distance between error events is determined by the code and not by fortuitous intersymbol interference.

15.9 Zero-forcing linear equalization

The canonical channel model is obtained by linearly filtering the received signal r(t) with a WMF and sampling. Since this may be done without loss of optimality, the optimum linear receiver under any conditions of optimality consists of the cascade of a WMF, a T -spaced sampler, and a further discrete-time filter f (D).

The optimum linear receiver under the Nyquist zero-forcing criterion (i.e., no intersymbol interference) therefore must be the cascade of a WMF, a T -spaced sampler, and an ISI-eliminating discrete-time filter f (D) = 1/Ah(D). This is called an (optimal) zero-forcing linear equalizer (ZF-LE).

If h(D) has zeroes on the unit circle, or equivalently if the spectrum S(u) = A2|H(u)|2 has nulls (frequencies u where S(u) = 0), then 1/h(D) is not well defined and the ZF-LE does not exist.

If the ZF-LE exists, then its output sequence is

r(D) r ′(D) =

Ah(D)= a(D) +

n(D) .

Ah(D)

Thus rk = rk + nk, where n′(D) = n(D)/Ah(D) is a stationary Gaussian sequence with auto-correlation function

Rnn(D) SnRn�n� (D) =

A2h(D)h∗(D−1)= .

R(D)

Thus nk is a Gaussian variable with variance

Rn� n� ,0 = Sn�n� (u) du = Sn S(u)−1 du.

The signal-to-noise ratio at the decision point with a ZF-LE is therefore

�� �−1

SNRZF−LE = Sx

S(u)−1 du . (15.17)Sn

With uncoded transmission using a PAM or QAM constellation with minimum squared distance 4b2 between signal points, independent symbol-by-symbol minimum-distance decisions

Page 13: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

′ ′′ ′ �

217 15.10. ZERO-FORCING DECISION-FEEDBACK EQUALIZATION

may be made on each received symbol (disregarding the memory of the noise). The error prob-ability is approximately

Pr(E) ∼= KQ √

� b2

σ2

= KQ √

��SNRZF−LE

SNReq

3SNRnorm

.

� where σ2 = (N0/2) S(u)−1 du is the variance per dimension, and K is the appropriate error coefficient.

In Appendix 10-B we show that SNRZF−LE ≤ SNReq, with equality if and only if S(u) is constant. The factor

SNRZF−LEγZF−LE = ≤ 1

SNReq

represents a “noise enhancement loss” due to noise enhancement in the linear receiver. The factor γZF−LE = 1 if S(u) is flat and 0 if S(u) has nulls (where the ZF-LE does not even exist); γZF−LE will tend to be near 1 if S(u) is near-flat and near 0 if S(u) has near-nulls.

15.10 Zero-forcing decision-feedback equalization

A decision-feedback equalizer is the simplest form of nonlinear equalizer. The basic idea is that if the receiver is operating with a low error rate, then almost all decisions are correct. By assuming correct decisions, we may eliminate all intersymbol interference due to past (decided) symbols.

With the canonical discrete-time channel model r(D) = a(D)Ah(D)+ n(D), decision-feedback equalization is implemented by the receiver shown in Figure 10.2.

r(D) = a(D)Ah(D) + n(D) 1/A

+ decision â(D)

-

h(D) - 1

Figure 10.2. Decision-feedback equalization with canonical channel model.

The received sequence is scaled by 1/A to give r′(D) = a(D)h(D) + n(D)/A. Given the decisions (. . . , ak−1) prior to time k, the intersymbol interference due to these past symbols ak−2, is subtracted from rk :

rk = rk − hj ak−j . j≥1

If the previous decisions are correct, a(D) = a(D), then

r ′′(D) = r ′(D) − a(D)(h(D) − 1) = a(D) + n(D)/A,

the “postcursor” interference is completely removed, and r′′ = ak + nk /A.k

The signal-to-noise ratio at the decision point of a ZF-DFE receiver is therefore

SNRZF−DFE

In Appendix 10-B, we show that

= Sx

Sn/A2 = Sx

Sn exp

� � log S(u) du

. (15.18)

SNRZF−LE ≤ SNRZF−DFE ≤ SNReq ≤ SNRo,

with all inequalities strict if S(u) is not a constant.

Page 14: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

218 CHAPTER 15. LINEAR GAUSSIAN CHANNELS

We note however that if SNR(u) = (Sx/Sn)S(u) is large for all u, then SNRZF−DFE ≈ SNReq. In this sense, at high SNRs on channels without nulls, the ZF-DFE is a canonical receiver.

Assuming that all past decisions have been correct, with uncoded M -PAM or M × M -QAM transmission and symbol-by-symbol decisions the probability of error is ��

SNRZF−DFE � � √

3SNRnorm .Pr(E) = KminQ SNReq

∼Therefore if SNRZF−DFE = SNReq, which on most channels will be true for large SNR, then the curve of Pr(E) vs. SNRnorm is independent of the channel characteristics with uncoded transmission and an optimum ZF-DFE receiver; i.e., the distance to the Shannon limit at a given Pr(E) is independent of the channel. (This is “Price’s result.”)

∗It can be shown that over all factorizations R(D) = A2h(D)h (D−1), the canonical (minimum-phase) factor h(D) maximizes the ratio |h0|2/‖h‖2, the fraction of the total response in the first coefficient h0, which yields the largest SNR for a ZF-DFE.

What happens when a decision error is made? In practice, errors may propagate for some time, but a DFE will always resynchronize eventually. Thus the probability of an error event given no past error is given by the Pr(E) above; however, the average number of symbol errors per √ error event will increase. In other words, the error probability exponent (argument of the Q (•) function) is unaffected, but the error coefficient for bit or symbol error probability will increase. The reader should therefore not be too concerned about error propagation in decision-feedback equalizers.

15.11 Tomlinson-Harashima precoding

Decision-feedback equalizers are inherently incompatible with coding. A DFE requires decisions with no delay; but if symbol-by-symbol decisions are made with a coded input sequence, then the resulting “raw error rate” is often quite high, so error propagation can no longer be regarded as insignificant.

If the data sequence is trellis-coded and the VA is used for decoding, one method of resolving this incompatibility is to implement a separate DFE for each surviving path in the decoder (“per-survivor processing”). The computational complexity of course increases by a factor equal to the number of trellis states, which may be prohibitive. Many reduced-complexity approximations to this strategy have been investigated.

A preferable method is to use a form of transmitter precoding, sometimes called “DFE in the transmitter.” Such precoding techniques effectively implement the feedback part of the DFE in the transmitter, where a(D) is known and no errors can occur. This method is applicable only on point-to-point (as opposed to multipoint or broadcast) channels, and requires that the channel characteristics (the equivalent discrete-time response Ah(D)) be known in the transmitter.

In general, precoding aims to achieve the following two objectives:

(a) a sequence of transmit signals x(D) is sent such that at the channel output, in the absence of noise, an apparently ISI-free sequence y(D) = x(D)h(D) is received, where y(D) may be uncoded or trellis-coded;

(b) by allowing redundancy in the channel output sequence y(D), a power constraint on the transmitted sequence x(D) and possibly other desirable criteria are met.

Page 15: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

219 15.11. TOMLINSON-HARASHIMA PRECODING

In this section we describe the simplest kind of transmitter precoding, which was invented independently by Tomlinson and by Harashima about 1970. TH precoding was originally introduced for uncoded transmission using M -PAM constellations, but may also be used with coded M -PAM or M × M -QAM transmission. It cannot be used with more general shaped constellations, and thus cannot achieve shaping gain.

An M -PAM signal set is a scaled version of the lattice constellation

C(Z, R(MZ)) = (Z + 1/2) ∩ [−M/2, M/2]

consisting of the M points in the translated integer lattice Z + 1/2 that lie in the length-M interval [−M/2, M/2], which is the Voronoi region R(MZ) of the sublattice MZ.

Every real number r may be written uniquely as x = r + Ma for r ∈ R(MZ), a ∈ Z. The unique r ∈ R(MZ) so defined is called “x mod M .”

Given a signal point yk ∈ C(Z, R(MZ)) to convey to the receiver, and knowing the ISI

pk = hj xk−j

j≥1

due to previous signals, the transmitter generates the transmit signal point

xk = yk − pk mod M.

Thus xk is in the Voronoi region R(MZ). In fact, the signal points will typically be uniformly distributed over R(MZ).

The received point will then be

rk = xk + pk + nk = yk + Mak + nk /A.

for some ak ∈ Z. Since MZ is a sublattice of Z, the noise-free output yk + Mak is still in Z +1/2, and a decision dk may be made to the closest point in Z + 1/2. The resulting decision dk is in Z + 1/2, but may lie outside R(MZ). The decision dk may then be reduced to dk mod M ∈ R(MZ), which must equal yk if dk = yk + Mak. Note that all operations are memoryless, so that no error propagation can occur.

TH precoding works perfectly well when y(D) is a trellis-coded signal sequence, provided only that y′(D) = y(D) + Ma(D) is a valid code sequence whenever y(D) is a valid code sequence and Ma(D) is any sequence of elements of MZ, which holds for practically all trellis codes. The decoder may then decode to the closest code sequence d(D) to the received sequence z(D), and reduce d(D) mod M to y(D) symbol by symbol, as above.

Since r(D) = y′(D) + n(D)/A as with a ZF-DFE, the performance will be the same as on a ZF-DFE channel with effective signal-to-noise ratio SNRZF−DFE, which as we have seen is approximately equal to SNReq at large SNRs.

Thus TH precoding permits the combination of trellis coding and ISI-cancelling (DFE-equivalent) equalization. Its only problem is that it requires the constellation shape in one dimension to be the Voronoi region R(MZ), which has no shaping gain. It also slightly increases transmit power from (M2 − 1)/12, the average energy of an equiprobable discrete M -PAM constellation, to M2/12, the average energy of a uniform continuous distribution over R(MZ), but for large constellations this “continuous approximation” increase is negligible.

Page 16: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

� � �

� � �

220 CHAPTER 15. LINEAR GAUSSIAN CHANNELS

15.12 Summary

It may be helpful to summarize these results.

The equivalent signal-to-noise ratio SNReq is the effective SNR of a linear Gaussian channel. It is given by

SNReq = exp log (1 + SNR(u)) du − 1,

where SNR(u) = (Sx/Sn)S(u), where Sx and Sn are the signal and noise variances, and S(u) = |F (u/T )|2/T is the spectrum of the discrete-time channel response.

The signal-to-noise ratios achieved with uncoded M -PAM or M × M -QAM modulation and zero-forcing linear equalization (ZF-LE), zero-forcing decision-feedback equalization (ZF-DFE), or maximum-likelihood sequence detection (assuming that performance is governed by the matched filter lower bound) are given respectively by the harmonic, geometric and arithmetic means

� � �−1

SNRZF−LE = SNR(u)−1 du ; � � �

SNRZF−DFE = exp log SNR(u) du ; �

SNRMFB = SNR(u) du = SNRo.

At high SNRs, SNRZF−DFE ≈ SNReq, and we obtain approximately the same error probability as on an ideal Gaussian channel with SNR equal to SNReq. The performance with ZF-LE is always worse and the MFB is always better, although as SNR(u) becomes flat all SNRs become the same. On the other hand, if SNR(u) has a near-null, then SNRZF−LE goes to zero, while if SNR(u) has a null, then the ZF-LE does not even exist. The ZF-DFE exists provided that SNR(u) is log-integrable, which will be the case provided that SNR(u) has only algebraic nulls.

With TH precoding, we can use trellis codes to achieve the same coding gains as on an ideal channel. With more advanced precoding, we can achieve shaping gains (up to 1.53 dB) as well. Therefore in principle precoding allows us to approach the Shannon limit as closely as on an arbitrary high-SNR AWGN channel as we can on an ideal AWGN channel.

In the following chapters, we will show that the signal-to-noise ratios achieved with MMSE linear equalization (MMSE-LE) and MMSE decision-feedback equalization (MMSE-DFE) are given respectively by

�� �−1

SNRMMSE−LE = (1 + SNR(u))−1 du − 1;

SNRMMSE−DFE = exp log (1 + SNR(u)) du − 1 = SNReq.

Since 1 + SNR(u) has no nulls, both equalizers unconditionally exist. Using the methods of Appendix 10-B, we can show that both MMSE SNRs are better than their ZF counterparts. Moreover, the MMSE-DFE is canonical at all SNRs, so MMSE-type precoding can be used to approach the Shannon limit on all linear Gaussian channels.

Page 17: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

15.12. SUMMARY 221

Appendix 10-A. Proof of the water-pouring result

After noise-whitening, we obtain an equivalent channel model

r (t ) = s (t ) ∗ h (t ) + n (t ),

where h (t ) is a composite response with spectrum H (f ), and n (t ) is AWGN with p.s.d. S n. The channel SNR function is then

2

SNRc(f ) = |H (f )|

. S n

We wish to maximize the total information rate through this channel subject to a power con-straint S x(f ) df ≤ P on the transmit p.s.d. S x(f ). Note that S x(f ) must be nonnegative.

We approach this problem by dividing the channel into independent parallel subchannels of width ∆f . We assume that ∆f is small enough and H (f ) smooth enough so that |H (f )|2 is approximately constant within each subchannel. Then each subchannel becomes an ideal AWGN channel.

If the transmit power allocated to a subchannel of width ∆f at frequency f is P (f ) = S x(f )∆f , then the capacity of the subchannel in b/s is

C (f ) = ∆f log2 (1 + S x(f )SNRc(f )).

The total capacity in b/s and total transmit power are

C = ∆f log2 (1 + S x(f )SNRc(f )); � P = S x(f )∆f.

As ∆f → 0, these expressions become integrals:

C = df log2 (1 + S x(f )SNRc(f ));

P = S x(f ) df .

Using a Lagrange multiplier λ to account for the constraint S x(f ) df ≤ P , and shifting for convenience to natural logarithms, our problem becomes to choose S x(f ) to maximize

C + λP = (ln (1 + S x(f )SNRc(f )) + λS x(f )) df .

Differentiating with respect to S x(f ) at each f and setting each derivative to zero, we obtain

SNRc(f ) + λ = 0,

1 + S x(f )SNRc(f )

which yields 1

S x(f ) = K (P ) − SNRc(f )

,

where the constant K (P ) = −λ −1 > 0 is chosen so that S x(f ) df = P . This equation for S x(f ) can be satisfied only when it yields a nonnegative value; if it yields a negative value, then S x(f ) must be set to zero. (This implies that the derivative SNRc(f ) − K (P )−1 is negative, so the so-called Kuhn-Tucker conditions for a minimum over a convex region are satisfied.) Thus we obtain the water-pouring solution for S x(f ) given in (??).

Page 18: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

� � �

222 CHAPTER 15. LINEAR GAUSSIAN CHANNELS

Appendix 10-B. Spectral mean inequalities

The parameters SNReq, SNRo, SNRZF−LE and SNRZF−DFE all have the form of spectral means of the channel SNR function {SNR(f ), f ∈ B} (or equivalently of its discrete-time analogue {S(u), u ∈ [0, 1)}). In this appendix we give a general treatment of such spectral means, and a general convexity inequality between them.

When we speak of a “mean” of SNR(f ) over B, we are regarding SNR(f ) as a random variable X defined on R+ = [0, ∞) with probability distribution

Pr{X ≤ x} = |f : SNR(f ) ≤ x|

,|B| where |B| = W = 1/T . This random variable X is deterministic with value K if and only if SNR(f ) is equal to a constant K over B.

Given a continuous, real-valued, strictly monotonic function g : R+ → R, the g-mean Mg of SNR(f ) over B is defined such that

df g(Mg ) = g(SNR(f )) .

WB

In other words, g(Mg ) is equal to the mean of the random variable g(X). More explicitly, since a continuous, strictly monotonic function g has a well-defined inverse g−1 : g(R+) → R+, we may write

df Mg = g −1 g(SNR(f )) .

WB

If X is deterministic with value K, then g(X) is deterministic with value g(K), so in this case Mg = K, which justifies its being called a mean.

A continuous function g is convex ∩ (“convex cap,” “concave”) if whenever X is a random variable equal to x1 with probability (w.p.) θ and x2 w.p. 1 − θ, then E[g(x)] ≤ g(E[x]); see Figure 10.3. It is strictly convex ∩ if strict inequality holds whenever x1 �= x2 and 0 < θ < 1. A twice-differentiable function is convex ∩ if its second derivative is nonpositive and strictly convex ∩ if its second derivative is strictly negative. Similarly, g is (strictly) convex ∪ (“convex cup,” “convex”) if and only if −g is (strictly) convex ∩.

g(x) g(x2)

g(x1)

g(E[X])

E[g(X)]

x1 E[X] x2

Figure 10.3. Jensen’s inequality E[g(X)] ≤ g(E[X]) for a convex ∩ function g, when X is a random variable equal to x1 w.p. θ and x2 w.p. 1 − θ.

Page 19: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

223 15.12. SUMMARY

The definition extends immediately to the following more general inequality:

Theorem 15.2 (Convexity inequality) Let g be a convex ∩ function of x, and let E[X] and E[g(X)] be the means of X and g(X), respectively, under an arbitrary probability distribution. Then

E[g(X)] ≤ g(E[X]).

If g is strictly convex ∩, then equality holds if and only if X is deterministic.

In other words, the g-mean Mg = g−1(E[g(X)]) is overbounded by the mean E[X] when g is convex ∩, with strict inequality if g is strictly convex ∩ and X is non-deterministic.

A more general inequality is the following [Hardy, Littlewood and Polya, Inequalities ]:

Theorem 15.3 (g-mean inequality) Let g and h be continuous, strictly monotonic functions such that g is increasing and g(h−1(x)) is convex ∩. Then Mg ≤ Mh. If g(h−1(x)) is strictly convex ∩ and X is non-deterministic, then the inequality is strict: Mg < Mh.

To apply these inequalities, we make the following observations:

• SNRo = SNRMFB is the mean of SNR(f ).

• SNReq is the g-mean of SNR(f ) with g(x) = log (1 + x).

• SNRZF−LE is the g-mean of SNR(f ) with g(x) = x−1 (the “harmonic mean”).

• SNRZF−DFE is the g-mean of SNR(f ) with g(x) = log x (the “geometric mean”).

The g-mean inequality then yields:

• Since log (1 + x) is strictly convex ∩, SNReq ≤ SNRo = SNRMFB.

• Since log (1 + x−1) is strictly convex ∪, SNRZF−LE ≤ SNReq.

• Since log (1 + ex) is strictly convex ∪, SNRZF−DFE ≤ SNReq.

• Since log x is strictly convex ∩, SNRZF−DFE ≤ SNRo.

• Since log (x−1) is strictly convex ∪, SNRZF−LE ≤ SNRZF−DFE.

Strict inequality holds in all cases unless SNR(f ) is constant.

Page 20: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

224 CHAPTER 15. LINEAR GAUSSIAN CHANNELS

Appendix 10-C. Proof of discrete-time spectral factorization

The spectral factorization of Theorem ?? may be obtained by expressing log S(u) as a discrete-time Fourier transform

αke −2πiuk log S(u) = ,

whose coefficients {αk } are given by the inverse transform

2πiuk du.αk = log S(u) e

∗Since log S(u) is real, α−k = αk. The Paley-Wiener conditions ensure that log S(u) is in L1 so that this transform pair exists. The factorization results from grouping the zero, positive and negative terms of {αk } as follows:

� log A2 = α0 = log S(u) du;

� log H(u) = αk e −2πiuk;

k>0 � log H ∗(u) = αk e −2πiuk ,

k<0

which yields ∗ 2log S(u) = log A2 + log H(u) + log H (u) = log A2|H(u)|

and the formula (??) for A2 .

To obtain an explicit expression for the coefficients of h(D), consider the sequence

ψ(D) = αkDk = log h(D),

k>0

whose transform is ψ(e−2πiu) = log H(u) = log h(e−2πiu). Taking formal derivatives, we have

ψ(1)(D) = h(D)

h(1)(D) ,

or equivalently h(1)(D) = h(D)ψ(1)(D). Repeated formal differentiation yields the recursive equation

k−1 � � k − 1�

h(k)(D) = h(i)(D)ψ(k−i)(D), k ≥ 1. i

i=0

Evaluating this equation at D = 0 and noting that h(k)(0) = (k!)hk and ψ(k)(0) = (k!)αk, we obtain finally h0 = 1 and the recursive expression

k−1 � k − i hk = hiαk−i, k ≥ 1,

k i=0

which explicitly determines h(D) in terms of the coefficients {αk } of the transform of log S(u). This proves that h(D) is causal and monic. For a proof that h(D) is minimum-phase, see [Papoulis, Signal Analysis, McGraw-Hill, 1984].

Page 21: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

225 15.12. SUMMARY

Appendix 10-D. Spectral factorization as Cholesky factorization

The spectral factorization of Theorem ?? may be understood as a Cholesky factorization of a Toeplitz Gram matrix, as follows. The Gram matrix A(f ) of the infinite set of signal space generators f = {f(t − kT ), k ∈ Z} is the infinite Hermitian-symmetric non-negative definite matrix

A(f ) = {Rk−j , j ∈ Z, k ∈ Z}. We note that each row or column of this matrix is just a shift of the autocorrelation sequence {Rk }— i.e., A(f ) is an infinite Toeplitz matrix.

Infinite Toeplitz matrices have the following properties:

(a) An infinite Toeplitz matrix T = {tk−j } is completely characterized by the sequence {tk }, or by its D-transform t(D), or by its spectrum T (u);

(b) The eigenvectors of any Toeplitz matrix are the vectors {e−2πiuk , k ∈ Z} for u ∈ [0, 1), and the associated eigenvalues are the corresponding values T (u) of its spectrum;

(c) The product TU of two such matrices is characterized by the product t(D)u(D) of the associated D-transforms, or by the product T (u)U(u) of the associated spectra, and does not depend on order— i.e., T and U commute: TU = UT .

The Cholesky factorization of A(f ) has the form

∗ A(f ) = LD2L ,

where L is a lower triangular monic matrix and D is a nonnegative diagonal matrix. Because of the Toeplitz property of A(f ), both L and D are Toeplitz; i.e.,

L = {hk−j , j ∈ Z, k ∈ Z}; D = {dk−j , j ∈ Z, k ∈ Z}.

Now the monic lower triangular property of L implies that h0 = 1 and hk = 0 for k < 0. Similarly the diagonal property of D implies that dk = 0 for k �= 0. In terms of D-transforms,

∗the factorization A(f ) = LD2L therefore becomes

∗R(D) = A2h(D)h (D−1),

where A = d0 is the D-transform of {dk } and h(D) = 1 + h1D + h2D2 + . . . is the D-transform

of {hk }. Hence h(D) is causal (hk = 0 for k < 0) and monic (h0 = 1).

Page 22: Chapter 15 · Chapter 15 Linear Gaussian Channels In this chapter we consider a more general linear Gaussian channel model in which the signal s(t) is passed through a linear filter

226 CHAPTER 15. LINEAR GAUSSIAN CHANNELS

Appendix 10-E. Spectral factorization and innovations sequences

Spectral factorization permits a canonical representation of stationary Gaussian sequences in terms of “innovations sequences,” as follows. Let {nk } be an i.i.d. Gaussian sequence, where each nk has mean zero and variance 1. The autocorrelation sequence of {nk } is then

Rnn,j = E[nk nk−j ∗] =

1, if j = 0; 0, if j �= 0;

i.e., Rnn(D) = 1. Now let {n′ } be a sequence obtained by passing {nk} through a filter with k response {gk }; i.e., n′(D) = n(D)g(D). Then it is easily shown that {n } is a stationary (i.e.,k E[nk nk−j

∗] = Rn�n�,j , independent of k) Gaussian sequence with

Rn�n� (D) = Rnn(D)g(D)g ∗ ∗(D−1) = g(D)g (D−1).

Conversely, suppose that {n } is a stationary Gaussian sequence with a given autocorrelation k

function Rn�n� (D). Assuming that Sn�n� (u) satisfies the Paley-Wiener conditions, we may write

∗Rn�n� (D) = A2h(D)h (D−1)

for some canonical h(D) and A2 > 0. Since a zero-mean Gaussian sequence is entirely char-acterized by its second-order statistics, we may represent n′(D) as the output of a filter with response Ah(D) (or Ah∗(D−1)) when the input is an i.i.d. Gaussian “innovations” sequence with Rnn(D) = 1,

n ′(D) = n(D)Ah(D), ∗since the autocorrelation sequence of n′(D) will then be Rn�n� (D) = A2h(D)h (D−1).


Recommended