1 On the Spectral Efﬁciency of Noncoherent Doubly ...schniter/pdf/tit08_pat.pdfthe use of coherent...

1

On the Spectral Efficiency of Noncoherent

Doubly Selective Block-Fading Channels

Arun P. Kannu and Philip Schniter

Dept. ECE, The Ohio State University, Columbus, OH 43210.

Abstract

In this paper, we consider noncoherent single-antenna communication over doubly selective block-

fading channels that obey a complex-exponential basis expansion model. In our noncoherent setup,

neither the transmitter nor the receiver know the channel fading coefficients, though both know the

channel statistics. First, we show that, when the inputs arechosen from continuous distributions, the

achievable spectral efficiency (i.e., the pre-log factor inthe channel capacity expression) equalsmax(0, 1−

NdelayNDopp/N), whereN ,Ndelay, andNDopp denote the channel’s discrete block-fading interval, discrete

delay spread, and discrete Doppler spread, respectively. Next, we study pilot-aided transmission (PAT)

over this channel. In the case of strictly doubly selective fading (i.e.,NDopp > 1 andNdelay > 1), we

establish that affine minimum mean-squared error (MMSE) PATschemes are spectrally inefficient, but

we provide guidelines for the design of spectrally efficientaffine PAT schemes and give an example of

one such scheme.

Index Terms— Noncoherent channels, doubly selective channels, doublydispersive channels, non-

coherent communication, spectral efficiency, channel capacity, achievable rates, pilot symbols, training

symbols, channel estimation.

I. INTRODUCTION

Recently, there has been great interest in characterizing the capacity of wireless multipath channels

under the practical assumption that neither the transmitter nor the receiver has channel state information

2015 Neil Avenue, Columbus, OH 43210. E-mail:[email protected], [email protected]

This work supported by NSF CAREER grant 237037 and the Office of Naval Research.

January 30, 2008 DRAFT

2

(CSI). In this work, we focus on channels that are simultaneously time- and frequency-selective, which

pertain to applications with simultaneously high signaling bandwidth and mobility. The high-SNR capacity

of the noncoherent Gaussian flat-fading channel was characterized in the MIMO case by Zheng and Tse [1]

using the block-fading approximation, whereby the channelcoefficients are assumed to remain constant

over a block ofN symbols and change independently from block to block. Later, Vikalo et al. [2]

characterized the high-SNR capacity of the noncoherent Gaussian frequency-selective block-fading SISO

channel under the assumption that the discrete block-length N exceeds the discrete channel delay spread

Ndelay. Liang and Veeravalli [3] characterized the high-SNR capacity of the SISO Gaussian time-selective

block-fading channel, assuming that, within the block, thechannel coefficients vary according to a finite-

term Fourier series withNDopp ≤ N expansion coefficients that have a full-rank covariance matrix,1 but

change independently from block to block. In [3], they also find the asymptotic capacity of a MIMO sub-

block correlated time-selective fading model, in which thechannel remains constant within a sub-block.

For the aforementioned noncoherent block-fading Gaussianchannels, it has been shown that the capacity

C as a function of SNRρ obeyslimρ→∞C(ρ)/ log(ρ) = η, where theachievable spectral efficiencyη

is given byη = N−1N in the SISO flat-fading case,η =

N−Ndelay

N in the SISO frequency-selective case,2

andη =N−NDopp

N in the SISO time-selective case.

In this work, we consider a SISO channel that combines the frequency-selectivity of [2] with the

time-selectivity of [3], henceforth referred to as theblock-fading doubly selective channel(DSC). More

precisely, this discrete-time channel uses a finite-lengthimpulse response whoseNdelay Gaussian coeffi-

cients vary according to anNDopp-term Fourier series within the block, but change independently from

block to block. When the fading coefficients are uncorrelated in both time and frequency, we show that

the achievable spectral efficiency with continuous input distribution obeysη = max(0,N−NdelayNDopp

N ).

Next, we study pilot-aided transmission (PAT) over this block-fading DSC. In PAT, the transmitter

embeds a known pilot (i.e., training) signal that aids the receiver in data decoding under channel

uncertainty. Often, PAT enables the receiver to compute an explicit channel estimate, thereby facilitating

the use of coherent decoding strategies. (See [4] for a recent comprehensive PAT overview.) We are

1Note that in the case ofNDopp < N , theN -length vector of coefficients has a rank-deficient correlation matrix.

2Assuming uncorrelated inter-symbol interference (ISI) coefficients.


3

interested in finding spectrally efficient PAT schemes, i.e., those whose asymptotic achievable rate pre-

log factor equalsmax(0,N−NDoppNdelay

N ). For the design of spectrally efficient PAT schemes, we consider

the only the case thatN > NDoppNdelay; for the case thatN ≤ NDoppNdelay, achieving the spectrally

efficient rate of zero would be trivial.

Previous studies have established that minimum mean squared error (MMSE) PAT, i.e., PAT that

minimizes the error variance of Wiener channel estimates, is spectrally efficient for flat [1], [5]; frequency-

selective [2]; and time-selective [6], [7] block-fading channels. We establish here, however, that MMSE-

PAT schemes arespectrally inefficientfor strictly doubly selective (i.e.,Ndelay > 1 andNDopp > 1) block-

fading channels. For these channels, we then develop guidelines for the design of spectrally efficient PAT

schemes and propose one such scheme.

Before continuing, a few comments are in order.

1) Our work relies on the block-fading assumption, which canbe justified in systems that employ

block interleaving or frequency hopping. Other investigations have circumvented the block-fading

assumption through the use of time-selective channel models whose coefficients vary from symbol

to symbol in a stationary manner. For these stationary models, it is necessary to make a distinction

betweennon-regular3 (e.g., bandlimited) fading processes andregular (e.g., Gauss-Markov) fading

processes. While non-regular fading channels have been shown to behave similarly to time-selective

block-fading channels, regular fading channels behave quite differently [8], [9]. Though similar

results are expected for stationary doubly selective channel models, the details lie outside the scope

of this work.

2) Our work relies on the assumption that intra-block time-variation can be accurately modeled by a

finite-term Fourier series with uncorrelated coefficients.Though we provide a detailed justification

in the sequel, the key idea is that, due to velocity limitations on the communicating terminals and

the scattering surfaces, the channel fading processes willbe bandlimited. It is well known that

bandlimited random sequences can be well approximated by finite-term Fourier series, where the

approximation error decreases with block size.

3Regular processes allow for perfect prediction of the future samples from (a possibly infinite number of) past samples while

non-regular processes do not. For more details, refer to [8].


4

3) Some authors (e.g., [10]) have studied the capacity of noncoherent doubly selective channels

assuming afixed set of channel eigenfunctions (as motivated by [11]). This fixed eigenfunction

model is suitable under very mild spreading conditions or inthe low-SNR regime as the ratio of

channel capacity to utilized bandwidth approaches zero [12]. We do not employ this model since

we focus on the high-SNR regime and do not assume mild spreading.

The paper is organized as follows. Section II details the modeling assumptions, Section III analyzes

the high-SNR capacity of the noncoherent doubly selective block-fading channel, Section IV details the

PAT setup for this channel, and Sections V–VI analyze several PAT schemes.

Notation: Matrices (column vectors) are denoted by upper (lower) bold-face letters. The Hermitian

is denoted by(·)H, the transpose by(·)⊤, the conjugate by(·)∗, the determinant bydet(·), and the

Frobenius norm by‖ · ‖F . The Loewner partial order is denoted by�, i.e., B � A means thatB − A

is positive semidefinite. The expectation is denoted byE{·}, the trace bytr{·}, the delta function by

δ(·), the Kronecker product by⊗, the modulo-N operation by〈·〉N and the integer ceiling operation by

⌈·⌉. The null space of a matrix is denoted bynull(·), the column space bycol(·), and the dimension

of a vector space bydim(·). The operation[·]n,m extracts the(n,m)th element of a matrix, where the

indicesn,m begin with0, anddiag(·) constructs a diagonal matrix from its vector-valued argument. The

appropriately dimensioned identity and all-zero matricesare denoted byI and0, respectively, while the

N × N identity matrix is denoted byIN . The set-union operation is denoted by∪, set-intersection by

∩, set-minus by\, and the empty set by∅. The integers are denoted byZ, reals byR, positive reals by

R+, and complex numbers byC.

II. SYSTEM MODEL

In this section, a continuous-time fading channel model is used with pulse-shaped transmission and

reception strategies to obtain a discrete-time baseband-equivalent doubly selective channel model. The

channel’s time-variation is then approximated using a basis expansion model (BEM) and the approxima-

tion is analyzed.


5

A. Continuous-Time Model

Consider a baseband-equivalent wireless multipath channel that can be modeled as a linear time-variant

(LTV) distortion plus an additive noise:

y(t) =

∫

h(t; τ)x(t − τ)dτ + v(t). (1)

Say that, over the small time-periodTsmall, the path lengths vary by at most a few wavelengths, so that the

path gains and delays can be assumed constant. Thus, within aduration ofTsmall seconds, it is reasonable

to modelh(t; τ) as a stationary random process for which

E{h(t; τ)h∗(t− to; τ − τo)} = Rlag;delay(to; τ)δ(τo). (2)

Property (2) is commonly known as wide-sense stationary uncorrelated scattering (WSSUS). If we define

RDopp;delay(f ; τ) =

∫

Rlag;delay(t; τ)e−j2πftdt, (3)

then the practical assumptions of finite path-length differences and finite rates of path-length variation

imply that

RDopp;delay(f ; τ) = 0 for

f /∈ [−BDopp,BDopp]

τ /∈ [0,Tdelay]. (4)

Thus, the channel has a causal delay spread ofTdelay seconds and a single-sided Doppler spread ofBDopp

Hz.

B. Discrete-Time Block-Fading Model

Now consider baseband-equivalent modulation, as described by x(t) =∑

n x[n]ψ(t − nTs), where

Ts is the sampling interval in seconds and whereψ(t) is a unit-energy pulse, and baseband-equivalent

demodulation, as described by the received samplesy[n] =∫y(t)ψ∗(t− nTs)dt for n ∈ Z. Throughout,

we assume that the baud rate is larger than the Doppler spread, i.e., 1Ts> 2BDopp. From (1), one can

write

y[n] =∑

l

h[n; l]x[n − l] + v[n] (5)


6

with v[n] =∫v(t)ψ∗(t− nTs)dt and

h[n; l] =

∫ ∫

ψ∗(t)h(t+ nTs; τ + lTs)ψ(t − τ)dtdτ. (6)

The received signal is then parsed into length-N blocks (for evenN ), where the block duration

Tburst ≈ NTs is less than the small-scale fading durationTsmall. In the sequel, we consider the block

{y[n]}N−1n=0 without loss of generality.

C. Complex-Exponential Basis-Expansion Model

For the block of interest, the channel response is characterized byh[n; l] for n ∈ {0, . . . , N − 1} and

l ∈ Z. These response coefficients can be parameterized w.l.o.g.using the basis expansion model (BEM)

h[n; l] =1√N

N/2−1∑

k=−N/2λ[k; l]ej

2π

Nnk. (7)

Using the ambiguity function

A(τ, f) =

∫

ψ(t)ψ∗(t− τ)e−j2πftdt, (8)

we can state the following lemma.

Lemma 1 (CE-BEM Statistics):Say that the support ofψ(t) is(− Tψ

2 ,Tψ2

]with Tψ ≤ Ts

2 . Then, as

N → ∞, the BEM coefficients{λ[k; l]}, for k ∈ {−N2 , . . . ,

N2 } andl ∈ Z, are uncorrelated with variance

E{|λ[k; l]|2

}= N

∫ ∣∣∣A

(τ, k

NTs

)∣∣∣

2RDopp;delay

(kNTs

; τ + lTs)dτ. (9)

Proof: See Appendix A.

In the sequel we assume that the support ofψ(t) satisfies the conditions in Lemma 1. Note then that

the variance ofλ[k; l] is a local average ofRDopp;delay(kNTs

; τ + lTs) over the intervalτ ∈ (−Tψ,Tψ] ⊂(− Ts

2 ,Ts2

]. Due to the support ofRDopp;delay(f ; τ) specified by (4), it follows thatλ[k; l] = 0 when

either k /∈{− NDopp−1

2 ,NDopp−1

2

}or l /∈ {0, . . . , Ndelay − 1}, whereNDopp := 2⌈BDoppTsN⌉ + 1 and

Ndelay := ⌈Tdelay/Ts⌉ + 1. We will refer toNDopp as thediscrete Doppler spreadand toNdelay as the

discrete delay spread. Thus, it suffices to parameterize the channel (over theN -block interval) as

h[n; l] =1√N

(NDopp−1)/2∑

k=−(NDopp−1)/2

λ[k; l]ej2π

Nnk. (10)

Furthermore, theNdelay-sample delay spread, in combination with (5), implies that{y[n]}N−1n=0 depend

only on the input samples{x[n]}N−1n=−Ndelay+1.


7

D. Block-Fading Doubly Selective Model

Our block-fading CE-BEM DSC model is now summarized. The channel output is given by

y[n] =√ρ

Ndelay−1∑

l=0

h[n; l]x[n − l] + v[n], n ∈ {0, . . . , N − 1}, (11)

where {v[n]} is circular white Gaussian noise (CWGN) of unit variance,{x[n]} is the channel in-

put with power constraint 1N+Ndelay−1 E

{ ∑N−1i=−Ndelay+1 |x[i]|2

}≤ 1, and ρ is the SNR. Definingy =

[y[0], . . . , y[N − 1]

]⊤, v =

[v[0], . . . , v[N − 1]

]⊤andx =

[x[−Ndelay + 1], . . . , x[N − 1]

]⊤, we obtain

the following vector representation of the block-fading model,

y =√ρHx + v, (12)

whereH ∈ CN×(N+Ndelay−1) is given element-wise as[H ]p,q = h[p; p +Ndelay − 1 − q].

Assuming sufficiently largeN , the non-zero response coefficients,h[n; l] for n ∈ {0, . . . , N − 1}

and l ∈ {0, . . . , Ndelay − 1}, obey the BEM (10). With the definitionshl =[h[0; l], . . . , h[N − 1; l]

]⊤,

h = [h⊤0 , . . . ,h

⊤Ndelay−1]

⊤, λl =[λ[−NDopp−1

2 ; l], . . . , λ[NDopp−1

2 ; l]]⊤

, andλ = [λ⊤0 , . . . , λ⊤Ndelay−1]

⊤, this

yields

h = Uλ, (13)

whereU = INdelay⊗F and whereF ∈ CN×NDopp is given element-wise as[F ]n,m = 1√

Nej

2π

Nn(m−(NDopp−1)/2).

Assuming a multitude of paths and leveraging the central limit theorem,λ becomes zero-mean Gaussian

with diagonal positive-definite covariance matrixRλ = E{λλH}. Without loss of generality, the channel

can be assumed energy-preserving, i.e.,1N tr{Rλ} = 1. Finally, independent and identical fading across

blocks is assumed. This assumption can be justified for block-interleaved systems or for time-division or

frequency-hopped systems where blocks are sufficiently separated across time and/or frequency.

III. A CHIEVABLE SPECTRAL EFFICIENCY

We now analyze the per-channel-use ergodic capacity of the CE-BEM DSC, expressed as [13]

C(ρ) = supx:E{‖x‖2}≤N+Ndelay−1

1

NI(y;x), (14)

whereI(y;x) denotes mutual information between the channel output and input, and where the supremum

is taken over all random input distributions satisfying thepower constraint. The mutual information in


8

(14) is obtained by averaging implicitly over all channel realizations. It is known that all rates below the

ergodic capacity can be achieved by coding over a large number of block-fading intervals [13], [14].

We defineη, the channel’sachievable spectral efficiency, as the pre-log factor in the high-SNR

expression for the channel capacity:

η = limρ→∞

C(ρ)

log ρ. (15)

For the block-fading DSC, thecoherentergodic capacity (i.e., whenH is known to the receiver), is given

by [14]

Ccoh(ρ) =1

Nsup

Rx�0,tr{Rx}≤N+Ndelay−1E{log det[IN + ρHRxH

H]}, (16)

whereRx = E{xxH} and the expectation is taken over the random matrixH . Using Rx = I gives a

lower bound onCcoh. Also, anyRx meeting the constraint in (16) satisfiesRx � (N + Ndelay − 1)I.

Thus we have4

1

NE{log det[IN + ρHHH]} ≤ Ccoh ≤ 1

NE{log det[IN + ρ(N +Ndelay − 1)HHH]}. (17)

Denoting the eigenvalues ofHHH by {νi}N−1i=0 , we have

1

N

N−1∑

i=0

E log(1 + ρνi) ≤ Ccoh(ρ) ≤ 1

N

N−1∑

i=0

E log(1 + (N +Ndelay − 1)ρνi). (18)

Since the random fading matrixHHH is full rank (almost surely), the eigenvalues are positive and

limρ→∞Ccoh(ρ)log ρ = 1. Thus, in the coherent case, the achievable spectral efficiency of the doubly selective

channel is unity. But, in the noncoherent case, the achievable spectral efficiency is generally less than

unity. In particular, we claim that the achievable spectralefficiency of the noncoherent block-fading CE-

BEM DSC, in the case of continuously distributed inputs, ismax(0, 1− NdelayNDopp

N ). To prove this claim,

we first derive an upper bound on the pre-log factor of mutual information between the input and output

of the block-fading DSC, and later establish the achievability of this bound. Since the optimal input

distribution in terms of mutual information may depend on the SNRρ, we allow the input distribution

to change with respect toρ to find upper bound on the asymptotic mutual information.

4SinceA � B � 0 implies log detA ≥ log detB.


9

Theorem 1 (Achievable Spectral Efficiency):For the block-fading CE-BEM DSC, any sequence of

continuous random input vectors{xρ} indexed by SNRρ, satisfying the power constraintE{‖xρ‖2} ≤

N +Ndelay − 1, and converging in distribution to a continuous random vector x∞, yields

lim supρ→∞

1N I(y;xρ)

log ρ≤ max

(

0,N −NDoppNdelay

N

)

. (19)

Proof: See Appendix B.

The following lemma specifies a fixed input distribution thatachieves the mutual information upper bound

given in (19).

Lemma 2 (Achievability):For the block-fading CE-BEM DSC, when the inputx is i.i.d. zero-mean

unit-variance circular Gaussian,

limρ→∞

1N I(y;x)

log ρ= max

(

0,N −NDoppNdelay

N

)

. (20)

Proof: See Appendix C.

It can be seen, from (20), that the loss in achievable spectral efficiency, relative to the coherent case,

increases with thespreading indexγ := NDoppNdelay/N . Sinceγ ≈ 2BDoppTdelay, larger values ofγ

correspond to higher levels of time-frequency dispersion.Thus, our findings, which imply that channel

dispersion limits achievable spectral efficiency, are intuitively satisfying. Forγ ≪ 1, the achievable

spectral efficiency will be close to unity, i.e., that of the coherent case. Such channels have relatively few

unknown parameters and thus are not expected to incur much “training overhead.” For generalγ < 1, the

achievable spectral efficiency of the block-fading DSC, under continuously distributed inputs, coincides

with previous results on special cases of this channel: flat fading (i.e.,Ndelay = 1, NDopp = 1) [1], [15];

time-selective fading (i.e.,Ndelay = 1) [3]; and frequency-selective fading (i.e.,NDopp = 1) [2].

For γ ≥ 1, Theorem 1 and Lemma 2 establish that the pre-log factor of mutual information with

continuous inputs is zero. DSCs for whichγ > 1 can be interpreted as “overspread” channels [16]. Time

and frequency variations of overspread channels are impossible to track even in the absence of noise since

they imply that the number of unknown channel parameters (NDoppNdelay) will be more than the number

of received observations (N ). Our γ ≥ 1 result can be compared with a related result from Lapidoth [17]

that shows that the noncoherent channel capacity grows onlydouble-logarithmicallywhen the differential

entropy (denoted byh(·)) of the channel matrix satisfiesh(H) > −∞. Intuitively, if h(H) > −∞, no


10

element ofH can be perfectly estimated with the full knowledge of other elements ofH , so that there

are more unknowns than observations. In fact, we make use of this result in our proof.

Note that, because Theorem 1 restricts the input distribution to be continuous, it does not characterize

the pre-log factor of thecapacity5 of the DSC.

IV. PILOT-A IDED TRANSMISSION

In this section, we detail the encoding and decoding techniques assumed for the PAT schemes analyzed

in this paper. Since a primary advantage of using PAT for noncoherent channels is the application of

communication techniques developed for coherent channels, we focus on the use of Gaussian coding and

(weighted) minimum-distance decoding via pilot-aided linear MMSE (LMMSE) channel estimates. We

are mainly interested in designing PAT schemes that achievethe rates promised by the mutual information

bounds in Theorem 1 and Lemma 2. We restrict our attention to the case whereγ < 1, which allows a

non-zero pre-log factor.

A. PAT Encoder

We assume either cyclic-prefixed (CP) or zero-prefixed (ZP) block-transmission, so that

[x[−Ndelay + 1], . . . , x[−1]

]=

0 ZP,

[x[N −Ndelay + 1], . . . , x[N − 1]

]CP.

(21)

Since, for both CP and ZP, the vectorx′ :=[x[0], . . . , x[N − 1]

]⊤completely specifies the transmission

vectorx defined in Section II-D, we focus our attention on the structure of x′. We considerx′ generated

by the general class ofaffine precodingschemes [18]:

x′ = p + Bs, (22)

wherep is a fixed pilot vector,B ∈ CN×Ns is a fixed full-rank linear precoding matrix, ands ∈ C

Ns is a

zero-mean information-bearing symbol vector and we refer to its dimensionNs as “data dimension.” For

the purpose of achievable-rate analysis, we can assume w.l.o.g. that the columns ofB are orthonormal,

since the mutual information betweens and y remains unaffected by invertible transformations ofs.

5We have not established that the capacity achieving input distribution for our DSC model is a continuous one.


11

Denoting the CP/ZP precoding matrix byM ∈ C(N+Ndelay−1)×N , so thatx = Mx′, the DSC model

(12) becomes

y =√ρHM(p + Bs) + v. (23)

The transmitted power constraintE{‖x‖2} ≤ N +Ndelay − 1 will be enforced via constraints onEp =

‖p‖2 > 0 andEs = E{‖s‖2}.

DefiningXi = diag(x[i], . . . , x[i+N −1]) andX = [X0, . . . ,X−Ndelay+1], equation (12) can also be

written asy =√ρXh + v. Due to zero-means, the pilot and data components ofX areP = E{X}

andD = X − P , respectively. Thus, it follows from (13) that

y =√ρPUλ +

√ρDUλ + v. (24)

Note that, when the channel statisticsU andRλ are known, estimation ofh is equivalent to estimation

of λ.

To achieve arbitrarily small probability of decoding errorover the block-fading DSC, we construct long

codewords that span multiple blocks. LetS denote a codebook in which each codewords spansK blocks.

Thus, we can writes = [s[0]⊤, . . . , s[K−1]⊤]⊤, wheres[k] ∈ CNs×1 is the “segment” of codewords that

corresponds to thekth block. We consider codebooks generated according to a Gaussian distribution,

so that each codeword, and its segments, are independently generated with positive-definite segment

covariance matrixRs. Recall that Gaussian codes are capacity-optimal for coherent Gaussian-noise

channels [14].

B. PAT Decoder

We assume that PAT decoding consists of a channel estimationstage followed by a data detection stage.

The channel estimator computes the LMMSE estimate ofh, given the observationy, the pilotsp, and

the (joint) second-order statistics ofh, v ands. Specifically, withRy = E{yyH} andRy,h = E{yhH},

the channel estimate is

h = RHy,hR

−1y y, (25)


12

where, from (24),

Ry = ρP URλ(PU )H + ρE{DURλ(DU)H} + I (26)

Ry,h =√ρPURλU

H. (27)

The channel estimation MSE is given by

σ2e = E{‖h − h‖2} = tr{URλU

H − RHy,hR

−1y Ry,h}. (28)

In the sequel,H denotes the version ofH constructed with these channel estimates.

For data detection, we employ weighted minimum-distance decoding based on the LMMSE channel

estimates. Recall that the maximum-likelihood (ML) decoder for coherent Gaussian-noise channels is a

weighted minimum-distance decoder [19] and notice that this decoder is relatively simple compared to

one that performs joint data detection and channel estimation. Given our multi-block coding scheme, the

decoder is specified as

s = arg mins∈S

K−1∑

k=0

‖Q(y[k] −√ρH

[k]

M(p + Bs[k]))‖2, (29)

wherey[k] and H[k]

denote the observation and the estimated channel matrix, respectively, of thekth

block. The choice of the weighting matrixQ is, for the moment, arbitrary.

C. Spectral Efficiency of PAT

For PAT, we say that a rateR is achievable if the probability of decoding error can be made arbitrarily

small at that rate. Since our PAT schemes use Gaussian codes,we employ Theorem 1, which bounds the

spectral efficiency of noncoherent DSC with continuously distributed inputs, in the following definition.

Definition 1: A PAT scheme isspectrally efficientif its achievable rateR(ρ) over the block-fading

CE-BEM DSC satisfieslimρ→∞R(ρ)log ρ =

N−NDoppNdelay

N .

For the case of flat or frequency-selective channels, MMSE-PAT schemes (i.e., those designed to

minimize channel-estimation-error variance) have been shown to be spectrally efficient [1], [2], [5]. In

the sequel, we establish that all CP-based affine MMSE-PAT schemes are spectrally inefficient over the

CE-BEM strictly DSC and propose a spectrally efficient (non-MMSE) affine PAT scheme.


13

V. L OSSLESSL INEARLY SEPARABLE PAT

In this section, we focus on affine PAT schemes for which the pilot and data components can be linearly

separated without energy loss at theoutputof the CE-BEM DSC channel, i.e., fromy in (23) and (24).

Practically speaking, theselosslessly linearly separable(LLS) PAT schemes are those that enable the

receiver to compute channel estimates in the absence of datainterference. From (24), it can be seen that

the LLS criterion can be stated as

(P U)HDU = 0, ∀D ∈ D, (30)

whereD refers to the collection of data matrices constructed from all possible codeword realizations. In

the sequel, we use the term MMSE-PAT when referring to any PATscheme that minimizes the channel-

estimation-error varianceσ2e = E{‖h − h‖2} subject to a fixed positive pilot energyEp.

Lemma 3:All MMSE-PAT schemes for the CE-BEM DSC are LLS.

Proof: It has been shown in Theorem 1 of [7], [20] that all CP-based affine MMSE-PAT schemes are

LLS, and it can be inferred from [21] that ZP-based single-carrier MMSE-PAT schemes are also LLS.

A. Achievable Rate

We now analyze the achievable rate of LLS PAT, assuming the encoder/decoder specified in Sec-

tion IV-B. To do this, we first choose the weighting matrixQ in (29). Let the columns ofBd form an

orthonormal basis for the left null space ofPU . Assuming the LLS condition (30), the projection

yd = BHdy =

√ρBH

dHMB︸︷︷︸

Hd

s + vd (31)

preserves the data component. Then writingHd = Hd + Hd with estimateHd and errorHd, we get

yd =√ρHds +

√ρHds + vd

︸︷︷︸

n

. (32)

From [22], we know that the rate-maximizing weighting operator for yd (under the restricted set of

Gaussian codebooks) will be the “whitening operator”R−1/2n , whereRn = E{nnH}. Thus, we use

Q = R−1/2n BH

d (33)


14

in the decoder (29). Theorem 2 from [22] then directly implies6 the following.

Lemma 4:For an affine PAT scheme that is LLS according to (30) and that uses the weighting factor

Q from (33) in decoder (29), the achievable rate is

R =1

NE{log det[I + ρR−1

n HdRsHHd ]}. (34)

We note that the rate expression (34) resembles that for the coherent case [14] whenn in (32) is

considered as “effective” Gaussian noise.

B. Asymptotic Achievable Rate

We now study the achievable rate of LLS PAT in the high-SNR regime. Since the channel estimation

error becomes part of the effective noise in (34), the MSEσ2e(ρ) from (28) directly influences the

asymptotic behavior of the achievable rate. The following theorem gives a condition on the MSEσ2e(ρ)

of linearly separable PAT that is sufficient to ensure that the achievable rate’s pre-log factor grows in

proportion to the data dimension.

Theorem 2:Suppose a(p,B) PAT scheme is linearly separable according to (30) and guarantees, for

some fixedκ ∈ R, estimation error that satisfiesσ2e(ρ) = E ‖h − h‖2 ≤ κ

ρ for all ρ > 1. Then its

asymptotic achievable rate obeys

limρ→∞

R(ρ)

log ρ=

Ns

N. (35)

Proof: See Appendix D.

When σ2e(ρ) <

κρ , the effective noise varianceRn remains bounded, enabling thelog ρ growth of

achievable rate (34) with pre-log factor equal to the rank ofHd. The estimation-error condition required

for Theorem 2 is quite mild and is satisfied, e.g., by all CP-based affine MMSE-PAT schemes. In

Appendix E, we show that all CP-based affine MMSE-PAT schemesyield Ns < N − NDoppNdelay

whenNdelay > 1 andNDopp > 1, i.e., when the channel isstrictly doubly selective. Putting these two

results together, we make the following claim.

6The achievable rate result in [22] is derived assuming MMSE channel estimates. However, when (30) is satisfied, the LMMSE

estimates (25) are MMSE because the pilot observations and the channel coefficients are jointly Gaussian.


15

Theorem 3 (Spectral Inefficiency):For CE-BEM block-fading DSCs withNdelay > 1 andNDopp > 1,

all CP-based affine MMSE-PAT schemes are spectrally inefficient.

Proof: See Appendix E.

ZP-based single-carrier MMSE-PAT schemes, as characterized in [21], also yieldNs < N−NDoppNdelay,

and hence are also spectrally inefficient whenNdelay > 1 andNDopp > 1.

For singly selective channels, however, there do exist spectrally efficient MMSE-PAT schemes, such as

those specified for frequency-selective channels (i.e.,NDopp = 1) in [2] and for time-selective channels

(i.e.,Ndelay = 1) in [3], [6], [7]. This can be understood by the fact that, in the frequency- (time-) selective

case, the effective channel matrixHM hasNdelay (NDopp) deterministiceigenvectors, known to the

transmitter, so that MMSE-estimation of theNdelay (NDopp) channel parameters can be accomplished by

sacrificing onlyNdelay (NDopp) signaling dimensions to pilots. In the doubly selective case, however, the

eigenvectors ofHM are not deterministic and (under our assumptions) unknown to the transmitter, so

that MMSE-estimation of theNDoppNdelay channel parameters requires sacrificing more thanNDoppNdelay

signaling dimensions to pilots.

VI. SPECTRALLY EFFICIENT PAT

As established in Section V, CP-based affine MMSE-PAT schemes, as well as ZP-based single-carrier

MMSE-PAT schemes, are spectrally inefficient in strictly doubly selective CE-BEM fading, i.e., when

NDopp > 1 andNdelay > 1, because they sacrifice more thanNDoppNdelay signaling dimensions to pilots.

In this section, we design spectrally efficient PAT schemes by side-stepping the MMSE requirement.

Since we have restricted ourselves to non-data-aided channel estimation, we reason that the lossless

linear separability criterion (30) is still essential, since, without it, channel estimation would suffer

unknown-data interference and, as a result, estimation error would persist even asρ → ∞. Precise

conditions for spectrally efficient PAT are given in the following lemma.

Lemma 5:Suppose that a(p,B) PAT scheme satisfies the following conditions:

1) PU is full rank,

2) rank(B) = Ns = N −NDoppNdelay, and

3) P U guarantees LLS according to (30).


16

Then the PAT scheme is spectrally efficient.

Proof: See Appendix F.

In Lemma 5, the first condition avoids an undetermined systemof equations during channel estimation,

the second enables the transmission ofN − NDoppNdelay linearly independent data symbols per block,

and the third prevents data-interference during channel estimation. A spectrally efficient PAT (SE-PAT)

scheme satisfying these three requirements is now described.

Example 1 (SE-PAT):AssumingN -block transmission over the CE-BEM DSC, consider the pilot

index setPs = {0, Ndelay, . . . , (NDopp−1)Ndelay} and the guard index setGs = {0, . . . , NDoppNdelay−1}.

Then construct a ZP-based affine PAT scheme(p,B) where

p[k] =

√EpNDopp

ejθ[k] k ∈ Ps

0 k /∈ Ps, (36)

for arbitraryθ[k] ∈ R and whereB is constructed from the columns ofIN whose indices arenot in Gs.

For the example scheme, note that the firstNDoppNdelay time slots are used by pilots while the remaining

time slots are used for data transmission, thereby ensuringlinear separability. It can be readily verified

that B has rankN − NDoppNdelay and thatPU has full rank, so that all three conditions in Lemma 5

are satisfied. Such SE-PAT schemes are advantageous in that they yield higher achievable rates than

spectrally inefficient (e.g., MMSE-PAT) schemes at high SNR.

VII. C ONCLUSION

In this paper, the achievable spectral efficiency of the noncoherent CE-BEM DSC with continuous

input distributions was shown to beN−NDoppNdelay

N ≈ 1 − 2BDoppTdelay, whereBDopp denotes the single-

sided Doppler spread andTdelay denotes the delay spread of the WSSUS channel. In addition, CP-based

affine MMSE-PAT schemes were shown to be spectrally inefficient, and a design procedure for spectrally

efficient PAT schemes was provided.

APPENDIX A

PROOF OFLEMMA 1

In this appendix we analyze the statistics of the CE-BEM coefficientsλ[k; l] by first considering the

statistics of the discrete-time channel coefficientsh[n; l]. From (2)-(3) and (6), it is straightforward to


17

show that

E{h[n; l]h∗[n−m; l − q]}

=

∫ ∫ ∫

ψ∗(t′ + v)ψ(t′ + v − τ)ψ(t′)ψ∗(t′ − τ − qTs)dt′

×∫

RDopp;delay(f ; τ + qTs + lTs)ej2πf(v+mTs)dfdvdτ. (37)

From (7), we knowλ[k; l] = 1√N

∑N−1n=0 h[n; l]e−j

2π

Nnk for k ∈ {−N

2 , . . . ,N2 − 1}, so that

E{λ[k; l]λ∗[k − p; l − q]}

=1

N

N−1∑

n=0

N−1∑

n′=0

E{h[n; l]h∗[n′; l − q]}e−j 2π

N[nk−n′(k−p)] (38)

=N−1∑

m=−N+1

(N − |m|)e−j 2π

Nmk E{h[n; l]h∗[n−m; l − q]} 1

N

N−1∑

n′=0

e−j2π

Nn′p (39)

= δ[p]

N−1∑

m=−N+1

(N − |m|)e−j 2π

Nmk E{h[n; l]h∗[n−m; l − q]}, (40)

usingm := n − n′ and the fact thatE{h[n; l]h∗[n −m; l − q]} is invariant ton. Combining (37) with

(40), it is straightforward to obtain

E{λ[k; l]λ∗[k − p; l − q]}

= δ[p]

∫ ∫ ∫


×∫

RDopp;delay(f′ + k

NTs; τ + qTs + lTs)e

j2π(f ′+ k

NTs)v

×N−1∑

m=−N+1

(N − |m|)ej2πmTsf ′

df ′dvdτ. (41)

Focusing on the case of large block-sizeN , we apply the rulelimN→∞1N

∑N−1m=−N+1(N−|m|)ej2πmφ =

∑∞i=−∞ δ(φ− i) with φ = Tsf

′ to the previous result and find

limN→∞

E{λ[k; l]λ∗[k − p; l − q]}

= Nδ[p]

∫ ∫ ∫


×∞∑

i=−∞RDopp;delay(

k+NiNTs

; τ + qTs + lTs)ej2π k+Ni

NTsvdvdτ (42)

= Nδ[p]

∫ ∫ ∫


×RDopp;delay(kNTs

; τ + qTs + lTs)ej2π k

NTsvdvdτ, (43)


18

where for (43) we used the assumptions thatBDopp <1

2TsandRDopp;delay(f ; ·) = 0 for f /∈ [−BDopp,BDopp]

with the fact thatk ∈ {−N2 , . . . ,

N2 − 1} to write

∑∞i=−∞RDopp;delay

(k+iNNTs

; ·)

= RDopp;delay(

kNTs

; ·).

Writing (43) in terms of the ambiguity function (8) yields

limN→∞

E{λ[k; l]λ∗[k − p; l − q]}

= Nδ[p]

∫

A∗(τ, kNTs

)A

(τ + qTs,

kNTs

)RDopp;delay

(kNTs

; τ + qTs + lTs)dτ. (44)

If the support ofψ(t) is(− Tψ

2 ,Tψ2

], then it can be seen that, whenTψ ≤ Ts

2 , the functionsA(τ, ·

)and

A(τ + qTs, ·

)∣∣q 6=0

share no common support, in which case (44) reduces to

limN→∞

E{λ[k; l]λ∗[k − p; l − q]}

= Nδ[p]δ[q]

∫ ∣∣∣A

(τ, k

NTs

)∣∣∣

2RDopp;delay

(kNTs

; τ + lTs)dτ. (45)

APPENDIX B

PROOF OFTHEOREM 1

DefiningL = min(N,NdelayNDopp), we define the vectorys =[y[0], . . . , y[L− 1]

]⊤.

Claim: limρ→∞I(ys;x

ρ)log ρ = 0.

Proof: Using the chain rule for mutual information [13], we have

I(ys;xρ) = I(y[0];xρ) +

L−1∑

i=1

I(y[i];xρ|y[0], . . . , y[i− 1]) (46)

≤ I(y[0];xρ) +

L−1∑

i=1

I(y[i];xρ, y[0], . . . , y[i− 1]). (47)

In the sequel, we analyze each term in (47) separately. In preparation , we define the vectorsxρ

i =

[xρ[i], . . . , xρ[i−Ndelay + 1]

]⊤and their “complements”xρ

i , which are composed of elements ofxρ not

in xρ

i . We also define the channel vectorshi =[h[i; 0], ..., h[i;Ndelay − 1]

]⊤, where

y[i] =√ρh⊤

i xρ

i + v[i]. (48)

Next we establish the useful result thatI(y[i];xρ

i ) ≤ log log ρ + Ξ for some constantΞ ∈ R. Towards

this aim, we use a special case of the capacity result from [17, Thm. 4.2], which is stated below.

Lemma 6 (Special case of Theorem 4.2 from [17]):Consider the following vector input-output rela-

tion for CWGN block fading channel modely =√ρHx + v. The input and the noise power are


19

constrained asE{‖x‖2} ≤ P1 and E{‖v‖2} ≤ P2, respectively, for some positive constantsP1 and

P2. Furthermore assume that the channel fades independently from block to block and only the channel

fading statistics are available at both the transmitter andreceiver. If the differential entropy (denoted

by h(·)) of the channel fading matrixH satisfiesh(H) > −∞, then the high-SNR asymptotic ergodic

channel capacity obeyslim supρ→∞C(ρ) − log log ρ <∞.

In our model (48), since the elements ofhi are independent with positive variance, the covariance matrix

of hi, denoted byRi, is positive definite and hence the differential entropy satisfiesh(h⊤i ) = h(hi) =

log detRi >∞. Applying Lemma 6 to (48), it follows thatI(y[i];xρ) ≤ log log ρ+ Ξ.

The first term in (47) can be writtenI(y[0];xρ) = I(y[0];xρ

0) + I(y[0]; xρ

0|xρ

0). Conditioned onxρ

0, the

uncertainty iny[0] is due to channel coefficients and additive noise, which are independent ofxρ

0. Hence,

I(y[0]; xρ

0|xρ

0) = 0. SinceI(y[0];xρ

0) ≤ log log ρ + Ξ, we know limρ→∞I(y[0];xρ)

log ρ = 0. Considering the

general term inside the summation of (47),

I(y[i];xρ, y[0], . . . , y[i− 1])

= I(y[i];xρ

i)︸︷︷︸

≤log log ρ+Ξ

+ I(y[i]; xρ

i |xρ

i)︸︷︷︸

=0

+ I(y[i]; y[0], . . . , y[i− 1]|xρ)︸︷︷︸

Ti

,

it remains to be shown thatlimρ→∞Ti

log ρ = 0.

Recall thaty and h are jointly Gaussian conditioned onxρ. In terms of differential entropies,

I(y[i]; y[0], . . . , y[i− 1]|xρ) = h(y[i]|xρ) − h(y[i]|xρ, y[0], . . . , y[i− 1]). It follows that

h(y[i]|xρ) = E{log(1 + ρ

Ndelay−1∑

ℓ=0

E{|h[i; ℓ]|2}|xρ[i− ℓ]|2)}, (49)

where the expectation is with respect toxρ. Now, given{y[0], . . . , y[i − 1]}, we split y[i] into MMSE

estimate and error asy[i] = E{y[i]|y[0], . . . , y[i− 1],xρ} + y[i]. Sincey is Gaussian givenxρ, we have

h(y[i]|y[0], . . . , y[i − 1],xρ) = E log(E |y[i]|2), where the expectation inside thelog is w.r.t. H and v

and the expectation outside thelog is w.r.t. xρ. Denoting the covariance ofhi−E{hi|y[0], . . . , y[i− 1]}

by Ri, we haveE |y[i]|2 = 1 + ρxρHi Rix

ρ

i . Let µmax,i denote the maximum eigenvalue ofRi and qi

denote the corresponding eigenvector. Now defineκmax,i = infxρ∈CN µmax,i. For i ∈ {1, . . . , L − 1},

all the elements ofhi can not be estimated perfectly, even in the absence of noise (ρ = ∞), since

{y[0], . . . , y[i − 1]} correspond to a projection ofλ onto a subspace of smaller dimension, and hence


20

κmax,i > 0. Now, E |y[i]|2 ≥ 1 + ρκmax,i|∑Ndelay−1

k=0 qi[k]xρ[i− k]|2, and hence

h(y[i]|xρ, y[0], . . . , y[i− 1]) ≥ E{log(1 + ρκmax,i|Ndelay−1

∑

k=0

qi[k]xρ[i− k]|2)}. (50)

Combining (49) and (50), we haveTi ≤ E log1+ρ

P

Ndelay−1

ℓ=0 E{|h[i;ℓ]|2}|xρ[i−ℓ]|2

1+ρκmax,i|P

Ndelay−1

k=0 qi[k]xρ[i−k]|2. Sincexρ is a sequence of

continuous random vectors converging to a continuous random vector,

limρ→∞ |∑Ndelay−1k=0 qi[k]x

ρ[i− k]|2 > 0 with probability 1, and limρ→∞Ti

log ρ = 0.

Now, if N ≤ NDoppNdelay, the proof is complete since in that caseys = y. For the caseN >

NDoppNdelay, we defineyr =[y[NDoppNdelay], . . . , y[N − 1]

]⊤and, using the chain rule for mutual

information, obtainI(y;xρ) = I(ys;xρ) + I(yr;x

ρ|ys). To complete the proof, we need to establish that

lim supρ→∞I(yr ;x

ρ)log ρ ≤ N −NDoppNdelay. For this we have

I(yr;xρ|ys) = h(yr|ys) − h(yr|ys,xρ) (51)

≤ h(yr) − h(yr|ys,xρ,H), (52)

since conditioning reduces entropy. Now,E{|y[n]|2} = E{|v[n]|2} + ρ∑Ndelay−1

l=0 E{|h[n; l]|2}E{|x[n −

l]|2} ≤ 1 + k1ρ for some constantk1 ∈ R. Bounding the maximum eigenvalue of the covariance matrix

of yr by the sum of its diagonal elements, we see thatRyr � (N −NdelayNDopp)(k1ρ+1)IN−NDoppNdelay .

Since the Gaussian distribution maximizes the entropy for agiven covariance matrix, we haveh(yr) ≤

log det[(N − NdelayNDopp)(k1ρ + 1)IN−NDoppNdelay ]. Finally, h(yr|ys,xρ,H) is equal to the entropy

of the unit variance white noise term inyr, which is bounded and independent ofρ. So, we have

lim supρ→∞I(yr ;x

ρ|ys)log ρ ≤ N −NDoppNdelay.

APPENDIX C

PROOF OFLEMMA 2

Since mutual information is non-negative, it is sufficient to restrict ourselves to the case ofN >

NDoppNdelay. We need only to prove that the lower bound on the mutual information with Gaussian

inputs satisfies the equality in (19). Using the chain rule for mutual information, we have

I(y;x) = I(y;x,H) − I(y;H |x) (53)

≥ I(y;x|H) − I(y;H |x). (54)


21

Since I(y;x|H) corresponds to coherent case of perfect receiver CSI and since x is Gaussian with

covarianceRx = I, we have [14]

I(y;x|H) = E{log det[IN + ρHHH]}. (55)

SinceHHH is full rank (almost surely), re-using the arguments following (17) yields

limρ→∞

I(y;x|H)

log ρ= N. (56)

Now, for matrix X appropriately constructed from the input samples{x[i]}N−1i=−Ndelay+1, (12) can be

written as

y =√ρXh + v.

Using the BEM model (13), we havey =√ρXUλ+ v. Sinceλ captures all the degrees of freedom of

DSC over a block, we haveI(y;H |x) = I(y;λ|x) = I(y;λ|X). Conditioned onX, the vectorsy and

λ are jointly Gaussian, and hence, using the statistics ofλ and Jensen’s inequality, we have

I(y;λ|X) = E log det[I + ρ(XU)Rλ(XU)H] (57)

≤ log det E[I + ρN(XU)HXU ] (58)

≤ NDoppNdelay log ρ+ Ξ, (59)

for some constantΞ, where (59) follows from the fact thatE(XU)H(XU) � kINDoppNdelay for some

constantk. So, finally we have

limρ→∞

I(y;H |x)

log ρ≤ NDoppNdelay. (60)

The desired result follows from (54), (56) and (60).

APPENDIX D

PROOF OFTHEOREM 2

According to Lemma 4, a linearly separable PAT scheme with weighting matrixQ in (33) achieves the

rate given in (34). To derive a lower bound on the achievable-rate pre-log factor, we first obtain a bound

(in the positive semi-definite sense) onRn, the covariance matrix of√ρBH

d HMBs + vd. Because of

the orthogonality of pilot and data subspaces of lossless linearly separable PAT, the elements of the (pilot


22

based) channel estimation error matrixH are independent to the noise in the data subspacevd and also

to the data vectors. So, we have

Rn = E{ρBHd HMBs(BH

d HMBs)H + vdvHd },

= E{ρBHd HMBRs(B

Hd HMB)H} + I,

� ρσ2eEs‖M‖2

F I + I, (61)

where the inequality (61) follows from applying the inequalitiesRs � EsI, BBH � I, E{HHH} � σ2

eI

and BHdBd = I. Incorporating the conditionσ2

e(ρ) ≤ κρ , we see thatRn ≤ CI for some constantC,

∀ρ > 1. So, we haveR−1n HdRsHd � ρ

C HdRsHd and the achievable rate (34) can be bounded as

R(ρ) ≥ 1

NE{log det[I +

ρ

CHdRsH

Hd ]} (62)

≥ 1

NE{log det[I +

ρσ2s

CHdH

Hd ]}, (63)

whereσ2s denotes the minimum eigenvalue ofRs. Sinceσ2

e(ρ) → 0 asρ → ∞, the channel estimates

converge almost everywhere to the true channel, i.e.,limρ→∞ H = H . Also, sinceHdHHd has rank

equal torank(B) = Ns, we havelimρ→∞R(ρ)log ρ ≥ Ns

N . To derive an upper bound on the achievable-rate’s

pre-log factor, we use Jensen’s inequality to take the expectation inside thelog det(·) term of (34), thereby

obtaininglimρ→∞R(ρ)log ρ ≤ Ns

N . Together, the upper and lower bounds yield (35).

APPENDIX E

PROOF OFTHEOREM 3

In this proof, we restrict our attention to strictly doubly selective channels, i.e., DSCs for which

Ndelay > 1 andNDopp > 1. Throughout this proof, we consider all indices modulo-N . Let (p,B) be an

arbitrary CP-MMSE PAT scheme for strictly DSC. We establishthe desired result in the following two

steps:

1) For the CP-MMSE-PAT scheme(p,B), the achievable rate pre-log factor equalsrank(B).

2) For strictly DSCs, any CP-MMSE-PAT scheme(p,B) obeysrank(B) < N −NdelayNDopp.


23

Step 1) of Proof:

The characterization of CP-based affine MMSE-PAT in [7], [20] establishes that the linear separability

condition (30) is satisfied, and furthermore thatE{‖h‖2} = tr{(R−1λ + ρEp

N INDoppNdelay)−1}. Recalling that

Rλ is diagonal, and defining positiveαi = [Rλ]i,i, we find E{‖h‖2} =∑NdelayNDopp−1

i=0 ( 1αi

+ ρEpN )−1 ≤

NNdelayNDopp

ρEp. Thus, all CP-based affine MMSE-PAT schemes satisfy the hypotheses of Theorem 2 and

hence the pre-log factor of their achievable rates are equalto their corresponding data dimensionrank(B).

Step 2) of Proof:

Now, we show that, whenNdelay > 1 andNDopp > 1, CP-based affine MMSE-PAT guarantees data

dimensionNs < N−NDoppNdelay. To establish the condition onNs, we use the method of contradiction.

In particular, we proceed in the following stages.

(i) Assume that there exists a CP-MMSE-PAT scheme for strictly DSC that allowsNs = N−NDoppNdelay.

(ii) Find the necessary requirements onp andB for such a PAT scheme.

(iii) Establish that the PAT schemes satisfying the requirements obtained in stage (ii) obeyNs < N −

NDoppNdelay, contradicting the initial assumption of stage (i).

Stage (i) – Initial Assumption:Let us assume that there exists a CP-MMSE PAT scheme(p,B) for

strictly DSC that satisfiesNs = N −NDoppNdelay.

Stage (ii) – Necessary Requirements:The necessary conditions on CP-based affine MMSE-PAT for

the CE-BEM DSC established in [7], [20] can be expressed as the pair (64)-(65) usingp[i] = [p]i,

bq[i] = [B]i,q, Ndelay = {−Ndelay + 1, . . . , Ndelay − 1}, andNDopp = {−NDopp + 1, . . . , NDopp − 1}:

N−1∑

i=0

bq[i]p∗[i− k]e−j

2π

Nmi = 0 ∀k ∈ Ndelay, ∀m ∈ NDopp, ∀q ∈ {0, . . . , Ns − 1}, (64)

1

Ep

N−1∑

i=0

p[i]p∗[i− k]e−j2π

Nmi = δ[k]δ[m] ∀k ∈ Ndelay, ∀m ∈ NDopp. (65)

Notice that (64) states the linear separability condition (30) in the case of a CE-BEM DSC. Defining

pk,m =1

√Ep

[p[k]ej

2π

Nm·0, p[k + 1]ej

2π

Nm·1, . . . , p[k +N − 1]ej

2π

Nm(N−1)

]⊤(66)

as a (normalized)k-time-shifted andm-frequency-shifted version of pilot vectorp, and constructing

matrix W from columns{pk,m, k ∈ Ndelay,m ∈ NDopp}, equation (64) can be conveniently rewritten as


24

W HB = 0. It will be convenient to visualize the elements of{pk,m, k ∈ Ndelay,m ∈ NDopp} arranged

in a grid, as in Fig. 1. For this, we use the abbreviationD = (NDopp − 1)/2.

p−Ndelay+1,−NDopp+1

p−Ndelay+2,−NDopp+1

...

p0,−NDopp+1

p1,−NDopp+1

...

pNdelay−1,−NDopp+1

· · ·

· · ·

· · ·

· · ·

· · ·

· · ·

· · ·

p−Ndelay+1,−D−1

p−Ndelay+2,−D−1

...

p0,−D−1

p1,−D−1

...

pNdelay−1,−D−1

p−Ndelay+1,−D

p−Ndelay+2,−D

...

p0,−D

p1,−D

...

pNdelay−1,−D

· · ·

· · ·

· · ·

· · ·

· · ·

· · ·

· · ·

p−Ndelay+1,D−1

p−Ndelay+2,D−1

...

p0,D−1

p1,D−1

...

pNdelay−1,D−1

p−Ndelay+1,D

p−Ndelay+2,D

...

p0,D

p1,D

...

pNdelay−1,D

· · ·

· · ·

· · ·

· · ·

· · ·

· · ·

· · ·

p−Ndelay+1,NDopp+1

p−Ndelay+2,NDopp+1

...

p0,NDopp+1

p1,NDopp+1

...

pNdelay−1,NDopp+1

W o

W 1e

W 2e

Fig. 1. Elements of the set{pk,m, k ∈ Ndelay, m ∈ NDopp} arranged in a grid, usingD = (NDopp − 1)/2.

Let (p,B) be a CP-based affine MMSE-PAT scheme with data dimensionNs = N − NDoppNdelay

(i.e., rank(B) = Ns). We now deduce some essential properties ofp. Defining

rk,m := 〈p0,0,pk,m〉 =1

Ep

N−1∑

i=0

p[i]p∗[i+ k]e−j2π

Nmi, (67)

where〈x,y〉 = yHx denotes the inner product, the MMSE condition (65) implies that

rk,m = δ[k]δ[m], ∀k ∈ Ndelay, ∀m ∈ NDopp. (68)

Note also that

〈pk1,m1,pk2,m2

〉 = ej2π

N(m2−m1)k1rk2−k1,m2−m1

(69)

r∗k,m = 〈pk,m,p0,0〉 = e−j2π

Nmkr−k,−m. (70)

Together, (68) and (69) imply thatthe elements within anyrectangle of heightNdelay and widthNDopp in

Fig. 1 are orthonormal. In addition to being a CP-MMSE-PAT,(p,B) satisfiesNs = N −NDoppNdelay,

which results in additional restrictions onp andB that are stated in Lemma 7.

Lemma 7:For a CP-MMSE-PAT withNs = N −NDoppNdelay, either |r0,NDopp | = 1 or |rNdelay,0| = 1.

Proof: Let W o be the matrix constructed from columns{pk,m, k ∈ {0,−1, . . . ,−Ndelay + 1}, m ∈

{−D, . . . ,D}}. Since these columns form a rectangle of heightNdelay and widthNDopp in Fig. 1, we

know they are orthonormal. Furthermore, since these columns form a subset of the columns ofW , we


25

know thatrank(W ) ≥ NDoppNdelay. But, since the MMSE condition (64)W HB = 0 implies that the

nullspace ofW H has a dimension of leastNs = N −NDoppNdelay, i.e., thatrank(W ) ≤ NDoppNdelay,

we see thatrank(W ) = NDoppNdelay. Hence, the columns ofW o form an orthonormal basis for the

columns ofW , which implies

pk,m =

Ndelay−1∑

i=0

D∑

j=−D〈pk,m,p−i,j〉p−i,j, ∀k ∈ Ndelay,∀m ∈ NDopp, (71)

=

Ndelay−1∑

i=0

D∑

j=−Dej

2π

N(j−m)kr−i−k,j−mp−i,j, ∀k ∈ Ndelay,∀m ∈ NDopp. (72)

Now let W 1e = [p0,−D−1, . . . ,p−Ndelay+2,−D−1]. (See Fig. 1.) Considering that we can enclose these

elements in a height-Ndelay and width-NDopp rectangle in Fig. 1, we can see that the columns ofW 1e

form an orthonormal set, and that the columns ofW 1e are orthogonal to most columns inW o. Using

(72) to write the columns ofW 1e as a linear combination of those columns inW o that are not orthogonal

to those inW 1e, we have

W 1e = [p0,D,p−1,D, . . . ,p−Ndelay+1,D]M 1, (73)

for M 1 ∈ CNdelay×Ndelay−1 such that

M1 =

r0,NDopp e−j2π

NNDoppr1,NDopp · · · e−j

2π

NNDopp(Ndelay−2)rNdelay−2,NDopp

r−1,NDopp e−j2π

NNDoppr0,NDopp · · · e−j

2π

NNDopp(Ndelay−2)rNdelay−3,NDopp

...... · · · ...

r−Ndelay+1,NDopp e−j2π

NNDoppr−Ndelay+2,NDopp · · · e−j

2π

NNDopp(Ndelay−2)r−1,NDopp

.

Now, letting W 2e = [p1,−D, . . . ,p1,D−1] and carrying out a similar procedure, we have

W 2e = [p−Ndelay+1,−D,p−Ndelay+1,−D+1, . . . ,p−Ndelay+1,D]M2, (74)

for M 2 ∈ CNDopp×NDopp−1 such that

M2 =

r−Ndelay,0 e−j2π

N r−Ndelay,−1 · · · e−j2π

N(NDopp−2)r−Ndelay,−NDopp+2

ej2π

N r−Ndelay,1 r−Ndelay,0 · · · e−j2π

N(NDopp−3)r−Ndelay,−NDopp+3

...... · · · ...

ej2π

N(NDopp−1)r−Ndelay,NDopp−1 ej

2π

N(NDopp−2)r−Ndelay,NDopp−2 · · · ej

2π

N r−Ndelay,1

.


26

Notice that the columns ofW 1e must be orthogonal to those inW 2

e since they can all be placed inside

a height-Ndelay and width-NDopp rectangle in Fig. 1. Since the basis expansions ofW 1e andW 2

e share

the common basis vectorp−Ndelay+1,D, the contribution fromp−Ndelay+1,D to eitherW 1e or W 2

e must be

zero, i.e., either (75) or (76) must hold:

r−1,NDopp = r−2,NDopp = · · · = r−Ndelay+1,NDopp = 0 (75)

r−Ndelay,1 = r−Ndelay,2 = · · · = r−Ndelay,NDopp−1 = 0. (76)

When (75) holds,M1 becomes upper triangular, and (73) implies

p0,−D−1 = r0,NDoppp0,D, (77)

in which case the unit-norm property ofp0,−D−1 andp0,D implies that|r0,NDopp | = 1. When (76) holds,

M2 becomes upper triangular, and (74) implies

p1,−D = r−Ndelay,0p−Ndelay+1,−D (78)

in which case|r−Ndelay,0| = 1. Applying (70), this can be translated to|rNdelay,0| = 1.

Stage (iii) – Establish Contradiction:Now we examine the implications of either|rNdelay,0| = 1 or

|r0,NDopp | = 1 on the MMSE pilot vectorp. In each case, we deduce thatNs 6= N −NDoppNdelay, which

contradicts our original assumption, thereby completing the proof.

We start with the first case, where|r0,NDopp | = 1. Sincer0,NDopp = ejθ for someθ ∈ R, from (66) and

(77), we have

p[i](e−j2π

NNDoppi − ejθ) = 0, ∀i ∈ {0, . . . , N − 1}. (79)

Thus, in order to avoidp = 0, which would not satisfy the MMSE-PAT requirement (65), we must have

θ = −2πN NDoppq for someq ∈ {0, . . . , N − 1}. In this case, (79) implies thatp[i] will be non-zero only

if i = q + kNNDopp

for k ∈ Z such that kNNDopp∈ Z. Now, for k ∈ Z, we define

aq[k] =

∣∣p[q + kN

NDopp]∣∣2 if kN

NDopp∈ Z

0 else

(80)


27

and use requirement (68) to claim that1Ep

∑NDopp−1i=0 aq[i]e

−j 2π

NDoppmi

= δ[m], ∀m ∈ NDopp, which can be

met if and only if

aq[i] =Ep

NDopp, ∀i ∈ {0, . . . , NDopp − 1}. (81)

From (80), it follows that (81) can be met if and only ifNNDopp∈ Z. Now, if N

NDopp∈ Z, then one can

recognize the pilot sequence specified by (80) and (81) as being the “TDKD” MMSE-PAT scheme from

[7], [20], for whichNs = N − (2Ndelay − 1)NDopp < N −NDoppNdelay.

We continue with the second case, where|rNdelay,0| = 1. SincerNdelay,0 = ejθ for someθ ∈ R, from

(66), (70), and (78), it follows that

p[i] = ejθp[i+Ndelay]. (82)

Keeping in mind our modulo-N assumption on time-domain indexing, say thatL is the largest integer

in {1, . . . , Ndelay} for which bothNL ∈ Z andp[i] = ejφp[i+L] for someφ ∈ R. Note that, if NNdelay

∈ Z,

thenL = Ndelay, elseL < Ndelay. Furthermore, modulo-N indexing impliesφ = 2πN Lq for someq ∈ Z.

Let p denote theN -point unitary discrete Fourier transform (DFT) ofp. For a sequencep obeying (82),

we have

p[k] =1√N

N−1∑

i=0

p[i]e−j2π

Nik (83)

=1√N

L−1∑

n=0

p[n]e−j2π

Nnk

N/L−1∑

m=0

e−j 2π

N/L(k−q)m (84)

and hencep[k] = 0 for k /∈ {q, q + NL , . . . , q + N

L (L− 1)}. The MMSE requirement (65) can be written

in terms ofp as [7], [20]

1

Ep

N−1∑

i=0

p[i]p∗[i− k]e−j2π

Nmi = δ[k]δ[m] ∀k ∈ NDopp, ∀m ∈ Ndelay. (85)

Defining∣∣p[q + nN

L ]∣∣2 = aq[n] for n ∈ {0, . . . , L− 1} and using (85) withk = 0, we require

1

Ep

L−1∑

n=0

aq[n]e−j2π

N(nN

L+q)m = δ[m], ∀m ∈ Ndelay. (86)

Since the magnitude of the left side of (86) isL-periodic, (86) can not be satisfied whenL < Ndelay.

Now, if L = Ndelay, then the only sequence{aq[i]} satisfying the requirement (86) isaq[n] = c, ∀n ∈

{0, . . . , Ndelay − 1}, for constantc. This can be recognized as the FDKD MMSE-PAT scheme from [7],

[20], for whichNs = N − (2NDopp − 1)Ndelay < N −NDoppNdelay.


28

APPENDIX F

PROOF OFLEMMA 5

First, we establish that, for the PAT schemes satisfying thehypothesis, the total estimation error satisfies

σ2e ≤ κ

ρ ,∀ρ. ConstructingBp using the orthonormal basis for the column space ofP U , we consider the

projection

yp = BHpy =

√ρBH

pPUλ +√ρBH

pDU + BHpv. (87)

Since the PAT is lossless linearly separable satisfying (30), the projectionyp in (87) captures all the pilot

energy andBHpDU = 0. DenotingG = BH

pPU andvp = BHpv, we have

yp =√ρGλ + vp. (88)

SincePU is full rank, it follows that the matrixG is full rank. Note thatσ2e = E{‖λ − λ‖2} whereλ

denotes the LMMSE estimate ofλ. Using the zero forcing estimate from (88) to upper boundσ2e , we

have

σ2e ≤ 1

ρtr{(GHG)−1}. (89)

SinceG is full rank, we havetr{(GHG)−1} ≤ κ for someκ ∈ R. Now, the desired result follows from

the application of Lemma 4.

REFERENCES

[1] L. Zheng and D. Tse, “Communication over the Grassmann manifold: A geometric approach to the noncoherent multiple-

antenna channel,”IEEE Trans. on Information Theory, vol. 48, pp. 359–383, Feb. 2002.

[2] H. Vikalo, B. Hassibi, B. Hochwald, and T. Kailath, “On the capacity of frequency-selective channels in training-based

transmission schemes,”IEEE Trans. on Signal Processing, pp. 2572–2583, Sept. 2004.

[3] Y. Liang and V. Veeravalli, “Capacity of noncoherent time-selective Rayleigh-fading channels,”IEEE Trans. on Information

Theory, vol. 50, pp. 3095–3110, Dec. 2004.

[4] L. Tong, B. M. Sadler, and M. Dong, “Pilot-assisted wireless transmissions,”IEEE Signal Processing Magazine, vol. 21,

pp. 12–25, Nov. 2004.

[5] B. Hassibi and B. M. Hochwald, “How much training is needed in multiple-antenna wireless links,”IEEE Trans. on

Information Theory, vol. 49, pp. 951–963, Apr. 2003.

[6] A. P. Kannu and P. Schniter, “Capacity analysis of MMSE pilot patterns for doubly selective channels,” inProc. IEEE

Workshop on Signal Processing Advances in Wireless Communication, 2005.


29

[7] A. P. Kannu and P. Schniter, “Design and analysis of MMSE pilot-aided cyclic-prefixed block transmissions for doubly

selective channels,”IEEE Trans. on Signal Processing. (to appear).

[8] A. Lapidoth, “On the asymptotic capacity of stationary Gaussian fading channels,”IEEE Trans. on Information Theory,

vol. 51, pp. 437–446, Feb. 2005.

[9] R. Etkin and D. N. C. Tse, “Degrees of freedom in some underspread MIMO fading channels,”IEEE Trans. on Information

Theory, vol. 52, pp. 1576–1608, Apr. 2006.

[10] G. Durisi, H. Bolcskei, and S. Shamai, “Capacity of underspread WSSUS fading channels in the wideband regime,” in

Proc. IEEE Internat. Symposium on Information Theory, July 2006.

[11] W. Kozek,Matched Weyl-Heisenberg Expansions of Nonstationary Environments. PhD thesis, Vienna Univ. of Technology,

March 1997.

[12] K. Liu, T. Kadous, and A. M. Sayeed, “Orthogonal time-frequency signaling over doubly dispersive channels,”IEEE Trans.

on Information Theory, vol. 50, pp. 2583–2603, Nov. 2004.

[13] T. M. Cover and J. A. Thomas,Elements of Information Theory. New York: Wiley, 1991.

[14] I. E. Telatar, “Capacity of multi-antenna Gaussian channels,”European Trans. on Telecommunications, vol. 10, pp. 585–595,

Nov. 1999.

[15] T. L. Marzetta and B. M. Hochwald, “Capacity of a mobile multiple-antenna communication link in Rayleigh flat fading,”

IEEE Trans. on Information Theory, vol. 45, pp. 139–157, Jan. 1999.

[16] R. Kennedy,Fading Dispersive Communication Channels. New York: Wiley, 1969.

[17] A. Lapidoth and S. Moser, “Capacity bounds via duality with applications to multiple-antenna systems on flat-fading

channels,”IEEE Trans. on Information Theory, vol. 49, pp. 2426–2467, Oct. 2003.

[18] J. H. Manton, I. Y. Mareels, and Y. Hua, “Affine precodersfor reliable communications,” inProc. IEEE Internat. Conf.

on Acoustics, Speech, and Signal Processing, vol. 5, pp. 2749–2752, June 2000.

[19] D. Tse and P. Viswanath,Fundamentals of Wireless Communication. New York: Cambridge University Press, 2005.

[20] A. P. Kannu and P. Schniter, “MSE-optimal training for linear time-varying channels,” inProc. IEEE Internat. Conf. on

Acoustics, Speech, and Signal Processing, 2005.

[21] X. Ma, G. B. Giannakis, and S. Ohno, “Optimal training for block transmissions over doubly-selective wireless fading

channels,”IEEE Trans. on Signal Processing, vol. 51, pp. 1351–1366, May 2003.

[22] H. Weingarten, Y. Steinberg, and S. Shamai (Shitz), “Gaussian codes and weighted nearest neighbor decoding in fading

multiple-antenna channels,” inIEEE Trans. on Information Theory, vol. 50, pp. 1665–1686, Aug. 2004.


Date post:	02-Oct-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

1 On the Spectral Efﬁciency of Noncoherent Doubly ...schniter/pdf/tit08_pat.pdfthe use of coherent...

Documents