1
On the Spectral Efficiency of Noncoherent
Doubly Selective Block-Fading Channels
Arun P. Kannu and Philip Schniter
Dept. ECE, The Ohio State University, Columbus, OH 43210.
Abstract
In this paper, we consider noncoherent single-antenna communication over doubly selective block-
fading channels that obey a complex-exponential basis expansion model. In our noncoherent setup,
neither the transmitter nor the receiver know the channel fading coefficients, though both know the
channel statistics. First, we show that, when the inputs arechosen from continuous distributions, the
achievable spectral efficiency (i.e., the pre-log factor inthe channel capacity expression) equalsmax(0, 1−
NdelayNDopp/N), whereN ,Ndelay, andNDopp denote the channel’s discrete block-fading interval, discrete
delay spread, and discrete Doppler spread, respectively. Next, we study pilot-aided transmission (PAT)
over this channel. In the case of strictly doubly selective fading (i.e.,NDopp > 1 andNdelay > 1), we
establish that affine minimum mean-squared error (MMSE) PATschemes are spectrally inefficient, but
we provide guidelines for the design of spectrally efficientaffine PAT schemes and give an example of
one such scheme.
Index Terms— Noncoherent channels, doubly selective channels, doublydispersive channels, non-
coherent communication, spectral efficiency, channel capacity, achievable rates, pilot symbols, training
symbols, channel estimation.
I. INTRODUCTION
Recently, there has been great interest in characterizing the capacity of wireless multipath channels
under the practical assumption that neither the transmitter nor the receiver has channel state information
2015 Neil Avenue, Columbus, OH 43210. E-mail:[email protected], [email protected]
This work supported by NSF CAREER grant 237037 and the Office of Naval Research.
January 30, 2008 DRAFT
2
(CSI). In this work, we focus on channels that are simultaneously time- and frequency-selective, which
pertain to applications with simultaneously high signaling bandwidth and mobility. The high-SNR capacity
of the noncoherent Gaussian flat-fading channel was characterized in the MIMO case by Zheng and Tse [1]
using the block-fading approximation, whereby the channelcoefficients are assumed to remain constant
over a block ofN symbols and change independently from block to block. Later, Vikalo et al. [2]
characterized the high-SNR capacity of the noncoherent Gaussian frequency-selective block-fading SISO
channel under the assumption that the discrete block-length N exceeds the discrete channel delay spread
Ndelay. Liang and Veeravalli [3] characterized the high-SNR capacity of the SISO Gaussian time-selective
block-fading channel, assuming that, within the block, thechannel coefficients vary according to a finite-
term Fourier series withNDopp ≤ N expansion coefficients that have a full-rank covariance matrix,1 but
change independently from block to block. In [3], they also find the asymptotic capacity of a MIMO sub-
block correlated time-selective fading model, in which thechannel remains constant within a sub-block.
For the aforementioned noncoherent block-fading Gaussianchannels, it has been shown that the capacity
C as a function of SNRρ obeyslimρ→∞C(ρ)/ log(ρ) = η, where theachievable spectral efficiencyη
is given byη = N−1N in the SISO flat-fading case,η =
N−Ndelay
N in the SISO frequency-selective case,2
andη =N−NDopp
N in the SISO time-selective case.
In this work, we consider a SISO channel that combines the frequency-selectivity of [2] with the
time-selectivity of [3], henceforth referred to as theblock-fading doubly selective channel(DSC). More
precisely, this discrete-time channel uses a finite-lengthimpulse response whoseNdelay Gaussian coeffi-
cients vary according to anNDopp-term Fourier series within the block, but change independently from
block to block. When the fading coefficients are uncorrelated in both time and frequency, we show that
the achievable spectral efficiency with continuous input distribution obeysη = max(0,N−NdelayNDopp
N ).
Next, we study pilot-aided transmission (PAT) over this block-fading DSC. In PAT, the transmitter
embeds a known pilot (i.e., training) signal that aids the receiver in data decoding under channel
uncertainty. Often, PAT enables the receiver to compute an explicit channel estimate, thereby facilitating
the use of coherent decoding strategies. (See [4] for a recent comprehensive PAT overview.) We are
1Note that in the case ofNDopp < N , theN -length vector of coefficients has a rank-deficient correlation matrix.
2Assuming uncorrelated inter-symbol interference (ISI) coefficients.
January 30, 2008 DRAFT
3
interested in finding spectrally efficient PAT schemes, i.e., those whose asymptotic achievable rate pre-
log factor equalsmax(0,N−NDoppNdelay
N ). For the design of spectrally efficient PAT schemes, we consider
the only the case thatN > NDoppNdelay; for the case thatN ≤ NDoppNdelay, achieving the spectrally
efficient rate of zero would be trivial.
Previous studies have established that minimum mean squared error (MMSE) PAT, i.e., PAT that
minimizes the error variance of Wiener channel estimates, is spectrally efficient for flat [1], [5]; frequency-
selective [2]; and time-selective [6], [7] block-fading channels. We establish here, however, that MMSE-
PAT schemes arespectrally inefficientfor strictly doubly selective (i.e.,Ndelay > 1 andNDopp > 1) block-
fading channels. For these channels, we then develop guidelines for the design of spectrally efficient PAT
schemes and propose one such scheme.
Before continuing, a few comments are in order.
1) Our work relies on the block-fading assumption, which canbe justified in systems that employ
block interleaving or frequency hopping. Other investigations have circumvented the block-fading
assumption through the use of time-selective channel models whose coefficients vary from symbol
to symbol in a stationary manner. For these stationary models, it is necessary to make a distinction
betweennon-regular3 (e.g., bandlimited) fading processes andregular (e.g., Gauss-Markov) fading
processes. While non-regular fading channels have been shown to behave similarly to time-selective
block-fading channels, regular fading channels behave quite differently [8], [9]. Though similar
results are expected for stationary doubly selective channel models, the details lie outside the scope
of this work.
2) Our work relies on the assumption that intra-block time-variation can be accurately modeled by a
finite-term Fourier series with uncorrelated coefficients.Though we provide a detailed justification
in the sequel, the key idea is that, due to velocity limitations on the communicating terminals and
the scattering surfaces, the channel fading processes willbe bandlimited. It is well known that
bandlimited random sequences can be well approximated by finite-term Fourier series, where the
approximation error decreases with block size.
3Regular processes allow for perfect prediction of the future samples from (a possibly infinite number of) past samples while
non-regular processes do not. For more details, refer to [8].
January 30, 2008 DRAFT
4
3) Some authors (e.g., [10]) have studied the capacity of noncoherent doubly selective channels
assuming afixed set of channel eigenfunctions (as motivated by [11]). This fixed eigenfunction
model is suitable under very mild spreading conditions or inthe low-SNR regime as the ratio of
channel capacity to utilized bandwidth approaches zero [12]. We do not employ this model since
we focus on the high-SNR regime and do not assume mild spreading.
The paper is organized as follows. Section II details the modeling assumptions, Section III analyzes
the high-SNR capacity of the noncoherent doubly selective block-fading channel, Section IV details the
PAT setup for this channel, and Sections V–VI analyze several PAT schemes.
Notation: Matrices (column vectors) are denoted by upper (lower) bold-face letters. The Hermitian
is denoted by(·)H, the transpose by(·)⊤, the conjugate by(·)∗, the determinant bydet(·), and the
Frobenius norm by‖ · ‖F . The Loewner partial order is denoted by�, i.e., B � A means thatB − A
is positive semidefinite. The expectation is denoted byE{·}, the trace bytr{·}, the delta function by
δ(·), the Kronecker product by⊗, the modulo-N operation by〈·〉N and the integer ceiling operation by
⌈·⌉. The null space of a matrix is denoted bynull(·), the column space bycol(·), and the dimension
of a vector space bydim(·). The operation[·]n,m extracts the(n,m)th element of a matrix, where the
indicesn,m begin with0, anddiag(·) constructs a diagonal matrix from its vector-valued argument. The
appropriately dimensioned identity and all-zero matricesare denoted byI and0, respectively, while the
N × N identity matrix is denoted byIN . The set-union operation is denoted by∪, set-intersection by
∩, set-minus by\, and the empty set by∅. The integers are denoted byZ, reals byR, positive reals by
R+, and complex numbers byC.
II. SYSTEM MODEL
In this section, a continuous-time fading channel model is used with pulse-shaped transmission and
reception strategies to obtain a discrete-time baseband-equivalent doubly selective channel model. The
channel’s time-variation is then approximated using a basis expansion model (BEM) and the approxima-
tion is analyzed.
January 30, 2008 DRAFT
5
A. Continuous-Time Model
Consider a baseband-equivalent wireless multipath channel that can be modeled as a linear time-variant
(LTV) distortion plus an additive noise:
y(t) =
∫
h(t; τ)x(t − τ)dτ + v(t). (1)
Say that, over the small time-periodTsmall, the path lengths vary by at most a few wavelengths, so that the
path gains and delays can be assumed constant. Thus, within aduration ofTsmall seconds, it is reasonable
to modelh(t; τ) as a stationary random process for which
E{h(t; τ)h∗(t− to; τ − τo)} = Rlag;delay(to; τ)δ(τo). (2)
Property (2) is commonly known as wide-sense stationary uncorrelated scattering (WSSUS). If we define
RDopp;delay(f ; τ) =
∫
Rlag;delay(t; τ)e−j2πftdt, (3)
then the practical assumptions of finite path-length differences and finite rates of path-length variation
imply that
RDopp;delay(f ; τ) = 0 for
f /∈ [−BDopp,BDopp]
τ /∈ [0,Tdelay]. (4)
Thus, the channel has a causal delay spread ofTdelay seconds and a single-sided Doppler spread ofBDopp
Hz.
B. Discrete-Time Block-Fading Model
Now consider baseband-equivalent modulation, as described by x(t) =∑
n x[n]ψ(t − nTs), where
Ts is the sampling interval in seconds and whereψ(t) is a unit-energy pulse, and baseband-equivalent
demodulation, as described by the received samplesy[n] =∫y(t)ψ∗(t− nTs)dt for n ∈ Z. Throughout,
we assume that the baud rate is larger than the Doppler spread, i.e., 1Ts> 2BDopp. From (1), one can
write
y[n] =∑
l
h[n; l]x[n − l] + v[n] (5)
January 30, 2008 DRAFT
6
with v[n] =∫v(t)ψ∗(t− nTs)dt and
h[n; l] =
∫ ∫
ψ∗(t)h(t+ nTs; τ + lTs)ψ(t − τ)dtdτ. (6)
The received signal is then parsed into length-N blocks (for evenN ), where the block duration
Tburst ≈ NTs is less than the small-scale fading durationTsmall. In the sequel, we consider the block
{y[n]}N−1n=0 without loss of generality.
C. Complex-Exponential Basis-Expansion Model
For the block of interest, the channel response is characterized byh[n; l] for n ∈ {0, . . . , N − 1} and
l ∈ Z. These response coefficients can be parameterized w.l.o.g.using the basis expansion model (BEM)
h[n; l] =1√N
N/2−1∑
k=−N/2λ[k; l]ej
2π
Nnk. (7)
Using the ambiguity function
A(τ, f) =
∫
ψ(t)ψ∗(t− τ)e−j2πftdt, (8)
we can state the following lemma.
Lemma 1 (CE-BEM Statistics):Say that the support ofψ(t) is(− Tψ
2 ,Tψ2
]with Tψ ≤ Ts
2 . Then, as
N → ∞, the BEM coefficients{λ[k; l]}, for k ∈ {−N2 , . . . ,
N2 } andl ∈ Z, are uncorrelated with variance
E{|λ[k; l]|2
}= N
∫ ∣∣∣A
(τ, k
NTs
)∣∣∣
2RDopp;delay
(kNTs
; τ + lTs)dτ. (9)
Proof: See Appendix A.
In the sequel we assume that the support ofψ(t) satisfies the conditions in Lemma 1. Note then that
the variance ofλ[k; l] is a local average ofRDopp;delay(kNTs
; τ + lTs) over the intervalτ ∈ (−Tψ,Tψ] ⊂(− Ts
2 ,Ts2
]. Due to the support ofRDopp;delay(f ; τ) specified by (4), it follows thatλ[k; l] = 0 when
either k /∈{− NDopp−1
2 ,NDopp−1
2
}or l /∈ {0, . . . , Ndelay − 1}, whereNDopp := 2⌈BDoppTsN⌉ + 1 and
Ndelay := ⌈Tdelay/Ts⌉ + 1. We will refer toNDopp as thediscrete Doppler spreadand toNdelay as the
discrete delay spread. Thus, it suffices to parameterize the channel (over theN -block interval) as
h[n; l] =1√N
(NDopp−1)/2∑
k=−(NDopp−1)/2
λ[k; l]ej2π
Nnk. (10)
Furthermore, theNdelay-sample delay spread, in combination with (5), implies that{y[n]}N−1n=0 depend
only on the input samples{x[n]}N−1n=−Ndelay+1.
January 30, 2008 DRAFT
7
D. Block-Fading Doubly Selective Model
Our block-fading CE-BEM DSC model is now summarized. The channel output is given by
y[n] =√ρ
Ndelay−1∑
l=0
h[n; l]x[n − l] + v[n], n ∈ {0, . . . , N − 1}, (11)
where {v[n]} is circular white Gaussian noise (CWGN) of unit variance,{x[n]} is the channel in-
put with power constraint 1N+Ndelay−1 E
{ ∑N−1i=−Ndelay+1 |x[i]|2
}≤ 1, and ρ is the SNR. Definingy =
[y[0], . . . , y[N − 1]
]⊤, v =
[v[0], . . . , v[N − 1]
]⊤andx =
[x[−Ndelay + 1], . . . , x[N − 1]
]⊤, we obtain
the following vector representation of the block-fading model,
y =√ρHx + v, (12)
whereH ∈ CN×(N+Ndelay−1) is given element-wise as[H ]p,q = h[p; p +Ndelay − 1 − q].
Assuming sufficiently largeN , the non-zero response coefficients,h[n; l] for n ∈ {0, . . . , N − 1}
and l ∈ {0, . . . , Ndelay − 1}, obey the BEM (10). With the definitionshl =[h[0; l], . . . , h[N − 1; l]
]⊤,
h = [h⊤0 , . . . ,h
⊤Ndelay−1]
⊤, λl =[λ[−NDopp−1
2 ; l], . . . , λ[NDopp−1
2 ; l]]⊤
, andλ = [λ⊤0 , . . . , λ⊤Ndelay−1]
⊤, this
yields
h = Uλ, (13)
whereU = INdelay⊗F and whereF ∈ CN×NDopp is given element-wise as[F ]n,m = 1√
Nej
2π
Nn(m−(NDopp−1)/2).
Assuming a multitude of paths and leveraging the central limit theorem,λ becomes zero-mean Gaussian
with diagonal positive-definite covariance matrixRλ = E{λλH}. Without loss of generality, the channel
can be assumed energy-preserving, i.e.,1N tr{Rλ} = 1. Finally, independent and identical fading across
blocks is assumed. This assumption can be justified for block-interleaved systems or for time-division or
frequency-hopped systems where blocks are sufficiently separated across time and/or frequency.
III. A CHIEVABLE SPECTRAL EFFICIENCY
We now analyze the per-channel-use ergodic capacity of the CE-BEM DSC, expressed as [13]
C(ρ) = supx:E{‖x‖2}≤N+Ndelay−1
1
NI(y;x), (14)
whereI(y;x) denotes mutual information between the channel output and input, and where the supremum
is taken over all random input distributions satisfying thepower constraint. The mutual information in
January 30, 2008 DRAFT
8
(14) is obtained by averaging implicitly over all channel realizations. It is known that all rates below the
ergodic capacity can be achieved by coding over a large number of block-fading intervals [13], [14].
We defineη, the channel’sachievable spectral efficiency, as the pre-log factor in the high-SNR
expression for the channel capacity:
η = limρ→∞
C(ρ)
log ρ. (15)
For the block-fading DSC, thecoherentergodic capacity (i.e., whenH is known to the receiver), is given
by [14]
Ccoh(ρ) =1
Nsup
Rx�0,tr{Rx}≤N+Ndelay−1E{log det[IN + ρHRxH
H]}, (16)
whereRx = E{xxH} and the expectation is taken over the random matrixH . Using Rx = I gives a
lower bound onCcoh. Also, anyRx meeting the constraint in (16) satisfiesRx � (N + Ndelay − 1)I.
Thus we have4
1
NE{log det[IN + ρHHH]} ≤ Ccoh ≤ 1
NE{log det[IN + ρ(N +Ndelay − 1)HHH]}. (17)
Denoting the eigenvalues ofHHH by {νi}N−1i=0 , we have
1
N
N−1∑
i=0
E log(1 + ρνi) ≤ Ccoh(ρ) ≤ 1
N
N−1∑
i=0
E log(1 + (N +Ndelay − 1)ρνi). (18)
Since the random fading matrixHHH is full rank (almost surely), the eigenvalues are positive and
limρ→∞Ccoh(ρ)log ρ = 1. Thus, in the coherent case, the achievable spectral efficiency of the doubly selective
channel is unity. But, in the noncoherent case, the achievable spectral efficiency is generally less than
unity. In particular, we claim that the achievable spectralefficiency of the noncoherent block-fading CE-
BEM DSC, in the case of continuously distributed inputs, ismax(0, 1− NdelayNDopp
N ). To prove this claim,
we first derive an upper bound on the pre-log factor of mutual information between the input and output
of the block-fading DSC, and later establish the achievability of this bound. Since the optimal input
distribution in terms of mutual information may depend on the SNRρ, we allow the input distribution
to change with respect toρ to find upper bound on the asymptotic mutual information.
4SinceA � B � 0 implies log detA ≥ log detB.
January 30, 2008 DRAFT
9
Theorem 1 (Achievable Spectral Efficiency):For the block-fading CE-BEM DSC, any sequence of
continuous random input vectors{xρ} indexed by SNRρ, satisfying the power constraintE{‖xρ‖2} ≤
N +Ndelay − 1, and converging in distribution to a continuous random vector x∞, yields
lim supρ→∞
1N I(y;xρ)
log ρ≤ max
(
0,N −NDoppNdelay
N
)
. (19)
Proof: See Appendix B.
The following lemma specifies a fixed input distribution thatachieves the mutual information upper bound
given in (19).
Lemma 2 (Achievability):For the block-fading CE-BEM DSC, when the inputx is i.i.d. zero-mean
unit-variance circular Gaussian,
limρ→∞
1N I(y;x)
log ρ= max
(
0,N −NDoppNdelay
N
)
. (20)
Proof: See Appendix C.
It can be seen, from (20), that the loss in achievable spectral efficiency, relative to the coherent case,
increases with thespreading indexγ := NDoppNdelay/N . Sinceγ ≈ 2BDoppTdelay, larger values ofγ
correspond to higher levels of time-frequency dispersion.Thus, our findings, which imply that channel
dispersion limits achievable spectral efficiency, are intuitively satisfying. Forγ ≪ 1, the achievable
spectral efficiency will be close to unity, i.e., that of the coherent case. Such channels have relatively few
unknown parameters and thus are not expected to incur much “training overhead.” For generalγ < 1, the
achievable spectral efficiency of the block-fading DSC, under continuously distributed inputs, coincides
with previous results on special cases of this channel: flat fading (i.e.,Ndelay = 1, NDopp = 1) [1], [15];
time-selective fading (i.e.,Ndelay = 1) [3]; and frequency-selective fading (i.e.,NDopp = 1) [2].
For γ ≥ 1, Theorem 1 and Lemma 2 establish that the pre-log factor of mutual information with
continuous inputs is zero. DSCs for whichγ > 1 can be interpreted as “overspread” channels [16]. Time
and frequency variations of overspread channels are impossible to track even in the absence of noise since
they imply that the number of unknown channel parameters (NDoppNdelay) will be more than the number
of received observations (N ). Our γ ≥ 1 result can be compared with a related result from Lapidoth [17]
that shows that the noncoherent channel capacity grows onlydouble-logarithmicallywhen the differential
entropy (denoted byh(·)) of the channel matrix satisfiesh(H) > −∞. Intuitively, if h(H) > −∞, no
January 30, 2008 DRAFT
10
element ofH can be perfectly estimated with the full knowledge of other elements ofH , so that there
are more unknowns than observations. In fact, we make use of this result in our proof.
Note that, because Theorem 1 restricts the input distribution to be continuous, it does not characterize
the pre-log factor of thecapacity5 of the DSC.
IV. PILOT-A IDED TRANSMISSION
In this section, we detail the encoding and decoding techniques assumed for the PAT schemes analyzed
in this paper. Since a primary advantage of using PAT for noncoherent channels is the application of
communication techniques developed for coherent channels, we focus on the use of Gaussian coding and
(weighted) minimum-distance decoding via pilot-aided linear MMSE (LMMSE) channel estimates. We
are mainly interested in designing PAT schemes that achievethe rates promised by the mutual information
bounds in Theorem 1 and Lemma 2. We restrict our attention to the case whereγ < 1, which allows a
non-zero pre-log factor.
A. PAT Encoder
We assume either cyclic-prefixed (CP) or zero-prefixed (ZP) block-transmission, so that
[x[−Ndelay + 1], . . . , x[−1]
]=
0 ZP,
[x[N −Ndelay + 1], . . . , x[N − 1]
]CP.
(21)
Since, for both CP and ZP, the vectorx′ :=[x[0], . . . , x[N − 1]
]⊤completely specifies the transmission
vectorx defined in Section II-D, we focus our attention on the structure of x′. We considerx′ generated
by the general class ofaffine precodingschemes [18]:
x′ = p + Bs, (22)
wherep is a fixed pilot vector,B ∈ CN×Ns is a fixed full-rank linear precoding matrix, ands ∈ C
Ns is a
zero-mean information-bearing symbol vector and we refer to its dimensionNs as “data dimension.” For
the purpose of achievable-rate analysis, we can assume w.l.o.g. that the columns ofB are orthonormal,
since the mutual information betweens and y remains unaffected by invertible transformations ofs.
5We have not established that the capacity achieving input distribution for our DSC model is a continuous one.
January 30, 2008 DRAFT
11
Denoting the CP/ZP precoding matrix byM ∈ C(N+Ndelay−1)×N , so thatx = Mx′, the DSC model
(12) becomes
y =√ρHM(p + Bs) + v. (23)
The transmitted power constraintE{‖x‖2} ≤ N +Ndelay − 1 will be enforced via constraints onEp =
‖p‖2 > 0 andEs = E{‖s‖2}.
DefiningXi = diag(x[i], . . . , x[i+N −1]) andX = [X0, . . . ,X−Ndelay+1], equation (12) can also be
written asy =√ρXh + v. Due to zero-means, the pilot and data components ofX areP = E{X}
andD = X − P , respectively. Thus, it follows from (13) that
y =√ρPUλ +
√ρDUλ + v. (24)
Note that, when the channel statisticsU andRλ are known, estimation ofh is equivalent to estimation
of λ.
To achieve arbitrarily small probability of decoding errorover the block-fading DSC, we construct long
codewords that span multiple blocks. LetS denote a codebook in which each codewords spansK blocks.
Thus, we can writes = [s[0]⊤, . . . , s[K−1]⊤]⊤, wheres[k] ∈ CNs×1 is the “segment” of codewords that
corresponds to thekth block. We consider codebooks generated according to a Gaussian distribution,
so that each codeword, and its segments, are independently generated with positive-definite segment
covariance matrixRs. Recall that Gaussian codes are capacity-optimal for coherent Gaussian-noise
channels [14].
B. PAT Decoder
We assume that PAT decoding consists of a channel estimationstage followed by a data detection stage.
The channel estimator computes the LMMSE estimate ofh, given the observationy, the pilotsp, and
the (joint) second-order statistics ofh, v ands. Specifically, withRy = E{yyH} andRy,h = E{yhH},
the channel estimate is
h = RHy,hR
−1y y, (25)
January 30, 2008 DRAFT
12
where, from (24),
Ry = ρP URλ(PU )H + ρE{DURλ(DU)H} + I (26)
Ry,h =√ρPURλU
H. (27)
The channel estimation MSE is given by
σ2e = E{‖h − h‖2} = tr{URλU
H − RHy,hR
−1y Ry,h}. (28)
In the sequel,H denotes the version ofH constructed with these channel estimates.
For data detection, we employ weighted minimum-distance decoding based on the LMMSE channel
estimates. Recall that the maximum-likelihood (ML) decoder for coherent Gaussian-noise channels is a
weighted minimum-distance decoder [19] and notice that this decoder is relatively simple compared to
one that performs joint data detection and channel estimation. Given our multi-block coding scheme, the
decoder is specified as
s = arg mins∈S
K−1∑
k=0
‖Q(y[k] −√ρH
[k]
M(p + Bs[k]))‖2, (29)
wherey[k] and H[k]
denote the observation and the estimated channel matrix, respectively, of thekth
block. The choice of the weighting matrixQ is, for the moment, arbitrary.
C. Spectral Efficiency of PAT
For PAT, we say that a rateR is achievable if the probability of decoding error can be made arbitrarily
small at that rate. Since our PAT schemes use Gaussian codes,we employ Theorem 1, which bounds the
spectral efficiency of noncoherent DSC with continuously distributed inputs, in the following definition.
Definition 1: A PAT scheme isspectrally efficientif its achievable rateR(ρ) over the block-fading
CE-BEM DSC satisfieslimρ→∞R(ρ)log ρ =
N−NDoppNdelay
N .
For the case of flat or frequency-selective channels, MMSE-PAT schemes (i.e., those designed to
minimize channel-estimation-error variance) have been shown to be spectrally efficient [1], [2], [5]. In
the sequel, we establish that all CP-based affine MMSE-PAT schemes are spectrally inefficient over the
CE-BEM strictly DSC and propose a spectrally efficient (non-MMSE) affine PAT scheme.
January 30, 2008 DRAFT
13
V. L OSSLESSL INEARLY SEPARABLE PAT
In this section, we focus on affine PAT schemes for which the pilot and data components can be linearly
separated without energy loss at theoutputof the CE-BEM DSC channel, i.e., fromy in (23) and (24).
Practically speaking, theselosslessly linearly separable(LLS) PAT schemes are those that enable the
receiver to compute channel estimates in the absence of datainterference. From (24), it can be seen that
the LLS criterion can be stated as
(P U)HDU = 0, ∀D ∈ D, (30)
whereD refers to the collection of data matrices constructed from all possible codeword realizations. In
the sequel, we use the term MMSE-PAT when referring to any PATscheme that minimizes the channel-
estimation-error varianceσ2e = E{‖h − h‖2} subject to a fixed positive pilot energyEp.
Lemma 3:All MMSE-PAT schemes for the CE-BEM DSC are LLS.
Proof: It has been shown in Theorem 1 of [7], [20] that all CP-based affine MMSE-PAT schemes are
LLS, and it can be inferred from [21] that ZP-based single-carrier MMSE-PAT schemes are also LLS.
A. Achievable Rate
We now analyze the achievable rate of LLS PAT, assuming the encoder/decoder specified in Sec-
tion IV-B. To do this, we first choose the weighting matrixQ in (29). Let the columns ofBd form an
orthonormal basis for the left null space ofPU . Assuming the LLS condition (30), the projection
yd = BHdy =
√ρBH
dHMB︸ ︷︷ ︸
Hd
s + vd (31)
preserves the data component. Then writingHd = Hd + Hd with estimateHd and errorHd, we get
yd =√ρHds +
√ρHds + vd
︸ ︷︷ ︸
n
. (32)
From [22], we know that the rate-maximizing weighting operator for yd (under the restricted set of
Gaussian codebooks) will be the “whitening operator”R−1/2n , whereRn = E{nnH}. Thus, we use
Q = R−1/2n BH
d (33)
January 30, 2008 DRAFT
14
in the decoder (29). Theorem 2 from [22] then directly implies6 the following.
Lemma 4:For an affine PAT scheme that is LLS according to (30) and that uses the weighting factor
Q from (33) in decoder (29), the achievable rate is
R =1
NE{log det[I + ρR−1
n HdRsHHd ]}. (34)
We note that the rate expression (34) resembles that for the coherent case [14] whenn in (32) is
considered as “effective” Gaussian noise.
B. Asymptotic Achievable Rate
We now study the achievable rate of LLS PAT in the high-SNR regime. Since the channel estimation
error becomes part of the effective noise in (34), the MSEσ2e(ρ) from (28) directly influences the
asymptotic behavior of the achievable rate. The following theorem gives a condition on the MSEσ2e(ρ)
of linearly separable PAT that is sufficient to ensure that the achievable rate’s pre-log factor grows in
proportion to the data dimension.
Theorem 2:Suppose a(p,B) PAT scheme is linearly separable according to (30) and guarantees, for
some fixedκ ∈ R, estimation error that satisfiesσ2e(ρ) = E ‖h − h‖2 ≤ κ
ρ for all ρ > 1. Then its
asymptotic achievable rate obeys
limρ→∞
R(ρ)
log ρ=
Ns
N. (35)
Proof: See Appendix D.
When σ2e(ρ) <
κρ , the effective noise varianceRn remains bounded, enabling thelog ρ growth of
achievable rate (34) with pre-log factor equal to the rank ofHd. The estimation-error condition required
for Theorem 2 is quite mild and is satisfied, e.g., by all CP-based affine MMSE-PAT schemes. In
Appendix E, we show that all CP-based affine MMSE-PAT schemesyield Ns < N − NDoppNdelay
whenNdelay > 1 andNDopp > 1, i.e., when the channel isstrictly doubly selective. Putting these two
results together, we make the following claim.
6The achievable rate result in [22] is derived assuming MMSE channel estimates. However, when (30) is satisfied, the LMMSE
estimates (25) are MMSE because the pilot observations and the channel coefficients are jointly Gaussian.
January 30, 2008 DRAFT
15
Theorem 3 (Spectral Inefficiency):For CE-BEM block-fading DSCs withNdelay > 1 andNDopp > 1,
all CP-based affine MMSE-PAT schemes are spectrally inefficient.
Proof: See Appendix E.
ZP-based single-carrier MMSE-PAT schemes, as characterized in [21], also yieldNs < N−NDoppNdelay,
and hence are also spectrally inefficient whenNdelay > 1 andNDopp > 1.
For singly selective channels, however, there do exist spectrally efficient MMSE-PAT schemes, such as
those specified for frequency-selective channels (i.e.,NDopp = 1) in [2] and for time-selective channels
(i.e.,Ndelay = 1) in [3], [6], [7]. This can be understood by the fact that, in the frequency- (time-) selective
case, the effective channel matrixHM hasNdelay (NDopp) deterministiceigenvectors, known to the
transmitter, so that MMSE-estimation of theNdelay (NDopp) channel parameters can be accomplished by
sacrificing onlyNdelay (NDopp) signaling dimensions to pilots. In the doubly selective case, however, the
eigenvectors ofHM are not deterministic and (under our assumptions) unknown to the transmitter, so
that MMSE-estimation of theNDoppNdelay channel parameters requires sacrificing more thanNDoppNdelay
signaling dimensions to pilots.
VI. SPECTRALLY EFFICIENT PAT
As established in Section V, CP-based affine MMSE-PAT schemes, as well as ZP-based single-carrier
MMSE-PAT schemes, are spectrally inefficient in strictly doubly selective CE-BEM fading, i.e., when
NDopp > 1 andNdelay > 1, because they sacrifice more thanNDoppNdelay signaling dimensions to pilots.
In this section, we design spectrally efficient PAT schemes by side-stepping the MMSE requirement.
Since we have restricted ourselves to non-data-aided channel estimation, we reason that the lossless
linear separability criterion (30) is still essential, since, without it, channel estimation would suffer
unknown-data interference and, as a result, estimation error would persist even asρ → ∞. Precise
conditions for spectrally efficient PAT are given in the following lemma.
Lemma 5:Suppose that a(p,B) PAT scheme satisfies the following conditions:
1) PU is full rank,
2) rank(B) = Ns = N −NDoppNdelay, and
3) P U guarantees LLS according to (30).
January 30, 2008 DRAFT
16
Then the PAT scheme is spectrally efficient.
Proof: See Appendix F.
In Lemma 5, the first condition avoids an undetermined systemof equations during channel estimation,
the second enables the transmission ofN − NDoppNdelay linearly independent data symbols per block,
and the third prevents data-interference during channel estimation. A spectrally efficient PAT (SE-PAT)
scheme satisfying these three requirements is now described.
Example 1 (SE-PAT):AssumingN -block transmission over the CE-BEM DSC, consider the pilot
index setPs = {0, Ndelay, . . . , (NDopp−1)Ndelay} and the guard index setGs = {0, . . . , NDoppNdelay−1}.
Then construct a ZP-based affine PAT scheme(p,B) where
p[k] =
√EpNDopp
ejθ[k] k ∈ Ps
0 k /∈ Ps, (36)
for arbitraryθ[k] ∈ R and whereB is constructed from the columns ofIN whose indices arenot in Gs.
For the example scheme, note that the firstNDoppNdelay time slots are used by pilots while the remaining
time slots are used for data transmission, thereby ensuringlinear separability. It can be readily verified
that B has rankN − NDoppNdelay and thatPU has full rank, so that all three conditions in Lemma 5
are satisfied. Such SE-PAT schemes are advantageous in that they yield higher achievable rates than
spectrally inefficient (e.g., MMSE-PAT) schemes at high SNR.
VII. C ONCLUSION
In this paper, the achievable spectral efficiency of the noncoherent CE-BEM DSC with continuous
input distributions was shown to beN−NDoppNdelay
N ≈ 1 − 2BDoppTdelay, whereBDopp denotes the single-
sided Doppler spread andTdelay denotes the delay spread of the WSSUS channel. In addition, CP-based
affine MMSE-PAT schemes were shown to be spectrally inefficient, and a design procedure for spectrally
efficient PAT schemes was provided.
APPENDIX A
PROOF OFLEMMA 1
In this appendix we analyze the statistics of the CE-BEM coefficientsλ[k; l] by first considering the
statistics of the discrete-time channel coefficientsh[n; l]. From (2)-(3) and (6), it is straightforward to
January 30, 2008 DRAFT
17
show that
E{h[n; l]h∗[n−m; l − q]}
=
∫ ∫ ∫
ψ∗(t′ + v)ψ(t′ + v − τ)ψ(t′)ψ∗(t′ − τ − qTs)dt′
×∫
RDopp;delay(f ; τ + qTs + lTs)ej2πf(v+mTs)dfdvdτ. (37)
From (7), we knowλ[k; l] = 1√N
∑N−1n=0 h[n; l]e−j
2π
Nnk for k ∈ {−N
2 , . . . ,N2 − 1}, so that
E{λ[k; l]λ∗[k − p; l − q]}
=1
N
N−1∑
n=0
N−1∑
n′=0
E{h[n; l]h∗[n′; l − q]}e−j 2π
N[nk−n′(k−p)] (38)
=N−1∑
m=−N+1
(N − |m|)e−j 2π
Nmk E{h[n; l]h∗[n−m; l − q]} 1
N
N−1∑
n′=0
e−j2π
Nn′p (39)
= δ[p]
N−1∑
m=−N+1
(N − |m|)e−j 2π
Nmk E{h[n; l]h∗[n−m; l − q]}, (40)
usingm := n − n′ and the fact thatE{h[n; l]h∗[n −m; l − q]} is invariant ton. Combining (37) with
(40), it is straightforward to obtain
E{λ[k; l]λ∗[k − p; l − q]}
= δ[p]
∫ ∫ ∫
ψ∗(t′ + v)ψ(t′ + v − τ)ψ(t′)ψ∗(t′ − τ − qTs)dt′
×∫
RDopp;delay(f′ + k
NTs; τ + qTs + lTs)e
j2π(f ′+ k
NTs)v
×N−1∑
m=−N+1
(N − |m|)ej2πmTsf ′
df ′dvdτ. (41)
Focusing on the case of large block-sizeN , we apply the rulelimN→∞1N
∑N−1m=−N+1(N−|m|)ej2πmφ =
∑∞i=−∞ δ(φ− i) with φ = Tsf
′ to the previous result and find
limN→∞
E{λ[k; l]λ∗[k − p; l − q]}
= Nδ[p]
∫ ∫ ∫
ψ∗(t′ + v)ψ(t′ + v − τ)ψ(t′)ψ∗(t′ − τ − qTs)dt′
×∞∑
i=−∞RDopp;delay(
k+NiNTs
; τ + qTs + lTs)ej2π k+Ni
NTsvdvdτ (42)
= Nδ[p]
∫ ∫ ∫
ψ∗(t′ + v)ψ(t′ + v − τ)ψ(t′)ψ∗(t′ − τ − qTs)dt′
×RDopp;delay(kNTs
; τ + qTs + lTs)ej2π k
NTsvdvdτ, (43)
January 30, 2008 DRAFT
18
where for (43) we used the assumptions thatBDopp <1
2TsandRDopp;delay(f ; ·) = 0 for f /∈ [−BDopp,BDopp]
with the fact thatk ∈ {−N2 , . . . ,
N2 − 1} to write
∑∞i=−∞RDopp;delay
(k+iNNTs
; ·)
= RDopp;delay(
kNTs
; ·).
Writing (43) in terms of the ambiguity function (8) yields
limN→∞
E{λ[k; l]λ∗[k − p; l − q]}
= Nδ[p]
∫
A∗(τ, kNTs
)A
(τ + qTs,
kNTs
)RDopp;delay
(kNTs
; τ + qTs + lTs)dτ. (44)
If the support ofψ(t) is(− Tψ
2 ,Tψ2
], then it can be seen that, whenTψ ≤ Ts
2 , the functionsA(τ, ·
)and
A(τ + qTs, ·
)∣∣q 6=0
share no common support, in which case (44) reduces to
limN→∞
E{λ[k; l]λ∗[k − p; l − q]}
= Nδ[p]δ[q]
∫ ∣∣∣A
(τ, k
NTs
)∣∣∣
2RDopp;delay
(kNTs
; τ + lTs)dτ. (45)
APPENDIX B
PROOF OFTHEOREM 1
DefiningL = min(N,NdelayNDopp), we define the vectorys =[y[0], . . . , y[L− 1]
]⊤.
Claim: limρ→∞I(ys;x
ρ)log ρ = 0.
Proof: Using the chain rule for mutual information [13], we have
I(ys;xρ) = I(y[0];xρ) +
L−1∑
i=1
I(y[i];xρ|y[0], . . . , y[i− 1]) (46)
≤ I(y[0];xρ) +
L−1∑
i=1
I(y[i];xρ, y[0], . . . , y[i− 1]). (47)
In the sequel, we analyze each term in (47) separately. In preparation , we define the vectorsxρ
i =
[xρ[i], . . . , xρ[i−Ndelay + 1]
]⊤and their “complements”xρ
i , which are composed of elements ofxρ not
in xρ
i . We also define the channel vectorshi =[h[i; 0], ..., h[i;Ndelay − 1]
]⊤, where
y[i] =√ρh⊤
i xρ
i + v[i]. (48)
Next we establish the useful result thatI(y[i];xρ
i ) ≤ log log ρ + Ξ for some constantΞ ∈ R. Towards
this aim, we use a special case of the capacity result from [17, Thm. 4.2], which is stated below.
Lemma 6 (Special case of Theorem 4.2 from [17]):Consider the following vector input-output rela-
tion for CWGN block fading channel modely =√ρHx + v. The input and the noise power are
January 30, 2008 DRAFT
19
constrained asE{‖x‖2} ≤ P1 and E{‖v‖2} ≤ P2, respectively, for some positive constantsP1 and
P2. Furthermore assume that the channel fades independently from block to block and only the channel
fading statistics are available at both the transmitter andreceiver. If the differential entropy (denoted
by h(·)) of the channel fading matrixH satisfiesh(H) > −∞, then the high-SNR asymptotic ergodic
channel capacity obeyslim supρ→∞C(ρ) − log log ρ <∞.
In our model (48), since the elements ofhi are independent with positive variance, the covariance matrix
of hi, denoted byRi, is positive definite and hence the differential entropy satisfiesh(h⊤i ) = h(hi) =
log detRi >∞. Applying Lemma 6 to (48), it follows thatI(y[i];xρ) ≤ log log ρ+ Ξ.
The first term in (47) can be writtenI(y[0];xρ) = I(y[0];xρ
0) + I(y[0]; xρ
0|xρ
0). Conditioned onxρ
0, the
uncertainty iny[0] is due to channel coefficients and additive noise, which are independent ofxρ
0. Hence,
I(y[0]; xρ
0|xρ
0) = 0. SinceI(y[0];xρ
0) ≤ log log ρ + Ξ, we know limρ→∞I(y[0];xρ)
log ρ = 0. Considering the
general term inside the summation of (47),
I(y[i];xρ, y[0], . . . , y[i− 1])
= I(y[i];xρ
i)︸ ︷︷ ︸
≤log log ρ+Ξ
+ I(y[i]; xρ
i |xρ
i)︸ ︷︷ ︸
=0
+ I(y[i]; y[0], . . . , y[i− 1]|xρ)︸ ︷︷ ︸
Ti
,
it remains to be shown thatlimρ→∞Ti
log ρ = 0.
Recall thaty and h are jointly Gaussian conditioned onxρ. In terms of differential entropies,
I(y[i]; y[0], . . . , y[i− 1]|xρ) = h(y[i]|xρ) − h(y[i]|xρ, y[0], . . . , y[i− 1]). It follows that
h(y[i]|xρ) = E{log(1 + ρ
Ndelay−1∑
ℓ=0
E{|h[i; ℓ]|2}|xρ[i− ℓ]|2)}, (49)
where the expectation is with respect toxρ. Now, given{y[0], . . . , y[i − 1]}, we split y[i] into MMSE
estimate and error asy[i] = E{y[i]|y[0], . . . , y[i− 1],xρ} + y[i]. Sincey is Gaussian givenxρ, we have
h(y[i]|y[0], . . . , y[i − 1],xρ) = E log(E |y[i]|2), where the expectation inside thelog is w.r.t. H and v
and the expectation outside thelog is w.r.t. xρ. Denoting the covariance ofhi−E{hi|y[0], . . . , y[i− 1]}
by Ri, we haveE |y[i]|2 = 1 + ρxρHi Rix
ρ
i . Let µmax,i denote the maximum eigenvalue ofRi and qi
denote the corresponding eigenvector. Now defineκmax,i = infxρ∈CN µmax,i. For i ∈ {1, . . . , L − 1},
all the elements ofhi can not be estimated perfectly, even in the absence of noise (ρ = ∞), since
{y[0], . . . , y[i − 1]} correspond to a projection ofλ onto a subspace of smaller dimension, and hence
January 30, 2008 DRAFT
20
κmax,i > 0. Now, E |y[i]|2 ≥ 1 + ρκmax,i|∑Ndelay−1
k=0 qi[k]xρ[i− k]|2, and hence
h(y[i]|xρ, y[0], . . . , y[i− 1]) ≥ E{log(1 + ρκmax,i|Ndelay−1
∑
k=0
qi[k]xρ[i− k]|2)}. (50)
Combining (49) and (50), we haveTi ≤ E log1+ρ
P
Ndelay−1
ℓ=0 E{|h[i;ℓ]|2}|xρ[i−ℓ]|2
1+ρκmax,i|P
Ndelay−1
k=0 qi[k]xρ[i−k]|2. Sincexρ is a sequence of
continuous random vectors converging to a continuous random vector,
limρ→∞ |∑Ndelay−1k=0 qi[k]x
ρ[i− k]|2 > 0 with probability 1, and limρ→∞Ti
log ρ = 0.
Now, if N ≤ NDoppNdelay, the proof is complete since in that caseys = y. For the caseN >
NDoppNdelay, we defineyr =[y[NDoppNdelay], . . . , y[N − 1]
]⊤and, using the chain rule for mutual
information, obtainI(y;xρ) = I(ys;xρ) + I(yr;x
ρ|ys). To complete the proof, we need to establish that
lim supρ→∞I(yr ;x
ρ)log ρ ≤ N −NDoppNdelay. For this we have
I(yr;xρ|ys) = h(yr|ys) − h(yr|ys,xρ) (51)
≤ h(yr) − h(yr|ys,xρ,H), (52)
since conditioning reduces entropy. Now,E{|y[n]|2} = E{|v[n]|2} + ρ∑Ndelay−1
l=0 E{|h[n; l]|2}E{|x[n −
l]|2} ≤ 1 + k1ρ for some constantk1 ∈ R. Bounding the maximum eigenvalue of the covariance matrix
of yr by the sum of its diagonal elements, we see thatRyr � (N −NdelayNDopp)(k1ρ+1)IN−NDoppNdelay .
Since the Gaussian distribution maximizes the entropy for agiven covariance matrix, we haveh(yr) ≤
log det[(N − NdelayNDopp)(k1ρ + 1)IN−NDoppNdelay ]. Finally, h(yr|ys,xρ,H) is equal to the entropy
of the unit variance white noise term inyr, which is bounded and independent ofρ. So, we have
lim supρ→∞I(yr ;x
ρ|ys)log ρ ≤ N −NDoppNdelay.
APPENDIX C
PROOF OFLEMMA 2
Since mutual information is non-negative, it is sufficient to restrict ourselves to the case ofN >
NDoppNdelay. We need only to prove that the lower bound on the mutual information with Gaussian
inputs satisfies the equality in (19). Using the chain rule for mutual information, we have
I(y;x) = I(y;x,H) − I(y;H |x) (53)
≥ I(y;x|H) − I(y;H |x). (54)
January 30, 2008 DRAFT
21
Since I(y;x|H) corresponds to coherent case of perfect receiver CSI and since x is Gaussian with
covarianceRx = I, we have [14]
I(y;x|H) = E{log det[IN + ρHHH]}. (55)
SinceHHH is full rank (almost surely), re-using the arguments following (17) yields
limρ→∞
I(y;x|H)
log ρ= N. (56)
Now, for matrix X appropriately constructed from the input samples{x[i]}N−1i=−Ndelay+1, (12) can be
written as
y =√ρXh + v.
Using the BEM model (13), we havey =√ρXUλ+ v. Sinceλ captures all the degrees of freedom of
DSC over a block, we haveI(y;H |x) = I(y;λ|x) = I(y;λ|X). Conditioned onX, the vectorsy and
λ are jointly Gaussian, and hence, using the statistics ofλ and Jensen’s inequality, we have
I(y;λ|X) = E log det[I + ρ(XU)Rλ(XU)H] (57)
≤ log det E[I + ρN(XU)HXU ] (58)
≤ NDoppNdelay log ρ+ Ξ, (59)
for some constantΞ, where (59) follows from the fact thatE(XU)H(XU) � kINDoppNdelay for some
constantk. So, finally we have
limρ→∞
I(y;H |x)
log ρ≤ NDoppNdelay. (60)
The desired result follows from (54), (56) and (60).
APPENDIX D
PROOF OFTHEOREM 2
According to Lemma 4, a linearly separable PAT scheme with weighting matrixQ in (33) achieves the
rate given in (34). To derive a lower bound on the achievable-rate pre-log factor, we first obtain a bound
(in the positive semi-definite sense) onRn, the covariance matrix of√ρBH
d HMBs + vd. Because of
the orthogonality of pilot and data subspaces of lossless linearly separable PAT, the elements of the (pilot
January 30, 2008 DRAFT
22
based) channel estimation error matrixH are independent to the noise in the data subspacevd and also
to the data vectors. So, we have
Rn = E{ρBHd HMBs(BH
d HMBs)H + vdvHd },
= E{ρBHd HMBRs(B
Hd HMB)H} + I,
� ρσ2eEs‖M‖2
F I + I, (61)
where the inequality (61) follows from applying the inequalitiesRs � EsI, BBH � I, E{HHH} � σ2
eI
and BHdBd = I. Incorporating the conditionσ2
e(ρ) ≤ κρ , we see thatRn ≤ CI for some constantC,
∀ρ > 1. So, we haveR−1n HdRsHd � ρ
C HdRsHd and the achievable rate (34) can be bounded as
R(ρ) ≥ 1
NE{log det[I +
ρ
CHdRsH
Hd ]} (62)
≥ 1
NE{log det[I +
ρσ2s
CHdH
Hd ]}, (63)
whereσ2s denotes the minimum eigenvalue ofRs. Sinceσ2
e(ρ) → 0 asρ → ∞, the channel estimates
converge almost everywhere to the true channel, i.e.,limρ→∞ H = H . Also, sinceHdHHd has rank
equal torank(B) = Ns, we havelimρ→∞R(ρ)log ρ ≥ Ns
N . To derive an upper bound on the achievable-rate’s
pre-log factor, we use Jensen’s inequality to take the expectation inside thelog det(·) term of (34), thereby
obtaininglimρ→∞R(ρ)log ρ ≤ Ns
N . Together, the upper and lower bounds yield (35).
APPENDIX E
PROOF OFTHEOREM 3
In this proof, we restrict our attention to strictly doubly selective channels, i.e., DSCs for which
Ndelay > 1 andNDopp > 1. Throughout this proof, we consider all indices modulo-N . Let (p,B) be an
arbitrary CP-MMSE PAT scheme for strictly DSC. We establishthe desired result in the following two
steps:
1) For the CP-MMSE-PAT scheme(p,B), the achievable rate pre-log factor equalsrank(B).
2) For strictly DSCs, any CP-MMSE-PAT scheme(p,B) obeysrank(B) < N −NdelayNDopp.
January 30, 2008 DRAFT
23
Step 1) of Proof:
The characterization of CP-based affine MMSE-PAT in [7], [20] establishes that the linear separability
condition (30) is satisfied, and furthermore thatE{‖h‖2} = tr{(R−1λ + ρEp
N INDoppNdelay)−1}. Recalling that
Rλ is diagonal, and defining positiveαi = [Rλ]i,i, we find E{‖h‖2} =∑NdelayNDopp−1
i=0 ( 1αi
+ ρEpN )−1 ≤
NNdelayNDopp
ρEp. Thus, all CP-based affine MMSE-PAT schemes satisfy the hypotheses of Theorem 2 and
hence the pre-log factor of their achievable rates are equalto their corresponding data dimensionrank(B).
Step 2) of Proof:
Now, we show that, whenNdelay > 1 andNDopp > 1, CP-based affine MMSE-PAT guarantees data
dimensionNs < N−NDoppNdelay. To establish the condition onNs, we use the method of contradiction.
In particular, we proceed in the following stages.
(i) Assume that there exists a CP-MMSE-PAT scheme for strictly DSC that allowsNs = N−NDoppNdelay.
(ii) Find the necessary requirements onp andB for such a PAT scheme.
(iii) Establish that the PAT schemes satisfying the requirements obtained in stage (ii) obeyNs < N −
NDoppNdelay, contradicting the initial assumption of stage (i).
Stage (i) – Initial Assumption:Let us assume that there exists a CP-MMSE PAT scheme(p,B) for
strictly DSC that satisfiesNs = N −NDoppNdelay.
Stage (ii) – Necessary Requirements:The necessary conditions on CP-based affine MMSE-PAT for
the CE-BEM DSC established in [7], [20] can be expressed as the pair (64)-(65) usingp[i] = [p]i,
bq[i] = [B]i,q, Ndelay = {−Ndelay + 1, . . . , Ndelay − 1}, andNDopp = {−NDopp + 1, . . . , NDopp − 1}:
N−1∑
i=0
bq[i]p∗[i− k]e−j
2π
Nmi = 0 ∀k ∈ Ndelay, ∀m ∈ NDopp, ∀q ∈ {0, . . . , Ns − 1}, (64)
1
Ep
N−1∑
i=0
p[i]p∗[i− k]e−j2π
Nmi = δ[k]δ[m] ∀k ∈ Ndelay, ∀m ∈ NDopp. (65)
Notice that (64) states the linear separability condition (30) in the case of a CE-BEM DSC. Defining
pk,m =1
√Ep
[p[k]ej
2π
Nm·0, p[k + 1]ej
2π
Nm·1, . . . , p[k +N − 1]ej
2π
Nm(N−1)
]⊤(66)
as a (normalized)k-time-shifted andm-frequency-shifted version of pilot vectorp, and constructing
matrix W from columns{pk,m, k ∈ Ndelay,m ∈ NDopp}, equation (64) can be conveniently rewritten as
January 30, 2008 DRAFT
24
W HB = 0. It will be convenient to visualize the elements of{pk,m, k ∈ Ndelay,m ∈ NDopp} arranged
in a grid, as in Fig. 1. For this, we use the abbreviationD = (NDopp − 1)/2.
p−Ndelay+1,−NDopp+1
p−Ndelay+2,−NDopp+1
...
p0,−NDopp+1
p1,−NDopp+1
...
pNdelay−1,−NDopp+1
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
p−Ndelay+1,−D−1
p−Ndelay+2,−D−1
...
p0,−D−1
p1,−D−1
...
pNdelay−1,−D−1
p−Ndelay+1,−D
p−Ndelay+2,−D
...
p0,−D
p1,−D
...
pNdelay−1,−D
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
p−Ndelay+1,D−1
p−Ndelay+2,D−1
...
p0,D−1
p1,D−1
...
pNdelay−1,D−1
p−Ndelay+1,D
p−Ndelay+2,D
...
p0,D
p1,D
...
pNdelay−1,D
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
· · ·
p−Ndelay+1,NDopp+1
p−Ndelay+2,NDopp+1
...
p0,NDopp+1
p1,NDopp+1
...
pNdelay−1,NDopp+1
W o
W 1e
W 2e
Fig. 1. Elements of the set{pk,m, k ∈ Ndelay, m ∈ NDopp} arranged in a grid, usingD = (NDopp − 1)/2.
Let (p,B) be a CP-based affine MMSE-PAT scheme with data dimensionNs = N − NDoppNdelay
(i.e., rank(B) = Ns). We now deduce some essential properties ofp. Defining
rk,m := 〈p0,0,pk,m〉 =1
Ep
N−1∑
i=0
p[i]p∗[i+ k]e−j2π
Nmi, (67)
where〈x,y〉 = yHx denotes the inner product, the MMSE condition (65) implies that
rk,m = δ[k]δ[m], ∀k ∈ Ndelay, ∀m ∈ NDopp. (68)
Note also that
〈pk1,m1,pk2,m2
〉 = ej2π
N(m2−m1)k1rk2−k1,m2−m1
(69)
r∗k,m = 〈pk,m,p0,0〉 = e−j2π
Nmkr−k,−m. (70)
Together, (68) and (69) imply thatthe elements within anyrectangle of heightNdelay and widthNDopp in
Fig. 1 are orthonormal. In addition to being a CP-MMSE-PAT,(p,B) satisfiesNs = N −NDoppNdelay,
which results in additional restrictions onp andB that are stated in Lemma 7.
Lemma 7:For a CP-MMSE-PAT withNs = N −NDoppNdelay, either |r0,NDopp | = 1 or |rNdelay,0| = 1.
Proof: Let W o be the matrix constructed from columns{pk,m, k ∈ {0,−1, . . . ,−Ndelay + 1}, m ∈
{−D, . . . ,D}}. Since these columns form a rectangle of heightNdelay and widthNDopp in Fig. 1, we
know they are orthonormal. Furthermore, since these columns form a subset of the columns ofW , we
January 30, 2008 DRAFT
25
know thatrank(W ) ≥ NDoppNdelay. But, since the MMSE condition (64)W HB = 0 implies that the
nullspace ofW H has a dimension of leastNs = N −NDoppNdelay, i.e., thatrank(W ) ≤ NDoppNdelay,
we see thatrank(W ) = NDoppNdelay. Hence, the columns ofW o form an orthonormal basis for the
columns ofW , which implies
pk,m =
Ndelay−1∑
i=0
D∑
j=−D〈pk,m,p−i,j〉p−i,j, ∀k ∈ Ndelay,∀m ∈ NDopp, (71)
=
Ndelay−1∑
i=0
D∑
j=−Dej
2π
N(j−m)kr−i−k,j−mp−i,j, ∀k ∈ Ndelay,∀m ∈ NDopp. (72)
Now let W 1e = [p0,−D−1, . . . ,p−Ndelay+2,−D−1]. (See Fig. 1.) Considering that we can enclose these
elements in a height-Ndelay and width-NDopp rectangle in Fig. 1, we can see that the columns ofW 1e
form an orthonormal set, and that the columns ofW 1e are orthogonal to most columns inW o. Using
(72) to write the columns ofW 1e as a linear combination of those columns inW o that are not orthogonal
to those inW 1e, we have
W 1e = [p0,D,p−1,D, . . . ,p−Ndelay+1,D]M 1, (73)
for M 1 ∈ CNdelay×Ndelay−1 such that
M1 =
r0,NDopp e−j2π
NNDoppr1,NDopp · · · e−j
2π
NNDopp(Ndelay−2)rNdelay−2,NDopp
r−1,NDopp e−j2π
NNDoppr0,NDopp · · · e−j
2π
NNDopp(Ndelay−2)rNdelay−3,NDopp
...... · · · ...
r−Ndelay+1,NDopp e−j2π
NNDoppr−Ndelay+2,NDopp · · · e−j
2π
NNDopp(Ndelay−2)r−1,NDopp
.
Now, letting W 2e = [p1,−D, . . . ,p1,D−1] and carrying out a similar procedure, we have
W 2e = [p−Ndelay+1,−D,p−Ndelay+1,−D+1, . . . ,p−Ndelay+1,D]M2, (74)
for M 2 ∈ CNDopp×NDopp−1 such that
M2 =
r−Ndelay,0 e−j2π
N r−Ndelay,−1 · · · e−j2π
N(NDopp−2)r−Ndelay,−NDopp+2
ej2π
N r−Ndelay,1 r−Ndelay,0 · · · e−j2π
N(NDopp−3)r−Ndelay,−NDopp+3
...... · · · ...
ej2π
N(NDopp−1)r−Ndelay,NDopp−1 ej
2π
N(NDopp−2)r−Ndelay,NDopp−2 · · · ej
2π
N r−Ndelay,1
.
January 30, 2008 DRAFT
26
Notice that the columns ofW 1e must be orthogonal to those inW 2
e since they can all be placed inside
a height-Ndelay and width-NDopp rectangle in Fig. 1. Since the basis expansions ofW 1e andW 2
e share
the common basis vectorp−Ndelay+1,D, the contribution fromp−Ndelay+1,D to eitherW 1e or W 2
e must be
zero, i.e., either (75) or (76) must hold:
r−1,NDopp = r−2,NDopp = · · · = r−Ndelay+1,NDopp = 0 (75)
r−Ndelay,1 = r−Ndelay,2 = · · · = r−Ndelay,NDopp−1 = 0. (76)
When (75) holds,M1 becomes upper triangular, and (73) implies
p0,−D−1 = r0,NDoppp0,D, (77)
in which case the unit-norm property ofp0,−D−1 andp0,D implies that|r0,NDopp | = 1. When (76) holds,
M2 becomes upper triangular, and (74) implies
p1,−D = r−Ndelay,0p−Ndelay+1,−D (78)
in which case|r−Ndelay,0| = 1. Applying (70), this can be translated to|rNdelay,0| = 1.
Stage (iii) – Establish Contradiction:Now we examine the implications of either|rNdelay,0| = 1 or
|r0,NDopp | = 1 on the MMSE pilot vectorp. In each case, we deduce thatNs 6= N −NDoppNdelay, which
contradicts our original assumption, thereby completing the proof.
We start with the first case, where|r0,NDopp | = 1. Sincer0,NDopp = ejθ for someθ ∈ R, from (66) and
(77), we have
p[i](e−j2π
NNDoppi − ejθ) = 0, ∀i ∈ {0, . . . , N − 1}. (79)
Thus, in order to avoidp = 0, which would not satisfy the MMSE-PAT requirement (65), we must have
θ = −2πN NDoppq for someq ∈ {0, . . . , N − 1}. In this case, (79) implies thatp[i] will be non-zero only
if i = q + kNNDopp
for k ∈ Z such that kNNDopp∈ Z. Now, for k ∈ Z, we define
aq[k] =
∣∣p[q + kN
NDopp]∣∣2 if kN
NDopp∈ Z
0 else
(80)
January 30, 2008 DRAFT
27
and use requirement (68) to claim that1Ep
∑NDopp−1i=0 aq[i]e
−j 2π
NDoppmi
= δ[m], ∀m ∈ NDopp, which can be
met if and only if
aq[i] =Ep
NDopp, ∀i ∈ {0, . . . , NDopp − 1}. (81)
From (80), it follows that (81) can be met if and only ifNNDopp∈ Z. Now, if N
NDopp∈ Z, then one can
recognize the pilot sequence specified by (80) and (81) as being the “TDKD” MMSE-PAT scheme from
[7], [20], for whichNs = N − (2Ndelay − 1)NDopp < N −NDoppNdelay.
We continue with the second case, where|rNdelay,0| = 1. SincerNdelay,0 = ejθ for someθ ∈ R, from
(66), (70), and (78), it follows that
p[i] = ejθp[i+Ndelay]. (82)
Keeping in mind our modulo-N assumption on time-domain indexing, say thatL is the largest integer
in {1, . . . , Ndelay} for which bothNL ∈ Z andp[i] = ejφp[i+L] for someφ ∈ R. Note that, if NNdelay
∈ Z,
thenL = Ndelay, elseL < Ndelay. Furthermore, modulo-N indexing impliesφ = 2πN Lq for someq ∈ Z.
Let p denote theN -point unitary discrete Fourier transform (DFT) ofp. For a sequencep obeying (82),
we have
p[k] =1√N
N−1∑
i=0
p[i]e−j2π
Nik (83)
=1√N
L−1∑
n=0
p[n]e−j2π
Nnk
N/L−1∑
m=0
e−j 2π
N/L(k−q)m (84)
and hencep[k] = 0 for k /∈ {q, q + NL , . . . , q + N
L (L− 1)}. The MMSE requirement (65) can be written
in terms ofp as [7], [20]
1
Ep
N−1∑
i=0
p[i]p∗[i− k]e−j2π
Nmi = δ[k]δ[m] ∀k ∈ NDopp, ∀m ∈ Ndelay. (85)
Defining∣∣p[q + nN
L ]∣∣2 = aq[n] for n ∈ {0, . . . , L− 1} and using (85) withk = 0, we require
1
Ep
L−1∑
n=0
aq[n]e−j2π
N(nN
L+q)m = δ[m], ∀m ∈ Ndelay. (86)
Since the magnitude of the left side of (86) isL-periodic, (86) can not be satisfied whenL < Ndelay.
Now, if L = Ndelay, then the only sequence{aq[i]} satisfying the requirement (86) isaq[n] = c, ∀n ∈
{0, . . . , Ndelay − 1}, for constantc. This can be recognized as the FDKD MMSE-PAT scheme from [7],
[20], for whichNs = N − (2NDopp − 1)Ndelay < N −NDoppNdelay.
January 30, 2008 DRAFT
28
APPENDIX F
PROOF OFLEMMA 5
First, we establish that, for the PAT schemes satisfying thehypothesis, the total estimation error satisfies
σ2e ≤ κ
ρ ,∀ρ. ConstructingBp using the orthonormal basis for the column space ofP U , we consider the
projection
yp = BHpy =
√ρBH
pPUλ +√ρBH
pDU + BHpv. (87)
Since the PAT is lossless linearly separable satisfying (30), the projectionyp in (87) captures all the pilot
energy andBHpDU = 0. DenotingG = BH
pPU andvp = BHpv, we have
yp =√ρGλ + vp. (88)
SincePU is full rank, it follows that the matrixG is full rank. Note thatσ2e = E{‖λ − λ‖2} whereλ
denotes the LMMSE estimate ofλ. Using the zero forcing estimate from (88) to upper boundσ2e , we
have
σ2e ≤ 1
ρtr{(GHG)−1}. (89)
SinceG is full rank, we havetr{(GHG)−1} ≤ κ for someκ ∈ R. Now, the desired result follows from
the application of Lemma 4.
REFERENCES
[1] L. Zheng and D. Tse, “Communication over the Grassmann manifold: A geometric approach to the noncoherent multiple-
antenna channel,”IEEE Trans. on Information Theory, vol. 48, pp. 359–383, Feb. 2002.
[2] H. Vikalo, B. Hassibi, B. Hochwald, and T. Kailath, “On the capacity of frequency-selective channels in training-based
transmission schemes,”IEEE Trans. on Signal Processing, pp. 2572–2583, Sept. 2004.
[3] Y. Liang and V. Veeravalli, “Capacity of noncoherent time-selective Rayleigh-fading channels,”IEEE Trans. on Information
Theory, vol. 50, pp. 3095–3110, Dec. 2004.
[4] L. Tong, B. M. Sadler, and M. Dong, “Pilot-assisted wireless transmissions,”IEEE Signal Processing Magazine, vol. 21,
pp. 12–25, Nov. 2004.
[5] B. Hassibi and B. M. Hochwald, “How much training is needed in multiple-antenna wireless links,”IEEE Trans. on
Information Theory, vol. 49, pp. 951–963, Apr. 2003.
[6] A. P. Kannu and P. Schniter, “Capacity analysis of MMSE pilot patterns for doubly selective channels,” inProc. IEEE
Workshop on Signal Processing Advances in Wireless Communication, 2005.
January 30, 2008 DRAFT
29
[7] A. P. Kannu and P. Schniter, “Design and analysis of MMSE pilot-aided cyclic-prefixed block transmissions for doubly
selective channels,”IEEE Trans. on Signal Processing. (to appear).
[8] A. Lapidoth, “On the asymptotic capacity of stationary Gaussian fading channels,”IEEE Trans. on Information Theory,
vol. 51, pp. 437–446, Feb. 2005.
[9] R. Etkin and D. N. C. Tse, “Degrees of freedom in some underspread MIMO fading channels,”IEEE Trans. on Information
Theory, vol. 52, pp. 1576–1608, Apr. 2006.
[10] G. Durisi, H. Bolcskei, and S. Shamai, “Capacity of underspread WSSUS fading channels in the wideband regime,” in
Proc. IEEE Internat. Symposium on Information Theory, July 2006.
[11] W. Kozek,Matched Weyl-Heisenberg Expansions of Nonstationary Environments. PhD thesis, Vienna Univ. of Technology,
March 1997.
[12] K. Liu, T. Kadous, and A. M. Sayeed, “Orthogonal time-frequency signaling over doubly dispersive channels,”IEEE Trans.
on Information Theory, vol. 50, pp. 2583–2603, Nov. 2004.
[13] T. M. Cover and J. A. Thomas,Elements of Information Theory. New York: Wiley, 1991.
[14] I. E. Telatar, “Capacity of multi-antenna Gaussian channels,”European Trans. on Telecommunications, vol. 10, pp. 585–595,
Nov. 1999.
[15] T. L. Marzetta and B. M. Hochwald, “Capacity of a mobile multiple-antenna communication link in Rayleigh flat fading,”
IEEE Trans. on Information Theory, vol. 45, pp. 139–157, Jan. 1999.
[16] R. Kennedy,Fading Dispersive Communication Channels. New York: Wiley, 1969.
[17] A. Lapidoth and S. Moser, “Capacity bounds via duality with applications to multiple-antenna systems on flat-fading
channels,”IEEE Trans. on Information Theory, vol. 49, pp. 2426–2467, Oct. 2003.
[18] J. H. Manton, I. Y. Mareels, and Y. Hua, “Affine precodersfor reliable communications,” inProc. IEEE Internat. Conf.
on Acoustics, Speech, and Signal Processing, vol. 5, pp. 2749–2752, June 2000.
[19] D. Tse and P. Viswanath,Fundamentals of Wireless Communication. New York: Cambridge University Press, 2005.
[20] A. P. Kannu and P. Schniter, “MSE-optimal training for linear time-varying channels,” inProc. IEEE Internat. Conf. on
Acoustics, Speech, and Signal Processing, 2005.
[21] X. Ma, G. B. Giannakis, and S. Ohno, “Optimal training for block transmissions over doubly-selective wireless fading
channels,”IEEE Trans. on Signal Processing, vol. 51, pp. 1351–1366, May 2003.
[22] H. Weingarten, Y. Steinberg, and S. Shamai (Shitz), “Gaussian codes and weighted nearest neighbor decoding in fading
multiple-antenna channels,” inIEEE Trans. on Information Theory, vol. 50, pp. 1665–1686, Aug. 2004.
January 30, 2008 DRAFT