Channel Estimation in Multi-user Massive MIMO Systems by ...

Louisiana State University Louisiana State University

LSU Digital Commons LSU Digital Commons

LSU Doctoral Dissertations Graduate School

March 2021

Channel Estimation in Multi-user Massive MIMO Systems by Channel Estimation in Multi-user Massive MIMO Systems by

Expectation Propagation based Algorithms Expectation Propagation based Algorithms

Mohammed Rashid Louisiana State University and Agricultural and Mechanical College

Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_dissertations

Part of the Electrical and Electronics Commons, Signal Processing Commons, and the Systems and

Communications Commons

Recommended Citation Recommended Citation Rashid, Mohammed, "Channel Estimation in Multi-user Massive MIMO Systems by Expectation Propagation based Algorithms" (2021). LSU Doctoral Dissertations. 5461. https://digitalcommons.lsu.edu/gradschool_dissertations/5461

This Dissertation is brought to you for free and open access by the Graduate School at LSU Digital Commons. It has been accepted for inclusion in LSU Doctoral Dissertations by an authorized graduate school editor of LSU Digital Commons. For more information, please [email protected].

https://digitalcommons.lsu.edu/

https://digitalcommons.lsu.edu/gradschool_dissertations

https://digitalcommons.lsu.edu/gradschool

https://digitalcommons.lsu.edu/gradschool_dissertations?utm_source=digitalcommons.lsu.edu%2Fgradschool_dissertations%2F5461&utm_medium=PDF&utm_campaign=PDFCoverPages

http://network.bepress.com/hgg/discipline/270?utm_source=digitalcommons.lsu.edu%2Fgradschool_dissertations%2F5461&utm_medium=PDF&utm_campaign=PDFCoverPages




https://digitalcommons.lsu.edu/gradschool_dissertations/5461?utm_source=digitalcommons.lsu.edu%2Fgradschool_dissertations%2F5461&utm_medium=PDF&utm_campaign=PDFCoverPages

mailto:[email protected]

CHANNEL ESTIMATION IN MULTI-USER MASSIVE MIMOSYSTEMS BY EXPECTATION PROPAGATION BASED

ALGORITHMS

A Dissertation

Submitted to the Graduate Faculty of theLouisiana State University and

Agricultural and Mechanical Collegein partial fulfillment of the

requirements for the degree ofDoctor of Philosophy

in

The Division of Electrical & Computer Engineering

byMohammed Rashid

B.Sc., NWFP University of Engineering & Technology, 2012M.Sc., Louisiana State University, 2018

May 2021

To my parents, Zahid Mahmood and Zila Huma, and my siblings, Rafia Mahmood and

Mohammed Sajid.

ii

Acknowledgments

I thank my advisor, Dr. Morteza Naraghi-Pour, for his generous advice and infinite

support throughout my study period at Louisiana State University (LSU). His priceless

guidance and discussions have made this work possible.

I am also thankful to all my committee members, Dr. Shuangqing Wei, Dr. Xuebin

Liang, and Dr. Xiangwei Zhou for their valuable feedback and support.

Finally, I thank my beloved parents for their exceptional love, support, and prayers

that have always been a strong source of encouragement for me. I am also grateful to my

siblings for their motivations, unbiased love, and sincere friendship throughout my study

period at LSU.

iii

Table of Contents

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

CHAPTER1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 MIMO to Massive MIMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Massive MIMO System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Channel Estimation in Massive MIMO System . . . . . . . . . . . . . . . . . . . . . . . . 91.4 Outline of this Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 SEMI-BLIND CHANNEL ESTIMATION FOR MULTI-CELLMASSIVE MIMO SYSTEMS ON TIME-VARYING CHANNELS . . . . . . . . . . . 152.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.3 Semi-blind Expectation Propagation Algorithm. . . . . . . . . . . . . . . . . . . . . . . . 242.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3 CLUSTERED SPARSE CHANNEL ESTIMATION FOR MAS-SIVE MIMO SYSTEMS BY EXPECTATION MAXIMIZATION-PROPAGATION (EM-EP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.2 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.3 Expectation Propagation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.4 Expectation Maximization Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

APPENDIXA INFERENCE BY MESSAGE PASSING ON GRAPHICAL MODELS . . . . . . 85

B PROOF OF LEMMA 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

C PROOF OF LEMMA 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

D DERIVING THE MARGINALS IN (3.57) AND (3.58) . . . . . . . . . . . . . . . . . . . . . 100

E COPYRIGHT INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

VITA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

iv

List of Figures

1.1 Multi-cell massive MIMO network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 CSI estimation schemes: (a) TDD frame, (b) FDD frame. . . . . . . . . . . . . . . . . . . . . 10

2.1 Factor graph illustrations of (a) True posterior distribution in(2.11), and (b) Approximated posterior distribution in (2.18). . . . . . . . . . . . . . . . . . 26

2.2 EP steps for updating factor qOt (st,ht). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.3 EP steps for updating factors qRt−1(ht−1) and qF

t (ht) . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4 Channel estimation error versus the receiver’s antenna array sizeM and with parameters: K = 8, Td = 64, Tp = K, a = 0.1, ρ = 0. . . . . . . . . . . . . 37

2.5 SER versus the receiver’s antenna array size M and with pa-rameters: K = 8, Td = 64, Tp = K, a = 0.1, ρ = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.6 Channel estimation error versus the receiver’s antenna array sizeM and with parameters: K = 8, Td = 64, Tp = K, a = 0.1, fd = 0.01. . . . . . . . . . 40

2.7 SER versus the receiver’s antenna array size M and with pa-rameters: K = 8, Td = 64, Tp = K, a = 0.1, fd = 0.01. . . . . . . . . . . . . . . . . . . . . . . . 40

2.8 Channel estimation error versus the cross gain a of users in othercells and with parameters: K = 8, Td = 64, Tp = K, fd = 0.01,ρ = 0.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.9 SER versus the cross gain a of users in other cells and withparameters: K = 8, Td = 64, Tp = K, fd = 0.01, ρ = 0.4. . . . . . . . . . . . . . . . . . . . . . 42

2.10 Channel estimation error versus the unknown data symbols lengthTd and with parameters: K = 8, M = 64, Tp = K, fd = 0.01,ρ = 0.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.11 SER versus the unknown data symbols length Td and with pa-rameters: K = 8, M = 64, Tp = K, fd = 0.01, ρ = 0.4. . . . . . . . . . . . . . . . . . . . . . . . 43

2.12 Channel estimation error versus the normalized Doppler shift fd

and with parameters: K = 8, M = 64, Td = 64, ρ = 0.4, a = 0.3. . . . . . . . . . . . . . 44

2.13 SER versus the normalized Doppler shift fd and with parame-ters: K = 8, M = 64, Td = 64, ρ = 0.4, a = 0.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.1 Factor graph illustrations of (a) True posterior distribution in(3.9), and (b) Approximated posterior distribution in (3.20). . . . . . . . . . . . . . . . . . . 54

3.2 EP steps for updating factor q2,m(wm, zm). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

v

3.3 EP steps for updating factors qR3,m−1(zm−1) and qF

3,m(zm) . . . . . . . . . . . . . . . . . . . . . . 61

3.4 Magnitude of the elements in w for four independent trials withG = 128, M = 200, N = 48, Ls = 3, Lp = 10, and SNR = 10 dB. . . . . . . . . . . . . 75

3.5 Channel estimation error vs. number of pilot symbols N withparameters G = 128, M = 200, SNR=10 dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

3.6 Channel estimation error vs. SNR (dB) with parameters G =128, M = 200, N = 64. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

3.7 Channel estimation error vs. Angular spread A with parametersG = 128, M = 200, N = 64. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

3.8 Channel estimation error vs. grid length M with parametersG = 150, N = 64, SNR = 10 dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

A.1 Probabilistic graphical models: (a) a Bayesian Network, (b) aMarkov random field, and (c) a factor graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

A.2 Message generated in each iteration of SP algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 90

A.3 A piece of the factor graph representing the joint distribution in (A.23). . . . . . . 93

vi

Abstract

Massive multiple input multiple output (MIMO) technology uses large antenna arrays

with tens or hundreds of antennas at the base station (BS) to achieve high spectral efficiency,

high diversity, and high capacity. These benefits, however, rely on obtaining accurate

channel state information (CSI) at the receiver for both uplink and downlink channels.

Traditionally, pilot sequences are transmitted and used at the receiver to estimate the CSI.

Since the length of the pilot sequences scale with the number of transmit antennas, for

massive MIMO systems downlink channel estimation requires long pilot sequences resulting

in reduced spectral efficiency and the so-called pilot contamination due to sharing of the

pilots in adjacent cells.

In this dissertation we first review the problem of channel estimation in massive MIMO

systems. Next, we study the problem of semi-blind channel estimation in the uplink in

the case of spatially correlated time-varying channels. The proposed method uses the

transmitted data symbols as virtual pilots to enhance channel estimation. An expectation

propagation (EP) algorithm is developed to iteratively approximate the joint a posterior

distribution of the unknown channel matrix and the transmitted data symbols with a

distribution from an exponential family. The distribution is then used for direct estimation

of the channel matrix and detection of the data symbols. A modified version of Kalman

filtering algorithm referred to as KF-M emerges from our EP derivation and it is used

to initialize our algorithm. Simulation results demonstrate that channel estimation error

and the symbol error rate of the proposed algorithm improve with the increase in the

number of BS antennas or the number of data symbols in the transmitted frame. Moreover,

the proposed algorithms can mitigate the effects of pilot contamination as well as time-

variations of the channel.

Next, we study the problem of downlink channel estimation in multi-user massive

MIMO systems. Our approach is based on Bayesian compressive sensing in which the clus-

tered sparse structure of the channel in the angular domain is exploited to reduce the pilot

vii

overhead. To capture the clustered structure, we employ a conditionally independent iden-

tically distributed Bernoulli-Gaussian prior on the sparse vector representing the channel,

and a Markov prior on its support vector. An EP algorithm is developed to approximate the

intractable joint distribution on the sparse vector and its support with a distribution from

an exponential family. This distribution is then used for direct estimation of the channel.

The EP algorithm requires the model parameters which are unknown. We estimate these

parameters using the expectation maximization (EM) algorithm. Simulation results show

that the proposed combination of EM and EP referred to as EM-EP algorithm outperforms

several recently-proposed algorithms in the literature.

viii

Chapter 1

Introduction

Global use of mobile devices that provide wireless internet connectivity to users, such as,

tablets, smartphones, and laptops, etc., is increasing every year by several times. According

to [1], only the number of smartphones in use would increase globally to 6.2 billion by 2021

which was at 3.2 billion in the year 2016, and the mobile video streaming would reach up

to 38 exabytes per month making up 78% of the total mobile data traffic. Besides the

increase in the use of mobile devices, the idea of using the internet-of-things technology [2]

for building smart cities and industries has also been introduced recently in which several

electronic machines, such as, mobile robots, drones, surveillance cameras, and sensors,

etc., are interconnected through wireless internet-based infrastructures [3, 4]. Thus, the

use of data traffic over the wireless internet is expected to increase more in the future

which requires higher throughput and spectral efficiency over the wireless networks. Given

the already congested and limited wireless spectrum, these two factors can be increased in

wireless networks by either deploying more access points in a coverage area, or by increasing

the number of antennas on an access point covering the area. In this work, we focus on the

latter option and study massive multiple input multiple output (MIMO) technology [5, 6]

for cellular networks which proposes to use tens or hundreds of antennas on the base station

(access point) serving the users and is now a core technology of the 5th Generation (5G)

cellular networks.

1.1 MIMO to Massive MIMO

In wireless communication, radio signals propagating through wireless channels undergo

multipath fading and shadowing due to the multiple scatterers and obstacles present in the

environment. Using multiple antennas at the transmitter and/or receiver side increases the

spatial diversity of the communication system, mitigates the channel fading effects, and

improves the reliability of the communication system. Furthermore, using multiple anten-

nas particularly at the transmitter side helps in transmitting multiple independent data

streams simultaneously through spatial multiplexing which increases the spectral efficiency

1

of the communication systems.

MIMO technologies can be broadly classified into three types: Point-to-Point MIMO

[7–9], multi-user MIMO [10–13], and massive MIMO [14–17]. Point-to-Point MIMO, also

known as single-user MIMO, uses multiple antennas at both ends of the communication

system. So with M antennas at the transmitter side (base station) and K antennas on

the receiver side (user) the capacity usually scales with min(M,K) in an independent and

identically distributed (iid) Rayleigh fading environment [16]. However, point-to-point

MIMO is not favored particularly in cellular networks because of the following drawbacks.

Firstly, in the presence of line-of-sight link between transmitter and receiver, the M ×

K dimensional channel matrix H has rank one which allows just one data stream [16].

Secondly, near the cell edges where the signal power received by a user is low, the capacity

greatly reduces in these systems [18]. Finally, in cellular networks due to the size and cost

constraint the number of antennas K on the user is typically much smaller than the number

of antennas M on the base station (BS). This implies that the capacity of point-to-point

MIMO channel usually scales with min(M,K) = K. Thus having larger M antennas at

the BS for a fixed K does not improve the capacity.

Multi-user MIMO splits up the single-user side of point-to-point MIMO having K

antennas into K separate single-antenna users spatially distributed in the cell. The two

major advantages of this transition are as follows. Firstly, only single-antenna users are

required which goes along with the size and cost constraints. Secondly, since the users in

the cell are located several wavelengths apart greater than the angular resolution of the

antenna array at the BS, the multi-user MIMO can work well even in a line-of-sight (LOS)

environment [13]. As a result, multi-user MIMO has been used in standards, such as, 4G

long-term evolution (LTE) [19], 802.11 (WiFi) [20], and 802.16 (WiMAX) [21]. However,

there are two major drawbacks of multi-user MIMO. Firstly, fewer than 10 antennas are

used at the access points of multi-user MIMO, for instance, WiFi uses 2 or 4 antennas,

and 4G LTE uses 8 antennas. With such small number of antennas only a moderate

2

increase in spectral efficiency is achieved. Secondly, there exist inter-user interference in

the communication between the base station (access point) and the users. Thus, to achieve

the capacity promised by the MIMO system, successive interference cancellation [22, 23]

is required for the uplink communication and dirty paper coding [24] is required for the

downlink communication. However, these methods have high computational complexity

which increases exponentially with the size of the system that is M ×K.

Massive MIMO is a multi-user MIMO that was first proposed in [14] and is now a core

technology of 5G cellular networks [25]. It proposes to use a larger number of antennas,

typically tens or hundreds of antennas at the BS. Such an increase in the number of antennas

brings the following advantages to the multi-user massive MIMO technology. Firstly, with

the deployment of many antennas at the BS large increase in spatial diversity and spatial

multiplexing gains can be achieved which in turn increases the spectral efficiency of the

system [26]. Secondly, with large number of antennas, the radiated energy by the BS can

be focused sharply towards the users through better precoding (also called beamforming).

This implies that the loss of energy in unwanted directions can be minimized. To achieve

the same spectral efficiency the required transmitted power in the downlink can be reduced

proportionally to the increase in the number of antennas on the BS. In the uplink, due

to coherent detection each user can also lower its transmitted power proportionally to the

number of antennas on the BS [15]. This increases the energy efficiency of the system.

Finally, for a large number of antennas at the BS, the channel vectors of different users to

the BS become nearly mutually orthogonal, a property known as channel hardening. As a

result, the inter-user interference in a cell can be canceled out using simple linear detector

or precoder at the BS. This effect is also known as favorable propagation condition and is

a consequence of law of large numbers [14].

Due to the above advantages, massive MIMO technology has attracted the interest of

many researchers over the past decade and different challenges including channel estima-

tion [27–30], achievable data rates [31–33], security [34, 35], and also energy and spectral

3

Figure 1.1. Multi-cell massive MIMO network

efficiency [26, 36] have been considered towards designing these systems. In this work we

focus on the channel estimation problem in massive MIMO systems and in Section 1.2 we

emphasize its importance with a review of uplink and downlink signal processing techniques

used in these systems wherein CSI at the BS has to be utilized. As in practice the required

CSI has to be estimated, thus in Section 1.3 we also include a discussion on the challenges

for estimating the CSI in these systems.

Notations: In this chapter, small letters (x) are used for scalars, bold small letters (x)

for vectors, and bold capital letters (X) for matrices. C represent the set of complex num-

bers. The superscripts (.)T , (.)H , (.)∗, and (.)−1 represent transpose, Hermitian transpose,

complex conjugate, and matrix inverse, respectively. A complex Gaussian distribution on a

with zero mean and covariance matrix R is denoted by CN (a|0,R). IN denotes the N ×N

identity matrix. Finally, ||x|| denote the ℓ2 norm of the vector x.

1.2 Massive MIMO System Model

Consider a multi-user MIMO network made up of L cells each with its own BS and

with K users located inside every cell. Every BS in a cell has M antennas and each user has

a single-antenna transceiver. We assume that M >> K, i.e., the BS is equipped with large

number of antennas compared to the number of users. At time t, the channel gain between

4

the m-th antenna of the l-th BS and the k-th user present in the i-th cell is represented by

hlimk(t) as shown in Fig. 1.1. Each channel gain hlimk(t) can be written as

hlimk(t) = glimk(t)√

βlik, (1.1)

where, glimk(t) models the small-scale fading narrow-band channel between the k-th user in

cell i and the m-th antenna of BS l, and βlik models the large-scale fading incurred by the

geometric attenuation and shadowing effects. We assume that glimk(t) is wide-sense station-

ary complex Gaussian process with zero mean and unit power, and βlik is a known constant

which is independent of the antenna index m. Let glik(t) = [gli1k(t), gli2k(t), . . . , gliMk(t)]T

represents the M × 1 small-scale fading channel vector from the k-th user in cell i to the

l-th BS antenna array. Collecting all glik(t) vectors for the K users in cell i we get the

M × K fading matrix Gli(t) = [gli1(t), . . . ,gliK(t)]. The overall channel gain between the

l-th BS and the users in cell i is given by

Hli(t) = Gli(t)D12li , (1.2)

where Dli = diag{βli1, βli2, . . . , βliK}. Under favorable propagation conditions, i.e., when

M → ∞ for a fixed K, the channel vectors of users to the BS (the columns of Hli) become

mutually orthogonal and we get

HHli (t)Hli(t) = D

12liG

Hli (t)Gli(t)D

12li ,

≈ MD12liIKD

12li = MDli, (1.3)

For simplicity, in the following we assume a single cell (L = 1) scenario and for notational

convenience we drop the indices l and i from the channel model in (1.2) and also from (1.3).

5

1.2.1 Uplink Transmission

In uplink (UL) communication, the K users transmit their data to the BS in their cell.

At time t, the signal vector received at the BS is given by

yu(t) =√γuH(t)s(t) + w′(t), (1.4)

where γu is the UL transmitted power and s(t) = [s1(t), s2(t), . . . , sK(t)]T represents the

transmitted symbols by all the users. We assume that sk(t) ∈ AM where AM is an

alphabet of M-ary modulated symbols having zero mean and unit average energy. Fur-

thermore, it is assumed that different users’ symbols are independently selected from AM,

i.e., E[s(t)sH(t)] = IK . The noise term w′(t) is modeled with a complex Gaussian distri-

bution, i.e., w′(t) ∼ CN (w′|0, IM). Assuming that CSI is perfectly known at the BS, the

instantaneous capacity achievable in the UL is given by

Cu = log2 det(

IK + γuHH(t)H(t))

,

≈K∑

k=1

log2(IK +Mγuβk), (1.5)

where the approximation in (1.5) is obtained by using the favorable propagation condition

from (1.3). Note that in (1.5), for a given UL transmitted power γu and large-scale fading

βk the UL capacity can be increased by increasing the number of antennas at the BS, i.e.,

M .

In massive MIMO system, as the number of antennas at the BS grows to infinity,

simple matched filter decoding becomes optimal for achieving the UL capacity in (1.5) and

for eliminating the inter-user interference in a cell. The received signal yu(t) at the BS is

6

multiplied by the conjugate-transpose of the channel matrix H(t) to obtain

x(t) = HH(t)yu(t) =√γuHH(t)H(t)s(t) + HH(t)w′(t),

≈ M√γuDs(t) + w(t), (1.6)

where x(t) = [x1(t), x2(t), . . . , xK(t)]T contains separated signals from individual users. To

get the approximation term in (1.6), we used the favorable propagation condition from (1.3).

Further, the noise term w(t) = HH(t)w′(t) under the favorable propagation is distributed

as CN (w|0,Σ) where the covariance matrix Σ = MD. As a result, the individual signal-

to-noise ratio (SNR) for the k-th user in the UL is SNRk = Mγuβk and the UL capacity

in (1.5) can be achieved.

1.2.2 Downlink Transmission

In downlink (DL) communication, the BS transmits the data through its M antennas

to the K users in its cell. Assume that the BS uses time-division duplexing (TDD) mode

for channelization between the UL and DL channels. In this mode adjacent time slots are

assigned for UL and DL transmissions. Since both channels transmit in the same frequency

band, the channel reciprocity property holds and the DL channel matrix is the transpose

of the UL channel matrix. Thus, in the DL transmission duration, at time t, the signal

vector received at all the K users is given by

yd(t) =√γdHT (t)s′(t) + v(t), (1.7)

where γd is the DL transmitted power and s′(t) is the signal vector transmitted by the BS

with E[||s′(t)||2] = 1. The noise term v(t) is distributed as v(t) ∼ CN (v|0, IM). Assuming

that CSI is perfectly known at the BS, the instantaneous capacity achievable in the DL is

7

given by

Cd = maxP

log2 det(

IM + γdH(t)PHH(t))

,

= maxP

log2 det(

IK + γdP12 HH(t)H(t)P

12

)

,

≈ maxP

log2 det (IK + γdMPD) , (1.8)

where the optimization problem in (1.8) is solved subject to the constraints that the

power allocation matrix P = diag{p1, p2, . . . , pK} is a non-negative diagonal matrix with∑K

k=1 pk = 1 [11]. The approximation term in (1.8) is obtained by using the favorable

propagation condition from (1.3). Note that in (1.8), for a given DL transmit power γd,

the large-scale fading matrix D, and the power allocation matrix P, the capacity can be

increased by increasing M , i.e., the number of antennas at the BS.

In massive MIMO system, as the number of antennas at the BS grow to infinity, simple

matched filter precoding becomes optimal for achieving the DL capacity in (1.8) given a

power allocation matrix P. In case of matched filter precoding, the transmitted signal s′(t)

is composed as

s′(t) = H∗(t)D− 12 P

12 sd(t), (1.9)

where sd(t) = [sd,1(t), sd,2(t), . . . , sd,K(t)]T contains transmitted data for the K users, and

each k-th user data symbol sd,k(t) belongs to an M-ary modulation symbol set AM. In-

serting (1.9) into (1.7), the signal vector received at all the K users is given by

yd(t) =√γdHT (t)H∗(t)D− 1

2 P12 sd(t) + v(t),

≈ √γdMD

12 P

12 sd(t) + v(t), (1.10)

Note that as P and D are both diagonal matrices, the matched filter precoder eliminates

the inter-user interference in the cell. Further, the individual DL SNR for the k-th user is

given by SNRk = Mγdβkpk and the DL capacity in (1.8) can be achieved.

8

1.3 Channel Estimation in Massive MIMO System

In Section 1.2, we assumed that the BS knows the CSI perfectly. However, in practice,

the required CSI is unknown and has to be estimated through the transmission of pilot

symbols. Due to the time-varying nature of mobile channels, the CSI varies over time

and has to be updated in a timely manner. The communication between the BS and the

users is usually executed in a sequence of short frames in which the channel is assumed

nearly static and is referred to as the channel coherence time. The length of a coherence

interval τ is defined by the product of the coherence time Tc and the coherence bandwidth

Wc of the channel. For example, the coherence time of Tc = 1 ms and the coherence

bandwidth of Wc = 100 KHz results in a coherence interval of τ = 100 symbols. The

length of each transmitted frame is set equal to the length of the coherence interval τ

and both pilot and data symbols are packed within the frame. The pilot symbols in the

frame are used to estimate the CSI which is used to accomplish precoding/decoding at

the BS and detect the unknown data symbols in the frame. The required length of pilot

symbols for accurate CSI estimation scales with the number of antennas on the transmitter

side and is independent of the number of receiving antennas [37, 38]. Thus, increasing the

number of transmitter antennas increases the pilot-overhead in a transmitted frame. The

pilot-overhead encountered per frame depends on the duplexing mode used by the massive

MIMO system. There are two possible candidates for the duplexing, one is time division

duplexing (TDD), and the other is frequency division duplexing (FDD). These modes are

explained next.

1.3.1 CSI Acquisition: TDD vs. FDD

In TDD mode, the UL and DL transmissions are assigned adjacent time slots within the

channel coherence interval. Since both channels use the same frequency band the channel

reciprocity property [34, 35] holds and the DL channel estimate is obtained from the UL

9

Figure 1.2. CSI estimation schemes: (a) TDD frame, (b) FDD frame.

channel estimate1. In order to estimate the UL channel, the K users in the cell transmit

orthogonal pilot symbols to the BS. The CSI estimated from the pilot symbols is then

used for coherent detection of data symbols in the frame and precoding in the DL channel.

For a coherence interval of τ symbols, the CSI estimation process consumes a minimum of

K symbols whereas the remaining symbols are used for UL and DL data transmission as

shown in Fig. 1.2 (a). Hence, for operating massive MIMO system in TDD mode, we must

have K < τ .

On the other hand, in FDD mode, the UL and DL channels use different frequency

bands and thus the reciprocity property does not hold. Therefore, both channels need to

be estimated separately in order to perform coherent detection in the UL and precoding in

the DL. To estimate the DL channel, the BS transmits M orthogonal pilot symbols to its

K users. Each user estimates the M DL channels and sends it back to the BS consuming

M symbols, so that the BS can perform precoding for its DL transmission using the DL

CSI. In addition, each user also transmits K orthogonal pilot symbols to the BS, so it

can estimate the UL channel for its coherent detection. This process is shown in Fig.

1.2 (b). Let the coherence interval τ be the same for both UL and DL channels. Thus,

1In practice, the physical channel between the BS and the users is indeed reciprocal but the end-to-endchannel may not be reciprocal due to a mismatch of the RF chains used at both ends. Thus, the reciprocityassumption requires calibration of both channels as discussed in [39, 40]. In our work, in chapter 2.4 weassumed that the end-to-end reciprocity is achieved already in the TDD mode.

10

while on the DL transmission frame channel acquisition requires M symbols, in the UL

frame it takes up to M + K symbols. Hence, for operating massive MIMO systems in

FDD mode, we should have M + K < τ which is difficult to achieve in practice because

we have M ≫ K. This is why TDD mode is usually preferred over FDD in massive

MIMO systems and Sprint networks in United States implemented their first 5G cellular

network using TDD mode operating at 2.5 GHz with 64 transmit and 64 receive antenna

elements (128 antennas) at the BS [41]. However, since many contemporary networks

using 4G LTE operates in FDD mode [42], thus the massive MIMO technology could be

rapidly adopted if the pilot-overhead needed with FDD mode is reduced. Motivated by this

thought, some ongoing research on FDD-based massive MIMO system include [43–45] in

which the channel sparsity in discrete Fourier transform basis is used to apply the Bayesian

compressive sensing paradigm [46,47] for reducing the required pilot-overhead.

1.3.2 Pilot Contamination

In a single-cell network, all users in the cell are usually assigned mutually orthogonal

pilot sequences to mitigate the inter-user interference and to accurately estimate the channel

state information (CSI) at the BS. However, in a multi-cell network, due to limitation of

channel coherence interval mutually orthogonal pilot sequences can not be assigned to all

the users in all the cells [48]. Thus, the same set of orthogonal pilot sequences assigned to

the users in one cell has to be reused for users in others cells. As a result, the channels

estimate of a user in one cell becomes contaminated by the transmission of the same pilot

sequences from the users in other cells. This effect is known as pilot contamination [49,50]

and it reduces the accuracy of CSI estimation.

In order to demonstrate the effect of pilot contamination, consider a multi-user massive

MIMO network made up of L cells each with its own BS and with K users located inside

every cell. Every BS in a cell hasM antennas and each user has a single-antenna transceiver.

We assume the network uses TDD mode and the UL and DL communication across the

cells is synchronized in time and frequency. The K users in a cell are assigned orthogonal

11

pilot sequences given by Φ = [φ1,φ2, . . . ,φK ] in which φk = [φk,1, φk,2, . . . , φk,ν ]T and the

matrix Φ satisfies ΦHΦ = νIK . Suppose the same orthogonal pilot matrix Φ is reused by

the users in the other L− 1 cells. In uplink communication, the signal matrix received by

the l-th BS is given by

Yl =√γu

L∑

i=1

HliΦT + Wl, (1.11)

in which Yl ∈ CM×ν , γu is the UL transmitted power and Hli is defined in (1.2). The

elements in the noise matrix Wl are i.i.d complex Gaussian with zero mean and unit

variance. As a simple method, the l-th BS may correlate Yl with its own orthogonal pilot

sequence Φ to estimate the channel Hl,l [5], and we get

Hll =1

ν√γu

YlΦ∗,

= Hll︸︷︷︸

desired channel

+L∑

i=1

i6=l

Hli

︸︷︷︸

unwanted interference

+1

ν√γu

WlΦ∗

︸︷︷︸

noise term

, (1.12)

in which we used the fact that ΦTΦ

∗ = ΦHΦ = νIK . The unwanted interference term

in (1.12) results from the re-use of pilot sequences in the neighboring cells. Given the

channel model of Hli in (1.2), due to distance the pilot transmission from the users in cell

i undergo a large attenuation and the value of βliks’ for them is small. Nevertheless, the

interference term adds up with the noise term to increase the overall error in the channel

estimation. Thus, to reduce the impact of pilot contamination in CSI estimation, bet-

ter channel estimation schemes are required for multi-cell massive MIMO systems. Blind

channel estimation schemes which utilizes statistical knowledge of the received signals have

been developed to reduce channel estimation error in multi-cell networks. Eigen-value de-

composition (EVD) based blind channel estimation scheme is proposed in [51] that uses

the EVD of the correlation matrix of received signals. Simulation results included in [51]

demonstrate the efficacy of the proposed EVD scheme compared to other pilot based chan-

nel estimation schemes. However, the gains promised by the EVD method require large

12

number of antennas at the BS and large number of samples available for the estimation

of the correlation matrix within the channel coherence interval. Expectation propagation

(EP) algorithm [52,53] based blind channel estimation scheme is proposed in [54] that uses

the EVD algorithm from [51] for its initialization in order to get an improved performance.

However, blind channel estimation methods suffer from inherent phase ambiguities in the

demodulated symbols, requiring a pilot symbol and user label in order to resolve these

ambiguities [55]. In contrast, adding a few additional training symbols in a semi-blind

approach significantly improves the channel estimation accuracy compared to the blind

methods (see Fig. 1 in [56,57]). A semi-blind pilot decontamination scheme is proposed for

a multi-cell massive MIMO system in [58] where the information sequence of the target cell

is first estimated and used as a pilot sequence in a least square method for CSI estimation.

1.4 Outline of this Dissertation

The rest of this dissertation is organized as follow. In Chapter 2, we consider the

problem of semi-blind channel estimation in the uplink of TDD-based multi-cell massive

MIMO systems with spatially correlated time-varying channels. A semi-blind message-

passing EP algorithm is developed to iteratively approximate the joint posterior distribution

on the unknown channel matrix and the transmitted data symbols with a distribution from

an exponential family. This distribution is then used for direct estimation of the channel

matrix and detection of the data symbols. In Chapter 3, we consider the problem of FDD-

based DL channel estimation in multi-user massive MIMO systems. To this end, we use

the Bayesian compressive sensing approach in which the clustered sparse structure of the

physical massive MIMO channel in the angular domain is employed to reduce the required

pilot-overhead. To capture the clustered structure, we exploit a conditional i.i.d Bernoulli-

Gaussian prior on the sparse vector representing the channel, and a Markov prior on its

support vector. A message-passing EP algorithm is developed to approximate the joint

posterior distribution on the sparse vector and its support with a distribution from an

exponential family. This distribution is then used to estimate the physical DL massive

13

MIMO channel. The EP algorithm assumes that the model parameters are known in

advance. In order to estimate these parameters, we use expectation maximization (EM)

algorithm. The EM algorithm is combined with the EP algorithm using the variational

EM approach and the resulting algorithm is referred to as EM-EP algorithm.

14

Chapter 2

Semi-blind Channel Estimation for Multi-cell Massive MIMO Sys-tems on Time-Varying Channels

In this chapter, we study the problem of semi-blind channel estimation and symbol

detection in the uplink of multi-cell massive MIMO systems with spatially correlated time-

varying channels. The semi-blind method jointly uses a few pilot symbols (on the order of

the number of users in the cell) and the transmitted data symbols in the frame as virtual

pilots for the channel estimation. Thus, an algorithm based on expectation propagation

(EP) is developed to iteratively approximate the true joint posterior distribution on the

transmitted data symbols and the unknown channel matrix with a distribution from an

exponential family. This distribution is then used for direct detection of the transmitted

data symbols as well as the estimation of the unknown channel matrix. A modified (semi-

blind) version of the Kalman filtering algorithm, referred to as KF-M, is also proposed

which emerges from our EP derivations and it is used to initialize the EP-based algorithm.

Performance of the Kalman smoothing algorithm followed by a single-pass of KF-M is

also examined which is referred to as KS-M. Simulation results included at the end of

this chapter demonstrate that the channel estimation error and the symbol error rate

(SER) of the proposed semi-blind KF-M, KS-M, and EP-based algorithms improve with

the increase in the number of base station antennas and the length of the data symbols in

the transmitted frame. It is shown that the proposed semi-blind EP algorithm significantly

outperforms the KF-M and KS-M algorithms in channel estimation and symbol detection.

Furthermore, simulation results show that by increasing the length of transmitted data

symbols in the frame, the semi-blind EP-based algorithm can mitigate the impact of pilot

contamination and time-variations in the channels in a multi-cell massive MIMO system

with a pilot-overhead smaller than 1/9.

A preprint of this chapter was made available at https://arxiv.org/abs/2011.09010 as Mort Naraghi-Pour, Mohammed Rashid, Cesar Vargas-Rosales, “Semi-blind Channel Estimation and Data Detectionfor Multi-cell Massive MIMO Systems on Time-Varying Channels”, arXiv e-prints, 2020. The copyrightinformation is included in Appendix E.

15

2.1 Introduction

In wireless communication, coherent demodulation of transmitted symbols requires

accurate knowledge of CSI. Traditionally, pilot sequences are transmitted to aid in channel

estimation. Since the number of transmit antennas determines the length of the orthogonal

pilot sequences, for massive MIMO systems employing a large number of antennas at the

BS, pilot-based channel estimation in the downlink is very challenging. Due to the increased

length of pilot sequences, the time required for pilot transmission increases resulting in low

spectral efficiency. In addition, the time required for data and pilot transmission may

exceed the coherence time of the channel.

In TDD, CSI can be estimated in the uplink from pilot transmissions. For users employ-

ing a single antenna, the length of the pilot sequence only needs to scale with the number of

users in the cell which is typically much smaller than the number of BS antennas. Invoking

the channel reciprocity property, the uplink CSI matrix is transposed to obtain the down-

link CSI which can then be used for precoding in the downlink. However, even in this case,

due to the limited number of orthogonal pilot sequences, pilots must be shared among the

users in the neighboring cells resulting in the so-called pilot contamination problem [48],

which diminishes the accuracy of CSI estimation.

To overcome the effects of pilot contamination, several blind channel estimation schemes

have been proposed in recent years [54,55,59,60]. In [54] the EP algorithm [52] is developed

for blind channel estimation and symbol detection in multi-cell massive MIMO systems. Bi-

linear generalized approximate message passing (BiG-AMP) algorithm is another approach

proposed in [55] for sparse massive MIMO channels. Channel sparsity is exploited in a

number of other studies including [59–61]. In particular [60] studies the effect of one-bit

ADCs in the receiver.

Blind channel estimation methods suffer from inherent phase ambiguities in the de-

modulated symbols, requiring a pilot symbol and user label in order to resolve these ambi-

guities [55]. In contrast, adding a few additional training symbols in a semi-blind approach

16

significantly improves the channel estimation accuracy compared to the blind methods (see

Fig. 1 in [56,57]). A semi-blind channel estimation method based on the expectation max-

imization algorithm is proposed and analyzed in [27]. A semi-blind pilot decontamination

scheme is proposed for a multi-cell massive MIMO system in [58] where the information

sequence of the target cell is first estimated and used as a pilot sequence in a least square

method for CSI estimation. In [62] the authors present a semi-blind joint channel estima-

tion and data detection method based on the regularized alternating least-square (R-ALS)

method. Other semi-blind algorithms for massive MIMO channels are proposed in [57, 63]

where the pilots sent in the beginning portion of the frames are used for their initializa-

tion. For sparse massive MIMO channels, a message passing semi-blind channel estimation

algorithm is proposed in [64].

In the semi-blind algorithms described above, the massive MIMO channel is assumed

to be spatially uncorrelated, as well as static during the transmission time of a frame. Due

to their large number of antennas, the BS’s in massive MIMO systems have fine angular

resolutions making some spatial directions more probable than others [65–67]. This results

in spatial correlation in the channel vector of a user to the BS which needs to be included in

the channel model. Secondly, while the assumption of static channel (the so-called block-

fading) model is valid for stationary or low-mobility users, it breaks down for high-mobility

users. In addition to delay spread due to multipath effects, mobile fading channels are

subject to time variations due to Doppler spread. For block-fading channels CSI estimation

is only required at the start of each frame. However, for time-varying channels, the CSI

estimates need to be updated instantaneously throughout the frame. Compared to the

rich literature available on block-fading channel estimation in massive MIMO systems, few

studies are available for the estimation of the time-varying channels. For a time-varying

massive MIMO channel, the data rates achievable by the linear MMSE channel estimator is

studied in [29,30,68,69]. The time evolution of the channel taps is modeled by a first-order

autoregressive (AR) process with the temporal correlation properties corresponding to the

17

Jakes’ model [70]. Kalman filter is used to estimate the time-varying sparse massive MIMO

channel in [71] and the time-varying non-sparse massive MIMO channel in [28,72].

Here, we consider semi-blind joint channel estimation and data detection in multi-cell

massive MIMO systems for a spatially correlated time-varying channel. To our knowledge,

this problem has not been studied for massive MIMO systems. The contributions made in

this chapter are summarized as follow:

• We consider a multi-user multi-cell massive MIMO system. The uplink channel from

each user to BS is assumed to be spatially correlated and time-varying. The channel

vector from a user to the BS is modeled as a complex circularly-symmetric Gaussian

vector with a given (known) correlation matrix [67]. The temporal correlation of the

channel is modeled by a first-order AR process, [30,68,69] with correlation properties

corresponding to the Jakes’ model [70]. A semi-blind method is developed for joint

channel estimation and data detection where it is assumed that the users transmit a

few pilot symbols (on the order of the number of users) at the beginning of each frame.

These symbols are used for an initial estimation of the channel and to overcome the

inherent ambiguity of non-coherent detectors. The proposed method is based on

the expectation propagation (EP) algorithm and iteratively approximates the joint a

posterior distribution of the channel matrix and the transmitted data symbols with

a distribution from an exponential family. This distribution is then used for direct

estimation of the channel matrix and detection of the data symbols.

• Simulation results are presented to demonstrate the performance of the proposed EP-

based algorithm (EP) in terms of channel estimation and symbol error rate (SER).

A modified version of the popular Kalman filtering (KF) algorithm referred to as KF-

M is also proposed. KF-M emerges from our EP derivation and is used to initialize

the EP-based algorithm. The backward recursion equations of Kalman smoother are

also a part of our EP derivations. Therefore, the performance of KF-M as well as

18

the Kalman smoothing algorithm followed by a single pass of KF-M is also presented

here for comparison and is denoted as KS-M. To benchmark the performance of semi-

blind KF-M, KS-M, and EP, we also present the performance of the Kalman filter

and smoother in a pure training mode (TM) when the entire frame is composed of

known pilot symbols and only channel estimation is performed. These two cases, are

referred to as KF-TM and KS-TM. In channel estimation, KF-TM provides a lower

bound for KF-M, and KS-TM provides a lower bound for KS-M, and EP. Finally,

we also plot the SER performance of the MMSE estimator with known CSI (denoted

PCSI) for comparison with SER performance of the proposed algorithms.

• To our knowledge the problem under consideration here has not been previously inves-

tigated. As such, unfortunately we cannot compare our results with those in published

literature. However, to verify that algorithms developed under the assumption of a

block-fading channel are not suitable for a time-varying channel model, we compare

our results with those from [27] and [62]. The comparison shows that for time-varying

channels, the proposed method significantly outperforms the methods in [27] and [62].

The rest of this chapter is organized as follows. Section 2.2 describes the system model

of a time-varying multicell massive MIMO system. The semi-blind EP algorithm for this

system is derived in section 2.3, and the simulation results are discussed in section 2.4

Notations: Throughout this chapter, small letters (x) are used for scalars, bold small

letters (x) for vectors, and bold capital letters (X) for matrices. R and C represent the set of

real and complex numbers, respectively. The superscripts (.)T , (.)H , (.)∗, and (.)−1 represent

transpose, Hermitian transpose, complex conjugate, and matrix inverse, respectively. Also,

⊗ denotes the matrix Kronecker product. For a pdf p(.), Ep denotes the expectation

operator with respect to p(.). IN denotes the N ×N identity matrix. Finally, vec(X) and

||x|| denote the vectorization of the matrix X and the ℓ2 norm of the vector x, respectively.

19

2.2 System Model

We consider a multi-user MIMO network made up of L cells each with its own BS and

with K users located inside every cell. Every BS in a cell has M antennas and each user

has a single-antenna transceiver. At time t, the channel gain between the m-th antenna

of the l-th BS and the k-th user present in the i-th cell is represented by hlimk(t). Each

channel gain hlimk(t) can be written as

hlimk(t) = glimk(t)√

βlik, (2.1)

where, glimk(t) models the fast-fading channel between the k-th user in cell i and the m-th

antenna of BS l, and βlik models the large-scale fading incurred by the geometric attenuation

and shadowing effects. We assume that βlik is a known constant which is independent of

the antenna index m.

The fast-fading channel glimk(t) is considered to be a wide-sense stationary complex

Gaussian process with zero mean and unit power. Using the Jakes’ model [70], the time

autocorrelation of glimk(t) is given by

Rglimk (∆t) = E [glimk(t)g∗

limk(t+ ∆t)]

= J0

(

2πfdlimk|∆t|

)

, (2.2)

in which, J0(.) is the zero-order Bessel function of first kind and fdlimk represents the nor-

malized maximum Doppler shift corresponding to the channel between the m-th antenna

of cell l and the k-th user in cell i. The time autocorrelation function of hlimk(t) can be

obtained as

Rhlimk (∆t) = E [hlimk(t)h∗

limk(t+ ∆t)]

= βlikJ0

(

2πfdlimk|∆t|

)

, (2.3)

20

Let glik(t) , [gli1k(t), gli2k(t), . . . , gliMk(t)]T represent the M × 1 fast-fading channel vector

from the k-th user in cell i to the l-th BS antenna array. We assume that the elements

in glik(t) are correlated with the correlation matrix Rlik , E[glik(t)gHlik(t)] [73, 74]. Let

Gli(t) , [gli1(t), . . . ,gliK(t)]. The overall channel gain between the l-th BS and the users

in cell i is given by

Hli(t) , Gli(t)D12li , (2.4)

where Dli , diag{βli1, βli2, . . . , βliK}. It is assumed that the users’ channels to the BS

are uncorrelated. In particular this implies that E[Gli(t)GHlj (t)] =

∑Kk=1 E[glik(t)gH

ljk(t)] =∑K

k=1 Rlikδij, and as a result,

E[Hli(t)HHlj (t)] = E[Gli(t)DliG

Hli (t)]δij,

=K∑

k=1

βlikRlikδij, (2.5)

The signal vector received at BS l at time t is given by

yl(t) =L∑

i=1

Hli(t)si(t) + w′l(t)

= Hll(t)sl(t)︸︷︷︸

desired signal

+L∑

i=1

i6=l

Hli(t)si(t)

︸︷︷︸

interference

+ w′l(t)

︸︷︷︸

noise

, (2.6)

where si(t) = [si1(t), si2(t), . . . , siK(t)]T represents the transmitted symbols by all the users

in the i-th cell. We assume that the symbols sij(t) belong to an M-ary modulation con-

stellation, denoted by AM, and have zero mean with average energy Es. We also assume

that E[si(t)sj(t)H ] = EsδijIK , i.e., the symbols sij(t) are independently selected from AM.

The noise term at the l-th BS is modeled with a circularly symmetric complex Gaussian

distribution, i.e., w′l(t) ∼ CN (w′

l|0, IM).

Let wl(t) ,∑L

i=1

i6=lHli(t)si(t)+w′

l(t) denote the overall disturbance at the l-th BS. Thus,

(2.6) can be written as yl(t) = Hll(t)sl(t)+wl(t) where wl(t) has zero mean and correlation

21

matrix

Rw , E[wl(t)(

wl(t))H

] = Es

L∑

i=1

i6=l

K∑

k=1

βlikRlik + IM , (2.7)

Assuming that KL is large and using the central limit theorem we consider wl(t) as a

circularly symmetric complex-valued Gaussian vector, i.e., wl(t) ∼ CN (wl|0,Rw).

To keep the notations uncluttered, in the following we drop the subscript l from the

signal model and write the received vector at time t at the l-th BS as

yt = Htst + wt, (2.8)

where Ht is the overall channel gain matrix, st ∈ AKM represents the transmitted symbols

by the K users, and wt is the zero mean complex Gaussian distributed disturbance with

covariance matrix given by (2.7).

By applying the vectorization property as in [54], (2.8) can also be written as

yt = Stht + wt, (2.9)

where we define St = sTt ⊗ IM and ht = vec(Ht).

An autoregressive (AR) process has been extensively used to model the time evolution

of the channel matrix [75–78]. Since any higher-order AR model can be written as a first-

order model in matrix state-space form [30, 77], in this chapter we use a first-order AR

model (AR(1)) for the time-varying vector channel given by

ht = Aht−1 + vt, (2.10)

where A is a diagonal matrix with the elements on the diagonal denoted by [A]n,n = an,

for n = 1, 2, . . . ,MK. The variable an is the AR(1) coefficient corresponding to the n-th

22

channel between a user and a BS antenna in the cell. We let an = J0(2πfdn) in which fd

n

is the normalized maximum Doppler shift for channel n. The innovation process {vt} is

an iid circularly symmetric complex Gaussian process with vt ∼ CN (0,Q). To include

the spatial correlation of the channel vector, we set Q = R1/2h QvR

1/2h in which, given

the independence of users’ channels, Rh , E[hthHt ] = diag{β1R1, β2R2, . . . , βKRK} is a

block diagonal matrix. The matrix Qv is diagonal with elements on the diagonal given by

[Qv]n,n = σ2n, for n = 1, 2, . . . ,MK. To match the autocorrelation function in (2.3) such

that the average power of the channel coefficient in (2.1) is equal to the large-scale fading

coefficient, we set σ2n = (1 − a2

n).

Consider a transmitted frame of length T which is comprised of Tp pilot symbols in the

beginning followed by Td = T−Tp unknown data symbols denoted as S , (Sp,Sd) where we

define Sp =(

s1, s2, . . . , sTp

)

and Sd =(

sTp+1, . . . , sT

)

. The corresponding received vectors

are given by Y , (Yp,Yd), Yp = (y1,y2, . . . ,yTp) and Yd = (yTp+1, . . . ,yT ). Similarly,

the channel vector is denoted by H , (Hp,Hd) = (h1,h2, . . . ,hTp,hTp+1, . . . ,hT ). We

are interested in a detector where, having received Y , the unknown channel vectors in H

and the unknown transmitted symbols in Sd are jointly estimated. The posterior joint

distribution of S and H is given by

p(S,H|Y) ∝ p(Sd,H)p(Yp|Sp,Hp)p(Yd|Sd,Hd)

=[ T∏

t=Tp+1

p(st)][ T∏

t=1

p(ht|ht−1)p(yt|st,ht)]

=T∏

t=1

p(st)p(ht|ht−1)p(yt|st,ht), (2.11)

in which p(st) is the probability mass function (pmf) of the transmitted vector st and by

convention we set p(st) = 1 for t = 1, . . . , Tp and p(h1) = p(h1|h0). From (2.9) we have

p(yt|st,ht) = CN (yt|Stht,Rw), (2.12)

23

The optimum receiver implements the maximum a posteriori rule according to (2.13),

i.e.,

(Sd,H)∗ = arg maxSd∈A

KTdM ,H∈CM×T

p(S,H|Y), (2.13)

Due to the complexity of (2.13), finding the optimum solution is generally very difficult

and requires multidimensional integration. The proposed EP algorithm in the next section

exploits the multiplicative nature of (2.11) to find a simpler approximation for the condi-

tional joint distribution of (S,H) such that the marginals can be calculated with much

less effort.

2.3 Semi-blind Expectation Propagation Algorithm

In this section we develop the EP algorithm for noncoherent semi-blind detection in

massive MIMO systems for fast fading channel. For a review of the EP algorithm we refer

to [53, 79]. Let F denote a family of exponential distributions. Similar to [78] we exploit

the factorized structure of (2.11) to approximate the posterior distribution p(S,H|Y) with

the following distributions from F .

p(S,H|Y) ≈ q(S,H) =T∏

t=1

qt(st,ht) =T∏

t=1

qt(st)qt(ht), (2.14)

where qt(st,ht) = qt(st)qt(ht) ∈ F . Examining (2.11) and following [78], we use the follow-

ing product form for q(S,H):

q(S,H) ∝ p(s1)p(h1)qO1 (s1,h1)×

T∏

t=2

p(st)qF Rt (ht−1,ht)qO

t (st,ht), (2.15)

where, comparing (2.15) and (2.11), we have qF Rt (ht−1,ht) to approximate p(ht|ht−1) and

qOt (st,ht) to approximate p(yt|st,ht). Finally, to write (2.15) in a completely factorized

24

form as in (2.14) we let

qOt (st,ht) = qO

t (st)qOt (ht), (2.16)

and

qF Rt (ht−1,ht) = qR

t (ht−1)qFt (ht), (2.17)

Now inserting (2.16) and (2.17) into (2.15), and letting qF1 (h1) to approximate p(h1) and

qRT (hT ) = 1, we can write

q(S,H) ∝[

T∏

t=1

p(st)qOt (st)

] [T∏

t=1

qFt (ht)qR

t (ht)qOt (ht)

]

, (2.18)

From (2.18) and (2.14), the approximating posterior distribution qt(st,ht) is given by

qt(st,ht) ∝ p(st)qRt (ht)qF

t (ht)qOt (st)qO

t (ht), (2.19)

Note that in (2.19), from (2.11) we have p(st) = 1 for all t = 1, . . . , Tp. For clarity, both

the true posterior distribution in (2.11) and the approximated one in (2.18) are illustrated

with factor graphs in Fig. 2.1. Updating the factors qFt (.), qR

t (.), and qOt (.) in (2.19) result

in forward, reverse, and observation messages respectively which propagate through the

factor graph in Fig. 2.1 (a) until a good approximation to the true posterior is obtained.

With EP, we update the factors ensuring that they belong to the exponential family

F . As a result, their product q(S,H) also belongs to F and can be effectively used for

maximum a posteriori estimation. To this end, we select the distributions qOt (ht), qF

t (ht)

and qRt (ht) from the exponential family F and qO

t (st) to be discrete. Since p(st) is also

discrete, it follows that (2.19) and (2.18) are from the exponential family. In the following,

we update these factors.

25

Figure 2.1. Factor graph illustrations of (a) True posterior distribution in (2.11), and (b)Approximated posterior distribution in (2.18). Small rectangles represent factor nodes andcircles represent variable nodes. A plate (big rectangle) notation is used to represent arepetition of variables in the subgraph.

First, we ignore the beginning portion of the frame and compute qOt (st) and qO

t (ht) for

the blind part of the frame, i.e., for t = Tp + 1, . . . , T . As shown in Fig. 2.2 we define the

cavity distribution as

q\Ot (st,ht) =

qt(st,ht)qO

t (st)qOt (ht)

= p(st)q\Ot (ht), (2.20)

From the exponential family F , we particularly consider the family of multivariate Gaussian

distributions. More specifically, we let q\Ot (ht) , qF

t (ht)qRt (ht) ∼ CN (m\O

t ,V\Ot ). The

discrete distributions qOt (st) are assumed to be the probability mass functions (pmf) of

their corresponding random variables. Denoting AKM , {a1, a2, · · · , aMK }, the pmf can be

defined as

qOt (st) = [P (st = a1), · · · , P (st = aMK )] , (2.21)

Next, the hybrid posterior distribution is defined by combining the tth factor in the likeli-

hood function p(Y|S,H), namely p(yt|st,ht), with (2.20) to get the following intermediate

approximate posterior:

qt(st,ht) =p(st)q

\Ot (ht)p(yt|st,ht)

Zt

, (2.22)

26

Figure 2.2. EP steps for updating qOt (st,ht): (a) Eliminate qO

t (st) and qOt (ht) from the

factor graph to find the cavity distribution q\Ot (st,ht) as in (2.20), (b) Use p(yt|st,ht)

factor to define the hybrid posterior distribution qt(st,ht) as in (2.22), and (c) Projectqt(st,ht) onto F and update qO

t (st,ht) as in (2.33) and (2.37).

where Zt is a normalization factor given by

Zt =∑

st∈AKM

∫

ht

p(st)q\Ot (ht)p(yt|st,ht)dht,

=∑

st∈AKM

p(st)∫

ht

CN (ht|m\Ot ,V

\Ot )CN (yt|Stht,Rw)dht,

=∑

st∈AKM

p(st) CN (yt|Stm\Ot ,Σt), (2.23)

In (2.23), Σt , StV\Ot SH

t + Rw. Next we project qt(st,ht) onto the closest distribution (in

the sense of Kullback-Leibler divergence) in F to find qt(st,ht) = qt(st)qt(ht):

qt(st,ht) = arg minqt(st,ht)∈F

KL (qt(st,ht)‖qt(st,ht)) ,

= arg minqt(st),qt(ht)∈F

KL (qt(st,ht)‖qt(st)q(ht)) , (2.24)

where KL(·‖·) denotes the Kullback-Leibler divergence. It is shown in [54] that the above

27

optimization problem can be divided into the following two separate optimizations:

qt(ht) = arg minqt(ht)∈F

KL (qt(ht)‖qt(ht)) , (2.25)

qt(st) = arg minqt(st)∈F

KL (qt(st)‖qt(st)) , (2.26)

where qt(ht) and qt(st) are the marginal distributions of ht and st, respectively, derived

from their joint distribution qt(st,ht).

The solution to (2.25) is obtained from the so-called moment matching property [54].

Since the approximated posterior qt(ht) ∈ F , we assume qt(ht) ∼ CN (ht|mt,Vt). The

moment matching property implies

mt = Eqt(ht)[ht], (2.27)

Vt = Eqt(ht)[hthHt ] − mtm

Ht , (2.28)

The values of mt and Vt are given by the following lemma whose proof is given in Appendix

B.

Lemma 1. 1. The posterior mean value mt is given by

mt = m\Ot + V

\Ot ∇H

m, (2.29)

where

∇Hm ,

(

∂

∂m\Ot

logZt

)H

,

=1Zt

∑

st∈AKM

p(st)CN (yt|Stm\Ot ,Σt)SH

t Σ−1t ζt, (2.30)

where ζt , yt − Stm\Ot and Zt is given in (2.23).

28

2. The covariance matrix Vt is given by

Vt = V\Ot − V

\Ot

(

∇Hm∇m − ∇V

)

V\Ot , (2.31)

where

∇V ,

(

∂ logZt

∂V\Ot

)T

,

=1Zt

∑

st∈AKM

p(st)CN(

yt|Stm\Ot ,Σt

)

,×

SHt

(

Σ−1t ζtζ

Ht Σ

−1t − Σ

−1t

)

St, (2.32)

Note that to simplify notation we have dropped the index t from the left side of (2.30)

and (2.32) here and in the following.

Thus, qOt (ht) can be updated as

qOt (ht) =

qt(ht)

q\Ot (ht)

∝ CN(

ht|mOt ,V

Ot

)

, (2.33)

As discussed in [78], during the iterations of the algorithm, VOt may become singular.

Therefore, to avoid numerical issues we write the mean and covariance as the natural

parameters as follows:

µOt = Λ

Ot mO

t = V−1t mt −

(

V\Ot

)−1m

\Ot , (2.34)

ΛOt =

(

VOt

)−1= V−1

t −(

V\Ot

)−1, (2.35)

The solution to (2.26) is to match the pmf of the posterior distribution qt(st) to the

marginal qt(st) which results in

qt(st) =p(st)Zt

CN(

yt|Stm\Ot ,Σt

)

, (2.36)

29

Figure 2.3. EP steps for updating qRt−1(ht−1) and qF

t (ht): (a) Eliminate qRt−1(ht−1) and

qFt (ht) from the factor graph to find the cavity distributions q\R

t−1(ht−1) and q\Ft (ht) as in

(2.38), (b) Use p(ht|ht−1) factor to define the intermediate posterior distribution rt(ht−1,ht)as in (2.39), and (c) Project rt(ht−1,ht) onto F and update qF

t (ht), qRt−1(ht−1) using (2.45)-

(2.58).

Therefore, qOt (st) is updated by

qOt (st) =

qt(st)p(st)

=1Zt

CN(

yt|Stm\Ot ,Σt

)

, (2.37)

We now consider the beginning portion of the transmitted frame and for t = 1, . . . , Tp

for which we do not compute qOt (st) as st are the known pilot symbols. To update qO

t (ht),

given q\Ot (ht) the posterior factor qt(ht) can be directly updated from (2.29) and (2.31) by

using ∇Hm = SH

t Σ−1t ζt and ∇V = SH

t

(

Σ−1t ζtζ

Ht Σ

−1t − Σ

−1t

)

St. We should point out that

for this part of the frame, the summations in (2.23), (2.30), and (2.32) reduces to a single

term corresponding to the known pilot symbol st. Following this, qOt (ht) can be computed

from (2.34) and (2.35).

Next, we need to update qFt (ht) and qR

t (ht) for the entire transmitted frame. Following

the steps summarized in Fig. 2.3 we define the following intermediate distribution

rt(ht−1,ht) , q\Rt−1(ht−1)p(ht|ht−1)q

\Ft (ht), (2.38)

where q\Rt (ht) , qt(ht)/qR

t (ht) and q\Ft (ht) , qt(ht)/qF

t (ht) are the cavity distributions,

30

given by CN (m\it ,V

\it ) where i = F , R. Via some algebraic manipulations, we can show

that

rt(ht−1,ht) ∝

CN

ht−1

ht

∣∣∣∣∣∣∣∣

µt−1

µt

,

Λt−1,t−1 Λt−1,t

Λt,t−1 Λt,t

−1

, (2.39)

where

Λt−1,t−1 ,(

V\Rt−1

)−1+ AHQ−1A, (2.40)

Λt,t−1 = ΛHt−1,t , −Q−1A, (2.41)

Λt,t ,(

V\Ft

)−1+ Q−1, (2.42)

and the means are related by

Λt−1,t−1µt−1 + Λt−1,tµt =(

V\Rt−1

)−1m

\Rt−1, (2.43)

Λt,tµt + Λt,t−1µt−1 =(

V\Ft

)−1m

\Ft , (2.44)

Next we project rt(ht−1,ht) onto F by minimizing the following KL divergence to get

(qt(ht), qt−1(ht−1)) =

arg minqt(ht),qt−1(ht−1)∈F

KL (rt(ht−1,ht)‖qt(ht)qt−1(ht−1)) , (2.45)

As in the case of the optimization in (2.24) which resulted in (2.25) and (2.26), the above

optimization problem can also be decomposed into two separate problems. In each one, we

minimize the KL divergence between qk(hk) ∀ k ∈ {t, t − 1} and the respective marginal

distribution obtained from rt(ht−1,ht) by integrating out the other variable. Setting qt(ht)

31

to the marginal distribution in the KL optimization problem, qFt (ht) can be updated from

qFt (ht) =

qt(ht)

q\Ft (ht)

=∫

ht−1

q\Rt−1(ht−1)p(ht|ht−1)dht−1 (2.46)

∝ CN(

ht|mFt ,V

Ft

)

,

where

mFt = Am

\Rt−1, VF

t = AV\Rt−1A

H + Q, (2.47)

To update qRt (ht), we follow the Kalman smoothing derivation to directly incorporate it

into qt(ht). Towards this, we first compute the term q\Rt (ht) as follows

q\Rt (ht) = qF

t (ht)qOt (ht), (2.48)

= CN(

ht|mFt ,V

Ft

)

CN(

ht|mOt ,V

Ot

)

,

∝ CN(

ht|m\Rt ,V

\Rt

)

,

where

m\Rt = V

\Rt

((

VFt

)−1mF

t + µOt

)

, (2.49)

V\Rt =

((

VFt

)−1+ Λ

Ot

)−1

, (2.50)

Next, since (2.43) is related to the marginal distribution of ht−1, we obtain the mean of

the posterior distribution qt−1(ht−1) by solving for µt−1. By substituting µt = mt, Λt−1,t−1

from (2.40), and Λt−1,t from (2.41), it can be shown that

µt−1 = Λ−1t−1,t−1

[ (

V\Rt−1

)−1m

\Rt−1 − Λt−1,tmt

]

, (2.51)

Vt−1,t−1 = Λ−1t−1,t−1 + Λ

−1t−1,t−1Λt−1,tVtΛt,t−1Λ

−1t−1,t−1, (2.52)

32

Algorithm 1: Semi-blind EP AlgorithmInput: Y, σ2

w, A, Q, n, ǫ, m\R0 , V

\R0

Output: Sd, H

/* Initial filtering pass (KF-M run) */

for each t = {1, 2, . . . , T}Compute mF

t and VFt via (2.47) then set m

\Ot = mF

t and V\Ot = VF

t .

If t > Tp, maximize CN(

yt|Stm\Ot ,Σt

)

for st to get the maximizer st.

If t > Tp, use st in (2.62) and (2.63) to compute mt and Vt from (2.29) and (2.31),respectively.

Use st in (2.62) and (2.63) to compute mt and Vt from (2.29) and (2.31), respectively.Compute µO

t via (2.34) and ΛOt via (2.35).

Set m\Rt = mt and V

\Rt = Vt.

end

/* EP run */

for each i = {1, 2, . . . , n}if i > 1 then

/* filtering pass */

for each t = {1, 2, . . . , T}Update mF

t and VFt via (2.47).

For the followed up next smoothing pass, compute m\Rt via (2.49) and V

\Rt via

(2.50).

end

end

/* smoothing pass */

for each t = {T, T − 1, . . . , 1}Set mT = m

\RT and VT = V

\RT .

If t < T , update mt using (2.57) and Vt using (2.58).

Compute q\Ot (ht) = qt(ht)

qOt

(ht)∼ CN

(

ht|m\Ot , V

\Ot

)

as follows:

m\Ot = V

\Ot

(V−1

t mt − µOt

),

V\Ot =

(V−1

t − ΛOt

)−1,

If t > Tp, maximize CN(

yt|Stm\Ot ,Σt

)

for st to get the maximizer st.

If t > Tp, use st in (2.62) and (2.63) to update mt via (2.29) and Vt via (2.31).Use st in (2.62) and (2.63) to update mt via (2.29) and Vt via (2.31).Compute µO

t via (2.34) and ΛOt via (2.35).

end

/* Check for convergence: Keep track of all mts’ for each ith iteration */

if∑T

t=1

∣∣∣

∣∣∣m

it−m

i−1t

∣∣∣

∣∣∣

∣∣∣

∣∣∣m

i−1t

∣∣∣

∣∣∣

< ǫ then

break;end

end

Populate Sd = {sTp+1, sTp+2, . . . , sT } and H = {m1, m2, . . . , mT }.

33

Using results from Kalman smoothing the above can be reduced as follows.

Λ−1t−1,t−1 = V

\Rt−1 − Jt−1Ft−1J

Ht−1, (2.53)

Λ−1t−1,t−1Λt−1,t = −Jt−1, (2.54)

where

Ft−1 = Q + AV\Rt−1AH , (2.55)

Jt−1 = V\Rt−1A

HF−1t−1, (2.56)

The update equations are obtained by substituting (2.55) and (2.56) into (2.51) and (2.52)

and adjusting the notation to update the tth factors:

mt = m\Rt + Jt

(

mt+1 − Am\Rt

)

, (2.57)

Vt = V\Rt + Jt (Vt+1 − Ft) JH

t , (2.58)

This completes all the posterior updates for the EP iteration. This algorithm is sum-

marized in Algorithm 1.

2.3.1 Reducing Computational Complexity

For t = Tp+1, . . . , T , computation of (2.30) and (2.32) require summation of MK terms

which is computationally challenging. Since pilot symbols are transmitted for t = 1, · · · , Tp,

even in the first pass of the algorithm, m\Ot provides a reasonably good estimate for ht for

t = Tp + 1, · · · , T . Therefore, the terms in V\Ot and Σt become smaller. As a result,

the PDF CN(

yt|Stm\Ot ,Σt

)

becomes narrow and all the summands in (2.30) and (2.32)

become negligible except for the single term in which yt is close to Stm\Ot . To find the

dominant term we can use either one of the following two methods:

34

a) MMSE estimator :

xt =((

H\Ot

)HR−1

w H\Ot + (Es)−1IK

)−1 (

H\Ot

)HR−1

w yt, (2.59)

where H\Ot = vec−1(m\0

t ), in which vec−1(.) is the inverse of the vec(.) operation. Equation

(2.59) is followed by hard decision,

st = arg minst∈AK

M

|| st − xt ||, (2.60)

b) ML estimator :

st = arg maxst∈AK

M

CN(

yt|Stm\Ot ,Σt

)

, (2.61)

Once st is computed from above then (2.30) and (2.32) can be approximated with

∇Hm ≈ SH

t Σ−1t ζt, (2.62)

∇V ≈ SHt

(

Σ−1t ζtζ

Ht Σ

−1t − Σ

−1t

)

St, (2.63)

where St = sTt ⊗ IM and ζt = yt − Stm

\Ot .

Remark 1. With the above simplification, the computational complexity of our algorithm

is dominated by (2.29), (2.31), (2.34), (2.35), (2.47), (2.49), (2.50), (2.57), (2.58), and

computation of m\Ot , V

\Ot in the smoothing pass. Thus, the computational complexity of

our EP algorithm is O(nT (M3K3+M2K2)). This complexity is only n times more than the

complexity of the conventional Kalman filtering and Kalman smoothing algorithms which

are run only once on the transmitted frame. However, as shown in Section 2.4 EP results

in a significantly improved detection and estimation performance compared to the Kalman

filtering and smoothing algorithms.

35

2.4 Simulation Results

In this section, we evaluate the performance of the proposed semi-blind EP algorithm

for joint channel estimation and symbol detection through simulations. We consider a

cellular system with L = 4 cells and K = 8 users in each cell. The performance of the

algorithms in the first cell is presented. The large-scale fading coefficients for the K users in

the cells are set to β11k = 1 and β1ik = a, k = 1, . . . , K and i = 2, 3, 4. The constant scalar

a models the cross gain between the first cell BS and the users in other cells [49, 80]. By

varying the value of a we study the effect of pilot contamination and inter-cell interference

on the channel estimation and symbol detection.

Using the Kronecker model, [74,79] for the spatial correlation matrices R1ik, we assume

[R1ik]m,n = r(m − n), m,n = 1, 2, . . . ,M and i = 1, 2, 3, 4, where r(m) = (ρ)|m|1. The

transmitted frame of length T is composed of Tp pilots symbols in the beginning followed

by the Td unknown data symbols. QPSK modulation with average symbol energy Es = 0

dB is assumed for both pilot symbols and the data symbols. Hadamard code is employed

to ensure that the pilot symbols of users in the first cell are orthogonal. Note that with our

assumed signal model in (2.9), orthogonality between the pilot sequences in the first and

neighboring cells is not assumed. The time-varying channel vectors for all the users in the

first cell are generated according to (2.10) with an initial Gaussian prior distribution on h0

with zero mean and covariance matrix Rh. Further, it is assumed that all the users in the

first cell have the same normalized Doppler shift fd and therefore the diagonal components

of matrix A in (2.10) are set to J0(2πfd)2.

The proposed algorithm described in Algorithm 1 is initialized with m\R0 = 0 and

V\R0 = Rh. Further, since EP converges in a few iterations, we set the maximum number

of iterations to n = 10 and the error tolerance between two iterations for terminating the

1For ease of presentation, we assume that all users have the same spatial correlation modeled by theparameter ρ. This assumption is valid when the angle of visibility to the target BS for the group of usersin the neighboring cells is either the same or a mirror image of the angle of visibility for the group of userin the target cell [81, 82].

2Note that increasing fd decreases an and thus the temporal correlation among the channel vectors.

36

16 24 32 40 48 56 64

-12

-10

-8

-6

-4

-2

0

2

4

Figure 2.4. Channel estimation error versus the receiver’s antenna array size M and withparameters: K = 8, Td = 64, Tp = K, a = 0.1, ρ = 0.

algorithm to ǫ = 10−6.

The symbol error rate (SER) is averaged over all the users in the first cell and the

channel estimation accuracy is measured using the following normalized error,

δh(dB) = 10 log10

1T

T∑

t=1

E

[

|| ht − ht ||2]

E [|| ht ||2]

, (2.64)

We note that (2.62) and (2.63), together with (2.47), (2.29), and (2.31) represent the

prediction and time-update equations of the Kalman filtering algorithm where the unknown

data symbols are estimated from density maximization. Therefore we refer to the initial

forward pass of our algorithm as the modified Kalman filter (KF-M). The performance

of this semi-blind version of Kalman filter which emerges from our EP derivations is also

presented. Further, (2.57) and (2.58) represent the backward recursion equations of Kalman

smoother. Thus, the performance of the Kalman smoothing algorithm followed by a single

pass of KF-M is also presented here for comparison and is denoted by KS-M. To benchmark

37

16 24 32 40 48 56 64

10-4

10-3

10-2

10-1

100

Figure 2.5. SER versus the receiver’s antenna array size M and with parameters: K = 8,Td = 64, Tp = K, a = 0.1, ρ = 0.

the performance of semi-blind KF-M, KS-M, and EP, we also present the performance of

the Kalman filter and smoother in a pure training mode (TM) when the entire frame is

composed of known pilot symbols and only channel estimation is performed. These two

cases, are referred to as KF-TM and KS-TM. In channel estimation, KF-TM provides a

lower bound for KF-M, and KS-TM provides a lower bound for KS-M, and EP. Finally, we

also plot the SER performance of the MMSE estimator with known CSI (denoted PCSI)

for comparison with SER performance of the proposed algorithms.

Fig. 2.4 shows the channel estimation error versus the number of antennas M for KF-

M, KS-M, and EP algorithms in a spatially uncorrelated (ρ = 0) but temporally correlated

channel. We consider two different cases for the temporal correlation corresponding to

fd = 0.01 and fd = 0.043. We observe that in both of these cases the performance of

3 We should point out that these are the normalized values for the Doppler shift. In other words,fd = fDTs where fD is the Doppler shift in Hz and Ts is the symbol period in sec. For example for anOFDM system with a bit rate of 15 Kbps per subcarrier, resulting in a symbol rate of 7.5 Ksps usingQPSK modulation, we get fD = 75 Hz and fD = 300 Hz for values of fd = 0.01 and 0.04, respectively. Fora carrier frequency of 2 GHz this implies a mobile velocity of 40.5 and 162 Km/h, respectively.

38

the algorithms improves with M and EP has a significant improvement over KF-M. The

channel is more time-varying in the case of fd = 0.04 and the estimation error is higher

in this case. The channel estimation error of KF-M approaches that of KF-TM for larger

values of M and the performance of EP approaches that of KS-TM. For both values of

fd, with increasing M our proposed semi-blind EP algorithm converges faster to KS-TM

than the semi-blind KS-M resulting in more accurate channel estimation. Note that while

KF-TM and KS-TM algorithms employ T = 72 pilot symbols to estimate the channel,

KF-M, KS-M, and EP use only Tp = K = 8 pilot symbols to estimate the channel (as

well as detect Td = 64 data symbols) with a pilot-overhead of 1/9. Channel estimation

improvement with M is a result of the so-called favorable propagation condition where the

channel vectors of different users become mutually orthogonal as M −→ ∞. As a result,

the performance of MMSE symbol estimator used in KF-M and EP improves. This in turn

improves the channel estimation accuracy for KF-M, KS-M, and EP. Fig. 2.4 also shows

the performance of the semi-blind expectation maximization (SB-EM) and the regularized

alternating least-square (R-ALS) algorithms proposed in [27] and [62], respectively, for the

channel described above using Tp = 8 pilot symbols. We observe that for fd = 0.01, both

SB-EM and R-ALS perform better than the KF-M and KF-TM algorithms, but worse than

the KS-M and EP. The improvement in performance over KF-M and KF-TM is because

both SB-EM and R-ALS estimate the channel using the entire received frame, whereas

at any given time, KF-M and KF-TM update the channel estimates using the received

signals up to the present time. Since KS-M, KS-TM, and EP use the entire received frame

as well in a smoothing pass, they outperform SB-EM and R-ALS due to the underlying

block-fading assumption of the latter two algorithms. Further, we observe that in the case

of fd = 0.04, where the channel is highly time-varying, the performance of R-ALS and

SB-EM is worse than all the other algorithms.

Fig. 2.5 depicts the SER performance of KF-M, KS-M, EP and PCSI versus the

number of antennas M . We can see that for both values of fd, the SER performance of all

39

16 24 32 40 48 56 64

-12

-10

-8

-6

-4

-2

0

Figure 2.6. Channel estimation error versus the receiver’s antenna array size M and withparameters: K = 8, Td = 64, Tp = K, a = 0.1, fd = 0.01.

16 24 32 40 48 56 64

10-4

10-3

10-2

10-1

100

Figure 2.7. SER versus the receiver’s antenna array size M and with parameters: K = 8,Td = 64, Tp = K, a = 0.1, fd = 0.01.

40

the algorithms improves with M and EP outperforms all the other algorithms except for

PCSI (MMSE with known channel coefficients). Moreover, the improvement of EP over

the other algorithms increases with M . The SER performance of SB-EM and R-ALS is also

shown in Fig. 2.5. It can be seen from Figs. 2.4 and 2.5 that the performance of algorithms

developed under the block-fading assumption is significantly degraded when the algorithms

are applied to time-varying channels as compared with algorithms specifically designed for

such channels.

Figs. 2.6 and 2.7 show the performance of KF-M, KS-M, and EP versus the number

of antennas M for a spatially and temporally correlated massive MIMO channel. For the

case of fd = 0.01 we consider two different cases of spatial correlation corresponding to

ρ = 0.4 and ρ = 0.9. We can see that in both cases the performance of the algorithms

again improves with M and EP significantly outperforms KF-M and KS-M. A comparison

of the three cases for ρ = 0 (in Figs. 2.4 and 2.5), ρ = 0.4 and ρ = 0.9 confirms the

expected result that as ρ increases, channel diversity decreases resulting in the degradation

of system performance. In addition, spatial channel correlation reduced the level of channel

hardening. In conclusion, as spatial correlation increases, a larger number of antennas are

required to achieve the same level of performance [67].

In Figs. 2.8 and 2.9, we study the effect of pilot contamination on channel estimation

error and SER. To this end, we vary the inter-cell cross gain a between the BS in the first

cell and the users in the three neighboring cells for the cases of M = 32, 64. As expected,

as the cross gain a increases, the performance of the algorithms degrades. However, as the

figures show, EP significantly outperforms KF-M and KS-M. Moreover, there is a significant

improvement as M increases from 32 to 64.

At large M values, the performance of the channel estimation algorithms in multi-

cell massive MIMO systems is constrained by the presence of pilot contamination in the

channels. As discussed in Section 2.1, its effect can be mitigated with the semi-blind channel

estimators which also exploit large number of data symbols for channel estimation. Figs.

41

0.05 0.1 0.2 0.3 0.4 0.5

-14

-12

-10

-8

-6

-4

-2

0

Figure 2.8. Channel estimation error versus the cross gain a of users in other cells and withparameters: K = 8, Td = 64, Tp = K, fd = 0.01, ρ = 0.4.

0.05 0.1 0.2 0.3 0.4 0.510

-6

10-5

10-4

10-3

10-2

10-1

100

Figure 2.9. SER versus the cross gain a of users in other cells and with parameters: K = 8,Td = 64, Tp = K, fd = 0.01, ρ = 0.4.

42

24 40 56 72 88 104 120 136

-11

-10

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

Figure 2.10. Channel estimation error versus the unknown data symbols length Td andwith parameters: K = 8, M = 64, Tp = K, fd = 0.01, ρ = 0.4.

24 40 56 72 88 104 120 136

10-2

10-1

100

Figure 2.11. SER versus the unknown data symbols length Td and with parameters: K = 8,M = 64, Tp = K, fd = 0.01, ρ = 0.4.

43

10-3

10-2

-10

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

Figure 2.12. Channel estimation error versus the normalized Doppler shift fd and withparameters: K = 8, M = 64, Td = 64, ρ = 0.4, a = 0.3.

2.10 and 2.11 show the performance of the algorithms versus the number of data symbols Td

in the transmitted frame. Two different cases of inter-cell interference described by a = 0.3

and a = 0.4 are considered. It is observed that as Td increases, the channel estimation error

and SER performance of all algorithms improves. This improvement is due to the fact that

the semi-blind approach uses the Td data symbols as virtual pilot symbols in estimating the

channel. Thus for a fixed number of antennas M , using a large Td can mitigate the impact

of pilot contamination with a pilot-overhead quite less than 1/9 [27]. In these figures,

the improvement in EP’s performance is better than both KF-M and KS-M. This is due

to the fact that EP updates the channel estimate at each time instant by incorporating

the forward, reverse, and observation messages. In contrast, KF-M does not include the

reverse and observation messages and KS-M does not include the observation messages.

Hence as Td increases, the channel estimates found through the smoothing pass of EP are

more accurate than those from KS-M and KF-M.

44

10-3

10-2

10-2

10-1

100

Figure 2.13. SER versus the normalized Doppler shift fd and with parameters: K = 8,M = 64, Td = 64, ρ = 0.4, a = 0.3.

Finally, in Figs. 2.12 and 2.13, we study the performance of KF-M, KS-M, and EP

versus the normalized Doppler shift fd for the pilot sequence of length Tp = K and Tp = 2K.

It is seen that for fixed Td, the performance of all algorithms improve with increasing the

pilot-overhead as expected. Further, as fd increases (the temporal correlation among the

channel vectors decreases), the performance of all algorithms degrades. It is interesting

to note that for fd ≤ 0.005, the performance of the algorithms is almost constant with

fd. In essence for these values of fd, the channel may be assumed to be time-invariant

(block fading). For the parameters listed in 3, this translates to mobile velocities less than

20 Km/h. In these figures we also show the performance of the algorithms when non-

orthogonal pilot sequences for users in the first cell are used. In this case for each user Tp

QPSK symbols are randomly generated and used at the beginning of each frame as the pilot

symbols. Although EP still outperforms KF-M and KS-M in this case, but a performance

degradation occurs compared to the previous case for all algorithms.

45

Chapter 3

Clustered Sparse Channel Estimation for Massive MIMO Systemsby Expectation Maximization-Propagation (EM-EP)

In this chapter, we study the problem of downlink channel estimation in multi-user

massive MIMO systems. Conventional pilot-based methods for downlink channel esti-

mation require pilot-overhead which scales with the number of BS antennas. Thus, we

consider the use of Bayesian compressive sensing approach in which the clustered sparse

structure of the massive MIMO channel in the angular domain is employed to reduce the

pilot-overhead. To capture the clustered structure, we employ a conditionally independent

identically distributed Bernoulli-Gaussian prior on the sparse vector representing the chan-

nel, and a Markov prior on its support vector. An expectation propagation (EP) algorithm

is developed to approximate the intractable joint distribution on the sparse vector and its

support with a distribution from an exponential family. The approximated distribution

is then used for direct estimation of the channel. The EP algorithm assumes that the

model parameters are known a priori. Since these parameters are unknown, we estimate

these parameters using the expectation maximization (EM) algorithm. The combination

of EM and EP referred to as EM-EP algorithm is reminiscent of the variational EM ap-

proach. Simulation results show that the proposed EM-EP algorithm outperforms several

recently-proposed algorithms in the literature.

3.1 Introduction

Since the cellular networks providing 4G-LTE services are already using the FDD pro-

tocol, the adoption of massive MIMO technology would be rapid if the massive MIMO

systems are designed for FDD rather than TDD protocol. In addition, in an FDD-based

system, the users can transmit in a continuous manner which helps to achieve the desired

cell-edge rates farther away from the base station. In contrast, in a TDD-based system,

the transmissions from the users can occur periodically, for instance, 1/2 or 1/3 of the

A preprint of this chapter was made available at https://arxiv.org/abs/2012.06675 as MohammedRashid and Mort Naraghi-Pour, “Clustered Sparse Channel Estimation for Massive MIMO Systems byExpectation Maximization-Propagation (EM-EP)”, arXiv e-prints, 2020. The copyright information isincluded in Appendix E.

46

coherence interval. Thus the desired rates can not be achieved at the same distances with

the TDD protocol compared to FDD. Therefore, to cover the same area, more base stations

are required with the TDD protocol compared to FDD which increases the operating and

deployment costs of the system.

However, in FDD-based massive MIMO systems, DL channel estimation is quite chal-

lenging [83]. In the conventional pilot-based method, the length of the pilot sequence scales

with the number of transmitting antennas. This implies a long pilot sequence which results

in reduced spectral efficiency. Moreover, the time required for pilot and data transmission

may exceed the coherence time of the channel. Recently, compressive sensing (CS) [46,47]

has been explored to reduce the pilot-overhead. Due to the limited local scattering in

the propagation environment, massive MIMO channel has a sparse representation in the

discrete Fourier transform (DFT) basis [83–86]. Using this sparsity structure, many CS-

based estimation algorithms have been devised. The classical orthogonal matching pursuit

(OMP) [87] and compressive sampling matching pursuit (CoSaMP) [88] are investigated

in [89, 90]. In [91] the authors assumed a common spatial sparsity among the subcarriers

in a frequency-selective DL channel and proposed the distributed sparsity adaptive match-

ing pursuit (DSAMP). Using a similar common spatial sparsity assumption, a generalized

approximate message passing (GAMP) based algorithm is proposed in [92] and the sparse

Bayesian learning (SBL) algorithm is derived in [71,93]. These and other algorithms which

use a DFT basis to obtain the sparse representation, employ a fixed uniformly-spaced dis-

crete grid in the angular domain which may not be sufficiently dense. As a result, some of

the physical angles of departures (AoDs) of the massive MIMO channel may not lie on the

assumed grid points. This direction mismatch error, also known as channel modeling error,

causes leakage of energy from such physical AoDs into the nearby angular bins resulting in

a straddle performance loss. In [45,94] this modeling error is minimized by learning a better

over-complete dictionary for the sparse representation. However, the proposed algorithm

requires extensive channel measurements from several locations in the cell to be used as

47

training samples. These measurements are cell-specific and difficult to collect in practice.

In [43], an Off-grid SBL algorithm is proposed in which the sampled grid points are mod-

eled as continuous-valued parameters and are learned iteratively to reduce the modeling

error. Simulation results in [43] show improved performance of off-grid SBL compared to

the over-complete dictionary learning algorithm in [45].

SBL and off-grid SBL aim to recover the sparse vector coefficients individually by

modeling them with an independent and identically distributed (iid) Gaussian prior distri-

bution. However, according to the geometry-based stochastic channel model (GSCM) [95],

there are a few dominant scatterers in the propagation environment, and the sub-paths

from each scatterer concentrate in small angular spreads which appear as non-zero clusters

in the sparse representation. This model is used in [96], although with the stringent as-

sumption of uniformly-sized clusters in the sparse vector. For non-uniform burst sparsity1,

a pattern-coupled SBL (PC-SBL) algorithm is proposed in [98] in which the precision of

each coefficient in the sparse vector is tuned according to the precision of its immediate

neighbors. However, PC-SBL updates the precisions with a sub-optimal solution. To avoid

this sub-optimality, a generic version of PC-SBL is derived in [97], referred to as PC-VB

here, where the authors assigned a latent support vector to every coefficient in the sparse

vector and assumed a multinoulli prior on the support vector. The resulting joint posterior

distribution on the sparse vector and its support is approximated with a variational Bayes-

based algorithm [99]. Grid refining procedure from [43] is also used to mitigate the direction

mismatch errors. In the same vein, Turbo compressive sensing (TCS) algorithm and expec-

tation maximization based GAMP algorithm were proposed in [44] and [100], respectively,

in which the sparse vector coefficients are modeled with an iid Bernoulli-Gaussian (BG)

prior. In [90] the authors extended [44] for the clustered sparse structure of the massive

MIMO channel and proposed a structured turbo compressive sensing (S-TCS) algorithm.

With a conditional iid BG prior on the sparse vector, a Markov prior is assumed on its

1This refers to the case when the non-zero clusters in the sparse vector appear with non-equal sizesseparated with sequences of zeros of arbitrary length [97].

48

support to integrate the clustering information of the massive MIMO channel. In [101],

a super-resolution clustered sparse Bayesian learning (SuRe-CSBL) algorithm is proposed

for a Markov prior distribution on the support vector. SuRe-CSBL approximates the true

joint posterior distribution on the sparse vector and its support with a structured GAMP

algorithm. The approximated distribution is then used for the estimation of massive MIMO

channel. The grid refining method from [43] is also integrated into SuRe-CSBL.

Here, we propose an EP algorithm to estimate the clustered sparse vector representing

the massive MIMO channel. Once the sparse vector is estimated from the received signal,

the physical massive MIMO channel can be easily estimated by a transform operation on

the sparse vector as in [90,97,101]. The contributions made in this paper are summarized

as follow:

• EP algorithm [78, 102] has been recently applied to SIMO and MIMO channel esti-

mation [54, 79]. It has also been applied to solve the inference problem in the CS

literature [103]. In [104], the authors used an EP algorithm to approximate the true

joint posterior distribution on the sparse vector and its support with a distribution

from an exponential family. However, an iid Bernoulli prior is assumed on the support

vector which does not capture the clustered structure of the sparse vector. In [105]

the authors assumed that the partitioning of the cluster in the sparse vector is known

a priori and modeled each cluster with a different Bernoulli prior distribution. In

contrast, we assume here that the cluster partitioning in the sparse vector is un-

known. Therefore to capture the structure of the sparse vector we model its support

vector with a first-order Markov process. An EP algorithm is developed to iteratively

approximate the intractable true joint posterior distribution on the sparse vector and

its support with a distribution from an exponential family. This distribution is then

used for the direct estimation of the DL massive MIMO channel.

• The framework of EP algorithm in [104, 105] assumes that the model parameters

including the noise precision in the signal model, the hyperparameters in the prior

49

distribution on the sparse vector, and the hyperparameters in the prior distribution

on the support vector are known a priori. For practical massive MIMO channel,

these parameters are unknown and need to be estimated. One way to estimate the

model parameters is by maximizing the marginal likelihood function−the procedure

which is known as type-II maximum likelihood method or evidence procedure [93].

However, directly maximizing the marginal likelihood function does not result in

closed-form update equations for the model parameters [71, 93]. Thus we derive

an expectation maximization (EM) algorithm which results in closed-form update

equations and iteratively computes the maximum likelihood solution of the model

parameters [106,107].

• In order to integrate the EP algorithm with the EM algorithm, we use a variational

EM approach [108,109] in which the approximated joint posterior distribution by the

EP algorithm is used to compute the expectation step in the EM algorithm. The

convergence of the resulting EM-EP algorithm is guaranteed through the conver-

gence properties of the variational EM algorithm [108]. As iterations of the proposed

method proceed, the EM algorithm converges to a local maxima of the marginal

likelihood function [106] and the EP algorithm closely approximates the true joint

posterior distribution with a distribution from an exponential family [53]. Grid re-

fining procedure from [43,97] is also integrated in the proposed EM-EP algorithm to

reduce the channel modeling error.

• Extensive simulations are carried out to demonstrate the efficacy of the proposed EM-

EP algorithm. The results are also compared with those in the literature showing the

advantages of the proposed method.

This chapter is organized as follow. Section 3.2 describes the system model for the

FDD-based downlink channel estimation in multi-user massive MIMO system. Expecta-

tion propagation algorithm for this system is proposed in Section 3.3. An expectation

50

maximization algorithm to estimate the model parameters and to refine the grid is derived

in Section 3.4. Finally, simulation results are discussed in Section 3.5.

Notations: Throughout this chapter, small letters (x) are used for scalars, bold small

letters (x) for vectors, and bold capital letters (X) for matrices. R and C represent the set of

real and complex numbers, respectively. The superscripts (.)T , (.)H , (.)∗, and (.)−1 represent

transpose, Hermitian transpose, complex conjugate, and inverse operations, respectively.

CN (x;µ,Σ) denotes complex Gaussian distribution on x with mean µ and covariance

matrix Σ. Bern(x; p) denotes a Bernoulli distribution on x with mean p. For a complex

variable x, |x| denotes the modulus of x. ℜ{x} represents its real part and ℑ{x} represents

its imaginary part. For a pdf p(.), Ep denotes the expectation operator with respect to p(.).

δ(x) is the Kronecker delta function which is equal to 1 when x = 0 and is zero otherwise.

IN denotes the N×N identity matrix. Finally, tr(X) and ||x|| denote the trace of a matrix

X and the ℓ2-norm of the vector x, respectively.

3.2 System Model

Consider a single cell massive MIMO system where a BS equipped with G antennas

serves K users each one having a single antenna. It is assumed that FDD is used and to

enable the estimation of the DL channels, the BS broadcasts a sequence of N pilot symbols

denoted by X = [x1,x2, . . . ,xN ]H where xn ∈ CG×1 for n = 1, . . . , N . The signal received

by the k-th user is given by

yk = Xhk + nk, (3.1)

where yk ∈ CN×1, hk ∈ C

G×1 is the DL channel to the k-th user and the receiver noise nk

is distributed as CN (nk; 0, η−1k IN) in which ηk denotes the precision.

Assuming that the transmitted pilot sequence satisfies tr(XXH) = NG, the signal-to-

noise ratio (SNR) is given by SNR = ηk. Suppose that the BS is equipped with a uniform

51

linear array (ULA)2 and to transmit in the direction θ, it uses the beam steering vector

a(θ) =[

1, e−j2π dλd

sin(θ), . . . , e

−j2π dλd

(G−1)sin(θ)]T

, (3.2)

where d is the spacing between adjacent antenna elements and λd is the wavelength of the

DL signal. Let the DL signal propagating from BS on the way to the k-th user pass across

a total of Ls scatterers each one forwarding the signal on Lp paths towards the user. Then

the channel vector hk to the k-th user can be written as

hk =Ls∑

s=1

Lp∑

p=1

αk,s,pa(θk,s,p), (3.3)

where αk,s,p is the complex path gain for the s-th scatterer and p-th path, and θk,s,p is the

corresponding AoD [85,110].

To reduce the pilot-overhead for estimating this downlink channel, we use the CS

approach which requires a virtual channel representation of the physical channel in (3.3).

To this end, let θ = (θ1, θ2, . . . , θM)T denote a uniform sampling of the interval [−π/2, π/2]

into M points. Assuming M is large enough such that the physical AoDs in (3.3) lie on

the grid points, the virtual representation of hk is given by

hk = A(θ)wk, (3.4)

where A(θ) = [a (θ1) , a (θ2) , . . . , a (θM)] and the vector wk contains the channel coefficients

in the virtual angular domain. Note that when M = G and the grid is uniformly sampled,

the dictionary A(θ) represents the unitary discrete Fourier transform matrix [43]. The

choice of the parameter M is discussed in Section 3.5.

In this chapter, we focus on the DL channel estimation for a reference user. Therefore

2In this work we assume a ULA at the BS. However, the proposed algorithm can be extended to anarbitrary 2-D array using the approach suggested in [43].

52

dropping the index k, from (3.1) and (3.4), the received signal is written as

y = Φ(θ)w + n, (3.5)

in which Φ(θ) = XA(θ). From (3.5) the likelihood function of w is given as p(y|Φ(θ),w, η)=

CN (y;Φ(θ)w, η−1IN). Given y and Φ(θ) we aim to compute the posterior distribution

of the sparse vector w. Note that the posterior distribution on w can be used to find

the minimum mean squared error (MMSE) estimate of w from which the physical channel

estimate is obtained using (3.4).

According to the GSCM model [95], there are only a few dominant scatterers in the

channel, i.e., Ls is small. Moreover, the forwarding paths from each scatterer are con-

centrated in a small angular spread around the line of sight direction between the BS

and the scatterer [111, 112]. Thus, w exhibits a clustered sparse structure with unknown

marking of cluster boundaries. Hence the support (indices of non-zero elements) of w is

unknown [43, 97, 101]. To model the clustered sparse structure of w and to determine its

support, we condition the m-th element of w on a latent variable zm ∈ {0, 1}, where wm 6= 0

when zm = 1 and wm = 0 when zm = 0. Thus given the latent vector z = [z1, z2, . . . , zM ]T ,

as in [90,101,104,105], the prior distribution on w is written as

p(w|z,γ) =M∏

m=1

p(wm|zm, γm),

=M∏

m=1

[

zmCN (wm; 0, γ−1m ) + (1 − zm)δ(wm)

]

, (3.6)

where γ = (γ1, γ2, . . . , γM)T and γm is the precision of wm. Due to the clustered sparsity

of w, the elements of the vector z are correlated. To capture this correlation we model z

as a first-order Markov process with transition probabilities Pr(zm = 1|zm−1 = 0) = τ01

and Pr(zm = 0|zm−1 = 1) = τ10. Note that these transition probabilities reflect the

53

Figure 3.1. Factor graph illustrations of (a) True posterior distribution in (3.9), and (b)Approximated posterior distribution in (3.20). Variable nodes are represented by circles(filled in circles for observed variables and empty ones for the hidden variables) and factornodes are denoted by small rectangles. Repetition of observed variables in the subgraph isrepresented using a plate (big rectangle) notation.

clustered sparse structure of w in the following way. The average length of the sequence

of zeros between two consecutive non-zero clusters is large when τ01 is small, and the non-

zero cluster size on average is large when τ10 is small. Denoting τ , (τ01, τ10), the prior

distribution on z is given as

p(z|τ ) = p(z1)M∏

m=2

p(zm|zm−1, τ ),

= p(z1)M∏

m=2

[(

(1 − τ10)zm−1(τ01)(1−zm−1))zm

(

(τ10)zm−1(1 − τ01)(1−zm−1))(1−zm)

]

,

(3.7)

where p(z1) = Bern(z1;λ) and we use the steady state distribution for z1 and set λ = τ01

τ01+τ10.

In practice the physical AoDs may not lie on the assumed angular grid θ in (3.4), and

thus we treat θ as an unknown parameter and aim to estimate it for learning the dictionary.

Therefore, letting ξ , (τ , γ1, γ2, . . . , γM , η,θT )T , we aim to jointly estimate (w, z, ξ). We

54

write the joint posterior distribution of (w, z, ξ) as

p(w, z, ξ|y) ∝ p(w, z|y, ξ)p(y|ξ)p(ξ), (3.8)

where the conditioning on ξ in (3.8) removes the multidimensional integration over ξ re-

quired otherwise in computing the normalization constant. Note that in (3.8), the marginal

joint posterior distribution on w and z is given by

p(w, z|y, ξ) ∝ p(y|Φ(θ),w, η)p(w|z,γ)p(z|τ ), (3.9)

Computing the joint posterior distribution in (3.8) is still involved. We can reduce (3.8)

to (3.9) by using the maximum a posteriori estimate of ξ in (3.9) obtained by maximizing

p(y|ξ)p(ξ) with respect to ξ. Assuming a uniform prior distribution on ξ, we get the

maximum likelihood (ML) estimate which can be computed as follows.

ξ = arg maxξ

p(y|ξ), (3.10)

The objective function in (3.10) is a non-concave function and due to the involved multidi-

mensional parameter space a brute-force search is difficult [93]. An alternative is to use the

iterative expectation maximization (EM) algorithm which increases the likelihood function

p(y|ξ) in each iteration and guarantees convergence to a local maxima [106, 107]. To this

end we define the complete data as d = [yT ,wT , zT ]T . Then if ξl is the estimate from the

l-th iteration, in the (l + 1)-st iteration of EM we perform the following two steps

E-Step : L(ξ; ξl) = Ep(w,z|y,ξl) [ln p(y,w, z|ξ)] , (3.11)

M-Step : ξ(l+1) = arg maxξ

L(ξ; ξl), (3.12)

and (3.11) and (3.12) are repeated until convergence.

55

Computing the E-step in (3.11) requires the exact joint posterior distribution p(w, z|y, ξl)

which is computationally intractable as it requires a multidimensional integration and sum-

mation. Therefore in Section 3.3 we derive an expectation propagation (EP) algorithm to

approximate this distribution with a distribution from an exponential family. We denote

the approximate distribution by Q(w, z|y, ξl) and use it in place of p(w, z|y, ξl) in (3.11).

Note that the estimate of the parameters in the l-th iteration of EM, namely ξl is used

by the EP algorithm to obtain Q(w, z|y, ξl). Once the E-step is solved in this way, the

solution to the M-step, derived in Section 3.4, is computed to obtain ξl+1. Next the EP

algorithm is run with ξl+1 to obtain Q(w, z|y, ξl+1) which is used in the (l+ 1)-st iteration

of E-step. The iterations between EM and EP are continued in this way until convergence is

achieved. This EM-EP approach is reminiscent of the variational EM algorithm [108,109].

We should point out that convergence of EM-EP is assured based on the convergence prop-

erties of variational EM [108]. In particular, as the iterations of the EM-EP proceed, the

EM algorithm converges to a local maxima of the objective function in (3.10) [106] and

the EP algorithm closely approximates the true joint posterior distribution p(w, z|y, ξl) in

(3.9) [53]. An EM-EP algorithm has been used in [113] to solve a classification problem,

whereas here we tend to use the setting for solving the estimation problem.

3.3 Expectation Propagation Algorithm

In this section, we derive an expectation propagation algorithm to approximate the

joint posterior distribution p(w, z|y, ξ) in (3.9) with a distribution from an exponential

family. For a review of the EP algorithm we refer the reader to [78,102–105].

Let F denote the family of exponential distributions. Exploiting the factorized struc-

ture of (3.9), we approximate the joint posterior distribution p(w, z|y, ξ) with

Q(w, z) = Q(w)Q(z), (3.13)

56

where Q(w) ∈ F and Q(z) ∈ F3. We choose the factors in (3.13) as

Q(w) = CN (w;µ,Σ), (3.14)

Q(z) =M∏

m=1

Qm(zm) =M∏

m=1

Bern(zm;σ(pm)), (3.15)

where the sigmoid function σ(.) is used to define the mean of the Bernoulli distribution as

σ(pm)4. The use of sigmoid function simplifies EP updates and avoids numerical underflow

errors resulting in the numerical stability of EP algorithm [105]. In (3.14) and (3.15), µ,

Σ, and p , [p1, p2, . . . , pM ]T are the unknown parameters that we next aim to estimate

with the EP algorithm.

Next we approximate each factor in (3.9). Let q1(w), q2(w, z) and q3(z) approximate

p(y|Φ(θ),w, η), p(w|z,γ) and p(z|τ ), respectively. Since q1(.) and q3(.) are the marginal

functions of w and z, respectively, whereas q2(.) is the joint function of both w and z, we

choose these terms as follows

q1(w) = CN (w;µ1,Σ1), (3.16)

q2(w, z) =M∏

m=1

q2,m(wm, zm), (3.17)

where

q2,m(wm, zm) ∝ CN (wm;µ2,m,Σ2,m)Bern(zm;σ(p2,m)), (3.18)

For q3(z), we approximate p(zm|zm−1) in (3.7) with qF R3,m−1,m(zm−1, zm) which in factorized

3The conditioning on y and ξ is dropped in this section occasionally for notational convenience4For a variable x ∈ R, the sigmoid function is defined as σ(x) = 1

1+e−x .

57

form we write as qF R3,m−1,m(zm−1, zm) = qR

3,m−1(zm−1)qF3,m(zm). Therefore

q3(z) =M∏

m=1

qR3,m(zm)qF

3,m(zm), (3.19)

where for j ∈ {F,R}, qj3,m(zm)= Bern

(

zm;σ(pj3,m)

)

and σ(pj3,m) denotes the mean of the

Bernoulli distribution. These means actually define the forward and reverse messages sent

between zm−1 and zm in the factor graph of Fig. 3.1(a) to get the approximate posterior

distribution in Fig. 3.1(b). Note that in (3.19) we use the convention that qF3,1(z1) = p(z1)

and qR3,M(zM) = 1. Next to find the unknown parameters in (3.14) and (3.15), we write

Q(w, z) ∝ q1(w)q2(w, z)q3(z), (3.20)

and using (3.16)-(3.19) in (3.20) above, we get

Σ =(

Σ−11 + Σ

−12

)−1, (3.21)

µ = Σ

(

Σ−11 µ1 + Σ

−12 µ2

)

, (3.22)

pm =

p2,m + pF3,m + pR

3,m, for m = 1, 2, . . . ,M − 1,

p2,m + pF3,m, for m = M,

(3.23)

where µ2= (µ2,1, µ2,2, . . . , µ2,M)T and Σ2 is a diagonal matrix with m-th entry as [Σ2]m,m =

Σ2,m. Note that pj3,m for j ∈ {F,R} and pm, p2,m in (3.23) are the arguments to the sigmoid

functions and not the success probabilities of the Bernoulli distributions. Thus, the value

of pm in (3.23) can be outside the range [0, 1]. However, the output of the sigmoid function

with input pm will be in the range [0, 1] representing the success probability5. Also note

that since in (3.19) we set qF3,1(z1) = p(z1), this implies that in (3.23) pF

3,1 = σ−1(λ). Both

the true posterior distribution in (3.9) and the approximated one in (3.20) are depicted in

5To derive (3.23), we used the following facts. Firstly,∏N

n=1 Bern(x; φn) ∝ Bern(x; φ) where φ =∏

N

n=1φn

∏N

n=1φn+

∏N

n=1(1−φn)

. Secondly, the inverse sigmoid (logit) function is given by σ−1(x) = ln x1−x .

58

Figure 3.2. EP steps for updating q2,m(wm, zm): (a) Eliminate q2,m(wm, zm) from thefactor graph to find the cavity distribution Q\2m(wm, zm) as in (3.28), (b) Use p(wm|zm)factor to define the hybrid posterior distribution R2,m(wm, zm) as in (3.32), and (c) ProjectR2,m(wm, zm) onto F and update q2,m(wm, zm) as in (3.34), (3.45), and (3.48).

Fig. 3.1 for clarity.

Now as q1(w) approximates p(y|Φ(θ),w, η) which is a complex Gaussian function

of w then to simplify we set q1(w) ∝ CN (y;Φ(θ)w, η−1IN). Expanding this Gaussian

distribution and completing the square for w, we get

Σ−11 = ηΦH(θ)Φ(θ), Σ

−11 µ1 = ηΦH(θ)y, (3.24)

using (3.24), (3.21) and (3.22) can be simplified as

Σ = Σ2 − Σ2ΦH(θ)

(

η−1IN + Φ(θ)Σ2ΦH(θ)

)−1Φ(θ)Σ2, (3.25)

µ = Σ

(

ηΦH(θ)y + Σ−12 µ2

)

, (3.26)

Thus to compute (3.23), (3.25), and (3.26) we just need to update the approximation factors

q2(w, z) and q3(z). We first update q2(w, z) as follow. Since it is equal to the product of

marginals q2,m(wm, zm), we can instead update each marginal distribution individually and

in parallel [102]. The steps involved in upating q2,m(wm, zm) are depicted in Fig. 3.2. Let

Qm(wm, zm) denote the marginal distribution obtained from (3.13) then using (3.14) and

59

(3.15) it can be written as

Qm(wm, zm) = Qm(wm)Qm(zm),

∝ CN (wm;µm,Σm,m)Bern(zm;σ(pm)), (3.27)

where µm is the m-th element of µ, and Σm,m = [Σ]m,m for m = 1, 2, . . . ,M . Following the

EP framework we first find the cavity distribution as

Q\2,m(wm, zm) =Qm(wm, zm)q2,m(wm, zm)

∝ Q\2,m(wm)Q\2,m(zm), (3.28)

where Q\2,m(wm) = CN (wm;µ\2,m,Σ\2,m) and Q\2,m(zm) = Bern(zm;σ(p\2,m)). The pa-

rameters in these distributions are given by6

Σ\2,m =(

Σ−1m,m − Σ−1

2,m

)−1, (3.29)

µ\2,m = Σ\2,m

(

Σ−1m,mµm − Σ−1

2,mµ2,m

)

, (3.30)

p\2,m = pm − p2,m, (3.31)

Next we define the hybrid posterior distribution R2,m(wm, zm) as

R2,m(wm, zm) =1Cm

p(wm|zm)Q\2,m(wm, zm), (3.32)

where p(wm|zm) is defined in (3.6). The normalization constant Cm in (3.32) is computed

6To get (3.31) we used the fact that for a Bernoulli variable x, we have Bern(x;φ1)Bern(x;φ2) ∝ Bern(x; φ) where

φ = φ1/φ2

φ1/φ2+(1−φ1)/(1−φ2) .

60

Figure 3.3. EP steps for updating qR3,m−1(zm−1) and qF

3,m(zm): (a) Eliminate qR3,m−1(zm−1)

and qF3,m(zm) from the factor graph to find the cavity distributions q\R

3,m−1(zm−1) and q\F3,m(zm)

as in (3.51) and (3.54), (b) Use p(zm|zm−1) factor to define the hybrid posterior distributionS3,m−1,m(zm−1, zm) as in (3.56), and (c) Project S3,m−1,m(zm−1, zm) onto F and updateqR

3,m−1(zm−1) and qF3,m(zm) as in (3.62)-(3.66).

as follow

Cm =∑

zm∈{0,1}

∫

p(wm|zm)Q\2,m(wm, zm)dwm,

=∫

CN (wm; 0, γ−1m )CN (wm;µ\2,m,Σ\2,m)dwmσ(p\2,m)

+∫

δ(wm)CN (wm;µ\2,m,Σ\2,m)dwm(1 − σ(p\2,m)),

= CN (0;µ\2,m,Σ\2,m + γ−1m )σ(p\2,m) + CN (0;µ\2,m,Σ\2,m)(1 − σ(p\2,m)), (3.33)

Then we update the approximation Qm(wm, zm) by projecting R2,m(wm, zm) onto the

closest distribution in F by minimizing the following Kullback-Leibler (KL) divergence

Qm(wm, zm) = arg minQm(wm,zm)∈F

KL(R2,m(wm, zm)‖Qm(wm, zm)), (3.34)

since Qm(wm, zm) = Qm(wm)Qm(zm) from (3.27), it can be easily shown that the optimiza-

61

tion problem in (3.34) is equivalent to solving the following two separate problems

Qm(wm) = arg minQm(wm)∈F

KL (R2,m(wm)‖Qm(wm)) , (3.35)

and

Qm(zm) = arg minQm(zm)∈F

KL (R2,m(zm)‖Qm(zm)) , (3.36)

whereR2,m(wm)=∑

zm∈0,1 R2,m(wm, zm) andR2,m(zm)=∫

R2,m(wm, zm)dwm are the marginal

distributions. The KL divergence in (3.35) and (3.36) is minimized by using the moment

matching property [104]. Thus for Qm(wm) and Qm(zm) defined in (3.27) we set

µm = ER2,m[wm], (3.37)

Σm,m = ER2,m[|wm|2] − |ER2,m

[wm]|2, (3.38)

σ(pm) = ER2,m[zm], (3.39)

The values of µm, Σm,m, and σ(pm) are given in the following lemma which is proved in

Appendix C.

Lemma 2. 1. The posterior mean value σ(pm) is given by

σ(pm) =(

1 +σ(−p\2,m)CN (0;µ\2,m,Σ\2,m)

σ(p\2,m)CN (0;µ\2,m,Σ\2,m + γ−1m )

)−1

, (3.40)

2. The posterior mean value µm is given by

µm = µ\2,m + Σ\2,m∂ lnCm

∂µ∗\2,m

, (3.41)

62

where

∂ lnCm

∂µ\2,m

= −σ(pm)µ∗

\2,m

Σ\2,m + γ−1m

− σ(−pm)µ∗

\2,m

Σ\2,m

, (3.42)

3. The posterior variance Σm,m is given by

Σm,m = Σ\2,m + (Σ\2,m)2

∂ lnCm

∂Σ\2,m

− ∂ lnCm

∂µ∗\2,m

∂ lnCm

∂µ\2,m

, (3.43)

where

∂ lnCm

∂Σ\2,m

= σ(pm)|µ\2,m|2 −

(

Σ\2,m + γ−1m

)

(

Σ\2,m + γ−1m

)2 + σ(−pm)|µ\2,m|2 −

(

Σ\2,m

)

(

Σ\2,m

)2 , (3.44)

Now we update the approximation factor q2,m(wm, zm). Since q2,m(wm, zm)= q2,m(wm)q2,m(zm)

we can update the marginals separately. To update q2,m(wm) we write

q2,m(wm) =Qm(wm)Q\2,m(wm)

=CN (wm;µm,Σm,m)

CN (wm;µ\2,m,Σ\2,m),

∝ CN (wm;µ2,m,Σ2,m), (3.45)

where

Σ2,m =(

(Σm,m)−1 −(

Σ\2,m

)−1)−1

, (3.46)

µ2,m = Σ2,m

(

(Σm,m)−1 µm −(

Σ\2,m

)−1µ\2,m

)

, (3.47)

and to update q2,m(zm) we write

q2,m(zm) =Qm(zm)Q\2,m(zm)

=Bern(zm;σ(pm))

Bern(zm;σ(p\2,m)),

∝ Bern(zm;σ(p2,m)), (3.48)

63

where

σ(p2,m) =CN (0;µ\2,m,Σ\2,m + γ−1

m )CN (0;µ\2,m,Σ\2,m + γ−1

m ) + CN (0;µ\2,m,Σ\2,m), (3.49)

and using the logit function σ−1(.) on (3.49) we get

p2,m = ln CN (0;µ\2,m,Σ\2,m + γ−1m ) − ln CN (0;µ\2,m,Σ\2,m) (3.50)

Next we update the approximation factor q3(z) in (3.20). We start with updating

qR3,m−1(zm−1) and qF

3,m(zm). The EP steps taken to update these factors are summarized

in Fig. 3.3. Given the marginal distribution on zm as Qm(zm) = qF3,m(zm)q2,m(zm)qR

3,m(zm)

which is also easily observable from Fig. 2.1(b), we first find the cavity distribution

q\R3,m−1(zm−1) as follow

q\R3,m−1(zm−1) =

Qm−1(zm−1)qR

3,m−1(zm−1),

= qF3,m−1(zm−1)q2,m−1(zm−1),

∝ Bern(

zm−1;σ(

p\R3,m−1

))

, (3.51)

where

σ(

p\R3,m−1

)

=σ(

pF3,m−1

)

σ (p2,m−1)

σ(

pF3,m−1

)

σ (p2,m−1) + σ(

−pF3,m−1

)

σ (−p2,m−1), (3.52)

Solving (3.52) using the logit function σ−1(.) and adjusting the notation to update the

m-th factor we get

p\R3,m = p2,m + pF

3,m, for m = 1, 2, . . . ,M − 1 (3.53)

64

Similarly, the cavity distribution q\F3,m(zm) can also be found by

q\F3,m(zm) =

Qm(zm)qF

3,m(zm),

= qR3,m(zm)q2,m(zm),

∝ Bern(

zm;σ(

p\F3,m

))

, (3.54)

where following the similar approach as in (3.52) and (3.53) we get

p\F3,m =

p2,m + pR3,m, for m = 1, 2, . . . ,M − 1

p2,m, for m = M(3.55)

Once the cavity distributions are computed then we define the hybrid joint posterior

distribution on zm−1 and zm as

S3,m−1,m(zm−1, zm) = q\R3,m−1(zm−1)p(zm|zm−1)q

\F3,m(zm), (3.56)

in which p(zm|zm−1) is given in (3.7). Since (3.56) involves a product of Bernoulli distribu-

tions, S3,m−1,m(zm−1, zm) is a bivariate Bernoulli distribution where the marginal distribu-

tions on zm−1 and zm can be written as

S3,m−1(zm−1) =∑

zm∈{0,1}

S3,m−1,m(zm−1, zm), (3.57)

S3,m(zm) =∑

zm−1∈{0,1}

S3,m−1,m(zm−1, zm), (3.58)

and using their derived forms in Appendix D the means of these marginal Bernoulli distri-

65

butions are found as

ES3,m−1 [zm−1] =1Dm

σ(p\R3,m−1)

[

σ(p\F3,m)(1 − τ01) + σ(−p\F

3,m)τ01

]

, (3.59)

ES3,m[zm] =

1Dm

σ(p\F3,m)

[

σ(p\R3,m−1)(1 − τ01) + σ(−p\R

3,m−1)τ10

]

, (3.60)

where the normalization constant Dm is given by

Dm = σ(−p\R3,m−1)σ(−p\F

3,m)(1 − τ10) + σ(p\R3,m−1)σ(−p\F

3,m)τ01

+ σ(−p\R3,m−1)σ(p\F

3,m)τ10 + σ(p\R3,m−1)σ(p\F

3,m)(1 − τ01), (3.61)

Now we update the approximation factors Qm−1(zm−1) and Qm(zm) by projecting

S3,m−1,m(zm−1, zm) in (3.56) onto the closest distribution in F . This is done by minimiz-

ing the KL divergence between S3,m−1,m(zm−1, zm) and Qm−1(zm−1)Qm(zm). As in (3.34),

this KL divergence can be minimized by solving the following two separate optimization

problems

Qm−1(zm−1) = arg minQm−1(zm−1)∈F

KL (S3,m−1(zm−1)‖Qm−1(zm−1)) , (3.62)

and

Qm(zm) = arg minQm(zm)∈F

KL (S3,m(zm)‖Qm(zm)) , (3.63)

where the marginals S3,m−1(zm−1) and S3,m(zm) are computed from (3.57) and (3.58). The

KL divergence in (3.62) and (3.63) is minimized as before by using the moment matching

property. Thus we set σ(pm−1) = ES3,m−1 [zm−1] given in (3.59) and σ(pm) = ES3,m[zm] given

in (3.60).

Finally we update the approximation factors qR3,m−1(zm−1) and qF

3,m(zm) as follow. To

66

update qR3,m−1(zm−1) we write

qR3,m−1(zm−1) =

Qm−1(zm−1)

q\R3,m−1(zm−1)

∝ Bern(zm−1;σ(pR3,m−1)), (3.64)

in which σ(pR3,m−1) is computed from

σ(pR3,m) =

σ(

p\F3,m+1

)

(1 − τ01) + σ(

−p\F3,m+1

)

τ01

σ(

p\F3,m+1

)

(1 − τ01) + σ(

−p\F3,m+1

)

τ01 + σ(

p\F3,m+1

)

τ10 + σ(

−p\F3,m+1

)

(1 − τ10),

(3.65)

where m = 1, 2, . . . ,M − 1, and the notation in (3.65) is adjusted to compute the m-th

factor. Similarly to update qF3,m(zm) we write

qF3,m(zm) =

Qm(zm)

q\F3,m(zm)

∝ Bern(zm;σ(pF3,m)), (3.66)

where σ(pF3,m) is computed from

σ(pF3,m) = σ

(

p\R3,m−1

)

(1 − τ01) + σ(

−p\R3,m−1

)

τ10, for m = 2, . . . ,M, (3.67)

This completes all the posterior updates required for an EP’s iteration. The complete EP

algorithm is summarized in Algorithm 2.

Remark 2. In order to improve the convergence of our proposed EP algorithm, when(

(Σm,m)−1 −(

Σ\2,m

)−1 )−1 ≥ 0, we follow the approach suggested in [102, 104] for an

EP algorithm, and damp the updates of the factors {q2,m(wm, zm)}Mm=1, {qF

3,m(zm)}Mm=2,

and {qR3,m(zm)}M−1

m=1 in every EP iteration. Using a smoothing mechanism the parameters

Σ2,m, µ2,m, p2,m and pj3,m, j ∈ {F,R}, are damped according to the equation

ψdamp = βψ + (1 − β)ψold (3.68)

67

Algorithm 2: EP AlgorithmInput: y

Parameters: ξ,θ./* EP run */

for each n = {1, 2, . . . , nEP }1. Compute Q(w, z) parameters p, Σ, and µ using (3.23), (3.25), and (3.26), respectively.

/* Updating factor q2(w, z): */

for each m = {1, 2, . . . , M}

1. Find Q\2,m(wm, zm) parameters Σ\2,m, µ\2,m, and p\2,m from (3.29), (3.30),and (3.31), respectively.

2. Update Qm(wm, zm) by computing pm from (3.40), µm from (3.41), and Σm,m

using (3.43).

3. Update the factor q2,m(wm, zm) by computing Σ2,m from (3.46), µ2,m from (3.47),and p2,m using (3.50).

end

/* Updating factor q3(z): */

/* Forward pass: */

for each m = {1, 2, . . . , M}

1. To update q\R3,m(zm) factor, compute p

\R3,m from (3.53), if m < M .

2. Update qF3,m(zm) by computing pF

3,m using (3.67), if m > 1.

end

/* Reverse pass: */

for each m = {M, M − 1, . . . , 1}

1. To update q\F3,m(zm) factor, compute p

\F3,m from (3.55).

2. Update the factor qR3,m(zm) by computing pR

3,m from (3.65), if m < M .

end

/* Check for convergence: Keep track of µ for each nth iteration */

if||µn−µ

n−1||||µn−1|| < ǫEP then

break;end

end

Output: µ, Σ, p

68

where β ∈ (0, 1) is the smoothing factor, ψold represents the parameter in the previous EP

iteration and ψ is the value calculated according to the dervations in Section 3.3. The su-

perscript damp denotes the value of the parameter after applying the smoothing mechanism.

The above damped updates replace the respective undamped ones in the next iteration of EP.

Further, to improve the convergence of EP we use the annealed damping scheme as suggested

in [104] where we start the EP algorithm with β = 0.5 and progressively anneal its value by

multiplying it with a constant κ < 1 after every iteration of EP until convergence. Based

on empirical evidence we select κ = 0.945 for the considered channel estimation problem in

this paper. Note that as indicated in [104] we can also have(

(Σm,m)−1 −(

Σ\2,m

)−1)−1

< 0

and when this happen we just set Σ2,m = 102 and use the above smoothing mechanism.

3.4 Expectation Maximization Algorithm

In this section we evaluate the E-Step and M-step of the EM algorithm as discussed in

(3.11) and (3.12). Using EM we aim to iteratively find the ML estimate of the unknown

parameters ξ = (τ , γ1, γ2, . . . , γM , η,θ)T . For the complete data defined in section 2.2 as

d = [yT ,wT , zT ]T and using the EP’s approximation to the posterior distribution from

(3.13), the E-step in (3.11) can be written as

L(ξ; ξl) ≈ EQ(w,z|y,ξl) [ln p(y,w, z|ξ)] ,

= EQ(w,z|y,ξl) [ln p(y|Φ(θ),w, η)p(w|z,γ)p(z|p10, p01)] , (3.69)

Since jointly maximizing (3.69) over ξ is difficult, here we instead update ξ one element

at a time while keeping the other elements fixed to their current estimates in the l-th

iteration [114]. To estimate τ10 and τ01, since only p(z|τ ) involves these parameters, (3.69)

69

simplifies to

L1(τ ; τ l) = EQ(w,z|y,ξl) [ln p(z|p10, p01)] ,

=M∑

m=2

[

ln(1 − τ01) + σ(

p(l+1)m

)

σ(

p(l+1)m−1

)

ln(1 − τ10)(1 − τ01)

τ01τ10

+

σ(

p(l+1)m

)

lnτ01

(1 − τ01)+ σ

(

p(l+1)m−1

)

lnτ10

(1 − τ01)

]

+ const, (3.70)

where we use the fact that EQ[zm] = σ(pm). Maximizing L1(.) with respect to (w.r.t) τ ,

we get the update equations as

τ(l+1)01 =

∑Mm=2

[

σ(

p(l+1)m−1

) (

1 − σ(

p(l+1)m

))]

∑Mm=2 σ

(

p(l+1)m−1

) , (3.71)

τ(l+1)10 =

∑Mm=2

[

σ(

p(l+1)m

) (

1 − σ(

p(l+1)m−1

))]

∑Mm=2

(

1 − σ(

p(l+1)m−1

)) , (3.72)

Similarly, maximizing L(.) w.r.t γm and η we get

γ(l+1)m =

(

Σ(l+1)m,m + |µ(l+1)

m |2)−1

, (3.73)

and,

η(l+1) =N

‖y − Φ(θl)µ(l+1)‖2 + tr {Φ(θl)Σ(l+1)ΦH(θl)} , (3.74)

where to get (3.73) we use the fact that EQ[|wm|2] = Σm,m+|µm|2 in which µm and Σm,m are

defined in (3.27), and in (3.74) we use the fact that EQ[w] = µ and EQ[wwH ] = Σ+µµH .

Both Σ and µ are given in (3.25) and (3.26).

Finally to update θ for dictionary learning and minimizing the modeling error, the

objective function in (3.69) can be simplified to

L2(θ) = ‖y − Φ(θ)µ(l+1)‖2 + tr{

Φ(θ)Σ(l+1)Φ(θ)H

}

, (3.75)

70

As seen from (3.75), a closed-form update equation for θ can not be obtained, but we

can use numerical methods, for instance, gradient descent (GD) to update θ in the l-th

iteration. However, GD employs backtracking line search [115] to adaptively select the

step-size which requires constant evaluation of the objective function in (3.75). Thus, to

reduce the computational complexity we adopt the following single-step update for θ with

a constant step-size as suggested in [43,97], i.e.,

θ(l+1) = θl − rθ

100sign

{

∇θlL2

(

θl)}

, (3.76)

where rθ is the grid interval, and sign{.} represent the signum function which has negligible

computational complexity. The step size rθ/100 divides the grid interval into 100 equal

parts, thus in the worst case the true values may be obtained in less than 100 iterations.

Further, this step size ensures that the final direction mismatch error is less than 1% of

rθ which for sufficiently small rθ is negligible to have significant impact on the channel

estimation error.

The mth term of the gradient ∇θL2(θ) is given by

[

∇θlL2

(

θl)]

m=

∂

∂θlm

L2

(

θl)

,

= 2α(l+1)1 ℜ{aH(θl

m)XHXa(θlm)} + 2ℜ{aH(θl

m)XHα(l+1)2 }, (3.77)

in which, α(l+1)1 = |µ(l+1)

m |2 + Σ(l+1)m,m , α

(l+1)2 = X

∑

n6=m Σ(l+1)n,m a(θl

n) − y(l+1)\m

(

µ(l+1)m

)∗, and

y(l+1)\m = y − X

∑

n6=m(µ(l+1)n a(θl

n)). The scalar Σ(l+1)n,m =

[

Σ(l+1)

]

n,mand the vector a(θl

m) =

∂∂θl

ma(θl) is computed from (3.2) for m = 1, 2, . . . ,M .

This completes all the sequential updates required to estimate ξ in the (l+ 1)-st itera-

tion. The parameters in ξ are repeatedly updated in the EM iterations until convergence.

The overall EM-EP algorithm is summarized in Algorithm 3.

71

Algorithm 3: Overall EM-EP AlgorithmInput: y

Parameters: ξ(0),θ(0), µ2,m = 0, Σ2,m = 102, p2,m = 0, pF3,m = 0 for m = 2, . . . , M , pR

3,m = 0 form = 1, 2, . . . , M − 1.

/* EM-EP run */

for each l = {0, 1, 2, . . . , nEM − 1}1. Given ξl and θl run the EP algorithm described in Algorithm 2 to generate µ(l+1),

Σ(l+1), and p(l+1).

2. Check for convergence:

if||µ(l+1)−µ

l||||µl||

< ǫEM then

break;

end

3. Use µ(l+1), Σ(l+1), and p(l+1) to update p01, p10, γ, η, and θ using (3.71), (3.72),(3.73), (3.74), and (3.76), respectively.

end

Output: h = A(θ(l+1))µ(l+1)

3.4.1 Computational Complexity of EM-EP Algorithm

The computational complexity of the proposed EP algorithm per iteration is dominated

by (3.25) and (3.26) which can be solved in O(NM2) computations. This complexity is the

same as that of the EP algorithm proposed in [104]. For the EM part of the algorithm, the

dominant terms include the update of η by (3.74) which takes O(NM2) computations, and

the update of θ by (3.76) which takes O(GNM) computations. Since M is usually greater

than G, the complexity of the proposed EM-EP algorithm is O(NM2) per iteration which

is the same as that of the off-grid SBL algorithm proposed in [43].

3.5 Simulation Results

In this section, we investigate the performance of the proposed EM-EP algorithm for

massive MIMO channel estimation. We consider a single-cell where a BS equipped with a

ULA has G antennas and transmits N pilot symbols to a reference user. The elements in

the pilot matrix X are selected from a circularly symmetric complex Gaussian distribution

with unit variance, and the DL channel h between the BS and the user is generated using

the 3GPP spatial channel model [116] with urban-micro cell environment. We assume that

each channel realization is composed of Ls scatterers with AoDs randomly located in the

72

interval [−90o, 90o], and each scatterer has Lp paths with the AoDs randomly generated

and concentrated in an angular spread denoted by A. Unless stated otherwise, the AoDs

of all the paths in a channel realization are continuous-valued variables and thus may not

lie on the assumed angular grid. The DL channel frequency is selected as 2.17 GHz and

the spacing between adjacent antennas in the ULA is set as d = c2f0

where c is the speed

of light and f0 = 2 GHz.

In order to compare our algorithm with the EP algorithm proposed in [104], we need

to extend this algorithm. In [104] the authors modeled the elements of the support (latent)

vector z with an iid Bernoulli prior distribution having a parameter p0 which, along with

the other model parameters, is assumed to be known. To apply their approach to the

problem under consideration here, we need to estimate these parameters. Therefore we

extend the method in [104] with the EM algorithm as discussed in section 2.2 and refer to

the resulting algorithm as EM-EP-B. More specifically, in the (l+1)-st iteration of EM-EP-

B algorithm, p0 is updated according to p(l+1)0 = 1

M

∑Mm=1 σ(p(l+1)

m ). Moreover, the other

model parameters, i.e., η, γm, and (to integrate grid refining) θ are updated using our

results in (3.73), (3.74), and (3.76) from section 3.4.

We also show the performances of SBL [117], Off-grid SBL [43], PC-SBL [98], PC-

VB [97], EM-BG-GAMP [100], TCS [44], S-TCS [90], and SuRe-CSBL [101]. For Off-grid

SBL, PC-SBL, PC-VB, SuRe-CSBL, EM-EP-B, and EM-EP algorithms, the dictionary

A(θ) is initialized to be a (partial) DFT matrix. For the other algorithms, however, A(θ)

is the fixed DFT matrix as required for the derivation of the algorithms and state evolution

analysis7. In all the experiments, we initialized the EM-EP algorithm with λ(0) = 0.3,

τ(0)01 = 0.1, τ (0)

10 = λ(0)

1−λ(0) τ(0)01 , η(0) = γ(0)

m =(

||y||2

(SNR(0)+1)N

)−1with SNR(0) = 100, and

θ(0)m = sin−1

(

−1 + 2mM

)

for m = 1, 2, . . . ,M as in [100, 101]. The maximum iterations of

EM and EP algorithms are set as nEP = nEM = 100 and the tolerance coefficients are

7For consistency, to initialize EM-EP-B, we set p(0)0 = λ(0) whereas the other hyperparameters and

the termination condition were set the same as those for EM-EP. To compare our results with TCS andS-TCS, all the hyperparameters were updated using the EM update equations from [100] except for thetransition probabilities for S-TCS which were updated using the posterior means in (3.71) and (3.72).

73

selected to be ǫEP = ǫEM = 10−4. The channel estimation error is computed by using the

following normalized mean-squared-error (NMSE),

NMSE (dB) = 10 log10

E[||h − h||2]E[||h||2] , (3.78)

in which h is the channel estimate.

In Fig. 3.4 we investigate the performance of the selected channel estimation algorithms

for recovering the sparse vector w with non-uniform burst sparsity1. We consider a BS with

G = 128 antennas transmitting N = 48 pilot symbols to the user with SNR = 10 dB.

The physical channel between the BS and the user has Ls = 3 scatterers with Lp = 10

paths per scatterer. The channel estimators assume a fixed uniformly-spaced angular grid

with θm = sin−1(

−1 + 2mM

)

for m = 1, 2, . . . ,M with M = 200, and the physical AoDs

corresponding to the three non-zeros clusters are assumed to be located on the grid points at

m = 81, 82, . . . , 90,100, 101, . . . , 109, 122, 123, 124, 128, 129, . . . , 134. We get the following

observations from Fig. 3.4. Firstly, when the non-zero clusters are closely located as

shown by the dotted lines in Fig. 3.4, the algorithms such as PC-VB which tune each

coefficient based on the nearest neighbor, exhibit a performance loss due to the leakage

of energy into the bins between the adjacent clusters. For instance, observe the energy

leakage around −3o, 8o, and 15o in Fig. 3.4 (e). Secondly, the algorithms such as EM-EP-

B and EM-BG-GAMP which aim to recover the coefficients individually result in outliers

at random positions far away from the true AoDs. This effect, when pronounced as in

the case of EM-BG-GAMP, causes significant performance loss. Thirdly, SuRe-CSBL and

S-TCS which employ a Markov prior on the support vector z eliminate the outliers, but

suffer from significant leakage of energy into the bins near the clusters true AoDs. Finally,

our proposed EM-EP algorithm eliminates the leakage of energy as well as the occurrence

of outliers, and much more accurately represents the channel.

Fig. 3.5 shows the channel estimation error versus the number of pilot symbols N for

74

0

1

2

3

4

5

-15 -10 -5 0 5 10 15 20 25 30 35

0

1

2

3

4

5

-15 -10 -5 0 5 10 15 20 25 30 35

0

1

2

3

4

5

-15 -10 -5 0 5 10 15 20 25 30 35

0

1

2

3

4

5

-15 -10 -5 0 5 10 15 20 25 30 35

0

1

2

3

4

5

-15 -10 -5 0 5 10 15 20 25 30 35

0

1

2

3

4

5

-15 -10 -5 0 5 10 15 20 25 30 35

Figure 3.4. Magnitude of the elements in w for four independent trials with G = 128,M = 200, N = 48, Ls = 3, Lp = 10, and SNR = 10 dB, and for (a) EM-EP, (b) SuRe-CSBL, (c) S-TCS, (d) EM-EP-B, (e) PC-VB, (f) EM-BG-GAMP. The dotted lines indicatelocations of the true AoDs.

75

Figure 3.5. Channel estimation error vs. number of pilot symbols N with parametersG = 128, M = 200, SNR=10 dB, and for (a) Ls = 3, Lp = 10, and A = 10o, (b) Ls = 4,Lp = 10, and A = 10o.

76

the selected channel estimation schemes. We consider the massive MIMO channel with

Ls = 3 or 4 scatterers and Lp = 10 paths per scatterer. The AoDs for all the paths are

randomly generated continuous-valued parameters with no on-grid assumption as before,

and all the paths per scatterer are concentrated in an angular spread A = 10o. We observe

that in both cases shown in Figs. 3.5(a) and 3.5(b), the performance of the algorithms

improve with N and EM-EP significantly outperforms all the algorithms. The channel has

more paths in case of Ls = 4 in Fig. 3.5(b) and thus larger values of N are required to reach

the same level of performance. SBL, Off-grid SBL, EM-BG-GAMP, and EM-EP-B aim to

recover the coefficients individually and hence their performance is degraded due to the

occurrence of outliers in the angular domain. Compared to SBL and Off-grid SBL which

assume an iid complex Gaussian prior on w, EM-BG-GAMP, TCS, and EM-EP-B assume

an iid Bernoulli-Gaussian (BG) prior where the level of sparsity in w is directly adjusted by

the weight of the Bernoulli component. This weight determines the fraction of coefficients

that are a priori set to zero. Thus, EM-BG-GAMP and TCS perform better than the SBL-

based algorithms. On the other hand, EM-EP-B includes the correlations in w by using

Σ in its estimation of the posterior distribution and also performs grid refining to learn

the dictionary. Therefore EM-EP-B outperforms both EM-BG-GAMP and TCS. PC-SBL

and PC-VB aim to recover each coefficient in w according to its nearest neighbor. PC-SBL

uses an SBL-based algorithm and tunes the precision of each coefficient according to the

precisions of its immediate neighbors but using a sub-optimal solution. PC-VB avoids this

sub-optimality by linking a support vector with a multinoulli prior to every coefficient and

using a variational Bayes (VB) [99] based algorithm. Hence, PC-VB performs better than

PC-SBL, but its performance suffers due to the leakage of energy when multiple non-zeros

clusters are closely located. Performance of PC-VB is inferior to that of EM-EP-B. Due to

its dependence on the VB method, PC-VB may approximate the true distribution locally

around one of its several sub-optimal modes, whereas EM-EP-B employs the EP method

which approximates the true distribution globally over a wider support and thus results in

77

Figure 3.6. Channel estimation error vs. SNR (dB) with parameters G = 128, M = 200,N = 64, and for (a) Ls = 3, Lp = 10, and A = 10o, (b) Ls = 4, Lp = 10, and A = 10o.

78

Figure 3.7. Channel estimation error vs. Angular spread A with parameters G = 128,M = 200, N = 64, and for (a) Ls = 3, Lp = 10, and SNR = 10 dB, (b) Ls = 4, Lp = 10,and SNR = 10 dB.

79

a better performance [107]. Finally in contrast to S-TCS and SuRe-CSBL, EM-EP takes

into account the correlation in w thereby outperforming the former two algorithms.

Fig. 3.6 shows the channel estimation error versus SNR for the selected algorithms. We

consider the same scenario as in Fig. 3.5 except that the number of pilot symbols is now

fixed to N = 64. We observe that the performance of the algorithms improves with SNR

and the proposed EM-EP algorithm has the best performance of all the schemes. In case

of Ls = 4 scatterers the channel has more paths and therefore has more chances of having

non-equal size clusters. Therefore in this case the performance of EM-EP-B which aims to

recover the coefficients individually deteriorates and is worse than that of SuRe-CSBL. Fig.

3.6 also shows that while the performance of all the methods reaches a floor at some value

of SNR (This is more evident in Fig. 3.6(b).), the proposed EM-EP continues to improve

with SNR.

Fig. 3.7 shows the channel estimation error for different values of the angular spread

A. We consider two cases of Ls = 3 and Ls = 4 scatterers as before and with G = 128,

M = 200, and N = 64. The SNR value is fixed to 10 dB. As A increases severe non-

equal size burst sparsity may exist with isolated paths, and thus as observed from Fig.

3.7 the performance of the algorithms degrades accordingly. For a fixed A, such non-equal

size burst sparsity becomes more intense when the channel has more paths as in case (b),

and hence the channel estimation errors are relatively higher. However, in both cases

the EP-based algorithms show significant gains in performance, and the proposed EM-EP

algorithm outperforms all the algorithms. In Fig. 3.7 we also show the performance of

EM-EP when no grid refining is performed, i.e., no optimization over AoDs θ, denoted

in Fig. 3.7 as EM-EP(no-GR). It can be seen that EM-EP(no-GR) performs better than

most of the other algorithms due to the use of the EP method and taking into account the

correlation in w.

Finally, in Fig. 3.8 we examine the effect of varying the grid length M on the chan-

nel estimation performance of the algorithms. Consider the channel with Ls = 3 or 4

80

Figure 3.8. Channel estimation error vs. grid length M with parameters G = 150, N = 64,SNR = 10 dB, and for (a) Ls = 3, Lp = 10, and A = 10o, (b) Ls = 4, Lp = 10, andA = 10o.

81

scatterer where the BS has G = 150 antennas, the number of pilot symbols are fixed to

N = 64, angular spread is selected to be A = 10o, and the SNR is set to 10 dB. It is

observed that in both cases shown in Fig. 3.8 the performance of the algorithms improve

with M , and our proposed EM-EP algorithm outperforms all the algorithms for large M .

The parameter M defines the resolution of the initial angular grid which is here given by

∆θ(0) = sin−1(2/M). When M is small, the initial grid is coarse and thus the algorithms

suffer from convergence to local minima resulting in higher channel estimation error. As

M increases the grid resolution improves which in turn improves the channel estimation

performance of the algorithms. Further, for a fixed number of paths, as M increases the

level of sparsity increases and thus the non-zero coefficients are more successfully recovered

by the algorithms.

82

Chapter 4

Conclusions

In this dissertation, we first proposed a semi-blind expectation propagation (EP) based

algorithm for joint channel estimation and symbol detection in the uplink of multi-cell

massive MIMO systems for spatially and temporally correlated channels. EP algorithm

is developed to approximate the a posteriori distribution of the channel matrix and data

symbols with a distribution from an exponential family. The latter is then used to directly

estimate the channel and detect the data symbols. A modified version of the classical

Kalman filtering algorithm (referred to as KF-M) is also proposed that emerges from our

EP derivations and is used in initializing the EP-based algorithm. Performance of Kalman

Smoothing algorithm followed by KF-M (referred to as KS-M) is also examined. Simulation

results show that the performance of KF-M, KS-M, and EP algorithms improves with the

increase in the number of base station antennas M and the length of the data symbols

Td in the transmitted frame. Thus for a fixed M , using a large Td in the semi-blind

approach can mitigate the effect of pilot contamination and time-varying channels in multi-

cell massive MIMO systems with a pilot-overhead quite less than 1/9. Moreover, the

EP-based algorithm significantly outperforms KF-M and KS-M algorithms. Finally, our

results show that when applied to time-varying channels, these algorithms outperform the

algorithms that are developed for block-fading channel models.

Next, we considered the problem of downlink channel estimation in the multi-user mas-

sive MIMO systems. To capture the clustered sparse nature of the channel, we assume a

conditionally independent identically distributed Bernoulli-Gaussian prior on the sparse

vector representing the channel, and a Markov prior on its support vector. We develop

an expectation propagation (EP) based algorithm to approximate the intractable joint dis-

tribution on the sparse vector and its support with a distribution from an exponential

family. To find the maximum likelihood estimates of the hyperparameters and the angular

grid points, we integrated the EP algorithm with the expectation maximization (EM) al-

gorithm. The resulting EM-EP algorithm directly estimates the hyperparameters and the

83

clustered sparse downlink channel. Simulation results show that due to the inclusion of the

correlations in the sparse vector in the approximated posterior and the use of EP method,

our EM-EP algorithm can recover the channel with non-equal size burst sparsity. Further,

the proposed EM-EP algorithm outperforms the existing algorithms in the literature in-

cluding S-TCS and SuRe-CSBL algorithms which also use a Markov prior on the support

vector.

84

Appendix A

Inference by Message Passing on Graphical Models

Probabilistic graphical models (PGMs) are used to diagrammatically represent a proba-

bility distribution which provides insight into the properties of the probability distributions,

for instance, the causal relationship between the random variables and the existence of the

conditional independence between them. Using PGMs, message-passing algorithms can be

developed to efficiently compute either exactly or approximately the true posterior dis-

tribution. Further, the structure of the inference algorithms can be easily visualized by

propagation of messages locally on the PGMs which makes the computations by the algo-

rithms both intuitive and transparent. In this appendix, we first briefly discuss the three

popular PGMs, namely, Bayesian network, Markov random field, and factor graph. Next

we review the sum-product algorithm which efficiently computes the posterior distributions

by propagating messages locally on a factor graph. Finally, we derive Kalman filtering as a

special instance of the sum-product algorithm considering its application in massive MIMO

channel estimation.

A.1 Probabilistic Graphical Models

Probabilistic graphical models (PGMs) are made up of nodes (vertices) that are con-

nected to each other through the links (edges). The nodes represent either a random

variable, a group of random variables, or a function of random variables. The links encode

the relationship between the nodes and are either directed or undirected. Bayesian network

is an example of a directed graphical model with directed links, whereas both Markov ran-

dom field and factor graph are classified as undirected graphical models with undirected

links. These graphical models are briefly explained in the following.

A.1.1 Bayesian Network

Bayesian network (BN) is a directed acyclic graph for a set of random variables. In

a BN, each random variable in the set is represented by a circle node, and the nodes

are connected through directed links. These directed links encode a causal parent-child

relationship between the variables, for instance, if there is directed link going from node xi

85

Figure A.1. Probabilistic graphical models: (a) a Bayesian Network, (b) a Markov randomfield, and (c) a factor graph.

to node xj then xi is called the parent of node xj, and node xj is called the child of node

xi. Let pa(xi) defines the set of all the parents of node xi, then the joint distribution of

x = [x1, x2, . . . , xN ]T represented by a BN is written as

p(x) =N∏

i=1

p(xi|pa(xi)) (A.1)

where pa(xi) is the set of parents nodes of xi and if it is an empty set then we write

p(xi|pa(xi)) = p(xi). For example, given the Bayesian network of Fig. A.1 (a), the joint

distribution of x = [x1, x2, x3, x4]T is written as

p(x) = p(x1)p(x2|x1)p(x3|x2)p(x4|x3), (A.2)

A.1.2 Markov Random Field

Markov random field (MRF) is an undirected graph comprised of a set of nodes that

are connected together through undirected links. Each node represent either a random

variable, or group of random variables, and each link is labeled with a local function that

encodes the interaction between the nodes. For an undirected graph to be called a MRF, the

underlying joint probability distribution of x = [x1, x2, . . . , xN ]T must satisfy the following

86

Markov property for each xi

p(xi|x1, x2, . . . , xi−1, xi+1, . . . , xN) = p(xi|n(xi)), (A.3)

where n(xi) denote the set of all the immediate neighbors of xi. Thus, in a MRF, the

state visited by xi depends only on the state of the random variables in n(xi), and it is

independent of the state of other random variables in the graph. Due to this reason, the

underlying joint probability distribution in a MRF factorizes over a set of maximal cliques1.

Hence, in general, the joint probability distribution of x = [x1, x2, . . . , xN ]T in a MRF is

written as

p(x) =1Z

∏

c∈C

ψc(xc), (A.4)

where Z =∑

x

∏

c∈C ψc(xc) is a normalization constant. C is the set of maximal cliques,

and ψc(xc) is a local function of the set of random variables xc that form a maximal clique

c. For example, given the MRF in Fig. A.1 (b), the joint distribution of x = [x1, x2, x3, x4]T

is written as

p(x) =1Zψ12(x1, x2)ψ23(x2, x3)ψ34(x3, x4), (A.5)

in which the maximal clique size is two. In this case, the MRF is also called a pairwise

MRF.

A.1.3 Factor Graph

Factor graph (FG) is a bipartite graph that is used to represent a factorable probability

distribution. In order to explicitly represent the factorization, a FG is made up from two

different kind of nodes, i.e., factor nodes and variable nodes. Each factor node is represented

by a square and labeled with a factor function fj, whereas each variable node is represented

by a circle and labeled with a random variable xi. Using undirected links, fj is connected

to only those xis’ that are its input arguments. In general, a joint probability distribution

1A clique is a fully connected subgraph of nodes, whereas a maximal clique is a subgraph of nodes towhich no further node could be added such that all the nodes still form a clique.

87

on x = [x1, x2, . . . , xN ]T that could be represented by a FG factorizes as follow

p(x) =1Z

∏

j

fj(xj), (A.6)

in which, Z =∑

x

∏

j fj(xj) is a normalization constant. xj is a subset of the random

variables in x, and fj(.) is a factor function that takes xj as an input argument. For

example, let the joint probability distribution of x = [x1, x2, x3, x4]T factorizes as follow

p(x) =1Zf12(x1, x2)f23(x2, x3)f34(x3, x4), (A.7)

then it can be represented by a factor graph shown in Fig. A.1 (c).

Note that both Bayesian network and Markov random field can be converted to a

factor graph. For instance, to convert the BN in Fig. A.1 (a) to the FG in Fig. A.1 (c),

we define f12(x1, x2) = p(x1)p(x2|x1), f23(x2, x3) = p(x3|x2), and f34(x3, x4) = p(x4|x3)

with the normalization constant Z = 1. Similarly, to convert the MRF in Fig. A.1 (b) to

the FG in Fig. A.1 (c), we define f12(x1, x2) = ψ12(x1, x2), f23(x2, x3) = ψ23(x2, x3), and

f34(x3, x4) = ψ34(x3, x4). Compared to Bayesian network and Markov random field, since

factor graphs explicitly represent a probability distribution by using separate factor nodes

and variable nodes, in the rest of this work we use factor graphs to represent a probability

distribution.

A.2 Sum-Product Algorithm

Sum-Product (SP) algorithm, also known as Belief propagation, is an inference algo-

rithm that efficiently computes the marginal distribution of a random variable, or a subset

of random variables, given a factorable joint probability distribution of a larger set of ran-

dom variables. The marginal distributions are computed by propagation of local messages

on a graph representing the joint distribution2.

2SP algorithm for all three types of the PGMs discussed in section A.1 have been developed that aremathematically equivalent to each other at every iteration [118].

88

Implementing the SP algorithm using factor graphs requires two kind of messages, one

from a variable node to a factor node, and other one from a factor node to a variable

node. The SP algorithm often starts from the leaf nodes of a FG, and in a cycle-free FG

it terminates when two messages have been passed on each link, one in both directions3.

To initialize the messaging process, if the leaf node is a variable node then a unit function

is sent from it to the neighboring factor node, but if the leaf node is a factor node then

that factor function is sent in a message to the neighboring variable node. In general, the

messages sent by the SP algorithm between the nodes in a FG are given by [119] as,

From variable node x to factor node f :

µx→f (x) =∏

h∈n(x)\{f}

µh→x(x), (A.8)

From factor node f to variable node x:

µf→x(x) =∫

∼{x}

f(X)∏

y∈n(f)\{x}

µy→f (y)

, (A.9)

where n(x) is a set of neighbor nodes of x, and X is the set of input arguments of the

function f . At the termination step, the marginal distribution at node x is given by

g(x) ∝∏

h∈n(x)

µh→x(x), (A.10)

As an example, let the joint distribution on x = [x1, x2, x3]T factorizes as follow

p(x) =1Zfa(x1)fb(x2)fc(x1, x2, x3), (A.11)

then as shown in Fig. A.2, the messages generated by the SP algorithm at every iteration

3Here we assume a cycle-free FG, but if a FG has cycles, the SP algorithm may not converge in whichcase message-passing scheduling could be used to terminate it [119].

89

Figure A.2. Message generated in each iteration of SP algorithm

are given as follows.

Iteration 1 :

µfa→x1(x1) = fa(x1),

µfb→x2(x2) = fb(x2),

µx3→fc(x3) = 1,

(A.12)

Iteration 2 :

µx1→fc(x1) = µfa→x1(x1),

µx2→fc(x2) = µfb→x2(x2),

µfc→x1(x1) =∫

∼{x1} µx3→fc(x3)µx2→fc

(x2)fc(x1, x2, x3),

µfc→x2(x2) =∫


(x1)fc(x1, x2, x3),

(A.13)

Iteration 3 :

µfc→x3(x3) =∫


(x2)fc(x1, x2, x3),

µx1→fa(x1) = µfc→x1(x1),

µx2→fb(x2) = µfc→x2(x2),

(A.14)

Termination step :

g(x1) = µfa→x1(x1)µfc→x1(x1), (A.15)

g(x2) = µfb→x2(x2)µfc→x2(x2), (A.16)

g(x3) = µfc→x3(x3), (A.17)

90

Note that in each one of the above iterations, the messages can be computed in parallel

since they are independent of each other. Hence, we get the marginal distributions in

just 3 iterations. In general for a cycle-free graph, the time that the SP algorithm takes

to compute all the marginal distributions is proportional to the number of unobserved

variables in the distribution function. Further, it can be shown that the SP algorithm gives

exact inference for a cycle-free FG by back tracing. For instance, for g(x1) we can solve,

g(x1) = µfa→x1(x1)µfc→x1(x1),

= µfa→x1(x1)∫

∼{x1}µx3→fc

(x3)µx2→fc(x2)fc(x1, x2, x3),

= fa(x1)∫

∼{x1}fb(x2)fc(x1, x2, x3), (A.18)

where the last equality in (A.18) is the unnormalized exact inference. Similarly, it can also

be shown for g(x2) and g(x3) as well. Once these marginal functions are obtained using

the SP algorithm, they can be normalized to get the marginal distribution p(xi).

Sum-Product algorithm also forms a basis for Kalman filtering mechanism where we

use it to compute the marginal posterior distributions. Kalman filtering is popular for time-

varying channel estimation in massive MIMO systems [28,72,120]. In the next section, we

derive Kalman filter as a special instance of the sum-product algorithm.

A.3 Kalman Filtering as Sum-Product Algorithm

In this section, we present a sum-product algorithm based view of the Kalman filtering

for massive MIMO channel estimation. Consider a single-cell massive MIMO system with

K single-antenna users in the cell and a base station (BS) with M antennas. We assume

the massive MIMO system operates in TDD mode and to estimate the uplink M × K

channel matrix Ht at time t, the users transmit pilot symbols st ∈ AKM. The set AM is

the alphabet of M-ary modulated symbols with unit average energy. The signal received

by the BS at time t is given by

yt = Htst + wt, (A.19)

91

in which the noise term is distributed as wt ∼ CN (w|0,Rw). By applying the vectorization

property4, (A.19) can be written as

yt = Stht + wt, (A.20)

where St = sTt ⊗ IM and ht = vec(Ht). Due to the mobility of users, or the mobility in the

channel environment, the massive MIMO channel ht may be time-varying in which case

it can be modeled with a fist-order auto-regressive (AR) process [68, 69]. Thus, the time

variation in the channel is given by

ht = Aht−1 + vt, (A.21)

where A is a diagonal matrix with elements on the diagonal denoted by [A]n,n = an for

n = 1, 2, . . . ,MK. Let the scalar an = J0(2πfdn) where fd

n represents the normalized

Doppler shift corresponding to the n-th channel. The innovation process vt is distributed

as vt ∼ CN (v|0,Q) where Q is a diagonal matrix with elements on the diagonal given by

[Q]n,n = (1 − a2n).

In Kalman filtering, at each time instant t, we are interested in computing the marginal

posterior distribution on ht given the received signals up to time t. It is written as

p(ht|y1,y2, . . . ,yt) =∫

∼ht

p(h1,h2, . . . ,ht|y1,y2, . . . ,yt), (A.22)

where the joint posterior distribution in (A.22) is given by

p(h1,h2, . . . ,ht|y1,y2, . . . ,yt) ∝t∏

u=1

p(hu|hu−1)p(yu|su,hu), (A.23)

in which p(h1) = p(h1|h0). Note that from (A.21) we have p(hu|hu−1) = CN (hu|Ahu−1,Q),

and from (A.20) we have p(yu|su,hu) = CN (yu|Suhu,Rw). The mean of the marginal

4vec(ABC) = (CT ⊗ A)vec(B).

92

Figure A.3. A piece of the factor graph representing the joint distribution in (A.23).

distribution in (A.22) gives the minimum mean squared error (MMSE) estimate of ht.

We can use the Sum-Product (SP) algorithm to compute the marginal posterior dis-

tribution in (A.22) at each time instant t. A piece of the factor graph representing the

joint posterior distribution in (A.23) is shown in Fig. A.3. In this figure, we use µu|u−1 to

represent the message sent from the factor node p(hu|hu−1) to the variable node hu. Since

the involved factor functions in (A.23) are complex Gaussian distributed, we assume that

the message µu|u−1 = CN (hu|mu|u−1,Vu|u−1). This implies that the message sent out from

the variable hu is given by the SP algorithm as

µu|u = µu|u−1p(yu|su,hu),

= CN(

hu|mu|u−1,Vu|u−1

)

CN (yu|Suhu,Rw) ,

∝ CN(

hu|mu|u,Vu|u

)

, (A.24)

where,

mu|u = mu|u−1 + Vu|u−1SHu

(

yu − Sumu|u−1

)

, (A.25)

Vu|u = Vu|u−1 + Vu|u−1SHu R−1

w SuVu|u−1, (A.26)

The vector mu|u gives the MMSE estimate of hu given the received signals up to time

instant u. Next, the message sent from the factor node p(hu+1|hu) to the variable node

93

hu+1 is

µu+1|u =∫

hu

µu|up(hu+1|hu),

=∫

hu

CN(

hu|mu|u,Vu|u

)

CN (hu+1|Ahu,Q) ,

∝ CN(

hu+1|mu+1|u,Vu+1|u

)

, (A.27)

in which,

mu+1|u = Amu|u, (A.28)

Vu+1|u = AVu|uAH + Q, (A.29)

The vector mu+1|u gives the MMSE prediction of hu+1 given the received signals up

to time instant u. Note that (A.25) and (A.26) forms the time-update equations, whereas

(A.28) and (A.29) forms the prediction-update equations of Kalman filtering. Further, since

mean and covariance matrix are the sufficient statistics of a complex Gaussian distribution,

instead of transmitting the full distributions, the messages on the factor graph can be

compressed to the sets µu|u ,(

mu|u,Vu|u

)

and µu+1|u ,(

mu+1|u,Vu+1|u

)

.

94

Appendix B

Proof of Lemma 1

For g(ht) defined as

g(ht) =∑

st∈AKM

p(yt|st,ht)p(st), (B.1)

the intermediate pdf is written as

qt(ht) =1Zt

g(ht)CN(

ht|m\Ot ,V

\Ot

)

, (B.2)

where

Zt =∫

ht

g(ht)CN(

ht|m\Ot ,V

\Ot

)

dht, (B.3)

First, we compute the gradient ∇Hm ,

(

∂

∂m\O

t

logZt

)H

as follow

∇Hm =

∫

ht

qt(ht) ∇Hm

[

CN(

ht|m\Ot ,V

\Ot

)] (

ht − m\Ot

)]

dht

=∫

ht

qt(ht)CN(

ht|m\Ot ,V

\Ot

) (

V\Ot

)−1 (

ht − m\Ot

)

dht =(

V\Ot

)−1 (

Eqt[ht] − m

\Ot

)

,

(B.4)

Setting mt = Eqt[ht] and solving the third equality in (B.4) results in

mt = m\Ot + V

\Ot ∇H

m. (B.5)

Next, to prove (2.31), we first evaluate the gradient ∇V ,

(

∂

∂V\O

t

logZt

)

in the follow-

95

ing. We have,

d (logZt) = d[∫

ht

g(ht)CN(

ht|m\Ot ,V

\Ot

)

dht

]

=1Zt

∫

ht

g(ht)CN(

ht|m\Ot ,V

\Ot

){

d[

−(

ht − m\Ot

)H (

V\Ot

)−1 (

ht − m\Ot

)]}

dht

− d(

log∣∣∣πV

\Ot

∣∣∣

)

,

=1Zt

∫

ht

qt(ht)CN(

ht|m\Ot ,V

\Ot

)

×[(

V\Ot

)−1 (

ht − m\Ot

) (

ht − m\Ot

)H (

V\Ot

)−1] (

dV\Ot

)

dht

−(

V\Ot

)−1 (

dV\Ot

)

,

=[(

V\Ot

)−1Eqt

[(

ht − m\Ot

) (

ht − m\Ot

)H] (

V\Ot

)−1 −(

V\Ot

)−1] (

dV\Ot

)

, (B.6)

and thus,

∇V =d (logZt)

dV\Ot

=(

V\Ot

)−1Eqt

[(

ht − m\Ot

) (

ht − m\Ot

)H] (

V\Ot

)−1 −(

V\Ot

)−1, (B.7)

Expanding the Eqt[.] operator in (B.7) and solving for Eqt

[

hthHt

]

, we get

Eqt

[

hthHt

]

= V\Ot ∇V V

\Ot + V

\Ot + Eqt

[ht](

m\Ot

)H

+ m\Ot Eqt

[

hHt

]

− m\Ot

(

m\Ot

)H, (B.8)

Now from (2.28), we have Vt = Eqt

[

hthHt

]

− Eqt[ht]Eqt

[

hHt

]

. Inserting in (B.8) and from

(2.29) we get

Vt = V\Ot − V

\Ot

(

∇Hm∇m − ∇V

)

V\Ot (B.9)

96

Appendix C

Proof of Lemma 2

Given the hybrid posterior distribution as

R2,m(wm, zm) =1Cm

p(wm|zm)CN (wm;µ\2,m,Σ\2,m)Bern(zm;σ(p\2,m)), (C.1)

where the normalization constant Cm is written as

Cm =∑

zm∈{0,1}

∫

p(wm|zm)CN (wm;µ\2,m,Σ\2,m)Bern(zm;σ(p\2,m))dwm, (C.2)

First, we compute ∂ ln Cm

∂µ∗\2,m

as follows

∂ lnCm =1Cm

∑

zm∈{0,1}

∫

p(wm|zm)∂[

CN (wm;µ\2,m,Σ\2,m)]

Bern(zm;σ(p\2,m))dwm,

=1Cm

∑

zm∈{0,1}

∫

p(wm|zm)CN (wm;µ\2,m,Σ\2,m)Bern(zm;σ(p\2,m))×[

wm − µ\2,m

Σ\2,m

]

dwm∂µ∗\2,m, (C.3)

which can be written as

∂ lnCm

∂µ∗\2,m

=ER2,m

[wm]Σ\2,m

− µ\2,m

Σ\2,m

, (C.4)

Setting µm = ER2,m[wm] in (C.4) and rearranging it we get

µm = µ\2,m + Σ\2,m∂ lnCm

∂µ∗\2,m

, (C.5)

97

Next we compute ∂ ln Cm

∂Σ\2,mby

∂ lnCm =1Cm

∑

zm∈{0,1}

∫

p(wm|zm)∂[

CN (wm;µ\2,m,Σ\2,m)]

Bern(zm;σ(p\2,m))dwm,

=1Cm

∑

zm∈{0,1}

∫

p(wm|zm)CN (wm;µ\2,m,Σ\2,m)Bern(zm;σ(p\2,m))×

|wm − µ\2,m|2(

Σ\2,m

)2 − 1Σ\2,m

dwm∂Σ\2,m, (C.6)

where (C.6) can be written as

∂ lnCm

∂Σ\2,m

=ER2,m

[

| wm − µ\2,m |2]

(

Σ\2,m

)2 − 1Σ\2,m

, (C.7)

Expanding the ER2,m[.] operator in (C.7) and using (C.5) in it then rearranging gives

ER2,m[|wm|2] = Σ\2,m +

(

Σ\2,m

)2 ∂ lnCm

∂Σ\2,m

+ |µ\2,m|2+

Σ\2,mµ\2,m∂ lnCm

∂µ\2,m

+ Σ\2,mµ∗\2,m

∂ lnCm

∂µ∗\2,m

, (C.8)

subtracting |ER2,m[wm]|2 from both sides of (C.8) and using (3.38) and (C.5) we get

Σm,m = Σ\2,m +(

Σ\2,m

)2

∂ lnCm

∂Σ\2,m

− ∂ lnCm

∂µ∗\2,m

∂ lnCm

∂µ\2,m

, (C.9)

Finally we compute ∂ ln Cm

∂σ(p\2,m)as follows

∂ lnCm =1Cm

∑

zm∈{0,1}

∫

p(wm|zm)CN (wm;µ\2,m,Σ\2,m)∂[

Bern(zm;σ(p\2,m))]

dwm,

=1Cm

∑

zm∈{0,1}

∫

p(wm|zm)CN (wm;µ\2,m,Σ\2,m)Bern(zm;σ(p\2,m))×[

zm

σ(p\2,m)− (1 − zm)

(1 − σ(p\2,m))

]

dwm∂σ(p\2,m), (C.10)

98

which can be written as

∂ lnCm

∂σ(p\2,m)=

ER2,m[zm]

σ(p\2,m)− (1 − ER2,m

[zm])(1 − σ(p\2,m))

, (C.11)

rearranging (C.11) and using (3.39) we get

σ(pm) = σ(p\2,m) + σ(p\2,m)(1 − σ(p\2,m))∂ lnCm

∂σ(p\2,m), (C.12)

where using (3.33), we compute

∂ lnCm

∂σ(p\2,m)=

1Cm

[

CN (0;µ\2,m,Σ\2,m + γ−1m ) − CN (0;µ\2,m,Σ\2,m)

]

, (C.13)

inserting (C.13) in (C.12) and again using (3.33) gives (3.40).

99

Appendix D

Deriving the Marginals in (3.57) and (3.58)Let the joint probability mass function (pmf) on zm−1 and zm can be defined as

p(zm−1 = i, zm = j) = φij for i, j ∈ {00, 01, 10, 11}. This pmf can be written as

p(zm−1, zm) = [φ11]zm−1zm [φ01](1−zm−1)zm [φ10]

zm−1(1−zm) [φ00](1−zm−1)(1−zm) , (D.1)

∝ exp{zm−1ℓ1 + zmℓ2 + zm−1zmℓ3}, (D.2)

where we define

ℓ1 = lnφ10

φ00

, ℓ2 = lnφ01

φ00

, ℓ3 = lnφ00φ11

φ01φ10

, (D.3)

Next we use (D.3) and∑

i,j φij = 1 to get the solution to this system of equations as

φ00 =1

1 + exp{ℓ1} + exp{ℓ2} + exp{ℓ1 + ℓ2 + ℓ3}, (D.4)

φ01 =exp{ℓ2}


φ10 =exp{ℓ1}


φ11 =exp{ℓ1 + ℓ2 + ℓ3}


Now the joint distribution on zm−1 and zm in our case is given in (3.56) as

S3,m−1,m(zm−1, zm) = q\R3,m−1(zm−1)p(zm|zm−1)q

\F3,m(zm), (D.8)

using (3.7), (3.51), and (3.54) in (D.8) and simplifying we get

S3,m−1,m(zm−1, zm) ∝ exp

zm−1 ln

σ(

p\R3,m−1

)

τ01

σ(

−p\R3,m−1

)

(1 − τ10)+ zm ln

σ(

p\F3,m

)

τ10

σ(

−p\F3,m

)

(1 − τ10)

+zm−1zm ln(1 − τ01)(1 − τ10)

τ01τ10

}

, (D.9)

100

Comparing (D.2) and (D.9), we see that

ℓ1 = lnσ(

p\R3,m−1

)

τ01

σ(

−p\R3,m−1

)

(1 − τ10), (D.10)

ℓ2 = lnσ(

p\F3,m

)

τ10

σ(

−p\F3,m

)

(1 − τ10), (D.11)

ℓ3 = ln(1 − τ01)(1 − τ10)

τ01τ10

, (D.12)

and using the above equations in (D.4)-(D.7) we get

φ00 =1Dm

σ(−p\R3,m−1)σ(−p\F

3,m)(1 − τ10), (D.13)

φ01 =1Dm

σ(−p\R3,m−1)σ(p\F

3,m)τ10, (D.14)

φ10 =1Dm

σ(p\R3,m−1)σ(−p\F

3,m)τ01, (D.15)

φ11 =1Dm

σ(p\R3,m−1)σ(p\F

3,m)(1 − τ01), (D.16)

where the normalization constant Dm is given by

Dm = σ(−p\R3,m−1)σ(−p\F

3,m)(1 − τ10) + σ(p\R3,m−1)σ(−p\F

3,m)τ01

+ σ(−p\R3,m−1)σ(p\F

3,m)τ10 + σ(p\R3,m−1)σ(p\F

3,m)(1 − τ01), (D.17)

Now once φijs’ are computed in (D.13)-(D.16), the marginal distributions on zm−1 and zm

can be found from

S3,m−1(zm−1) = [φ10 + φ11]zm−1 [φ01 + φ00]

(1−zm−1) , (D.18)

S3,m(zm) = [φ01 + φ11]zm [φ10 + φ00]

(1−zm) , (D.19)

where (D.18) and (D.19) is derived from (D.1) by marginalizing over the other variable.

101

Appendix E

Copyright Information

The following paragraph containing copyright management information is taken from

[121].

“This grant to arXiv is a non-exclusive license and is not a grant of exclusive rights or a

transfer of the copyright. The authors retain their copyright and may enter into publication

agreements or other arrangements, so long as they do not conflict with the ability of arXiv

to exercise its rights under the License. arXiv has no obligation to protect or enforce any

copyright in the Work, and arXiv has no obligation to respond to any permission requests

or other inquiries regarding the copyright in or other uses of the Work.”

102

References

[1] Business Insider Intelligence, “Mobile data will skyrocket 700% by 2021.” [On-line] available at: https://www.businessinsider.com/mobile-data-will-skyrocket-700-by-2021-2017-2, (published on Feb. 9, 2017).

[2] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash, “Internetof Things: A Survey on Enabling Technologies, Protocols, and Applications,” IEEECommunications Surveys Tutorials, vol. 17, no. 4, pp. 2347–2376, 2015.

[3] A. Zanella, N. Bui, A. Castellani, L. Vangelista, and M. Zorzi, “Internet of Thingsfor Smart Cities,” IEEE Internet of Things Journal, vol. 1, no. 1, pp. 22–32, 2014.

[4] B. M. Lee and H. Yang, “Massive MIMO for Industrial Internet of Things in Cyber-Physical Systems,” IEEE Transactions on Industrial Informatics, vol. 14, no. 6,pp. 2641–2652, 2018.

[5] L. Lu and G. Y. Li and A. L. Swindlehurst and A. Ashikhmin and R. Zhang,“An Overview of Massive MIMO: Benefits and Challenges,” IEEE Journal of Se-lected Topics in Signal Processing, vol. 8, pp. 742–758, Oct. 2014.

[6] J. Vieira and S. Malkowsky and K. Nieman and Z. Miers and N. Kundargiand L. Liu and I. Wong and V. ÃŰwall and O. Edfors and F. Tufvesson,“A flexible 100-antenna testbed for Massive MIMO,” in 2014 IEEE Globecom Work-shops (GC Wkshps), pp. 287–293, Dec 2014.

[7] D. Gesbert, M. Shafi, Da-shan Shiu, P. J. Smith, and A. Naguib, “From theory topractice: an overview of MIMO space-time coded wireless systems,” IEEE Journalon Selected Areas in Communications, vol. 21, no. 3, pp. 281–302, 2003.

[8] Lizhong Zheng and D. N. C. Tse, “Diversity and multiplexing: a fundamental tradeoffin multiple-antenna channels,” IEEE Transactions on Information Theory, vol. 49,no. 5, pp. 1073–1096, 2003.

[9] G. J. Foschini, “Layered space-time architecture for wireless communication in afading environment when using multi-element antennas,” Bell Labs Technical Journal,vol. 1, no. 2, pp. 41–59, 1996.

[10] G. Caire and S. Shamai, “On the achievable throughput of a multiantenna Gaus-sian broadcast channel,” IEEE Transactions on Information Theory, vol. 49, no. 7,pp. 1691–1706, 2003.

[11] S. Vishwanath, N. Jindal, and A. Goldsmith, “Duality, achievable rates, and sum-ratecapacity of Gaussian MIMO broadcast channels,” IEEE Transactions on InformationTheory, vol. 49, no. 10, pp. 2658–2668, 2003.

[12] S. Belhadj Amor, Y. Steinberg, and M. Wigger, “MIMO MAC-BC Duality WithLinear-Feedback Coding Schemes,” IEEE Transactions on Information Theory,vol. 61, no. 11, pp. 5976–5998, 2015.

103

[13] D. Gesbert, M. Kountouris, R. W. Heath, C. Chae, and T. Salzer, “Shifting theMIMO Paradigm,” IEEE Signal Processing Magazine, vol. 24, no. 5, pp. 36–46, 2007.

[14] T. L. Marzetta, “Noncooperative Cellular Wireless with Unlimited Numbers of BaseStation Antennas,” IEEE Transactions on Wireless Communications, vol. 9, no. 11,pp. 3590–3600, 2010.

[15] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Energy and Spectral Efficiencyof Very Large Multiuser MIMO Systems,” IEEE Transactions on Communications,vol. 61, no. 4, pp. 1436–1449, 2013.

[16] F. Rusek, D. Persson, B. K. Lau, E. G. Larsson, T. L. Marzetta, O. Edfors, andF. Tufvesson, “Scaling Up MIMO: Opportunities and Challenges with Very LargeArrays,” IEEE Signal Processing Magazine, vol. 30, no. 1, pp. 40–60, 2013.

[17] J. Hoydis, S. ten Brink, and M. Debbah, “Massive MIMO in the UL/DL of CellularNetworks: How Many Antennas Do We Need?,” IEEE Journal on Selected Areas inCommunications, vol. 31, no. 2, pp. 160–171, 2013.

[18] T. L. Marzetta, “Massive MIMO: An Introduction,” Bell Labs Technical Journal,vol. 20, pp. 11–22, 2015.

[19] M. Rumney, Looking Towards 4G: LTE - Advanced, pp. 567–600. 2013.

[20] “IEEE Standard for Information technology–Telecommunications and informationexchange between systems Local and metropolitan area networks–Specific require-ments Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer(PHY) Specifications,” IEEE Std 802.11-2012 (Revision of IEEE Std 802.11-2007),pp. 1–2793, 2012.

[21] K. H. Teo, Z. Tao, and J. Zhang, “The Mobile Broadband WiMAX Standard [Stan-dards in a Nutshell],” IEEE Signal Processing Magazine, vol. 24, no. 5, pp. 144–148,2007.

[22] E. BjÃűrnson, E. G. Larsson, and T. L. Marzetta, “Massive MIMO: ten myths andone critical question,” IEEE Communications Magazine, vol. 54, no. 2, pp. 114–123,2016.

[23] Proakis, Digital Communications. McGraw Hill, 5th ed., 2007.

[24] N. Jindal and A. Goldsmith, “Dirty paper coding vs. TDMA for MIMO broadcastchannels,” in 2004 IEEE International Conference on Communications (IEEE Cat.No.04CH37577), vol. 2, pp. 682–686 Vol.2, 2004.

[25] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. K. Soong, and J. C.Zhang, “What Will 5G Be?,” IEEE Journal on Selected Areas in Communications,vol. 32, no. 6, pp. 1065–1082, 2014.

104

[26] E. BjÃűrnson, E. G. Larsson, and M. Debbah, “Massive MIMO for Maximal SpectralEfficiency: How Many Users and Pilots Should Be Allocated?,” IEEE Transactionson Wireless Communications, vol. 15, no. 2, pp. 1293–1308, 2016.

[27] E. Nayebi and B. D. Rao, “Semi-blind channel estimation for multiuser massiveMIMO systems,” IEEE Transactions on Signal Processing, vol. 66, pp. 540–553, Jan2018.

[28] A. Almamori and S. Mohan, “Estimation of channel state information for massiveMIMO based on received data using Kalman filter,” in 2018 IEEE 8th Annual Com-puting and Communication Workshop and Conference (CCWC), pp. 665–669, 2018.

[29] A. K. Papazafeiropoulos, “Impact of General Channel Aging Conditions on the Down-link Performance of Massive MIMO,” IEEE Transactions on Vehicular Technology,vol. 66, no. 2, pp. 1428–1442, 2017.

[30] K. T. Truong and R. W. Heath, “Effects of channel aging in massive MIMO systems,”Journal of Communications and Networks, vol. 15, no. 4, pp. 338–351, 2013.

[31] L. Fan, S. Jin, C. Wen, and H. Zhang, “Uplink Achievable Rate for Massive MIMOSystems With Low-Resolution ADC,” IEEE Communications Letters, vol. 19, no. 12,pp. 2186–2189, 2015.

[32] M. A. Teeti, R. Wang, and R. Abdolee, “On the Uplink Achievable Rate for MassiveMIMO With 1-Bit ADC and Superimposed Pilots,” IEEE Access, vol. 6, pp. 37627–37643, 2018.

[33] Z. Jiang, A. F. Molisch, G. Caire, and Z. Niu, “Achievable Rates of FDD MassiveMIMO Systems With Spatial Channel Correlation,” IEEE Transactions on WirelessCommunications, vol. 14, no. 5, pp. 2868–2882, 2015.

[34] Y. O. Basciftci, C. E. Koksal, and A. Ashikhmin, “Physical-Layer Security inTDD Massive MIMO,” IEEE Transactions on Information Theory, vol. 64, no. 11,pp. 7359–7380, 2018.

[35] A. Sheikhi, S. M. Razavizadeh, and I. Lee, “A Comparison of TDD and FDD MassiveMIMO Systems Against Smart Jamming,” IEEE Access, vol. 8, pp. 72068–72077,2020.

[36] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Energy and Spectral Efficiencyof Very Large Multiuser MIMO Systems,” IEEE Transactions on Communications,vol. 61, no. 4, pp. 1436–1449, 2013.

[37] T. L. Marzetta, “How Much Training is Required for Multiuser MIMO?,” in 2006Fortieth Asilomar Conference on Signals, Systems and Computers, pp. 359–363, Oct2006.

105

[38] J. Flordelis, F. Rusek, F. Tufvesson, E. G. Larsson, and O. Edfors, “Massive MIMOperformanceâĂŤTDD versus FDD: What Do Measurements Say?,” IEEE Transac-tions on Wireless Communications, vol. 17, pp. 2247–2261, April 2018.

[39] R. Rogalin, O. Y. Bursalioglu, H. C. Papadopoulos, G. Caire, and A. F. Molisch,“Hardware-impairment compensation for enabling distributed large-scale MIMO,” in2013 Information Theory and Applications Workshop (ITA), pp. 1–10, 2013.

[40] C. Shepard, H. Yu, N. Anand, E. Li, T. Marzetta, R. Yang, and L. Zhong, “Argos:Practical Many-Antenna Base Stations,” in Proceedings of the 18th Annual Interna-tional Conference on Mobile Computing and Networking, pp. 53–64, 2012.

[41] J. Saw, “Sprint 5G and the Power of Massive MIMO.” [Online] avail-able at: http://d18rn0p25nwr6d.cloudfront.net/CIK-0000101830/1f67e3cf-ae8f-4e9f-bdd9-fd09924d8b16.pdf.

[42] SIGNALBOOSTER, “What are the cellular frequencies of carriers in USA &CANADA?.” [Online] available at: https://www.signalbooster.com/pages/what-are-the-cellular-frequencies-of-cell-phone-carriers-in-usa-canada.

[43] J. Dai, A. Liu, and V. K. N. Lau, “FDD Massive MIMO Channel Estimation WithArbitrary 2D-Array Geometry,” IEEE Transactions on Signal Processing, vol. 66,pp. 2584–2599, May 2018.

[44] J. Ma, X. Yuan, and L. Ping, “Turbo Compressed Sensing with Partial DFT SensingMatrix,” IEEE Signal Processing Letters, vol. 22, pp. 158–161, Feb 2015.

[45] Y. Ding and B. D. Rao, “Dictionary Learning-Based Sparse Channel Representationand Estimation for FDD Massive MIMO Systems,” IEEE Transactions on WirelessCommunications, vol. 17, pp. 5437–5451, Aug 2018.

[46] E. J. Candes and M. B. Wakin, “An Introduction To Compressive Sampling,” IEEESignal Processing Magazine, vol. 25, pp. 21–30, March 2008.

[47] S. Ji, Y. Xue, and L. Carin, “Bayesian Compressive Sensing,” IEEE Transactions onSignal Processing, vol. 56, pp. 2346–2356, June 2008.

[48] O. Elijah, C. Y. Leow, T. A. Rahman, S. Nunoo, and S. Z. Iliya, “A comprehensivesurvey of pilot contamination in massive MIMO-5G system,” IEEE CommunicationsSurveys Tutorials, vol. 18, pp. 905–923, Secondquarter 2016.

[49] H. Q. Ngo, T. L. Marzetta, and E. G. Larsson, “Analysis of the pilot contaminationeffect in very large multicell multiuser MIMO systems for physical channel models,”in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), pp. 3464–3467, 2011.

[50] J. Jose, A. Ashikhmin, T. L. Marzetta, and S. Vishwanath, “Pilot Contaminationand Precoding in Multi-Cell TDD Systems,” IEEE Transactions on Wireless Com-munications, vol. 10, no. 8, pp. 2640–2651, 2011.

106

[51] H. Q. Ngo and E. G. Larsson, “EVD-based channel estimation in multicell multiuserMIMO systems with very large antenna arrays,” in 2012 IEEE International Confer-ence on Acoustics, Speech and Signal Processing (ICASSP), pp. 3249–3252, March2012.

[52] T. P. Minka, “Expectation propagation for approximate bayesian inference,” in Pro-ceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, (SanFrancisco, CA, USA), pp. 362–369, 2001.

[53] T. P. Minka, A Family of Algorithms for Approximate Bayesian Inference. PhDthesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2001.

[54] K. Ghavami and M. Naraghi-Pour, “Blind channel estimation and symbol detectionfor Multi-Cell Massive MIMO systems by expectation propagation,” IEEE Transac-tions on Wireless Communications, vol. 17, pp. 943–954, Feb 2018.

[55] J. Zhang, X. Yuan, and Y. A. Zhang, “Blind signal detection in massive MIMO:Exploiting the channel sparsity,” IEEE Transactions on Communications, vol. 66,pp. 700–712, Feb 2018.

[56] E. De Carvalho and D. T. M. Slock, “Cramer-Rao bounds for semi-blind, blind andtraining sequence based channel estimation,” in First IEEE Signal Processing Work-shop on Signal Processing Advances in Wireless Communications, pp. 129–132, 1997.

[57] K. Mawatwal, D. Sen, and R. Roy, “A semi-blind channel estimation algorithm formassive MIMO systems,” IEEE Wireless Communications Letters, vol. 6, pp. 70–73,Feb 2017.

[58] D. Hu, L. He, and X. Wang, “Semi-blind pilot decontamination for massive MIMOsystems,” IEEE Transactions on Wireless Communications, vol. 15, pp. 525–536, Jan2016.

[59] L. Chen and X. Yuan, “Blind multiuser detection in massive MIMO channels withclustered sparsity,” IEEE Wireless Communications Letters, vol. 8, no. 4, pp. 1052–1055, 2019.

[60] A. Mezghani and A. L. Swindlehurst, “Blind estimation of sparse broadband mas-sive MIMO channels with ideal and one-bit ADCs,” IEEE Transactions on SignalProcessing, vol. 66, no. 11, pp. 2972–2983, 2018.

[61] H. Liu, X. Yuan, and Y. J. Zhang, “Super-Resolution Blind Channel-and-Signal Es-timation for Massive MIMO With One-Dimensional Antenna Array,” IEEE Trans-actions on Signal Processing, vol. 67, no. 17, pp. 4433–4448, 2019.

[62] S. Liang, X. Wang, and L. Ping, “Semi-Blind Detection in Hybrid Massive MIMOSystems via Low-Rank Matrix Completion,” IEEE Transactions on Wireless Com-munications, vol. 18, no. 11, pp. 5242–5254, 2019.

107

[63] B. Srinivas, K. Mawatwal, D. Sen, and S. Chakrabarti, “A semi-blind based channelestimator for pilot contaminated one-bit massive MIMO systems,” in 2018 IEEE 88thVehicular Technology Conference (VTC-Fall), pp. 1–7, Aug 2018.

[64] W. Yan and X. Yuan, “Semi-Blind Channel-and-Signal Estimation for Uplink MassiveMIMO With Channel Sparsity,” IEEE Access, vol. 7, pp. 95008–95020, 2019.

[65] L. Li and Z. Wang, “A Novel Spatial Correlation Estimation Technique for MIMOCommunication System,” in IEEE Vehicular Technology Conference, pp. 1–5, 2006.

[66] L. Yang and J. Qin, “Performance of STBCs with antenna selection: spatial cor-relation and keyhole effects,” IEE Proceedings - Communications, vol. 153, no. 1,pp. 15–20, 2006.

[67] E. BjÃűrnson, J. Hoydis, and L. Sanguinetti, Massive MIMO Networks: Spectral,Energy, and Hardware Efficiency. 2017.

[68] A. K. Papazafeiropoulos and T. Ratnarajah, “Deterministic equivalent performanceanalysis of time-varying massive MIMO systems,” IEEE Transactions on WirelessCommunications, vol. 14, pp. 5795–5809, Oct 2015.

[69] R. Chopra, C. R. Murthy, H. A. Suraweera, and E. G. Larsson, “Performance anal-ysis of FDD massive MIMO systems under channel aging,” IEEE Transactions onWireless Communications, vol. 17, pp. 1094–1108, Feb 2018.

[70] W. C. Jakes and D. C. Cox, Microwave Mobile Communications. Wiley-IEEE Press,1994.

[71] S. Srivastava, A. Mishra, A. Rajoriya, A. K. Jagannatham, and G. Ascheid, “Quasi-Static and Time-Selective Channel Estimation for Block-Sparse Millimeter Wave Hy-brid MIMO Systems: Sparse Bayesian Learning (SBL) Based Approaches,” IEEETransactions on Signal Processing, vol. 67, pp. 1251–1266, March 2019.

[72] S. Kashyap, C. MollÃľn, E. BjÃűrnson, and E. G. Larsson, “Performance analysis of(TDD) massive MIMO with Kalman channel prediction,” in 2017 IEEE InternationalConference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3554–3558,March 2017.

[73] A. Forenza, D. J. Love, and R. W. Heath, “Simplified Spatial Correlation Models forClustered MIMO Channels With Different Array Configurations,” IEEE Transactionson Vehicular Technology, vol. 56, no. 4, pp. 1924–1934, 2007.

[74] Da-Shan Shiu, G. J. Foschini, M. J. Gans, and J. M. Kahn, “Fading correlation andits effect on the capacity of multielement antenna systems,” IEEE Transactions onCommunications, vol. 48, no. 3, pp. 502–513, 2000.

[75] K. E. Baddour and N. C. Beaulieu, “Autoregressive modeling for fading channelsimulation,” IEEE Transactions on Wireless Communications, vol. 4, pp. 1650–1662,July 2005.

108

[76] M. K. Tsatsanis, G. B. Giannakis, and G. Zhou, “Estimation and equalization offading channels with random coefficients,” in 1996 IEEE International Conferenceon Acoustics, Speech, and Signal Processing Conference Proceedings, vol. 2, pp. 1093–1096, May 1996.

[77] C. Komninakis, C. Fragouli, A. H. Sayed, and R. D. Wesel, “Multi-input multi-outputfading channel tracking and equalization using Kalman estimation,” IEEE Transac-tions on Signal Processing, vol. 50, pp. 1065–1076, May 2002.

[78] Y. Qi and T. P. Minka, “Window-based expectation propagation for adaptive signaldetection in flat-fading channels,” IEEE Transactions on Wireless Communications,vol. 6, pp. 348–355, Jan 2007.

[79] K. Ghavami and M. Naraghi-Pour, “MIMO detection with imperfect channel stateinformation using expectation propagation,” IEEE Transactions on Vehicular Tech-nology, vol. 66, pp. 8129–8138, Sep. 2017.

[80] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “The Multicell Multiuser MIMOUplink with Very Large Antenna Arrays and a Finite-Dimensional Channel,” IEEETransactions on Communications, vol. 61, no. 6, pp. 2350–2361, 2013.

[81] E. Bjornson, J. Hoydis, and L. Sanguinetti, “Massive MIMO Networks: Spectral,Energy, and Hardware Efficiency,” Foundations and Trends R© in Signal Processing,vol. 11, no. 3-4, pp. 154–655, 2017.

[82] T. Kim, “User scheduling and grouping in massive MIMO broadcast channels withheterogeneous users,” Journal of Communications and Networks, vol. 21, no. 4,pp. 385–394, 2019.

[83] J. Shen, J. Zhang, E. Alsusa, and K. B. Letaief, “Compressed CSI Acquisition in FDDMassive MIMO: How Much Training is Needed?,” IEEE Transactions on WirelessCommunications, vol. 15, pp. 4145–4156, June 2016.

[84] R. Zhang, H. Zhao, and J. Zhang, “Distributed Compressed Sensing Aided SparseChannel Estimation in FDD Massive MIMO System,” IEEE Access, vol. 6, pp. 18383–18397, 2018.

[85] W. U. Bajwa, J. Haupt, A. M. Sayeed, and R. Nowak, “Compressed Channel Sens-ing: A New Approach to Estimating Sparse Multipath Channels,” Proceedings of theIEEE, vol. 98, pp. 1058–1076, June 2010.

[86] Y. Zhou, M. Herdin, and A. Sayeed, “Experimental Study of MIMO Channel Statis-tics and Capacity via Virtual Channel Representation,” tech. rep., University ofWisconsin-Madison, Madison, WI, USA, 2007.

[87] S. K. Sahoo and A. Makur, “Signal Recovery from Random Measurements via Ex-tended Orthogonal Matching Pursuit,” IEEE Transactions on Signal Processing,vol. 63, pp. 2572–2581, May 2015.

109

[88] M. F. Duarte and Y. C. Eldar, “Structured Compressed Sensing: From Theory toApplications,” IEEE Transactions on Signal Processing, vol. 59, pp. 4053–4085, Sep.2011.

[89] X. Rao and V. K. N. Lau, “Distributed Compressive CSIT Estimation and Feed-back for FDD Multi-User Massive MIMO Systems,” IEEE Transactions on SignalProcessing, vol. 62, pp. 3261–3271, June 2014.

[90] L. Chen, A. Liu, and X. Yuan, “Structured Turbo Compressed Sensing for MassiveMIMO Channel Estimation Using a Markov Prior,” IEEE Transactions on VehicularTechnology, vol. 67, pp. 4635–4639, May 2018.

[91] Z. Gao, L. Dai, Z. Wang, and S. Chen, “Spatially Common Sparsity Based AdaptiveChannel Estimation and Feedback for FDD Massive MIMO,” IEEE Transactions onSignal Processing, vol. 63, pp. 6169–6183, Dec 2015.

[92] W. Wang, Y. Xiu, B. Li, and Z. Zhang, “FDD Downlink Channel Estimation SolutionWith Common Sparsity Learning Algorithm and Zero-Partition Enhanced GAMPAlgorithm,” IEEE Access, vol. 6, pp. 11123–11145, 2018.

[93] M. E. Tipping, “Sparse Bayesian Learning and the Relevance Vector Machine,” J.Mach. Learn. Res., vol. 1, pp. 211–244, Sept. 2001.

[94] Y. Ding and B. D. Rao, “Compressed Downlink Channel Estimation Based on Dic-tionary Learning in FDD Massive MIMO Systems,” in 2015 IEEE Global Communi-cations Conference (GLOBECOM), pp. 1–6, Dec 2015.

[95] A. F. Molisch, A. Kuchar, J. Laurila, K. Hugl, and R. Schmalenberger, “Geometry-based directional model for mobile radio channelsâĂŤprinciples and implementation,”European Transactions on Telecommunications, vol. 14, no. 4, pp. 351–359, 2003.

[96] A. Liu, V. K. N. Lau, and W. Dai, “Exploiting Burst-Sparsity in Massive MIMOWith Partial Channel Support Information,” IEEE Transactions on Wireless Com-munications, vol. 15, pp. 7820–7830, Nov 2016.

[97] J. Dai, A. Liu, and H. C. So, “Non-Uniform Burst-Sparsity Learning for Mas-sive MIMO Channel Estimation,” IEEE Transactions on Signal Processing, vol. 67,pp. 1075–1087, Feb 2019.

[98] J. Fang, Y. Shen, H. Li, and P. Wang, “Pattern-Coupled Sparse Bayesian Learning forRecovery of Block-Sparse Signals,” IEEE Transactions on Signal Processing, vol. 63,pp. 360–372, Jan 2015.

[99] D. G. Tzikas, A. C. Likas, and N. P. Galatsanos, “The variational approximationfor Bayesian inference,” IEEE Signal Processing Magazine, vol. 25, pp. 131–146,November 2008.

110

[100] J. Vila and P. Schniter, “Expectation-maximization Bernoulli-Gaussian approximatemessage passing,” in 2011 Conference Record of the Forty Fifth Asilomar Conferenceon Signals, Systems and Computers (ASILOMAR), pp. 799–803, 2011.

[101] Z. He, X. Yuan, and L. Chen, “Super-Resolution Channel Estimation for MassiveMIMO via Clustered Sparse Bayesian Learning,” IEEE Transactions on VehicularTechnology, vol. 68, pp. 6156–6160, June 2019.

[102] M. Seeger, “Expectation propagation for exponential families,” tech. rep., Universityof California at Berkeley, 485 Soda Hall, Berkeley CA, USA, 2005.

[103] A. Braunstein, A. P. Muntoni, A. Pagnani, and M. Pieropan, “Compressed sensingreconstruction using Expectation Propagation,” Journal of Physics A: Mathematicaland Theoretical, 2019.

[104] J. M. Hernandez-Lobato, D. Hernandez-Lobato, and A. Suarez, “Expectation prop-agation in linear regression models with spike-and-slab priors,” Machine Learning,vol. 99, pp. 437–487, Jun 2015.

[105] D. Hernandez-Lobato, J. M. Hernandez-Lobato, and P. Dupont, “Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propaga-tion,” Journal of Machine Learning Research, vol. 14, pp. 1891–1945, 2013.

[106] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incompletedata via the EM algorithm,” Journal of the Royal Statistical Society, Series B, vol. 39,no. 1, pp. 1–38, 1977.

[107] C. M. Bishop, Pattern Recognition and Machine Learning (Information Science andStatistics). Berlin, Heidelberg: Springer-Verlag, 2006.

[108] D. G. Tzikas, A. C. Likas, and N. P. Galatsanos, “The variational approximationfor Bayesian inference,” IEEE Signal Processing Magazine, vol. 25, pp. 131–146,November 2008.

[109] K. P. Murphy, Machine Learning: A Probabilistic Perspective. The MIT Press, 2012.

[110] D. Tse and P. Viswanath, Fundamentals of Wireless Communication. New York, NY,USA: Cambridge University Press, 2005.

[111] Y. Ding and B. D. Rao, “Dictionary Learning-Based Sparse Channel Representationand Estimation for FDD Massive MIMO Systems,” IEEE Transactions on WirelessCommunications, vol. 17, pp. 5437–5451, Aug 2018.

[112] J. Shen, J. Zhang, E. Alsusa, and K. B. Letaief, “Compressed CSI Acquisition in FDDMassive MIMO: How Much Training is Needed?,” IEEE Transactions on WirelessCommunications, vol. 15, pp. 4145–4156, June 2016.

[113] Hyun-Chul Kim and Z. Ghahramani, “Bayesian Gaussian Process Classification withthe EM-EP Algorithm,” IEEE Transactions on Pattern Analysis and Machine Intel-ligence, vol. 28, pp. 1948–1959, Dec 2006.

111

[114] R. M. Neal and G. E. Hinton, “A view of the EM algorithm that justifies incremental,sparse, and other variants,” in Learning in Graphical Models (M. I. Jordan, ed.),Dordrecht: Springer Netherlands, 1998.

[115] J. Nocedal and S. J. Wright, Numerical Optimization. New York, NY, USA: Springer,second ed., 2006.

[116] 3GPP, “Universal mobile telecommunications system (UMTS); Spatial channel modelfor multiple input multiple output (MIMO) simulations,” 3GPP TR 25.996 version11.0.0 Release 11, 2012.

[117] D. P. Wipf and B. D. Rao, “Sparse Bayesian learning for basis selection,” IEEETransactions on Signal Processing, vol. 52, pp. 2153–2164, Aug 2004.

[118] J. Yedidia, W. Freeman, and Y. Weiss, “Understanding Belief Propagation and ItsGeneralizations,” in Exploring Artificial Intelligence in the New Millennium (G. Lake-meyer and B. Nebel, eds.), ch. 8, pp. 239–236, Morgan Kaufmann Publishers, Jan.2003.

[119] F. R. Kschischang, B. J. Frey, and H. A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Transactions on Information Theory, vol. 47, pp. 498–519,Feb 2001.

[120] S. Srivastava, A. Mishra, A. Rajoriya, A. K. Jagannatham, and G. Ascheid, “Quasi-Static and Time-Selective Channel Estimation for Block-Sparse Millimeter Wave Hy-brid MIMO Systems: Sparse Bayesian Learning (SBL) Based Approaches,” IEEETransactions on Signal Processing, vol. 67, no. 5, pp. 1251–1266, 2019.

[121] arXiv, “arXiv Submittal Agreement Terms and Conditions.” [Online] available at:https://arxiv.org/help/policies/submission agreement.

112

Vita

Mohammed Rashid received the B.Sc. degree (awarded University Gold Medal for dis-

tinction) from the NWFP University of Engineering and Technology, Peshawar, Pakistan,

in 2012, and the M.Sc. degree from Louisiana State University, Baton Rouge, LA, USA, in

2018, both in electrical engineering. He is currently working toward the Ph.D. degree at the

Division of Electrical and Computer Engineering, the School of Electrical Engineering and

Computer Science, Louisiana State University, Baton Rouge, LA, USA. He is a Graduate

Research and Teaching Assistant with Louisiana State University. His research interests

include Bayesian machine learning algorithms, Bayesian Compressive Sensing, algorithms

for probabilistic inference on graphical models, and statistical information processing in

systems.

113

Date post:	20-Dec-2021
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times

Channel Estimation in Multi-user Massive MIMO Systems by ...

Documents