Date post: | 25-Feb-2018 |
Category: |
Documents |
Upload: | rkhanna1965 |
View: | 228 times |
Download: | 0 times |
of 130
7/25/2019 Tutorial T4
1/130
Theory of Large Dimensional
Random Matrices for Engineers(Part I)
Antonia M. TulinoUniversit degli Studi di Napoli, "Federico II"
The 9th International Symposium on Spread Spectrum Techniques and Applications,
Manaus, Amazon, Brazil,
August 28-31, 2006
7/25/2019 Tutorial T4
2/130
Outline
A brief historical tour of the main results in random matrix theory.
Overview some of the main transforms.
Fundamental limits of wireless communications: basic channels.
For these basic channels, we analyze some performance measures ofengineering interest.
Stieltjes transform, and its role in understanding eigenvalues of random
matrices
Limit theorems of three classes of random matrices
Proof of one of the theorems
1
7/25/2019 Tutorial T4
3/130
Outline
A brief historical tour of the main results in random matrix theory.
Overview some of the main transforms.
Fundamental limits of wireless communications: basic channels.
For these basic channels, we analyze some performance measures ofengineering interest.
Stieltjes transform, and its role in understanding eigenvalues of randommatrices
Limit theorems of three classes of random matrices
Proof of one of the theorems
2
7/25/2019 Tutorial T4
4/130
Outline
A brief historical tour of the main results in random matrix theory.
Overview some of the main transforms.
Fundamental limits of wireless communications: basic channels.
For these basic channels, we analyze some performance measures ofengineering interest.
Stieltjes transform, and its role in understanding eigenvalues of randommatrices
Limit theorems of three classes of random matrices
Proof of one of the theorems
3
7/25/2019 Tutorial T4
5/130
Outline
A brief historical tour of the main results in random matrix theory.
Overview some of the main transforms.
Fundamental limits of wireless communications: basic channels.
Performance measures of engineering interest (Signal Processing /Information Theory).
Stieltjes transform, and its role in understanding eigenvalues of randommatrices
Limit theorems of three classes of random matrices
Proof of one of the theorems
4
7/25/2019 Tutorial T4
6/130
Outline
A brief historical tour of the main results in random matrix theory.
Overview some of the main transforms.
Fundamental limits of wireless communications: basic channels.
Performance measures of engineering interest (Signal Processing /Information Theory).
Stieltjes transform, and its role in understanding eigenvalues of randommatrices(Part II)
Limit theorems of three classes of random matrices
Proof of one of the theorems
5
7/25/2019 Tutorial T4
7/130
Outline
A brief historical tour of the main results in random matrix theory.
Overview some of the main transforms.
Fundamental limits of wireless communications: basic channels.
Performance measures of engineering interest (Signal Processing /Information Theory).
Stieltjes transform, and its role in understanding eigenvalues of randommatrices(Part II)
Limit theorems of three classes of random matrices(Part II)
Proof of one of the theorems
6
7/25/2019 Tutorial T4
8/130
Outline
A brief historical tour of the main results in random matrix theory.
Overview some of the main transforms.
Fundamental limits of wireless communications: basic channels.
Performance measures of engineering interest (Signal Processing /Information Theory).
Stieltjes transform, and its role in understanding eigenvalues of randommatrices(Part II)
Limit theorems of three classes of random matrices(Part II)
Proof of one of the theorems(Part II)
7
7/25/2019 Tutorial T4
9/130
Introduction
Today random matrices find applications in fields as diverse as theRiemann hypothesis, stochastic differential equations, statistical physics,chaotic systems, numerical linear algebra, neural networks, etc.
Random matrices are also finding an increasing number of applicationsin the context of information theory and signal processing.
8
7/25/2019 Tutorial T4
10/130
Random Matrices & Information Theory
The applications in information theory include, among others: Wireless communications channels Learning and neural networks Capacity of ad hoc networks Speed of convergence of iterative algorithms for multiuser detection Direction of arrival estimation in sensor arrays
Earliest applications to wireless communication : works of Foschini andTelatar, in the mid-90s, on characterizing the capacity of multi-antennachannels.
A. M. Tulino and S. VerduRandom Matrices and Wireless Communications,
Foundations and Trends in Communications and Information Theory,vol. 1, no. 1, June 2004.
9
7/25/2019 Tutorial T4
11/130
Wireless Channels
y = Hx+ n
x =K-dimensional complex-valuedinputvector, y =N-dimensional complex-valuedoutputvector,
n =
N-dimensionaladditive Gaussian noise
H =N Krandom channel matrixknown to the receiver
This model applies to a variety of communication problems by simplyreinterpretingK,N, and H
Fading
Wideband
Multiuser
Multiantenna
10
7/25/2019 Tutorial T4
12/130
Multi-Antenna channels
K
N
y = Hx+ n
KandNnumber of transmit and receive antennas
H = propagation matrix: N Kcomplex matrix whose entries representthe gains between each transmit and each receive antenna.
11
7/25/2019 Tutorial T4
13/130
Multi-Antenna channels
K
N
N Prototype picture courtesy ofBell Labs (Lucent Technologies)
y = Hx+ n
KandNnumber of transmit and receive antennas
H = propagation matrix: N Kcomplex matrix whose entries representthe gains between each transmit and each receive antenna.
12
7/25/2019 Tutorial T4
14/130
Multi-Antenna channels
K
N
N Prototype picture courtesy ofBall Labs (Lucent Technologies)
y = Hx+ n
KandNnumber of transmit and receive antennas
H =propagation matrix: N Kcomplex matrix whose entries representthe gains between each transmit and each receive antenna.
13
7/25/2019 Tutorial T4
15/130
CDMA (Code-Division Multiple Access) Channel
Signal space withN dimensions.N= spreading gain = proportional to Bandwidth
Each user assigned a signature vector known at the receiver
User 2
User 1 chann
el
channel
chan
nel
...
.
.
.
...
Interface
Interface
x2
x1
DS-CDMA (Direct sequence CDMA) used in many current cellularsystems (IS-95, cdma2000, UMTS).
MC-CDMA (Multi-Carrier CDMA) being considered for 4G (FourthGeneration) wireless.
14
7/25/2019 Tutorial T4
16/130
DS-CDMA Flat-faded Channel
User 2
User 1 A11s1
A22 s2
Akk s
k
...
.
.
.
...
Interface
Interface
Front
Endy=Hx+n
x2
x1
y = H
SAx+ n= SAx+ n
K=number of users;N=processing gain. S = [s1 | . . . | sK]with skthe signature vector of the kth user. A is aKKdiagonal matrix containing the independent complex fading
coefficients for each user.
15
7/25/2019 Tutorial T4
17/130
Multi-Carrier CDMA (MC-CDMA)
User 2
C1s1
C2 s2
...
Front
End
Interface
User 1
Interface
...
y=Hx+n
y = HGS
x+ n= G Sx+ n
KandNrepresent thenumber of users and of subcarriers. H incorporates both the spreading and the frequency-selective fading i.e.
hnk=gnksnk n= 1, . . . , N k= 1, . . . , K S=[s1 | . . . | sK]with sk the signature vector of thekth user. G=[g1 | . . . | gK] is an N K matrix whose columns are
independentN-dimensional random vectors.
16
7/25/2019 Tutorial T4
18/130
Role of Singular Values
inWireless Communication
17
E i i l (A t ti ) S t l Di t ib ti
7/25/2019 Tutorial T4
19/130
Empirical (Asymptotic) Spectral Distribution
Definition: The ESD (Empirical Spectral Distribution) of anN
N
Hermitian random matrix A, FNA (),
FNA (x) =
1
N
N
i=11{i(A) x}
where1(A), . . . , N(A)are the eigenvalues of A.
If, asN
, FNA ()
converges almost surely (a.s), the corresponding limit(asymptotic ESD) is simply denoted by FA().FNA ()denotes theexpected ESD.
18
Role of Singular Values Mutual Information
7/25/2019 Tutorial T4
20/130
Role of Singular Values: Mutual Information
I(SNR ) = 1N
log det I+ SNRHH = 1N
Ni=1
log 1 + SNR i(HH)=
0
log (1 + SNR x) dFNHH
(x)
with FNHH(x)theESDof HH and with
SNR =N E[x2]KE[n2]
the signal-to-noise ratio, a key performance measure.
19
Role of Singular Values: Ergodic Mutual Information
7/25/2019 Tutorial T4
21/130
Role of Singular Values: Ergodic Mutual Information
In an ergodic time-varying channel,
E[I(SNR )] = 1NE
log detI+ SNRHH
=
0log (1 + SNR x) dFNHH(x)
whereFNHH
()denotes theexpected ESD.
20
High SNR Power Offset
7/25/2019 Tutorial T4
22/130
High-SNR Power Offset
For SNR , a regime of interest in short-range applications, the mutualinformation behaves as
I(SNR ) =S (log SNR + L) +o(1)
where the key measures are the high-SNRslope
S= limSNR
I(SNR)log SNR
which for most channels gives S= min
KN, 1
, and thepower offset
L= limSNR
log SNR I(SNR
)S
which essentially boils down tolog det(HH)or log det(HH) depending
on whetherK > NorK < N.
21
Role of Singular Values: MMSE
7/25/2019 Tutorial T4
23/130
Role of Singular Values: MMSE
The minimum mean-square error (MMSE) incurred in the estimation of the
input x based on the noisy observation at the channel output y for an i.i.d.Gaussian input:
MMSE= 1
KE[x x2] = 1
K
K
k=1E[|xk xk|2] = 1
K
K
k=1MMSEk
wherex is the estimate of x. For an i.i.d Gaussian input,
MMSE= 1
KtrI+ SNRHH
1 =
1
K
Ki=1
1
1 + SNR i(HH)
= 0
1
1 + SNR xdFKHH(x)
= N
K
0
1
1 + SNR xdFN
HH(x) N K
K
22
7/25/2019 Tutorial T4
24/130
In the Beginning ...
23
The Birth of (Nonasymptotic) Random Matrix Theory:
7/25/2019 Tutorial T4
25/130
The Birth of (Nonasymptotic) Random Matrix Theory:
(Wishart, 1928)
J. Wishart, The generalized product moment distribution in samples froma normal multivariate population,Biometrika, vol. 20 A, pp. 3252, 1928.
Probability density function of the Wishart matrix:
HH = h1h1+. . .+ hnh
n
where hiare iid zero-mean Gaussian vectors.
24
Wishart Matrices
7/25/2019 Tutorial T4
26/130
Wishart Matrices
Definition 1. The m
m random matrix A = HH is a (central)real/complex Wishart matrix with n degrees of freedom and covariancematrix, (A Wm(n,)), if the columns of them n matrixH are zero-mean independent real/complex Gaussian vectors with covariance matrix.1
The p.d.f. of a complex Wishart matrix A Wm(n,)forn mis
fA(B) = m(m1)/2
detnmi=1(n i)!
exp tr
1B detBnm. (1)
1If the entries of H have nonzero mean, HH is a non-central Wishart matrix.
25
Singular Values2: Fisher-Hsu-Girshick-Roy
7/25/2019 Tutorial T4
27/130
Singular Values : Fisher Hsu Girshick Roy
The joint p.d.f. of the ordered strictly positive eigenvalues of the Wishart
matrix HH:
R. A. Fisher, The sampling distribution of some statistics obtained fromnon-linear equations,The Annals of Eugenics, vol. 9, pp. 238249, 1939.
M. A. Girshick, On the sampling theory of roots of determinantalequations,The Annals of Math. Statistics, vol. 10, pp. 203204, 1939.
P. L. Hsu, On the distribution of roots of certain determinantal equations,The Annals of Eugenics, vol. 9, pp. 250258, 1939.
S. N. Roy, p-statistics or some generalizations in the analysis of varianceappropriate to multivariate problems, Sankhya, vol. 4, pp. 381396,1939.
26
Singular Values2: Fisher-Hsu-Girshick-Roy
7/25/2019 Tutorial T4
28/130
g y
Joint distribution of ordered nonzero eigenvalues (Fisher in 1939, Hsu in1939, Girshick in 1939, Roy in 1939):
t,rexp
ti=1
i
ti=1
rti
tj=i+1
(i j)2
wheret and r are the minimum and maximum of the dimensions of H.
The marginal p.d.f. of the unordered eigenvalues is
t1k=0
k!
(k+r t)!
Lrtk ()2
rte
where the Laguerre polynomials are Lnk () = 1k!e
n dk
dk
en+k
.
27
Singular Values2: Fisher-Hsu-Girshick-Roy
7/25/2019 Tutorial T4
29/130
g y
0
1
2
3
4
5
0
1
2
3
4
5
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Figure 1: Joint p.d.f. of the unordered positive eigenvalues of the Wishartmatrix HH withn = 3and m = 2.
28
Wishart Matrices: Eigenvectors
7/25/2019 Tutorial T4
30/130
Theorem 1. The matrix of eigenvectors of Wishart matrices is uniformly
distributed on the manifold of unitary matrices (Haar measure)
29
Unitarily invariant RMs
7/25/2019 Tutorial T4
31/130
Definition: An N
N self-adjoint random matrix A is called unitarily
invariant if the p.d.f. of A is equal to that of VAV for any unitary matrixV.
Property: If A is unitarily invariant, it admits the following eigenvaluedecomposition:
A = UU.
with U and independent.
Example A Wishart matrix is unitarily invariant.
A = 12(H + H) with H a N NGaussian matrix with i.i.d entries, is
unitarily invariant.
A = UBU with U Haar matrix and B independent on U, is unitarilyinvariant.
30
Bi-Unitarily invariant RMs
7/25/2019 Tutorial T4
32/130
Definition: AnN
Nrandom matrix A is calledbi-unitarily invariantif its
p.d.f. equals that of UAV for any unitary matrices U and V.
Property: If A is a bi-unitarily invariant RM, it has a polar decompositionA = UH with
U N NHaar RM. H N Nunitarily invariant positive-definite RM. U and H independent.
Example:
A complex Gaussian randon matrix with i.i.d. entries is bi-unitarilyinvariant.
An N K matrix Q uniformly distributed over the Stiefel manifold ofcomplexN Kmatrices such that QQ = I.
31
The Birth of Asymptotic Random Matrix Theory
7/25/2019 Tutorial T4
33/130
E. Wigner, Characteristic vectors of bordered matrices with infinitedimensions, The Annals of Mathematics, vol. 62, pp. 546564, 1955.
W = 1
N
0 +1 +1 1 1 +1+1 0 1 1 +1 +1+1 1 0 +1 +1 11 1 +1 0 +1 +1
1 +1 +1 +1 0
1
+1 +1 1 +1 1 0
As the matrix dimensionN , the histogram of the eigenvaluesconverges to thesemicircle law:
f(x) = 1
24 x2, 2< x
7/25/2019 Tutorial T4
34/130
E. Wigner, On the distribution of roots of certain symmetric matrices, The
Annals of Mathematics, vol. 67, pp. 325327, 1958.
If the upper-triangular entries are independent zero-mean randomvariables with variance 1N(standard Wigner matrix) such that, for someconstant, and sufficiently largeN
max1ijN
E|Wi,j|4
N2 (2)
Then, the empirical distribution of W converges almost surely to the
semicircle law
33
The Semicircle Law
7/25/2019 Tutorial T4
35/130
2.5 2 1.5 1 0.5 0 0.5 1 1.5 2 2.50
0.05
0.1
0.15
0.2
0.25
0.3
The semicircle law density function compared with the histogram of the average of 100
empirical density functions for a Wigner matrix of size N = 10.
34
Square matrix of iid coefficients
7/25/2019 Tutorial T4
36/130
Girko (1984),full-circle lawfor the unsymmetrized matrix
H = 1
N
+1 +1 +1 1 1 +11 1 1 1 +1 +1+1 1 1 +1 +1 1+1 1 1 1 +1 +11 1 +1 1 1 1
1
1 +1 +1 +1 +1
AsN , the eigenvalues of H are uniformly distributed on the unit disk.
1.5 1 0.5 0 0.5 1 1.51.5
1
0.5
0
0.5
1
1.5
The full-circle law and the eigenvalues of a realization of a500 500matrix
35
Full Circle Law
7/25/2019 Tutorial T4
37/130
V. L. Girko, Circular law, Theory Prob. Appl., vol. 29, pp. 694706, 1984.
Z. D. Bai, The circle law, The Annals of Probability, pp. 494529, 1997.
Theorem 2. LetH be anN N complex random matrix whose entriesare independent random variables with identical mean and variance andfinite kth moments for k 4. Assume that the joint distributions of thereal and imaginary parts of the entries have uniformly bounded densities.
Then, the asymptotic spectrum ofH converges almost surely to the circularlaw, namely the uniform distribution over the unit disk on the complex plane{ C: || 1} whose density is given by
fc() =1
|| 1 (3)
(also holds for real matrices replacing the assumption on the jointdistribution of real and imaginary parts with the one-dimensionaldistribution of the real-valued entries.)
36
Elliptic Law (Sommers-Crisanti-Sompolinsky-Stein, 1988)
7/25/2019 Tutorial T4
38/130
H. J. Sommers, A. Crisanti, H. Sompolinsky, and Y. Stein, Spectrum of
large random asymmetric matrices,Physical Review Letters, vol. 60,pp. 1895- 1899, 1988.
If the off-diagonal entries are Gaussian and pairwise correlated withcorrelation coefficient, the eigenvalues are asymptotically uniformlydistributed on an ellipse in the complex plane whose axes coincide with thereal and imaginary axes and have radii 1 +and 1 .
37
7/25/2019 Tutorial T4
39/130
What About the Singular Values?
38
Asymptotic Distribution of Singular Values:
Quarter circle law
7/25/2019 Tutorial T4
40/130
Quarter circle law
Consider anN Nmatrix H whose entries are independent zero-meancomplex (or real) random variables with variance 1N, the asymptoticdistribution of the singular values converges to
q(x) =1
4 x2, 0 x 2 (4)
39
Asymptotic Distribution of Singular Values:
Quarter circle law
7/25/2019 Tutorial T4
41/130
Quarter circle law
0 0.5 1 1.5 2 2.50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
The quarter circle law compared a histogram of the average of100empirical singular value density functions of a matrix of size 100 100.
40
Minimum Singular Value of Gaussian Matrix
7/25/2019 Tutorial T4
42/130
A. Edelman,Eigenvalues and condition number of random matrices. PhDthesis, Dept. Mathematics, MIT, Cambridge, MA, 1989.
J. Shen, On the singular values of Gaussian random matrices, LinearAlgebra and its Applications, vol. 326, no. 1-3, pp. 114, 2001.
Theorem 3. The minimum singular value of anN Nstandard complexGaussian matrixH satisfies
limN
P[N min x] =exx2/2. (5)
41
Marcenko-Pastur Law
7/25/2019 Tutorial T4
43/130
V. A. Marcenko and L. A. Pastur, Distributions of eigenvalues for somesets of random matrices, Math USSR-Sbornik, vol. 1, pp. 457483,
1967.
U. Grenander and J. W. Silverstein, Spectral analysis of networks withrandom topologies, SIAM J. of Applied Mathematics, vol. 32, pp. 449519, 1977.
K. W. Wachter, The strong limits of random matrix spectra for samplematrices of independent elements, The Annals of Probability, vol. 6,no. 1, pp. 118, 1978.
J. W. Silverstein and Z. D. Bai, On the empirical distribution ofeigenvalues of a class of large dimensional random matrices, J. of
Multivariate Analysis, vol. 54, pp. 175192, 1995.
Y. L. Cun, I. Kanter, and S. A. Solla, Eigenvalues of covariance matrices:Application to neural-network learning,Physical Review Letters, vol. 66,pp. 23962399, 1991.
42
Rediscovering/Strenghtening the Marcenko-Pastur Law
7/25/2019 Tutorial T4
44/130
V. A. Marcenko and L. A. Pastur, Distributions of eigenvalues for somesets of random matrices, Math USSR-Sbornik, vol. 1, pp. 457483,
1967.
U. Grenander and J. W. Silverstein, Spectral analysis of networks withrandom topologies, SIAM J. of Applied Mathematics, vol. 32, pp. 449519, 1977.
K. W. Wachter, The strong limits of random matrix spectra for samplematrices of independent elements, The Annals of Probability, vol. 6,no. 1, pp. 118, 1978.
J. W. Silverstein and Z. D. Bai, On the empirical distribution ofeigenvalues of a class of large dimensional random matrices, J. of
Multivariate Analysis, vol. 54, pp. 175192, 1995.
Y. L. Cun, I. Kanter, and S. A. Solla, Eigenvalues of covariance matrices:Application to neural-network learning,Physical Review Letters, vol. 66,pp. 23962399, 1991.
43
Marcenko-Pastur Law
7/25/2019 Tutorial T4
45/130
V. A. Marcenko and L. A. Pastur, Distributions of eigenvalues for somesets of random matrices,Math USSR-Sbornik, vol. 1, pp. 457483, 1967.
IfN K-matrix H has zero-mean i.i.d. entries with variance 1N, theasymptotic ESD of HH found in (Marcenko-Pastur, 1967) is
f(x) = [1 ]+ (x) +
[x a]+[b x]+2x
where[z]+ = max{0, z},
and
a= 1 2 b= 1 + 2 .K
N
44
Marcenko-Pastur Law
7/25/2019 Tutorial T4
46/130
V. A. Marcenko and L. A. Pastur, Distributions of eigenvalues for somesets of random matrices,Math USSR-Sbornik, vol. 1, pp. 457483, 1967.
IfN K-matrix H has zero-mean i.i.d. entries with variance 1N, theasymptotic ESD of HH found in (Marcenko-Pastur, 1967) is
f(x) = [1 ]+ (x) +
[x a]+[b x]+2x
(Bai 1999) The results also holds if only a unit second-moment condition isplaced on the entries of H and
1
K
E|Hi,j|2 1 {|Hi,j| } 0
for any >0 (Lindeberg-type conditionon the whole matrix).
45
Nonzero-Mean Matrices
7/25/2019 Tutorial T4
47/130
Lemma: (Yin 1986, Bai 1999): For anyN Kmatrices A and B,
supx0
|FNAA
(x) FNBB
(x)| rank(AB)N
.
Lemma: (Yin 1986, Bai 1999): For anyN
NHermitian matrices A and B,
supx0
|FNA (x) FNB (x)| rank(AB)
N .
Using these Lemmas, all results illustrated so far can be extended to
matrices whose mean has rankr wherer >1 but such that
limN
r
N = 0.
46
Generalizations needed!
C l t d E t i
7/25/2019 Tutorial T4
48/130
Correlated EntriesH = RSTS: N K matrix whose entries are independent complex random
variables (arbitrarily distributed)R: N N either deterministic or random matrix (whose asymptotic
spectrum of converges a. s. to a compactly supported measure).T: K K either deterministic or random matrix matrix whose
asymptotic spectrum converges a. s. to a compactly supportedmeasure.
Non-identically Distributed EntriesH be an N K complex random matrix with independent entries
(arbitrarily distributed) with identical means.
Var[Hi,j] =Gi,j
Nwith Gi,j uniformly bounded.
Special case : Doubly Regular Channels
47
Transforms
7/25/2019 Tutorial T4
49/130
1. Stieltjes transform
2. transform
3. Shannon transform
4. R-transform
5. S-transform
48
The Stieltjes Transform
7/25/2019 Tutorial T4
50/130
The Stieltjes transform (also called the Cauchy transform) of an arbitraryrandom variableXis defined as
SX(z) = E
1
X z
whose inversion formula was obtained in :T. J. Stieltjes, Recherches sur les fractions continues, Annales de laFaculte des Sciences de Toulouse, vol. 8 (9), no. A (J), pp. 147 (1122),1894 (1895).
fX() = lim0+
1
ImSX(+ )
49
The-Transform [Tulino-Verdu 2004]
7/25/2019 Tutorial T4
51/130
The-transform of a nonnegative random variableXis given by
X() = E
1
1 +X
whereis a nonnegative real number, and thus, 0 < X() 1.
Note:
X() =
k=0()kE[Xk],
50
-Transform of a Random Matrix
7/25/2019 Tutorial T4
52/130
Given aK KHermitian matrix A = HH,
the-transform of itsexpected ESDis
FNA
() = 1
K
Ki=1
E
1
1 +i(HH)
=
1
NE
trI+HH
1
the-transform of itsasymptotic ESDis
A() =
0
1
1 +xdFA(x) = lim
K
1
Ktr{(I+HH)1}
()= generating function for the expected (asymptotic) moments of A.
(SNR ) =Minimum Mean Square Error
51
The Shannon Transform [Tulino-Verdu 2004]
7/25/2019 Tutorial T4
53/130
The Shannon transform of a nonnegative random variable Xis defined as
VX() = E[log(1 +X)]where >0.
The Shannon transform gives the capacity of various noisy coherentcommunication channels.
52
Shannon Transform of a Random Matrix
7/25/2019 Tutorial T4
54/130
Given aN NHermitian matrix A = HH,
the Shannon transform of itsexpected ESDis
VFNA
() = 1
NE [log det (I+A)]
the Shannon transform of itsasymptotic ESDis
VA() = limN
1
Nlog det (I+A)
I(SNR ,HH) = V(SNR )
53
Stieltjes, Shannon and
7/25/2019 Tutorial T4
55/130
log e
d
dVX() = 1 1
SX
1
= 1 X()
SNRd
dSNRI(SNR ) = K
N(1 MMSE)
54
Stieltjes, Shannon and
7/25/2019 Tutorial T4
56/130
log e
d
dVX() = 1 1
SX
1
= 1 X()
SNRd
dSNRI(SNR ) = K
N(1 MMSE)
55
S-transform
7/25/2019 Tutorial T4
57/130
D. Voiculescu, Multiplication of certain non-commuting random variables,
J. Operator Theory, vol. 18, pp. 223235, 1987.
X(x) = x+ 1x
1X (1 +x) (6)
which maps(
1, 0)onto the positive real line.
56
S-transform: Key Theorem
O. Ryan, On the limit distributions of random matrices with independent or free entries,
7/25/2019 Tutorial T4
58/130
O. Ryan, On the limit distributions of random matrices with independent or free entries,
Com. in Mathematical Physics, vol. 193, pp. 595-626, 1998.
F. Hiai and D. Petz, Asymptotic freeness almost everywhere for random matrices, Acta
Sci. Math. Szeged, vol. 66, pp. 801-826, 2000.
Let A and B be independent random matrices, if either:
B is unitarily or bi-unitarily invariant,
or both A and B have i.i.d entries
then S-transform of the spectrum of AB is :
AB(x) = A(x)B(x)
and
AB() =A
B(AB() 1)
57
S-transform: Example
Let
7/25/2019 Tutorial T4
59/130
LetH = CQ
where: K N Q is an N Kmatrix independent of C and uniformly distributed over
the Stiefel manifold of complexN Kmatrices such that QQ = I.
Since Q is bi-unitarily invariant then
CQQC(SNR ) =CC
SNR
1 +CQQCCQQC(SNR )
and
VCQQC() = SNR
0
1
x(1 CQQC(x)) dx
58
Downlink MC-CDMA with Orthogonal Sequences and equal-power
y = CQAx+ n,
7/25/2019 Tutorial T4
60/130
where:
Q = the orthogonal spreading sequences
A = theK Kdiagonal matrix of transmitted amplitudes with A = I C= theN Nmatrix of fading coefficients.
1
K
K
k=1
MMSEka.s.
Q
C
CQ
(SNR ) = 1
1
(1
CQQ
C(SNR ))
An alternative characterization of the Shannon-transform (inspired by the optimality by
successive cancellation with MMSE ) is
VCQQC
() = E [log (1 + (Y, ))]
with(y, )
1 + (y, ) = E
|C|2
y |C|2 + 1 + (1 y)(y, )
whereYis a random variable uniform on [0, 1].
59
R-transform
7/25/2019 Tutorial T4
61/130
D. Voiculescu, Addition of certain non-commuting random variables,J.
Funct. Analysis, vol. 66, pp. 323346, 1986.
RX(z) = S1X (z) 1
z. (7)
R-transform and-transform
The R-transform (restricted to the negative real axis) of a non-negativerandom variableXis given by
RX() =X()
1
withand satisfying = X()
60
R-transform: Key Theorem
O. Ryan, On the limit distributions of random matrices with independent or free entries,
Com in Mathematical Ph sics ol 193 pp 595 626 1998
7/25/2019 Tutorial T4
62/130
Com. in Mathematical Physics, vol. 193, pp. 595-626, 1998.
F. Hiai and D. Petz, Asymptotic freeness almost everywhere for random matrices, ActaSci. Math. Szeged, vol. 66, pp. 801-826, 2000.
Let A and B be independent random matrices, if either:
B is unitarily or bi-unitarily invariant,
or both A and B have i.i.d entries
then theR-transformof the spectrum of the sum is RA+B= RA+ RBand
A+B() =A(a) +B(b) 1witha,bandsatisfying the following pair of equations:
a A(a) = A+B() =b B(b)
61
Random Quadratic Forms
Z D B i d J W Sil i N i l id h f h
7/25/2019 Tutorial T4
63/130
Z. D. Bai and J. W. Silverstein, No eigenvalues outside the support of the
limiting spectral distribution of large dimensional sample covariancematrices, The Annals of Probability, vol. 26, pp. 316345, 1998.
Theorem 4. Let the components of theN-dimensional vectorx be zero-mean and independent with variance 1N. For any N N nonnegativedefinite random matrixB independent of x whose spectrum converges
almost surely,
limN
x (I+B)1x =B() a.s. (8)
limN
x (B zI)1 x = SB(z) a.s. (9)
62
Rationale
7/25/2019 Tutorial T4
64/130
Stieltjes: Description of asymptotic distribution of singular values +tool for proving results (Marcenko-Pastur (1967))
: Description of asymptotic distribution of singular values +signal processing insight
Shannon: Description of asymptotic distribution of singular values +information theory insight
63
Non-asymptotic Shannon Transform
Example: For N K-matrix H having zero-mean i i d Gaussian entries:
7/25/2019 Tutorial T4
65/130
Example: ForN K-matrix H having zero-mean i.i.d. Gaussian entries:
V(SNR ) =t1k=0
k1=0
k2=0
k
1
(k+r t)!(1)1+2I1+2+rt(SNR )(k 2)!(r t+1)!(r t+2)!2!
I0(SNR ) =
e
1SNR Ei(
1
SNR)
In(SNR ) =nIn1(SNR ) + (SNR )n
I0(SNR ) +n
k=1
(k 1)!(SNR )k
For the-Transform
(SNR ) = 1 SNR
d
dSNRV(SNR )
64
Asymptotics
K
7/25/2019 Tutorial T4
66/130
K
N
KN
65
Shannon and-Transform of Marcenko-Pastur Law
Example: The Shannon transform of the Marcenko-Pastur law is
7/25/2019 Tutorial T4
67/130
V(SNR ) = log1 + SNR 14F(SNR , )
+1
log
1 + SNR 1
4F(SNR , )
log e
4 SNRF(SNR , )
where
F(x, z) =
x(1 +
z)2 + 1
x(1 z)2 + 12
while its-transform is
HH(SNR ) =
1 F(SNR , )
4 SNR
66
7/25/2019 Tutorial T4
68/130
Asymptotics
7/25/2019 Tutorial T4
69/130
Distribution Insensitivity: The asymptotic eigenvalue distribution does
not depend on the distribution with which the independent matrixcoefficients are generated.
Ergodicity: The eigenvalue histogram of one matrix realizationconverges almost surely to the asymptotic eigenvalue distribution.
Speed of Convergence: 8 = .
68
Marcenko-Pastur Law: Applications
7/25/2019 Tutorial T4
70/130
Unfaded Equal-Power DS-CDMA
Canonical model (i.i.d. Rayleigh fading MIMO channels)
Multi-Carrier CDMA channels whose sequences have i.i.d. entries
69
More General Models
Correlated Entries
7/25/2019 Tutorial T4
71/130
H = RSTS: N K matrix whose entries are independent complex random
variables (arbitrarily distributed) with identical means and variance 1N.R: N Nrandom matrix whose asymptotic spectrum of converges a.
s. to a compactly supported measure.T: K
Krandom matrix whose asymptotic spectrum converges a. s.
to a compactly supported measure.
Non-identically Distributed EntriesH be an N K complex random matrix with independent entries
(arbitrarily distributed) with identical means.
Var[Hi,j] =Gi,j
Nwith Gi,j uniformly bounded.
Special case : Doubly Regular Channels
70
Doubly Regular Matrices [Tulino-Lozano-Verdu,2005]
Definition:AnN Kmatrix P isasymptotically mean row-regularif
7/25/2019 Tutorial T4
72/130
limK
1K
Kj=1
Pi,j
is independent ofi as KN .
Definition:P isasymptotically mean column-regularif its transpose isasymptotically mean row-regular.
Definition:P isasymptotically mean doubly-regularif it is bothasymptotically mean row-regular and asymptotically mean column-regular.
If the limits limN
1
N
Ni=1
Pi,j = limK
1
K
Kj=1
Pi,j = 1 then P is standard
asymptotically mean doubly-regular.
71
Regular Matrices: Example
An N K rectangular Toeplitz matrix
7/25/2019 Tutorial T4
73/130
AnN
Krectangular Toeplitz matrix
Pi,j =(i j)
withK Nis an asymptotically mean row-regular matrix.
If either the function is periodic or N = K, then the Toeplitz matrix isasymptotically mean doubly-regular.
72
Double Regularity: Engineering Insight
7/25/2019 Tutorial T4
74/130
text
H=P S1/2
o
where S has i.i.d. entries with variance 1Nand thus Var [Hi,j] = P
i,jN
gain between copolar antennas () different from gain betweencrosspolar antennas () and thus when antennas with two orthogonalpolarizations are used
P =
. . . . . . . . .... ... ... ... . . .
which is mean doubly regular.
73
7/25/2019 Tutorial T4
75/130
One-Side Correlated Entries
Let H = S (or H =
S) with:
7/25/2019 Tutorial T4
76/130
S: NKmatrix whose entries are independent (arbitrarily distributed) withidentical mean and variance 1N.
: K K (or N N) deterministic correlation matrix whose asymptoticESD converges to a compactly supported measure.
Then,
VHH() =V(HH) + log 1
HH+ (HH 1) log e
withHH()satisfying
= 1 HH1 (HH)
.
75
One-Side Correlated Entries: Applications
Multi-Antenna Channels with correlation either only at the transmitter or
7/25/2019 Tutorial T4
77/130
Multi Antenna Channels with correlation either only at the transmitter or
at the receiver.
DS-CDMA with Frequency-Flat Fading; in this case
= AA with A the K K diagonal matrix of complex fadingcoefficients
76
Correlated Entries
Let
7/25/2019 Tutorial T4
78/130
H = CSA
S: N Kcomplex random matrix whose entries are i.i.d with variance 1N.
R= CC: N Neither determinist or random matrix such that its ESDconverges a.s. to a compactly supported measure.
T= AA: KKeither determinist or random matrix such that its ESD of
converges a.s. to a compactly supported measure.
Definition:Let Rand Tbe independent random variables withdistributions given by the asymptotic ESD of Rand T.
77
Correlated Entries: Applications
Multi-Antenna Channels with correlation at the transmitter and receiver
7/25/2019 Tutorial T4
79/130
(Separable correlation model); in this case: R=the receive correlation matrix respectively,
T=the transmit correlation matrix.
Downlink MC-CDMA with frequency-selective fading and i.i.d sequences;in this case:
C =theN Ndiagonal matrix containing fading coefficient for eachsubcarrier,
A =the KKdeterministic diagonal matrix containing the amplitudesof the users.
78
Correlated Entries: Applications
Downlink DS-CDMA with Frequency-Selective Fading; in this case:
7/25/2019 Tutorial T4
80/130
q y g
C =the N NToeplitz matrix defined as:
(C)i,j = 1
Wcc
i jWc
withc()the impulse response of the channel,
A = K K deterministic diagonal matrix containing the amplitudes ofthe users.
79
Correlated Entries: Shannon and-transform
[Tulino-Lozano-Verdu, 2003]
The-transform is:
7/25/2019 Tutorial T4
81/130
HH() =R( r()).
The Shannon transform is:
VHH() =
VR(r) +
VT(t)
rt
log e
where
rt
= 1 T(t) rt
= 1 R(r)
80
7/25/2019 Tutorial T4
82/130
Arbitrary Numbers of Dimensions: Shannon Transform of
Correlated channels
The-transform is:
7/25/2019 Tutorial T4
83/130
HH()
1
nR
nRi=1
1
1 + i(R) r.
The Shannon transform is:
VHH() nR
i=1
log2 (1 + i(R) r)+
nTj=1
log2 (1 +j(T) t)t r
log2 e
r
= 1
nT
nTj=1
j(T)
1 +j(T)t
t
= 1
nR
nRi=1
i(R)
1 + i(R) r.
82
Example: Mutual Information of a Multi-Antenna Channel
.
(IID)d3
2.5
d=2
d=1
7/25/2019 Tutorial T4
84/130
simulation
first-orderlow SNR
analytical
.1.5
2
d=1
d=2
capacity
isotropicinput
d
transmitter
receiver
max( )
T dB
-8 -6 -4 -2 0 2
Eb/N 0 (dB)
rate/b
andwidth
(bits
/s/Hz)
g nR
1dB -6 g n
R
1dB -4g n
R
1dB -8 g n
R
1dB+2g n
R
1dB
g nR
1
dB-1.59( )
The transmit correlation matrix: (T)i,j e0.05 d2(ij)2 withd antenna
spacing (wavelengths).
83
Correlated Entries (Hanly-Tse, 2001)
S be aN Kmatrix with i.i.d entries A= diag{A1,, . . . ,AK,} where {Ak,} are i.i.d. random variables
7/25/2019 Tutorial T4
85/130
S be aN L Kmatrix with i.i.d entries P aKKdiagonal matrix whosek-th diagonal entry(P)k,k =
L=1 A
2k,.
The distribution of the singular values of the matrix
H = SA1 SAL
(11)is the same as the distribution of the singular values of the matrix
SP
Applications: DS-CDMA with Flat Fading and Antenna Diversity: {Ak,} are the i.i.d.fading coefficients of thekth user at theth antenna and S is the signature matrix.
Engineering interpretation: the effective spreading gain= the CDMA spreading gain
the number of receive antennas
84
Non-identically Distributed Entries
Let H be anN Kcomplex random matrix:
7/25/2019 Tutorial T4
86/130
Entries are independent (arbitrarily distributed) satisfying the Lindebergcondition and with identical means,
Var[Hi,j] =
Pi,j
Nwhere P is an N Kdeterministic matrix whose entries are uniformlybounded.
85
Arbitrary Numbers of Dimensions: Shannon Transform
for IND Channels
( )
nT
l ( )
nR
l
nT
(P)
nT
7/25/2019 Tutorial T4
87/130
VHH() j=1
log2 (1 +j) +i=1
log21 +
nTj=1
(P)i,jj nT
j=1
jj
where
j = 1
nR
nR
i=1(P)i,j
1 + 1nT
nT
j=1
(P)i,jj
j =
1 +j
SNR j = SINR exhibited byxj at the output of a linear MMSE receiver,
j/SNR= the corresponding MSE.
86
Non-identically Distributed Entries: Special cases
P is asymptotic doubly regular. In which case:
7/25/2019 Tutorial T4
88/130
VHH()and HH() Shannon and of the Marcenko-Pastur Law.
P is the outer product of the nonnegativeN-vector R andK-vector T.In this case:
G =RT
H = diag(R)Sdiag(T)
87
Non-identically Distributed Entries: Applications
MC-CDMA frequency-selective fading and i.i.d sequences (Uplink and
Downlink)
7/25/2019 Tutorial T4
89/130
Downlink). Uplink DS-CDMA with Frequency-Selective Fading:
L. Li, A. M. Tulino, and S. Verdu, Design of reduced-rank MMSE multiuser detectors
using random matrix methods,IEEE Trans. on Information Theory, vol. 50, June 2004.
J. Evans and D. Tse, Large system performance of linear multiuser receivers in
multipath fading channels,IEEE Trans. on Information Theory, vol. 46, Sep. 2000.
J. M. Chaufray, W. Hachem, and P. Loubaton, Asymptotic analysis of optimum and
sub-optimum CDMA MMSE receivers, Proc. IEEE Int. Symp. on Information Theory
(ISIT02), p. 189, July 2002.
88
Non-identically Distributed Entries: Applications
Multi-Antenna Channels with Polarization Diversity:
H P Hw
7/25/2019 Tutorial T4
90/130
H = P Hwwhere Hwis zero-mean i.i.d. Gaussian and P is a deterministic matrixwith nonnegative entries.(P)i,j is the power gain between the jth transmit and ith receiveantennas, determined by their relative polarizations.
Non-separable Correlations
H = UHwU
where UR and UT are unitary while the entries of H are independentzero-mean Gaussian. A more restrictive case is when UR andUT are Fouriermatrices.
This model is advocated and experimentally supported in W. Weichselberger et all,
A stochastic mimo channel model with joint correlation of both link ends, IEEE Trans.
on Wireless Com., vol. 5, no. 1, pp. 90100, 2006.
89
Example: Mutual Information of a Multi-Antenna Channel
12
10 simulationanalytical
s/s
/Hz)
7/25/2019 Tutorial T4
91/130
4
6
2
8
10
.
simulation
G=0.4 3.6 0.5
0.3 1 0.2
-10 -5 0 5 10 15 20
SNR (dB)
Mutu
alInformation(bits/s
90
Ergodic Regime
{Hi
}varies ergodically over the duration of a codeword.
7/25/2019 Tutorial T4
92/130
{ }
The quantity of interest is then the mutual information averaged over thefading, E
I(SNR ,HH), withI(SNR ,HH) = 1
N log det I+ SNRHH
91
Non-ergodic Conditions
Often, however, H is held approximately constant during the span of acodeword
Outage capacity (cumulative distribution of mutual information)
7/25/2019 Tutorial T4
93/130
Outage capacity (cumulative distribution of mutual information),Pout(R) = P[log det(I+ SNRHH
)< R]
The normalized mutual information converges a.s. to its expectation asK, N (hardening / self-averaging)
1
N log det(I+ SNRHH)
a.s. VHH(SNR ) = limN
1
NE[log det(I+ SNRHH)]
However, non-normalized mutual information
I(SNR ,HH
) = log det(I+ SNRHH
)
still suffers random fluctuations that, while small relative to the mean, arevital to the outage capacity.
92
CLT for Linear Spectral Statistics
7/25/2019 Tutorial T4
94/130
Z. D. Bai and J. W. Silverstein, CLT of linear spectral statistics of largedimensional sample covariance matrices, Annals of Probability, vol. 32, no.1A, pp. 553605, 2004.
93
IID Channel
AsK, N with KN , the random variable
N = log det(I+ SNRHH) NV (SNR )
7/25/2019 Tutorial T4
95/130
N= log det(I+ SNRHH) NVHH(SNR )
is asymptotically zero-mean Gaussian with variance
E2= log1 (1 HH(SNR ))2
94
IID Channel
For fixed numbers of antennas, mean and variance of the mutual
information of the IID channel given by [Smith & Shafi 02] and [Wang& Gi ki 04] A i t lit b d i ll
7/25/2019 Tutorial T4
96/130
g y [ ] [ g& Giannakis 04]. Approximate normality observed numerically.
Arguments supporting the asymptotic normality of the cumulativedistribution of mutual information given:
in [Hochwald et al. 04], for SNR 0or SNR . in [Moustakas et al. 03] using the replica method from statistical
physics (not yet fully rigorized).
in [Kamath et al. 02], asymptotic normality proved rigorously for anySNR using Bai & Silversteins CLT.
95
One-Side Correlated Wireless Channel (H = ST)[Tulino-Verdu,2004]
Theorem:AsK, N
with KN
, the random variable
7/25/2019 Tutorial T4
97/130
N= log det(I+ SNRSTS) NVSTS(SNR )
is asymptotically zero-mean Gaussian with variance
E[2] = log1 E TSNR STS(SNR )
1 + TSNR STS(SNR )
2with expectation over the nonnegative random variable Twhose
distribution equals the asymptotic ESD of T.
96
Examples
In the examples that follow, transmit antennas correlated with
(T)i,j =e0.2(ij)2
hi h i t i l f l t d b t ti i b bi Th i
7/25/2019 Tutorial T4
98/130
which is typical of an elevated base station in suburbia. The receiveantennas are uncorrelated.
The outage capacity is computed by applying our asymptotic formulas tofinite (and small) matrices,
VSTS(SNR ) 1
N
Kj=1
log (1 + SNRj(T) ) log + ( 1)log e
= 11+SNR 1
K
K
j=1
j(T)
1+SNR j (T)
E[2] = log
1 1KK
j=1
j(T)SNR
1+j(T)SNR
2
97
Example: Histogram
0.6
0.7
0.8
7/25/2019 Tutorial T4
99/130
3 2 1 0 1 2 30
0.1
0.2
0.3
0.4
0.5
K= 5transmit andN= 10receive antennas, SNR = 10.
98
Example: 10%-Outage Capacity (K=N= 2)
.
.Gaussian approximation
Simulation
10
12
14
ty(b
its/s/Hz)
7/25/2019 Tutorial T4
100/130
Transmitter
(K=2)Receiver
(N=2)
0
2
4
6
8
10
0 5 10 15 20 25 30 35 40SNR (dB)
10%O
utageCapacit
SNR (dB) Simul. Asympt.
0 0.52 0.50
10 2.28 2.27
99
Example: 10%-Outage Capacity (K= 4, N= 2)
.
.Gaussian approximation
Simulation
12
16
ty(b
its/s/Hz)
7/25/2019 Tutorial T4
101/130
Transmitter
(K=4)
Receiver(N=2)
0
4
8
12
0 5 10 15 20 25 30 35 40SNR (dB)
10%O
utageCapacit
100
Summary
Various wireless communication channels: analysis tackled with the aidof random matrix theory.
Shannon and -transforms, motivated by the application of random
7/25/2019 Tutorial T4
102/130
Shannon and transforms, motivated by the application of randommatrices to the theory of noisy communication channels.
Shannon transforms and-transforms for the asymptotic ESD of severalclasses of random matrices.
Application of the various findings to the analysis of several wirelesschannel in both ergodic and non-ergodic regime.
Succinct expressions for the asymptotic performance measures.
Applicability of these asymptotic results to finite-size communicationsystems.
101
Reference
A. M. Tulino and S. VerduRandom Matrices and Wireless Communications,
Foundations and Trends in Communications and Information Theory,vol 1 no 1 June 2004
7/25/2019 Tutorial T4
103/130
vol. 1, no. 1, June 2004.http://dx.doi.org/10.1561/0100000001
102
Theory of Large Dimensional
Random Matrices for Engineers(P t II)
7/25/2019 Tutorial T4
104/130
Random Matrices for Engineers(Part II)
Jack SilversteinNorth Carolina State University
The 9th International Symposium on Spread Spectrum Techniques and Applications,
Manaus, Amazon, Brazil,
August 28-31, 2006
1. Introduction. LetM(R) denote the collection of all sub-
probability distribution functions on R. We say for{FN} M(R),
7/25/2019 Tutorial T4
105/130
FN converges vaguely to F M(R) (written FN v F) if for all
[a, b], a, b continuity points ofF, limN FN
{[a, b]
}= F
{[a, b]
}. We
write FND F, when FN, F are probability distribution functions
(equivalent to limN FN(a) =F(a) for all continuity points a ofF).
For F M(R),
SF(z) 1x z dF(x), z C+ {z C : z >0}is defined as the Stieltjes transform ofF.
1
Properties:
1.SF is an analytic function on C+.
7/25/2019 Tutorial T4
106/130
2.SF(z)>0.
3.|SF
(z)|
1
z.
4. For continuity pointsa < b ofF
F{[a, b]} = 1
lim0+
ba
SF(+i)d.
5. If, for x0 R,SF(x0) limzC+x0 SF(z) exists, then F isdifferentiable at x0 with value (
1 )SF(x0) (Silverstein and Choi
(1995)).
2
LetS C+ be countable with a cluster point in C+. Using 4., the
fact that FNv
F is equivalent to
7/25/2019 Tutorial T4
107/130
fN(x)dFN(x) f(x)dF(x)
for all continuous f vanishing at, and the fact that an analytic
function defined onC+
is uniquely determined by the values it takes
on S, we have
FNv F SFN(z) SF(z) for all z S.
3
The fundamental connection to random matrices:
For any HermitianNNmatrix A, we letFA denote theempiricaldi t ib ti f ti i i l t l di t ib ti (ESD) f it
7/25/2019 Tutorial T4
108/130
distribution function, or empirical spectral distribution (ESD), of its
eigenvalues:
FA(x) = 1N
(number of eigenvalues ofA x).
Then
SFA(z) =
1
N
tr (A
zI)1.
So, if we have a sequence {AN} of Hermitian random matrices, to show,with probability one, FAN
v F for some F M(R), it is equivalentto show for any z C+
1
Ntr (AN zI)1 SF(z) a.s.
For the remainder of the lectureSA will denoteSFA .
4
The main goal of this part of the tutorial is to present results
on the limiting ESD of three classes of random matrices. The results
are expressed in terms of limit theorems, involving convergence of the
7/25/2019 Tutorial T4
109/130
are expressed in terms of limit theorems, involving convergence of the
Stieltjes transforms of the ESDs. An outline of the proof of the first re-
sult will be given. The proof will clearly indicate the importance of theStieltjes transform to limiting spectral behavior. Essential properties
needed in the proof will be emphasized in order to better understand
where randomness comes in and where basic properties of matrices are
used.
5
For each of the theorems, it is assumed that the sequence of random
matrices are defined on a common probability space. They all assume:
7/25/2019 Tutorial T4
110/130
For N = 1, 2, . . . X = XN = (XNij), N K, XNij C, i.d. for all
N,i,j, independent across i, j for each N, E|X
1
1 1 EX1
1 1|2
= 1, and
K=K(N) with K/N >0 as N .
Let S=SN = (1/
N)XN.
6
Theorem 1.1(Marcenko and Pastur (1967), Silverstein and Bai
(1995)). Let T be aK
Kreal diagonal random matrix whose ESD
converges almost surely in distribution, as N to a nonrandomli i L T d d i bl i h hi li i i di ib i
7/25/2019 Tutorial T4
111/130
limit. Let T denote a random variable with this limiting distribution.
LetW0 be anN NHermitian random matrix with ESD converging,
almost surely, vaguely to a nonrandom distributionW0 with Stieltjestransform denoted byS0. AssumeS, T, andW0 to be independent,Then the ESD of
W=W0+ STS
converges vaguely, asN , almost surely to a nonrandom distribu-tion whose Stieltjes transform,S(), satisfies forz C+
(1.1) S(z) = S0
z E T1 + TS(z)
.
It is the only solution to (1.1) in C+.
7
Theorem 1.2 (Silverstein, in preparation). Define H = CSA,
whereC isN
N andA isK
K, both random. Assume that the
ESDs of D = CC and T = AA converge almost surely in distri-bution to nonrandom limits and let D and T denote random variables
7/25/2019 Tutorial T4
112/130
bution to nonrandom limits, and let DandTdenote random variables
distributed, respectively, according to those limits. AssumeC, A and
S to be independent. Then the ESD of HH converges in distribu-
tion, asN , almost surely to a nonrandom limit whose Stieltjestransform,S(), is given forz C+ by
S(z) =E 1
DE
T1+(z)T
z ,where (z) satisfies
(1.2) (z) = E D
DE
T
1+(z)T
z
.(z) is the only solution to (1.2) in C+.
8
Theorem 1.3 (Dozier and Silverstein). Let H0 be N K, ran-dom, independent ofS, such that the ESD ofH0H
0 converges almost
surely in distribution to a nonrandom limit, and let M denote a ran-
7/25/2019 Tutorial T4
113/130
dom variable with this limiting distribution. Let K>0be nonrandom.
Define
H=S + KH0.
Then the ESD ofHH converges in distribution, as N , almost
surely to a nonrandom limit whose Stieltjes transformS satisfies foreach z C+
(1.3) S(z) = E 1KM1+S(z)
z(1 + S(z)) + ( 1)
.
S(z) is the only solution to (1.3) with bothS(z) and zS(z) in C+.
9
Remark: In Theorem 1.1 ifW0 = 0 for all N large, thenS0(z) =
1/z and we find that
S=
S(z) has an inverse
(1 4) z =1
+ E
T
7/25/2019 Tutorial T4
114/130
(1.4) z= S +E
1 + TS
.
All of the analytic behavior of the limiting distribution can be extracted
from this equation (Silverstein and Choi).
Explicit solutions can be derived in a few cases. Consider the
Marcenko-Pastur distribution, where T = I, that is, the matrix is
simply SS
. ThenS= S(z) solvesz= 1S +
1
1 + S,
resulting in the quadratic equation
zS2 + S(z+ 1 ) + 1 = 0
10
with solution
S=(z+ 1 ) (z+ 1 )2 4z2z
7/25/2019 Tutorial T4
115/130
=(z+ 1 ) z2 2z(1 +) + (1 )2
2z
=(z+ 1 )
(z (1 )2)(z (1 + )2)
2z
We see the imaginary part ofSgoes to zero whenzapproaches the realline and lies outside the interval [(1 )2, (1 + )2], so we concludefrom property 5. that for all x= 0 the limiting distribution has adensity f given by
f(x) =
(x(1)2)((1+)2x)
2x x ((1 )2, (1 + )2)0 otherwise.
11
Considering the value of(the limit of columns to rows) we can
conclude that the limiting distribution has no mass at zero when 1,and has mass 1 at zero when
7/25/2019 Tutorial T4
116/130
0, A and
7/25/2019 Tutorial T4
121/130
B N N with A Hermitian, andq CN. Then
|trB((A z1I)1 (A z2I)1)| |z2 z1|NB 1v2
, and
|q
B(A z1I)1
q q
B(A z2I)1
q| |z2 z1| q2
B1
v2 .
17
We now outline the proof of Theorem 1.1. Write T= diag(t1, . . . , tK).
Let qi denote the ith column ofS. Then
STSK
t
7/25/2019 Tutorial T4
122/130
STS =i=1
tiqiqi.
Let W(i) =W tiqiqi . For any z C+ and x C we write
W
zI=W0
(z
x)I + (1/N)STS
xI.
Taking inverses we have
(W0 (z x)I)1
= (W zI)1 + (W0 (z x)I)1((1/N)STS xI)(W zI)1.
18
Dividing by N, taking traces and using Lemma 2.1 we find
SW0(zx)SW(z) = (1/N)tr (W0(zx)I)1Ki=1
tiqiqixI
(WzI)1
7/25/2019 Tutorial T4
123/130
i 1
= (1/N)
ni=1
tiqi (W(i) zI)1(W0 (z x)I)1qi
1 +tiqi (W(i) zI)1qi
x(1/N)tr (W zI)1(W0 (z x)I)1.
Notice when x and qi are independent, Lemmas 2.2, 2.3 give us
qi (W(i)zI)1(W0(zx)I)1qi (1/N)tr (WzI)1(W0(zx)I)1.
19
Letting
x=xN
= (1/N)K
i=1
ti
1 +tiSW(z)we have
7/25/2019 Tutorial T4
124/130
SW0(z
xN)
SW(z) = (1/N)
K
i=1
ti
1 +tiSW(z)di
where
di= 1 +tiSW(z)
1 +tiqi (W(i) zI)1qi
qi (W(i)
zI)1(W0
(z
xN)I)
1qi
(1/N)tr (W zI)1(W0 (z xN)I)1.
In order to use Lemma 2.3, for each i, xN is replaced by
x(i) = (1/N)Kj=1
tj1 +tjSW(i)(z)
.
20
Using Lemma 2.3 (p= 6 is sufficient) and the fact that all matrix
inverses encountered are bounded in spectral norm by 1/
z we have
from standard arguments using Booles and Markovs inequalities, and
the Borel-Cantelli lemma, almost surely
7/25/2019 Tutorial T4
125/130
, y
(2.1) maxiK max[| qi2
1|, |q
i (W(i) zI)1
qi SW(i)(z)|,
|qi (W(i)zI)1(W0(zx(i))I)1qi(1/N)tr (W(i)zI)1(W0(zx(i))I)1|]
0 as N .
This and Lemma 2.2 imply almost surely
(2.2) maxiK
max[|SW(z)SW(i)(z)|, |SW(z) qi (W(i) zI)1qi|] 0,
21
and subsequently, almost surely
(2.3) maxiK
max
1 +tiSW(z)1 (W I) 1
1 , |x x(i)|
0.
7/25/2019 Tutorial T4
126/130
( )iK
1 +tiqi (W(i) zI)1qi
| ( )|
Therefore, from Lemmas 2.2, 2.4, and (2.1) -(2.3),we get maxiKdi
0 almost surely, giving us
SW0(z xN) SW(z) 0,
almost surely.
22
On any realization for which the above holds and FW0 v W0,
consider any subsequence whichSW(z) converges to, say,
S, then, on
this subsequence
7/25/2019 Tutorial T4
127/130
xN = (K/N)1K
Ki=1
ti1 +tiSW(z) E T
1 + TS
Therefore, in the limit we have
S= S0
z E
T
1 + TS
,
which is (1.1). Uniqueness gives us, for this realization,SW(z) SasN . This event occurs with probability one.
23
3. Proof of uniqueness of (1.1) . ForS C+ satisfying (1.1)with z
C+ we have
S=
1
TdW0()
7/25/2019 Tutorial T4
128/130
z E
T
1 + TS0( )
=
1
z E T
1 + T
S iz+E
T2S
|1 + T
S|2
dW0()
Therefore
(3.1)
S= z+E T2
S|1 + TS|2 1 z+E T1 + TS2 dW0()
24
Suppose SSS C+ also satisfies (1.1). Then(3.2)
S SSS=
E T1 + TSSS T1 + TS
z+E T
1 + TS
z+E T
1 + TSSSdW0()
7/25/2019 Tutorial T4
129/130
1 + TS
1 + TSSS
= (S SSS)E
T2
(1 + TS)(1 + TSSS)
1
z+E T
1 +TS z+E
T
1 +TSSS
dW0().
Using Cauchy-Schwarz and (3.1) we have
E T
2
(1 +T
S)(1 +T
SSS)
1 z+E
T
1 + TS
z+E T
1 + TSSSdW0()
25
E
T2
|1 +T
S|2
1
z+E T1 + TS2d
W0()
1/2
T2 1
1/2
7/25/2019 Tutorial T4
130/130
E
T2
|1 + T
SSS|2
1
z+
E T1 + TSSS
2 dW0()
=
E T
2
|1 + TS|2 Sz+E T2S|1 + TS|2
1/2
E T
2
|1 + TSSS|2 SSSz+E T2SSS|1 + TSSS|2
1/2