1
Topics in Eigen-analysis
Lin Zanjiang
28 July 2014
Contents
1 Terminology ......................................................................................... 2
2 Some Basic Properties and Results ....................................................... 2
3 Eigen-properties of Hermitian Matrices ................................................ 5
3.1 Basic Theorems ............................................................................................................... 5
3.2 Quadratic Forms & Nonnegative Definite Matrices ........................................................ 6
3.2.1 Definitions ................................................................................................................ 7
3.2.2 Eigenvalue Properties of Nonnegative Definite Matrices ........................................ 7
4 Inequalities and Extremal Properties of Eigenvalues ............................. 9
4.1 The Rayleigh Quotient and the Courant-Fischer Min-max Theorem .............................. 9
4.2 Some Eigenvalue Inequalities ....................................................................................... 11
4.3 Application to Principal Component Analysis (PCA) ................................................... 13
2
1 Terminology
(1) 𝑐𝐴(𝑥) ∶= det(𝑥𝐼 − 𝐴) ≝ the characteristic polynomial of 𝐴
(2) The eigenvalues 𝜆 of 𝐴 ∶= the roots of 𝑐𝐴(𝑥)
(3) The 𝜆 −eigenvectors 𝒙 ∶= the nonzero solutions to (𝜆𝐼 − 𝐴)𝒙 = 𝟎
(4) The eigenvalue-eigenvector equation: 𝐴𝒙 = 𝜆𝒙
(5) 𝑆𝐴(𝜆) ∶= The eigenspace of a matrix 𝐴 corresponding to the eigenvalue 𝜆
(6) The characteristic equation: |𝑥𝐼 − 𝐴| = 0
(7) Standard inner product: ⟨𝒛, 𝒘⟩ = 𝑧1𝑤1̅̅̅̅ + ⋯+ 𝑧𝑛𝑤𝑛̅̅ ̅̅ 𝒛, 𝒘 ∈ ℂ𝑛
2 Some Basic Properties and Results
Theorem 2.1
(a) The eigenvalues of 𝐴 are the same as the eigenvalues of 𝐴𝑇
(b) 𝐴 is singular if and only if at least one eigenvalue of 𝐴 is equal to zero
(c) The eigenvalues and corresponding geometric multiplicities of 𝐵𝐴𝐵−1 are the
same as those of 𝐴, if 𝐵 is a nonsingular matrix
(d) The modulus of each eigenvalue of 𝐴 is equal to 1 if 𝐴 is an orthogonal matrix
Theorem 2.2(revision)
Suppose that 𝜆 is an eigenvalue, with multiplicity 𝑟 ≥ 1, of the 𝑛 × 𝑛 matrix 𝐴. Then
1 ≤ dim{𝑆𝐴(𝜆)} ≤ 𝑟
𝑃𝑟𝑜𝑜𝑓. If 𝜆 is an eigenvalue of 𝐴 , then by definition an 𝒙 ≠ 𝟎 satisfying the
equation 𝐴𝒙 = 𝜆𝒙 exists, and so clearly dim {𝑆𝐴(𝜆)} ≥ 1. Now let k = dim {𝑆𝐴(𝜆)},
and let 𝒙1, ⋯ , 𝒙𝑘 be linearly independent eigenvectors corresponding to 𝜆. Form a
nonsingular 𝑛 × 𝑛 matrix 𝑋 that has these 𝑘 vectors as its first 𝑘 columns; that is,
𝑋 has the form 𝑋 = [𝑋1 𝑋2] , where 𝑋1 = (𝒙1,⋯ , 𝒙𝑘) and 𝑋2 is 𝑛 × (𝑛 − 𝑘) .
Since each column of 𝑋1 is an eigenvector of 𝐴 corresponding to the eigenvalue 𝜆, it
follows that 𝐴𝑋1 = 𝜆𝑋1, and
3
𝑋−1𝑋1 = [𝐼𝑘(0)],
which follows from the fact that 𝑋−1𝑋 = 𝐼𝑛. As a result, we find that
𝑋−1𝐴𝑋 = 𝑋−1[𝐴𝑋1 𝐴𝑋2]
= 𝑋−1[𝜆𝑋1 𝐴𝑋2]
= [𝜆𝐼𝑘 𝐵1(0) 𝐵2
],
where 𝐵1 and 𝐵2 are a partition of the matrix 𝑋−1𝐴𝑋2 . If u is an eigenvalue of
𝑋−1𝐴𝑋, then
0 = |𝑋−1𝐴𝑋 − 𝜇𝐼𝑛| = |(𝜆 − 𝜇)𝐼𝑘 𝐵1(0) 𝐵2 − 𝜇𝐼𝑛−𝑘
|
= (𝜆 − 𝜇)𝑘|𝐵2 − 𝜇𝐼𝑛−𝑘|,
Thus, 𝜆 must be an eigenvalue of 𝑋−1𝐴𝑋 with multiplicity of at least 𝑘. The result
follows because from Theorem 2.1(c), the eigenvalues and corresponding geometric
multiplicities of 𝑋−1𝐴𝑋 are the same as those of 𝐴.
Theorem 2.3
Let 𝜆 be an eigenvalue of the 𝑛 × 𝑛 matrix 𝐴 , and let 𝒙 be a corresponding
eigenvector. Then,
(a) If 𝑘 ≥ 1 is an integer, 𝜆𝑘 is an eigenvalue of 𝐴𝑘 corresponding to the
eigenvector 𝒙.
(b) If 𝐴 is nonsingular, 𝜆−1 is an eigenvalue of 𝐴−1 corresponding to the
eigenvector 𝒙.
𝑃𝑟𝑜𝑜𝑓.
(a) Let us prove by induction. Clearly, (a) holds when 𝑘 = 1 because it follows from
the definition of eigenvalue and eigenvector. Next, if (a) holds for 𝑘 − 1, that is,
𝐴𝑘−1𝒙 = 𝜆𝑘−1𝒙, then
𝐴𝑘𝒙 = 𝐴(𝐴𝑘−1𝒙) = 𝐴(𝜆𝑘−1𝒙)
= 𝜆𝑘−1(𝐴𝒙) = 𝜆𝑘−1(𝜆𝒙) = 𝜆𝑘𝒙
(b) Let us premultiply the equation 𝐴𝒙 = 𝜆𝒙 by 𝐴−1, which gives the equation
𝒙 = 𝜆𝐴−1𝒙
Since 𝐴 is nonsingular, we know from Theorem 2.1(b) that 𝜆 ≠ 0, and so dividing
4
both sides of the above equation by 𝜆 yields
𝐴−1𝒙 = 𝜆−1𝒙 ,
which implies that 𝐴−1 has an eigenvalue 𝜆−1 and corresponding eigenvector 𝒙.
Theorem 2.4
Let 𝐴 be an 𝑛 × 𝑛 matrix with eigenvalues 𝜆1, ⋯ , 𝜆𝑛. Then
(a) tr(𝐴) = ∑ 𝜆𝑖𝑛𝑖=1
(b) |𝐴| = ∏ 𝜆𝑖𝑛𝑖=1
𝑃𝑟𝑜𝑜𝑓.
Express the characteristic equation, |𝑥𝐼𝑛 − 𝐴| = 0 into the polynomial form
𝑥𝑛 + 𝛼𝑛−1𝑥𝑛−1 +⋯+ 𝛼1𝑥 + 𝛼0 = 0
To determine 𝑎0, we can substitute 𝑥 = 0 into the equation, thus, 𝛼0 = |(0)𝐼𝑛 − 𝐴| =
|𝐴|. To determine 𝛼𝑛−1, we are actually going to find the coefficient of 𝑥𝑛−1. Recall
that the determinant is actually a sum of terms that are products of one entry in each
column(row) with row(column) positions spanning over all permutations of the integer
(1, 2,⋯ ,𝑚) with proper +/− signs, it can be easily seen that the only term that
involves at least 𝑛 − 1 of the diagonal elements of (𝑥𝐼𝑛 − 𝐴) is the term that
involves the product of all the diagonal elements. Now that this term involves an even
permutation as there exists a trivial composition of zero transpositions, the sign term
should be +1, therefore, 𝛼𝑛−1 will be the coefficient of 𝑥𝑛−1 in
(𝑥 − 𝑎11)(𝑥 − 𝑎22)⋯ (𝑥 − 𝑎𝑛𝑛),
which is obviously −tr(𝐴) . Now finally, note that 𝜆1, ⋯ , 𝜆𝑛 are the roots to the
characteristic equation, it follows that
(𝑥 − 𝜆1)(𝑥 − 𝜆2)⋯ (𝑥 − 𝜆𝑛) = 0
With coefficient matching, we find that
|𝐴| =∏𝜆𝑖
𝑛
𝑖=1
, tr(𝐴) =∑𝜆𝑖
𝑛
𝑖=1
which completes the proof.
5
3 Eigen-properties of Hermitian Matrices
3.1 Basic Theorems
Theorem 3.1.1
Let 𝐴 ∈ 𝑀𝑛×𝑛 be a matrix, then 𝐴 is hermitian if and only if
⟨𝐴𝒙, 𝒚⟩ = ⟨𝒙, 𝐴𝒚⟩
For all vectors 𝒙, 𝒚 ∈ ℂ𝑛.
𝑃𝑟𝑜𝑜𝑓.
(1) If 𝐴 is hermitian, then ⟨𝐴𝒙, 𝒚⟩ = 𝒚𝑇𝐴𝒙 = 𝒚𝑇𝐴𝑇𝒙 = (𝐴𝒚)𝑻𝒙 = ⟨𝒙, 𝐴𝒚⟩ , which
proves the " ⟹ " part.
(2) For the converse direction, let 𝒙, 𝒚 take the form of 𝒆𝟏, ⋯ , 𝒆𝒏 respectively, which
are the standard basis of ℝ𝑛, then it is immediately clear that ∀𝑖, 𝑗 = 1,⋯ , 𝑛, 𝑎𝑖𝑗 =
𝑎𝑗𝑖̅̅ ̅ ⟹ 𝐴 is hermitian.
Theorem 3.1.2 (The Schur’s Theorem Revisited)
Let 𝐴 ∈ 𝑀𝑛×𝑛 be a matrix, there exists a unitary matrix 𝑈 such that
𝑈𝐻𝐴𝑈 = 𝑇
is upper triangular, which is called the Schur’s Theorem. Actually, it is just the complex
counterpart of the Triangulation Theorem we learnt in class. With a minor modification,
we may write
𝐴 = 𝑈𝑇𝑈𝐻
And this is called the Schur Decomposition, with the diagonal entries of 𝑇 being the
eigenvalues of 𝐴.
3.1.3 Theorem (The Spectral Theorem Revisited)
If the 𝐴 in the last theorem turns out to be hermitian, then the corresponding 𝑇 will
become diagonal. This is called the Spectral Theorem. Similarly, it is the complex
extension of the Principal Axis Theorem we learnt in class. Again, with the same
modification, we may rewrite it as
𝐴 = 𝑈𝑇𝑈𝐻
This is called the Spectral Decomposition. These two eigenvalue decompositions are
just special cases of the Singular Value Decomposition which applies to nonsquare
6
matrices.
3.1.4 Theorem
Let 𝐴 ∈ 𝑀𝑛×𝑛 be a hermitian matrix, then
(a) 𝒙𝐻𝐴𝒙 is real ∀𝒙 ∈ ℂ𝒏.
(b) All eigenvalues of 𝐴 are real
(c) 𝑆𝐻𝐴𝑆 is hermitian for all 𝑆 ∈ 𝑀𝑛×𝑛.
(d) Eigenvectors of 𝐴 corresponding to distinct eigenvalues are orthogonal.
(e) It is possible to construct a set of 𝑛 orthonormal eigenvectors of 𝐴.
𝑃𝑟𝑜𝑜𝑓.
(a) (𝒙𝐻𝐴𝒙̅̅ ̅̅ ̅̅ ̅) = (𝒙𝐻𝐴𝒙)𝐻 = 𝒙𝐻𝐴𝐻𝒙 = 𝒙𝐻𝐴𝒙 , that is, 𝒙𝐻𝐴𝒙 equals its complex
conjugate and hence is real.
(b) (1) If 𝐴𝒙 = 𝜆𝒙 and 𝒙𝐻𝒙 = 𝑘 ∈ ℝ+ , then 𝜆 =𝜆
𝑘𝒙𝐻𝒙 =
1
𝑘𝒙𝐻𝜆𝒙 =
1
𝑘(𝒙𝐻𝐴𝒙 ),
which is real by (a).
(2) Alternative proof for (b): let 𝜆, 𝜇 be 2 eigenvalues of 𝐴, having eigenvectors
𝒙, 𝒚 correspondingly, it follows that 𝐴𝒙 = 𝜆𝒙 and 𝐴𝒚 = 𝜇𝒚 , according to
Theorem 3.1.1, we have λ⟨𝒙, 𝒚⟩ = ⟨𝜆𝒙, 𝒚⟩ = ⟨𝐴𝒙, 𝒚⟩ = ⟨𝒙, 𝐴𝒚⟩ = ⟨𝒙, 𝜇𝒚⟩ =
�̅�⟨𝒙, 𝒚⟩. In the case where 𝜇 = 𝜆 & 𝒚 = 𝒙, it becomes 𝜆⟨𝒙, 𝒙⟩ = �̅�⟨𝒙, 𝒙⟩, which
in turn implies that 𝜆 = �̅� since we know ⟨𝒙, 𝒙⟩ = ‖𝒙‖2 > 0 for a nonzero
eigenvector 𝒙. Therefore, 𝜆 must be real. And similarly, 𝜇 is real.
(c) (𝑆𝐻𝐴𝑆)𝐻 = 𝑆𝐻𝐴𝐻𝑆 = 𝑆𝐻𝐴𝑆, so 𝑆𝐻𝐴𝑆 is always hermitian.
(d) Following the discussion in (b)(2), finally we can get the equation 𝜆⟨𝒙, 𝒚⟩ =
𝜇⟨𝒙, 𝒚⟩, so if 𝜆 ≠ 𝜇, then it follows immediately that ⟨𝒙, 𝒚⟩ = 0, thus implying
that 𝑥 and 𝑦 are orthogonal.
(e) Following from the Spectral Decomposition, we rewrite it as
𝐴𝑈 = 𝑈𝑇
↔ A[𝒙1 ⋯ 𝒙𝑛] = [𝒙𝟏 ⋯ 𝒙𝒏] [𝜆1 0
⋱0 𝜆𝑛
]
↔ [𝐴𝒙1 ⋯ 𝐴𝒙𝒏] = [𝜆𝒙1 ⋯ 𝜆𝒙𝑛]
Therefore, it can be easily seen that (𝒙1 ⋯ 𝒙𝑛) are the corresponding
eigenvectors to (𝜆1 ⋯ 𝜆𝑛). As 𝑈 is unitary, it follows that these eigenvectors
are orthonormal.
3.2 Quadratic Forms & Nonnegative Definite Matrices
7
3.2.1 Definitions
(a) Let 𝐴 ∈ 𝑀𝑛×𝑛 be a symmetric matrix (hermitian matrix with real entries) and 𝒙
denote a 𝑛 × 1 column vector, then 𝑄 = 𝒙𝑇𝐴𝒙 is said to be a quadratic form.
Observe that
𝑄 = 𝒙𝑇𝐴𝒙 = (𝑥1 ⋯ 𝑥𝑛) (
𝑎𝟏𝟏 ⋯ 𝑎𝟏𝒏⋮ ⋱ ⋮𝑎𝒏𝟏 ⋯ 𝑎𝒏𝒏
)(
𝑥1⋮𝑥𝑛)
= (𝑥1 ⋯ 𝑥𝑛)
(
∑𝑎𝟏𝒊𝑥𝑖
⋮
∑𝑎𝒏𝒊𝑥𝑖)
= ∑𝑎𝑖𝑗𝑥𝑖𝑥𝑗𝑖,𝑗
(b) Let 𝐴 ∈ 𝑀𝑛×𝑛 be a symmetric matrix, then 𝐴 is
(1) Positive definite if ∀𝒙 ≠ 0 & 𝒙 ∈ ℝ𝑛, 𝑄 = 𝒙𝑇𝐴𝒙 > 0
(2) Nonnegative definite (Positive semidefinite) if ∀𝒙 ∈ ℝ𝑛, 𝑄 = 𝒙𝑇𝐴𝒙 ≥ 0
(3) Negative definite if ∀𝒙 ≠ 0 & 𝒙 ∈ ℝ𝑛, 𝑄 = 𝒙𝑇𝐴𝒙 < 0
(4) Nonpositive definite (Negative semidefinite) if ∀𝒙 ∈ ℝ𝑛, 𝑄 = 𝒙𝑇𝐴𝒙 ≤ 0
(5) Indefinite if 𝑄 > 0 for some 𝒙 while 𝑄 < 0 for some other 𝒙
We are only interested in positive or nonnegative cases because all theorems will be
similar for negative or nonpositive cases.
3.2.2 Eigenvalue Properties of Nonnegative Definite Matrices
3.2.2.1 Theorem
Let 𝜆1, ⋯ , 𝜆𝑛 be the eigenvalues of the 𝑛 × 𝑛 symmetric matrix 𝐴, then
(a) 𝐴 is positive definite if and only if 𝜆𝑖 > 0 for all 𝑖,
(b) 𝐴 is nonnegative definite if and only if 𝜆𝑖 ≥ 0 for all 𝑖.
𝑃𝑟𝑜𝑜𝑓.
(a) Let the columns of 𝑈 = (𝒙1 ⋯ 𝒙𝑛) be a set of orthonormal eigenvectors of 𝐴
corresponding to the eigenvalues 𝜆1, ⋯ , 𝜆𝑛 , so that 𝐴 = 𝑈𝑇𝑈𝑇 , where 𝑇 =
diag (𝜆1, ⋯ , 𝜆𝑛) . If 𝐴 is positive definite, then 𝒙𝑇𝐴𝒙 > 0 for all 𝒙 ≠ 𝟎 , so in
particular, choosing 𝒙 = 𝒙𝒊, we have
𝒙𝒊𝑇𝐴𝒙𝒊 = 𝒙𝒊
𝑻(𝜆𝑖𝒙𝒊) = 𝜆𝑖𝒙𝒊𝑇𝒙𝒊 = 𝜆𝒊 > 0
8
Conversely, if 𝜆𝑖 > 0 for all 𝑖, then for any 𝒙 ≠ 0 define 𝒚 = 𝑈𝑇𝒙, and note that
𝒙𝑇𝐴𝒙 = 𝒙𝑇𝑈𝑇𝑈𝑇𝒙 = 𝒚𝑻𝑇𝒚 =∑𝑦𝒊𝟐𝜆𝒊
𝒏
𝒊=𝟏
has to be positive since the 𝜆𝒊s are all positive and at least one of the 𝑦𝒊𝟐s is positive
because 𝒚 ≠ 𝟎.
(b) By similar argument as in (a), it is easy to show that 𝐴 is nonnegative definite if
and only if 𝜆𝑖 ≥ 0 for all 𝑖.
3.2.2.2 Theorem
Let 𝑇 be an 𝑚 × 𝑛 real matrix with rank(𝑇) = 𝑟. Then
(a) 𝑇𝑇𝑇 has 𝑟 positive eigenvalues. It is always nonnegative definite and positive
definite if 𝑟 = 𝑛.
(b) The positive eigenvalues of 𝑇𝑇𝑇 are equal to the positive eigenvalues of 𝑇𝑇𝑇.
𝑃𝑟𝑜𝑜𝑓.
(a) For any nonzero 𝑛 × 1 vector 𝒙, let 𝒚 = 𝑇𝒙, then
𝒙𝑇𝑇𝑇𝑇𝒙 = 𝒚𝑇𝒚 =∑𝑦𝒊2
𝑛
𝑖=1
is nonnegative, so 𝑇𝑇𝑇 is nonnegative definite, and thus by Theorem 3.2.2.1(b), all of
its eigenvalues are nonnegative. Further, observe that 𝒙 ≠ 𝟎 is an eigenvector of 𝑇𝑇𝑇
corresponding to a zero eigenvalue if and only if 𝒚 = 𝑇𝒙 = 𝟎 and thus the above
equation equals zero. Therefore, the number of zero eigenvalues equals the dimension
of 𝑛𝑢𝑙𝑙(𝑇), which is 𝑛 − 𝑟, so (a) is proved.
(b) Let 𝜆 > 0 be an eigenvalue of 𝑇𝑇𝑇 with multiplicity ℎ. Since the 𝑛 × 𝑛 matrix
𝑇𝑇𝑇 is symmetric, we can find an 𝑛 × ℎ matrix 𝑋, whose columns are orthonormal,
satisfying
𝑇𝑇𝑇𝑋 = 𝜆𝑋.
Let 𝑌 = 𝑇𝑋 and observe that
𝑇𝑇𝑇𝑌 = 𝑇𝑇𝑇𝑇𝑋 = 𝑇(𝜆𝑋) = 𝜆𝑇𝑋 = 𝜆𝑌,
so that 𝜆 is also an eigenvalue of 𝑇𝑇𝑇 with multiplicity also being ℎ because
rank(𝑌) = rank(𝑇𝑋) = rank((𝑇𝑋)𝑇𝑇𝑋)
= rank(𝑋𝑇𝑇𝑇𝑇𝑋) = rank(𝜆𝑋𝑇𝑋)
9
= rank(𝜆𝐼ℎ) = ℎ
So the proof is done.
4 Inequalities and Extremal Properties of Eigenvalues
4.1 The Rayleigh Quotient and the Courant-Fischer Min-max
Theorem
In this section, we are going to investigate some extremal properties of the eigenvalues
of a hermitian matrix, and see how to turn the problem of finding the eigenvalues into
a constrained optimization problem.
4.1.1 Definition
Let 𝐴 ∈ 𝑀𝑛×𝑛 be a hermitian matrix, then the Rayleigh quotient of 𝐴, denoted as
𝑅𝐴(𝒙), is a function from ℂ𝑛\{𝟎} to ℝ, defined as follows:
𝑅𝐴(𝒙) =𝒙𝐻𝐴𝒙
𝒙𝐻𝒙
It is not difficult to see that when the norm of 𝒙, ‖𝒙‖ = 1, the Rayleigh quotient of 𝐴
actually equals its quadratic form. In the next part, we are ready to relate the Rayleigh
quotient of a hermitian matrix to it eigenvalues.
4.1.2 Theorem
Let 𝐴 be a hermitian 𝑛 × 𝑛 matrix with ordered eigenvalues 𝜆1 ≥ 𝜆2 ≥ ⋯ ≥ 𝜆𝑛. For
any 𝒙 ∈ ℂ𝑛\{𝟎}
𝜆𝑛 ≤𝒙𝐻𝐴𝒙
𝒙𝐻𝒙≤ 𝜆1
And, in particular
𝜆𝑛 = min𝒙≠𝟎
𝒙𝐻𝐴𝒙
𝒙𝐻𝒙
𝜆1 = max𝒙≠𝟎
𝒙𝐻𝐴𝒙
𝒙𝐻𝒙
10
𝑃𝑟𝑜𝑜𝑓.
Let 𝐴 = 𝑈𝑇𝑈𝐻 be the spectral decomposition of 𝐴 , where the columns of 𝑈 =(𝒙𝟏 ⋯ 𝒙𝒏) are the orthonormal set of eigenvectors corresponding to 𝜆1,⋯ , 𝜆𝑛 ,
which make up the diagonal entries of the diagonal matrix 𝑇 . As in the proof of
Theorem 3.2.2.1, define 𝒚 = 𝑈𝐻𝒙, then we have
𝒙𝐻𝐴𝒙
𝒙𝐻𝒙=𝒙𝐻𝑈𝑇𝑈𝐻𝒙
𝒙𝐻𝑈𝑈𝐻𝒙=𝒚𝐻𝑇𝒚
𝒚𝐻𝒚=∑ 𝜆𝒊𝑦𝒊
𝟐𝒏𝒊=𝟏
∑ 𝑦𝒊𝟐𝒏𝒊=𝟏
Together with the fact that
𝜆𝒏∑𝑦𝒊𝟐
𝒏
𝒊=𝟏
≤∑𝜆𝒊𝑦𝒊𝟐
𝒏
𝒊=𝟏
≤ 𝜆𝟏∑𝑦𝒊𝟐
𝒏
𝒊=𝟏
The proof is complete.
In fact, we can see that the implication of this theorem is that we may regard the problem
of finding the largest and smallest eigenvalues of a hermitian matrix as a constrained
optimization problem:
maximize: 𝒙𝐻𝐴𝒙
subject to: 𝒙𝐻𝒙 = 1
Below is a theorem that generalizes the above theorem to all eigenvalues of 𝐴.
4.1.3 Theorem (the Courant-Fischer min-max theorem)
Let 𝐴 be an 𝑛 × 𝑛 hermitian matrix, then
(a) 𝜆𝒊 = maxdim(𝑉)=𝑖
min𝒙∈𝑉,‖𝒙‖=1
𝒙𝐻𝐴𝒙
(b) 𝜆𝒊 = mindim(𝑉)=𝑛−𝑖+1
max𝒙∈𝑉,‖𝒙‖=1
𝒙𝐻𝐴𝒙
𝑃𝑟𝑜𝑜𝑓.
(a) Recall that 𝒙𝐻𝐴𝒙 = 𝒚𝐻𝑇𝒚 = ∑ 𝜆𝒊𝑦𝒊𝟐𝒏
𝒊=𝟏 , where 𝒚 = 𝑈𝐻𝒙, or equivalently, 𝒙 =
𝑈𝒚, and also note that the linear transformation from 𝒙 to 𝒚 is an isomorphism and
there is no scaling as 𝑈 is unitary, so we may change the constraint to dim(𝑊) =
𝑖, 𝒚 ∈ 𝑊, ‖𝒚‖ = 1. And in order to get the maximum under the constraint dim(𝑊) =
𝑖, we just choose 𝑊 = span{𝒆𝟏, ⋯ , 𝒆𝒊}. Therefore, it is easily verified that
11
𝜆𝒊 ≤ min𝒚∈span{𝒆𝟏,⋯ ,𝒆𝒊}
∑𝜆𝒋𝑦𝒋𝟐
𝒊
𝒋=𝟏
= maxdim(𝑊)=𝑖
min𝒚∈𝑊,‖𝒚‖=1
𝒚𝐻𝑇𝒚
= maxdim(𝑉)=𝑖
min𝒙∈𝑉,‖𝒙‖=1
𝒙𝐻𝐴𝒙
Now it remains to prove that the left-hand side of the above equation is greater than or
equal to the right-hand side to finally get the equality. To prove it, we must show that
every 𝑖 −dimensional subspace 𝑉 of ℂ𝑛 contains a unit vector 𝒙 such that
𝜆𝒊 ≥ 𝒙𝐻𝐴𝒙
And from previous discussion, we know that it is equivalent to say that every
𝑖 −dimensional subspace 𝑊 of ℂ𝑛 contains a unit vector 𝒚 such that
𝜆𝒊 ≥ 𝒚𝐻𝐴𝒚
Now let 𝛺 = span{𝒆𝒊, ⋯ , 𝒆𝒏} with dimension 𝑛 − 𝑖 + 1, so it must have nonempty
intersection with every 𝑊. Then let 𝒘 be a unit vector in 𝛺 ∩𝑊, we may write
𝒘𝐻𝑇𝒘 =∑𝜆𝒋𝑤𝒋𝟐
𝒏
𝒋=𝒊
with
∑𝑤𝑗 = 1
𝒏
𝒋=𝒊
Thus it is immediately clear that the reverse inequality is proved. So finally we achieve
the equality.
(b) This can be proved simply by replacing 𝐴 by – 𝐴 and using the fact that
𝜆𝒊(−𝐴) = −𝜆𝑛−𝑖+1(𝐴).
4.2 Some Eigenvalue Inequalities
In this section we are going to introduce a few inequalities concerning eigenvalues,
which may be applied in eigenvalue estimation and inferences and also eigenvalue
perturbation theories. It turns out that many of these inequalities can be proved using
the min-max theorem derived in the last section.
12
4.2.1 Theorem (Weyl’s inequality)
Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 be hermitian matrices, then for 1 ≤ 𝑖 ≤ 𝑛, we have
𝜆𝒊(𝐴) + 𝜆𝒏(𝐵) ≤ 𝜆𝒊(𝐴 + 𝐵) ≤ 𝜆𝒊(𝐴) + 𝜆𝟏(𝐵)
𝑃𝑟𝑜𝑜𝑓.
First, we have
𝜆𝒊(𝐴 + 𝐵) = maxdim(𝑉)=𝑖
min𝒙∈𝑉,‖𝒙‖=1
𝒙𝐻(𝐴 + 𝐵)𝒙
= maxdim(𝑉)=𝑖
min𝒙∈𝑉,‖𝒙‖=1
( 𝒙𝐻𝐴𝒙 + 𝒙𝐻𝐵𝒙)
≥ maxdim(𝑉)=𝑖
min𝒙∈𝑉,‖𝒙‖=1
𝒙𝐻𝐴𝒙 + min𝒙∈𝑉,‖𝒙‖=1
𝒙𝐻𝐵𝒙
= maxdim(𝑉)=𝑖
min𝒙∈𝑉,‖𝒙‖=1
𝒙𝐻𝐴𝒙 + 𝜆𝒏(𝐵) = 𝜆𝒊(𝐴) + 𝜆𝒏(𝐵)
Which proves the left inequality. The right inequality can be proved in exactly the same
manner.
4.2.2 Corollary
Let 𝐴 ∈ 𝑀𝑛×𝑛 be a hermitian matrix, 𝐵 ∈ 𝑀𝑛×𝑛 be a positive semidefinite matrix,
then for 1 ≤ 𝑖 ≤ 𝑛, we have
𝜆𝒊(𝐴) ≤ 𝜆𝒊(𝐴 + 𝐵)
4.2.3 Theorem
Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 be hermitian matrices, if 1 ≤ 𝑗1 ≤ ⋯ ≤ 𝑗𝑘 ≤ 𝑛, then
∑𝜆𝑗ℓ(𝐴 + 𝐵) ≤
𝑘
ℓ=1
∑𝜆𝑗ℓ(𝐴) +
𝑘
ℓ=1
∑𝜆𝑗ℓ(𝐵)
𝑘
ℓ=1
4.2.4 Theorem (Cauchy Interlacing Theorem)
Let 𝐴 ∈ 𝑀𝑛×𝑛 be a hermitian matrix with eigenvalues 𝜆1 ≥ 𝜆2 ≥ ⋯ ≥ 𝜆𝑛 , and
partitioned as follows:
𝐴 = [𝐻 𝐵∗
𝐵 𝑅]
where 𝐻 ∈ 𝑀𝑚×𝑚 with eigenvalues 𝜃1 ≥ 𝜃2 ≥ ⋯ ≥ 𝜃𝑚, then
𝜆𝑘 ≥ 𝜃𝑘 ≥ 𝜆𝑘+𝑛−𝑚
13
4.3 Application to Principal Component Analysis (PCA)
Principal component analysis (PCA) is a technique that is useful for the compression
and classification of data. The purpose is to reduce the dimensionality of a data set
(sample) by finding a new set of variables, smaller than the original set of variables that
nonetheless retains most of the sample's information.
4.3.1 Definition
Let X = (𝒙𝟏, 𝒙𝟐, ⋯ , 𝒙𝒑) be an 𝑛 × 𝑝 data matrix with p being the number of
variables and n being the number of observations for each variable. Define the first
principal component of the sample by the linear transformation
𝒛𝟏 = 𝑋𝒂𝟏 =∑𝑎𝑖1𝒙𝑖
𝑝
𝑖=1
where the vector 𝒂𝟏 = (𝑎11, 𝑎21, ⋯ , 𝑎𝑝1)𝑇 is chosen such that var[𝒛𝟏] is maximal.
Likewise, the 𝑘th principal component is defined as the linear transformation
𝒛𝒌 = 𝑋𝒂𝒌 =∑𝑎𝑖1𝒙𝑖 , 𝑘 = 1,⋯ , 𝑝
𝑝
𝑖=1
where the vector 𝒂𝒌 = (𝑎1𝑘, 𝑎2𝑘, ⋯ , 𝑎𝑝𝑘)𝑇 is chosen such that var[𝒛𝒌] is maximal
subject to cov[𝒛𝒌, 𝒛𝒍] = 0 ⟹ 𝒂𝒌𝑇𝒂𝒍 = 0, 𝑘 > 𝑙 ≥ 1 and to 𝒂𝒌
𝑇𝒂𝒌 = 1.
After some computation, we get var[𝒛𝒌] = 𝒂𝒌𝑇𝑆𝒂𝒌 , where 𝑆 =
1
𝑛−1𝑋𝑇𝑋 is the
covariance matrix of the data matrix 𝑋. It is clear that 𝑆 is nonnegative definite, thus
having nonnegative eigenvalues. If we want to find 𝑘(𝑘 < 𝑝) principal components,
then we get
𝑍 = [𝒛𝟏 ⋯ 𝒛𝒌] = 𝑋[𝒂𝟏 ⋯ 𝒂𝒌] = 𝑋𝐴
The matrix 𝑍 is called the score matrix while 𝐴 is called the loading matrix.
4.3.2 Methods for implementing PCA
14
(a) Constrained optimization
To maximize 𝒂𝟏𝑇𝑆𝒂𝟏 subject to 𝒂𝟏
𝑇𝒂𝟏 = 1 , we use the technique of Lagrange
multipliers. We want to maximize the function
𝒂𝟏𝑇𝑆𝒂𝟏 − 𝜇(𝒂𝟏
𝑇𝒂𝟏 − 1)
w. r. t. 𝒂𝟏
by differentiating w. r. t. 𝒂𝟏 :
d
d𝒂𝟏(𝒂𝟏
𝑇𝑆𝒂𝟏 − 𝜇(𝒂𝟏𝑇𝒂𝟏 − 1)) = 𝟎
𝑆𝒂𝟏 − 𝜇𝒂𝟏 = 𝟎 𝑆𝒂𝟏 = 𝜇𝒂𝟏
From this step, it is obvious that 𝜇 is an eigenvalue of 𝑆(and of course, we can proceed
to this step without using the Lagrange multiplier, but instead using the min-max
theorem), so to maximize 𝒂𝟏𝑇𝑆𝒂𝟏 , certainly we are going to choose the largest
eigenvalue 𝜆1 of 𝑆. And then set 𝒂𝟏 to be its corresponding eigenvector, then we
get our first principal component 𝒛𝟏 = 𝒂𝟏𝑇𝑋, which finishes our first step. To find the
principal component, note that 𝒂𝟏, ⋯ , 𝒂𝒑 constitute an orthonormal basis, so to satisfy
the zero-covariance constraint, we have to do the optimization in the orthogonal
complement of span{𝒂𝟏,⋯ , 𝒂𝒌−𝟏} , thus according to the min-max theorem,
max 𝒂𝒌𝑇𝑆𝒂𝒌 = 𝜆𝑘, 𝒛𝒌 = 𝒂𝒌
𝑇𝑋, where 𝒂𝒌 is the unit eigenvector corresponding to
𝜆𝑘. Finally, we conclude that 𝑍 = [𝒛𝟏 ⋯ 𝒛𝒌] = 𝑋[𝒂𝟏 ⋯ 𝒂𝒌].
(2) Spectral decomposition and singular value decomposition (SVD)
First, we draw the conclusion from (1) that if we write the spectral decomposition of 𝑆
as
𝑆 = 𝐴𝑇𝐴𝑇
where 𝑇 = diag( 𝜆1, 𝜆2, ⋯ , 𝜆𝑛) with 𝜆1 ≥ 𝜆2 ≥ ⋯ ≥ 𝜆𝑛 , then the first 𝑘 columns
of 𝐴 just makes up the loading matrix that we want.
Next, let 𝑋 = 𝑈𝛴𝑉𝑇 be the singular value decomposition of the data matrix 𝑋, recall
that 𝑋𝑇𝑋 = 𝑆 = 𝑉𝛴2𝑉𝑇, so the columns of 𝑉 are the unit eigenvectors of 𝑆, then let
𝐴 = 𝑉, we finally get 𝑍 = 𝑋𝐴 = 𝑋𝑉 = 𝑈𝛴𝑉𝑇𝑉 = 𝑈𝛴.
In practice, the singular value decomposition is the standard way to do PCA since it
avoids the trouble of computing 𝑋𝑇𝑋.