BY
WAKE FOREST UNIVERSITY GRADUATE SCHOOL OF ARTS AND SCIENCES
in Partial Fulfillment of the Requirements
for the Degree of
Acknowledgments
I would like to thank my advisors for their hard work and patience.
When I started to be interested in hyperspectral imaging, I had not
taken any courses related to this topic. Dr. Erway spent a lot of
time explaining the definitions and theorems of numerical linear
algebra to me and answering my questions. Dr. Plemmons is learned
and broad minded. I did not only learn about hyperspectral imaging
from him; I also learned how to be a good researcher. He helped me
modify this thesis several times and gave me many useful
suggestions to make it better.
I also would like to thank Dr. Hu and Peter Zhang. Your intelligent
ideas and rich experience in imaging helped me with numerical
experiments. I do appreciate that you taught and shared your
knowledge with me. Thanks to Dr. Jiang and Dr. Pauca for being my
committee members. Thanks to Dr. Kirkman for your encouragement and
suggestions. Thanks to Jennifer Blevins, the tutor of the Writing
Center, for helping me proofread my thesis very carefully.
Last, I would like to thank my family and friends. Thank you for
your love and support.
ii
2.1 Numerical Linear Algebraic Preliminaries . . . . . . . . . . .
. . . . . 6
2.2 Statistics Preliminaries . . . . . . . . . . . . . . . . . . .
. . . . . . . 9
Chapter 3 SVD and PCA in Hyperspectral Imaging . . . . . . . . . .
. . . . . . . . . . . . . . . . 14
3.1 SVD in Hyperspectral Imaging . . . . . . . . . . . . . . . . .
. . . . 14
3.2 PCA in Hyperspectral Imaging . . . . . . . . . . . . . . . . .
. . . . 15
Chapter 4 Compressive-Projection Principal Component Analysis . . .
. . . . . . . . . 18
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 19
4.2.2 Reconstruction of the Principle Components . . . . . . . . .
. 23
4.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 24
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 30
5.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 31
5.4.3 Classification . . . . . . . . . . . . . . . . . . . . . . .
. . . . 39
6.1 Comparing Randomized SVD and Truncated SVD . . . . . . . . . .
. 43
6.2 Comparing Randomized SVD and CPPCA . . . . . . . . . . . . . .
. 44
iii
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 48
A.1 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 53
Vita. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 65
iv
Abstract
Hyperspectral imaging provides researchers with abundant
information with which to study the characteristics of objects in a
scene. Processing the massive hyperspectral imagery datasets in a
way that efficiently provides useful information becomes an
important issue. In this thesis, we consider methods which reduce
the dimension of hyperspectral data while retaining as much useful
information as possible.
Traditional deterministic methods for low-rank approximation are
not always adapt- able to process huge datasets in an effective
way, and therefore probabilistic methods are useful in dimension
reduction of hyperspectral images. In this thesis, we begin by
generally introducing the background and motivations of this work.
Next, we sum- marize the preliminary knowledge and the applications
of SVD and PCA. After these descriptions, we present a
probabilistic method, randomized Singular Value Decompo- sition
(rSVD), for the purposes of dimension reduction, compression,
reconstruction, and classification of hyperspectral data. We
discuss some variations of this method. These variations offer the
opportunity to obtain a more accurate reconstruction of the matrix
whose singular values decay gradually, to process matrices without
target rank, and to obtain the rSVD with only one single pass over
the original data. Moreover, we compare the method with
Compressive-Projection Principle Component Analysis (CPPCA). From
the numerical results, we can see that rSVD has better performance
in compression and reconstruction than truncated SVD and CPPCA. We
also apply rSVD to classification methods for the hyperspectral
data provided by the National Geospatial-Intelligence Agency
(NGA).
v
Chapter 1: Introduction
The history of remote sensing using imaging dates back to the
middle of the 20th
century. At that time, people started to take photos from the sky
by fixing the camera
to a balloon, a kite, or a pigeon. These rudimentary experiments
demonstrated
the basic idea of remote sensing [1, 2]. With the invention of the
airplane, aerial
photography became possible. During World War I and World War II,
people started
to recognize the importance of the information gathered from a
remote place and
began using this information in a strategic ways [3]. The
development of artificial
satellites in the latter half of the 20th century made remote
sensing possible for civil,
research, and military purposes on a global scale [4]. After the
development of these
satellites, remote sensing became a new scientific area. Fig. 1.1
illustrates this brief
history of remote sensing.
Figure 1.1: Remote sensing developed from rudimentary ideas, to
aerial photography, and then to spectral imaging.
Spectral analysis is important in the area of remote sensing.
Researchers apply the
spectral information of objects in environmental remote sensing,
monitoring chemi-
cal/oil spills and military target discrimination [5, 6, 7]. For
collection of the spectral
data, color imagery, color infrared imagery, and multispectral
imagery were invented.
However, these kinds of images still cannot offer us enough
information to construct
1
the “spectrum” of an object. Thus, when hyperspectral imagery was
invented, it
became an important milestone in remote sensing. Compared to
multispectral sen-
sors, hyperspectral sensors measure energy in many narrow bands
(Fig. 1.2). As a
result, hyperspectral imaging produces the spectra of all pixels so
that every pixel
contains abundant information about the object, which allows us to
learn more about
the characteristics of objects in a scene. However, the
hyperspectral imagery datasets
are so massive that the traditional technologies are not always
adapted to process
them well. In hyperspectral imaging, classification and target
detection are major
objectives. To achieve any of them with these huge data sets,
dimension reduction
is an important task. In this thesis, methods are presented to
reduce the dimension
and classify hyperspectral images.
Visually, a hyperspectral image is represented as a cube, and one
common method
to handle it is to reorganize the image as a matrix. For example, a
100 × 100 × 200
hyperspectral image with 100 × 100 pixels and 200 bands can be
represented as a
10000× 200 matrix. More details about how to reorganize
hyperspectral images into
matrices will be shown in Chapter 2.
The idea of dimension reduction is to transform the data in a
high-dimensional
space to a space of fewer dimensions. There are many methods to
deal with this
process, such as manifold learning [8], non-negative matrix
factorization [9], principle
component analysis (PCA) [10], and singular value decomposition
(SVD) [11]. To
achieve these methods, low-rank matrix approximation is often
useful. For example,
PCA and truncated SVD are nothing other than a low rank
approximation [12]. The
form of low rank approximation can be shown as
Am×n ≈ Bm×kCk×n (1.1)
where k (k < min{m,n}) is the numerical rank of A.
Low-rank approximation is widely used in many fields, since large
matrices can
2
be stored inexpensively and be multiplied rapidly with vectors or
other matrices by
using this factorization. For example, researchers often use
low-rank approximation
in data analysis [12], in solving least squares problems [13], and
in model reduction
or coarse graining for solution of PDEs [14]. Generally low-rank
approximation can
be obtained by two kinds of algorithms, deterministic and
probabilistic. The classical
methods for deterministic low-rank approximation are based on QR
factorization,
eigenvalue decomposition, and singular value decomposition which
are challenged by
hyperspectral imaging. The major reasons are: first, they are not
always adapted
to solve such large-scale problems; second, they are unable to
handle matrices with
missing or inaccurate data; third, they often require several
passes over data [15].
For example, the original SVD often cannot even finish the task of
the hyperspetral
matrix factorization for the requirements of a large number of
operations and large
memory. Even if the truncated SVD, which is the optimal low-rank
representation,
can give the factorization [16], it often needs considerable
time.
Compared with deterministic methods, probabilistic methods are
generally faster
and more robust in practice [17]. These methods begin by projecting
the original ma-
trix to a lower dimensional space by multiplying a random matrix.
One then factorizes
the matrix in the lower dimensional space. The aim of the
probabilistic methods is
to capture most of the information of the original data and perform
processing on a
reduced-size matrix.
Many studies related to random projection in hyperspectral imaging
for large
amounts of data, either algorithmic [18, 19] or experimental [11,
20, 21], have shown
positive results. For example, Fowler proposed a method [19],
Compressive-Projection
Principle Component Analysis (CPPCA), which uses random projection
to reduce the
dimension in a light encoder system, transmits the projected data
to the decoder on
the ground, and reconstructs the original data in the decoder
system. This process is
3
driven by the Rayleigh-Ritz theory and achieved by convex-set
optimization. CPPCA
shifts the computational burden effectively from the resource
constrained encoder
to the decoder, and the reconstruction obtained by CPPCA is more
accurate than
that obtained by popular methods related to compressed sensing.
However, CPPCA
recovers coefficients of a known sparsity pattern in an unknown
basis and requires an
additional step to recover the eigenvectors.
In this thesis, we present a randomized Singular Value
Decomposition (rSVD)
method for the purposes of dimension reduction, compression,
reconstruction, and
classification with hyperspectral data. Moreover, we discuss this
method with some
variations for different cases. These variations offer the
opportunity to obtain a
more accurate reconstruction of the matrix whose singular values
decay gradually, the
opportunity to process matrices without target rank, and the
opportunity to obtain
the rSVD by only one single pass over the original data. The good
results of rSVD are
shown by the numerical experiments and the comparisons of the
computation time
and accuracy by rSVD and CPPCA with real hyperspectral data.
The structure of this thesis is as follows. In Chapter 2, we
summarize the back-
ground in numerical linear algebra, statistics, and hyperspectral
imaging in sections
2.1, 2.2, and 2.3, respectively. In Chapter 3, we describe the
applications of singular
value decomposition and principle component analysis in
hyperspectral imaging. In
Chapter 4, we generally introduce the CPPCA method in section 4.1,
show in detail
the CPPCA algorithm in section 4.2, and analyse the performance in
section 4.3.
In Chapter 5, the general introduction of rSVD is shown in section
5.1, the related
algorithms are presented in section 5.2, the performance of rSVD is
analyzed in sec-
tion 5.3, and the applications in hyperspectral imaging is shown in
5.4. In Chapter
6, we compare the performances of rSVD and truncated SVD and the
performances
of rSVD and CPPCA. Finally, we present the results of numerical
experiments and
4
5
Chapter 2: Background Preparation
To better illustrate the ideas and the numerical experiments of
this thesis, this
chapter reviews some useful background tools. We start with some
definitions and
theorems in numerical linear algebra.
2.1 Numerical Linear Algebraic Preliminaries
To begin with, we review some classical deterministic matrix
decomposition meth-
ods, including the singular value decomposition, QR factorization,
and eigenvalue
decomposition [22].
Definition 1. Given any matrix A ∈ Rm×n (m > n), there is a
singular value de-
composition (SVD) of A. It can be expressed as
A = Um×mSm×nV T n×n (2.1)
where U is an m × m orthonormal matrix, S is an m × n diagonal
matrix with
S = diag(σ1, σ2, . . . , σn) and V is an n×n orthonormal matrix.
The diagonal entries
of S, σ1 ≥ σ2 ≥ ... ≥ σn ≥ 0, are known as the singular values of
A.
When the size of A is large, the calculation of the SVD is
expensive. Thus, the
approximation of the SVD, the truncated Singular Value
Decomposition, turns out
to be a more widespread method in practice than the full SVD for
large matrices.
Definition 2. Given a matrix A ∈ Rm×n, the truncated Singular Value
Decomposition
of A can be expressed as
A ≈ Um×kSk×kV T k×n (2.2)
where k is the numerical rank, U and V are orthonormal matrices and
S is a diagonal
matrix.
6
Theorem 2.1. Given a matrix A ∈ Rm×n, m ≥ n, there exists a
factorization
Am×n = Qm×nRn×n where Q is an orthogonal matrix and R is an upper
triangular
matrix. This factorization is termed as QR factorization.
Definition 3. Given a square matrix A ∈ Rm×m, the eigenvalue
decomposition of A
is expressed as
A = Xm×mΛm×mX −1 m×m (2.3)
where X is a nonsingular matrix whose ith column is an eigenvector
of A and Λ is a
diagonal matrix whose ith diagonal entry is the corresponding
eigenvalue.
In particular, when A is symmetric, X contains the singular vectors
of A. Next,
we introduce an important theoretical foundation for this
thesis.
Theorem 2.2. (Johnson-Lindenstrauss lemma,1984) For any 0 ≤ ε ≤ 1
and any
integer n, if k ≥ 4(ε2/2− ε3)−1 ln(n), then for any set X of n
points in Rd, there is
a Lipschitz function f : Rd → Rk such that
(1− ε)u− v ≤ f(u)− f(v) ≤ (1 + ε)u− v (2.4)
for any u, v ∈ X.
This theorem shows us that the distances between points can be
preserved by the
projection from a high dimension space to a lower dimension
subspace. A proof of
this theorem is given in [23]. Both CPPCA and rSVD are based on
this theorem. We
show the details in Chapter 4 and Chapter 5.
Last, we need to mention some preliminary results of numerical
linear algebra
related to the rSVD before moving on to the next section. Theorem
2.3 relates the
error of the approximation of SVD, Ak, to the singular value
σk+1.
Theorem 2.3. Given a matrix A ∈ Rm×n and k is the rank of A, if the
approximation
of A is
7
then the error between A and Ak is A− Ak2 = σk+1.
Therefore the approximation of A, Ak, is less accurate for a fixed
k when σ1 is
large and the singular values decay gradually.
Usually, we can examine whether the singular values of a matrix
decay rapidly or
gradually by observing the plot of singular values on a log scale
visually, as in Fig
2.1. Theorem 2.4 gives the average spectral error with an
oversampling parameter.
Figure 2.1: Plots of the first 50 singular values of two matrices
with same size on a log scale.
Theorem 2.4. (Average spectral error) Given a matrix A ∈ Rm×n and a
Gaussian
random matrix , if the sample matrix
Yk+p = An×(k+p) = Um×(k+p)S(k+p)×(k+p)V T
(k+p)×n, (2.6)
where k is the rank of A and p is an oversampling parameter with
small integral
number, then the average spectral error is
EA− Ak+p2 ≤
( 1 +
√ k
where E denotes the expectation with respect to [17].
In other words, when the singular values decay gradually, the error
of approxima-
tion may be large, by these two theorems. Therefore, in this case,
we use the power
iteration (AAT )qA for a small integer q instead of A in (2.6) for
reducing the error.
Here, we give the average spectral error for the power
iteration.
Theorem 2.5. (Average spectral error for the power iteration) Use
the hypotheses
of Theorem 2.3. Let Z = (AAT )qA where q is a small positive
integer, the average
spectral error is
(1 +
√ k
(2.8)
where σi is the ith singular value of A and E is the expectation
with respect to .
To find the proofs of Theorems 2.3, 2.4 and 2.5, please refer to
[17].
2.2 Statistics Preliminaries
Hyperspectral imagery datasets contain abundant information on
wavelength bands.
Often, the information on one wavelength band resembles the
information on a differ-
ent wavelength band. This phenomenon makes data analysis difficult;
not only does
it cause redundant calculation, but it also makes the data analysis
more complex.
Therefore, it is natural to create a small number of new variables
to be surrogates
for the original large number of variables. Principal component
analysis (PCA) is an
efficient method for executing this. We introduce it in detail in
this section.
PCA uses a linearly orthogonal transformation to convert a set of
observations
of possibly correlated variables into a set of values of linear
uncorrelated variables.
9
When we consider a set of correlated variables X = (X1, ..., Xn)T
with the expecta-
tion E(X) = µ and the covariance matrix D(X) = Σ, the linear
transform can be
expressed as
..........................................................,
(2.9)
where (w1, w2, . . . , wn) forms the transform matrix W . The
variance and covariance
of Z can be calculated by W and Σ as follows:
V ar(Zi) = wTi Σwi (2.10)
and
where i, j = 1, 2, ..., n.
If Z1 includes most of the information of X1, ..., Xn, it can be
treated as a sur-
rogate for X1, ..., Xn. But how do we measure the “information”? In
the classical
method of measurement, the more information Z1 includes, the
greater the value of
V ar(Z1). When Z1 fails to express enough information of X1, ...,
Xn, we can consider
adding Z2 to complement the information. Generally, we hope to use
as few of the
Zi’s as possible. To make Z2 include as much new information as
possible, Z2 should
not contain the information Z1 includes, i.e. Cov(Z2, Z1) = 0. Now
that we have in-
troduced the main idea of PCA, we give the formal definition of
principle components
now.
Definition 4. Given a set of correlated variables X = (X1, ...,
Xn)T , Zi = wTi X is
the ith principle component of X if
10
(1) wTi wi = 1, i = 1, ..., n;
(2) when i > 1, wTi Σwj = 0, j = 1, ..i− 1;
(3) V ar(Zi) = maxwTw=1,wT Σwj=0(j=1,...i−1)V ar(w TX)
By this definition, the problem of obtaining the first principle
component Z1 =
wT1 X is equivalent to the problem of obtaining w1. It could be
treated as an opti-
mization problem,
maximize w1
V ar(Z1)
(2.12)
The technique of Lagrange multipliers can be used to solve this
problem. Consider
a function f(w1)
= wT1 Σw1 − λ(wT1 w1 − 1). (2.15)
Differentiating with respect to w1 and λ gives
{ ∂f ∂w1
= 2(Σ− λI)w1 = 0, ∂f ∂λ
= wT1 w1 − 1 = 0. (2.16)
Since w1 6= 0, |Σ − λI| = 0 is used to find the eigenvalues and
eigenvectors of Σ.
To decide which of these eigenvectors gives Z1 with maximum
variance, we observe
that
V ar(Z1) = wT1 Σw1 = wT1 λw1 = λwT1 w1 = λ. (2.17)
Thus, to maximize the parameter λ, we choose the eigenvector w1
corresponding to
the largest eigenvalue λ. Generally, we can obtain the ith
principle component by the
11
eigenvector corresponding to the ith largest eigenvalue [24].
Theorem 2.6 gives the
more formal statement of the preceding discussion. In addition, the
theorem shows
that obtaining the transform matrix W is equivalent to finding the
eigenvectors of
the covariance matrix Σ.
Theorem 2.6. Consider a set of correlated variables X = (X1, ...,
Xn)T where Xi ∈
Rm; the covariance matrix of X is D(X) = Σ = XXT
m . If the eigenvalues of Σ are
λ1 ≥ λ2 ≥ ... ≥ λn ≥ 0, and w1, w2,...,wn are the eigenvectors
corresponding to the
eigenvalues, then Zi = wTi X is the ith principle component of
X.
2.3 Hyperspectral Imaging Preliminaries
Let us envision a hyperpectral image, so we can place it into a
visual context. A
hyperspectral image looks like a cube which is formed from several
“images” (Fig.
2.2). Each “image” contains all pixels in a wavelength band. For
each pixel, the
hyperspectral image is measured by many continuous wavelength
bands. Now, two of
the most used hyperspectral image spectrometers, NASA’s Airborne
Visible/Infrared
Imaging Spectrometer (AVIRIS) and naval research laboratory’s
Hyperspectral Digi-
tal Imagery Collection Experiment (HYDICE) can generate 224 and 210
wavelength
bands, respectively [25].
When compared with traditional multispectral imaging, hyperspectral
imaging
provides much more information because of its greater number of
wavelength bands.
Also, the reflectance curve of each pixel through the wavelength
bands is essentially
continuous in hyperspectral images. Thus, each pixel has an entire
spectrum which
can be used to determine a spectral signature. Every material has a
particular spectral
signature, so this is useful to identify an object by extracting
the spectral signature
at each pixel and comparing it with known spectral
signatures.
Hyperspectral imaging also has some disadvantages. The main
disadvantage is
12
Figure 2.2: Hyperspectral image [26].
that the computational cost for processing can be very large. Is it
possible to reduce
the computational cost? Let us keep this question in mind as we
continue to introduce
the hyperspectral image.
To process a hyperspectral image, we need to reorganize its data
first. Consider
a hyperspectral image which has a× b pixels and n wavelength bands.
Let m = a ∗ b.
Form a m × n matrix A. Each entry of Ai,j, (i = 1, ...,m and j = 1,
..., n) is the
reflectance of pixel ith in the jth wavelength band. Every row
contains the reflectance
of all wavelength bands of a pixel, and every column contains the
reflectance of all
pixels in a certain wavelength band. We use this matrix in the
later chapters.
13
3.1 SVD in Hyperspectral Imaging
Singular Value Decomposition (SVD) is a powerful tool in
hyperspectral image anal-
ysis. The SVD of a matrix can be directly used for noise reduction,
data compression,
and dimension reduction. In addition, it is also related to the
processes of classifica-
tion and unmixing. In this section, we review these uses of the SVD
in hyperspectral
imaging. Here, we use the matrix A, the hyperspectral data in
matrix form, which
we introduced in Chapter 2.
As we know, image noise is undesirable but cannot be avoided from
the image
capture, so how to denoise the hyperspectral images is usually the
first step in hy-
perspectral imaging. Since most of the actions of a matrix are
contained in the first
singular values and their corresponding singular vectors, the
truncated SVD can be
used to denoise the matrix by discarding the small singular values
which mainly rep-
resent the noise. The truncated SVD, UkSkVk T , then, represents
the denoised matrix
of A where k is the numerical rank. For example, [11] shows how to
denoise hyper-
spectral images by SVD and how to unmix them based on the
compressive sensing
method.
The truncated SVD can also represent a compressed hyperspectral
dataset. Recall
the hyperspectral data we introduced in Chapter 2. The sizes of the
hyperspectral
datasets are usually large. Thus, compression is an important topic
in hyperspectral
imaging. We can compress them from two directions. One direction
deals with
the data of the wavelength bands. The other direction deals with
the data of the
pixels. Reference [27] shows the methods to compress hyperspectral
data by random
projections.
14
One way of dimension reduction is to project the data in the
high-dimensional
space to a lower-dimensional subspace which captures most of the
action of the data.
Two of the methods that can be used to reduce the dimension are
band selection
and feature extraction. SVD is beneficial to the method of feature
extraction. By
using SVD, the dimension of data can be reduced to the space
spanned by the first k
columns of U . The projection Ap is expressed as
Ap = UT k A, (3.1)
where A is an m × n matrix and Ap is a k × n matrix. The row
dimension of Ap is
generally much less than that of A.
For classification, we consider the matrix X which is the transpose
of A. The
projection Xp is
Xp = V T k X, (3.2)
where the columns of Vk are the right singular vectors of A and Xp
includes the
“most common” information from all pixels. Then it is appropriate
to use Xp to
do the unsupervised classification of the hyperspetral image data.
With different
user-defined numerical ranks, k, we have different levels of
accuracy in classification.
In short, SVD is useful in the initial processing of hyperspectral
images because it
can provide a natural, ordered hierarchy for the compressed
representation of informa-
tion and it provides an orthogonal basis for the range of the
matrix of hyperspectral
data matrix.
3.2 PCA in Hyperspectral Imaging
PCA is commonly used in feature extraction, unmixing, and target
detection from
hyperspectral images. The main reason for this is that PCA can
convert a large set
of hyperspectral data into a smaller set of linear uncorrelated
variables. As we intro-
15
duced in Chapter 2, this means PCA can be used to reduce the
dimension while losing
the information of the original data as little as possible.
Compared to the other linear
projections which are used to reduce the dimension, PCA has better
performance in
preserving target detection and classification capabilities after
dimension reduction
in most cases [28].
Based on PCA, some useful methods are generated. For example, Jia
and Ricard
[29] proposed the segmented principal components transformation
which uses the
property that the hyperspectral data present high correlation in
the neighbouring
spectral bands and the hyperspectral data with high correlations
along the diagonal
line appears in blocks (Fig. 3.1). They partition the hyperspectral
data into different
subsets along the diagonal line, and apply PCA to different subsets
for obtaining more
accurate reconstruction.
Figure 3.1: The hyperspectral data present high correlation in the
diagonal line [30].
A similar idea is used in the method of class dependent
compressive-projection
PCA [31]. This method partitions the image into several subsets
such that each subset
represents a unique class that has higher correlation than the
subset partitioned by
the segmented principal components transformation.
Moreover, directed principle component analysis, selective
principle component
analysis, standard principle component analysis, and
residual-scaled principle com-
16
ponent analysis are often utilized in hyperspectral imaging to
improve the perfor-
mance of the traditional PCA [32]. In the next chapter, we
introduce a method,
Compressive-Projection Principal Component Analysis (CPPCA), which
has been
proposed recently and improves traditional PCA with the idea of
compressive projec-
tion.
Before we end this chapter, let us observe some relationships
between PCA and
SVD. Consider the matrix X ∈ Rn×m and the matrix A, which is equal
to the
transpose of X. The covariance matrix Σ of X can be expressed
as
Σ = XXT
m = WΛW T (3.3)
where Λ is a diagonal matrix with the entries of eigenvalues,
λ1(Σ), λ2(Σ)..., λn(Σ)
and the columns of W are the eigenvectors corresponding to the
eigenvalues. W is
also called a transform matrix in PCA.
Since
(3.3) is equivalent to
Σ = V SUTUSV T
m = V S2V T
m (3.5)
Therefore, we conclude that the matrix V for A is equal to the
matrix W for X
and
S2
m = Λ. (3.6)
Moreover, the principle components W TX, V TX and SUT are
equal.
17
Analysis
As we introduced in Chapter 2, PCA is a data dependent transform
which results
from the eigenvalue decomposition of the covariance matrix of a
dataset. We have
shown PCA plays a central role in dimension reduction in Chapter 3,
but its use is
limited in many resource-constrained settings, like the
hyperspectral sensing platform
in satellite-borne devices. One of the reasons its use is limited
is because the PCA
transform has to be calculated in this resource-constrained setting
before it can be
applied to the data set. This means the computational burden is in
the encoder
system which may not have the ability to execute this task.
Fowler [19] has proposed a method called Compressive-Projection
Principal Com-
ponent Analysis (CPPCA). The CPPCA encoder projects the dataset at
the signal
sensor onto lower dimensional subspaces chosen at random, then the
CPPCA decoder
reconstructs not only the PCA transform matrix for the transmitted
dataset but also
an approximation of the principle components by these randomly a
priori projections.
This process can transfer the computational burden from the encoder
to the decoder
successfully. The data flow is shown in Fig. 4.1.
Figure 4.1: Data flow of CPPCA [33].
18
In this chapter, we review CPPCA in Section 4.1, present the CPPCA
algorithm
in Section 4.2, perform CPPCA on a real dataset and observe the
results in Section
4.3, and apply it to hyperspectral image data compression in
Section 4.4.
4.1 Introduction
Consider a dataset of correlated variables X ∈ Rn×m with the
expectation E(X) =
µ and the covariance matrix D(X) = Σ, where each column of X, Xi ∈
Rn and
Σ = 1 m XXT . Theorem 2.6 shows the PCA transform matrix W is
formed by the
eigenvectors of Σ. So W can be calculated by the eigenvalue
decomposition as
Σ = WΛW T (4.1)
where Λ is a diagonal matrix with the entries of eigenvalues,
λ1(Σ), λ2(Σ)..., λn(Σ).
Instead of calculating the transform of PCA in the encoder, CPPCA
allows this
calculation to be shifted to the decoder. Suppose we have an
orthonormal matrix
P ∈ Rn×k, (k ≤ n), whose columns form the basis of a k-dimensional
subspace P .
The orthogonal projection of X onto the subspace P is Y = PP TX.
The projected
data Y = P TX (Y ∈ Rk×m) is transmitted from the encoder to the
decoder. The
projected covariance matrix Σ can be expressed as
Σ = P TX(P TX)T
Σ = U ΛUT (4.3)
where Λ = diag(λ1(Σ), λ2(Σ)..., λk(Σ)). Define λ1(Σ), λ2(Σ)...,
λk(Σ) as Ritz values.
From (4.2) and (4.3), we find out that
P TΣP = U ΛU (4.4)
⇒ Σ = PU ΛUP T (4.5)
19
where the ui’s are the columns of U . Define Pui = ui as Ritz
vectors, i = 1, 2, ..., k,
where ||ui|| = 1.
Also, the orthogonal projection wj onto P with unit length is
defined as the
normalized projection vj
vj = PP Twj
||PP Twj||2 (4.6)
where j = 1, ..., n.
Generally, the Ritz vector ui cannot be used to approximate any vj,
j = 1, ..., n.
But, if the subspace P is chosen randomly, and the eigenvalues of Σ
are sufficiently
separated, i.e., λ1(Σ) λ2(Σ) ... λk(Σ), then the corresponding
normalized
projection vj is very close to the Ritz vector ui. Each ui
corresponds to the Ritz value
λi(Σ), i.e., ui ≈ vi, i = 1, .., k (Fig. 4.2). For more details,
see [19].
Figure 4.2: The projection of x onto the subspace P . The Ritz
vector ui is close to the normalized projection vi [33].
After we have the approximation of vj by uj, an algorithm based on
projections
onto convex sets (POCS) we can reconstruct the first L
eigenvectors. These are
20
assembled into the approximation of the L−component transform
matrix W , denoted
by Ψ, and the principle components Z from Y and P .
Before introducing the CPPCA algorithm, let us first review the
method of POCS
[34]. The method of POCS is an iterative algorithm aimed at finding
the vector w in
the intersection of a given sequence {Cj}Jj=1 of closed convex
sets. That is,
w ∈ C0 = ∩Jj=1Cj. (4.7)
4.2 CPPCA Algorithm
First, we introduce the CPPCA Encoder Algorithm. The CPPCA encoder
splits
X = [X1, X2, . . . , Xm] into J partitions of columns Xj. Each Xj
is related to its ran-
dom projection P j of P , j = 1, 2, . . . , J . Then Y j = P jXj is
formed by Algorithm 1.
Algorithm 1 CPPCA Encoder
1: Draw a length-J cell array of n× k projection matrices P{1}, . .
. , P{J}. 2: for j = 1 to J do 3: X{j} ← X(:, j : J : m); 4: Y {j}
← P{j}TX{j}; 5: end for
This algorithm is based on an assumption that each Xj resembles X
statistically,
such that it has the approximate eigenvalue decomposition of X
[19].
4.2.1 Reconstruction of the Transform Matrix of PCA
As we discussed in the introduction, the CPPCA decoder does not
have access to
either the original data X or the covariance matrix Σ. Thus, the
transform matrix
W of PCA cannot be calculated directly by the eigenvalue
decomposition (4.1) in the
decoder. The main goal of CPPCA is to approximate the transform
matrix W with
the projected data Y and a priori P . By taking advantage of the
approximation of
21
W , we can easily obtain the approximations of the principle
components Z and the
original data set X.
In this part, we introduce the algorithm for reconstructing the
first L eigenvectors
of Σ, i.e., the first L columns of the transform matrix of PCA.
Given the normalized
projection v of the eigenvector w in the subspace P , we form the
subspace C as
C = P⊥ ⊕ span{v}. (4.8)
Thus, C is the direct sum of orthogonal complements of P with a
plane containing v.
In order to form the subspaces C1, C2, . . . , CJ , we generate J
random subspaces
P1,P2, . . . ,PJ which contain v1, v2, . . . , vJ respectively by
the orthonormal projection
matrices P 1, P 2, . . . , P J . Then C1, C2, . . . , CJ can be
formed by (4.8).
Figure 4.3: w1 are projected on two different planes [33].
Figure 4.4: These planes has an inter- section. We can find w1 in
the inter- section [33].
By the Fig 4.3 and Fig 4.4, we can find that w is in the
intersection of C1 ∩ . . . ∩
CJ(J = 2). Since C1, C2, . . . , CJ are closed and convex, it is
appropriate to use the
POCS method to give the approximation of w by ui instead of vi if
the eigenvalues
of Σ are sufficiently separated. This POCS solution can be used to
approximate w.
22
The iteration of an estimation of w is formed as
wt = 1
QjQjT wt−1 (4.9)
where t = 1, 2, ....
The approximation of wi is the normalization of the convergence of
wti while the
initial w0 i is the average of the Ritz vectors and Qj is used to
perform C(j). We
indicate w0 i as
uji (4.10)
where uji is the ith Ritz vector of jth partition. This process is
carried out by Algo-
rithm 2.
Algorithm 2 POCS Method
1: Initialize w0 i and Q by (3.10) and (3.7), respectively.
2: max iteration ← 100; 3: tolerance ← 0.001; 4: for j = 1 to J do
5: wj previous← w0
i ; 6: QQ← [QQ Q ∗Q′]; 7: for i = 1 to max iteration do 8: wj ← QQ
∗ repmat(wj; previous, [J 1])/J ; 9: s is the angle between wj
previous and wj; 10: if the angle degree s is greater than 90 then
11: s← 180− s; 12: end if 13: if the angle degree s is less than
tolerance then 14: return; 15: end if 16: end for 17: end for
4.2.2 Reconstruction of the Principle Components
In this section, we introduce the algorithm to reconstruct the
principle components
Z1, . . . , ZJ . Since the approximation of the principle
components is Zj = ΨTXj, we
23
Y j = P jTΨZj. (4.11)
Thus, once we obtain the L-component approximation of the transform
matrix
W , we can reconstruct Zj by the least-squares solver. The solution
is
Zj = (P jTΨ)+Y j, (4.12)
where (P jTΨ)+ is the pseudoinverse of P jTΨ.
Algorithm 3 Reconstruction of Principle Components by CPPCA
1: Input: P{j}, Ψ, Y {j}; 2: L← the number of columns of Ψ; 3: Zj ←
pinv(P{j}′ ∗Ψ) ∗ Y {j});
4.3 Performance
In this section, we examine the performance of CPPCA on an actual
hyperspectral
imagery dataset taken from part of hyperspectral data collected by
Gader, et al. [35].
We call this the Gulfport dataset. The Gulfport dataset is rotated
and cropped into
a HSI cube with 320 × 360 pixels and 58 wavelength bands, and then
unfolded into
a large matrix of size 115200× 58. Set the number of pixels as m =
115200 and the
number of wavelength bands as n = 58. Define the hyperspectral data
as X where
X ∈ Rn×m has removed the mean vector from the original matrix so
that E(X)=0.
Firstly, we test the hypothesis that the Ritz vectors ui are close
to normalized
projections vi, i = 1, 2, ..., is true in practice. Let the $i
denote the angles between
ui and vi, i = 1, 2, .... Let us observe the plot of the
eigenvalues of the hyperspectral
data of Gulfport in a log scale (Fig. 4.5).
From this plot, we observe that λ1(Σ) λ2(Σ) λ3(Σ) > . . . >
λn(Σ) > 0. We
generate 1000 random orthonormal projections P ∈ Rn×k to see the
angles between
24
Figure 4.5: Plot of the eigenvalues of the hyperspectral data
matrix of Gulfport in a log scale.
the first six Ritz vectors ui and the normalized projections vi.
The frequency of angle
degrees is presented in Fig. 4.6 and the average angle is given in
Table 4.1. From the
figure, we can see that the angles are concentrated when i = 1, 2,
3. When i = 4, 5, 6,
the angles are dispersive. Thus, the results are only stable, when
i = 1, 2, 3. From
this table, we find out that:
1. The angles $i of this real hyperspectral data are larger than
the angles of the
data of the numerical experiment in [19]. The reason is the
eigenvalues of the
hyperspectral data of Gulfport are not separated as significantly
as the data of
numerical experiments in [19].
2. $1 is very close to 0, so the Ritz vector u1 is very close to
v1.
3. $2 is also close to 0, but it is not as close to 0 as $1.
4. $3, $4, $5, $6 are much greater than $1 and $2.
25
5. $i increases as i increases, so we should not use many ui’s to
obtain the initial
average w0 i (4.10), since the approximation of the eigenvector w
is incorrect if
we use the ui which is not close to vi to replace vi (4.6).
Figure 4.6: The frequency of angle degrees $i of the hyperspectral
data of Gulfport, i = 1, 2, . . . , 6.
Table 4.1: The average angle between Ritz vector and normalized
projection.
The average angle $1 $2 $3 $4 $5 $6
The value of $i 1.50 8.60 20.73 27.46 30.14 48.41
When considering the above observations, it is appropriate to use
the first three
Ritz vectors to obtain the initial average w0 i (4.10).
26
Next, let us start to reconstruct the eigenvectors w1 and w2 with J
= 15, 20, 30, 50, 60.
Set ξi as the average angles between the eigenvectors wi and the
approximation of
these eigenvectors, i = 1, 2.
Fig. 4.7 and Fig. 4.8 show that for larger J , the average angles
ξ1 and ξ2 are
smaller. Meanwhile, when J is greater, first we need to generate
more projections P to
form the projected data Y . As a result, the CPPCA algorithm has
more calculations
later. Therefore, we consider finding a value of J which balances
the accuracy of
approximation and the amount of calculation.
Figure 4.7: The angle degrees ξ1 of hyperspectral data of Gulfport
with J = 15, 20, 30, 50, 60.
As we observed, the average angles ξ1 and ξ2 with J = 15 are much
greater than
the average angles ξ1 and ξ2 with other values of J . And, each
pair of average angles
ξ1 and ξ2 are close to the other pair when J = 20, 30, 50, 60.
Thus, we use J = 20 for
the following experiments.
Last, we reconstruct the approximate original data X which denotes
X where
27
Figure 4.8: The angle degrees ξ2 of the hyperspectral data of
Gulfport with J = 15, 20, 30, 50, 60.
X = [x1, x2, ..., xm] and X = [x1, x2, ..., xm]. We use the
signal-to-noise ratio (SNR)
to measure the quality of the reconstruction of a vector in dB
[33].
SNR(xj, xj) = 10log10 var(xj)
where var(xj) is the variance of xj ∈ Rn×1 and
MSE(xj, xj) = 1
n xj − xj. (4.14)
The mean of SNR(xj, xj), j = 1, ...,m is used to measure the
quality of the recon-
struction of X. Fig. 4.9 shows the reconstruction performance of
the hyperspectral
data of Gulfport with different k. We compare the CPPCA algorithm
with the rSVD
algorithm in Chapter 6.
28
Figure 4.9: Reconstruction performance of the hyperspectral data of
Gulfport by the CPPCA algorithm.
29
5.1 Introduction
Since the advent of large datasets in hyperspectral image
processing, the classical ma-
trix factorization methods we introduced in Chapter 2 can sometimes
not be adapted
to process such large-scale problems. These methods often bring a
huge computa-
tional cost.
To overcome the disadvantages of classical methods, randomized
methods are
considered by researchers to be appropriate for constructing the
approximate matrix
factorization. They are appropriate because the random sampling
method is effective
when estimating characteristics of the whole population by a
relatively small sample
and the Johnson-Lindenstrauss Lemma guarantees that the distances
between points
can be preserved by the projection from the high dimension space to
a lower dimen-
sional subspace. Chen et al. [28] have also shown that for
classification algorithms
and classical target detection for HSI, even with a completely
random projection,
the dimensionality can be reduced to 1/5 ∼ 1/3 of the original
dimensionality with-
out severely affecting the algorithm performance. Thus, randomized
methods are
appropriate to use in hyperspectral imaging.
Our goal is to compute a low rank SVD approximation (2.2) for
hyperspectral
imaging by random projections. The algorithms about how to obtain a
low rank
SVD approximation have previously been proposed in [17]. Here, we
introduce the
ideas and explain how to execute the algorithms under different
conditions and how
to apply them in hyperspectral imaging. We define a matrix as A ∈
Rm×n with a
target rank k, k ≤ n ≤ m, and ε as the approximation error.
In the first step, the aim is to find an approximate basis matrix Q
∈ Rm×k for the
30
range of A by using as few columns as possible. Meanwhile Q should
satisfy that
A−QQTA2 ≤ ε (5.1)
to ensure the accuracy of approximation.
In the second step, the aim is to finish the approximate
factorization of the SVD
with a small amount of calculation. Let B = QTA, the size of B is k
× n which is
smaller than the matrix A. It can be factorized directly to be B =
UBSV T where UB
and V are orthogonal. Then we can achieve the approximate
factorization of A as
A ≈ QUBSV T . Here, denote QUB as U which is still orthogonal, so
it can be seen as
an approximation of the original left singular vector matrix U of
A. We define this
factorization
How accurate is this method? Define the error ek as
ek = A− USV T2. (5.3)
This error ek should be compared to the theoretical error
σk+1
σk+1 = A− Ak2 (5.4)
which we defined in Theorem 2.3.
5.2 Algorithms
This section includes the algorithms to solve problems under
different conditions.
Case 1: If we already have a target rank k and the singular values
of A decay
rapidly, we construct a random matrix of size n × (k + p). The
oversampling
parameter p is assigned as five from experience [17]. In stating
the algorithms, we
31
incorporate k + p into k. Form a random sample of the matrix A as Y
= A with
lower dimensions. The columns of Y are linearly independent, so we
can obtain the
approximate basis for the range of A by using the rank revealing QR
factorization,
Y = QR where Q is an orthogonal basis (Algorithm 4).
Algorithm 4
1: Input: Given an m× n matrix A with numerical rank k, k ≤ n ≤ m.
2: Generate a Gaussian random matrix ∈ Rn×k; 3: Form the matrix Y ∈
Rm×k, Y = A; 4: Construct a matrix Q ∈ Rm×k whose columns are the
basis for the range of Y ; 5: Form the small matrix B as QTA; 6:
Compute the SVD of B, B = UBSV
T ; 7: Form the rSVD of A, A ≈ QUBSV
T = USV T ; 8: Output: U , S, V T .
We compare the error ek and the theoretical error σk+1 of a matrix
A in Fig. 5.1
[36], when the singular values of A decay rapidly. It shows that ek
given by (5.3) is
close to the theoretical error σk+1 with high probability. However,
ek is not always
close to the theoretical error σk+1 from Theorem 2.4. For example,
Fig. 5.2 shows
that ek is not close to the theoretical error σk+1 from the curve
when q = 0.
Case 2: If the singular values of A decay gradually, σk/σ1 is not
small, we may
lose the accuracy of Algorithm 4. Consider forming Y as Y = (AAT
)qA by power
iteration. Since (AAT )qA has same singular vectors of A, but the
singular value σi of
the matrix (AAT )qA is equal to σ2q−1 i . The singular values of
(AAT )qA decay more
rapidly, so the error A−QQTA is smaller by Theorem 2.3 and Theorem
2.5. Fig.
5.2 gives an example of a 1000× 1000 matrix and Algorithm 5 shows
us how to deal
with this case with a relatively accurate computation.
Case 3: We may not know the target rank k in practice. Thus, we
need to
learn how many columns of Q to use from a given ε such that A −
QQTA2 ≤ ε.
We attempt to use, say, l columns first to observe the value of ε.
If the value of
A − QlQ T l A2 is beyond ε, then we add more columns of Q until it
satisfies the
32
Figure 5.1: The comparison of the error ek and the theoretical
error σk+1 [36].
bound (Algorithm 6).
Case 4: The algorithms above in this section require us to revisit
the input matrix.
This may be not feasible for large matrices. Here, we introduce the
methods for large
matrices. These methods only require one pass over the matrix A to
construct the
matrix Q and the rSVD of A.
Algorithm 7 is used for to symmetric matrices. We define B = QTAQ
and multiply
QT to both sides of this equation, then we have
BQT = QTAQQT. (5.5)
BQT ≈ QTA = QTY. (5.6)
Q,B, and Y are known, so we can solve this equation (5.6) to obtain
the matrix B.
From B = QTAQ we can obtain the approximation of A. Similarly, we
can process
33
Algorithm 5
1: Input: Given an m× n matrix A with numerical rank k, k ≤ n ≤ m.
2: Generate a Gaussian random matrix ∈ Rn×k; 3: Form the matrix Y ∈
Rm×k, Y = A; 4: Compute the rank-revealing QR factorization Y = QR;
5: for j = 1 to q do 6: Form Y = ATQ and compute the rank-revealing
QR factorization of Y ; 7: Form Y = AQ and compute the
rank-revealing QR factorization of Y ; 8: end for 9: Form the small
matrix B as QTA; 10: Compute the SVD of B, B = UBSV
T ; 11: Form the rSVD of A, A ≈ QUBSV
T = USV T ; 12: Output: U , S, V T .
Algorithm 6
1: Input: Given an m× n matrix A with numerical rank k, k ≤ n ≤ m.
2: Form an empty basis matrix Q, set e=1 and k=0; 3: while e > ε
do 4: k = k + 1; 5: Form the vectors yi = Ari; where ri is a
Gaussian random vector; 6: Form qi = (1−Qi−1Q
∗ i−1)yi;
7: Normalize qi = qi qi ;
8: Q = [Qqi]; 9: = [ ri]; 10: Compute the error e = A−QQTA2; 11:
end while 12: Form the small matrix B as QTA; 13: Compute the SVD
of B, B = UBSV
T ; 14: Form the rSVD of A, A ≈ QUBSV
T = USV T ; 15: Output: U , S, V T .
Algorithm 7
1: Input: Given an m×m symmetric matrix A with numerical rank k, k
≤ m. 2: Generate a Gaussian random matrix ∈ Rm×k; 3: Form the
matrix Y ∈ Rm×k, Y = A; 4: Construct a matrix Q ∈ Rm×k whose
columns are the basis for the range of Y ; 5: Use a standard least
squares solver to find Bapprox which satisfies Bapprox(Q
T) ≈ QTY ;
6: Compute the eigenvalue decomposition of Bapprox as Bapprox = V
ΛV T ; 7: Form the approximated eigenvectors U = QV ; 8: The
approximation of A can be expressed as A ≈ UΛUT ; 9: Output: U and
Λ.
34
Figure 5.2: The comparison of the error ek and the theoretical
error σk+1. The pink curve shows the error ek is greater than the
theoretical error σk+1 when q = 0 [36]. .
nonsymmetric matrices by Algorithm 8.
5.3 Performance Analysis
We consider an n× n symmetric Toeplitz matrices, n = 15, 30, ...,
1500. The singular
values of these matrices decay rapidly, as seen in Fig. 5.3, so
they are appropriate to
be used to test Algorithm 4.
First, we see the relative error between the rSVD and original
matrix A by Fig.
5.4. It shows that the relative errors
A− Um×kSk×nV T n×n2
A2
(5.7)
rise and fall around 1.5×10−7. Also, the result shows us the
algorithm should remain
35
Algorithm 8
1: Input: Given an m× n matrix A with numerical rank k, k ≤ n ≤ m.
2: Generate Gaussian random matrices ∈ Rn×k and ψ ∈ Rm×k; 3: Form
matrices Y ∈ Rm×k and Z ∈ Rn×k, Y = A and Z = ATψ; 4: Construct a
matrix Q ∈ Rm×k whose columns are the basis for the range of
Y
and a matrix F ∈ Rn×k whose columns are the basis for the range of
Z; 5: Find Bapprox which satisfies Bapprox(F
T) ≈ QTY and BT approx(Q
TΨ) ≈ F TZ;
6: Compute the SVD of Bapprox as Bapprox = USV T ;
7: Form the approximated left singular vectors U = QU and the
approximated right singular vectors V = FV ;
8: The approximation of A can be expressed as A ≈ USV T
9: Output: U , S and V .
relatively accurate when the sizes of the matrices increase.
Second, we see the computation times of rSVD for n×n Toeplitz
matrices. Figure
5.5 illustrates that the computational times increase linearly, but
stay very small.
From the results we have shown, rSVD is an accurate and efficient
method to
factorize matrices. Also, we compare the relative error and
computational time with
truncated SVD in the next chapter.
5.4 Applications
In this section, we apply the rSVD to hyperspectral imaging. As
introduced earlier,
one can obtain a good approximation of a matrixA, A ≈ QB, by rSVD.
Here, Q andB
are smaller matrices than A. Generally, since the number of
wavelength bands which
are generated by hyperspectral spectrometers is less than 250, the
reorganization of
hyperspectral data is expressed as a m×n matrix, where n m. As a
result, the size
of B is very small. For example, considering the hyperspectral data
of the Gulfport
dataset, we reorganize it as a 115200×58 matrix by the way we
introduced in Chapter
2. Let this matrix be A. A is approximated with target rank 25
by
A ≈ QQTA = QB (5.8)
36
Figure 5.3: We use a 1000 × 1000 matrix as an example to show the
singular values of Toeplitz matrices decay rapidly by observing the
100 largest singular values.
where Q is a 115200× 25 matrix and B is a 25× 58 matrix. Therefore,
we consider
compressing the matrix A into B and Q on a hyperspectral sensing
platform, then
transmiting B and Q to a decoder station and use them to
reconstruct the matrix A.
In practice, to avoid producing multiple random projections for
each column of
A, the rSVD splits A = [A1, A2, . . . , Am]T in to J partitions Aj,
j = 1, . . . , , J . Each
Aj is related to the random matrix j.
5.4.1 Compression on a Hyperspectral Sensing Platform
In the hyperspectral sensing platform, we generate J random
Gaussian matrices j
first. Then, we use Algorithm 4, rSVD Encoder, to compress A to B
and Q. The
sum of the bytes used to store B and Q is smaller than that for
storing Y which is
the compressed data by the CPPCA.
37
Figure 5.4: The relative errors between the rSVD and original
matrix A are shown by the red curve.
Figure 5.5: The computation times (seconds) of rSVD.
38
Algorithm 9 rSVD Encoder
1: Draw a length-J cell array of n× k random Gaussian matrices {1},
. . . ,{J}. 2: for j = 1 to J do 3: A{j} ← A(:, j : J : m); 4: Y
{j} ← A{j}{j}; 5: Construct the matrices Q{j} whose columns form an
orthonormal basis for the
range of Y {j}; 6: Form B{j} = Q{j}TA{j}; 7: end for 8: Output: a
length-J cell array B and a length-J cell array Q.
5.4.2 Reconstruction at a Ground Receiving Station
After receiving the arrays B and Q, the task of reconstructing A
can be finished by
QB = QQTA easily for the properties of Q. The algorithm is shown as
Algorithm 10.
Algorithm 10 Reconstruction by rSVD
1: Input: a cell length-J array B and a length-J cell array Q. 2:
for j = 1 to J do 3: A{j} = Q{j}B{j}; 4: end for
Next, let us use the SNR (4.13) to measure the quality of the
reconstruction per-
formance by rSVD. Fig. 5.6 shows relatively accurate reconstruction
performance by
rSVD when J = 20. We compare it with CPPCA from the accuracy and
computation
times in Chapter 7.
5.4.3 Classification
Since the projection Xp given in (3.3) contains the “most common”
information of
the matrix X, we do the unsupervised classification of the
hyperspectral image of the
Gulfport dataset by using Xp. Here, we use the k-means algorithm
for our numerical
experiment. Fig. 5.7 shows the result of classification by the
method of k-means. We
can see that the water and shadows are in yellow, the trees are in
red, the grasses
are in dark red, the pavements are in green, the beach sands are in
dark blue, and
39
Figure 5.6: Reconstruction performance of the hyperspectral data of
the Gulfport dataset by the rSVD algorithm. Here, k is the target
rank and n is the number of columns of A.
the sandy/dirt grasses are in blue and light blue. The
classification performance is
compared to that obtained from the original matrix X. Only 13
pixels of 115200
pixels are classified differently between the original matrix and
the projected data,
so the total accuracy of classification is above 99 percent. This
result demonstrates
that it is suitable to use the projected data Xp for
classification. Fig. 5.8 shows us
the images associated with the first eight columns of Xp.
40
Figure 5.7: The classification result in the method of
k-means.
41
Figure 5.8: The plots of the first eight columns of Xp. From the
first sub-figure, we can see that most information of the
hyperspectral image is contained in the first column of Xp. From
sub-figure 2, we can see that the second column almost contains the
rest of the information which the first column does not contain.
The fifth column of Xp contains the main identification of four
targets.
42
6.1 Comparing Randomized SVD and Truncated SVD
In this section, we compare the two methods of randomized SVD
(rSVD) and trun-
cated SVD (tSVD). SVDS is a Matlab function to calculate the
truncated SVD (2.2).
The command of SVDS is [U, S, V ] = svds(A, k). From this command,
we can get
the k largest singular values and the associated singular vectors
of a matrix A. SVDS
is considered to be an efficient method to obtain the tSVD. Thus,
we compare it with
rSVD which is coded in Matlab.
First, we compare the computational times of these two methods.
Generate ran-
dom test matrices A ∈ Rn×n, n = 101, ..., 2000 and set the target
rank k = 6. We use
Algorithm 4 to compute the rSVD. The result is shown as Fig.
6.1.
Figure 6.1: When the target rank is six, the computation time
(seconds) of SVDS and rSVD.
From Fig. 6.1, we find that SVDS is almost as effective as rSVD
when n is
43
relatively small. However, when n becomes large, the computation
time of SVDS
which increases quickly is much greater than the computation time
of rSVD, which
is kept in the range from 0 to 1 seconds.
6.2 Comparing Randomized SVD and CPPCA
In this section, we compare the rSVD and the CPPCA algorithms from
the aspects of
accuracy and computation time. First, let us use the hyperspectral
Gulfport dataset
to compare the accuracy of reconstruction by these two methods. For
the rSVD,
the hyperspectral Gulfport dataset should be reorganized as a
matrix A ∈ R115200×58.
Meanwhile, for the CPPCA, the hyperspectral Gulfport dataset should
be reorganized
as a matrix X ∈ R58×115200, X = AT , and E(X) = E(A) = 0. When we
obtain the
reconstruction matrices A and X, we use AT and X that are both
58×115200 matrices
to get the SNR. Fig. 6.2 shows that the rSVD trumps the CPPCA in
the accuracy
of reconstruction.
Figure 6.2: Comparison of the reconstruction performances by rSVD
and CPPCA with J = 20.
44
Second, compare the computation times of full datacube
reconstructions of the
rSVD and the CPPCA. Table 6.1 shows the rSVD takes a little longer
than the CP-
PCA to finish the task of construction with target rank k, when k/n
= 0.2, 0.3, 0.4, 0.5.
Table 6.1: Computation times in seconds of the rSVD and
CPPCA.
k/n 0.1 0.15 0.2 0.3 0.4 0.5
Computation time of rSVD 0.212 0.292 0.390 0.707 0.897 1.264
Computation time of CPPCA 0.247 0.305 0.331 0.368 0.399 0.509
Last, we compare the accuracy of the reconstructions of the
eigenvectors, wi, of
the covariance matrix of X by these two methods. The motivation is
that if we can
obtain a good reconstruction of the eigenvectors wi from the rSVD,
it is helpful in
obtaining the principle components wi TX. PCA is a very useful tool
in hyperspectral
imaging, such as in the process of classification. Thus, accurate
reconstructions of
the eigenvectors improve the performance of PCA in hyperspectral
imaging.
The first row of Fig. 6.3 shows the histograms of the angles
between the first
four reconstructions of wi by the rSVD and the true eigenvectors,
and the second
row shows the histograms of the angles between the the first four
reconstructions of
wi by the CPPCA. We can see that the reconstructions of wi by the
rSVD are more
accurate than those CPPCA. Moreover, this advantage appears more
when the index
i of w increases, since the angles in the second row apparently
increase.
45
Figure 6.3: Comparison of the reconstruction performances of first
four wi by rSVD and CPPCA.
46
Chapter 7: Conclusions and Future Research
Recently, researchers have shown that randomization is very useful
in low-rank
matrix approximation. In this thesis, we have presented a related
method, randomized
SVD. We are interested in the performances of this method in
hyperspectral imaging,
such as the accuracy and computation time. From our the numerical
experiments,
we can draw observations as follows:
• Compared with the classical deterministic method truncated SVD
implemented
in Matlab, randomized SVD can process the matrices in a shorter
time. This
advantage is more obvious when the sizes of the matrices
increase.
• Compared with the popular CPPCA method, randomized SVD is more
accurate
in reconstruction.
• We have applied randomized SVD in classification, and it works
well. Only 13
pixels of 115200 pixels are classified differently between the
original matrix and
the projected data in our example. The accuracy of classification
is above 99
percent.
Thus, the randomized SVD method performs well on large matrices.
Although
CPPCA has special uses in resource constrained settings, randomized
SVD is more
convenient and more accurate than CPPCA in most situations.
We will focus on how to further apply randomized SVD in
hyperspectral imaging
in our future research. For example, we will use randomized SVD in
classification with
segmented subsets, anomaly detection, and unmixing with
hyperspectral images.
47
Bibliography
[1] Nicholas M. Short, Sr.. History of Remote Sensing: In the
Beginning; Launch
Vehicles. 2009.
[3] Wikipedia contributors. Hyperspectral Imaging [Internet].
Wikipedia, The Free
Encyclopedia; 2012 May 26, 04:26 UTC [cited 2012 June 12].
Available from:
http : //en.wikipedia.org/wiki/Hyperspectral imaging.
[4] Wikipedia contributors. Remote Sensing [Internet]. Wikipedia,
The Free En-
cyclopedia; 2012 June 7, 13:28 UTC [cited 2012 June 12]. Available
from:
http : //en.wikipedia.org/wiki/Remote sensing.
[5] M. T. Eismann, Hyperspectral Remote Sensing. SPIE Press,
2012.
[6] H. F. Grahn and E. Paul Geladi Techniques and Applications of
Hyperspectral
Image Analysis. Wiley, 2007.
[7] J. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P.
Gader, and
J. Chanussot. Hyperspectral unmixing overview: Geometrical,
statistical, and
sparse regression-based approaches. IEEE Journal of Selected Topics
in Applied
Earth Observations and Remote Sensing, vol. 99, pp. 1-16,
2012.
[8] Y. Chen. Improved nonlinear manifold learning for land cover
classification
via intelligent landmark selection. Geoscience and Remote Sensing
Symposium,
pp.545-548, 2006.
48
[9] D. D. Lee and H. S. Seung. Learning the parts of objects with
nonnegative matrix
factorization. Nature, 401:788-791, 1999.
[10] P. Cunningham. Dimension reduction, Technical Report on
Dimension Reduction
UCD-CSI-2007-7. August 2007.
[11] C. Li, T. Sun, K. Kelly, and Y. Zhang. A compressive sensing
and unmix-
ing scheme for hyperspectral data processing. IEEE Trans Image
Process,
21(3):1200-1210, 2012.
[12] T. Hastle, R. Tibshirani, and J. Friedman. The Elements of
Statistical Learning:
data mining, inference, and prediction. Springer, Berlin,
2008.
[13] V. Rokhlin and M. Tygert. A fast randomized algorithm for
overdetermined
linear least-squares regression. PNAS, vol. 105, no. 36, pp.
13212-13217, 2008.
[14] B. Engquist and O. Runborg. Wavelet-based numerical
homogenization with
applications Multiscale and Multiresolution Methods: Theory and
Applications,
T. J. Barth et al., ed., vol. 20 of LNCSE, Springer, Berlin, pp.
97148, 2001.
[15] M. Johnson. Randomized algorithms for computation of singular
value decom-
position of large matrices, presentation slides, 2010.
[16] H. Wang, S. Babacan, and K. Sayood. Lossless
hyperspectral-image compression
using context-based conditional average. Geoscience and Remote
Sensing, IEEE
Transactions on, vol. 45, no. 12, pp. 4187–4193, 2007.
[17] N. Halko, P. G. Martinsson and J. A. Tropp. Finding structure
with randomness:
probabilistic algorithms for constructing approximate matrix
Decompositions.
SIAM Review, 53(2):217-288, 2011.
49
[18] Q. Zhang, R. Plemmons, D. Kittle, D. Brady, and S. Prasad.
Joint segmentation
and reconstruction of hyperspectral data with compressed
measurements. Applied
Optics. vol. 50, no. 22, pp. 4417–4435, 2011.
[19] J. Fowler. Compressive-projection principal component
analysis. IEEE Transac-
tions on Image Processing, 18(10):2230-2242, 2009.
[20] M. Gehm, R. John, D. Brady, R. Willett, and T. Schulz.
Single-shot compres-
sive spectral imaging with a dual-disperser architecture. Optics
Express, vol. 15,
no. 21, pp. 14 013–14 027, 2007.
[21] A. Wagadarikar, R. John, R. Willett, and D. Brady, Single
disperser design for
coded aperture snapshot spectral imaging, Applied optics, vol. 47,
no. 10, pp.
B44–B51, 2008.
[22] L. Trefethen and D. Bau, Numerical Linear Algebra. Society for
Industrial Math-
ematics, no. 50, 1997.
[23] S. Dasgupta1 and A. Gupta. An elementary proof of a theorem of
Johnson and
Lindenstrauss. Random Structures and Algorithms, 22(1):60-65,
2003.
[24] I.T. Jolliffe. Principal Component Analysis, Second Edition.
Springer, NY, 2002.
[25] C. Liu, C. Zhao and L. Zhang. A new method of hyperspectral
remote sens-
ing image dimensional reduction. Journal of Image and Graphics,
10(2):218-222,
2005.
[26] P. Shippert. Introduction to Hyperspectral Image Analysis.
Remote Sensing of
Earth via Satellite, no. 3, Winter 2003.
[27] J. Zhang, J. Erway, X. Hu, Q. Zhang, R. Plemmons. Randomized
SVD in hy-
perspectral imaging, preprint, May 2012.
50
[28] Y. Chen, N. Nasrabadi and T. Tran. Effects of linear
projections on the perfor-
mance of target detection and classification in hyperspectral
imagery. Journal of
Applied Remote Sensing, vol. 5, no. 1, pp. 053563-1-053563-25,
2011.
[29] X. Jia and J. A. Richard. Segmented principal components
transformation for
efficient hyperspectral remote-sensing image display and
classification. IEEE
Trans. Geoscience and Remote Sensing, vol. 37, no. 1, pp. 538-542,
Jan. 1999.
[30] G. Motta and F. Rizzo and J. Storer. Hyperspectral Data
Compression. Springer,
NY, 2006.
[31] W. Li, S. Prasad, J. Fowler, L.M. Bruce. Class dependent
compressive-projection
principal component analysis for hyperspectral image
reconstruction. 3rd IEEE
Workshop on Hyperspectral Signal and Image Processing: Evolution in
Remote
Sensing (WHISPERS), 2011.
[32] B. Zhang and L. Gao. Hyperspectral Image Classification and
Target Detection.
Science Press, China, 2011.
[33] J. Fowler and Q. Du. Reconstruction from compressive random
projections of
hyperspectral imagery. Optical Remote Sensing: Advances in Signal
Processing
and Exploitation Techniques, ch.3:31-48, 2011.
[34] A. K. Brodzik and J. M. Mooney. Convex projections algorithm
for restoration
of limited-angle chromotomographic images. Journal of the Optical
Society of
America A, 16(2):246-257, 1999.
[35] P. Gader, A. Zare, R. Close, and G. Tuell, Co-registered
hyperspectral and Li-
DAR Long Beach, Mississippi data collection, 2010, University of
Florida, Uni-
versity of Missouri, and Optech International.
51
[36] G. Martinsson. Randomized methods for computing the Singular
Value Decom-
position (SVD) of very large matrices. Workshop on Algorithms for
Modern Mas-
sive Data Sets, Palo Alto, June 2010.
52
Appendix A: Related Matlab Code
Here we attach some Matlab codes we used in previous chapters.
These Matlab
codes include the codes for algorithms, figures and tables.
A.1 Algorithms
>> To find a lgor i thm 1 , a lgor i thm 2 , and algor i thm
3 , p l e a s e r e f e r http ://www. ece . msstate . edu/˜ f ow l
e r /CPPCA/ .
>> Algorithm 4 function [U, S ,V]=randProjSVD I (A, k ) [m n
] = s ize (A) ; O = randn(n , k ) ; Y = A∗O; [Q R] = qr (Y, 0 ) ; B
= Q’∗A; [U, S ,V] = svd (B) ; U = Q∗U;
>> Algorithm 5 function [U, S ,V]=randProjSVD II (A, k , q )
[m n ] = s ize (A) ; O = randn(n , k ) ; Y = A∗O; [Q R] = qr (Y, 0
) ; for j = 1 : q
Y = A’∗Q; [Q R] = qr (Y, 0 ) ; Y = A∗Q; [Q R] = qr (Y, 0 ) ;
end B = Q’∗A; [U, S ,V] = svd (B) ; U = Q∗U;
>> Algorithm 6 [Q j ]= ITERandrangefinder (A, 1 0 ) ; B =
Q’∗A;
53
[U, S ,V] = svd (B) ; UU = Q∗U; function Q =ITERandrangefinder (A,
r ) [m n ] = s ize (A) ; Y=zeros (m, r ) ; for i =1: r
O=randn(n , r ) ; y i=A∗ O( : , i ) ; Y( : , i )=Y( : , i )+y i
;
end Q = [ ] ; N = zeros (1 , r ) ; for k = 1 : r
n i = norm(Y( : , k ) , 2 ) ; N( k )= ni ;
end max N = max(N) ; j =0; e p s i l o n =10ˆ−3; h = e p s i l o n
/(10∗ sqrt (2/ pi ) ) ; s t ep = 0 ; while max N > h &&
step <25
step = step + 1 ; j=j +1; i f j > 1 Y( : , j ) = Y( : , j
)−Q∗(Q’∗Y( : , j ) ) ; end qj = Y( : , j ) /norm(Y( : , j ) , 2 ) ;
Q = [Q, q j ] ; omega = randn(n , 1 ) ; y = A∗omega − Q∗(Q’∗A∗omega
) ; Y = [Y, y ] ; N = [N, norm(y , 2 ) ] ; for i=j +1: j+r−1
Y( : , i ) = Y( : , i )−qj ∗( qj ’∗Y( : , i ) ) ; N( i ) = norm(Y(
: , i ) , 2 ) ;
end
>> Algorithm 7 %Input :
54
load A. mat A=A( 1 : 5 8 , : ) ; %A i s a symmetric matrix .
Omiga=randn (58 ,30) ; %30=25+5. Here , 25 i s the t a r g e t rank
, 5 i s the oversamping
parameter . %we add a oversamping parameter , s i n c e we want to
keep the
accurancy . %The error norm(A−U\LambdaU∗) produced by a l gor i thm
7 can be
l a r g e r than the error r e s u l t i n g from a lgor i thm 4.
Y=A∗Omiga ; [Q R]=qr (Y, 0 ) ; Q=Q( : , 1 : 2 5 ) ; B=Q’∗Y∗pinv
(Q’∗Omiga) ; %We d e f i n e B=Q’AQ; M u l t i p l y Q’ Omiga to
each s ide , then we
have BQ’ Omiga=Q’AQQ’ Omiga ; %Since AQQ’\ approx A, BQ’ Omiga
\approx Q’ AOmiga . BQ’ Omiga \
approx Q’Y. ( s t e p 5 in a l gor i thm 7) [V D]=eig (B) ; U=Q∗V;
%s i n c e B=Q’AQ, A \approx QBQ’ ; A\approx QVD(QV) ’ ;
>> Algorithm 8
>> Algorithm 9 function [B, Q, A] = RSVD Encoder (A, Omiga)
[M N]= s ize (A) ; J = 20 ; [N K] = s ize (Omiga{1}) ; for j = 1 :
J
A{ j } = A( j : J :M, : ) ; Y t i l d e { j } = A{ j }∗Omiga{ j }
;
[Q{ j } R{ j }]=qr ( Y t i l d e { j } , 0 ) ; B{ j}=Q{ j } ’∗A{ j
} ;
end
>> Algorithm 10 function [ A TILDE , U TILDE , S TILDE , V
TILDE ] = RSVD Decoder (
B,Q) J=20; for j =1:J A TILDE{ j}=Q{ j }∗B{ j } ; end
55
A.2 Figures and Tables
>> Fig . 5 . 4 clear ; R= [ ] ; for i =1:1:100 [A, b , x]=
grav i ty (15∗ i , 2 , 0 , 1 , . 5 ) ; [Q, j ]= ITERandrangefinder
(A, 1 0 ) ; B = Q’∗A; [U, S ,V] = svd (B) ; UU = Q∗U; R( i , 1
)=norm(UU∗S∗V’−A) /norm(A) ; end figure x =15:15:1500 plot (x ,R( 1
: 1 0 0 , 1 ) , ’ r ’ )
>> Fig . 5 . 5 clear ; clc ; t=zeros (100 ,1 ) ; for i
=1:1:100 [A, b , x]= grav i ty (15∗ i , 2 , 0 , 1 , . 5 ) ; t ic [Q
j ] =ITERandrangefinder (A, 1 0 ) ; B = Q’∗A; [U, S ,V] = svd (B) ;
UU = Q∗U; t ( i )=toc ; end figure x =15 :15 :1500 ; plot (x , t ,
’−∗ ’ ) ;
>> Fig . 5 . 7 and Fig 5 .8 function [ o v e r a l l a c c u
r a c y , each accuracy ]=CLASSIFICATION( ) load TESTDATA
CLASSIFICATION. mat P=6;% 6 CLASSES [m n]= s ize (A) ;% MEAN(A)
=0,STD(A) =1; Omiga=randn (58 ,25) ;%I f i t works s l ow ly , p l
e a s e change i t to a
s m a l l e r number , such as 30. Y=A∗Omiga ; [Q R]=qr (Y, 0 )
;
56
B=Q’∗A; [U, S ,V] = svd (B) ; U=Q∗U; WW=S ’∗U’ ; %f i g u r e ;
%imagesc ( reshape (WW( 1 , : ) ,307 ,1280) ) ; %f i g u r e ;
%imagesc ( reshape (WW( 2 , : ) ,307 ,1280) ) ; %f i g u r e ;
%imagesc ( reshape (WW( 3 , : ) ,307 ,1280) ) ; %f i g u r e ;
%imagesc ( reshape (WW( 4 , : ) ,307 ,1280) ) ; %f i g u r e ;
%imagesc ( reshape (WW( 5 , : ) ,307 ,1280) ) ; %f i g u r e ;
%imagesc ( reshape (WW( 6 , : ) ,307 ,1280) ) ; %f i g u r e ;
%imagesc ( reshape (WW( 7 , : ) ,307 ,1280) ) ; %f i g u r e ;
%imagesc ( reshape (WW( 8 , : ) ,307 ,1280) ) ; K=WW( 1 : 2 , : ) ’
; opt i ons=s t a t s e t ( ’ MaxIter ’ ,500) ; [ IDX C]=kmeans
(K,P, ’ opt ions ’ , opt i ons ) ; IDX=reshape (IDX, [ 3 2 0 360 ]
) ; imagesc (IDX) ; f igure [ IDX1 C1]=kmeans (A,P, ’ opt ions ’ ,
opt i ons ) ; IDX1=reshape (IDX1 , [ 3 2 0 360 ] ) ; imagesc (IDX1)
; a (1 )=sum(sum(IDX==1)) ; a (2 )=sum(sum(IDX==2)) ; a (3
)=sum(sum(IDX==3)) ; a (4 )=sum(sum(IDX==4)) ; a (5
)=sum(sum(IDX==5)) ; a (6 )=sum(sum(IDX==6)) ; a=sort ( a ) ; b (1
)=sum(sum(IDX1==1)) ; b (2 )=sum(sum(IDX1==2)) ; b (3
)=sum(sum(IDX1==3)) ; b (4 )=sum(sum(IDX1==4)) ; b (5
)=sum(sum(IDX1==5)) ; b (6 )=sum(sum(IDX1==6)) ;
57
b=sort (b) ; s=sum(abs ( a−b) ) ; o v e r a l l a c c u r a c
y=(1−s /(m∗n) ) ; for i =1:P
each accuracy ( i )=abs ( a ( i )−b( i ) ) /b( i ) ; end each
accuracy=diag (eye (P) )−each accuracy ’ ;
>> Fig . 6 . 2 function a l l r e c o n s t r u c t i o n s c
o m p a r e 2 ( ) %Compare the r e c o n s t r u c t i o n
performances by d i f f e r e n t methods
. clc ; clear %Input : load X. mat ; %X i s the standard data . The
mean v e c t o r o f X has been
removed from X, i . e . E(X) =0. %Parameters ; [N M] = s ize (X) ;
Ls=[3 3 3 3 3 3 ] ; %the number o f e i g e n v e c t o r s which
we used
to form the approximated transform matrix . J=20; % s p l i t the o
r i g i n a l matrix i n t o J p a r t i t i o n s . r e l a t i v
e d i m e n s i o n s = [ 0 . 1 0 .15 0 .2 0 .3 0 .4 0 . 5 ] ; Ks =
round( r e l a t i v e d i m e n s i o n s ∗ N) ; %t a r g e t rank
p r o j e c t i o n m a t r i x f i l e = [ ’ p r o j e c t i o n s
. ’ num2str(N) ’ . ’
num2str( J ) ’ . mat ’ ] ; %Psi PCA = PCA Train (X) ; o r i g i n a
l transform matrix . X o r i g i n a l = X; A=X or i g ina l ’ ; A
o r i g i n a l = A; q=5; %% CPPCA for index1 = 1 : length (Ks) K =
Ks( index1 ) ; L = Ls ( index1 ) ; P{ index1} = CPPCA
GenerateProjections (N, K, J ,
p r o j e c t i o n m a t r i x f i l e ) ; %Generate the c e l l o
f p r o j e c t i o n matr ices . The l e n g t h o f the
;
58
%Compress the data X to Y. Here , the o u t p u t s X and Y are c e
l l s .
t ic [ X check CPPCA , Psi CPPCA ] = CPPCA Decoder ( Y t i l d e {
index1 } ,
P{ index1 } , L) ; toc
%X check CPPCA i s the r e c o n s t r u c t i o n o f c e l l X,
and the Psi CPPCA i s the
%approximated transform matrix . To g e t the Psi CPPCA ,
Rayleigh−Ritz theory
%and convex−s e t o p t i m i z a t i o n are used . D CPPCA(
index1 ) = SNR Dataset (X, X check CPPCA , . . .
[ Psi CPPCA zeros (N, N − L) ] ) ; end % To see the SNR of X and X
check CPPCA . %% Randomized SVD Direct ly for index1 = 1 : length
(Ks)
K = Ks( index1 ) ; Omiga{ index1} = RSVD Generaterandommatrices (N,
K, J ) ;
% Generate the c e l l o f Gaussian random matr ices . The l e n g
t h o f the c e l l
% i s J . [B{ index1 } , Q{ index1 } , A] = RSVD Encoder ( A or i g
ina l , Omiga { index1 }) ;
%Compress the data A to B and Q. Here , the o u t p u t s B, Q and
A are c e l l s .
%B i s the sma l l matrix . Q i s the o r t h o g o n a l matrix .
t ic A check RSVD = RSVD Decoder (B{ index1 } , Q{ index1 }) ; toc
%A check RSVD i s the r e c o n s t r u c t i o n o f c e l l A.
A1=ce l l 2mat (A’ ) ; A2 = ce l l 2mat ( A check RSVD ’ ) ;
%Change A and A check RSVD to A1 and A2 which have same
s i z e wi th X and X check CPPCA . A1=A1 ’ ; A2=A2 ’ ; D RSVD(
index1 ) = mean(SNR(A1 , A2) ) ;
end % To see the SNR of A1 and A2 . %% Randomized SVD Power i t e r
a t i o n
59
%Randomized SVD Power i t e r a t i o n i s used f o r the matrix
whose s i g u l a r v a l u e s
%decays g r a d u a l l y . for index1 = 1 : length (Ks)
K = Ks( index1 ) ; Omiga{ index1} = RSVD Generaterandommatrices (N,
K, J ) ; % Generate the c e l l o f Gaussian random matr ices . The
l e n g t h
o f the c e l l % i s J . [B POWER{ index1 } , Q POWER{ index1 } ,
A POWER] =
RSVD Power iteration Encoder ( A or i g ina l , q , Omiga{ index1
}) ;
%Compress the data A to B and Q. Here , the o u t p u t s B, Q and
A are c e l l s .
%B i s the sma l l matrix . Q i s the o r t h o g o n a l matrix .
t ic A check RSVD POWER = RSVD Decoder (B POWER{ index1 } , Q
POWER{
index1 }) ; toc %A check RSVD POWER i s the r e c o n s t r u c t i
o n o f c e l l A. A1=ce l l 2mat (A POWER’ ) ; A2 = ce l l 2mat (A
check RSVD POWER ’ ) ; A1=A1 ’ ; A2=A2 ’ ; D RSVD POWER( index1 ) =
mean(SNR(A1 , A2) ) ;
end %% D CPPCA = D CPPCA( 3 : length (D CPPCA) ) ; D RSVD = D RSVD(
3 : length (D RSVD) ) ; D RSVD POWER=D RSVD POWER( 3 : length (D
RSVD POWER) ) ; r e l a t i v e d i m e n s i o n s = . . .
r e l a t i v e d i m e n s i o n s ( 3 : length ( r e l a t i v e
d i m e n s i o n s ) ) ;
f igure (1 ) ; c l f ; plot ( r e l a t i v e d i m e n s i o n s ,
D CPPCA, ’ LineWidth ’ , 2) ; hold on plot ( r e l a t i v e d i m
e n s i o n s , D RSVD, ’ r ’ , ’ LineWidth ’ , 2) ; hold on plot (
r e l a t i v e d i m e n s i o n s , D RSVD POWER, ’ g ’ , ’
LineWidth ’ , 2) ; grid on xlabel ( ’ Re l a t i v e subspace
dimension , K/N ’ ) ;
60
ylabel ( ’ Average SNR (dB) ’ ) ; legend ( ’CPPCA’ , ’ Randomized
SVD ’ , ’ Randomized SVD POWER’ )
>> Fig . 6 . 3 and Table 6 .1 %Compare the accuracy o f r e c
o n s t r u c t i o n o f f i r s t 2 columns o f
the transform matrix by rsvd and cppca f o r the data whose s i n g
u l a r v a l u e s decay g r a d u a l l y .
%v1=the ang l e between V( : , 1 ) and r e a l y f i r s t column
of the transform matrix
%v2=the ang l e between V( : , 2 ) and r e a l y second column of
the transform matrix
%w1=the ang l e between Psi cppca ( : , 1 ) and r e a l y f i r s t
column of the transform matrix
%w2=the ang l e between Psi cppca ( : , 2 ) and r e a l y second
column of the transform matrix
clear ; load A. mat %mean(A) =0; [M N]= s ize (A) ; L=4;%the number
o f e i g e n v e c t o r s which we used to form the
approximated transform matrix . J=20; % s p l i t the o r i g i n a
l matrix i n t o J p a r t i t i o n s . k=25; %k i s the t a r g e
t rank . X o r i g i n a l=A’ ; num tr i a l s = 10 ; [ Ps i S ] =
PCA Train ( X o r i g i n a l ) ; for t r i a l = 1 : num tr i a l
s %%rsvd Omiga=randn (58 ,25) ; Y=A∗Omiga ; [Q R]=qr (Y, 0 ) ; B =
Q’∗A; [U, S ,V] = svd (B) ; s 1 r svd=SAM( Psi ( : , 1 ) ,V( : , 1
) ) ; i f ( s 1 r svd > 90) ;
s 1 r svd = 180 − s 1 r svd ; end
omega1 rsvd ( t r i a l ) =s1 r svd ; s 2 r svd=SAM( Psi ( : , 2 )
,V( : , 2 ) ) ; i f ( s 2 r svd > 90) ;
s 2 r svd = 180 − s 2 r svd ; end
omega2 rsvd ( t r i a l ) =s2 r svd ; s 3 r svd=SAM( Psi ( : , 3 )
,V( : , 3 ) ) ;
61
i f ( s 3 r svd > 90) ; s 3 r svd = 180 − s 3 r svd ;
end omega3 rsvd ( t r i a l ) =s3 r svd ;
s 4 r svd=SAM( Psi ( : , 4 ) ,V( : , 4 ) ) ; i f ( s 4 r svd >
90) ;
s 4 r svd = 180 − s 4 r svd ; end
omega4 rsvd ( t r i a l ) =s4 r svd ; %% cppca P = CPPCA
GenerateProjections (N, k , J ) ; [ Y t i lde , X] = CPPCA Encoder
( X or i g ina l , P) ; [ X check CPPCA , Psi CPPCA ] = CPPCA
Decoder ( Y t i lde , P, L) ; s1 cppca= SAM( Psi ( : , 1 ) ,Psi
CPPCA ( : , 1 ) ) ;
i f ( s1 cppca > 90) ; s1 cppca = 180 − s1 cppca ;
end omega1 cppca ( t r i a l ) =s1 cppca ; s2 cppca = SAM( Psi ( :
, 2 ) ,Psi CPPCA ( : , 2 ) ) ;
i f ( s2 cppca > 90) ; s2 cppca = 180 − s2 cppca ;
end omega2 cppca ( t r i a l ) =s2 cppca ; s3 cppca= SAM( Psi ( : ,
3 ) ,Psi CPPCA ( : , 3 ) ) ;
i f ( s3 cppca > 90) ; s3 cppca = 180 − s3 cppca ;
end omega3 cppca ( t r i a l ) =s3 cppca ;
s4 cppca= SAM( Psi ( : , 4 ) ,Psi CPPCA ( : , 4 ) ) ; i f ( s4
cppca > 90) ;
s4 cppca = 180 − s4 cppca ; end
omega4 cppca ( t r i a l ) =s4 cppca ; end subplot ( 2 , 4 , 1 ) ;
hist ( omega1 rsvd ( : ) , linspace (0 , 90 , 100) ) ; A = axis ;
A(1 ) = −10; A(2) = 90 ; axis (A) ; ylabel ( ’ Frequency ’ ) ;
xlabel ( ’ Angle v 1 ( degree s ) ’ ) ;
62
disp ( [ ’ Average omega1 rsvd = ’ num2str(mean( omega1 rsvd ( : )
) ) ’ degree s ’ ] ) ;
subplot ( 2 , 4 , 2 ) ; hist ( omega2 rsvd ( : ) , linspace (0 , 90
, 100) ) ; A = axis ; A(1 ) = −10; A(2) = 90 ; axis (A) ; ylabel (
’ Frequency ’ ) ; xlabel ( ’ Angle v 2 ( degree s ) ’ ) ; disp ( [
’ Average omega2 rsvd = ’ num2str(mean( omega2 rsvd ( : ) ) )
’ degree s ’ ] ) ; subplot ( 2 , 4 , 3 ) ; hist ( omega3 rsvd ( : )
, linspace (0 , 90 , 100) ) ; A = axis ; A(1 ) = −10; A(2) = 90 ;
axis (A) ; ylabel ( ’ Frequency ’ ) ; xlabel ( ’ Angle v 3 ( degree
s ) ’ ) ; disp ( [ ’ Average omega3 rsvd = ’ num2str(mean( omega3
rsvd ( : ) ) )
’ degree s ’ ] ) ; subplot ( 2 , 4 , 4 ) ; hist ( omega4 rsvd ( : )
, linspace (0 , 90 , 100) ) ; A = axis ; A(1 ) = −10; A(2) = 90 ;
axis (A) ; ylabel ( ’ Frequency ’ ) ; xlabel ( ’ Angle v 4 ( degree
s ) ’ ) ; disp ( [ ’ Average omega4 rsvd = ’ num2str(mean( omega4
rsvd ( : ) ) )
’ degree s ’ ] ) ; subplot ( 2 , 4 , 5 ) ; hist ( omega1 cppca ,
linspace (0 , 90 , 100) ) ; A = axis ; A(1 ) = −10; A(2) = 90 ;
axis (A) ; ylabel ( ’ Frequency ’ ) ; xlabel ( ’ Angle w 1 ( degree
s ) ’ ) ; disp ( [ ’ Average omega1 cppca = ’ num2str(mean( omega1
cppca ( : ) )
) ’ degree s ’ ] ) ;
63
subplot ( 2 , 4 , 6 ) ; hist ( omega2 cppca ( : ) , linspace (0 ,
90 , 100) ) ; A = axis ; A(1 ) = −10; A(2) = 90 ; axis (A) ; ylabel
( ’ Frequency ’ ) ; xlabel ( ’ Angle w 2 ( degree s ) ’ ) ; disp (
[ ’ Average omega2 cppca = ’ num2str(mean( omega2 cppca ( : )
)
) ’ degree s ’ ] ) ; subplot ( 2 , 4 , 7 ) ; hist ( omega3 cppca (
: ) , linspace (0 , 90 , 100) ) ; A = axis ; A(1 ) = −10; A(2) = 90
; axis (A) ; ylabel ( ’ Frequency ’ ) ; xlabel ( ’ Angle w 3 (
degree s ) ’ ) ; disp ( [ ’ Average omega3 cppca = ’ num2str(mean(
omega3 cppca ( : ) )
) ’ degree s ’ ] ) ; subplot ( 2 , 4 , 8 ) ; hist ( omega4 cppca (
: ) , linspace (0 , 90 , 100) ) ; A = axis ; A(1 ) = −10; A(2) = 90
; axis (A) ; ylabel ( ’ Frequency ’ ) ; xlabel ( ’ Angle w 4 (
degree s ) ’ ) ; disp ( [ ’ Average omega4 cppca = ’ num2str(mean(
omega4 cppca ( : ) )
) ’ degree s ’ ] ) ;
64
Vita
Jiani Zhang was born on February 18, 1987. She graduated with a
Bachelor of
Science degree in Mathematics from Beijing Institute of Technology,
China, in July
2010. Then she attended the department of Mathematics at Wake
Forest University,
and will receive an MA in mathematics from Wake Forest in August,
2012. She
is going to continue her studies in Mathematics by pursuing a Ph.D.
from Tufts
University.
65