Homework 2
Principal Component Analysis
J. Andrew Casey-Clyde
Problem 1: Center
Using SVD to find the first two principal directions, we start by centering
the matrix X of data points according to their mean, m =(
110
).
X̃ =
1 2 0
2 1 0
0 0 0
−1 1 0
1 1 0
1 1 0
=
0 1 0
1 0 0
−1 −1 0
1
Problem 1: SVD
Next, we perform SVD on the centered data, with X̃ = UΣVT . We start
by finding V and Σ from the eigenvectors and square roots of eigenvalues
of X̃T X̃, respectively. Computing X̃T X̃, we have
X̃T X̃ =
1 −1
1 −1
1
1
−1 −1
=
2 1
1 2
2
Problem 1: SVD, eigenvalues
Finding the eigenvalues of X̃T X̃, we have
det(
X̃T X̃− λI)
= 0
−λ(2− λ)2 + λ = 0
We can immediately see that one of the eigenvalues must be 0.
Continuing, we have
(2− λ)2 − 1 = 0
λ2 − 4λ+ 3 = 0
(λ− 3)(λ− 1) = 0.
Thus, Λ =(
310
)and Σ =
(σ1
σ2σ3
)=(√
310
). Additionally,
because σ3 = 0, we can conclude that the first two principal components
account for 100% of the variance, with a total variance of 4.
3
Problem 1: SVD, 1st principal direction
Solving for the first principal direction, we have
X̃T X̃v1 = 3v12 1
1 2
v1v2v3
= 3
v1v2v3
⇒ v3 = 0
2v1 + v2 = 3v1 ⇒ v2 = v1
v1 + 2v1 = 3v1
v1 = v1
Thus, with normalization, v1 =
√2/2√2/2
0
4
Problem 1: SVD, 2nd principal direction
Solving for the second principal direction, we have
X̃T X̃v2 = v22 1
1 2
v1v2v3
=
v1v2v3
⇒ v3 = 0
2v1 + v2 = v1 ⇒ v2 = −v1v1 − 2v1 = −v1
v1 = v1
Normalizing, v2 =
√2/2
−√2/2
0
5
Problem 1: PCA
With the first two principle directions, we have enough information to
compute the first two principle components for each point.
Y = X̃[v1 v2
]=
0 1 0
1 0 0
−1 −1 0
√2/2
√2/2
√2/2 −√2/2
0 0
=
√2/2 −√2/2√2/2
√2/2
−√
2 0
6
Problem 2
We consider the dot product of uTi uj . Since ui = 1
σiXvi , we have
uTi uj =
1
σiσjvTi XTXvj
But the vectors vi are just eigenvectors of XTX with associated
eigenvalues σ2i . Therefore we have
uTi uj =
1
σiσjvTi σ
2j vj
=σjσi
vTi vj
When i = j , the above quantity is 1, as the factor in front will cancel
itself, and the vi already form an orthonormal basis. Otherwise, when
i 6= j , it is 0 for the same orthonormality reasons.
7
Problem 3a
1500 1000 500 0 500 1000 1500pc1
1000
500
0
500
1000
1500
pc2
First Two Principal Components: MNIST 0
8
Problem 3b
1000 500 0 500 1000pc1
750
500
250
0
250
500
750
1000
pc2
First Two Principal Components: MNIST 1
9
Problem 3c
1000 500 0 500 1000 1500 2000pc1
1000
500
0
500
1000
1500
pc2
First Two Principal Components: MNIST 0, 101
10
Problem 4: PCA
For the USPS data,
• 50% of the variance is retained with the first 7 components
• 95% of the variance is retained with the top 88 components
11
Problem 4: kNN Error Rate
2 4 6 8 10k
0.06
0.08
0.10
0.12
0.14
0.16
Erro
r Rat
e
kNN: Error Rate vs. k Neighbors50% Variance95% VarianceNo PCA • Highest accuracy method:
PCA with 88 components
(95% variance retained)
• Error Rate: 0.0523
• k = 5
• At all values of k, PCA with
95% variance gives the
highest accuracy
12
Problem 4: kNN Run Times
2 4 6 8 10k
0
2
4
6
8
Run
Tim
e (s
)
kNN: Run Time vs. k Neighbors50% Variance: Total50% Variance: PCA50% Variance: Training50% Variance: Testing95% Variance: Total95% Variance: PCA95% Variance: Training95% Variance: TestingNo PCA: TotalNo PCA: PCANo PCA: TrainingNo PCA: Testing
• Lowest run time: PCA with
7 components (50%
variance retained)
• Run time: 0.988s
• k = 4
• At all values of k, PCA with
50% variance runs in the
shortest period of time
• Testing dominates run time
for PCA with 95% variance
and no PCA
• For 50% variance run time
primarily driven by the time
to perform PCA
13
Problem 5: NLC Error Rate
2 4 6 8 10k
0.04
0.06
0.08
0.10
0.12
0.14
0.16
Erro
r Rat
e
NLC: Error Rate vs. k Neighbors per Centroid50% Variance95% VarianceNo PCA
• Highest accuracy method:
NLC without PCA
• Error Rate: 0.0443
• k = 4
• While NLC without PCA
results in the lowest error
rate for all k considered, it
is notably competitive with
95% variance PCA, which
acheives better accuracy at
several values of k
• Both methods are
consistently better than
PCA with only 50%
variance retained
14
Problem 5: NLC Run Times
2 4 6 8 10k
0.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5
Run
Tim
e (s
)
NLC: Run Time vs. k Neighbors per Centroid50% Variance: Total50% Variance: PCA50% Variance: Training50% Variance: Testing95% Variance: Total95% Variance: PCA95% Variance: Training95% Variance: TestingNo PCA: TotalNo PCA: PCANo PCA: TrainingNo PCA: Testing
• Lowest run time: PCA with
7 components (50%
variance retained)
• Run time: 5.09s
• k = 2
• At all values of k, PCA with
50% variance again has the
shortest run time (and is
nearly flat)
• For the NLC classifier,
testing time dominates for
all choices of variance
retained
15
Problem 6
• Using stellar photometric data from Sloan Digital Sky Survey
(SDSS)
• Magnitude measurements of light coming from stars in narrow
frequency bands (color magnitudes)
• Color magnitude values for each star are biased based on unknown
distances
• Can instead take difference between adjacent values to unbias data
(color indexes) [1]
• Data set has low dimensionality! Only 4 features
• 50% variance retained with only 1 component!
• 95% variance retained with 3 components
16
Problem 6: SDSS kNN Error Rate
2 4 6 8 10k
0.08
0.10
0.12
0.14
0.16
0.18
0.20
Erro
r Rat
e
kNN: Error Rate vs. k Neighbors50% Variance95% VarianceNo PCA
• Highest accuracy method:
kNN without PCA
• Error Rate: 0.0702
• k = 7
• kNN without PCA offers
consistently better accuracy.
17
Problem 6: SDSS kNN Run Times
2 4 6 8 10k
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
Run
Tim
e (s
)
kNN: Run Time vs. k Neighbors50% Variance: Total50% Variance: PCA50% Variance: Training50% Variance: Testing95% Variance: Total95% Variance: PCA95% Variance: Training95% Variance: TestingNo PCA: TotalNo PCA: PCANo PCA: TrainingNo PCA: Testing
• Lowest run time: PCA with
88 components (95%
variance retained)
• Run time: 0.0283s
• k = 2
• 95% variance retantion
consistently faster at all
values of k
18
Problem 6: SDSS NLC Error Rate
2 4 6 8 10k
0.08
0.10
0.12
0.14
0.16
0.18
Erro
r Rat
e
NLC: Error Rate vs. k Neighbors per Centroid
50% Variance95% VarianceNo PCA
• Highest accuracy method:
NLC without PCA
• Error Rate: 0.0817
• k = 10
• NLC without PCA
consistently more accurate
19
Problem 6: SDSS NLC Run Times
2 4 6 8 10k
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Run
Tim
e (s
)
NLC: Run Time vs. k Neighbors per Centroid50% Variance: Total50% Variance: PCA50% Variance: Training50% Variance: Testing95% Variance: Total95% Variance: PCA95% Variance: Training95% Variance: TestingNo PCA: TotalNo PCA: PCANo PCA: TrainingNo PCA: Testing
• Lowest run time: PCA with
7 components (50%
variance retained)
• Run time: 2.876s
• k = 4
• Run times competitive at
most values of k for all
methods
20
Problem 6: SDSS Results
• Overall, best accuracies for this dataset are achieved without the use
of PCA, and in particular kNN is better
• This makes sense, given the low dimensionality of the original data
• Lowest run times achieved using kNN with 95% variance retained
• Run times for all variances relatively competitive within approach!
• Larger differences apparent in choice of method, with NLC taking 2
orders of magnitude longer than kNN
21