Homework 2 - Principal Component Analysis€¦ · Problem 5: NLC Run Times 2 4 6 8 10 0.0 k 2.5 5.0...

Homework 2

Principal Component Analysis

J. Andrew Casey-Clyde

Problem 1: Center

Using SVD to find the first two principal directions, we start by centering

the matrix X of data points according to their mean, m =(

110

).

X̃ =

1 2 0

2 1 0

0 0 0

−1 1 0

1 1 0

1 1 0

=

0 1 0

1 0 0

−1 −1 0

1

Problem 1: SVD

Next, we perform SVD on the centered data, with X̃ = UΣVT . We start

by finding V and Σ from the eigenvectors and square roots of eigenvalues

of X̃T X̃, respectively. Computing X̃T X̃, we have

X̃T X̃ =

1 −1

1 −1

1

1

−1 −1

=

2 1

1 2

2

Problem 1: SVD, eigenvalues

Finding the eigenvalues of X̃T X̃, we have

det(

X̃T X̃− λI)

= 0

−λ(2− λ)2 + λ = 0

We can immediately see that one of the eigenvalues must be 0.

Continuing, we have

(2− λ)2 − 1 = 0

λ2 − 4λ+ 3 = 0

(λ− 3)(λ− 1) = 0.

Thus, Λ =(

310

)and Σ =

(σ1

σ2σ3

)=(√

310

). Additionally,

because σ3 = 0, we can conclude that the first two principal components

account for 100% of the variance, with a total variance of 4.

3

Problem 1: SVD, 1st principal direction

Solving for the first principal direction, we have

X̃T X̃v1 = 3v12 1

1 2

v1v2v3

= 3

v1v2v3

⇒ v3 = 0

2v1 + v2 = 3v1 ⇒ v2 = v1

v1 + 2v1 = 3v1

v1 = v1

Thus, with normalization, v1 =

√2/2√2/2

0

4

Problem 1: SVD, 2nd principal direction

Solving for the second principal direction, we have

X̃T X̃v2 = v22 1

1 2

v1v2v3

=

v1v2v3

⇒ v3 = 0

2v1 + v2 = v1 ⇒ v2 = −v1v1 − 2v1 = −v1

v1 = v1

Normalizing, v2 =

√2/2

−√2/2

0

5

Problem 1: PCA

With the first two principle directions, we have enough information to

compute the first two principle components for each point.

Y = X̃[v1 v2

]=

0 1 0

1 0 0

−1 −1 0

√2/2

√2/2

√2/2 −√2/2

0 0

=

√2/2 −√2/2√2/2

√2/2

−√

2 0

6

Problem 2

We consider the dot product of uTi uj . Since ui = 1

σiXvi , we have

uTi uj =

1

σiσjvTi XTXvj

But the vectors vi are just eigenvectors of XTX with associated

eigenvalues σ2i . Therefore we have

uTi uj =

1

σiσjvTi σ

2j vj

=σjσi

vTi vj

When i = j , the above quantity is 1, as the factor in front will cancel

itself, and the vi already form an orthonormal basis. Otherwise, when

i 6= j , it is 0 for the same orthonormality reasons.

7

Problem 3a

1500 1000 500 0 500 1000 1500pc1

1000

500

0

500

1000

1500

pc2

First Two Principal Components: MNIST 0

8

Problem 3b

1000 500 0 500 1000pc1

750

500

250

0

250

500

750

1000

pc2

First Two Principal Components: MNIST 1

9

Problem 3c

1000 500 0 500 1000 1500 2000pc1

1000

500

0

500

1000

1500

pc2

First Two Principal Components: MNIST 0, 101

10

Problem 4: PCA

For the USPS data,

• 50% of the variance is retained with the first 7 components

• 95% of the variance is retained with the top 88 components

11

Problem 4: kNN Error Rate

2 4 6 8 10k

0.06

0.08

0.10

0.12

0.14

0.16

Erro

r Rat

e

kNN: Error Rate vs. k Neighbors50% Variance95% VarianceNo PCA • Highest accuracy method:

PCA with 88 components

(95% variance retained)

• Error Rate: 0.0523

• k = 5

• At all values of k, PCA with

95% variance gives the

highest accuracy

12

Problem 4: kNN Run Times

2 4 6 8 10k

0

2

4

6

8

Run

Tim

e (s

)

kNN: Run Time vs. k Neighbors50% Variance: Total50% Variance: PCA50% Variance: Training50% Variance: Testing95% Variance: Total95% Variance: PCA95% Variance: Training95% Variance: TestingNo PCA: TotalNo PCA: PCANo PCA: TrainingNo PCA: Testing

• Lowest run time: PCA with

7 components (50%

variance retained)

• Run time: 0.988s

• k = 4


50% variance runs in the

shortest period of time

• Testing dominates run time

for PCA with 95% variance

and no PCA

• For 50% variance run time

primarily driven by the time

to perform PCA

13

Problem 5: NLC Error Rate

2 4 6 8 10k

0.04

0.06

0.08

0.10

0.12

0.14

0.16

Erro

r Rat

e

NLC: Error Rate vs. k Neighbors per Centroid50% Variance95% VarianceNo PCA

• Highest accuracy method:

NLC without PCA


• k = 4

• While NLC without PCA

results in the lowest error

rate for all k considered, it

is notably competitive with

95% variance PCA, which

acheives better accuracy at

several values of k

• Both methods are

consistently better than

PCA with only 50%

variance retained

14

Problem 5: NLC Run Times

2 4 6 8 10k

0.0

2.5

5.0

7.5

10.0

12.5

15.0

17.5

Run

Tim

e (s

)

NLC: Run Time vs. k Neighbors per Centroid50% Variance: Total50% Variance: PCA50% Variance: Training50% Variance: Testing95% Variance: Total95% Variance: PCA95% Variance: Training95% Variance: TestingNo PCA: TotalNo PCA: PCANo PCA: TrainingNo PCA: Testing


7 components (50%

variance retained)

• Run time: 5.09s

• k = 2


50% variance again has the

shortest run time (and is

nearly flat)

• For the NLC classifier,

testing time dominates for

all choices of variance

retained

15

Problem 6

• Using stellar photometric data from Sloan Digital Sky Survey

(SDSS)

• Magnitude measurements of light coming from stars in narrow

frequency bands (color magnitudes)

• Color magnitude values for each star are biased based on unknown

distances

• Can instead take difference between adjacent values to unbias data

(color indexes) [1]

• Data set has low dimensionality! Only 4 features

• 50% variance retained with only 1 component!

• 95% variance retained with 3 components

16

Problem 6: SDSS kNN Error Rate

2 4 6 8 10k

0.08

0.10

0.12

0.14

0.16

0.18

0.20

Erro

r Rat

e

kNN: Error Rate vs. k Neighbors50% Variance95% VarianceNo PCA


kNN without PCA


• k = 7

• kNN without PCA offers

consistently better accuracy.

17

Problem 6: SDSS kNN Run Times

2 4 6 8 10k

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

Run

Tim

e (s

)

kNN: Run Time vs. k Neighbors50% Variance: Total50% Variance: PCA50% Variance: Training50% Variance: Testing95% Variance: Total95% Variance: PCA95% Variance: Training95% Variance: TestingNo PCA: TotalNo PCA: PCANo PCA: TrainingNo PCA: Testing


88 components (95%

variance retained)


• k = 2

• 95% variance retantion

consistently faster at all

values of k

18

Problem 6: SDSS NLC Error Rate

2 4 6 8 10k

0.08

0.10

0.12

0.14

0.16

0.18

Erro

r Rat

e

NLC: Error Rate vs. k Neighbors per Centroid

50% Variance95% VarianceNo PCA


NLC without PCA


• k = 10

• NLC without PCA

consistently more accurate

19

Problem 6: SDSS NLC Run Times

2 4 6 8 10k

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Run

Tim

e (s

)

NLC: Run Time vs. k Neighbors per Centroid50% Variance: Total50% Variance: PCA50% Variance: Training50% Variance: Testing95% Variance: Total95% Variance: PCA95% Variance: Training95% Variance: TestingNo PCA: TotalNo PCA: PCANo PCA: TrainingNo PCA: Testing


7 components (50%

variance retained)


• k = 4

• Run times competitive at

most values of k for all

methods

20

Problem 6: SDSS Results

• Overall, best accuracies for this dataset are achieved without the use

of PCA, and in particular kNN is better

• This makes sense, given the low dimensionality of the original data

• Lowest run times achieved using kNN with 95% variance retained

• Run times for all variances relatively competitive within approach!

• Larger differences apparent in choice of method, with NLC taking 2

orders of magnitude longer than kNN

21

Date post:	29-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Homework 2 - Principal Component Analysis€¦ · Problem 5: NLC Run Times 2 4 6 8 10 0.0 k 2.5 5.0...

Documents