Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning...

1

Dimensionality ReductionPCA

Machine Learning – CSE446David Wadden (slides provided by Carlos Guestrin)University of Washington

Feb 22, 2017©Carlos Guestrin 2005-2017

2

Dimensionality reduction

n Input data may have thousands or millions of dimensions!¨ e.g., text data has

n Dimensionality reduction: represent data with fewer dimensions¨ easier learning – fewer parameters¨ visualization – hard to visualize more than 3D or 4D¨ discover “intrinsic dimensionality” of data

n high dimensional data that is truly lower dimensional

©Carlos Guestrin 2005-2013

David

3

Lower dimensional projections

n Rather than picking a subset of the features, we can create new features that are combinations of existing features

n Let’s see this in the unsupervised setting ¨ just X, but no Y


David

David

David

David

David

David

David

David

David

David

David

David

David

4

Linear projection and reconstruction

x[1]

x[2]

project into1-dimension z

reconstruction:only know z,

what was (x[1],x[2])


David

David

David

David

David

David

David

David

David

David

David

David

David

David

David

David

David

5

Principal component analysis –basic idean Project d-dimensional data into k-dimensional

space while preserving information:¨ e.g., project space of 10000 words into 3-dimensions¨ e.g., project 3-d into 2-d

n Choose projection with minimum reconstruction error


David

6

Linear projections, a review

n Project a point into a (lower dimensional) space:¨ point: xi = (xi[1],…,xi[D])¨ select a basis – set of basis vectors – (u1,…,uK)

n we consider orthonormal basis: ¨ ui•ui=1, and ui•uj=0 for i¹j

¨ select a center – x, defines offset of space ¨ best coordinates in lower dimensional space defined

by dot-products: (zi[1],…,zi[K]),


zi[j] = (xi � x̄) · uj

David

David

David

David

David

David

David

David

David

David

David

David

David

David

David

David

7

PCA finds projection that minimizes reconstruction errorn Given N data points: xi = (xi[1],…, xi[D]), i=1…Nn Will represent each point as a projection:

¨ where: and

n PCA:¨ Given K<D, find (u1,…,uK)

minimizing reconstruction error:

x1

x2


x̂i = x̄+KX

j=1

zi[j]uj x̄ =1

N

NX

i=1

xi zi[j] = (xi � x̄) · uj

errorK =

NX

i=1

(xi � ˆ

xi)2

David

David

David

David

David

David

David

David

David

David

David

David

David

David

David

David

8

Understanding the reconstruction error

¨Given K<D, find (u1,…,uK) minimizing reconstruction error:

n Note that xi can be represented exactly by d-dimensional projection:

n Rewriting error:


xi = x̄+DX

j=1

zi[j]uj

x̂i = x̄+KX

j=1

zi[j]uj

zi[j] = (xi � x̄) · uj

errorK =

NX

i=1

(xi � ˆ

xi)2

errorK =

NX

i=1

(xi � ˆ

xi)2=

NX

i=1

2

4¯

x+

DX

j=1

zi[j]uj �

0

@x̄+

KX

j=1

zi[j]uj

1

A

3

52

=

NX

i=1

2

4DX

j=K+1

zi[j]uj

3

52

=

NX

i=1

2

4DX

j=K+1

zi[j]uj · ujzi[j] + 2

DX

j=K+1

DX

`>j

zi[j]uj · u`zi[`]

3

5

=

NX

i=1

DX

j=K+1

(zi[j])2

David

David

David

David

David

9

Reconstruction error and covariance matrix


⌃ =1

N

NX

i=1

(xi � x̄)(xi � x̄)T

�m` =1

N

NX

i=1

(xi[m]� x̄[m])(xi[`]� x̄[`])errorK =

NX

i=1

DX

j=K+1

[uj · (xi � ¯

x)]

2

=

NX

i=1

DX

j=K+1

u

Tj (xi � ¯

x)(xi � ¯

x)

Tuj

=

DX

j=K+1

u

Tj

"nX

i=1

(xi � ¯

x)(xi � ¯

x)

T

#uj

= NDX

j=K+1

u

Tj ⌃uj

David

David

David

David

David

David

David

David

David

David

David

David

10

Minimizing reconstruction error and eigen vectors

n Minimizing reconstruction error equivalent to picking (ordered) orthonormal basis (u1,…,uD) minimizing:

n Eigen vector:

n Minimizing reconstruction error equivalent to picking (uK+1,…,uD) to be eigen vectors with smallest eigen values


⌃u = �u

errorK = NDX

j=k+1

uTj ⌃uj

David

David

David

David

David

David

David

David

11

Basic PCA algoritm

n Start from m by n data matrix Xn Recenter: subtract mean from each row of X

¨ Xc ¬ X – Xn Compute covariance matrix:

¨ S ¬ 1/N XcT Xc

n Find eigen vectors and values of Sn Principal components: k eigen vectors with

highest eigen values


David

David

David

David

David

David

12

PCA example


x̂i = x̄+KX

j=1

zi[j]uj

David

David

13

PCA example – reconstruction

only used first principal component


x̂i = x̄+KX

j=1

zi[j]uj

14

Eigenfaces [Turk, Pentland ’91]

n Input images: n Principal components:


David

David

David

David

15

Eigenfaces reconstruction

n Each image corresponds to adding 8 principal components:


David

16

Scaling up

n Covariance matrix can be really big!¨ S is D by D¨ Say, only 10000 features¨ finding eigenvectors is very slow…

n Use singular value decomposition (SVD)¨ finds to K eigenvectors¨ great implementations available, e.g., scipy.linalg.svd


17

SVDn Write X = W S VT

¨ X ¬ data matrix, one row per datapoint¨ W ¬ weight matrix, one row per datapoint – coordinate of xi in eigenspace¨ S ¬ singular value matrix, diagonal matrix

n in our setting each entry is eigenvalue lj

¨ VT ¬ singular vector matrixn in our setting each row is eigenvector vj


18

PCA using SVD algoritm

n Start from m by n data matrix Xn Recenter: subtract mean from each row of X

¨ Xc ¬ X – Xn Call SVD algorithm on Xc – ask for k singular vectorsn Principal components: k singular vectors with highest

singular values (rows of VT)¨ Coefficients become:


19

What you need to know

n Dimensionality reduction¨ why and when it’s important

n Simple feature selectionn Principal component analysis

¨ minimizing reconstruction error¨ relationship to covariance matrix and eigenvectors¨ using SVD


Date post:	25-Aug-2020
Category:	Documents
Upload:	others
View:	10 times
Download:	0 times

Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning...

Documents