+ All Categories
Home > Documents > Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning...

Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning...

Date post: 25-Aug-2020
Category:
Upload: others
View: 10 times
Download: 0 times
Share this document with a friend
19
1 Dimensionality Reduction PCA Machine Learning – CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington Feb 22, 2017 ©Carlos Guestrin 2005-2017
Transcript
Page 1: Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning –CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington

1

Dimensionality ReductionPCA

Machine Learning – CSE446David Wadden (slides provided by Carlos Guestrin)University of Washington

Feb 22, 2017©Carlos Guestrin 2005-2017

Page 2: Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning –CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington

2

Dimensionality reduction

n Input data may have thousands or millions of dimensions!¨ e.g., text data has

n Dimensionality reduction: represent data with fewer dimensions¨ easier learning – fewer parameters¨ visualization – hard to visualize more than 3D or 4D¨ discover “intrinsic dimensionality” of data

n high dimensional data that is truly lower dimensional

©Carlos Guestrin 2005-2013

David
Page 3: Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning –CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington

3

Lower dimensional projections

n Rather than picking a subset of the features, we can create new features that are combinations of existing features

n Let’s see this in the unsupervised setting ¨ just X, but no Y

©Carlos Guestrin 2005-2013

David
David
David
David
David
David
David
David
David
David
David
David
David
Page 4: Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning –CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington

4

Linear projection and reconstruction

x[1]

x[2]

project into1-dimension z

reconstruction:only know z,

what was (x[1],x[2])

©Carlos Guestrin 2005-2013

David
David
David
David
David
David
David
David
David
David
David
David
David
David
David
David
David
Page 5: Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning –CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington

5

Principal component analysis –basic idean Project d-dimensional data into k-dimensional

space while preserving information:¨ e.g., project space of 10000 words into 3-dimensions¨ e.g., project 3-d into 2-d

n Choose projection with minimum reconstruction error

©Carlos Guestrin 2005-2013

David
Page 6: Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning –CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington

6

Linear projections, a review

n Project a point into a (lower dimensional) space:¨ point: xi = (xi[1],…,xi[D])¨ select a basis – set of basis vectors – (u1,…,uK)

n we consider orthonormal basis: ¨ ui•ui=1, and ui•uj=0 for i¹j

¨ select a center – x, defines offset of space ¨ best coordinates in lower dimensional space defined

by dot-products: (zi[1],…,zi[K]),

©Carlos Guestrin 2005-2013

zi[j] = (xi � x̄) · uj

David
David
David
David
David
David
David
David
David
David
David
David
David
David
David
David
Page 7: Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning –CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington

7

PCA finds projection that minimizes reconstruction errorn Given N data points: xi = (xi[1],…, xi[D]), i=1…Nn Will represent each point as a projection:

¨ where: and

n PCA:¨ Given K<D, find (u1,…,uK)

minimizing reconstruction error:

x1

x2

©Carlos Guestrin 2005-2013

x̂i = x̄+KX

j=1

zi[j]uj x̄ =1

N

NX

i=1

xi zi[j] = (xi � x̄) · uj

errorK =

NX

i=1

(xi � ˆ

xi)2

David
David
David
David
David
David
David
David
David
David
David
David
David
David
David
David
Page 8: Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning –CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington

8

Understanding the reconstruction error

¨Given K<D, find (u1,…,uK) minimizing reconstruction error:

n Note that xi can be represented exactly by d-dimensional projection:

n Rewriting error:

©Carlos Guestrin 2005-2013

xi = x̄+DX

j=1

zi[j]uj

x̂i = x̄+KX

j=1

zi[j]uj

zi[j] = (xi � x̄) · uj

errorK =

NX

i=1

(xi � ˆ

xi)2

errorK =

NX

i=1

(xi � ˆ

xi)2=

NX

i=1

2

x+

DX

j=1

zi[j]uj �

0

@x̄+

KX

j=1

zi[j]uj

1

A

3

52

=

NX

i=1

2

4DX

j=K+1

zi[j]uj

3

52

=

NX

i=1

2

4DX

j=K+1

zi[j]uj · ujzi[j] + 2

DX

j=K+1

DX

`>j

zi[j]uj · u`zi[`]

3

5

=

NX

i=1

DX

j=K+1

(zi[j])2

David
David
David
David
David
Page 9: Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning –CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington

9

Reconstruction error and covariance matrix

©Carlos Guestrin 2005-2013

⌃ =1

N

NX

i=1

(xi � x̄)(xi � x̄)T

�m` =1

N

NX

i=1

(xi[m]� x̄[m])(xi[`]� x̄[`])errorK =

NX

i=1

DX

j=K+1

[uj · (xi � ¯

x)]

2

=

NX

i=1

DX

j=K+1

u

Tj (xi � ¯

x)(xi � ¯

x)

Tuj

=

DX

j=K+1

u

Tj

"nX

i=1

(xi � ¯

x)(xi � ¯

x)

T

#uj

= NDX

j=K+1

u

Tj ⌃uj

David
David
David
David
David
David
David
David
David
David
David
David
Page 10: Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning –CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington

10

Minimizing reconstruction error and eigen vectors

n Minimizing reconstruction error equivalent to picking (ordered) orthonormal basis (u1,…,uD) minimizing:

n Eigen vector:

n Minimizing reconstruction error equivalent to picking (uK+1,…,uD) to be eigen vectors with smallest eigen values

©Carlos Guestrin 2005-2013

⌃u = �u

errorK = NDX

j=k+1

uTj ⌃uj

David
David
David
David
David
David
David
David
Page 11: Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning –CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington

11

Basic PCA algoritm

n Start from m by n data matrix Xn Recenter: subtract mean from each row of X

¨ Xc ¬ X – Xn Compute covariance matrix:

¨ S ¬ 1/N XcT Xc

n Find eigen vectors and values of Sn Principal components: k eigen vectors with

highest eigen values

©Carlos Guestrin 2005-2013

David
David
David
David
David
David
Page 12: Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning –CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington

12

PCA example

©Carlos Guestrin 2005-2013

x̂i = x̄+KX

j=1

zi[j]uj

David
David
Page 13: Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning –CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington

13

PCA example – reconstruction

only used first principal component

©Carlos Guestrin 2005-2013

x̂i = x̄+KX

j=1

zi[j]uj

Page 14: Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning –CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington

14

Eigenfaces [Turk, Pentland ’91]

n Input images: n Principal components:

©Carlos Guestrin 2005-2013

David
David
David
David
Page 15: Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning –CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington

15

Eigenfaces reconstruction

n Each image corresponds to adding 8 principal components:

©Carlos Guestrin 2005-2013

David
Page 16: Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning –CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington

16

Scaling up

n Covariance matrix can be really big!¨ S is D by D¨ Say, only 10000 features¨ finding eigenvectors is very slow…

n Use singular value decomposition (SVD)¨ finds to K eigenvectors¨ great implementations available, e.g., scipy.linalg.svd

©Carlos Guestrin 2005-2013

Page 17: Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning –CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington

17

SVDn Write X = W S VT

¨ X ¬ data matrix, one row per datapoint¨ W ¬ weight matrix, one row per datapoint – coordinate of xi in eigenspace¨ S ¬ singular value matrix, diagonal matrix

n in our setting each entry is eigenvalue lj

¨ VT ¬ singular vector matrixn in our setting each row is eigenvector vj

©Carlos Guestrin 2005-2013

Page 18: Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning –CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington

18

PCA using SVD algoritm

n Start from m by n data matrix Xn Recenter: subtract mean from each row of X

¨ Xc ¬ X – Xn Call SVD algorithm on Xc – ask for k singular vectorsn Principal components: k singular vectors with highest

singular values (rows of VT)¨ Coefficients become:

©Carlos Guestrin 2005-2013

Page 19: Dimensionality Reduction PCA · 2017. 2. 24. · 1 Dimensionality Reduction PCA Machine Learning –CSE446 David Wadden (slides provided by Carlos Guestrin) University of Washington

19

What you need to know

n Dimensionality reduction¨ why and when it’s important

n Simple feature selectionn Principal component analysis

¨ minimizing reconstruction error¨ relationship to covariance matrix and eigenvectors¨ using SVD

©Carlos Guestrin 2005-2013


Recommended