+ All Categories
Home > Documents > Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State...

Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State...

Date post: 19-Dec-2015
Category:
View: 217 times
Download: 3 times
Share this document with a friend
34
Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University http://www.public.asu.edu/~jye02
Transcript
Page 1: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Principal Component Analysis

Jieping Ye

Department of Computer Science and Engineering

Arizona State University

http://www.public.asu.edu/~jye02

Page 2: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Outline of lecture

• What is feature reduction?• Why feature reduction?• Feature reduction algorithms• Principal Component Analysis (PCA)• Nonlinear PCA using Kernels

Page 3: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

What is feature reduction?

• Feature reduction refers to the mapping of the original high-dimensional data onto a lower-dimensional space.– Criterion for feature reduction can be different based on different

problem settings.• Unsupervised setting: minimize the information loss• Supervised setting: maximize the class discrimination

• Given a set of data points of p variables

Compute the linear transformation (projection)

nxxx ,,, 21

)(: pdxGyxG dTpdp

Page 4: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

What is feature reduction?

dY pdTG

pX

dTdp XGYXG :

Linear transformation

Original data reduced data

Page 5: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

High-dimensional data

Gene expression Face images Handwritten digits

Page 6: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Outline of lecture

• What is feature reduction?• Why feature reduction?• Feature reduction algorithms• Principal Component Analysis• Nonlinear PCA using Kernels

Page 7: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Why feature reduction?

• Most machine learning and data mining techniques may not be effective for high-dimensional data – Curse of Dimensionality– Query accuracy and efficiency degrade rapidly as the dimension

increases.

• The intrinsic dimension may be small. – For example, the number of genes responsible for a certain type

of disease may be small.

Page 8: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Why feature reduction?

• Visualization: projection of high-dimensional data onto 2D or 3D.

• Data compression: efficient storage and retrieval.

• Noise removal: positive effect on query accuracy.

Page 9: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Application of feature reduction

• Face recognition• Handwritten digit recognition• Text mining• Image retrieval• Microarray data analysis• Protein classification

Page 10: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Outline of lecture

• What is feature reduction?• Why feature reduction?• Feature reduction algorithms• Principal Component Analysis• Nonlinear PCA using Kernels

Page 11: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Feature reduction algorithms

• Unsupervised– Latent Semantic Indexing (LSI): truncated SVD– Independent Component Analysis (ICA)– Principal Component Analysis (PCA)– Canonical Correlation Analysis (CCA)

• Supervised – Linear Discriminant Analysis (LDA)

• Semi-supervised – Research topic

Page 12: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Outline of lecture

• What is feature reduction?• Why feature reduction?• Feature reduction algorithms• Principal Component Analysis• Nonlinear PCA using Kernels

Page 13: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

What is Principal Component Analysis?

• Principal component analysis (PCA) – Reduce the dimensionality of a data set by finding a new set of

variables, smaller than the original set of variables– Retains most of the sample's information.– Useful for the compression and classification of data.

• By information we mean the variation present in the sample, given by the correlations between the original variables. – The new variables, called principal components (PCs), are

uncorrelated, and are ordered by the fraction of the total information each retains.

Page 14: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Geometric picture of principal components (PCs)

2z

1z

• the 1st PC is a minimum distance fit to a line in X space

• the 2nd PC is a minimum distance fit to a line in the plane perpendicular to the 1st PC

PCs are a series of linear least squares fits to a sample,each orthogonal to all the previous.

1z

Page 15: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Algebraic definition of PCs

.,,2,1,1

111 njxaxazp

iijij

T

pnxxx ,,, 21

]var[ 1z

Given a sample of n observations on a vector of p variables

define the first principal component of the sampleby the linear transformation

where the vector

is chosen such that is maximum.

),,,(

),,,(

21

121111

pjjjj

p

xxxx

aaaa

Page 16: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Algebraic derivation of PCs

To find first note that

where

is the covariance matrix.

Ti

n

ii xxxx

nS

1

1

1a

111

11

1

2

112

111

1

1))((]var[

Saaaxxxxan

xaxan

zzEz

Tn

i

T

iiT

n

i

Ti

T

mean. theis 1

1

n

iix

nx

In the following, we assume theData is centered. 0x

Page 17: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Algebraic derivation of PCs

npnxxxX ],,,[ 21

0x

TXXn

S1

Assume

Form the matrix:

then

TVUX

Obtain eigenvectors of S by computing the SVD of X:

Page 18: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

To find that maximizes subject to

Let λ be a Lagrange multiplier

is an eigenvector of S

corresponding to the largest eigenvalue

therefore

Algebraic derivation of PCs

1a ]var[ 1z 111 aaT

0)(

0

)1(

1

111

1111

aIS

aSaLa

aaSaaL

p

TT

1a

.1

Page 19: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

To find the next coefficient vector maximizing

then let λ and φ be Lagrange multipliers, and maximize

subject to

and to

First note that

Algebraic derivation of PCs

2a

122 aaT

]var[ 2z

0],cov[ 12 zz

2112112 ],cov[ aaSaazz TT

122222 )1( aaaaSaaL TTT

uncorrelated

Page 20: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Algebraic derivation of PCs

122222 )1( aaaaSaaL TTT

001222

aaSaLa

2222 and SaaaSa T

Page 21: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

We find that is also an eigenvector of S

whose eigenvalue is the second largest.

In general

• The kth largest eigenvalue of S is the variance of the kth PC.

• The kth PC retains the kth greatest fraction of the variation in the sample.

Algebraic derivation of PCs

2a

2

kkTkk Saaz ]var[

kz

Page 22: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Algebraic derivation of PCs

• Main steps for computing PCs– Form the covariance matrix S.

– Compute its eigenvectors:

– Use the first d eigenvectors to form the d PCs.

– The transformation G is given by

],,,[ 21 daaaG

p

iia 1

d

iia 1

.point A test dTp xGx

Page 23: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Optimality property of PCA

npTndT

ndTnp

XGGXXG

XGX

)(

Dimension reductionReconstruction

ndT XGY

pdTG

npX

Original data

dpG npX

Page 24: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Optimality property of PCA

2

FXX

The matrix G consisting of the first d eigenvectors of the covariance matrix S solves the following min problem:

Main theoretical result:

dF

T

GIGXGGXdp

T2G subject to )(min

reconstruction error

PCA projection minimizes the reconstruction error among all linear projections of size d.

Page 25: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Applications of PCA

• Eigenfaces for recognition. Turk and Pentland. 1991.

• Principal Component Analysis for clustering gene expression data. Yeung and Ruzzo. 2001.

• Probabilistic Disease Classification of Expression-Dependent Proteomic Data from Mass Spectrometry of Human Serum. Lilien. 2003.

Page 26: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

PCA for image compression

d=1 d=2 d=4 d=8

d=16 d=32 d=64 d=100Original Image

Page 27: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Outline of lecture

• What is feature reduction?• Why feature reduction?• Feature reduction algorithms• Principal Component Analysis• Nonlinear PCA using Kernels

Page 28: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Motivation

Linear projections will not detect thepattern.

Page 29: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Nonlinear PCA using Kernels

• Traditional PCA applies linear transformation– May not be effective for nonlinear data

• Solution: apply nonlinear transformation to potentially very high-dimensional space.

• Computational efficiency: apply the kernel trick.– Require PCA can be rewritten in terms of dot product.

)(: xx

)()(),( jiji xxxxK More on kernelslater

Page 30: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Nonlinear PCA using Kernels

Rewrite PCA in terms of dot product

.0 i.e., centered,been has data theAssume i

ix

Ti

ii xx

nS

1The covariance matrix S can be written as

i

iTi

Ti

ii xvx

nvvvxx

nSv )(

11

Let v be The eigenvector of S corresponding to nonzero eigenvalue

Eigenvectors of S lie in the space spanned by all data points.

Page 31: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Nonlinear PCA using Kernels

].x,,x,[xX where,1

n21 TXXn

S

Xxvi

ii

i

iTi

Ti

ii xvx

nvvvxx

nSv )(

11

The covariance matrix can be written in matrix form:

XXXXn

Sv T 1

)())((1

XXXXXXn

TTT

)(1

XXn

T Any benefits?

Page 32: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Nonlinear PCA using Kernels

Next consider the feature space: )(: xx

].x,,x,[x where,1

n21 XXX

nS

T

XXn

T1 Xxvi

ii )(

The (i,j)-th entry of XXT

is )()( ji xx

Apply the kernel trick: )()(),( jiji xxxxK

Kn

1K is called the kernel matrix.

Page 33: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Nonlinear PCA using Kernels

• Projection of a test point x onto v:

iii

iii

iii

xxKxx

xxvx

),()()(

)()()(

Explicit mapping is not required here.

Page 34: Principal Component Analysis Jieping Ye Department of Computer Science and Engineering Arizona State University jye02.

Reference

• Principal Component Analysis. I.T. Jolliffe.

• Kernel Principal Component Analysis. Schölkopf, et al.

• Geometric Methods for Feature Extraction and Dimensional Reduction. Burges.


Recommended