+ All Categories
Home > Documents > Principal Components: A Mathematical Introduction Simon Mason International Research Institute for...

Principal Components: A Mathematical Introduction Simon Mason International Research Institute for...

Date post: 14-Dec-2015
Category:
Upload: victor-powell
View: 221 times
Download: 0 times
Share this document with a friend
Popular Tags:
29
Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University L i n k i n g S c i e n c e t o S o c i e t y L i n k i n g S c i e n c e t o S o c i e t y
Transcript
Page 1: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

Principal Components:A Mathematical Introduction

Simon Mason

International Research Institute for Climate Prediction

The Earth Institute of Columbia University

L i n k i n g S c i e n c e t o S o c i e t yL i n k i n g S c i e n c e t o S o c i e t y

Page 2: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

What is the most beautiful city setting?

The setting could be measured on a variety of metrics, such as height of surrounding mountains, length of coastline.

But if more than one metric is used, then some combined measure will need to be devised.

Page 3: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

The city scores can be represented by a matrix, X. For simplicity, the scores are considered on only two metrics, and for only three cities.

The metrics are sea and mountains, and the cities are San Francisco, Hong Kong, and Cape Town:

sea mtns

San Francisco 4 4

Hong Kong 8 5

Cape Town 6 6

X

Page 4: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

The means and variances are:

4 4

8 5

6 6

X

mean 6

sea mo

5

vari

unt

an

ains

ce 4 1

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

The variance is used to distinguish the cities’ attractiveness. The total variance is 5.

Page 5: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

X can be expressed as an anomaly matrix or a standardized anomaly matrix:

2 1

2 0

0 1

X

1 1

1 0

0 1

X

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Page 6: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

In general, if

a d

b e

c f

X

and a + b + c = 0, and d + e + f = 0, then

2 2 2var iance 2a b c

covariance 2ad be cf

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Page 7: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

In general, matrix multiplication gives:

2 2 2

2 2 2

T

a da b c

b ed e f

c f

a b c ad be cf

da eb fc d e f

X X

So, if a + b + c = 0, and d + e + f = 0, then:

1 variance-covariance matrixTn X X

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Page 8: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

If X contains data expressed as anomalies

If X contains data expressed as standardized anomalies

1 variance-covariance matrixTn X X

1 correlation matrixTn X X

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Page 9: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

Using the city data expressed in standardized anomalies:

1 1

1

1 1

1 0

0 1

1 11 1 0

1 01 0 1

0 1

2 1

1 2

1 0.5

0.5 1

Tn n

n

X

X X

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Page 10: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

The variance-covariance matrix for the city data is:

11 0.5

0.5 1T

n

X X

Note that the covariances are greater than zero, implying that both metrics represent a common aspect of city attractiveness.

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Page 11: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

Because of the covariance (or correlation) between the two metrics, we could combine these two metrics into a single new metric that represents the variance that is common to both metrics.

Specifically we want to define sets of weights so that the new variables are uncorrelated, and have maximized variance.

Let the weights for the first principal component be a, and, for the second principal component, b.

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Page 12: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

In matrix notation data are post-multiplied by the weights, represented as U:

sea sea

mountains mountains

1 1

1 0

0 1

a b

a b

XU

This gives the principal components Z. The scores on the principal components are:

sea mountains sea mountains

sea sea

mountains mountains

PC1 PC2

San Francisco

Hong Kong

Cape Town

a a b b

a b

a b

Z

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Page 13: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

The principal components are defined as:

Z XU

Which simply states that they are calculated as the weighted sums of the original metrics.

Note that the sums of the squared weights = 1. Also if the principal components are to be uncorrelated, the weights also need to be uncorrelated.

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Page 14: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

These two properties of the weights are useful:

The diagonals are the sums of the squares of each column of X. A column of X contains the weights for one of the principal components, so the diagonal of XTX are 1. Because the weights are uncorrelated, the off-diagonals are 0.

1 0

0 1T T

U U UU I

sea mountains sea sea

sea mountains mountains mountains

2 2sea sea mountains mountainssea mountains

2 2sea sea mountains mountains sea mountains

T a a a b

b b a b

a a a b a b

b a b a b b

U U

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Page 15: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

So if we post-multiply

Z XU

by UT, we get:

I

T T

ZU XUU

X

X

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Allowing us to express X in terms of the principal component scores and loadings.

Page 16: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

Rem em ber that

1 Tn X X

is e ither the variance-covariance m atrix, or the corre lation m atrix (depending on whether X contains anom alies or standardized anom alies). W e can replace X by ZU T, which g ives:

1 1

1

1

TT T Tn n

T Tn

T Tn

X X ZU ZU

U Z ZU

U Z Z U

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Page 17: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

C o m p a r e 1 Tn X X w i t h 1 T

n Z Z , w h i c h a p p e a r s i n t h e p r e v i o u s

e q u a t i o n . J u s t a s 1 Tn X X i s t h e c o v a r i a n c e ( c o r r e l a t i o n ) f o r

X , s o 1 Tn Z Z i s t h e c o v a r i a n c e m a t r i x f o r Z .

W e k n o w t h a t t h e c o v a r i a n c e m a t r i x f o r Z i s a d i a g o n a l m a t r i x o f e i g e n v a l u e s ( v a r i a n c e s o f t h e p r i n c i p a l c o m p o n e n t s ) . S e t t i n g 1 T

nC X X , a n d 1 TnΛ Z Z ,

1 1T T Tn n

T

X X U Z Z U

C U Λ U

w h i c h s t a t e s t h a t t h e c o v a r i a n c e ( o r c o r r e l a t i o n ) m a t r i x o f t h e o r i g i n a l m e t r i c s i s r e l a t e d t o t h e w e i g h t s a n d v a r i a n c e s o f t h e p r i n c i p a l c o m p o n e n t s .

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Page 18: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

TC U Λ U c a n b e r e a r r a n g e d :

T T T

T

T

T

U C U U U Λ U U

U C U I Λ I

U C U Λ

U C U Λ 0

I f w e t a k e o n l y o n e p r i n c i p a l c o m p o n e n t a t a t i m e ,

0T u C u u a r e t h e w e i g h t s f o r t h i s c o m p o n e n t , a n d i s i t s v a r i a n c e .

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Page 19: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

0

0

0

0

T

T

u Cu

uu Cu u

Cu u

C I u

If C I is invertible we could premultiply by C I, and would be left with u = 0, which provides no useful solution. Therefore we want C I to be non-invertible, which we can ensure by setting the determinant to zero.

0 C I

Page 20: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

0

1 0.5 1 00

0.5 1 0 1

1 0.5 00

0.5 1 0

1 0.50

0.5 1

C I

Using the city data:

Page 21: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

which gives us the variances for both principal components. (Note the total variance.) The eigenvectors can be obtained by solving:

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

2 2

2

1 0.50

0.5 1

1 0.5 0

2 0.75 0

1.79, or 0.21

TC UΛU

Page 22: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

T h e p r i n c i p a l c o m p o n e n t s c a n a l s o b e d e r i v e d u s i n g S V D . R e m e m b e r t h a t t h e v a r i a n c e - c o v a r i a n c e m a t r i x f o r t h e p r i n c i p a l c o m p o n e n t s i s a d i a g o n a l m a t r i x c o n t a i n i n g t h e e i g e n v a l u e s :

12

1 . 7 9 0 . 0 0

0 . 0 0 0 . 2 1T

Z Z Λ

L e t t h e s t a n d a r d i z e d p r i n c i p a l c o m p o n e n t s b e r e p r e s e n t e d b y W :

1W Z S S r e p r e s e n t s t h e s t a n d a r d d e v i a t i o n s ,

S Λ .

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Page 23: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

I f w e s t a n d a r d i z e t h e p r i n c i p a l c o m p o n e n t s s o t h a t t h e y h a v e u n i t v a r i a n c e , t h e v a r i a n c e - c o v a r i a n c e m a t r i x o f t h e s t a n d a r d i z e d c o m p o n e n t s i s :

11 . 0 0 0 . 0 0

0 . 0 0 1 . 0 0T

n

W W

A n a l t e r n a t i v e s t a n d a r d i z a t i o n , Σ , a l s o e l i m i n a t e s t h e c o n s t a n t 1

n . L e t t h e s e s t a n d a r d i z e d p r i n c i p a l c o m p o n e n t s b e d e n o t e d V , w h i c h a r e r e s c a l e d p r i n c i p a l c o m p o n e n t s :

1V Z Σ s o t h a t

1 . 0 0 0 . 0 0

0 . 0 0 1 . 0 0T

V V

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Page 24: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

F r o m

1V Z Σ a n d t h e o r i g i n a l d e f i n i t i o n o f t h e p r i n c i p a l c o m p o n e n t s

Z X U W e g e t

1V X U Σ T o r e a r r a n g e t h i s e q u a t i o n i n t e r m s o f X …

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Page 25: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

1

1

T T

T

V X U Σ

V Σ X U Σ Σ

V Σ X U

V Σ U X U U

V Σ U X

N o w X i s e x p r e s s e d i n t e r m s o f t w o o r t h o g o n a l m a t r i c e s :

1 . 0 0 0 . 0 0

0 . 0 0 1 . 0 0T T

U U V V

TX V Σ U d e f i n e s t h e S V D o f X . A n S V D e x p r e s s e s a m a t r i x

i n t e r m s o f a d i a g o n a l m a t r i x o f s i n g u l a r v a l u e s , Σ , a n d t w o o r t h o g o n a l m a t r i c e s . O n e o f t h e o r t h o g o n a l m a t r i c e s i s t h e t h e p r i n c i p a l c o m p o n e n t w e i g h t s U , t h e o t h e r i s t h e s t a n d a r d i z e d - r e s c a l e d p r i n c i p a l c o m p o n e n t s c o r e s V .

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Page 26: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

The eigenvalues can be obtained from the singular vectors Σ as follows. From

1 TnΛ Z Z

and

1

1

V ZΣ

VΣ ZΣ Σ

VΣ Z

then …

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Page 27: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

1

1

1

1

1

21

T

n

T Tn

Tn

Tn

n

n

Λ VΣ VΣ

Λ Σ V VΣ

Λ Σ IΣ

Λ Σ Σ

Λ ΣΣ

Λ Σ

Therefore the singular vectors are simply rescaled eigenvalues.

Page 28: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

Finally, the SVD is useful for demonstrating the equivalence between S- and T-mode analyses.

Mode Rows of X Columns of X S Time Space T Space Time R Time Parameter Q Parameter Time O Space Parameter P Parameter Space

Page 29: Principal Components: A Mathematical Introduction Simon Mason International Research Institute for Climate Prediction The Earth Institute of Columbia University.

L i n k i n g S c i e n c e t o S i g h t – S e e i n g !L i n k i n g S c i e n c e t o S i g h t – S e e i n g !

T

TT T

T

X VΣU

X VΣU

UΣV

Therefore a T-mode principal components analysis will generate the same results as an S-mode analysis, except that the loadings and the scores are swapped., and the singular values will be scaled by a different value for n.


Recommended