+ All Categories
Home > Documents > Multivariate Data Analysis

Multivariate Data Analysis

Date post: 30-Dec-2015
Category:
Upload: wayne-kirk
View: 37 times
Download: 0 times
Share this document with a friend
Description:
Multivariate Data Analysis. Principal Component Analysis. Principal Component Analysis (PCA). Singular Value Decomposition Eigenvector / eigenvalue calculation. Data Matrix (IxK). Reduce variables Improve projections Remove noise Find outliers Find classes. K. X. I. PCA. - PowerPoint PPT Presentation
Popular Tags:
40
Multivariate Data Analysis Principal Component Analysis
Transcript
Page 1: Multivariate Data Analysis

Multivariate Data Analysis

Principal Component Analysis

Page 2: Multivariate Data Analysis

Principal Component Analysis (PCA)

• Singular Value Decomposition

• Eigenvector / eigenvalue calculation

Page 3: Multivariate Data Analysis

Data Matrix (IxK)

• Reduce variables

• Improve projections

• Remove noise

• Find outliers

• Find classes

X

I

K

Page 4: Multivariate Data Analysis

PCA

• Example with 2 variables, 6 objects

• Find best (most informative) direction in space

• Describe direction

• Make projection

Page 5: Multivariate Data Analysis

x1

x2

Page 6: Multivariate Data Analysis

x1

x2

Page 7: Multivariate Data Analysis

1st PC

Page 8: Multivariate Data Analysis

1st PC

Score

Residual

Page 9: Multivariate Data Analysis

1st PC

Loading p1

Loading p2

Unit vector

Page 10: Multivariate Data Analysis

1st PC

Loading p1 = cos()

Loading p2 = sin ()Unit vector

Page 11: Multivariate Data Analysis

X t

p

I

K

Score vector

Loading vector

i

Page 12: Multivariate Data Analysis

X t

p

I

K

Score vector

Loading vector

k

Page 13: Multivariate Data Analysis

X t

p

I

K

Score vector

Loading vector

Page 14: Multivariate Data Analysis

X = t1p1’ + t2p2’ + ... + tApA’ + E

X=TP’+E

X : properly preprocessed (IxK)T: Score matrix (IxA)P: loading matrix (KxA)E: residual matrix (IxK)ta: score vectorpa: loading vector

Page 15: Multivariate Data Analysis

The Wine Example

People magazine

Wise & Gallagher

Page 16: Multivariate Data Analysis

63.5000 40.1000 2.5000 78.0000 61.1000 58.0000 25.1000 0.9000 78.0000 94.1000 46.0000 65.0000 1.7000 78.0000 106.4000 15.7000 102.1000 1.2000 78.0000 173.0000 12.2000 100.0000 1.5000 77.0000 199.7000 8.9000 87.8000 2.0000 76.0000 176.0000 2.7000 17.1000 3.8000 69.0000 373.6000 1.7000 140.0000 1.0000 73.0000 283.7000 1.0000 55.0000 2.1000 79.0000 34.7000 0.2000 50.4000 0.8000 73.0000 36.4000

FranceItaly Switz AustraBrit U.S.A.RussiaCzech Japan Mexico

Wine Beer Spirit LifeEx HeartD

Page 17: Multivariate Data Analysis

Beer Wine Spirit LifeEx HeartD

20.9900 68.2600 1.7500 75.9000 153.8700

24.9270 38.6718 0.9132 3.2128 110.8182

Mean

StandardDeviation

Page 18: Multivariate Data Analysis

1 2 3 4 50

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Component

Singular value

1=46%

32%

12%8%

2%

Page 19: Multivariate Data Analysis

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

1

2

3

4 5

6

7

8

9

10

Score 1 (46%)

Score 2 (32%)

France

ItalySwitz

AustralBrit

USA

Russia

Czech

JapanMex

Page 20: Multivariate Data Analysis

-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

2

3

45

Loading 1

Loading 2

Wine

Beer

Spirit

Life exp.Heart dis.

Page 21: Multivariate Data Analysis

Conclusions

Scores = positions of objects in multivariate space

Loadings = importance of original variables for new directions

Try to explain a large enough portion of X (46+32 = 78%)

Page 22: Multivariate Data Analysis

The Apricot Example

Manley & Geladi

Page 23: Multivariate Data Analysis

1000 1500 2000 2500

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

Wavelength, nm

Pseudoabsorbance

Appelkoos

Page 24: Multivariate Data Analysis

1 2 3 4 5 6 7 8 9 10

0

1

2

3

4

5

6

7

8

9

Component number

Singular value

Scree plot

Page 25: Multivariate Data Analysis

What is rank?

Mathematical rank = max(min(I,K))

Gives zero residual

Effective rank = A

Separates model from noise

Page 26: Multivariate Data Analysis

ANOVA

68.8269 1.2843 0.0463 0.0045 0.0007 0.0003 0.0002 0.0001 0.0000 0.0000

70.1634

98.10 1.83 0.07 0.01 0.00 0.00 0.00 0.00 0.00 0.00

Comp# SS SS% SS%cum

100

98.1099.93100

1 2 3 4 5 6 7 8 9 10

Total

Page 27: Multivariate Data Analysis

-0.5 0 0.5 1

-0.5

0

0.5

1

1

2

3

4

5 6

7

8

9

10

Score 1 (98%)

Score 2 (2%)

Page 28: Multivariate Data Analysis

ANOVA

SStot = SS1 + SS2 + SS3 +...+ SS(I or K)

SStot = 1 + 2 + 3 +...+ (I or K)

From largest to smallest!

Page 29: Multivariate Data Analysis

ANOVA

X = TP’ + E

data = model + residual

SStot = SSmod + SSres

R2 = SSmod / SStot = 1 - SSres / SStot

Coefficient of determination (often in %)

Page 30: Multivariate Data Analysis

Examples

Wines R2 = SSmod = 78% SSres = 22% 2 Comp.

Apricots 1 R2 = SSmod = 99.93% SSres = 0.07%

2 Comp.

Apricots 2 R2 = SSmod = 100% SSres = ±0.0%

3 Comp.

Page 31: Multivariate Data Analysis

1000 1500 2000 2500

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Wavelength, nm

Absorbance

Outliers removed

Page 32: Multivariate Data Analysis

1 2 3 4 5 6 7 80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Singular values

Component

No outliers

1=81%

16%

3%

Page 33: Multivariate Data Analysis

-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1

2

34

5

6

7

8

Score 2 (16%)

Score 3 (3%)

Whole fruit

No kernel

Thin slice

Page 34: Multivariate Data Analysis

1000 1500 2000 2500-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

Wavelength, nm

Loading 2 3

Page 35: Multivariate Data Analysis

-0.06 -0.04 -0.02 0 0.02 0.04 0.06-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

Loading 2

Loading 3

Page 36: Multivariate Data Analysis

More nomenclature

Score = Latent Variable

Loading vector = Eigenvector

Effective rank = Pseudorank = Model dimensionality = Number of components

SSa = Eigenvalue

Singular value = SSa1/2

Page 37: Multivariate Data Analysis

An analysis sequence

• 1. Scale, mean-center data

• 2. Calculate a few components

• 3. Check scores, loadings

• 4. Find outliers, groupings, explain

• 5. Remove outliers

Page 38: Multivariate Data Analysis

An analysis sequence

• 6. Scale, mean-center data

• 7. Calculate enough components

• 8. Try to detemine pseudorank

• 9. Check score plots

• 10. Check loading plots

• 11. Check residuals

Page 39: Multivariate Data Analysis

Residual stdev

1 2 3 4 5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4

Wines

Page 40: Multivariate Data Analysis

1 2 3 4 50

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Residual stdevWines

0 1 2 3 4


Recommended