Date post: | 04-Jun-2018 |
Category: |
Documents |
Upload: | othersk468731 |
View: | 218 times |
Download: | 0 times |
of 26
8/13/2019 PCA - Easy to Understand
1/26
PCAPrincipal Component Analysis
Dr. Saed SayadUniversity of Toronto
2010
1http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
2/26
Basic Statistics
2http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
3/26
Statistics - Example
X SumX Count Average X-Xbar (X-Xbar)^2 Sxx Variance SD
0 40 4 10 -10 100 208 69.33 8.33
8 -2 4
12 2 4
20 10 100
Y SumY Count Average Y-Ybar (Y-Ybar)^2 Syy Variance SD8 40 4 10 -2 4 10 3.33 1.83
9 -1 1
11 1 1
12 2 4
X Y Count X-Xbar Y-Ybar (X-Xbar)(Y-Ybar) Sxy CoVariance
0 8 4 -10 -2 20 44 14.67
8 9 -2 -1 2
12 11 2 1 2
20 12 10 2 20
3http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
4/26
Covariance Matrix
4http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
5/26
Matrix Algebra
Example of one non-eigenvector and one eigenvector
5http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
6/26
Matrix Algebra
Example of how a scaled eigenvector is still an eigenvector
6http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
7/26
Eigenvectors Properties
Eigenvectors can only be found for square matrices and not every squarematrix has eigenvectors.
Given an n x n matrix that does have eigenvectors, there are n of them.
Another property of eigenvectors is that even if we scale the vector bysome amount before we multiply it, we still get the same multiple of it asa result. This is because if you scale a vector by some amount, all you aredoing is making it longer not changing its direction.
Lastly, all the eigenvectors of a matrix areperpendicular. it means that you
can express the data in terms of these perpendicular eigenvectors, insteadof expressing them in terms of thexand y axes.
7http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
8/26
Standardized Eigenvectors
We like to find the eigenvectors whose length isexactly one.
This is because, the length of a vector doesnt affectwhether its an eigenvector or not, whereas thedirection does.
So, in order to keep eigenvectors standard, whenever
we find an eigenvector we usually scale it to make ithave a length of 1, so that all eigenvectors have thesame length.
8http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
9/26
Standardized Eigenvectors
Eigenvector
Vector Length
Eigenvector with length of one
9http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
10/26
Eigenvalues
Eigenvalues is the amount by which the original vector was scaled aftermultiplication by the square matrix.
4 is the eigenvalue associated with that eigenvector in the example.
No matter what multiple of the eigenvector we took before wemultiplied it by the square matrix, we would always get 4 times thescaled vector.
Eigenvectors and Eigenvalues always come in pairs.
10http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
11/26
PCA
PCA is a way of identifying patterns in data, and expressingthe data in such a way as to highlight their similarities anddifferences.
Since patterns in data can be hard to find in data of highdimension, where the luxury of graphical representation is notavailable, PCA is a powerful tool for analysing data.
The other main advantage of PCA is that once you have found
these patterns in the data, and you compress the data, ie. byreducing the number of dimensions, without much loss ofinformation. This technique used in image compression.
11http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
12/26
PCA Original and Adjusted Data
Original Data Original Data - Average
12http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
13/26
PCA Original Data plot
13http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
14/26
Calculate Covariance Matrix
14http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
15/26
Calculate Eigenvectors and Eigenvalues from
Covariance Matrix
15http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
16/26
Eigenvectors Plot
16http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
17/26
Choosing components and forming a Feature
Vector
The eigenvector with the highesteigenvalue is theprinciple componentofthe data set. It is the most significant relationship between the datadimensions.
In general, once eigenvectors are found from the covariance matrix, the
next step is to order them by eigenvalue, highest to lowest. This gives usthe components in order of significance.
Now, if we like, we can decide to ignore the components of lessersignificance. We do lose some information, but if the eigenvalues aresmall, we dont lose much.
If we leave out some components, the final data set will have lessdimensions than the original.
17http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
18/26
Feature Vector
18http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
19/26
Deriving the new data set
Row Feature Vectoris the matrix with the eigenvectors in thecolumns transposed so that the eigenvectors are now in therows, with the most significant eigenvector at the top.
Row Data Adjustis the mean-adjusted data transposed, ie.the data items are in each column, with each row holding aseparate dimension.
Final Data is the final data items in columns, and dimensionsalong rows. It is the original data solely in terms of thevectors.
Final Data = Row Feature VectorXRow Data Adjust
19http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
20/26
Deriving the new data set
Two Eigenvectors
20http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
21/26
Get the original data back
Row Data Adjust= Row Feature Vector TX Final Data
Row Original Data = (Row Feature Vector TX Final Data) + Original Mean
21http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
22/26
Principal Component Regression
PCA + MLR
22http://chem-eng.utoronto.ca/~datamining/
8/13/2019 PCA - Easy to Understand
23/26
PCR
T is a matrix of SCORES
X is a DATA matrix
P is a matrix of LOADNIGS
Y is a dependent variable vector
B is the regression coefficients vector
http://chem-eng.utoronto.ca/~datamining/
23
YTT)T(B
ETBY
TPX
1-
8/13/2019 PCA - Easy to Understand
24/26
PCR
In PCR the X matrix is replaced by the T matrixwhich has less and orthogonal variables.
There will be no issue with inversion of the TT
matrix (unlike MLR) because of orthogonalscores.
PCR also resolves the issue of colinearitywhichcould in return reduce the prediction error.
There is no guarantee to have a PCR model thatworks better than an MLR model using the samedataset.
http://chem-eng.utoronto.ca/~datamining/
24
8/13/2019 PCA - Easy to Understand
25/26
Reference
www.cs.otago.ac.nz/cosc453/student_tutorial
s/principal_components.pdf
http://chem-eng.utoronto.ca/~datamining/ 25
8/13/2019 PCA - Easy to Understand
26/26
Questions?
26http://chem-eng.utoronto.ca/~datamining/