SVD and PCA
Derek Onken and Li Xiong
January 29, 2018 2
Feature Extraction
Create new features (attributes) by combining/mapping existing ones
Common methods
Principle Component Analysis
Singular Value Decomposition
Other compression methods (time-frequency analysis)
Fourier transform (e.g. time series)
Discrete Wavelet Transform (e.g. 2D images)
January 29, 2018 3
Principle component analysis: find the dimensions that capture the most variance
A linear mapping of the data to a new coordinate system such that the greatest variance lies on the first coordinate (the first principal component), the second greatest variance on the second coordinate, and so on.
Steps
Normalize input data: each attribute falls within the same range
Compute k orthonormal (unit) vectors, i.e., principal components -each input data (vector) is a linear combination of the k principal component vectors
The principal components are sorted in order of decreasing “significance”
Weak components can be eliminated, i.e., those with low variance
Principal Component Analysis (PCA)
Dimensionality Reduction: PCA
Mathematically
Compute the covariance matrix
Find the eigenvectors of the covariance matrix correspond to large eigenvalues
Y
X
v
PCA: Illustrative Example
5
PCA: Illustrative Example
6
PCA: Illustrative Example
7
PCA: Illustrative Example
8
PCA: Illustrative Example
9
Eigen Decomposition
How the eigenvalues and eigenvectors create a Matrix decomposition.• Q is a matrix consisting of the eigenvectors• Λ is the diagonal matrix containing all the eigenvalues
Singular Value Decomposition (SVD)
Similarity of Eigen and SVD
Columns of Q are eigenvectors
Λ contains eigenvalues
Columns of u are left-singular vectors
Columns of v are right-singular vects
Σ contains ordered singular values 𝜎𝑖
A must be square and we defined A as A=MTM.
The vj are eigenvectors of MTM.
The ui are eigenvectors of MMT.
The eigenvalues are squares of the singular values. (𝜆𝑖 = 𝜎𝑖2)
AN APPLICATION EXAMPLE…..
FROM::
Dimensionality Reduction:SVD & CUR
CS246: Mining Massive DatasetsJure Leskovec, Stanford University
http://cs246.stanford.edu
SVD - Properties
It is always possible to decompose a real matrix A into A = U VT , where
U, , V: unique
U, V: column orthonormal
UT U = I; VT V = I (I: identity matrix)
(Columns are orthogonal unit vectors)
: diagonal
Entries (singular values) are positive, and sorted in decreasing order (σ1 σ2 ... 0)
1/29/2018Jure Leskovec, Stanford CS246:
Mining Massive Datasets15
Nice proof of uniqueness: http://www.mpi-inf.mpg.de/~bast/ir-seminar-ws04/lecture2.pdf
http://www.mpi-inf.mpg.de/~bast/ir-seminar-ws04/lecture2.pdf
SVD – Example: Users-to-Movies
Consider a matrix. What does SVD do?
1/29/2018Jure Leskovec, Stanford CS246:
Mining Massive Datasets16
=
SciFi
Romance
Matr
ix
Alie
n
Sere
nity
Casa
bla
nca
Am
elie
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
m
n
U
VT
“Concepts” AKA Latent dimensionsAKA Latent factors
SVD – Example: Users-to-Movies
A = U VT - example: Users to Movies
1/29/2018Jure Leskovec, Stanford CS246:
Mining Massive Datasets17
=
SciFi
x x
Matr
ix
Alie
n
Sere
nity
Casa
bla
nca
Am
elie
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
0.13 0.02 -0.01
0.41 0.07 -0.03
0.55 0.09 -0.04
0.68 0.11 -0.05
0.15 -0.59 0.65
0.07 -0.73 -0.67
0.07 -0.29 0.32
12.4 0 0
0 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
Romance
SVD – Example: Users-to-Movies
A = U VT - example: Users to Movies
1/29/2018Jure Leskovec, Stanford CS246:
Mining Massive Datasets18
SciFi-concept
Romance-concept
=
SciFi
x x
Matr
ix
Alie
n
Sere
nity
Casa
bla
nca
Am
elie
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
0.13 0.02 -0.01
0.41 0.07 -0.03
0.55 0.09 -0.04
0.68 0.11 -0.05
0.15 -0.59 0.65
0.07 -0.73 -0.67
0.07 -0.29 0.32
12.4 0 0
0 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
Romance
SVD – Example: Users-to-Movies
A = U VT - example:
1/29/2018Jure Leskovec, Stanford CS246:
Mining Massive Datasets19
Romance-concept
U is “user-to-concept” factor matrix
SciFi-concept
=
SciFi
x x
Matr
ix
Alie
n
Sere
nity
Casa
bla
nca
Am
elie
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
0.13 0.02 -0.01
0.41 0.07 -0.03
0.55 0.09 -0.04
0.68 0.11 -0.05
0.15 -0.59 0.65
0.07 -0.73 -0.67
0.07 -0.29 0.32
12.4 0 0
0 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
Romance
SVD – Example: Users-to-Movies
A = U VT - example:
1/29/2018Jure Leskovec, Stanford CS246:
Mining Massive Datasets20
SciFi
SciFi-concept
“strength” of the SciFi-concept
=
SciFi
x x
Matr
ix
Alie
n
Sere
nity
Casa
bla
nca
Am
elie
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
0.13 0.02 -0.01
0.41 0.07 -0.03
0.55 0.09 -0.04
0.68 0.11 -0.05
0.15 -0.59 0.65
0.07 -0.73 -0.67
0.07 -0.29 0.32
12.4 0 0
0 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
Romance
SVD – Example: Users-to-Movies
A = U VT - example:
1/29/2018Jure Leskovec, Stanford CS246:
Mining Massive Datasets21
SciFi-concept
V is “movie-to-concept”factor matrix
SciFi-concept
=
SciFi
x x
Matr
ix
Alie
n
Sere
nity
Casa
bla
nca
Am
elie
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
0.13 0.02 -0.01
0.41 0.07 -0.03
0.55 0.09 -0.04
0.68 0.11 -0.05
0.15 -0.59 0.65
0.07 -0.73 -0.67
0.07 -0.29 0.32
12.4 0 0
0 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
Romance
1/29/2018Jure Leskovec, Stanford CS246:
Mining Massive Datasets22
SVD - Interpretation #1
‘movies’, ‘users’ and ‘concepts’: U: user-to-concept matrix
V: movie-to-concept matrix
: its diagonal elements: ‘strength’ of each concept
SVD – Best Low Rank Approx.
Fact: SVD gives ‘best’ axis to project on:
‘best’ = minimizing the sum of reconstruction errors
1/29/2018Jure Leskovec, Stanford CS246:
Mining Massive Datasets23
A U
Sigma
VT=
B U
Sigma
VT=
B is best approximation of A:
𝐴 − 𝐵 𝐹 =
𝑖𝑗
𝐴𝑖𝑗 − 𝐵𝑖𝑗2
Example of SVD
Case study: How to query?
Q: Find users that like ‘Matrix’
A: Map query into a ‘concept space’ – how?
1/29/2018Jure Leskovec, Stanford CS246:
Mining Massive Datasets25
=
SciFi
Romnce
x x
Matr
ix
Alie
n
Sere
nity
Casa
bla
nca
Am
elie
1 1 1 0 0
3 3 3 0 0
4 4 4 0 0
5 5 5 0 0
0 2 0 4 4
0 0 0 5 5
0 1 0 2 2
0.13 0.02 -0.01
0.41 0.07 -0.03
0.55 0.09 -0.04
0.68 0.11 -0.05
0.15 -0.59 0.65
0.07 -0.73 -0.67
0.07 -0.29 0.32
12.4 0 0
0 9.5 0
0 0 1.3
0.56 0.59 0.56 0.09 0.09
0.12 -0.02 0.12 -0.69 -0.69
0.40 -0.80 0.40 0.09 0.09
Case study: How to query?
Q: Find users that like ‘Matrix’
A: Map query into a ‘concept space’ – how?
1/29/2018Jure Leskovec, Stanford CS246:
Mining Massive Datasets26
5 0 0 0 0
q =
Matrix
Alie
n
v1
q
v2
Matr
ix
Alie
n
Sere
nity
Casa
bla
nca
Am
elie
Project into concept space:
Inner product with each
‘concept’ vector vi
Case study: How to query?
Q: Find users that like ‘Matrix’
A: Map query into a ‘concept space’ – how?
1/29/2018Jure Leskovec, Stanford CS246:
Mining Massive Datasets27
v1
q
q*v1
5 0 0 0 0
Matr
ix
Alie
n
Sere
nity
Casa
bla
nca
Am
elie
v2
Matrix
Alie
n
q =
Project into concept space:
Inner product with each
‘concept’ vector vi
Case study: How to query?
Compactly, we have:
qconcept = q V
E.g.:
1/29/2018Jure Leskovec, Stanford CS246:
Mining Massive Datasets28
movie-to-conceptfactors (V)
=
SciFi-concept
5 0 0 0 0
Matr
ix
Alie
n
Sere
nity
Casa
bla
nca
Am
elie
q =
0.56 0.12
0.59 -0.02
0.56 0.12
0.09 -0.69
0.09 -0.69
x 2.8 0.6
Case study: How to query?
How would the user d that rated (‘Alien’, ‘Serenity’) be handled?dconcept = d V
E.g.:
1/29/2018Jure Leskovec, Stanford CS246:
Mining Massive Datasets29
movie-to-conceptfactors (V)
=
SciFi-concept
0 4 5 0 0
Matr
ix
Alie
n
Sere
nity
Casa
bla
nca
Am
elie
q =
0.56 0.12
0.59 -0.02
0.56 0.12
0.09 -0.69
0.09 -0.69
x 5.2 0.4
Case study: How to query?
Observation: User d that rated (‘Alien’, ‘Serenity’) will be similar to user qthat rated (‘Matrix’), although d and q have zero ratings in common!
1/29/2018Jure Leskovec, Stanford CS246:
Mining Massive Datasets30
0 4 5 0 0
d =
SciFi-concept
5 0 0 0 0
q =
Matr
ix
Alie
n
Sere
nity
Casa
bla
nca
Am
elie
Zero ratings in common Similarity > 0
2.8 0.6
5.2 0.4
SVD: Drawbacks
+ Optimal low-rank approximationin terms of Frobenius norm
- Interpretability problem: A singular vector specifies a linear
combination of all input columns or rows
- Lack of sparsity: Singular vectors are dense!
1/29/2018Jure Leskovec, Stanford CS246:
Mining Massive Datasets31
=
U
VT