Department of Computer ScienceCSCI 5622: Machine Learning
Chenhao TanLecture 16: Dimensionality Reduction
Slides adapted from Jordan Boyd-Graber, Chris Ketelsen
1
Midterm
A. Review session
B. Flipped classroom
C. Go over the example midterm
D. Clustering!
2
Learning objectives
• Understand what unsupervised learning is for
• Learn principal component analysis
• Learn singular value decomposition
3
Supervised learning
4
Unsupervised learning
Data: X Labels: Y Data: X
Supervised learning
5
Unsupervised learning
Data: X
Latent structure: Z
Data: X Labels: Y
When do we need unsupervised learning?
6
When do we need unsupervised learning?
• Acquiring labels is expensive
• You may not even know what labels to acquire
7
When do we need unsupervised learning?• Exploratory data analysis
• Learn patterns/representations that can be useful for supervised
learning (representation learning)
• Generate data
• …
8
When do we need unsupervised learning?
9
https://qz.com/1090267/artificial-intelligence-can-now-show-you-how-those-pants-will-fit/
Unsupervised learning
10
• Dimensionality reduction
• Clustering
• Topic modeling
Unsupervised learning
11
• Dimensionality reduction
• Clustering
• Topic modeling
Principal Component Analysis -Motivation
12
Principal Component Analysis -Motivation
13
Data’s features almost certainly correlated
Principal Component Analysis -Motivation
14
Makes it hard to see hidden structure
Principal Component Analysis -Motivation
15
To make this easier, let try to reduce this to 1-dimension
Principal Component Analysis -Motivation
16
We need to shift our perspective
Change the definition of up-down-left-right
Choose new features as linear combinations of old features
Change of feature-basis
Principal Component Analysis -Motivation
17
We need to shift our perspective
Change the definition of up-down-left-right
Choose new features as linear combinations of old features
Change of feature-basis
Important: Center and normalize data before performing PCAWe will assume that this has already been done in this lecture.
Principal Component Analysis -Motivation
18
Proceed incrementally:
• If we could choose one combination to describe data?
• Which combination leads to the least loss of information?
• Once we've found that one, look for another one, perpendicular
to the first, the retains the next most amount of information-
• Repeat until done (or good enough)
Principal Component Analysis -Motivation
19
Principal Component Analysis -Motivation
20
Principal Component Analysis -Motivation
21
Principal Component Analysis -Motivation
22
Principal Component Analysis -Motivation
23
Principal Component Analysis -Motivation
24
Principal Component Analysis -Motivation
25
Principal Component Analysis -Motivation
26
Principal Component Analysis -Motivation
27
Principal Component Analysis -Motivation
28
Principal Component Analysis -Motivation
29
Principal Component Analysis -Motivation
30
Principal Component Analysis -Motivation
31
Principal Component Analysis -Motivation
32
The best vector to project onto is called the 1st principal componentWhat properties should it have?
Principal Component Analysis -Motivation
33
The best vector to project onto is called the 1st principal componentWhat properties should it have?• Should capture largest variance in data• Should probably be a unit vector
Principal Component Analysis -Motivation
34
The best vector to project onto is called the 1st principal componentWhat properties should it have?• Should capture largest variance in data• Should probably be a unit vectorAfter we’ve found the first, look the second which:• Captures largest amount of leftover variance• Should probably be a unit vector• Should be orthogonal to the one that came before it
Principal Component Analysis -Motivation
35
Principal Component Analysis -Motivation
36
Principal Component Analysis -Motivation
37
Main idea: The principal components give a new perpendicular coordinate system to view data where each principle component describes successively less and less information.
Principal Component Analysis -Motivation
38
Main idea: The principal components give a new perpendicular coordinate system to view data where each principle component describes successively less and less information.
So far: All we’ve done is a change of basis on the feature space.
But when do we reduce the dimension?
Principal Component Analysis -Motivation
39
But when do we reduce the dimension?
Picture data points in a 3D feature space
What if the points lied mostly along a single vector?
Principal Component Analysis -Motivation
40
The other two principal components are still there
But they do not carry much information
Principal Component Analysis -Motivation
41
The other two principal components are still there
But they do not carry much information
Throw them away and work with low dimensional representation!
Reduce 3D data to 1D
Principal Component Analysis – The How
42
Principal Component Analysis – The How
43
Principal Component Analysis – The How
44
Principal Component Analysis – The How
45
Principal Component Analysis – The How
46
But how do we find w?
Principal Component Analysis – The How
47
But how do we find w?
Principal Component Analysis – The How
48
Principal Component Analysis – The How
49
Principal Component Analysis – The How
50
Principal Component Analysis – The How
51
Principal Component Analysis – The How
52
Principal Component Analysis – The How
53
Principal Component Analysis – The How
54
Principal Component Analysis – The How
55
Principal Component Analysis – The How
56
Principal Component Analysis – The How
57
Principal Component Analysis – The How
58
Principal Component Analysis – The How
59
PCA – Dimensionality reduction
60
Questions:• How do we reduce dimensionality?• How much stuff should we keep?
PCA – Dimensionality reduction
61
PCA – Dimensionality reduction
62
Quiz
63
PCA - applications
64
PCA - applications
65
PCA - applications
66
PCA - applications
67
PCA - applications
68
PCA - applications
69
Connecting PCA and SVD
70
SVD Applications
71
Wrap up
72
Dimensionality reduction can be a useful way to • explore data• visualize data• represent data