Non-linear dimension-reduction methods
Olga SorkineJanuary 2006
2
Overview
Dimensionality reduction of high-dimensional data Good for learning, visualization and … parameterization
3
Dimension reduction
Input: points in some D-dimensional space (D is large)– Images– Physical measurements– Statistical data– etc…
We want to discover some structure/correlation in the input data. Hopefully, the data lives on a d-dimensional surface (d << D).– Discover the real dimensionality d– Find a mapping from RD to Rd that preserves something about
the data• Today we’ll talk about preserving variance/distances
4
Discovering linear structures
PCA – finds linear subspaces that best preserve the variance of the data points
5
Linear is sometimes not enough
When our data points sit on a non-linear manifold– We won’t find a good linear mapping from the data points to a
plane, because there isn’t any
6
Today
Two methods to discover such non-linear manifolds:
Isomap (descendent of MultiDimensional Scaling) Llocally Linear Embedding
7
Notations
Input data points: columns of X RDn
Assume that the center of mass of the points is the origin
2
| | |
| | |nX
1x x x
8
Reminder about PCA
PCA finds a linear d-dimensional subspace of RD along which the variance of the data is the biggest
Denote by the data points projected onto the d-dimensional space. PCA finds such subspace that:
When we do parallel projection of the data points, the distances between them can only get smaller. So finding a subspace which attains the maximum scatter means we get the distances somehow preserved.
, , , 1 2 nx x x
2
maxi j
i jx x
9
Reminder about PCA
To find the principal axes:– Compute the scatter matrix S RDD
– Diagonalize S:
The eigenvectors of S are the principal directions. The eigenvalues are sorted in descending order.
Take d first eigenvectors as the “principal subspace” and project the data points onto this subspace.
TS XX
1| | | |
| | | |
T
D
S
1 D 1 Dv v v v
10
Why this works?
The eigenvectors vi are the maxima of the following quadratic form:
In fact, we get directions of maximal variance:2
( ) ( ) ( )T T T T T T Tf S XX X X X v v v v v v v v
2 2
22| ,
| ,
T
n
X
1 1
i
n
x x vv v x
x x v
( ) , Tf S S v v v v v
Multidimensional Scaling
J. Tenenbaum, V. Silva, J.C. LangfordScience, December 2000
12
Multidimensional scaling (MDS)
The idea: compute the pairwise distances between the input points:
Now, find n points in low-dimensional space Rd, so that their distance matrix is as close as possible to M.
2dist ( , )n n
M
i jx x
13
MDS – the math details
We look for X’,
such that || M’ – M || is as small as possible, where
M’ is the Euclidean distances matrix for points xi’.
| |
| |
d nX R
1 nx x
22dist ( , ) n nM R
i j i jx x x x
14
MDS – the math details
Ideally, we want:
2
2 2
,
|| || || || 2 ,
M M
M
M
i j
i j i j
i j i j
x x
x x x x
x x x x
2 2 2
|| || || || || ||
|| || || || || ||
|| || || || || ||
1 1 1
n n n
x x x
x x x
x x x
2
1 2
1 2
|| || || || || ||
|| || || || || ||
|| || || || || ||
1 n
n
n
x x x
x x x
x x x
| |
| |
1
1 n
n
xx x
x
TX X want to get rid of these
15
MDS – the math details
Trick: use the “magic matrix” J :1 1
1 1 1
1 1
11
1
n n
n n n
n n n n
J
0a a a J
0
bb
J
b
16
MDS – the math details
Cleaning the system:
2
2 2 2 1 2
1 2
|| || || || || || || || || || || ||
|| || || || || || || || || || || || 2
|| || || || || || || || || || || ||
TX X M
1 1 1 1 n
n
n n n n
x x x x x x
x x x x x x
x x x x x x
J J
12
2
:
T
T
X X JMJ
X X JMJ B
TX X B
17
How to find X’
We will use the spectral decomposition of B:
1| | | |
| | | |
T
T
n
X X B
1 n 1 nv v v v
1 1| | | | | || | | | | |
| | | | | || | | | | |
TT
Tn nd d
n n
X X
1 d 1 dv v v v v v
n d
d d
TX X
18
How to find X’
So we find X’ by throwing away the last nd eigenvalues
1
d
X
1
d
v
v
d n
2arg min T
LXX X X B
22
,ijL
i j
A A
19
Isomap
The idea of Tenenbaum et al.: estimate geodesic distances of the data points (instead of Euclidean)
Use K nearest neighbors or -balls to define neighborhood graphs
Approximate the geodesics by shortest paths on the graph.
20
Inducing a graph
-15
-10
-5
0
5
10
15
-15
-10
-5
0
5
10
150
20
40
60
21
Defining neighborhood and weights
ijw i jx x
22
Finding geodesic paths
Compute weighted shortest paths on the graph (Dijkstra)
23
Locating new points in the Isomap embedding
Suppose we have a new data point p RD
Want to find where it belongs in the Rd embedding Compute the distances from p to all other points:
2
dist( , )dist( , )
dist( , )n
1p xp x
u
p x
d d V
p u
24
Some results
25
Morph in Isomap space
26
Flattening results (Zigelman et al.)
27
Flattening results (Zigelman et al.)
28
Flattening results (Zigelman et al.)
Locally Linear Embedding
S.T. Roweis and L.K. SaulScience, December 2000
30
The idea
Define neighborhood relations between points– K nearest neighbors -balls
Find weights that reconstruct each data point from its neighbors:
Find low-dimensional coordinates so that the same weights hold:
2
( )1min
ijj
j N iijw
w
i jx x
, ( ),
2
min iji j N i
w
1 n
i jx x
x x
, , dR 1 nx x
31
Local information reconstructs global one
The weights wij capture the local shape– Invariant to translation, rotation and scale of the neighborhood– If the neighborhood lies on a manifold, the local mapping from
the global coordinates (RD) to the surface coordinates (Rd) is almost linear
– Thus, the weights wij should hold also for manifold (Rd) coordinate system!
, ( ),
2
min iji j N i
w
1 n
i jx x
x x
2
( )1min
ijj
j N iijw
w
i jx x
32
Solving the minimizations
Linear least squares (using Lagrange multipliers)
To find that minimize,
a sparse eigen-problem is solved. Additional constraints
are added for conditioning:
2
( )1min
ijj
j N iijw
w
i jx x
, ( ),
2
min iji j N i
w
1 n
i jx x
x x
, , dR 1 nx x
10, T
i i
In
i i ix x x
33
Some results
The Swiss roll
34
Some results
35
Some results
The end