Download - Non-linear dimension-reduction methods

Non-linear dimension-reduction methods

Olga SorkineJanuary 2006

2

Overview

Dimensionality reduction of high-dimensional data Good for learning, visualization and … parameterization

3

Dimension reduction

Input: points in some D-dimensional space (D is large)– Images– Physical measurements– Statistical data– etc…

We want to discover some structure/correlation in the input data. Hopefully, the data lives on a d-dimensional surface (d << D).– Discover the real dimensionality d– Find a mapping from RD to Rd that preserves something about

the data• Today we’ll talk about preserving variance/distances

4

Discovering linear structures

PCA – finds linear subspaces that best preserve the variance of the data points

5

Linear is sometimes not enough

When our data points sit on a non-linear manifold– We won’t find a good linear mapping from the data points to a

plane, because there isn’t any

6

Today

Two methods to discover such non-linear manifolds:

Isomap (descendent of MultiDimensional Scaling) Llocally Linear Embedding

7

Notations

Input data points: columns of X RDn

Assume that the center of mass of the points is the origin

2

| | |

| | |nX

1x x x

8

Reminder about PCA

PCA finds a linear d-dimensional subspace of RD along which the variance of the data is the biggest

Denote by the data points projected onto the d-dimensional space. PCA finds such subspace that:

When we do parallel projection of the data points, the distances between them can only get smaller. So finding a subspace which attains the maximum scatter means we get the distances somehow preserved.

, , , 1 2 nx x x

2

maxi j

i jx x

9

Reminder about PCA

To find the principal axes:– Compute the scatter matrix S RDD

– Diagonalize S:

The eigenvectors of S are the principal directions. The eigenvalues are sorted in descending order.

Take d first eigenvectors as the “principal subspace” and project the data points onto this subspace.

TS XX

1| | | |

| | | |

T

D

S

1 D 1 Dv v v v

10

Why this works?

The eigenvectors vi are the maxima of the following quadratic form:

In fact, we get directions of maximal variance:2

( ) ( ) ( )T T T T T T Tf S XX X X X v v v v v v v v

2 2

22| ,

| ,

T

n

X

1 1

i

n

x x vv v x

x x v

( ) , Tf S S v v v v v

Multidimensional Scaling

J. Tenenbaum, V. Silva, J.C. LangfordScience, December 2000

12

Multidimensional scaling (MDS)

The idea: compute the pairwise distances between the input points:

Now, find n points in low-dimensional space Rd, so that their distance matrix is as close as possible to M.

2dist ( , )n n

M

i jx x

13

MDS – the math details

We look for X’,

such that || M’ – M || is as small as possible, where

M’ is the Euclidean distances matrix for points xi’.

| |

| |

d nX R

1 nx x

22dist ( , ) n nM R

i j i jx x x x

14


Ideally, we want:

2

2 2

,

|| || || || 2 ,

M M

M

M

i j

i j i j

i j i j

x x

x x x x

x x x x

2 2 2

|| || || || || ||

|| || || || || ||

|| || || || || ||

1 1 1

n n n

x x x

x x x

x x x

2

1 2

1 2

|| || || || || ||

|| || || || || ||

|| || || || || ||

1 n

n

n

x x x

x x x

x x x

| |

| |

1

1 n

n

xx x

x

TX X want to get rid of these

15


Trick: use the “magic matrix” J :1 1

1 1 1

1 1

11

1

n n

n n n

n n n n

J

0a a a J

0

bb

J

b

16


Cleaning the system:

2

2 2 2 1 2

1 2

|| || || || || || || || || || || ||

|| || || || || || || || || || || || 2

|| || || || || || || || || || || ||

TX X M

1 1 1 1 n

n

n n n n

x x x x x x

x x x x x x

x x x x x x

J J

12

2

:

T

T

X X JMJ

X X JMJ B

TX X B

17

How to find X’

We will use the spectral decomposition of B:

1| | | |

| | | |

T

T

n

X X B

1 n 1 nv v v v

1 1| | | | | || | | | | |

| | | | | || | | | | |

TT

Tn nd d

n n

X X

1 d 1 dv v v v v v

n d

d d

TX X

18

How to find X’

So we find X’ by throwing away the last nd eigenvalues

1

d

X

1

d

v

v

d n

2arg min T

LXX X X B

22

,ijL

i j

A A

19

Isomap

The idea of Tenenbaum et al.: estimate geodesic distances of the data points (instead of Euclidean)

Use K nearest neighbors or -balls to define neighborhood graphs

Approximate the geodesics by shortest paths on the graph.

20

Inducing a graph

-15

-10

-5

0

5

10

15

-15

-10

-5

0

5

10

150

20

40

60

21

Defining neighborhood and weights

ijw i jx x

22

Finding geodesic paths

Compute weighted shortest paths on the graph (Dijkstra)

23

Locating new points in the Isomap embedding

Suppose we have a new data point p RD

Want to find where it belongs in the Rd embedding Compute the distances from p to all other points:

2

dist( , )dist( , )

dist( , )n

1p xp x

u

p x

d d V

p u

24

Some results

25

Morph in Isomap space

26

Flattening results (Zigelman et al.)

27


28


Locally Linear Embedding

S.T. Roweis and L.K. SaulScience, December 2000

30

The idea

Define neighborhood relations between points– K nearest neighbors -balls

Find weights that reconstruct each data point from its neighbors:

Find low-dimensional coordinates so that the same weights hold:

2

( )1min

ijj

j N iijw

w

i jx x

, ( ),

2

min iji j N i

w

1 n

i jx x

x x

, , dR 1 nx x

31

Local information reconstructs global one

The weights wij capture the local shape– Invariant to translation, rotation and scale of the neighborhood– If the neighborhood lies on a manifold, the local mapping from

the global coordinates (RD) to the surface coordinates (Rd) is almost linear

– Thus, the weights wij should hold also for manifold (Rd) coordinate system!

, ( ),

2

min iji j N i

w

1 n

i jx x

x x

2

( )1min

ijj

j N iijw

w

i jx x

32

Solving the minimizations

Linear least squares (using Lagrange multipliers)

To find that minimize,

a sparse eigen-problem is solved. Additional constraints

are added for conditioning:

2

( )1min

ijj

j N iijw

w

i jx x

, ( ),

2

min iji j N i

w

1 n

i jx x

x x

, , dR 1 nx x

10, T

i i

In

i i ix x x

33

Some results

The Swiss roll

34

Some results

35

Some results

The end