Date post: | 15-Jan-2016 |
Category: |
Documents |
Upload: | samson-mccormick |
View: | 225 times |
Download: | 0 times |
Nonlinear Dimensionality Reduction Approaches
Dimensionality Reduction
The goal:The meaningful low-dimensional structures hidden in their high-dimensional observations.
Classical techniques Principle Component Analysis—preserves the variance Multidimensional Scaling—preserves inter-point distance
Isomap Locally Linear Embedding
Common Framework
Algorithm Given data . Construct a nxn affinity matrix M. Normalize M, yielding . Compute the m largest eigenvalues and eigenvectors
of . Only positive eigenvalues should be considered. The embedding of each example is the vector with
the i-th element of the j-th principle eigenvector of . Alternatively (MDS and Isomap), the embedding is , with
. If the first m eigenvalues are positive, then is the best approximation of using only m corrdinates, in the sense of squared error.
nxxD ,,1
M~
'j jv
M~
jx jy ijy
jv M~
ie
ijjij ye ' ji ee .
M~
Linear Dimensionality Reduction PCA
Finds a low-dimensional embedding of the data points that best preserves their variance as measured in the high-dimensional input space
MDS Finds an embedding that preserves the inter-point
distances, equivalent to PCA when the distances are Euclidean.
Multi-Dimensional Scaling
MDS starts from a notion of distacne of affinity that is computed each pair of training examples.
The normalizing step is equivalent to dot products using the “double-centering” formula:
The embedding of example is given by where is the k-th eigenvector of . Note that if then where is the average value of
jijiijij SS
nSn
Sn
MM2
111
2
1~ j
iji MSwhere
M~
2
jiij yyM yyyyM jiij ~ y
iy
ike ikk vix
kv.
Nonlinear Dimensionality Reduction Many data sets contain essential nonlinear
structures that invisible to PCA and MDS Resorts to some nonlinear dimensionality
reduction approaches.
A Global Geometric Framework for Nonlinear Dimensionality Reduction (Isomap)
Joshua B. Tenenbaum, Vin de Silva, John C. Langford
Example
64X64 Input Images form4096-dimensional vectors
Intrinsically, three dimensions is enough for presentations Two pose parameters and azimuthal lighting angle
Isomap Advantages
Combining the major algorithmic features of PCA and MDS Computational efficiency Global optimality Asymptotic convergence guarantees
Flexibility of learning a broad class of nonlinear manifold
Example of Nonlinear Structure Swiss roll
Only the geodesic distances reflect the true low-dimensional geometry of the manifold.
Intuition
Built on top of MDS. Capturing in the geodesic manifold path of
any two points by concatenating shortest paths in-between.
Approximating these in-between shortest paths given only input-space distance.
Algorithm Description
Step 1Determining neighboring points within a fixed radius based on the input space distance
These neighborhood relations are represented as a weighted graph G over the data points.
Step 2Estimating the geodesic distances between all pairs of points on the manifold M by computing their shortest path distances in the graph G
Step 3Constructing an embedding of the data in d-dimensional Euclidean space Y that best preserves the manifold’s geometry
jid ,X
jidM ,
jidG ,
Construct Embeddings
The coordinate vector for points in Y are chosen to minimize the cost function
where denotes the matrix of Euclidean distances
and the matrix norm The operator converts distances to inner products.
iy
2LYG DDE
YD jiY yyjid ,
2LA 2L ji ijA,
2
Dimension
The true dimensionality of data can be estimated from the decrease in error as the dimensionality of Y is increased.
Manifold Recovery Guarantee Isomap is guaranteed asymptotically to recover the
true dimensionality and geometric structure of nonlinear manifolds
As the sample data points increases, the graph distances provide increasingly better approximations to the intrinsic geodesic distances
),( jidG
),( jidM
Examples
Interpolations between distant points in the low-dimensional coordinate space.
Summary
Isomap handles non-linear manifold Isomap keeps the advantages of PCA and
MDS Non-iterative procedure Polynomial procedure Guaranteed convergence
Isomap represents the global structure of a data set within a single coordinate system.
Nonlinear Dimensionality Reduction by Locally Linear EmbeddingSam T. Roweis and Lawrence K. Saul
LLE
Neighborhood preserving embeddings Mapping to global coordinate system of low
dimensionality No need to estimate pairwise distances
between widely separated points Recovering global nonlinear structure from
locally linear fits
Algorithm Description
We expect each data point and its neighbors to lie on or close to a locally linear patch of the manifold.
We reconstruct each point from its neighbors.
where summarize the contribution of jth data point to the ith data reconstruction and is what we will estimated by optimizing the error
Reconstructed from only its neighbors Wj sums to 1
i
j jiji XWXW2
ijW
Algorithm Description
A linear mapping for transform the high dimensional coordinates of each neighbor to global internal coordinates on the manifold.
Note that the cost defines a quadratic form
where
The optimal embedding is found by computing the bottom d eigenvector of M, d is the dimension of the embedding
i
j jijiY
YWYY2
min
ij
jiij YYMY
k
kjkijiijijij WWWWM
Illustration
Examples Two Dimensional Embeddings of Faces
Examples
Examples
Thank you