Laplacian Eigenmaps for Dimensionality Reduction and
Data Representation
By Mikhail Belkin, Partha Niyogi Slides by Shelly Grossman
Big Data Processing Seminar
Amir Averbuch 28.12.2014
Introduction
• A geometrically-motivated algorithm to non-linear dimensionality reduction.
• An attempt to recover a representation of the data in it’s intrinsic structure (if exists), keeping close points together.
• Shares common properties with LLE, Spectral Clustering, Diffusion maps, and other non linear dimensionality reduction methods.
Agenda
Preliminaries & Reminders
Geometric motivation
The Algorithm & Justification
Relation to Laplace operator and Heat Kernels
Similar algorithms
Examples
Open questions
Preliminaries & Reminders
Manifolds
• A space that resembles the Euclidean Space Rn in a neighborhood near each point.
Dimensionality Reduction • “Unfolding” a manifold embedded in a high-
dimensional space so each data point is assigned a low dimensional representation.
• x1,…,xk ∈M, M embedded in ℝl.
• Target: find y1,…,yk ∈ ℝ𝑚 , 𝑚 ≪ 𝑙, where yi is equivalent to xi.
Example
• X=(x1,x2,…,xn) x1
– Not very good…
• Later on, we will see criteria for “Good” and “Bad” representations.
Graph Laplacian ℒ
𝓛 = 𝑫 −𝑾
W = adjacency matrix
𝐷𝑖,𝑗 = 0
deg(𝑣𝑖)
𝑖 ≠ 𝑗
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
W = weights matrix
𝐷𝑖,𝑖 = 𝑤(𝑣𝑖 , 𝑣𝑗)
ℒ𝑖,𝑗 =
𝐷𝑖,𝑖−𝑤(𝑣𝑖 , 𝑣𝑗)
0
𝑖 = 𝑗𝑖 ≠ 𝑗 ; ∃𝑒𝑑𝑔𝑒 𝑣𝑖 → 𝑣𝑗
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Geometric motivation
• The graph Laplacian is a discrete approximation of the Laplace operator on manifolds.
• Eigenvectors of the Laplacian matrix are equivalent to eigenfunctions of the Laplace operator.
• The Laplace operator, in turn, defines the inner-product on the tangent space for any point in the manifold.
• The inner product is used to define geometric notions such as length, angle, orthogonality.
• See S. Rosenberg, the Laplacian on a Riemmannian Manifold, 1997, pgs. 11, 18.
The Algorithm
• Input: K points in ℝ𝑙 (samples from the data).
• We do not know whether these points actually lay on a manifold of lower dimension – it’s an assumption.
• Output: an embedding map of these points to a lower dimension.
Adjacency Graph Construction
• The k data points are translated to k graph nodes.
• Edges are defined according to a metric set on the points.
– Which points are considered “close”?
• Two alternatives:
– 𝜀-close nodes are connected
– n nearest neighbors
𝜀-close nodes
• 𝑥𝑖 − 𝑥𝑗2< 𝜀
• ∙ is the usual Euclidean norm in ℝ𝑙 • Geometrically intuitive, but leads to
disconnected graphs. • Need to choose 𝜀.
1 2
3
0.8𝜀
4 1.2𝜀
n-nearest neighbors
• n=2 – Node 1 is close to 4,5
– Node 2 is close to 1,3
– Node 3 is close to 1,2
– Node 4 is close to 1,5
– Node 5 is close to 1,4
• Easy to pick and good chances to have a connected graph, but not as intuitive.
1
2
3
4 5
Choosing weights
• Binary: 1 for an existing edge in the adjacency graph. 𝑊𝑖,𝑗 ∈ {0,1}
• Heat Kernel: t ∈ ℝ+,𝑊𝑖,𝑗 = 𝑒−𝑥𝑖−𝑥𝑗
2
𝑡
for an existing edge, 0 otherwise.
• Intuition regarding the heat kernel will be provided later on.
Eigenmap computation
• Repeat the following for each connected component:
• Solve a generalized eigenvector problem:
ℒ𝐟 = 𝜆𝐷𝐟
– This will result in a set of eigenvalues and matching eigenvectors.
• Take m eigenvectors matching the smallest eigenvalues (omitting 0): 𝐟𝟏, 𝐟𝟐, … 𝐟𝐦
𝐱𝐢 → (𝐟𝟏 𝐢 , … , 𝐟𝐦 𝐢 )
Justification • Suppose m=1 (map the sample to a line).
• The map is: 𝐱𝐢 → 𝐲𝐢
• Minimize: 𝑦𝑖 − 𝑦𝑗2𝑊𝑖,𝑗𝑖,𝑗 = λ
• It can be proved that 1
2λ = 𝐲𝐭 ℒ𝐲
• The minimizing vector matches the smallest eigenvalue of ℒ.
• A similar argument can be applied for m>1.
Notes • We find eigenmaps per each connected
component.
• We take m eigenvectors, where m is the dimension of the embedded manifold (if known).
• 0 is omitted as the matching eigenvector is 𝟏: in ℒ, the sum of a row is 0. Taking it will result in mapping an entire component’s first coordinate to a single point.
Relation to Laplace Operator • A similar process can be applied in the continuous
case.
• 𝑓 maps every 2 points in the embedded manifold to a low dimension space (i.e. the real line).
𝑓 𝑦 − 𝑓 𝑥 ≤ 𝑑𝑖𝑠𝑡ℳ 𝑥, 𝑦 𝛻𝑓 𝑥 + 𝑜(𝑑𝑖𝑠𝑡ℳ 𝑥, 𝑦 )
• It can be proved with tools from Functional Analysis that a mapping 𝑓 that best preserves local distances is an eigenfunction of the Laplace operator on the manifold.
Heat Kernel and Choice of Weight Matrix
Discrete Laplacian↔Laplace Operator ↔ Heat equation
• Heat equation: 𝜕
𝜕𝑡+ℒ 𝑢 = 0
• The solution for 𝑢(𝑥, 𝑡) can be expressed using the Heat Kernel, which is approximately the Gaussian:
4𝜋𝑡 −𝑚2 𝑒−
𝑥−𝑦 2
4𝑡
• Plugging into the heat equation 𝑓 𝑥 = 𝑢(𝑥, 0) we get an estimate for the Laplacian using Gaussian weights:
ℒ𝑓 𝑥𝑖 ≈1
𝑡𝑓 𝑥𝑖 − 𝛼 𝑒−
𝑥𝑖−𝑥𝑗2
4𝑡
𝑊𝑖,𝑗𝑗
𝑓 𝑥𝑗
LLE • In LLE we had:
1. Calculating weights 𝑊𝑖,𝑗
2. Use 𝑊𝑖,𝑗 to calculate the representation 𝑦𝑖 .
• Step 2 could also be done by calculating the smallest eigenvectors of 𝑀 = 𝐼 −𝑊 𝑇 𝐼 − 𝑊
• Regarding 𝑀 as an operator on functions defined on the dataset, it can be shown that:
𝑀𝑓 ≈1
2ℒ2𝑓
• Therefore, LLE calculates the eigenfunctions of the iterated Laplacian.
• Eigenfunctions of ℒ2 are the same as the eigenfunctions of ℒ.
Clustering • In last lecture: Clustering↔Minimal graph cut
• We also saw that normalized spectral clustering solves a generalized eigenvector problem:
ℒ𝐯 = 𝜆𝐷𝐯
• Can also show how the process of finding the minimal cut reduces to finding the eigenvectors of the graph Laplacian.
• Therefore the Laplacian has a role in both dimensionality reduction and clustering.
• Can be viewed as 2 sides of the same coin.
Examples
• Classic Swiss roll:
Original sample Laplacian representation PCA
Toy vision example - bars
example Laplacian representation PCA Dimension=1600 (40x40) to dimension=2. Sample was 500 horizontal bars and 500 vertical bars.
Linguistics
• 300 most popular words in the Brown Corpus (compiled in 1961).
• Each such word is represented as a vector of dimension 600 with the bigram count information:
𝑤𝑖 = (𝑐 𝑤1𝑤𝑖 , … , 𝑐 𝑤300𝑤𝑖 , 𝑐 𝑤𝑖𝑤1 , … , 𝑐 𝑤𝑖𝑤300 )
• Dimensionality reduction using Laplacian eigenmaps will give us a bonus – soft clustering of words with similar syntactic categories.
Linguistics
Infinitives (to be) Prepositions Modal verbs
Speech
• Given a short recording of speech, can we recognize and represent phonetic data efficiently?
• Convert speech signal to Fourier transform, label each vector of Fourier coefficients (dimension = 256) with phonetic identity.
• Labels are not disclosed to Laplacian eigenmap algorithm.
Fricatives עיצורים חוככיםf, v, s, z
Closures עיצורים סותמיםg, k, t, d, p, b
Vowels, י"אהו
Nasals, עיצורים אפיים (n, m)
Open Questions
• Finding an isometry of a manifold in a low dimensional space. – Dimensionality reduction with global preservation of distances.
• The process does not reveal the intrinsic dimensionality of the manifold, even though we assume the data does lie there.
• Assumes uniform sampling.
• Manifold boundaries.
• Choice of 𝜖 and 𝑡.
• Do we really mind the underlying manifold? Requires research of specific problems in various areas.