+ All Categories
Home > Documents > Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as...

Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as...

Date post: 01-Mar-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
35
Optimization on manifolds and data processing Rodolphe Sepulchre Department of Electrical Engineering and Computer Science University of Liège, Belgium Collaborators: P.A. Absil (U Louvain and U Cambridge) Robert Mahony (Australian National U) Michel Journée (U Liège) Andrew Teschendorff (U Cambridge) Principal manifolds workshop – Leicester – August 2006 – p. 1/??
Transcript
Page 1: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Optimization on manifolds and data

processingRodolphe Sepulchre

Department of Electrical Engineering and Computer Science

University of Liège, Belgium

Collaborators: P.A. Absil (U Louvain and U Cambridge)

Robert Mahony (Australian National U)

Michel Journée (U Liège)

Andrew Teschendorff (U Cambridge)

Principal manifolds workshop – Leicester – August 2006 – p. 1/??

Page 2: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Algorithms on manifolds

Principal manifolds: lines (or surfaces) passing through themiddle of the data distribution.

Question: How to define and compute such things when thedata are not points in IRn but points on abstract manifolds?

Motivation: SYMMETRYIn many problems, data represent geometric objects thatare invariant under certain transformations.

Principal manifolds workshop – Leicester – August 2006 – p. 2/??

Page 3: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

A three-step approach

An optimization-based formulation of thecomputational problem

Generalization of optimization algorithms on abstractmanifolds

Exploit flexibility and additional structure to buildnumerically efficient algorithms

Optimization algorithms on matrix manifolds, book in preparationP.-A. Absil, R. Mahony, R. Sepulchre.

Principal manifolds workshop – Leicester – August 2006 – p. 3/??

Page 4: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Applications

Eigenvalue problems(Invariant subspace calculation, PCA, SVD, . . . )

Statistical problems(Matrix approximations, ICA, . . . )

Pose estimation and motion recovery

. . .

Principal manifolds workshop – Leicester – August 2006 – p. 4/??

Page 5: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Outline

Part I: a quick illustration of the three steps

Part II: ICA and gene expression data analysis

Principal manifolds workshop – Leicester – August 2006 – p. 5/??

Page 6: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Eigenvalue problems as optimization

Let A a n × n symmetric matrix.Find an eigenvalue λ ∈ IR and an eigenvector y ∈ IRn suchthat Ay = λy

FACT: Eigenvectors are critical points of the Rayleighquotient

f : IRn∗ → IR : f(y) =

yT Ay

yT y

The global minimum is the leftmost eigenvector.

Principal manifolds workshop – Leicester – August 2006 – p. 6/??

Page 7: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Manifolds associated to eigenvectors

SYMMETRY: f(µy) = f(y) ∀µ ∈ IR∗

⇒ critical points are not isolated in IRn.

REMEDY:Impose a normalization constraint ‖ y ‖= 1

⇒ Optimization on the sphere Sn−1

ortreat yIR∗ as one point in the projective space

Pn−1 = {yIR∗ : y ∈ IRn∗}

⇒ Optimization on the projective space Pn−1

Principal manifolds workshop – Leicester – August 2006 – p. 7/??

Page 8: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Generalized eigenvalue problems

Let A,B n × n symmetric and B positive definite.Find (λ, y) such that Ay = λBy

The cost function is now defined over the full rank n× p

matrices:

f(Y ) = trace(YTAY(YTBY)−1)

Y∗ is a global minimizer of f iff Y∗ span the leftmostp-dimensional invariant subspace of B−1A.

Principal manifolds workshop – Leicester – August 2006 – p. 8/??

Page 9: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Manifolds for invariant subspaces

SYMMETRY: f(Y M) = f(Y ) for all full rank p × p matrix M⇒ critical points are not isolated in IRn×p.

REMEDY:Impose a normalization constraint ‖ Y T Y ‖= Ip

⇒ Optimization on the Stiefel manifold St(p, n)ortreat yGL(p) as one point in the Grassmann manifoldGr(p, n) of p-dimensional subspaces of IRn.

Principal manifolds workshop – Leicester – August 2006 – p. 9/??

Page 10: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Important matrix manifolds

Sn−1, St(p, n) are examples of embedded manifolds invector spaces.

Pn−1, Gr(p, n) are examples of quotient manifolds invector spaces

The linear structure of the total vector space is very helpfulfor computations!

Principal manifolds workshop – Leicester – August 2006 – p. 10/??

Page 11: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

A three-step approach

An optimization-based formulation of thecomputational problem

Generalization of optimization algorithms on abstractmanifolds

Exploit flexibility and additional structure to buildnumerically efficient algorithms

How different is an algorithm in a vector space and on amanifold?Illustration: line-search algorithm

Principal manifolds workshop – Leicester – August 2006 – p. 11/??

Page 12: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Line search in a vector space

xk+1 = xk + tkηk

The vector ηk is a search directionThe scalar tk dictates the step length≈ discretized version of the continuous-time descentgradient flow

x = −gradf(x)

Principal manifolds workshop – Leicester – August 2006 – p. 12/??

Page 13: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Line search on a manifold

Let M an abstract Riemannian manifold.

xk+1 = Expxk(tkξ) = γ(tk : xk, ξ)

Start at xk; choose a direction ξ in the tangent space TxM ;follow for tk units the geodesic passing at xk and tangent toξ.( Luenberger, 73; Gabay, 82).Conceptually elegant and useful; numerically unpractical.

Principal manifolds workshop – Leicester – August 2006 – p. 13/??

Page 14: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Optimization on manifolds

Newton method (Smith 93, Mahony 94)

Conjugated gradients (Edelman 96)

Trust region method (Absil et al. 04)

. . .

Translation of corresponding algorithms in vector spaces +convergence theory.

Principal manifolds workshop – Leicester – August 2006 – p. 14/??

Page 15: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

A three-step approach

An optimization-based formulation of thecomputational problem

Generalization of optimization algorithms on abstractmanifolds

Exploit flexibility and additional structure to buildnumerically efficient algorithms

Does this approach lead to competitive numerical algorithms?Illustration: line-search algorithm

Principal manifolds workshop – Leicester – August 2006 – p. 15/??

Page 16: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Retractions

xk+1 = Rxk(tkξ)

The convergence theory of line search methods still holds ifthe exponential mapping is replaced by ANY mappingR : TM → M satisfying Rx(0x) = x and DRx(0x) = idTxM .

Principal manifolds workshop – Leicester – August 2006 – p. 16/??

Page 17: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Examples of retractions

Use the linear structure of the total space:On Sn−1: Rx(ξ) = x+ξ

‖x+ξ‖

On Gr(p, n): RspanY (ξ) = span(Y + ξY ) with ξY thehorizontal lift of ξ

Good retractions may turn the algorithm into a numericallyefficient procedure.

Principal manifolds workshop – Leicester – August 2006 – p. 17/??

Page 18: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

State of the art

Brute force trust-region algorithms applied to the Rayleighquotient cost on Gr(p, n) (Absil et al, 04) compete with thebest available numerical algorithms for large-scaleproblems.

Principal manifolds workshop – Leicester – August 2006 – p. 18/??

Page 19: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Some benefits of the approach

A solid framework for convergence analysis;

A geometric interpretation of existing heuristics;

Sometimes, new and competitive algorithms

More inOptimization algorithms on matrix manifolds,Princeton University Press, 2007.P.-A. Absil, R. Mahony, R. Sepulchre.

Principal manifolds workshop – Leicester – August 2006 – p. 19/??

Page 20: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Extracting Independent Components of

Gene Expression Data

Michel Journee, Rodolphe Sepulchre, Pierre-Antoine Absil

Department of Electrical Engineering and Computer ScienceUniversity of Liege, Belgium

Workshop on Principal Manifolds, Leicester, August 2006

Page 21: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Independent Component Analysis

• Blind source separation based on the statistical independence of the sources.

• It assumes a linear, instantaneous and noisy mixture of sources,

x = Hs + v, H ∈ Rn×p.

➠ Given the observations x, identify the mixing matrix H and the independentsources s.

Workshop on Principal Manifolds, Leicester, August 2006 1

Page 22: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Outline

• ICA algorithms are optimization algorithms on manifolds.

• The application of ICA to gene expression data raises central issues.(Cost function, manifold, optimization algorithm?)

Workshop on Principal Manifolds, Leicester, August 2006 2

Page 23: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

The basic ICA algorithm

1. Let assume a linear demixing model: z = WTx, W ∈ R

n×p.

2. Measure the statistical independence of the estimated sources zi (⇒ contrast).

3. Select the W ∗ that maximizes that measure.

➠ Two main features: the contrast and the optimization algorithm.

Workshop on Principal Manifolds, Leicester, August 2006 3

Page 24: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

The contrast

• Definition:

A function γ(·) : W ∈ M → γ(W ) ∈ R that measures the statisticalindependence of the zi.

• Different types of contrast:

➠ Based on the mutual information (MI is zero at the independence andotherwise always positive).

➠ Diagonalization of the rth-order cumulant tensor (usually r=4).

➠ Joint approximate diagonalization of a set of matrices (SOBI, JADE, etc.).

➠ The constrained covariance: supf,g

cov(f(z1), g(z2)).

➠ ...

Workshop on Principal Manifolds, Leicester, August 2006 4

Page 25: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

The optimization algorithm

• Optimization on a matrix manifold: W ∗ = argmaxW∈M

γ(W ).

• Which manifold M ?

Inherent symmetries of ICA:

➠ Continuous symmetry: W ∼ WΛ, with Λ an invertible diagonal matrix.

➠ Discrete symmetry: W ∼ WP , with P a permutation matrix.

Workshop on Principal Manifolds, Leicester, August 2006 5

Page 26: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Choice of a manifold

• Optimization on the orthogonal group:

Op = {Y ∈ Rp×p : Y TY = Ip}.

➠ Jacobi algorithms (JADE, SOBI, RADICAL), KernelICA.

• Optimization on the orthogonal Stiefel manifold:

St(n, p) = {Y ∈ Rn×p : Y TY = Ip}.

➠ FastICA (one-unit algorithm used in a deflation scheme).

• Optimization on the oblique manifold [P.-A. Absil and K.A. Gallivan, 2006]:

OB(n, p) = {Y ∈ Rn×p : diag(Y TY ) = Ip}.

➠ Trust region optimization.

Workshop on Principal Manifolds, Leicester, August 2006 6

Page 27: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Prewhitening in ICA

• ICA is usually used in conjunction with PCA.

PCA ICAx x z

• Motivations for prewhitening:

➠ Good-conditioning of the ICA problem.

➠ Reduction of the dimensions of the ICA problem.

➠ Restriction of the ICA optimization to the orthogonal Stiefel manifold(prewhitening-based algorithms).

Workshop on Principal Manifolds, Leicester, August 2006 7

Page 28: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Discussion about prewhitening

• The prewhitening step is biased in the presence of noise and outliers.

Optimization on orthogonal manifolds is not able to compensate for theseerrors.

Optimization on non-orthogonal manifolds is more accurate.

• Optimization algorithms on orthogonal manifolds are usually betterconditioned.

Optimization on non-orthogonal manifolds might be less robust.

• The compromise between performance and robustness is rarely discussed inthe literature, especially for high-dimensional problems.

Workshop on Principal Manifolds, Leicester, August 2006 8

Page 29: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Outline

• ICA algorithms are optimization algorithms on manifolds.

• The application of ICA to gene expression data raises central issues.(Cost function, manifold, optimization algorithm?)

Workshop on Principal Manifolds, Leicester, August 2006 9

Page 30: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

What are gene expression data?

• Gene expression denotes the relevance of a specific gene on the biologicalfunctions to be fulfilled in the cell.

• DNA microarrays are intensively used in biochemistry and biomedicine toestimate the gene expression levels.

• They provide a huge amount of data (typically, ∼10.000 genes and ∼100experiments).

➠ Dimensionality reduction methods are needed for the analysis of these data.

Workshop on Principal Manifolds, Leicester, August 2006 10

Page 31: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Dimensionality reduction by ICA: Motivation

• Each biological function relies on a subset of genes (expression mode).

• Gene expression levels result from several biological processes that take placeindependently.

• Gene expression is assumed to be a linear function of the expression modes.

➠ Independence and linearity are the basic requisites for ICA1.

1First application of ICA to microarrays:

W. Liebermeister, Linear modes of gene expression determined by independent component analysis, Bioinformatics18 (2002), 51–60.

Workshop on Principal Manifolds, Leicester, August 2006 11

Page 32: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

ICA for the analysis of gene expression data

Experiments

Gen

es

Expression modes

Gen

es

Experiments

Exp

ress

ion

mod

es

=

X H S= .

∼ 104

∼ 102

Workshop on Principal Manifolds, Leicester, August 2006 12

Page 33: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Preliminary results

• Application of standard ICA algorithms to breast cancer databases2.

• Performance:ICA seems to outperform PCA in relating expression modes to biologicalpathways (i.e., groups of genes that participate together when a certainbiological function is required).

2In collaboration with A.E. Teschendorff, Department of Oncology, University of Cambridge

Workshop on Principal Manifolds, Leicester, August 2006 13

Page 34: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Challenges

Standard ICA algorithms are not well adapted for gene expression data.(i.e., few experiments, many observations, lot of outliers and noise.)

➠ New algorithmic developments are needed, i.e, cost functions, manifolds andoptimization algorithms specially dedicated to this kind of data sets.

Workshop on Principal Manifolds, Leicester, August 2006 14

Page 35: Optimization on manifolds and data processing · 2006. 8. 27. · Eigenvalue problems as optimization Let A a n×n symmetric matrix. ... • Optimization on a matrix manifold: W ...

Conclusion

• ICA performs dimensionality reduction by assuming that the observations arisefrom several independent sources.

• ICA algorithms are optimization-based algorithms on manifolds.

• ICA seems promising for the analysis of microarrays but raises central robustnessand performance issues.

Workshop on Principal Manifolds, Leicester, August 2006 15


Recommended