+ All Categories
Home > Documents > Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… ·...

Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… ·...

Date post: 12-Oct-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
47
Two Geometric Problems in Optimal Transport: Discrete and Gaussian measures Yoav Zemel * Statistical Laboratory University of Cambridge [email protected] * Support from the Swiss National Science Foundation grant 178220 is gratefully acknowledged Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 1 / 45
Transcript
Page 1: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Two Geometric Problems in Optimal Transport:Discrete and Gaussian measures

Yoav Zemel∗

Statistical LaboratoryUniversity of Cambridge

[email protected]

∗Support from the Swiss National Science Foundation grant 178220 is gratefully acknowledged

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 1 / 45

Page 2: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Optimal Transport: Fast Probabilistic ApproximationsWith Exact Solvers

joint work with Max Sommerfeld, Jorn Schrieber & Axel Munk

Georg–August–Universitat Gottingen

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 2 / 45

Page 3: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Optimal coupling and the Monge–Kantorovich problem

(X , d) separable metric space, p ≥ 1.

Monge–Kantorovich problem & Wasserstein distance

Let X ∼ µ and Y ∼ ν be random elements on (X , d) and define

Wp(X,Y ) ≡Wp(µ, ν) :=

[inf

Z1d=X,Z2

d=Y

E[dp(Z1, Z2)]

]1/p

.

Minimise over all random elements (Z1, Z2) on X ×X with Xd= Z1 and Y

d= Z2.

Defines metric on probability measures on X (with finite p-th moments)

Probability uses: metrises weak convergence + p-th moments, easy to bound,subadditive

Statistical uses: goodness of fit/deformation models/registration of pointprocesses/TDA/neural networks/...

Takes into account the geometry of (X , d): Wp(x0, y0) = d(x0, y0)

Close to the human perception of similarity of images

Difficult to compute

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 3 / 45

Page 4: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Optimal coupling and the Monge–Kantorovich problem

(X , d) separable metric space, p ≥ 1.

Monge–Kantorovich problem & Wasserstein distance

Let X ∼ µ and Y ∼ ν be random elements on (X , d) and define

Wp(X,Y ) ≡Wp(µ, ν) :=

[inf

Z1d=X,Z2

d=Y

E[dp(Z1, Z2)]

]1/p

.

Minimise over all random elements (Z1, Z2) on X ×X with Xd= Z1 and Y

d= Z2.

Defines metric on probability measures on X (with finite p-th moments)

Probability uses: metrises weak convergence + p-th moments, easy to bound,subadditive

Statistical uses: goodness of fit/deformation models/registration of pointprocesses/TDA/neural networks/...

Takes into account the geometry of (X , d): Wp(x0, y0) = d(x0, y0)

Close to the human perception of similarity of images

Difficult to computeYoav Zemel (Cambridge) subsampling-Wasserstein-covariance 3 / 45

Page 5: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

A subsampling approach

Focus on a finite metric space (X , d) of size N

Computational complexity O(N3) is prohibitive: take images of size1024× 1024. Then N = 10242 > 106 and one needs 1018/1011 = 107

seconds > 16 weeks

We propose a subsampling scheme: sample S << N points from measures µand ν and compute the distance between µS and νS

Repeat this B times

Computation time O(BS3)

S (and B) controls the computational-statistical tradeoff: large S yieldsbetter approximations, small S fast to compute

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 4 / 45

Page 6: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

N = 10000 pixels

Subsample S = 1000 points

W = 0.0208

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 5 / 45

Page 7: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

N = 10000 pixels

Subsample S = 1000 points

W = 0.0211

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 6 / 45

Page 8: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

A subsampling approach

True distance 0.0217, approximation 0.0209, 3.7% relative errorComputation time 5 minutes, empirical 2.4 seconds, 125 times faster

1%

10%

100%

10−5 10−4 10−3 10−2 10−1 100

Runtime Relativeto Exact Algorithm

Mea

n R

elat

ive

App

roxi

mat

ion

Err

or

ProblemSize

32x3264x64128x128

SampleSize

100500100020004000

Theoretical guarantees?

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 7 / 45

Page 9: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Error bounds

(X , d) metric space of cardinality N = |X |N (X , δ) is the number of δ-balls (centred at points of X ) needed to cover XSommerfeld, Schrieber, Z, Munk (2019, JMLR) show that

E[W pp (µ, µS)

]≤ (diam(X ))pES−1/2, E = E(p,X )

E = 2p−1 infq≥2

inflmax∈N

q2p

( √N

q(lmax+1)p+

lmax∑l=0

q−lp√N (X , q−ldiam(X ))

)

E|Wp(µS , νS)−Wp(µ, ν)| ≤ 2diam(X )E1/pS−1/(2p)

The repetition number B cannot improve the bias; it only appears indeviation bounds

The proof is based on majorising (X , d) with an ultrametric tree, followingBoissard & Le Gouic (2014); Fournier & Guillin (2015), and using the explicitformula on ultrametric spaces (Kloeckner 2015).

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 8 / 45

Page 10: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Error bounds

(X , d) metric space of cardinality N = |X |N (X , δ) is the number of δ-balls (centred at points of X ) needed to cover XSommerfeld, Schrieber, Z, Munk (2019, JMLR) show that

E[W pp (µ, µS)

]≤ (diam(X ))pES−1/2, E = E(p,X )

E = 2p−1 infq≥2

inflmax∈N

q2p

( √N

q(lmax+1)p+

lmax∑l=0

q−lp√N (X , q−ldiam(X ))︸ ︷︷ ︸

=O(qlD/2), X⊂(RD,‖·‖)

)

E|Wp(µS , νS)−Wp(µ, ν)| ≤ 2diam(X )E1/pS−1/(2p)

The repetition number B cannot improve the bias; it only appears indeviation bounds

The proof is based on majorising (X , d) with an ultrametric tree, followingBoissard & Le Gouic (2014); Fournier & Guillin (2015), and using the explicitformula on ultrametric spaces (Kloeckner 2015).

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 9 / 45

Page 11: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Euclidean error bounds

For (X , d) ⊂ (RD, ‖ · ‖2), E[W pp (µ, µS)] is bounded above by

S−1/2Dp/223p−1(diam(X ))pα(D, p)×

1 D < 2p,

log2N D = 2p,

N1/2−p/D D > 2p.

α(D, p) is explicit and ≤ 3 +√

2 (p ∈ N)

The power of N is < 1/2, so error can vanish with S << N

In low dimensions the error does not even depend on N

Similar bounds for any other norm on RD

Dependence on S and N is optimal when D > 2p: there are large families ofmeasures µ on N points such that

E[W pp (µ, µS)] ≥ S−1/2β(D, p)N1/2−p/D

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 10 / 45

Page 12: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Summary

We propose a probabilistic meta-algorithm approach that

1 is extremely easy to implement and to tune towards higher accuracy orshorter computation time as desired

2 can be used with any algorithm for transportation problems as a back-end,including general LP solvers, specialized network solvers and algorithms usingentropic penalization

3 comes with theoretical non-asymptotic guarantees for the approximation errorof the Wasserstein distance—in particular, this error is independent of thesize of the original problem in many important cases, including images

4 works well in practice. For example, the Wasserstein distance between two1282-pixel images can typically be approximated with a relative error of lessthan 5% in only 1% of the time required for exact computation

Sommerfeld, M., Schrieber, J., Zemel, Y. & Munk, A. (2019).Optimal transport: Fast probabilistic approximations with exact solvers.Journal of Machine Learning Research 20(105):1–23.

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 11 / 45

Page 13: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

The Procrustes Metric on Covariance Operators isOptimal Transport: Statistical Implications

joint work with Valentina Masarotto & Victor M. Panaretos

Ecole polytechnique Federale de Lausanne

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 12 / 45

Page 14: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Outline

1 Procrustes metric on covariance operators

2 Optimal coupling of Gaussian processes

3 1 = 2

4 So what?In other words: (some) statistical applications

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 13 / 45

Page 15: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

1. Covariance operators in functional data analysis

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 14 / 45

Page 16: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Covariance operators in functional data analysis

Aim: nonparametric inference on random function X(t), t ∈ [0, 1]

Data: N identically distributed realisations {Xj(t)}Nj=1

Setup: view X as a random element of L2[0, 1] (or sep. Hilbert space X )

Basic objects (and only objects if X Gaussian)

1 Covariance operator S : L2[0, 1]→ L2[0, 1]:

[Sf ](t) =

∫ 1

0

Cov[X(t), X(s)]f(s) ds

2 Karhunen–Loeve expansion: for the eigen decomposition (λn, ϕn) of S (thatis, (ϕn)n is an orthonormal basis and Sϕn = λnϕn),

X(t)− EX(t) =

∞∑n=1

ξnϕn(t),

where ξn = 〈X − EX,ϕn〉 are zero-mean uncorrelated with variance λn.

In practice, observe discrete measurements Xj(tk) + εjk on a grid (tk)Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 15 / 45

Page 17: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Metrics on covariance operators

Since covariances Si are Hilbert–Schmidt, use induced norm to contrast them:

d(Si, Sj) = |||Si − Sj |||2, |||S|||2 =√

trace[S∗S].

This implies a linear model for the covariances as

Si = S + ∆i, E∆i = 0.

Metric compatible with non-linear nature?

Procrustes metric on covariances (Pigoli et al., 2014)

For two covariance operators S1, S2 : X → X on a separable Hilbert space X ,define the Procrustes distance as

Π(S1, S2) = infU∗U=I

∣∣∣∣∣∣∣∣∣S1/21 − S1/2

2 U∣∣∣∣∣∣∣∣∣

2,

where I is the identity and {U : U∗U = I} is the collection of unitary operators.

Generalise matrix version (Dryden et al. 2009) motivated from statistical shapetheory

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 16 / 45

Page 18: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

2. Optimal coupling of Gaussian processes

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 17 / 45

Page 19: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Optimal coupling and the Monge–Kantorovich problem

Monge–Kantorovich problem & Wasserstein distance

Let X ∼ µ and Y ∼ ν be random elements on (X , ‖ · ‖) and define

W2(X,Y ) ≡W2(µ, ν) :=√

infZ1

d=X,Z2

d=Y

E‖Z1 − Z2‖2

over all random elements (Z1, Z2) on X × X such that Xd= Z1 and Y

d= Z2.

If µ regular1, optimal coupling π is deterministic:

manifested as the joint distribution of (X, t(X)) for some deterministicmap t : X → X , called an optimal transport map.

Optimal deterministic map uniquely exists when departure measure is regular,and is characterised as gradient of convex potential.

Denote optimal map from µ to ν by tνµ.1vanishes on Gaussian null sets

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 18 / 45

Page 20: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Optimal coupling of Gaussian measures

For µ ≡ N(0, S1) and ν ≡ N(0, S2) centred Gaussian measures,

W 2(µ, ν) = trace(S1) + trace(S2)− 2trace([S1/21 S2S

1/21 ]1/2).

µ is regular ⇐⇒ S1 injective

When dim(X ) <∞, invertibility of S1 guarantees existence & uniqueness ofdeterministic optimal transport map

tS2

S1:= t

N(0,S2)N(0,S1) = S

−1/21 (S

1/21 S2S

1/21 )1/2S

−1/21 .

Transport map formula essentially valid when dim(X ) =∞:

Existence/uniqueness of optimal maps (Cuesta-Albertos et al, 1996)

Let µ ≡ N(0, S1) and ν ≡ N(0, S2) be centred Gaussian measures in X . ProvidedKer(S1) ⊆ Ker(S2), there exists a subspace Xsub ⊆ X with µ-measure 1, onwhich the optimal map is well-defined and is given by the linear operator

tS2

S1= S

−1/21 (S

1/21 S2S

1/21 )1/2S

−1/21 .

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 19 / 45

Page 21: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

(Pseudo-)Riemannian geometry of the Wasserstein space

Fix a regular µ, so tνµ exists for any measure νUnique geodesic between µ and ν (McCann’s interpolant)

[stνµ + (1− s)I]#µ, s ∈ [0, 1].

Tangent space, exponential map (surjective) and log map

Tanµ = {s(t− I) : t optimal between µ and t#µ; s > 0}L2(µ)

= {∇ϕ : ϕ ∈ Cyl∞c (X )}L2(µ)

expµ(r− I) = r#µ logµ(ν) = tνµ − I

[see Ambrosio, Gigli & Savare, 2008; figure from Choi et al, 2015]Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 20 / 45

Page 22: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

3. Putting things together

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 21 / 45

Page 23: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Wasserstein ≡ Procrustes

Equivalence [Masarotto, Panaretos & Z., 2019]

The Procrustes distance between two trace-class covariance operators S1 and S2

on X coincides with the Wasserstein distance between Gaussian measuresN(0, S1) and N(0, S2) on X ,

Π(S1, S2) = infU∗U=I

∣∣∣∣∣∣∣∣∣S1/21 − S1/2

2 U∣∣∣∣∣∣∣∣∣

2=

=

√trace(S1 + S2 − 2[S

1/22 S1S

1/22 ]1/2) = W2(N(0, S1), N(0, S2)).

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 22 / 45

Page 24: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Almost Riemannian geometry on covariance operators

The tangent space at S is (a Hilbert space)

TanS = {A : A = A∗,∣∣∣∣∣∣S1/2A

∣∣∣∣∣∣2<∞},

where the closure is with respect to the associated inner product

〈A,B〉S = trace[ASB] = E 〈AX,BX〉 , X ∼ N(0, S)

Exponential mapexpS(A) = (A+ I)S(A+ I)

Injectivity condition ker(S0) ⊆ ker(S1) suffices for1 existence of log map

logS0(S1) = t1

0 − I = S−1/20 [S

1/20 S1S

1/20 ]1/2S

−1/20 − I

2 a unique minimal geodesic

St = t2S1 + (1− t)2S0 + t(1− t)[t10S0 + S0t

10],

with t10 = tS1

S0= S

−1/20 [S

1/20 S1S

1/20 ]1/2S

−1/20

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 23 / 45

Page 25: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Topology

Procrustes topology [Masarotto, Panaretos & Z., 2019]

The following are equivalent for covariance operators (Sn)∞n=1, S on X :

Π(Sn, S)→ 0

N(0, Sn)→ N(0, S) in distribution

|||Sn − S|||1 → 0, where |||S|||1 = trace([S∗S]1/2) is the trace norm∣∣∣∣∣∣∣∣∣S1/2n − S1/2

∣∣∣∣∣∣∣∣∣2→ 0

Corollary: stability to finite dimensional approximations.

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 24 / 45

Page 26: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Uniform stability under projections

Uniform stability [Masarotto, Panaretos & Z., 2019]

Let {ek}k≥1 be an orthonormal basis of X and Pn =∑nj=1 ej ⊗ ej be the

projection on the span of {e1, . . . , en}. Let B be Π-compact. Then

supS1,S2∈B

∣∣∣Π(PnS1Pn,PnS2Pn)−Π(S1, S2)∣∣∣→ 0, n→∞.

To construct a Π-compact B, note that this is equivalent with:

B is |||·|||1-compact

√B is |||·|||2-compact

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 25 / 45

Page 27: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Frechet means (barycentre)

The Frechet mean of a collection S1, . . . , SN of covariance operators is

S ∈ arg minS

N∑i=1

Π2(S, Si)

Always existsUnique if one Si is injective“Swells” less than the arithmetic mean:

(S1 + · · ·+ SN )/N − S ≥ 0 (a nonnegative operator)

Can be computed by steepest descent

Stability [Masarotto, Panaretos & Z., 2019]

Let Ski → Si as k →∞, for all i = 1, . . . , N

Let Sk

Frechet mean of (Sk1 , . . . , SkN )

If Frechet mean S of (S1, . . . , SN ) unique, then

Sk → S

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 26 / 45

Page 28: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

4. Two statistical applications

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 27 / 45

Page 29: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Tangent space functional principal component analysis

Once S found, can do principal component analysis on the tangent space:

Lift to the tangent space

logS(Si) = tSi

S− I = (S)−1/2[S

1/2SiS

1/2]1/2(S)−1/2 − I ∈ TanS

(requires existence of the maps tSi

S)

Since TanS is a Hilbert space, principal component analysis amounts to:

1 Constructing tangent space empirical covariance

1

N

N∑i=1

(tSi

S− I

)⊗S(tSi

S− I

)2 Extracting eigenvectors on TanS and retracting their span via exp

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 28 / 45

Page 30: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Optimal maps

Let S1, . . . , SN covariances with Frechet mean S

Transport maps tSi

Sexist if...

Conjecture

If S1, . . . , SN are injective, then so is S

True in finite dimensions

True if they commute

Trouble can be avoided though:

Theorem (Masarotto, Panaretos & Z., in progress)

Transport maps tSi

Sexist as bounded operators with∣∣∣∣∣∣∣∣∣tSi

S

∣∣∣∣∣∣∣∣∣∞≤ N, |||A|||∞ = sup

x∈X\{0}

‖Ax‖‖x‖

↪→ tangent space principal component analysis well-definedYoav Zemel (Cambridge) subsampling-Wasserstein-covariance 29 / 45

Page 31: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Testing homogeneity for covariances

Have samples Xi,1, . . . , Xi,K with covariances Si

Wish to test H0 : S1 = · · · = SN

Key idea: rewrite H0 as

tSi

S− I := ∆i = 0, i = 1, . . . , N.

Test statistic

Tr =

N∑i=1

∣∣∣∣∣∣∣∣∣∆i

∣∣∣∣∣∣∣∣∣2r, r ∈ {1, 2,∞}.

Reject for high values

Calibrate using permutations

More powerful than other methods

Simulation setup taken from Cabassi et al. (2017): k1 groups have covariance(1 + γ)Sm and k2 = N − k1 groups have covariance Sm, the male covarianceoperator of the Berkeley growth dataset.

K = 20 curves generated from each group

Compare power with “pairwise test” of Cabassi et al. (2017)

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 30 / 45

Page 32: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Additive perturbations: Gaussian marginals

γ

Pow

er

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0

k1=4, k2=4 k1=1, k2=7

k1=1, k2=3

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

k1=2, k2=2

Pairwise Transport Map

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 31 / 45

Page 33: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Additive perturbations: Student marginals

γ

Pow

er

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0

k1=4, k2=4 k1=1, k2=7

k1=1, k2=3

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

k1=2, k2=2

Pairwise Transport Map

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 32 / 45

Page 34: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Geodesic perturbations: Gaussian marginals

γ

Pow

er

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0

k1=4, k2=4 k1=1, k2=7

k1=1, k2=3

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

k1=2, k2=2

Pairwise Transport Map

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 33 / 45

Page 35: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Geodesic perturbations: Student marginals

γ

Pow

er

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.5 1.0 1.5 2.0 2.5 3.0

k1=4, k2=4 k1=1, k2=7

k1=1, k2=3

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

k1=2, k2=2

Pairwise Transport Map

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 34 / 45

Page 36: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Generative model

Hilbert–Schmidt distance |||S1 − S2|||2 implies linear model

Si = S + ∆i, E∆i = 0.

What about Procrustes distance Π?

Generative model and deformations [MPZ19]

Let S be any covariance operator, and let t : X → X be a random self-adjointnonnegative operator satisfying

1 E|||t|||2∞ <∞2 Et = I

Then S is a Π-Frechet mean of the random operator tSt:

EΠ2(S, tSt) ≤ EΠ2(S′, tSt)

for any covariance operator S′.

Linear model on TanS!

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 35 / 45

Page 37: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

References

1 Sommerfeld, Schrieber, Zemel & Munk (2019).Optimal transport: Fast probabilistic approximations with exact solvers.Journal of Machine Learning Research 20(105):1–23.

2 Masarotto, Panaretos & Zemel (2019). Procrustes Metrics on CovarianceOperators and Optimal Transportation of Gaussian Processes. Invited paper,Special Issue on Manifold Statistics, Sankhya A 81(1):172–213.

3 Schrieber, Schuhmacher & Gottschlich (2016). DOTmark — A Benchmarkfor Discrete Optimal Transport. IEEE Access, 5:271–282.

4 Peyre, G. & Cuturi, M. (2019). Computational Optimal transport.Foundations and Trends in Machine Learning.

5 Villani, C. (2008). Optimal Transport: Old and New. Springer.

6 Panaretos & Zemel (2019). Statistical Aspects of Wasserstein Distances.Annual Review of Statistics and Its Applications 6:405–431.

7 Panaretos & Zemel (2018+). An Invitation to Statistics in WassersteinSpace. SpringerBriefs in Probability & Mathematical Statistics (in press).

8 Bigot (2019). Statistical data analysis in the Wasserstein space.arXiv:1907.08417.

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 36 / 45

Page 38: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

α(D, p) =

1/(1− 2D/2−p) D < 2p,

2 +D−1 D = 2p,

2 + 1/(2D/2−p − 1) D > 2p.

≤ 3 +√

2 (p,D ∈ Z).

P

[|Wp(µS , νS)−Wp(µ, ν)| ≥ z +

2E1/p

S1/(2p)

]≤ 2 exp

(− SBz2p

8 diam(X )2p

).

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 37 / 45

Page 39: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Computation

In practice, have finite rank approximations S1, . . . , SN

Steepest descent – essentially Procrustean type algorithm!

(A) For j = 0, set Γj = S1 + · · ·+ SN .

(B) For i = 1, . . . , N solve the (pairwise) coupling problem and find the optimal

transport map tSi

Γj= Γ

−1/2j (Γ

1/2j SiΓ

1/2j )1/2Γ

−1/2j from Γj to Si.

(C) Define the map tj = 1N

∑Ni=1 t

Si

Γj= 1

N

∑Ni=1 Γ

−1/2j (Γ

1/2j SiΓ

1/2j )1/2Γ

−1/2j

(D) Set Γj+1 = tjΓjtj .

(E) Iterate (B)–(D).

Provably converges to unique Frechet mean when dim(X ) <∞Very stable/fast in practice.

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 38 / 45

Page 40: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Example: Frechet mean of four covariances

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 39 / 45

Page 41: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Example: optimal maps from the Frechet mean

FRECHET MEAN IN WASSERSTEIN SPACE 5

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

Fig 8. Registration maps from the Frechet mean of Figure 6 in the article to the four measures of Figure 6in the article

Institut de MathematiquesEcole Polytechnique Federale de Lausanne1015 Lausanne, Switzerland??

??

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 40 / 45

Page 42: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Steepest descent is a Procrustes algorithm on optimal maps

1 Registration: register each Si to current template Γj , via maps tSi

Γj.

In geometrical terms, lift {Si}Ni=1 to tangent space at Γj

Local linear coordinates (actually global): tSiΓj− I = logΓj

(Si)

2 Averaging: average registered measures coordinate-wise

In geometrical terms, average local linear representation tSiΓj− I = logΓj

(Si)

Then retract linear average back onto the manifold via the exponential map

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 41 / 45

Page 43: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Interpretation: phase variation

Suggests connections with the registration problem in functional data analysis:1 Interested in a Gaussian process X ∼ N(0, S), viewed via KL-expansion

X =

∞∑n=1

σ1/2n ξnϕn

for {σn, ϕn} the eigenvalue/eigenfunction pairs of S, and ξniid∼ N(0, 1)

↪→ Amplitude variation: superposition of random N(0, σn) amplitudefluctuations around fixed (deterministic) modes ϕn

2 Instead, one observes warped version,

X = tX =

∞∑n=1

σ1/2n ξntϕn

↪→ Phase variation2: emanates from deformation fluctuations of the modes ϕn

3 Tangent PCA+multicoupling: optimal registration!

(tSSiare registration maps!)

2The term phase comes from the case X = L2[0, 1], X(x) : [0, 1] → R where deformationvariation is attributable to the “x-axis” (ordinate).

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 42 / 45

Page 44: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Principal component analysis with different inner product

Tangent space data

∆i = logS(Si) = tSi

S− I = (S)−1/2[S

1/2SiS

1/2]1/2(S)−1/2 − I ∈ TanS

Centred: ∆i + · · ·+ ∆N = 0

TanS is Hilbert, but inner product is not Hilbert–Schmidt one, rather

〈A,B〉S = trace[ASB]

Empirical tangent space covariance

K =1

N

N∑i=1

∆i ⊗S ∆i, (A⊗S B)C = 〈B,C〉S A, A,B,C ∈ TanS

Eigenvalues and vectors can be found in Hilbert–Schmidt space (Ocana,Aguilera & Valderrama 1999)

KA = λA for A ∈ TanS if and only if KHSS1/2A = λ(S

1/2A) with

KHS =1

N

N∑i=1

∆iS1/2 ⊗∆iS

1/2

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 43 / 45

Page 45: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Interpretation: optimal multicoupling

Coupling several Gaussian measures (multicoupling)

Let N(0, Si) be Gaussians on X . Construct random vector (X1, . . . , XN ) ∈ XNso

1 Xi ∼ N(0, Si) for all i.

2 For any other random vector (Y1, . . . , YN ) ∈ XN with Yi ∼ N(0, Si),∑i<j

E‖Xi −Xj‖2 ≤∑i<j

E‖Yi − Yj‖2.

Answer [MPZ19]

Find Frechet mean S, letZ ∼ N(0, S)

and defineXi = tSi

SZ = (S)−1/2(S

1/2SiS

1/2)1/2(S)−1/2Z.

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 44 / 45

Page 46: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Interpretation: optimal multicoupling

Coupling several Gaussian measures (multicoupling)

Let N(0, Si) be Gaussians on X . Construct random vector (X1, . . . , XN ) ∈ XNso

1 Xi ∼ N(0, Si) for all i.

2 For any other random vector (Y1, . . . , YN ) ∈ XN with Yi ∼ N(0, Si),∑i<j

E‖Xi −Xj‖2 ≤∑i<j

E‖Yi − Yj‖2.

Answer [MPZ19]

Find Frechet mean S, letZ ∼ N(0, S)

and defineXi = tSi

SZ = (S)−1/2(S

1/2SiS

1/2)1/2(S)−1/2Z.

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 44 / 45

Page 47: Two Geometric Problems in Optimal Transport: Discrete and ...yz668/maz/toulouse-subsampling_c… · Discrete and Gaussian measures Yoav Zemel Statistical Laboratory University of

Collections of covariance operators

n populations with covariance operators

Si : X → X , i = 1, . . . , n

from which we observe Ni noisy realisations

Xijk = Xij(tk) + εijk, i ≤ n, j ≤ Ni.

DNA biophysics (sequence-dependent flexibility, Panaretos et al., 2010)

Computational linguistics (phonetic analysis, Pigoli et al., 2014, 2018)

Yoav Zemel (Cambridge) subsampling-Wasserstein-covariance 45 / 45


Recommended