+ All Categories
Home > Data & Analytics > Optimal Transport vs. Fisher-Rao distance between Copulas

Optimal Transport vs. Fisher-Rao distance between Copulas

Date post: 16-Jan-2017
Category:
Upload: hellebore-capital-limited
View: 143 times
Download: 0 times
Share this document with a friend
18
Introduction Statistical distances Optimal Transport vs. Fisher-Rao distance between Copulas IEEE SSP 2016 G. Marti, S. Andler, F. Nielsen, P. Donnat June 28, 2016 Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas
Transcript
Page 1: Optimal Transport vs. Fisher-Rao distance between Copulas

IntroductionStatistical distances

Optimal Transport vs. Fisher-Rao distancebetween Copulas

IEEE SSP 2016

G. Marti, S. Andler, F. Nielsen, P. Donnat

June 28, 2016

Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas

Page 2: Optimal Transport vs. Fisher-Rao distance between Copulas

IntroductionStatistical distances

Clustering of Time Series

We need a distance Dij between time series xi and xj

If we look for ‘correlation’, Dij is a decreasing function of ρij ,a measure of ‘correlation’

Several choices are available for ρij . . .

Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas

Page 3: Optimal Transport vs. Fisher-Rao distance between Copulas

IntroductionStatistical distances

Copulas

Sklar’s Theorem:

F (xi , xj) = Cij(Fi (xi ),Fj(xj))

Cij , the copula, encodes the dependence structureFrechet-Hoeffding bounds:

max{ui + uj − 1, 0} ≤ Cij(ui , uj) ≤ min{ui , uj}

(left) lower-bound, (mid) independence, (right) upper-bound copulas

Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas

Page 4: Optimal Transport vs. Fisher-Rao distance between Copulas

IntroductionStatistical distances

Copulas - Gaussian Example

Gaussian copula: CGaussR (ui , uj) = ΦR(Φ−1(ui ),Φ

−1(uj))

The distribution is parametrized by a correlation matrix R.

Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas

Page 5: Optimal Transport vs. Fisher-Rao distance between Copulas

IntroductionStatistical distances

The Target/Forget (copula-based) Dependence Coefficient

Dependence is measured as the relative distance from independence tothe nearest target-dependence: comonotonicity or counter-monotonicity

Which distances are appropriate between copulas for the task ofclustering (copulas and time series)?

Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas

Page 6: Optimal Transport vs. Fisher-Rao distance between Copulas

IntroductionStatistical distances

Definitions - Fisher-Rao geodesic distance

Metrization of the paramater space {θ ∈ Rd |∫p(X ; θ)dx = 1}.

Consider the metric gjk(θ) = −∫ ∂2 log p(x ,θ)

∂θj∂θkp(x , θ)dx ,

the infinitesimal length ds(θ) =√

(∇θ)>G (θ)∇θ,

the Fisher-Rao geodesic distance

FR(θ1, θ2) =

∫ θ2

θ1

ds(θ).

f -divergences induce infinitesimal length proportional toFisher-Rao infinitesimal length:

Df (θ‖θ + dθ) =1

2(∇θ)>G (θ)∇θ.

Thus, they have the same local behaviour [1].

Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas

Page 7: Optimal Transport vs. Fisher-Rao distance between Copulas

IntroductionStatistical distances

Definitions - Optimal Transport distances

Wasserstein metric

Wp(µ, ν)p = infγ∈Γ(µ,ν)

∫M×M

d(x , y)pdγ(x , y)

Image from Optimal Transport for Image Processing, Papadakis

Other transportation distances: regularized discrete optimaltransport [3], Sinkhorn distances [2], . . .

Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas

Page 8: Optimal Transport vs. Fisher-Rao distance between Copulas

IntroductionStatistical distances

Geometry of covariances

Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas

Page 9: Optimal Transport vs. Fisher-Rao distance between Copulas

IntroductionStatistical distances

Distances between Gaussian copulas

Copulas C1,C2,C3 encoding a correlation of 0.5, 0.99, 0.9999 respectively;Which pair of copulas is the nearest?- For Fisher-Rao, Kullback-Leibler, Hellinger and related divergences:D(C1,C2) ≤ D(C2,C3);- For Wasserstein: W2(C2,C3) ≤W2(C1,C2)

Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas

Page 10: Optimal Transport vs. Fisher-Rao distance between Copulas

IntroductionStatistical distances

Distances as a function of (ρ1, ρ2)

Distance heatmap and surface as a function of (ρ1, ρ2)

for Fisher-Rao for Wasserstein W2

Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas

Page 11: Optimal Transport vs. Fisher-Rao distance between Copulas

IntroductionStatistical distances

Distances impact on clustering

Datasets of bivariate time series are generated from six Gaussian copulaswith correlation .1, .2, .6, .7, .99, .9999

Distance heatmaps for Fisher-Rao (left), W2 (right); Using Wardclustering, Fisher-Rao yields clusters of copulas with correlations{.1, .2, .6, .7}, {.99}, {.9999}, W2 yields {.1, .2}, {.6, .7}, {.99, .9999}

Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas

Page 12: Optimal Transport vs. Fisher-Rao distance between Copulas

IntroductionStatistical distances

Fisher metric and the Cramer–Rao lower bound

Cramer–Rao lower bound (CRLB)

The variance of any unbiased estimator θ of θ is bounded by thereciprocal of the Fisher information G (θ):

var(θ) ≥ 1

G (θ).

In the bivariate Gaussian copula case,

var(ρ) ≥ (ρ− 1)2(ρ+ 1)2

3(ρ2 + 1).

Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas

Page 13: Optimal Transport vs. Fisher-Rao distance between Copulas

IntroductionStatistical distances

Fisher metric and the Cramer–Rao lower bound

We consider the set of 2× 2 correlation matrices C =

(1 θθ 1

)parameterized by θ.

Let x =

(x1x2

)∈ R2.

f (x ; θ) = 1

√1−θ2

exp(− 1

2x>C−1x

)= 1

√1−θ2

exp

(− 1

2(1−θ2)(x2

1 + x22 − 2θx1x2)

)log f (x ; θ) = − log(2π

√1− θ2)− 1

2(1−θ2)(x2

1 + x22 − 2θx1x2)

∂2 log f (x ;θ)

∂θ2 = − θ2+1(θ2−1)2 −

x21

2(θ+1)3 +x21

2(θ−1)3 −x22

2(θ+1)3 +x22

2(θ−1)3 −x1x2

(θ+1)3 −x1x2

(θ−1)3

Then, we compute∫∞−∞

∂2 log f (x ;θ)

∂θ2 f (x ; θ)dx .

Since E[x1] = E[x2] = 0, E[x1x2] = θ, E[x21 ] = E[x2

2 ] = 1, we get∫∞−∞

∂2 log f (x ;θ)

∂θ2 f (x ; θ)dx =

− θ2+1(θ2−1)2 −

12(θ+1)3 + 1

2(θ−1)3 −1

2(θ+1)3 + 12(θ−1)3 −

θ(θ+1)3 −

θ(θ−1)3 = − 3(θ2+1)

(θ−1)2(θ+1)2

Thus,

G(θ) =3(θ2 + 1)

(θ − 1)2(θ + 1)2.

Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas

Page 14: Optimal Transport vs. Fisher-Rao distance between Copulas

IntroductionStatistical distances

Fisher metric and the Cramer–Rao lower bound

In the bivariate Gaussian copula case,

var(ρ) ≥ (ρ− 1)2(ρ+ 1)2

3(ρ2 + 1).

Recall that locally Fisher-Rao and the f -divergences are aquadratic form of the Fisher metric (∇θ)>G (θ)∇θ. So, thediscriminative power of these distances is well calibrated withrespect to statistical uncertainty. For this purpose, they induce theappropriate curvature on the parameter space.

Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas

Page 15: Optimal Transport vs. Fisher-Rao distance between Copulas

IntroductionStatistical distances

Properties of these distances

In addition, for clustering we prefer OT since:

in a parametric setting:

Fisher-Rao and f -divergences are defined on density manifolds,but some important copulas (such as the Frechet-Hoeffdingupper bound) do not belong to these manifolds;Thus, in case of closed-form formulas (such as in the Gaussiancase), they are ill-defined for these copulas (for perfectdependence, covariance is not invertible)

in a non-parametric/empirical setting:

f -divergences are defined for absolutely continuous measures,thus require a pre-processing KDEthey are not aware of the support geometry, thus badly handlenoise on the support

Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas

Page 16: Optimal Transport vs. Fisher-Rao distance between Copulas

IntroductionStatistical distances

Barycenters

OT is defined for both discrete/empirical and continuous measuresand is support-geometry aware:

0 0.5 10

0.5

1

0.0000

0.0015

0.0030

0.0045

0.0060

0.0075

0.0090

0.0105

0.0120

0 0.5 10

0.5

1

0.0000

0.0015

0.0030

0.0045

0.0060

0.0075

0.0090

0.0105

0.0120

0 0.5 10

0.5

1

0.0000

0.0008

0.0016

0.0024

0.0032

0.0040

0.0048

0.0056

0 0.5 10

0.5

1

0.0000

0.0015

0.0030

0.0045

0.0060

0.0075

0.0090

0.0105

0.0120

0 0.5 10

0.5

1

0.0000

0.0015

0.0030

0.0045

0.0060

0.0075

0.0090

0.0105

0.0120

5 copulas describing the dependence between X ∼ U([0, 1]) andY ∼ (X ± εi )2, where εi is a constant noise specific for each distribution

0 0.5 10

0.5

1Wasserstein barycenter copula

0.0000

0.0004

0.0008

0.0012

0.0016

0.0020

0.0024

0.0028

0.0032

Barycenter of the 5 copulas for a divergence and OT

Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas

Page 17: Optimal Transport vs. Fisher-Rao distance between Copulas

IntroductionStatistical distances

Future Research

Develop further geometries of copulas

using Optimal Transport: show that dependence-clustering oftime series is improved over standard correlationsusing f -divergences: detect efficiently dependence-regimeswitching in multivariate time series (cf. Frederic Barbaresco’swork on radar signal processing)

Numerical experiments and code:

https://www.datagrapple.com/Tech/fisher-vs-ot.html

Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas

Page 18: Optimal Transport vs. Fisher-Rao distance between Copulas

IntroductionStatistical distances

Shun-ichi Amari and Andrzej Cichocki.Information geometry of divergence functions.Bulletin of the Polish Academy of Sciences: TechnicalSciences, 58(1):183–195, 2010.

Marco Cuturi.Sinkhorn distances: Lightspeed computation of optimaltransport.In Advances in Neural Information Processing Systems, pages2292–2300, 2013.

Sira Ferradans, Nicolas Papadakis, Julien Rabin, Gabriel Peyre,and Jean-Francois Aujol.Regularized discrete optimal transport.Springer, 2013.

Gautier Marti Optimal Transport vs. Fisher-Rao distance between Copulas


Recommended