Gradient ows, optimal transport, and evolution PDE’sI C. Villani: Optimal transport: Old and New...

transcript

Introduction The discrete case Measures The Euclidean case

Gradient flows, optimal transport,and evolution PDE’s

2 - A quick introduction to Optimal Transport

Giuseppe Savarehttp://www.imati.cnr.it/∼savare

Dipartimento di Matematica, Universita di Pavia

GNFM Summer SchoolRavello, September 13–18, 2010

Outline

1 A short historical tour

2 The “discrete” case, duality and linear programming

3 The measure-theoretic setting

4 Euclidean spaces: geometry and transport maps

Outline

Gaspard Monge (1746-1818)

1781: “La theorie des deblais et des remblais ”

Problem: how to transport soil from the groud to a given configuration in the“most efficient” way.

42 3 The founding fathers of optimal transport

minimize the total cost. Monge assumed that the transport cost of oneunit of mass along a certain distance was given by the product of themass by the distance.

deblaisremblais

Fig. 3.1. Monge’s problem of deblais and remblais

Nowadays there is a Monge street in Paris, and therein one can findan excellent bakery called Le Boulanger de Monge. To acknowledge this,and to illustrate how Monge’s problem can be recast in an economicperspective, I shall express the problem as follows. Consider a largenumber of bakeries, producing loaves, that should be transported eachmorning to cafes where consumers will eat them. The amount of breadthat can be produced at each bakery, and the amount that will beconsumed at each cafe are known in advance, and can be modeled asprobability measures (there is a “density of production” and a “densityof consumption”) on a certain space, which in our case would be Paris(equipped with the natural metric such that the distance between twopoints is the length of the shortest path joining them). The problem isto find in practice where each unit of bread should go (see Figure 3.2),in such a way as to minimize the total transport cost. So Monge’sproblem really is the search of an optimal coupling; and to be moreprecise, he was looking for a deterministic optimal coupling.

Fig. 3.2. Economic illustration of Monge’s problem: squares stand for productionunits, circles for consumption places.

The transport cost is proportional to the distance |T (x)− x|.

Leonid Kantorovich (1912-1986)

1939: Mathematical Methods of Organizing andPlanning of Production,(unpublished until 1960).1942: On the translocation of masses1948: On a problem of Monge

1975: Nobel prize, jointly with Tjalling Koopmans,“for their contributions to the theory of optimum allocation of resources”

Autobiography:http://nobelprize.org/nobel prizes/economics/laureates/1975/kantorovich-autobio.html

Parallel contributions:1941: Frank Hitchcock, The distribution of a product from several sources tonumerous localities (Jour. Math. Phys.)1947: Tjalling Koopmans, Optimum utilization of the transportation system.1947: George Dantzig, simplex method.

Twoards the recent theory...I Statistical and probabilistic aspects:

(beginning of ’900: Gini, Dall’Aglio, Hoeffding, Frechet,. . . )

Rachev-Ruschendorf, Mass Transportation Problems (1998)I Particle systems, Boltzmann equation:

Dobrushin, Tanaka (∼’70)

I Yann Brenier (’89): fluid mechanics, transportmap, polar decomposition. Dynamical interpratationof optimal transport.

I John Mather: Lagrangian dynamical systems.I Mike Cullen: meteorologic models, semigeostrofic equations.I Regularity, geometric and functional inequalities, Riemannian geometry,

urban planning, evolution equations, etc.:L. Caffarelli, C. Evans, W. Gangbo, R. McCann, F. Otto, L.Ambrosio, G. Buttazzo, C. Villani, J. Lott, N. Trudinger, G. Loeper,T. Sturm, J. Carrillo, G. Toscani, A. Pratelli,. . .

I C. Villani: Optimal transport: Old and NewSpringer (2009) 978 p.

Outline

Discrete formulation

• Initial configuration of resources in X =x1, · · · , xh; at every point xi ∈ X itis available the quantity mi = m(xi).

• Final configuration Y = y1, · · · , yn:at every point yj the quantity nj =n(yj) is expected.

• The unitary cost cij = c(xi, yj) fortransporting the single unit from posi-tion xi to the destination yj . x4

Admissible transference plan: choose the quantities Ti,j = T (xi, yj) moved

from xi to yj , so that

T (xi, yj) ≥ 0,Xy∈Y

T (xi, y) = m(xi),Xx∈X

T (x, yj) = n(yj)

The cost of the transference plan T is C(T ) :=X

x∈X,y∈Yc(x, y)T (x, y)

T (x, yj) = n(yj)

Optimal transport

Problem

Find the best transference plan T which minimizes the cost C(T ) among all theadmissible plans.

The linear programming structure: given positive coefficients mi,nj and ci,jfind the quantities Ti,j minimizing the linear functional

C(T ) =Xi,j

ci,jTi,j

under the linear/convex constraints

Ti,j ≥ 0,Xj

Ti,j = mi,Xi

Ti,j = mj

In vector notation:min ~C · ~T : A0

~T ≥ 0, A1~T = ~b

In the discrete case existence of the optimal plan is easy; more important are3 foundamental properties:

I Cyclical monotonicity of the optimal transference plan.

I Dual characterization, Kantorovich potentials (prices in economic terms),linear programming.

I Integrality of the transference plan, transport maps.

Optimal transport

Problem

C(T ) =Xi,j

ci,jTi,j

Ti,j ≥ 0,Xj

Ti,j = mi,Xi

Ti,j = mj

~T ≥ 0, A1~T = ~b

Optimal transport

Problem

C(T ) =Xi,j

ci,jTi,j

Ti,j ≥ 0,Xj

Ti,j = mi,Xi

Ti,j = mj

~T ≥ 0, A1~T = ~b

Cyclical monotonicityConsider an aribtrary collection of couples (x, y) joined by a transport ray , i.e.T (x, y) > 0: in the picture (x2, y1), (x3, y2), (x4, y3)

The associated (unitary) cost is

c(x2, y1) + c(x3, y2) + c(x4, y3) ≤ c(x2, y2) + c(x3, y3) + c(x4, y1)

if one applies a (cyclical) permutation σ of the targets: y1 → y2 → y3 → y1

Theorem (Rachev-Ruschendorf)

If T is optimal the cost of any rearranged configuration by a cyclical permutationcannot decrease.

Cyclical monotonicity is also sufficient

Theorem

If T is a cyclically monotone admissible plan then it is optimal.

The dual problem: optimal prices

Linear programming: the dual problem gives a crucial insight on the structureof the optimal transference plan.

Economic interpretation: a transport company offers to take care thetransportation job: they will pay the price u(x) to buy a unit placed at thepoint x and they will sell it at y for the price v(y).To be competitive, the prices should be more convenient than the transportationcost c(x,y):

v(y)− u(x) ≤ c(x,y) x ∈ X, y ∈ Y (*)

The total profit for the company is

P(u,v) :=Xy∈Y

n(y)v(y)−Xx∈X

m(x)u(x)

and their problem is to find the prices which maximaize the profits

maxP(u, v) among all the competitive prices (u,v) satisfying (*)

Clearly C(T ) ≥ P(u,v) for every admissible trasnference plan T and every coupleof competitive prices u,v.

v(y)− u(x) ≤ c(x,y) x ∈ X, y ∈ Y (*)

P(u,v) :=Xy∈Y

n(y)v(y)−Xx∈X

m(x)u(x)

v(y)− u(x) ≤ c(x,y) x ∈ X, y ∈ Y (*)

P(u,v) :=Xy∈Y

n(y)v(y)−Xx∈X

m(x)u(x)

v(y)− u(x) ≤ c(x,y) x ∈ X, y ∈ Y (*)

P(u,v) :=Xy∈Y

n(y)v(y)−Xx∈X

m(x)u(x)

Duality theorem

Theorem (Min-max and “complementary slackness”)

An admissible transference plan T is optimal if and only if there exist competitiveprices (u,v) such that

C(T ) = P(u,v).

In particular

minTC(T ) = max

(u,v)P(u,v).

Moreover, the “slackness”

S(x,y) := c(x,y)− u(x)− v(y) ≥ 0

satisfies the “complementary slackness principle”

T (x,y)S(x,y) = 0 i.e. T (x,y) > 0 ⇒ S(x,y) = 0.

“If x and y are connected through an optimal transport ray then their respectiveprices u(x) e v(y) are maximal: v(y)− u(x) = c(x,y).”

Duality theorem

C(T ) = P(u,v).

In particular

minTC(T ) = max

(u,v)P(u,v).

S(x,y) := c(x,y)− u(x)− v(y) ≥ 0

T (x,y)S(x,y) = 0 i.e. T (x,y) > 0 ⇒ S(x,y) = 0.

Duality theorem

C(T ) = P(u,v).

In particular

minTC(T ) = max

(u,v)P(u,v).

S(x,y) := c(x,y)− u(x)− v(y) ≥ 0

T (x,y)S(x,y) = 0 i.e. T (x,y) > 0 ⇒ S(x,y) = 0.

Duality theorem

C(T ) = P(u,v).

In particular

minTC(T ) = max

(u,v)P(u,v).

S(x,y) := c(x,y)− u(x)− v(y) ≥ 0

T (x,y)S(x,y) = 0 i.e. T (x,y) > 0 ⇒ S(x,y) = 0.

Duality via Von Neumann min-max

ci,jTi,j : Ti,j ≥ 0,Xj

Ti,j = mi,Xi

Ti,j = nj .

Introduce Lagrange multipliers Si,j ≥ 0, ui,vj for the constraint

ci,jTi,j = minT

maxS,u,v

ci,jTi,j −Xi,j

Si,jTi,j

Ti,j −mi

”+Xj

Ti,j −mj

”= min

TmaxS,u,v

“ci,j − Si,j − ui + vj

”+ vjnj − uimi

= maxS,u,v

”+ vjnj − uimi

= maxu,v

vjnj − uimi : ci,j − Si,j − ui − vj = 0

= maxu,v

vjnj − uimi : ci,j − ui − vj ≥ 0.

Ti,j = mi,Xi

Ti,j = nj .

ci,jTi,j = minT

maxS,u,v

ci,jTi,j −Xi,j

Si,jTi,j

Ti,j −mi

”+Xj

Ti,j −mj

”= min

TmaxS,u,v

”+ vjnj − uimi

= maxS,u,v

”+ vjnj − uimi

= maxu,v

Ti,j = mi,Xi

Ti,j = nj .

ci,jTi,j = minT

maxS,u,v

ci,jTi,j −Xi,j

Si,jTi,j

Ti,j −mi

”+Xj

Ti,j −mj

”= min

TmaxS,u,v

”+ vjnj − uimi

= maxS,u,v

”+ vjnj − uimi

= maxu,v

Ti,j = mi,Xi

Ti,j = nj .

ci,jTi,j = minT

maxS,u,v

ci,jTi,j −Xi,j

Si,jTi,j

Ti,j −mi

”+Xj

Ti,j −mj

”= min

TmaxS,u,v

”+ vjnj − uimi

= maxS,u,v

”+ vjnj − uimi

= maxu,v

Ti,j = mi,Xi

Ti,j = nj .

ci,jTi,j = minT

maxS,u,v

ci,jTi,j −Xi,j

Si,jTi,j

Ti,j −mi

”+Xj

Ti,j −mj

”= min

TmaxS,u,v

”+ vjnj − uimi

= maxS,u,v

”+ vjnj − uimi

= maxu,v

Ti,j = mi,Xi

Ti,j = nj .

ci,jTi,j = minT

maxS,u,v

ci,jTi,j −Xi,j

Si,jTi,j

Ti,j −mi

”+Xj

Ti,j −mj

”= min

TmaxS,u,v

”+ vjnj − uimi

= maxS,u,v

”+ vjnj − uimi

= maxu,v

Ti,j = mi,Xi

Ti,j = nj .

ci,jTi,j = minT

maxS,u,v

ci,jTi,j −Xi,j

Si,jTi,j

Ti,j −mi

”+Xj

Ti,j −mj

”= min

TmaxS,u,v

”+ vjnj − uimi

= maxS,u,v

”+ vjnj − uimi

= maxu,v

Integrality

Theorem

If the initial and final configuration m(x),n(y) ∈ N are integers then thereexists an integer optimal transference plan T , i.e. T (x,y) ∈ N.

In other words, there is no need to split unitary quantities in order to realize theoptimal transport.

Corollary

If m(x) ≡ 1 and n(y) are integers, then the transference plan T is associated toa transport map t : X → Y so that

T (x,y) > 0 ⇔ y = t(x).

If moreover n(y) ≡ 1 then the map t is one-to-one.

Roughly speaking: from every point x ∈ X start a unique transport ray andmass is not splitted in various directions.

Integrality

Theorem

Corollary

T (x,y) > 0 ⇔ y = t(x).

Integrality

Theorem

Corollary

T (x,y) > 0 ⇔ y = t(x).

Outline

Measure dataI X,Y discrete spaces X,Y topological spaces (R,RN , locally compact

spaces, Polish (i.e. complete and separable) spaces, Radon spaces, . . . ): hereRN .

I The cost a (lower-semi) continuous function c : X × Y → R ∪ +∞.I The initial and final configurations m(x),n(y) a couple of Borel measuresµ,ν on X and Y . The mass is normalized to 1.Given A ⊂ X,B ⊂ Y µ(A) denotes the quantity of resources available inA, ν(B) denotes the resources expected in B.

Transport plan T a measure γ onX×Y : γ(A ×B) is the mass coming from Aand transported in B.Admissibility: the marginals of γ arethus fixed (γ is a coupling between µ andν)

γ(A× Y ) = µ(A), γ(X ×B) = ν(B)

Γ(µ,ν) : collection of all the admissibletrasnference plan/couplings.

γ |x− y| = 0

The cost of a transference plan γ isXx,y

c(x, y)T (x, y) C(γ) :=

c(x,y) dγ(x,y).

γ(A× Y ) = µ(A), γ(X ×B) = ν(B)

γ |x− y| = 0

c(x, y)T (x, y) C(γ) :=

c(x,y) dγ(x,y).

γ(A× Y ) = µ(A), γ(X ×B) = ν(B)

γ |x− y| = 0

c(x, y)T (x, y) C(γ) :=

c(x,y) dγ(x,y).

γ(A× Y ) = µ(A), γ(X ×B) = ν(B)

γ |x− y| = 0

c(x, y)T (x, y) C(γ) :=

c(x,y) dγ(x,y).

γ(A× Y ) = µ(A), γ(X ×B) = ν(B)

γ |x− y| = 0

c(x, y)T (x, y) C(γ) :=

c(x,y) dγ(x,y).

Transport and probabilityDiscrete setting: x1, · · · , xN, m1, · · · ,mN µ =

Pimiδxi . t:=

transport map, yi = t(xi),

t#µ = ν =X

miδyi .

In term of measures

ν(B) =X

i:yi ∈ B

i:t(xi)∈Bmi =

Xi:xi∈t−1(B)

mi = µ(t−1(B))

In general, for every Borel map t : X → Y and every Borel measure µ ∈ P(X)we define

ν = t#µ ⇔ ν(B) = µ(t−1(B)).

In probability: P is a probability measure on the probability space Ω,X : Ω→ X is a random variable,

X#P ∈P(X ) is the law of X, X#P(A) = P[X ∈ A].

Change of variable formula:ZX

φ(t(x)) dµ(x) =

φ(y) dν(y)

Expectation: E[φ(X)] =

ZΩφ(X(ω)) dP(ω) =

ZXφ(x) d(X#P)

Pimiδxi . t:=

t#µ = ν =X

miδyi .

In term of measures

ν(B) =X

i:yi ∈ B

i:t(xi)∈Bmi =

Xi:xi∈t−1(B)

mi = µ(t−1(B))

ν = t#µ ⇔ ν(B) = µ(t−1(B)).

φ(t(x)) dµ(x) =

φ(y) dν(y)

ZXφ(x) d(X#P)

Pimiδxi . t:=

t#µ = ν =X

miδyi .

In term of measures

ν(B) =X

i:yi ∈ B

i:t(xi)∈Bmi =

Xi:xi∈t−1(B)

mi = µ(t−1(B))

ν = t#µ ⇔ ν(B) = µ(t−1(B)).

φ(t(x)) dµ(x) =

φ(y) dν(y)

ZXφ(x) d(X#P)

Pimiδxi . t:=

t#µ = ν =X

miδyi .

In term of measures

ν(B) =X

i:yi ∈ B

i:t(xi)∈Bmi =

Xi:xi∈t−1(B)

mi = µ(t−1(B))

ν = t#µ ⇔ ν(B) = µ(t−1(B)).

φ(t(x)) dµ(x) =

φ(y) dν(y)

ZXφ(x) d(X#P)

Pimiδxi . t:=

t#µ = ν =X

miδyi .

In term of measures

ν(B) =X

i:yi ∈ B

i:t(xi)∈Bmi =

Xi:xi∈t−1(B)

mi = µ(t−1(B))

ν = t#µ ⇔ ν(B) = µ(t−1(B)).

φ(t(x)) dµ(x) =

φ(y) dν(y)

ZXφ(x) d(X#P)

The general problem

Problem

Given two Borel probability measures µ ∈ P(X) and ν ∈ P(Y ) find anadmissible trasnference plan γ ∈ Γ(µ,ν) minimizing the toal cost

minγ∈Γ(µ,ν)

Kantorovich potentials: functions u : X → R, v : Y → R such that

v(y)− u(x) ≤ c(x,y) (Π(c))

u(x)m(x) +Xy

v(y)n(y) P(u,v) :=

ZXu(x) dµ(x) +

ZYv(y) dν(y)

Problem (Dual formulation)

Find a couple of Kantorovich potentials (u,v) ∈ Π(c) maximizing

maxΠ(c)P(u,v).

The general problem

Problem

minγ∈Γ(µ,ν)

v(y)− u(x) ≤ c(x,y) (Π(c))

u(x)m(x) +Xy

v(y)n(y) P(u,v) :=

ZXu(x) dµ(x) +

ZYv(y) dν(y)

maxΠ(c)P(u,v).

The general problem

Problem

minγ∈Γ(µ,ν)

v(y)− u(x) ≤ c(x,y) (Π(c))

u(x)m(x) +Xy

v(y)n(y) P(u,v) :=

ZXu(x) dµ(x) +

ZYv(y) dν(y)

maxΠ(c)P(u,v).

A foundamental theorem

Assume that the cost is continuous and feasible, e.g.

C(µ⊗ ν) =

ZZX×Y

c(x,y) d(µ⊗ ν)(x,y) < +∞ (sufficient feasibility codition)

Theorem

Existence There rexists an optimal transference plan γopt ∈ Γ(µ,ν) and acouple of optimal Kantorovich potentials (uopt,vopt) ∈ Π(c).

Duality

C(γopt) = minΓ(µ,ν)

C(γ) = maxΠ(c)P(u, v) = P(uopt,vopt).

Slackness For every (x,y) ∈ supp(γ) ( connection by a transport ray)

c(x,y) = vopt(y)− uopt(x).

Cyclical monotonicity For every (x1, y1), (x2, y2), · · · , (xN , yN ) in the supportof γ and every permutation σ : 1, 2, · · ·N → 1, 2, · · · , N

c(x1, y1) + · · ·+ c(xN , yN ) ≤ c(x1, yσ(1)) + · · · c(xN , yσ(N)).

C(µ⊗ ν) =

ZZX×Y

Theorem

Duality

C(µ⊗ ν) =

ZZX×Y

Theorem

Duality

C(µ⊗ ν) =

ZZX×Y

Theorem

Duality

C(µ⊗ ν) =

ZZX×Y

Theorem

Duality

Outline

Some important questions

I Uniqueness of the optimal transference plan

I Integrality existence of a transport map.

I Links with the geometry: the cost function (x,y) depends on the distancebetween x and y (|x− y| when X = Y = Rd)

I Regularity of Kantorovich potentials

I Further information when the measures µ = fL d L d andν = gL d L d are absolutely continuous with respect to theLebesgue measure:

µ(A) =

ZAf(x) dx, ν(B) =

ZBg(y) dy.

All these questions are strictly linked!From now on we will consider the Euclidean case X = Y = Rd.

µ(A) =

ZAf(x) dx, ν(B) =

ZBg(y) dy.

µ(A) =

ZAf(x) dx, ν(B) =

ZBg(y) dy.

µ(A) =

ZAf(x) dx, ν(B) =

ZBg(y) dy.

µ(A) =

ZAf(x) dx, ν(B) =

ZBg(y) dy.

µ(A) =

ZAf(x) dx, ν(B) =

ZBg(y) dy.

Integrality and transport maps

At the continuous level the integrality condition could be informally stated byasking that (almost) every point x is the starting point of at most onetransport ray.We can say that y is connected to x by a transport ray if (x,y) ∈ suppγ; thus wehave

(x,y1), (x,y2) ∈ suppγ ⇒ y1 = y2 =: t(x)

a property which should hold µ-almost everywhere.t : X → Y is called transport map induced by the plan γ. It satisfies

if A = t−1(B) then µ(A) = ν(B) = γ(A×B).

Recalling the change-of-variable formula, if µ = f dx, ν = g dy, and t isdifferentiable

µ(A) =

ZAf(x) dx = ν(B) =

ZBg(y) dy =

ZAg(t(x))| det Dt(x)| dx

so thatf(x) = g(t(x))|det Dt(x)|.

(x,y1), (x,y2) ∈ suppγ ⇒ y1 = y2 =: t(x)

µ(A) =

ZAf(x) dx = ν(B) =

ZBg(y) dy =

so thatf(x) = g(t(x))| det Dt(x)|.

(x,y1), (x,y2) ∈ suppγ ⇒ y1 = y2 =: t(x)

µ(A) =

ZAf(x) dx = ν(B) =

ZBg(y) dy =

(x,y1), (x,y2) ∈ suppγ ⇒ y1 = y2 =: t(x)

µ(A) =

ZAf(x) dx = ν(B) =

ZBg(y) dy =

Existence and uniqueness of the optimal transport map:c(x, y) = 1

2 |x− y|2

Theorem (Brenier (1989))

Siano µ = f dx, ν = g dy, c(x,y) := 12|x− y|2

I There exists a unique optimal transference plan γ and it is associated to atransport map t.

I The Kantorovich potentials are perturbations of convex functions; moreprecisely

2|x|2 + u(x) = φ(x) and

2|y|2 − v(y) = ψ(y) are convex

and ψ is the Legendre transform of φ

ψ(y) = φ∗(y) = supx〈y,x〉 − φ(x).

I t(x) =∇φ(x) = x−∇u(x) is the gradient of a convex function, it isessentially injective, a.e. differentiable, differenziabile, and Dt = D2φ ispositive definite.

I φ solves Monge-Ampere equation

det D2φ(x) =f(x)

g(∇φ(x))

2 |x− y|2

2|x|2 + u(x) = φ(x) and

ψ(y) = φ∗(y) = supx〈y,x〉 − φ(x).

det D2φ(x) =f(x)

g(∇φ(x))

2 |x− y|2

2|x|2 + u(x) = φ(x) and

ψ(y) = φ∗(y) = supx〈y,x〉 − φ(x).

det D2φ(x) =f(x)

g(∇φ(x))

2 |x− y|2

2|x|2 + u(x) = φ(x) and

ψ(y) = φ∗(y) = supx〈y,x〉 − φ(x).

det D2φ(x) =f(x)

g(∇φ(x))

2 |x− y|2

2|x|2 + u(x) = φ(x) and

ψ(y) = φ∗(y) = supx〈y,x〉 − φ(x).

det D2φ(x) =f(x)

g(∇φ(x))

Brenier theoremµ = f dx,ν = g dx are absolutely continuous in Rd.

The optimal coupling γ ∈ Γo(µ,ν) isconcentrated on the graph of a

cyclically monotone map t:

γ = (i× t)#µ

W2(µ,ν) =

ZRd|x− t(x)|2 dµ(x)

t can be recovered by the optimal Kantorovich potentials u− v satisfying

v(y)− u(x) ≤ |x− y|2, W22(µ,ν) =

Zv(y) dν(y)−

Zu(x) dµ(x)

t(x) = x+∇u(x) = ∇“1

2|x|2 + u(x)

2|x|2 + u(x) is convex.

γ = (i× t)#µ

W2(µ,ν) =

v(y)− u(x) ≤ |x− y|2, W22(µ,ν) =

Zv(y) dν(y)−

Zu(x) dµ(x)

t(x) = x+∇u(x) = ∇“1

2|x|2 + u(x)

γ = (i× t)#µ

W2(µ,ν) =

v(y)− u(x) ≤ |x− y|2, W22(µ,ν) =

Zv(y) dν(y)−

Zu(x) dµ(x)

t(x) = x+∇u(x) = ∇“1

2|x|2 + u(x)

γ = (i× t)#µ

W2(µ,ν) =

v(y)− u(x) ≤ |x− y|2, W22(µ,ν) =

Zv(y) dν(y)−

Zu(x) dµ(x)

t(x) = x+∇u(x) = ∇“1

2|x|2 + u(x)

Extensions and applications

I Strictly convex costs c(x, y) = h(|x− y|): Gangbo-McCann,. . . (’96-)

I Monge problem c(x, y) = |x− y|: Sudakov (’79), Ambrosio (2000),. . . ,Bianchini, Champion-De Pascale,. . .

I Regularity: (Caffarelli,. . . (’92-), Wang, Trudinger, Loeper, Villani,McCann,)

I Isoperimetric and functional inequalities: Gromov, Villani, Otto,McCann, Maggi, Figalli, Pratelli, . . .

I Hilbert and Wiener spaces: Feyel-Ustunel, Ambrosio-Gigli-S., (’04-), . . .

I Riemannian manifold, Ricci flow: McCann, Sturm, Villani, Lott,Topping, Carfora . . . (’98-))

I . . .

A distance between probability measures

The quadratic cost c(x,y) = |x− y|2 induces a distance between probabilitymeasures with finite quadratic moment (P2(Rd)): the so-calledKantorovich-Rubinstein-Wasserstein distance

W2(µ,ν) :=“C(µ,ν)

”1/2=“

minγ∈Γ(µ,ν)

ZZ|x− y|2 dγ(x,y)

”1/2

This distance has a simple interpretation in the case of discrete measures: if

δxk e ν =1

δyk allora

W 22 (µ,ν) = min

|xk − yσ(k)|2, σ permutation of 1, 2, · · · , N

P2(Rd),W2 is a complete and separable metric space, the distance W2 isassociated to the weak convergence of measures:

W2(µn, µ)→ 0 ⇔

8><>:Zζ(x) dµn(x)→

Zζ(x) dµ(x)

per ogni ζ ∈ C0(Rd), |ζ(x)| ≤ A|x|2 +B.

W2(µ,ν) :=“C(µ,ν)

”1/2=“

minγ∈Γ(µ,ν)

”1/2

δxk e ν =1

δyk allora

W 22 (µ,ν) = min

W2(µn, µ)→ 0 ⇔

8><>:Zζ(x) dµn(x)→

Zζ(x) dµ(x)

W2(µ,ν) :=“C(µ,ν)

”1/2=“

minγ∈Γ(µ,ν)

”1/2

δxk e ν =1

δyk allora

W 22 (µ,ν) = min

W2(µn, µ)→ 0 ⇔

8><>:Zζ(x) dµn(x)→

Zζ(x) dµ(x)

Weak convergence, lower semicontinuity, and compactness

Definition (Weak convergence)

A sequence µn ∈P(Rm) converges weakly to µ ∈P(Rm) if

limn→+∞

ϕ(x) dµn(x) =

ϕ(x) dµ(x) ∀ϕ ∈ C0b(Rd)

I Test functions ϕ can be equivalently choosen in C0c (Rd) or in C∞c (Rd), as for

distributional convergence.

I If Xn → X pointwise, then (Xn)#P X#P.

I If ζ : Rd → [0,+∞] is just lower semicontinuous (no boundedness isrequired) and µn µ then

lim infn→+∞

ZRdζ(x) dµn(x) ≥

ZRdζ(x) dµ(x).

I Prokhorov Theorem: A set Γ ⊂P(Rd) is weakly relatively compactiff it is tight, i.e.

for every ε > 0 there exists a compact set K b Rd: µ(Rd \K) ≤ ε ∀µ ∈ Γ.

limn→+∞

ϕ(x) dµn(x) =

lim infn→+∞

ZRdζ(x) dµ(x).

limn→+∞

ϕ(x) dµn(x) =

lim infn→+∞

ZRdζ(x) dµ(x).

limn→+∞

ϕ(x) dµn(x) =

lim infn→+∞

ZRdζ(x) dµ(x).

Optimal couplings and triangular inequalityLower semicontinuity and tightness: the minimum problem

2(µ1,µ2) := minnZ

Rm×Rm|x1 − x2|2 dµ(x1,x2) : µ ∈ Γ(µ1,µ2)

ois attained: Γo(µ1,µ2) denotes the collection (closed, convex set) of all theoptimal couplings in P2(Rm × Rm). In general more than one optimal couplingcould exist.Connecting a sequence of measures, disintegration and Kolmogorovtheorem:if µ1,2 ∈ Γo(µ1, µ2), µ2,3 ∈ Γo(µ2, µ3), · · · ,µj,j+1 ∈ Γo(µj , µj+1) then thereexists a probability measure P and random variables X1, X2, X3, · · · , Xj , Xj+1, · · ·such that µ1,2 = (X1, X2)#P, · · · ,µj,j+1 = (Xj , Xj+1)#P.In particular

W22(µj , µj+1) = E

ˆ|Xj −Xj+1|2

˜(Xh, Xk)#P ∈ Γ(µh, µk) but it is not optimal in general

if h, k are not consecutive.Application: W2 is a distance, triangular inequality.

W2(µ1, µ3) ≤W2(µ1, µ2) + W2(µ2, µ3)

W2(µ1, µ3) ≤“

Eˆ|X1 −X3|2

˜”1/2=“

Eˆ|(X1 −X2) + (X2 −X3)|2

˜”1/2

≤“

Eˆ|X1 −X2|2

˜”1/2+“

Eˆ|X2 −X3|2

˜”1/2= W2(µ1, µ2) + W2(µ2, µ3)

2(µ1,µ2) := minnZ

Rm×Rm|x1 − x2|2 dµ(x1,x2) : µ ∈ Γ(µ1,µ2)

W22(µj , µj+1) = E

ˆ|Xj −Xj+1|2

W2(µ1, µ3) ≤W2(µ1, µ2) + W2(µ2, µ3)

W2(µ1, µ3) ≤“

Eˆ|X1 −X3|2

˜”1/2=“

Eˆ|(X1 −X2) + (X2 −X3)|2

˜”1/2

≤“

Eˆ|X1 −X2|2

˜”1/2+“

Eˆ|X2 −X3|2

˜”1/2= W2(µ1, µ2) + W2(µ2, µ3)

2(µ1,µ2) := minnZ

Rm×Rm|x1 − x2|2 dµ(x1,x2) : µ ∈ Γ(µ1,µ2)

W22(µj , µj+1) = E

ˆ|Xj −Xj+1|2

W2(µ1, µ3) ≤W2(µ1, µ2) + W2(µ2, µ3)

W2(µ1, µ3) ≤“

Eˆ|X1 −X3|2

˜”1/2=“

Eˆ|(X1 −X2) + (X2 −X3)|2

˜”1/2

≤“

Eˆ|X1 −X2|2

˜”1/2+“

Eˆ|X2 −X3|2

˜”1/2= W2(µ1, µ2) + W2(µ2, µ3)

2(µ1,µ2) := minnZ

Rm×Rm|x1 − x2|2 dµ(x1,x2) : µ ∈ Γ(µ1,µ2)

W22(µj , µj+1) = E

ˆ|Xj −Xj+1|2

W2(µ1, µ3) ≤W2(µ1, µ2) + W2(µ2, µ3)

W2(µ1, µ3) ≤“

Eˆ|X1 −X3|2

˜”1/2=“

Eˆ|(X1 −X2) + (X2 −X3)|2

˜”1/2

≤“

Eˆ|X1 −X2|2

˜”1/2+“

Eˆ|X2 −X3|2

˜”1/2= W2(µ1, µ2) + W2(µ2, µ3)

2(µ1,µ2) := minnZ

Rm×Rm|x1 − x2|2 dµ(x1,x2) : µ ∈ Γ(µ1,µ2)

W22(µj , µj+1) = E

ˆ|Xj −Xj+1|2

W2(µ1, µ3) ≤W2(µ1, µ2) + W2(µ2, µ3)

W2(µ1, µ3) ≤“

Eˆ|X1 −X3|2

˜”1/2=“

Eˆ|(X1 −X2) + (X2 −X3)|2

˜”1/2

≤“

Eˆ|X1 −X2|2

˜”1/2+“

Eˆ|X2 −X3|2

˜”1/2= W2(µ1, µ2) + W2(µ2, µ3)

“Soft” properties

I Convergence with respect to W ⇔Weak convergence +convergence of the quadraticmoments.

I Completeness (if one considers all the probability measures in P2(Rm)).

I Lower semicontinuity with respect to weak/distributional convergence

I Convexity (but linear segments are not geodesics!)

I Existence of (constant speed, minimizing) geodesics connecting arbitrarymeasures µ0, µ1: they are curves µ : t ∈ [0, 1] 7→ µt s.t.

W2(µ0, µ1) = L10[µ], W2(µs, µt) = |t− s|W2(µ0, µ1).

Gradient ows, optimal transport, and evolution PDE’sI C. Villani: Optimal transport: Old and New...

Documents