Santosh Vempala Algorithmic aspects of...

Winter SchoolCombinatorial and algorithmic aspects of convexity

Paris, Institut Henri Poincare,January 19-23, 2015.

http ://wiki-math.univ-mlv.fr/gemecod/doku.php/winterschool2015

Santosh Vempala

Algorithmic aspects of convexity.

Scribes :Joe Anderson and Alperen Ergur

Bernardo Gonzalez and Jesus RebolloBen Cousins and Cecile Mailler

Mohamed Jalel Atia and Lincong FangGiorgos Chasapis, Georgios Samaras and Vissarion Fisikopoulos

Table of content

1. Lecture 1. Introduction. Scribes : Joe Anderson and Alperen Ergur. 3

2. Lecture 2. Convex optimization. Scribes : Bernardo Gonzalez and Jesus Rebollo. 11

3. Lecture 3. Computing the volume. Scribes : Ben Cousins and Cecile Mailler 21

4. Lecture 4. Sampling. Scribes : Mohamed Jalel Atia and Lincong Fang 28

5. Lecture 5. Isoperimetry. Scribes : Giorgos Chasapis, Georgios Samaras, Vissarion Fisikopoulos 37

Page 2

Page 3

Winter SchoolCombinatorial and algorithmic aspects of convexityParis, Institut Henri Poincare, January 19-23, 2015.

Algorithmic aspects of convexity.Santosh Vempala’s lecture I.

Scribes: Joe Anderson and Alperen Ergur

1 Introduction

Convex geometry is a rich and classical field of Mathematics. On the otherhand, computational complexity and the theory of algorithms are only abouthalf a century old. When one studies complexity, the objective is to studythe amount of resources (space, time, randomness, communication) neededto solve a particular problem, as a function of the size of the input. Forgeometric problems, the input size is generally a function of the dimension(e.g. the input is a set of points in Rn). We want to address the followingtype of question: ”What is the complexity of a given problem as a functionof dimension?” We have typically two kinds of behavior: the ones which arepolynomial (P) and the ones which are exponential or worse-NP (decisionproblems with polysize proofs of Yes answer). The question ”P=NP?” has agreat importance and the convexity is a frontier of polytime!

See [GLS88] for more information of many aspects of these problems.We will focus on the following problems, each of which is to be solved

by an algorithm taking input of a particular type and format and returningsome output with particular guarantees:

1. Optimization

Input: a function f : Rn → R and a number ε > 0.

Output: x ∈ Rn such that f(x) ≥ maxy∈Rn

f(y)− ε.

2. Integration

Input: A function f : Rn → R and a number ε > 0.

Output: V ∈ R such that (1− ε)∫f ≤ V ≤ (1 + ε)

∫f .

1

Page 4

3. Sampling

Input: Function f : Rn → R and ε > 0.

Output: A sample X drawn from a distribution π such that d(π, πf ) ≤ε, where πf is the distribution whose density is proportional to f .

4. Learning

Input: Set of data (xi, f(xi) where each xi is drawn according to anunknown distribution D, a number ε > 0.

Output: A function g such that PD(g(x) 6= f(x)) ≤ ε.

5. Rounding

Input: Function f : Rn → R+ such that∫f <∞, and c1, c2 ∈ R

Output: An affine transformation T such that

(a) ETf (‖x‖2) = n

(b) PTf (‖x‖ ≤ c1) ≥ c2

In the above examples, one must consider how a function f can be pro-vided as an input argument; to do so, one can imagine passing a functionpointer to a function which is a black box to compute f at a given point.Then the complexity under this model is the number of calls to such a blackbox needed by the oracle to give an acceptable output.

Problem: without more hypotheses, the problems 1 to 5 are all in-tractable. No efficient algorithm is possible.But if we add some convexity hypotheses on f , then it become possible.For example in problem 1 and 2 we moreover assume:

Example 1 (Convex Optimization). Minimize a function f over K wheref is convex and K ⊆ Rn is convex. This can be solved with accuracy ε andcomplexity poly(n, log 1/ε) by the Ellipsoid Algorithm (to be discussed indetail in the next talk).

Example 2. Let f = χK for a convex body K ⊆ Rn. The task is tocompute vol(K) =

∫f and it can be solved with complexity poly(n, 1/ε)

(see [DFK91]).

2

Page 5

2 Preliminaries

Definition 1. Let S ⊂ Rn. We call S convex if a, b ∈ S =⇒ [a, b] ∈ S,where [a, b] = λa+ (1− λ)b : λ ∈ [0, 1].Definition 2. Let f : Rn → R. We call f concave if f(λx + (1 − λ)y) ≥λf(x) + (1− λ)f(y) for λ ∈ [0, 1] and all x, y ∈ Rn.

Definition 3. We call f log-concave if f(λx+ (1− λ)y) ≥ f(x)λf(y)1−λ forλ ∈ [0, 1] and all x, y ∈ R.

Definition 4. We call f harmonic-concave if

f(λx+ (1− λ)y) ≥ 1λ

f(x)+ 1−λ

f(y)

for λ ∈ [0, 1] and all x, y ∈ R.

Definition 5. We call f quasi-concave if f(λx+(1−λ)y) ≥ minf(x), f(y)for λ ∈ [0, 1] and all x, y ∈ R.

Definition 6. We call f star-shaped if there is some x0 ∈ S such that forall y ∈ S, f(λx0 + (1− λ)y) ≥ λf(x0) + (1− λ)f(y).

Example 3. Exercise in relation to the problem 4 (sampling). Con-sider the following algorithmic task: given a point x ∈ Rn, produce a “label”for x, i.e., compute `(x) : Rn → −1, 1. The points will be given sequen-tially, each after the previous has been assigned a label. Moreover, after eachlabel is produced – and before the next point is supplied – the algorithmreceives feedback on its choice of label, either “correct” or “incorrect”. If thealgorithm has chosen the label incorrectly, it is charged one; otherwise, nocharge. The ultimate goal of the algorithm designer is to limit the numberof mistakes in any setting. However, since the labels may be chosen by anadversary, one cannot hope to have a good strategy in general, since theadversary can always claim you are wrong, not having to have assigned aground truth labeling a priori.

One possibility to have a “good” algorithm, however, is to restrict theproblem setting to one in which there is some halfspace H defined by itsnormal vector a ∈ Rn; the ground truth is then

`(x) =

1 〈x, a〉 ≥ 0

−1 〈x, a〉 < 0.

3

Page 6

Using this knowledge, after n points are given, the algorithm has a set ofconstraints 〈a, xi〉 < 0 or 〈a, xi〉 ≥ 0, where a is the unknown. To guess thelabel of the new point x, compute 〈x, a〉 over all a satisfying the previousconstraints and return the majority answer, (supposing this can be doneefficiently).

Note that to compute ` (i.e. a), one needs n(b+1) bits, where b is the bitlength of a. Then using the majority decision, when a mistake is made, oneeliminates at least half of the valid choices of a. Thus, after 2n(b+1) roundsof this game, the number of mistakes will be at most n(b+ 1).

Instead of using the majority, which takes exponentially many roundsto become reliable, one can simply choose an a from the remaining feasibleregion uniformly at random. This will give the number of expected mistakesat most 2n(b+ 1), since the algorithm has at least 50% chance to agree withthe majority on every round.Exercise: Use sampling to make the argument rigorous.

3 Structure of convex bodies

We now study some algorithmic problems and see how one can use convexityto construct good solutions. In this section we’ll note basic structural factsabout convex bodies. These facts will be useful for later discussions.

3.1 Structure I: Separation

Convex sets have separation oracles. That is, there is an algorithm which,given a point x ∈ Rn, either says that x ∈ K or returns a halfspace H suchthat K ⊂ H and x /∈ H. This hyperplane is guaranteed by using convexityof K, considering the point y in K closest to x 6∈ K. Then one takes thehyperplane perpendicular to y − x passing through y. Furthermore, theclosest point in K to x 6∈ K is unique, because if there were two, say y1 andy2, then one could take the point (y1 + y2)/2 which would be strictly closerto x.

3.2 Structure II: Brunn-Minkowski

Let A,B ⊂ Rn be compact. Note that vol(λA) = λn vol(A) for λ > 0.

4

Page 7

Definition 7. The Minkowski sum of A and B is

A+B = x+ y : x ∈ A, y ∈ B.

Theorem 8 (Brunn-Minkowski). Let A,B ⊂ Rn be compact and λ ∈ [0, 1].Then

vol(λA+ (1− λ)B)1/n ≥ λ vol(A)1/n + (1− λ) vol(B)1/n.

Consequence: the volume of the sections of a convex body by parallelhyperplanes is a 1/(n− 1)-concave function. Indeed, let K be a convex bodyand u be a unit vector which determines a fixed direction. For a ∈ K, letHa = x; 〈x−a, u〉 = 0. We apply the Brunn-Minkowski inequality in Rn−1

to the sets A = K ∩Ha and B = K ∩Hb. From the convexity of K one hasA+B

2⊂ K ∩Ha+b

2. Thus

vol(K ∩Ha+b2

)1

n−1 ≥ vol

(A+B

2

) 1n−1

≥ vol(K ∩Ha)1

n−1 + vol(K ∩Hb)1

n−1

2.

Proof of Brunn-Minkowski’s theorem:We first prove the case where the two sets A and B in Rn are axis-alignedparallelopipeds of side length a1, . . . , an and b1, . . . , bn, respectively. Thisgives us vol(A) =

∏ai and vol(B) =

∏bi, and vol(A + B) =

∏(ai + bi).

Then, using the arithmetico-geometric inequality;

vol(A)1/n + vol(B)1/n

vol(A+B)1/n=

(∏ aiai + bi

)1/n

+

(∏ biai + bi

)1/n

≤ 1

n

∑ aiai + bi

+1

n

∑ biai + bi

= 1.

Next, we take A and B to be finite sets of disjoint parallelepipeds andproceed by induction on the total number of parallelepipeds. Note that onecan find an axis hyperplane H with at least one full member of A (or B, butwe assume that this is achieved for A slog) on each side. Then we decomposeA and B into A+, A−, B+, and B− depending on which side of H they lie.

We can then translate B so that vol(A)vol(B)

= vol(A+)vol(B+)

and we apply the induc-

5

Page 8

tive hypothesis:

vol(A+B) ≥ vol(A+ +B+) + vol(A− +B−)

≥ (vol(A+)1/n + vol(B+)1/n)n + (vol(A−)1/n + vol(B−)1/n)n

= vol(A+)

(1 +

(vol(B+)

vol(A+)

)1/n)n

+ vol(A−)

(1 +

(vol(B−)

vol(A−)

)1/n)n

= vol(A+)

(1 +

(vol(B)

vol(A)

)1/n)n

+ vol(A−)

(1 +

(vol(B)

vol(A)

)1/n)n

= vol(A)1/n + vol(B)1/n.

For the last step, we approximate any compact sets A and B by finite unionsof parallelepipeds and take the limit.

3.3 Structure III: Sandwiching

Theorem 9 (John’s Theorem). Let K ⊆ Rn be convex. Then there existsan ellipsoid E such that E ⊆ K ⊆ nE.

Proof idea. Use the ellipsoid contained in K of maximum volume. Remark 1:For K centrally symmetric, there exists and ellipsoid E with prop-erty E ⊆ K ⊆ √nE.Remark 2: for any deterministic algorithm using a membership oracle, onecannot compute an approximation to such an ellipsoid with better than n3/2

accuracy.EK denotes expected value with respect to the uniform measure over K.

Theorem 10 (Inertia Ellipsoid). Let K be a convex body with center ofgravity EK(x) = x0 and inertia matrix EK [(x − x0)(x − x0)T ] = A. Theinertia ellipsoid is E = y : (y − x0)A−1(y − x0) ≤ 1. Then

√n+ 1

nE ⊆ K ⊆

√n(n+ 1)E.

The inertia ellipsoid is easy to construct, a linear number of samples isenough. It gives the empirical covariance matrix.

6

Page 9

Definition 11 (Milman’s Ellipsoid). Let N(A,B) be the minimum numberof translates of B needed to cover A. A Milman Ellipsoid of K is an ellipsoidE such that N(E,K)N(K,E) = 2O(n).

It can be computed by a probabilistic algorithm in polytime, but the bestdeterministic algorithm needs 2Ω(n) and this is optimal.

3.4 Structure IV: Concentration of mass

Recall that in Rn, the unit hypercube has volume 1. However, the unit spherehas volume about cn/nn, approaching 0 with n.

Let B be a ball of radius r. Then vol(tB) = tn vol(B). Compare thiswith the volume of a ball of radius (1−1/n)r: (1−1/n)n vol(B) ≈ vol(B)/e.This means that in high dimension, about a third of the mass is containedin a very thin shell (of width ≈ 1/n) near the boundary.

In an opposite way, if one look at a slice Ht of the same ball at distancet from the origin, then

voln−1(Ht)

voln−1(H0)=

(√r2 − t2r

)n−1

=

(1− t2

r2

)n−12

= e−t2(n−1)

2r2 .

Hence one gets a constant fraction of the volume in a slab of width c/√n.

The first case is related to the thin shell conjecture which asks if for anyconvex body of volume one and in isotropic position a large part of the massis contained in a shell of constant radius.

4 Basics for tomorrow: convex optimization

Let K be a convex body given by a separation oracle and by two numbers 0 <r < R such that there exists x0 unknown so that B(x0, r) ⊂ K ⊂ B(0, R).The goal will be to minimize a convex function f over K, i.e. to find a pointx ∈ Rn such that f(x) ≥ max

y∈Rnf(y)− ε.

This problem reduces to the feasibility problem: if K is non-empty, find asingle point in K. We first find a point in x ∈ K; f(x) ≤ t then ask if thereis a point in x ∈ K; f(x) ≤ t/2, Next, use binary search to find anotherpoint and repeat. This will produce the expected result in O(log(1/ε) steps.In general, results of this type will depend on parameters x0, r, and R suchthat x0 + rBn

2 ⊆ K ⊆ RBn2 .

7

Page 10

References

[DFK91] Martin Dyer, Alan Frieze, and Ravi Kannan. A randompolynomial-time algorithm for approximating the volume of con-vex bodies. Journal of the ACM (JACM), 38(1):1–17, 1991.

[GLS88] M. Grotschel, L. Lovasz, and A. Schrijver. Geometric Algorithmsand Combinatorial Optimization. Springer-Verlag, Berlin, 1988.

8

Page 11


Algorithmic aspects of convexity.Santosh Vempala’s lecture II: Convex optimization

Scribes: Bernardo Gonzalez and Jesus Rebollo

The topic of this lecture is convex optimization and a more general view of convex pro-gramming. We want to minimize a convex function f in a convex body K. This dependson the presentation of K and f . Recall from the last lecture that, using binary search, wereduced to the feasibility problem: ”Find x ∈ K or K is empty”.

We assume that the convex set K is presented by an oracle. We shall consider twokinds of oracles:

• Separation Oracle: we ask if a point x is in K. If x ∈ K, the oracle answers ”yes”. Ifx /∈ K the oracle returns a half-space H such that K ⊂ H and x /∈ K. In addition,we are given two numbers r,R ∈ R such that there exists a point x0 (unknown)satisfying

x0 + rBn ⊂ K ⊂ RBn.• Membership Oracle: we ask if a point x is in K. If x ∈ K, the oracle answers ”yes”.

If x /∈ K, the oracle answers ”no”. Here in addition to the oracle we need an initialgiven point x0 ∈ K, and two radius r,R ∈ R so that

x0 + rBn ⊂ K ⊂ RBn.

We say that an algorithm is efficient, if it has complexity

poly(n, logR

r)

An excellent reference for these questions is the book ”Geometric algorithms and con-vex optimization” by Grotschel, Lovasz and Schrijver.

Examples 0.1. Here are three examples of convex sets given by an oracle.

1. Linear programming: we are given an m× n matrix A, a vector b ∈ Rm and

K = x ∈ Rn;Ax ≤ b := x ∈ Rn; (Ax)i ≤ bi, 1 ≤ i ≤ mis a convex polyhedron.

2. Semi-definite programming: we are given m symmetric matrices A1, . . . , Am and mreal numbers b1, . . . , bm and one looks for a semidefinite positive symmetric matrixX such that

〈Ai, X〉 := Tr(AiX) = bi.

3. Perfect matching. Recall that a perfect matching of an undirected graph G = (V,E)is a subset M ⊂ E such that every vertex i ∈ V belongs to exactly one edgee ∈ M . The algorithmic search for a perfect matching may be done by linearprogramming via the introduction of the perfect matching polytope PM(G) whichis the set of (xi,j)i,j∈V such that

∑(i,j)∈E xi,j = 1 for all i ∈ V , 0 ≤ xi,j ≤ 1 and

∑(i,j)∈S xi,j ≤

|S|−12 for all S ⊂ E, with |S| odd.

Remark 0.2. R can be exponentially large and r can be exponentially small, so we cannotafford a ”search”.

1

Page 12

1 Ellipsoid Algorithm

An ellipsoid is the affine image of the unit ball Bn2 . It is uniquely defined by a point

z, its center, and a positive definite matrix A : Rn → Rn as follows

E(z,A) = x ∈ Rn : (x− z)TA−1(x− z) ≤ 1 = z +A12 (Bn

2 ).

For any ellipsoid E, we denote by z(E) its center. We define an algorithm which producesa sequence of ellipsoids Ei with centers zi = z(Ei).We start with E0 = RBn

2 and z0 = 0. Then at each step we process the following loop:We ask the oracle: ”Does zi ∈ K?”.- if yes, return zi.- if no, the oracle gives a unit vector ai such that aTi x ≤ aTi zi for all x ∈ K. Then let Ei+1

be the unique minimum volume ellipsoid containing Ei ∩ x; aTi x ≤ aTi zi and let zi+1 beits center.Repeat t times. If no feasible point, declare ”K is empty”.

(a) i=0 (b) i=1 (c) i=2

(d) i=3 (e) i=4 (f) i=5

Figure 1: Steps of the Ellipsoid Algorithm

Lemma 1.1. With the previous notations, one has

V ol(Ei+1) ≤ e−t

2(n+1)V ol(Ei).

Proof. We want to find the equation of the minimum volume ellipsoid Ei+1 containing agiven half-ellipsoid Ei. By an affine transform, we may assume that Ei is the Euclideanball Bn

2 of radius one and that the halfspace is x;x1 > 0. For any z ∈ [0, 1], consider

2

Page 13

the minimum volume ellipsoid E(z), centered at (z, 0, · · · , 0) containing Bn2 ∩x;x1 > 0.

Then by symmetry, E(z) has one semi-axis of length 1 − z with direction e1 and n − 1other semi-axes of identical length that we denote by b. The relation between z and b isgiven by

((0, 1)− (z, 0))T

(1

(1−z)2 0

0 1b2

)((0, 1)− (z, 0)) = 1

Hence, one hasz2

(1− z)2 +1

b2= 1.

Thus

b =1− z√1− 2z

.

The volume of the ellipsoid E(z) is V ol(E(z)) = (1− z)bn−1V ol(Bn2 ). Let

V (z) = (1− z)bn−1 = (1− z) (1− z)n−1

(1− 2z)n−12

=(1− z)n

(1− 2z)n−12

.

Then

V ′(z) =(1− 2z)

n−12 (−n)(1− z)n−1 −

(n−12

)(−2)(1− 2z)

n−32 (1− z)n

(1− 2z)n−1.

Therefore the function V ol(E(z)) reaches its minimum at a point z satisfying n(1− 2z) =(n − 1)(1 − z), that is z = 1/(n + 1). Hence the minimum volume ellipsoid containingBn

2 ∩ x;x1 > 0 has its center at distance 1/(n+ 1) from the origin. Finally we get

V ol(Ei + 1)

V ol(Ei)= V (z) =

(1− 1n+1)n

(1− 2n+1)

n−12

=

(n

n+ 1

)n(n+ 1

n− 1

)n−12

=

(n2

n2 − 1

)n−12

× n

n+ 1

=

(1 +

1

n2 − 1

)n−12(

1− 1

n+ 1

)

≤ e1

2(n+1) e−1

n+1 = e− 1

2(n+1) ,

where we used in the last inequality that 1 + x ≤ ex for every x ∈ R.

Theorem 1.2. If a convex body is given by a separation oracle then the ellipsoid algorithmproduces a point in the convex body after at most

t = O(n2 log

(R

r

)) iterations.

Proof. From the lemma, before that the algorithms stops, one has

V ol(rBn) ≤ V ol(Et+1) ≤ e−t

2(n+1)V ol(E0) = e− t

2(n+1)RnV ol(Bn2 )

Thus

t ≤ 2(n+ 1) log

(R

r

)n= O(n2 log

(R

r

)).

3

Page 14

A variant of the ellipsoid algorithm may also be used for ”rounding”, i.e. finding anellipsoid E whose center is denoted by z and a shrinking factor λ such that

(1− λ)z + λE ⊂ K ⊂ E.Theorem 1.3. Rounding (Lovasz) With at most O(n2 log

(Rr

)) calls to the oracle, we

may find such a rounding ellipsoid with a shrinking factor λ = (√n(n+ 1))−1.

Proof. We describe an algorithm which produces a sequence of ellipsoids (Ei) containingK and which eventually produces the target rounding ellipsoid. We start with E0 = RBn

2 .At each step, we apply the following procedure to the ellipsoid Ei.

Given an ellipsoid E containing K with center z, with orthonormal basis of eigenvectorsv1, · · · , vn and semi-axes a1, · · · , an, we call the oracle 2n times to ask if the points z ±aivi/(n+ 1) belong to K.• if the answer is YES for all 2n points then we deduce that

conv

(z ± aivi

n+ 1

)= z + conv

(± aivin+ 1

)⊂ K.

But in the same way thatBn

2√n⊂ Bn

1 , one has E−z√n(n+1)

⊂ conv(± aivin+1) thus we get

(1− λ)z + λE = z + λ(E − z) ⊂ K ⊂ E,with λ = (

√n(n+ 1))−1.

• if the answer is NO for at least one of the 2n points then one has (say) z + a1v1n+1 /∈ K

and we get a separating hyperplane which cuts the ellipsoid E through this point. Thenwe may show that the ellipsoid E′ of minimum volume containing this ”half” ellipsoidsatisfies that V ol(E′) ≤ e−c/nV ol(E).

Exactly as before, because of the volume drop, running the algorithm, we get that onecannot be in case 2 more than O(n2 log

(Rr

)) times. Hence after at most this number of

steps we are in case 1 and we obtain the rounding ellipsoid.

Theorem 1.4. Lower bound for feasibility. Any feasibility algorithm needs at leastΩ(n log

(Rr

)) calls to the oracle for determining a point in a convex body K given by a

separation oracle and two constants 0 < r < R which are the radius of an inscribed and acircumscribed ball, in the worst case.

Proof. For example, if K is an unknown cube of radius r in a grid of radius R. At each cut,

we eliminate in the best case half of the cubes, so we shall need at least log2

((Rr

)n)=

n log2(Rr

)oracle calls.

We have seen a bound of O(n2 log(Rr

)) queries for the ellipsoid algorithm. Can we

find an algorithm that achieve the best bound given above. The answer is ”Yes” and thealgorithm is described in the next section.

2 A centroid based algorithm

Given a high-dimensional convex body K, we would like to pick a point z such thatfor any cut of the body by a half-space, the piece containing z is big. A reasonable choicefor z is the centroid, i.e.

z = centroid(K) =1

V ol(K)

∫

x∈Kx dx

4

Page 15

We introduce the following centroid based algorithm which construct a sequence of pointszi and polytopes Pi which contain K.We start with P0 = RBn

∞ and z0 = 0. At each step, we ask the oracle: ”Does zi ∈ K?”.- if yes, return zi.- if no, the oracle gives a unit vector ai such that aTi x ≤ aTi zi for all x ∈ K. Then letPi+1 = Pi ∩ x; aTi x ≤ aTi zi and let zi+1 be the centroid of Pi+1.

This choice guarantees to get at least half of the volume for any origin symmetric body.We now want to know how much we are guaranteed to get for a general convex body, andwhat body gives the worst case. Actually, Grunbaum’s theorem states that the circularcone is the worst case if we choose the centroid. Thus the previous algorithm removes inthe worst case 1

e of the total mass of the container Pi, which means good news as it leadsto fast algorithm.

Theorem 2.1 (Grunbaum). Let K be a convex body and H be a half-space containing thecentroid z of K, then

vol(H ∩K) ≥ 1

evol(K).

From Grunbaum’s theorem, using again volume considerations, we get the followingtheorem.

Theorem 2.2. With the previous notations

t = O(n log

(R

r

)) iterations suffices.

For the sake of completeness, we nevertheless prove Grunbaum’s theorem.

Proof. of Grunbaum’s theorem. Without loss of generality, we change coordinates by anaffine transformation so that the centroid is the origin and the half-space H used to cutis x1 ≥ 0, then perform the following operations:

1. Replace every (n − 1)-dimensional slice Kr with an (n − 1)-dimensional ball withthe same volume to get KH (symmetrized for orthogonal H), which is convex byLemma 2.3.

2. Turn KH into a cone, such that the ratio gets smaller by Lemma 2.4.

Lemma 2.3. KH is convex.

Proof. Let KrH = KH ∩ x1 = r be a parallel slice in the new body. The radius of Kr

H

is proportional to V ol(Kr)1

n−1 . By applying Brunn-Minkowski inequality, we get that

V ol(Kr)1

n−1 is a concave function in r, thus KH is convex.

Lemma 2.4. We can turn KH into a cone while decreasing the ratio.

Proof. Let K+H = KH ∩ x1 ≥ 0, K−H = x1 ≤ 0. We now make a cone C+ by picking a

point having x1 coordinate positive on the x1-axis, and V ol(C) = V ol(K+H). We extend

the cone in the x1 ≤ 0 region, so that the volume of the extended part C− is equal toV ol(K−H). We name the resulting cone C. Now because the construction of C only movedmass from left to right, we deduce that the centroid of C must lie in H. Let H ′ be the

5

Page 16

Figure 2: From left to right: K, its Schwartz symmetrization KH with respect to e1 andthe final cone C.

translation of H along the x1-axis so that it contains the centroid of C in its boundinghyperplane. Then

vol(H ∩K)

vol(K)=

vol(H ∩ C)

vol(C)≥ vol(H ′ ∩ C)

vol(C)=

(n

n+ 1

)n≥ 1

e.

The problem is that computing the centroid is an NP -hard problem, thus it is notplausible to find the centroid even for polytopes and deterministic algorithms. In orderto conclude a feasible algorithm, in the previous algorithm we replace the centroid by anapproximate centroid

zi+1 :=1

m

m∑

j=1

xj , x1, . . . , xm random points in Pi+1.

Next theorem is a corrected bound of vol(K ∩H) as in Theorem 2.1 when we considerz = 1

m

∑mj=1 xj , xj random uniform points from K, instead of the centroid.

Theorem 2.5. Let K be an n-dimensional convex body and z := 1m

∑mj=1 xj, where xj

are m random uniform points from K. Then

E(minH3z

vol(K ∩H)) ≥(

1

e−√n

m

)vol(K).

Corollary 2.6. With probability ≥ 1− δ the algorithm will provide an answer after T =O(n log R

rδ ) queries.

In order to show Theorem 2.5 we may assume that K is in isotropic position as thequotient keeps constant under affine maps.

Lemma 2.7. Let f : R → R+ be a marginal isotropic log-concave density of an n-dimensional convex body K, i.e.,

∫f(y)dy = 1,

∫yf(y)dy = 0 and

∫y2f(y)dy = 1.

Then

max f(y) ≤ n

n+ 1

√n

n+ 2< 1.

6

Page 17

Proof of Theorem 2.5. Let us assume that K in isotropic position. Then we have

EK(z · z>) =1

mEK(xj · x>j ) =

n

m

as EK(xj · x>j ) = n for any such random uniform point in an isotropic set K. Thus

EK(||z||2) ≤√EK(||z||22) =

√n/m. On the other hand, let H be any halfspace containing

z. We may assume that H = x;x1 ≥ a, with a ≥ 0. One has ‖z‖2 ≥ z1 ≥ a. Letting

f(t) :=voln−1(x ∈ K;x1 = t)

voln(K)

the marginal density of K (which is also isotropic and log-concave) then by Lemma 2.7,Theorem 2.1 we conclude

vol(K ∩H) =

∫

y≥af(y)dy =

∫

y≥0f(y)dy −

∫

0≤y≤af(y)dy ≥ 1

e− a ≥ 1

e− ||z||2.

Taking the expectation, we conclude that

E(minH3z

vol(K ∩H)) ≥ E(

1

e− ||z||2

)≥ 1

e−√n

m.

Proof of Lemma 2.7. Let us assume after a suitable rigid motion that the marginals aretaken with respect to the hyperplane H = e⊥1 . Let ae1 = y∗ ∈ R, a ≥ 0, be such thatf(y∗) = max f(y). We first apply Schwarz symmetrization to K with respect to line1and get KH . We now modify KH (and f change accordingly) and call K ′ the final result,in the following way: call K1 = KH ∩ x : x>e1 ≤ 0, K2 = KH ∩ x : 0 ≤ x>e1 ≤ aand K3 = KH ∩ x : x>e1 ≥ a. Replace K1 by the circular piramid K ′1 = conv(be1 ∪(K1 ∩ K2)) with b ≤ 0 be such that vol(K1) = vol(K ′1). Replace K2 by the truncatedcone K ′2 = conv((K1 ∩K2) ∪ (K2 ∩K3)) and finally exchange K3 by the circular piramidK ′3 = conv(ce1∪(K2∩K3)) and c ≥ 0 be such that vol(K ′3) = vol(K3)+vol(K2)−vol(K ′2).

Let us observe that the mass moves away from the center of mass and spread in thedirection line1. Indeed, calling the map of the change y → y+ g(y) with yg(y) ≥ 0, andI(K) =

∫K(y − EK(y))2fK(y)dy (the moment of inertia), then

I(K ′) = EK′(y2)− EK′(y)2

= EK((y + g(y))2)− EK(y + g(y))2

= I(K) + V ar(g(y)) + 2EK(yg(y)) ≥ I(K).

Thus the moment of inertia will only increase if we modify K ′ in the following way: insteadof K ′1, we continue K ′2 on the left until it reaches the axis. We have replaced K1∪K ′2 by asingle cone with the same slope as K ′2. It has added mass on the left. We compensate it bymoving the point c, which defines K ′3, to the right. Then the whole mass and the variancehave increased, the center of mass and the maximum have not moved. Rescaling down inthe line line1 as well as rescaling up in H the maximum of its marginal increases. Witha similar move we finally arrive at a cone. Calling h, and resp. A, the height of the lastcone, and resp. the area of its base, we get that for its marginal f

max f =n

n+ 1

√n

n+ 2,

7

Page 18

3 Optimization with a membership oracle

Our next goal is to find a suitable algorithm for optimizing a linear function on aconvex body known only by a membership oracle. Recall what it means. We are given aninitial point x0 ∈ K, and two radius r,R ∈ R so that

x0 + rBn ⊂ K ⊂ RBn.We ask if a point x is in K. If x ∈ K, the oracle answers ”yes”. If x /∈ K, the oracleanswers ”no”.

Data: z0 ∈ K, c ∈ Rn, ε > 0, K0 = K, Membership oracleResult: Approximating minx∈K c>xinitialization;

if c>zi ≤ minK c>x+ ε then

DONE;else

sample K;

Ki+1 = Ki ∩ x ∈ Rn : c>x ≤ c>zi;let zi+1 be the centroid of Ki+1;

end

This algorithm lead to a number of iterations of orderm = O(n log 1ε ) (and a complexity

of O∗(n5)). Let us also remark that the distance to arg minK c>x decreases by a factor

of (1 − 1n) at least. Assuming a factor (1 − 1√

n) on the outer radius R on each of those

steps and sampling with a density getting us closer to the optimal solution, we add thefollowing lines to the algorithm: for i = 1, . . . ,m do

- Ti = R(1− 1√n

)i;

- Sample from Ki+1 with density fi(x) ≈ e−c>xTi χK(x).

Theorem 3.1. For m = O(√n log 1

ε ) iterations it suffices with probability ≥ 1 − δ thatc>xm ≤ minK c

>x+ ε.

Next lemma shows the truth of Theorem 3.1 and the algorithm provided above.

Lemma 3.2. Let f(x) = e−c>xT χK(x) be a density on K, where K is an n-dimensional

convex body, c ∈ Rn and T > 0. Then

E(c>x) ≤ minK

c>x+ nT.

Proof. After a suitable rigid motion we can assume that c = (1, 0, . . . , 0) and x∗ = ae1 ∈ Kbe such that K ⊂ x ∈ Rn : c>x∗ ≤ c>x (see Figure 3). Let us denote by v = E(c>x)and

K ′ = x ∈ Rn : x = x∗ + α(y − x∗), y ∈ K ∩H(v), α ≥ 0where H(v) = ve1 + e⊥1 . Then v = E(c>x) ≤ EK′(c>x) and

EK′(c>x) =

∫∞0 ye−

yT vol(K ′ ∩H(y))dy

∫∞0 e−

yT vol(K ′ ∩H(y))dy

.

Using vol(K ′ ∩H(y)) = (y/v)n−1vol(K ′ ∩H(v)) we finally get that

EK′(c>x) =

∫∞0 yne−

yT dy

∫∞0 yn−1e−

yT dy

=n!Tn

(n+ 1)!Tn−1= nT.

The last computation follows from∫∞0 yne−

yT dy = n!Tn+1.

8

Page 19

Figure 3: The original set K and the infinite cone K ′ containing the direction c.

References

[Gr] B. Grunbaum, Partitions of mass-distributions and of convex bodies by hyperplanes,Pacific J. Math. 10 (1960), 1257–1261.

[V] S. Vempala, http://www.cc.gatech.edu/ vempala/hda/notes.pdf.

[Ke] J. Kelner, 18.409 An Algorithmist’s Toolkit, Lecture 13. Scribe: Jonathan Pines.

9

Page 20

Page 21


Algorithmic aspects of convexity.Santosh Vempala’s lecture III: Computing the volume

Scribes: Ben Cousins and Cecile Mailler

1 Integration

1.1 Computing the volume of a convex body

In this section, we are interested in designing an effective algorithm with the following inputs andoutputs: Given

• a membership oracle for the convex body K ⊆ Rn;

• x0 ∈ K, r,R > 0 such that x0 +rBn ⊆ K ⊆ RBn (where Bn is the unit ball in dimension n);and

• ε > 0,

the algorithm gives V such that

(1− ε)Vol(K) ≤ V ≤ (1 + ε)Vol(K).

Some exact formulas are known to calculate the volume of some simple convex bodies, namely,parallelepipeds, balls and simplices. Therefore, one can think of dividing the convex body K intosome of these simple forms.

Naive algorithm 1 (for polytopes): Divide the polytope into simplices. But the number ofsimplices needed to divide a polytope is exponential in terms of the dimension!Naive algorithm 2: Partition the convex body K by cubes of edge length δ (like a pixelisation).But once again, there is an exponential number of cubes, and we need to call the membership oraclefor each of them.Naive algorithm 3: Find an ellipsoid E such that E ⊆ K ⊆ n3/2E (this can be done inpolynomial time). But it only gives an nn-approximation: Vol(E) ≤ Vol(K) ≤ (n3/2)nVol(E).

Actually, it has been proved that computing the volume is difficult:

Theorem 1 ([Ele86, BF87]). Let a > 0. For any deterministic algorithm that uses na (resp.2an) oracle calls, and computes, for all convex body K ∈ R, A(K) and B(K) such that A(K) ≤Vol(K) ≤ B(K), there exists a convex body K0 such that

B(K0)

A(K0)≥(cst na lnn

)n/2

(resp. ≥ 2cst n) .

Proof. Assume that K = Bn is the unit ball in dimension n. Assume that the algorithm hasalready done m oracle calls (and also assume that the m points were in fact inside K). For alli ∈ 1, . . . ,m, we thus know that xi ∈ K = Bn and thus we know that the ball of diameter [0, xi],

denote by B(i) is included in Bn. Moreover, for all i ∈ 1, . . . ,m, we have that Vol(B(i)) ≤ Vol(Bn)2n

because the diameter of B(i) is at most half of the diameter of K. Therefore, Vol(⋃m

i=1B(i))≤

m2n Vol(B).

1

Page 22

K xi

B(i)

Figure 1: The ball B(i) is the ball of diameter [0, xi].

To conclude the proof, it is enough to show that conv(x1, . . . , xm) ⊆ ⋃mi=1B

(i). Assumethat y ∈ conv(x1, . . . , xm) and y /∈ ⋃mi=1B

(i). This second assumption implies that, for alli ∈ 1, . . . ,m, the angle (0, y, xi) < π/2, which is contradiction with y ∈ conv(x1, . . . , xm).

Therefore, after m oracle calls, we cannot get better than a 2n

m approximation, which concludesthe proof.

Barany and Furedi [BF87] also proved the following equivalent of Theorem 1: for all α > 0,an algorithm with complexity (1/α)n cannot give better than a (1 +α)n-approximation. However,the following result holds:

Theorem 2 ([DV06]). There exists a deterministic algorithm that finds a (1+α)n-approximationwith complexity (1/α)O(n).

Random sampling then permits to obtain more efficient algorithms:

Theorem 3 ([DFK91]). For all δ, α > 0, there exists a randomised algorithm that computes, withprobability 1− δ, a (1 + α)n-approximation with polynomial complexity in

(n, log R

r ,1α log 1

δ

).

A naive randomised algorithm could be the following: take a ball B including the convex bodyK, sample uniform random points in B and approximate the volume of K by the proportion ofrandom points that belong to K times the volume of the ball B. The problem is that almost allpoints won’t belong to K.

The [DFK91] algorithm is the following: Assume that Bn ⊆ K ⊆ RBn. Let m = n log2R, andfor all i ∈ 1, . . . ,m, define

Ki = K ∩ (2i/nBn).

Note that

Vol(K) = Vol(Bn)m∏

i=1

Vol(Ki)

VolKi−1.

Then, for i from 1 to m, sample ki points in Ki−1 and estimate Vol(Ki)Vol(Ki−1)

by the proportion of

points falling into Ki. Multipliate these estimates to get an approximation for Vol(K). Note that

the estimation of Vol(Ki)Vol(Ki−1)

by uniform random sampling works because Vol(Ki) ≤ 2Vol(Ki−1).

To get the complexity, we need to know how many points are needed to get a good approx-

imation at each step. By Tchbychev’s inequality, we know that m2

α2 points are enough to get a

(1 + α)-approximation. Therefore, in total, we need m3

α2 =n3 log3

2 Rα2 samples and oracle calls. But

in fact,

Theorem 4 ([DFK91]). O(m2

α2

)samples suffice.

2

Page 23

Proof. The idea of the proof is just the following: given Y1, . . . , Ym m i.i.d. random variables, wehave that

Var(Y1 · · ·Ym)

(E(Y1 · · ·Ym))2 =

m∏

i=1

(1 +

VarYi(EYi)2

)− 1 = exp

(m · VarY1

(EY1)2

)− 1.

In our case, VarYi

(EYi)2∼ cst

kiwhere ki is the number of points drawn at step i. Thus, choosing ki = m

α2

givesVar(Y1 · · ·Ym)

(E(Y1 · · ·Ym))2 ∼

m2

α2,

which conlcudes the proof by Tchebychev’s inequality.

This algorithm by Dyer, Frieze, and Kannan has inspired a wide literature, aiming at reducingits complexity. The original algorithm stated above uses the sample algorithm, of complexityO(n3), and has overall complexity equal to O(n23). The table below lists the different improve-ments of this algorithm and their complexity:

authors complexity main idea[DFK91] n23

[LS90] n16 isoperimetry[Lov90] n10 ball-walk[DF88] n8

[LS92] n7 rounding + many tools[KLS97] n5 isotropic positions[LV06] n4 hit-and-run[CV14] n3 Gaussian cooling

Surprisingly, a way to get a better complexity is to tackle a more complicated problem: namelythe integration on a convex body.

1.2 Logconcave Integration

In the previous section, we computed the volume of a convex body K by constructing a sequenceof bodies that converge to K, computing the volume change for each body. We now shift our focusfrom volume to integration, which can be viewed as a generalization of volume computation. Webegin by formally stating the integration problem.

Problem 1. Given as input:

• A membership oracle to a convex body K ⊆ Rn.

• A point x0 ∈ Rn and a number R ∈ R such that x0 +Bn ⊆ K ⊆ R.

• An oracle to a function f : Rn → R+ such that∫Kf(x) dx <∞.

• An error parameter ε > 0.

Output a number V such that

(1− ε)∫

K

f(x) dx ≤ V ≤ (1 + ε)

∫

K

f(x) dx.

The approach we use for integration is similar to that of volume, where we use a sequenceof functions that connect an “easy” function to our target function. For a sequence of functionsf0, . . . , fm where each fi : Rn → R, we rewrite

∫Kf(x) dx as

∫

K

f(x) dx =

∫

K

f0(x) dx ·∫Kf1(x) dx∫

Kf0(x) dx

· . . . ·∫Kf(x) dx∫

Kfm(x) dx

.

3

Page 24

We want f0 to be a function which is easy to integrate over K (perhaps approximately), andthen we want to estimate each integral ratio

∫Kfi(x) dx∫

Kfi−1(x) dx

.

To estimate this ratio, sample a point X with density proportional to fi−1 and set Y =fi(X)/fi−1(X). The expectation of Y is the quantity we wish to estimate.

Claim 1. For Y and fi as defined above,

E(Y ) =

∫Kfi(x) dx∫

Kfi−1(x) dx

.

Proof. We have that

E(Y ) =

∫

K

fi(x)

fi−1(x)· fi−1(x)∫

Kfi−1(y) dy

dx =

∫Kfi(x) dx∫

Kfi−1(x) dx

.

The function fi should be “close” to fi−1, so that the ratio of the integrals will be easy toestimate within a target relative error (i.e. the variance E(Y 2)/E(Y )2 should be bounded). Wenow sketch the algorithm.

Integrate(K, f, ε)

1. Compute (or estimate)∫Kf0, call this quantity R0.

2. For i = 1, . . . ,m:

(a) Compute an estimate Ri of the integral ratio∫Kfi/∫Kfi−1.

3. Return R0R1 . . . Rm as the estimate for∫Kf .

Figure 2: General algorithm for integration

We now describe one way to select the sequence of functions f0, . . . , fm based on the algo-rithm in [LV06]. Set fi(x) = e−ai‖x‖ and

• a0 = 4n

• ai = ai−1 · (1− 1/√n) for i = 1, . . . ,m− 1

• am = ε/(2R).

The proof of the variance bound will use the following lemma about logconcave functions,whose proof is deferred to the end of the section.

Lemma 1 ([LV06]). If a > 0, Z(a) = an∫Kf(ax), dx, and f : Rn → R logconcave, then Z(a) is

a logconcave function of a.

Proof. (of Lemma 1) Define

G(t, x) =

1 if t > 0 and x ∈ tK0 otherwise

,

4

Page 25

which is a logconcave function. Also define F (t, x) = f(x) ·G(t, x). Since f,G are logconcave, Fis also logconcave. Since F is logconcave, its marginal is logconcave. The marginal of F in t is

∫

Rn

f(x)G(x, t) dx = tn∫

K

f(tx) dx.

Lemma 2 ([LV06]). Let fi = e−ai‖x‖, ai = ai−1(1 − 1/√n), and X be a random sample with

density proporional to fi−1. Then, for Y = fi(X)/fi−1(X), we have that

E(Y 2)

E(Y )2≤ 4.

Proof. For convenience, define F (a) =∫Ke−a‖x‖ dx. From Claim 1, we have that

E(Y ) =F (ai)

F (ai+1).

We also derive the second moment:

E(Y 2) =

∫

K

(fi(x)

fi−1(x)

)2

· fi−1(x)∫Kfi−1(y) dy

dx

=

∫Ke−2ai‖x‖ · eai−1‖x‖ dx∫

Kfi−1(x) dx

F (ai−1)

=F (2ai − ai−1)

F (ai−1).

We therefore have thatE(Y 2)

E(Y )2=F (2ai − ai−1)F (ai−1)

F (ai)2.

Define Z(a) = anF (a). By Lemma 1, we have that Z(a) is a logconcave function of a. There-fore,

Z(2ai − ai−1)Z(ai−1)

Z(ai)2≤ 1,

which after rearranging terms gives

E(Y 2)

E(Y )2≤(

a2i(2ai − ai−1)ai−1

)n

=

(1

(2− (ai−1/ai))(ai−1/ai)

)n

=

(1

(1 + 1/√n)(1− 1/

√n)

)n

=

(1

1− 1/n

)n

=

(1 +

1

n− 1

)n≤ 4.

We recall well-known properties of logconcave functions.

Theorem 5. Marginals of logconcave functions are logconcave. Logconcave functions are closedunder convolution.

5

Page 26

The following theorem is commonly known as the Prekopa-Leindler inequality.

Theorem 6. Suppose f, g, h : Rn → R+ are integrable and that ∀x, y ∈ Rn, λ ∈ [0, 1], h(λx+ (1−λ)y) ≥ f(x)λg(y)1−λ. Then

∫

Rn

h(x) dx ≥(∫

Rn

f(x) dx

)λ(∫

Rn

g(x) dx

)1−λ.

Proof. We prove the lemma by induction on the dimension n. First consider n = 1. Let Lf (t) =x : f(x) ≥ t be a level set of f . Since f is logconcave, we have that the level sets of f are convex.Then

λLf (t) + (1− λ)Lg(t) = λx+ (1− λ)y : f(x) ≥ t, g(x) ≥ t ⊆ Lh(t)

since h(λx + (1 − λ)y) ≥ t. Therefore, we have that vol(Lh(t)) ≥ λLf (t) + (1 − λ)Lg(t) for allλ ∈ [0, 1] and

∫

Rh(x) dx =

∫ ∞

0

vol(Lh(t)) dt

≥ λ∫ ∞

0

Lf (t) dt+ (1− λ)

∫ ∞

0

Lg(t) dt

≥(∫

Rf(x) dx

)λ(∫

Rg(x) dx

)1−λ

Now suppose the inequality is true for dimension n− 1. Define h(z, x) = hz(x) for z ∈ R, x ∈Rn−1. (similarly for f, g). Fix a z. Then the marginal distribution on the remaining n − 1coordinates is logconcave. Then for z = λz1 + (1− λ)z2, by a similar argument to n = 1

h(λz1 + (1− λ)z2, λx1 + (1− λ)x2) ≥ f(z1, x1)λg(z2, x2)1−λ,

which implies thathz(λx1 + (1− λ)x2) ≥ fz1(x1)λgz2(x2)1−λ.

By induction, we have that

∫

Rn−1

hz(x) dx ≥(∫

Rn−1

fz1(x) dx

)λ(∫

Rn−1

gz2(x) dx

)1−λ,

and thus ∫

Rn

h(x) dx ≥(∫

Rn

f(x) dx

)λ(∫

Rn

g(x) dx

)1−λ.

We now give a slightly more detailed algorithm for integration, which will work for any log-concave function.

Integrate(K, f, ε)

1. Set fi(x) = f(x)ai , x ∈ K.

2. Set a0 = 0, am = 1, and ai+1 = ai(1− 1/√n) for i = 0, . . . ,m− 2.

3. For i = 1, . . . ,m, compute wi =∫fi/∫fi−1.

4. Output W1 . . .Wm ·∫f0.

We note that for optimizing a logconcave function f , we can use a slightly different coolingschedule and instead of estimating integral ratios, we simply output the point x with the largestfunction value f(x) that we see. So, integrating and optimizing a general logconcave function arevery closely related.

6

Page 27

References

[BF87] I. Barany and Z. Furedi. Computing the volume is difficult. Discrete & ComputationalGeometry, 2(1):319–326, 1987.

[CV14] B. Cousins and S. Vempala. Bypassing KLS: Gaussian cooling and an O∗(n3) volumealgorithm. 2014.

[DF88] M. E. Dyer and A. M. Frieze. On the complexity of computing the volume of a polyhe-dron. SIAM Journal on Computing, 17(5):967–974, 1988.

[DFK91] M. Dyer, A. Frieze, and R. Kannan. A random polynomial-time algorithm for approxi-mating the volume of convex bodies. Journal of the ACM, 38(1):1–17, 1991.

[DV06] Amit Deshpande and Santosh Vempala. Adaptive sampling and fast low-rank matrixapproximation. In APPROX-RANDOM, pages 292–303, 2006.

[Ele86] G. Elekes. A geometric inequality and the complexity of computing volume. Discrete &Computational Geometry, 1(1):289–292, 1986.

[KLS97] R. Kannan, L. Lovasz, and M. Simonovits. Random walks and an O∗(n5) volume algo-rithm for convex bodies. Random Structures and Algorithms, 11:1–50, 1997.

[Lov90] L. Lovasz. How to compute the volume? Jber. d. Dt. Math.-Verein, Jubilaumstagung1990, pages 138–151, 1990.

[LS90] L. Lovasz and M. Simonovits. Mixing rate of markov chains, an isoperimetric inequality,and computing the volume. In Proc. 31st IEEE Annual Symp. on Found. of Comp. Sci.,pages 482–491, 1990.

[LS92] L. Lovasz and M. Simonovits. On the randomized complexity of volume and diameter.In Proc. 33rd IEEE Annual Symp. on Found. of Comp. Sci., pages 482–491, 1992.

[LV06] L. Lovasz and S. Vempala. Simulated annealing in convex bodies and an O∗(n4) volumealgorithm. J. Comput. Syst. Sci., 72(2):392–417, 2006.

7

Page 28

Vempala’s course: lecture 4

Scribes: Mohamed Jalel ATIA and Lincong FANG

February 2, 2015

1 Sampling

Let f : Rn → R+ such that∫f < ∞. Let x0 ∈ Rn and ε > 0, X is from a

distance πf : d(πf , πf ) ≤ ε.With K convex body, our goal is to find X with a uniform distribution on Kand mean oracle (X0, r, R).

Grid walk.- At X, pick y from ±ei with probability 1

4n y = 0 with probability 12 .

- If X + δy ∈ K then go to X + δy.And with Q0, Q1, . . . , Qt densities,- Pick uniformly from CXT

- - if belongs to K then go to output,- - else return from beginning.

Let Q(i) =∑j Q(j)P (i, j) and because of symmetry we have

P (i, j) = P (i, j) =1

4n

The uniform Q is stationary on its support = (δZ)n ∩K.

δ

Figure 1: An illustration of grid walk.

1

Page 29

(a)

Q t Q t+1

(b)

Figure 2: Illustration of Qt.

Figure 3: S = x : CX ∩K 6= ∅.

Theorem. Reducible and nonbipartide ⇒ distribution stationary.This is, essentially, because

p(i, i) ≥ 1

2(lazy).

Let us consider S = x : CX ∩K 6= ∅.Claim. Any two x, y ∈ (δZ)n ∩K are connected by edges on S.

The question is: how large is |S||(δZ)n∩K| , so we will try to bound |S|

|(δZ)n∩K| .

Let us consider B ⊂ K ⊆ RB. Let K = (1 + α)K, (K + αB), since α = δ√n

then K contains every Cx that intersects K then

| S || (δZ)n ∩K | ≥

vol((1 + α)K)

vol((1− α)K)≤ (

1 + α

1− α )n = (1 + δ

√n

1− δ√n )n, small

≤ C if δ1

n√n.

Figure 4: x, y ∈ (δZ)n ∩K are connected by edges on S.

2

Page 30

1/2δn

Κ

Κ+αΒ

Figure 5: (K + αB).

(a) (b)

Figure 6: Illustration of ball walks.

How many steps? How long it takes?Ball Walk (δ): supp = K.At X, Pick y from X + δB, if y ∈ K go to y.Neighborhood: X + δB ∩K.

Hit and Run The support is K.At X- Pick l uniformly from Sn−1.- Go to random point y on the chord l(x).

x

y

(a)

x y

(b)

Figure 7: Illustration of Hit and Run.

3

Page 31

x

Figure 8: Illustration of Dikin walk.

The Neighborhood is K.

Dikin walk The support is K.At X - Compute ellipsoid EX .

- Pick Y from EX - Go to Y = min1, vol(EX)vol(EY ).

Neighborhood: X + EX .

Markov scheme Let K be a space-convex body and A all the subsets ofK. For u ∈ K we define the probability Pu(A), ∀A ∈ A.The Markov chain is a sequence w0, w1, , wk, wk+1 such that Pr(wk+1 ∈ A/w0wk) =Pr(wk+1 ∈ A/wk) = Pwk

(A).Stationary means that ∫

K

Pu(A)dQ(u) = Q(A).

Time reversible Let A,B such that

Q(A)

∫

A

Pu(B)dQ(u) = Q(B)

∫

B

Pu(A)dQ(u).

Ergodic flow defined by

Φ(A) =

∫

A

Pu(K/A)dQ(u).

Conductance defined by

φ(A) =Φ(A)

minQ(A), Q(K/A) .

Convergence of Q0, Q1, , Qk → Q :

dTV (Qt, Q) = supAQt(A)−Q(A),

χ2(Qt, Q) =

∫(dQt(u)

dQ(u)− 1)2dQ(u).

4

Page 32

M(Qt, Q) = supAQt(A)

Q(A).

Theorem (lazy, connected)

dTV (Qt, Q) ≤√M(Q0, Q)(1− φ2

2)t,

where φ = infAφ(A).Theorem (time inversible)

χ2(Qt, Q) ≤ (1− φ2

2)tχ2(Q0, Q).

Proof theorem 1.- The first fact is that the distribution Q is stationery if and only if A ∈A, Φ(A) = Φ(K/A).For x ∈ [0, 1] we define

supA:Q(A)=xQt(A)−Q(A)

And

ht(x) = supg∈GX

∫g(u)dQt(u)−

∫g(u)dQ(u)

Where

GX = g : K → [0, 1],

∫g(u)dQ(u) = X.

- The second fact says that if Q is an atom-free then

ht(x) = supQ(A)=XQt(A)−Q(A).

Lemma- ht(x) is a concave function.

- ht(x) ≤ 12

(ht−1(x− 2φy + ht−1(x+ 2φy)

)where y = minx, 1− x.

Proof. g1 ∈ GX1 , g2 ∈ GX2

λg1 + (1− λ)g2 ∈ GλX1+(1−λ)X2.

ht(λX1 + (1− λ)X2) ≥∫

(λg1(u) + (1− λ)g2(u))dQt(u)− (λX1 + (1− λ)X2)

≥ λ(

∫g1(u)dQt(u)−X1) + (1− λ)(

∫g2(u)dQt(u)−X2)

≥ λht(X1) + (1− λ)ht(X2),

Using the Lazy-caracter (A, Q(A) = x ≤ 12 ) we obtain:

g1(u) = 2Pu(A)− 1, u ∈ A, 0 if u /∈ A,

5

Page 33

x0 1x1 x2

x(1-2ø) x(1+2ø)

Figure 9: Illustration of ht−1.

g2(u) = 2Pu(A), u /∈ A, 1 if u ∈ A,We have

1

2g1(u) +

1

2g2(u) = Pu(A), ∀u.

*∫

( 12g1(u) + 1

2g2(u))dQt−1(u) =∫Pu(A)dQt−1(u) = Qt(A).∫

( 12g1(u) + 1

2g2(u))dQ(u) =∫Pu(A)dQ(u) = Q(A) = X = 1

2 (X1 +X2).∃A, ht(X) = Qt(A)−Q(A) = 1

2

∫g1(u)dQt−1(u)+ 1

2

∫g2(u)dQt−1(u)− 1

2 (X1 +X2)= 1

2 (∫g1(u)dQt−1(u)−X1) + 1

2 (∫g2(u)dQt−1(u)−X2),

Then

ht(X) ≤ 1

2ht(X1) +

1

2ht(X2).

X1 =∫Kg1(u)dQ(u)−

∫AdQ(u) = 2

∫(1− Pu(K/A))dQ(u)−X

= 2∫

1dQ(u)−X − 2∫Pu(K/A)dQ(u) = 2X −X − 2

∫Pu(K/A)dQ(u),

With Q(A) = X ≤ 12 we have

φ(A) =

∫Pu(K/A)dQ(u)

Q(A).

X1 ≤ X − 2φX, X2 ≥ X + 2φX and

ht(X) ≤ 1

2ht−1(X(1− 2φ)) +

1

2ht−1(X(1 + 2φ)).

Theorem Let C0, C1 such that h0(X) ≤ C0 + C1min√X,√

1−X then

ht(X) ≤ C0 + C1min√X,√

1−X(1− φ2

2 )t.Proof With x ≤ 1

2 We have

ht(X) ≤ 1

2ht−1(X(1− 2φ)) +

1

2ht−1(X(1 + 2φ))

≤ C0 +1

2C1(√X(1− 2φ) +

√X(1 + 2φ))(1− φ2

2)t− 1

≤ C0 + C1

√X(1− φ2

2)(1− φ2

2)t− 1

6

Page 34

Figure 10: Illustration of l(X).

With

h0(X) ≤ minMX, 1 −X ≤√MX ⇒ ht(X) ≤

√M(1− φ2

2)t, ∀X,

then

dTV (Qt, Q) ≤√M(1− φ2

2)t.

Remark- Mixing Time which is O( 1φ2 ) and let consider

l(X) =vol((X + δB) ∩K)

vol(δB).

We note K = K + αBn and α = 2δ√n we then have

K = K + 2δ√nB ⊆ (1 + 2δ

√n)K,

and thenvol(K) ≤ (1 + 2δ

√n)nvol(K),

for δ ≤ 12n√n

we have vol(K) ≤ ρ · vol(K).

When ∀A ⊆ K, φ(A) is large we have the following lemmaLemma For u, v ∈ K, l(u), l(v) ≥ l

|| u− v ||≤ tδ√n⇒ dTV (Pu, Pv) ≤ t+ 1− l.

Theorem We have the following inequality

φ ≥ lδ

16∆√n

= Ω(1

n2D),

And the mixing time is of order O(n4R2).

Proof∫S1Pu(S2)dQu ≥ φ.min(Q(S1), Q(S2)).

S1 = u ∈ S1 : Pu(S2) <l

4, S2 = u ∈ S2 : Pu(S1) <

l

4,

7

Page 35

Κ

Κ+αΒ

(a)

α

δ

(b)

α

δ

3.0

6 m

m

1/2δn

(c)

Figure 11: K + αB.

Figure 12: .

8

Page 36

S1

S2

(a)

S’1

S’2S’3

(b)

S’1

S’2

S’3

(c)

Figure 13: .

And then we can claim Claim ∀u ∈ S1, v ∈ S2,

|| u− v ||≥ lδ

2√n.

Proof we have

dTV (Pu, Pv) > 1− l

2, t =

l

2⇒|| u− v ||≥ lδ

2√n.

If vol(S1) < 12vol(S1) then φ ≥ lδ

16D√n

, the same for S2. So if vol(Si) ≥− l

2vol(Si) or

∫

S1

Pu(S2)dQu =1

2

∫

S1

Pu(S2)dQu +1

2

∫

S2

Pu(S1)dQu ≥l

4

vol(S3)

vol(K),

Using

vol(S3) ≥ 2d(S1, S2)

Dminvol(S1), vol(S2)

We find ∫

S1

Pu(S2)dQu ≥l2

4

δ√n

2

D

1

2≥ c

n2D.

9

Page 37


Algorithmic aspects of convexity.Santosh Vempala’s lecture V.

Scribes: Giorgos Chasapis, Georgios Samaras, Vissarion Fisikopoulos

1 Preliminaries (more or less mentioned in the previouslecture)

We consider a state space K and a σ-algebra A on the subsets of K. Foru ∈ K and A ∈ A, let Pu(A) be the ”one-step probability”, which tells us theprobability of being in A after taking one step from u. We have a startingdistribution Q0 on K which gives us a probability Q0(A) of starting in the setA ∈ A.

With this setup, a Markov chain is a sequence of points w0, w1, w2, . . . suchthat P(w0 ∈ A) = Q0(A) and

P(wi+1 ∈ A|w0, . . . , wi) = P(wi+1 ∈ A|wi) = Pwi(A),

for each A ∈ A. A distribution Q is called stationary if, for each A ∈ A,

Q(A) =

∫

KPu(A) dQ(u).

The Markov chain is called time reversible if, for each A,B ∈ A,

Q(A)

∫

A

Pu(B) dQ(u) = Q(B)

∫

B

Pu(A) dQ(u).

For any A ∈ A, the ergodic flow Φ(A) is the probability of transitioning fromA to K \A,

Φ(A) =

∫

A

Pu(K \A) dQ(u).

We define, for each A ∈ A,

φ(A) =Φ(A)

minQ(A), Q(K \A) .

Then the conductance of K is φ = minA⊂K φ(A), where Q is a stationary dis-tribution.

Now let K = K be a convex body in Rn and A the set of measurable subsetsof K. The local conductance at u ∈ K is

`(u) = 1− Pu(u) =vol((u+ δBn) ∩K)

vol(δBn)

We have seen that the conductance of the ball walk in K can be exponentiallysmall. We can bypass this problem by defining the α-extension of K, K ′ =K + αBn. Then for α > 2δ

√n we have `(u) > 1/8, for every u ∈ K ′.

1

Page 38

If Qi is the distribution of the i-th step of the random walk, set

dTV (Qt, Q) = supA∈A

(Qt(A)−Q(A))

and

M(Qt, Q) = supA∈A

Qt(A)

Q(A).

The following theorem was proved in lecture 4.

Theorem 1.1.dTV (Qt, Q) 6

√M(Q0, Q)

(1− φ2/2

)t.

We want to prove that the random walk we defined is rapidly mixing, thatis, that conductance φ is bounded from below by an inverse polynomial in thedimension. Specifically, our goal is to prove

Theorem 1.2. If D = diam(K) and for every u ∈ K the local conductance ofthe ball walk with δ steps is at least `, then

φ > `2δ

16√nD

.

Proof. Consider an arbitrary measurable S1 ⊆ K and set S2 = K \ S1. We willprove that

Φ(S1) =

∫

S1

Pu(S2) dQ(u) > `2δ

16√nD·minvol(S1), vol(S2)

Observe that, since the distribution Q is stationary, we have

∫

S1

Pu(S2) dQ(u) =

∫

S2

Pu(S1) dQ(u).

Next we define the sets

S′1 = u ∈ S1 : Pu(S2) < `/4,

S′2 = u ∈ S2 : Pu(S1) < `/4and

S′3 = K \ (S′1 ∪ S′2).

Then

∫

S1

Pu(S2) dQ(u) =1

2

(∫

S1

Pu(S2) dQ(u) +

∫

S2

Pu(S1) dQ(u)

)

> 1

2

∫

S′3

`

4dQ(u).

We have thus proved that Φ(S1) > `/8 · Q(S′3). We will see how we derivethe wanted result from the following theorem, a variant of which will be provedin the next section.

2

Page 39

Theorem 1.3. Given a partition S′1, S′2, S′3 of a convex body K in Rn,

Q(S′3) > 2

Dd(S1,

′ S2) ·minQ(S′1), Q(S′2),

where D = diam(K), and d(S′1, S′2) the usual Euclidean distance between the

sets S′1, S′2.

We will also use the fact that S′1 and S′2 are ”far apart”. The followingLemma was proved in Lecture 4.

Lemma 1.4. Let u, v ∈ K, such that `(u), `(v) > `. Then

‖u− v‖ 6 tδ√n

=⇒ dTV (Pu, Pv) 6 t+ 1− `.

Now let v ∈ S′1, u ∈ S′2. By Lemma 1.4, we have

dTV (Pu, Pv) > 1− `

2=⇒ ‖u− v > `δ

2√n.

Assume that Q(S′1) < 1/2 ·Q(S1). Then

∫

S1

Pu(S2) dQ(u) > `

4

∫ ′

S1\S1

dQ(u) > `

8vol(S1),

which proves what was wanted. We are thus left with the case Q(S′i) > 1/2 ·Q(Si), i = 1, 2. Then, by Theorem 1.3,

Φ(S1) > `

8· 2

D· `δ

2√n

minQ(S′1), Q(S′2)

> `2

8· δ

D√n· 1

2minQ(S1), Q(S2),

which implies that

φ > `2δ

16D√n,

concluding the proof of the Theorem.

Remarks 1.5. (i) The random walk mixing rate is of order O(1/φ2), so byTheorem 1.2 we have an upper bound O(n4D2).(ii) Another method would be to bound the average local conductance EK(`(u)) >c, so that, choosing δ = c′/

√n, we wouuld have a mixing rate of order O(n2D2).

(iii) By considering the example of a cylinder of height D and base radius 1,δ = 1/

√n, we gain the bound Ω(n2D2).

3

Page 40

D

δ = 1√n

1

Figure 1: The example with the cylinder of height D and base radius 1.

S1

S3

S2

K

Figure 2: A partition of a convex body.

2 The localization lemma and an isoperimetric inequality

We now state and prove the isoperimetric inequality of Theorem 1.3 in the moregeneral context of log-concave functions.

Theorem 2.1. Let f be a log-concave function, K := supp(f), D := diam(K),and Πf the induced probability distribution. Then for any partition S1, S2, S3of K,

Πf (S3) > 2d(S1, S2)

DminΠf (S1),Πf (S2).

We write φf for the best constant c > 0 such that

Πf (S3) > c ·minΠf (S1),Πf (S2).

By a Theorem of Kannan, Lovasz and Simonovits [1], we have

φf > c√EΠf

(‖X −X‖2)=

c√∑λi(A)

,

where X = EΠf(X), A is the covariance matrix of Πf and λ1(A) > λ2(A) >

. . . > λn(A) the eigenvalues of A. In the same paper, KLS prove the upperbound φf 6 c√

λ1(A)and conjecture that φf is actually of the order Θ(1/

√λ1(A)).

Conjecture 2.2 (Kanan, Lovasz, Simonovits, [1]).

φf 6 c√λ1(A)

.

4

Page 41

We also formulate the following (weaker) conjecture, in terms of the Frobe-nius norm of A, which we discuss at the end of these notes.

Conjecture 2.3.

φf > c

(∑λi(A)2)

1/4=

c√‖A‖F

.

We now proceed to the proof of our main result.

Proof of Theorem 2.1. Let c = 2d(S1, S2)/D and suppose that S1, S2, S3 is apartition of K such that

∫

S3

f < c

∫

S1

f and

∫

S3

f < c

∫

S2

f,

or equivalently that∫Rn g > 0 and

∫Rn h > 0, where

(1) g(x) =

cf(x), if x ∈ S1

0, if x ∈ S2

−f(x), if x ∈ S3

and h(x) =

0, if x ∈ S1

cf(x), if x ∈ S2

−f(x), if x ∈ S3

To prove th theorem, we reduce the assertion to the one-dimensional case,through the following localization lemma.

abt

`

Figure 3: The truncated cone construction. Connection of n-dimensional inte-gration with one dimensional.

Lemma 2.4. Let g, h : Rn → R be lower semi-continuous integrable functionssuch that

∫Rn g > 0 and

∫Rn h > 0. Then there exist two points a, b ∈ Rn and

an affine function (”needle”) l : [0, 1]→ R+ such that

∫ 1

0

l(t)n−1g((1−t)a+tb) dt > 0 and

∫ 1

0

l(t)n−1h((1−t)a+tb) dt > 0.

Sketch of the Proof. The proof of the Lemma can be roughly divided into threesteps.Step 1. Let A be a (n − 2)-dimensional affine subspace of Rn. For each such

5

Page 42

A, there is a halfspace H (bisecting halfspace) with A contained in its boundinghyperplane, such that ∫

H

g =1

2

∫

Rn

g,

while at the same time (replacing H by its complementary halfspace, if neces-sary) ∫

H

g > 0 and

∫

H

h > 0.

Now let A1, A2, . . . be a sequence of such (n−2)-dimensional subspaces with ra-tional coorinates, and consider K = K0 ⊇ K1 ⊇ K2 ⊇ . . . a respective sequenceof convex bodies, where each Ki+1 is obtained from Ki by cutting it into two bya bisecting hyperplane Pi through Ai, and choosing the appropriate half. ThenL =

⋂∞i=1Ki is at most 1-dimensional.

Step 2. Without loss of generality, let a = o and b = e1. Define

Zt = x ∈ Rn : x1 = t.

By the Brunn-Minkowski inequality, for each i = 1, 2, . . . the function

ψi =

(vol(Ki ∩ Zt)vol(Ki)

) 1n−1

is concave. Moreover, ψi(t) 6 n1

n−1 and for each 0 6 s 6 t 6 1,

s

tψi(t) 6 ψi(s) 6

1− s1− t ψi(t).

Thus there is a limiting concave function ψ such that

∫ψi(t)

n−1g −→∫ψ(t)n−1g.

Since∫ψi(t)

n−1g > 0 for each i = 1, 2, . . ., it follows that the assertion of theLemma holds for the function ψ in the place of l.

Step 3. The final step in the proof is to obtain the affine function l fromthe concave function ψ of Step 2. The technical details were not presented inclass.

To conclude the proof of the Theorem, we proceed as follows: Partition [0, 1]into Z1, Z2, Z3, where

Zi = t ∈ [0, 1] : (1− t)a+ tb ∈ Si, i = 1, 2, 3.

Apply Lemma 2.4 to the functions g, h of (1) 1.

1Actually the functions g and h as defined in (1) are not lower semi-continuous. Howeverthis can be achieved by expanding S1 and S2 slightly so as to make them open sets, andmaking the support of f an open set. Since we are proving strict inequalities, we do not loseanything by these modifications.

6

Page 43

Rewriting g and h in terms of our original function f we get that∫

Z3

l(t)n−1f((1− t)a+ tb) dt < c

∫

Z1

l(t)n−1f((1− t)a+ tb) dt

and∫

Z3

l(t)n−1f((1− t)a+ tb) dt < c

∫

Z2

l(t)n−1f((1− t)a+ tb) dt.

The functions f and l(·)n−1 are both log-concave, so the same holds for theirproduct F (t) = l(t)n−1f((1− t)a+ tb). We will have reached a contradiction assoon as we prove that

∫

Z3

F > 2d(Z1, Z2) ·min∫

Z1

F,

∫

Z2

F.

Assume at first that Z1, Z3, Z2 are successive intervals, and write u (resp. v)for the right (resp. left) endpoint of Z1 (resp. Z2), so that d(Z1, Z2) = |u− v|.Without loss of generality suppose that F (u) 6 F (v). Then

∫

Z3

F > F (u) · |u− v| = |u− v| · F (u) · |1− 0|

> |u− v| ·∫

Z1

F (u) dt > d(Z1, Z2) ·∫

Z1

F

that proves our claim, disregarding the factor 2 (the full proof would require alittle bit more of a struggle).

In the general setting, consider a maximal interval (r, s) ⊆ Z3. Then, by ourprevious argument, the integral of F over Z3 is at least c times the smaller ofthe integrals to its left [0, r] and to its right [s, 1]. If all of Z1 or Z2 is containedin one of these intervals, we are done. If not, then set U = [0, r] ∨ [s, 1]. SinceZ1, Z2 are seperated by at least d(S1, S2)/D, there is an interval of Z3 of lengthd(S1, S2)/D between U ∩Z1 and U ∩Z2. Repeating the process for this intervalyields the required result.

Open problem 1. Given convex body K ∈ Rn for every distance functiondx(y) = ||x−y||Ex convex in x ∈ K all hyperplane partitions are within constantof the optimum Ψ.

3 Hit-and-run

We now focus on the analysis of the hit-and-run random walk.

Theorem 3.1. [3] For the hit-and-run random walk it holds Φ ≤ cnD staring

from any point in a convex body K.

The following definition of the cross-ratio is used

dK(u, v) =|u− v| |p− q||p− u| |v − q| = (p : v : u : q)

for the proof of the result. In what follows we list a set of results such as thelocalization lemma for hit-and-run without proofs. For more information see [3].

7

Page 44

Theorem 3.2. It holds

πf (s3) ≥ dk(s1, s2) minπf (s1), πf (s2)

Lemma 3.3. For 0 ≤ u ≤ w it holds

(ev − eu) (ew − 1)

(eu − 1) (ew − ev) ≥(v − u)w

u (w − v)

Theorem 3.4. For function h : K → R+

h(x) ≤ 1

3min1, dK(u, v)

for all u ∈ S1, v ∈ S2 and x ∈ chord(u, v).

Finally, for a S1, S2, S3 partition of K it holds π(S3) ≥ EK(h(x))π(S1)π(S2).

4 Open problems

Open problem 2. Find simpler algorithms to round convex bodies. For exam-ple analyse the following

walk for 2T steps in K

1. compute the covariance matrix of the trace X_2T of the walk

2. if exist an eigenvalue l>2

then make isotropic and goto 1

else return K

Open problem 3. Analyse the coordinate directions hit-and-run. That is in-stead from picking a random direction from the unit ball pick a random directionfrom the set −ei, ei,∀i ∈ [n].

Open problem 4. Given a polytope K := x : Ax ≤ b ⊂ Rn is thereany deterministic approximation algorithm that computes (1 + ε)vol(K) in timepolynomial in n, 1/ε?

Open problem 5. If the ball walk starts from EK(x), does it mix in polynomialtime?

Open problem 6. Is there any statistical tests that guarantees that the distribu-tion Qt is close to Q? Is the conductance EQt(||X||2) monotonically increasing?

References

[1] R. Kannan, L. Lovasz, and M. Simonovits. Isoperimetric problems for con-vex bodies and a localization lemma. Discrete & Computational Geometry,13:541-559, 1995.

[2] L. Lovasz and M. Simonovits. Random walks in a convex body and an im-proved volume algorithm. In Random Structures and Alg., volume 4, pages359-412, 1993.

8

Page 45

[3] L. Lovasz and S. Vempala. 2004. Hit-and-run from a corner. In Proceedingsof the thirty-sixth annual ACM symposium on Theory of computing (STOC’04). ACM, New York, NY, USA, 310-314, 2004.

[4] S. Vempala. Algorithmic Convex Geometry. Lecture notes, available at http://www.cc.gatech.edu/~vempala/acg/notes.pdf, 2008.

9

Page 46

Date post:	30-Sep-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Santosh Vempala Algorithmic aspects of...

Documents