+ All Categories
Home > Documents > Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Date post: 18-Dec-2015
Category:
Upload: tyrone-barrett
View: 217 times
Download: 0 times
Share this document with a friend
60
Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: www.math.gatech.edu/~randall )
Transcript
Page 1: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Mixing

Dana RandallGeorgia Tech

A tutorial on Markov chains

( Slides at: www.math.gatech.edu/~randall )

Page 2: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Outline

Fundamentals for designing a Markov chain

Bounding running times (convergence rates)

Connections to statistical physics

Page 3: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Main Q: What do typical elements look like?

Determine properties of “typical’’ elements Evaluate thermodynamic properties

(such as free energy, entropy,…)

Estimate the cardinality of the set “Markov chain Monte Carlo’’

Random sampling can be

used to:

Markov chains for sampling

Given: A large set (matchings, colorings,

independent sets,…)

Page 4: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

A A

K K

2 2

Andrei Andreyevich Markov 1856-1922

Markov chains

Page 5: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Sampling using Markov chains

State space Ω

( |Ω| ~ cn )

Page 6: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Sampling using Markov chains

State space Ω

Step 1. Connect the state space.

( |Ω| ~ cn )

E.g., if Ω = indep. sets of a graph G, connect I and I’ iff |I I’| = 1.

Page 7: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Basics of Markov chains

Starting at x: - Pick a neighbor y. - Move to y with prob. P(x,y) = 1/∆.

- With all remaining prob. stay at x.

Transitions P: Random walk on H

(max deg in H)

H

Def’n: A MC is ergodic if it is: •irreducible - for all x,y Ω, t: Pt(x,y) > 0; (connected) •aperiodic - g.c.d. t: Pt(x,y) > 0 =1.

(not bipartite)(The “t step” transition prob.)

x

y

Page 8: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

The stationary distribution

(1/∆/∆)

Thm: Any finite, ergodic MC converges to a unique stationary distribution π.

Thm: The stationary distribution π satisfies:

(The detailed balance condition)

π(x) P(x,y) = π(y) P(y,x).

P symmetric π is uniform.

˜

So,

Page 9: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

E.g., For >0, sample ind. set I w/ prob: π(I) =

where Z = ∑J |J|.

0 21

|I|

Z

Q: What if we want to sample from some other distribution?

Sampling from non-uniform distributions

Step 2. Carefully define the transition probabilities.

Page 10: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

The Metropolis Algorithm

Propose a move from x to y as before, but accept with probability min (1, π(y)/π(x))

(with remaining probability stay at x).

(MRRTT ’53)

π(y)/∆π(x)1π(y)π(x)

x

y( if π(x) ≥ π(y) )

π(x) P(x,y) = π(y) P(y,x)

1/∆

For independent sets:

min(1,)

I

I v

min(1,-

1)

π(y) (|I|+1)/Z

π(x) (|I|)/Z= =

Page 11: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Q: But for how long do we walk?

Basics continued…

Step 1. Connect the state space.Step 2. Carefully define the transition probabilities.

Starting at any state x0, take a random walk for some number of steps . . . and output the final state (from ?).

Step 3. Bound the mixing time.

This tells us the number of steps to take.

Page 12: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

The mixing rate

Def’n: The total variation distance is ||Pt,π|| = max __ ∑ |Pt(x,y) - π(x)|.

x Ω yΩ 2 1

A Markov chain is rapidly mixing if() is poly (n, log(-1)).

Def’n Given , the mixing time is

= min t: ||Pt’,π|| < , t’ ≥

t.

A

Page 13: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Spectral gap

Let >≥…≥ Ω be the eigenvalues of P.

Def’n: Gap(P) = 1-|2| is the spectral gap.

Mixing rate

Spectral Gap

Thm: (Alon, Alon-Milman, Sinclair)

log ( )

≥ log

( ).

Gap(P)

1

2 Gap(P)

|2|

1

π*

1

2

Page 14: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Outline

Fundamentals for designing a Markov chain

Bounding running times (convergence rates)

Connections to statistical physics

Page 15: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Outline for rest of talk

Techniques:

•Coupling

•Flows and paths

•Indirect methods

Problems:

•Walk on the hypercube

•Colorings

•Matchings

•Independent sets

•Connections with statistical physics: - problems - algorithms - physical insights

Page 16: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Coupling

Page 17: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Coupling

Once they agree, they move in sync (xt=yt

xt+1=yt+1)

Couple moves, but each simulates the MC

Start at any x0 and y0

x0

y0Simulate 2 processes:

Page 18: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Def’n: A coupling is a MC on Ω x Ω:1) Each process Xt, Yt is a faithful

copy of the original MC,

2) If Xt = Yt, then Xt+1 = Yt+1.

Coupling

T = max ( E [ Tx,y ] ), where Tx,y = min t: Xt=Yt | X0=x, Y0=y.

x,y

The coupling time T is:

Thm: () ≤ T e ln -1 . (Aldous’81)

Page 19: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Ex1: Walk on the hypercube

MCCUBE:• Start at v0=(0,0,…,0).• Repeat: - Pick i [n], b 0,1. - Set vi = b.

Symmetric, ergodic π is uniform.

Mixing time? Use coupling:

x0 = 0 1 1 0 0 1 y0 = 1 1 1 0 0 0

i=2, b=0: x1 = 0 0 1 0 0 1 y1 = 1 0 1 0 0 0

i=6, b=1: x2 = 0 0 1 0 0 1 y2 = 1 0 1 0 0 1

i=1, b=1: xt = 1 0 1 1 1 0 yt = 1 0 1 1 1 0. . .

˜

so T = n log n (coupon

collecting)

() = O ( n ln (n -1).

˜

Page 20: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Outline

Techniques:

•Coupling - path coupling

•Flows and paths

•Indirect methods

Problems:

•Walk on the hypercube

•Colorings

•Matchings

•Independent sets

•Connections with statistical physics: - problems - algorithms - physical insights

Page 21: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Ex 2: Colorings

Given: A graph G (max deg d), k > 1.Goal: Find a random k-coloring of G. MCCOL: (Single point replacement)

• Starting at some k-coloring C0

• Repeat: - With prob 1/2 do nothing. - Pick v V, c [k]; - Recolor v with c, if possible.

The “lazy” chain

If k ≥ d + 2, then the state space is connected.

(Therefore π is uniform.)

Note: k ≥ d + 1 colorings exist.(Greedy)

˜

Page 22: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Path Coupling

Coupling: Show for all x,y , E[ (dist(x,y)) ] < 0.

Path coupling: Show for all u,v s.t. dist(u,v)=1, that E[ (dist(u,v)) ] < 0.

-

-

Consider a shortest path:x = z0, z1, z2, . . . , zr= y, dist(zi,zi+1) = 1 dist(x,y) = r.

[Bubley,Dyer,Greenhill’97-8]

E[ (dist(x,y)) ]

≤ i

E[ (dist(zi,zi+1)) ]

≤ 0.

˜

Page 23: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Path coupling for MCCOL

Thm: MCCOL is rapidly mixing if k ≥ 3d. (Jerrum ‘95)

Pf: Use path coupling: dist(x,y) = 1.

x y

w w

E∆dist ≤ ( (k-d)(-1) + 2d(+1) ) = (3d-k) ≤ 0.

12nk12nk

v = w, c C \ , , : ∆dist = -1,Cases:

v N(w), c , : ∆dist = + 1 (or 0) o.w.: ∆dist = 0.

Page 24: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Summary: Coupling

Pros: Can yield very easy proofs

Cons: Demands a lot from the chain

Extensions: Careful coupling (k ≥ 2d) (Jerrum’95)

Change the MC (Luby-R-

Sinclair’95)

“Macromoves” - burn in (Dyer-Frieze’01, Molloy’02) - non-Markovian couplings (Hayes-Vigoda’03)

Page 25: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Outline

Techniques:

•Coupling

•Flows and paths

•Indirect methods

Problems:

•Walk on the hypercube

•Colorings

•Matchings

•Independent sets

•Connections with statistical physics: - problems - algorithms - physical insights

Page 26: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Conductance and flows

Ω

(Jerrum-Sinclair’88)

= min (S)SΩ, π(S)≤1/2

S SC(S) =

∑ π(s) P(s,s’)

∑ π(s)

sS, s’SC

sS

2 Thm: ≤ Gap(P) ≤ 2 2

Page 27: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

x

y

Min cut Max flow

˜

paths: xy: from xΩ, to yΩ, x ≠ y, carrying π(x)π(y) units of flow.

: Make |Ω|2

canonical

(Sinclair’92)

Q(e) = π(u) P(u,v) = π(v) P(v,u).

Capacity of e=(u,v): e

= min l

( lis the max path length )

_

() = max ∑ π(x) π(y) Q(e)

1

xy e

e

The congestion of these paths is:

Ω

Thm: ≤ log ( π(x))-1._

Page 28: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Ex 3: Back to the hypercube

- The complementary pair (u’,v’) determines (s,t), so |

xy e | = 2n-1.

and l= n = Õ(n2).

() = max = = n Q(e)

∑ π(x) π(y)xy e

e

2n-1 2-2n

2-n (1/2n)

˜

s = 0 1 1 0 0 1 t = 1 1 0 0 0 0

Ex 3: Back to the hypercube

s = 0 1 1 0 0 1 t = 1 1 0 0 0 0

Ex 3: Back to the hypercube

1 1 1 0 0 1

s = 0 1 1 0 0 1 t = 1 1 0 0 0 0

Ex 3: Back to the hypercube

1 1 1 0 0 1

1 1 1 0 0 1

s = 0 1 1 0 0 1 t = 1 1 0 0 0 0

Ex 3: Back to the hypercube

1 1 0 0 0 1

1 1 0 0 0 1

1 1 0 0 0 1 t = 1 1 0 0 0 0

1 1 1 0 0 1

1 1 1 0 0 1

u =v =

0 1 0 0 0 0

0 1 0 0 0 0

0 1 1 0 0 0

0 1 1 0 0 0

0 1 1 0 0 0 0 1 1 0 0 1 = s

u’ =v’ =

- Bound the number of paths through (u,v) E.

- Define a canonical path from s to t.

Page 29: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Outline

Techniques:

•Coupling

•Flows and paths

•Indirect methods

Problems:

•Walk on the hypercube

•Colorings

•Matchings

•Independent sets

•Connections with statistical physics: - problems - algorithms - physical insights

Page 30: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Ex 4: Sampling matchings

Page 31: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Ex 4: Sampling matchings

MCMATCH:

Starting at M0, repeat: Pick e = (u,v) E

- If e M, remove e;

- If u and v unmatched in

M, add e;

- If u matched (by e’) and v unmatched (or vice versa), add e and remove e’;

- Otherwise do nothing.

eu v

u ve

e’

eu v

Thm: Coupling won’t work! (Kumar-Ramesh’99)

Page 32: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Mixing time of MCMATCH

s

t

s t

s

t

u

vpaths using (u,v) determined by u’

. . . as before.

u’

Page 33: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Techniques:

•Coupling

•Flows and paths

•Indirect methods

Problems:

•Walk on the hypercube

•Colorings

•Matchings

•Independent sets

•Connections with statistical physics: - problems - algorithms - physical insights

Outline

Page 34: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Goal: Given , sample ind. set I with prob: π(I) = |I|/Z,

Z = ∑J |J|.

Ex 5: Independent Sets

MCIND: Starting at I0, Repeat: - Pick v V and b 0,1; - If v I, b=0, remove v w.p. min (1,-1) - If v I, b=1, add v w.p. min (1,) if possible; - O.w. do nothing.

/

Page 35: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Slow mixing of MCIND (large )

n

n

(nn/2)

10 ∞

S SC

large there is a “bad cut,” . . . so MCIND is slowly mixing.

˜

#R/#B

(Even)

(Odd)

Page 36: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Summary: Flows

Pros: Offers a combinatorial approach to mixing; especially useful for proving slow mixing.

Cons: Requires global knowledge of the chain to spread out paths.

Extensions: Balanced flows (Morris-Sinclair’99) MCMC -- Major highlights: - The permanent (Jerrum-Sinclair-Vigoda’02) - Volume of a convex polytope (Dyer-Frieze-Kannan’89, +… )

Page 37: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Techniques:

•Coupling

•Flows and paths

•Indirect methods - Comparison - Decomposition

Problems:

•Walk on the hypercube

•Colorings

•Matchings

•Independent sets

•Connections with statistical physics: - problems - algorithms - physical insights

Outline

Page 38: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Comparison(Diaconis,Saloff-Coste’93)

unknown

Pknown

P_

w

z

For each edge (x,y) P, make a path x,y using edges in P.

Let (z,w) be the set of paths x,y using (z,w)

_x y

Thm: Gap(P) ≥ Gap(P)._

1A

A = max ∑ |x,y|

π(x)P(x,y)

1

Q(e) exy e

_

Page 39: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Comparison

w

z

(x,y) P x,y (using P)

(z,w) is the set of paths x,y using (z,w)

Thm: Gap(P) ≥ Gap(P)._

1A

x y _known

P

unknownP

_

SS_

SS_

˜

(S,S) cannot be a bad cut in P if it isn’t in P.

__

Page 40: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Adjacency . . . The ˆ Matrix Reloaded

Comparison, aka . . .

Page 41: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Disjoint decomposition

Ω

A1

A3

A2

A6

A5A4

a1

a3 a4

a2

a5

a6

P—

Projection

P3

Restrictions

P

_

π(ai) =

π(Ai)

P(ai,aj) = ∑

π(x)P(x,y)

π(Ai) xAi,

yAj

_

(Madras-R.’96, Martin-R.’00)

Thm: Gap(P) ≥ — Gap(P) (mini Gap(Pi)).12

_

Page 42: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Let Ω = ind. sets of G; Ωk = ind. sets of size k.

For G=(V,E):

Ex 6: MCIND on small ind. sets

MCSWAP:Starting at I0, Repeat: - Pick (u,v,b) V x V x 0,1,2; - If b=0 and u V, remove u w.p. min (1,-1) - If b=1 and u V, add u w.p. min (1,) if possible; - If b=2 remove u and add v (if possible); - O.w. do nothing.

* Consider first the “swap” chain:

/

Thm: MCIND is rapidly mixing

on

Ωk , where K = |V|/2(∆+1).

k = 0

K

Page 43: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Ind. sets w/bounded size (cont.)

Thm: MCIND is rapidly mixing on

Ωk , where K=|V|/2(∆+1).k = 1

K

Ω0 Ω1 Ω2 . . . ΩK-1 ΩK

Ωk

a0 a1 a2 . . .aK-1 aK

ProjectionRestrictions

|ΩK| is logconcave, . . .

so P is rapidly mixing. _

.?

MCSWAP

Page 44: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

The Restrictions of MCswap

Ω0 Ω1 Ω2 . . . ΩK-1 ΩK

Ωk

ProjectionRestrictions

.

Thm: MCSWAP is rapidly mixing on Ωk , k < K. (Bubley-Dyer’97)

.

KThm: MCSWAP is rapidly mixing on

Ωk .

k = 1 (Decomposition)

Cor: MCIND is rapidly mixing on Ωk .

k = 1

K

(Comparison)

Page 45: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Summary: Indirect methods

Pros: Offer a top down approach; allow hybrid methods to be used..

Extensions: Comparison thm for log-Sobolev (Diaconis-Saloff-Coste’96) Comparison for Glauber dynamics (R.-Tetali ‘98) Decomposition for log-Sobolev (Jerrum-Son-Tetali-Vigoda ‘02)

Cons: Can increase the complexity.

Page 46: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Techniques:

•Coupling

•Flows and paths

•Hybrid methods

Problems:

•Walk on the hypercube

•Colorings

•Matchings

•Independent sets

•Connections with statistical physics: - problems - algorithms - physical insights

Outline

Page 47: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

They have a need for sampling

Use many interesting heuristics

Great intuition

Experts on “large data sets’’

Microscopic

Macroscopic details behavior

(i.e., phase transitions)

Why Statistical Physics?

Page 48: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )
Page 49: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

(3-colorings) (Independent sets)

(Matchings) (Min cut)

- - -- +

Models from statistical physics

Potts model

Hardcore model

Dimer model

--- --

- --

-++

++

+

++

+

+-

Ising model

+

Page 50: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Independent sets:

π(I)=|I|/Z

Models (cont.)

Matchings:

π(M)=|M|/Z

Ising model:

π()= |E |/Z,

E= = u v: (u) =

(v)

(E = E= E≠)

˜

-- --

--++

++

+

++

+

-

+

=

ˇ

Page 51: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Models: (The physics perspective)

Independent sets: H() = -|I|

If = e then π() = |I| /Z.

Given: A physical system Ω = Define: A Gibbs measure as follows:

π() = e-H()/ Z,

H() (the Hamiltonian),

= 1/kT (inverse temperature),

normalizing constant or partition function. where Z = ∑ e

-H() is the

Ising model: H() = -∑ u v

(u,v) E

If = e2 then π() = |E | /Z.=

Page 52: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Physics perspective (cont.)

Q: What about on the infinite lattice? Use conditional probabilities:

?

But there can be boundary effects !!!

Page 53: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Phase transitions: Ind. sets

Low temperature: long range effects

High temperature: ∂ effects die out

regions

……

T∞

T0

Tc

TC indicates a “phase transition.”

Page 54: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Slow mixing of MCIND

revisited

S SC

n

n

(n n)

#R/#B

10

π(Si) = ∑ π(s) e-H(s)/Z

Si

sSi

“Entropy “Energy term” term”

Page 55: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Group by # of “fault lines”

S SC

. . .

Fault lines are vacant pathsof width 2 from top to bottom (or left to right).

SR

S1

SB

S3

S2

Page 56: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

“Peierls Argument”

2. Shift right of fault by 1 and flip colors.

For fixed path length l,

S1

SB x 2n/2 x 3l.

1. Identify horizontalor vertical fault line .

( S1)

3. Remove rt column ; add points along fault line, if possible.

( SB)

Page 57: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Peierls Argument cont.

≤ 2n/2 3l

S1 SB

( ≥ l - n/2more points)

≤ π(SB) 2n/2 3n (n/2) (poly(n)) /n)

≤ π(SB) ( )n/2 (poly(n)),

if > 18.

18

π(S1) = ∑ π()eS1

≤ ∑ ∑ π() 2n/2 3l (n/2-l)

l eSB

(and similarly for S2, S3, …)

Page 58: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Conclusions

Techniques:• Coupling: can be easy

when it works

•Flows: requires global knowledge of chain;

very useful for slow mixing

• Connection to physics: can offer tremendous insights

Open problems: . . .

• Indirect methods: top down approach; often increases complexity

Page 59: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Conclusions

Open problems:

...

Sampling 4,5,6-colorings on the grid.

Sampling perfect matchings on non-bipartite graphs. Sampling acyclic orientations in a graph. Sampling configurations of the Potts model (a generalization of Ising, but with more colors).

How can we further exploit phase transitions? Other physical intuition?

Page 60: Mixing Dana Randall Georgia Tech A tutorial on Markov chains ( Slides at: randall )

Recommended