A Family of MCMC Methods on Implicitly Defined Manifolds · A Family of MCMC Methods on Implicitly...

Post on 07-Oct-2020

0 views 0 download

transcript

A Family of MCMC Methods on Implicitly Defined Manifolds!Marcus A. Brubaker,+, Mathieu Salzmann and Raquel Urtasun!

Toyota Technological Institute at Chicago!+ University of Toronto, Canada!

Introduc)on:  •  Tradi&onal  MCMC  methods  (e.g.,  Gauss-­‐Metropolis,  HMC)  assume  the  

target  distribu&on  is  over  a  Euclidean  space  •  However,  many  problems  exist  which  are  most  naturally  characterized  over  

a  non-­‐linear  manifold  •  Sampling  from  posteriors  that  arise  in  such  problems  has  typically  required  

the  deriva&on  of  posterior-­‐specific  sampling  schemes  

Contribu)ons:  •  Here  we  derive  an  MCMC  scheme  based  on  Hamiltonian  dynamics  on  an  

implicitly  defined  manifold  •  We  prove  that,  subject  to  suitable  condi&ons,  the  Markov  Chain  converges  

to  the  target  posterior  •  We  present  constrained  variants  of  several  MCMC  methods  including:  

Gauss-­‐Metropolis,  Hamiltonian  (and  Langevin)  Monte  Carlo  and  Riemann  Manifold  HMC  [6]  

•  These  algorithms  are  demonstrated  on  a  range  of  problems  including:  o  Sampling  from  a  linearly  constrained  Gaussian  distribu&on  o  Sampling  from  the  Bingham-­‐von  Mises-­‐Fisher  distribu&on  over  o  Bayesian  matrix  factoriza&on  for  collabora&ve  filtering  o  Human  pose  es&ma&on  

•  Matlab  code  available  from:  hSp://www.cs.toronto.edu/~mbrubake/  

Previous  Work:  •  Similar  methods  are  commonly  used  in  molecular  dynamics  to  compute  the  

free  energy  of  a  constrained  system  (eg,  [1-­‐3])  •  Gibbs  samplers  have  been  derived  for  some  distribu&ons  (eg,  [4])  but  even  

those  specialized  methods  are  outperformed  by  methods  presented  here  

M = {q ∈ Rn|c(q) = 0}

π(q)

Sn

Experimental  Results:  •  Gaussian  distribu&on  in  a  linear  subspace  

•  Bingham-­‐von  Mises-­‐Fisher  

•  Collabora&ve  filtering  

•  Human  pose  es&ma&on  o  Pose  is  a  set  of  3D  joint  posi&ons  o  Manifold  is  induced  by  the  limb  length                                                                                              

constraints  of  the  skeleton  o  Posterior  combines  noisy  2D  joint  projec&ons                                                                                      

with  a  PCA  based  prior  model  of  pose  o  Compared  with  direct  op&miza&on  for                                                                                                                            

different  levels  of  noise  

References:  1. G.  Cicco^  and  J.  P.  Ryckaert.  Molecular  dynamics  simula&on  of  rigid  molecules.  Computer  Physics  Report,  4(6):346–392,  1986  

2. C.  Hartmann.  An  ergodic  sampling  scheme  for  constrained  Hamiltonian  systems  with  applica&ons  to  molecular  dynamics.  Journal  of  Sta&s&cal  Physics,  130:687–711,  2008  

3. T.  Lelièvre,  M.  Rousset,  and  G.  Stoltz.  Free  energy  computa&ons:  A  Mathema&cal  Perspec&ve.  Imperial  College  Press,  2010  

4. P.  D.  Hoff.  Simula&on  of  the  matrix  Bingham-­‐von  Mises-­‐FIsher  distribu&on,  with  applica&ons  to  mul&variate  and  rela&onal  data.  Journal  of  Computa&onal  and  Graphical  Sta&s&cs,  18:438–456,  2009  

5. E.  Hairer,  C.  Lubich,  and  G.  Wanner.  Geometric  Numerical  Integra&on.  Springer,  2nd  edi&on,  2006  6. M.  Girolami  and  B.  Calderhead.  Riemann  manifold  Langevin  and  Hamiltonian  Monte  Carlo  methods.  Journal  of  the  Royal  Sta&s&cal  Society:  Series  B,  73:123–214,  2011  

0 0.01 0.02 0.03 0.04 0.05

0

0.2

0.4

0.6

0.8

1

CHMC (L = 4)CHMC (L = 3)CHMC (L = 2)CLangevinCMetropolisGibbs

20 40 60 80 100

100

200

300

400

Frame #

Mea

n jo

int e

rror [

mm

]

Constr optOurs MAPOurs mean

0 2 4 6 8 10

50

100

150

200

250

Noise std

Mea

n jo

int e

rror [

mm

]

Constr optOurs MAPOurs mean

M = {q ∈ Rn|c(q) = 0}Theore)cal  Result:  •  Assume  that                                                                                                is  connected,  smooth  and  

differen&able  with                                            full-­‐rank  everywhere  and  the  target    posterior                    is  strictly  posi&ve  on    

•  Given:    a  mass  matrix                        which  is  posi&ve  definite  on      a  simula&on  poten&al  energy  func&on                        which  is              con&nuous    a  numerical  integra&on  method                                                                                which  is        

symmetric,  locally  accessible,  consistent  with  the  Simula&on  Hamiltonian          ,  and  symplec0c  on  the  co-­‐tangent  bundle  

•  Theorem:  For  all                                  

where                                                denotes          steps  of  the  Markov  transi&on  kernel  of  the  Constrained  Hamiltonian  Monte  Carlo  algorithm  

C(q) = ∂c∂q

M(q) M

Mπ(q)

U(q)ΦH

h : T ∗M → T ∗M

T ∗M =�(p, q)|c(q) = 0 and C(q)∂H∂p (p, q) = 0

C2

H

q0 ∈ M

limn→∞

�Tn(q0 → ·)− π(·)� = 0

Tn(q0 → ·) n

Simula)on  of  constrained  Hamiltonian  systems  •  Need  a  symplec&c,  consistent  and  symmetric  integra&on  method  on  •  Generalized  RATTLE  Algorithm  (see  [5]  for  details  and  other  op&ons)  

•  If                                      and  the  mass  matrix  is  constant,  RATTLE  reduces  to  Leapfrog  

M

p1/2 = p0 −h

2

�∂H(p1/2, q0)

∂q+ C(q0)

q1 = q0 +h

2

�∂H(p1/2, q0)

∂p+

∂H(p1/2, q1)

∂p

0 = c(q1)

p1 = p1/2 −h

2

�∂H(p1/2, q1)

∂q+ C(q1)

0 = C(q1)∂H(p1, q1)

∂p

M = Rn

Instances  of  Constrained  HMC:  •  Gauss-­‐Metropolis  with  covariance          can  expressed  as  HMC  with                                      

and                                                  .    Constrained  Gauss-­‐Metropolis  is  thus  similarly  defined.  •  Constrained  Langevin  Monte  Carlo  arises  with    •  Constrained  Riemann  Manifold  HMC  [6]  arises  for  suitable  choices  of    

Σ U(q) = 0M(q) = Σ−1

L = 1M(q)

10 0 1015

10

5

0

5CHMC  

10 0 1015

10

5

0

5CLangevin  

10 0 1015

10

5

0

5CMetropolis  

M = Sn π(q) ∝ exp(dT q + qTAq)

Method E[− log π(q)] ESS % ESS/second

CHMC (L = 4) -999.021 27.3 183.756

CHMC (L = 3) -998.759 25.4 217.427

CHMC (L = 2) -999.121 37.9 440.898

CLangevin -998.757 33.0 619.339

CMetropolis -998.82 3.8 90.1513

Gibbs [4] -998.742 50.8 160.722

M = Vr(RN )× Vr(RM )× Rr π(U,S,V) ∝�

(i,j)∈E

exp

�− (f(UiSVj)−Yi,j)2

2σ2p

1M Movie Lens (RMSE) EachMovie (RMSE)r 5 10 15 5 10 15

HMC 1.577 ± 0.39 2.001 ± 0.66 2.306 ± 0.25 1.153 ± 0.002 1.161 ± 0.002 1.204 ± 0.018

HMC-l 0.909 ± 0.008 0.949 ± 0.01 0.99 ± 0.01 1.155 ± 0.007 1.164 ± 0.001 1.184 ± 0.004

CHMC 0.893 ± 0.01 0.888 ± 0.01 0.889 ± 0.01 1.144 ± 0.002 1.121 ± 0.001 1.116 ± 0.001

CHMC-l 0.888 ± 0.01 0.881 ± 0.01 0.881 ± 0.01 1.137 ± 0.003 1.115 ± 0.002 1.11 ± 0.002

Constrained  Hamiltonian  Monte  Carlo:  •  Input:  •  Define:  o  Co-­‐tangent  Projec0on:  

o  Acceptance  Hamiltonian:  

o  Simula0on  Hamiltonian:  

1.                                                                       ,  2.  For                                                  ,      3.  With  probability    o  Return  

4.  Else  o  Return  

q0, M(q), h, L, π(q), U(q)

i = 1, . . . , L (pi, qi) ← ΦH

h (pi−1, qi−1)

P(q) = I −M(q)−TC(q)T�C(q)M(q)−1M(q)−TC(q)T

�−1C(q)M(q)−1

H(p, q) = 12p

TM(q)−1p+ U(q)

H(p, q) = 12p

TM(q)−1p+ 12 log |2πP(q)TM(q)P(q)|− log π(q)

qL

q0

p�0 ∼ N (0,M(q0)) p0 ← P(q0)p�0

min {1, exp(H(p0, q0)−H(pL, qL))}