+ All Categories
Home > Documents > A Family of MCMC Methods on Implicitly Defined Manifolds · A Family of MCMC Methods on Implicitly...

A Family of MCMC Methods on Implicitly Defined Manifolds · A Family of MCMC Methods on Implicitly...

Date post: 07-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
1
A Family of MCMC Methods on Implicitly Defined Manifolds Marcus A. Brubaker ,+ , Mathieu Salzmann and Raquel Urtasun Toyota Technological Institute at Chicago + University of Toronto, Canada Introduc)on: Tradi&onal MCMC methods (e.g., GaussMetropolis, HMC) assume the target distribu&on is over a Euclidean space However, many problems exist which are most naturally characterized over a nonlinear manifold Sampling from posteriors that arise in such problems has typically required the deriva&on of posteriorspecific sampling schemes Contribu)ons: Here we derive an MCMC scheme based on Hamiltonian dynamics on an implicitly defined manifold We prove that, subject to suitable condi&ons, the Markov Chain converges to the target posterior We present constrained variants of several MCMC methods including: GaussMetropolis, Hamiltonian (and Langevin) Monte Carlo and Riemann Manifold HMC [6] These algorithms are demonstrated on a range of problems including: o Sampling from a linearly constrained Gaussian distribu&on o Sampling from the Binghamvon MisesFisher distribu&on over o Bayesian matrix factoriza&on for collabora&ve filtering o Human pose es&ma&on Matlab code available from: hSp://www.cs.toronto.edu/~mbrubake/ Previous Work: Similar methods are commonly used in molecular dynamics to compute the free energy of a constrained system (eg, [13]) Gibbs samplers have been derived for some distribu&ons (eg, [4]) but even those specialized methods are outperformed by methods presented here M = {q R n |c(q)=0} π(q) S n Experimental Results: Gaussian distribu&on in a linear subspace Binghamvon MisesFisher Collabora&ve filtering Human pose es&ma&on o Pose is a set of 3D joint posi&ons o Manifold is induced by the limb length constraints of the skeleton o Posterior combines noisy 2D joint projec&ons with a PCA based prior model of pose o Compared with direct op&miza&on for different levels of noise References: 1. G. Cicco^ and J. P. Ryckaert. Molecular dynamics simula&on of rigid molecules. Computer Physics Report, 4(6):346–392, 1986 2. C. Hartmann. An ergodic sampling scheme for constrained Hamiltonian systems with applica&ons to molecular dynamics. Journal of Sta&s&cal Physics, 130:687–711, 2008 3. T. Lelièvre, M. Rousset, and G. Stoltz. Free energy computa&ons: A Mathema&cal Perspec&ve. Imperial College Press, 2010 4. P. D. Hoff. Simula&on of the matrix Binghamvon MisesFIsher distribu&on, with applica&ons to mul&variate and rela&onal data. Journal of Computa&onal and Graphical Sta&s&cs, 18:438–456, 2009 5. E. Hairer, C. Lubich, and G. Wanner. Geometric Numerical Integra&on. Springer, 2nd edi&on, 2006 6. M. Girolami and B. Calderhead. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Sta&s&cal Society: Series B, 73:123–214, 2011 0 0.01 0.02 0.03 0.04 0.05 0 0.2 0.4 0.6 0.8 1 CHMC (L = 4) CHMC (L = 3) CHMC (L = 2) CLangevin CMetropolis Gibbs 20 40 60 80 100 100 200 300 400 Frame # Mean joint error [mm] Constr opt Ours MAP Ours mean 0 2 4 6 8 10 50 100 150 200 250 Noise std Mean joint error [mm] Constr opt Ours MAP Ours mean M = {q R n |c(q)=0} Theore)cal Result: Assume that is connected, smooth and differen&able with fullrank everywhere and the target posterior is strictly posi&ve on Given: a mass matrix which is posi&ve definite on a simula&on poten&al energy func&on which is con&nuous a numerical integra&on method which is symmetric, locally accessible, consistent with the Simula&on Hamiltonian , and symplec0c on the cotangent bundle Theorem: For all where denotes steps of the Markov transi&on kernel of the Constrained Hamiltonian Monte Carlo algorithm C(q)= c q M(q) M M π(q) ˆ U (q) Φ ˆ H h : T M T M T M = (p, q)|c(q) = 0 and C(q) ˆ H p (p, q)=0 C 2 ˆ H q 0 M lim n→∞ T n (q 0 ·) π(·) =0 T n (q 0 ·) n Simula)on of constrained Hamiltonian systems Need a symplec&c, consistent and symmetric integra&on method on Generalized RATTLE Algorithm (see [5] for details and other op&ons) If and the mass matrix is constant, RATTLE reduces to Leapfrog M p1/2 = p0 h 2 ˆ H(p1/2,q0) q + C(q0) T λ q1 = q0 + h 2 ˆ H(p1/2,q0) p + ˆ H(p1/2,q1) p 0 = c(q1) p1 = p1/2 h 2 ˆ H(p1/2,q1) q + C(q1) T μ 0 = C(q1) ˆ H(p1,q1) p M = R n Instances of Constrained HMC: GaussMetropolis with covariance can expressed as HMC with and . Constrained GaussMetropolis is thus similarly defined. Constrained Langevin Monte Carlo arises with Constrained Riemann Manifold HMC [6] arises for suitable choices of Σ ˆ U (q)=0 M(q)= Σ 1 L =1 M(q) ï10 0 10 ï15 ï10 ï5 0 5 CHMC ï10 0 10 ï15 ï10 ï5 0 5 CLangevin ï10 0 10 ï15 ï10 ï5 0 5 CMetropolis M = S n π(q) exp(d T q + q T Aq) Method E[log π(q)] ESS % ESS/second CHMC (L = 4) -999.021 27.3 183.756 CHMC (L = 3) -998.759 25.4 217.427 CHMC (L = 2) -999.121 37.9 440.898 CLangevin -998.757 33.0 619.339 CMetropolis -998.82 3.8 90.1513 Gibbs [4] -998.742 50.8 160.722 M = Vr(R N ) × Vr(R M ) × R r π(U, S, V) (i,j)E exp (f (UiSVj) Yi,j) 2 2σ 2 p 1M Movie Lens (RMSE) EachMovie (RMSE) r 5 10 15 5 10 15 HMC 1.577 ± 0.39 2.001 ± 0.66 2.306 ± 0.25 1.153 ± 0.002 1.161 ± 0.002 1.204 ± 0.018 HMC-l 0.909 ± 0.008 0.949 ± 0.01 0.99 ± 0.01 1.155 ± 0.007 1.164 ± 0.001 1.184 ± 0.004 CHMC 0.893 ± 0.01 0.888 ± 0.01 0.889 ± 0.01 1.144 ± 0.002 1.121 ± 0.001 1.116 ± 0.001 CHMC-l 0.888 ± 0.01 0.881 ± 0.01 0.881 ± 0.01 1.137 ± 0.003 1.115 ± 0.002 1.11 ± 0.002 Constrained Hamiltonian Monte Carlo: Input: Define: o Cotangent Projec0on: o Acceptance Hamiltonian: o Simula0on Hamiltonian: 1. , 2. For , 3. With probability o Return 4. Else o Return q 0 ,M(q), h, L, π(q), ˆ U (q) i =1,...,L (p i ,q i ) Φ ˆ H h (p i1 ,q i1 ) P(q)= I M(q) T C(q) T C(q)M(q) 1 M(q) T C(q) T 1 C(q)M(q) 1 ˆ H(p, q)= 1 2 p T M(q) 1 p + ˆ U (q) H(p, q)= 1 2 p T M(q) 1 p + 1 2 log |2πP(q) T M(q)P(q)| log π(q) q L q 0 p 0 N (0,M(q 0 )) p 0 P(q 0 )p 0 min {1, exp(H(p 0 ,q 0 ) H(p L ,q L ))}
Transcript
Page 1: A Family of MCMC Methods on Implicitly Defined Manifolds · A Family of MCMC Methods on Implicitly Defined Manifolds! Marcus A. Brubaker ,+, Mathieu Salzmann and Raquel Urtasun

A Family of MCMC Methods on Implicitly Defined Manifolds!Marcus A. Brubaker,+, Mathieu Salzmann and Raquel Urtasun!

Toyota Technological Institute at Chicago!+ University of Toronto, Canada!

Introduc)on:  •  Tradi&onal  MCMC  methods  (e.g.,  Gauss-­‐Metropolis,  HMC)  assume  the  

target  distribu&on  is  over  a  Euclidean  space  •  However,  many  problems  exist  which  are  most  naturally  characterized  over  

a  non-­‐linear  manifold  •  Sampling  from  posteriors  that  arise  in  such  problems  has  typically  required  

the  deriva&on  of  posterior-­‐specific  sampling  schemes  

Contribu)ons:  •  Here  we  derive  an  MCMC  scheme  based  on  Hamiltonian  dynamics  on  an  

implicitly  defined  manifold  •  We  prove  that,  subject  to  suitable  condi&ons,  the  Markov  Chain  converges  

to  the  target  posterior  •  We  present  constrained  variants  of  several  MCMC  methods  including:  

Gauss-­‐Metropolis,  Hamiltonian  (and  Langevin)  Monte  Carlo  and  Riemann  Manifold  HMC  [6]  

•  These  algorithms  are  demonstrated  on  a  range  of  problems  including:  o  Sampling  from  a  linearly  constrained  Gaussian  distribu&on  o  Sampling  from  the  Bingham-­‐von  Mises-­‐Fisher  distribu&on  over  o  Bayesian  matrix  factoriza&on  for  collabora&ve  filtering  o  Human  pose  es&ma&on  

•  Matlab  code  available  from:  hSp://www.cs.toronto.edu/~mbrubake/  

Previous  Work:  •  Similar  methods  are  commonly  used  in  molecular  dynamics  to  compute  the  

free  energy  of  a  constrained  system  (eg,  [1-­‐3])  •  Gibbs  samplers  have  been  derived  for  some  distribu&ons  (eg,  [4])  but  even  

those  specialized  methods  are  outperformed  by  methods  presented  here  

M = {q ∈ Rn|c(q) = 0}

π(q)

Sn

Experimental  Results:  •  Gaussian  distribu&on  in  a  linear  subspace  

•  Bingham-­‐von  Mises-­‐Fisher  

•  Collabora&ve  filtering  

•  Human  pose  es&ma&on  o  Pose  is  a  set  of  3D  joint  posi&ons  o  Manifold  is  induced  by  the  limb  length                                                                                              

constraints  of  the  skeleton  o  Posterior  combines  noisy  2D  joint  projec&ons                                                                                      

with  a  PCA  based  prior  model  of  pose  o  Compared  with  direct  op&miza&on  for                                                                                                                            

different  levels  of  noise  

References:  1. G.  Cicco^  and  J.  P.  Ryckaert.  Molecular  dynamics  simula&on  of  rigid  molecules.  Computer  Physics  Report,  4(6):346–392,  1986  

2. C.  Hartmann.  An  ergodic  sampling  scheme  for  constrained  Hamiltonian  systems  with  applica&ons  to  molecular  dynamics.  Journal  of  Sta&s&cal  Physics,  130:687–711,  2008  

3. T.  Lelièvre,  M.  Rousset,  and  G.  Stoltz.  Free  energy  computa&ons:  A  Mathema&cal  Perspec&ve.  Imperial  College  Press,  2010  

4. P.  D.  Hoff.  Simula&on  of  the  matrix  Bingham-­‐von  Mises-­‐FIsher  distribu&on,  with  applica&ons  to  mul&variate  and  rela&onal  data.  Journal  of  Computa&onal  and  Graphical  Sta&s&cs,  18:438–456,  2009  

5. E.  Hairer,  C.  Lubich,  and  G.  Wanner.  Geometric  Numerical  Integra&on.  Springer,  2nd  edi&on,  2006  6. M.  Girolami  and  B.  Calderhead.  Riemann  manifold  Langevin  and  Hamiltonian  Monte  Carlo  methods.  Journal  of  the  Royal  Sta&s&cal  Society:  Series  B,  73:123–214,  2011  

0 0.01 0.02 0.03 0.04 0.05

0

0.2

0.4

0.6

0.8

1

CHMC (L = 4)CHMC (L = 3)CHMC (L = 2)CLangevinCMetropolisGibbs

20 40 60 80 100

100

200

300

400

Frame #

Mea

n jo

int e

rror [

mm

]

Constr optOurs MAPOurs mean

0 2 4 6 8 10

50

100

150

200

250

Noise std

Mea

n jo

int e

rror [

mm

]

Constr optOurs MAPOurs mean

M = {q ∈ Rn|c(q) = 0}Theore)cal  Result:  •  Assume  that                                                                                                is  connected,  smooth  and  

differen&able  with                                            full-­‐rank  everywhere  and  the  target    posterior                    is  strictly  posi&ve  on    

•  Given:    a  mass  matrix                        which  is  posi&ve  definite  on      a  simula&on  poten&al  energy  func&on                        which  is              con&nuous    a  numerical  integra&on  method                                                                                which  is        

symmetric,  locally  accessible,  consistent  with  the  Simula&on  Hamiltonian          ,  and  symplec0c  on  the  co-­‐tangent  bundle  

•  Theorem:  For  all                                  

where                                                denotes          steps  of  the  Markov  transi&on  kernel  of  the  Constrained  Hamiltonian  Monte  Carlo  algorithm  

C(q) = ∂c∂q

M(q) M

Mπ(q)

U(q)ΦH

h : T ∗M → T ∗M

T ∗M =�(p, q)|c(q) = 0 and C(q)∂H∂p (p, q) = 0

C2

H

q0 ∈ M

limn→∞

�Tn(q0 → ·)− π(·)� = 0

Tn(q0 → ·) n

Simula)on  of  constrained  Hamiltonian  systems  •  Need  a  symplec&c,  consistent  and  symmetric  integra&on  method  on  •  Generalized  RATTLE  Algorithm  (see  [5]  for  details  and  other  op&ons)  

•  If                                      and  the  mass  matrix  is  constant,  RATTLE  reduces  to  Leapfrog  

M

p1/2 = p0 −h

2

�∂H(p1/2, q0)

∂q+ C(q0)

q1 = q0 +h

2

�∂H(p1/2, q0)

∂p+

∂H(p1/2, q1)

∂p

0 = c(q1)

p1 = p1/2 −h

2

�∂H(p1/2, q1)

∂q+ C(q1)

0 = C(q1)∂H(p1, q1)

∂p

M = Rn

Instances  of  Constrained  HMC:  •  Gauss-­‐Metropolis  with  covariance          can  expressed  as  HMC  with                                      

and                                                  .    Constrained  Gauss-­‐Metropolis  is  thus  similarly  defined.  •  Constrained  Langevin  Monte  Carlo  arises  with    •  Constrained  Riemann  Manifold  HMC  [6]  arises  for  suitable  choices  of    

Σ U(q) = 0M(q) = Σ−1

L = 1M(q)

10 0 1015

10

5

0

5CHMC  

10 0 1015

10

5

0

5CLangevin  

10 0 1015

10

5

0

5CMetropolis  

M = Sn π(q) ∝ exp(dT q + qTAq)

Method E[− log π(q)] ESS % ESS/second

CHMC (L = 4) -999.021 27.3 183.756

CHMC (L = 3) -998.759 25.4 217.427

CHMC (L = 2) -999.121 37.9 440.898

CLangevin -998.757 33.0 619.339

CMetropolis -998.82 3.8 90.1513

Gibbs [4] -998.742 50.8 160.722

M = Vr(RN )× Vr(RM )× Rr π(U,S,V) ∝�

(i,j)∈E

exp

�− (f(UiSVj)−Yi,j)2

2σ2p

1M Movie Lens (RMSE) EachMovie (RMSE)r 5 10 15 5 10 15

HMC 1.577 ± 0.39 2.001 ± 0.66 2.306 ± 0.25 1.153 ± 0.002 1.161 ± 0.002 1.204 ± 0.018

HMC-l 0.909 ± 0.008 0.949 ± 0.01 0.99 ± 0.01 1.155 ± 0.007 1.164 ± 0.001 1.184 ± 0.004

CHMC 0.893 ± 0.01 0.888 ± 0.01 0.889 ± 0.01 1.144 ± 0.002 1.121 ± 0.001 1.116 ± 0.001

CHMC-l 0.888 ± 0.01 0.881 ± 0.01 0.881 ± 0.01 1.137 ± 0.003 1.115 ± 0.002 1.11 ± 0.002

Constrained  Hamiltonian  Monte  Carlo:  •  Input:  •  Define:  o  Co-­‐tangent  Projec0on:  

o  Acceptance  Hamiltonian:  

o  Simula0on  Hamiltonian:  

1.                                                                       ,  2.  For                                                  ,      3.  With  probability    o  Return  

4.  Else  o  Return  

q0, M(q), h, L, π(q), U(q)

i = 1, . . . , L (pi, qi) ← ΦH

h (pi−1, qi−1)

P(q) = I −M(q)−TC(q)T�C(q)M(q)−1M(q)−TC(q)T

�−1C(q)M(q)−1

H(p, q) = 12p

TM(q)−1p+ U(q)

H(p, q) = 12p

TM(q)−1p+ 12 log |2πP(q)TM(q)P(q)|− log π(q)

qL

q0

p�0 ∼ N (0,M(q0)) p0 ← P(q0)p�0

min {1, exp(H(p0, q0)−H(pL, qL))}

Recommended