Fast MCMC Algorithms on Polytopes
Raaz Dwivedi, Department of EECS
Joint work with Yuansi Chen Martin Wainwright Bin Yu
Random Sampling
• Consider the problem of drawing random samples from a given density (known up-to proportionality)
X1, X2, . . . , Xm ⇠ ⇡⇤
Applications
• Probabilities of Events • Rare Event Simulations • Bayesian Posterior Mean • Volume Computation (polynomial time)
X1, X2, . . . , Xm ⇠ ⇡⇤
E[g(X)] =
Zg(x)⇡⇤(x)dx ⇡ 1
m
mX
i=1
g(Xi)
Applications
• Probabilities of Events • Rare Event Simulations • Bayesian Posterior Mean • Volume Computation (polynomial time)
X1, X2, . . . , Xm ⇠ ⇡⇤
E[g(X)] =
Zg(x)⇡⇤(x)dx ⇡ 1
m
mX
i=1
g(Xi)
Applications
• Zeroth order optimization: Polynomial time algorithms based on Random Walk
• Convex optimization: Bertsimas and Vempala 2004, Kalai and Vempala 2006, Kannan and Narayanan 2012, Hazan et al. 2015
• Non-convex optimization, Simulated Annealing: Aarts and Korst 1989, Rakhlin et al. 2015
minx2K
g(x)
Uniform Sampling on Polytopes
• Integration of arbitrary functions under linear constraints
• Mixed Integer Programming
• Sampling non negative integer matrices with specified row and column sums (contingency tables)
• Connections between optimization and sampling algorithms
GoalGiven A and b, and a starting distribution ,
design an MCMC algorithm
that generates a random sample from uniform distribution on
in as few steps as possible!
Convergence Rate: Mixing time for total variation
µ0
kµ0Pk � ⇡⇤kTV ✏
X =
⇢x 2 Rd
���� Ax b
�
Markov Chain Monte Carlo
• Design a Markov Chain which can converge to the desired distribution • Metropolis Hastings Algorithms (1950s), Gibbs Sampling (1980s)
• Simulate the Markov chain for several steps to get a sample
Markov Chain Monte Carlo
• Sampling on convex sets: Ball Walk (Lovász et al. 1990), Hit-and-run (Smith et al. 1993, Lovász 1999),
• Sampling on polytopes: Dikin Walk (Kannan and Hariharan 2012, Hariharan 2015, Sachdeva and Vishnoi 2016), Geodesic Walk (Lee and Vempala 2016)
Ball Walk [Lovász and Simonovits 1990]
• Propose a uniform point in a ball around x
• reject if outside the polytope, else move to it
z
z
z ⇠ UB✓x,
cpd
◆�
Ball Walk [Lovász and Simonovits 1990]
• Mixing time depends on conditioning of the set
Rmin
Rmax
#steps = O
✓d2
R2max
R2min
◆
per step cost = nd
Can be exponential in d
• Proposal
• Another variant
• Accept Reject:
Dikin Walk [Kannan and Narayanan 2012]
z
z
z ⇠ N✓x,
r2
dD�1
x
◆
P( accept z) = min
⇢1,
P (z ! x)
P (x ! z)
�
z ⇠ U [Dx(r)]
• Proposal
Dikin Walk [Kannan and Narayanan 2012]
z
zDx =
nX
i=1
aia>i(bi � a>i x)
2
A =
2
6664
—a>1 ——a>2 —
...—a>n—
3
7775K =
�x 2 Rd|Ax b
Log Barrier Method (Optimization)
[Dikin 1967, Nemirovski 1990]
z ⇠ N✓x,
r2
dD�1
x
◆
Upper bounds
Ball Walk Dikin Walk ? ?
#Steps
Per Step Cost
nd
nd
n = #constraints d = #dimensions
n > dnd2
d2R2
max
R2min
Slow mixing of Dikin Walk
�1.0 �0.5 0.0 0.5 1.0�1.0
�0.5
0.0
0.5
1.0
Dikin
�1.0 �0.5 0.0 0.5 1.0�1.0
�0.5
0.0
0.5
1.0
Dikin
#constraints = 128#constraints = 4
–Lovász’s Lemma
“If any two points that are apart have overlap in their transition regions, then the chain mixes in
steps.”
�
(Distance and overlap measured in appropriately)
⇢
O
✓1
�2⇢2
◆
–Lovász’s Lemma
“If any two points that are apart have overlap in their transition regions, then the chain mixes in
steps.”
� ⇢
O
✓1
�2⇢2
◆
For any fixed overlap , we want far away points to have overlapping regions, and hence large
ellipsoids (contained within the polytope) are useful.
⇢⇢
Improving Dikin Walk
Dikin Proposal
z ⇠ N✓x,
r2
dD�1
x
◆
Dx =nX
i=1
aia>i(bi � a>i x)
2
nX
i=1
wi(x)aia>i
(bi � a>i x)2
Importance weighting of constraints
Log Barrier Method [Dikin 1967, Nemirovski 1990]
Dikin Proposal
z ⇠ N✓x,
r2
dD�1
x
◆
Dx =nX
i=1
aia>i(bi � a>i x)
2
[Kannan and Narayanan 2012]
Improving Dikin Walk
Sampling meets optimization (again!!)
Volumetric Barrier Method [Vaidya 1993]
[Chen, D., Wainwright and Yu 2017]
Log Barrier Method [Dikin 1967, Nemirovski 1990]
Dikin Proposal
z ⇠ N✓x,
r2
dD�1
x
◆
Dx =nX
i=1
aia>i(bi � a>i x)
2
[Kannan and Narayanan 2012]
Vaidya Proposal
z ⇠ N✓x,
r2pnd
V �1x
◆
Vx =nX
i=1
✓�x,i +
d
n
◆aia>i
(bi � a>i x)2
�x,i =a>i D
�1x ai
(bi � a>i x)2
Vaidya Walk [Chen, D., Wainwright, Yu 2017]
#constraints = 128#constraints = 4
�1.0 �0.5 0.0 0.5 1.0�1.0
�0.5
0.0
0.5
1.0
Dikin
Vaidya
�1.0 �0.5 0.0 0.5 1.0�1.0
�0.5
0.0
0.5
1.0
Dikin
Vaidya
Convergence Rates
Ball Walk Dikin Walk
Vaidya Walk
#Steps
Per Step Cost
n0.5d1.5nd
n constraints d dimensions
n > d
d2R2
max
R2min
Convergence Rates
Ball Walk Dikin Walk
Vaidya Walk
#Steps
Per Step Cost
n0.5d1.5nd
ndn constraints d dimensions
n > dnd2
d2R2
max
R2min
nd2
/ n0.45
�1.0 �0.5 0.0 0.5 1.0�1.0
�0.5
0.0
0.5
1.0
initial
�1.0 �0.5 0.0 0.5 1.0�1.0
�0.5
0.0
0.5
1.0
target
Dikin Walk vs Vaidya Walk
k = 0 k = 1
k = #iterations#experiments = 200#dimensions = 2
Dikin Walk
Vaidya Walk
k=10 k=100 k=500 k=1000
Dikin Walk vs Vaidya Walkk = #iterations#experiments = 200#constraints = 64
Dikin Walk
Vaidya Walk
k=10 k=100 k=500 k=1000
Dikin Walk vs Vaidya Walkk = #iterations#experiments = 200#constraints = 64
Dikin Walk
Vaidya Walk
k=10 k=100 k=500 k=1000
Dikin Walk vs Vaidya Walkk = #iterations#experiments = 200#constraints = 64
Dikin Walk
Vaidya Walk
k=10 k=100 k=500 k=1000
Small number of constraints: No Winner!
k = #iterations#experiments = 200#constraints = 64
Dikin Walk
Vaidya Walk
Dikin Walk vs Vaidya Walkk = #iterations#experiments = 200#constraints = 2048
k=10 k=100 k=500 k=1000
Dikin Walk
Vaidya Walk
Dikin Walk vs Vaidya Walkk = #iterations#experiments = 200#constraints = 2048
k=10 k=100 k=500 k=1000
Dikin Walk
Vaidya Walk
Dikin Walk vs Vaidya Walkk = #iterations#experiments = 200#constraints = 2048
k=10 k=100 k=500 k=1000
Dikin Walk
Vaidya Walk
Vaidya walk wins!k = #iterations#experiments = 200#constraints = 2048
k=10 k=100 k=500 k=1000
Dikin Walk vs Vaidya Walk/ n0.9
/ n0.45
O(nd) O(n0.5d1.5)
101 102 103
n
101
102
103
k̂ mix
Dikin
Vaidya / n0.9
/ n0.45
#constraints (n)
vs
Approx. Mixing Time
Dikin Walk
Vaidya Walk
k=0 k=10 k=100 k=500 k=1000
Dikin Walk
Vaidya Walk
#constraints = 64
#constraints = 2048
k=0 k=10 k=100 k=500 k=1000
Can we improve further?
Log Barrier Method [Dikin 1967, Nemeirovski 1990]
Vaidya’s Volumetric Barrier Method
[Vaidya 1993]
Dikin Proposal
z ⇠ N✓x,
r2
dD�1
x
◆
Dx =nX
i=1
aia>i(bi � a>i x)
2
Vaidya Proposal
z ⇠ N✓x,
r2pnd
V �1x
◆
Vx =nX
i=1
✓�x,i +
d
n
◆aia>i
(bi � a>i x)2
�x,i =a>i D
�1x ai
(bi � a>i x)2
[Kannan and Narayanan, 2012]
John Walk
Log Barrier Method [Dikin 1967, Nemirovski 1990]
Vaidya’s Volumetric Barrier Method
[Vaidya 1993]
Dikin Proposal
z ⇠ N✓x,
r2
dD�1
x
◆
Dx =nX
i=1
aia>i(bi � a>i x)
2
Vaidya Proposal
z ⇠ N✓x,
r2pnd
V �1x
◆
Vx =nX
i=1
✓�x,i +
d
n
◆aia>i
(bi � a>i x)2
�x,i =a>i D
�1x ai
(bi � a>i x)2
John Proposal
z ⇠ N✓x,
r2
d1.5J�1x
◆
Jx =nX
i=1
jx,iaia>i
(bi � a>i x)2
jx,i = convex program
[Chen, D., Wainwright, Yu 2017][Kannan and Narayanan, 2012]
John’s Ellipsoidal Algorithm
[Fritz John 1948, Lee and Sidford 2015]
Mixing Times
Dikin Walk Vaidya Walk John Walk
#Steps
Per Step Cost
n0.5d1.5nd
n = #constraints d = #dimensions
n > d
d2.5 log4n
d
Mixing Times
Dikin Walk Vaidya Walk John Walk
#Steps
Per Step Cost
n0.5d1.5nd
n = #constraints d = #dimensions
n > d
d2.5 log4n
d
nd2 nd2 nd2 log2 n
Conjecture
Dikin Walk Vaidya Walk John Walk
#Steps
Per Step Cost
n0.5d1.5nd
n = #constraints d = #dimensions
n > d
nd2 nd2 nd2 log2 n
d2 logc⇣nd
⌘
Proof Idea
• Proof relies on Lovasz’s Lemma
• Need to establish that near by points have similar transition distributions
• Have to show that the weighted matrices are sufficiently smooth — use of weights makes it involved