Accelerating Bayesian inference with computationally ... · Accelerating Bayesian inference with...

Post on 02-Aug-2020

8 views 0 download

transcript

Accelerating Bayesian inference with computationallyintensive models, with application to Pine Island Glacier

Patrick R. Conrad (MIT), Patrick Heimbach (MIT), Youssef Marzouk (MIT),Natesh Pillai (Harvard), and Aaron Smith (Univ. of Ottawa)

Antarctica and climate change

The Western Antarctic Ice Sheet has recently showngrowing mass loss along the Amundsen coast

Western Antarctic Ice Sheet

[Rignot et al. 2011]

Pine Island Glacier

[NASA]

Vast uncertainty in ice-ocean dynamics

Figure: Temperature profile under Pine Island Glacier, Antarctica [Jacobset al.]

I How readily is heat absorbed by the ice?I How much mixing occurs near the ice-ocean interface?I Ultimately, can we predict melt rates and the stability of

the glacier?

Forward model of ice-ocean coupling

I MIT General CirculationModel, configured forPine Island

I Realistic geometry oncoarse scale (4 km × 4km × 20 m) or fine scale(1 km × 1 km × 20 m)models

I Several input parametersare unknown

Constructing an inference problem

Satellite imageBathymetry and sample

locations

102oW 30’ 101oW 30’ 100oW 30’

24’

12’

75oS

48’

36’

0

100

200

300

400

500

600

700

800

900

1000

I Representative locations for temperature and salinityobservations

Bayesian inference illustration

I Bayesian inferenceexpresses our prior beliefsover parameters θ ∈ Rn,with a probability density,

p(θ),

and constructs a posteriorprobability density,

p(θ|d) ∝ L(θ|d, f(θ))p(θ)

expressing our beliefs aftercomparing the datad ∈ Rd , to thecomputationally expensiveforward model f(θ).

I Well suited to limiteddata and complex models

θMAP

Posterior Contours

Prior Contours

Markov chain Monte Carlo (MCMC)

Posterior contours

Proposal contours

MCMC samples

I Significant literature discusses proposals that “mix”quickly, i.e., that generate nearly independent samples

I Evaluates forward model N timesI Run-time can be dominated by cost of fI Standard MCMC links cost of understanding p(θ|d) and

f(θ)

MCMC with Local Approximations

Given X0, initialize S0, then simulate chain {Xt}t≤N withkernel:

MH Kernel Kt(x , ·)1. Given Xt, draw qt ∼ Q(Xt, ·) from kernel Q with

(symmetric) translation invariant density q(x , ·)2. Compute acceptance ratio

α = min

(1,L(θ|d, f̃t(qt))p(qt)

L(θ|d, f̃t(Xt))p(Xt)

)3. As needed, select new samples near qt or Xt, yieldingSt ⊆ St+1. Refine f̃t → f̃t+1.

4. Draw u ∼ U(0, 1). If u < α, let Xt+1 = qt, otherwiseXt+1 = Xt.

Local approximations

I To compute f̃(θ), construct a model over ball BR(θ)I Use samples θi ∈ S at distance r = ‖θ − θi‖ < RI Approximation converges locally under loose conditions

[Cleveland]I For example, quadratic approximations over BR(θ) [Conn

et al.]:‖f −QRf‖ ≤ ‖f‖κλR3

Local approximation illustration

Early times Late times

Models are refined using new points chosen when modelquality appears poor

Ergodicity and exactness of approximate samplers

Assume the log-posterior is approximated with localquadratic models and θ ∈ X ⊆ Rn for compact X , orp(θ|d) obeys a Gaussian envelope:

limr→∞

sup|θ|=r

| log(p(θ|d))− log(p∞(θ))| = 0

for some quadratic form log(p∞) with negative definitecoefficient matrix.

Then under standard regularity assumptions forgeometrically ergodic kernel K∞ and posterior p(θ|d), thechain Xt is ergodic and asymptotically samples from theexact posterior:

limt→∞‖P(Xt)− p(θ|d)‖TV = 0

Example: Elliptic permeability inversion

Infer parameters of k given observations of u in the PDE:

∇s · (k(s, θ)∇su(s, θ)) = 0,

Accuracy of chains

104 105

MCMC step

10-3

10-2

10-1

100

101

Rela

tive c

ovari

ance

err

or

True model

Linear

Quadratic

GP

Cost of chains

104 105

MCMC step

102

103

104

105

Tota

l num

ber

of

evalu

ati

ons True model

Linear

Quadratic

GP

Prior and likelihood selection

I Priors are log-normal with expert-chosen mean and widthI Likelihoods are i.i.d. Gaussian with variance suggested by

in situ experimental data

Parameter Nominal value, µ′ Prior “width” σ′

Drag coefficients 1.5E-3 1.5E-3Heat & Salt transfer 1.0E-4 0.5E-4Prandtl Number 13.8 1.Schmidt Number 2432. 200.Horizontal Diffusion 5.0E-5 5.0E-5ZetaN 5.2E-2 0.5E-3

Temperature – 0.04Salinity – 0.1

Computational details and results

I Compute synthetic data using the fine scale model, try toinfer them using the coarse scale

I Constructed 30 parallel chains with shared evaluationsI Chains run for approximately two weeksI Results shown after burn-in is removed

Inference cost summary

Samples Model runs Savings

Drill and surface 225,000 53,000 ≥ 4.2x

Surface only 450,000 52,000 ≥ 8.6x

Prior and posterior marginals

Drag

Drill and Surface

Surface Only

Prior

Transfer

1.00e−04

2.50e−04

Prandtl

13

14.5

Schmidt

2400

2600

Diff

2.00e−05

1.20e−04

0.0515 0.0525

Zeta

2 6

x 10−3

0.0515

0.0525

1 2.5

x 10−4

13 14.5 2400 2600 2 12

x 10−5

Contributions

I Introduce a novel framework for using localapproximations within MCMC; prove that theframework produces asymptotically exact samples.

I Demonstrate strong numerical performance on canonicalinference problems.

I Construct a realistic, synthetic inference problem forice-ocean coupling near Pine Island Glacier.

I Apply local approximation methods to reducecomputational cost of inference in the Pine Island Glaciersetting.

This work is supported by the US Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under Award Number DE-SC0007099, part of the SciDAC Institute for the Quantification of Uncertainty in Extreme-Scale Computations (QUEST). prconrad@mit.edu, ymarz@mit.edu