Exact Approximate MCMC for Big Data - Lancaster University · Massive increase inindustrial...

Exact Approximate MCMC for Big Data

Jack Baker

STOR-i, Lancaster University

April 24, 2015

Jack Baker (STOR-i) Exact Approximate MCMC: Big Data April 24, 2015 1 / 13

Motivation

Massive increase in industrial interest.

Companies who do not use their customer information are seen as ata massive competetive disadvantage.

Every 2 days we create as much information as we did from thebeginning of time until 2003 [Eric Schmidt].


Metropolis-Hastings: A Brief Review

Aim: simulate from the posterior p(θ|y) ∝ p(θ)p(y |θ) of an unknownparameter vector θ given data y .

Assume: the posterior can be evaluated up to a proportionalityconstant, therefore must be able to evaluate the likelihood p(y |θ).

Given current state θ:Sample θ′ from some proposal q(θ′|θ)

Accept θ′ with probability

min{

1, p(θ′)p(y |θ′)q(θ|θ′)

p(θ)p(y |θ)q(θ′|θ)

}


Metropolis-Hastings for Big Data: A Key Issue

Suppose we have a large number of observations n. Then thelikelihood p(y |θ) becomes expensive to calculate.

Some potential solutions:Subsample: only use some of the data y to calculate the likelihood.Parallelise: split the data into S samples and run SMetropolis-Hastings (MH) samplers in parallel. Then recombine.


The Rest of the Talk: Overview

Pseudo-marginal MCMC: enables us to replace the likelihood p(y |θ)with a Monte Carlo estimate p̂(y |θ) in a Metropolis-Hastings stylesampler.

But method ensures we still target the desired posterior p(θ|y), henceexact approximate MCMC.

Question: can we use this result in subsampling for big data?

Only using a sample of the data to calculate the likelihood can beseen as providing an estimate of it.


Pseudo-marginal MCMC: Motivation

Complex statistical models often depend on latent processes whichcannot be marginalised analytically.

Examples: speech recognition, gene prediction, protein folding.

Traditional MCMC, alternately updating the latent state and theunknown parameters often mixes poorly.

However the marginal likelihood can often be estimated using MonteCarlo techniques, e.g. importance sampling.


Pseudo-marginal MCMC

We can’t evaluate the (marginal) likelihood but we can provide anunbiased, Monte Carlo estimate of it p̂(y |θ).

Given previous state θ, and likelihood estimate p̂(y |θ):Simulate θ′ from a proposal q(θ′|θ).

At new state θ′, produce a Monte Carlo estimate p̂(y |θ′) to thelikelihood.

Accept new state with probability

min{

1, p(θ′)p̂(y |θ′)q(θ|θ′)

p(θ)p̂(y |θ)q(θ′|θ)

}


Pseudo-marginal MCMC: Results

Result: Provided p̂(y |θ) is unbiased and non-negative then thisalgorithm targets the desired posterior p(θ|y) (Andrieu and Roberts,2009).

Note: The estimate of the likelihood at the previous state must bereused for this to be the case.

The price: mixing, which is affected by the variance in p̂(y |θ).


Pseudo-marginal MCMC: MixingMixing: balance between performance and computational expense. Forbest results, Doucet et al. (2012) suggest using a variance of 1 in the logof p̂(y |θ).


Pseudo-Marginal MCMC and Big Data

Idea: we can use subsamples to provide estimates of the likelihood

Result: Provided these estimates are unbiased, then using thepseudo-marginal result we will sample from the desired posteriorp(θ|y).

Mixing issues: Korattikara et al. (2013) found that building estimatesfrom simple random samples led to variance that was too high forpseudo-marginal MCMC to mix properly.


Pseudo-marginal MCMC and Big Data

Idea: Quiroz et al. (2014) suggest subsampling data points in a wayproportional to the size of their contribution to the likelihood.

The price: since we do not wish to calculate the likelihood, themethod requires the construction of a proxy to the true likelihood.


Discussion: Open Areas

Proxies: efficient methods for constructing proxies to the likelihood.Balance between computational expense and mixing.

Sampling: alternative sampling methods for the problem.

Alternative approaches: methods to use only a subsample of the datato estimate the likelihood, while still targeting the desired posterior(e.g. data augmentation).

Parallelisation of MCMC: methods to recombine parallelised MCMCwhich still targets the desired posterior. Some methods withasymptotic results, but do not extend to large parameter spaces well.


Date post:	14-Mar-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Exact Approximate MCMC for Big Data - Lancaster University · Massive increase inindustrial...

Documents