+ All Categories
Home > Documents > Bayesian Inference

Bayesian Inference

Date post: 22-Feb-2016
Category:
Upload: atara
View: 26 times
Download: 0 times
Share this document with a friend
Description:
Bayesian Inference. Chris Mathys Wellcome Trust Centre for Neuroimaging UCL SPM Course London, October 25, 2013. Thanks to Jean Daunizeau and Jérémie Mattout for previous versions of this talk. A spectacular piece of information. A spectacular piece of information. - PowerPoint PPT Presentation
32
Bayesian Inference Chris Mathys Wellcome Trust Centre for Neuroimaging UCL SPM Course London, October 25, 2013 Thanks to Jean Daunizeau and Jérémie Mattout for previous versions of this talk
Transcript
Page 1: Bayesian Inference

Bayesian Inference

Chris Mathys

Wellcome Trust Centre for Neuroimaging

UCL

SPM Course

London, October 25, 2013

Thanks to Jean Daunizeau and Jérémie Mattout for previous versions of this talk

Page 2: Bayesian Inference

A spectacular piece of information

2Oct 25, 2013

Page 3: Bayesian Inference

A spectacular piece of information

Messerli, F. H. (2012). Chocolate Consumption, Cognitive Function, and Nobel Laureates.

New England Journal of Medicine, 367(16), 1562–1564.

3Oct 25, 2013

Page 4: Bayesian Inference

This is a question referring to uncertain quantities. Like almost all scientific

questions, it cannot be answered by deductive logic. Nonetheless, quantitative

answers can be given – but they can only be given in terms of probabilities.

Our question here can be rephrased in terms of a conditional probability:

To answer it, we have to learn to calculate such quantities. The tool for this is

Bayesian inference.

So will I win the Nobel prize if I eat lots of chocolate?

4Oct 25, 2013

Page 5: Bayesian Inference

«Bayesian» = logicaland

logical = probabilistic

«The actual science of logic is conversant at present only with things either

certain, impossible, or entirely doubtful, none of which (fortunately) we have to

reason on. Therefore the true logic for this world is the calculus of probabilities,

which takes account of the magnitude of the probability which is, or ought to be,

in a reasonable man's mind.»

— James Clerk Maxwell, 1850

5Oct 25, 2013

Page 6: Bayesian Inference

But in what sense is probabilistic reasoning (i.e., reasoning about uncertain

quantities according to the rules of probability theory) «logical»?

R. T. Cox showed in 1946 that the rules of probability theory can be derived

from three basic desiderata:

1. Representation of degrees of plausibility by real numbers

2. Qualitative correspondence with common sense (in a well-defined sense)

3. Consistency

«Bayesian» = logicaland

logical = probabilistic

6Oct 25, 2013

Page 7: Bayesian Inference

By mathematical proof (i.e., by deductive reasoning) the three desiderata as set out by

Cox imply the rules of probability (i.e., the rules of inductive reasoning).

This means that anyone who accepts the desiderata must accept the following rules:

1. (Normalization)

2. (Marginalization – also called the sum rule)

3. (Conditioning – also called the product rule)

«Probability theory is nothing but common sense reduced to calculation.»

— Pierre-Simon Laplace, 1819

The rules of probability

7Oct 25, 2013

Page 8: Bayesian Inference

The probability of given is denoted by

In general, this is different from the probability of alone (the marginal probability of ),

as we can see by applying the sum and product rules:

Because of the product rule, we also have the following rule (Bayes’ theorem) for going

from to :

Conditional probabilities

8Oct 25, 2013

Page 9: Bayesian Inference

In our example, it is immediately clear that is very different from . While the first is

hopeless to determine directly, the second is much easier to find out: ask Nobel

laureates how much chocolate they eat. Once we know that, we can use Bayes’ theorem:

Inference on the quantities of interest in fMRI/DCM studies has exactly the same

general structure.

The chocolate example

9Oct 25, 2013

prior

posterior

likelihood

evidence

model

Page 10: Bayesian Inference

forward problem

likelihood

inverse problem

posterior distribution

Inference in SPM

𝑝 (𝜗|𝑦 ,𝑚 )

𝑝 (𝑦|𝜗 ,𝑚 )

10Oct 25, 2013

Page 11: Bayesian Inference

Likelihood:

Prior:

Bayes’ theorem:

generative model

Inference in SPM

𝑝 (𝑦|𝜗 ,𝑚 )

𝑝 (𝜗|𝑚 )

𝑝 (𝜗|𝑦 ,𝑚)=𝑝 (𝑦|𝜗 ,𝑚 )𝑝 (𝜗|𝑚 )𝑝 (𝑦|𝑚 )

11Oct 25, 2013

Page 12: Bayesian Inference

A simple example of Bayesian inference(adapted from Jaynes (1976))

Assuming prices are comparable, from which manufacturer would you buy?

A: B:

Two manufacturers, A and B, deliver the same kind of components that turn out to

have the following lifetimes (in hours):

Oct 25, 2013 12

Page 13: Bayesian Inference

A simple example of Bayesian inference

How do we compare such samples?

• By comparing their arithmetic means

Why do we take means?

• If we take the mean as our estimate, the error in our estimate is the mean of the

errors in the individual measurements

• Taking the mean as maximum-likelihood estimate implies a Gaussian error

distribution

• A Gaussian error distribution appropriately reflects our prior knowledge about

the errors whenever we know nothing about them except perhaps their variance

Oct 25, 2013 13

Page 14: Bayesian Inference

What next?

• Let’s do a t-test (but first, let’s compare variances with an F-test):

Is this satisfactory? No, so what can we learn by turning to probability

theory (i.e., Bayesian inference)?

A simple example of Bayesian inference

Means not significantly different!

Oct 25, 2013 14

Variances not significantly different!

Page 15: Bayesian Inference

A simple example of Bayesian inference

The procedure in brief:

• Determine your question of interest («What is the probability that...?»)

• Specify your model (likelihood and prior)

• Calculate the full posterior using Bayes’ theorem

• [Pass to the uninformative limit in the parameters of your prior]

• Integrate out any nuisance parameters

• Ask your question of interest of the posterior

All you need is the rules of probability theory.

(Ok, sometimes you’ll encounter a nasty integral – but that’s a technical difficulty,

not a conceptual one).

Oct 25, 2013 15

Page 16: Bayesian Inference

A simple example of Bayesian inference

The question:

• What is the probability that the components from manufacturer B

have a longer lifetime than those from manufacturer A?

• More specifically: given how much more expensive they are, how

much longer do I require the components from B to live.

• Example of a decision rule: if the components from B live 3 hours

longer than those from A with a probability of at least 80%, I will

choose those from B.

Oct 25, 2013 16

Page 17: Bayesian Inference

A simple example of Bayesian inference

The model (bear with me, this will turn out to be simple):

• likelihood (Gaussian):

• prior (Gaussian-gamma):

Oct 25, 2013 17

Page 18: Bayesian Inference

A simple example of Bayesian inference

The posterior (Gaussian-gamma):

Parameter updates:

with

Oct 25, 2013 18

Page 19: Bayesian Inference

A simple example of Bayesian inference

The limit for which the prior becomes uninformative:

• For , , , the updates reduce to:

• As promised, this is really simple: all you need is , the number of

datapoints; , their mean; and , their variance.

• This means that only the data influence the posterior and all influence from the

parameters of the prior has been eliminated.

• The uninformative limit should only ever be taken after the calculation of the

posterior using a proper prior.

Oct 25, 2013 19

Page 20: Bayesian Inference

A simple example of Bayesian inference

Integrating out the nuisance parameter gives rise to a t-

distribution:

Oct 25, 2013 20

Page 21: Bayesian Inference

A simple example of Bayesian inference

The joint posterior is simply the product of our two

independent posteriors and . It will now give us the answer to

our question:

Note that the t-test told us that there was «no significant

difference» even though there is a >95% probability that the

parts from B will last 3 hours longer than those from A.

Oct 25, 2013 21

Page 22: Bayesian Inference

Bayesian inference

The procedure in brief:

• Determine your question of interest («What is the probability that...?»)

• Specify your model (likelihood and prior)

• Calculate the full posterior using Bayes’ theorem

• [Pass to the uninformative limit in the parameters of your prior]

• Integrate out any nuisance parameters

• Ask your question of interest of the posterior

All you need is the rules of probability theory.

Oct 25, 2013 22

Page 23: Bayesian Inference

Frequentist (or: orthodox, classical) versus Bayesian inference: parameter estimation

if then reject H0

• estimate parameters (obtain test stat.)

• define the null, e.g.:

• apply decision rule, i.e.:

Classical

𝐻0 :𝜗=0𝑝 (𝑡|𝐻 0 )

𝑝 (𝑡>𝑡∗|𝐻0 )

𝑡∗ 𝑡≡𝑡 (𝑌 )

𝑝 (𝑡>𝑡∗|𝐻0 )≤𝛼

if then accept H0

• invert model (obtain posterior pdf)

• define the null, e.g.:

• apply decision rule, i.e.:

Bayesian

𝑝 (𝜗|𝑦 )

𝑝 (𝐻0|𝑦 )

𝐻0 :𝜗>0

𝑝 (𝐻0|𝑦 ) ≥𝛼

23Oct 25, 2013

Page 24: Bayesian Inference

• Principle of parsimony: «plurality should not be assumed without necessity»

• Automatically enforced by Bayesian model comparison

y=f(x

)y

= f(

x)

x

Model comparison

mod

el e

vide

nce

p(y|

m)

space of all data sets

Model evidence:

“Occam’s razor” :

𝑝 (𝑦|𝑚 )=∫𝑝 (𝑦|𝜗 ,𝑚 )𝑝 (𝜗|𝑚 ) d𝜗

24Oct 25, 2013

Page 25: Bayesian Inference

Y

• Define the null and the alternative hypothesis in terms of priors, e.g.:

0 0

1 1

1 if 0:

0 otherwise

: 0,

H p H

H p H N

0

1

1P H yP H y

if then reject H0• Apply decision rule, i.e.:

y

1p Y H

0p Y H

space of all datasets

Frequentist (or: orthodox, classical) versus Bayesian inference: model comparison

25Oct 25, 2013

Page 26: Bayesian Inference

Applications of Bayesian inference

26Oct 25, 2013

Page 27: Bayesian Inference

realignment smoothing

normalisation

general linear model

template

Gaussian field theory

p <0.05

statisticalinference

segmentationand normalisation

dynamic causalmodelling

posterior probabilitymaps (PPMs)

multivariatedecoding

27Oct 25, 2013

Page 28: Bayesian Inference

grey matter CSFwhite matter

yi ci

k

2

1

1 2 k

class variances

classmeans

ith voxelvalue

ith voxellabel

classfrequencies

Segmentation (mixture of Gaussians-model)

28Oct 25, 2013

Page 29: Bayesian Inference

PPM: regions best explainedby short-term memory model

PPM: regions best explained by long-term memory model

fMRI time series

GLM coeff

prior varianceof GLM coeff

prior varianceof data noise

AR coeff(correlated noise)

short-term memorydesign matrix (X)

long-term memorydesign matrix (X)

fMRI time series analysis

29Oct 25, 2013

Page 30: Bayesian Inference

m2m1 m3 m4

V1 V5stim

PPC

attention

V1 V5stim

PPC

attention

V1 V5stim

PPC

attention

V1 V5stim

PPC

attention

m1 m2 m3 m4

15

10

5

0

V1 V5stim

PPC

attention

1.25

0.13

0.46

0.39 0.26

0.26

0.10estimated

effective synaptic strengthsfor best model (m4)

models marginal likelihoodln p y m

Dynamic causal modeling (DCM)

30Oct 25, 2013

Page 31: Bayesian Inference

m1

m2

diffe

renc

es in

log-

mod

el e

vide

nces

1 2ln lnp y m p y m

subjects

Fixed effect

Random effect

Assume all subjects correspond to the same model

Assume different subjects might correspond to different models

Model comparison for group studies

31Oct 25, 2013

Page 32: Bayesian Inference

Thanks

32Oct 25, 2013


Recommended