J. Daunizeau Institute of Empirical Research in Economics, Zurich, Switzerland Brain and Spine...

Post on 28-Mar-2015

212 views 0 download

Tags:

transcript

J. Daunizeau

Institute of Empirical Research in Economics, Zurich, Switzerland

Brain and Spine Institute, Paris, France

Bayesian inference

Overview of the talk

1 Probabilistic modelling and representation of uncertainty1.1 Bayesian paradigm

1.2 Hierarchical models

1.3 Frequentist versus Bayesian inference

2 Numerical Bayesian inference methods

2.1 Sampling methods

2.2 Variational methods (ReML, EM, VB)

3 SPM applications

3.1 aMRI segmentation

3.2 Decoding of brain images

3.3 Model-based fMRI analysis (with spatial priors)

3.4 Dynamic causal modelling

Overview of the talk

1 Probabilistic modelling and representation of uncertainty1.1 Bayesian paradigm

1.2 Hierarchical models

1.3 Frequentist versus Bayesian inference

2 Numerical Bayesian inference methods

2.1 Sampling methods

2.2 Variational methods (ReML, EM, VB)

3 SPM applications

3.1 aMRI segmentation

3.2 Decoding of brain images

3.3 Model-based fMRI analysis (with spatial priors)

3.4 Dynamic causal modelling

Degree of plausibility desiderata:- should be represented using real numbers (D1)- should conform with intuition (D2)- should be consistent (D3)

a=2b=5

a=2

• normalization:

• marginalization:

• conditioning :(Bayes rule)

Bayesian paradigmprobability theory: basics

Bayesian paradigmderiving the likelihood function

- Model of data with unknown parameters:

y f e.g., GLM: f X

- But data is noisy: y f

- Assume noise/residuals is ‘small’:

22

1exp

2p

4 0.05P

→ Distribution of data, given fixed parameters:

22

1exp

2p y y f

f

Likelihood:

Prior:

Bayes rule:

Bayesian paradigmlikelihood, priors and the model evidence

generative model m

Bayesian paradigmforward and inverse problems

,p y m

forward problem

likelihood

,p y m

inverse problem

posterior distribution

Principle of parsimony :« plurality should not be assumed without necessity »

y=f(

x)y

= f(

x)

x

“Occam’s razor” :

mo

de

l evi

de

nce

p(y

|m)

space of all data sets

Model evidence:

Bayesian paradigmmodel comparison

••• hierarchy

causality

Hierarchical modelsprinciple

Hierarchical modelsdirected acyclic graphs (DAGs)

•••

prior densities posterior densities

Hierarchical modelsunivariate linear hierarchical model

t t Y t *

0*P t t H

0p t H

0*P t t H if then reject H0

• estimate parameters (obtain test stat.)

H0: 0• define the null, e.g.:

• apply decision rule, i.e.:

classical SPM

p y

0P H y

0P H y if then accept H0

• invert model (obtain posterior pdf)

H0: 0• define the null, e.g.:

• apply decision rule, i.e.:

Bayesian PPM

Frequentist versus Bayesian inferencea (quick) note on hypothesis testing

Y

• define the null and the alternative hypothesis in terms of priors, e.g.:

0 0

1 1

1 if 0:

0 otherwise

: 0,

H p H

H p H N

0

1

1P H y

P H yif then reject H0• apply decision rule, i.e.:

y

Frequentist versus Bayesian inferencewhat about bilateral tests?

1p Y H

0p Y H

space of all datasets

• Savage-Dickey ratios (nested models, i.i.d. priors):

10 1

1

0 ,

0

p y Hp y H p y H

p H

Overview of the talk

1 Probabilistic modelling and representation of uncertainty1.1 Bayesian paradigm

1.2 Hierarchical models

1.3 Frequentist versus Bayesian inference

2 Numerical Bayesian inference methods

2.1 Sampling methods

2.2 Variational methods (ReML, EM, VB)

3 SPM applications

3.1 aMRI segmentation

3.2 Decoding of brain images

3.3 Model-based fMRI analysis (with spatial priors)

3.4 Dynamic causal modelling

Sampling methodsMCMC example: Gibbs sampling

Variational methodsVB / EM / ReML

→ VB : maximize the free energy F(q) w.r.t. the “variational” posterior q(θ) under some (e.g., mean field, Laplace) approximation

1 or 2q

1 or 2 ,p y m

1 2, ,p y m

1

2

Overview of the talk

1 Probabilistic modelling and representation of uncertainty1.1 Bayesian paradigm

1.2 Hierarchical models

1.3 Frequentist versus Bayesian inference

2 Numerical Bayesian inference methods

2.1 Sampling methods

2.2 Variational methods (ReML, EM, VB)

3 SPM applications

3.1 aMRI segmentation

3.2 Decoding of brain images

3.3 Model-based fMRI analysis (with spatial priors)

3.4 Dynamic causal modelling

realignmentrealignment smoothingsmoothing

normalisationnormalisation

general linear modelgeneral linear model

templatetemplate

Gaussian Gaussian field theoryfield theory

p <0.05p <0.05

statisticalstatisticalinferenceinference

segmentationand normalisation

segmentationand normalisation

dynamic causalmodelling

dynamic causalmodelling

posterior probabilitymaps (PPMs)

posterior probabilitymaps (PPMs)

multivariatedecoding

multivariatedecoding

grey matter CSFwhite matter

yi ci

k

2

1

1 2 k

class variances

classmeans

ith voxelvalue

ith voxellabel

classfrequencies

aMRI segmentationmixture of Gaussians (MoG) model

Decoding of brain imagesrecognizing brain states from fMRI

+

fixation cross

>>

paceresponse

log-evidence of X-Y sparse mappings:effect of lateralization

log-evidence of X-Y bilateral mappings:effect of spatial deployment

fMRI time series analysisspatial priors and model comparison

PPM: regions best explainedby short-term memory model

PPM: regions best explained by long-term memory model

fMRI time series

GLM coeff

prior varianceof GLM coeff

prior varianceof data noise

AR coeff(correlated noise)

short-term memorydesign matrix (X)

long-term memorydesign matrix (X)

m2m1 m3 m4

V1 V5stim

PPC

attention

V1 V5stim

PPC

attention

V1 V5stim

PPC

attention

V1 V5stim

PPC

attention

m1 m2 m3 m4

15

10

5

0

V1 V5stim

PPC

attention

1.25

0.13

0.46

0.39 0.26

0.26

0.10estimated

effective synaptic strengthsfor best model (m4)

models marginal likelihood

ln p y m

Dynamic Causal Modellingnetwork structure identification

1 2

31 2

31 2

3

1 2

3

time

( , , )x f x u

32

21

13

0t

t t

t t

tu

13u 3

u

DCMs and DAGsa note on causality

m1

m2

diff

eren

ces

in lo

g- m

ode

l evi

denc

es

1 2ln lnp y m p y m

subjects

fixed effect

random effect

assume all subjects correspond to the same model

assume different subjects might correspond to different models

Dynamic Causal Modellingmodel comparison for group studies

I thank you for your attention.

A note on statistical significancelessons from the Neyman-Pearson lemma

• Neyman-Pearson lemma: the likelihood ratio (or Bayes factor) test

1

0

p y Hu

p y H

is the most powerful test of size to test the null. 0p u H

MVB (Bayes factor) u=1.09, power=56%

CCA (F-statistics)F=2.20, power=20%

error I rate

1 -

erro

r II

rat

e

ROC analysis

• what is the threshold u, above which the Bayes factor test yields a error I rate of 5%?