+ All Categories
Home > Documents > Bayesian inference and Bayesian model selection · How is the posterior computed = how is a...

Bayesian inference and Bayesian model selection · How is the posterior computed = how is a...

Date post: 01-Oct-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
66
Bayesian inference and Bayesian model selection Klaas Enno Stephan
Transcript
Page 1: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Bayesian inference and Bayesian model

selection

Klaas Enno Stephan

Page 2: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Lecture as part of "Methods & Models for fMRI data analysis",

University of Zurich & ETH Zurich, 27 November 2018

With slides from and many thanks to:

Kay Brodersen,

Will Penny,

Sudhir Shankar Raman

Page 3: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Why should I know about Bayesian inference?

Because Bayesian principles are fundamental for

• statistical inference in general

• system identification

• translational neuromodeling ("computational assays")

– computational psychiatry

– computational neurology

– computational psychosomatics

• contemporary theories of brain function (the "Bayesian brain")

– predictive coding

– free energy principle

– active inference

Page 4: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Why should I know about Bayesian inference?

Because Bayesian principles are fundamental for

• statistical inference in general

• system identification

• translational neuromodeling ("computational assays")

– computational psychiatry

– computational neurology

– computational psychosomatics

• contemporary theories of brain function (the "Bayesian brain")

– predictive coding

– free energy principle

– active inference

Page 5: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

posterior = likelihood ∙ prior / evidence

Bayes' theorem

The Reverend Thomas Bayes

(1702-1761)

( | ) ( )( | )

( )

p y pp y

p y

“... the theorem expresses

how a ... degree of belief

should rationally change to

account for availability of

related evidence."

Wikipedia

Page 6: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Bayes' theorem

The Reverend Thomas Bayes

(1702-1761)

( | ) ( )( | )

( | ) ( )

p y pp y

p y p

“... the theorem expresses

how a ... degree of belief

should rationally change to

account for availability of

related evidence."

Wikipediaposterior = likelihood ∙ prior / evidence

Page 7: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Bayesian inference: an animation

Page 8: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

The evidence term

( | ) ( )( | )

( | ) ( )

p y pp y

p y p

( | ) ( )( | )

( | ) ( )

p y pp y

p y p

discrete

continuous

Page 9: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Bayesian inference: An example (with fictitious probabilities)

0,1

( 1| 1) ( 1)( 1| 1)

( 1| ) ( )j

p y pp y

p y j p j

• symptom:

y=1: fever

y=0: no fever

• disease:

=1: Ebola

=0: any other disease (AOD)

• A priori: p(Ebola) =10-6

p(AOD) = (1-10-6)

• A patient presents with fever. What is the probability that he/she has ebola?

99.99% 20%

0.01% 80%

1: ebola 0: AOD

1: fever

0: no fever

y

p(y|)

Page 10: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Bayesian inference: An example (with fictitious probabilities)

• symptom:

y=1: fever

y=0: no fever

• disease:

=1: Ebola

=0: any other disease (AOD)

• A priori: p(Ebola) =10-6

p(AOD) = (1-10-6)

• A patient presents with fever. What is the probability that he/she has ebola?

99.99% 20%

0.01% 80%

1: ebola 0: AOD

1: fever

0: no fever

66

6 6

0.999 10( 1| 1) 4.995 10

0.999 10 0.2 (1 10 )p y

y

p(y|)

Page 11: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

1. specify the joint probability over data (observations) and parameters

2. enforce mechanistic thinking: how could the data have been caused?

3. generate synthetic data (observations) by sampling from the prior – can

model explain certain phenomena at all?

4. inference about parameters → p(|y)

5. model evidence p(y|m): index of model quality

Generative models

( | , )p y m

( | , )p y m ( | )p m

Page 12: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Observation of data

Formulation of a generative model

Model inversion – updating one's beliefs

( | ) ( | ) ( )p y p y p

Model

likelihood function p(y|)

prior distribution p()

Measurement data y

posterior

model evidence

Bayesian inference in practice

Page 13: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Priors• Objective priors:

– "non-informative" priors

– objective constraints (e.g., non-negativity)

• Subjective priors:

– subjective but not arbitrary

– can express beliefs that result from

understanding of the problem or system

– can be result of previous empirical results

• Shrinkage priors:

– emphasize regularization and sparsity

• Empirical priors:

– learn parameters of prior distributions from

the data ("empirical Bayes")

– rest on a hierarchical model

Example of a shrinkage prior

Page 14: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

A generative modelling framework for fMRI & EEG:

Dynamic causal modeling (DCM)

Friston et al. 2003, NeuroImage

( , , )dx

f x udt

),(xgy

Model inversion:

Inferring neuronal

parameters

EEG, MEG fMRI

Forward model:

Predicting measured

activity

dwMRI

Stephan et al. 2009, NeuroImage

Page 15: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Stephan et al. 2015,

Neuron

DCM for

fMRI

Page 16: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

0 10 20 30 40 50 60 70 80 90 100

0

0.1

0.2

0.3

0.4

0 10 20 30 40 50 60 70 80 90 100

0

0.2

0.4

0.6

0 10 20 30 40 50 60 70 80 90 100

0

0.1

0.2

0.3

Neural population activity

0 10 20 30 40 50 60 70 80 90 100

0

1

2

3

0 10 20 30 40 50 60 70 80 90 100-1

0

1

2

3

4

0 10 20 30 40 50 60 70 80 90 100

0

1

2

3

fMRI signal change (%)

x1 x2

x3

CuxDxBuAdt

dx n

j

j

j

m

i

i

i

1

)(

1

)(

Nonlinear Dynamic Causal Model for fMRI

Stephan et al. 2008, NeuroImage

u1

u2

Page 17: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

( , , )

( , )

dx dt f x u

y g x

)|(

),(),|(),|(

)(),|()|(

myp

mpmypmyp

dpmypmyp

),(),(

))(),((),|(

Nmp

gNmyp

Invert model

Make inferences

Define likelihood model

Specify priors

Neural dynamics

Observer function

Design experimental inputs)(tu

Inference on model

structure

Inference on parameters

Bayesian system

identification

Page 18: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Why should I know about Bayesian inference?

Because Bayesian principles are fundamental for

• statistical inference in general

• system identification

• translational neuromodeling ("computational assays")

– computational psychiatry

– computational neurology

– computational psychosomatics

• contemporary theories of brain function (the "Bayesian brain")

– predictive coding

– free energy principle

– active inference

Page 19: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Generative models as "computational assays"

( | , )p y m

( | , )p y m ( | )p m

( | , )p y m

( | , )p y m ( | )p m

Page 20: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Application to brain activity and

behaviour of individual patients

Computational assays:

Models of disease mechanisms

Detecting physiological subgroups

(based on inferred mechanisms)

Translational Neuromodeling

Individual treatment prediction

disease mechanism A

disease mechanism B

disease mechanism C

( , , )dx

f x udt

Stephan et al. 2015, Neuron

Page 21: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Generative embedding (supervised)

Brodersen et al. 2011, PLoS Comput. Biol.

step 2 —

kernel construction

step 1 —

model inversion

measurements from

an individual subject

subject-specific

inverted generative model

subject representation in the

generative score space

A → B

A → C

B → B

B → C

A

CB

step 3 —

support vector classification

separating hyperplane fitted to

discriminate between groups

A

CB

jointly discriminative

model parameters

step 4 —

interpretation

Page 22: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Generative embedding (unsupervised)

Brodersen et al. 2014, NeuroImage Clinical

Page 23: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

SYMPTOM

(behaviour

or physiology)

HYPOTHETICAL

MECHANISM...

( | )kp m y( | , )kp y m

y

1m Kmkm ...

Differential diagnosis by model selection

( | ) ( )( | y)

( | ) ( )

k kk

k k

k

p y m p mp m

p y m p m

Stephan et al. 2017, NeuroImage

Page 24: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Why should I know about Bayesian inference?

Because Bayesian principles are fundamental for

• statistical inference in general

• system identification

• translational neuromodeling ("computational assays")

– computational psychiatry

– computational neurology

– computational psychosomatics

• contemporary theories of brain function (the "Bayesian brain")

– predictive coding

– free energy principle

– active inference

Page 25: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Perception = inversion of a hierarchical generative model

environm. states

others' mental states

bodily states

( | , )p x y m

( | , ) ( | )p y x m p x mforward model

perception

neuronal states

Page 27: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

How is the posterior computed =

how is a generative model inverted?

Bayesian Inference

Approximate Inference

Variational Bayes

MCMC Sampling

Analytical solutions

Page 28: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

How is the posterior computed = how is a generative

model inverted?

• compute the posterior analytically

– requires conjugate priors

• variational Bayes (VB)

– often hard work to derive, but fast to compute

– uses approximations (approx. posterior, mean field)

– problems: local minima, potentially inaccurate approximations

• Sampling: Markov Chain Monte Carlo (MCMC)

– theoretically guaranteed to be accurate (for infinite computation time)

– problems: may require very long run time in practice, convergence difficult

to prove

Page 29: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Conjugate priors

• for a given likelihood function, the choice of prior determines the algebraic

form of the posterior

• for some probability distributions a prior can be found such that the posterior

has the same algebraic form as the prior

• such a prior is called “conjugate” to the likelihood

• examples:

– Normal × Normal Normal

– Beta × Binomial Beta

– Dirichlet × Multinomial Dirichlet

( | ) ( | ) ( )p p pθ y y θ θ

same form

Page 30: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Likelihood & Prior

Posterior:

Posterior mean =

variance-weighted combination of

prior mean and data mean

Prior

Likelihood

Posterior

y

Posterior mean & variance of univariate Gaussians

p

2

2

( | ) ( , )

( ) ( , )

e

p p

p y N

p N

2( | ) ( , )p y N

p

pe

pe

22

2

222

11

111

Page 31: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Likelihood & prior

Posterior:

Prior

Likelihood

Posterior

Same thing – but expressed as precision weighting

p

1

1

( | ) ( , )

( ) ( , )

e

p p

p y N

p N

1( | ) ( , )p y N

p

pe

pe

Relative precision weighting

y

Page 32: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Variational Bayes (VB)

best proxy

𝑞 𝜃

trueposterior

𝑝 𝜃 𝑦

hypothesisclass

divergence

KL 𝑞||𝑝

Idea: find an approximate density 𝑞(𝜃) that is maximally similar to the true

posterior 𝑝 𝜃 𝑦 .

This is often done by assuming a particular form for 𝑞 (fixed form VB) and

then optimizing its sufficient statistics.

Page 33: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Kullback–Leibler (KL) divergence

• asymmetric measure of the difference

between two probability distributions P

and Q

• Interpretations of DKL(P‖Q):

– "Bayesian surprise" when Q=prior,

P=posterior: measure of the

information gained when one

updates one's prior beliefs to the

posterior P

– a measure of the information lost

when Q is used to approximate P

• non-negative: 0 (zero when P=Q)

Page 34: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Variational calculus

Standard calculusNewton, Leibniz, and

others

• functions

𝑓: 𝑥 ↦ 𝑓 𝑥

• derivatives d𝑓d𝑥

Example: maximize

the likelihood

expression 𝑝 𝑦 𝜃w.r.t. 𝜃

Variational

calculusEuler, Lagrange, and

others

• functionals

𝐹: 𝑓 ↦ 𝐹 𝑓

• derivatives d𝐹d𝑓

Example: maximize

the entropy 𝐻 𝑝w.r.t. a probability

distribution 𝑝 𝑥

Leonhard Euler(1707 – 1783)

Swiss mathematician, ‘Elementa Calculi

Variationum’

Page 35: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Variational Bayes

𝐹 𝑞 is a functional wrt. the

approximate posterior 𝑞 𝜃 .

Maximizing 𝐹 𝑞, 𝑦 is equivalent to:

• minimizing KL[𝑞| 𝑝

• tightening 𝐹 𝑞, 𝑦 as a lower

bound to the log model evidence

When 𝐹 𝑞, 𝑦 is maximized, 𝑞 𝜃 is

our best estimate of the posterior.

ln𝑝(𝑦) = KL[𝑞| 𝑝 + 𝐹 𝑞, 𝑦

divergence ≥ 0

(unknown)

neg. free energy

(easy to evaluate for a given 𝑞)

KL[𝑞| 𝑝

ln 𝑝 𝑦 ∗

𝐹 𝑞, 𝑦

KL[𝑞| 𝑝

ln 𝑝 𝑦

𝐹 𝑞, 𝑦

initialization …

… convergence

Page 36: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Derivation of the (negative) free energy approximation

• See whiteboard!

• (or Appendix to Stephan et al. 2007, NeuroImage 38: 387-401)

Page 37: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Mean field assumption

Factorize the approximate

posterior 𝑞 𝜃 into independent

partitions:

𝑞 𝜃 =

𝑖

𝑞𝑖 𝜃𝑖

where 𝑞𝑖 𝜃𝑖 is the approximate

posterior for the 𝑖th subset of

parameters.

For example, split parameters

and hyperparameters:𝜃1

𝜃2

𝑞 𝜃1 𝑞 𝜃2

Jean Daunizeau, www.fil.ion.ucl.ac.uk/ ~jdaunize/presentations/Bayes2.pdf , | ,p y q q q

Page 38: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

VB in a nutshell (under mean-field approximation)

, | ,p y q q q

( )

( )

exp exp ln , ,

exp exp ln , ,

q

q

q I p y

q I p y

Iterative updating of sufficient statistics of approx. posteriors by

gradient ascent.

ln | , , , |

ln | , , , , |q

p y m F KL q p y

F p y KL q p m

Mean field approx.

Neg. free-energy

approx. to model

evidence.

Maximise neg. free

energy wrt. q =

minimise divergence,

by maximising

variational energies

Page 39: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Model comparison and selection

Given competing hypotheses

on structure & functional

mechanisms of a system, which

model is the best?

For which model m does p(y|m)

become maximal?

Which model represents the

best balance between model

fit and model complexity?

Pitt & Miyung (2002) TICS

Page 40: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Bayesian model selection (BMS)

Posterior model probability

||

|

|m

p y m p mp m y

p y

p y m p m

p y m p m

• First step of inference: define model

space M

• Inference on model structure m:

• For a uniform prior on m, model

evidence sufficient for model

selection

[1, [M

( | ) ( | , ) ( | ) p y m p y m p m d

Model evidence:

Page 41: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

( | ) ( | , ) ( | ) p y m p y m p m d

Model evidence:

Various approximations:

- negative free energy (F)

- Akaike Information Criterion (AIC)

- Bayesian Information Criterion (BIC)

Bayesian model selection (BMS)

probability that data were

generated by model m,

averaging over all possible

parameter values (as specified

by the prior)

accounts for both accuracy and

complexity of the model

all possible datasets

y

p(y|m)

Ghahramani 2004

Page 42: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

( | ) ( | , ) ( | ) p y m p y m p m d

Model evidence:

Bayesian model selection (BMS)

“If I randomly sampled from my

prior and plugged the resulting

value into the likelihood

function, how close would the

predicted data be – on average

– to my observed data?”

accounts for both accuracy and

complexity of the model

all possible datasets

y

p(y|m)

Ghahramani 2004

Various approximations:

- negative free energy (F)

- Akaike Information Criterion (AIC)

- Bayesian Information Criterion (BIC)

Page 43: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

pmypAIC ),|(log

Logarithm is a

monotonic function

Maximizing log model evidence

= Maximizing model evidence

)(),|(log

)()( )|(log

mcomplexitymyp

mcomplexitymaccuracymyp

Np

mypBIC log2

),|(log

Akaike Information Criterion:

Bayesian Information Criterion:

Log model evidence = balance between fit and complexity

Approximations to the model evidence

No. of

parameters

No. of

data points

Page 44: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

The (negative) free energy approximation F

log ( | , ) , |

accuracy complexity

F p y m KL q p m

KL[𝑞| 𝑝

ln 𝑝 𝑦|𝑚

𝐹 𝑞, 𝑦 log ( | ) , | ,p y m F KL q p y m

Like AIC/BIC, F is an accuracy/complexity tradeoff:

F is a lower bound on the log model evidence:

Page 45: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

The (negative) free energy approximation

• Log evidence is thus expected log likelihood (wrt. q) plus 2 KL's:

log ( | )

log ( | , ) , | , | ,

p y m

p y m KL q p m KL q p y m

log ( | ) , | ,

log ( | , ) , |

accuracy complexity

F p y m KL q p y m

p y m KL q p m

Page 46: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

The complexity term in F

• In contrast to AIC & BIC, the complexity term of the negative free energy F

accounts for parameter interdependencies.

Under Gaussian assumptions about the posterior (Laplace approximation):

• The complexity term of F is higher

– the more independent the prior parameters ( effective DFs)

– the more dependent the posterior parameters

– the more the posterior mean deviates from the prior mean

y

T

yy CCC

mpqKL

|

1

||2

1ln

2

1ln

2

1

)|(),(

Page 47: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Bayes factors

)|(

)|(

2

112

myp

mypB

positive value, [0;[

But: the log evidence is just some number – not very intuitive!

A more intuitive interpretation of model comparisons is made possible by Bayes

factors:

To compare two models, we could just compare their log evidences.

B12 p(m1|y) Evidence

1 to 3 50-75% weak

3 to 20 75-95% positive

20 to 150 95-99% strong

150 99% Very strong

Kass & Raftery classification:

Kass & Raftery 1995, J. Am. Stat. Assoc.

Page 48: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Fixed effects BMS at group level

Group Bayes factor (GBF) for 1...K subjects:

Average Bayes factor (ABF):

Problems:

- blind with regard to group heterogeneity

- sensitive to outliers

k

k

ijij BFGBF )(

( )kKij ij

k

ABF BF

Page 49: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

)|(~ 111 mypy)|(~ 111 mypy

)|(~ 222 mypy)|(~ 111 mypy

)|(~ pmpm kk

);(~ rDirr

)|(~ pmpm kk )|(~ pmpm kk),1;(~1 rmMultm

Random effects BMS for heterogeneous groups

Dirichlet parameters

= “occurrences” of models in the population

Dirichlet distribution of model probabilities r

Multinomial distribution of model labels m

Measured data y

Model inversion

by Variational

Bayes or MCMC

Stephan et al. 2009, NeuroImage

Page 50: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Random effects BMS for heterogeneous groups

k = 1...K

n = 1...N

mnk

yn

rk

Dirichlet parameters

= “occurrences” of models in the population

Dirichlet distribution of model probabilities r

Multinomial distribution of model labels m

Measured data y

Model inversion

by Variational

Bayes or MCMC

Stephan et al. 2009, NeuroImage

Page 51: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Four equivalent options for reporting model ranking by

random effects BMS

1. Dirichlet parameter estimates

2. expected posterior probability of

obtaining the k-th model for any

randomly selected subject

3. exceedance probability that a

particular model k is more likely than

any other model (of the K models

tested), given the group data

4. protected exceedance probability:

see below

)( 1 Kkqkr

{1... }, {1... | }:

( | ; )k k j

k K j K j k

p r r y

Page 52: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

-35 -30 -25 -20 -15 -10 -5 0 5

Su

bje

cts

Log model evidence differences

MOG

LG LG

RVF

stim.

LVF

stim.

FGFG

LD|RVF

LD|LVF

LD LD

MOGMOG

LG LG

RVF

stim.

LVF

stim.

FGFG

LD

LD

LD|RVF LD|LVF

MOG

m2 m1

m1m2

Data: Stephan et al. 2003, Science

Models: Stephan et al. 2007, J. Neurosci.

Example: Hemispheric interactions during vision

Page 53: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

r1

p(r

1|y

)

p(r1>0.5 | y) = 0.997

m1m2

1

1

11.8

84.3%r

2

2

2.2

15.7%r

%7.9921 rrp

Stephan et al. 2009a, NeuroImage

Page 54: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Example: Synaesthesia

• “projectors” experience

color externally colocalized

with a presented grapheme

• “associators” report an

internally evoked

association

• across all subjects: no

evidence for either model

• but BMS results map

precisely onto projectors

(bottom-up mechanisms)

and associators (top-down)

van Leeuwen et al. 2011, J. Neurosci.

Page 55: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Overfitting at the level of models

• #models risk of overfitting

• solutions:

– regularisation: definition of model

space = choosing priors p(m)

– family-level BMS

– Bayesian model averaging (BMA)

|

| , |m

p y

p y m p m y

|

| ( )

| ( )m

p m y

p y m p m

p y m p m

posterior model probability:

BMA:

Page 56: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Model space partitioning: comparing model families

• partitioning model space into K subsets

or families:

• pooling information over all models in

these subsets allows one to compute

the probability of a model family, given

the data

• effectively removes uncertainty about

any aspect of model structure, other

than the attribute of interest (which

defines the partition)

Stephan et al. 2009, NeuroImage

Penny et al. 2010, PLoS Comput. Biol.

1,..., KM f f

kp f

Page 57: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Family-level inference: random effects – a special case

• When the families are of equal size, one can simply sum the posterior model

probabilities within families by exploiting the agglomerative property of the

Dirichlet distribution:

Stephan et al. 2009, NeuroImage

1 2

1 2

1 2 1 2

* * *

1 2

* * *

1 2

, ,..., ~ , ,...,

, ,...,

~ , ,...,

J

J

K K

k k J k

k N k N k N

k k J k

k N k N k N

r r r Dir

r r r r r r

Dir

Page 58: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

r1

p(r

1|y

)

p(r1>0.5 | y) = 0.986

Model space partitioning:

comparing model families

0

2

4

6

8

10

12

14

16

alp

ha

0

2

4

6

8

10

12

alp

ha

0

20

40

60

80

Su

mm

ed

lo

g e

vid

en

ce

(re

l. t

o R

BM

L)

CBMNCBMN(ε) CBML CBML(ε)RBMN

RBMN(ε) RBML RBML(ε)

CBMNCBMN(ε) CBML CBML(ε)RBMN

RBMN(ε) RBML RBML(ε)

nonlinear models linear models

FFX

RFX

4

1

*

1

k

k

8

5

*

2

k

k

nonlinear linear

log

GBF

Model

space

partitioning

1 73.5%r 2 26.5%r

1 2 98.6%p r r

m1m2

m1m2

Stephan et al. 2009, NeuroImage

Page 59: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Bayesian Model Averaging (BMA)

• abandons dependence of parameter

inference on a single model and takes into

account model uncertainty

• uses the entire model space considered (or

an optimal family of models)

• averages parameter estimates, weighted

by posterior model probabilities

• represents a particularly useful alternative

– when none of the models (or model

subspaces) considered clearly

outperforms all others

– when comparing groups for which the

optimal model differs

|

| , |m

p y

p y m p m y

Penny et al. 2010, PLoS Comput. Biol.

1..

1..

|

| , |

n N

n n N

m

p y

p y m p m y

NB: p(m|y1..N) can be obtained

by either FFX or RFX BMS

single-subject BMA:

group-level BMA:

Page 60: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Schmidt et al. 2013, JAMA Psychiatry

Prefrontal-parietal connectivity during

working memory in schizophrenia

• 17 at-risk mental

state (ARMS)

individuals

• 21 first-episode

patients

(13 non-treated)

• 20 controls

Page 61: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Schmidt et al. 2013, JAMA Psychiatry

BMS results for all groups

Page 62: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

BMA results: PFC PPC connectivity

Schmidt et al. 2013, JAMA Psychiatry

17 ARMS, 21 first-episode (13 non-treated),

20 controls

Page 63: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Protected exceedance probability:

Using BMA to protect against chance findings

• EPs express our confidence that the posterior probabilities of models are

different – under the hypothesis H1 that models differ in probability: rk1/K

• does not account for possibility "null hypothesis" H0: rk=1/K

• Bayesian omnibus risk (BOR) of wrongly accepting H1 over H0:

• protected EP: Bayesian model averaging over H0 and H1:

Rigoux et al. 2014, NeuroImage

Page 64: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

inference on model structure or inference on model parameters?

inference on

individual models or model space partition?

comparison of model

families using

FFX or RFX BMS

optimal model structure assumed

to be identical across subjects?

FFX BMS RFX BMS

yes no

inference on

parameters of an optimal model or parameters of all models?

BMA

definition of model space

FFX analysis of

parameter estimates

(e.g. BPA)

RFX analysis of

parameter estimates

(e.g. t-test, ANOVA)

optimal model structure assumed

to be identical across subjects?

FFX BMS

yes no

RFX BMS

Stephan et al. 2010, NeuroImage

Page 65: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Further reading

• Penny WD, Stephan KE, Mechelli A, Friston KJ (2004) Comparing dynamic causal models. NeuroImage

22:1157-1172.

• Penny WD, Stephan KE, Daunizeau J, Joao M, Friston K, Schofield T, Leff AP (2010) Comparing Families of

Dynamic Causal Models. PLoS Computational Biology 6: e1000709.

• Penny WD (2012) Comparing dynamic causal models using AIC, BIC and free energy. Neuroimage 59: 319-

330.

• Rigoux L, Stephan KE, Friston KJ, Daunizeau J (2014) Bayesian model selection for group studies – revisited.

NeuroImage 84: 971-985.

• Stephan KE, Weiskopf N, Drysdale PM, Robinson PA, Friston KJ (2007) Comparing hemodynamic models with

DCM. NeuroImage 38:387-401.

• Stephan KE, Penny WD, Daunizeau J, Moran RJ, Friston KJ (2009) Bayesian model selection for group

studies. NeuroImage 46:1004-1017.

• Stephan KE, Penny WD, Moran RJ, den Ouden HEM, Daunizeau J, Friston KJ (2010) Ten simple rules for

Dynamic Causal Modelling. NeuroImage 49: 3099-3109.

• Stephan KE, Iglesias S, Heinzle J, Diaconescu AO (2015) Translational Perspectives for Computational

Neuroimaging. Neuron 87: 716-732.

• Stephan KE, Schlagenhauf F, Huys QJM, Raman S, Aponte EA, Brodersen KH, Rigoux L, Moran RJ,

Daunizeau J, Dolan RJ, Friston KJ, Heinz A (2017) Computational Neuroimaging Strategies for Single Patient

Predictions. NeuroImage 145: 180-199.

Page 66: Bayesian inference and Bayesian model selection · How is the posterior computed = how is a generative model inverted? • compute the posterior analytically – requires conjugate

Thank you


Recommended