J. Daunizeau
Institute of Empirical Research in Economics, Zurich, Switzerland
Brain and Spine Institute, Paris, France
Bayesian inference
Overview of the talk
1 Probabilistic modelling and representation of uncertainty
1.1 Bayesian paradigm
1.2 Hierarchical models
1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods
2.1 Sampling methods
2.2 Variational methods (ReML, EM, VB)
3 SPM applications
3.1 aMRI segmentation
3.2 Decoding of brain images
3.3 Model-based fMRI analysis (with spatial priors)
3.4 Dynamic causal modelling
Overview of the talk
1 Probabilistic modelling and representation of uncertainty
1.1 Bayesian paradigm
1.2 Hierarchical models
1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods
2.1 Sampling methods
2.2 Variational methods (ReML, EM, VB)
3 SPM applications
3.1 aMRI segmentation
3.2 Decoding of brain images
3.3 Model-based fMRI analysis (with spatial priors)
3.4 Dynamic causal modelling
Degree of plausibility desiderata:- should be represented using real numbers (D1)
- should conform with intuition (D2)- should be consistent (D3)
a=2b=5
a=2
• normalization:
• marginalization:
• conditioning :(Bayes rule)
Bayesian paradigmprobability theory: basics
Bayesian paradigmderiving the likelihood function
- Model of data with unknown parameters:
( )y f θ= e.g., GLM: ( )f Xθ θ=
- But data is noisy: ( )y f θ ε= +
- Assume noise/residuals is ‘small’:
( ) 2
2
1exp
2p ε ε
σ
∝ −
( )4 0.05P ε σ> ≈
ε
→ Distribution of data, given fixed parameters:
( ) ( )( )2
2
1exp
2p y y fθ θ
σ
∝ − −
θ
f
Likelihood:
Prior:
Bayes rule:
Bayesian paradigmlikelihood, priors and the model evidence
θ
generative model m
Bayesian paradigmforward and inverse problems
( ),p y mϑ
forward problem
likelihood
( ),p y mϑ
inverse problem
posterior distribution
Principle of parsimony :
« plurality should not be assumed without necessity »
y=f(
x)y
= f
(x)
x
“Occam’s razor” :
model e
vid
en
ce p
(y|m
)
space of all data sets
Model evidence:
Bayesian paradigmmodel comparison
••• hierarchy
causality
Hierarchical modelsprinciple
Hierarchical modelsdirected acyclic graphs (DAGs)
•••
prior densities posterior densities
Hierarchical modelsunivariate linear hierarchical model
( )t t Y≡ t *
( )0*P t t H>
( )0p t H
( )0*P t t H α> ≤if then reject H0
• estimate parameters (obtain test stat.)
H
0:θ = 0• define the null, e.g.:
• apply decision rule, i.e.:
classical SPM
( )p yθ
θ
( )0P H y
( )0P H y α≥if then accept H0
• invert model (obtain posterior pdf)
H
0:θ > 0• define the null, e.g.:
• apply decision rule, i.e.:
Bayesian PPM
Frequentist versus Bayesian inferencea (quick) note on hypothesis testing
Y
• define the null and the alternative hypothesis in terms of priors, e.g.:
( )
( ) ( )
0 0
1 1
1 if 0:
0 otherwise
: 0,
H p H
H p H N
θθ
θ
==
= Σ
( )( )
0
1
1P H y
P H y≤if then reject H0• apply decision rule, i.e.:
y
Frequentist versus Bayesian inferencewhat about bilateral tests?
( )1p Y H
( )0p Y H
space of all datasets
• Savage-Dickey ratios (nested models, i.i.d. priors):
( ) ( )( )( )
1
0 1
1
0 ,
0
p y Hp y H p y H
p H
θ
θ
==
=
Overview of the talk
1 Probabilistic modelling and representation of uncertainty
1.1 Bayesian paradigm
1.2 Hierarchical models
1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods
2.1 Sampling methods
2.2 Variational methods (ReML, EM, VB)
3 SPM applications
3.1 aMRI segmentation
3.2 Decoding of brain images
3.3 Model-based fMRI analysis (with spatial priors)
3.4 Dynamic causal modelling
Sampling methodsMCMC example: Gibbs sampling
Variational methodsVB / EM / ReML
→ VB : maximize the free energy F(q) w.r.t. the “variational” posterior q(θ)
under some (e.g., mean field, Laplace) approximation
( )1 or 2q θ
( )1 or 2,p y mθ
( )1 2, ,p y mθ θ
θ
1
θ
2
Overview of the talk
1 Probabilistic modelling and representation of uncertainty
1.1 Bayesian paradigm
1.2 Hierarchical models
1.3 Frequentist versus Bayesian inference
2 Numerical Bayesian inference methods
2.1 Sampling methods
2.2 Variational methods (ReML, EM, VB)
3 SPM applications
3.1 aMRI segmentation
3.2 Decoding of brain images
3.3 Model-based fMRI analysis (with spatial priors)
3.4 Dynamic causal modelling
realignmentrealignment smoothingsmoothingnormalisationnormalisation general linear modelgeneral linear modeltemplatetemplate Gaussian Gaussian field theoryfield theoryp <0.05p <0.05statisticalstatisticalinferenceinference
segmentationsegmentationsegmentationsegmentationand normalisationand normalisationand normalisationand normalisation dynamic causaldynamic causaldynamic causaldynamic causalmodellingmodellingmodellingmodellingposterior probabilityposterior probabilityposterior probabilityposterior probabilitymaps (PPMs)maps (PPMs)maps (PPMs)maps (PPMs) multivariatemultivariatemultivariatemultivariatedecodingdecodingdecodingdecoding
grey matter CSFwhite matter
…
…
yi
ci λ
µk
µ2
µ1
σ1σ 2 σ k
class variances
class
means
ith voxel
value
ith voxellabel
class
frequencies
aMRI segmentationmixture of Gaussians (MoG) model
Decoding of brain imagesrecognizing brain states from fMRI
+
fixation cross
>>
paceresponse
log-evidence of X-Y sparse mappings:effect of lateralization
log-evidence of X-Y bilateral mappings:effect of spatial deployment
fMRI time series analysisspatial priors and model comparison
PPM: regions best explainedby short-term memory model
PPM: regions best explained by long-term memory model
fMRI time series
GLM coeff
prior variance
of GLM coeff
prior variance
of data noiseAR coeff
(correlated noise)
short-term memorydesign matrix (X)
long-term memorydesign matrix (X)
m2m1 m3 m4V1V1V1V1 V5V5V5V5stim PPCPPCPPCPPCattentionV1V1V1V1 V5V5V5V5stim PPCPPCPPCPPCattention V1V1V1V1 V5V5V5V5stim PPCPPCPPCPPCattention V1V1V1V1 V5V5V5V5stim PPCPPCPPCPPCattentionm1 m2 m3 m4151050 V1V1V1V1 V5V5V5V5stim PPCPPCPPCPPCattention1.251.251.251.250.130.130.130.130.460.460.460.46 0.390.390.390.390.260.260.260.260.260.260.260.26 0.100.100.100.10 estimatedeffective synaptic strengthsfor best model (m4)models marginal likelihood
ln p y m( )
Dynamic Causal Modellingnetwork structure identification
1 2
31 2
31 2
3
1 2
3
time
( , , )x f x u θ=&
32θ
21θ
13θ
0t∆ →
t t+ ∆
t t− ∆
tu
13
uθ3
uθ
DCMs and DAGsa note on causality
m1
m2
diff
ere
nces in
lo
g-
mo
de
l e
vid
en
ces
( ) ( )1 2ln lnp y m p y m−
subjects
fixed effect
random effect
assume all subjects correspond to the same model
assume different subjects might correspond to different models
Dynamic Causal Modellingmodel comparison for group studies
I thank you for your attention.
A note on statistical significancelessons from the Neyman-Pearson lemma
• Neyman-Pearson lemma: the likelihood ratio (or Bayes factor) test
( )( )
1
0
p y Hu
p y HΛ = ≥
is the most powerful test of size to test the null. ( )0p u Hα = Λ ≥
MVB (Bayes factor) u=1.09, power=56%
CCA (F-statistics)
F=2.20, power=20%
error I rate
1 -
err
or
II r
ate
ROC analysis
• what is the threshold u, above which the Bayes factor test yields a error I rate of 5%?