J. Daunizeau
ICM, Paris, France TNU, Zurich, Switzerland
An introduction to Bayesian inference and model comparison
Overview of the talk
9 An introduction to probabilistic modelling
9 Bayesian model comparison
9 SPM applications
Overview of the talk
9 An introduction to probabilistic modelling
9 Bayesian model comparison
9 SPM applications
Degree of plausibility desiderata: - should be represented using real numbers (D1) - should conform with intuition (D2) - should be consistent (D3)
a=2 b=5
a=2
• normalization:
• marginalization:
• conditioning : (Bayes rule)
Probability theory: basics
Deriving the likelihood function
- Model of data with unknown parameters:
� �y f T e.g., GLM: � �f XT T
- But data is noisy: � �y f T H �
- Assume noise/residuals is ‘small’:
� � 22
1exp
2p H H
V§ ·v �¨ ¸© ¹
� �4 0.05P H V! |
H
→ Distribution of data, given fixed parameters:
� � � �� �2
2
1exp
2p y y fT T
V§ ·v � �¨ ¸© ¹
T
f
Forward and inverse problems
� �,p y m-
forward problem
likelihood
� �,p y m-
inverse problem
posterior distribution model data
Likelihood:
Prior:
Bayes rule:
Likelihood, priors and the model evidence
T
generative model m
Principle of parsimony : « plurality should not be assumed without necessity »
y=f(x
) y
= f(
x)
x
“Occam’s razor” :
mod
el e
vide
nce
p(y|
m)
space of all data sets
Model evidence:
Bayesian model comparison
••• inference
causality
Hierarchical models
Directed acyclic graphs (DAGs)
Variational approximations (VB, EM, ReML)
→ VB : maximize the free energy F(q) w.r.t. the approximate posterior q(θ) under some (e.g., mean field, Laplace) simplifying constraint
� �1 or 2q T
� �1 or 2 ,p y mT
� �1 2, ,p y mT T
T1
T2
� � � � � �� �
� � � �� �Free energy
ln | ln , | , ;q
F q
p y m p y m S q KL p y m qT T T � �
Overview of the talk
9 An introduction to probabilistic modelling
9 Bayesian model comparison
9 SPM applications
� �t t Y{ t *
� �0*P t t H!
� �0p t H
� �0*P t t H D! dif then reject H0
• estimate parameters (obtain test stat.)
H0 :T 0• define the null, e.g.:
• apply decision rule, i.e.:
classical (null) hypothesis testing
• define two alternative models, e.g.:
• apply decision rule, e.g.:
Bayesian Model Comparison
Frequentist versus Bayesian inference
Y y
� �1p Y m
� �0p Y m
space of all datasets
if then accept m0 � �� �
0
1
P m yP m y
Dt
� �
� � � �
0 0
1 1
1 if 0:
0 otherwise
: 0,
m p m
m p m N
TT
T
®¯
6
Family-level inference
A B A B
A B
u
A B
u
P(m1|y) = 0.04 P(m2|y) = 0.25
P(m2|y) = 0.7 P(m2|y) = 0.01
� � � �1 1 max
0.3m
P e y P m y �
model selection error risk:
Family-level inference
A B A B
A B
u
A B
u
P(m1|y) = 0.04 P(m2|y) = 0.25
P(m2|y) = 0.7 P(m2|y) = 0.01
� � � �1 1 max
0.3m
P e y P m y �
model selection error risk:
P(f2|y) = 0.95 P(f1|y) = 0.05
� � � �1 1 max
0.05f
P e y P f y �
family inference (pool statistical evidence)
� � � �m f
P f y P m y�
¦
Sampling subjects as marbles in an urn
10
i
i
mm
® ¯
→ ith marble is blue
→ ith marble is purple
→ (binomial) probability of drawing a set of n marbles:
� � � �11
1 ii
nmm
i
p m r r r �
��
Thus, our belief about the proportion of blue marbles is:
� � � � � �� � 1
1
11
11 ii
n p r nmm
iii
p r m p r r r E r m mn
v�
v � � ª º ¬ ¼ ¦�
r = proportion of blue marbles in the urn
r
1m 2m nm…
Group-level model comparison
At least, we can measure how likely is the ith subject’s data under each model!
� � � � � � � �1
,n
i i ii
p r m y p r p y m p m r
v �
� �i ip y m � �n np y m� �1 1p y m � �2 2p y m
… …
r
1m 2m nm
ny2y1y
…
… � � � �,m
p r y p r m y ¦Our belief about the proportion of models is:
Exceedance probability: � �'k k k kP r r yM z !
Overview of the talk
9 An introduction to probabilistic modelling
9 Bayesian model comparison
9 SPM applications
realignment smoothing
normalisation
general linear model
template
Gaussian field theory
p <0.05
statistical inference
segmentation and normalisation
dynamic causal modelling
posterior probability maps (PPMs)
multivariate decoding
grey matter CSF white matter
…
…
yi ci O
Pk
P2
P1
V1 V 2 V k
class variances
class means
ith voxel value
ith voxel label
class frequencies
aMRI segmentation mixture of Gaussians (MoG) model
Decoding of brain images recognizing brain states from fMRI
+
fixation cross
>>
pace response
log-evidence of X-Y sparse mappings: effect of lateralization
log-evidence of X-Y bilateral mappings: effect of spatial deployment
fMRI time series analysis spatial priors and model comparison
PPM: regions best explained by short-term memory model
PPM: regions best explained by long-term memory model
fMRI time series
GLM coeff
prior variance of GLM coeff
prior variance of data noise
AR coeff (correlated noise)
short-term memory design matrix (X)
long-term memory design matrix (X)
m2 m1 m3 m4
V1 V5 stim
PPC
attention
V1 V5 stim
PPC
attention
V1 V5 stim
PPC
attention
V1 V5 stim
PPC
attention
m1 m2 m3 m4
15
10
5
0
V1 V5 stim
PPC
attention
1.25
0.13
0.46
0.39 0.26
0.26
0.10 estimated
effective synaptic strengths for best model (m4)
models marginal likelihood ln p y m� �
Dynamic Causal Modelling network structure identification
SPM: frequentist vs Bayesian RFX analysis
-2
-1
0
1
2
0 1M M0.05p �
-2
-1
0
1
2
0.05p !0 1M M -2
-1
0
1
2
0.05p !0 1M M
-2
-1
0
1
2
0 1M M
0.05p �
0 ?T
subjects
para
met
er e
stim
ates
I thank you for your attention.
A note on statistical significance lessons from the Neyman-Pearson lemma
• Neyman-Pearson lemma: the likelihood ratio (or Bayes factor) test
� �� �
1
0
p y Hu
p y H/ t
is the most powerful test of size to test the null. � �0p u HD / t
MVB (Bayes factor) u=1.09, power=56%
CCA (F-statistics) F=2.20, power=20%
error I rate
1 - e
rror I
I rat
e
ROC analysis
• what is the threshold u, above which the Bayes factor test yields a error I rate of 5%?