Date post: | 28-Mar-2015 |
Category: |
Documents |
Upload: | caleb-moran |
View: | 213 times |
Download: | 1 times |
Bayesian Inference
Lee Harrison
York Neuroimaging Centre
14 / 05 / 2010
realignmentrealignment smoothingsmoothing
normalisationnormalisation
general linear modelgeneral linear model
templatetemplate
Gaussian Gaussian field theoryfield theory
p <0.05p <0.05
statisticalstatisticalinferenceinference
Bayesian segmentationand normalisation
Bayesian segmentationand normalisation
Spatial priorson activation extent
Spatial priorson activation extent
Dynamic CausalModelling
Dynamic CausalModelling
Posterior probabilitymaps (PPMs)
Posterior probabilitymaps (PPMs)
Paradigm Results
Attention to Motion
Büchel & Friston 1997, Cereb. CortexBüchel et al. 1998, Brain
V5+
SPCV3A
Attention – No attention
- fixation only- observe static dots+ photic V1- observe moving dots + motion V5- task on moving dots + attention V5 + parietal cortex
Paradigm
- fixation only
- observe static dots- observe moving dots- task on moving dots
Dynamic Causal Models
Attention to Motion
V1
V5
SPC
Motion
Photic
Attention
V1
V5
SPC
Motion
PhoticAttention
Model 1 (forward):attentional modulationof V1→V5: forward
Model 2 (backward):attentional modulationof SPC→V5: backward
Bayesian model selection: Which model is optimal?
Responses to Uncertainty
Long term memory
Short term memory
Responses to Uncertainty
Paradigm Stimuli sequence of randomly sampled discrete events
Model simple computational model of an observers
response to uncertainty based on the number of
past events (extent of memory)
Question which regions are best explained by
short / long term memory model?
…
1 2 40 trials
1 432
??
Overview
• Introductory remarks
• Some probability densities/distributions
• Probabilistic (generative) models
• Bayesian inference
• A simple example – Bayesian linear regression
• SPM applications– Segmentation – Dynamic causal modeling– Spatial models of fMRI time series
k=2
Probability distributions and densities
Probability distributions and densities
k=2
Probability distributions and densities
k=2
Probability distributions and densities
k
kkp 1)|(
))(exp()|( 202
1 p
2
1
2
k=2
Probability distributions and densities
k=2
Probability distributions and densities
k=2
Probability distributions and densities
k=2
y
1
2
1
2
3
Generative models
111
2221
1332 ),0(~
eXy
eX
Ne
estimation
y
time
space
2
generation
?)(q
space
1
},{
)()|()|( pypyp )()|()|( pypyp posterior likelihood ∙ prior
)|( yp )|( yp )(p )(p
Bayes theorem allows one to formally incorporate prior knowledge into computing statistical probabilities.
The “posterior” probability of the parameters given the data is an optimal combination of prior knowledge and new data, weighted by their relative precision.
new data prior knowledge
Bayesian statistics
)(
)()|()|(
yp
pypyp
Given data y and parameters , their joint probability can be written in 2 ways:
),()()|( ypypyp )()|(),( pypyp
Eliminating p(y,) gives Bayes’ rule:
Likelihood Prior
Evidence
Posterior
Bayes’ rule
yy
Observation of data
likelihood p(y|)
prior distribution p()
likelihood p(y|)
prior distribution p()
Formulation of a generative model
Update of beliefs based upon observations, given a prior state of knowledge
Principles of Bayesian inference
( | ) ( | ) ( )p y p y p
Normal densities
ey
ppe
pe
y
1
),;()( 1 ppNp
),;()|( 1 eyNyp
),;()|( 1 Nyp
Univariate Gaussian
p
Posterior mean = precision-weighted combination of prior mean and data mean
Normal densities
ppe
pe
xy
x
1
2
),;()( 1 ppNp
),;()|( 1 exyNyp
),;()|( 1 Nyp
Bayesian GLM: univariate case
x
p
exy
Normal densities
),;()( pp CNp
),;()|( eCXyNyp
),;()|( CNyp
Bayesian GLM: multivariate case
ppeT
peT
CyCXC
CXCXC
1
111
β 2
β1
One step if Ce and Cp are known. Otherwise iterative estimation.
eXy
Approximate inference: optimization
)(
)|,(),|(
myp
mypmyp
True posterior
Approximate posterior
)()( ii
iteratively improve
),|(
)(log)(
)(
)|,(log)()(log
myp
q
mypqmyp
free energy
mean-field approximation
1
2
Value of parameter
Objective function
Simple example – linear regressionData Ordinary least squares
yXXXE
XyXyE
Xy
TTols
D
TD
1)(ˆ0
)()(
y
Simple example – linear regression
Bases (explanatory variables)
y
Data and model fit
Bases (explanatory variables) Sum of squared errors
Ordinary least squares
yXXXE
XyXyE
Xy
TTols
D
TD
1)(ˆ0
)()(
X
Simple example – linear regressionData and model fit
Bases (explanatory variables)
y
Ordinary least squares
yXXXE
XyXyE
Xy
TTols
D
TD
1)(ˆ0
)()(
X
Sum of squared errors
Simple example – linear regressionData and model fit
Bases (explanatory variables) Sum of squared errors
y
Ordinary least squares
yXXXE
XyXyE
Xy
TTols
D
TD
1)(ˆ0
)()(
X
Simple example – linear regressionData and model fit
Bases (explanatory variables) Sum of squared errors
y
Over-fitting: model fits noise
Inadequate cost function: blind to overly complex models
Solution: include uncertainty in model parameters
Ordinary least squares
X
Bayesian linear regression:priors and likelihood
eXy Model:
X
Bayesian linear regression:priors and likelihood
1
2
Model:
Prior:
eXy
)2/exp(
),0()(2
2
122
kk INp
Sample curves from prior (before observing any data)
Mean curveX
Bayesian linear regression:priors and likelihood
1
2
Model:
Prior:
)2/exp(
),0()(2
2
122
kk INp
eXy
Bayesian linear regression:priors and likelihood
)2/)(exp(
),(),(
),|(),(
21
111
1
111
ii
ii
N
ii
Xy
XNyp
ypyp
Model:
Prior:
Likelihood:
eXy
)2/exp(
),0()(2
2
122
kk INp
1
2
Bayesian linear regression:priors and likelihood
)2/)(exp(
),(),(
),|(),(
21
111
1
111
ii
ii
N
ii
Xy
XNyp
ypyp
Model:
Prior:
Likelihood:
X
eXy
)2/exp(
),0()(2
2
122
kk INp
1
2
Bayesian linear regression: priors and likelihood
1
2
Model:
Prior:
Likelihood:
X
eXy
)2/exp(
),0()(2
2
122
kk INp
)2/)(exp(
),(),(
),|(),(
21
111
1
111
ii
ii
N
ii
Xy
XNyp
ypyp
Bayesian linear regression: posterior
Model:
Prior:
Likelihood:
Bayes Rule:
)|(),|(),( pypyp
N
iiypyp
111 ),|(),(
1
2
eXy
)2/exp(
),0()(2
2
122
kk INp
y
Bayesian linear regression: posterior
yCX
IXXC
CNyp
T
kT
1
1
21
, ,|
X
Model:
Prior:
Likelihood:
Bayes Rule:
)|(),|(),( pypyp
Posterior:
N
iiypyp
111 ),|(),(
1
2
eXy
)2/exp(
),0()(2
2
122
kk INp
Bayesian linear regression: posterior
1
2
X
Model:
Prior:
Likelihood:
Bayes Rule:
Posterior:
eXy
)2/exp(
),0()(2
2
122
kk INp
yCX
IXXC
CNyp
T
kT
1
1
21
, ,|
)|(),|(),( pypyp
N
iiypyp
111 ),|(),(
Bayesian linear regression: posterior
1
2
X
Model:
Prior:
Likelihood:
Bayes Rule:
Posterior:
eXy
)2/exp(
),0()(2
2
122
kk INp
yCX
IXXC
CNyp
T
kT
1
1
21
, ,|
)|(),|(),( pypyp
N
iiypyp
111 ),|(),(
ths
)|( yp thth pysp )|(
Posterior distribution: probability of the effect given the dataPosterior distribution: probability of the effect given the data
Posterior probability map: images of the probability (confidence) that an activation exceeds some specified thresholdsth, given the data y
Posterior probability map: images of the probability (confidence) that an activation exceeds some specified thresholdsth, given the data y
)|( yp
Two thresholds:• activation threshold sth : percentage of whole brain
mean signal (physiologically relevant size of effect)• probability pth that voxels must exceed to be
displayed (e.g. 95%)
Two thresholds:• activation threshold sth : percentage of whole brain
mean signal (physiologically relevant size of effect)• probability pth that voxels must exceed to be
displayed (e.g. 95%)
mean: size of effectprecision: variabilitymean: size of effectprecision: variability
Posterior Probability Maps (PPMs)
Bayesian linear regression: model selection
Bayes Rule:
),(
),|(),,|(),,(
myp
mpmypmyp
normalizing constant
dmpmypmyp ),|(),,|(),(
)()(
),|(log
mcomplexitymaccuracy
myp
Model evidence:
2
21
2
2
log)(
)(
kmcomplexity
Xymaccuracy
X
grey matterPPM of belonging to… CSFwhite matter
1
2
3
212
223
iiy
class variances
class priorfrequencies
ith voxel label
class means
ith voxel value
1
2
3
212
223
iiy
1
2
3
212
223
iiy
class variances
class priorfrequencies
ith voxel label
class means
ith voxel value
aMRI segmentation
),,( uxFx Neural state equation:
Electric/magneticforward model:
neural activityEEGMEGLFP
Neural model:1 state variable per regionbilinear state equationno propagation delays
Neural model:8 state variables per region
nonlinear state equationpropagation delays
fMRI ERPs
inputs
Hemodynamicforward model:neural activityBOLD
Dynamic Causal Modelling:generative model for fMRI and ERPs
m2m1 m3 m4
V1 V5stim
PPC
attention
V1 V5stim
PPC
attention
V1 V5stim
PPC
attention
V1 V5stim
PPC
attention
[Stephan et al., Neuroimage, 2008]
m1 m2 m3 m4
15
10
5
0
V1 V5stim
PPC
attention
1.25
0.13
0.46
0.39 0.26
0.26
0.10estimated
effective synaptic strengthsfor best model (m4)
models marginal likelihood
ln p y m
Bayesian Model Selection for fMRI
AR coeff(correlated noise)
prior precisionof AR coeff
A
VB estimate of β ML estimate of β
aMRI Smooth Y (RFT)
fMRI time series analysis with spatial priors
observations
GLM coeff
prior precisionof GLM coeff
prior precisionof data noise
Y
Penny et al 2005
11,0 LNp
degree of smoothness Spatial precision matrix
XY
Mean (Cbeta_*.img)
Std dev (SDbeta_*.img)
activation threshold
ths
Posterior density q(βn)
Probability mass pn
fMRI time series analysis with spatial priors:posterior probability maps
probability of getting an effect, given the dataprobability of getting an effect, given the data
),()( nnn Nq mean: size of effectcovariance: uncertainty
thpp
Display only voxels that exceed e.g. 95%Display only voxels that exceed e.g. 95%
PPM (spmP_*.img)
)( thsqp
fMRI time series analysis with spatial priors:Bayesian model selection)()(log qFmyp
Compute log-evidence for each model/subjectCompute log-evidence for each model/subject
model 1model 1
model Kmodel K
subject 1subject 1
subject Nsubject N
Log-evidence mapsLog-evidence maps
fMRI time series analysis with spatial priors:Bayesian model selection
kr
k
BMS mapsBMS maps
PPMPPM
EPMEPM
model kmodel kJoao et al, 2009
)()(log qFmyp
Compute log-evidence for each model/subjectCompute log-evidence for each model/subject
model 1model 1
model Kmodel K
subject 1subject 1
subject Nsubject N
Log-evidence mapsLog-evidence maps
)( krq
kr
941.0)5.0( krq
Probability that model k generated data
Probability that model k generated data
Reminder…
Long term memory
Short term memory
Short-term memory model long-term memory model
onsets Missed trials
IT indices: H,h,I,i
H=entropy; h=surprise; I=mutual information; i=mutual surprise
Compare two models
IT indices are smoother
primary visual cortex
Regions best
explained by short-term memory model
Regions best explained by
long-term memory model
frontal cortex (executive
control)
Group data: Bayesian Model Selection maps
Thank-you