Bayesian Inference Lee Harrison York Neuroimaging Centre 14 / 05 / 2010.

Bayesian Inference

Lee Harrison

York Neuroimaging Centre

14 / 05 / 2010

realignmentrealignment smoothingsmoothing

normalisationnormalisation

general linear modelgeneral linear model

templatetemplate

Gaussian Gaussian field theoryfield theory

p <0.05p <0.05

statisticalstatisticalinferenceinference

Bayesian segmentationand normalisation

Bayesian segmentationand normalisation

Spatial priorson activation extent

Spatial priorson activation extent

Dynamic CausalModelling

Dynamic CausalModelling

Posterior probabilitymaps (PPMs)

Posterior probabilitymaps (PPMs)

Paradigm Results

Attention to Motion

Büchel & Friston 1997, Cereb. CortexBüchel et al. 1998, Brain

V5+

SPCV3A

Attention – No attention

- fixation only- observe static dots+ photic V1- observe moving dots + motion V5- task on moving dots + attention V5 + parietal cortex

Paradigm

- fixation only

- observe static dots- observe moving dots- task on moving dots

Dynamic Causal Models

Attention to Motion

V1

V5

SPC

Motion

Photic

Attention

V1

V5

SPC

Motion

PhoticAttention

Model 1 (forward):attentional modulationof V1→V5: forward

Model 2 (backward):attentional modulationof SPC→V5: backward

Bayesian model selection: Which model is optimal?

Responses to Uncertainty

Long term memory

Short term memory

Responses to Uncertainty

Paradigm Stimuli sequence of randomly sampled discrete events

Model simple computational model of an observers

response to uncertainty based on the number of

past events (extent of memory)

Question which regions are best explained by

short / long term memory model?

…

1 2 40 trials

1 432

??

Overview

• Introductory remarks

• Some probability densities/distributions

• Probabilistic (generative) models

• Bayesian inference

• A simple example – Bayesian linear regression

• SPM applications– Segmentation – Dynamic causal modeling– Spatial models of fMRI time series

k=2

Probability distributions and densities


k=2


k=2


k

kkp 1)|(

))(exp()|( 202

1 p

2

1

2

k=2


k=2


k=2


k=2

y

1

2

1

2

3

Generative models

111

2221

1332 ),0(~

eXy

eX

Ne

estimation

y

time

space

2

generation

?)(q

space

1

},{

)()|()|( pypyp )()|()|( pypyp posterior likelihood ∙ prior

)|( yp )|( yp )(p )(p

Bayes theorem allows one to formally incorporate prior knowledge into computing statistical probabilities.

The “posterior” probability of the parameters given the data is an optimal combination of prior knowledge and new data, weighted by their relative precision.

new data prior knowledge

Bayesian statistics

)(

)()|()|(

yp

pypyp

Given data y and parameters , their joint probability can be written in 2 ways:

),()()|( ypypyp )()|(),( pypyp

Eliminating p(y,) gives Bayes’ rule:

Likelihood Prior

Evidence

Posterior

Bayes’ rule

yy

Observation of data

likelihood p(y|)

prior distribution p()

likelihood p(y|)

prior distribution p()

Formulation of a generative model

Update of beliefs based upon observations, given a prior state of knowledge

Principles of Bayesian inference

( | ) ( | ) ( )p y p y p

Normal densities

ey

ppe

pe

y

1

),;()( 1 ppNp

),;()|( 1 eyNyp

),;()|( 1 Nyp

Univariate Gaussian

p

Posterior mean = precision-weighted combination of prior mean and data mean

Normal densities

ppe

pe

xy

x

1

2

),;()( 1 ppNp

),;()|( 1 exyNyp

),;()|( 1 Nyp

Bayesian GLM: univariate case

x

p

exy

Normal densities

),;()( pp CNp

),;()|( eCXyNyp

),;()|( CNyp

Bayesian GLM: multivariate case

ppeT

peT

CyCXC

CXCXC

1

111

β 2

β1

One step if Ce and Cp are known. Otherwise iterative estimation.

eXy

Approximate inference: optimization

)(

)|,(),|(

myp

mypmyp

True posterior

Approximate posterior

)()( ii

qq

iteratively improve

),|(

)(log)(

)(

)|,(log)()(log

myp

qq

q

mypqmyp

free energy

mean-field approximation

1

2

Value of parameter

Objective function

Simple example – linear regressionData Ordinary least squares

yXXXE

XyXyE

Xy

TTols

D

TD

1)(ˆ0

)()(

y

Simple example – linear regression

Bases (explanatory variables)

y

Data and model fit

Bases (explanatory variables) Sum of squared errors

Ordinary least squares

yXXXE

XyXyE

Xy

TTols

D

TD

1)(ˆ0

)()(

X

Simple example – linear regressionData and model fit

Bases (explanatory variables)

y


yXXXE

XyXyE

Xy

TTols

D

TD

1)(ˆ0

)()(

X

Sum of squared errors



y


yXXXE

XyXyE

Xy

TTols

D

TD

1)(ˆ0

)()(

X



y

Over-fitting: model fits noise

Inadequate cost function: blind to overly complex models

Solution: include uncertainty in model parameters


X

Bayesian linear regression:priors and likelihood

eXy Model:

X


1

2

Model:

Prior:

eXy

)2/exp(

),0()(2

2

122

kk INp

Sample curves from prior (before observing any data)

Mean curveX


1

2

Model:

Prior:

)2/exp(

),0()(2

2

122

kk INp

eXy


)2/)(exp(

),(),(

),|(),(

21

111

1

111

ii

ii

N

ii

Xy

XNyp

ypyp

Model:

Prior:

Likelihood:

eXy

)2/exp(

),0()(2

2

122

kk INp

1

2


)2/)(exp(

),(),(

),|(),(

21

111

1

111

ii

ii

N

ii

Xy

XNyp

ypyp

Model:

Prior:

Likelihood:

X

eXy

)2/exp(

),0()(2

2

122

kk INp

1

2

Bayesian linear regression: priors and likelihood

1

2

Model:

Prior:

Likelihood:

X

eXy

)2/exp(

),0()(2

2

122

kk INp

)2/)(exp(

),(),(

),|(),(

21

111

1

111

ii

ii

N

ii

Xy

XNyp

ypyp

Bayesian linear regression: posterior

Model:

Prior:

Likelihood:

Bayes Rule:

)|(),|(),( pypyp

N

iiypyp

111 ),|(),(

1

2

eXy

)2/exp(

),0()(2

2

122

kk INp

y


yCX

IXXC

CNyp

T

kT

1

1

21

, ,|

X

Model:

Prior:

Likelihood:

Bayes Rule:

)|(),|(),( pypyp

Posterior:

N

iiypyp

111 ),|(),(

1

2

eXy

)2/exp(

),0()(2

2

122

kk INp


1

2

X

Model:

Prior:

Likelihood:

Bayes Rule:

Posterior:

eXy

)2/exp(

),0()(2

2

122

kk INp

yCX

IXXC

CNyp

T

kT

1

1

21

, ,|

)|(),|(),( pypyp

N

iiypyp

111 ),|(),(


1

2

X

Model:

Prior:

Likelihood:

Bayes Rule:

Posterior:

eXy

)2/exp(

),0()(2

2

122

kk INp

yCX

IXXC

CNyp

T

kT

1

1

21

, ,|

)|(),|(),( pypyp

N

iiypyp

111 ),|(),(

ths

)|( yp thth pysp )|(

Posterior distribution: probability of the effect given the dataPosterior distribution: probability of the effect given the data

Posterior probability map: images of the probability (confidence) that an activation exceeds some specified thresholdsth, given the data y

Posterior probability map: images of the probability (confidence) that an activation exceeds some specified thresholdsth, given the data y

)|( yp

Two thresholds:• activation threshold sth : percentage of whole brain

mean signal (physiologically relevant size of effect)• probability pth that voxels must exceed to be

displayed (e.g. 95%)

Two thresholds:• activation threshold sth : percentage of whole brain

mean signal (physiologically relevant size of effect)• probability pth that voxels must exceed to be

displayed (e.g. 95%)

mean: size of effectprecision: variabilitymean: size of effectprecision: variability

Posterior Probability Maps (PPMs)

Bayesian linear regression: model selection

Bayes Rule:

),(

),|(),,|(),,(

myp

mpmypmyp

normalizing constant

dmpmypmyp ),|(),,|(),(

)()(

),|(log

mcomplexitymaccuracy

myp

Model evidence:

2

21

2

2

log)(

)(

kmcomplexity

Xymaccuracy

X

grey matterPPM of belonging to… CSFwhite matter

1

2

3

212

223

iiy

class variances

class priorfrequencies

ith voxel label

class means

ith voxel value

1

2

3

212

223

iiy

1

2

3

212

223

iiy

class variances

class priorfrequencies

ith voxel label

class means

ith voxel value

aMRI segmentation

),,( uxFx Neural state equation:

Electric/magneticforward model:

neural activityEEGMEGLFP

Neural model:1 state variable per regionbilinear state equationno propagation delays

Neural model:8 state variables per region

nonlinear state equationpropagation delays

fMRI ERPs

inputs

Hemodynamicforward model:neural activityBOLD

Dynamic Causal Modelling:generative model for fMRI and ERPs

m2m1 m3 m4

V1 V5stim

PPC

attention

V1 V5stim

PPC

attention

V1 V5stim

PPC

attention

V1 V5stim

PPC

attention

[Stephan et al., Neuroimage, 2008]

m1 m2 m3 m4

15

10

5

0

V1 V5stim

PPC

attention

1.25

0.13

0.46

0.39 0.26

0.26

0.10estimated

effective synaptic strengthsfor best model (m4)

models marginal likelihood

ln p y m

Bayesian Model Selection for fMRI

AR coeff(correlated noise)

prior precisionof AR coeff

A

VB estimate of β ML estimate of β

aMRI Smooth Y (RFT)

fMRI time series analysis with spatial priors

observations

GLM coeff

prior precisionof GLM coeff

prior precisionof data noise

Y

Penny et al 2005

11,0 LNp

degree of smoothness Spatial precision matrix

XY

Mean (Cbeta_*.img)

Std dev (SDbeta_*.img)

activation threshold

ths

Posterior density q(βn)

Probability mass pn

fMRI time series analysis with spatial priors:posterior probability maps

probability of getting an effect, given the dataprobability of getting an effect, given the data

),()( nnn Nq mean: size of effectcovariance: uncertainty

thpp

Display only voxels that exceed e.g. 95%Display only voxels that exceed e.g. 95%

PPM (spmP_*.img)

)( thsqp

fMRI time series analysis with spatial priors:Bayesian model selection)()(log qFmyp

Compute log-evidence for each model/subjectCompute log-evidence for each model/subject

model 1model 1

model Kmodel K

subject 1subject 1

subject Nsubject N

Log-evidence mapsLog-evidence maps

fMRI time series analysis with spatial priors:Bayesian model selection

kr

k

BMS mapsBMS maps

PPMPPM

EPMEPM

model kmodel kJoao et al, 2009

)()(log qFmyp

Compute log-evidence for each model/subjectCompute log-evidence for each model/subject

model 1model 1

model Kmodel K

subject 1subject 1

subject Nsubject N

Log-evidence mapsLog-evidence maps

)( krq

kr

941.0)5.0( krq

Probability that model k generated data

Probability that model k generated data

Reminder…

Long term memory

Short term memory

Short-term memory model long-term memory model

onsets Missed trials

IT indices: H,h,I,i

H=entropy; h=surprise; I=mutual information; i=mutual surprise

Compare two models

IT indices are smoother

primary visual cortex

Regions best

explained by short-term memory model

Regions best explained by

long-term memory model

frontal cortex (executive

control)

Group data: Bayesian Model Selection maps

Thank-you

Date post:	28-Mar-2015
Category:	Documents
Upload:	caleb-moran
View:	213 times
Download:	1 times

Bayesian Inference Lee Harrison York Neuroimaging Centre 14 / 05 / 2010.

Documents