Multivariate models and machine learning for fMRI - TNU · Multivariate models and machine learning...

Multivariate models and machine learning for fMRI

Methods and Models in fMRI, 15.11.2016

Jakob [email protected] Neuromodeling Unit (TNU) Institute for Biomedical Engineering (IBT)University and ETH Zürich

Many thanks to Sudhir Raman and Kay Brodersenfor material

Translational Neuromodeling Unit

1

mailto:[email protected]

Overview

fMRI Analysis and Classifcation 2

Motivation

Learning from data

Multivariate Bayes in SPM

Generative Embedding

Modelling Terminology

Why multivariate?

Univariate approaches are excellent for localizing activations in individual voxels.

v1 v2 v1 v2

reward no reward

*

n.s.

Why multivariate?

Multivariate approaches can be used to examine responses that are jointly encoded in multiple voxels.

v1 v2 v1 v2

n.s.

orange juice apple juice v1

v2

n.s.

A bit of history – Multidymensional scaling


Edelman et al, Psychobiology, 1998

Psychophysical rating fMRI

Two-dimensional projection of similarity measure for bothpsychophysical rating and fMRI response.

A bit of history – Classification Studies


Haxby et al, Science, 2001

A bit of history – Classification Studies


Kamitani and Tong, Nat Neurosci, 2005

Representational similarity analysis


Idea: Compare the similarity of representations (correlation betweenactivation patterns) between different stimuli. Allows for a comparison between monkey(neural firing pattern) and human (fMRI activation patterns).

Kriegeskorte et al, Neuron, 2008

Overview


Motivation

Learning from data



Modelling Terminology

Analysis steps

Feature Extraction

Modelling

Classification

Clustering

Regression

Prediction

Model Selection

Cross validation

Performance

Inference

Feature space

F1 F2 . . . FP

S1 1 0.5

S2 0 5.7

. 1 4

. 1 5.3SN 1 6.6

• Discrete• Continuous

Data Points

Features

Feature selection for fMRImultivariate analysis


Different features answer different questions.Reducing the dimensionality might reduce noise,but could also reduce relevant information.

Model parametersMean valuesRaw data

Model Parameters,e.g. DCM

Correlationsbetweenregions

Model selection - Generalizability


Model Fit

Model Complexity

Bishop (2006), Pitt & Miyung (2002), TICS

Encoding and decoding models


context (cause or consequence)𝑋𝑋𝑡𝑡 ∈ ℝ𝑑𝑑

BOLD signal𝑌𝑌𝑡𝑡 ∈ ℝ𝑣𝑣

conditionstimulus

responseprediction error

encoding model

decoding model

𝑔𝑔:𝑋𝑋𝑡𝑡 → 𝑌𝑌𝑡𝑡

ℎ:𝑌𝑌𝑡𝑡 → 𝑋𝑋𝑡𝑡

Modelling goals

• Prediction

hY X

Predictive Density

Modelling goals

• Model Selection

Sparse Coding Distributed Coding

Model Evidence

Overview

Motivation

Learning From Data



Modelling Concepts

Learning from data

Reinforcement Learning

Semi-supervised Learning

SupervisedLearning

Unsupervised Learning

Labels for trainingdata are known!

Labels for trainingdata are NOT known!

Supervised learning

Function - f

Independent variablesX

dependent variableY CategoricalContinuous

Classification

Support Vector Machines

• Kernel Function – K 𝒙𝒙𝒊𝒊,𝒙𝒙𝒋𝒋 = 𝝓𝝓 𝒙𝒙𝒊𝒊 .𝝓𝝓 𝒙𝒙𝒋𝒋

𝝓𝝓

Function - fX Y

Kernel Methods

Kernel methods for pattern analysis, Taylor , Cristianini, 2004

Other popular classifiers• Gaussian Processes

• Deep Belief networks

G.E. Hinton, S. Osindero, and Y. Teh, “A fast learning algorithm for deep belief nets”, Neural Computation, vol 18, 2006

http://deeplearning.net/tutorial/DBN.html

C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning, the MIT Press, 2006,

Generative and Discriminative classifiers


• Generative classifiers• Learn the parameters for the functions p(Y) and

p(X|Y), e.g. Naïve Bayes Classifier• Discriminative classifiers

• Learn the parameters for p(Y|X), e.g. logistic regression, SVM

Cross-validation

The generalization ability of a classifier can be estimated using a resampling procedure known as cross-validation. One example is 2-fold cross-validation:

examples123

99100

?training exampletest examples

folds??

1

...

???

2

...performance evaluation


• Model Selection• Performance evaluation

• Balanced Accuracy• F1 Score

Cross-validation

Another commonly used variant is leave-one-out cross-validation.

examples123

99100

?training exampletest example

?...98

?

...99

?

...

100

...

folds?1

...

?

2

...performance evaluation

In fMRI often leave one-run-out


Performance – Single Subject


𝑝𝑝 = 𝑃𝑃 𝑋𝑋 ≥ 𝑘𝑘 𝐻𝐻0 = 1 − 𝐵𝐵 𝑘𝑘|𝑛𝑛,𝜋𝜋0

Brodersen et al. 2013, NeuroImage

Binomial Test

k=30

!!! Cross-validated data are not necessarilybinomially distributed Permutation tests are better!!!

Performance – Mulitple subjects


Brodersen et al. 2013, NeuroImage

Fixed effects

Random effects

http://www.translationalneuromodeling.org/tapas/

Confounds – GLM vs. MVPA


Todd et al. 2013, NeuroImage

Second level t-tests for accuracies?


True β-Values are normallydistributed.

True accuracies are not normal and truncated at chance.

A possible solution is givenby Allefeld et al.

Allefeld et al. Neuroimage, 2016

Statistical testing with classification

• Within subjects:– Permutation statistics

– Parametric tests ar not valid (assumptions not met), e.g. Biomial-or t-test (c.f. Schreiber and Krekelberg, 2013).

• Across subjects:– Assumptions for t-tests are not met

– Full Bayesian model (Bordersen et al. 2013, but assumptions arenot met for CV)

– Use prevalence statistic proposed in Allefeld et al., 2016


Research questions for classification

Temporal evolution of discriminability Model-based classificationaccuracy

50 %

100 %

within-trial time

Accuracy rises above chance

Participant indicates decision

Overall classification accuracy Spatial deployment of discriminative regions

80%

55%

accuracy

50 %

100 %

classification task

Truthor

lie?

Left or right

button?

Healthy or ill?

Pereira et al. (2009) NeuroImage, Brodersen et al. (2009) The New Collection

{ group 1, group 2 }


Decoding «hidden» intentions –searchlight approach


Haynes et al., Current Biology, 2007

Decoding of free decisions


Soon et al., Nat Neurosci, 2008

Decoding of fingerpresses (red line). Participants freely choose timingand hand.

Earliest information about left-rightlong before execution – free will?

Decoding task preparation –connectitivy based decoding


Heinzle et al., J Neurosci, 2012

SV-Classifier on connectivity graph (correlation)

Discriminative maps

Unsupervised learning


Building a representation of data

Dimensionality Reduction Time seriesClustering

K-means Mixture models

K-means clustering


• Cost function

• Algorithm1. Initialize2. Estimate assignments3. Estimate cluster centroids4. Repeat 2,3 until

convergence

Bishop PRML (2006)

Clustering – Mixture of Gaussians


Bishop PRML (2006)

Interpretation


• Cluster parameters

• Internal Criterion – Model Evidence• External Criterion - Purity

Inferred Labels

External Labels

Subjects

Cluster 1 Cluster 2


Motivation

Learning from Data


Generative EmbeddingModelling

Encoding vs. Decoding models


Encoding vs. Decoding models


Coding Hypotheses


Spatial vectors Smooth vectors

Sparse vectors

Singular vectors of data Support vectors

Distributed vectors

𝑈𝑈 = 𝑅𝑅𝑌𝑌𝑇𝑇𝑈𝑈𝑈𝑈𝑉𝑉𝑇𝑇 = 𝑅𝑅𝑌𝑌𝑇𝑇

Coding Hypotheses


Friston et al. 2008 NeuroImage

Solved with variational Bayes



Example – Decoding of motion.


Experimental factors:1. Photic2. Motion3. Attention

Attention to motion dataset - Büchel & Friston 1999 Cerebral Cortex




Results





1 2 3 4 5-20

0

20

40

60

partitions

log-evidencemaximum p = 100.00%

-0.04 -0.02 0 0.02 0.040

100

200

300

400

500

voxel-weight

frequ

ency

distribution of weights

Posterior probabilities at maxima ________________________________p(|w| > 0) location (x,y,z) weight (w)________________________________p = 0.993 -39.0,-90.0,-3.0mm q = 0.0254;p = 0.983 -33.0,-99.0,-3.0mm q = -0.0216;p = 0.983 -30.0,-99.0,3.0mm q = 0.0211;p = 0.982 -42.0,-90.0,9.0mm q = 0.0201;p = 0.980 -45.0,-75.0,-3.0mm q = 0.0168;p = 0.979 -30.0,-84.0,6.0mm q = -0.0187;p = 0.977 -39.0,-87.0,3.0mm q = -0.0196;p = 0.973 -30.0,-84.0,-6.0mm q = -0.0204;p = 0.972 -39.0,-81.0,-15.0mm q = 0.0166;p = 0.946 -36.0,-84.0,12.0mm q = -0.0144;p = 0.933 -48.0,-84.0,-3.0mm q = -0.0119;p = 0.929 -39.0,-75.0,3.0mm q = -0.0160;________________________________506 voxels; 360 scans

PPM: MVB_Motion (Motion)

0 100 200 300 400-1

-0.5

0

0.5

scans

adjus

ted re

spon

se

MVB_Motion (prior: sparse)

targetprediction

-1 -0.5 0 0.5-0.4

-0.2

0

0.2

0.4

contrast

pred

iction

observed and predicted contrastSNR (variance) 0.64

SPM

mip

[-36

, -87

, -3]

<

< <

SPM{T338}

Motion

SPMresults: .\SPM-practical\attention\GLMHeight threshold T = 4.874226 {p<0.05 (FWE)}

50

100

150

200

250

300

contrast(s)

3

Laminar activity related to novelty andepisodic memory


Maas et al. 2014 Nature Communications


Motivation

Learning from Data



Modelling Principles

Classifying Groups of Subjects


Subject 1

Subject 2

Subject N

Voxel activity

Subject 1

Subject 2

Subject N

Connectivity

Dynamic causal model (DCM)

ClassificationClustering

Group 1 Group 2

......

• High dimensionality• Unusual cluster distributions• Lack of interpretation



Brodersen et al. PLOS computation biology 2011.

DCM for speech processing


Working memory in Schizophrenia


• 41 Schizophrenia patients (DSM IV,ICD 10), 42 controls

• Visual numeric n-back working memory task

Deserno et al (2012) The Journal of Neuroscience

1 5

4 29 8

9

3 5900ms

500ms

Model based clustering


Brodersen et al 2014 Neuroimage

Results healthy vs. schizophrenia patients



Within patients clustering



Be aware

• Interpretation of decoding or classificationresults is difficult.

• The decoded information must be in thedata, but in what features exactly is oftenhard to find out …


Summary


Summary

Learning from Data



Modelling Principles

Acknowledgments


Many thanks to K.E. Stephan, Sudhir S. Raman and K. Brodersen for sharing their teaching material.

Date post:	07-May-2018
Category:	Documents
Upload:	truongliem
View:	232 times
Download:	1 times

Multivariate models and machine learning for fMRI - TNU · Multivariate models and machine learning...

Documents