+ All Categories
Home > Documents > Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie...

Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie...

Date post: 02-Jan-2016
Category:
Upload: crystal-miles
View: 227 times
Download: 2 times
Share this document with a friend
Popular Tags:
101
Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca Hutchinson, Marcel Just, Mark Palatucci, Francisco Pereira, Rob Mason, Indra Rustandi, Svetlana Shinkareva, Wei Wang
Transcript
Page 1: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Machine Learning for Analyzing Brain Activity

Tom M. Mitchell

Machine Learning DepartmentCarnegie Mellon University

October 2006

Collaborators: Rebecca Hutchinson, Marcel Just, Mark Palatucci, Francisco Pereira, Rob Mason, Indra Rustandi,

Svetlana Shinkareva, Wei Wang

Page 2: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

improving performance at some taskthrough experience

Learning =

Page 3: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Learning to Predict Emergency C-Sections

9714 patient records, each with 215 features

[Sims et al., 2000]

Page 4: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Learning to detect objects in images

Example training images for each orientation

(Prof. H. Schneiderman)

Page 5: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Learning to classify text documents

Company home page

vs

Personal home page

vs

University home page

vs

Page 6: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Reinforcement Learning

...]rγr γE[r(s)V 2t2

1tt*

[Sutton and Barto 1981; Samuel 1957]

Page 7: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Machine Learning - Practice

Object recognition

Mining Databases

Speech Recognition

Control learning

• Reinforcement learning

• Supervised learning

• Bayesian networks

• Hidden Markov models

• Unsupervised clustering

• Explanation-based learning

• ....

Text analysis

Page 8: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Machine Learning - Theory

PAC Learning Theory

# examples (m)

representational complexity (H)

error rate ()failure probability ()

Similar theories for

• Reinforcement skill learning

• Unsupervised learning

• Active student querying

• …

… also relating:

• # of mistakes during learning

• learner’s query strategy

• convergence rate

• asymptotic performance

• …

(for supervised concept learning)

Page 9: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Functional MRI

Page 10: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Brain scans can track activation with precision and sensitivity

[from Walt Schneider]

Page 11: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Human Brain Imaging • fMRI Location with millimeter

precision

1 mm3 = 0.0004% of cortex

• ERP Time course with millisecond precision

10 ms = 10 % of human production cycle

• DTI Connections tracing millimeter precision

1 mm connection ~10k fibers, or 0.0001% of neurons

[from Walt Schneider]

Page 12: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Can we train classifiers of mental state?

Page 13: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Can we train program to classify what you’re thinking about?

“reading a word about tools” or “reading a word about buildings” ?

Observed fMRI:

…time

Train classifiers of form: fMRI(t, t+1,... t+d) CognitiveProcesse.g., fMRI(t, t+1,... t+4) = {tools, buildings, fish, vegetables, ...}

Page 14: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Reading a noun (15 sec)

[Rustandi et al., 2005]

Page 15: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Representing Meaning in the Brain

Study brain activation associated with different semantic categories of words and pictures

Categories: vegetables, tools, trees, fish, dwellings, building parts

Some experiments use block stimulus design• Present sequence of 20 words from same category, classify the

block of words

Some experiments use single stimuli• Present single words/pictures for 3 sec, classify brain activity for

single word/picture

Page 16: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Classifying the Semantic Category of Word Blocks

Learn fMRI(t,...t+32) word-category(t,...t+32) – fMRI(t1...t2) = 104 voxels, mean activation of each during interval [t1

t2]

Training methods:– train single-subject classifiers– Gaussian Naïve Bayes P(fMRI | word-category) – Nearest nbr with spatial-correlation as distance– SVM, Logistic regression, ...

Feature selection: Select n voxels– Best accuracy: reduce 104 voxels to 102

Page 17: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Mean Activation per Voxel for Word Categories

Tools

Dwellings

one horizontal slice, from one subject, ventral temporal cortex

[Pereira, et al 2004]

Presentation 1 Presentation 2 Classification accuracy 1.0 (tools vs dwellings) on each of 7 human subjects

(trained on indiv. human subjects)

Page 18: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Gaussian Naïve Bayes (GNB) classifier* for <f1, … fn> C. Assume the fj are conditionally independent given C.

Training:

1. For each class value, ci, estimate

2. For each feature Fj estimate

Classify new instance Use Bayes rule:

F2F1

C

Fn…

*assumes feature values are conditionally independent given the class

Normal distribution

Page 19: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Results predicting word block semantic category

Mean pairwise prediction accuracy averaged over 8 subjects:

Random guess: 0.5 expected accuracy

Ventral temporal cortex classifiers averaged over 8 subjects:• Best pair: Dwellings vs. Tools (1.00 accuracy)• Worst pair: Tools vs. Fish (.40 accuracy)• Average over all pairs: .75

Averaged over all subjects, all pairs:• Full brain: .75 (individual subjects: .57 to .83)• Ventral temporal: .75 (individuals: .57 to .88)• Parietal: .70 (individuals: .62 to .77)• Frontal: .67 (individuals: .48 to .78)

Page 20: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Question:

Are there consistently distinguishable and consistently confusable categories

across subjects?

Page 21: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Six-Category Study: Pairwise Classification Errors (ventral temporal cortex)

Fish Vegetables Tools Dwellings Trees Bldg Parts

Subj1 .20 .55 * .20 .15 .15 .05 *

Sub2 .10 * .55 * .35 .20 .10 * .30

Sub3 .20 .35 * .15 * .20 .20 .20

Sub4 .15 .45 * .15 .15 .25 .05 *

Sub5 .60 * .55 .25 .20 .15 * .15 *

Sub6 .20 .25 .00 * .30 * .30 * .05

Sub7 .15 .55 * .15 .25 .15 .05 *

Mean .23 .46 .18 .21 .19 .12

* Worst * Best

Page 22: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Question:

Can we classify single, 3-second word presentation?

Page 23: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Accuracy of up to 80% for classifying whether word is about a “tool” or a “dwelling”

Rank accuracy of up to 68% for classifying which of 14 individual words (6 presentations of each word)

Category classification accuracy is above chance for all subjects.

Individual word classification accuracy is not consistent across subjects

Classifying individual word presentations

Page 24: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Question:

Where in the brain is the activity that discriminates word category?

Page 25: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Learned Logistic Regression Weights: Tools (red) vs Buildings (blue)

Page 26: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Accuracy of searchlights: Bayes classifier

Accuracy at each voxel witha radius 1 searchlight

Page 27: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Regions that encode ‘tools’ vs. ‘dwellings’

Accuracy ateach significant

searchlight[0.7-0.8]

“tools” vs. “dwellings

“searchlight”classifier at eachvoxel uses onlythe voxel and its immediate

neighbors

Page 28: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

The distinguishing voxels occur in sensible areas:dwellings activate parahippocampal place areatools activate motor and premotor areas

Page 29: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

What is the relation between the neural representation of a word in two different languages in the brain of a bilingual?

Tested 10 Portuguese-English bilinguals in English and in Portuguese, using the same words

Page 30: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Identifying categories (tool or dwelling) within languages for individual subjects

• using naïve bayes classifier

Subj. # English Portuguese Eng->Port Port->Eng

01701B 0.69 0.76 0.64 0.93

01708B 0.58 0.71 0.64 0.86

01723B 0.77 0.68 0.57 0.79

01730B 0.68 0.71 0.71 0.71

01751B 0.60 0.64 0.93 0.57

01765B 0.61 0.56 0.79 0.79

01771B 0.56 0.70 0.79 0.71

01776B 0.63 0.55 0.93 0.79

01778B 0.52 0.67 0.50 0.64

01783B 0.55 0.75 0.93 0.79

Page 31: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Identifying categories across languages for individual subjects (rank accuracy)

Subj. # English Portuguese Eng->Port Port->Eng

01701B 0.69 0.76 0.64 0.93

01708B 0.58 0.71 0.64 0.86

01723B 0.77 0.68 0.57 0.79

01730B 0.68 0.71 0.71 0.71

01751B 0.60 0.64 0.93 0.57

01765B 0.61 0.56 0.79 0.79

01771B 0.56 0.70 0.79 0.71

01776B 0.63 0.55 0.93 0.79

01778B 0.52 0.67 0.50 0.64

01783B 0.55 0.75 0.93 0.79

Across Languages

Page 32: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

What is the relation between the neural representation of an object when it is referred to by a word versus when it is depicted by a line drawing?

Page 33: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Schematic representation of experimental design for (A) pictures and (B) words experiments

Page 34: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

It is easier to identify the semantic category of a picture a subject is viewing than a word he/she is reading

Pictures accuracy

Words accuracy

Page 35: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Can a classifier be trained in one modality, and then accurately identify activation patterns in the other modality?

Page 36: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Cross-Modal identification accuracy is high, in both directions

Word to picture

Picture to word

Page 37: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Can a classifier be trained on a group of human subjects, then be successfully applied to a new person?

Page 38: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Picture category accuracy within and between subjects

The classifiers work well across subjects; for “bad” subjects, the identification is even better across than within subjects

Within subject accuracy

Between subject accuracy

Page 39: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Locations of diagnostic voxels across subjects Tool voxels are shown in blue, dwellings voxels are shown in red L IPL indicated with a yellow circle;activates during imagined or actual grasping (Crafton et al., 1996)

Subj 1 Subj 5Subj 4Subj 3Subj 2

Page 40: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Voxel Locations are Similar for Pictures and Words

Pictures

Tools:

L IPL

L postcentral

L middle temporal

Cuneus

Dwellings (positive weights):

L/R Parahippocampal gyrus

Cuneus

WordsTools:

L IPL

L postcentralL precentral

L middle temporal

Dwellings (positive weights):L/R Parahippocampal gyrus

Interpretation:L IPL – imaginary grasping (of tools, here) (Crafton et al., 1996)

Parahippocampal gyrus – formation and retrieval of topographical memory; plays a role in perception of landmarks or scenes

Page 41: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Lessons Learned

Yes, one can train machine learning classifiers to distinguish a variety of cognitive states/processes– Picture vs. Sentence– Ambiguous sentence vs. unambiguous– Nouns about “tools” vs. nouns about “dwellings”

• Train on Portuguese words, test on English• Train on words, test on pictures• Train on some human subjects, test on others

Failures too:– True vs. false sentences– Negative sentence (containing “not”) vs. affirmative

ML methods:– Logistic regression, NNbr, Naïve Bayes, SVMs, LogReg, …– Feature selection matters: searchlights, contrast to fixation, ...– Case study in high dimensional, noisy classification [MLJ 2004]

Page 42: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

[Science, 2001]

[Machine Learning Journal, 2004]

[Nature Neuroscience, 2006]

Page 43: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

2. How can we model overlapping mental processes?

Page 44: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Decide whether consistent

Can we learn to classify/track multiple overlapping processes(with unknown timing)?

Observed fMRI:

Observed button press:

Read sentence

View picture

Input stimuli:

?

Page 45: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Process: ReadSentence Duration d: 11 sec. P(Process = ReadSent) P(Offset times): Response signature W:

Configuration C of Process Instances h 1, 2, … i

Observed data Y:

Input Stimulus :

14

Timing landmarks : ¢ 2¢ 1 ¢ 3

2

Process instance: 4 Process h: ReadSentence Timing landmark : 3

Offset time O: 1 sec Start time ´ + O

sentencepicture

sentence

3

Hidden Process Models Red to be learned

Page 46: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

The HPM Graphical ModelProbabilistically generate data Yt,v using a configuration of N

process instances 1, ... n

Offset(1)

contribution to Yt,v

observed data Yt,v

Stimulus(1)

ProcessType(1)

Voxel v, time t

StartTime(1)

Offset(2)

contribution to Yt,v

Stimulus(2)

StartTime(2)

ProcessType(2)

observedunobserved

Page 47: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Learning HPMs

• Known process IDs,start times:– Least squares regression, eg. Dale[HBM,1999]– Ordinary least sq if assume noise indep over time– Generalized least sq if assume autocorrelated noise

• Unknown start times:EM algorithm (Iteratively reweighted least squares)– Repeat:

• E: estimate distribution over latent variables• M: choose parameters to maximize expected log full data likelihood

Y = X h + ε

Page 48: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

HPM: Synthetic Noise-Free Data Example

Process 1: Process 2: Process 3:

Process responses:

Process instances:

observed data

ProcessID=1, S=1

ProcessID=2, S=17

ProcessID=3, S=21

Time

Page 49: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Figure 1. The learner was given 80 training examples with known start times for only the first two processes. It chooses the correct start time (26) for the third process, in addition to learning the HDRs for all three processes.

true signal

Observed noisy signal

true response W

learned W

Process 1 Process 2 Process 3

Page 50: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Inference with HPMs

• Given an HPM and data set– Assign the Interpretation (process IDs and timings) that

maximizes data likelihood

• Classification = assigning the maximum likelihood process IDs

y = X h + ε

Page 51: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

2-process HPM for Picture-Sentence Study

Read sentence

View picture

Cognitive processes:

Observed fMRI:

cortical region 1:

cortical region 2:

Page 52: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

ViewPicture in Visual Cortex

Offset = P(Offset)0 0.7251 0.275

Page 53: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

ReadSentence in Visual Cortex

Offset = P(Offset)0 0.6251 0.375

Page 54: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

View PictureOr

Read Sentence

Read SentenceOr

View PictureFixation

Press Button

4 sec. 8 sec.t=0

Rest

picture or sentence? picture or sentence?

16 sec.

GNB:

picture or sentence?

picture or sentence?

HPM:

HPMs improve classifiaction accuracy over G Naïve Bayes

by 15% on average.

Page 55: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

trial 25

Models learned (with known onset times)

Comprehend sentence

Comprehend picture

Page 56: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

How can we use HPMs to resolve between competing cognitive models?

Page 57: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Is the subject using two or three cognitive processes?

• Train 2-process HPM2 on training data

• Train 3-process HPM3 on training data

• Test HPM2 and HPM3 on separate test data– Which predicts known process identities better?– Which has higher probability given the test data?– (use n-fold cross-validation for test)

Page 58: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Decide whether consistent

3-process HPM model for Picture-Sentence Study

Read sentence

View picture

Cognitive processes:

?

Observed fMRI:

cortical region 1:

cortical region 2:

Observed button press:

Page 59: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Decide whether consistent

3-process HPM model for Picture-Sentence Study

Observed fMRI:

Observed button press:

Read sentence

View picture

Input stimuli:

?

Page 60: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Learned HPM with 3 processes (S,P,D), and R=13sec (TR=500msec).

P PS S

D?

observed

Learned models:

S

P

D

D start time chosen by program as t+18

reconstructed

P PS S

D D

D?

Page 61: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Which HPM Model Works Best?

Page 62: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Which HPM Model Works Best? 3-process HPM

Page 63: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Parameter Sharing in HPMs[Niculescu, Mitchell, Rao, JMLR 2006]

• Problem: Many, many parameters to estimate:

4698 voxels ¢ 26 parameters/voxel ¢ 3 processes = 366,444

• But only dozens of training trials

• Sometimes neighboring voxels exhibit similar Wv,t,

• Learn which subregions share, then for each v in region r

Wv,t, Cv ¢ Wr,t, voxel process

time

region

Page 64: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Which Parameters to Share?

Learn shared regions using via greedy, top-down algorithm:

• Initialize Regions set of anatomically-defined regions• Loop until all r 2 Regions are finalized:

– Choose an unfinalized region, R, from Regions– SR divide R rectilinearly into 2x2x4 subregions

– Train HPMR and HPMSR, using nested cross-validation to determine which is more accurate

– If HPMSR more accurate than HPMR,

• Then replace R by SR

• Else mark R finalized

Page 65: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

shared parameter regions amplitude coefficients Cv

shared parameters for S process

Wv,t, = Cv ¢ Wt,

Wt,S

t

Page 66: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Results of Parameter Sharing in HPMs

• Parameter-sharing model needs only 35% as much training data as the non-sharing model (to achieve same accuracy)

• Reduces 4698 voxels to 299 regions

• Reduces number of estimated parameters from 366,444 to 38,232

• Improves cross-validated data likelihood of learned model

• Parameter-sharing model currently learnable only when process onset times are given future work...

Page 67: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Goal: General Models of Cognitive Processing

Read word

Decide category

Push button

Read word Read word

Comprehend sentence Decide truth

Push button

Read word Read word

Comprehend picture Comprehend sentenceDecide pic=?=sent

Push button

Page 68: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Summary Conclusions

• Can studies of human and artificial intelligence inform each other?– Up to now, not much– This may be about to change

• Can we understand knowledge representation in the brain?– fMRI provides sufficient data to distinguish interesting

semantic representations

• Will we be able to track processes in the brain?– HPMs provide a machine learning approach to learning most

probable models given observed data (and linearity assumption)

Page 69: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Thank you

Page 70: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

shared parameter regions amplitude coefficients Cv

shared parameters for S process

Wv,t, = Cv ¢ Wr,t,

Wr,t,S

t

Page 71: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Univariate analysis (e.g., SPM):

Multivariate analysis (e.g., learned classifiers):

“Is the activity of voxel v sensitive to the experimental conditions?”

“Can voxel set S={v1, ... vn} successfully predict the experimental condition?”

Tool words

Dwelling words

Page 72: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Why Multivariate Classifiers?

1. Discover distributed patterns of activation

2. Determine statistical significance with fewer modeling assumptions (e.g., no need for t-test assumptions of Gaussianity)

– Cross validation tests assume only iid examples

3. Determine whether there is a statistically significant difference, AND magnitude of the difference

4. Better handling of signal-to-noise problems:– Univariate combine signal across images– Multivariate combine signal across images and voxels

Page 73: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Imagine two voxels, and their P(activation|class) for c1, and c2

n=106 p < 0.0001 n=102 p < 0.0001

Both depend on the class. We can get the same p-values if we collect more data for the first. p-values yield confidence in existence of effect -- not

magnitude of effect.

Magnitude of the effect is obtained by training a GNB classifier. Its cross validated prediction error is an unbiased estimate of the Bayes error – the area under the intersection – the magnitude of the effect

Page 74: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Then how do we get p-values for a classifier using voxel set S, which predicts m out of n correctly on a cross validation set?

Tool words

Dwelling words

Null hypothesis: true classifier accuracy = .50

P( m correct | true acc = 0.5) = Binomial(n, p=0.5)

P( at least m correct | true acc = 0.5) = p-value

Page 75: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Gaussian Naïve Bayes (GNB) classifier* for <f1, … fn> C. Assume the fj are conditionally independent given C.

Training:

1. For each class value, ci, estimate

2. For each feature Fj estimate

Classify new instance Use Bayes rule:

F2F1

C

Fn…

* Same model assumptions as GLM!

Page 76: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Linear Decision Surfaces

This form of GNB learns a linear decision surface:

Logistic regression learns same linear form, but estimates wi to maximize conditional data likelihood P(C|X)

Linear Discriminant Analysis learns same form, but estimates wi to maximize the ratio of between-class variance to within-class variance.

Linear SVM learns same form, but estimates wi to maximize margin between classes

Page 77: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Learning an HPM: Picture and Sentence Study

• Each trial: determine whether sentence correctly describes picture

• 40 trials per subject.• Picture first in 20 trials, Sentence first in other 20• Images acquired every 0.5 seconds.

Read Sentence

View Picture Read Sentence

View PictureFixation

Press Button

4 sec. 8 sec.t=0

Rest

Page 78: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Goal: Use brain imaging to study how people think

• What details can be observed with imaging?– Physically: 1 mm, 1 msec, axon bundle connectivity – Functionally: surprisingly subtle (e.g., ‘tools’ vs. ’dwellings’)– Controlled experiments difficult – humans think what they want!

• What form of cognitive models makes sense?– High level production system models?: SOAR, ACT-R, 4CAPS, ...– Intermediate level: Hidden Process Models– Connectionist neural network models?: e.g., Plaut language models

• How can we analyze the data to find models?– Machine learning classifiers predictive spatial/temporal patterns – Hidden process models model overlapping processes with

unknown timing– Can we build a library of cognitive processes and their signatures?

Page 79: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Lessons Learned

Yes, one can train machine learning classifiers to distinguish a variety of cognitive states/processes– Nouns about “tools” vs. nouns about “building parts” – Ambiguous sentence vs. unambiguous– Picture vs. Sentence

Failures too:– True vs. false sentences– Negative sentence (containing “not”) vs. affirmative

ML methods:– Logistic regression, NNbr, Naïve Bayes, SVMs, LDA, NNets, …– Feature selection matters: searchlights, contrast to fixation, ...– Case study in high dimensional, noisy classif [MLJ 2004]

Page 80: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

HPMs More Precisely…

Process h (e.g., ‘Read’) =

< duration, fMRI response signature W, offset time distribution >

Process Instance (e.g., “Read ‘The dog ran’ ”)=

< process, stimulus, start time >

Configuration c = set of Process Instances

Hidden Process Model HPM = h H, , C, i

• H: set of processes• : define distrib over h()• C: set of partially specified

candidate configurations

• : h 1 … v i voxel noise model

Page 81: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Learning HPMs: with unknown timing, known processes

EM (Expectation-Maximization) algorithm• E-step

– Estimate the conditional distribution over start times of the process

instances given observed data, current HPM

P(O(1)…O(N) | Y, h(1)… h(N), HPM).

• M-step– Use the distribution from the E step to determine new maximum-

(expected) likelihood estimates of the HPM parameters.

Distributions governing timing offsets and response signatures

** In real problems, some timings are often known

* Special case of DBNs with built-in assumptions

Page 82: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Observed fMRI:

time

Can set S of voxels successfully predict the experimental condition?

Reading a word about ‘tools’ or ‘buildings’?

Page 83: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Example 2: Word Categories – Individual word presentations

Two categories (tools, dwellings):

• Presented 7 tool words, 7 dwelling words, 6 times each (84 word presentations in total)

• Inter-trial interval: 10 sec

• Train classifier to predict category given single word presentation, using 4 sec of data (starting 4 sec after stimulus)

[with Marcel Just, Rob Mason, Francisco Pereira, Svetlana Shinkareva, Wei Wang]

Page 84: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Learning task formulation• Learn

Mean(fMRI(t+4), ...,fMRI(t+7)) WordCategory– Leave one out cross validation over 84 word presentations

• Preprocessing: – Convert each image x to standard normal image

• Learning algorithms tried:– kNN spatial correlation– Gaussian Naïve Bayes best on average– Regularized Logistic regression best on average– Support Vector Machine

• Feature selection methods tried:– Logistic regression weights, activity relative to fixation, spotlights,...

Page 85: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Linear Decision Surfaces

This form of GNB learns a linear decision surface:

Logistic regression learns same linear form, but estimates wi to maximize conditional data likelihood P(C|X)

Linear Discriminant Analysis learns same form, but estimates wi to maximize the ratio of between-class variance to within-class variance.

Linear SVM learns same form, but estimates wi to maximize margin between classes

Page 86: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Learning task formulation• Learn

Mean(fMRI(t+4), ...,fMRI(t+7)) WordCategory– Leave one out cross validation over 84 word presentations

• Preprocessing: – Convert each image x to standard normal image

• Learning algorithms tried:– kNN spatial correlation– Gaussian Naïve Bayes best on average– Regularized Logistic regression best on average– Support Vector Machine

• Feature selection methods tried:– Logistic regression weights, activity relative to fixation, spotlights,...

• Results: for 4 of 8 subjects, classifier accuracy > .80; others .5 to .8

Page 87: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Question: How can we tell which locations allow classifier to succeed?

Classifiers answer: “Can set S of voxels successfully predict the experimental condition?”

• Try all possible subsets S?

• Examine learned classifier weights?

• Examine class-conditional means?

• ...

Page 88: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Linear discriminant weights

GNB(accuracy 0.65)

Slice orientation

posterior

anterior

left right

Page 89: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Linear discriminant weights

GNB(accuracy 0.65)

LogisticRegression(accuracy 0.75)

correlation 0.8

Page 90: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Learned Logistic Regression Weights: Tools (red) vs Buildings (blue)

Page 91: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Idea 1 [Kriegeskorte 2002]:

• Examine ability to discriminate inside a small region• Train a classifier for

every small region

Idea 2:• Use this for voxel selection, within the training set

– Compute accuracy inside all searchlights– Rank voxels by the accuracy of their searchlights

searchlight classifiers

Page 92: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Accuracy of searchlights: Bayes classifier

Accuracy at each voxel witha radius 1 searchlight

Page 93: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Accuracy of single-voxel classifiers

Accuracy at each voxelby itself

Page 94: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Accuracy of searchlights: Bayes classifier

Accuracy at each voxel witha radius 1 searchlight

Page 95: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Accuracy of searchlights: Bayes classifier

Accuracy at each voxel witha radius 1 searchlight(significant voxels FDR 0.01)

Page 96: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Accuracies of significant searchlights

Accuracy ateach significant

searchlight[0.7-0.8]

Page 97: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

voxel selection based on searchlights

Conclusions:• GNB accuracy using searchlight-selected voxels ~80%• Locations identified are plausible

– include parahippocampal gyrus and pre/post central gyri

• Similar results in accuracy/location for 3 other subjects

“Spatial Searchlights for Feature Selection and Classification” Francisco Pereira, et al., in preparation

Page 98: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Word stimuli in word-picture study

Dwellings Tools

Castle Drill

House Saw

Hut Screwdriver

Apartment Pliers

Igloo Hammer

Page 99: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Picture stimuli were presented as white lines on black background

Page 100: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

functional Magnetic Resonance Imaging (fMRI)

~1 mm resolution

~2 images per sec.

15,000 voxels/image

non-invasive, safe

measures Blood Oxygen Level Dependent (BOLD) response

Typical fMRI response to impulse of

neural activity

10 sec

Page 101: Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

General Linear Model

• ‘design matrix’ X describes timing of processes (for [Dale 1999], this is the stimulus timing)

Y = X h + ε

Observations TxV

Design matrix

Gaussian noise

Response signatures for all stimuli

HPM’s correspond to making X an unobserved random variable

[Dale, HBM 1999]


Recommended