Ted Willke, Senior Principal Engineer, Intel Labs at MLconf NYC

You thought what?! The promise of real-time brain decoding

Ted Willke

Intel Labs

2

Alvarez & Oliva, 2006

BUILDINGS PEOPLE

What is attention?

“Every one knows what attention is. It is the taking possession by the mind, in clear and vivid form, of one out of what seem several simultaneously possible objects or trains of thought... It implies withdrawal from some things in order to deal effectively with others...”

– William James (1890)

A simple but important distinction: • Overt attention: moving your eyes • Covert attention: moving your mind’s eye

Courtesy of Nick Turk-Browne, Princeton 3

The great controller

Perception Memory Learning

Atte

ntio

n

4 Courtesy of Nick Turk-Browne, Princeton

Perception

5

The brain: The black box at the end of our necks

• Facts: Only 2% of body weight but

uses up to 20% of energy

~200B neurons

Neurons fire up to ~10 kHz

1K to 10K connections per neuron

• The cerebral neocortex (the “mammalian brain” associated with higher reasoning): ~20B neurons

~125 trillion synapses

There are more ways to organize the neocortex’s ~125 trillion synapses than stars in the known universe.

stimulus (task)

mind brain dataset?

what is present in the mind as the task is performed?

Adapted from Francisco Pereira, Botvinick Lab, Princeton

computational model?

what is attended to in the mind as the task is

performed?

6

Non-invasive neuroimaging

7

Electrical phenomena Metabolic phenomena

Positron Emission Tomography

Functional Magnetic Resonance Imaging (fMRI)

Magneto-Encephalography

(MEG)

Consumer EEG (<sensors)

Near-Infrared Spectroscopy (fNIRs)

Be

tte

r sp

ati

al

reso

luti

on

Lab/Medical EEG (>sensors)

Varying portability, temporal & spatial resolution. fMRI is the workhorse of brain research despite disadvantages of non-portability & expense

Real-Time Functional MRI (rtfMRI)

8

metabolic brain

anatomical brain

Adapted from graphic by Jeremy Manning, Princeton

stimulus (task)

mind brain rtfMRI

classifier

conclusions from structure of the learnt model

conclusions from feature choice

weights on features hidden layers

voxel location voxel behavior time within trial

dependent on prediction model

dependent on experiment

Adapted from Francisco Pereira, Botvinick Lab, Princeton 9

Studying attention | dueling categories

% B

OL

D c

ha

ng

e

Time

Face attention

Scene (place) attention

Fusiform face area (FFA)

Parahippocampal place area (PPA)

e.g., O’Craven et al., 1999, Nature

10

Studying attention | coupling hypothesis

Occipital cortex Ventral temporal cortex

V4 FFA

PPA

r

Al-Aidroos et al., 2012, Proc Natl Acad Sci 11

Studying attention | coupling hypothesis

Al-Aidroos et al., 2012, Proc Natl Acad Sci

Face attention Scene attention

N = 7, *p < .05, **p < .01

12

13

Standard types of fMRI analysis. (A) Univariate activation refers to the average amplitude of BOLD activity evoked by events of an experimental condition.

N B Turk-Browne Science 2013;342:580-584 *BOLD: blood oxygenation level–dependent (BOLD) contrast imaging

14

Standard types of fMRI analysis. (A) Univariate activation refers to the average amplitude of BOLD activity evoked by events of an experimental condition.

N B Turk-Browne Science 2013;342:580-584 *MVPA: Multivariate Pattern Analysis *FCMA: Full Correlation Matrix Analysis,

Advanced Analysis MVPA FCMA

Basic (i.e. common) Analysis

Offline fMRI image analysis experiment

data acquisition preprocessing

classifier testing analyze results

Processing time …

6 to 55 hours

voxel analysis 15 Courtesy of Nick Turk-Browne, Princeton

16

real-time brain decoding for causal experimentation

Studying attention | real-time neurofeedback

Attend to scene MORE

scene evidence

LESS scene evidence

Rewarded with stronger stimulus and easier task

Punished with degraded stimulus and harder task

Starting stimulus

17 Courtesy of Nick Turk-Browne, Princeton

data acquisition real-time preprocessing

classifier testing update stimulus display

Processing time …

6 to 55 hours

real-time voxel analysis

Closed-loop rtfMRI neurofeedback system

18

Studying attention | training and scoring

Neurofeedback

Use multivariate pattern analysis (MVPA) over whole-brain activity to decode attention to faces vs. scenes

Mean cross-validation accuracy = 78% ***

Norman et al. (2006), LaConte (2011) Regularized logistic regression (penalty = 1), *** p < 0.001 19

20

Subject

Scanner

Scoring sequence – your brain on scenes?

21

22

This was done with MVPA. We’d also like to try FCMA to include connectivity information, but...

A Big Data/HPC challenge Some facts:

To keep up with the rtfMRI scanner, must process full brain scan and provide feedback in <1sec (for a 2sec TR)

Raw image data for 1 subject, ~480 Gbytes

Some studies train on 100’s of subjects

If we run correlations across all subjects involves a lot of data movement

Processing is expensive:

N~100K voxels per time slice

O(N2) for basic preprocessing (minutes today)

O(N3) to process the full correlation matrix (hours today)

Raw fMRI Data

Patterns of correlated

voxels

Image Sources: Princeton Neuroscience Institute and Wikipedia

“Train classifier on 100’s of subjects during coffee break, classify a subject’s patterns in <1sec.”

23

Machine Learning Workload Convergence

24

Education

Health

Banking

Manufacturing

Usages Workloads Machine Learning

Algorithms

High-level Libraries

Primitives Low-level Libraries

Hardware Platforms

Xeon

Xeon Phi

Xeon FPGA

Xeon Gfx

Add-in card

New ISA Transportation

Building Blocks

Intel can help accelerate a wide range of machine learning through a focus on key building blocks.

25

Intel® Math Kernel Library (Intel® MKL)

Random Number Gen.

• Congruential

• Wichmann-Hill

• Mersenne Twister

• Sobol

• Neiderreiter

• Non-deterministic

Summary Statistics

• Kurtosis

• Variation coefficient

• Quantiles

• Order statistics

• Min/max

• Variance-covariance

Data Fitting

• Spline-based

• Interpolation

• Cell search

Linear Algebra

• BLAS, Sparse BLAS

• LAPACK solvers

• Sparse Solvers (DSS, PARADISO)

• Iterative solver (RCI)

• ScaLAPACK, PBLAS

Fast Fourier Transforms

• Multidimensional

• FFTW interfaces

• Cluster FFT

• Trig. Transforms

• Poisson solver

• Convolution via VSL

Vector Math

• Trigonometric

• Hyperbolic

• Exponential, Logarithmic

• Power / Root

Unveiling Details of Knights Landing (Next Generation Intel® Xeon Phi™ Products)

2nd half ’15 1st commercial systems

3+ TFLOPS In One Package Parallel Performance & Density

On-Package Memory:

up to 16GB at launch

5X Bandwidth vs DDR4

Compute: Energy-efficient IA cores

Microarchitecture enhanced for HPC

3X Single Thread Performance vs Knights Corner

Intel Xeon Processor Binary Compatible

1/3X the Space

5X Power Efficiency

. . .

. . .

Integrated Fabric

Intel® Silvermont Arch. Enhanced for HPC

Processor Package

Conceptual—Not Actual Package Layout

…

Platform Memory: DDR4 Bandwidth and

Capacity Comparable to Intel® Xeon® Processors

Jointly Developed with Micron Technology

26

FCMA Correlation Computation

27

voxe

ls

voxels

scan data

scan data

Correlations

Need Pearson’s correlation coefficient for each pair of voxels

34,470 voxels => over 500 million pairs

Functionality provided by Intel’s libraries

If scan data is normalized (mean-centered and unit norm) then Pearson correlation becomes matrix multiplication

Can use single-precision general matrix multiplication (SGEMM) built into Intel Math Kernel Library (MKL)

Current work is to improve SGEMM performance when computing with small numbers of scans (e.g. 12)

Thanks to Mike Anderson, Intel Labs

FCMA Z-Score Computation

28

Correlations

Need to complete Z-score procedure across all correlation matrices produced by a single subject

Fisher transformation of each correlation coefficient => 0.5* ln((1+x)/(1-x))

Then , at each location in correlation matrix, subtract mean and divide by standard deviation across all correlation matrices

Acceleration using Single Instruction Multiple Data (SIMD) instructions

Correlation coefficients are grouped into contiguous vectors and processed using SIMD instructions to exploit data parallelism

Loop annotated with #pragma simd

Natural logarithm can also be vectorised using Intel Short Vector Math Library (SVML) to accelerate Fisher transformation

voxe

ls voxels


Putting it all together: FCMA Z-score example

29

#pragma omp parallel for for(int v = 0 ; v < step*nSubs ; v++) { int s = v % nSubs; // subject id int i = v / nSubs; // voxel id float (*mat)[row] = (float(*)[row])&(voxels->corr_vecs[i*nTrials*row]); #pragma simd for(int j = 0 ; j < row ; j++) { float mean = 0.0f; float std_dev = 0.0f; for(int b = s*nPerSub; b < (s+1)*nPerSub; b++) { _mm_prefetch((char*)&(mat[b][j+32]), _MM_HINT_ET1); _mm_prefetch((char*)&(mat[b][j+16]), _MM_HINT_T0); float num = 1.0f + mat[b][j]; float den = 1.0f - mat[b][j];

num = (num <= 0.0f) ? 1e-4 : num; den = (den <= 0.0f) ? 1e-4 : den; mat[b][j] = 0.5f * logf(num/den); mean += mat[b][j]; std_dev += mat[b][j] * mat[b][j]; } mean = mean / (float)nPerSub; std_dev = std_dev / (float)nPerSub - mean*mean; float inv_std_dev = (std_dev <= 0.0f) ? 0.0f : 1.0f / sqrt(std_dev); for(int b = s*nPerSub; b < (s+1)*nPerSub; b++) { mat[b][j] = (mat[b][j] - mean) * inv_std_dev; } } } }

Several MPI processes running the above code

OpenMP divides independent voxels (dim1) and subjects across 60 Xeon Phi Cores

#pragma simd directive assigns consecutive voxels (dim2) to vector lanes

voxe

ls

voxels


FCMA SVM

30

Co

rre

lati

on

wit

h v

oxe

l v

i Subjects, trials

Key is to find the most predictive voxels in the correlation matrix • Rows of the correlation matrix are the feature

vectors

Very large number of SVMs are trained • One for each voxel - O(35000) • Each trained SVM is cross validated and the top

few voxels are chosen for predictive analyses

Acceleration using custom SVM code • Kernel matrix precomputed as #dimensions <<

#data points • Ported parallel GPUSVM code to run on Xeon and

Xeon Phi platforms • Uses thread level and SIMD parallelism • Faster than libSVM

Thanks to Narayanan Sundaram, Intel Labs

FCMA – Effect of Optimizations

31

0

1

2

3

4

5

6

7

Co

rre

lati

on

Z-s

core

SV

M

To

tal

Co

rre

lati

on

Z-s

core

SV

M

To

tal

Xeon Xeon Phi

Ru

nti

me

in

se

con

ds

(fo

r 1

7 s

ub

ject

s)

Before optimizations

After optimizations

1.7X speedup on Xeon 5.8X speedup on Xeon Phi Xeon Phi 2.1X faster than Xeon

Thanks to Yida Wang, Princeton, and Narayanan Sundaram

32

Model-based approaches

33

stimulus (task)

mind brain rtfMRI

classifier

conclusions from structure of the learnt model

conclusions from feature choice

weights on features hidden layers

voxel location voxel behavior time within trial

dependent on prediction model

dependent on experiment


34

stimulus (task)

mind brain rtfMRI

classifier


35

stimulus (task)

mind brain rtfMRI

model


predicted stimulus or task

36

stimulus (task)

mind brain rtfMRI

model


predicted rtfMRI data

37

Modeling | Topographic Factor Analysis

Manning JR, Ranganath R, Norman KA, Blei DM (2014) Topographic Factor Analysis: A Bayesian Model for Inferring Brain

Networks from Neural Data. PLoS ONE 9(5): e94914. doi:10.1371/journal.pone.0094914

http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0094914


38






39





N trials V voxels voxel activations y K shared sources (µ, ) weights w


40





number of sources? specification of sources?

hyperparameter values? initialization of sources?


41





“mental state” mn during nth trial gives rise to behavioral data bn and neural data yn


42

... is a work in progress....

more basic neuroscience

research more machine learning

speed and accuracy a look at other model-

based methods

Decoding your thoughts...

43

Conclusions

Closed-loop rtfMRI amplifies and externalizes internal states that are difficult to access

Holds promise for people that suffer from mental disorders or simply want to improve brain performance

Intel is helping put the rt into rtfMRI and unlock the potential of this research

Thanks Princeton Neuroscience Institute!

Jon Cohen — PNI Co-Founder, Professor of Neuroscience and Psychology

Matt Botvinick — Professor of Neuroscience and Psychology

Ken Norman — Professor of Neuroscience and Psychology

Nick Turk-Browne — Professor of Neuroscience and Psychology

Kai Li — Professor of Computer Science and Co-Founder of Data Domain Corporation

44

Date post:	15-Jul-2015
Category:	Technology
Upload:	sessionsevents
View:	537 times
Download:	0 times