Spatial regularization and sparsity for brain mapping
Bertrand Thirion,INRIA Saclay-Île-de-France, Parietal team
http://[email protected]
June 2014 2Spatial Regularization & sparsity for brain mapping
FMRI data analysis pipeline
Complex metabolic pathway
June 2014 3Spatial Regularization & sparsity for brain mapping
Statistical inference & MVPA Question 1 : Is there any effect ? → omnibus test
MVPA: Can I discriminate btw the two conditions ?Question 2 : What regions actually display a difference btw the two conditions ?
MVPA: Support of the discriminative pattern ?
June 2014 4Spatial Regularization & sparsity for brain mapping
Outline
● Machine learning techniques for MVPA in neuroimaging
● Improving the decoder: smoothness and sparsity
● Recovery and randomness.
June 2014 5Spatial Regularization & sparsity for brain mapping
Reverse inference : combining the information from different regions
Aims at decoding brain activities → predicting a cognitive variable [Dehaene et al. 1998], [Haxby et al. 2001], [Cox et al. 2003]
June 2014 6Spatial Regularization & sparsity for brain mapping
Predictive linear model
y is the behavioral variable.X R∈ n×p is the data matrix, i.e. the activations maps(w, b) are the parameters to be estimated.n activation maps (samples), p voxels (features).
y R∈ n → regression setting :f (X, w, b) = X w + b ,
y {-1, 1}∈ n → classification setting :f (X, w, b) = sign(X w + b) ,where “sign” denotes the sign
function.
y = f (X, w, b) + noise
June 2014 7Spatial Regularization & sparsity for brain mapping
Curse of dimensionality in MVPA
● Problem: p≫ n● Overfit the noise on the training
data● Solutions
● Prior region selection
– Prior selection of brain regions → prior-bound result
● Data-driven feature selection (e.g. Anova, RFE) :
– Univariate methods (Anova) → no optimality ?– Multivariate methods → combinatorial pb, computational cost
● Regularization (e.g. Lasso, Elastic net) :
– Shrink w according to your prior
June 2014 8Spatial Regularization & sparsity for brain mapping
Training a predictive model
● Learning w from a given training set (y, X)
● Choice of the loss
● Regression: Least-squares, Hinge, Huber
● Classification: Hinge, logistic● Choice of the regularizer
● Convex setting: a norm on w● Bayesian setting: prior
distribution on w
June 2014 9Spatial Regularization & sparsity for brain mapping
Evaluation of the decoding
Prediction accuracy
Coefficient of determination R2 : Classification accuracy κ :
→ Quantify the amount of information shared by the pattern and y.
Layout of the resulting maps of weights: Do we have any guarantee to recover the true discriminative pattern ?Common hypothesis = segregation into functionally specific territories→ sparse: few relevant regions implied → compact structure: grouping into connected clusters.
June 2014 10Spatial Regularization & sparsity for brain mapping
You said: recovery ?
[Haufe et al. NIMG 2013]
✗ MVPA cannot recover the true sources as it aims at finding a good discriminative model (“filters”), not at estimating the signal.✗ A correction taking covariance structure is necessary
✔ However, this can be improved by choosing relevant priors✔ You might want to have a discriminative model that makes sense to you
June 2014 11Spatial Regularization & sparsity for brain mapping
● Machine learning techniques for MVPA in neuroimaging
● Improving the decoder: smoothness and sparsity
● Recovery and randomness.
Outline
June 2014 12Spatial Regularization & sparsity for brain mapping
Regularization framework
w = the discriminative pattern Constrain w to select few parameters that explain well the data.→ Penalized regression
✔ ℓ(y, Xw) is the loss function, usually for regression✔ λJ(w) is the penalization term.
Ridge (no sparsity)
Lasso (very sparse)
Elastic net (sparsity + grouping)
Smooth lasso (sparsity + smoothness)
Total variation (piecewise sparsity)
June 2014 13Spatial Regularization & sparsity for brain mapping
Priors and penalization: Brain decoding = engineering problem ?
Prior on the relevant activation maps
Penalization in regularized
regression
Design of a norm ║w║ to be minimized
Example: Total Variation penalization [Michel et al. 2011]
June 2014 14Spatial Regularization & sparsity for brain mapping
Do we need to bother about sparsity ?
Is brain activation (connectivity,..) “sparse” ? No ! But...
In neuroscience, people estimate discriminative patterns that look like:
But in a neuroimaging article, it will look more like
If you want to show the truly discriminative pattern, you need it to be sparse !
June 2014 15Spatial Regularization & sparsity for brain mapping
Solution: (F)ISTA
Gradient descent on the smooth terms
FISTA = accelerated ISTA (much faster convergence)
w(t)
projection on the non-smooth constrains
w(t+1)
Lasso: the proximal operator is simply soft-threshodling
June 2014 16Spatial Regularization & sparsity for brain mapping
The smooth lasso: the proximal operator
sparsitysmoothness
Stronger penalty
June 2014 17Spatial Regularization & sparsity for brain mapping
Sparse total variation: the proximal operator
Stronger penalty
sparsitySmall TV
18
What do the results look like ?
Can nevertheless be improved with adapted techniques
[Gramfort et al PRNI 2013]
Encoding Elastic net decoding Sparse flat decoding
19
Performance on recovery (simulation)
Example of recovery (simulated data):The TV-l1 prior outperforms alternatives
20
Caveat: resulting map depends on convergence tolerance
● TV-l1 estimator: stricter convergence → a different sparser map !
[Dohmatob et al. PRNI 2014]
June 2014 21Spatial Regularization & sparsity for brain mapping
Discussion
● Bayesian alternatives (ARD, smooth ARD) [Sabuncu et al.]
● You lose the convexity● Empirical Bayes: adapts well to new data
● Cost of these methods
● Convergence monitoring is hard● Smoothing + ANOVA selection + SVM is a good competitor...
● Other approaches: use of clustering for structured sparsity [Jenatton et al. SIAM 2012], even more costly !
June 2014 22Spatial Regularization & sparsity for brain mapping
Outline
● Machine learning techniques for MVPA in neuroimaging
● Improving the decoder: smoothness and sparsity
● Recovery and randomness
June 2014 23Spatial Regularization & sparsity for brain mapping
Recovery...
● Prediction vs. Identification
● Prediction: estimate w that maximizes the prediction accuracy
● Identification or Recovery: estimate ŵ such that supp(ŵ) =supp(w)
● Compressive sensing:
● detection of k signals out of p (voxels)● with only n observations << k
● Problem: data are correlated
How to measure the recovery of the set of regions ?How to improve recovery
June 2014 24Spatial Regularization & sparsity for brain mapping
Small sample recovery
[Haxby Science 2001] dataset:
Trying to discriminate faces vs houses: level of performance achieved with limited number of samples
June 2014 25Spatial Regularization & sparsity for brain mapping
Randomization
● Stability selection = randomization of the features + bootstrap on the samples
● Improved feature recovery... for few, weakly correlated features
Lasso path stability path of Lasso
[Meinshausen and Bühlman, 2009]
June 2014 26Spatial Regularization & sparsity for brain mapping
Hierarchical clustering and randomized selection
Algorithm Randomized-Ward-Logistic
(1) Loop: randomly perturb the data
(2) Ward agglomeration to form q features
(3) sparse linear model on reduced features
(4) accumulate non-zero features
(5) threshold map of selection counts
[Gramfort et al. MLINI 2011]
June 2014 27Spatial Regularization & sparsity for brain mapping
Simulation study
F testGround truth Randomized Ward logistic
June 2014 28Spatial Regularization & sparsity for brain mapping
The best approach for feature recovery depends on the problem
● The response depends on the characteristics of the problem: smoothness (coupling between signal and noise) and clustering (redundancy of features)
128 samples 256 samples[Varoquaux et al. ICML 2012]
June 2014 29Spatial Regularization & sparsity for brain mapping
Simulation study
Identification accuracy Prediction accuracy
Improves both prediction and identification !
June 2014 30Spatial Regularization & sparsity for brain mapping
Examples on real data
Regression task [Jimura et al. 2011]
Classification task [Haxby et al. 2001]
June 2014 31Spatial Regularization & sparsity for brain mapping
Conclusion
✔ SVM and sparse models less powerful than univariate methods for recovery. ✔ Sparsity + clustering + randomization: excellent recovery
⇒ Multivariate brain mapping✔ Simultaneous prediction and recovery
cc
✗ High computational cost (parameter setting)
June 2014 32Spatial Regularization & sparsity for brain mapping
Acknowledgements
● Many thanks to my co-workers: V. Michel, G. Varoquaux, A. Gramfort, F. Pedregosa, P. Fillard, J.B. Poline, V.Fritsch, V. Siless, S.Medina, R. Bricquet ● To People who provide data: E.Eger, R. Poldrack, K. Jimura, J. Haxby
June 2014 33Spatial Regularization & sparsity for brain mapping
All this will land into...
● Machine learning for neuroimaging http://nilearn.github.io
● Scikit-learn-like API
● BSD, Python, OSS
● Classification of neuroimaging data (decoding)● Functional connectivity analysis
June 2014 34Spatial Regularization & sparsity for brain mapping
Thank you for your attention
http://parietal.saclay.inria.fr