+ All Categories
Home > Documents > Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and...

Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and...

Date post: 15-Jun-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
30
Pa#ern recogni,on and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max Planck UCL Centre for Computa=onal Psychiatry and Ageing Research
Transcript
Page 1: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

Pa#ernrecogni,onandneuroimaginginpsychiatry

JanainaMourao-MirandaMachineLearningandNeuroimagingLabMaxPlanckUCLCentreforComputa=onalPsychiatryandAgeingResearch

Page 2: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

Outline

•  Supervisedlearninginclinicalneuroimaging•  Limita=ons•  Associa=vemodels•  Amul=plehold-outframeworkforassocia=vemodels

Page 3: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

Clinicalques,onsinpsychiatry:ü  Diagnosisacross

diseasesü  Predic=ngdiseases

outcomeü  Iden=fyingatrisk

subjectsü  Iden=fyingtreatment

responders

SupervisedLearningFramework:classifica,on

Group1:AtRisk

Group2:LowRisk Training

Predic=vefunc=on

Newsubject

Tes=ng Predic=on:AtRisk/LowRisk

Page 4: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

Clinicalques,onsinpsychiatry:ü  Predictsymptom

intensityfrombrainscans.

ü  Predictpersonalitytraitsfrombrainscans.

SupervisedLearningFramework:regression

Training

Predic=vefunc=on

Newsubject

Tes=ng Predic=on:Score=23

ScoreScoreScoreScoreScoreScore

Page 5: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

Limita,on

label:pa=ent/control

+1/-1

Clinicalassessments

•  Pa=entsgroupsareheterogeneous->categoricallabelsare

unreliable.•  Clinical/behavioralinforma=onarenotembeddedinthemodel

Page 6: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

NIMHResearchDomainCriteria(RDoC)framework

•  Diagnos=ccategoriesbasedonclinicalassessmentsfailtoalignwithfindingsfromclinicalneuroscience,gene=csandhavenotbeenpredic=veoftreatmentresponse.

•  “Developnewwaysofclassifyingmentaldisordersbasedondimensionsofobservablebehaviorandneurobiologicalmeasures.”

Page 7: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

Mul,-sourcelearning

Multivariate associative models

Multi-source

predictive models

Better characterization of mental health disorders

Better diagnosis

and prognosis

Page 8: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

Classifica=on/regressionmodel

•  Predic=vefunc=on(w)•  Outputmetric:accuracy/MSE

labellabellabellabellabellabellabellabellabellabel

f(x*)=x*.w

x*

Reallabel y*w

Predictedlabel

error/accuracymeasure

Page 9: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

Associa=vemodelsPLS/CCA

•  Associa=veeffects(u,v)•  Outputmetric:correla=on/covariance

x*

y*

v

ux*.u

y*.v Correla=onbetweentheprojec=ons

Page 10: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

Associa,vemodels

•  Par=alLeastSquares(PLS)andCanonicalCorrela=onAnalysis(CCA),finddirec=ons(weightvectors)thatmaximizethecovarianceorcorrela=onbetweentheprojec=onsoftwotypeofdata(e.g.brainandbehavioral).

Partial Least Squre (PLS)maxu,v Cov(Xu,Yv) = uTXTYvsubject to

u2=1, v

2=1

Canonical Correlation Analysis (CCA)maxu,v Corr(Xu,Yv) = uTXTYvsubject touTXTXu =1 and vTYTYv =1

X:matrixcontainingneuroimaginginforma=onY:matrixcontainingbehavioral/clinicalinforma=on

Page 11: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

Mul,variateAssocia,veEffects

•  Apairofweightvectoruandvrepresentsamul=variateassocia=veeffectbetweenthetwotypesofdata.

•  Oncethefirstpairisfound,theassociatedeffectcanberemovedfromthedata(bymatrixdefla=on)andthesameprocedurecanbeappliedtofindaddi=onalassocia=veeffects.

•  Theassocia=veeffectstoberanked,sinceeachweightvectorpairwillexplainmorecovariance/correla=oninthedatathanthefollowingones.

Page 12: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

Weightvectors

•  Theweightvectoruandvhavethesamedimensionalityoftheoriginaldatatypes.

•  Bylookingatthepairedvectors,onecaniden=fythefeaturesineachviewaremorerelatedwitheachassocia=veeffect.

•  SparseversionofPLS/CCAenableaselec=onofthenecessaryfeaturestodescribeeachassocia=veeffect.

Imageweight(u)

Scores for clinical variables

Clinical Variable 1 0.6

Clinical Variable 2 0

Behavioral Variable 1 0.4

Clinicalweight(v)

Page 13: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

LatentSpace

•  Byprojec=ngtheeachdatasetontothelatentspace(i.e.uandv)onecanseehowtherela=onshipbetweenthedatasourcesvariesacrossthesample.

•  Forexample,howbrain-behaviourrela=onshipvariesinhealthanddiseasesamples.

Neu

roim

agingprojectedon

tou

(Brainsc

ore)

Clinicaldataprojectedontov(Clinicalscore)

LatentSpace

Page 14: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

ExamplesofSPLS/SCCAapplica,onstoneuroimaging

•  LeFlochetal.[2012]:associa=onsbetweenSingleNucleo=dePolymorphisms(SNPs)andfMRIRegionsofInterest(ROIs)

•  Avantsetal.[2014]:associa=onsbetweensub-scoresofthePhiladelphiaBriefAssessmentofCogni=on(PBAC)ques=onnairefromstructuralMRIdata

•  Rosaetal.[2015]:associa=onsbetweentwoArterialSpin

Labelling(ASL)datasetsfromthesamesubjectsusingdifferentdrugs

Page 15: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

Challenges

•  Howtofindtheop=malnumberofvariablesinwhichviewtodescribethemul=variateassocia=veeffect?

•  Howtotestthesignificanceofthemul=variateassocia=veeffect?

Page 16: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

Journal of Neuroscience Methods 271 (2016) 182–194

Contents lists available at ScienceDirect

Journal of Neuroscience Methods

jo ur nal home p age: www.elsev ier .com/ locate / jneumeth

A multiple hold-out framework for Sparse Partial Least Squares

João M. Monteiroa,b,∗, Anil Raoa,b, John Shawe-Taylora,Janaina Mourão-Mirandaa,b, for the Alzheimer’s Disease Initiative1

a Department of Computer Science, University College London, London, UKb Max Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, UK

h i g h l i g h t s

• SPLS framework which tests modelreliability by fitting it to several datasplits.

• Framework was applied to brainanatomy and individual items of theMMSE score.

• The adequate number of voxels andclinical items was selected automati-cally.

• SPLS found two associative effectsbetween sparse brain voxels andMMSE items.

• Projection deflation provided betterresults than a classical PLS deflation.

g r a p h i c a l a b s t r a c t

a r t i c l e i n f o

Article history:Received 18 December 2015Received in revised form 10 June 2016Accepted 15 June 2016Available online 26 June 2016

Keywords:Machine learningSparse methodsPartial Least SquaresNeuroimagingMini-Mental State ExaminationDementia

a b s t r a c t

Background: Supervised classification machine learning algorithms may have limitations when study-ing brain diseases with heterogeneous populations, as the labels might be unreliable. More exploratoryapproaches, such as Sparse Partial Least Squares (SPLS), may provide insights into the brain’s mecha-nisms by finding relationships between neuroimaging and clinical/demographic data. The identificationof these relationships has the potential to improve the current understanding of disease mechanisms,refine clinical assessment tools, and stratify patients. SPLS finds multivariate associative effects in thedata by computing pairs of sparse weight vectors, where each pair is used to remove its correspondingassociative effect from the data by matrix deflation, before computing additional pairs.New method: We propose a novel SPLS framework which selects the adequate number of voxels andclinical variables to describe each associative effect, and tests their reliability by fitting the model todifferent splits of the data. As a proof of concept, the approach was applied to find associations betweengrey matter probability maps and individual items of the Mini-Mental State Examination (MMSE) in aclinical sample with various degrees of dementia.Results: The framework found two statistically significant associative effects between subsets of brainvoxels and subsets of the questions/tasks.

∗ Corresponding author at: Computer Science Department, University College London, Gower Street, London, WC1E 6BT, UK.E-mail address: [email protected] (J.M. Monteiro).

1 Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigatorswithin the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listingof ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how to apply/ADNI Acknowledgement List.pdf.

http://dx.doi.org/10.1016/j.jneumeth.2016.06.0110165-0270/© 2016 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

•  NovelSPLSframeworkwhich:1.  Selectstheadequatenumbervariablestodescribeeach

associa=veeffect.2.  Teststheirreliabilitybyfiengthemodeltodifferentsplits

ofthedata.

Page 17: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

•  SparsePLSformula=on(Wigenetal.,2009).

maxu,v uTXTYv

subject to

u2

2≤1, v

2

2≤1, u

1≤ cu, v 1

≤ cv

•  Nestedcross-valida,on:•  Computa=onalexpensive•  Largenumberoffoldsleadtohighvarianceoftheresults

•  Mul,plehold-outframework:•  Computa=onallymoreefficient•  Checkthereliabilityoftheobtainedsolu=onstodata

perturba=on.•  Needslargesample->riskofFN

Page 18: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

Hold-outframework

J.M. Monteiro et al. / Journal of Neuroscience Methods 271 (2016) 182–194 185

i. u ← Cvii. u ← S(u,!u)

∥S(u,!u)∥2, where !u = 0 if this results in ∥u∥1 ! cu;

otherwise, !u is set to be a positive constant such that∥u∥1 = cu.

(b) Update v:i. v ← C⊤u

ii. v ← S(v,!v)∥S(v,!v)∥2

, where !v = 0 if this results in ∥v∥1 ! cv;otherwise, !v is set to be a positive constant such that∥v∥1 = cv.

4 Deflate C

where S(·, ·) is the soft-thresholding operator defined as S(a,") = sgn(a)(|a| − ")+, where " > 0 is a constant and x+ is equal tox if x > 0 and x = 0 if x ! 0 (Witten et al., 2009). The initialisationof v in step 2 can be done in several ways (Witten et al., 2009;Parkhomenko et al., 2009; Waaijenborg et al., 2008), in this study,it was done by taking the first component of the SVD of C (Wittenet al., 2009). !u and !v have to be set so that the l1-norm con-straints are obeyed. This is done by iteratively searching for !u and!v, such that ∥u∥1≈ cu and ∥v∥1 ≈ cv.

In other SPLS algorithms, the sparsity is set by adjusting "instead of the l1-norm constraints (Parkhomenko et al., 2009; LêCao et al., 2008). This will make the algorithms faster, since !u and!v do not have to be searched iteratively. However, by setting "directly, there are situations in which this value might be too highand all the entries of u or v will be set to zero, i.e. no variablesare included in the model. The exact value of " for which this hap-pens is dataset dependent. On the other hand, by using cu or cv toset ∥·∥1 = 1, there is a guarantee that at least one entry of the cor-responding weight vector will be different than zero (i.e. at leastone variable is included), making the range of the regularisationhyper-parameters easier to define.

2.3. Learning and validation framework

The proposed framework is divided into three main parts:hyper-parameter optimisation, statistical evaluation, and matrixdeflation. These will be addressed in Sections 2.3.1–2.3.3, respec-tively.

2.3.1. Hyper-parameter optimisationSeveral studies have used k-fold cross-validation to select the

optimal model hyper-parameters using the correlation betweenthe projections of the data onto the weight vectors as a metric(Parkhomenko et al., 2009). Since the number of samples availablein neuroimaging datasets is usually small, the natural tendencywould be to use more folds with a lower number of samples perfold, which increases the variance of the cross-validation results.To overcome this issue, the proposed framework uses an approachbased on random subsampling of the data.

The proposed framework starts by removing 10% of the datarandomly and keeping it as a hold-out dataset (Fig. 1), which willbe used later for the statistical evaluation (Section 2.3.2). Then, thetrain/test dataset is randomly split 100 times into a training set (80%of the data) and a testing set (20% of the data). For each split, themodel is trained on the training set, the testing data are projectedonto the resulting weight vector pair, and the correlation betweenthe projections of the two views is computed:

#k = |Corr(Xku(−k), Ykv(−k))| (5)

where Xk and Yk denote the testing sets; and u(−k) and v(−k) are theweight vectors computed using the training data.

The average correlation of K splits (where K = 100) for a spe-cific hyper-parameter combination (cu, cv) is then computed usingthe arithmetic mean: #cu,cv = 1

K

!Kk=1#k. This procedure is repeated

Fig. 1. Hyper-parameter optimisation framework.

for several hyper-parameter combinations spanning the full hyper-parameter range, and the combination with the highest averagecorrelation is selected, i.e. a grid-search is performed. The selectedhyper-parameter combination will then be used to train the modelsin the statistical evaluation step (Section 2.3.2).

By doing this random subsampling procedure, we are increas-ing both the number of correlation values (#k) used to computethe average correlation ( #cu,cv ), and the size of the testing datasets,which should make the estimation of the average correlation perhyper-parameter combination ( #cu,cv ) more stable. Please note thatthe same random splits are performed for each hyper-parametercombination. Also, the grid-search is performed using 40 equidis-tant points in 1 ! cu ! √p and 1 ! cv ! √q, which makes a totalof 1600 hyper-parameter combinations. The plots showing theaverage absolute correlation for different hyper-parameter values(hyper-parameter space) are provided in the supplementary mate-rial of this paper.

2.3.2. Statistical evaluationWhen testing the statistical significance of an associative effect

using a permutation test with a nested cross-validation framework,one has to re-train the model for every permutation, including thehyper-parameter optimisation step. Unfortunately, when dealingwith very high-dimensional data, such as whole-brain images, thiscan be computationally prohibitive. In order to assess the statis-tical significance of the weight vector pairs without performing ahyper-parameter optimisation for each permutation, we proposean approach which uses hold-out datasets {X*, Y*}.

The statistical evaluation step is summarised in Fig. 2, whichstarts by training a model with all the train/test data using the opti-mal hyper-parameters selected in the previous step (Section 2.3.1),

Fig. 2. Permutation framework.

J.M. Monteiro et al. / Journal of Neuroscience Methods 271 (2016) 182–194 185

i. u ← Cvii. u ← S(u,!u)

∥S(u,!u)∥2, where !u = 0 if this results in ∥u∥1 ! cu;

otherwise, !u is set to be a positive constant such that∥u∥1 = cu.

(b) Update v:i. v ← C⊤u

ii. v ← S(v,!v)∥S(v,!v)∥2

, where !v = 0 if this results in ∥v∥1 ! cv;otherwise, !v is set to be a positive constant such that∥v∥1 = cv.

4 Deflate C

where S(·, ·) is the soft-thresholding operator defined as S(a,") = sgn(a)(|a| − ")+, where " > 0 is a constant and x+ is equal tox if x > 0 and x = 0 if x ! 0 (Witten et al., 2009). The initialisationof v in step 2 can be done in several ways (Witten et al., 2009;Parkhomenko et al., 2009; Waaijenborg et al., 2008), in this study,it was done by taking the first component of the SVD of C (Wittenet al., 2009). !u and !v have to be set so that the l1-norm con-straints are obeyed. This is done by iteratively searching for !u and!v, such that ∥u∥1≈ cu and ∥v∥1 ≈ cv.

In other SPLS algorithms, the sparsity is set by adjusting "instead of the l1-norm constraints (Parkhomenko et al., 2009; LêCao et al., 2008). This will make the algorithms faster, since !u and!v do not have to be searched iteratively. However, by setting "directly, there are situations in which this value might be too highand all the entries of u or v will be set to zero, i.e. no variablesare included in the model. The exact value of " for which this hap-pens is dataset dependent. On the other hand, by using cu or cv toset ∥·∥1 = 1, there is a guarantee that at least one entry of the cor-responding weight vector will be different than zero (i.e. at leastone variable is included), making the range of the regularisationhyper-parameters easier to define.

2.3. Learning and validation framework

The proposed framework is divided into three main parts:hyper-parameter optimisation, statistical evaluation, and matrixdeflation. These will be addressed in Sections 2.3.1–2.3.3, respec-tively.

2.3.1. Hyper-parameter optimisationSeveral studies have used k-fold cross-validation to select the

optimal model hyper-parameters using the correlation betweenthe projections of the data onto the weight vectors as a metric(Parkhomenko et al., 2009). Since the number of samples availablein neuroimaging datasets is usually small, the natural tendencywould be to use more folds with a lower number of samples perfold, which increases the variance of the cross-validation results.To overcome this issue, the proposed framework uses an approachbased on random subsampling of the data.

The proposed framework starts by removing 10% of the datarandomly and keeping it as a hold-out dataset (Fig. 1), which willbe used later for the statistical evaluation (Section 2.3.2). Then, thetrain/test dataset is randomly split 100 times into a training set (80%of the data) and a testing set (20% of the data). For each split, themodel is trained on the training set, the testing data are projectedonto the resulting weight vector pair, and the correlation betweenthe projections of the two views is computed:

#k = |Corr(Xku(−k), Ykv(−k))| (5)

where Xk and Yk denote the testing sets; and u(−k) and v(−k) are theweight vectors computed using the training data.

The average correlation of K splits (where K = 100) for a spe-cific hyper-parameter combination (cu, cv) is then computed usingthe arithmetic mean: #cu,cv = 1

K

!Kk=1#k. This procedure is repeated

Fig. 1. Hyper-parameter optimisation framework.

for several hyper-parameter combinations spanning the full hyper-parameter range, and the combination with the highest averagecorrelation is selected, i.e. a grid-search is performed. The selectedhyper-parameter combination will then be used to train the modelsin the statistical evaluation step (Section 2.3.2).

By doing this random subsampling procedure, we are increas-ing both the number of correlation values (#k) used to computethe average correlation ( #cu,cv ), and the size of the testing datasets,which should make the estimation of the average correlation perhyper-parameter combination ( #cu,cv ) more stable. Please note thatthe same random splits are performed for each hyper-parametercombination. Also, the grid-search is performed using 40 equidis-tant points in 1 ! cu ! √p and 1 ! cv ! √q, which makes a totalof 1600 hyper-parameter combinations. The plots showing theaverage absolute correlation for different hyper-parameter values(hyper-parameter space) are provided in the supplementary mate-rial of this paper.

2.3.2. Statistical evaluationWhen testing the statistical significance of an associative effect

using a permutation test with a nested cross-validation framework,one has to re-train the model for every permutation, including thehyper-parameter optimisation step. Unfortunately, when dealingwith very high-dimensional data, such as whole-brain images, thiscan be computationally prohibitive. In order to assess the statis-tical significance of the weight vector pairs without performing ahyper-parameter optimisation for each permutation, we proposean approach which uses hold-out datasets {X*, Y*}.

The statistical evaluation step is summarised in Fig. 2, whichstarts by training a model with all the train/test data using the opti-mal hyper-parameters selected in the previous step (Section 2.3.1),

Fig. 2. Permutation framework.

Page 19: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

J.M. Monteiro et al. / Journal of Neuroscience Methods 271 (2016) 182–194 185

i. u ← Cvii. u ← S(u,!u)

∥S(u,!u)∥2, where !u = 0 if this results in ∥u∥1 ! cu;

otherwise, !u is set to be a positive constant such that∥u∥1 = cu.

(b) Update v:i. v ← C⊤u

ii. v ← S(v,!v)∥S(v,!v)∥2

, where !v = 0 if this results in ∥v∥1 ! cv;otherwise, !v is set to be a positive constant such that∥v∥1 = cv.

4 Deflate C

where S(·, ·) is the soft-thresholding operator defined as S(a,") = sgn(a)(|a| − ")+, where " > 0 is a constant and x+ is equal tox if x > 0 and x = 0 if x ! 0 (Witten et al., 2009). The initialisationof v in step 2 can be done in several ways (Witten et al., 2009;Parkhomenko et al., 2009; Waaijenborg et al., 2008), in this study,it was done by taking the first component of the SVD of C (Wittenet al., 2009). !u and !v have to be set so that the l1-norm con-straints are obeyed. This is done by iteratively searching for !u and!v, such that ∥u∥1≈ cu and ∥v∥1 ≈ cv.

In other SPLS algorithms, the sparsity is set by adjusting "instead of the l1-norm constraints (Parkhomenko et al., 2009; LêCao et al., 2008). This will make the algorithms faster, since !u and!v do not have to be searched iteratively. However, by setting "directly, there are situations in which this value might be too highand all the entries of u or v will be set to zero, i.e. no variablesare included in the model. The exact value of " for which this hap-pens is dataset dependent. On the other hand, by using cu or cv toset ∥·∥1 = 1, there is a guarantee that at least one entry of the cor-responding weight vector will be different than zero (i.e. at leastone variable is included), making the range of the regularisationhyper-parameters easier to define.

2.3. Learning and validation framework

The proposed framework is divided into three main parts:hyper-parameter optimisation, statistical evaluation, and matrixdeflation. These will be addressed in Sections 2.3.1–2.3.3, respec-tively.

2.3.1. Hyper-parameter optimisationSeveral studies have used k-fold cross-validation to select the

optimal model hyper-parameters using the correlation betweenthe projections of the data onto the weight vectors as a metric(Parkhomenko et al., 2009). Since the number of samples availablein neuroimaging datasets is usually small, the natural tendencywould be to use more folds with a lower number of samples perfold, which increases the variance of the cross-validation results.To overcome this issue, the proposed framework uses an approachbased on random subsampling of the data.

The proposed framework starts by removing 10% of the datarandomly and keeping it as a hold-out dataset (Fig. 1), which willbe used later for the statistical evaluation (Section 2.3.2). Then, thetrain/test dataset is randomly split 100 times into a training set (80%of the data) and a testing set (20% of the data). For each split, themodel is trained on the training set, the testing data are projectedonto the resulting weight vector pair, and the correlation betweenthe projections of the two views is computed:

#k = |Corr(Xku(−k), Ykv(−k))| (5)

where Xk and Yk denote the testing sets; and u(−k) and v(−k) are theweight vectors computed using the training data.

The average correlation of K splits (where K = 100) for a spe-cific hyper-parameter combination (cu, cv) is then computed usingthe arithmetic mean: #cu,cv = 1

K

!Kk=1#k. This procedure is repeated

Fig. 1. Hyper-parameter optimisation framework.

for several hyper-parameter combinations spanning the full hyper-parameter range, and the combination with the highest averagecorrelation is selected, i.e. a grid-search is performed. The selectedhyper-parameter combination will then be used to train the modelsin the statistical evaluation step (Section 2.3.2).

By doing this random subsampling procedure, we are increas-ing both the number of correlation values (#k) used to computethe average correlation ( #cu,cv ), and the size of the testing datasets,which should make the estimation of the average correlation perhyper-parameter combination ( #cu,cv ) more stable. Please note thatthe same random splits are performed for each hyper-parametercombination. Also, the grid-search is performed using 40 equidis-tant points in 1 ! cu ! √p and 1 ! cv ! √q, which makes a totalof 1600 hyper-parameter combinations. The plots showing theaverage absolute correlation for different hyper-parameter values(hyper-parameter space) are provided in the supplementary mate-rial of this paper.

2.3.2. Statistical evaluationWhen testing the statistical significance of an associative effect

using a permutation test with a nested cross-validation framework,one has to re-train the model for every permutation, including thehyper-parameter optimisation step. Unfortunately, when dealingwith very high-dimensional data, such as whole-brain images, thiscan be computationally prohibitive. In order to assess the statis-tical significance of the weight vector pairs without performing ahyper-parameter optimisation for each permutation, we proposean approach which uses hold-out datasets {X*, Y*}.

The statistical evaluation step is summarised in Fig. 2, whichstarts by training a model with all the train/test data using the opti-mal hyper-parameters selected in the previous step (Section 2.3.1),

Fig. 2. Permutation framework.

J.M. Monteiro et al. / Journal of Neuroscience Methods 271 (2016) 182–194 185

i. u ← Cvii. u ← S(u,!u)

∥S(u,!u)∥2, where !u = 0 if this results in ∥u∥1 ! cu;

otherwise, !u is set to be a positive constant such that∥u∥1 = cu.

(b) Update v:i. v ← C⊤u

ii. v ← S(v,!v)∥S(v,!v)∥2

, where !v = 0 if this results in ∥v∥1 ! cv;otherwise, !v is set to be a positive constant such that∥v∥1 = cv.

4 Deflate C

where S(·, ·) is the soft-thresholding operator defined as S(a,") = sgn(a)(|a| − ")+, where " > 0 is a constant and x+ is equal tox if x > 0 and x = 0 if x ! 0 (Witten et al., 2009). The initialisationof v in step 2 can be done in several ways (Witten et al., 2009;Parkhomenko et al., 2009; Waaijenborg et al., 2008), in this study,it was done by taking the first component of the SVD of C (Wittenet al., 2009). !u and !v have to be set so that the l1-norm con-straints are obeyed. This is done by iteratively searching for !u and!v, such that ∥u∥1≈ cu and ∥v∥1 ≈ cv.

In other SPLS algorithms, the sparsity is set by adjusting "instead of the l1-norm constraints (Parkhomenko et al., 2009; LêCao et al., 2008). This will make the algorithms faster, since !u and!v do not have to be searched iteratively. However, by setting "directly, there are situations in which this value might be too highand all the entries of u or v will be set to zero, i.e. no variablesare included in the model. The exact value of " for which this hap-pens is dataset dependent. On the other hand, by using cu or cv toset ∥·∥1 = 1, there is a guarantee that at least one entry of the cor-responding weight vector will be different than zero (i.e. at leastone variable is included), making the range of the regularisationhyper-parameters easier to define.

2.3. Learning and validation framework

The proposed framework is divided into three main parts:hyper-parameter optimisation, statistical evaluation, and matrixdeflation. These will be addressed in Sections 2.3.1–2.3.3, respec-tively.

2.3.1. Hyper-parameter optimisationSeveral studies have used k-fold cross-validation to select the

optimal model hyper-parameters using the correlation betweenthe projections of the data onto the weight vectors as a metric(Parkhomenko et al., 2009). Since the number of samples availablein neuroimaging datasets is usually small, the natural tendencywould be to use more folds with a lower number of samples perfold, which increases the variance of the cross-validation results.To overcome this issue, the proposed framework uses an approachbased on random subsampling of the data.

The proposed framework starts by removing 10% of the datarandomly and keeping it as a hold-out dataset (Fig. 1), which willbe used later for the statistical evaluation (Section 2.3.2). Then, thetrain/test dataset is randomly split 100 times into a training set (80%of the data) and a testing set (20% of the data). For each split, themodel is trained on the training set, the testing data are projectedonto the resulting weight vector pair, and the correlation betweenthe projections of the two views is computed:

#k = |Corr(Xku(−k), Ykv(−k))| (5)

where Xk and Yk denote the testing sets; and u(−k) and v(−k) are theweight vectors computed using the training data.

The average correlation of K splits (where K = 100) for a spe-cific hyper-parameter combination (cu, cv) is then computed usingthe arithmetic mean: #cu,cv = 1

K

!Kk=1#k. This procedure is repeated

Fig. 1. Hyper-parameter optimisation framework.

for several hyper-parameter combinations spanning the full hyper-parameter range, and the combination with the highest averagecorrelation is selected, i.e. a grid-search is performed. The selectedhyper-parameter combination will then be used to train the modelsin the statistical evaluation step (Section 2.3.2).

By doing this random subsampling procedure, we are increas-ing both the number of correlation values (#k) used to computethe average correlation ( #cu,cv ), and the size of the testing datasets,which should make the estimation of the average correlation perhyper-parameter combination ( #cu,cv ) more stable. Please note thatthe same random splits are performed for each hyper-parametercombination. Also, the grid-search is performed using 40 equidis-tant points in 1 ! cu ! √p and 1 ! cv ! √q, which makes a totalof 1600 hyper-parameter combinations. The plots showing theaverage absolute correlation for different hyper-parameter values(hyper-parameter space) are provided in the supplementary mate-rial of this paper.

2.3.2. Statistical evaluationWhen testing the statistical significance of an associative effect

using a permutation test with a nested cross-validation framework,one has to re-train the model for every permutation, including thehyper-parameter optimisation step. Unfortunately, when dealingwith very high-dimensional data, such as whole-brain images, thiscan be computationally prohibitive. In order to assess the statis-tical significance of the weight vector pairs without performing ahyper-parameter optimisation for each permutation, we proposean approach which uses hold-out datasets {X*, Y*}.

The statistical evaluation step is summarised in Fig. 2, whichstarts by training a model with all the train/test data using the opti-mal hyper-parameters selected in the previous step (Section 2.3.1),

Fig. 2. Permutation framework.

Repeat10=mes

•  Correctforp-valuesmul=plecomparison.

•  Combined/OmnibushypothesisHomniis:“Allthenull-hypothesisHsaretrue”.

•  Ifanyofthe10p-valuesissta=s=callysignificant,then,theomnibushypothesiswillberejected.

Page 20: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

•  Leave10%ofthedataforhold-out

•  Sample:80%trainingand20%test

•  Greedysearchforhyper-parameterop=miza=onusing

correla=onasmetric

Repe

at100

•  Selectedop=malhyper-parameter

•  Trainthemodelusingtheop=malhyper-parameters•  Computehold-outdata

correla=on

Permuteoneofthedatamatricesofthetrain/testset

Repe

at10000

Computep-value

holdoutcorrela,on

p-value

•  Trainthemodelusingtheop=malhyper-parameters•  Computehold-outdata

correla=on

Page 21: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

Braindata:sMRI

•  592uniquesubjectsfromtheADNI:•  309males(averageage74.68±7.36)•  283females(averageage72.18±7.50)

•  T1weightedMRIscanspreprocessedinSPM2•  Segmentedintogreymagerprobabilitymaps•  NormalisedusingDARTEL•  ConvertedtoMNIspace(2x2x2mm)•  SmoothedwithaGaussianfilterwith2mmFWHM•  Masktoselectvoxels>10%graymagerprobability

Page 22: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

192 J.M. Monteiro et al. / Journal of Neuroscience Methods 271 (2016) 182–194

projection deflation method has shown to provide statistically sig-nificant results in the second weight vector pair (which was notobserved when using the classic PLS deflation method), by tryingto enforce orthogonality between the sparse weight vector pairs.In addition, by fitting the model to different splits of the data, wewere able to obtain robust models and significance tests for SPLS.Finally, the different projections of the subjects onto the SPLS latentspace showed that no defined clusters could be found, however, thesubjects seemed to form a continuous distribution from healthiersubjects, to progressively worse cases of neurodegeneration.

Acknowledgements

The authors would like to acknowledge Prof. John Ashburner forhis insightful comments to the original manuscript.

João M. Monteiro was supported by a PhD studentship awardedby Fundac ão para a Ciência e a Tecnologia (SFRH/BD/88345/2012).

Janaina Mourão-Miranda and Anil Rao were supported by theWellcome Trust under grant no. WT102845/Z/13/Z.

Data collection and sharing for this project was funded bythe Alzheimer’s Disease Neuroimaging Initiative (ADNI) (NationalInstitutes of Health Grant U01 AG024904) and DOD ADNI (Depart-ment of Defense award number W81XWH-12-2-0012). ADNI isfunded by the National Institute on Aging, the National Instituteof Biomedical Imaging and Bioengineering, and through generouscontributions from the following: AbbVie, Alzheimer’s Associa-tion; Alzheimer’s Drug Discovery Foundation; Araclon Biotech;BioClinica Inc.; Biogen; Bristol-Myers Squibb Company; CereSpirInc.; Eisai Inc.; Elan Pharmaceuticals Inc.; Eli Lilly and Company;EuroImmun; F. Hoffmann-La Roche Ltd. and its affiliated com-pany Genentech Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; JanssenAlzheimer Immunotherapy Research & Development, LLC.; Johnson& Johnson Pharmaceutical Research & Development LLC.; Lumos-ity; Lundbeck; Merck & Co. Inc.; Meso Scale Diagnostics, LLC.;NeuroRx Research; Neurotrack Technologies; Novartis Pharma-ceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; TakedaPharmaceutical Company; and Transition Therapeutics. The Cana-dian Institutes of Health Research is providing funds to supportADNI clinical sites in Canada. Private sector contributions are facil-itated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Insti-tute for Research and Education, and the study is coordinated bythe Alzheimer’s Disease Cooperative Study at the University ofCalifornia, San Diego. ADNI data are disseminated by the Laboratoryfor Neuro Imaging at the University of Southern California.

Appendix A. Mini-Mental State Examination

Table A1 gives a brief description of the questions/tasks per-formed during the MMSE (Folstein et al., 1975).

Appendix B. Weight vectors or associative effects

B.1. PLS

The average of the clinical weight vectors is presented in Fig. A1,and the corresponding image weight vector can be seen in Fig. A2.As expected, these are not sparse, all the available clinical vari-ables and image voxels are included in the model. As one can see,the weights are higher in the hippocampus and amygdala regions.However, since these are weights and not p-values (as the onesobtained in an mass-univariate statistical test), they cannot bethresholded.

Table A1MMSE questions/tasks.

Domain Question/task

Orientation

1. What is today’s date?2. What year is it?3. What month is it?4. What day of the week is today?5. What season is it?6. What is the name of this hospital?7. What floor are we on?8. What town or city are we in?9. What county (district) are we in?10. What state are we in?

Registration

11. Name object (ball)12. Name object (flag)13. Name object (tree)13a. Number of trials

Att. & calc.

14. D15. L16. R17. O18. W

Recall19. Recall Ball20. Recall Flag21. Recall Tree

Language

22. Show a wrist watch and ask “What is this?”23. Show a pencil and ask “What is this?”24. Repeat a sentence25. Takes paper in right hand26. Folds paper in half27. Puts paper on floor28. Read and obey a command (“Close your eyes”)29. Write a sentence30. Copy design

Fig. A1. Mean of clinical weight vector using PLS.

Fig. A2. Mean of image weight vectors using PLS.

ClinicalData •  MMSEques=onnaire

items

Page 23: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

Resultsacrossdifferentsplits

J.M. Monteiro et al. / Journal of Neuroscience Methods 271 (2016) 182–194 187

may bring insights about their structure, which can potentially beused for patient stratification.

2.5. Dataset

The data used in the preparation of this article were obtainedfrom the Alzheimer’s Disease Neuroimaging Initiative (ADNI)database (adni.loni.usc.edu). The ADNI was launched in 2003 as apublic-private partnership, led by Principal Investigator MichaelW. Weiner, MD. The primary goal of ADNI has been to testwhether serial Magnetic Resonance Imaging (MRI), Positron Emis-sion Tomography (PET), other biological markers, and clinicaland neuropsychological assessment can be combined to measurethe progression of mild cognitive impairment (MCI) and earlyAlzheimer’s disease (AD). For up-to-date information, see www.adni-info.org.

SPLS was applied to investigate the association between the greymatter maps and the individual scores of the questions/tasks of theMMSE, which is a quite widely used exam that is performed onpatients with dementia (Folstein et al., 1975). The dataset consistedof a subset of 592 unique subjects from the ADNI: 309 males (aver-age age 74.68 ± 7.36) and 283 females (average age 72.18 ± 7.50).These subjects were clinically labeled as being either healthy, suf-fering from MCI, or suffering from AD. The T1 weighted MRI scanswere segmented into grey matter probability maps using SPM12,normalised using DARTEL, converted to MNI space with voxel sizeof 2 mm × 2 mm × 2 mm and smoothed with a Gaussian filter with2 mm FWHM. A mask was then generated, this selected voxelswhich had an average probability of being grey matter equal orhigher than 10% for the whole dataset. This resulted in 168 130voxels per subject being used.

Each question/task of the MMSE was coded in the followingway: the subjects were given a score of 1 if the answer was cor-rect, or the task was performed correctly; and a score of 2 if theanswer was wrong, or the task was not performed correctly. Theexam is conducted by a clinician and is divided into five categories,each containing different questions/tasks, which test five differentcognitive domains (Folstein et al., 1975):

• Orientation (questions 1 to 10) — These are related with temporaland spatial orientation.

• Registration (questions 11 to 13) — The clinician names threeobjects and asks the patient to repeat all three. There is an extraquestion (13.a) in which the clinician writes down the number oftrials that the subject had to take.

• Attention and Calculation (questions 14 to 18) — The subject isasked to spell the word “world” backwards (i.e. “D”, “L”, “R”, “O”,“W”). A score is attributed for each letter, and the subject is onlygiven a good score if the letter is in the correct order.

• Recall (questions 19 to 21) — The subject is asked to name thethree objects named before (questions 11 to 13).

• Language (questions 22 to 30) — These questions/tasks involverecognising and naming objects (e.g. naming a watch and a pen-cil), repeating a sentence, understanding verbal commands (e.g.“take a paper with the right hand”, “fold it in half”, “put it on thefloor”), reading, writing, and drawing.

For a detailed list of the questions/tasks, please refer to AppendixA. All the features in both views (image and clinical) were mean-centered and normalised to have standard deviation equal to 1.

3. Results

3.1. Statistical significance testing

Table 1 shows the p-values obtained by using PLS with the pro-posed framework, as we can see, the omnibus hypothesis Homni

Table 1PLS p-values computed with 10 000 permutations. All p-values are rounded to 4decimal places.

PLS (u, v) pair

Split PLS deflation Proj. deflation

1 2 3 2 3

1 0.0690 0.0193 0.8619 0.4635 0.98532 0.2825 0.0655 0.0422 0.0323 0.31753 0.0120 0.2718 0.0173 0.4599 0.06094 0.0902 0.4255 0.4968 0.0742 0.48365 0.0924 0.9607 0.9855 0.9152 0.21066 0.0844 0.3984 0.1593 0.3412 0.52707 0.0866 0.1860 0.7745 0.9767 0.83428 0.0894 0.0479 0.1417 0.3052 0.68699 0.1233 0.1396 0.3932 0.3170 0.577510 0.0224 0.0289 0.1805 0.9831 0.7565

Rej. Homni No No No No No

(Section 2.3.2) could not be rejected, i.e. p > 0.005 for all the splits.Although the proposed framework would stop as soon as a statis-tically significant weight vector pair could not be found, i.e. in thefirst associative effect for the considered dataset, the algorithm wasrun until 3 associative effects were found, in order to assess howthe different deflation methods behave with PLS and SPLS.

The p-values obtained with SPLS can be seen in Table 2. In thiscase, Homni was rejected twice: for the first and second associativeeffects using projection deflation. No statistically significant resultswere obtained when using a PLS deflation.

3.2. Generalisability of the weight vectors

Fig. 3 shows the average absolute correlation on the 10 hold-outdatasets obtained with both PLS and SPLS, using the two types ofdeflation. The average absolute correlation on the hold-out datasetsexhibits a consistent downward trend, and seems to be higher whenPLS deflation is applied with PLS. However, when SPLS is used, pro-jection deflation seems to perform better, exhibiting higher averagecorrelation values on the hold-out datasets, and having smallerstandard deviation (which is reflected by the smaller error bars).

3.3. Weight vectors or associative effects

3.3.1. PLSSince PLS was not able to reject the omnibus hypothesis, no

weight vectors are presented in this section. For comparative pur-poses, the average of the weight vectors for the first effect can beseen in Appendix B.1.

Table 2SPLS p-values computed with 10 000 permutations (statistically significant resultsare shown in bold). All p-values are rounded to 4 decimal places.

SPLS (u, v) pair

Split PLS deflation Proj. deflation

1 2 3 2 3

1 0.0007 0.2476 0.3754 0.0376 0.05832 0.0068 0.2365 0.3585 0.0041 0.17693 0.0002 0.6051 0.0460 0.5298 0.50294 0.0002 0.9637 0.2013 0.0509 0.28415 0.0001 0.5711 0.9273 0.0012 0.39786 0.0005 0.6613 0.1107 0.0782 0.32677 0.0001 0.6073 0.3526 0.0256 0.00668 0.0016 0.9777 0.4515 0.0405 0.11269 0.0004 0.0713 0.4301 0.0002 0.069210 0.0001 0.1618 0.1817 0.2745 0.4399

Rej. Homni Yes No No Yes No

J.M. Monteiro et al. / Journal of Neuroscience Methods 271 (2016) 182–194 187

may bring insights about their structure, which can potentially beused for patient stratification.

2.5. Dataset

The data used in the preparation of this article were obtainedfrom the Alzheimer’s Disease Neuroimaging Initiative (ADNI)database (adni.loni.usc.edu). The ADNI was launched in 2003 as apublic-private partnership, led by Principal Investigator MichaelW. Weiner, MD. The primary goal of ADNI has been to testwhether serial Magnetic Resonance Imaging (MRI), Positron Emis-sion Tomography (PET), other biological markers, and clinicaland neuropsychological assessment can be combined to measurethe progression of mild cognitive impairment (MCI) and earlyAlzheimer’s disease (AD). For up-to-date information, see www.adni-info.org.

SPLS was applied to investigate the association between the greymatter maps and the individual scores of the questions/tasks of theMMSE, which is a quite widely used exam that is performed onpatients with dementia (Folstein et al., 1975). The dataset consistedof a subset of 592 unique subjects from the ADNI: 309 males (aver-age age 74.68 ± 7.36) and 283 females (average age 72.18 ± 7.50).These subjects were clinically labeled as being either healthy, suf-fering from MCI, or suffering from AD. The T1 weighted MRI scanswere segmented into grey matter probability maps using SPM12,normalised using DARTEL, converted to MNI space with voxel sizeof 2 mm × 2 mm × 2 mm and smoothed with a Gaussian filter with2 mm FWHM. A mask was then generated, this selected voxelswhich had an average probability of being grey matter equal orhigher than 10% for the whole dataset. This resulted in 168 130voxels per subject being used.

Each question/task of the MMSE was coded in the followingway: the subjects were given a score of 1 if the answer was cor-rect, or the task was performed correctly; and a score of 2 if theanswer was wrong, or the task was not performed correctly. Theexam is conducted by a clinician and is divided into five categories,each containing different questions/tasks, which test five differentcognitive domains (Folstein et al., 1975):

• Orientation (questions 1 to 10) — These are related with temporaland spatial orientation.

• Registration (questions 11 to 13) — The clinician names threeobjects and asks the patient to repeat all three. There is an extraquestion (13.a) in which the clinician writes down the number oftrials that the subject had to take.

• Attention and Calculation (questions 14 to 18) — The subject isasked to spell the word “world” backwards (i.e. “D”, “L”, “R”, “O”,“W”). A score is attributed for each letter, and the subject is onlygiven a good score if the letter is in the correct order.

• Recall (questions 19 to 21) — The subject is asked to name thethree objects named before (questions 11 to 13).

• Language (questions 22 to 30) — These questions/tasks involverecognising and naming objects (e.g. naming a watch and a pen-cil), repeating a sentence, understanding verbal commands (e.g.“take a paper with the right hand”, “fold it in half”, “put it on thefloor”), reading, writing, and drawing.

For a detailed list of the questions/tasks, please refer to AppendixA. All the features in both views (image and clinical) were mean-centered and normalised to have standard deviation equal to 1.

3. Results

3.1. Statistical significance testing

Table 1 shows the p-values obtained by using PLS with the pro-posed framework, as we can see, the omnibus hypothesis Homni

Table 1PLS p-values computed with 10 000 permutations. All p-values are rounded to 4decimal places.

PLS (u, v) pair

Split PLS deflation Proj. deflation

1 2 3 2 3

1 0.0690 0.0193 0.8619 0.4635 0.98532 0.2825 0.0655 0.0422 0.0323 0.31753 0.0120 0.2718 0.0173 0.4599 0.06094 0.0902 0.4255 0.4968 0.0742 0.48365 0.0924 0.9607 0.9855 0.9152 0.21066 0.0844 0.3984 0.1593 0.3412 0.52707 0.0866 0.1860 0.7745 0.9767 0.83428 0.0894 0.0479 0.1417 0.3052 0.68699 0.1233 0.1396 0.3932 0.3170 0.577510 0.0224 0.0289 0.1805 0.9831 0.7565

Rej. Homni No No No No No

(Section 2.3.2) could not be rejected, i.e. p > 0.005 for all the splits.Although the proposed framework would stop as soon as a statis-tically significant weight vector pair could not be found, i.e. in thefirst associative effect for the considered dataset, the algorithm wasrun until 3 associative effects were found, in order to assess howthe different deflation methods behave with PLS and SPLS.

The p-values obtained with SPLS can be seen in Table 2. In thiscase, Homni was rejected twice: for the first and second associativeeffects using projection deflation. No statistically significant resultswere obtained when using a PLS deflation.

3.2. Generalisability of the weight vectors

Fig. 3 shows the average absolute correlation on the 10 hold-outdatasets obtained with both PLS and SPLS, using the two types ofdeflation. The average absolute correlation on the hold-out datasetsexhibits a consistent downward trend, and seems to be higher whenPLS deflation is applied with PLS. However, when SPLS is used, pro-jection deflation seems to perform better, exhibiting higher averagecorrelation values on the hold-out datasets, and having smallerstandard deviation (which is reflected by the smaller error bars).

3.3. Weight vectors or associative effects

3.3.1. PLSSince PLS was not able to reject the omnibus hypothesis, no

weight vectors are presented in this section. For comparative pur-poses, the average of the weight vectors for the first effect can beseen in Appendix B.1.

Table 2SPLS p-values computed with 10 000 permutations (statistically significant resultsare shown in bold). All p-values are rounded to 4 decimal places.

SPLS (u, v) pair

Split PLS deflation Proj. deflation

1 2 3 2 3

1 0.0007 0.2476 0.3754 0.0376 0.05832 0.0068 0.2365 0.3585 0.0041 0.17693 0.0002 0.6051 0.0460 0.5298 0.50294 0.0002 0.9637 0.2013 0.0509 0.28415 0.0001 0.5711 0.9273 0.0012 0.39786 0.0005 0.6613 0.1107 0.0782 0.32677 0.0001 0.6073 0.3526 0.0256 0.00668 0.0016 0.9777 0.4515 0.0405 0.11269 0.0004 0.0713 0.4301 0.0002 0.069210 0.0001 0.1618 0.1817 0.2745 0.4399

Rej. Homni Yes No No Yes No

Page 24: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

188 J.M. Monteiro et al. / Journal of Neuroscience Methods 271 (2016) 182–194

Fig. 3. Average absolute correlation on the hold-out datasets.

3.3.2. SPLSUnlike PLS, SPLS found statistically significant sparse weight

vectors, representing associative effects between clinical (Fig. 4)and image views (Fig. 5). This section will only present the sta-tistically significant weight vectors, which were obtained usingprojection deflation. For comparative purposes, the averages ofthe weight vectors for the second effect using PLS deflation andprojection deflation are presented in Appendix B.2.

3.3.2.1. First associative effect. As previously mentioned, eachweight vector pair represents a multivariate associative effectbetween the two views (brain voxels and clinical variables), i.e. theclinical weight vector will show a subset of clinical variables asso-ciated with a subset of brain voxels displayed in the image weightvector. Fig. 4(a) shows the first clinical weight vector. It is possibleto see that only 15 out of 31 clinical variables were selected. Thesebelonged mainly to the “Orientation”, “Attention and Calculation”,and “Recall” domains. One variable was selected in the “Language”domain. The weight vector corresponding to the first image weightvector can be seen in Fig. 5(a). As we can see, the weight map is

Fig. 4. (a) First clinical weight vector; (b) Second clinical weight vectors. The signof the second weight vector was inverted for visualisation only (in order to beconsistent with the first weight vector pair).

very sparse and the regions found have been previously associatedwith memory (e.g. hippocampus and amygdala) (Jack et al., 2000).

Using the Automated Anatomical Labeling (AAL) atlas (Tzourio-Mazoyer et al., 2002), it is possible to summarise the image weightvectors by ranking the regions of the atlas by their average absoluteweight value. The average was used to take into account the differ-ent atlas region sizes, i.e. the larger the fraction of voxels equal tozero in a region is, the lower the average absolute weight in thatregion will be. Table 3 shows the top 10 regions for the first imageweight vector. For the complete list of regions, please refer to thesupplementary material of the paper.

3.3.2.2. Second associative effect. The second clinical weight vec-tor was not as sparse as the previous one (Fig. 4(b)): 28 out of31 variables were selected. The magnitudes of the weights forthe “Recall” domain are substantially smaller than on the previ-ous weight vector pair, while the absolute values of the weightson the “Registration”, “Attention and Calculation”, and “Language”domains are greater. The voxels found by the second image weightvector (Fig. 5(b) and (d)) were less localised than the ones in the firstimage weight vector, these were present mostly in the temporallobes, hippocampus, and amygdala. The second associative effectseems to capture an association between all domains of the MMSEscore and mainly temporal regions in the left brain hemisphere.

The top 10 regions for the second image weight vector can beseen in Table 4. For the complete list of regions, please refer to thesupplementary material of the paper.

Please note that most voxels in Fig. 5(a) and (b) have positiveweights, while most signs of the corresponding clinical weightvector have negative weights (Fig. 4(a) and (b)). This means thatboth effects follow the same tendency: high grey matter density(high image weights) are associated with generally low values in

Table 3Top 10 atlas regions for the first image weight vector.

Atlas region # voxels found

Amygdala L 98Amygdala R 90Hippocampus R 175Hippocampus L 152ParaHippocampal R 92ParaHippocampal L 44Lingual L 9Precuneus L 2Precuneus R 1Temporal Pole Sup L 1

188 J.M. Monteiro et al. / Journal of Neuroscience Methods 271 (2016) 182–194

Fig. 3. Average absolute correlation on the hold-out datasets.

3.3.2. SPLSUnlike PLS, SPLS found statistically significant sparse weight

vectors, representing associative effects between clinical (Fig. 4)and image views (Fig. 5). This section will only present the sta-tistically significant weight vectors, which were obtained usingprojection deflation. For comparative purposes, the averages ofthe weight vectors for the second effect using PLS deflation andprojection deflation are presented in Appendix B.2.

3.3.2.1. First associative effect. As previously mentioned, eachweight vector pair represents a multivariate associative effectbetween the two views (brain voxels and clinical variables), i.e. theclinical weight vector will show a subset of clinical variables asso-ciated with a subset of brain voxels displayed in the image weightvector. Fig. 4(a) shows the first clinical weight vector. It is possibleto see that only 15 out of 31 clinical variables were selected. Thesebelonged mainly to the “Orientation”, “Attention and Calculation”,and “Recall” domains. One variable was selected in the “Language”domain. The weight vector corresponding to the first image weightvector can be seen in Fig. 5(a). As we can see, the weight map is

Fig. 4. (a) First clinical weight vector; (b) Second clinical weight vectors. The signof the second weight vector was inverted for visualisation only (in order to beconsistent with the first weight vector pair).

very sparse and the regions found have been previously associatedwith memory (e.g. hippocampus and amygdala) (Jack et al., 2000).

Using the Automated Anatomical Labeling (AAL) atlas (Tzourio-Mazoyer et al., 2002), it is possible to summarise the image weightvectors by ranking the regions of the atlas by their average absoluteweight value. The average was used to take into account the differ-ent atlas region sizes, i.e. the larger the fraction of voxels equal tozero in a region is, the lower the average absolute weight in thatregion will be. Table 3 shows the top 10 regions for the first imageweight vector. For the complete list of regions, please refer to thesupplementary material of the paper.

3.3.2.2. Second associative effect. The second clinical weight vec-tor was not as sparse as the previous one (Fig. 4(b)): 28 out of31 variables were selected. The magnitudes of the weights forthe “Recall” domain are substantially smaller than on the previ-ous weight vector pair, while the absolute values of the weightson the “Registration”, “Attention and Calculation”, and “Language”domains are greater. The voxels found by the second image weightvector (Fig. 5(b) and (d)) were less localised than the ones in the firstimage weight vector, these were present mostly in the temporallobes, hippocampus, and amygdala. The second associative effectseems to capture an association between all domains of the MMSEscore and mainly temporal regions in the left brain hemisphere.

The top 10 regions for the second image weight vector can beseen in Table 4. For the complete list of regions, please refer to thesupplementary material of the paper.

Please note that most voxels in Fig. 5(a) and (b) have positiveweights, while most signs of the corresponding clinical weightvector have negative weights (Fig. 4(a) and (b)). This means thatboth effects follow the same tendency: high grey matter density(high image weights) are associated with generally low values in

Table 3Top 10 atlas regions for the first image weight vector.

Atlas region # voxels found

Amygdala L 98Amygdala R 90Hippocampus R 175Hippocampus L 152ParaHippocampal R 92ParaHippocampal L 44Lingual L 9Precuneus L 2Precuneus R 1Temporal Pole Sup L 1

Clinicalweights-v1

Firstassocia,veeffect:memoryrelated

Brainweights-u1

J.M. Monteiro et al. / Journal of Neuroscience Methods 271 (2016) 182–194 189

Fig. 5. (a) First image weight vector; (b) Second image weight vector; (c) 3D visualisation of the features selected for the first image weight vector; (d) 3D visualisation of thefeatures selected for the second image weight vector. Red regions denote positive weights and blue regions denote negative weights (very small region on the second weightvector). The sign of the second weight vector was inverted for visualisation purposes only (in order to be consistent with the first weight vector pair). (For interpretation ofthe references to colour in this figure legend, the reader is referred to the web version of this article.)

the clinical questions/tasks (i.e. the task was performed correctly,Section 2.5), and vice versa.

3.4. Projection onto the SPLS latent space

All the available data were projected onto the weight vectorpairs computed using SPLS, in order to bring insights about struc-ture in the data, and to potentially stratify patients (Section 2.4).

Table 4Top 10 atlas regions for the second weight vector.

Atlas region # voxels found

Amygdala L 36Temporal Inf L 292Hippocampus L 88Amygdala R 11ParaHippocampal L 53Fusiform L 78Temporal Inf R 64Hippocampus R 22Occipital Inf L 12Temporal Mid L 76

Since PLS was not able to find statistically significant weight vectorpairs, the projections for this method will not be presented.

Fig. 6(a) shows the projection of the data onto both SPLS imageweight vectors, while Fig. 6(b) shows the projection of the dataonto both SPLS clinical weight vectors. Each point represents theprojection of one subject’s data onto the subspace defined by theweight vector pair, where its color is given based on the clinicaldiagnosis. The horizontal axes (!1 and ω1) correspond to theprojections onto the first weight vector, and the vertical axes (!2and ω2) correspond to the projections onto the second vector.

As we can see, there are no defined clusters, however, thereseems to be a continuous distribution of subjects from lower tohigher degrees of neurodegeneration.

4. Discussion

Our results show that the proposed SPLS framework was ableto detect two statistically significant associative effects betweengrey matter maps and individual questions/tasks of the MMSEscore when using sparsity constraints on both views. To the bestof our knowledge, this has not been previously shown. Theseresults are particularly interesting as the information encoded

Page 25: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

Clinicalweights-v2

Secondassocia,veeffect:“seman,c”relatedBrainweights-u2

188 J.M. Monteiro et al. / Journal of Neuroscience Methods 271 (2016) 182–194

Fig. 3. Average absolute correlation on the hold-out datasets.

3.3.2. SPLSUnlike PLS, SPLS found statistically significant sparse weight

vectors, representing associative effects between clinical (Fig. 4)and image views (Fig. 5). This section will only present the sta-tistically significant weight vectors, which were obtained usingprojection deflation. For comparative purposes, the averages ofthe weight vectors for the second effect using PLS deflation andprojection deflation are presented in Appendix B.2.

3.3.2.1. First associative effect. As previously mentioned, eachweight vector pair represents a multivariate associative effectbetween the two views (brain voxels and clinical variables), i.e. theclinical weight vector will show a subset of clinical variables asso-ciated with a subset of brain voxels displayed in the image weightvector. Fig. 4(a) shows the first clinical weight vector. It is possibleto see that only 15 out of 31 clinical variables were selected. Thesebelonged mainly to the “Orientation”, “Attention and Calculation”,and “Recall” domains. One variable was selected in the “Language”domain. The weight vector corresponding to the first image weightvector can be seen in Fig. 5(a). As we can see, the weight map is

Fig. 4. (a) First clinical weight vector; (b) Second clinical weight vectors. The signof the second weight vector was inverted for visualisation only (in order to beconsistent with the first weight vector pair).

very sparse and the regions found have been previously associatedwith memory (e.g. hippocampus and amygdala) (Jack et al., 2000).

Using the Automated Anatomical Labeling (AAL) atlas (Tzourio-Mazoyer et al., 2002), it is possible to summarise the image weightvectors by ranking the regions of the atlas by their average absoluteweight value. The average was used to take into account the differ-ent atlas region sizes, i.e. the larger the fraction of voxels equal tozero in a region is, the lower the average absolute weight in thatregion will be. Table 3 shows the top 10 regions for the first imageweight vector. For the complete list of regions, please refer to thesupplementary material of the paper.

3.3.2.2. Second associative effect. The second clinical weight vec-tor was not as sparse as the previous one (Fig. 4(b)): 28 out of31 variables were selected. The magnitudes of the weights forthe “Recall” domain are substantially smaller than on the previ-ous weight vector pair, while the absolute values of the weightson the “Registration”, “Attention and Calculation”, and “Language”domains are greater. The voxels found by the second image weightvector (Fig. 5(b) and (d)) were less localised than the ones in the firstimage weight vector, these were present mostly in the temporallobes, hippocampus, and amygdala. The second associative effectseems to capture an association between all domains of the MMSEscore and mainly temporal regions in the left brain hemisphere.

The top 10 regions for the second image weight vector can beseen in Table 4. For the complete list of regions, please refer to thesupplementary material of the paper.

Please note that most voxels in Fig. 5(a) and (b) have positiveweights, while most signs of the corresponding clinical weightvector have negative weights (Fig. 4(a) and (b)). This means thatboth effects follow the same tendency: high grey matter density(high image weights) are associated with generally low values in

Table 3Top 10 atlas regions for the first image weight vector.

Atlas region # voxels found

Amygdala L 98Amygdala R 90Hippocampus R 175Hippocampus L 152ParaHippocampal R 92ParaHippocampal L 44Lingual L 9Precuneus L 2Precuneus R 1Temporal Pole Sup L 1

J.M. Monteiro et al. / Journal of Neuroscience Methods 271 (2016) 182–194 189

Fig. 5. (a) First image weight vector; (b) Second image weight vector; (c) 3D visualisation of the features selected for the first image weight vector; (d) 3D visualisation of thefeatures selected for the second image weight vector. Red regions denote positive weights and blue regions denote negative weights (very small region on the second weightvector). The sign of the second weight vector was inverted for visualisation purposes only (in order to be consistent with the first weight vector pair). (For interpretation ofthe references to colour in this figure legend, the reader is referred to the web version of this article.)

the clinical questions/tasks (i.e. the task was performed correctly,Section 2.5), and vice versa.

3.4. Projection onto the SPLS latent space

All the available data were projected onto the weight vectorpairs computed using SPLS, in order to bring insights about struc-ture in the data, and to potentially stratify patients (Section 2.4).

Table 4Top 10 atlas regions for the second weight vector.

Atlas region # voxels found

Amygdala L 36Temporal Inf L 292Hippocampus L 88Amygdala R 11ParaHippocampal L 53Fusiform L 78Temporal Inf R 64Hippocampus R 22Occipital Inf L 12Temporal Mid L 76

Since PLS was not able to find statistically significant weight vectorpairs, the projections for this method will not be presented.

Fig. 6(a) shows the projection of the data onto both SPLS imageweight vectors, while Fig. 6(b) shows the projection of the dataonto both SPLS clinical weight vectors. Each point represents theprojection of one subject’s data onto the subspace defined by theweight vector pair, where its color is given based on the clinicaldiagnosis. The horizontal axes (!1 and ω1) correspond to theprojections onto the first weight vector, and the vertical axes (!2and ω2) correspond to the projections onto the second vector.

As we can see, there are no defined clusters, however, thereseems to be a continuous distribution of subjects from lower tohigher degrees of neurodegeneration.

4. Discussion

Our results show that the proposed SPLS framework was ableto detect two statistically significant associative effects betweengrey matter maps and individual questions/tasks of the MMSEscore when using sparsity constraints on both views. To the bestof our knowledge, this has not been previously shown. Theseresults are particularly interesting as the information encoded

J.M. Monteiro et al. / Journal of Neuroscience Methods 271 (2016) 182–194 189

Fig. 5. (a) First image weight vector; (b) Second image weight vector; (c) 3D visualisation of the features selected for the first image weight vector; (d) 3D visualisation of thefeatures selected for the second image weight vector. Red regions denote positive weights and blue regions denote negative weights (very small region on the second weightvector). The sign of the second weight vector was inverted for visualisation purposes only (in order to be consistent with the first weight vector pair). (For interpretation ofthe references to colour in this figure legend, the reader is referred to the web version of this article.)

the clinical questions/tasks (i.e. the task was performed correctly,Section 2.5), and vice versa.

3.4. Projection onto the SPLS latent space

All the available data were projected onto the weight vectorpairs computed using SPLS, in order to bring insights about struc-ture in the data, and to potentially stratify patients (Section 2.4).

Table 4Top 10 atlas regions for the second weight vector.

Atlas region # voxels found

Amygdala L 36Temporal Inf L 292Hippocampus L 88Amygdala R 11ParaHippocampal L 53Fusiform L 78Temporal Inf R 64Hippocampus R 22Occipital Inf L 12Temporal Mid L 76

Since PLS was not able to find statistically significant weight vectorpairs, the projections for this method will not be presented.

Fig. 6(a) shows the projection of the data onto both SPLS imageweight vectors, while Fig. 6(b) shows the projection of the dataonto both SPLS clinical weight vectors. Each point represents theprojection of one subject’s data onto the subspace defined by theweight vector pair, where its color is given based on the clinicaldiagnosis. The horizontal axes (!1 and ω1) correspond to theprojections onto the first weight vector, and the vertical axes (!2and ω2) correspond to the projections onto the second vector.

As we can see, there are no defined clusters, however, thereseems to be a continuous distribution of subjects from lower tohigher degrees of neurodegeneration.

4. Discussion

Our results show that the proposed SPLS framework was ableto detect two statistically significant associative effects betweengrey matter maps and individual questions/tasks of the MMSEscore when using sparsity constraints on both views. To the bestof our knowledge, this has not been previously shown. Theseresults are particularly interesting as the information encoded

Page 26: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

Projec,onsacrossdatasourcesFirsteffect Secondeffect

B.5. Projections 177

Table B.3: Atlas regions for the second image weight map. Only regions with selectedvoxels are shown.

Atlas Region # voxels found

Amygdala_L 36Temporal_Inf_L 292Hippocampus_L 88

Amygdala_R 11ParaHippocampal_L 53

Fusiform_L 78Temporal_Inf_R 64Hippocampus_R 22Occipital_Inf_L 12

Temporal_Mid_L 76Temporal_Mid_R 36

Heschl_R 1Precuneus_L 17Angular_R 4

Occipital_Mid_L 14Temporal_Pole_Mid_L 1

Occipital_Mid_R 3ParaHippocampal_R 4

Cingulum_Mid_L 5Angular_L 2

Temporal_Pole_Sup_R 1Insula_R 2

Precentral_R 2Insula_L 4

Parietal_Inf_L 1Lingual_L 2

Parietal_Inf_R 1Caudate_L 1

Thalamus_L 1

B.5 Projections3421

Xu1-60 -40 -20 0 20 40 60

Yv1

-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

HealthyMCIDementia

(a) Projection of the image data onto the firstweight vector pair {u1,v1}.

Xu2-50 0 50

Yv2

-5

0

5

10

15

20

HealthyMCIDementia

(b) Projection of the image data onto thesecond weight vector pair {u2,v2}.

Figure B.8: Projection of the data onto the SPLS weight vector pairs.

Clinicalprojected

ontov

(Clinicalsc

ore)

Braindataprojectedontou(Brainscore)

B.5. Projections 177

Table B.3: Atlas regions for the second image weight map. Only regions with selectedvoxels are shown.

Atlas Region # voxels found

Amygdala_L 36Temporal_Inf_L 292Hippocampus_L 88

Amygdala_R 11ParaHippocampal_L 53

Fusiform_L 78Temporal_Inf_R 64Hippocampus_R 22Occipital_Inf_L 12

Temporal_Mid_L 76Temporal_Mid_R 36

Heschl_R 1Precuneus_L 17Angular_R 4

Occipital_Mid_L 14Temporal_Pole_Mid_L 1

Occipital_Mid_R 3ParaHippocampal_R 4

Cingulum_Mid_L 5Angular_L 2

Temporal_Pole_Sup_R 1Insula_R 2

Precentral_R 2Insula_L 4

Parietal_Inf_L 1Lingual_L 2

Parietal_Inf_R 1Caudate_L 1

Thalamus_L 1

B.5 Projections3421

Xu1-60 -40 -20 0 20 40 60

Yv1

-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

HealthyMCIDementia

(a) Projection of the image data onto the firstweight vector pair {u1,v1}.

Xu2-50 0 50

Yv2

-5

0

5

10

15

20

HealthyMCIDementia

(b) Projection of the image data onto thesecond weight vector pair {u2,v2}.

Figure B.8: Projection of the data onto the SPLS weight vector pairs.

Clinicalprojected

ontov

(Clinicalsc

ore)

Braindataprojectedontou(Brainscore)

Page 27: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

190 J.M. Monteiro et al. / Journal of Neuroscience Methods 271 (2016) 182–194

Fig. 6. (a) Projection of the image data onto the image weights; (b) Projection of the clinical data onto the clinical weights.

on the individual question/task level is very noisy, however, italso expresses more subtle effects in the data when comparedwith a summarised final exam score. The first effect captured anassociation mainly between the “Orientation”, “Attention and Cal-culation” and the “Recall” domains on the clinical view, and brainregions such as the amygdala and hippocampus. While the secondeffect captured an association between most clinical variables, andregions mainly on the left brain hemisphere, including temporalregions. These results were achieved by imposing sparsity in bothviews, and without using any a priori assumption regarding datastructure, which might be useful when it is not possible to haveone. Moreover, the projection of the subjects onto the latent SPLSspace showed a consistent distribution of the subjects from lowerto higher degrees of neurodegeneration.

Projection deflation has shown to provide more reliable weightvectors when compared with the commonly used PLS deflationmethod. When comparing the different deflation approaches, theresults showed that only by using projection deflation it was pos-sible to find a second statistically significant associative effect withSPLS. Moreover, projection deflation provided a higher average cor-relation on the hold-out datasets, i.e. the model generalises betterfor unseen data.

The proposed framework was also tested with PLS. Our resultsshowed that SPLS performed better than PLS, not only by beingable to find statistically significant associative effects, but also byimproving the interpretability of the weight vector pairs due totheir sparsity, and by generalising better for unseen data (whichcan be demonstrated by an increase in average correlation obtainedin hold-out datasets).

4.1. Multiple hold-out framework

In the present study, we proposed a SPLS framework which usesmultiple random splits for the hold-out dataset. By performing asignificance test on each random split, the framework checks howreliable the weight vector computation is to data perturbations,making it more robust than approaches based on a single split.

The estimation of the sparsity levels for both views withouta priori assumptions allows for greater flexibility when trying tofind the best model to describe a particular associative effect in thedata. Moreover, these levels are not fixed for every weight vectorpair, which means that each associative effect will be described by

the right level of sparsity in each view, i.e. the proposed approachwill find the necessary number of voxels and clinical variables todescribe each associative effect.

One of the main advantages of the proposed frameworkwhen compared with a more widespread nested cross-validationapproach is its computational speed. Nested cross-validation con-sists in performing a two level cross-validation, where, for eachtraining fold, an inner cross-validation procedure is performed forevery hyper-parameter combination, in order to select the opti-mal hyper-parameter combination to be applied in the outer fold.This is a quite computationally intensive procedure, since hyper-parameter selection will have to be repeated for each permutationduring the statistical evaluation. Even if nested cross-validationwere to be applied to this problem with a small number of folds(5 inner folds and 5 outer folds) and permutations (1000), itwould require 40 045 005 SPLS computations, whereas the pro-posed framework with 100 subsamples (which should provide amore stable hyper-parameter selection) and 10 000 permutations,computed SPLS 1 700 010 times, which corresponds to approxi-mately 4% of the number of computations that would have beennecessary with a nested cross-validation framework. For the detailsof how these values are calculated, please refer to Appendix C.

Witten et al. proposed a hyper-parameter optimisation pro-cedure based on permutations where, for each hyper-parametercombination, the p-value of the correlation using all the data wascomputed and the combination with the lowest p was selected(Witten and Tibshirani, 2009). This method will choose the hyper-parameters for which the distance between the true correlationand the null distribution of the correlations is the largest, however,this might not be the same hyper-parameters that maximise thecorrelation of the projections using testing data, which is what theproposed subsampling approach will try to achieve (Section 2.3.1).Moreover, the permutation based method may require a very largeamount of permutations in order to enable correction for multiplecomparisons.

4.2. Previous SPLS/SCCA applications to structural MRI andclinical scores

To our knowledge, using SPLS to study the effects in the brain byimposing sparsity on both voxels and individual questions/tasks ofa cognitive exam has not been done before.

190 J.M. Monteiro et al. / Journal of Neuroscience Methods 271 (2016) 182–194

Fig. 6. (a) Projection of the image data onto the image weights; (b) Projection of the clinical data onto the clinical weights.

on the individual question/task level is very noisy, however, italso expresses more subtle effects in the data when comparedwith a summarised final exam score. The first effect captured anassociation mainly between the “Orientation”, “Attention and Cal-culation” and the “Recall” domains on the clinical view, and brainregions such as the amygdala and hippocampus. While the secondeffect captured an association between most clinical variables, andregions mainly on the left brain hemisphere, including temporalregions. These results were achieved by imposing sparsity in bothviews, and without using any a priori assumption regarding datastructure, which might be useful when it is not possible to haveone. Moreover, the projection of the subjects onto the latent SPLSspace showed a consistent distribution of the subjects from lowerto higher degrees of neurodegeneration.

Projection deflation has shown to provide more reliable weightvectors when compared with the commonly used PLS deflationmethod. When comparing the different deflation approaches, theresults showed that only by using projection deflation it was pos-sible to find a second statistically significant associative effect withSPLS. Moreover, projection deflation provided a higher average cor-relation on the hold-out datasets, i.e. the model generalises betterfor unseen data.

The proposed framework was also tested with PLS. Our resultsshowed that SPLS performed better than PLS, not only by beingable to find statistically significant associative effects, but also byimproving the interpretability of the weight vector pairs due totheir sparsity, and by generalising better for unseen data (whichcan be demonstrated by an increase in average correlation obtainedin hold-out datasets).

4.1. Multiple hold-out framework

In the present study, we proposed a SPLS framework which usesmultiple random splits for the hold-out dataset. By performing asignificance test on each random split, the framework checks howreliable the weight vector computation is to data perturbations,making it more robust than approaches based on a single split.

The estimation of the sparsity levels for both views withouta priori assumptions allows for greater flexibility when trying tofind the best model to describe a particular associative effect in thedata. Moreover, these levels are not fixed for every weight vectorpair, which means that each associative effect will be described by

the right level of sparsity in each view, i.e. the proposed approachwill find the necessary number of voxels and clinical variables todescribe each associative effect.

One of the main advantages of the proposed frameworkwhen compared with a more widespread nested cross-validationapproach is its computational speed. Nested cross-validation con-sists in performing a two level cross-validation, where, for eachtraining fold, an inner cross-validation procedure is performed forevery hyper-parameter combination, in order to select the opti-mal hyper-parameter combination to be applied in the outer fold.This is a quite computationally intensive procedure, since hyper-parameter selection will have to be repeated for each permutationduring the statistical evaluation. Even if nested cross-validationwere to be applied to this problem with a small number of folds(5 inner folds and 5 outer folds) and permutations (1000), itwould require 40 045 005 SPLS computations, whereas the pro-posed framework with 100 subsamples (which should provide amore stable hyper-parameter selection) and 10 000 permutations,computed SPLS 1 700 010 times, which corresponds to approxi-mately 4% of the number of computations that would have beennecessary with a nested cross-validation framework. For the detailsof how these values are calculated, please refer to Appendix C.

Witten et al. proposed a hyper-parameter optimisation pro-cedure based on permutations where, for each hyper-parametercombination, the p-value of the correlation using all the data wascomputed and the combination with the lowest p was selected(Witten and Tibshirani, 2009). This method will choose the hyper-parameters for which the distance between the true correlationand the null distribution of the correlations is the largest, however,this might not be the same hyper-parameters that maximise thecorrelation of the projections using testing data, which is what theproposed subsampling approach will try to achieve (Section 2.3.1).Moreover, the permutation based method may require a very largeamount of permutations in order to enable correction for multiplecomparisons.

4.2. Previous SPLS/SCCA applications to structural MRI andclinical scores

To our knowledge, using SPLS to study the effects in the brain byimposing sparsity on both voxels and individual questions/tasks ofa cognitive exam has not been done before.

Projec=onsontothebrainweights Projec=onsontotheclinicalweights

Braindataprojectedontou1(Brainscore1)

Braindataprojected

ontou 2

(Brainsc

ore2)

Clinicaldataprojectedontov1(Clinicalscore1)

Clinicaldataprojectedon

tov

2(Clinicalsc

ore2)

Projec,onswithineachdatasource

Page 28: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

•  Thesupervisedpagernrecogni=onframeworkhaslimita=onswhenappliedtodatawithunreliablelabels(e.g.categoricalclassifica=oninpsychiatry).

•  Alterna=veassocia=vemodelsthatintegratemul=plesourcesofinforma=onmightprovidenewinsightsaboutthepsychiatricdisordersandpoten=allyhelptobegercharacterizepa=entgroups.

Summary

Page 29: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

Colleaguesandcollaboratorsv  JoaodeMatosMonteiro,UCL,UKv MariaJoaoRosa,UCL,UKv AnilRao,UCL,UKv ProfJohnShawe-Taylor

Acknowledgements

Page 30: Paern recognion and neuroimaging in psychiatry Courses/Course... · Paern recognion and neuroimaging in psychiatry Janaina Mourao-Miranda Machine Learning and Neuroimaging Lab Max

References•  Le Floch et al, 2012. Significant correlation between a set of genetic

polymorphisms and a functional brain network revealed by feature selection and sparse partial least squares. Neuroimage 63 (1), 11–24.

•  Avants et al, 2014. Sparse canonical correlation analysis relates network-level atrophy to multivariate cognitive measures in a neurodegenerative population. Neuroimage.

•  Rosa et al. 2015. Estimating multivariate similarity between neuroimaging datasets with sparse canonical correlation analysis: an application to perfusion imaging. Frontiers in neuroscience.

•  Monteiro et al. 2016. A multiple hold-out framework for sparse partial least squares. Journal of Neuroscience Methods. Code: github.com/jmmonteiro/spls


Recommended