Advanced Statistics Inference Methods & Issues: Multiple Testing, Nonparametrics, Conjunctions &...

Advanced Statistics

Inference Methods & Issues:

Multiple Testing, Nonparametrics, Conjunctions & Bayes

Thomas Nichols, Ph.D.

Department of BiostatisticsUniversity of Michigan

http://www.sph.umich.edu/~nichols

OHBM fMRI Course June 12, 2005

NIH Neuroinformatics / Human Brain Project

http://www.sph.umich.edu/~nichols

© 2005 Thomas Nichols

2

Overview

• Multiple Testing Problem– Which of my 100,000 voxels are “active”?

• Nonparametric Inference– Can I trust my P-value at this voxel?

• Conjunction Inference– Is this voxel “active” in all of my tasks?

• Bayesian vs. Classical Inference– What in the world is a posterior probability?


3

Overview






4

• Null Hypothesis H0

• Test statistic T– t observed realization of T

level– Acceptable false positive rate

– Level = P( T > u | H0 )

– Threshold u controls false positive rate at level

• P-value– Assessment of t assuming H0

– P( T > t | H0 )• Prob. of obtaining stat. as large

or larger in a new experiment

– P(Data|Null) not P(Null|Data)

Hypothesis Testing

u

Null Distribution of T

t

P-val

Null Distribution of T


5

Hypothesis Testing in fMRI

• Massively Univariate Modeling– Fit model at each voxel– Create statistic images of effect

• Then find the signal in the image...


6

Assessing Statistic Images

Where’s the signal?

t > 0.5t > 3.5t > 5.5

High Threshold Med. Threshold Low Threshold

Good Specificity

Poor Power(risk of false negatives)

Poor Specificity(risk of false positives)

Good Power

...but why threshold?!


7

• Don’t threshold, model the signal!– Signal location?

• Estimates and CI’s on(x,y,z) location

– Signal magnitude?• CI’s on % change

– Spatial extent?• Estimates and CI’s on activation volume

• Robust to choice of cluster definition

• ...but this requires an explicit spatial model

Blue-sky inference:What we’d like

space

Loc. Ext.

Mag.


8

Blue-sky inference:What we need

• Need an explicit spatial model

• No routine spatial modeling methods exist– High-dimensional mixture modeling problem– Activations don’t look like Gaussian blobs– Need realistic shapes, sparse representation

• Some work by Hartvig et al., Penny et al.


9

Real-life inference:What we get

• Signal location– Local maximum – no inference– Center-of-mass – no inference

• Sensitive to blob-defining-threshold

• Signal magnitude– Local maximum intensity – P-values (& CI’s)

• Spatial extent– Cluster volume – P-value, no CI’s

• Sensitive to blob-defining-threshold


10

Voxel-level Inference

• Retain voxels above -level threshold u• Gives best spatial specificity

– The null hyp. at a single voxel can be rejected

Significant Voxels

space

u

No significant Voxels


11

Cluster-level Inference

• Two step-process– Define clusters by arbitrary threshold uclus

– Retain clusters larger than -level threshold k

Cluster not significant

uclus

space

Cluster significantk k


12

Cluster-level Inference

• Typically better sensitivity

• Worse spatial specificity– The null hyp. of entire cluster is rejected– Only means that one or more of voxels in

cluster active

Cluster not significant

uclus

space

Cluster significantk k


14

Voxel-wise Inference & Multiple Testing Problem (MTP)

• Standard Hypothesis Test– Controls Type I error of each test,

at say 5%

– But what if I have 100,000 voxels?• 5,000 false positives on average!

• Must control false positive rate– What false positive rate?

– Chance of 1 or more Type I errors?

– Proportion of Type I errors?

5%0


15

MTP Solutions:Measuring False Positives

• Familywise Error Rate (FWER)– Familywise Error

• Existence of one or more false positives

– FWER is probability of familywise error

• False Discovery Rate (FDR)– R voxels declared active, V falsely so

• Observed false discovery rate: V/R

– FDR = E(V/R)


16

FWER MTP Solutions

• Bonferroni

• Maximum Distribution Methods– Random Field Theory– Permutation


17

FWER MTP Solutions:Bonferroni

• V voxels to test

• Corrected Threshold– Threshold corresponding to = 0.05/V

• Corrected P-value– min{ P-value V, 1 }


18

FWER MTP Solutions

• Bonferroni



19

FWER MTP Solutions: Controlling FWER w/ Max

• FWER & distribution of maximum

FWER= P(FWE)= P(One or more voxels u |

Ho)= P(Max voxel u | Ho)

• 100(1-)%ile of max distn controls FWERFWER = P(Max voxel u | Ho)

u


20

FWER MTP Solutions

• Bonferroni



21

FWER MTP Solutions:Random Field Theory

• Euler Characteristic u

– Topological Measure• #blobs - #holes

– At high thresholds,just counts blobs

– FWER = P(Max voxel u | Ho)= P(One or more blobs | Ho) P(u 1 | Ho) E(u | Ho)

Random Field

Suprathreshold Sets

Threshold

No holes

Never more than 1 blob


22

RFT Details:Expected Euler Characteristic

E(u) () || (u 2 -1) exp(-u 2/2) / (2)2

– Search region R3

– ( volume– || roughness

• Assumptions– Multivariate Normal– Stationary*– ACF twice differentiable at 0

* Stationary– Results valid w/out stationary– More accurate when stat. holds

Only very upper tail approximates1-Fmax(u)


25

Random Field Intuition

• Corrected P-value for voxel value t Pc = P(max T > t) E(t) () || t2 exp(-t2/2)

• Statistic value t increases– Pc decreases (of course!)

• Search volume () increases– Pc increases (more severe MCP)

• Smoothness increases (|| smaller)– Pc decreases (less severe MCP)


26

Random Field TheoryStrengths & Weaknesses

• Closed form results for E(u)– Z, t, F, Chi-Squared Continuous RFs

• Results depend only on volume & smoothness

• Smoothness assumed known• Sufficient smoothness required

– Results are for continuous random fields– Smoothness estimate becomes biased

• Multivariate normality• Several layers of approximations

Lattice ImageData

Continuous Random Field


27

Real Data

• fMRI Study of Working Memory – 12 subjects, block design Marshuetz et al (2000)

– Item Recognition• Active:View five letters, 2s pause,

view probe letter, respond

• Baseline: View XXXXX, 2s pause,view Y or N, respond

• Second Level RFX– Difference image, A-B constructed

for each subject– One sample t test

...

D

yes

...

UBKDA

Active

...

N

no

...

XXXXX

Baseline


28

Real Data:RFT Result

• Threshold– S = 110,776– 2 2 2 voxels

5.1 5.8 6.9 mmFWHM

– u = 9.870• Result

– 5 voxels above the threshold

– 0.0063 minimumFWE-correctedp-value-lo

g 10 p

-va

lue


29

MTP Solutions:Measuring False Positives

• Familywise Error Rate (FWER)– Familywise Error

• Existence of one or more false positives

– FWER is probability of familywise error

• False Discovery Rate (FDR)– FDR = E(V/R)– R voxels declared active, V falsely so

• Realized false discovery rate: V/R


30

False Discovery Rate• For any threshold, all voxels can be cross-classified:

• Realized FDR

rFDR = V0R/(V1R+V0R) = V0R/NR

– If NR = 0, rFDR = 0

• But only can observe NR, don’t know V1R & V0R – We control the expected rFDR

FDR = E(rFDR)

Accept Null Reject Null

Null True (no effect) V0A V0R m0

Null False (true effect) V1A V1R m1

NA NR V


31

False Discovery RateIllustration:

Signal

Signal+Noise

Noise


32

FWE

6.7% 10.4% 14.9% 9.3% 16.2% 13.8% 14.0% 10.5% 12.2% 8.7%

Control of Familywise Error Rate at 10%

11.3% 11.3% 12.5% 10.8% 11.5% 10.0% 10.7% 11.2% 10.2% 9.5%

Control of Per Comparison Rate at 10%

Percentage of Null Pixels that are False Positives

Control of False Discovery Rate at 10%

Occurrence of Familywise Error

Percentage of Activated Pixels that are False Positives


33

Benjamini & HochbergProcedure

• Select desired limit q on FDR• Order p-values, p(1) p(2) ... p(V)

• Let r be largest i such that

• Reject all hypotheses corresponding to p(1), ... , p(r).

p(i) i/V qp(i)

i/V

i/V qp-

valu

e

0 1

01

JRSS-B (1995)57:289-300


34

Benjamini & Hochberg Procedure Details

• Method is valid under smoothness– Positive Regression Dependency on Subsets

P(X1c1, X2c2, ..., Xkck | Xi=xi) is non-decreasing in xi

• Only required of test statistics for which null true

• Special cases include– Independence

– Multivariate Normal with all positive correlations

– Same, but studentized with common std. err.

• For arbitrary covariance structure– Replace q with q c(V)

c(V) = i=1,...,V 1/i log(V)+0.5772

Benjamini &Yekutieli (2001).Ann. Stat.29:1165-1188


35

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

When no signal:P-value

threshold /V

When allsignal:P-value

threshold

Ord

ered

p-v

alue

s p

(i)

Fractional index i/V

Adaptiveness of Benjamini & Hochberg FDR

...FDR adapts to the amount of signal in the data


37

FDR Example

FDR Threshold = 3.833,073 voxels

FWER Perm. Thresh. = 7.6758 voxels

• Threshold– Indep/PosDep

u = 3.83

• Result– 3,073 voxels above

Indep/PosDep u– <0.0001 minimum

FDR-correctedp-value


38

Overview






39

Nonparametric Permutation Test

• Parametric methods– Assume distribution of

statistic under nullhypothesis

• Nonparametric methods– Use data to find

distribution of statisticunder null hypothesis

– Any statistic!

5%

Parametric Null Distribution

5%

Nonparametric Null Distribution


40

Permutation TestToy Example

• Data from V1 voxel in visual stim. experimentA: Active, flashing checkerboard B: Baseline, fixation6 blocks, ABABAB Just consider block averages...

• Null hypothesis Ho – No experimental effect, A & B labels arbitrary

• Statistic– Mean difference

A B A B A B

103.00 90.48 99.93 87.83 99.76 96.06


44


• Under Ho

– Consider all equivalent relabelings– Compute all possible statistic values– Find 95%ile of permutation distribution

AAABBB 4.82 ABABAB 9.45 BAAABB -1.48 BABBAA -6.86

AABABB -3.25 ABABBA 6.97 BAABAB 1.10 BBAAAB 3.15

AABBAB -0.67 ABBAAB 1.38 BAABBA -1.38 BBAABA 0.67

AABBBA -3.15 ABBABA -1.10 BABAAB -6.97 BBABAA 3.25

ABAABB 6.86 ABBBAA 1.48 BABABA -9.45 BBBAAA -4.82


45


• Under Ho

– Consider all equivalent relabelings– Compute all possible statistic values– Find 95%ile of permutation distribution

0 4 8-4-8


46

Controlling FWER: Permutation Test

• Parametric methods– Assume distribution of

max statistic under nullhypothesis

• Nonparametric methods– Use data to find

distribution of max statisticunder null hypothesis

– Again, any max statistic!

5%

Parametric Null Max Distribution

5%

Nonparametric Null Max Distribution


47

Permutation Test& Exchangeability

• Exchangeability is fundamental– Def: Distribution of the data unperturbed by permutation

– Under H0, exchangeability justifies permuting data

– Allows us to build permutation distribution

• Subjects are exchangeable– Under Ho, each subject’s A/B labels can be flipped

• fMRI scans are not exchangeable under Ho– If no signal, can we permute over time?

– No, permuting disrupts order, temporal autocorrelation


48

Permutation Test& Exchangeability

• fMRI scans are not exchangeable– Permuting disrupts order, temporal autocorrelation

• Intrasubject fMRI permutation test– Must decorrelate data, model before permuting– What is correlation structure?

• Usually must use parametric model of correlation

– E.g. Use wavelets to decorrelate• Bullmore et al 2001, HBM 12:61-78

• Intersubject fMRI permutation test– Create difference image for each subject– For each permutation, flip sign of some subjects


52

Permutation TestExample

• fMRI Study of Working Memory – 12 subjects, block design Marshuetz et al (2000)

– Item Recognition• Active:View five letters, 2s pause,

view probe letter, respond

• Baseline: View XXXXX, 2s pause,view Y or N, respond

• Second Level RFX– Difference image, A-B constructed

for each subject

– One sample, smoothed variance t test

...

D

yes

...

UBKDA

Active

...

N

no

...

XXXXX

Baseline


53


• Permute!– 212 = 4,096 ways to flip 12 A/B labels– For each, note maximum of t image.

Permutation DistributionMaximum t

Maximum Intensity Projection Thresholded t


54


• Compare with Bonferroni = 0.05/110,776

• Compare with parametric RFT– 110,776 222mm voxels– 5.15.86.9mm FWHM smoothness– 462.9 RESELs


55

t11 Statistic, RF & Bonf. Thresholdt11 Statistic, Nonparametric Threshold

uRF = 9.87uBonf = 9.805 sig. vox.

uPerm = 7.67

58 sig. vox.

Smoothed Variance t Statistic,Nonparametric Threshold

378 sig. vox.

Test Level vs. t11 Threshold


Does this Generalize?RFT vs Bonf. vs Perm. No. Significant Voxels

(0.05 Corrected) t SmVar t df RF Bonf Perm Perm

Verbal Fluency 4 0 0 0 0 Location Switching 9 0 0 158 354 Task Switching 9 4 6 2241 3447 Faces: Main Effect 11 127 371 917 4088 Faces: Interaction 11 0 0 0 0 Item Recognition 11 5 5 58 378 Visual Motion 11 626 1260 1480 4064 Emotional Pictures 12 0 0 0 7 Pain: Warning 22 127 116 221 347 Pain: Anticipation 22 74 55 182 402


58

Overview






59

Conjunction Inference

• Consider several working memory tasks– N-Back tasks with different stimuli– Letter memory: D J P F D R A T F M R I B K– Number memory: 4 2 8 4 4 2 3 9 2 3 5 8 9 3 1 4– Shape memory:

• Interested in stimuli-generic response– What areas of the brain respond to all 3 tasks?– Don’t want areas that only respond in 1 or 2 tasks


61

Conjunction Inference• For working memory example, K=3...

– Letters H1 T1

– Numbers H2 T2

– Shapes H3 T3

– Test

• At least one of the three effects not present

– versus

• All three effects present

}0{}0{}0{: 3210 HHHH

}1{}1{}1{: 321 HHHH A


64

Conjunction Inference Methods: Friston et al

• Use the minimum of the K statistics– Idea: Only declare a conjunction if all of the

statistics are sufficiently large– only when for all kuTk

kmin uTk


71

Valid Conjunction Inference With the Minimum Statistic

• For valid inference, compare min stat to u– Assess mink Tk image as if it were just T1

– E.g. u0.05=1.64 (or some corrected threshold)

• Equivalently, take intersection mask– Thresh. each statistic image at, say, 0.05 FWE corr.– Make mask: 0 = below thresh., 1 = above thresh.– Intersection of masks: conjunction-significant voxels


72

Overview






73

Classical Statistics: Model

Y

Likelihood of Y

p(Y|)

• Estimation– n = 12 subj. fMRI study

• Data at one voxel– Y, sample average

% BOLD change

• Model– Y ~ N(, /√n) is true population

mean BOLD % change

– Likelihood p(Y|) • Relative frequency of

observing Y for one given value of


74

Classical Statistics: MLE

• Estimating – Don’t know in

practice

• Maximum Likelihood Estimation– Find that makes

data most likely– The MLE ( ) is

our estimate of

Y

Likelihood of Y

p(Y|)

Actual y observed in the experiment

y

– Here, the MLE of the population mean is simply the BOLD sample mean, y


75

Classical Statistical Inference

• Level 95% Confidence Interval– Y ± 1.96/√n– With many

replications of the experiment, CI will contain 95% of the time

Y

Likelihood of Y

p(Y|)

CI observed


76

Classical Statistics Redux

• Grounded in long-run frequency of observable phenomena– Data, over theoretical replications– Inference: Confidence intervals, P-values

• Estimation based on likelihood

• Parameters are fixed– Can’t talk about probability of parameters– P( Pop mean > 0 ) ???

• True population mean % BOLD is either > 0 or not

• Only way to know is to scan everyone in population


77

Bayesian Statistics

• Grounded in degrees of belief– “Belief” expressed with the grammar of

probability– No problem making statements about

unobservable parameters• Parameters are regarded random, not fixed

• Data is regarded as fixed, since you only have one dataset


78

Bayesian Essentials• Prior Distribution

– Expresses belief on parameters before seeing the data

• Likelihood– Same as with Classical

• Posterior Distribution– Expresses belief on parameters after the seeing the data

• Bayes Theorem– Tells how to combine prior with likelihood (data) to

create posterior

)()|(')'()'|(

)()|()|(

pyp

dpyp

pypyp

Posterior Likelihood Prior


79

Bayesian Statistics:From Prior to Posterior

• Prior p( ) ~ N(0 , ) 0 = 0 %: a priori belief

that activation & de-activation are equally likely

= 1 % : a priori belief that activation is small

• Data: y = 5 %• Posterior

– Somewhere between prior and likelihood

Prior

LikelihoodPosterior

0-50

5 10

Population Mean % BOLD Change


80

Bayesian Statistics:Posterior Inference

• All Inference based on posterior

• E.g. Posterior Mean(instead of MLE)

Prior

LikelihoodPosterior

0-50

5 10


yn

n

n /11

/1

0

/11

1

22

2

22

2

PriorMean

Data (Sample mean)Posterior

Mean

0 y

– Weighted sum of prior & data mean

– Weights based on prior & data precision


81

Bayesian Inference:Posterior Inference

• But posterior is just another distribution– Can ask any probability question

• E.g. “What’s the probability, after seeing the data, that > 0”, or P( > 0 | y )– Here P( > 0 | y ) ≈ 1

• “Credible Intervals”– Here 4 ± 0.9 has

95% posterior prob.– No reference to

repetitions of theexperiment

Posterior

0-50

5 10



82

Bayesian vs. Classical• Foundations

– ClassicalHow observable statistics behave in long-run

– BayesianMeasuring belief about unobservable parameters

• Inference– Classical

References other possible datasets not observed• Requires awkward explanations for CI’s & P-values

– BayesianBased on posterior, combination of prior and data

• Allows intuitive probabilistic statements (posterior probabilities)


83

Bayesian vs. Classical• Bayesian Challenge: Priors

– I can set my prior to always find a result– “Objective” priors can be found; results then often

similar to Classical inference

• When are the two similar?– When n large, the prior can be overwhelmed by

likelihood– One-sided P-value ≈ Posterior probability of > 0 – Doesn’t work with 2-sided P-value!

[ P( 0 | y ) = 1 ]


84

Bayesian vs. ClassicalSPM T vs SPM PPM

• Auditory experiment

SPM:Voxels with T > 5.5

PPM:Voxels with Posterior Probability > 0.95

Slide: Alexis Roche, CEA, SHFJ

SPM

mip

[0, 0

, 0]

<

< <

SPM{T39.0

}

SPMresults:Height threshold T = 5.50

Extent threshold k = 0 voxelsDesign matrix

1 4 7 10 13 16 19 22

147

1013161922252831343740434649525560

contrast(s)

3

SP

Mm

ip[0

, 0, 0

]

<

< <

PPM 2.06

SPMresults:Height threshold P = 0.95

Extent threshold k = 0 voxelsDesign matrix

1 4 7 10 13 16 19 22

147

1013161922252831343740434649525560

contrast(s)

4

• Qualitatively similar, but hard to equate thresholds


85

Conclusions

• Multiple Testing Problem– Choose a MTP metric (FDR, FWE)– Use a powerful method that controls the metric

• Nonparametric Inference– More power for small group FWE inferences

• Conjunction Inference– Use intersection mask, or treat mink Tk as single T

• Bayesian Inference– Conceptually different, but simpler than Classical– Priors controversial, but objective ones can be used


86

References• Multiple Testing Problem

– Worsley, Marrett, Neelin, Vandal, Friston and Evans, A Unified Statistical Approach for Determining Significant Signals in Images of Cerebral Activation. Human Brain Mapping, 4:58-73, 1996.

– Nichols & Hayasaka, Controlling the Familywise Error Rate in Functional Neuroimaging: A Comparative Review. Statistical Methods in Medical Research, 12:419-446, 2003.

– CR Genovese, N Lazar and TE Nichols. Thresholding of Statistical Maps in Functional Neuroimaging Using the False Discovery Rate. NeuroImage, 15:870-878, 2002.

• Nonparametric Inference– TE Nichols and AP Holmes. Nonparametric Permutation Tests for Functional Neuroimaging: A Primer with

Examples. Human Brain Mapping, 15:1-25, 2002.– Bullmore, Long and Suckling. Colored noise and computational inference in neurophysiological (fMRI) time

series analysis: resampling methods in time and wavelet domains. Human Brain Mapping, 12:61-78, 2001.

• Conjunction Inference– TE Nichols, M Brett, J Andersson, TD Wager, J-B Poline. Valid Conjunction Inference with the Minimum

Statistic. NeuroImage, 2005.– KJ Friston, WD Penny and DE Glaser. Conjunction Revisited. NeuroImage, NeuroImage 25:661– 667, 2005.

• Bayesian Inference– L.R. Frank, R.B. Buxton, E.C. Wong. Probabilistic analysis of functional magnetic resonance imaging data.

Magnetic Resonance in Medicine, 39:132–148, 1998.– Friston,, Penny, Phillips, Kiebel, Hinton and Ashbuarner, Classical and Bayesian inference in neuroimagining:

theory. NeuroImage, 16: 465-483, 2002. (See also, 484-512)– Woolrich, M., Behrens, T., Beckmann, C., Jenkinson, M., and Smith, S. (2004). Multi-Level Linear Modelling

for FMRI Group Analysis Using Bayesian Inference. NeuroImage, 21(4):1732-1747

Date post:	21-Jan-2016
Category:	Documents
Upload:	marilynn-ellis
View:	216 times
Download:	0 times

Advanced Statistics Inference Methods & Issues: Multiple Testing, Nonparametrics, Conjunctions &...

Documents