Statistical Methods in Lecture 7.2: functional MR...

1

Statistical Methods in functional MRI

Martin Lindquist Department of Biostatistics Johns Hopkins University

Lecture 7.2: Multiple Comparisons

04/25/13

Issues with FWER

•  Methods that control the FWER (Bonferroni, RFT, Permutation Tests) provide a strong control over the number of false positives.

•  While this is appealing, the resulting thresholds often lead to tests that suffer from low power.

•  Power is critical in fMRI applications because the most interesting effects are usually at the edge of detection.

False Discovery Rate

•  The false discovery rate (FDR) is a recent development in multiple comparison problems due to Benjamini and Hochberg (1995).

•  While the FWER controls the probability of any

false positives, the FDR controls the proportion of false positives among all rejected tests.

Suppose we perform tests on m voxels.

U, V, T and S are unobservable random variables. R is an observable random variable.

Declared Inactive

Declared Active

Truly inactive U V m0

Truly active T S m-m0

m-R R m

Notation Definitions

•  In this notation:

•  False discovery rate:

•  The FDR is defined to be 0 if R=0.

⎟⎠

⎞⎜⎝

⎛=RVEFDR

( )1≥= VPFWER

2

Properties

•  A procedure controlling the FDR ensures that on average the FDR is no bigger than a pre-specified rate q which lies between 0 and 1.

•  However, for any given data set the FDR need not be below the bound.

•  An FDR-controlling technique guarantee controls of the FDR in the sense that FDR ≤ q.

BH Procedure

1.  Select desired limit q on FDR (e.g., 0.05)

2.  Rank p-values, p(1) ≤ p(2) ≤ ... ≤ p(m)

3.  Let r be largest i such that

4.  Reject all hypotheses corresponding to p(1), ... , p(r).

p(i) ≤ i/m × q p(i)

p-va

lue

0 1

0 1

i/m × q

The BH procedure is adaptive in the sense that the larger the signal, the lower the threshold.

qm

Low signal

q

High signal

Comments

•  If all null hypothesis are true, the FDR is equivalent to the FWER.

•  Any procedure that controls the FWER also controls the FDR. A procedure that controls the FDR only can be less stringent and lead to a gain in power.

•  Since FDR controlling procedures work only on the p-values and not on the actual test statistics, it can be applied to any valid statistical test.

Example Signal

Signal + Noise

Noise

+

=

α=0.10, No correction

Percentage of false positives 0.0974 0.1008 0.1029 0.0988 0.0968 0.0993 0.0976 0.0956 0.1022 0.0965

FWER control at 10%

Occurrence of false positive FWER

FDR control at 10%

Percentage of active voxels that are false positives 0.0871 0.0952 0.0790 0.0908 0.0761 0.1090 0.0851 0.0894 0.1020 0.0992

3

Uncorrected Thresholds

•  Most published PET and fMRI studies use arbitrary uncorrected thresholds (e.g., p<0.001). –  With available sample sizes, corrected thresholds are so

stringent that power is extremely low.

•  Using uncorrected thresholds is problematic when interpreting conclusions from individual studies, as many activated regions may be false positives.

•  Null findings are hard to disseminate, hence it is difficult to refute false positives established in the literature.

Extent Threshold

•  Sometimes an arbitrary extent threshold is used when reporting results.

•  Here a voxel is only deemed truly active if it belongs to a cluster of k contiguous active voxels (e.g., p<0.001, 10 contingent voxels).

•  Unfortunately, this does not necessarily correct the problem because imaging data are spatially smooth and therefore false positives may appear in clusters.

•  Activation maps with spatially correlated noise thresholded at three different significance levels. Due to the smoothness, the false-positive activation are contiguous regions of multiple voxels.

α=0.10 α=0.01 α=0.001

Note: All images smoothed with FWHM=12mm

Example

α=0.10 α=0.01 α=0.001

Note: All images smoothed with FWHM=12mm

•  Similar activation maps using null data.

Example

Lecture 8: Functional Connectivity

04/25/13

4

Data Processing Pipeline

Preprocessing

Data Analysis

Data Acquisition

Slice-time Correction

Motion Correction, Co-registration & Normalization

Spatial Smoothing

Localizing Brain Activity

Connectivity

Prediction

Reconstruction

Experimental Design

Brain Networks •  It has become common practice to talk about brain

networks, i.e. sets of interconnected brain regions with information transfer among regions.

•  To construct a network: –  Define a set of nodes (e.g., ROIs) –  Estimate the set of connections, or edges, between the

nodes.

A

B

C 0 1 0 0 0 1 0 1 0

A B C

A B C

Network Methods •  A number of methods have been suggested in

the neuroimaging literature to quantify the relationship between nodes/regions.

•  Their appropriateness depend upon:

–  what type of conclusions one is interested in making;

–  what type of assumptions one is willing to make;

–  and the level of the analysis and modality.

Brain Connectivity •  Functional Connectivity

–  Undirected association between two or more fMRI time series.

–  Makes statements about the structure of relationships among brain regions.

DLPFC

MTG

dACC VMPFC

Brain Connectivity •  Effective Connectivity

–  Directed influence of one brain region on the physiological activity recorded in other brain regions.

–  Makes statements about causal effects among tasks and regions.

V1

V5

PPC

Functional Connectivity •  Methods:

–  Seed analysis –  Inverse covariance methods –  Multivariate decomposition methods

§  Principle Components Analysis §  Independent Components Analysis §  Partial Least Squares

–  Mediation analysis

–  Psychophysiological interaction (PPI) analysis

5

Effective Connectivity •  Methods:

–  Structural Equation Modeling

–  Granger Causality

–  Dynamic Causal Modeling –  Bayes Net

–  Mediation analysis –  Psychophysiological interaction (PPI) analysis

Levels of Analysis •  Functional connectivity can be applied at

different levels of analysis, with different interpretations at each.

•  Connectivity across time can reveal networks that are dynamically activated across time.

•  Connectivity across trials can identify coherent networks of task related activations.

Levels of Analysis •  Connectivity across subjects can reveal

patterns of coherent individual differences.

•  Connectivity across studies can reveal tendencies for studies to co-activate within sets of regions.

Bivariate Connectivity

•  Simple functional connectivity –  Region A is correlated with Region B.

–  Provides information about relationships among regions.

–  Can be performed on time series data within a subject, or individual differences (contrast maps, one per subject).

A B

Time Series Connectivity •  Calculate the cross-correlation between time

series from two separate brain regions.

Region 1 Region 2

Subject 1

Subject 2

Subject n

…

Group Analysis

r Z

r Z

r Z

Seed Analysis •  In seed analysis the cross-correlation is

computed between the time course from a predetermined region (seed region) and all other voxels.

•  This allows researchers to find regions correlated with the activity in the seed region.

•  The seed time course can also be a performance or physiological variable

6

Correlations between brain activity and heart-rate

Time (TRs, 2 s)

Average within-subject correlation (r)

Threshold: p < .005

VMPFC

Issues •  One of the main problems with time series

connectivity is the fact that there may be different hemodynamic lags in different regions:

–  Time series from different regions may not match up, even if neural activity patterns match up.

–  If lags are estimated from data, temporal order may be caused by vascular (uninteresting) or neural (interesting) response.

Beta Series •  The beta series approach can be used to

minimize issues of inter-region neurovascular coupling.

•  Procedure:

–  Fit a GLM to obtain separate parameter estimates for each individual trial.

–  Compute the correlation between these estimates across voxels.

Beta Series Region 1 Region 2

Subject 1

Subject 2

Subject n

…

Group Analysis

r Z

r Z

r Z

Individual Differences

……

..

Subject Contrast Image

1

2

N

Seed Value

1x

2x

Nx

Group Results

Partial Correlation

•  Partial Correlation

–  Correlation between two regions, after the effect of all other regions have been removed.

–  Helps protect against ‘illusory’ correlations between regions (e.g., A and C uncorrelated after controlling for B).

A C

B

7

Inverse Covariance Methods •  For multivariate normal data there exists a duality

between the inverse covariance (precision) matrix and the graph representing relationships between regions.

–  Conditional independence between variables (regions) corresponds to zero entries in the precision matrix.

–  Graphical lasso (GLASSO) can be used to estimate sparse precision matrices and graphs.

A C

B 0

0 =Σ−1

Mediation

•  Mediation –  The relationship between regions A and B is mediated by M –  Can identify functional pathways spanning > 2 regions –  Can be performed on time series data within a subject, or

individual differences (contrast maps, one per subject)

–  Also: Test of whether task-related activations in B are mediated, or explained, by M.

A B M

Task B M

Demonstrating Mediation

x y

m a b

c’ x y

c

Full model, with mediator Reduced model, without mediator

m = im + ax + em y = iy + bm + c'x + e’y

y = iy’ + cx + ey

Decomposition of Effects •  The mediation framework allows us to

decompose the total effect of x on y as follows:

•  Does m explain some of the x-y relationship?

–  Test c – c’, which is equivalent to significance of the ab product.

–  Sobel test or bootstrap test.

c = c' + ab Total effect = Direct effect + Mediated effect

X

M

Y ( )nxxx …,, 21 ( )nyyy …,, 21

( ))(),(),( 21 tmtmtm n…

)(tα )(tβ

Total:

Direct: 'γ

γ

)()()( , txttm miii εα +=

)(')()( ,0

txdssmsy yii

T

ii εγα ++= ∫

)(, txy xiii εγ +=

∫+=T

dsss0

)()(' βαγγ

Functional Mediation Pain Data

α pathway function

Temp Rating

Brain Response

αβ pathway function

β pathway function

Activation in the right anterior insula mediates the relationship between temperature and pain rating.

The key time interval driving the mediation is between 14-24 seconds following activation.

8

Moderation

•  Moderation –  The relationship between regions A and B is moderated by M –  Connectivity between A and B depends on state (level) of M –  Can be performed on time series data within a subject, or

individual differences (contrast maps, one per subject) –  M can be task state or other variable

•  In SPM, on time series data: “Psychophysiological interaction” (PPI).

M

B

A •  In the psychophysiological interaction (PPI) approach, the standard GLM model is supplemented with additional regressors that model the interaction between the task and the time course in a seed region.

εββββ ++++= XRRXY *3210

Task

Time course from seed region

Interaction term

PPI

•  PPI can be used to determine whether the correlation between two brain areas is altered by different psychological contexts.

•  The interaction term reflects the modulation of the slope of the linear relationship with the seed voxel depending on the variable used to create the interaction.

PPI Decomposition Methods •  We often use multivariate decomposition

methods to study functional connectivity. –  Provides a decomposition of the data into separate

components. –  Can be used to find coherent brain networks. –  Provides information on how different brain regions

interact with one another.

•  The most common decomposition methods are principal components analysis and independent components analysis.

Voxels

Tim

e

X

•  Throughout we organize the fMRI data in a T×N matrix X. –  The row dimension is the number of time points and

the column dimension the number of voxels.

Data Organization Principal Components Analysis

•  Principal components analysis involves finding spatial modes, or eigenimages, in the data. –  These are the patterns that account for most of the

variance-covariance structure in the data. –  They are ranked in order of the amount of variation they

explain.

•  The eigenimages can be obtained using singular value decomposition (SVD), which decomposes the data into two sets of orthogonal vectors that correspond to patterns in space and time.

9

Using SVD, we can decompose the matrix X as:

TUSVX =

where U and V are unitary orthogonal matrices and S is a diagonal matrix consisting of ranked singular values.

Each column of V defines a distributed brain region that can be displayed as an image (eigenimages).

Each column of U correspond to the time-dependent profiles associated with each eigenimage.

TUSVX =

Voxels

Tim

e

=

Eigenimages Time courses

TNNN

TT sss vuvuvuX +++= …222111

APPROX. OF Y

s1 + APPROX. OF Y

s2 + ... = u1

v1T

u2

v2T

Worsley

Independent Components Analysis

•  Independent Components Analysis (ICA) is a family of techniques used to extract independent signals from some source signal.

•  ICA provides a method to blindly separate the data into spatially independent components.

•  The key assumption is that the data set consists of p spatially independent components, which are linearly mixed and spatially fixed.

Two people are talking simultaneously in a room with two microphones.

Cocktail Party Problem

Speakers: s1(t) and s2(t). Microphones: x1(t) and x2(t)

)()()()()()(

2221212

2121111

tsatsatxtsatsatx

+=

+= ASX =→

Mixing matrix Source matrix

10

Assumptions

•  If the mixing matrix is known, the problem is straight forward.

•  However, ICA solves this problem without knowing the mixing parameters.

•  Instead it exploits some key assumptions:

–  Linear mixing of sources.

–  The components si are statistically independent.

–  The components si are non-Gaussian.

ICA Estimation

•  We can find the independent components using a variety of different approaches. –  Maximizing non-Gaussianity –  Minimizing the mutual information –  Maximum likelihood estimation –  Projection pursuit

ICA for fMRI

•  It is assumed that the fMRI data can be modeled by identifying sets of voxels whose activity both vary together over time and are different from the activity in other sets.

•  Decompose the data set into a set of spatially independent component maps with a set of corresponding time-courses.

ICA for fMRI

+

A[ ]1 2Ts s=s

fMRI data

fMRI data is assumed to be a linear mixture of statistically independent sources, s.

×

×

Source 1

Source 2

Time course 1

Time course 2 Vince Calhoun

×

×

ICA for fMRI

•  We seek to decompose X as follows:

where the matrix S contains statistically independent maps in its rows each with an internally consistent time-course contained in the associated column of the mixing matrix A.

ASX =

Voxels

Tim

e

=

Mixing Matrix

Components Data

Spatially independent Components

Time Courses

ASX =

Use an ICA algorithm to find A and S.

Overview

11

Comments •  Unlike PCA which assumes an orthonormality

constraint, ICA assumes statistical independence among a collection of spatial patterns.

•  Independence is a stronger requirement than orthonormality.

•  However, in ICA the spatially independent components are not ranked in order of importance as they are when performing PCA.

Types of ICA •  An ICA that decomposes the original data into

spatially statistically independent components is called spatial ICA (sICA).

•  It is possible to switch the order and make the temporal dynamics independent. This is called temporal ICA (tICA).

•  Spatial ICA is more common in fMRI data analysis.

McKeown, et. al.

Multi-subject Analysis •  Using ICA to analyze fMRI data from multiple

subjects raises several questions. –  How should components be combined across subjects? –  How should the final results be thresholded and/or presented?

•  There are several approaches: –  Stack time courses (forces time courses to be the same) –  Stack images and back-reconstruct (allows time courses to

vary, allows some flexibility in images) –  Stack into a cube (forces images and time courses to be the

same)

Group ICA •  Group ICA is based on temporal concatenation.

•  It decomposes the group matrix, and estimates through back-reconstruction the spatial weights for each subject for a component of interest.

•  For each subject the spatial weights at each voxel are treated as random variables, and a t-test is used to test whether that voxel loaded significantly on that component in the group.

Group ICA

X

Subject 1

Subject N

Data

A S_agg

ICA

A1

AN

= ×

Subject i

Back-reconstruction

×1−

=Ai Si

= ×

× = -1

Date post:	30-Aug-2018
Category:	Documents
Upload:	nguyenkhue
View:	214 times
Download:	0 times

Statistical Methods in Lecture 7.2: functional MR...

Documents