+ All Categories
Home > Documents > © S. C. Strother, 2008 © S. C. Strother, 2008 Reproducibility Across Processing Pipelines...

© S. C. Strother, 2008 © S. C. Strother, 2008 Reproducibility Across Processing Pipelines...

Date post: 18-Jan-2018
Category:
Upload: baldwin-frank-parker
View: 219 times
Download: 0 times
Share this document with a friend
Description:
© S. C. Strother, 2008 fMRI Processing Pipelines Statistical Analysis Engine Statistical Maps Experimental Design Matrix Rendering of Results on Anatomy Data Modeling/ Analysis Common Preprocessing Steps Reconstructed MRI/fMRI Data Physiological Noise Correction Intra-Subject Motion Correction Between Subject Alignment Smoothing Intensity Normalisation Detrending

If you can't read please download the document

Transcript

S. C. Strother, 2008 S. C. Strother, 2008 Reproducibility Across Processing Pipelines (including analysis methods) Stephen C. Strother, Ph.D. Rotman Research Institute, Baycrest Centre& Medical Biophysics, University of Toronto S. C. Strother, 2008 Overview Why: Pipelines as Meta-Models How: Optimizing fMRI pipelines Metrics? ROC with simulations & data-driven Reproducibility (r) Prediction (p) the NPAIRS (p, r) resampling framework What: Four Examples ROCs vs NPAIRS r NPAIRS (p, r) plots for pipeline optimization Sensitivity of r to changes in SPM(z) Measuring neural spatial scale with r S. C. Strother, 2008 fMRI Processing Pipelines Statistical Analysis Engine Statistical Maps Experimental Design Matrix Rendering of Results on Anatomy Data Modeling/ Analysis Common Preprocessing Steps Reconstructed MRI/fMRI Data Physiological Noise Correction Intra-Subject Motion Correction Between Subject Alignment Smoothing Intensity Normalisation Detrending S. C. Strother, 2008 Why Test fMRI Pipelines? All models are wrong, but some are useful! All models are wrong. G.E. Box (1976) Marks Nester, An applied statisticians creed, Applied Statistics, 45(4): , Goal is to quantify and optimize utility = Reproducibility? A wide range of software is available for building fMRI processing & analysis pipelines, but fMRI researchers implicitly hope, but typically do not test, that their results are robust to the exact techniques, algorithms and software packages used. Little is known about what constitutes an optimal fMRI pipeline: cognitive and clinically relevant tasks. children, middle-aged and older subjects who represent the age-matched controls relevant for many clinical populations. S. C. Strother, 2008 Optimization Metric Frameworks Simulations 1.ROC curves 1. 1.Skudlarski P., et al., Neuroimage. 9(3):311 329, Della-Maggiore V., et al., Neuroimage 17:1928, Lukic AS., et al., IEEE Symp. Biomedical Imaging, Beckmann CF & Smith SM. IEEE Trans. Med. Img. 23: , Data-Driven: 2. 2.Minimize p-values a. a.Hopfinger JB, et al., Neuroimage, 11: , b. b.Tanabe J, et al. Neuroimage, 15: , Model Selection: Maximum Likelihood, Akaikes information criterion (AIC), Minimum DescriptionLength, Bayesian Information Criterion (BIC) & Bayes Evidence, Cross Validation 4. 4.Replication/Reproducibility 1. 1.Intra-Class Correlation Coefficient 2. 2.Empirical ROCs mixed multinomial model a. a.Genovese CR., et al., Magnetic Resonance in Medicine, 38:497507, b. b.Maitra, R., et al., Magnetic Resonance in Medicine, 48, 62 70, c. c.Liou M., et al., Neuroimage, 29:383-95, Empirical ROCs lower bound on ROC a. a.Nandy RR & Cordes D. Magnetic Resonance in Medicine 49:11521162, Split-Half Resampling a. a.Strother et al., Hum Brain Mapp, 5: , Prediction/Generalization Error or Accuracy a. a.Hansen et al., Neuroimage, 9: , b. b.Kustra R & Strother SC. IEEE Trans Med Img 20: , c. c.Carlson, T.A., et al., J Cog Neuroscience, 15:704717, NPAIRS: Prediction + Reproducibility a. a.Strother SC, et. al., Neuroimage 15: , b. b.Kjems U, et al., et al., Neuroimage 15: , c. c.Shaw ME, et. al. Neuroimage 19: , d. d.LaConte S, et. al. Neuroimage 18:10-23, e. e.Strother SC, et. al., Neuroimage 23S1:S196-S207, 2004. S. C. Strother, 2008 Simulation Block design: N independent baseline & activation image pairs; Total = 2N images, 30 Cortex/White Matter = 4:1. Mean Gaussian amplitude: M = 3% of background. Pixel noise standard deviation: 5% background. Gaussian amplitude correlations: 0.0, 0.5, Gaussian amplitude variation, V: 0.1 2.0 FROM: Lukic AS, Wernick MN, Strother SC. An evaluation of methods for detecting brain activations from PET or fMRI images. Artificial Intelligence in Medicine, 25:69-88, V S. C. Strother, 2008 Receiver Operating Characteristic (ROC) Curves P A = P(True positive) = P(Truly active voxel is classified as active) = Sensitivity P I = P(False positive) = P(Inactive voxel is classified as active) = False alarm rate Skudlarski P, Neuroimage. 9(3):311 329, Della-Maggiore V, Neuroimage 17:1928, Lukic AS, IEEE Symp. Biomedical Imaging, Beckmann CF, Smith SM. IEEE Trans. Med. Img. 23: , pAUC S. C. Strother, 2008 ROC Generation S. C. Strother, 2008 Detection Performance V S. C. Strother, 2008 Empirical ROCs Data-Driven, Empirical ROCs: Nandy RR, Cordes D. Novel ROC-Type Method for Testing the Efficiency of Multivariate Statistical Methods in fMRI. Magnetic Resonance in Medicine 49:11521162, P(Y) = P(voxel identified as active) P(Y/F) = P(inactive voxel identified as active) P(Y) vs. P(Y/F) is a lower bound for true ROC Two runs: standard experimental AND resting-state for P(Y/F). Assumes common noise structure for accurate P(Y/F). S. C. Strother, 2008 Quantifying replication/reproducibility because: replication is a fundamental criterion for a result to be considered scientific; smaller p values do not imply a stronger likelihood of repeating the result; for good scientific practice it is necessary, but not sufficient, to build a measure of replication into the experimental design and data analysis; results are data-driven and avoid simulations. Reproducibility S. C. Strother, 2008 Reproducibity and the Binomial Mixture Model Data-Driven, Empirical ROCs: Genovese CR, Noll DC, Eddy WF. Estimating test-retest reliability in functional MR imaging. I. Statistical methodology. Magnetic Resonance in Medicine, 38:497507, 1997.Genovese CR, Noll DC, Eddy WF. Estimating test-retest reliability in functional MR imaging. I. Statistical methodology. Magnetic Resonance in Medicine, 38:497507, Maitra, R., Roys, S. R., & Gullapalli, R. P. Testretest reliability estimation of functional MRI data. Magnetic Resonance in Medicine, 48, 62 70, 2002.Maitra, R., Roys, S. R., & Gullapalli, R. P. Testretest reliability estimation of functional MRI data. Magnetic Resonance in Medicine, 48, 62 70, Liou M, Su H-R, Lee J-D, Cheng PE, Huang C-C, Tsai C-H. Bridging Functional MR Images and Scientific Inference: Reproducibility Maps. J. Cog. Neuroscience, 15: , 2003.Liou M, Su H-R, Lee J-D, Cheng PE, Huang C-C, Tsai C-H. Bridging Functional MR Images and Scientific Inference: Reproducibility Maps. J. Cog. Neuroscience, 15: , M number of replicationsR V - # voxels > threshold P A P(true activation)P A P(false activation) mixing fraction of true and false activations ROC framework However, comparisons between ROC curves, in this case results from three different preprocessing pipelines, need to use common mixing proportion estimate (assume same underlying activation) Uncertainty estimates on curves are also necessary to determine significance of the difference between AUCs ROC framework IV Common or joint lambda binomial mixture model is required for empirical ROC curve comparisons Model still uses independent P A and P I across thresholds and methods but a single value of must be used for the empirical ROC analysis to be meaningful How do we determine the joint lambda to be used and can the same optimization techniques be used with the joint model? Threshold dependency problems in joint BMM I Most common method of determining joint is to first estimate the independent model (where varies) and then use the average value as the joint value Threshold dependency problems in BMM III low Z-statistic thresholds result in most voxels being above threshold more than once high uncertainty associated with low threshold estimates of estimated will tend to average to 0.5 when no threshold constraint is used Previous literature after Genovese et al (1997) used a variety of different threshold constraints when estimating the joint lambda across thresholds/methods Activation shape alters BMM estimates Second and potentially more serious problem with this model concerns the spatial form of the activation Simplest test of BMM: Cube vs. Gaussian blob simulated activity with 10% of brain active Simulation uses fixed activation placement, height and noise across subjects Activation shape alters estimates From Z-statistic -2 to 4.5, the estimated is correctly inverse variance weighting maintains that across all thresholds The values from the Gaussian blob vary are threshold dependent - this dependence becomes more variable as the simulations become more realistic Can BMM be used with multi-subject data? Lack of point-to-point correspondence Differences in activation magnitude/size Differences in noise mean/variance Potentially different strategies/networks The binomial mixture model framework cannot reliably estimate the mixing proportion when the activation shape is not of uniform strength across activated voxels. S. C. Strother, 2008 Prediction and Reproducibility Prediction and Reproducibility (Split-Half Cross-Validation Resampling) Prediction Metric Standard SPM Estimation S. C. Strother, 2008 Activation-Pattern Reproducibility Metrics (NPAIRS: Split-half reSampling) Signal eigenvalue = (1 + r) Noise eigenvalue = (1 r) uncorrelated signal (rSPM) and noise (nSPM) from any data analysis model. a reproducible SPM (rSPM) on a common statistical Z-score scale. Simulations: Comparing Metrics S. C. Strother, 2008 GLOBALLOCAL S. C. Strother, 2008 NPAIRS: ROC-Like Prediction vs. Reproducibility Optimizing Performance. Like an ROC plot there is a single point, (1, 1), on this prediction vs. reproducibility plot with the best performance; at this location the model has perfectly predicted the design matrix while extracting an infinite SNR. A Bias-Variance Tradeoff. As model complexity increases (i.e., #PCs 10 100), prediction of design matrixs class labels improves and reproducibility (i.e., activation SNR) decreases. (i.e., activation SNR) decreases. LaConte S, et. al. Evaluating preprocessing choices in single-subject BOLD-fMRI studies using data-driven performance metrics. Neuroimage 18:10-23, 2003 Measuring Improved (p, r) Plots S. C. Strother, 2008 Testing Pipelines with (p, r) Plots S. C. Strother, Sliding window running means. 2 Multi-Taper power spectrum 3 Wilcoxin matchedpair rank sum test (N = 16) Zhang et al., Neuromage 41:4: , 2008 and Mag Res Med (in press) S. C. Strother, 2008 A Multi-Task Dataset as a f(age) Previously collected data (Grady et al., J. Cog. NSci, 2006) oLanguage-picture encoding/recognition memory experiment o1.5T GE MRI at Sunnybrook (TR 2.5 s) Multiple tasks o6 different tasks Picture/word; Animacy/Size(case) 4 x Encoding [4 epochs, 89 volumes] 2 x Recognition [8 epochs, 166 volumes] TASKAAAA Different age groups o20 subjects, young (10) and old (10) o60 runs/age group (6 tasks/run x 10 subjects) S. C. Strother, 2008 fMRI Processing Pipelines Examined Reconstructed MRI/fMRI Data Within-Subject Motion Correction Between Subject Alignment Spatial Smoothing Reconstructed MRI/fMRI Data Within-Subject Motion Correction Between Subject Alignment Voxel Detrending (Motion Params.) Reconstructed MRI/fMRI Data Within-Subject Motion Correction Between Subject Alignment Voxel Detrending (Motion Params.) General Linear Model or Multivariate CVA Reconstructed MRI/fMRI Data Within-Subject Motion Correction Between Subject Alignment Spatial Smoothing MC + MPER MC + MPER + Smoothing MC + Smoothing MC S. C. Strother, 2008 % SPM(z) vs. Reproducibility Spatial-Scale of Cat Orientation Columns 2 mm A D cerebellum A SSPL A L C 2 mm SPL D L B 18 2 mm SSPL SPL WM MION enhanced, CBV weighted fMRI 1-mm thick slice tangential to the surface of the cortex containing visual area 18. Gradient-Echo, 9.4 Tesla In-plane resolution 0.15 x 0.15mm 2, TR=2s, TE=10ms Zhao F, et al., NeuroImage 27:416-24, 2005 Methods & Framework Correlation rSPM Original Data Bootstrap Samples BS1 BS2 BS100 Resampling 2D Gaussian smoothing Reproducibility FWHM(mm) max Results Distribution of the optimal FWHM of Gaussian filters over 100 bootstrap samples. (0 vs 90, 0.15 x 0.15mm 2 )(45 vs 135, 0.15 x 0.15mm 2 ) Data 1 (0 vs 90, x mm 2 ) Dataset 1Dataset 2 Dataset 3Dataset 4 S. C. Strother, 2008 Overview Why: Pipelines as Meta-Models How: Optimizing fMRI pipelines Metrics? ROC with simulations & data-driven Reproducibility (r) Prediction (p) the NPAIRS (p, r) resampling framework What: Four Examples ROCs vs NPAIRS r NPAIRS (p, r) plots for pipeline optimization Sensitivity of r to changes in SPM(z) Measuring neural spatial scale with r S. C. Strother, 2008 Acknowledgements Rotman Research Institute Xu Chen, Ph.D.Cheryl Grady, Ph.D. Grigori Yourganov, M.ScSimon Graham, Ph.D. Wayne Lee, M.Sc.Randy McIntosh, Ph.D. Mani Fazeli, M.Sc. Anita Oder, B.Sc. Illinois Institute of Technology & Predictek, Inc., Chicago Ana Lukic, Ph.D.Miles Wernick, PhD. University of Pittsburgh Seong-Gi Kim, Ph.D.Fuqiang Zhao, Ph.D FMRIB, Oxford University Morgan Hough, B.A.Steve Smith, Ph.D. Principal Funding Sources: NIH Human Brain Project, P20-EB & P20- MH , James S. McDonnell Foundation


Recommended