My first 100 Tb of data STATISTICAL METHODS FOR NEW TECHNOLOGY WORKING GROUP Ciprian M. Crainiceanu...

Post on 27-Mar-2015

212 views 0 download

Tags:

transcript

My first 100 Tb of data

STATISTICAL METHODS FOR NEW TECHNOLOGY WORKING GROUP

Ciprian M. CrainiceanuJohns Hopkins University

http://www.biostat.jhsph.edu/smnt

Members of the group

• Key personnel• C.M. Crainiceanu, B.S. Caffo, A.-M. Staicu, S. Greven, D.

Ruppert, C.-Z. Di

• Senior Students• V. Zipunnikov, J.-A. Goldsmith

• Other statisticians (>20)• Scientific collaborators

• Direct collaboration• Solving important scientific problems• Diverse scientific applications

Scientific Collaborators

• Susan Bassett – fMRI, Alzheimer’s• Danny Reich – DTI, DCE-MRI, MS• Brian Schwartz – lead exposure,

VBM, DTI, white matter imaging• Stewart Mostofsky – fMRI,

rsfcMRI, Autism, ADHD, Turrets• Naresh Punjabi – EEG, sleep,

sleep diseases• Dzung Pham / Pilou Bazin –

Cortical shape, thickness, lesion detection, MS

• Dean Wong – PET, fMRI substance abuse

• Susan Resnick – BLSA• Jerry Prince – BLSA, ADNI

• Jim Pekar, Peter Van Zijl – 7T MRI, fMRI, rsfcMRI preprocessing, scanner physics

• Christos Davatzikos- RAVENS• Susumu Mori – DTI,

tractography• Dana Boatman – ECOG, EEG,

epilepsy• Graham Redgrave – fMRI, DTI,

Huntington’s, anorexia/bulimia• Tudor Badea, Bruno Jednyak –

Neuron classification, morphometry, 3D structure and shape

• Tom Glass – Gizmos• Merck – EEG, neuroimaging• Pfizer – imaging biomarkers?

Observational Studies 2.0

Longitudinal Functional Principal Component Analysis (LFPCA)

• I=1000, J=4, D=100: 15’• I=1000, J=8, D=200: 70’

Greven, Crainiceanu, Caffo, Reich, 2010. LFPCA, EJS, to appear

A simple regression formula

• Data compression via longitudinal PCA• MoM estimators of covariance matrices, smoothing• Need: all covariance operators

• Solution: regress Yij(d)Yik(d’) on 1, Tik, Tij, TikTij, jk

Variance explained (FA, 3 yrs of long. data)

Longitudinal Penalized Functional Regression

LPFR: recipe and ingredients

PASAT/MD (Corp. Call.), PD (Cortic. spinal)

Functional regression

• No paper on longitudinal functional regression• No paper published with this data structure• Longitudinal extensions are not “simple”• Technical details are hard without the correct

“recipe” for known and published “ingredients”• No available method that scales up

Goldsmith, Feder, Crainiceanu, Caffo, Reich, 2010. PFR, JCGS, to appear

Goldsmith, Crainiceanu, Caffo, Reich, 2010. LPFR, to appear?

Population Value Decomposition (PVD)

PVD

Yi = P ViD + Ei

• P is T*A• D is B*F• Vi is A*B

• A << T, B << F

Singular Value Decomposition (SVD) summarizes variance

Subject-specific Data

Eigenvariates EigenfrequenciesDiagonalMatrix

Frequency.

FrequencyTi

me

One subject

Caffo BS, Crainiceanu CM, Verduzco G, Joel SE, Mostofsky SH, Bassett SS, Pekar JJ. Two-Stage decompositions for the analysis of functional connectivity for fMRI with application to Alzheimer’s disease risk. NeuroImage (In Press).

Default PVD

Subject-specific Data

Low rank approximation

Eigenvariates

Eigenfrequencies

...

Stacked across subjects Population decomposition

Projecting original data onto population bases

(Start here)SVD

SVD

…Subject-specific Data

Population eigenimages

Currently:

•Deploying PVD to the 1000 Functional Connectomes Projecthttp://www.nitrc.org/projects/fcon_1000/

•Comparing rsfcMRI in stroke versus normal subjects

HD-MFPCA/RAVENS Images

Multilevel Functional Principal Component Analysis (MFPCA)

MFPCA

HD-MFPCA

HD-MFPCA, Step 1

HD-MFPCA, Step 2

Main message, backed by 100Tb of data

• Eventually, good tech makes into observational and clinical trials

• Longitudinal/Multilevel FDA is the natural next step in FDA

• Data is changing the way we do business: availability, size, complexity

• Likely: funding will be based much more on relevance than on technical ability