A Data Driven Approach to Tackling Big Data Connectomics · A Data Driven Approach to Tackling Big...

Post on 08-Oct-2020

1 views 0 download

transcript

A Data Driven Approach to Tackling Big Data ConnectomicsGreg Kiar

McGill University Healthy Brains for Healthy Lives Fellow,McGill Centre for Integrative Neuroscience, Montreal Neurological Institute,Ph.D. student McGill University,M.S.E. Johns Hopkins University,B.Eng Carleton University

2018-02-05

Outline

2018-02-05 2

• Context

• The common approach

• An approach based on accessibility, robustness, and scalability

Publicly available MRI datasets• ADNI• ABCD• ABIDE• ADHD-200• Age-ility• AIBL• BRAINS• CamCAN• CMI-HBN• COBRE• CoRR/FCP-INDI• DLBS

• fBIRN• GSP• HCP• IXI• Kirby21• MASSIVE• MindBoggle-101• MIRIAD• MPI-LMBB• MSC• NACC• NCANDA

• NKIRS• OASIS-CS• OASIS-Long• OpenfMRI• PING• PNC• PTBP• SALD• SchizConnect• StudyForrest• UK-Biobank

2018-02-05 3

Source:https://github.com/cMadan/openMorph

Publicly supported BIDS apps• AFNI• ANTS Cortical Thickness• Baracus• Brainiak-srm• BROCCOLI• CPAC• DPARSF• Fibre Density and Cross-section• fMRIprep• Freesurfer• FSL Tools• HCP Pipelines• Hyper Alignment

• MAGeTbrain• MindBoggle• MRIQC• MRtrix3 Connectome• ndmg• NIAK• OPPNI• SRM• SPM• Tracula• QAP

2018-02-05 4

Source:http://bids-apps.neuroimaging.io/apps/

Common approach1. Pose a hypothesis2. Collect + curate dataset3. Manually perform QC on dataset4. Pick processing pipeline and parameters5. Process random subset with pipeline in 4.6. Manually perform QC on derivatives7. Redo from 4. if not happy with 6.8. Process all data with pipeline in 4.9. Answer statistical question10. Publish claim11. Get tenure

2018-02-05 5

Common Honest approach1. Poste ahoc hypothesis2. Collect + curate dataset3. Undergrads Manually perform QC on dataset4. Pick processing pipeline and parameters5. Process random subset first subject with pipeline in 4.6. Grad students Manually perform QC on derivatives7. Don’t redo from 4. if not happy with 6.8. Process all data with pipeline in 4.9. Answer statistical question (see updated 1.)10. Publish claim11. Get tenure

2018-02-05 6

Problems with this approach

1. Manual QC is subjective; we need LOTS to be reliable

2. Pipelines and parameters aren’t ”optimized” objectively

3. Datasets aren’t homogeneous

4. Incentive to publish/graduate rather than redo experiments

5. Computer infrastructures are expensive, and not always equal

2018-02-05 7

Proposed solution

1. Optimize pipeline for stability and robustness; remove bias

2. Automate QC where possible

3. Evaluate on public data with known “truths” (i.e. TestReTest)

4. Automate pipeline deployment

5. Automate data discovery*

2018-02-05 8

Proposed solution

1. Optimize pipeline for stability and robustness; remove bias

2. Automate QC where possible

3. Evaluate on public data with known “truths” (i.e. TestReTest)

4. Automate pipeline deployment

5. Automate data discovery*

2018-02-05 9

ndmg: one-click connectomes

2018-02-05 10

Registration to MNI152

2018-02-05 11

DWI corr. DWI

T1wint. DWI

Template

xfmreg. DWI QAExternal Dep.

SubjectIntermediateOutput

2018-02-05 12

Compute Tensor Field

2018-02-05 13

bvals

grad. table

reg. DWI

Mask

Tensors QA

bvecs

External Dep.SubjectIntermediateOutput

2018-02-05 14

Estimate Streamlines

2018-02-05 15

FA thresh.

Mask

Fibers QA

Tensors

External Dep.SubjectIntermediateOutput

2018-02-05 16

Graph Generation

2018-02-05 17

Parcellation

Connectome QA

Fibers

External Dep.SubjectIntermediateOutput

2018-02-05 18

Optimizing for discriminability

2018-02-05 19

g: connectomei: class label (i.e. subject)j: observation label (i.e. session)

“My brain looks more like my brain than my brain looks like your brain”or

“My brain looks more like brains of the same {dataset, sex, age, handedness, etc.} than brains of another {“, “, “,”}”

Reliable structural connectivity

2018-02-05 20

2018-02-05 21

2018-02-05 22

2018-02-05 23

easy to use/install$ # Installable on Python2.7 if you have FSL installed…$ pip install ndmg$$ # Run on your dataset$ ndmg_bids /data /outs session $ ndmg_bids /data /outs group$$ # Or install and run through Docker$ docker run –ti –v /data –v /outs bids/ndmg /data /outs session$ docker run –ti –v /data –v /outs bids/ndmg /data /outs group

2018-02-05 24

Proposed solution

1. Optimize pipeline for stability and robustness; remove bias

2. Automate QC where possible

3. Evaluate on public data with known “truths” (i.e. TestReTest)

4. Automate pipeline deployment

5. Automate data discovery*

2018-02-05 25

Boutiques

2018-02-05 26

Boutiques

2018-02-05 27

{

"name": "echo",

"tool-version": "1.0",

"description": "A simple script to test output files",

"command-line": "echo [PARAM] > output.txt",

"schema-version": "0.5",

"inputs": [{ "id": "param",

"name": "Parameter",

"value-key": "[PARAM]",

"type": "Number" }],

"output-files": [{ "id": "output_file",

"name": "Output file",

"path-template": "output.txt" }]

}

Clowdr

2018-02-05 28

2018-02-05 29

2018-02-05 30

Boutiques on pip, Clowdr soon

2018-02-05 31

$ # Installable on Python2 or 3…$ pip install boutiques$$ # Describe, validate, launch your tool, or more!$ bosh validate descriptor.json$ bosh exec simulate descriptor.json –r$ bosh exec launch descriptor.json invocation.json$$ # Soon… (currently the API is a bit uglier than this)$ clowdr deploy descriptor.json invocation.json s3://dataset

Proposed solution

1. Optimize pipeline for stability and robustness; remove bias

2. Automate QC where possible

3. Evaluate on public data with known “truths” (i.e. TestReTest)

4. Automate pipeline deployment

5. Automate data discovery*

2018-02-05 32

Apine

2018-02-05 33

So, why are we doing this again?• If tools and platforms are made to be reproducible and robust…

SCIENTISTS CAN FOCUS ON SCIENCE!

• Free validation and summary of the quality of work being done• Enables scaling to datasets beyond reach of manual curation

2018-02-05 34

The dream• Go to a website• Pick a dataset• Pick an analysis• Design a hypothesis• Launch it• Go outside & run around• Come back to your answer• Share the results, form new hypotheses, and collect new data

2018-02-05 35

Acknowledgements• McGill Centre for Integrative Neuroscience (Alan Evans, et al.)• Big-Data for Neuroinformatics Lab (Tristan Glatard, et al.)• Jean-Baptiste Poline, Pierre Bellec, Christine Tardif• Montreal Neurological Institute/The Neuro• Healthy Brains for Healthy Lives• Lab-mates, Family, Friends, Universe

2018-02-05 36

All code demonstrated in this presentation is publicly available on GitHub.

Thanks!Find me @

gkiar

g_kiar

greg.kiar@mcgill.ca

2018-02-05 37