+ All Categories
Home > Data & Analytics > Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by...

Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by...

Date post: 08-Jan-2017
Category:
Upload: spark-summit
View: 479 times
Download: 0 times
Share this document with a friend
25
Scaling unsupervised ciliary motion analysis for actionable biomedical insights with PySpark Shannon Quinn University of Georgia
Transcript
Page 1: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

Scaling unsupervised ciliary motion analysis for actionable biomedical insights with PySpark

Shannon QuinnUniversity of Georgia

Page 2: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

Who am I?• Georgia Tech alumnus

• Carnegie Mellon University & University of Pittsburgh alumnus

• Assistant Professor of Computer Science & Cellular Biology at University of Georgia

• Public health, imaging, data science, open science, running…

Page 3: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

What are cilia?

Scale bars: 10μm

Page 4: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

Why do we care about cilia?• Clinical

– Ciliopathies– Association with

congenital heart disease

• Developmental– Nodal flow– Left-right asymmetry

Page 5: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

How do we diagnose ciliopathies?Cheap, fast, inaccurate Slow, expensive, accurate (?)

Measure nasal nitric oxide (NO)

levels

Electron microscopy to search

for structural defects

Ciliary beat frequency

(CBF) computation

Manual ciliary beat

pattern analysis

“Gold standard”

Page 6: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

What is our goal?• Input: high-speed video of ciliary biopsy• Output: quantitative properties of observed motion

Curly!

Page 7: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

Strategy for quantifying motion

Page 8: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

From videos to features

Page 9: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

Features of motion

Scaling Deformation(biaxial shear)

Rotation(curl)

Not useful in 2D

Novel use of differential image velocity invariants to categorize ciliary motion defects.Quinn SP, Francis R, Lo C, Chennubhotla CS. Proceedings of the Biomedical Science and Engineering Conference (BSEC) 2011.

Page 10: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

What do these features look like?

Rotation (rad/s)

Page 11: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

How do we model the features?~yt = C~xt

~xt = A1~xt�1 +A2~xt�2 + ...+Ad~xt�d

Featurevectors!

Page 12: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

What can we do with these features?

93% accuracy

Automated identification of abnormal respiratory ciliary motion in nasal biopsies.Quinn SP, Zahid M, Durkin J, Francis R, Lo C, Chennubhotla CS. Science Translational Medicine 2015.

Page 13: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

Great, but…

Page 14: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

…definitely more than two motion types

Page 15: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

Subtypes likely have clinical implications

• Primary ciliary dyskinesia– Genetic disorder directly

affecting cilia• Other disorders highly

correlated with ciliary dysfunction– Congenital heart disease– Heterotaxy / situs inversus– Cognitive defects– Developmental defects

Page 16: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

Short answer: Yes! Clustering!

• AR parameters A1, A2, …, Ad

• Nonlinear space• Geodesic distance metrics

– “Vanilla” K-means is out

~yt = C~xt

~xt = A1~xt�1 +A2~xt�2 + ...+Ad~xt�d

Page 17: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

Dataset(s)2015 Classification Study

• 291 videos

Unsupervised subtyping• 291 from previous study• 431 left out (artifacts)• 628 from internal

collaborators• 1000+ from external

collaborators

• ~200MB / video• ~500GB raw data

Page 18: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

Data Acquisition

http://ciliaweb.csb.pitt.edu

Page 19: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

Spark Pipeline• Preprocess videos

– Identify regions of interest (patches)

– Compute optical flow & motion features (rotation, deformation)

rdd = raw.flatMap(find_rois)

.map(flow_features)

Preprocess Features Clustering

(OpenCV, scikit-image, PCA-flow)

Page 20: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

Spark Pipeline• Derive AR subspace

– Principal components– Compute AR motion

parameters A1…Ad

svd = rdd.computeSVD()

_svd_ = sc.broadcast(svd)ar = rdd.map(ar_params)

Preprocess Features Clustering

(SciPy, thunder, bolt)

Page 21: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

Spark Pipeline• Cluster parameters

– Pairwise similarity– Eigendecomposition of

graph Laplacian

L = ar.cartesian(ar) \.map(pairwise)

X = L.computeSVD()

DON’T DO THIS. EVER.

Preprocess Features Clustering

(scikit-learn)

Page 22: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

Eigenvectors of L

Page 23: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

Conclusions• 93% classification: methods are sound

– Dynamic texture representation is accurate• Low-dim embeddings of AR motion

parameters– Definitely more complicated than normal /

abnormal• Need lots of data!

Page 24: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

Big picture• Blackbox tool for clinicians

– Web front-end + Python middleware + Spark back-end

• Upload video -> Get analysis– Assist experts with diagnostics

• Expert input– Phenotype annotations, regions of interest

Page 25: Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by Shannon Quinn

THANK YOU.• [email protected]• @SpectralFilter• https://magsol.github.io/


Recommended