Date post: | 08-Jan-2017 |
Category: |
Data & Analytics |
Upload: | spark-summit |
View: | 479 times |
Download: | 0 times |
Scaling unsupervised ciliary motion analysis for actionable biomedical insights with PySpark
Shannon QuinnUniversity of Georgia
Who am I?• Georgia Tech alumnus
• Carnegie Mellon University & University of Pittsburgh alumnus
• Assistant Professor of Computer Science & Cellular Biology at University of Georgia
• Public health, imaging, data science, open science, running…
What are cilia?
Scale bars: 10μm
Why do we care about cilia?• Clinical
– Ciliopathies– Association with
congenital heart disease
• Developmental– Nodal flow– Left-right asymmetry
How do we diagnose ciliopathies?Cheap, fast, inaccurate Slow, expensive, accurate (?)
Measure nasal nitric oxide (NO)
levels
Electron microscopy to search
for structural defects
Ciliary beat frequency
(CBF) computation
Manual ciliary beat
pattern analysis
“Gold standard”
What is our goal?• Input: high-speed video of ciliary biopsy• Output: quantitative properties of observed motion
Curly!
Strategy for quantifying motion
From videos to features
Features of motion
Scaling Deformation(biaxial shear)
Rotation(curl)
Not useful in 2D
Novel use of differential image velocity invariants to categorize ciliary motion defects.Quinn SP, Francis R, Lo C, Chennubhotla CS. Proceedings of the Biomedical Science and Engineering Conference (BSEC) 2011.
What do these features look like?
Rotation (rad/s)
How do we model the features?~yt = C~xt
~xt = A1~xt�1 +A2~xt�2 + ...+Ad~xt�d
Featurevectors!
What can we do with these features?
93% accuracy
Automated identification of abnormal respiratory ciliary motion in nasal biopsies.Quinn SP, Zahid M, Durkin J, Francis R, Lo C, Chennubhotla CS. Science Translational Medicine 2015.
Great, but…
…definitely more than two motion types
Subtypes likely have clinical implications
• Primary ciliary dyskinesia– Genetic disorder directly
affecting cilia• Other disorders highly
correlated with ciliary dysfunction– Congenital heart disease– Heterotaxy / situs inversus– Cognitive defects– Developmental defects
Short answer: Yes! Clustering!
• AR parameters A1, A2, …, Ad
• Nonlinear space• Geodesic distance metrics
– “Vanilla” K-means is out
~yt = C~xt
~xt = A1~xt�1 +A2~xt�2 + ...+Ad~xt�d
Dataset(s)2015 Classification Study
• 291 videos
Unsupervised subtyping• 291 from previous study• 431 left out (artifacts)• 628 from internal
collaborators• 1000+ from external
collaborators
• ~200MB / video• ~500GB raw data
Data Acquisition
http://ciliaweb.csb.pitt.edu
Spark Pipeline• Preprocess videos
– Identify regions of interest (patches)
– Compute optical flow & motion features (rotation, deformation)
rdd = raw.flatMap(find_rois)
.map(flow_features)
Preprocess Features Clustering
(OpenCV, scikit-image, PCA-flow)
Spark Pipeline• Derive AR subspace
– Principal components– Compute AR motion
parameters A1…Ad
svd = rdd.computeSVD()
_svd_ = sc.broadcast(svd)ar = rdd.map(ar_params)
Preprocess Features Clustering
(SciPy, thunder, bolt)
Spark Pipeline• Cluster parameters
– Pairwise similarity– Eigendecomposition of
graph Laplacian
L = ar.cartesian(ar) \.map(pairwise)
X = L.computeSVD()
DON’T DO THIS. EVER.
Preprocess Features Clustering
(scikit-learn)
Eigenvectors of L
Conclusions• 93% classification: methods are sound
– Dynamic texture representation is accurate• Low-dim embeddings of AR motion
parameters– Definitely more complicated than normal /
abnormal• Need lots of data!
Big picture• Blackbox tool for clinicians
– Web front-end + Python middleware + Spark back-end
• Upload video -> Get analysis– Assist experts with diagnostics
• Expert input– Phenotype annotations, regions of interest
THANK YOU.• [email protected]• @SpectralFilter• https://magsol.github.io/