Scaling Unsupervised Ciliary Motion Analysis for Actionable Biomedical Insights with PySpark by...

Post on 08-Jan-2017

479 views 0 download

transcript

Scaling unsupervised ciliary motion analysis for actionable biomedical insights with PySpark

Shannon QuinnUniversity of Georgia

Who am I?• Georgia Tech alumnus

• Carnegie Mellon University & University of Pittsburgh alumnus

• Assistant Professor of Computer Science & Cellular Biology at University of Georgia

• Public health, imaging, data science, open science, running…

What are cilia?

Scale bars: 10μm

Why do we care about cilia?• Clinical

– Ciliopathies– Association with

congenital heart disease

• Developmental– Nodal flow– Left-right asymmetry

How do we diagnose ciliopathies?Cheap, fast, inaccurate Slow, expensive, accurate (?)

Measure nasal nitric oxide (NO)

levels

Electron microscopy to search

for structural defects

Ciliary beat frequency

(CBF) computation

Manual ciliary beat

pattern analysis

“Gold standard”

What is our goal?• Input: high-speed video of ciliary biopsy• Output: quantitative properties of observed motion

Curly!

Strategy for quantifying motion

From videos to features

Features of motion

Scaling Deformation(biaxial shear)

Rotation(curl)

Not useful in 2D

Novel use of differential image velocity invariants to categorize ciliary motion defects.Quinn SP, Francis R, Lo C, Chennubhotla CS. Proceedings of the Biomedical Science and Engineering Conference (BSEC) 2011.

What do these features look like?

Rotation (rad/s)

How do we model the features?~yt = C~xt

~xt = A1~xt�1 +A2~xt�2 + ...+Ad~xt�d

Featurevectors!

What can we do with these features?

93% accuracy

Automated identification of abnormal respiratory ciliary motion in nasal biopsies.Quinn SP, Zahid M, Durkin J, Francis R, Lo C, Chennubhotla CS. Science Translational Medicine 2015.

Great, but…

…definitely more than two motion types

Subtypes likely have clinical implications

• Primary ciliary dyskinesia– Genetic disorder directly

affecting cilia• Other disorders highly

correlated with ciliary dysfunction– Congenital heart disease– Heterotaxy / situs inversus– Cognitive defects– Developmental defects

Short answer: Yes! Clustering!

• AR parameters A1, A2, …, Ad

• Nonlinear space• Geodesic distance metrics

– “Vanilla” K-means is out

~yt = C~xt

~xt = A1~xt�1 +A2~xt�2 + ...+Ad~xt�d

Dataset(s)2015 Classification Study

• 291 videos

Unsupervised subtyping• 291 from previous study• 431 left out (artifacts)• 628 from internal

collaborators• 1000+ from external

collaborators

• ~200MB / video• ~500GB raw data

Data Acquisition

http://ciliaweb.csb.pitt.edu

Spark Pipeline• Preprocess videos

– Identify regions of interest (patches)

– Compute optical flow & motion features (rotation, deformation)

rdd = raw.flatMap(find_rois)

.map(flow_features)

Preprocess Features Clustering

(OpenCV, scikit-image, PCA-flow)

Spark Pipeline• Derive AR subspace

– Principal components– Compute AR motion

parameters A1…Ad

svd = rdd.computeSVD()

_svd_ = sc.broadcast(svd)ar = rdd.map(ar_params)

Preprocess Features Clustering

(SciPy, thunder, bolt)

Spark Pipeline• Cluster parameters

– Pairwise similarity– Eigendecomposition of

graph Laplacian

L = ar.cartesian(ar) \.map(pairwise)

X = L.computeSVD()

DON’T DO THIS. EVER.

Preprocess Features Clustering

(scikit-learn)

Eigenvectors of L

Conclusions• 93% classification: methods are sound

– Dynamic texture representation is accurate• Low-dim embeddings of AR motion

parameters– Definitely more complicated than normal /

abnormal• Need lots of data!

Big picture• Blackbox tool for clinicians

– Web front-end + Python middleware + Spark back-end

• Upload video -> Get analysis– Assist experts with diagnostics

• Expert input– Phenotype annotations, regions of interest

THANK YOU.• spq@uga.edu• @SpectralFilter• https://magsol.github.io/