®
How Will AI Revolutionize Biomedical Research?Adapting Deep Learning for the Life Sciences
© 2017 Netrias, LLC® All rights reserved.
®How will artificial intelligence revolutionize biomedical research?
Aspirational AI Bridging the gap with Data
Analysis CollectionIntegration
Reality
High-throughput Reads
Assembly
Annotation
2
© 2017 Netrias, LLC® All rights reserved.
®What are the unsolved multi-omics data challenges?
3
© 2017 Netrias, LLC® All rights reserved.
®What are the unsolved multi-omics data challenges?
Problem Definition Description
IntegrationData Heterogeneity
Multiple data types silos analysis and biases results
4
© 2017 Netrias, LLC® All rights reserved.
®What are the unsolved multi-omics data challenges?
Problem Definition Description
IntegrationData Heterogeneity
Multiple data types silos analysis and biases results
AnalysisData Dimensionality
Limited samples and high dimensionality with missing data hinders analysis.
DNA
Protein
~20k coding genes x 3 levels = 60k dimensional space >> 3 dimensional space for positioning
5
© 2017 Netrias, LLC® All rights reserved.
®What are potential analytic solutions to the challenges?
Problem Definition Promising Analytic Solution
IntegrationData Heterogeneity
Align integration with research
AnalysisData Dimensionality
Incorporate prior knowledge into learning workflows
Transcription
Proteomic
Analysis Result
Analysis Result
Prior Knowledge
Transcription
DNA
Protein
~20k coding genes x 3 levels = 60k dimensional space >> 3
dimensional space for positioning
6
© 2017 Netrias, LLC® All rights reserved.
®What are problems with shallow learning?
7
Looks for linear separability or direct relationships among variables in the data to extract patterns.
Problem Impact
Transformations and feature
generation is complex for
high-dimensional, sparse,
non-linear omics data
Leads to research delays for timely diagnosis and treatment
Dimensionality reduction cannot
capture both inter/intra omic
relationships
Lose underlying relationships that model disease mechanisms
© 2017 Netrias, LLC® All rights reserved.
®How can we move to deep architectures?
• Well-equipped to handle high dimensional, sparse, noisy data with nonlinear relationships• Provides high generalizability for multiplatform data common in the life sciences• Abstracts data to learn complex features/patterns to identify the breadth vs depth trade-offs
8
Multi-layer Multi Kernel Learning Deep Neural Network
Mamoshina, P., Vieira, A., Putin, E., & Zhavoronkov, A. (2016). Applications of Deep Learning in Biomedicine. Molecular Pharmaceutics, 13(5), 1445–1454. http://doi.org/10.1021/acs.molpharmaceut.5b00982
© 2017 Netrias, LLC® All rights reserved.
®What are adaptation of deep architectures?
9
Image → Object : Gene Expression → Disease
Convolutional Neural Network
Recurrent Neural Network
Sentence → Language : Cell Signalling → Signaling Cascades
Stacked Autoencoder
Image → Features : Gene Expression → Transcription Factor
© 2017 Netrias, LLC® All rights reserved.
®
Use case: Deep architectures in colorectal cancer research
10© 2017 Netrias, LLC® All rights reserved.
®Identify factors that correlate with colorectal cancer intervention
Technology Problem
11
MATLAB and local machine took ~2.5 weeks of processing
Analytic Problem
Hill-climbing was computationally intensive and support vector machine cannot accurately model complex system
Scientific Problem
161 Tumor Tissue162 Control Tissue
Treatment Type + Pathological Response
DNA SNP Microarray Data
Incomplete
Type of Response to Chemotherapy
Medium
Complete
© 2017 Netrias, LLC® All rights reserved.
®Classify pathological response to chemotherapy
Hidden Layers
Incomplete Pathological Response
12
Complete Pathological Response
x1
x2
xN-1
xN
.
.
....
y1
y2
Challenges:
1. Many degrees of freedom model configuration (layers x nodes / layer x activation function)
2. Long training/testing times per model configuration for refinement
Ex. 1 hour training time per model x 10 models = 10 hours
© 2017 Netrias, LLC® All rights reserved.
®QC neural network to predict pathological response
13
x1
x2
xN-1
xN
.
.
. ...
y1
y2
Hidden Layers
Complete Pathological Response
Incomplete Pathological Response
Quality Control Layer
(Statistical Significance)
.
.
.
Incorporate standard quality control test to identify statistically significant transcriptional dysregulation into the learning process (analogous to pooling layer of a CNN).
© 2017 Netrias, LLC® All rights reserved.
®Knowledge-driven QC-net to predict pathological response
14
x1
x2
xN-1
xN
.
.
. ...
y1
y2
Hidden Layers
Complete Pathological Response
Incomplete Pathological Response
Quality Control Layer
.
.
.
Pathway Annotation
Layer
Prior Knowledge
Incorporate prior-knowledge in the form of pathway annotations to statistically significant genes to conduct knowledge-driven dimensionality reduction (analogous to convolutional + pooling layer of CNN).
© 2017 Netrias, LLC® All rights reserved.
®Distributed computing enables faster model selection and refinement
15
N, user managed, on-demand, parallel runs
Cloud Storage
EC2 NodeEC2 NodeCompute Node
EC2 NodeEC2 NodeCompute Node
EC2 NodeEC2 NodeCompute Node
ParallelizationSpin upProvisionExecuteTear down
User managed single run
Ensure reproducibility
Take advantage of commodity distributed storage and compute infrastructure for orders of magnitude faster model training and test, as well as the ability to ensure analytic workflow reproducibility
© 2017 Netrias, LLC® All rights reserved.
®Parallelizing model runs for scalable technology solution
16
EC2 NodeEC2 NodeCompute Node 1Train/Test
EC2 NodeEC2 NodeCompute Node 2Train/Test
EC2 NodeEC2 NodeCompute Node N Train/Test
S3Model 1 Results
Model 2 Results
Model N Results
...
Data-Driven Model Refinement
Advantages:
1. More powerful hardware to rapidly test various models
2. Parallelize model + parameter runs (1 instance → 8 instances)
Quickly iterate:5 mins per model x 9 models / 8 instances= 6 mins vs. 2.5 weeks
© 2017 Netrias, LLC® All rights reserved.
®Results: Shallow vs Deep architectures
17
National Cancer Institute Netrias (Raw Data) Netrias (Knowledge Driven)
Input Data Normalized to Control Gene Expr of 161 patients reduced to 68 patients that have incomplete and complete response to immuno-chemotherapy.
Training/Test 54/14 32/36 32/36
Features 41 Transcripts 41 Transcripts 37 PathwaysFeature Eng QC + Hill Climbing QC QC + Pathway Analysis
Model Support Vector MachineFully connected Deep Net - 2
layers (10,11)Fully connected Deep Net - 2 layers
(10,11)AUC 0.86 0.887 0.79
Sensitivity 0.31 0.733 0.8
Time ~2.5 weeks ~Minutes ~Minutes
Tech MATLAB 8.4.0.150421 + local laptop
Cloud Computing Cloud Computing
Model Reproducibility No Yes Yes
© 2017 Netrias, LLC® All rights reserved.
®Conclusion and next stepsConclusions
1. Deep learning frameworks provide significant improvement (such as 2.5x sensitivity) over linear classifiers in prediction of pathological response.
2. Distributed compute infrastructure enables parallel model runs, reducing run-times three orders of magnitude (weeks → minutes)
Next Steps
1. Develop novel characterization metrics to fully describe model performance and uncertainty
2. Use methods of model explainability to identify novel biomarkers for disease
18
© 2017 Netrias, LLC® All rights reserved.
reveal the hidden state of the system™
®
19© 2017 Netrias, LLC® All rights reserved.