+ All Categories
Home > Documents > How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics...

How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics...

Date post: 14-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
19
® How Will AI Revolutionize Biomedical Research? Adapting Deep Learning for the Life Sciences © 2017 Netrias, LLC® All rights reserved.
Transcript
Page 1: How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics data challenges? Integration Data Problem Definition Description Heterogeneity Multiple

®

How Will AI Revolutionize Biomedical Research?Adapting Deep Learning for the Life Sciences

© 2017 Netrias, LLC® All rights reserved.

Page 2: How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics data challenges? Integration Data Problem Definition Description Heterogeneity Multiple

®How will artificial intelligence revolutionize biomedical research?

Aspirational AI Bridging the gap with Data

Analysis CollectionIntegration

Reality

High-throughput Reads

Assembly

Annotation

2

© 2017 Netrias, LLC® All rights reserved.

Page 3: How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics data challenges? Integration Data Problem Definition Description Heterogeneity Multiple

®What are the unsolved multi-omics data challenges?

3

© 2017 Netrias, LLC® All rights reserved.

Page 4: How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics data challenges? Integration Data Problem Definition Description Heterogeneity Multiple

®What are the unsolved multi-omics data challenges?

Problem Definition Description

IntegrationData Heterogeneity

Multiple data types silos analysis and biases results

4

© 2017 Netrias, LLC® All rights reserved.

Page 5: How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics data challenges? Integration Data Problem Definition Description Heterogeneity Multiple

®What are the unsolved multi-omics data challenges?

Problem Definition Description

IntegrationData Heterogeneity

Multiple data types silos analysis and biases results

AnalysisData Dimensionality

Limited samples and high dimensionality with missing data hinders analysis.

DNA

Protein

~20k coding genes x 3 levels = 60k dimensional space >> 3 dimensional space for positioning

5

© 2017 Netrias, LLC® All rights reserved.

Page 6: How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics data challenges? Integration Data Problem Definition Description Heterogeneity Multiple

®What are potential analytic solutions to the challenges?

Problem Definition Promising Analytic Solution

IntegrationData Heterogeneity

Align integration with research

AnalysisData Dimensionality

Incorporate prior knowledge into learning workflows

Transcription

Proteomic

Analysis Result

Analysis Result

Prior Knowledge

Transcription

DNA

Protein

~20k coding genes x 3 levels = 60k dimensional space >> 3

dimensional space for positioning

6

© 2017 Netrias, LLC® All rights reserved.

Page 7: How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics data challenges? Integration Data Problem Definition Description Heterogeneity Multiple

®What are problems with shallow learning?

7

Looks for linear separability or direct relationships among variables in the data to extract patterns.

Problem Impact

Transformations and feature

generation is complex for

high-dimensional, sparse,

non-linear omics data

Leads to research delays for timely diagnosis and treatment

Dimensionality reduction cannot

capture both inter/intra omic

relationships

Lose underlying relationships that model disease mechanisms

© 2017 Netrias, LLC® All rights reserved.

Page 8: How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics data challenges? Integration Data Problem Definition Description Heterogeneity Multiple

®How can we move to deep architectures?

• Well-equipped to handle high dimensional, sparse, noisy data with nonlinear relationships• Provides high generalizability for multiplatform data common in the life sciences• Abstracts data to learn complex features/patterns to identify the breadth vs depth trade-offs

8

Multi-layer Multi Kernel Learning Deep Neural Network

Mamoshina, P., Vieira, A., Putin, E., & Zhavoronkov, A. (2016). Applications of Deep Learning in Biomedicine. Molecular Pharmaceutics, 13(5), 1445–1454. http://doi.org/10.1021/acs.molpharmaceut.5b00982

© 2017 Netrias, LLC® All rights reserved.

Page 9: How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics data challenges? Integration Data Problem Definition Description Heterogeneity Multiple

®What are adaptation of deep architectures?

9

Image → Object : Gene Expression → Disease

Convolutional Neural Network

Recurrent Neural Network

Sentence → Language : Cell Signalling → Signaling Cascades

Stacked Autoencoder

Image → Features : Gene Expression → Transcription Factor

© 2017 Netrias, LLC® All rights reserved.

Page 10: How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics data challenges? Integration Data Problem Definition Description Heterogeneity Multiple

®

Use case: Deep architectures in colorectal cancer research

10© 2017 Netrias, LLC® All rights reserved.

Page 11: How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics data challenges? Integration Data Problem Definition Description Heterogeneity Multiple

®Identify factors that correlate with colorectal cancer intervention

Technology Problem

11

MATLAB and local machine took ~2.5 weeks of processing

Analytic Problem

Hill-climbing was computationally intensive and support vector machine cannot accurately model complex system

Scientific Problem

161 Tumor Tissue162 Control Tissue

Treatment Type + Pathological Response

DNA SNP Microarray Data

Incomplete

Type of Response to Chemotherapy

Medium

Complete

© 2017 Netrias, LLC® All rights reserved.

Page 12: How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics data challenges? Integration Data Problem Definition Description Heterogeneity Multiple

®Classify pathological response to chemotherapy

Hidden Layers

Incomplete Pathological Response

12

Complete Pathological Response

x1

x2

xN-1

xN

.

.

....

y1

y2

Challenges:

1. Many degrees of freedom model configuration (layers x nodes / layer x activation function)

2. Long training/testing times per model configuration for refinement

Ex. 1 hour training time per model x 10 models = 10 hours

© 2017 Netrias, LLC® All rights reserved.

Page 13: How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics data challenges? Integration Data Problem Definition Description Heterogeneity Multiple

®QC neural network to predict pathological response

13

x1

x2

xN-1

xN

.

.

. ...

y1

y2

Hidden Layers

Complete Pathological Response

Incomplete Pathological Response

Quality Control Layer

(Statistical Significance)

.

.

.

Incorporate standard quality control test to identify statistically significant transcriptional dysregulation into the learning process (analogous to pooling layer of a CNN).

© 2017 Netrias, LLC® All rights reserved.

Page 14: How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics data challenges? Integration Data Problem Definition Description Heterogeneity Multiple

®Knowledge-driven QC-net to predict pathological response

14

x1

x2

xN-1

xN

.

.

. ...

y1

y2

Hidden Layers

Complete Pathological Response

Incomplete Pathological Response

Quality Control Layer

.

.

.

Pathway Annotation

Layer

Prior Knowledge

Incorporate prior-knowledge in the form of pathway annotations to statistically significant genes to conduct knowledge-driven dimensionality reduction (analogous to convolutional + pooling layer of CNN).

© 2017 Netrias, LLC® All rights reserved.

Page 15: How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics data challenges? Integration Data Problem Definition Description Heterogeneity Multiple

®Distributed computing enables faster model selection and refinement

15

N, user managed, on-demand, parallel runs

Cloud Storage

EC2 NodeEC2 NodeCompute Node

EC2 NodeEC2 NodeCompute Node

EC2 NodeEC2 NodeCompute Node

ParallelizationSpin upProvisionExecuteTear down

User managed single run

Ensure reproducibility

Take advantage of commodity distributed storage and compute infrastructure for orders of magnitude faster model training and test, as well as the ability to ensure analytic workflow reproducibility

© 2017 Netrias, LLC® All rights reserved.

Page 16: How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics data challenges? Integration Data Problem Definition Description Heterogeneity Multiple

®Parallelizing model runs for scalable technology solution

16

EC2 NodeEC2 NodeCompute Node 1Train/Test

EC2 NodeEC2 NodeCompute Node 2Train/Test

EC2 NodeEC2 NodeCompute Node N Train/Test

S3Model 1 Results

Model 2 Results

Model N Results

...

Data-Driven Model Refinement

Advantages:

1. More powerful hardware to rapidly test various models

2. Parallelize model + parameter runs (1 instance → 8 instances)

Quickly iterate:5 mins per model x 9 models / 8 instances= 6 mins vs. 2.5 weeks

© 2017 Netrias, LLC® All rights reserved.

Page 17: How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics data challenges? Integration Data Problem Definition Description Heterogeneity Multiple

®Results: Shallow vs Deep architectures

17

National Cancer Institute Netrias (Raw Data) Netrias (Knowledge Driven)

Input Data Normalized to Control Gene Expr of 161 patients reduced to 68 patients that have incomplete and complete response to immuno-chemotherapy.

Training/Test 54/14 32/36 32/36

Features 41 Transcripts 41 Transcripts 37 PathwaysFeature Eng QC + Hill Climbing QC QC + Pathway Analysis

Model Support Vector MachineFully connected Deep Net - 2

layers (10,11)Fully connected Deep Net - 2 layers

(10,11)AUC 0.86 0.887 0.79

Sensitivity 0.31 0.733 0.8

Time ~2.5 weeks ~Minutes ~Minutes

Tech MATLAB 8.4.0.150421 + local laptop

Cloud Computing Cloud Computing

Model Reproducibility No Yes Yes

© 2017 Netrias, LLC® All rights reserved.

Page 18: How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics data challenges? Integration Data Problem Definition Description Heterogeneity Multiple

®Conclusion and next stepsConclusions

1. Deep learning frameworks provide significant improvement (such as 2.5x sensitivity) over linear classifiers in prediction of pathological response.

2. Distributed compute infrastructure enables parallel model runs, reducing run-times three orders of magnitude (weeks → minutes)

Next Steps

1. Develop novel characterization metrics to fully describe model performance and uncertainty

2. Use methods of model explainability to identify novel biomarkers for disease

18

© 2017 Netrias, LLC® All rights reserved.

Page 19: How Will AI Revolutionize Biomedical Research? · 2017-09-14 · What are the unsolved multi-omics data challenges? Integration Data Problem Definition Description Heterogeneity Multiple

reveal the hidden state of the system™

®

19© 2017 Netrias, LLC® All rights reserved.


Recommended