Pistoia Alliance debates AI in life science

Post on 22-Jan-2018

234 views 0 download

transcript

2 October, 2017

Where will AI/Deep learning have

an impact in Life Science & Health

Pistoia Alliance Debates

27 September 2017

Nick Lynch

This webinar is being recorded

© P

isto

ia A

llia

nce

Poll Question 1: What role do you play in

your company

A. IT

B. data scientist/informatician

C. scientist

D. information professional

E. other

© P

isto

ia A

llia

nce

The Panel

4

Peter Henstock

Senior Manager,

Business Technology

group, Pfizer

Sean Ekins, CEO and

Founder Collaborations

Pharmaceuticals

David Pearah , CEO,

HDF Group

Poll Question 2: What is your familiarity

with AI/Deep learning?

A. I am using AI/Deep learning

B. I am experimenting with AI/Deep learning

C. I am aware of AI/Deep learning

D. I know next to nothing about it

© P

isto

ia A

llia

nce

David Pearah, CEO

HDF Group

Learning from other industry sectors

© P

isto

ia A

llia

nce

What is it?7

© P

isto

ia A

llia

nce

Data Science and Artificial Intelligence

Hype?

Yes.

Real Substance

and Impact?

Yes.

© P

isto

ia A

llia

nce

Artificial Intelligence

(AI)

Field of computer science that allows

computers to “seem human” in some way

by replicating human cognitive functions

(e.g., learning and problem solving)

Machine Learning

(ML)

Subset of AI approaches that gives

computers the ability to learn from

and make predictions on data without

being explicitly programmed (i.e. learn

on their own from new data)

Deep Learning

(DL)

Simulates many (deep) hierarchical

layers of neurons in the human brain: by

running large amounts of data through

this simulation, it develops its own

understanding of the concepts inherent in

the data

© P

isto

ia A

llia

nce

• Storage and processing power as a cheap, on-demand utility:

• Graphics Processing Units (GPUs)

• Cloud computing allows affordable GPUs at scale

• Critical mass in open source software community

• Powerful new applications for known AI techniques (e.g., deep learning)

• Global, online AI community sharing advances daily

• Open source software from the community and tech giants (e.g., Google TensorFlow)

• Huge AI investments from tech titans who see AI as a strategic asset

• Exponential growth in data to analyze using DL. In life science:

• Electronic health records

• Genomic data

• Patient monitoring and treatment devices (e.g., EKG, Pulse, Oxygen, IV Pumps, etc..)

• Consumer biomonitoring devices (e.g., FitBit, Apple Watch, smartphones)

• Environmental data

• Data registries

• Medical literature and supporting primary data

Deep Learning (DL): Why Now?

© P

isto

ia A

llia

nce

© P

isto

ia A

llia

nce

Artificial Intelligence

Machine Learning

Knowledge Representation and Reasoning

Automated Planning

Natural Language Processing

Multi-Agent Systems

Robotics

Reinforcement Learning Supervised Learning Semi-supervised Learning Unsupervised LearningMarkov Decision

Processes (e.g. Policy iteration)

Classification/Regression Clustering Summarization Anomaly Detection

Distance-based (e.g.: LOF)Model-based (e.g.: MMPP)

Graphical and Statistical(e.g.: Exponential

Smoothing)

Dimensionality Reduction(e.g. PCA, SVD)

Association and Sequence models (e.g.: apriori

algorithm)

Density-based (e.g.: DBSCAN)

Hierarchical (e.g.: Single-linkage)

Centriod-based (e.g.: K-Means)

Distribution-based(e.g.: Mixture of

Gaussians)

Instance-based (e.g.: KNN, CBR)

Decision Tree(e.g.: Random

Forest)

Artificial Neural Networks (e.g.

Perceptron)

Bayesian Networks(e.g.: Naïve Bayes)

Kernel-based (e.g. SVM)

© P

isto

ia A

llia

nce

Creating artificial intelligence solutions using supervised learning with a neural

network:

Dogs

2

Collecting and annotating data sets

3Training via Computation

4Independent Validation of the Algorithm

5Deployment and Monitoring

1 Define a Narrative AI Use Case

Cats

© P

isto

ia A

llia

nce

What is it?

© P

isto

ia A

llia

nce

What’s happening?

What is it?

© P

isto

ia A

llia

nce

16

I/O library

optimized for

scale + speed

Self-

documenting

container

optimized for

scientific data +

metadata

Users who need both features

HDF5 + Deep Learning

1

6

HDF5 already integrated into every major DL Framework

(TensorFlow, Caffe, Keras, etc.)

© P

isto

ia A

llia

nce

v

v

v

What does the HDF Group do?

• HDF5 Community Edition + Enterprise Edition

• Connectors: ODBC + Cloud (Beta)

• Add-Ons: compression + encryption

• HDF Support Packages (Basic + Pro + Premier)

• Support for h5py + PyTables + pandas (NEW)

• Training

• HDF: new functionality + performance tuning for specific use cases

• HPC software engineering with scientific expertise

• Deep Learning expertise

Products

Support

Consulting

1

7

© P

isto

ia A

llia

nce

Questions? Comments?

Dave Pearah, CEODavid.Pearah@hdfgroup.org

www.hdfgroup.org

© P

isto

ia A

llia

nce

Poll Question 3: What is your company’s

primary use for AI/Deep learning

A. Early Discovery/ Pre-clinical

B. Development & Clinical

C. Imaging Analysis

D. Other

E. Don’t use AI

Sean Ekins, CEO, Collaborations

Pharmaceuticals, Inc.Deep Learning in Pharmaceutical Research

© P

isto

ia A

llia

nce

AI in Pharma is not new!

222 October, 2017

• Neural Networks

• Genetic algorithms

• SVM

• ‘Used’ for decades

• Why it never took off:

– Compute power

– Lack of training data

– Limited support

– Most Scientists did not believe them…needed a

paradigm shift

– Pharma mergers culled 10,000’s scientists

DEEP LEARNING

© P

isto

ia A

llia

nce

Big data in 2002 vs 2017

232 October, 2017

Now -TB data ~19,000 cpds

© P

isto

ia A

llia

nce

HTS phenotypic

screen

Molecule Screening database

Machine learning models

Vendor library

Top scoring molecules assayedin vitro

Bernoulli Naive Bayes, Logistic linear regression, AdaBoost Decision Trees, Random Forest, Support

Vector Machines (SVM), Deep Neural networks (DNN)

Speeding drug discovery with AI

▶ Molecular pattern recognition

of biological data

▶ Descriptors identify these

patterns

▶ Define active and inactive

features

▶ Used to generate predictions

for drug activity at a certain

target (organism, protein of

interest)

© P

isto

ia A

llia

nce

What is Deep Learning

252 October, 2017

© P

isto

ia A

llia

nce

Deep Learning uses

262 October, 2017

• facial recognition

algorithms

– Facebook tagging

photos

• self-driving cars

• robot assistants http://tinyurl.com/hak4lcv

http://tinyurl.com/y8vjv8lp

© P

isto

ia A

llia

nce

Deep Learning in Pharmaceutical Research

272 October, 2017

• Bioinformatics

– Protein disorder

– Refine docking

complexes

– Model CLIP-seq data

– High content image

analysis data

– Biomarkers

– Protein contacts

– Cancer diagnosis

• Pharmaceutical

– Solubility

– Gene expression data

– Formulation

– QSAR – Merck DL out

performed random

forests in 11 /15 and

13/15 datasets

– Tox21

Where else could we apply DL in drug discovery?Pharmacoeconomics?

© P

isto

ia A

llia

nce

Gaps in Deep Learning for Pharmaceutical research

282 October, 2017

• TensorFlow

• Deeplearning4j

• Facebook (Torch)

• Microsoft (CNTK)

• Which metrics to use?

• Which descriptors?

• Are the DL over training?

• Lack of prospective testing.

© P

isto

ia A

llia

nce

Recent Deep Learning papers

292 October, 2017

© P

isto

ia A

llia

nce

Comparison of TB Machine-Learning Models (1µM)

302 October, 2017

Logistic Regression (LR)

Adaboosted Decision Trees (ADA)

Random Forest (RF)

Naive-bayes (BNB)

Support Vector Machines (SVM)

Deep Neural Networks (DNN)

▶ TB data from literature

▶ ~19,000 molecules

▶ ECFP6 descriptors

▶ Used previously with

Bayesian methods

▶ Multiple metrics

▶ 5 fold cross val

▶ Classic ML -Open source

Scikit-learn http://scikit-

learn.org/stable/

▶ Deep Neural Networks

(DNN) using Keras

https://keras.io/, and

Tensorflow

www.tensorflow.org,

© P

isto

ia A

llia

nce

Small scale Machine Learning comparison

312 October, 2017

• Comparing different

algorithms and using FCFP6

fingerprints

• Deep learning seems to

improve model ROC statistics

in 4/6 cases.

• Data sets range from 100s –

>300K

• All classification models

• Next steps evaluate all the

datasets in ChEMBL,

PubChem, ToxCast etc

31

Korotcov et al., Submitted

© P

isto

ia A

llia

nce

Building Machine Learning models Assay Central

322 October, 2017

• Curate data and build

models

• Provide models and

collections as jar files

Add DL algorithm to Assay Central

© P

isto

ia A

llia

nce

Acknowledgments

332 October, 2017

• Kim Zorn Assay Central Guru

• Alex Clark Assay Central

• Thomas Lane PhD intern UNC

• Dan Russo PhD intern Rutgers

• Jacob Gerlach High School Intern

• Valery Tkachenko Deep Learning Consultant

• Alex Korotcov Deep Learning Consultant

• Thanks also to: Renee Arnold, Peter Swaan

Funding from NIGMS NIH R43GM122196

© P

isto

ia A

llia

nce

Poll Question 4: What is the greatest

barrier to application of AI at your org

A. Technical & skills expertise

B. Access to data

C. Data quality

D. Management support/understanding

E. Other

Peter Henstock - Business

Technology, Pfizer Inc.

Why is pharma lagging in the AI arena whereas

other industries are already transformed

© P

isto

ia A

llia

nce

AI Works

© P

isto

ia A

llia

nce

What does Waze do?

• Obtain public data: maps & locations

• Acquire & organize data for AI analyses

– Leverage historical traffic data

– Integrate new traffic information

• Utilize AI algorithms

– Fastest route predictions

• Present timely information through UI

© P

isto

ia A

llia

nce

Why Isn’t AI Working Yet for Pharma?

drugwazeRescreening55% chance of new series 6 weeks $1.2MM

Optimization14% issue series 1

Solubility cause23% issue series 2

Safety cause5% issue series 3

8.2 months to Phase 1

Predicted FDA approval chance: 37%

Recommended actions: 1) Resolve the

© P

isto

ia A

llia

nce

Keys to Success

• Obtain public data

• Acquire & organize data for AI analyses

• Utilize AI algorithms

• Present timely information through UI

© P

isto

ia A

llia

nce

Need for a Chief Data Officer

Value Proposition

https://www.123rf.com/photo_17347316_businessman-pulling-rope-on-white-background.html

$ $ $

Acquire and organize data for AI

© P

isto

ia A

llia

nce

Analytics First, Then AI

• Readiness for Analytics & AI

– Curated data sources

– Automated data management processes

– Structured data analytics

• “If your company isn’t good at analytics,

it’s not ready for AI”

– Harvard Business Review June 7, 2017

© P

isto

ia A

llia

nce

Keys to Success

• Obtain public data

• Acquire & organize data for AI analyses

• Utilize AI algorithms

• Present timely information through UI

© P

isto

ia A

llia

nce

Harvard Business Review October 2012

© P

isto

ia A

llia

nce

Modern Data Scientist

Math

Statistics

AI

Hacking

Database

Computing

Story Telling

Visualization

Domain Knowledge

Analysis

© P

isto

ia A

llia

nce

AI & Pharma Skillset Intersection

https://www.quora.com/What-is-the-difference-between-Data-Analytics-Data-Analysis-Data-Mining-Data-Science-Machine-Learning-and-Big-Data-1

Software Engineering

Bioinformatics

Architecture & Systems

Clinical Statistics

HPC/Linux Farm

AI & Machine Learning

Scientists

© P

isto

ia A

llia

nce

Does Pharma Have the Right Skills?

ManagementBusiness

Computer

Science

Biology

Chemistry

Medicine

Law

Statistics

Physics

BS MS/MBA PhD/MD/JD

© P

isto

ia A

llia

nce

Does Pharma Have the Right Skills?

ManagementBusiness

Computer

Science

Biology

Chemistry

Medicine

Law

Statistics

Physics

BS MS/MBA PhD/MD/JD

Need depth & breadth across AI areas

© P

isto

ia A

llia

nce

http://skrullemperor.deviantart.com/art/Deer-in-Headlights-120323487

© P

isto

ia A

llia

nce

Threat of High Salaries for “Expertise”

Paul Minton:

Waiter ($20K) data scientist ($100K)

“As Tech Booms, Workers Turn to Coding for Career Change”. July 28, 2015 New York Times

© P

isto

ia A

llia

nce

https://www.linkedin.com/pulse/body-language-does-work-business-owners-andrew-r-mackey

© P

isto

ia A

llia

nce

AI is a harder concept to grasp

• Pharma & IT grasp replacement technologies

– Virtual machine replaces physical machine

– Cloud storage replaces local disks

– Agile replaces waterfall method

– High Throughput Screening replaces “screening”

– High Content Screening replaces imaging

• AI and Machine Learning

– Provide a data-driven complement to many disciplines

– Apply from early discovery to marketing

– Span journals, data, omics, images, decision-making

© P

isto

ia A

llia

nce

Volume of Tasks• Easy to develop AI solutions around a single task

– Waze navigates

– Amazon sells

– LinkedIn links

– Facebook advertises

• Pharma/Biotech tasks are varied

– Text mining for targets

– Screening and imaging technologies

– Using ‘Omics

– Drug optimization

– Clinical trials

– Patient reports and communication

– Predictions on activity, safety, trial enrollment, outcomes…

© P

isto

ia A

llia

nce

Machine Learning Methods of AI

ML Mastery

© P

isto

ia A

llia

nce

Big Data Landscape

http://mattturck.com/2016/02/01/big-data-landscape/

© P

isto

ia A

llia

nce

http://arthurmcarthurs.blogspot.com/2011/06/deer-in-headlights.html

© P

isto

ia A

llia

nce

AI Is Having a Stifled Impact in Pharma

• Bottom-Up Proof Cycle

– Scientific domain culture

– Continually need to prove AI’s value to every group

– Leveraging 1 data set at a time for 1 AI problem

– Gains are localized to small groups

• Minimal investment

– Sitting on more data than most industries

– Failing to analyze and leverage this data

– Hiring less AI expertise than small tech startups

– Relying on expensive external collaborations

© P

isto

ia A

llia

nce

How to Succeed1) Organize the data for AI

“Data, rather than software, is the barrier”

2) Invest in AI talent“Simply downloading and “applying” open-source software to your data won’t work. AI needs to be customized to your business context and data. This is why there is currently a war for the scarce AI talent that can do this work.”

3) Develop an AI strategy“After understanding what AI can and can’t do, the next step for executives is incorporating it into their strategies. [This] is the beginning, not the end….”

What Artificial Intelligence Can and Can’t do Now”

Harvard Business Review Nov 9, 2016 Andrew Ng

© P

isto

ia A

llia

nce

Audience Q&APlease use the Question function in GoToWebinar

© P

isto

ia A

llia

nce

Beyond BMI: Body Composition

Phenotyping in the UK Biobank

The next Pistoia Alliance Discussion Webinar:

Date: October 25, 2017

check http://www.pistoiaalliance.org/events/ for the latest information

info@pistoiaalliance.org @pistoiaalliance www.pistoiaalliance.org