Date post: | 22-Jan-2018 |
Category: |
Healthcare |
Upload: | pistoia-alliance |
View: | 234 times |
Download: | 0 times |
2 October, 2017
Where will AI/Deep learning have
an impact in Life Science & Health
Pistoia Alliance Debates
27 September 2017
Nick Lynch
This webinar is being recorded
© P
isto
ia A
llia
nce
Poll Question 1: What role do you play in
your company
A. IT
B. data scientist/informatician
C. scientist
D. information professional
E. other
© P
isto
ia A
llia
nce
The Panel
4
Peter Henstock
Senior Manager,
Business Technology
group, Pfizer
Sean Ekins, CEO and
Founder Collaborations
Pharmaceuticals
David Pearah , CEO,
HDF Group
Poll Question 2: What is your familiarity
with AI/Deep learning?
A. I am using AI/Deep learning
B. I am experimenting with AI/Deep learning
C. I am aware of AI/Deep learning
D. I know next to nothing about it
© P
isto
ia A
llia
nce
David Pearah, CEO
HDF Group
Learning from other industry sectors
© P
isto
ia A
llia
nce
What is it?7
© P
isto
ia A
llia
nce
Data Science and Artificial Intelligence
Hype?
Yes.
Real Substance
and Impact?
Yes.
© P
isto
ia A
llia
nce
Artificial Intelligence
(AI)
Field of computer science that allows
computers to “seem human” in some way
by replicating human cognitive functions
(e.g., learning and problem solving)
Machine Learning
(ML)
Subset of AI approaches that gives
computers the ability to learn from
and make predictions on data without
being explicitly programmed (i.e. learn
on their own from new data)
Deep Learning
(DL)
Simulates many (deep) hierarchical
layers of neurons in the human brain: by
running large amounts of data through
this simulation, it develops its own
understanding of the concepts inherent in
the data
© P
isto
ia A
llia
nce
• Storage and processing power as a cheap, on-demand utility:
• Graphics Processing Units (GPUs)
• Cloud computing allows affordable GPUs at scale
• Critical mass in open source software community
• Powerful new applications for known AI techniques (e.g., deep learning)
• Global, online AI community sharing advances daily
• Open source software from the community and tech giants (e.g., Google TensorFlow)
• Huge AI investments from tech titans who see AI as a strategic asset
• Exponential growth in data to analyze using DL. In life science:
• Electronic health records
• Genomic data
• Patient monitoring and treatment devices (e.g., EKG, Pulse, Oxygen, IV Pumps, etc..)
• Consumer biomonitoring devices (e.g., FitBit, Apple Watch, smartphones)
• Environmental data
• Data registries
• Medical literature and supporting primary data
Deep Learning (DL): Why Now?
© P
isto
ia A
llia
nce
© P
isto
ia A
llia
nce
Artificial Intelligence
Machine Learning
Knowledge Representation and Reasoning
Automated Planning
Natural Language Processing
Multi-Agent Systems
Robotics
Reinforcement Learning Supervised Learning Semi-supervised Learning Unsupervised LearningMarkov Decision
Processes (e.g. Policy iteration)
Classification/Regression Clustering Summarization Anomaly Detection
Distance-based (e.g.: LOF)Model-based (e.g.: MMPP)
Graphical and Statistical(e.g.: Exponential
Smoothing)
Dimensionality Reduction(e.g. PCA, SVD)
Association and Sequence models (e.g.: apriori
algorithm)
Density-based (e.g.: DBSCAN)
Hierarchical (e.g.: Single-linkage)
Centriod-based (e.g.: K-Means)
Distribution-based(e.g.: Mixture of
Gaussians)
Instance-based (e.g.: KNN, CBR)
Decision Tree(e.g.: Random
Forest)
Artificial Neural Networks (e.g.
Perceptron)
Bayesian Networks(e.g.: Naïve Bayes)
Kernel-based (e.g. SVM)
© P
isto
ia A
llia
nce
Creating artificial intelligence solutions using supervised learning with a neural
network:
Dogs
2
Collecting and annotating data sets
3Training via Computation
4Independent Validation of the Algorithm
5Deployment and Monitoring
1 Define a Narrative AI Use Case
Cats
© P
isto
ia A
llia
nce
What is it?
© P
isto
ia A
llia
nce
What’s happening?
What is it?
© P
isto
ia A
llia
nce
16
I/O library
optimized for
scale + speed
Self-
documenting
container
optimized for
scientific data +
metadata
Users who need both features
HDF5 + Deep Learning
1
6
HDF5 already integrated into every major DL Framework
(TensorFlow, Caffe, Keras, etc.)
© P
isto
ia A
llia
nce
v
v
v
What does the HDF Group do?
• HDF5 Community Edition + Enterprise Edition
• Connectors: ODBC + Cloud (Beta)
• Add-Ons: compression + encryption
• HDF Support Packages (Basic + Pro + Premier)
• Support for h5py + PyTables + pandas (NEW)
• Training
• HDF: new functionality + performance tuning for specific use cases
• HPC software engineering with scientific expertise
• Deep Learning expertise
Products
Support
Consulting
1
7
© P
isto
ia A
llia
nce
Poll Question 3: What is your company’s
primary use for AI/Deep learning
A. Early Discovery/ Pre-clinical
B. Development & Clinical
C. Imaging Analysis
D. Other
E. Don’t use AI
Sean Ekins, CEO, Collaborations
Pharmaceuticals, Inc.Deep Learning in Pharmaceutical Research
© P
isto
ia A
llia
nce
AI in Pharma is not new!
222 October, 2017
• Neural Networks
• Genetic algorithms
• SVM
• ‘Used’ for decades
• Why it never took off:
– Compute power
– Lack of training data
– Limited support
– Most Scientists did not believe them…needed a
paradigm shift
– Pharma mergers culled 10,000’s scientists
DEEP LEARNING
© P
isto
ia A
llia
nce
Big data in 2002 vs 2017
232 October, 2017
Now -TB data ~19,000 cpds
© P
isto
ia A
llia
nce
HTS phenotypic
screen
Molecule Screening database
Machine learning models
Vendor library
Top scoring molecules assayedin vitro
Bernoulli Naive Bayes, Logistic linear regression, AdaBoost Decision Trees, Random Forest, Support
Vector Machines (SVM), Deep Neural networks (DNN)
Speeding drug discovery with AI
▶ Molecular pattern recognition
of biological data
▶ Descriptors identify these
patterns
▶ Define active and inactive
features
▶ Used to generate predictions
for drug activity at a certain
target (organism, protein of
interest)
© P
isto
ia A
llia
nce
What is Deep Learning
252 October, 2017
© P
isto
ia A
llia
nce
Deep Learning uses
262 October, 2017
• facial recognition
algorithms
– Facebook tagging
photos
• self-driving cars
• robot assistants http://tinyurl.com/hak4lcv
http://tinyurl.com/y8vjv8lp
© P
isto
ia A
llia
nce
Deep Learning in Pharmaceutical Research
272 October, 2017
• Bioinformatics
– Protein disorder
– Refine docking
complexes
– Model CLIP-seq data
– High content image
analysis data
– Biomarkers
– Protein contacts
– Cancer diagnosis
• Pharmaceutical
– Solubility
– Gene expression data
– Formulation
– QSAR – Merck DL out
performed random
forests in 11 /15 and
13/15 datasets
– Tox21
Where else could we apply DL in drug discovery?Pharmacoeconomics?
© P
isto
ia A
llia
nce
Gaps in Deep Learning for Pharmaceutical research
282 October, 2017
• TensorFlow
• Deeplearning4j
• Facebook (Torch)
• Microsoft (CNTK)
• Which metrics to use?
• Which descriptors?
• Are the DL over training?
• Lack of prospective testing.
© P
isto
ia A
llia
nce
Recent Deep Learning papers
292 October, 2017
© P
isto
ia A
llia
nce
Comparison of TB Machine-Learning Models (1µM)
302 October, 2017
Logistic Regression (LR)
Adaboosted Decision Trees (ADA)
Random Forest (RF)
Naive-bayes (BNB)
Support Vector Machines (SVM)
Deep Neural Networks (DNN)
▶ TB data from literature
▶ ~19,000 molecules
▶ ECFP6 descriptors
▶ Used previously with
Bayesian methods
▶ Multiple metrics
▶ 5 fold cross val
▶ Classic ML -Open source
Scikit-learn http://scikit-
learn.org/stable/
▶ Deep Neural Networks
(DNN) using Keras
https://keras.io/, and
Tensorflow
www.tensorflow.org,
© P
isto
ia A
llia
nce
Small scale Machine Learning comparison
312 October, 2017
• Comparing different
algorithms and using FCFP6
fingerprints
• Deep learning seems to
improve model ROC statistics
in 4/6 cases.
• Data sets range from 100s –
>300K
• All classification models
• Next steps evaluate all the
datasets in ChEMBL,
PubChem, ToxCast etc
31
Korotcov et al., Submitted
© P
isto
ia A
llia
nce
Building Machine Learning models Assay Central
322 October, 2017
• Curate data and build
models
• Provide models and
collections as jar files
Add DL algorithm to Assay Central
© P
isto
ia A
llia
nce
Acknowledgments
332 October, 2017
• Kim Zorn Assay Central Guru
• Alex Clark Assay Central
• Thomas Lane PhD intern UNC
• Dan Russo PhD intern Rutgers
• Jacob Gerlach High School Intern
• Valery Tkachenko Deep Learning Consultant
• Alex Korotcov Deep Learning Consultant
• Thanks also to: Renee Arnold, Peter Swaan
Funding from NIGMS NIH R43GM122196
© P
isto
ia A
llia
nce
Poll Question 4: What is the greatest
barrier to application of AI at your org
A. Technical & skills expertise
B. Access to data
C. Data quality
D. Management support/understanding
E. Other
Peter Henstock - Business
Technology, Pfizer Inc.
Why is pharma lagging in the AI arena whereas
other industries are already transformed
© P
isto
ia A
llia
nce
AI Works
© P
isto
ia A
llia
nce
What does Waze do?
• Obtain public data: maps & locations
• Acquire & organize data for AI analyses
– Leverage historical traffic data
– Integrate new traffic information
• Utilize AI algorithms
– Fastest route predictions
• Present timely information through UI
© P
isto
ia A
llia
nce
Why Isn’t AI Working Yet for Pharma?
drugwazeRescreening55% chance of new series 6 weeks $1.2MM
Optimization14% issue series 1
Solubility cause23% issue series 2
Safety cause5% issue series 3
8.2 months to Phase 1
Predicted FDA approval chance: 37%
Recommended actions: 1) Resolve the
© P
isto
ia A
llia
nce
Keys to Success
• Obtain public data
• Acquire & organize data for AI analyses
• Utilize AI algorithms
• Present timely information through UI
© P
isto
ia A
llia
nce
Need for a Chief Data Officer
Value Proposition
https://www.123rf.com/photo_17347316_businessman-pulling-rope-on-white-background.html
$ $ $
Acquire and organize data for AI
© P
isto
ia A
llia
nce
Analytics First, Then AI
• Readiness for Analytics & AI
– Curated data sources
– Automated data management processes
– Structured data analytics
• “If your company isn’t good at analytics,
it’s not ready for AI”
– Harvard Business Review June 7, 2017
© P
isto
ia A
llia
nce
Keys to Success
• Obtain public data
• Acquire & organize data for AI analyses
• Utilize AI algorithms
• Present timely information through UI
© P
isto
ia A
llia
nce
Harvard Business Review October 2012
© P
isto
ia A
llia
nce
Modern Data Scientist
Math
Statistics
AI
Hacking
Database
Computing
Story Telling
Visualization
Domain Knowledge
Analysis
© P
isto
ia A
llia
nce
AI & Pharma Skillset Intersection
https://www.quora.com/What-is-the-difference-between-Data-Analytics-Data-Analysis-Data-Mining-Data-Science-Machine-Learning-and-Big-Data-1
Software Engineering
Bioinformatics
Architecture & Systems
Clinical Statistics
HPC/Linux Farm
AI & Machine Learning
Scientists
© P
isto
ia A
llia
nce
Does Pharma Have the Right Skills?
ManagementBusiness
Computer
Science
Biology
Chemistry
Medicine
Law
Statistics
Physics
BS MS/MBA PhD/MD/JD
© P
isto
ia A
llia
nce
Does Pharma Have the Right Skills?
ManagementBusiness
Computer
Science
Biology
Chemistry
Medicine
Law
Statistics
Physics
BS MS/MBA PhD/MD/JD
Need depth & breadth across AI areas
© P
isto
ia A
llia
nce
http://skrullemperor.deviantart.com/art/Deer-in-Headlights-120323487
© P
isto
ia A
llia
nce
Threat of High Salaries for “Expertise”
Paul Minton:
Waiter ($20K) data scientist ($100K)
“As Tech Booms, Workers Turn to Coding for Career Change”. July 28, 2015 New York Times
© P
isto
ia A
llia
nce
https://www.linkedin.com/pulse/body-language-does-work-business-owners-andrew-r-mackey
© P
isto
ia A
llia
nce
AI is a harder concept to grasp
• Pharma & IT grasp replacement technologies
– Virtual machine replaces physical machine
– Cloud storage replaces local disks
– Agile replaces waterfall method
– High Throughput Screening replaces “screening”
– High Content Screening replaces imaging
• AI and Machine Learning
– Provide a data-driven complement to many disciplines
– Apply from early discovery to marketing
– Span journals, data, omics, images, decision-making
© P
isto
ia A
llia
nce
Volume of Tasks• Easy to develop AI solutions around a single task
– Waze navigates
– Amazon sells
– LinkedIn links
– Facebook advertises
• Pharma/Biotech tasks are varied
– Text mining for targets
– Screening and imaging technologies
– Using ‘Omics
– Drug optimization
– Clinical trials
– Patient reports and communication
– Predictions on activity, safety, trial enrollment, outcomes…
© P
isto
ia A
llia
nce
Machine Learning Methods of AI
ML Mastery
© P
isto
ia A
llia
nce
Big Data Landscape
http://mattturck.com/2016/02/01/big-data-landscape/
© P
isto
ia A
llia
nce
http://arthurmcarthurs.blogspot.com/2011/06/deer-in-headlights.html
© P
isto
ia A
llia
nce
AI Is Having a Stifled Impact in Pharma
• Bottom-Up Proof Cycle
– Scientific domain culture
– Continually need to prove AI’s value to every group
– Leveraging 1 data set at a time for 1 AI problem
– Gains are localized to small groups
• Minimal investment
– Sitting on more data than most industries
– Failing to analyze and leverage this data
– Hiring less AI expertise than small tech startups
– Relying on expensive external collaborations
© P
isto
ia A
llia
nce
How to Succeed1) Organize the data for AI
“Data, rather than software, is the barrier”
2) Invest in AI talent“Simply downloading and “applying” open-source software to your data won’t work. AI needs to be customized to your business context and data. This is why there is currently a war for the scarce AI talent that can do this work.”
3) Develop an AI strategy“After understanding what AI can and can’t do, the next step for executives is incorporating it into their strategies. [This] is the beginning, not the end….”
What Artificial Intelligence Can and Can’t do Now”
Harvard Business Review Nov 9, 2016 Andrew Ng
© P
isto
ia A
llia
nce
Audience Q&APlease use the Question function in GoToWebinar
© P
isto
ia A
llia
nce
Beyond BMI: Body Composition
Phenotyping in the UK Biobank
The next Pistoia Alliance Discussion Webinar:
Date: October 25, 2017
check http://www.pistoiaalliance.org/events/ for the latest information
[email protected] @pistoiaalliance www.pistoiaalliance.org