Automating Drug DesignUsing Robot Scientists
Ross D. King, University of Manchester, [email protected]
The Concept of a Robot Scientist
Background Knowledge
Analysis
Final Theory Experiment selection Robot
Results Interpretation
Computer systems capable of originating their own experiments, physically executing them, interpreting the
results, and then repeating the cycle.
Hypothesis Formation
Motivation Robot Scientists have the potential to increase the productivity
of science. They can work cheaper, faster, more accurately, and longer than humans. They can also be easily multiplied. – Enabling the high-throughput testing of hypotheses.
Robot Scientists have the potential to improve the quality of science. – by enabling the description of experiments in greater detail
and semantic clarity.
Robot Scientist Timeline 1999-2004 Initial Robot Scientist Project
– Limited Hardware– Collaboration with Douglas Kell (Aber Biology), Steve
Oliver (Manchester), Stephen Muggleton (Imperial)King et al. (2004) Nature, 427, 247-252
2004-2011 Adam Project – Yeast Functional Genomics– Sophisticated Laboratory Automation– Collaboration with Steve Oliver (Cambridge).King et al. (2009) Science, 324, 85-89
20011-2014 Eve Project – Drug Design for Tropical Diseases
Adam
Adam
Adam
Functional genomics In yeast (S. cerevisiae) ~15% of the 6,000 genes still have
no known function. First machine to autonomously discover scientific
knowledge.
Eve
Hit Confirmation
Assay
DesignLibrary screen
Learn and Test
QSAR
Lead Compound
Synthetic Biology Robot Scientist
Automating Early Drug Development
Application Domain
Malaria
Shistosomaisis Leishmania Chagas
Why Tropical Diseases? Millions of people die of these diseases, and hundreds of
millions of people suffer infection.
It is clear how to cure these diseases – kill the parasites.
They are “neglected”, so avoid competition from the Pharmaceutical industry.
Synthetic Biology based Assays
Eve utilizes a standardized form of screening assay that combines advantages of: – computational assays (generality) – biochemical assays (targeted) – utilizing live cells (biological realism, and early
screening for toxicity)
These are cheap (few £k) and quick (few weeks) to engineer.
Synthetic Biology based Assays
Our idea is to engineer cells to be Assay computers.
These computers will accurately estimate a biological function that corresponds to the set of desired assay properties.
The function estimated is the utility of a compound against a disease.
E.g. ((inhibit P. vivax DHFR) (¬ inhibit ∧ H. sapiens DHFR) (¬ cytotoxic)).∧
Synthetic Biology Workflow
Enzymes Targeted
Dihydrofolate Reductase (DHFR)
N-myristoyl transferase
Phosphoglycerate kinase
Eve
AI
The Experimental Cycle
Background Knowledge
Analysis
Final Theory Experiment(s) selection Robot Results
Interpretation
Hypothesis Formation
Model v Real-World
Logical ModelBiological
System
Experimental Predictions
Experimental Results
Representation for QSARs Eve wishes to learn quantitative structure activity
relationships (QSARs). Functions that predict compound activity from structure.
The standard method is to use attributes. Technically these are propositions that are true for the compounds, e.g. partial charge, a fingerprint, etc. Eve currently uses a form of fingerprint.
Compounds have relational structure. Propositions are provably inefficient at representing this. It is potentially much better to use predicated logic.
The Experimental Cycle
Background Knowledge
Analysis
Final Theory Experiment(s) selection Robot Results
Interpretation
Hypothesis Formation
Inferring Hypotheses Science is based on the hypothetico-deductive method.
In the philosophy of science. It has often been argued that only humans can make the “leaps of imagination” necessary to form hypotheses.
QSAR learning is a form of inductive hypothesis formation.
Learning QSARs
Almost every form of statistical and machine learning method you can think of has been applied to QSAR learning.
Leading methods are logistic regression, support vector machines, random forests. …
Eve currently uses Gaussian process models. Has the advantages of being generative and outputting probabilities – helps active learning.
The Experimental Cycle
Background Knowledge
Analysis
Final Theory Experiment(s) selection Robot Results
Interpretation
Hypothesis Formation
Active Learning 1 Active learning is the branch of machine learning where the
machine can select its own experiments.
Eve uses active learning to select compounds to test the QSAR hypotheses.
This selection task is comparable to that in many other areas of science and engineering: identify or design artifacts that have optimal performance.
It has an extra ingredient reminiscent of reinforcement learning: finding the right balance between exploring compound space, and exploiting regions with highly active compounds.
Active Learning 2
A successful approach was found to be a combination of selecting compounds with high estimated activity T, and high estimated variance, i.e. select the example where:T + b√var(T) is maximal
It is generally inefficient to assay (or synthesize) a single compound in a QSAR cycle, so batches of N compounds should be selected (for Eve N=64). This greatly increases the computational complexity of choosing the best experiment.
The Experimental Cycle
Background Knowledge
Analysis
ConsistentHypotheses
Final Theory Experiment(s) selection Robot
Experiments(s)
ResultsInterpretation
Hypothesis Formation
Eve’s Automation of Pipeline
Library screening
Hit confirmation
Learn QSAR/Intel
ligent screening
Hits
Confirmed hits
Predicted hits
Lead
Offline validation
Standard library screening is brute force: Eve uses intelligent screening
In the standard “pipeline” the 3 processes are not integrated.
In Eve automated and integrated.
Eve’s HardwareHighlights of Eve's hardware:
Acoustic liquid handling High throughput 384 well
plates Two industrial robot arms Automated 60x
microscope Liquid handlers,
fluorescence readers, barcode scanners, dry store, incubator, tube decapper ...
The Experimental Cycle
Background Knowledge
Analysis
Final Theory Experiment(s) selection Robot Results
Interpretation
Hypothesis Formation
Hit or Not? Growth curves were fit to the time course, and growth
parameters derived.
Machine learning was used to distinguish between: hit compounds, non-hits, toxic compounds, and Autofluorescent compounds.
The property of being a hit is not a Boolean function – quantitative.
The Experimental Cycle
Background Knowledge
Analysis
Final Theory Experiment(s) selection Robot Results
Interpretation
Hypothesis Formation
Closing the Loop
We have physically implemented all aspects of Eve.
To the best of our knowledge Eve is the first laboratory automation system that can execute cycles of QSAR learning and testing.
To the best of our knowledge Eve is the first laboratory automation system that integrates: library screening, hit conformation, and QSAR learning.
Table of Results
Intelligent v Brute-force Screening 1
We wished to compare our AI based screening against the standard brute-force approach: “begin at the beginning and go on till you come to the end: then stop” (Lewis Carroll).
While simple to automate standard screening is slow and wasteful of resources, since every compound in the library is tested. It is also unintelligent, as it makes no use of what is learnt during screening.
Use money to decide.
Intelligent v Brute-force Screening 2
Developed an econometric model for the relative costs of the two approaches.
Use simulation runs based on Eve’s screening data to compare approaches.
Intelligent screening is most cost-effective with larger libraries, more valuable compounds, and fast cycles of screening and testing. Such regimes are standard for pharmaceutical screening,
Acknowledgments
ABERYSTWYTH / MANCHESTERWayne Aubrey
Amanda Clare
Douglas Kell
Maria Liakata
Chuan Lu
Magda Markham
Katherine Martin
Ronald Pateman
Jem Rowland
Andrew Sparkes
Larisa Soldatova
Mike Young
Ken Whelan
CAMBRIDGE
Steve Oliver
Elizabeth Bilsland
Pınar Pir
Harry Moss
Michael de Clare
Mark Carrington
LEUVEN
Kurt De Grave
Luc De Raedt
Jan Ramon
Support from BB/F008228/1 from the UK Biotechnology & Biological Sciences Research Council and a contract from the European Commission under the FP7 Collaborative Programme, UNICELLSYS.