+ All Categories
Home > Science > Slas talk 2016

Slas talk 2016

Date post: 14-Apr-2017
Category:
Upload: sean-ekins
View: 402 times
Download: 0 times
Share this document with a friend
38
Ensuring Chemical Structure, Biological Data and Computational Model Quality Sean Ekins 1 Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay- Varina, NC 27526, USA. 2 Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA 3 Collaborations Pharmaceuticals, Inc., 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA. 4 Phoenix Nest, Inc. P.O. BOX 150057, Brooklyn NY 11215, USA. 5 Hereditary Neuropathy Foundation, 401 Park Avenue South, 10th Floor, New York, NY 10016, USA Email: [email protected] Twitter: collabchem
Transcript
Page 1: Slas talk 2016

Ensuring Chemical Structure, Biological Data and Computational Model QualitySean Ekins

1 Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA.2Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA3Collaborations Pharmaceuticals, Inc., 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA.4Phoenix Nest, Inc. P.O. BOX 150057, Brooklyn NY 11215, USA.5Hereditary Neuropathy Foundation, 401 Park Avenue South, 10th Floor, New York, NY 10016, USA

Email: [email protected] Twitter: collabchem

Page 2: Slas talk 2016

Outline

Database Quality Molecule structure availability Dispensing Error Simulating Error NIH Probe Quality BIA 10-2474

"Well, here's another nice mess you've gotten me into!"

Page 3: Slas talk 2016

Summary of Data Rich World What do we trust?

Page 4: Slas talk 2016

‘Big’ Chemistry Databases 1Billion molecules but how many are

real

Page 5: Slas talk 2016

It Started for me by Looking at Malaria Data with SMARTS Filters

Med. Chem. Commun., 2010,1, 325-330

Used various filters (Pfizer, Glaxo, Abbott – implemented by University of New Mexico) with antimalarial datasets

Found large percentages of libraries were failing filters

Some filters more stringent than others (Alarm vs Glaxo)

Proposed wider use of such filters

PAINS also appeared in 2010

Page 6: Slas talk 2016

Circa 2011-2012Structure Quality Issues

Everywhere

NPC Browser http://tripod.nih.gov/npc/

Database released and within days 100’s of errors found in structures

DDT, 16: 747-750 (2011)

Science Translational Medicine 2011

DDT 17: 685-701 (2012)

Page 7: Slas talk 2016

Circa 2013-now: Finding Structures of Pharma

Molecules is Hard

DDT, 18: 58-70 (2013)

NCATS and MRC made molecule identifiers from several pharmas available without structures.. Continues today

Limits computational repurposing efforts, transparency

Page 8: Slas talk 2016

DDT editorial Dec 2011

http://goo.gl/dIqhU

This editorial led to collaboration

It’s Not Just Structure Quality we Need to Worry About

Page 9: Slas talk 2016

How do you Move a Liquid?

Images courtesy of Bing, Tecan

McDonald et al., Science 2008, 322, 917.Belaiche et al., Clin Chem 2009, 55, 1883-1884

Plastic Leaching

Page 10: Slas talk 2016

Using Literature Data From Different Dispensing Methods to Generate

Computational ModelsFew molecule structures and corresponding datasets are public

Using data from 2 AstraZeneca patents:

Tyrosine kinase EphB4 pharmacophores (Accelrys Discovery Studio) were developed using data for 14 compounds

IC50 determined using different dispensing methods

Analyzed correlation with simple descriptors (SAS JMP)

Calculated LogP correlation with log IC50 data for acoustic dispensing (r2 = 0.34, p < 0.05, N = 14)

Barlaam, B. C.; Ducray, R., WO 2009/010794 A1, 2009Barlaam, B. C.; Ducray, R.; Kettle, J. G., US 7,718,653 B2, 2010

Page 11: Slas talk 2016

Compound #

5 0.002 0.5534 0.003 0.1467 0.003 0.778

W7b 0.004 0.1528 0.004 0.445

W5 0.006 0.0876 0.007 0.973

W3 0.012 0.049W1 0.014 0.1129 0.052 0.17010 0.064 0.817

W12 0.158 0.250W11 0.207 14.40011 0.486 3.030

3.312.8

1.669.6

6.2

8.2

IC50 Acoustic (µM) IC50 Tips (µM) Ratio IC50Tip/IC50ADE

276.548.7

259.342.5

111.313.7

139.04.2

14 Compounds With Structures and IC50 Data

Barlaam, B. C.; Ducray, R., WO 2009/010794 A1, 2009Barlaam, B. C.; Ducray, R.; Kettle, J. G., US 7,718,653 B2, 2010

Page 12: Slas talk 2016

-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

log IC50-acoustic

log

IC50

-tips

log IC50 Values for Tip-based Serial Dilution and Dispensing Versus Acoustic Dispensing with Direct Dilution

Shows Poor R2 = 0.246

acoustic technique always gave more potent IC50 values

PLoS ONE 8(5): e62325 (2013)

Page 13: Slas talk 2016

  Hydrophobic

features (HPF)

Hydrogen

bond

acceptor

(HBA)

Hydrogen

bond donor

(HBD)

Observed

vs. predicted

IC50 r

Acoustic mediated process 2 1 1 0.92

Tip-based process 0 2 1 0.80PLoS ONE 8(5): e62325 (2013)

Acoustic Tip based

Tyrosine Kinase EphB4 Pharmacophores

Generated with Discovery Studio (Accelrys)

Cyan = hydrophobic

Green = hydrogen bond acceptor

Purple = hydrogen bond donor

Each model shows most potent molecule mapping

Page 14: Slas talk 2016

• An additional 12 compounds from AstraZeneca Barlaam, B. C.; Ducray, R., WO 2008/132505 A1, 2008

• 10 of these compounds had data for tip based dispensing and 2 for acoustic dispensing

• Calculated LogP and logD showed low but statistically significant correlations with tip based dispensing (r2= 0.39 p < 0.05 and 0.24 p < 0.05, N = 36)

• Used as a test set for pharmacophores

• The two compounds analyzed with acoustic liquid handling were predicted in the top 3 using the ‘acoustic’ pharmacophore

• The ‘Tip-based’ pharmacophore failed to rank the retrieved compounds correctly

Test set Evaluation of Pharmacophores

PLoS ONE 8(5): e62325 (2013)

Page 15: Slas talk 2016

Pharmacophores for the tyrosine kinase EphB4 generated from crystal structures in the protein data bank PDB using Discovery Studio version 3.5.5

Automated Receptor-Ligand Pharmacophore Generation

MethodCyan = hydrophobic

Green = hydrogen bond acceptor

Purple = hydrogen bond donor

Grey = excluded volumes

Each model shows most potent molecule mappingBioorg Med Chem Lett

2010, 20, 6242-6245.Bioorg Med Chem Lett 2008, 18, 5717-5721. Bioorg Med Chem Lett 2008, 18, 2776-2780.Bioorg Med Chem Lett 2011, 21, 2207-2211.

PLoS ONE 8(5): e62325 (2013)

Page 16: Slas talk 2016

• In the absence of structural data, pharmacophores and other computational and statistical models are used to guide medicinal chemistry in early drug discovery.

• Our findings suggest acoustic dispensing methods could improve HTS results and avoid the development of misleading computational models and statistical relationships.

• Automated pharmacophores are closer to pharmacophore generated with acoustic data – all have hydrophobic features – missing from Tip- based pharmacophore model

• Importance of hydrophobicity seen with logP correlation and crystal structure interactions

• Public databases should annotate this meta-data alongside biological data points, to create larger datasets for comparing different computational methods.

Dispensing Issues Summary

PLoS ONE 8(5): e62325 (2013)

Page 17: Slas talk 2016

Simple computational replica of experiment

Simulate experiments

Understand error

Just need assay protocol, data on imprecision and inaccuracy

Can be used before an assay is ever performed

IPython notebook available

Boot Strapping for Evaluating Dispensing Error

Hanson, Ekins and Chodera, J Comput Aided Mol Des 29: 1073-1086 (2015)

Page 18: Slas talk 2016

Modeling Error Using the Bootstrap Principle

Simulate Error and bias in dispensing

Hanson, Ekins and Chodera, J Comput Aided Mol Des 29: 1073-1086 (2015)

Page 19: Slas talk 2016

Modeling Error Using the Bootstrap Principle

Can account for some but not all error

Hanson, Ekins and Chodera, J Comput Aided Mol Des 29: 1073-1086 (2015)

Page 20: Slas talk 2016

Modeling Error Using the Bootstrap Principle

The number of wells for dilution series can impact error

Try simulation for yourself https://goo.gl/Rku8c5

Hanson, Ekins and Chodera, J Comput Aided Mol Des 29: 1073-1086 (2015)

Page 21: Slas talk 2016

What is a Probe? Crowdsourcing NIH Probe Evaluation

NIH spent a decade funding HTS efforts as part of the MLSCN and MLPCN

By 2010 $576.6M in funding

Various definitions of a probe

Potency, selectivity, solubility and availability

Little has been done to learn from this work

J Chem Inf Model. (2014) 10:2996-3004

Page 22: Slas talk 2016

Could One Medicinal Chemist be enough? But do we really need a crowd? Could 1 medicinal chemist be

enough? > 40 years experienceChris Lipinski scored the original 64 cpds

– he was close to medianFound more probes since 2009• Now scored more than 300 NIH Probes

for desirabilityExtensive due diligence

Based on literature (public/private)Chemical Reactivity

J Chem Inf Model. (2014) 10:2996-3004J Med Chem. (2015) 5:2068-76

Page 23: Slas talk 2016

Contribution of Criteria for Considering Compounds as Undesirable

79% of 322 probes are desirable

J Chem Inf Model. (2014) 10:2996-3004

Page 24: Slas talk 2016

Simple Property Comparison for NIH Probes

Properties from CDD

Properties from Discovery Studio

Higher MWT, rotatable bonds and heavy atoms is desirable

J Chem Inf Model. (2014) 10:2996-3004

Page 25: Slas talk 2016

Expert Evaluation vs PAINS and Bad Apple

Desirable probes less likely to be filtered by PAINS or BadApple as promiscuous than those scored as undesirable.

(Fisher's exact test, p>0.0001 for PAINS and p=0.04 for BadApple). J Chem Inf Model. (2014) 10:2996-3004

Since the rule of 5 there has been a considerable focus on more rules – ALERTS, PAINS, QED, BadApple etc

Page 26: Slas talk 2016

Cross Validation of NIH Probes Machine Learning Models

FCFP_6 descriptors + 8 simple descriptors Leave out 50% x 100 of Bayesian models

5 fold cross validation for n307 models External test sets

J Chem Inf Model. (2014) 10:2996-3004

Page 27: Slas talk 2016

Comparison of Desirability Scores with Bayesian Learning Predicted Scores and Other Metrics

• The colors on the heat map correspond to the value of the indicated metric for each probe, listed vertically.

• The scale was normalized internally with green corresponding to the optimal condition within each metric.

• Data in CDD Public and can be used with

3 fold cross validationROC = 0.69

J Chem Inf Model. (2014) 10:2996-3004

Page 28: Slas talk 2016

NIH Probes now Added to Approved Drugs Mobile App

http://goo.gl/PVkQeo

Making the data more accessible as we are drowning in molecules

Ligand efficiency higher in undesirable compounds

Bayesian model preferable in classifying desirable compounds vs other molecule quality metrics

Model could improve probe selection, score libraries, prior to more extensive due diligence

Probes could be scored by additional chemists dependent on needs e.g. bias to CNS, anticancer..

J Chem Inf Model. (2014) 10:2996-3004

Page 29: Slas talk 2016

Issues Raised in NIH Probes Search

Complexities in finding the NIH MLP probes in PubChem

Identifier and structure searches in CAS SciFinderTM reveals an extreme disclosure

The parallel worlds of commercial and public database disclosure do not completely intersect

Integration and intersections of databases and the need for bioassay ontology adoption

Public Commercial

J Med Chem. (2015) 5:2068-76

Page 30: Slas talk 2016

The Tragic Case of BIA 10-2474

Page 31: Slas talk 2016

Crowdsourcing BIA 10-2474 / Target/s -Predictions/Speculations

Nobody confirmed molecule name / structure used in trial in first few days

Predictions with Polypharma, Bayesian models and SEA (Shoichet lab)

Suggested promiscuity, beyond target of FAAH

Page 32: Slas talk 2016

BIA 10-2474 / Metabolite Predictions-Structure Ultimately Was Not Same

Raises questions on Openness, transparency

Use of software for predictions

Quality and utility of predictive tools

But without information on structure its impossible

Page 33: Slas talk 2016

Making Predictions Open in Real Time

www.collabchem.com http://cheminf20.org/ http://cdsouthan.blogspot.com/

Page 34: Slas talk 2016

Recommendations Need more collaboration or openness in terms of

availability of chemistry and biology data. Role of publishers?

Increased communication between the various databases that are both public and proprietary

Companies need to be more transparent structure/ID deposition of Phase I clinical trial data globally

Could lead to more opportunity for discovery / repurposing

Chance to profile compounds with computational tools and flag possible issues

Role of ‘armchair science’ and crowd in raising issues is valid

Page 35: Slas talk 2016

AcknowledgementsAlex M. Clark

Antony J. Williams

Christopher Southan

John Chodera and Sonya Hanson

NIH NCATS 9R44TR000942-02 “Biocomputation across distributed private datasets to enhance drug discovery”.

Nadia Litterman

Joe Olechno

Christopher A. Lipinski

Barry A. Bunin

Jeremy Yang for the link to BadApple Biovia for providing Discovery Studio

Page 36: Slas talk 2016

Extra slides

Page 37: Slas talk 2016

Key Recent ReferencesModeling error in experimental assays using the bootstrap principle: understanding discrepancies between assays using different dispensing technologies.Hanson SM, Ekins S, Chodera JD.J Comput Aided Mol Des. 2015 Dec;29(12):1073-86.

Open Source Bayesian Models. 2. Mining a "Big Dataset" To Create and Validate Models with ChEMBL.Clark AM, Ekins S.J Chem Inf Model. 2015 Jun 22;55(6):1246-60.

Parallel worlds of public and commercial bioactive chemistry data.Lipinski CA, Litterman NK, Southan C, Williams AJ, Clark AM, Ekins S.J Med Chem. 2015 Mar 12;58(5):2068-76.

Computational prediction and validation of an expert's evaluation of chemical probes.Litterman NK, Lipinski CA, Bunin BA, Ekins S.J Chem Inf Model. 2014 Oct 27;54(10):2996-3004.

Dispensing processes impact apparent biological activity as determined by computational and statistical analyses.Ekins S, Olechno J, Williams AJ.PLoS One. 2013 May 1;8(5):e62325.


Recommended