Unc slides on computational toxicology

Sean Ekins, M.Sc, Ph.D., D.Sc.

Collaborations in Chemistry, Fuquay-Varina, NC.

Collaborative Drug Discovery, Burlingame, CA.School of Pharmacy, Department of Pharmaceutical

Sciences, University of Maryland. 215-687-1320

[email protected]

Computational Models for

Predicting Human Toxicities

• Key enablers

• What has been modeled – a quick review

• What will be modeled

• Future

Outline

Why Use Computational Models For Toxicology?

Goal of a model – Alert you to potential toxicity, enable you to focus efforts on best molecules – reduce risk

Selection of model – trade off between interpretability, insights for modifying molecules, speed of calculation and coverage of chemistry space – applicability domain

Models can be built with proprietary, open and commercial tools

software (descriptors + algorithms) + data = model/s

Human operator decides whether a model is acceptable

Key enablers: Hardware is getting smaller

1930’s

1980s

1990s

Room size

Desktop size

Not to scale and not equivalent computing power – illustrates mobility

Laptop

Netbook

Phone

Watch

Key Enablers: More data available and open tools

• Details

• Details

What has been modeled

• Physicochemical properties, LogP, logD, Solubility, boiling point, melting point

• QSAR for various proteins, complex properties• Homology models, Docking• Expert systems• Hybrid methods – combine different approaches• Mutagenicity (Ames, micronucleus, clastogenicity,

and DNA damage, developmental tox.. )• Environmental Tox – Aquatic, dermatotoxicology• Mixtures

Physicochemical properties• Solubility data – 1000’s data in Literature • Models median error ~0.5 log = experimental error• LogP –tens of 1000’s data available• Fragmental or whole molecule predictors• All logP predictors are not equal. Median error ~ 0.3 log = experimental

error• People now accept solubility and LogP predictions as if real

ACD predictions + EpiSuite predictions in www.chemspider.com

• Mobile molecular data sheet

• Links to melting point predictor from open notebook science

• Required curation of data

Simple Rules

• Rule of 5

• Lipinski, Lombardo, Dominy, Feeney Adv. Drug Deliv. Rev. 23: 3-25 (1997).

• AlogP98 vs PSA• Egan, Merz, Baldwin, J. Med. Chem. 43: 3867-3877 (2000)

• Greater than ten rotatable bonds correlates with decreased rat oral bioavailability• Veber, Johnson, Cheng, Smith, Ward, Kopple. J Med Chem 45: 2515–2623, (2002)

• Compounds with ClogP < 3 and total polar surface area > 75A2 fewer animal toxicity findings.

• Hughes, et al. Bioorg Med Chem Lett 18, 4872-4875 (2008).

L. Carlsson,et al., BMC Bioinformatics 2010, 11:362

MetaPrint 2D in Bioclipse- free metabolism site predictor

Uses fingerprint descriptors and metabolite database to learn frequencies of metabolites in various substructures

QSAR for Various Proteins

• Enzymes – predominantly Cytochrome P450s - for drug-drug interactions

• Transporters – predominantly P-gp but some others e.g. OATP, BCRP -

• Receptors – PXR, CAR, for hepatotoxicity

• Ion Channels – predominantly hERG for cardiotoxicity

• Issues – initially small training sets – public data is a fraction of what drug companies have

Pharmacophores

Ideal when we have few molecules for training In silico database searching

Accelrys Catalyst in Discovery Studio

Geometric arrangement of functional groups necessary for a biological response

•Generate 3D conformations•Align molecules•Select features contributing to activity•Regress hypothesis•Evaluate with new molecules

•Excluded volumes – relate to inactive molecules

CYP2B6CYP2C9CYP2D6CYP3A4CYP3A5CYP3A7hERGP-gpOATPsOCT1OCT2BCRPhOCTN2ASBThPEPT1hPEPT2FXR LXRCARPXR etc

hOCTN2 – Organic Cation transporterPharmacophore

• High affinity cation/carnitine transporter - expressed in kidney, skeletal muscle, heart, placenta and small intestine

• Inhibition correlation with muscle weakness - rhabdomyolysis• A common features pharmacophore developed with 7 inhibitors• Searched a database of over 600 FDA approved drugs - selected drugs for in vitro testing. • 33 tested drugs predicted to map to the pharmacophore, 27 inhibited hOCTN2 in vitro

• Compounds were more likely to cause rhabdomyolysis if the Cmax/Ki ratio was higher than 0.0025

Diao, Ekins, and Polli, Pharm Res, 26, 1890, (2009)

hOCTN2 – Organic Cation transporterPharmacophore



+ve

-ve

hOCTN2 quantitative pharmacophore and Bayesian model

Diao et al., Mol Pharm, 7: 2120-2131, 2010 r = 0.89

vinblastine

cetirizine

emetine

hOCTN2 quantitative pharmacophore and Bayesian model

Bayesian Model - Leaving 50% out 97 times external ROC 0.90internal ROC 0.79 concordance 73.4%; specificity 88.2%; sensitivity 64.2%.

Lab test set (N = 27) Bayesian model has better correct predictions (> 80%) and lower false positives and negatives than pharmacophore (> 70%)

Predictions for literature test set (N=32) not as good as in house – mean max Tanimoto similarity were ~ 0.6

Diao et al., Mol Pharm, 7: 2120-2131, 2010

PCA used to assess training and test set overlap

Among the 21 drugs associated with rhabdomyolysis or carnitinedeficiency, 14 (66.7%) provided a Cmax/Ki ratio higher than0.0025.

Among 25 drugs that were not associated with rhabdomyolysis or

carnitine deficiency, only 9 (36.0%) showed a Cmax/Ki ratio higher than

0.0025.

Rhabdomyolysis or carnitine deficiency was associated with a Cmax/Ki

value above 0.0025 (Pearson’s chi-square test p = 0.0382).

limitations of Cmax/Ki serving as a predictor for rhabdomyolysis-- Cmax/Ki does not consider the effects of drug tissue distributionor plasma protein binding.

hOCTN2 association with rhabdomyolysis

Drug induced liver injury DILI

• Drug metabolism in the liver can convert some drugs into highly reactive intermediates,

• In turn can adversely affect the structure and functions of the liver.

• DILI, is the number one reason drugs are not approved – and also the reason some of them were withdrawn from

the market after approval• Estimated global annual incidence rate of DILI is 13.9-24.0

per 100,000 inhabitants, – and DILI accounts for an estimated 3-9% of all adverse

drug reactions reported to health authorities • Herbal components can cause DILI too

https://dilin.dcri.duke.edu/for-researchers/info/

• Drug Induced Liver Injury Models

• 74 compounds - classification models (linear discriminant analysis, artificial neural networks, and machine learning algorithms (OneR)) – Internal cross-validation (accuracy 84%, sensitivity 78%, and specificity 90%). Testing

on 6 and 13 compounds, respectively > 80% accuracy.

(Cruz-Monteagudo et al., J Comput Chem 29: 533-549, 2008).

• A second study used binary QSAR (248 active and 283 inactive) Support vector machine models – – external 5-fold cross-validation procedures and 78% accuracy for a set of 18

compounds

(Fourches et al., Chem Res Toxicol 23: 171-183, 2010).

• A third study created a knowledge base with structural alerts from 1266 chemicals. – Alerts created were used to predict results for 626 Pfizer compounds (sensitivity of

46%, specificity of 73%, and concordance of 56% for the latest version) (Greene et al., Chem Res Toxicol 23: 1215-1222, 2010).

• DILI Model - Bayesian

• Laplacian-corrected Bayesian classifier models were generated using Discovery Studio (version 2.5.5; Accelrys).

• Training set = 295, test set = 237 compounds

• Uses two-dimensional descriptors to distinguish between compounds that are DILI-positive and those that are DILI-negative

– ALogP– ECFC_6 – Apol – logD – molecular weight – number of aromatic rings – number of hydrogen bond acceptors – number of hydrogen bond donors – number of rings – number of rotatable bonds – molecular polar surface area – molecular surface area – Wiener and Zagreb indices

Ekins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010

Extended connectivity fingerprints

• DILI Bayesian

Features in DILI -Features in DILI +

Avoid===Long aliphatic chains, Phenols, Ketones, Diols, -methyl styrene, Conjugated structures, Cyclohexenones, Amides

Test set analysis

Ekins, Williams and Xu, Drug Metab Dispos 38: 2302-2308, 2010

• compounds of most interest – well known hepatotoxic drugs (U.S. Food and Drug Administration

Guidance for Industry “Drug-Induced Liver Injury: Premarketing Clinical Evaluation,” 2009), plus their less hepatotoxic comparators, if clinically available.

Fingolimod (Gilenya) for MS (EMEA and FDA)

Paliperidone for schizophrenia

Pirfenidone for Idiopathic pulmonary fibrosis

Roflumilast for pulmonary disease

Predictions for newly approved EMEA compounds

Can we get DILI data for these?

Time dependent inhibition for P450 3A4

• Pfizer generated a large dataset (~2000 compounds) and went through sequential Bayesian model generation and testing cycles

Test set 2 20 active in 156 compounds Combined both model predictions

Zientek et al., Chem Res Toxicol 23: 664-676 (2010)

• 3A4 TDI

Indazole ring, the pyrazole, and the methoxy-aminopyridine rings areimportant for TDI

Approach decreased in vitro screening 30%

Helps identify reactive metabolite forming compounds

Zientek et al., Chem Res Toxicol 23: 664-676 (2010)

http://www.slideshare.net/ekinsseanEkins S and Williams AJ, MedChemComm, 1: 325-330, 2010.

Analysis of malaria and TB datasets

Antimalarial Compound libraries and filter failures

Ekins and Williams Drug Disc Today 15; 812-815, 2010

0

20

40

60

80

100G

SK

(13

,35

5)

St J

ud

e(1

52

4)

No

vart

is(5

69

5)

FD

A d

rug

s(1

04

1)

An

tima

lari

al

dru

gs

(14

)

Abbott Alerts

Pfizer Lint Alerts

GSK Alerts

% F

ailu

reFiltering using SMARTs filters to remove thiol reactives, false positives etc at University of New Mexico (http://pasilla.health.unm.edu/tomcat/biocomp/smartsfilter)

TB Compound libraries and filter failures

Filtering using SMARTs filters to remove thiol reactives, false positives etc at University of New Mexico (http://pasilla.health.unm.edu/tomcat/biocomp/smartsfilter)

Ekins et al., Mol Biosyst, 6: 2316-2324, 2010

0

20

40

60

80

100%

Fa

ilu

re

TB

Ma

dd

ry (

90

)

TB

An

an

tha

n (

16

0)

TB

dru

gs

(13

)

US

an

tibio

tics

(16

3)

FD

A d

rug

s (1

04

1)

Abbott Alerts

Pfizer Lint Alerts

GSK alerts

Correlation between the number of SMARTS filter failures and the number of Lipinski violations for different types of rules sets with FDA drug set from CDD (N = 2804)

Suggests # of Lipinski violations may also be an indicator of undesirable chemical features that result in reactivity

Correlations

Ekins and Freundlich, Pharm Res, 28, 1859-1869, 2011.

Could all pharmas share their data as models with each other?

Increasing Data & Model Access

Ekins and Williams, Lab On A Chip, 10: 13-22, 2010.

The big idea

Challenge..There is limited access to ADME/Tox data and models needed for R&D

How could a company share data but keep the structures proprietary?

Sharing models means both parties use costly software

What about open source tools? Pfizer had never considered this - So we proposed a

study and Rishi Gupta generated models

Pfizer Open models and descriptors

Gupta RR, et al., Drug Metab Dispos, 38: 2083-2090, 2010

• What can be developed with very large training and test sets?

• HLM training 50,000 testing 25,000 molecules

• training 194,000 and testing 39,000

• MDCK training 25,000 testing 25,000

• MDR training 25,000 testing 18,400

• Open molecular descriptors / models vs commercial descriptors

• Examples – Metabolic Stability


HLM Model with CDK and SMARTS Keys:

HLM Model with MOE2D and SMARTS Keys

# Descriptors: 578 Descriptors# Training Set compounds: 193,650

Cross Validation Results: 38,730 compounds

Training R2: 0.79

20% Test Set R2: 0.69

Blind Data Set (2310 compounds): R2 = 0.53RMSE = 0.367

Continuous Categorical:κ = 0.40Sensitivity = 0.16Specificity = 0.99PPV = 0.80Time (sec/compound): 0.252

# Descriptors: 818 Descriptors# Training Set compounds: 193,930

Cross Validation Results: 38,786 compounds

Training R2: 0.77

20% Test Set R2: 0.69

Blind Data Set (2310 compounds): R2 = 0.53RMSE = 0.367

Continuous Categorical: κ = 0.42Sensitivity = 0.24Specificity = 0.987PPV = 0.823Time (sec/compound): 0.303

PCA of training (red) and test (blue) compounds

Overlap in Chemistry space

• Examples – P-gp


Open source descriptors CDK and C5.0 algorithm

~60,000 molecules with P-gp efflux data from Pfizer

MDR <2.5 (low risk) (N = 14,175) MDR > 2.5 (high risk) (N = 10,820)

Test set MDR <2.5 (N = 10,441) > 2.5 (N = 7972)

Could facilitate model sharing?

CDK +fragment descriptors MOE 2D +fragment descriptorsKappa 0.65 0.67

sensitivity 0.86 0.86specificity 0.78 0.8

PPV 0.84 0.84

Merck KGaA

Combining models may give greater coverage of ADME/ Tox chemistry space and improve predictions?

Lundbeck

Pfizer

Merck

GSK

Novartis

Lilly

BMS

Allergan Bayer

AZ

Roche BI

Merk KGaA

Model coverage of chemistry space

Next steps

ADME/Tox Data crosses diseases Potential to share models selectively with collaborators e.g.

academics, neglected disease researchers We used the proof of concept to submit an SBIR

“Biocomputation across distributed private datasets to enhance drug discovery”

Develop prototype for sharing models securely- collaborate to show how combining data for TB etc could improve models

Phase II- develop a commercial product that leverages CDD Engage Pistoia Alliance to expand concept to many

companies – in progress

Future: What will be modeled

• Mitochondrial toxicity, hepatotoxicity, • More Transporters – MATE, OATPs, BSEP..bigger datasets – driven by

academia• Screening centers – more data – more models • Understanding differences between ligands for Nuclear Receptors

– CAR vs PXR

• Models will become replacements for data as datasets expand (e.g. like logP)

• Toxicity Models used for Green Chemistry

Chem Rev. 2010 Oct 13;110(10):5845-82

What You Might Not Know About Chemistry Databases On The Internet

• Data-sharing between open databases is cyclic• This can proliferate errors in the “Linked Data”

Government Databases Should Come With a Health Warning

Openness Can Bring Serious Quality Issues

NPC Browser http://tripod.nih.gov/npc/

Database released and within days 100’s of errors found in structures

Williams and Ekins, DDT, 16: 747-750 (2011)

Science Translational Medicine 2011

•Make science more accessible = >communication

•Mobile – take a phone into field /lab and do science more readily than on a laptop

•GREEN – energy efficient computing

•MolSync + DropBox + MMDS = Share molecules as SDF files on the cloud = collaborate

Mobile Apps for Drug Discovery

Williams et al DDT 16:928-939, 2011

Acknowledgments• University of Maryland

– Lei Diao– James E. Polli

• Pfizer– Rishi Gupta– Eric Gifford– Ted Liston– Chris Waller

• Merck– Jim Xu

• Antony J. Williams (RSC)

• Accelrys• CDD

• Email: [email protected]

Slideshare: http://www.slideshare.net/ekinssean

Twitter: collabchem

Blog: http://www.collabchem.com/

Website: http://www.collaborations.com/CHEMISTRY.HTM

Date post:	28-Jan-2015
Category:	Health & Medicine
Upload:	sean-ekins
View:	107 times
Download:	0 times

Unc slides on computational toxicology

Health & Medicine