In Silico Prediction of DILI - Extraction of Histopathology Data from Preclinical Toxicity Studies of the eTOX Database for new In Silico Models of Hepatotoxicity
Alexander Amberg1, Lennart T. Anger1, Manuela Stolte1, Jennifer Hemmerich1, Hans Matter2, Lilia Fisk3, Inga Tluczkiewicz4, Kevin Pinto-Gil5, Oriol López-Massaguer5, Manuel Pastor5 1Sanofi, Preclinical Safety, Frankfurt, Germany; 2Sanofi, Integrated Drug Discovery, Frankfurt, Germany; 3Lhasa Limited, Leeds, United Kingdom; 4Fraunhofer ITEM, Hannover, Germany; 5IMIM, GRIB, Barcelona, Spain
IN SILICO MODEL DEVELOPMENT AND RESULTS
CONCLUSION
The eTOX consortium extracted in vivo data from unpublished preclinical toxicity studies of 13 EFPIA partners. This new database contains high quality toxicity results in high detail level from 1,947 drug candidates (8,196 studies) supplemented with 1,286 chemicals from the RepDose database (2,695 studies). Different compilation steps were applied to transform these data into usable in silico model training datasets: initially, all toxicity findings were extracted from study reports (paper/PDF). Then the verbatim terms for all treatment-related hepatotoxicity findings were harmonized using special ontologies. Finally, to receive model training sets with sufficient compound numbers and chemical space coverage, all primary histopathology terms were combined and grouped to different 1st and then 2nd level clusters of similar toxicity mechanisms: e.g. primary necrosis terms such as “centrilobular”, “periportal” etc. were grouped to 1st level cluster “necrosis”, then clusters such as “necrosis”, “vacuolization” etc. were grouped to 2nd level cluster “degenerative lesions”. With this approach, various training datasets were compiled depending on the species (rat, dog and monkey), treatment durations (2 weeks - 2 years) and administrations routes. Then, different modeling approaches were applied on these datasets, including structural alerts, fragment-based and molecular descriptor-based machine learning approaches (e.g. random forest, decision tree, k nearest neighbor). Models were validated and optimized, first by internal validation (test set 10%) then by external validation using Sanofi’s confidential data. For example, best external validation results (n=66) were achieved for the 1st cluster rat necrosis models (229 positives, 198 negatives) using fragment-based (Sensitivity: 0.80, Specificity: 0.77) and a molecular descriptor-based decision tree approach (Sensitivity: 0.81, Specificity: 0.88). These validation results show that by reasonable clustering histopathology data from eTOX, it is possible to develop highly predictive in silico models for drug-induced liver injury (DILI).
METHODS
Structural Alerts – Validation and Improvement of existing and Identification of new alerts
eTOX Training Dataset Compilation Steps to transform the eTOX in vivo data into usable in silico model training datasets 1) Data extraction of treatment-/compound-related hepatotoxicity findings from study reports
#2118/P455
ABSTRACT
INTRODUCTION
A) EFPIA preclinical toxicity studies (paper/PDF)
• Reports/studies: 8,196 • Compounds: 1,947
B) RepDose DB (http://fraunhofer-repdose.de/)
• Reports/studies: 2,695 • Compounds: 1,286
Work performed at eTOX Hackathon (“Hack Marathon”) of toxicologist, pathologist, in silico modeler, data manager
2) Harmonization of the verbatim terms from study reports using special ontologies and combination of the data to different model trainings datasets
Rat • Histopathology • Clinical chemistry • Liver weight
Dog • Histopathology • Clinical chemistry • Liver weight
Monkey • Histopathology • Clinical chemistry • Liver weight
No. of compounds
EFPIA oral
RepDose gavage, feed
EFPIA oral
RepDose gavage, feed
EFPIA oral
RepDose gavage, feed
≤ 28 days (~ 1 month) 889 284 380 - 62 -
28 – 120 days (~ 3 month) 114 562 96 91 16 -
≥ 120 days (~ 6-24 month) 87 382 98 189 18 -
Data matrix available for all of these training sets
primary terms with at least one treatment-related finding (values = LOEL)
Example data matrix histopathology
Substance ID min dose
max dose
accumu-lation lipid
bile duct hyperplasia
congestion hyperplasia hypertrophy hypertrophy kupffer cells
Inflammation necrosis centrilobular
necrosis periportal
single cell necrosis
Compound 1 500 1500 Compound 2 50 1000 50 250 250 Compound 3 0.69 13.8 0.69 Compound 4 100 1750 750 100 Compound 5 38 510 Compound 6 50 2000 250 1000 Compound 7 60 500 60 Compound 8 100 2000 2000 100 Compound 9 5 651 Compound 10 1 100 5 100
No. of positives 3 35 7 47 111 6 40 6 8 33
all individual compounds
Primary terms - preferred ontology Positive cpds. 1st level cluster
Primary terms Positive cpds. 1st level cluster
1st level cluster
2nd level cluster
hypertrophy, epithelial | hypertrophy, hepatocyte | hypertrophy 142 cell enlargement | enlargement | hypertrophy | peroxisome proliferation | swelling
380 hypertrophy
intracellular vacuolation | vacuolation, biliary epithelium | vacuolation, epithelial
140 vacuolization 134 vacuolation degenerative lesions
accumulation, lipid | increased, lipid content | intracellular increase of lipids | vacuolation, lipidic (fatty change) | vacuolation, lipidic
131 fatty degeneration | lipidosis 114 steatosis degenerative lesions
necrosis, fibrinoid | necrosis, focal/multifocal | necrosis, hepato-cellular | necrosis, centrilobular | necrosis, midzonal | necrosis, periportal | necrosis, zonal | single cell necrosis …
126 apoptosis | necrosis 165 necrosis degenerative lesions
abscess(es) | chronic inflammatory/proliferative/ metaplastic changes | inflammation, granulomatous | inflammatory processes …
124 inflammation 67 inflammation inflammatory changes
inflammatory cell infiltration | granuloma | histiocytic inflammatory cell infiltrate | histiocytic proliferation | increased, histiocyte number | increased plasma cell number ...
94 granulocytes | granuloma | histiocytosis | infiltration | leukocytes | macrophages | polynuclear cells
78 infiltration inflammatory changes
RepDose: histopathology findingsEFPIA legacy reports: histopathology findings
• Many compilation steps had to be applied to transform the in vivo hepatotoxicity data from the eTOX database into usable model training datasets
• Data extraction of treatment-related findings from unpublished preclinical toxicity study reports • Harmonization of the verbatim terms using special ontologies and combination of the data • Grouping of primary histopathology terms to 1st and 2nd level cluster of similar findings (and mechanisms) to
receive trainings data with sufficient compounds numbers and chemical space coverage
• Various in silico models were developed from these training datasets using approaches like • Statistical structural fragment / fingerprint based models • Molecular descriptor based machine learning models (QSAR), like Decision Tree (DT), Partial Least Square
(PLS), Random Forest (RF), k-Nearest Neighbor (kNN) etc. • Systems biology models • Structural alerts
• Internal & external validation showed sensitivities up to 81% and specificities up to 88%
• Case study on tienilic acid demonstrated how these in silico models can be used for the prediction of DILI and for elucidation of potential mechanisms of hepatotoxicity
• Mulliner et al. models [7] prediction results • Leadscope & SVM prediction summary
• eTOX models prediction results • Leadscope & C5 DT summary
structural feature contribution structural feature contribution
Tienilic acid Withdrawn from the market due to idiosyncratic autoimmune mediated
hepatotoxicity findings in patients, characterized by covalent CYP2C9 binding (antibodies of acylated CYP2C9 hapten found in patients) [6]
Structural alert: Thiophene c) metabolic CYP soft spot
analysis (MetaSite)
• Drug Induced Liver Injury (DILI) • still of great concern for patient safety and
major cause for drug candidate attrition and drug withdrawal from the market
• Data about liver toxicity • scattered and available from many sources public, proprietary, consortia (eTOX), commercial eSafety in vitro, preclinical in vivo, clinical, post-market
• Chen et al. [1] reviewed several in silico models for human DILI (since 2012) • Models trained with 74 - 1087 compounds from different data sources using various modeling approaches • Performance: higher accuracy for small datasets: 70-84% (13-53 compounds)
lower accuracy for larger datasets: 60-75% (73-1087 compounds) high specificity (90-95%), but poorer sensitivity (~50%)
Conclusion • Sufficiently harmonized hepatotoxicity datasets from all the different sources in high detail level missing • Consequently, satisfying in silico models for generation of reliable predictions of hepatotoxicity lacking
using these harmonized, detailed training data
• In the eTOX consortium framework in vivo data were extracted from unpublished preclinical toxicity studies of 13 EFPIA partners from pharmaceutical industry [2]
eTOX database:
high quality preclinical toxicity results in high detail level
Consortium (www.e-tox.net)
• Objectives A) Setup eTOX database
and pre-competitive data sharing • Internal EFPIA toxicity reports
plus public toxicity data • Using standardized ontologies
and database schema
B) Development of new prediction models
• Use of eTOX data as model training data
• Participants
Pharma (13) Academia (11) SME (6)
The research leading to these results has received support from the Innovative Medicines Initiative Joint Undertaking under grant agreement nº 115002 (eTOX), resources of which are composed of financial contribution from the European Union’s Seventh Framework Programme (FP7/2007-2013) and EFPIA companies’ in kind contributions
• Validation of alerts using different datasets [2 , 7] • drugs, (pre)clinical, chemicals, individual species (rat, dog, monkey)
Validation results: example eTOX drugs (% correct positive predictions)
[1] Chen M, Bisgin H, Tong L, Hong H, Fang H, Borlak J, Tong W “Towards predictive models for drug-induced liver injury in humans: are we there yet?” Biomarkers Med. 8(2), 201-213, 2014 [2] Sanz F, Pognan F, Steger-Hartmann T, Díaz C “Legacy data sharing to improve drug safety assessment: the eTOX project” Nat. Rev. Drug Discov. 16(12), 811-812, 2017 [3] Carbonell P, Lopez O, Amberg A, Pastor M, Sanz F “Hepatotoxicity Prediction by Systems Biology Modeling of Disturbed Metabolic Pathways using Gene Expression Data” ALTEX 34(2), 219-234, 2017 [4] López-Massaguer O, Pinto-Gil K, Sanz F, Amberg A, Anger LT, Stolte M, Ravagli C, Marc P, Pastor M “Generating modelling data from repeat-dose toxicity reports” Toxicol. Sci., 1–14, 2017 [5] Enoch SJ, Ellison CM, Schultz TW, Cronin MT “A review of the electrophilic reaction chemistry involved in covalent protein binding relevant to toxicity” Crit. Rev. Toxicol. 41(9), 783-802, 2011 [6] Kalgutkar AS, Gardner I, Obach RS, Shaffer CL, Callegari E, Henne KR et al. „A comprehensive listing of bioactivation pathways of organic functional groups” Curr. Drug Metab. 6(3), 161-225, 2005 [7] Mulliner D, Schmidt F, Stolte M, Spirkl HP, Czich A, Amberg A. “Computational Models for Human and Animal Hepatotoxicity with a Global Application Scope” Chem. Res. Toxicol. 29(5), 757-767, 2016
Quinoline Derek alert 685
PP = 78% (14/18 positive)
Thiopene Kalgutkar PP = 37%
(15/41 positive)
Phenoxyacetic acid Derek alert 690
PP = 53% (10/19 positive)
Aromatic sulphonamide
Kalgutkar PP = 9.5%
(2/21 positive)
LOEL Boxplot summary
Quinoline
Kalgutkar [6] Enoch/Cronin [5]
Alert source Derek Nexus 5.0.2
• Translational assessment • preclinical vs. clinical
predictivity
metabolic soft spot analysis
(MetaSite)
Tienilic acid
(6/21)
(31/34) (12/17)
(5/18)
(positives/total)
Statistical structural fragment / fingerprint based models • Model approach (www.leadscope.com)
• Use of predefined fragments for statistical correlation to toxicological activity from training dataset
• Model output • Predicted probability (0-100%) for model endpoint • Information in / out of domain of training data • Individual structural fragments analysis • Link to the respective training data
QSAR - Molecular descriptor based machine learning models • QSAR Modeling approach A
• Cubist c5.0 Decision Tree model
• Molecular descriptors • MOE, CATS, Crippen packages
• Model output • Classification model (50-60 rules):
positive / negative for respective model endpoint
• Validation results • External validation using Sanofi’s confidential data
(n=66)
Systems biology models
Model training data 77 overlapping compounds extracted from
Modeling approach on the example statins
• Validation results • Internal validation (test dataset: 4% of original dataset)
80.2 79.2 81.3 77.7 73.3 80.6 78.5 78.8 78.1 78.8 80.3 76.8
2nd level cluster
1st level cluster
74.3 72.4 76.9 79.8 75.0 82.8 78.1 75.0 79.8 79.4 78.1 80.3 80.6 81.7 79.3 78.4 78.0 79.2
78.0 80.0 77.0
out of domain
predictions 20 29
85.0 81.0 88.0
• Results [3] • Most frequent disturbed
metabolic pathways: mitochondrial β-oxidation of fatty acids and amino acid metabolism
• ROC AUC up to 0.9 for some hepatotox endpoints
• Quantitative assessment • use of LOEL of finding, box plot analysis
• Implementation of mechanistic information • identification of responsible reactive metabolite • likelihood for reactive metabolite formation by analysis
of metabolic soft spots using MetaSite (www.moldiscovery.com/software/metasite)
• Identification of new structural alerts • Fragment analysis by Leadscope
• Fragment analysis by SARpy (www.vegahub.eu/portfolio-item/sarpy)
• Molecular descriptors • Adriana, GRIND2
descriptors using AdrianaCode and Pentacle software
• Model output • Qualitative and
quantitative scores
• Validation results [4] • Qualitative models DEG, INF, PRO
CASE STUDY
structural feature contribution
R
R
R
Training compounds
Training compounds
Training compounds a) quantitative assessment
Training compounds
Training compounds
External validation results: rat liver necrosis (NEC)
• QSAR Modeling approach B • PLS-R/ PLS-DA (Partial Least Square Regression/
Discriminant Analysis), RF-R/RF-C (Random Forest Regressor/
Classifier)
• Validation results [4] • Quantitative model PRO
• Aim: increase mechanistic understanding: pathway, cell, tissue, organ level
Endpoint (mg/kg) Number of points 50 100 500
Hepatotoxicity 14 0.89 0.89 0.54
1. Clinical chemistry findings 11 0.92 0.92 0.83
1.1 Hepatobiliary injury 3 1 1 0.5
1.2 Hepatocellular injury 10 0.90 0.90 0.81
2. Morphological findings 8 1.0 1.0 -
2.1 Hepatobiliary injury 1 - - -
2.2 Hepatocellular injury 8 1.0 1.0 -
Bioactivation of Thiophene [6]
Pyrazole PP = 63% (10/16 positives)
Triflourmethylbenzene PP = 61% (40/61 positives)
Acetamidobenzene PP = 51% (26/51 positives)
Halogenated benzylamine PP = 73% (24/33 positives)
in total 78
primary terms
b) Metabolic activation / toxicification step
SME = Small Medium Enterprises
200
100
0 Dos
e [m
g/kg
/day
]
Dog
Monkey
Rat
12/17 1/4 1/2
1,000
100
10
1
LOEL
Dog Monkey
Rat . . .
Median . . . . . . . . . . .
.
. . .
Dos
e [m
g/kg
/day
]
Tested Doses
Compound 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
3) Grouping of histopathology terms - to cluster of similar findings (and mechanism)
Dos
e [m
g/kg
/day
]
300
100
0
200
400
500
600 5 / 10 2 / 3
Dog
Rat
In Silico Models Cluster level
No. training compounds Positives1 Negatives2
Necrosis NEC 1st 427 229 198
Steatosis STE 1st 406 208 198
Inflammation IFM 1st 322 124 198
Infiltration IFT 1st 302 104 198
Proliferation PLF 1st 345 147 198
Hyperplasia HYP 1st 439 241 198
Hypertrophy HYT 1st 652 455 197
Degenerative lesions DEG 2nd 602 355 247
Inflammatory changes INF 2nd 412 165 247
Non-neoplastic proliferative changes PRO 2nd 558 311 247
Training datasets • Histopathology data oral rat studies
up to 2 years used for in silico models
1Positives: Compounds with findings in respective endpoint 2Negatives: Compounds without liver findings in histopathology, clinical chemistry, liver weight and tested up to 1000mg/kg
Hepatotoxicity: eTOX (preclinical)
Degenerative lesions
Necrosis
Inflammatory changes
Steatosis Inflammation Infiltration
2nd level cluster
Non-neoplastic proliferative changes
1st level cluster Proliferation Hyperplasia … … … Hypertrophy
S
O
Cl
ClO
OHO
S
OR
metabolicactivation/toxification
NNH
R
NH
Cl
NH
F
R R
NH
OR
R
F FF
RN
R
OOH
O
R
SRS NO
O R
R
R