+ All Categories
Home > Documents > In Silico Prediction of DILI - Extraction of ... Silico Prediction of DILI... · In Silico...

In Silico Prediction of DILI - Extraction of ... Silico Prediction of DILI... · In Silico...

Date post: 26-Mar-2020
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
1
In Silico Prediction of DILI - Extraction of Histopathology Data from Preclinical Toxicity Studies of the eTOX Database for new In Silico Models of Hepatotoxicity Alexander Amberg 1 , Lennart T. Anger 1 , Manuela Stolte 1 , Jennifer Hemmerich 1 , Hans Matter 2 , Lilia Fisk 3 , Inga Tluczkiewicz 4 , Kevin Pinto-Gil 5 , Oriol López-Massaguer 5 , Manuel Pastor 5 1 Sanofi, Preclinical Safety, Frankfurt, Germany; 2 Sanofi, Integrated Drug Discovery, Frankfurt, Germany; 3 Lhasa Limited, Leeds, United Kingdom; 4 Fraunhofer ITEM, Hannover, Germany; 5 IMIM, GRIB, Barcelona, Spain IN SILICO MODEL DEVELOPMENT AND RESULTS CONCLUSION The eTOX consortium extracted in vivo data from unpublished preclinical toxicity studies of 13 EFPIA partners. This new database contains high quality toxicity results in high detail level from 1,947 drug candidates (8,196 studies) supplemented with 1,286 chemicals from the RepDose database (2,695 studies). Different compilation steps were applied to transform these data into usable in silico model training datasets: initially, all toxicity findings were extracted from study reports (paper/PDF). Then the verbatim terms for all treatment-related hepatotoxicity findings were harmonized using special ontologies. Finally, to receive model training sets with sufficient compound numbers and chemical space coverage, all primary histopathology terms were combined and grouped to different 1st and then 2nd level clusters of similar toxicity mechanisms: e.g. primary necrosis terms such as “centrilobular”, “periportal” etc. were grouped to 1st level cluster “necrosis”, then clusters such as “necrosis”, “vacuolization” etc. were grouped to 2nd level cluster “degenerative lesions”. With this approach, various training datasets were compiled depending on the species (rat, dog and monkey), treatment durations (2 weeks - 2 years) and administrations routes. Then, different modeling approaches were applied on these datasets, including structural alerts, fragment-based and molecular descriptor-based machine learning approaches (e.g. random forest, decision tree, k nearest neighbor). Models were validated and optimized, first by internal validation (test set 10%) then by external validation using Sanofi’s confidential data. For example, best external validation results (n=66) were achieved for the 1st cluster rat necrosis models (229 positives, 198 negatives) using fragment-based (Sensitivity: 0.80, Specificity: 0.77) and a molecular descriptor-based decision tree approach (Sensitivity: 0.81, Specificity: 0.88). These validation results show that by reasonable clustering histopathology data from eTOX, it is possible to develop highly predictive in silico models for drug-induced liver injury (DILI). METHODS Structural Alerts – Validation and Improvement of existing and Identification of new alerts eTOX Training Dataset Compilation Steps to transform the eTOX in vivo data into usable in silico model training datasets 1) Data extraction of treatment-/compound-related hepatotoxicity findings from study reports #2118/P455 ABSTRACT INTRODUCTION A) EFPIA preclinical toxicity studies (paper/PDF) Reports/studies: 8,196 Compounds: 1,947 B) RepDose DB (http://fraunhofer-repdose.de/) Reports/studies: 2,695 Compounds: 1,286 Work performed at eTOX Hackathon (“Hack Marathon”) of toxicologist, pathologist, in silico modeler, data manager 2) Harmonization of the verbatim terms from study reports using special ontologies and combination of the data to different model trainings datasets Rat Histopathology Clinical chemistry Liver weight Dog Histopathology Clinical chemistry Liver weight Monkey Histopathology Clinical chemistry Liver weight No. of compounds EFPIA oral RepDose gavage, feed EFPIA oral RepDose gavage, feed EFPIA oral RepDose gavage, feed ≤ 28 days (~ 1 month) 889 284 380 - 62 - 28 – 120 days (~ 3 month) 114 562 96 91 16 - ≥ 120 days (~ 6-24 month) 87 382 98 189 18 - Data matrix available for all of these training sets primary terms with at least one treatment-related finding (values = LOEL) Example data matrix histopathology Substance ID min dose max dose accumu- lation lipid bile duct hyperplasia congestion hyperplasia hypertrophy hypertrophy kupffer cells Inflammation necrosis centrilobular necrosis periportal single cell necrosis Compound 1 500 1500 Compound 2 50 1000 50 250 250 Compound 3 0.69 13.8 0.69 Compound 4 100 1750 750 100 Compound 5 38 510 Compound 6 50 2000 250 1000 Compound 7 60 500 60 Compound 8 100 2000 2000 100 Compound 9 5 651 Compound 10 1 100 5 100 No. of positives 3 35 7 47 111 6 40 6 8 33 all individual compounds Primary terms - preferred ontology Positive cpds. 1 st level cluster Primary terms Positive cpds. 1 st level cluster 1 st level cluster 2 nd level cluster hypertrophy, epithelial | hypertrophy, hepatocyte | hypertrophy 142 cell enlargement | enlargement | hypertrophy | peroxisome proliferation | swelling 380 hypertrophy intracellular vacuolation | vacuolation, biliary epithelium | vacuolation, epithelial 140 vacuolization 134 vacuolation degenerative lesions accumulation, lipid | increased, lipid content | intracellular increase of lipids | vacuolation, lipidic (fatty change) | vacuolation, lipidic 131 fatty degeneration | lipidosis 114 steatosis degenerative lesions necrosis, fibrinoid | necrosis, focal/multifocal | necrosis, hepato- cellular | necrosis, centrilobular | necrosis, midzonal | necrosis, periportal | necrosis, zonal | single cell necrosis … 126 apoptosis | necrosis 165 necrosis degenerative lesions abscess(es) | chronic inflammatory/proliferative/ metaplastic changes | inflammation, granulomatous | inflammatory processes … 124 inflammation 67 inflammation inflammatory changes inflammatory cell infiltration | granuloma | histiocytic inflammatory cell infiltrate | histiocytic proliferation | increased, histiocyte number | increased plasma cell number ... 94 granulocytes | granuloma | histiocytosis | infiltration | leukocytes | macrophages | polynuclear cells 78 infiltration inflammatory changes RepDose: histopathology findings EFPIA legacy reports: histopathology findings Many compilation steps had to be applied to transform the in vivo hepatotoxicity data from the eTOX database into usable model training datasets Data extraction of treatment-related findings from unpublished preclinical toxicity study reports Harmonization of the verbatim terms using special ontologies and combination of the data Grouping of primary histopathology terms to 1 st and 2 nd level cluster of similar findings (and mechanisms) to receive trainings data with sufficient compounds numbers and chemical space coverage Various in silico models were developed from these training datasets using approaches like Statistical structural fragment / fingerprint based models Molecular descriptor based machine learning models (QSAR), like Decision Tree (DT), Partial Least Square (PLS), Random Forest (RF), k-Nearest Neighbor (kNN) etc. Systems biology models Structural alerts Internal & external validation showed sensitivities up to 81% and specificities up to 88% Case study on tienilic acid demonstrated how these in silico models can be used for the prediction of DILI and for elucidation of potential mechanisms of hepatotoxicity Mulliner et al. models [7] prediction results Leadscope & SVM prediction summary eTOX models prediction results Leadscope & C5 DT summary structural feature contribution structural feature contribution Tienilic acid Withdrawn from the market due to idiosyncratic autoimmune mediated hepatotoxicity findings in patients, characterized by covalent CYP2C9 binding (antibodies of acylated CYP2C9 hapten found in patients) [6] Structural alert: Thiophene c) metabolic CYP soft spot analysis (MetaSite) Drug Induced Liver Injury (DILI) still of great concern for patient safety and major cause for drug candidate attrition and drug withdrawal from the market Data about liver toxicity scattered and available from many sources public, proprietary, consortia (eTOX), commercial eSafety in vitro, preclinical in vivo, clinical, post-market Chen et al. [1] reviewed several in silico models for human DILI (since 2012) Models trained with 74 - 1087 compounds from different data sources using various modeling approaches Performance: higher accuracy for small datasets: 70-84% (13-53 compounds) lower accuracy for larger datasets: 60-75% (73-1087 compounds) high specificity (90-95%), but poorer sensitivity (~50%) Conclusion Sufficiently harmonized hepatotoxicity datasets from all the different sources in high detail level missing Consequently, satisfying in silico models for generation of reliable predictions of hepatotoxicity lacking using these harmonized, detailed training data In the eTOX consortium framework in vivo data were extracted from unpublished preclinical toxicity studies of 13 EFPIA partners from pharmaceutical industry [2] eTOX database: high quality preclinical toxicity results in high detail level Consortium (www.e-tox.net) Objectives A) Setup eTOX database and pre-competitive data sharing Internal EFPIA toxicity reports plus public toxicity data Using standardized ontologies and database schema B) Development of new prediction models Use of eTOX data as model training data Participants Pharma (13) Academia (11) SME (6) The research leading to these results has received support from the Innovative Medicines Initiative Joint Undertaking under grant agreement nº 115002 (eTOX), resources of which are composed of financial contribution from the European Union’s Seventh Framework Programme (FP7/2007-2013) and EFPIA companies’ in kind contributions Validation of alerts using different datasets [2 , 7] drugs, (pre)clinical, chemicals, individual species (rat, dog, monkey) Validation results: example eTOX drugs (% correct positive predictions) [1] Chen M, Bisgin H, Tong L, Hong H, Fang H, Borlak J, Tong W “Towards predictive models for drug-induced liver injury in humans: are we there yet?” Biomarkers Med. 8(2), 201-213, 2014 [2] Sanz F, Pognan F, Steger-Hartmann T, Díaz C “Legacy data sharing to improve drug safety assessment: the eTOX project” Nat. Rev. Drug Discov. 16(12), 811-812, 2017 [3] Carbonell P, Lopez O, Amberg A, Pastor M, Sanz F “Hepatotoxicity Prediction by Systems Biology Modeling of Disturbed Metabolic Pathways using Gene Expression Data” ALTEX 34(2), 219-234, 2017 [4] López-Massaguer O, Pinto-Gil K, Sanz F, Amberg A, Anger LT, Stolte M, Ravagli C, Marc P, Pastor M “Generating modelling data from repeat-dose toxicity reports” Toxicol. Sci., 1–14, 2017 [5] Enoch SJ, Ellison CM, Schultz TW, Cronin MT “A review of the electrophilic reaction chemistry involved in covalent protein binding relevant to toxicity” Crit. Rev. Toxicol. 41(9), 783-802, 2011 [6] Kalgutkar AS, Gardner I, Obach RS, Shaffer CL, Callegari E, Henne KR et al. „A comprehensive listing of bioactivation pathways of organic functional groups” Curr. Drug Metab. 6(3), 161-225, 2005 [7] Mulliner D, Schmidt F, Stolte M, Spirkl HP, Czich A, Amberg A. “Computational Models for Human and Animal Hepatotoxicity with a Global Application Scope” Chem. Res. Toxicol. 29(5), 757-767, 2016 Quinoline Derek alert 685 PP = 78% (14/18 positive) Thiopene Kalgutkar PP = 37% (15/41 positive) Phenoxyacetic acid Derek alert 690 PP = 53% (10/19 positive) Aromatic sulphonamide Kalgutkar PP = 9.5% (2/21 positive) LOEL Boxplot summary Quinoline Kalgutkar [6] Enoch/Cronin [5] Alert source Derek Nexus 5.0.2 Translational assessment preclinical vs. clinical predictivity metabolic soft spot analysis (MetaSite) Tienilic acid (6/21) (31/34) (12/17) (5/18) (positives/total) Statistical structural fragment / fingerprint based models Model approach (www.leadscope.com) Use of predefined fragments for statistical correlation to toxicological activity from training dataset Model output Predicted probability (0-100%) for model endpoint Information in / out of domain of training data Individual structural fragments analysis Link to the respective training data QSAR - Molecular descriptor based machine learning models QSAR Modeling approach A Cubist c5.0 Decision Tree model Molecular descriptors MOE, CATS, Crippen packages Model output Classification model (50-60 rules): positive / negative for respective model endpoint Validation results External validation using Sanofi’s confidential data (n=66) Systems biology models Model training data 77 overlapping compounds extracted from Modeling approach on the example statins Validation results Internal validation (test dataset: 4% of original dataset) 80.2 79.2 81.3 77.7 73.3 80.6 78.5 78.8 78.1 78.8 80.3 76.8 2 nd level cluster 1 st level cluster 74.3 72.4 76.9 79.8 75.0 82.8 78.1 75.0 79.8 79.4 78.1 80.3 80.6 81.7 79.3 78.4 78.0 79.2 78.0 80.0 77.0 out of domain predictions 20 29 85.0 81.0 88.0 Results [3] Most frequent disturbed metabolic pathways: mitochondrial β-oxidation of fatty acids and amino acid metabolism ROC AUC up to 0.9 for some hepatotox endpoints Quantitative assessment use of LOEL of finding, box plot analysis Implementation of mechanistic information identification of responsible reactive metabolite likelihood for reactive metabolite formation by analysis of metabolic soft spots using MetaSite (www.moldiscovery.com/software/metasite) Identification of new structural alerts Fragment analysis by Leadscope Fragment analysis by SARpy (www.vegahub.eu/portfolio-item/sarpy) Molecular descriptors Adriana, GRIND2 descriptors using AdrianaCode and Pentacle software Model output Qualitative and quantitative scores Validation results [4] Qualitative models DEG, INF, PRO CASE STUDY structural feature contribution R R R Training compounds Training compounds Training compounds a) quantitative assessment Training compounds Training compounds External validation results: rat liver necrosis (NEC) QSAR Modeling approach B PLS-R/ PLS-DA (Partial Least Square Regression/ Discriminant Analysis), RF-R/RF-C (Random Forest Regressor/ Classifier) Validation results [4] Quantitative model PRO Aim: increase mechanistic understanding: pathway, cell, tissue, organ level Endpoint (mg/kg) Number of points 50 100 500 Hepatotoxicity 14 0.89 0.89 0.54 1. Clinical chemistry findings 11 0.92 0.92 0.83 1.1 Hepatobiliary injury 3 1 1 0.5 1.2 Hepatocellular injury 10 0.90 0.90 0.81 2. Morphological findings 8 1.0 1.0 - 2.1 Hepatobiliary injury 1 - - - 2.2 Hepatocellular injury 8 1.0 1.0 - Bioactivation of Thiophene [6] Pyrazole PP = 63% (10/16 positives) Triflourmethylbenzene PP = 61% (40/61 positives) Acetamidobenzene PP = 51% (26/51 positives) Halogenated benzylamine PP = 73% (24/33 positives) in total 78 primary terms b) Metabolic activation / toxicification step SME = Small Medium Enterprises 200 100 0 Dose [mg/kg/day] Dog Monkey Rat 12/17 1/4 1/2 1,000 100 10 1 LOEL Dog Monkey Rat . . . Median . . . . . . . . . . . . . . . Dose [mg/kg/day] Tested Doses Compound 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 3) Grouping of histopathology terms - to cluster of similar findings (and mechanism) Dose [mg/kg/day] 300 100 0 200 400 500 600 5 / 10 2 / 3 Dog Rat In Silico Models Cluster level No. training compounds Positives 1 Negatives 2 Necrosis NEC 1 st 427 229 198 Steatosis STE 1 st 406 208 198 Inflammation IFM 1 st 322 124 198 Infiltration IFT 1 st 302 104 198 Proliferation PLF 1 st 345 147 198 Hyperplasia HYP 1 st 439 241 198 Hypertrophy HYT 1 st 652 455 197 Degenerative lesions DEG 2 nd 602 355 247 Inflammatory changes INF 2 nd 412 165 247 Non-neoplastic proliferative changes PRO 2 nd 558 311 247 Training datasets Histopathology data oral rat studies up to 2 years used for in silico models 1 Positives: Compounds with findings in respective endpoint 2 Negatives: Compounds without liver findings in histopathology, clinical chemistry, liver weight and tested up to 1000mg/kg Hepatotoxicity: eTOX (preclinical) Degenerative lesions Necrosis Inflammatory changes Steatosis Inflammation Infiltration 2 nd level cluster Non-neoplastic proliferative changes 1 st level cluster Proliferation Hyperplasia Hypertrophy S O Cl Cl O OH O S O R metabolic activation/ toxification N N H R N H Cl N H F R R NH O R R F F F R N R O OH O R S R S N O O R R R
Transcript
Page 1: In Silico Prediction of DILI - Extraction of ... Silico Prediction of DILI... · In Silico Prediction of DILI - Extraction of Histopathology Data from Preclinical Toxicity Studies

In Silico Prediction of DILI - Extraction of Histopathology Data from Preclinical Toxicity Studies of the eTOX Database for new In Silico Models of Hepatotoxicity

Alexander Amberg1, Lennart T. Anger1, Manuela Stolte1, Jennifer Hemmerich1, Hans Matter2, Lilia Fisk3, Inga Tluczkiewicz4, Kevin Pinto-Gil5, Oriol López-Massaguer5, Manuel Pastor5 1Sanofi, Preclinical Safety, Frankfurt, Germany; 2Sanofi, Integrated Drug Discovery, Frankfurt, Germany; 3Lhasa Limited, Leeds, United Kingdom; 4Fraunhofer ITEM, Hannover, Germany; 5IMIM, GRIB, Barcelona, Spain

IN SILICO MODEL DEVELOPMENT AND RESULTS

CONCLUSION

The eTOX consortium extracted in vivo data from unpublished preclinical toxicity studies of 13 EFPIA partners. This new database contains high quality toxicity results in high detail level from 1,947 drug candidates (8,196 studies) supplemented with 1,286 chemicals from the RepDose database (2,695 studies). Different compilation steps were applied to transform these data into usable in silico model training datasets: initially, all toxicity findings were extracted from study reports (paper/PDF). Then the verbatim terms for all treatment-related hepatotoxicity findings were harmonized using special ontologies. Finally, to receive model training sets with sufficient compound numbers and chemical space coverage, all primary histopathology terms were combined and grouped to different 1st and then 2nd level clusters of similar toxicity mechanisms: e.g. primary necrosis terms such as “centrilobular”, “periportal” etc. were grouped to 1st level cluster “necrosis”, then clusters such as “necrosis”, “vacuolization” etc. were grouped to 2nd level cluster “degenerative lesions”. With this approach, various training datasets were compiled depending on the species (rat, dog and monkey), treatment durations (2 weeks - 2 years) and administrations routes. Then, different modeling approaches were applied on these datasets, including structural alerts, fragment-based and molecular descriptor-based machine learning approaches (e.g. random forest, decision tree, k nearest neighbor). Models were validated and optimized, first by internal validation (test set 10%) then by external validation using Sanofi’s confidential data. For example, best external validation results (n=66) were achieved for the 1st cluster rat necrosis models (229 positives, 198 negatives) using fragment-based (Sensitivity: 0.80, Specificity: 0.77) and a molecular descriptor-based decision tree approach (Sensitivity: 0.81, Specificity: 0.88). These validation results show that by reasonable clustering histopathology data from eTOX, it is possible to develop highly predictive in silico models for drug-induced liver injury (DILI).

METHODS

Structural Alerts – Validation and Improvement of existing and Identification of new alerts

eTOX Training Dataset Compilation Steps to transform the eTOX in vivo data into usable in silico model training datasets 1) Data extraction of treatment-/compound-related hepatotoxicity findings from study reports

#2118/P455

ABSTRACT

INTRODUCTION

A) EFPIA preclinical toxicity studies (paper/PDF)

• Reports/studies: 8,196 • Compounds: 1,947

B) RepDose DB (http://fraunhofer-repdose.de/)

• Reports/studies: 2,695 • Compounds: 1,286

Work performed at eTOX Hackathon (“Hack Marathon”) of toxicologist, pathologist, in silico modeler, data manager

2) Harmonization of the verbatim terms from study reports using special ontologies and combination of the data to different model trainings datasets

Rat • Histopathology • Clinical chemistry • Liver weight

Dog • Histopathology • Clinical chemistry • Liver weight

Monkey • Histopathology • Clinical chemistry • Liver weight

No. of compounds

EFPIA oral

RepDose gavage, feed

EFPIA oral

RepDose gavage, feed

EFPIA oral

RepDose gavage, feed

≤ 28 days (~ 1 month) 889 284 380 - 62 -

28 – 120 days (~ 3 month) 114 562 96 91 16 -

≥ 120 days (~ 6-24 month) 87 382 98 189 18 -

Data matrix available for all of these training sets

primary terms with at least one treatment-related finding (values = LOEL)

Example data matrix histopathology

Substance ID min dose

max dose

accumu-lation lipid

bile duct hyperplasia

congestion hyperplasia hypertrophy hypertrophy kupffer cells

Inflammation necrosis centrilobular

necrosis periportal

single cell necrosis

Compound 1 500 1500 Compound 2 50 1000 50 250 250 Compound 3 0.69 13.8 0.69 Compound 4 100 1750 750 100 Compound 5 38 510 Compound 6 50 2000 250 1000 Compound 7 60 500 60 Compound 8 100 2000 2000 100 Compound 9 5 651 Compound 10 1 100 5 100

No. of positives 3 35 7 47 111 6 40 6 8 33

all individual compounds

Primary terms - preferred ontology Positive cpds. 1st level cluster

Primary terms Positive cpds. 1st level cluster

1st level cluster

2nd level cluster

hypertrophy, epithelial | hypertrophy, hepatocyte | hypertrophy 142 cell enlargement | enlargement | hypertrophy | peroxisome proliferation | swelling

380 hypertrophy

intracellular vacuolation | vacuolation, biliary epithelium | vacuolation, epithelial

140 vacuolization 134 vacuolation degenerative lesions

accumulation, lipid | increased, lipid content | intracellular increase of lipids | vacuolation, lipidic (fatty change) | vacuolation, lipidic

131 fatty degeneration | lipidosis 114 steatosis degenerative lesions

necrosis, fibrinoid | necrosis, focal/multifocal | necrosis, hepato-cellular | necrosis, centrilobular | necrosis, midzonal | necrosis, periportal | necrosis, zonal | single cell necrosis …

126 apoptosis | necrosis 165 necrosis degenerative lesions

abscess(es) | chronic inflammatory/proliferative/ metaplastic changes | inflammation, granulomatous | inflammatory processes …

124 inflammation 67 inflammation inflammatory changes

inflammatory cell infiltration | granuloma | histiocytic inflammatory cell infiltrate | histiocytic proliferation | increased, histiocyte number | increased plasma cell number ...

94 granulocytes | granuloma | histiocytosis | infiltration | leukocytes | macrophages | polynuclear cells

78 infiltration inflammatory changes

RepDose: histopathology findingsEFPIA legacy reports: histopathology findings

• Many compilation steps had to be applied to transform the in vivo hepatotoxicity data from the eTOX database into usable model training datasets

• Data extraction of treatment-related findings from unpublished preclinical toxicity study reports • Harmonization of the verbatim terms using special ontologies and combination of the data • Grouping of primary histopathology terms to 1st and 2nd level cluster of similar findings (and mechanisms) to

receive trainings data with sufficient compounds numbers and chemical space coverage

• Various in silico models were developed from these training datasets using approaches like • Statistical structural fragment / fingerprint based models • Molecular descriptor based machine learning models (QSAR), like Decision Tree (DT), Partial Least Square

(PLS), Random Forest (RF), k-Nearest Neighbor (kNN) etc. • Systems biology models • Structural alerts

• Internal & external validation showed sensitivities up to 81% and specificities up to 88%

• Case study on tienilic acid demonstrated how these in silico models can be used for the prediction of DILI and for elucidation of potential mechanisms of hepatotoxicity

• Mulliner et al. models [7] prediction results • Leadscope & SVM prediction summary

• eTOX models prediction results • Leadscope & C5 DT summary

structural feature contribution structural feature contribution

Tienilic acid Withdrawn from the market due to idiosyncratic autoimmune mediated

hepatotoxicity findings in patients, characterized by covalent CYP2C9 binding (antibodies of acylated CYP2C9 hapten found in patients) [6]

Structural alert: Thiophene c) metabolic CYP soft spot

analysis (MetaSite)

• Drug Induced Liver Injury (DILI) • still of great concern for patient safety and

major cause for drug candidate attrition and drug withdrawal from the market

• Data about liver toxicity • scattered and available from many sources public, proprietary, consortia (eTOX), commercial eSafety in vitro, preclinical in vivo, clinical, post-market

• Chen et al. [1] reviewed several in silico models for human DILI (since 2012) • Models trained with 74 - 1087 compounds from different data sources using various modeling approaches • Performance: higher accuracy for small datasets: 70-84% (13-53 compounds)

lower accuracy for larger datasets: 60-75% (73-1087 compounds) high specificity (90-95%), but poorer sensitivity (~50%)

Conclusion • Sufficiently harmonized hepatotoxicity datasets from all the different sources in high detail level missing • Consequently, satisfying in silico models for generation of reliable predictions of hepatotoxicity lacking

using these harmonized, detailed training data

• In the eTOX consortium framework in vivo data were extracted from unpublished preclinical toxicity studies of 13 EFPIA partners from pharmaceutical industry [2]

eTOX database:

high quality preclinical toxicity results in high detail level

Consortium (www.e-tox.net)

• Objectives A) Setup eTOX database

and pre-competitive data sharing • Internal EFPIA toxicity reports

plus public toxicity data • Using standardized ontologies

and database schema

B) Development of new prediction models

• Use of eTOX data as model training data

• Participants

Pharma (13) Academia (11) SME (6)

The research leading to these results has received support from the Innovative Medicines Initiative Joint Undertaking under grant agreement nº 115002 (eTOX), resources of which are composed of financial contribution from the European Union’s Seventh Framework Programme (FP7/2007-2013) and EFPIA companies’ in kind contributions

• Validation of alerts using different datasets [2 , 7] • drugs, (pre)clinical, chemicals, individual species (rat, dog, monkey)

Validation results: example eTOX drugs (% correct positive predictions)

[1] Chen M, Bisgin H, Tong L, Hong H, Fang H, Borlak J, Tong W “Towards predictive models for drug-induced liver injury in humans: are we there yet?” Biomarkers Med. 8(2), 201-213, 2014 [2] Sanz F, Pognan F, Steger-Hartmann T, Díaz C “Legacy data sharing to improve drug safety assessment: the eTOX project” Nat. Rev. Drug Discov. 16(12), 811-812, 2017 [3] Carbonell P, Lopez O, Amberg A, Pastor M, Sanz F “Hepatotoxicity Prediction by Systems Biology Modeling of Disturbed Metabolic Pathways using Gene Expression Data” ALTEX 34(2), 219-234, 2017 [4] López-Massaguer O, Pinto-Gil K, Sanz F, Amberg A, Anger LT, Stolte M, Ravagli C, Marc P, Pastor M “Generating modelling data from repeat-dose toxicity reports” Toxicol. Sci., 1–14, 2017 [5] Enoch SJ, Ellison CM, Schultz TW, Cronin MT “A review of the electrophilic reaction chemistry involved in covalent protein binding relevant to toxicity” Crit. Rev. Toxicol. 41(9), 783-802, 2011 [6] Kalgutkar AS, Gardner I, Obach RS, Shaffer CL, Callegari E, Henne KR et al. „A comprehensive listing of bioactivation pathways of organic functional groups” Curr. Drug Metab. 6(3), 161-225, 2005 [7] Mulliner D, Schmidt F, Stolte M, Spirkl HP, Czich A, Amberg A. “Computational Models for Human and Animal Hepatotoxicity with a Global Application Scope” Chem. Res. Toxicol. 29(5), 757-767, 2016

Quinoline Derek alert 685

PP = 78% (14/18 positive)

Thiopene Kalgutkar PP = 37%

(15/41 positive)

Phenoxyacetic acid Derek alert 690

PP = 53% (10/19 positive)

Aromatic sulphonamide

Kalgutkar PP = 9.5%

(2/21 positive)

LOEL Boxplot summary

Quinoline

Kalgutkar [6] Enoch/Cronin [5]

Alert source Derek Nexus 5.0.2

• Translational assessment • preclinical vs. clinical

predictivity

metabolic soft spot analysis

(MetaSite)

Tienilic acid

(6/21)

(31/34) (12/17)

(5/18)

(positives/total)

Statistical structural fragment / fingerprint based models • Model approach (www.leadscope.com)

• Use of predefined fragments for statistical correlation to toxicological activity from training dataset

• Model output • Predicted probability (0-100%) for model endpoint • Information in / out of domain of training data • Individual structural fragments analysis • Link to the respective training data

QSAR - Molecular descriptor based machine learning models • QSAR Modeling approach A

• Cubist c5.0 Decision Tree model

• Molecular descriptors • MOE, CATS, Crippen packages

• Model output • Classification model (50-60 rules):

positive / negative for respective model endpoint

• Validation results • External validation using Sanofi’s confidential data

(n=66)

Systems biology models

Model training data 77 overlapping compounds extracted from

Modeling approach on the example statins

• Validation results • Internal validation (test dataset: 4% of original dataset)

80.2 79.2 81.3 77.7 73.3 80.6 78.5 78.8 78.1 78.8 80.3 76.8

2nd level cluster

1st level cluster

74.3 72.4 76.9 79.8 75.0 82.8 78.1 75.0 79.8 79.4 78.1 80.3 80.6 81.7 79.3 78.4 78.0 79.2

78.0 80.0 77.0

out of domain

predictions 20 29

85.0 81.0 88.0

• Results [3] • Most frequent disturbed

metabolic pathways: mitochondrial β-oxidation of fatty acids and amino acid metabolism

• ROC AUC up to 0.9 for some hepatotox endpoints

• Quantitative assessment • use of LOEL of finding, box plot analysis

• Implementation of mechanistic information • identification of responsible reactive metabolite • likelihood for reactive metabolite formation by analysis

of metabolic soft spots using MetaSite (www.moldiscovery.com/software/metasite)

• Identification of new structural alerts • Fragment analysis by Leadscope

• Fragment analysis by SARpy (www.vegahub.eu/portfolio-item/sarpy)

• Molecular descriptors • Adriana, GRIND2

descriptors using AdrianaCode and Pentacle software

• Model output • Qualitative and

quantitative scores

• Validation results [4] • Qualitative models DEG, INF, PRO

CASE STUDY

structural feature contribution

R

R

R

Training compounds

Training compounds

Training compounds a) quantitative assessment

Training compounds

Training compounds

External validation results: rat liver necrosis (NEC)

• QSAR Modeling approach B • PLS-R/ PLS-DA (Partial Least Square Regression/

Discriminant Analysis), RF-R/RF-C (Random Forest Regressor/

Classifier)

• Validation results [4] • Quantitative model PRO

• Aim: increase mechanistic understanding: pathway, cell, tissue, organ level

Endpoint (mg/kg) Number of points 50 100 500

Hepatotoxicity 14 0.89 0.89 0.54

1. Clinical chemistry findings 11 0.92 0.92 0.83

1.1 Hepatobiliary injury 3 1 1 0.5

1.2 Hepatocellular injury 10 0.90 0.90 0.81

2. Morphological findings 8 1.0 1.0 -

2.1 Hepatobiliary injury 1 - - -

2.2 Hepatocellular injury 8 1.0 1.0 -

Bioactivation of Thiophene [6]

Pyrazole PP = 63% (10/16 positives)

Triflourmethylbenzene PP = 61% (40/61 positives)

Acetamidobenzene PP = 51% (26/51 positives)

Halogenated benzylamine PP = 73% (24/33 positives)

in total 78

primary terms

b) Metabolic activation / toxicification step

SME = Small Medium Enterprises

200

100

0 Dos

e [m

g/kg

/day

]

Dog

Monkey

Rat

12/17 1/4 1/2

1,000

100

10

1

LOEL

Dog Monkey

Rat . . .

Median . . . . . . . . . . .

.

. . .

Dos

e [m

g/kg

/day

]

Tested Doses

Compound 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

3) Grouping of histopathology terms - to cluster of similar findings (and mechanism)

Dos

e [m

g/kg

/day

]

300

100

0

200

400

500

600 5 / 10 2 / 3

Dog

Rat

In Silico Models Cluster level

No. training compounds Positives1 Negatives2

Necrosis NEC 1st 427 229 198

Steatosis STE 1st 406 208 198

Inflammation IFM 1st 322 124 198

Infiltration IFT 1st 302 104 198

Proliferation PLF 1st 345 147 198

Hyperplasia HYP 1st 439 241 198

Hypertrophy HYT 1st 652 455 197

Degenerative lesions DEG 2nd 602 355 247

Inflammatory changes INF 2nd 412 165 247

Non-neoplastic proliferative changes PRO 2nd 558 311 247

Training datasets • Histopathology data oral rat studies

up to 2 years used for in silico models

1Positives: Compounds with findings in respective endpoint 2Negatives: Compounds without liver findings in histopathology, clinical chemistry, liver weight and tested up to 1000mg/kg

Hepatotoxicity: eTOX (preclinical)

Degenerative lesions

Necrosis

Inflammatory changes

Steatosis Inflammation Infiltration

2nd level cluster

Non-neoplastic proliferative changes

1st level cluster Proliferation Hyperplasia … … … Hypertrophy

S

O

Cl

ClO

OHO

S

OR

metabolicactivation/toxification

NNH

R

NH

Cl

NH

F

R R

NH

OR

R

F FF

RN

R

OOH

O

R

SRS NO

O R

R

R

Recommended