1. Introduction
2. Regulatory application of
in silico genotoxicity
predictions
3. Chemical similarity
4. Applicability domains
5. Expert opinion
Review
Latest advances in computationalgenotoxicity predictionRussell T Naven, Nigel Greene† & Richard V Williams†Compound Safety Prediction Group, Worldwide Medicinal Chemistry, Pfizer Worldwide Research
and Development, Groton, CT, USA
Introduction: Computational approaches for genotoxicity prediction have
existed for over two decades. Numerous methodologies have been utilized
and the results of various evaluations have published.
Areas covered: In silico methods are considered mature enough to be part of
draft FDA regulatory guidelines for the assessment of genotoxic impurities.
However, aspects of how best to use predictive systems remain unresolved:
i) methodologies to measure how similar two compounds need to be in
order to assume they have the same biological outcome; and ii) defining
whether a compound is close enough to the model training set such that a
model prediction can be considered reliable.
Expert opinion: In silico prediction of genotoxicity is a fundamental part
of screening strategies for the assessment genotoxic impurities in drug
products. However, the concept of using chemical similarity to infer muta-
genic potential from one of known activity to another whose activity is
unknown remains a scientific challenge. Similarly, defining when an in silico
model prediction can be considered to be reliable is also difficult. Reaction
mechanisms and the functional group building blocks of chemistry are pretty
much constant, and so when data-gaps appear, it tends to be for compounds
that have been regularly used but rarely tested.
Keywords: applicability domains, chemical similarity, derek for windows, genotoxic impurities,
ICH M7, leadscope model applier, MC4PC, toxtree
Expert Opin. Drug Metab. Toxicol. (2012) 8(12):1579-1587
1. Introduction
The in silico prediction of genotoxicity has been in existence for over two decadesbecoming a major research focus with the publication of structural alerts forDNA reactivity from Ashby and Tennant [1]. The early accessibility of large datasets in the public domain [2-5] have resulted in a measure of success in these predic-tive approaches, in particular for the prediction of the Ames Salmonella assay formutagenicity [6,7].
The use of any in silico system or model is limited by both the accuracy ofthe predictions and the confidence in those predictions but such accuracy and con-fidence is context dependent. For example, the Ames test is probably the mostaccurate surrogate assay for genotoxic carcinogenicity [8], yet there is little confi-dence that non-genotoxic carcinogens will display activity in this in vitro assay.In silico models for the prediction of genotoxicity have to contend with the similarissues. In their case, accuracy can be generally considered a property of the system,and confidence, also known as trust or reliability, assigned to individual predictions.
The accuracy of in silico toxicity predictions is typically measured through inter-nal and external validation of the model using data sets of known experimentalactivity. Internal validation is used during development to show that statistically-derived models are robust, but provide little information about their ability topredict the activity of compounds outside of the training set [9,10]. External
10.1517/17425255.2012.724059 © 2012 Informa UK, Ltd. ISSN 1742-5255 1579All rights reserved: reproduction in whole or in part not permitted
Exp
ert O
pin.
Dru
g M
etab
. Tox
icol
. Dow
nloa
ded
from
info
rmah
ealth
care
.com
by
Que
en's
Uni
vers
ity o
n 05
/22/
13Fo
r pe
rson
al u
se o
nly.
validation is the gold standard method for evaluating modelperformance, but results have proved to be very data set andtherefore context dependent.Commercially available software packages such as
Derek for Windows (DfW) [11] now called Derek Nexus,MC4PC [12], and Leadscope Model Applier (LSMA) [13] arenow commonly used within the pharmaceutical industry forthe prediction of genotoxicity and other toxicological end-points. Other freely available systems like Toxtree [14] arealso being evaluated for its usefulness. Typically, thesecommercially or freely available predictive mutagenicity mod-els are able predict the activity of publically-available data setsto certain degree of accuracy, but this is not the case forproprietary pharmaceutical data [6,15]. The differences forthis have been explored, and can be related to chemotypedistribution [6,15], physicochemical properties and biologicalproperties [16].The commercial systems all have strengths and weaknesses
and their comparative performances have been extensivelyreviewed and published [6,15,17,18] with no single system per-forming significantly better than the others. Depending onthe source of the data being used to evaluate a system’s perfor-mance, e.g. public domain data or a set of proprietarypharmaceutical-like compounds, the overall concordance ofthese algorithms is between 70 and 85%. These values are infact getting close to the inter- and intra-laboratory reproduci-bility of the Ames assay, reported as 87% [19]. However, asystem’s sensitivity, i.e., its ability to accurately predict anAmes positive compound, can vary much more dramaticallyfrom up to 85% for public domain sets to just 17% forpharmaceutical proprietary data [15]. This variability in perfor-mance most likely results from most software applicationsbeing trained using only the public data sets and that mostpharmaceutical-like compounds, i.e., the active pharma-ceutical ingredients in drug products, tend not to contain
the classical DNA-reactive functional groups that are acommon cause of genotoxicity. It is possible that thesepharmaceutical compounds either undergo rare or unusualmetabolic activation and hence are not obviously reactive inof themselves, or they elicit a positive response in the Amesassay through non-reactive mechanisms such as inter-calation [20]. It should also be noted that the ratio of positiveto negative compounds are significantly lower with thepharmaceutical-like sets where typically only 6 -- 10% areconsidered to be mutagenic in the Ames assay compared to40 -- 60% in the case of some public sets and so maintainingan appropriate balance between correct positives and falsepositives becomes a key challenge for any algorithm.
2. Regulatory application of in silicogenotoxicity predictions
In parallel with the development of regulatory guidelineson the limits for genotoxic impurities (GTIs) [21,22] thepharmaceutical industry has developed strategies to helpwith the identification, monitoring and control of theseGTIs [23]. One published strategy uses a 5 level categorizationapproach to address the appropriate level of concern associ-ated with a GTI and uses structural alerts for mutagenicityas part of the process. The five categories suggested are shownin Table 1. In this scheme, compounds that have not yet beentested in the Ames assay can be evaluated using an in silicosystem for the presence of structural alerts for mutagenicityand classified accordingly.
In the categorization scheme below, chemical similarity isused to identify GTIs that contain an alert that are consideredto be closely related to the API, Class 4 in Table 1. However,this scheme presents the dilemma of how to define what issimilar and what is not. This issue of defining chemicalsimilarity will be discussed later.
Since there are multiple commercially available systems,typical implementations of this strategy can include one ormore of these in silico systems as part of a screening cascadeto identify potential GTIs [24]. In the context of GTIs, vali-dation studies have yielded results with a sensitivity of48 -- 82%, specificity of 47 -- 97% and concordance of59 -- 92% depending on the system and data set (Table 2),with expert systems especially showing high Sensitivity andNegative Predictive Values (NPV) [24-26]. It should be notedthat the Derek system does not actually predict a negativeresponse for any endpoint but in most evaluation exercisespublished to date the authors have assumed that a “nothingto report” call from the system is equivalent to a negativeprediction.
Glowienke and Hasselgren concluded that both Derekand Multicase covered most of the compound types in thisintermediates data set. Similarly, a recent study on LeadscopeModel Applier indicated that the chemical space populated bythe training set overlapped significantly with that of a drugimpurity database, although no prediction metrics were
Article highlights.
. In silico models for genotoxicity prediction have been inexistence for over two decades and many evaluations oftheir predictive accuracy have been published.
. Model performance is often data-set dependent but forgenotoxic impurities they display good coverage for datasets of pharmaceutical intermediates.
. In silico predictions can be improved through the use ofexpert interpretation of model output, whereas theeffect of combining models is variable.
. Draft regulatory guidelines for genotoxic impurityassessment include the use of computational approachesfor genotoxicity prediction.
. Computationally derived measures of chemical similaritydo not perform well for inferring mutagenic activity.
. Applicability domain of models are important toascertain but there is a data gap on the effect of theseon model performance.
This box summarises key points contained in the article.
R. T. Naven et al.
1580 Expert Opin. Drug Metab. Toxicol. (2012) 8(12)
Exp
ert O
pin.
Dru
g M
etab
. Tox
icol
. Dow
nloa
ded
from
info
rmah
ealth
care
.com
by
Que
en's
Uni
vers
ity o
n 05
/22/
13Fo
r pe
rson
al u
se o
nly.
generated for this particular data set [13]. This is in accordancewith the distribution of chemotypes between model trainingand intermediates data sets bearing in mind that most drugimpurities will stem from solvents, reagents and intermediatesused in the chemical synthesis of the final active pharmaceuti-cal ingredient. For example, several studies have found thatthe most prevalent alerts for drug impurities were aromaticamines, aromatic nitro compounds and alkylating agents [25,26]which is similar to the chemotypic distribution for publicallyavailable data [15,27].
More recently, proposals for the International Conferenceon Harmonization (ICH) M7 guidance documents may wellinclude the adoption of in silico methods as part of theinternational guidelines on the assessment and limits forGTIs although no official documentation has been publishedyet. Current industry strategies for the use of in silico systemsas part of a screening cascade are suggesting that if a com-pound is assessed by one or more in silico methods inaddition to using expert judgment to review the underlyingdata and predictions, then a compound predicted to be nega-tive can be assumed to non-mutagenic under these guidelines.This strategy suggests that negative predictions require noadditional scrutiny or testing, therefore the ability of anysystem to adequately identify negative compounds needs tobe understood and so more recent evaluations have focusedon the negative predictive value (NPV), i.e., the percentageof correct negative calls compared to the total number ofnegative calls [24].
In a survey of current practices across 8 pharmaceuticalcompanies, Dobo et al. report that using (Q)SAR systems
alone without any human expert intervention resulted in anaverage NPV of 94% and where expert opinion was incorpo-rated into the assessment this rose to 99%. The modelsthemselves are able to assist in this process. Statistically-derived, data-driven models generally support predictionswith a list of the most similar compounds from their trainingsets, which at least provide the context for the prediction.Derek provides a fraction of this data directly, as examplecompounds, and the remainder indirectly which can beaccessed through the references used to support the alert.Derek also provides a summary of the toxicity data, mechanis-tic rationale and explanation of the SAR derivation. However,it should be noted that in their evaluation, Dobo et al. [24].make the assumption that most of the compound sets usedcontained a representative percentage of Ames positives andthat the NPV values reported were significantly better thanif they were simply to predict all compounds to be negative.
It is also worth noting that all companies involved in thissurvey used at least one system and the most common ofwhich was the Derek system from Lhasa Limited. The surveyresults also suggest that the number and specific combinationof in silico tools used did not have a significant influence onthe NPV although a rigorous exploration of one system versustwo was not conducted on the data sets. Similarly, combiningthe output of two or more predictive systems did not signi-ficantly improve performance for an earlier study with adata set of intermediates [26] or food additives [28].
In contrast, other published evaluations of in silico systemshave suggested that there are gains in sensitivity and specificityfrom the use of more than one system however the degree of
Table 1. Categories of genotoxic impurities in pharmaceutical products.
Category Description
Class 1 Impurities known to be both genotoxic (mutagenic) and carcinogenicClass 2 Impurities known to be genotoxic (mutagenic), but with unknown carcinogenic potentialClass 3 Alerting structure, unrelated to the structure of the Active Pharmaceutical Ingredient (API) and of
unknown genotoxic (mutagenic) potentialClass 4 Alerting structure, related to the APIClass 5 No alerting structure or sufficient evidence for absence of genotoxicity
Table 2. Performance metrics for commercial systems when applied to GTI data sets.
System Data set composition and reference Sensitivity* Specificity‡ NPV Concordance§
DfW (multiple versions) 48 positives, 224 negatives;Dobo et al. (2006) [25]
68% 97% 94% 92%
DfW 11.0 95 positives, 178 negatives;Glowienke and Hasselgren [26]
82% 47% 83% 59%
MC4PC 2.0, AZ3 (1.9) 95 positives, 178 negatives;Glowienke and Hasselgren [26]
48% 75% 73% 66%
Derek 2011 159 positives, 495 negatives;Proprietary intermediates data set
70% 73% 88% 72%
*Sensitivity = correctly predicted positives/total number of positives.zSpecificity = correctly predicted negatives/total number of negatives§Concordance = percentage of correctly predicted compounds.
Latest advances in computational genotoxicity prediction
Expert Opin. Drug Metab. Toxicol. (2012) 8(12) 1581
Exp
ert O
pin.
Dru
g M
etab
. Tox
icol
. Dow
nloa
ded
from
info
rmah
ealth
care
.com
by
Que
en's
Uni
vers
ity o
n 05
/22/
13Fo
r pe
rson
al u
se o
nly.
coverage, i.e., the number of compounds that can bepredicted, drops and so decreases the utility of theseapproaches [15,18,29,30]. However, the data published in theseevaluations do not include expert opinion as part of theprocess and so it cannot be determined whether one in silicosystem plus expert opinion would have yielded similar orbetter predictive performances without such a significantdrop in coverage. If a combination of systems is used, thenmethods are available for deriving a measure of confidencein the consensus prediction by using methods such as distanceto model [31].The validation analyses carried out to date confirm the
utility of in silico models for the assessment of GTIs. Thisshould lead to increased confidence in the predictions, espe-cially when expert knowledge is used to put those predictionsinto context. Confidence in individual predictions should befurther increased by assessing whether the query compoundsis within the applicability domain (AD) of the model.
3. Chemical similarity
Similarity is multi-dimensional concept and the similaritybetween two compounds can be difficult to determine andeven more challenging to create a set of guidelines for. Forinstance, compounds 1 and 2 in Figure 1 are similar in that
they both have the same molecular formulae (C6H5NO2),yet their atom connectivity bears little resemblance; theyhave different aromaticity, physicochemical properties andmost importantly, probably dissimilar biological properties.
On the other hand, glucose 3 and galactose 4 in Figure 1
appear almost structurally identical yet from a pharmaco-logical perspective, these compounds are structurally distinctfrom each other. While much effort has been focused onhow to measure the structural similarity between twocompounds, the more pertinent question is: how relevant isthe concept of structural similarity to toxicological endpointscurrently under analysis.
The more dependant a toxicological endpoint is on struc-turally distinct toxicophores, such as mutagenicity or theuncoupling of oxidative phosphorylation, the less-applicablethe concept of similarity becomes. This is because minormodifications to the toxicophore can significantly influencetoxicological activity, yet major modifications to the structuralperiphery may have little impact on activity so long as thetoxicophore remains intact. When assessing the relevance ofpositive predictions in these cases, it is not enough to askhow similar the query compound is to other non-mutageniccompounds, but to identify the mutagenicity-attenuatingfeatures of structurally-alerting, non-mutagenic compoundsand to assess if these features can be safely and accuratelyextrapolated to the query compound.
Such an assessment is crucial to correctly identifying therelevance of positive predictions for impurities related toparent drugs that are known to be non-mutagenic. Forinstance, in Figure 2 acebutolol (1) is a non-mutagenic drugthat contains the alerting substructure of an aromatic amine(in bold) masked as an amide. With respect to the free-aniline 2 (i.e., without the amide group labeled * in 1) wecannot be certain that it is also devoid of mutagenic activity,despite the obvious similarity. This is because the mutagenicmechanism of aromatic amines is postulated to require thefree aniline and 1 could theoretically be inactive due to themetabolically-resistant amide functionality. In the absence ofevidence to suggest that the biotransformation exists to asignificant extent in Salmonella bacteria, the free aniline
O
CH2OH
OH
OH
OH
OH
O
CH2OH
OH
OH
OH
OH
N+
−O
O H2N O
O
1
Nitrobenzene 4-amino-2-benzo-quinone
Glucose Galactose
2 3 4
Figure 1. Selected examples of similar compounds.
NH2
O
OHN
OH
HN
O
O
O
nPr
HN
OH
1 2
Acebutolol
*
Figure 2. Acebutolol and a structurally-related compound.
R. T. Naven et al.
1582 Expert Opin. Drug Metab. Toxicol. (2012) 8(12)
Exp
ert O
pin.
Dru
g M
etab
. Tox
icol
. Dow
nloa
ded
from
info
rmah
ealth
care
.com
by
Que
en's
Uni
vers
ity o
n 05
/22/
13Fo
r pe
rson
al u
se o
nly.
should therefore be tested if we are to have confidence in itsnon-mutagenic activity.
How far we can extrapolate from 2 to other impuritiesprimarily depends upon the reason for attenuation of muta-genic activity of the alerting aromatic amine and notstructural similarity. Given that the mechanism of aromaticamine mutagencity is believed to involve oxidation and subse-quent Phase-II metabolism of the amino group, there couldbe at least two mitigating factors. These are unfavorableelectronic properties of the alkoxy-carbonyl-substituted arom-atic ring and/or unfavorable electrostatic/metabolic propertiesof the hydroxyl-amino side chain, both of which may pre-clude one or more of the metabolic steps that lead to success-ful activation of the aromatic amine. We cannot extrapolatewith confidence, therefore, into areas of chemical space wherethere are significant changes to the electronics of the ringsystem because it may counter the currently-unfavorableelectronic properties, or similarly, to analogues where welose the hydroxyl-amino side chain. We can be confidentthat the non-mutagenic activity of 2 can be extrapolated tothe t-butyl analogue (3 in Figure 3) owing to the minormodification to the side chain, but where changes occur in
the aromatic ring (as in 4 in Figure 3) or in the side chain(as in 5 in Figure 3), further evidence is required before wecan have robust confidence in their lack of mutagenic activity.
Our confidence in the dismissal of positive predictions formutagenicity must therefore be based on three factors:
1) Identification of the toxicophore and mechanism ofmutagenicity associated with the prediction
2) Determination of themutagenicity-attenuating feature(s)associated with relevant non-mutagenic compounds usedto support the assessment (the parent compound in thecase of drug-related impurities).
3) An assessment showing that the features identified in2 are still intact for the query compound.
In the instance where compounds do not contain astructural alert for mutagenicity, it is often merited to searchthe public literature to ensure that there are no potentialknowledge gaps in the predictive system used. In these cases,the use of whole-molecule similarity measures is a usefulway to infer similar mutagenic activity. However, care shouldbe taken to ensure that any searches for similar compounds do
NH2
O
O NH
OH
Cl
NH2
O
O
NH2
O
O NH
OH
3 4 5
Figure 3. Additional structurally-related compounds to acebutolol.
OH
HO
O
HO
HO
OH
OH
O
OHO
DNA
O
Figure 4. Proposed Mechanism for DNA adduct formation of Ptaquiloside.
Latest advances in computational genotoxicity prediction
Expert Opin. Drug Metab. Toxicol. (2012) 8(12) 1583
Exp
ert O
pin.
Dru
g M
etab
. Tox
icol
. Dow
nloa
ded
from
info
rmah
ealth
care
.com
by
Que
en's
Uni
vers
ity o
n 05
/22/
13Fo
r pe
rson
al u
se o
nly.
not return compounds that are mutagenic through a mecha-nism or functional groups not available to the query com-pound. While these examples may be considered structurallysimilar, they should not be used as comparators since theyhave the potential to form DNA adducts whereas compoundsthat are devoid of these reactive functional groups do not havethis inherent liability.
4. Applicability domains
There are many methods that may be used to define a modelAD, which have been reviewed [32-34]. The AD of a model canbe broadly described using two non-exclusive terms:
1) The region of chemical or response space relating to themodel training set, and
2) The region of chemical or response space where amodel makes an acceptable prediction error.
In the first definition (i), the underlying assumption is thatpredictions based on interpolations are generally reliable andthose based on extrapolations are more likely to be unreliable.This premise is supported by several publications [35-37]. Gen-erally, this is considered to be best achieved using the modeldescriptors [38,39] but comparable results can be gained fromdescriptors not used in the model [35,37].The second definition (ii) is more finessed, and is based on
the premise that valuable information can be generated byassessing well predicted compounds, whereas a subset of anytraining set will be misclassified and similarity to such com-pounds is no guarantee of reliability [32,36]. Further, predic-tions for compounds that are dissimilar to the training setare not, by default, in error [32].Many of the commonly used models for the prediction of
mutagenicity provide an AD measure alongside any predic-tions [40]. In most cases, though, the exact definition is uniqueto the model, which is consistent with the concept that there
is no single method for AD application, and this must be fit-ted to the model [32]. The partial exceptions to this are theexpert systems Derek and Toxtree. Although it is acceptedthat the scope of the structural alerts in these systems definestheir AD [38], this provides little information to the userwhen an alert is not matched. In the case of Derek, investiga-tions are underway to provide such information to the user, sothat a confidence metric can be assigned whatever the outputof the model. Such expert systems do not have a straightfor-ward and easily-accessible model training set, as this is basedon disparate data sources, for example toxicity data, mechanis-tic information and chemical knowledge, that have been syn-thesised into a SAR in cerebro. Further, not all data ispublically-available, thus, an approach is required that canreflect this expert knowledge and also does not require a com-plete model training set.
It’s worth comparing the literature research with the cur-rent use-case of predictive models for the assessment ofGTIs. Most of the studies have used been trained to reducethe error in continuous output QSARs based on data wherethe assay data provides homogeneous responses (e.g. polarnarcosis, LogP or IC50 for protein inhibition). Thus, giventhe context-specific use of models, there is a data-gap on theapplicability of ADs to categorical models based on assayswhich generate a more diverse range of outputs. There aresome exceptions to this [33,36,41], but the results of these onlyshowed that there was value in using AD to qualify confidencein positive, rather than negative, predictions. This is perhapsrelated to the difficulty in deriving a prediction of inactivity,which is in effect predicting the effects of the absence of allfeatures that are likely to bestow activity, and is a commonissue in the usage of most QSARs.
5. Expert opinion
The current state of the art for the in silico prediction ofgenotoxicity, or more specifically Ames mutagenicity, is
O
O
N
N
O
O
O
N
N
OHO
NH2N
N
Figure 5. CC-1065, an Antitumor Antibiotic.
R. T. Naven et al.
1584 Expert Opin. Drug Metab. Toxicol. (2012) 8(12)
Exp
ert O
pin.
Dru
g M
etab
. Tox
icol
. Dow
nloa
ded
from
info
rmah
ealth
care
.com
by
Que
en's
Uni
vers
ity o
n 05
/22/
13Fo
r pe
rson
al u
se o
nly.
at a maturity where it is now being proposed as part of ascreening strategy for ICH Guidelines dealing with the assess-ment and control of genotoxic impurities in drug products.However, there remain two fundamental scientific challenges,that should a universal solution be found, would greatlyenhance the predictive power and application of in silicosystems. These are specifically the concept of chemicalsimilarity in the context of inferring mutagenic potentialfrom one of known activity to another whose activity isunknown, and the second is being able to define when anin silico model prediction can be considered to be reliable.These two issues are in fact closely linked as understandingwhen compound is chemically similar to a knowncompound will ultimately bring a better definition ofapplicability domains.
By way of example, Ptaquiloside is a naturally occurringsubstituent of bracken and has been shown to be genotoxicand carcinogenic. The mechanism of genotoxicity has beensuggested as that shown in Figure 4 where the cyclopropylring opens under metabolic activation and reacts with DNA.Under normal assay conditions, however, ptaquiloside is notmutagenic in the Ames assay.
This cyclopropyl ring opening mechanism was laterattributed to the mutagenic activity of CC-1065 and otherpro-aryl spirocyclopropanes [42]. The structure of CC-1065shown in Figure 5 would not be considered to be similar toptaquiloside by most current similarity measures howevertheir mechanism of action for genotoxicity is highly similar.Similarly, current methods for applicability domains wouldnot allow for coverage of one based on inclusion of the otherin a given training set.
In the case above, expert systems that can incorporateknowledge of chemical mechanisms will have a distinct
advantage over statistical or similarity based algorithmswhen predicting the activity of these compounds, yet but forthe same reason, the applicability domain of an expert systemwill be challenging to define accurately. Additional, or morerigorous, applicability domain measures will restrict the scopeof predictions, which may be an issue, e.g. may they improvethe predictivity for compounds considered within thedomain, but at the expense of compounds outside thedomain, which will presumably have to be tested.
Expert systems may also suffer from the complexity of fullydefining the subtle steric and electronic effects of substituentson the reactivity of functional groups. However, this limita-tion may also be present in statistical algorithm-based systemsif there is inadequate coverage in the training set or if thedescriptors used do not adequately capture these differences.Assessment of the chemical structure in question by a humanexpert will therefore enhance our ability to correctly assignactivity based on chemically similar compounds.
Finally, whilst chemical space will expand over time, evenwithin a single research project in a single pharmaceuticalcompany [35], the reaction mechanisms and hence the func-tional group building blocks of chemistry stay pretty muchconsistent, and generally when data-gaps appear, this tendsto be for compounds, such as boronic acids, which havebeen regularly used but rarely tested [43]. In this case, suchdata can be shared and incorporated into models that willpredict their activity thus demonstrating the value in recentinitiatives on data sharing across the pharmaceutical industry.
Declaration of interest
The authors state no conflict of interest and have received nopayment in preparation of this manuscript.
Latest advances in computational genotoxicity prediction
Expert Opin. Drug Metab. Toxicol. (2012) 8(12) 1585
Exp
ert O
pin.
Dru
g M
etab
. Tox
icol
. Dow
nloa
ded
from
info
rmah
ealth
care
.com
by
Que
en's
Uni
vers
ity o
n 05
/22/
13Fo
r pe
rson
al u
se o
nly.
BibliographyPapers of special note have been highlighted as
either of interest (�) or of considerable interest(��) to readers.
1. Ashby J, Tennant RW. Definitive
relationships among chemical structure,
carcinogenicity and mutagenicity for
301 chemicals tested by the U.S. NTP.
Mutat Res 1991;257(3):229-306
2. Haworth S, Lawlor T, Mortelmans K,
et al. Salmonella mutagenicity test results
for 250 chemicals. Environ Mutagen
1983;5(Suppl 1):1-142
3. Mortelmans K, Haworth S, Lawlor T,
et al. Salmonella mutagenicity tests: II.
Results from the testing of
270 chemicals. Environ Mutagen
1986;8(Suppl 7):1-119
4. Zeiger E, Anderson B, Haworth S, et al.
Salmonella mutagenicity tests: IV. Results
from the testing of 300 chemicals.
Environ Mol Mutagen
1988;11(Suppl 12):1-157
5. Zeiger E, Anderson B, Haworth S, et al.
Salmonella mutagenicity tests: III. Results
from the testing of 255 chemicals.
Environ Mutagen 1987;9(Suppl 9):1-109
6. Naven RT, Louise-May S, Greene N.
The computational prediction of
genotoxicity. Expert Opin Drug
Metab Toxicol 2010;6(7):797-807.. A useful summary of the models used
for genotoxicity prediction in the
pharmaceutical industry
7. Lynch AM, Sasaki JC, Elespuru R, et al.
New and emerging technologies for
genetic toxicity testing.
Environ Mol Mutagen
2011;52(3):205-23
8. Kirkland D, Aardema M, Henderson L,
Muller L. Evaluation of the ability of a
battery of three in vitro genotoxicity tests
to discriminate rodent carcinogens and
non-carcinogens I. Sensitivity, specificity
and relative predictivity. Mutat Res
2005;584(1-2):1-256
9. Gramatica P. Principles of QSAR models
validation: internal and external.
QSAR Comb Sci 2007;26(5):694-701. Evidence is presented which highlights
the importance of external validation
of computational approaches for
toxicity prediction
10. Tropsha A, Gramatica P, Gombar VK.
The Importance of Being Earnest:
validation is the Absolute Essential for
Successful Application and Interpretation
of QSPR Models. QSAR Comb Sci
2003;22(1):69-77
11. Marchant CA, Briggs KA, Long A. In
silico tools for sharing data and
knowledge on toxicity and metabolism:
derek for windows, meteor, and vitic.
Toxicol Mech Methods
2008;18(2-3):177-87
12. Saiakhov RD, Klopman G. Benchmark
performance of MultiCASE Inc. software
in Ames mutagenicity set. J Chem
Inf Model 2010;50(9):1521
13. Valerio LG Jr, Cross KP.
Characterization and validation of an
in silico toxicology model to predict the
mutagenic potential of drug impurities*.
Toxicol Appl Pharmacol
2012;260(3):209-21
14. Benigni R, Bossa C, Tcheremenskaia O,
Giuliani A. Alternatives to the
carcinogenicity bioassay: in silico
methods, and the in vitro and in vivo
mutagenicity assays. Expert Opin Drug
Metab Toxicol 2010;6(7):809-19
15. Hillebrecht A, Muster W, Brigo A, et al.
Comparative evaluation of in silico
systems for ames test mutagenicity
prediction: scope and limitations.
Chem Res Toxicol 2011;24(6):843-54.. Describes the predictivity of the most
commonly used mutagenicity models
for several data sets and identifies areas
for improvement
16. McCarren P, Bebernitz GR, Gedeck P,
et al. Avoidance of the Ames test liability
for aryl-amines via computation.
Bioorg Med Chem 2011;19(10):3173-82
17. Greene N, Judson PN, Langowski JJ,
et al. Knowledge-based expert systems for
toxicity and metabolism prediction:
DEREK, StAR and METEOR.
SAR QSAR Environ Res
1999;10(2-3):299-314
18. Snyder RD, Pearl GS, Mandakas G,
et al. Assessment of the sensitivity of the
computational programs DEREK,
TOPKAT, and MCASE in the
prediction of the genotoxicity of
pharmaceutical molecules.
Environ Mol Mutagen
2004;43(3):143-58
19. Kamber M, Fluckiger-Isler S,
Engelhardt G, et al. Comparison of the
Ames II and traditional Ames test
responses with respect to mutagenicity,
strain specificities, need for metabolism
and correlation with rodent
carcinogenicity. Mutagenesis
2009;24(4):359-66
20. Snyder RD. Assessment of atypical
DNA intercalating agents in biological
and in silico systems. Mutat Res
2007;623(1-2):72-82
21. United States Food and Drug
Administration. Guidance for Industry,
Genotoxic and Carciniogenic Impurities
in Drug Substance and Drug Products:
recommended Approaches.
2008.Available from: http://www.fda.gov/
downloads/Drugs/Guidance
ComplianceRegulatoryInformation/
Guidances/ucm079235.pdf
22. European Medicines Agency. Guideline
on the Limits of Genotoxic Impurities.
2006.Available from: http://www.emea.
europa.eu/docs/en_GB/document_library/
Scientific_guideline/2009/09/
WC500002903.pdf
23. Muller L, Mauthe RJ, Riley CM, et al.
A rationale for determining, testing, and
controlling specific impurities in
pharmaceuticals that possess potential for
genotoxicity. Regul Toxicol Pharmacol
2006;44(3):198-211.. Introduces the concept of categories to
define levels of concern for GTIs
24. Dobo KL, Greene N, Fred C, et al. In
silico methods combined with expert
knowledge rule out mutagenic potential
of pharmaceutical impurities: an industry
survey. Regul Toxicol Pharmacol
2012;62(3):449-55. An evaluation of negative mutagenicity
prediction accuracy for GTIs across
several pharmaceutical companies and
the benefits of human interpretation
of results
25. Dobo KL, Greene N, Cyr MO, et al.
The application of structure-based
assessment to support safety and
chemistry diligence to manage genotoxic
impurities in active pharmaceutical
ingredients during drug development.
Regul Toxicol Pharmacol
2006;44(3):282-93. Highlights the role that predictive
systems can play in the safety
assessment of impurities in
drug development
26. Glowienke S, Hasselgren C. Use of
Structure Activity Relationship (SAR)
evaluation as a critical tool in the
evaluation of the genotoxic potential of
R. T. Naven et al.
1586 Expert Opin. Drug Metab. Toxicol. (2012) 8(12)
Exp
ert O
pin.
Dru
g M
etab
. Tox
icol
. Dow
nloa
ded
from
info
rmah
ealth
care
.com
by
Que
en's
Uni
vers
ity o
n 05
/22/
13Fo
r pe
rson
al u
se o
nly.
impurities, in genotoxic impurities:
strategies for identification and control.
John Wiley & Sons, Inc; Hoboken, NJ,
USA: 2011. p. 97-120. This is a review and validation of
mutagenicity models for
GTI assessment.
27. McCarren P, Springer C, Whitehead L.
An investigation into pharmaceutically
relevant mutagenicity data and the
influence on Ames predictive potential.
J Cheminform 2011;3:51.. This paper includes analyses of the
differences between publically-available
and proprietary pharmaceutical
data sets.
28. Ono A, Takahashi M, Hirose A, et al.
Validation of the (Q)SAR combination
approach for mutagenicity prediction of
flavor chemicals. Food Chem Toxicol
2012;50(5):1538-46
29. Pearl GM, Livingston-Carr S,
Durham SK. Integration of
computational analysis as a sentinel tool
in toxicological assessments. Curr Top
Med Chem 2001;1(4):247-55
30. White AC, Mueller RA, Gallavan RH,
et al. A multiple in silico program
approach for the prediction of
mutagenicity from chemical structure.
Mutat Res 2003;539(1-2):77-89
31. Sushko I, Novotarskyi S, K€orner R, et al.
Applicability domains for classification
problems: benchmarking of distance to
models for Ames mutagenicity set.
J Chem Inf Model
2010;50(12):2094-111
32. Dragos H, Gilles M, Alexandre V.
Predicting the predictability: a unified
approach to the applicability domain
problem of QSAR models. J Chem
Inf Model 2009;49(7):1762-76.. This includes a introduction to the
concept of applicability domains and
sets some general principles for
their development.
33. Ellison CM, Sherhod R, Cronin MT,
et al. Assessment of methods to define
the applicability domain of structural
alert models. J Chem Inf Model
2011;51(5):975-85
34. Hewitt M, Ellison CM. Developing the
applicability domain of in silico models:
relevance, importance and methodology.
In: Cronin MTD, Madden JC, editors.
In silico toxicology: principles and
applications. Royal Society of Chemistry;
Cambridge, UK: 2010. p. 301-33
35. Weaver S, Gleeson MP. The importance
of the domain of applicability in QSAR
modeling. J Mol Graph Model
2008;26(8):1315-26. Evidence for several models is
presented showing that as distance to
training set increases, model
accuracy decreases.
36. Kuhne R, Ebert RU, Schuurmann G.
Chemical domain of QSAR models from
atom-centered fragments. J Chem
Inf Model 2009;49(12):2660-9
37. Sheridan RP, Feuston BP, Maiorov VN,
Kearsley SK. Similarity to molecules in
the training set is a good discriminator
for prediction accuracy in QSAR.
J Chem Inf Comput Sci
2004;44(6):1912-28. This paper evaluates the performance
of applicability domains for assessing
prediction accuracy.
38. Netzeva TI, Worth A, Aldenberg T,
et al. Current status of methods for
defining the applicability domain of
(quantitative) structure-activity
relationships. The report and
recommendations of ECVAM Workshop
52. Altern Lab Anim 2005;33(2):155-73.. Provides a high-level summary of the
approaches that may be used for
applicability domain definition.
39. Jaworska J, Nikolova-Jeliazkova N,
Aldenberg T. QSAR applicabilty domain
estimation by projection of the training
set descriptor space: a review.
Altern Lab Anim 2005;33(5):445-59
40. Fioravanzo E, Bassan A, Pavan M, et al.
Role of in silico genotoxicity tools in the
regulatory assessment of pharmaceutical
impurities. SAR QSAR Environ Res
2012;23(3-4):257-77
41. Ellison CM, Enoch SJ, Cronin MT,
et al. Definition of the applicability
domains of knowledge-based predictive
toxicology expert systems by using a
structural fragment-based approach.
Altern Lab Anim 2009;37(5):533-45
42. Harbach PR, Zimmer DM, Mazurek JH,
Bhuyan BK. Mutagenicity of the
antitumor antibiotic CC-1065 and its
analogues in mammalian (V79) cells and
bacteria. Cancer Res 1988;48(1):32-6
43. O’Donovan MR, Mee CD, Fenner S,
et al. Boronic acids-a novel class of
bacterial mutagen. Mutat Res
2011;724(1-2):1-6
AffiliationRussell T Naven1, Nigel Greene†1 &
Richard V Williams2
†Author for correspondence1Compound Safety Prediction Group,
Worldwide Medicinal Chemistry,
Pfizer Worldwide Research and Development,
Eastern Point Road, Groton,
CT 06340, USA2Lhasa Ltd, 22-23 Blenheim Terrace,
Woodhouse Lane, Leeds,
LS2 9HD, UK
Latest advances in computational genotoxicity prediction
Expert Opin. Drug Metab. Toxicol. (2012) 8(12) 1587
Exp
ert O
pin.
Dru
g M
etab
. Tox
icol
. Dow
nloa
ded
from
info
rmah
ealth
care
.com
by
Que
en's
Uni
vers
ity o
n 05
/22/
13Fo
r pe
rson
al u
se o
nly.