Latest advances in computational genotoxicity prediction

1. Introduction

2. Regulatory application of

in silico genotoxicity

predictions

3. Chemical similarity

4. Applicability domains

5. Expert opinion

Review

Latest advances in computationalgenotoxicity predictionRussell T Naven, Nigel Greene† & Richard V Williams†Compound Safety Prediction Group, Worldwide Medicinal Chemistry, Pfizer Worldwide Research

and Development, Groton, CT, USA

Introduction: Computational approaches for genotoxicity prediction have

existed for over two decades. Numerous methodologies have been utilized

and the results of various evaluations have published.

Areas covered: In silico methods are considered mature enough to be part of

draft FDA regulatory guidelines for the assessment of genotoxic impurities.

However, aspects of how best to use predictive systems remain unresolved:

i) methodologies to measure how similar two compounds need to be in

order to assume they have the same biological outcome; and ii) defining

whether a compound is close enough to the model training set such that a

model prediction can be considered reliable.

Expert opinion: In silico prediction of genotoxicity is a fundamental part

of screening strategies for the assessment genotoxic impurities in drug

products. However, the concept of using chemical similarity to infer muta-

genic potential from one of known activity to another whose activity is

unknown remains a scientific challenge. Similarly, defining when an in silico

model prediction can be considered to be reliable is also difficult. Reaction

mechanisms and the functional group building blocks of chemistry are pretty

much constant, and so when data-gaps appear, it tends to be for compounds

that have been regularly used but rarely tested.

Keywords: applicability domains, chemical similarity, derek for windows, genotoxic impurities,

ICH M7, leadscope model applier, MC4PC, toxtree

Expert Opin. Drug Metab. Toxicol. (2012) 8(12):1579-1587

1. Introduction

The in silico prediction of genotoxicity has been in existence for over two decadesbecoming a major research focus with the publication of structural alerts forDNA reactivity from Ashby and Tennant [1]. The early accessibility of large datasets in the public domain [2-5] have resulted in a measure of success in these predic-tive approaches, in particular for the prediction of the Ames Salmonella assay formutagenicity [6,7].

The use of any in silico system or model is limited by both the accuracy ofthe predictions and the confidence in those predictions but such accuracy and con-fidence is context dependent. For example, the Ames test is probably the mostaccurate surrogate assay for genotoxic carcinogenicity [8], yet there is little confi-dence that non-genotoxic carcinogens will display activity in this in vitro assay.In silico models for the prediction of genotoxicity have to contend with the similarissues. In their case, accuracy can be generally considered a property of the system,and confidence, also known as trust or reliability, assigned to individual predictions.

The accuracy of in silico toxicity predictions is typically measured through inter-nal and external validation of the model using data sets of known experimentalactivity. Internal validation is used during development to show that statistically-derived models are robust, but provide little information about their ability topredict the activity of compounds outside of the training set [9,10]. External

10.1517/17425255.2012.724059 © 2012 Informa UK, Ltd. ISSN 1742-5255 1579All rights reserved: reproduction in whole or in part not permitted

Exp

ert O

pin.

Dru

g M

etab

. Tox

icol

. Dow

nloa

ded

from

info

rmah

ealth

care

.com

by

Que

en's

Uni

vers

ity o

n 05

/22/

13Fo

r pe

rson

al u

se o

nly.

validation is the gold standard method for evaluating modelperformance, but results have proved to be very data set andtherefore context dependent.Commercially available software packages such as

Derek for Windows (DfW) [11] now called Derek Nexus,MC4PC [12], and Leadscope Model Applier (LSMA) [13] arenow commonly used within the pharmaceutical industry forthe prediction of genotoxicity and other toxicological end-points. Other freely available systems like Toxtree [14] arealso being evaluated for its usefulness. Typically, thesecommercially or freely available predictive mutagenicity mod-els are able predict the activity of publically-available data setsto certain degree of accuracy, but this is not the case forproprietary pharmaceutical data [6,15]. The differences forthis have been explored, and can be related to chemotypedistribution [6,15], physicochemical properties and biologicalproperties [16].The commercial systems all have strengths and weaknesses

and their comparative performances have been extensivelyreviewed and published [6,15,17,18] with no single system per-forming significantly better than the others. Depending onthe source of the data being used to evaluate a system’s perfor-mance, e.g. public domain data or a set of proprietarypharmaceutical-like compounds, the overall concordance ofthese algorithms is between 70 and 85%. These values are infact getting close to the inter- and intra-laboratory reproduci-bility of the Ames assay, reported as 87% [19]. However, asystem’s sensitivity, i.e., its ability to accurately predict anAmes positive compound, can vary much more dramaticallyfrom up to 85% for public domain sets to just 17% forpharmaceutical proprietary data [15]. This variability in perfor-mance most likely results from most software applicationsbeing trained using only the public data sets and that mostpharmaceutical-like compounds, i.e., the active pharma-ceutical ingredients in drug products, tend not to contain

the classical DNA-reactive functional groups that are acommon cause of genotoxicity. It is possible that thesepharmaceutical compounds either undergo rare or unusualmetabolic activation and hence are not obviously reactive inof themselves, or they elicit a positive response in the Amesassay through non-reactive mechanisms such as inter-calation [20]. It should also be noted that the ratio of positiveto negative compounds are significantly lower with thepharmaceutical-like sets where typically only 6 -- 10% areconsidered to be mutagenic in the Ames assay compared to40 -- 60% in the case of some public sets and so maintainingan appropriate balance between correct positives and falsepositives becomes a key challenge for any algorithm.

2. Regulatory application of in silicogenotoxicity predictions

In parallel with the development of regulatory guidelineson the limits for genotoxic impurities (GTIs) [21,22] thepharmaceutical industry has developed strategies to helpwith the identification, monitoring and control of theseGTIs [23]. One published strategy uses a 5 level categorizationapproach to address the appropriate level of concern associ-ated with a GTI and uses structural alerts for mutagenicityas part of the process. The five categories suggested are shownin Table 1. In this scheme, compounds that have not yet beentested in the Ames assay can be evaluated using an in silicosystem for the presence of structural alerts for mutagenicityand classified accordingly.

In the categorization scheme below, chemical similarity isused to identify GTIs that contain an alert that are consideredto be closely related to the API, Class 4 in Table 1. However,this scheme presents the dilemma of how to define what issimilar and what is not. This issue of defining chemicalsimilarity will be discussed later.

Since there are multiple commercially available systems,typical implementations of this strategy can include one ormore of these in silico systems as part of a screening cascadeto identify potential GTIs [24]. In the context of GTIs, vali-dation studies have yielded results with a sensitivity of48 -- 82%, specificity of 47 -- 97% and concordance of59 -- 92% depending on the system and data set (Table 2),with expert systems especially showing high Sensitivity andNegative Predictive Values (NPV) [24-26]. It should be notedthat the Derek system does not actually predict a negativeresponse for any endpoint but in most evaluation exercisespublished to date the authors have assumed that a “nothingto report” call from the system is equivalent to a negativeprediction.

Glowienke and Hasselgren concluded that both Derekand Multicase covered most of the compound types in thisintermediates data set. Similarly, a recent study on LeadscopeModel Applier indicated that the chemical space populated bythe training set overlapped significantly with that of a drugimpurity database, although no prediction metrics were

Article highlights.

. In silico models for genotoxicity prediction have been inexistence for over two decades and many evaluations oftheir predictive accuracy have been published.

. Model performance is often data-set dependent but forgenotoxic impurities they display good coverage for datasets of pharmaceutical intermediates.

. In silico predictions can be improved through the use ofexpert interpretation of model output, whereas theeffect of combining models is variable.

. Draft regulatory guidelines for genotoxic impurityassessment include the use of computational approachesfor genotoxicity prediction.

. Computationally derived measures of chemical similaritydo not perform well for inferring mutagenic activity.

. Applicability domain of models are important toascertain but there is a data gap on the effect of theseon model performance.

This box summarises key points contained in the article.

R. T. Naven et al.

1580 Expert Opin. Drug Metab. Toxicol. (2012) 8(12)

Exp

ert O

pin.

Dru

g M

etab

. Tox

icol

. Dow

nloa

ded

from

info

rmah

ealth

care

.com

by

Que

en's

Uni

vers

ity o

n 05

/22/

13Fo

r pe

rson

al u

se o

nly.

generated for this particular data set [13]. This is in accordancewith the distribution of chemotypes between model trainingand intermediates data sets bearing in mind that most drugimpurities will stem from solvents, reagents and intermediatesused in the chemical synthesis of the final active pharmaceuti-cal ingredient. For example, several studies have found thatthe most prevalent alerts for drug impurities were aromaticamines, aromatic nitro compounds and alkylating agents [25,26]which is similar to the chemotypic distribution for publicallyavailable data [15,27].

More recently, proposals for the International Conferenceon Harmonization (ICH) M7 guidance documents may wellinclude the adoption of in silico methods as part of theinternational guidelines on the assessment and limits forGTIs although no official documentation has been publishedyet. Current industry strategies for the use of in silico systemsas part of a screening cascade are suggesting that if a com-pound is assessed by one or more in silico methods inaddition to using expert judgment to review the underlyingdata and predictions, then a compound predicted to be nega-tive can be assumed to non-mutagenic under these guidelines.This strategy suggests that negative predictions require noadditional scrutiny or testing, therefore the ability of anysystem to adequately identify negative compounds needs tobe understood and so more recent evaluations have focusedon the negative predictive value (NPV), i.e., the percentageof correct negative calls compared to the total number ofnegative calls [24].

In a survey of current practices across 8 pharmaceuticalcompanies, Dobo et al. report that using (Q)SAR systems

alone without any human expert intervention resulted in anaverage NPV of 94% and where expert opinion was incorpo-rated into the assessment this rose to 99%. The modelsthemselves are able to assist in this process. Statistically-derived, data-driven models generally support predictionswith a list of the most similar compounds from their trainingsets, which at least provide the context for the prediction.Derek provides a fraction of this data directly, as examplecompounds, and the remainder indirectly which can beaccessed through the references used to support the alert.Derek also provides a summary of the toxicity data, mechanis-tic rationale and explanation of the SAR derivation. However,it should be noted that in their evaluation, Dobo et al. [24].make the assumption that most of the compound sets usedcontained a representative percentage of Ames positives andthat the NPV values reported were significantly better thanif they were simply to predict all compounds to be negative.

It is also worth noting that all companies involved in thissurvey used at least one system and the most common ofwhich was the Derek system from Lhasa Limited. The surveyresults also suggest that the number and specific combinationof in silico tools used did not have a significant influence onthe NPV although a rigorous exploration of one system versustwo was not conducted on the data sets. Similarly, combiningthe output of two or more predictive systems did not signi-ficantly improve performance for an earlier study with adata set of intermediates [26] or food additives [28].

In contrast, other published evaluations of in silico systemshave suggested that there are gains in sensitivity and specificityfrom the use of more than one system however the degree of

Table 1. Categories of genotoxic impurities in pharmaceutical products.

Category Description

Class 1 Impurities known to be both genotoxic (mutagenic) and carcinogenicClass 2 Impurities known to be genotoxic (mutagenic), but with unknown carcinogenic potentialClass 3 Alerting structure, unrelated to the structure of the Active Pharmaceutical Ingredient (API) and of

unknown genotoxic (mutagenic) potentialClass 4 Alerting structure, related to the APIClass 5 No alerting structure or sufficient evidence for absence of genotoxicity

Table 2. Performance metrics for commercial systems when applied to GTI data sets.

System Data set composition and reference Sensitivity* Specificity‡ NPV Concordance§

DfW (multiple versions) 48 positives, 224 negatives;Dobo et al. (2006) [25]

68% 97% 94% 92%

DfW 11.0 95 positives, 178 negatives;Glowienke and Hasselgren [26]

82% 47% 83% 59%

MC4PC 2.0, AZ3 (1.9) 95 positives, 178 negatives;Glowienke and Hasselgren [26]

48% 75% 73% 66%

Derek 2011 159 positives, 495 negatives;Proprietary intermediates data set

70% 73% 88% 72%

*Sensitivity = correctly predicted positives/total number of positives.zSpecificity = correctly predicted negatives/total number of negatives§Concordance = percentage of correctly predicted compounds.

Latest advances in computational genotoxicity prediction

Expert Opin. Drug Metab. Toxicol. (2012) 8(12) 1581

Exp

ert O

pin.

Dru

g M

etab

. Tox

icol

. Dow

nloa

ded

from

info

rmah

ealth

care

.com

by

Que

en's

Uni

vers

ity o

n 05

/22/

13Fo

r pe

rson

al u

se o

nly.

coverage, i.e., the number of compounds that can bepredicted, drops and so decreases the utility of theseapproaches [15,18,29,30]. However, the data published in theseevaluations do not include expert opinion as part of theprocess and so it cannot be determined whether one in silicosystem plus expert opinion would have yielded similar orbetter predictive performances without such a significantdrop in coverage. If a combination of systems is used, thenmethods are available for deriving a measure of confidencein the consensus prediction by using methods such as distanceto model [31].The validation analyses carried out to date confirm the

utility of in silico models for the assessment of GTIs. Thisshould lead to increased confidence in the predictions, espe-cially when expert knowledge is used to put those predictionsinto context. Confidence in individual predictions should befurther increased by assessing whether the query compoundsis within the applicability domain (AD) of the model.

3. Chemical similarity

Similarity is multi-dimensional concept and the similaritybetween two compounds can be difficult to determine andeven more challenging to create a set of guidelines for. Forinstance, compounds 1 and 2 in Figure 1 are similar in that

they both have the same molecular formulae (C6H5NO2),yet their atom connectivity bears little resemblance; theyhave different aromaticity, physicochemical properties andmost importantly, probably dissimilar biological properties.

On the other hand, glucose 3 and galactose 4 in Figure 1

appear almost structurally identical yet from a pharmaco-logical perspective, these compounds are structurally distinctfrom each other. While much effort has been focused onhow to measure the structural similarity between twocompounds, the more pertinent question is: how relevant isthe concept of structural similarity to toxicological endpointscurrently under analysis.

The more dependant a toxicological endpoint is on struc-turally distinct toxicophores, such as mutagenicity or theuncoupling of oxidative phosphorylation, the less-applicablethe concept of similarity becomes. This is because minormodifications to the toxicophore can significantly influencetoxicological activity, yet major modifications to the structuralperiphery may have little impact on activity so long as thetoxicophore remains intact. When assessing the relevance ofpositive predictions in these cases, it is not enough to askhow similar the query compound is to other non-mutageniccompounds, but to identify the mutagenicity-attenuatingfeatures of structurally-alerting, non-mutagenic compoundsand to assess if these features can be safely and accuratelyextrapolated to the query compound.

Such an assessment is crucial to correctly identifying therelevance of positive predictions for impurities related toparent drugs that are known to be non-mutagenic. Forinstance, in Figure 2 acebutolol (1) is a non-mutagenic drugthat contains the alerting substructure of an aromatic amine(in bold) masked as an amide. With respect to the free-aniline 2 (i.e., without the amide group labeled * in 1) wecannot be certain that it is also devoid of mutagenic activity,despite the obvious similarity. This is because the mutagenicmechanism of aromatic amines is postulated to require thefree aniline and 1 could theoretically be inactive due to themetabolically-resistant amide functionality. In the absence ofevidence to suggest that the biotransformation exists to asignificant extent in Salmonella bacteria, the free aniline

O

CH2OH

OH

OH

OH

OH

O

CH2OH

OH

OH

OH

OH

N+

−O

O H2N O

O

1

Nitrobenzene 4-amino-2-benzo-quinone

Glucose Galactose

2 3 4

Figure 1. Selected examples of similar compounds.

NH2

O

OHN

OH

HN

O

O

O

nPr

HN

OH

1 2

Acebutolol

*

Figure 2. Acebutolol and a structurally-related compound.

R. T. Naven et al.


Exp

ert O

pin.

Dru

g M

etab

. Tox

icol

. Dow

nloa

ded

from

info

rmah

ealth

care

.com

by

Que

en's

Uni

vers

ity o

n 05

/22/

13Fo

r pe

rson

al u

se o

nly.

should therefore be tested if we are to have confidence in itsnon-mutagenic activity.

How far we can extrapolate from 2 to other impuritiesprimarily depends upon the reason for attenuation of muta-genic activity of the alerting aromatic amine and notstructural similarity. Given that the mechanism of aromaticamine mutagencity is believed to involve oxidation and subse-quent Phase-II metabolism of the amino group, there couldbe at least two mitigating factors. These are unfavorableelectronic properties of the alkoxy-carbonyl-substituted arom-atic ring and/or unfavorable electrostatic/metabolic propertiesof the hydroxyl-amino side chain, both of which may pre-clude one or more of the metabolic steps that lead to success-ful activation of the aromatic amine. We cannot extrapolatewith confidence, therefore, into areas of chemical space wherethere are significant changes to the electronics of the ringsystem because it may counter the currently-unfavorableelectronic properties, or similarly, to analogues where welose the hydroxyl-amino side chain. We can be confidentthat the non-mutagenic activity of 2 can be extrapolated tothe t-butyl analogue (3 in Figure 3) owing to the minormodification to the side chain, but where changes occur in

the aromatic ring (as in 4 in Figure 3) or in the side chain(as in 5 in Figure 3), further evidence is required before wecan have robust confidence in their lack of mutagenic activity.

Our confidence in the dismissal of positive predictions formutagenicity must therefore be based on three factors:

1) Identification of the toxicophore and mechanism ofmutagenicity associated with the prediction

2) Determination of themutagenicity-attenuating feature(s)associated with relevant non-mutagenic compounds usedto support the assessment (the parent compound in thecase of drug-related impurities).

3) An assessment showing that the features identified in2 are still intact for the query compound.

In the instance where compounds do not contain astructural alert for mutagenicity, it is often merited to searchthe public literature to ensure that there are no potentialknowledge gaps in the predictive system used. In these cases,the use of whole-molecule similarity measures is a usefulway to infer similar mutagenic activity. However, care shouldbe taken to ensure that any searches for similar compounds do

NH2

O

O NH

OH

Cl

NH2

O

O

NH2

O

O NH

OH

3 4 5

Figure 3. Additional structurally-related compounds to acebutolol.

OH

HO

O

HO

HO

OH

OH

O

OHO

DNA

O

Figure 4. Proposed Mechanism for DNA adduct formation of Ptaquiloside.



Exp

ert O

pin.

Dru

g M

etab

. Tox

icol

. Dow

nloa

ded

from

info

rmah

ealth

care

.com

by

Que

en's

Uni

vers

ity o

n 05

/22/

13Fo

r pe

rson

al u

se o

nly.

not return compounds that are mutagenic through a mecha-nism or functional groups not available to the query com-pound. While these examples may be considered structurallysimilar, they should not be used as comparators since theyhave the potential to form DNA adducts whereas compoundsthat are devoid of these reactive functional groups do not havethis inherent liability.

4. Applicability domains

There are many methods that may be used to define a modelAD, which have been reviewed [32-34]. The AD of a model canbe broadly described using two non-exclusive terms:

1) The region of chemical or response space relating to themodel training set, and

2) The region of chemical or response space where amodel makes an acceptable prediction error.

In the first definition (i), the underlying assumption is thatpredictions based on interpolations are generally reliable andthose based on extrapolations are more likely to be unreliable.This premise is supported by several publications [35-37]. Gen-erally, this is considered to be best achieved using the modeldescriptors [38,39] but comparable results can be gained fromdescriptors not used in the model [35,37].The second definition (ii) is more finessed, and is based on

the premise that valuable information can be generated byassessing well predicted compounds, whereas a subset of anytraining set will be misclassified and similarity to such com-pounds is no guarantee of reliability [32,36]. Further, predic-tions for compounds that are dissimilar to the training setare not, by default, in error [32].Many of the commonly used models for the prediction of

mutagenicity provide an AD measure alongside any predic-tions [40]. In most cases, though, the exact definition is uniqueto the model, which is consistent with the concept that there

is no single method for AD application, and this must be fit-ted to the model [32]. The partial exceptions to this are theexpert systems Derek and Toxtree. Although it is acceptedthat the scope of the structural alerts in these systems definestheir AD [38], this provides little information to the userwhen an alert is not matched. In the case of Derek, investiga-tions are underway to provide such information to the user, sothat a confidence metric can be assigned whatever the outputof the model. Such expert systems do not have a straightfor-ward and easily-accessible model training set, as this is basedon disparate data sources, for example toxicity data, mechanis-tic information and chemical knowledge, that have been syn-thesised into a SAR in cerebro. Further, not all data ispublically-available, thus, an approach is required that canreflect this expert knowledge and also does not require a com-plete model training set.

It’s worth comparing the literature research with the cur-rent use-case of predictive models for the assessment ofGTIs. Most of the studies have used been trained to reducethe error in continuous output QSARs based on data wherethe assay data provides homogeneous responses (e.g. polarnarcosis, LogP or IC50 for protein inhibition). Thus, giventhe context-specific use of models, there is a data-gap on theapplicability of ADs to categorical models based on assayswhich generate a more diverse range of outputs. There aresome exceptions to this [33,36,41], but the results of these onlyshowed that there was value in using AD to qualify confidencein positive, rather than negative, predictions. This is perhapsrelated to the difficulty in deriving a prediction of inactivity,which is in effect predicting the effects of the absence of allfeatures that are likely to bestow activity, and is a commonissue in the usage of most QSARs.

5. Expert opinion

The current state of the art for the in silico prediction ofgenotoxicity, or more specifically Ames mutagenicity, is

O

O

N

N

O

O

O

N

N

OHO

NH2N

N

Figure 5. CC-1065, an Antitumor Antibiotic.

R. T. Naven et al.


Exp

ert O

pin.

Dru

g M

etab

. Tox

icol

. Dow

nloa

ded

from

info

rmah

ealth

care

.com

by

Que

en's

Uni

vers

ity o

n 05

/22/

13Fo

r pe

rson

al u

se o

nly.

at a maturity where it is now being proposed as part of ascreening strategy for ICH Guidelines dealing with the assess-ment and control of genotoxic impurities in drug products.However, there remain two fundamental scientific challenges,that should a universal solution be found, would greatlyenhance the predictive power and application of in silicosystems. These are specifically the concept of chemicalsimilarity in the context of inferring mutagenic potentialfrom one of known activity to another whose activity isunknown, and the second is being able to define when anin silico model prediction can be considered to be reliable.These two issues are in fact closely linked as understandingwhen compound is chemically similar to a knowncompound will ultimately bring a better definition ofapplicability domains.

By way of example, Ptaquiloside is a naturally occurringsubstituent of bracken and has been shown to be genotoxicand carcinogenic. The mechanism of genotoxicity has beensuggested as that shown in Figure 4 where the cyclopropylring opens under metabolic activation and reacts with DNA.Under normal assay conditions, however, ptaquiloside is notmutagenic in the Ames assay.

This cyclopropyl ring opening mechanism was laterattributed to the mutagenic activity of CC-1065 and otherpro-aryl spirocyclopropanes [42]. The structure of CC-1065shown in Figure 5 would not be considered to be similar toptaquiloside by most current similarity measures howevertheir mechanism of action for genotoxicity is highly similar.Similarly, current methods for applicability domains wouldnot allow for coverage of one based on inclusion of the otherin a given training set.

In the case above, expert systems that can incorporateknowledge of chemical mechanisms will have a distinct

advantage over statistical or similarity based algorithmswhen predicting the activity of these compounds, yet but forthe same reason, the applicability domain of an expert systemwill be challenging to define accurately. Additional, or morerigorous, applicability domain measures will restrict the scopeof predictions, which may be an issue, e.g. may they improvethe predictivity for compounds considered within thedomain, but at the expense of compounds outside thedomain, which will presumably have to be tested.

Expert systems may also suffer from the complexity of fullydefining the subtle steric and electronic effects of substituentson the reactivity of functional groups. However, this limita-tion may also be present in statistical algorithm-based systemsif there is inadequate coverage in the training set or if thedescriptors used do not adequately capture these differences.Assessment of the chemical structure in question by a humanexpert will therefore enhance our ability to correctly assignactivity based on chemically similar compounds.

Finally, whilst chemical space will expand over time, evenwithin a single research project in a single pharmaceuticalcompany [35], the reaction mechanisms and hence the func-tional group building blocks of chemistry stay pretty muchconsistent, and generally when data-gaps appear, this tendsto be for compounds, such as boronic acids, which havebeen regularly used but rarely tested [43]. In this case, suchdata can be shared and incorporated into models that willpredict their activity thus demonstrating the value in recentinitiatives on data sharing across the pharmaceutical industry.

Declaration of interest

The authors state no conflict of interest and have received nopayment in preparation of this manuscript.



Exp

ert O

pin.

Dru

g M

etab

. Tox

icol

. Dow

nloa

ded

from

info

rmah

ealth

care

.com

by

Que

en's

Uni

vers

ity o

n 05

/22/

13Fo

r pe

rson

al u

se o

nly.

BibliographyPapers of special note have been highlighted as

either of interest (�) or of considerable interest(��) to readers.

1. Ashby J, Tennant RW. Definitive

relationships among chemical structure,

carcinogenicity and mutagenicity for

301 chemicals tested by the U.S. NTP.

Mutat Res 1991;257(3):229-306

2. Haworth S, Lawlor T, Mortelmans K,

et al. Salmonella mutagenicity test results

for 250 chemicals. Environ Mutagen

1983;5(Suppl 1):1-142

3. Mortelmans K, Haworth S, Lawlor T,

et al. Salmonella mutagenicity tests: II.

Results from the testing of

270 chemicals. Environ Mutagen

1986;8(Suppl 7):1-119

4. Zeiger E, Anderson B, Haworth S, et al.

Salmonella mutagenicity tests: IV. Results

from the testing of 300 chemicals.

Environ Mol Mutagen

1988;11(Suppl 12):1-157

5. Zeiger E, Anderson B, Haworth S, et al.

Salmonella mutagenicity tests: III. Results

from the testing of 255 chemicals.

Environ Mutagen 1987;9(Suppl 9):1-109

6. Naven RT, Louise-May S, Greene N.

The computational prediction of

genotoxicity. Expert Opin Drug

Metab Toxicol 2010;6(7):797-807.. A useful summary of the models used

for genotoxicity prediction in the

pharmaceutical industry

7. Lynch AM, Sasaki JC, Elespuru R, et al.

New and emerging technologies for

genetic toxicity testing.

Environ Mol Mutagen

2011;52(3):205-23

8. Kirkland D, Aardema M, Henderson L,

Muller L. Evaluation of the ability of a

battery of three in vitro genotoxicity tests

to discriminate rodent carcinogens and

non-carcinogens I. Sensitivity, specificity

and relative predictivity. Mutat Res

2005;584(1-2):1-256

9. Gramatica P. Principles of QSAR models

validation: internal and external.

QSAR Comb Sci 2007;26(5):694-701. Evidence is presented which highlights

the importance of external validation

of computational approaches for

toxicity prediction

10. Tropsha A, Gramatica P, Gombar VK.

The Importance of Being Earnest:

validation is the Absolute Essential for

Successful Application and Interpretation

of QSPR Models. QSAR Comb Sci

2003;22(1):69-77

11. Marchant CA, Briggs KA, Long A. In

silico tools for sharing data and

knowledge on toxicity and metabolism:

derek for windows, meteor, and vitic.

Toxicol Mech Methods

2008;18(2-3):177-87

12. Saiakhov RD, Klopman G. Benchmark

performance of MultiCASE Inc. software

in Ames mutagenicity set. J Chem

Inf Model 2010;50(9):1521

13. Valerio LG Jr, Cross KP.

Characterization and validation of an

in silico toxicology model to predict the

mutagenic potential of drug impurities*.

Toxicol Appl Pharmacol

2012;260(3):209-21

14. Benigni R, Bossa C, Tcheremenskaia O,

Giuliani A. Alternatives to the

carcinogenicity bioassay: in silico

methods, and the in vitro and in vivo

mutagenicity assays. Expert Opin Drug

Metab Toxicol 2010;6(7):809-19

15. Hillebrecht A, Muster W, Brigo A, et al.

Comparative evaluation of in silico

systems for ames test mutagenicity

prediction: scope and limitations.

Chem Res Toxicol 2011;24(6):843-54.. Describes the predictivity of the most

commonly used mutagenicity models

for several data sets and identifies areas

for improvement

16. McCarren P, Bebernitz GR, Gedeck P,

et al. Avoidance of the Ames test liability

for aryl-amines via computation.

Bioorg Med Chem 2011;19(10):3173-82

17. Greene N, Judson PN, Langowski JJ,

et al. Knowledge-based expert systems for

toxicity and metabolism prediction:

DEREK, StAR and METEOR.

SAR QSAR Environ Res

1999;10(2-3):299-314

18. Snyder RD, Pearl GS, Mandakas G,

et al. Assessment of the sensitivity of the

computational programs DEREK,

TOPKAT, and MCASE in the

prediction of the genotoxicity of

pharmaceutical molecules.

Environ Mol Mutagen

2004;43(3):143-58

19. Kamber M, Fluckiger-Isler S,

Engelhardt G, et al. Comparison of the

Ames II and traditional Ames test

responses with respect to mutagenicity,

strain specificities, need for metabolism

and correlation with rodent

carcinogenicity. Mutagenesis

2009;24(4):359-66

20. Snyder RD. Assessment of atypical

DNA intercalating agents in biological

and in silico systems. Mutat Res

2007;623(1-2):72-82

21. United States Food and Drug

Administration. Guidance for Industry,

Genotoxic and Carciniogenic Impurities

in Drug Substance and Drug Products:

recommended Approaches.

2008.Available from: http://www.fda.gov/

downloads/Drugs/Guidance

ComplianceRegulatoryInformation/

Guidances/ucm079235.pdf

22. European Medicines Agency. Guideline

on the Limits of Genotoxic Impurities.

2006.Available from: http://www.emea.

europa.eu/docs/en_GB/document_library/

Scientific_guideline/2009/09/

WC500002903.pdf

23. Muller L, Mauthe RJ, Riley CM, et al.

A rationale for determining, testing, and

controlling specific impurities in

pharmaceuticals that possess potential for

genotoxicity. Regul Toxicol Pharmacol

2006;44(3):198-211.. Introduces the concept of categories to

define levels of concern for GTIs

24. Dobo KL, Greene N, Fred C, et al. In

silico methods combined with expert

knowledge rule out mutagenic potential

of pharmaceutical impurities: an industry

survey. Regul Toxicol Pharmacol

2012;62(3):449-55. An evaluation of negative mutagenicity

prediction accuracy for GTIs across

several pharmaceutical companies and

the benefits of human interpretation

of results

25. Dobo KL, Greene N, Cyr MO, et al.

The application of structure-based

assessment to support safety and

chemistry diligence to manage genotoxic

impurities in active pharmaceutical

ingredients during drug development.

Regul Toxicol Pharmacol

2006;44(3):282-93. Highlights the role that predictive

systems can play in the safety

assessment of impurities in

drug development

26. Glowienke S, Hasselgren C. Use of

Structure Activity Relationship (SAR)

evaluation as a critical tool in the

evaluation of the genotoxic potential of

R. T. Naven et al.


Exp

ert O

pin.

Dru

g M

etab

. Tox

icol

. Dow

nloa

ded

from

info

rmah

ealth

care

.com

by

Que

en's

Uni

vers

ity o

n 05

/22/

13Fo

r pe

rson

al u

se o

nly.

http://www.emea.europa.eu/docs/en_GB/document_library/Scientific_guideline/2009/09/WC500002903.pdf








impurities, in genotoxic impurities:

strategies for identification and control.

John Wiley & Sons, Inc; Hoboken, NJ,

USA: 2011. p. 97-120. This is a review and validation of

mutagenicity models for

GTI assessment.

27. McCarren P, Springer C, Whitehead L.

An investigation into pharmaceutically

relevant mutagenicity data and the

influence on Ames predictive potential.

J Cheminform 2011;3:51.. This paper includes analyses of the

differences between publically-available

and proprietary pharmaceutical

data sets.

28. Ono A, Takahashi M, Hirose A, et al.

Validation of the (Q)SAR combination

approach for mutagenicity prediction of

flavor chemicals. Food Chem Toxicol

2012;50(5):1538-46

29. Pearl GM, Livingston-Carr S,

Durham SK. Integration of

computational analysis as a sentinel tool

in toxicological assessments. Curr Top

Med Chem 2001;1(4):247-55

30. White AC, Mueller RA, Gallavan RH,

et al. A multiple in silico program

approach for the prediction of

mutagenicity from chemical structure.

Mutat Res 2003;539(1-2):77-89

31. Sushko I, Novotarskyi S, K€orner R, et al.

Applicability domains for classification

problems: benchmarking of distance to

models for Ames mutagenicity set.

J Chem Inf Model

2010;50(12):2094-111

32. Dragos H, Gilles M, Alexandre V.

Predicting the predictability: a unified

approach to the applicability domain

problem of QSAR models. J Chem

Inf Model 2009;49(7):1762-76.. This includes a introduction to the

concept of applicability domains and

sets some general principles for

their development.

33. Ellison CM, Sherhod R, Cronin MT,

et al. Assessment of methods to define

the applicability domain of structural

alert models. J Chem Inf Model

2011;51(5):975-85

34. Hewitt M, Ellison CM. Developing the

applicability domain of in silico models:

relevance, importance and methodology.

In: Cronin MTD, Madden JC, editors.

In silico toxicology: principles and

applications. Royal Society of Chemistry;

Cambridge, UK: 2010. p. 301-33

35. Weaver S, Gleeson MP. The importance

of the domain of applicability in QSAR

modeling. J Mol Graph Model

2008;26(8):1315-26. Evidence for several models is

presented showing that as distance to

training set increases, model

accuracy decreases.

36. Kuhne R, Ebert RU, Schuurmann G.

Chemical domain of QSAR models from

atom-centered fragments. J Chem

Inf Model 2009;49(12):2660-9

37. Sheridan RP, Feuston BP, Maiorov VN,

Kearsley SK. Similarity to molecules in

the training set is a good discriminator

for prediction accuracy in QSAR.

J Chem Inf Comput Sci

2004;44(6):1912-28. This paper evaluates the performance

of applicability domains for assessing

prediction accuracy.

38. Netzeva TI, Worth A, Aldenberg T,

et al. Current status of methods for

defining the applicability domain of

(quantitative) structure-activity

relationships. The report and

recommendations of ECVAM Workshop

52. Altern Lab Anim 2005;33(2):155-73.. Provides a high-level summary of the

approaches that may be used for

applicability domain definition.

39. Jaworska J, Nikolova-Jeliazkova N,

Aldenberg T. QSAR applicabilty domain

estimation by projection of the training

set descriptor space: a review.

Altern Lab Anim 2005;33(5):445-59

40. Fioravanzo E, Bassan A, Pavan M, et al.

Role of in silico genotoxicity tools in the

regulatory assessment of pharmaceutical

impurities. SAR QSAR Environ Res

2012;23(3-4):257-77

41. Ellison CM, Enoch SJ, Cronin MT,

et al. Definition of the applicability

domains of knowledge-based predictive

toxicology expert systems by using a

structural fragment-based approach.

Altern Lab Anim 2009;37(5):533-45

42. Harbach PR, Zimmer DM, Mazurek JH,

Bhuyan BK. Mutagenicity of the

antitumor antibiotic CC-1065 and its

analogues in mammalian (V79) cells and

bacteria. Cancer Res 1988;48(1):32-6

43. O’Donovan MR, Mee CD, Fenner S,

et al. Boronic acids-a novel class of

bacterial mutagen. Mutat Res

2011;724(1-2):1-6

AffiliationRussell T Naven1, Nigel Greene†1 &

Richard V Williams2

†Author for correspondence1Compound Safety Prediction Group,

Worldwide Medicinal Chemistry,

Pfizer Worldwide Research and Development,

Eastern Point Road, Groton,

CT 06340, USA2Lhasa Ltd, 22-23 Blenheim Terrace,

Woodhouse Lane, Leeds,

LS2 9HD, UK



Exp

ert O

pin.

Dru

g M

etab

. Tox

icol

. Dow

nloa

ded

from

info

rmah

ealth

care

.com

by

Que

en's

Uni

vers

ity o

n 05

/22/

13Fo

r pe

rson

al u

se o

nly.

Date post:	10-Dec-2016
Category:	Documents
Upload:	richard-v
View:	212 times
Download:	0 times

Latest advances in computational genotoxicity prediction

Documents