+ All Categories
Home > Documents > Predictive modeling of chemical toxicity towards Pseudokirchneriella subcapitata using regression...

Predictive modeling of chemical toxicity towards Pseudokirchneriella subcapitata using regression...

Date post: 30-Dec-2016
Category:
Upload: kunal
View: 212 times
Download: 0 times
Share this document with a friend
7
Predictive modeling of chemical toxicity towards Pseudokirchneriella subcapitata using regression and classication based approaches Subrata Pramanik, Kunal Roy n Drug Theoretics and Cheminformatics Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India article info Article history: Received 21 November 2013 Received in revised form 28 December 2013 Accepted 30 December 2013 Available online 20 January 2014 Keywords: Biodiversity Toxicity QSTR LDA Pseudokirchneriella subcapitata abstract Biodiversity nurturing may be a valuable pathway in controlling chemical stress on the ecosystem. In the present work, in silico studies have been performed to develop regression based quantitative structure toxicity relationship (QSTR) models using a data set containing 105 organic chemicals for the prediction of 48-h chemical toxicity towards Pseudokirchneriella subcapitata. Classication based linear discriminant analysis (LDA) was also performed to distinguish chemicals into toxic and nontoxic groups using the same data set. The developed models were found to possess good predictive quality in terms of internal, external and overall validation parameters. The regression based QSTR model suggests that second order molecular connectivity index (molecular size and lipophilicity), density (aromaticity), relative shape of molecules (cyclicity/aromaticity), and specic molecular fragments of the chemicals are important properties of chemicals to exert their toxicity on P. subcapitata. The classication based LDA QSTR model suggested that fused ring aromatic systems, secondary carbon atom fragments, second order valence molecular connectivity indices (molecular size and branching) and molecular weight are the distinguish- ing features to differentiate chemicals into toxic and nontoxic groups. & 2014 Elsevier Inc. All rights reserved. 1. Introduction Environmental management has aims to protect different living species from stresses arising from the chemicals released to the ecosystems (Cardinale et al., 2012; Hartung, 2009). Every species plays a momentous role in monitoring evolutionary diversication (Cardinale et al., 2012). The dynamics of all ecosystems are decided by intrinsic and extrinsic functions of individual species and their intimate interaction with non-living objects in the ecosystem such as bioaccumulation and excretion (Ahrens and Traas, 2007; Bell et al., 2005; Brose et al., 2004; Emerson and Kolm, 2005; Gravel et al., 2011). The number of species in an ecosystem and their traits are harmonized predictors of many ecological processes, such as rates of bioconservation, biomass sequestration, productivity, sustainable management of natural resources and biogeochemical cycle (Chapin et al., 2000; Hector and Bagchi, 2007; Kolter and Greenberg, 2006). In this consequence, algae communities provide valuable services to environmental management. Over the last few decades, the environment has been much exposed to chemical industrializations, mostly through the increased use of agricultural fertilizers and pharmaceuticals, fossil fuel combustion, biomedical waste and petrochemicals (Planson et al., 2012; Rohr et al., 2008). Unrestricted release of chemicals into the environment contributes to the leading causes of pollu- tion worldwide (Scherb and Voigt, 2011). A number of biomedical along with many ecological problems such as global warming, melting of ice caps, loss of biodiversity, abnormality in biogeo- chemical cycle are arising from the chemical revolution (Cardinale et al., 2012; Gonzalez et al., 2011; Raes et al., 2011; Scherb and Voigt, 2011). Therefore, the environment needs to be nurtured and conserved for ecological functions which may require natural biodiversity. Several studies suggest that conservation of biodiver- sity may be a valuable pathway in controlling the chemical stress or chemical toxicity of the ecosystem (Bell et al., 2005; Cardinale et al., 2012; Chapin et al., 2000; Hector and Bagchi, 2007). Under- standing the chemical toxicity to different species is becoming a point of focus in environmental chemistry (Azarbad et al., 2013; Daouk et al., 2013; Garrigues, 2005). The effects of chemicals on individual species depend on the interaction of chemicals with cellular microenvironment (Hartung et al., 2012). The toxicity screening of a large number of chemicals and understanding their complex cellular interaction towards toxicity require dened in- vitro, in-vivo experiments which face some socioeconomic and bioethical complications such as time, cost, number of animals for experiment and difculties in correlation/interpretation with human system (Ahrens and Traas, 2007; Garrigues, 2005). To assist the experimental work, reliable simulation/theoretical Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/ecoenv Ecotoxicology and Environmental Safety 0147-6513/$ - see front matter & 2014 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.ecoenv.2013.12.030 n Corresponding author. Present address: Manchester Institute of Biotechnology, 131 Princess Street, Manchester M17DN, United Kingdom. Fax: þ91 33 2837 1078. E-mail addresses: [email protected], [email protected] (K. Roy). URL: http://sites.google.com/site/kunalroyindia/ (K. Roy). Ecotoxicology and Environmental Safety 101 (2014) 184190
Transcript

Predictive modeling of chemical toxicity towards Pseudokirchneriellasubcapitata using regression and classification based approaches

Subrata Pramanik, Kunal Roy n

Drug Theoretics and Cheminformatics Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, JadavpurUniversity, Kolkata 700032, India

a r t i c l e i n f o

Article history:Received 21 November 2013Received in revised form28 December 2013Accepted 30 December 2013Available online 20 January 2014

Keywords:BiodiversityToxicityQSTRLDAPseudokirchneriella subcapitata

a b s t r a c t

Biodiversity nurturing may be a valuable pathway in controlling chemical stress on the ecosystem. In thepresent work, in silico studies have been performed to develop regression based quantitative structuretoxicity relationship (QSTR) models using a data set containing 105 organic chemicals for the predictionof 48-h chemical toxicity towards Pseudokirchneriella subcapitata. Classification based linear discriminantanalysis (LDA) was also performed to distinguish chemicals into toxic and nontoxic groups using thesame data set. The developed models were found to possess good predictive quality in terms of internal,external and overall validation parameters. The regression based QSTR model suggests that second ordermolecular connectivity index (molecular size and lipophilicity), density (aromaticity), relative shape ofmolecules (cyclicity/aromaticity), and specific molecular fragments of the chemicals are importantproperties of chemicals to exert their toxicity on P. subcapitata. The classification based LDA QSTR modelsuggested that fused ring aromatic systems, secondary carbon atom fragments, second order valencemolecular connectivity indices (molecular size and branching) and molecular weight are the distinguish-ing features to differentiate chemicals into toxic and nontoxic groups.

& 2014 Elsevier Inc. All rights reserved.

1. Introduction

Environmental management has aims to protect different livingspecies from stresses arising from the chemicals released to theecosystems (Cardinale et al., 2012; Hartung, 2009). Every speciesplays a momentous role in monitoring evolutionary diversification(Cardinale et al., 2012). The dynamics of all ecosystems are decidedby intrinsic and extrinsic functions of individual species and theirintimate interaction with non-living objects in the ecosystem suchas bioaccumulation and excretion (Ahrens and Traas, 2007; Bellet al., 2005; Brose et al., 2004; Emerson and Kolm, 2005; Gravelet al., 2011). The number of species in an ecosystem and their traitsare harmonized predictors of many ecological processes, such asrates of bioconservation, biomass sequestration, productivity,sustainable management of natural resources and biogeochemicalcycle (Chapin et al., 2000; Hector and Bagchi, 2007; Kolter andGreenberg, 2006). In this consequence, algae communities providevaluable services to environmental management.

Over the last few decades, the environment has been muchexposed to chemical industrializations, mostly through theincreased use of agricultural fertilizers and pharmaceuticals, fossil

fuel combustion, biomedical waste and petrochemicals (Plansonet al., 2012; Rohr et al., 2008). Unrestricted release of chemicalsinto the environment contributes to the leading causes of pollu-tion worldwide (Scherb and Voigt, 2011). A number of biomedicalalong with many ecological problems such as global warming,melting of ice caps, loss of biodiversity, abnormality in biogeo-chemical cycle are arising from the chemical revolution (Cardinaleet al., 2012; Gonzalez et al., 2011; Raes et al., 2011; Scherb andVoigt, 2011). Therefore, the environment needs to be nurtured andconserved for ecological functions which may require naturalbiodiversity. Several studies suggest that conservation of biodiver-sity may be a valuable pathway in controlling the chemical stressor chemical toxicity of the ecosystem (Bell et al., 2005; Cardinaleet al., 2012; Chapin et al., 2000; Hector and Bagchi, 2007). Under-standing the chemical toxicity to different species is becoming apoint of focus in environmental chemistry (Azarbad et al., 2013;Daouk et al., 2013; Garrigues, 2005). The effects of chemicals onindividual species depend on the interaction of chemicals withcellular microenvironment (Hartung et al., 2012). The toxicityscreening of a large number of chemicals and understanding theircomplex cellular interaction towards toxicity require defined in-vitro, in-vivo experiments which face some socioeconomic andbioethical complications such as time, cost, number of animals forexperiment and difficulties in correlation/interpretation withhuman system (Ahrens and Traas, 2007; Garrigues, 2005). Toassist the experimental work, reliable simulation/theoretical

Contents lists available at ScienceDirect

journal homepage: www.elsevier.com/locate/ecoenv

Ecotoxicology and Environmental Safety

0147-6513/$ - see front matter & 2014 Elsevier Inc. All rights reserved.http://dx.doi.org/10.1016/j.ecoenv.2013.12.030

n Corresponding author. Present address: Manchester Institute of Biotechnology,131 Princess Street, Manchester M1 7DN, United Kingdom. Fax: þ91 33 2837 1078.

E-mail addresses: [email protected], [email protected] (K. Roy).URL: http://sites.google.com/site/kunalroyindia/ (K. Roy).

Ecotoxicology and Environmental Safety 101 (2014) 184–190

analysis should be performed to explain the chemical toxicity ondifferent species. The quantitative structure toxicity/activity/prop-erty relationship (QSTR/QSAR/QSPR) methods can be applied toaddress the structural relationship of chemicals with their toxicpotency and categorization of chemicals into toxic and nontoxicgroups (Du et al., 2008; Hartung, 2009; Lee et al., 2013; Shi andYang, 2013; Yuan et al., 2012). Therefore, the present study hasbeen focused to explore the chemical attributes of pesticides,polycyclic aromatic hydrocarbons, nitriles, aldehydes along withother pollutants for their toxic manifestation towards Pseudokirch-neriella subcapitata (Chen et al., 2009).

2. Materials and methods

2.1. The dataset selection

A number of QSAR models on the chemical toxicity of P. subcapitata can befound in the literature (Supplemental data Table S1). The toxicity values (EC50) varywith the variation of experimental conditions. The EC50 value is the effectiveconcentration of a chemical/drug that inhibits growth/kill 50% of species. Thegrowth inhibition assay on P. subcapitata is usually performed for different timedurations (24 h, 48 h, 72 h and 96 h). A review of previous QSAR studies on P.subcapitata toxicity (Supplemental data Table S1) shows that QSARs on 48-htoxicity data were previously carried out on small number of chemicals (referencesof the previous QSAR studies are listed in the Supplemental data at the end of TableS1). This prompted us to develop further QSAR models for 48-h data using a largerchemical set. QSAR models on 72-h data on toxicity susceptibility on P. subcapitatahave been recently reported using non-polar and polar narcotic chemicals byAruoja et al. (2014). However, it may be noted that QSAR models developed on datafor toxicity of different durations should not be directly compared; according to theOECD guidelines for the QSAR model development (http://www.oecd.org/env/ehs/risk-assessment/37849783.pdf), the endpoint of a QSAR model should be definiteone. In this perspective, the 48-h toxicity data of diverse chemicals (Chen et al.,2009) was used in the present study for the development of QSTR models whichshould be applicable on organic chemicals with structural diversity. The presentdata set contains 108 chemicals comprising benzenes, alkanes, phenols, anilines,aldehydes, nitriles, alcohols, ketones, pesticides, and polycyclic aromatic hydro-carbons (PAHs). The toxicity data (EC50) was not reported for 2,4,6-trichlorophenol.In a preliminary regression analysis, it was observed that formaldehyde andacetaldehyde act as outliers (showing high residual values) in model development.A number of studies reported that formaldehyde forms polymer (paraformalde-hyde) and acetaldehyde forms a number of polymers such as paraldehyde,metaldehyde and polyacetaldehyde (Furukawa and Saeguas, 1962; Ishida, 1981; Liet al., 2011). These two aldehydes were not used in model development. Therefore,three chemicals were excluded from this study and the remaining 105 chemicalshave been used in the modeling. The toxicity of the chemicals was reported interms of EC50 (mg/l) data against P. subcapitata. The EC50 values are the effectiveconcentration of a chemical that inhibits 50% growth of these algae. The EC50 (mg/l)values were converted to negative logarithmic scale (pEC50 mM) for the modeldevelopment (Supplemental data Table S2).

2.2. Software

Structures of all chemicals were drawn using Marvin Sketch 5.10.0 software(ChemAxon Ltd. http://www.chemaxon.com). The open source PaDEL-Descriptorsoftware (http://padel.nus.edu.sg/software/padeldescriptor/) was used to calculateextended topochemical atom (ETA) indices while non-ETA descriptors werecalculated using Cerius2 version 4.10 software (Cerius 2 Version 4.10. http://accelrys.com/products). The SPSS software (http://www.spss.com) was applied fork-means clustering analysis for data set division (training and test sets) and ROCanalysis. Stepwise multiple linear regression regression (MLR) and partial leastsquares (PLS) were performed by MINITAB version 14.13 (http://www.minitab.com). The variable importance plot (VIP) and Y-randomization test for PLSregression based QSTR was carried out using SIMCA-P software (UMETRICSSIMCA-P 10.0, www.umetrics.com, Umea, Sweden). STATISTICA version 7.1 software(http://www.statsoft.com/) was used to perform linear discriminant analysis (LDA).

2.3. Descriptor calculation

A set of ETA and non-ETA (2D and 3D) descriptors was used as a pool ofindependent variables for model development. The non-ETA descriptors are topolo-gical (Balaban, Kappa shape indices, flexibility index, subgraph count indices,molecular connectivity indices, Wiener and Zagreb), thermodynamic (AlogP98 andMolRef), structural (MW, Rotlbonds, Hbond acceptor, Hbond donor and Chiralcenters), spatial (RadOfGyration, Jurs descriptors, area, density, partial moment of

inertia and molar volume), electronic (HOMO, LUMO, superdelocalizability anddipole moment), atom types (Atype), and electrotopological state indices(Supplemental data Table S3). The set of descriptors has been chosen based on theirprecise application, predictability and easy interpretability in terms of pEC50determination. The conformer generation followed by energy minimization wasdone prior to 3D descriptor calculation. The multiple conformations of each moleculewere generated using the optimal search as the conformational search method. Eachconformer was subjected to energy minimization procedure using smart minimizerunder open force field (OFF) to generate the lowest energy conformation for eachstructure. The Gasteiger method was used for charge calculation of molecules(Gasteiger and Marsili, 1980). Due to the importance of lipohilicity in aquatic toxicitymodeling, experimental partition co-efficient (log Kow) (taken from the literature;Chen et al., 2009) was also tried as an additional descriptor.

2.4. Dataset splitting and model development

The data set (Ntotal¼105) was divided into training and test sets based on thek-means clustering technique. Approximately 30% of chemicals were selected asthe test set members (Ntest¼31) and the remaining 70% as the training setmembers (Ntraining¼74) (Dougherty et al., 2002; Everitt et al., 2001; Johnson andWichern, 2005). The splitting was done in such a way that each of the sets coversthe total chemical space of the entire data set (Martin et al., 2012). The samedivision was used for both regression based QSTR and LDA studies. The modelswere generated using the structural information of chemicals from the training set,and the test set chemicals were employed to check model reliability or externalpredictive quality of models. The regression based QSTR models were developedusing the partial least squares (PLS) method. The descriptors appeared in stepwiseMLR with stepping criteria F-to-enter 4 and F-to-remove 3.9 (Darlington, 1990)were subjected to partial least squares (PLS) regression (Eriksson et al., 2001, 2002;Wold, 1995). PLS is a more robust regression method than MLR and it obviates theproblem of intercorrelation in the latter approach. The linear discriminant analysis(LDA) has also been applied to identify the discriminatory features into higherand lower toxic chemicals (Fisher, 1936; Mitteroecker and Bookstein, 2011).The threshold value (pEC50¼0.936 mM) for LDA analysis was selected based onarithmetic mean of pEC50 values (Kar et al., 2012). Chemicals having the pEC50

value higher than or equal to 0.936 mM were considered as highly toxic to P.subcapitata. In the LDA studies, 60 descriptors out of 171 descriptors were selectedas the pool of predictor variables based on molecular spectrum analysis.

2.5. Validation metrics for the regression based QSTR model

The robustness of the QSTR models was verified by using a number of statisticalparameters. Three strategies were followed: (1) leave-one-out (LOO) internalvalidation or cross-validation for the training set compounds, (2) external valida-tion using the test compounds and (3) overall validation using both training andtest set compounds. The main objective of this QSTR modeling is the developmentof robust models which are able to make accurate and reliable predictions oftoxicity of chemicals towards P. subcapitata. Therefore, mathematical equationsdeveloped from the training set were subsequently validated internally usingtraining set chemicals as well as externally using the test set molecules for checkingthe predictive quality of the developed models. The overall validation strategiescountercheck the reliability of the developed models for their possible applicationon a new set of data and assess confidence of such predictions.

The model fitness parameters R2and R2a , internal validation metrics Q2,

r2mðLOOÞscaled and Δr2mðLOOÞscaled , external validation metrics R2pred , r2mðtestÞscaled ,

Δr2mðtestÞscaled and overall metrics r2mðoverallÞscaled , Δr2mðoverallÞscaled were reported inconnection with validation for the developed models (Kubinyi et al., 1998; Royet al., 2012, 2013) (the definitions of all fitness parameters have been provided inSupplemental data). Further, predictive qualities of the models were assessed basedon Golbraikh and Tropsha0s approaches (Golbraikh and Tropsha, 2002). Accordingto the acceptance criteria set forth by Golbraikh and Tropsha, a model must followthe following conditions:

ðiÞ Q240:5

ðiiÞ r240:6

ðiiiÞ ðr2�r2oÞ=r2o0:1 or ðr2�r02o Þ=r2o0:1

ðivÞ 0:85rkr1:15 or 0:85rk0r1:15

Here, r2 and ro2 are squared correlation coefficient values between the observed

and predicted values (Y and X axes respectively) of the compounds with andwithout intercept respectively. An interchange of the axes gives the value ro

02

instead of ro2. The plot of observed values (Y-axis) against the predicted values

(X-axis) of the test set compounds setting the intercept to zero gives the slopeof the fitted line as the value of k. The interchange of axes gives the value of k0 .

The developed models were also subjected to a randomization test (100permutations) to check the possibility of chance correlationn. The pEC50 (Y) values

S. Pramanik, K. Roy / Ecotoxicology and Environmental Safety 101 (2014) 184–190 185

were randomly permuted keeping the descriptor matrix (same combination ofdescriptors as in the original model) intact followed by a PLS run. The randomiza-tion and subsequent PLS analysis generates a new set of R2and Q2values, whichwere plotted against the correlation coefficient between the original Y values andthe permuted Y values. The corrected R2

p (cR2p ) was also calculated based on the

results of the randomized data (Mitra et al., 2010; Schuurmann et al., 2008) (Thedefinition of cR2

p has been provided in Supplemental data). The models will beconsidered valid if R2

into0:4, Q2into0:05 and cR2

p 40:5.

2.6. Validation metrics for the classification based QSTR model (LDA)

The models were initially assessed based on quantitative statistical parameterssuch as sensitivity, specificity, accuracy, precision and F-measure for training andtest set compounds respectively (Roy and Mitra, 2011). The validation parameterssuch as Wilks0 λ statistics (λ should be less than 0.5 for the acceptable model)(Galvez-Llompart et al., 2011), chi-square (χ2) (higher value is better) (Fawcett,2006), Canonical index (Rc) (Rc should be greater than 0.5 for the acceptable model)(Mitteroecker and Bookstein, 2011), Matthews correlation coefficient(�1rMCCr1) (MCC should be greater than 0.5 for an acceptable model)(Powers, 2011), squared Mahalanobis distance (higher value is better) (Galvez-Llompart et al., 2011), probability-level (p) and receiver operating characteristics(ROC) (Fawcett, 2006; Powers, 2011) were also used to assess the model quality.

The higher χ2 value defines prominent separation of chemicals into toxic andnontoxic groups. According to the Fisher–Snedecor parameter (F), selection ofdescriptors was done to determine the relative importance of candidate variables.The ROC curves define the discrimination ability of the classification models andthe graphs are obtained by plotting the sensitivity and (1-specificity) indices alongthe Y and X axes respectively. In an ideal case, area under curve of the receiveroperating characteristic curve (AUROC) would be 1, whereas AUROC is 0.5 can beacceptable (Fawcett, 2006; Powers, 2011). Two more parameters incorporatingEuclidean distance (ED) and fitness function (FIT (λ)) measured with ROC namelyROCED and ROCFIT were also reported (Perez-Garrido et al., 2011). ROCED is 0 for aperfect classifier and 4.5 for random classifier, specifically greater than 2 isconsidered as a random classifier and above 4 is considered as a bad classifier.

3. Results

3.1. Regression based QSTR model

The combination of ETA and non-ETA descriptors was used formodel development and subsequent model validation. Initially,stepwise MLR method was used for the descriptor selection andthe selected descriptors were subsequently subjected to PLSregression to obviate the problem of inter-correlation. The devel-oped equation (Eq. (1)) shows good reliability in terms of internal,external and overall predictive qualities. The log Kow was added tothe descriptors appearing in Eq. (1) and subsequent PLS analysiswas performed to generate an equation with log Kow (Eq. (2)). Thevalidation metrics of both equations (Table 1) show almostequivalent predictive quality. The predictive quality and overallperformances of the developed models were verified using thescatter plots. The predictive quality of Eqs. (1) and (2) for both thetraining and test sets has been found to be good, and there areonly small deviations from perfect predictions (Fig. 1). The follow-ing equations provide a comprehensive explanation for the pre-diction of toxicity (48 h) of chemicals against P. subcapitata.

pEC50ðmMÞ ¼ �3:25199þ0:40376� 2χþ3:44722� density

�2:86449� ½∑α�p=∑αþ0:65341

�Atype_N_74þ0:81062� Atype_C_36 ð1Þ

NTraining ¼ 74; NTest ¼ 31; R2 ¼ 0:774; R2a ¼ 0:760; Q2 ¼ 0:708

r2mðLOOÞscaled ¼ 0:601; Δr2mðLOOÞscaled ¼ 0:160;cR2

p ¼ 0:781; R2pred ¼ 0:798;

r2mðtestÞscaled ¼ 0:646; Δr2mðtestÞscaled ¼ 0:201;

RMSEP¼ 0:565; r2mðoverallÞscaled

¼ 0:623; Δr2mðoverallÞscaled ¼ 0:177

ðr2�r2oÞ=r2 ¼ 0:002 and k¼ 0:953 or

ðr2�r02o Þ=r2 ¼ 0:081 and k0 ¼ 0:916

pEC50ðmMÞ ¼ �3:209þ0:340� 2χþ0:098� log Kow

þ3:434� density�2:977� ½∑α�p=∑αþ0:669� Atype_N_74þ0:978� Atype_C_36 ð2Þ

NTraining ¼ 74; NTest ¼ 31; R2 ¼ 0:782; R2a ¼ 0:765;

Q2 ¼ 0:705; r2mðLOOÞscaled ¼ 0:600; Δr2mðLOOÞscaled ¼ 0:152;

cR2p ¼ 0:774; R2

pred ¼ 0:813; r2mðtestÞscaled ¼ 0:672;

Δr2mðtestÞscaled ¼ 0:186; RMSEP¼ 0:544; r2mðoverallÞscaled ¼ 0:627;

Δr2mðoverallÞscaled ¼ 0:172

ðr2�r2oÞ=r2 ¼ 0:004 and k¼ 0:940 or

ðr2�r02o Þ=r2 ¼ 0:075 and k0 ¼ 0:940

The Y randomization study generates set of R2 and Q2values,which when plotted against the correlation coefficient betweenthe original Y values and the permuted Y values yield differentintercept values R2¼(0.0, �0.016), Q2¼(0.0, �0.407) for Eq. (1)and R2¼(0.0, 0.014), Q2¼(0.0, �0.486) for Eq. (2). The rando-mized R2and Q2 are far deviated from actual R2 and Q2values(Fig. 2). The corrected R2

p (cR2p ) of both models (cR2

p ¼0.776 for Eq.(1) and cR2

p ¼0.773 for Eq. (2)) indicate that models are notobtained by chance. According to the variable importance plot(VIP), the order of significance of descriptors in the Eq. (1) is (1) 2χ,(2) density, (3) ½∑α�p=∑α, (4) Atype_N_74 and (5) Atype_C_36(Fig. 3A). The order of significance of descriptors in the Eq. (2) is(1) 2χ, (2) log Kow, (3) density, (4) Atype_N_74, (5) ½∑α�p=∑α and(6) Atype_C_36 (Fig. 3B).

3.2. Classification based QSTR model

The classification based QSTR model derived using LDA showsthat four predictor variables can distinguishingly classify chemi-cals into toxic and nontoxic groups for P. subcapitata. The devel-oped discriminant analysis linear equation (Eq. (3)) has significantdiscriminant function (DF) which can be characterized by Wilks0

lambda (λ¼0.438) and canonical correlation coefficient(Rc¼0.749). Statistical significance tests using χ² (χ²¼57.749) showthat groups are separated distinctly.

DF¼ �13:0420�1:9488� S_aaaCþ1:1885

�Atype_C_24�4:1799� 2χv þ0:1964�MW ð3ÞWilks0 lambda (λ)¼0.438, χ²¼57.749, Rc¼0.749, AUC ofROCTraining¼0.968, AUC of ROCTest¼0.957, eigen value¼1.281,

Table 1Statistical quality of the regression models (NTraining¼74, NTest¼31).

Equationno.

Chemometric tools LVs R2 Q2 R2acRp

2r2mðLOOÞ

Δrm2(LOO) R2pred r2mðtestÞ

Δrm2(test) r2mðoverallÞ

Δrm2(overall)

1 PLS (2χ, density, ½∑α�p=∑α, Atype_C_36, Atype_N_74) 4 0.774 0.708 0.760 0.781 0.601 0.160 0.798 0.646 0.201 0.623 0.178

2 PLS (2χ, log Kow, density, ½∑α�p=∑α, Atype_C_36,Atype_N_74)

5 0.782 0.705 0.765 0.774 0.600 0.152 0.813 0.672 0.186 0.627 0.172

S. Pramanik, K. Roy / Ecotoxicology and Environmental Safety 101 (2014) 184–190186

DF¼4, ROCED¼0.718, ROCFIT¼1.640, squared mahalanobis dis-tance¼6.136, F(4, 69)¼22.11241, po0.00001, MCCTraining¼0.729, and MCCTest¼0.692

The LDA equation was developed setting the method of theforward stepwise method for variable selection with F¼4 forinclusion and F¼3.9 for exclusion. The a priori classificationprobabilities were set at same for all groups (0.5). The discriminantfeatures observed in this model (Eq. (3)) are (1) S_aaaC, (2) Aty-pe_C_24, (3) 2χv and (4) MW. According to the toxicity thresholdpoint (pEC50¼0.936 mM), 53 chemicals were considered as low or

nontoxic out of 74 chemicals of the training set members. Themodel hypothesized that 46 chemicals are truly toxic and 7 arenontoxic out of 53 toxic chemicals. The model also suggests thatout of 21 nontoxic chemicals 19 are truly nontoxic and 2 aretoxic. A number of qualitative statistical analyses were done to assessmodel predictive quality such as sensitivity¼86.79%, specificity¼90.47%, accuracy¼87.83%, precision¼95.83%, F-measure¼91.08%.The external predictive quality also checked using on test setchemicals. Qualitative statistical analyses for external predictivitysuch as sensitivity¼94.44%, specificity¼69.23%, accuracy¼83.87%,

Fig. 1. (A) Scatter plot of the model developed without partition co-efficient (log Kow) to explain model predictability and overall accuracy of the final PLS equation (Eq. (1)).(B) Scatter plot of the model developed with partition co-efficient (log Kow) to explain model predictability and overall accuracy of the final PLS equation (Eq. (2)).

Fig. 2. (A) Validation of the model developed without partition co-efficient (log Kow) using randomization test (Eq. (1)). (B) Validation of the model developed with partitionco-efficient (log Kow) using randomization test (Eq. (2)).

Fig. 3. (A) Variable importance plot (VIP) of the descriptors in the model developed without partition co-efficient (log Kow) (Eq. (1)). (B) Variable importance plot (VIP) of thedescriptors in the model developed with partition co-efficient (log Kow) (Eq. (2)).

S. Pramanik, K. Roy / Ecotoxicology and Environmental Safety 101 (2014) 184–190 187

precision¼80.95%, F-measure¼87.17% reflect good predictive qualityof the model. The discrimination accuracy was also measuredby the area under curve (AUC) of the receiver operating characteristiccurve (ROC). Here, it is observed that the AUC of ROCTraining is0.968 and AUC of ROCTest is 0.957 for training/discrimination andtest sets respectively (Fig. 4). Some additional statistical parameters(eigen value¼1.281, DF¼4, ROCED¼0.718, ROCFIT¼1.640, squaredMahalanobis distance¼6.136, F(4, 69)¼22.11241, po0.00001,MCCTraining¼0.729, MCCTest¼0.692) were also evaluated in connec-tion with discriminating feature analysis capability of the LDA model.Specifically, it can be explained that MCC values for training and testsets show the quality of a good classifier. The discriminating role ofpredictive variables (S_aaaC, Atype_C_24, 2χv and MW) has beengraphically represented in the contribution plot (Fig. 5).

4. Discussion

4.1. Analysis of the PLS equations

The variable importance plot (VIP) for Eq. (1) (Fig. 3A) reflectsthat 2χ (second-order molecular connectivity index) has the high-est contribution to the response. This descriptor can be calculatedfrom the vertex degree (δ) of the atoms in the hydrogen atomdepleted molecular graph of chemicals. The descriptor 2χ can becalculated from the sum of the multiplication of square roots of δi

values of atoms of the pairs of edges or bonds (δj and δl) in themolecular graph of chemicals as follows:

2χ ¼ ∑2p

k ¼ 1ðδiδlδjÞ�1=2

k

Here, k runs overall in the mth order subgraphs constituted byn atoms (n¼mþ1 for acyclic subgraphs, here n is number ofskeleton atoms or vertices and m is order of subgraphs in themolecular graph of the molecules) and 2p is second order pathcount. This descriptor has a positive contribution in the modeldevelopment suggesting that chemicals having higher 2χ valuesare higher toxic. This descriptor has relationship with molecularsize and lipophilicity. It is observed that polycyclic aromatichydrocarbons (e.g. dibenzo[b,i]anthracene, perylene and benzo[b]chrysene), phenols (e.g. 2,3,4,6-tetrachlorophenol, and 2,3-dichlorophenol), anilines (e.g. 3,4-dichloroaniline and 3,4,5-tri-chloroaniline), pesticides (e.g. dichlorvos and parathion) havinghigher values of 2χ are highly toxic to P. subcapitata. In this context,it can be inferred that highly lipophilic chemicals are environmen-tally highly toxic.

Density is a 3D spatial descriptor. This is defined as the ratio ofmolecular weight to molecular volume of the chemicals. Thedensity reflects the types of atoms and how tightly they arepacked in the molecules. Density can be related to transport andmelting behavior of chemicals. This descriptor has a positivecontribution with a high coefficient value. According to thecontribution of this descriptor, it is seen that aromatic and here-toaromatic chemicals with higher molecular weight and lowermolecular volume are highly toxic on P. subcapitata. Chemicalssuch as pesticides (e.g. atrazine, dichlorvos and parathion), poly-cyclic aromatic hydrocarbons (e.g. benzo[b]chrysene and benzo[a]anthrace), phenols (e.g. 4-chlorophenol, phenol and 4-nitrophe-nol) and nitriles (e.g. trichloroacetonitrile and bromoacetoitrile)having higher values of this descriptor are highly toxic. The lessdense chemicals such as 1-propanol and 1-octanol are leastenvironmentally toxic. In this context, it can be inferred thataromatic and heteroaromatic chemicals are more environmentallytoxic than aliphatic chemicals.

The descriptor ½∑α�p=∑α belongs to the class of extendedtopochemical atom (ETA) indices. Here ½∑α�p is sum of values ofall non-hydrogen vertices each of which is joined to only one othernon-hydrogen vertex of the molecule and ∑α is sum of values ofall non-hydrogen vertices of a molecule. The descriptor ½∑α�p= ∑αdefines relative shape of molecules. This descriptor has a negativecontribution in model development. Chemicals having higher

Fig. 4. (A) Receiver operating characteristic curve (ROC) for training set. Here, it is observed that the AUC of ROCTraining is 0.968. (B) Receiver operating characteristic curve(ROC) for test set. Here, it is observed that the AUC of ROCTest is 0.957.

Fig. 5. Contribution plot to define discriminating role of predictive variables(S_aaaC, Atype_C_24, 2χv and MW) in differentiating chemicals into toxic andnontoxic groups.

S. Pramanik, K. Roy / Ecotoxicology and Environmental Safety 101 (2014) 184–190188

values of this descriptor are less toxic. Aliphatic hydrocarbons suchas tetrachloromethane, 1-propanol and acetone having highervalues of this variable are least toxic on this alga. This observationalso supports the previous statement that aliphatic compounds areless toxic than aromatic and cyclic ones.

The descriptors Atype_N_74 and Atype_C_36 are atom type frag-ment descriptors and signify the contribution of the different mole-cular fragments to the hydrophobicity profile of the chemicals. TheAtype_N_74 descriptor defines the presence of nitrogen atoms withdouble and triple bonds (R–N� and R–N¼). The Atype_C_36descriptor measures number of fragments bearing Al–CH¼X(Al represent aliphatic group, X represents heteroatoms such O, N, S,P, Se, and halogens). These two descriptors have positive contributionsin model development and they have least contributions.

The variable importance plot (VIP) for Eq. (2) (Fig. 3B) shows thatall descriptors in this equation have almost equivalent contributionwith the corresponding descriptors of Eq. (1) to the response. Theadditional descriptor log Kow is an important descriptor with impor-tance lying between those of 2χ and density descriptors of Eq. (1). Thelog Kow has been classically used to describe penetration character-istics or the hydrophobic character of the chemicals. This descriptorhas a positive contribution in aquatic toxicity prediction. According tothe contribution of this descriptor, it is seen that aromatic andheretoaromatic chemicals with higher lipophilicity are highly toxicon P. subcapitata. Chemicals such as pesticides (e.g. atrazine, dichlorvosand parathion), polycyclic aromatic hydrocarbons (e.g. benzo[b]chry-sene and benzo[a]anthrace) and phenols (e.g. 2,3,4,6-tetrachlorophe-nol) having higher values of this descriptor are highly toxic. The lesslipophilic chemicals such as 1-propanol, 1-octanol and glutaraldehydeare least environmentally toxic. In this context, it may be mentionedagain that aromatic and heteroaromatic chemicals are more envir-onmentally toxic than aliphatic chemicals as evident from thedescriptors density and ½∑α�p=∑α. The Atype_N_74 and ½∑α�p=∑αdescriptors have equal contribution in this model development. TheAtype_N_74 descriptor has a positive contribution while ½∑α�p=∑αhas a negative contribution to the response. The Atype_C_36 descrip-tor has positive contributions in model development and it has theleast contribution in toxicity prediction.

4.2. Analysis of the LDA equation

The developed LDA model suggests that four descriptors(S_aaaC, Atype_C_24, 2χv and MW) have prominent role indiscriminating features analysis to distinguish chemicals into toxicand nontoxic groups. The S_aaaC variable belongs to the class oftopological descriptors and defines the sum of E-state values ofcarbon atoms with three aromatic bonds. This signify the impor-tance of fused (ploy)aromatic carbons. The Atype_C_24 definessecondary carbon atom (R–CH–R) fragments of the chemicals. The2χv index is a refinement of the 2χ index where valence vertexdegree (δv) is considered instead of vertex degree (δ). This hasrelationship with lipophilicity, branching and functionality. Theconformational independent typical 0D descriptor molecularweight (MW) is the sum of atomic weight of individual atoms.This has a major role in defining molecular density, molecularmass, rigidity and presence of individual atoms in chemicals. Thedescriptors in Eq. (3) were subjected to contribution plot toindentify contributing features towards differentiating toxic andnontoxic groups. According to the contribution plot (Fig. 5),molecular weight (MW) has the maximum contribution towardsthe categorization of chemicals into toxic and nontoxic groups. It isobserved that chemicals with higher molecular weight are highlytoxic than chemicals with lower molecular weight on P. subcapi-tata. It is also seen that chemicals such as pesticides (e.g. parathionand malathion), polycyclic aromatic hydrocarbons (e.g. dibenzo[b,i]anthracene, perylene and benzo[b]chrysene) and phenols (e.g.

2,3,4,6-tetrachlorophenol) with higher molecular weight belongto highly environmentally toxic chemicals. The 2χv index hasinverse contribution in discriminating feature analysis on bothgroups. Its role is next to the molecular weight in differentiatingchemicals into toxic and nontoxic groups. Other two descriptors(S_aaaC, Atype_C_24) have minor capability in the discriminantfeatures analysis.

4.3. Overview

The developed regression and classification (LDA) modelscomprehensively describe the molecular features responsible forchemical toxicity against P. subcapitata. It may be observed thatdescriptors belonging to molecular connectivity, 3D spatial (den-sity), extended topochemical atom (relative shape of molecules)and molecular fragment contribution to the hydrophobicity have aconspicuous role in defining the toxicity potency of chemicals. Ingeneral, compounds with higher molecular size and lipophilicity,fused polyaromatic an dherearomatic compounds are more toxic.Overall, both models suggest the similar type of properties ofchemicals for their toxicity towards P. subcapitata.

5. Conclusion

The ecological functions of individual species are unique in theenvironment. The safety concern of human health and the environ-ment from the stress of chemicals needs toxicity potential analysis ofchemicals. The toxic potency of chemicals in P. subcapitata wasmodeled in two ways: (1) determination of molecular information ofchemicals responsible for P. subcapitata toxicity through developmentof regression based QSTR models; (2) classification of chemicals intotoxic and nontoxic groups from the molecular information responsiblefor P. subcapitata toxicity based on the LDA study. The first approachsuggests that second order molecular connectivity index, relativeshape of molecules, density and different molecular fragments of thechemicals are important properties of chemicals to exert their toxicityon P. subcapitata. The second approach suggests that sum of E-stateatoms of carbon atoms with three aromatic bonds, secondary carbonatoms fragments, second order valence molecular connectivity indexand molecular weight are distinguishing features to differentiatechemicals into toxic and nontoxic groups. In general, compounds withhigher molecular size and lipophilicity, fused polyaromatic and here-aromatic compounds are more toxic. The statistical parameters of theboth models have reliable predictive quality. Therefore, these modelsmay be applicable for toxicity prediction of chemicals prior toexperimental evaluation.

Acknowledgment

The authors thank Council of Scientific and Industrial Research(CSIR), New Delhi for awarding a major research Project (No. 01(2546)/11/EMR-II) to KR and a senior research fellowship to SP.

Appendix A. Supplementary materials

Supplementary data associated with this article can be found in theonline version at http://dx.doi.org/10.1016/j.ecoenv.2013.12.030.

References

Ahrens, A., Traas, T.P., 2007. Environmental exposure scenarios: development,challenges and possible solutions. J. Expo. Sci. Environ. Epidemiol. 17, S7–S15.

Aruoja, V., Moosus, M., Kahru, A., Sihtmae, M., Maran, U., 2014. Measurement ofbaseline toxicity and QSAR analysis of 50 non-polar and 58 polar narcotic

S. Pramanik, K. Roy / Ecotoxicology and Environmental Safety 101 (2014) 184–190 189

chemicals for the alga Pseudokirchneriella subcapitata. Chemosphere 96,23–32.

Azarbad, H., Niklinska, M., van Gestel, C.A., van Straalen, N.M., Roling, W.F.,Laskowski, R., 2013. Microbial community structure and functioning alongmetal pollution gradients. Environ. Toxicol. Chem. 32 (9), 1992–2002.

Bell, T., Newman, J.A., Silverman, B.W., Turner, S.L., Lilley, A.K., 2005. The contribu-tion of species richness and composition to bacterial services. Nature 436,1157–1160.

Brose, U., Ostling, A., Harrison, K., Martinez, N.D., 2004. Unified spatial scaling ofspecies and their trophic interactions. Nature 428, 167–171.

Cardinale, B.J., Emmett, D.J., Gonzalez, A., Hooper, D.U., Perrings, C., Venail, P.,Narwani, A., Mace, M.G., Tilman, D., Wardle, D.A., Kinzig, A.P., Daily, G.C., Loreau, M.,Grace, J.B., Larigauderie, A., Srivastava, D.S., Naeem, S., 2012. Biodiversity loss and itsimpact on humanity. Nature 486, 59–67.

Cerius 2 Version 4.10. ⟨http://accelrys.com/products⟩.Chapin III, F.S., Zavaleta, E.S., Eviner, V.T., Naylor, R.L., Vitousek, P.M., Reynolds, H.L.,

Hooper, D.U., Lavorel, S., Sala, O.E., Hobbie, S.E., Mack, M.C., Diaz, S., 2000.Consequences of changing biodiversity. Nature 405, 234–242.

ChemAxon Ltd. ⟨http://www.chemaxon.com⟩.Chen, C.Y., Wang, Y.J., Yang, C.F., 2009. Estimating low-toxic-effect concentrations in

closed-system algal toxicity tests. Ecotoxicol. Environ. Saf. 72 (5), 1514–1522.Daouk, S., Copin, P.J., Rossi, L., Chevre, N., Pfeifer, H.R., 2013. Dynamics and

environmental risk assessment of the herbicide glyphosate and its metaboliteAMPA in a small vineyard river of the Lake Geneva catchment. Environ. Toxicol.Chem. 32 (9), 2035–2044.

Darlington, R.B., 1990. Regression and Linear Models. McGrawHill, New York. USADougherty, E.R., Barrera, J., Brun, M., Kim, S., Cesar, R.M., Chen, Y., Bittner, M., Trent,

J.M., 2002. Inference from clustering with application to gene-expressionmicroarrays. J. Comput. Biol. 9 (1), 105–126.

Du, H., Wang, J., Watzl, J., Zhang, X., Hu, Z., 2008. Classification structure-activityrelationship (CSAR) studies for prediction of genotoxicity of thiophene deriva-tives. Toxicol. Lett. 177 (1), 10–19.

Emerson, B.C., Kolm, N., 2005. Species diversity can drive speciation. Nature 434,1015–1017.

Eriksson, L., Johansson, E., Kettaneh-Wold, N., Wold, S., 2001. Multi-and Mega-variate Data Analysis: Principles and Applications, 2nd ed. Umetrics Academy,Umetrics, Umea, Sweden, pp. 1–533

Eriksson, L., Johansson, E., Lindgren, F., Sjostrom, M., Wold, S., 2002. Megavariateanalysis of hierarchical QSAR data. J. Comput. Aided Mol. Des. 16 (10), 711–726.

Everitt, B., Landau, S., Leese, M., 2001. Cluster Analysis. Arnold, London, UKFawcett, T., 2006. An introduction to ROC analysis. Pattern Recognit. Lett. 27,

861–874.Fisher, R.A., 1936. The use of multiple measurements in taxonomic problems. Ann.

Eugen. 7, 179–188.Furukawa, J., Saeguas, T., 1962. High polymerization of aldehydes, alkylene oxides

and diketene. Pure Appl. Chem. 4 (2–4), 387–406.Galvez-Llompart, M., Recio, M.C., Garcia-Domenech, R., 2011. Topological virtual

screening: a way to find new compounds active in ulcerative colitis byinhibiting NF-κB. Mol. Divers. 15 (4), 917–926.

Garrigues, P., 2005. Environmental chemistry: the ultimate challenge in analyticalchemistry. Anal. Bioanal. Chem. 381, 3–4.

Gasteiger, J., Marsili, M., 1980. Iterative partial equalization of orbitalelectronegativity-a rapid access to atomic charges. Tetrahedron 36, 3219–3228.

Golbraikh, A., Tropsha, A., 2002. Predictive QSAR modeling based on diversitysampling of experimental datasets for the training and test set selection. Mol.Divers. 5, 231–43.

Gonzalez, A., Clemente, J.C., Shade, A., Metcalf, J.L., Song, S., Prithiviraj, B., Palmer, B.E., Knight, R., 2011. Our microbial selves: what ecology can teach us. EMBO Rep.12 (8), 775–784.

Gravel, D., Bell, T., Barbera, C., Bouvier, T., Pommier, T., Venail, P., Mouquet, N., 2011.Experimental niche evolution alters the strength of the diversity-productivityrelationship. Nature 469, 89–92.

Hartung, T., 2009. Toxicology for the twenty-first century. Nature 460, 208–212.Hartung, T., van Vliet, E., Jaworska, J., Bonilla, L., Skinner, N., Thomas, R., 2012.

Systems toxicology. ALTEX 29, 119–128.Hector, A., Bagchi, R., 2007. Biodiversity and ecosystem multifunctionality. Nature

448, 188–190.⟨http://padel.nus.edu.sg/software/padeldescriptor⟩⟨http://www.minitab.com⟩

⟨http://www.spss.com⟩

⟨http://www.statsoft.com⟩

Ishida, S., 1981. Polymerization of formaldehyde and the physical properties of thepolymerization products. I. J. Appl. Polym. Sci. 26 (8), 2743–2750.

Johnson, A.R., Wichern, W.D., 2005. Applied Multivariate Statistical Analysis.Pearson, Delhi, India

Kar, S., Deeb, O., Roy, K., 2012. Development of classification and regression basedQSAR models to predict rodent carcinogenic potency using oral slope factor.Ecotoxicol. Environ. Saf. 82, 85–95.

Kolter, R., Greenberg, P.E., 2006. Microbial sciences: the superficial life of microbes.Nature 441, 300–302.

Kubinyi, H., Hamprecht, F.A., Mietzner, T., 1998. Three-dimensional quantitativesimilarity–activity relationships (3D QSiAR) from SEAL similarity matrices. J.Med. Chem. 41 (14), 2553–2564.

Lee, S.Y., Kang, H.J., Kwon, J.H., 2013. Toxicity cutoff of aromatic hydrocarbons forluminescence inhibition of Vibrio fischeri. Ecotoxicol. Environ. Saf. 94, 116–122.

Li, Z., Zhang, Z., Kay, B.D., Dohnalek, Z., 2011. Polymerization of formaldehyde andacetaldehyde on ordered (WO3)3 films on Pt(111). J. Phys. Chem. C 115 (19),9692–9700.

Martin, T.M., Harten, P., Young, D.M., Muratov, E.N., Golbraikh, A., Zhu, H., Tropsha,A., 2012. Does rational selection of training and test sets improve the outcomeof QSAR modeling? J. Chem. Inf. Model. 52 (10), 2570–2578.

Mitra, I., Saha, A., Roy, K., 2010. Chemometric modeling of free radical scavengingactivity of flavone derivatives. Eur. J. Med. Chem. 45 (11), 5071–5079.

Mitteroecker, P., Bookstein, F., 2011. Linear discrimination, ordination, and thevisualization of selection gradients in modern morphometrics. Evol. Biol. 38,100–114.

Perez-Garrido, A., Helguera, A.M., Borges, F., Cordeiro, M.N., Rivero, V., Escudero, A.G.,2011. Two new parameters based on distances in a receiver operating character-istic chart for the selection of classification models. J. Chem. Inf. Model. 51 (10),2746–2759.

Planson, A.G., Carbonell, P., Paillard, E., Pollet, N., Faulon, J.L., 2012. Compoundtoxicity screening and structure–activity relationship modeling in Escherichiacoli. Biotechnol. Bioeng. 109 (3), 846–850.

Powers, D.M.W., 2011. Evaluation: from precision, recall and F-measure to Roc,informedness, markedness & correlation. J. Mach. Learn. Technol. 2 (1), 37–63.

Raes, J., Letunic, I., Yamada, T., Jensen, L.J., Bork, P., 2011. Toward molecular trait-based ecology through integration of biogeochemical, geographical and meta-genomic data. Mol. Syst. Biol. 473, 1–9.

Rohr, J.R., Schotthoefer, A.M., Raffel, T.R., Carrick, H.J., Halstead, N., Hoverman, J.T.,Johnson, C.M., Johnson, L.B., Lieske, C., Piwoni, M.D., Schoff, P.K., Beasley, V.R.,2008. Agrochemicals increase trematode infections in a declining amphibianspecies. Nature 455 (7217), 1235–1239.

Roy, K., Chakraborty, P., Mitra, I., Ojha, P.K., Kar, S., Das, R.N., 2013. Some casestudies on application of “rm

2” metrics for judging quality of quantitativestructure–activity relationship predictions: emphasis on scaling of responsedata. J. Comput. Chem. 34 (12), 1071–1082.

Roy, K., Mitra, I., 2011. On various metrics used for validation of predictive QSARmodels with applications in virtual screening and focused library design. Comb.Chem. High Throughput Screen. 14 (6), 450–474.

Roy, K., Mitra, I., Kar, S., Ojha, P.K., Das, R.N., Kabir, H., 2012. Comparative studies onsome metrics for external validation of QSPR models. J. Chem. Inf. Model. 52 (2),396–408.

Scherb, H., Voigt, K., 2011. Adverse genetic effects induced by chemical or physicalenvironmental pollution. Environ. Sci. Pollut. Res. 18 (5), 695–696.

Schuurmann, G., Ebert, R.U., Chen, J., Wang, B., Kuhne, R., 2008. External validationand prediction employing the predictive squared correlation coefficients testset activity mean vs training set activity mean. J. Chem. Inf. Model. 48 (11),2140–2145.

Shi, J.Q., Yang, X., 2013. Reply to comment on “Acute toxicity and n-octanol/waterpartition coefficients of substituted thiophenols: determination and QSARanalysis”. Ecotoxicol. Environ. Saf. 93, 199.

UMETRICS SIMCA-P 10.0. 2002. ⟨www.umetrics.com⟩, Umea, Sweden.Wold, S., 1995. PLS for multivariate linear modelling. In van de. In: Waterbeemd, H

(Ed.), Chemometric Methods in Molecular Design. VCH, Weinheim, Germany,pp. 195–218

Yuan, J., Pu, Y., Yin, L., 2012. QSAR study of liver specificity of carcinogenicity of N-nitroso compounds. Ecotoxicol. Environ. Saf. 84, 282–292.

S. Pramanik, K. Roy / Ecotoxicology and Environmental Safety 101 (2014) 184–190190


Recommended