+ All Categories
Home > Documents > Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants Using In Silico...

Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants Using In Silico...

Date post: 18-Dec-2016
Category:
Upload: kunal
View: 219 times
Download: 2 times
Share this document with a friend
13
DOI: 10.1002/minf.201300018 Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants Using In Silico Tools: Classification and Regression Based QSARs and Pharmacophore Mapping Supratik Kar [a] and Kunal Roy* [a] 1 Introduction The United Nations Children’s Fund (UNICEF) voiced an alarm at the decline of breastfeeding across East Asia, and stressed the need to ensure that mothers understand the long-term benefits of this important practice for the surviv- al and development of their children. As little as five per cent of all mothers breastfeed in Thailand, while around ten per cent do so in Vietnam. In China, only 28 per cent of babies are breastfed. The potential impact of optimal breastfeeding practices is especially important in develop- ing country situations with a high burden of disease and low access to clean water and sanitation. But non-breastfed children in industrialized countries are also at greater risk of dying. A recent study of post-neonatal mortality in the United States found a 25 % increase in mortality among non-breastfed infants. In the UK Millennium Cohort Survey, six months of exclusive breast feeding was associated with a 53 % decrease in hospital admissions for diarrhoea and a 27 % decrease in respiratory tract infections. [1,2] Breastfeeding of infants under two years of age has the greatest prospective on child survival of all preventive in- terventions. It provides health benefits for both the mother and child. Researchers show that breastfeeding also con- tributes to maternal health immediately after the delivery because it helps to reduce the risk of post-partum haemor- rhage, assists the uterus return to its pre-pregnant state faster, can help women to lose weight after baby’s birth and reduces the risk of osteoporosis. In the short term, breastfeeding delays the return to fertility and in the long term, it reduces the risk of mothers with gestational diabe- tes to develop Type 2 diabetes and breast, uterine and ovarian cancer. [3] Breast milk has important ingredients that are not found in any infant formula, to build the baby’s immune system and is a perfect food to promote healthy growth and de- velopment of infant. Breastfeeding has profound impact on a child’s survival, health, nutrition and development. Breast milk provides all of the nutrients, vitamins and minerals an infant needs for growth for the first six months, and no other liquids or food are needed. In addition, breast milk carries antibodies from the mother that help combat dis- [a] S. Kar, K. Roy Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University Kolkata 700032, India phone: + 91-98315 94140; fax: + 91-33-2837 1078 *e-mail: [email protected] [email protected] Homepage: http://sites.google.com/site/kunalroyindia Supporting information for this article is available on the WWW under http://dx.doi.org/10.1002/minf.201300018. Abstract : A large set of 185 compounds with diverse mo- lecular structures and different mechanisms of therapeutic actions was used to develop and validate statistically signif- icant classification and regression based QSTR models for predicting partitioning of drugs/chemicals into breast milk. Pharmacophore mapping was also carried out which showed four important features required for lower risk of secretion into milk: (i) hydrophobic group (HYD), (ii) ring ar- omatic group (RA), (iii) negative ionizable (NegIon) and (iv) hydrogen bond donor (HBA). QSTR and pharmacophore models were rigorously validated internally as well as exter- nally to check the possibilities of any chance correlation and judge the predictive potential of the models. Pharma- cological distribution diagrams (PDDs) were used for the classification model as a visualizing technique for the iden- tification and selection of chemicals with lower partitioning into milk. Our in silico models enable to identify the essen- tial structural attributes and quantify the prime molecular pre-requisites which were chiefly responsible for secretion into milk. The developed models were also implemented to screen milk/plasma partitioning potential for a huge number DrugBank database (http://www.drugbank.ca/) compounds. Keywords: Milk-plasma partitioning · In silico · LDA · QSAR · Pharmacophore · QSTR Mol. Inf. 2013, 32, 693 – 705 # 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim 693
Transcript
Page 1: Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants Using In Silico Tools: Classification and Regression Based QSARs and Pharmacophore Mapping

DOI: 10.1002/minf.201300018

Prediction of Milk/Plasma Concentration Ratios of Drugsand Environmental Pollutants Using In Silico Tools:Classification and Regression Based QSARs andPharmacophore MappingSupratik Kar[a] and Kunal Roy*[a]

1 Introduction

The United Nations Children’s Fund (UNICEF) voiced analarm at the decline of breastfeeding across East Asia, andstressed the need to ensure that mothers understand thelong-term benefits of this important practice for the surviv-al and development of their children. As little as five percent of all mothers breastfeed in Thailand, while aroundten per cent do so in Vietnam. In China, only 28 per cent ofbabies are breastfed. The potential impact of optimalbreastfeeding practices is especially important in develop-ing country situations with a high burden of disease andlow access to clean water and sanitation. But non-breastfedchildren in industrialized countries are also at greater riskof dying. A recent study of post-neonatal mortality in theUnited States found a 25 % increase in mortality amongnon-breastfed infants. In the UK Millennium Cohort Survey,six months of exclusive breast feeding was associated witha 53 % decrease in hospital admissions for diarrhoea anda 27 % decrease in respiratory tract infections.[1,2]

Breastfeeding of infants under two years of age has thegreatest prospective on child survival of all preventive in-terventions. It provides health benefits for both the motherand child. Researchers show that breastfeeding also con-tributes to maternal health immediately after the deliverybecause it helps to reduce the risk of post-partum haemor-rhage, assists the uterus return to its pre-pregnant state

faster, can help women to lose weight after baby’s birthand reduces the risk of osteoporosis. In the short term,breastfeeding delays the return to fertility and in the longterm, it reduces the risk of mothers with gestational diabe-tes to develop Type 2 diabetes and breast, uterine andovarian cancer.[3]

Breast milk has important ingredients that are not foundin any infant formula, to build the baby’s immune systemand is a perfect food to promote healthy growth and de-velopment of infant. Breastfeeding has profound impact ona child’s survival, health, nutrition and development. Breastmilk provides all of the nutrients, vitamins and minerals aninfant needs for growth for the first six months, and noother liquids or food are needed. In addition, breast milkcarries antibodies from the mother that help combat dis-

[a] S. Kar, K. RoyDrug Theoretics and Cheminformatics Laboratory, Department ofPharmaceutical Technology, Jadavpur UniversityKolkata 700032, Indiaphone: + 91-98315 94140; fax: + 91-33-2837 1078*e-mail : [email protected]

[email protected]: http://sites.google.com/site/kunalroyindia

Supporting information for this article is available on the WWWunder http://dx.doi.org/10.1002/minf.201300018.

Abstract : A large set of 185 compounds with diverse mo-lecular structures and different mechanisms of therapeuticactions was used to develop and validate statistically signif-icant classification and regression based QSTR models forpredicting partitioning of drugs/chemicals into breast milk.Pharmacophore mapping was also carried out whichshowed four important features required for lower risk ofsecretion into milk: (i) hydrophobic group (HYD), (ii) ring ar-omatic group (RA), (iii) negative ionizable (NegIon) and (iv)hydrogen bond donor (HBA). QSTR and pharmacophoremodels were rigorously validated internally as well as exter-nally to check the possibilities of any chance correlation

and judge the predictive potential of the models. Pharma-cological distribution diagrams (PDDs) were used for theclassification model as a visualizing technique for the iden-tification and selection of chemicals with lower partitioninginto milk. Our in silico models enable to identify the essen-tial structural attributes and quantify the prime molecularpre-requisites which were chiefly responsible for secretioninto milk. The developed models were also implemented toscreen milk/plasma partitioning potential for a hugenumber DrugBank database (http://www.drugbank.ca/)compounds.

Keywords: Milk-plasma partitioning · In silico · LDA · QSAR · Pharmacophore · QSTR

Mol. Inf. 2013, 32, 693 – 705 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim 693

Page 2: Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants Using In Silico Tools: Classification and Regression Based QSARs and Pharmacophore Mapping

ease. The act of breastfeeding itself stimulates propergrowth of the mouth and jaw, and secretion of hormonesfor digestion and satiety. Breastfeeding also lowers the riskof chronic conditions later in life, such as obesity, high cho-lesterol, high blood pressure, diabetes, childhood asthma,gastro-intestinal (gut) illness, allergies, respiratory tract(chest) infections, urinary tract infections, SIDS (cot death)and childhood leukaemia.[4,5]

Due to various health concerns, women are forced totake medication while breast feeding. The accumulation ofa specific drug in milk is associated with a high risk to theinfant that can exceed the benefits of breast feeding.[6]

Again, humans are largely exposed to toxic food and envi-ronmental contaminates and toxicants. Thus, there is in-creasing concern regarding the presence of drugs and con-taminants in breast milk, leading to potentially harmful ef-fects to the nursing infants.[7,8] Numerous reports havebeen published on levels of polychlorinated biphenyls(PCBs) and dichlorodiphenyltrichloroethane (DDT) inhuman milk in the USA.[9,10] The occurrence of drugs andenvironmental pollutants in human milk in recent reviewshas worsened the fear in extremities.[11,12]

Exposure Index’ (EI) has been proposed by Ito and Lee[13]

that can relate the milk to plasma ratio of drug concentra-tion (M/P), the milk intake (A), and the Infant Drug Clear-ance based on the average time of drug exposure level.

EI ð%Þ ¼ 100� ðM=PÞ � A=InfantDrugClearance ð1Þ

Milk/plasma concentration ratio of drug (M/P) is a key pa-rameter of EI, which has been proposed and used to deter-mine the amount of drugs transferred into human breastmilk. It represents the ratio of drug concentration in breastmilk to that in maternal plasma and is expressed as:

M=P ¼ CBMd=CMP

d ð2Þ

In Equation 2, CBMd and CMP

d are drug concentrations inmaternal breast milk and her plasma, respectively.[10] Drugswith M/P value of 1 or more are present in breast milk athigher level than in mother’s plasma and are classified ashigh risk chemicals while those with M/P value less than1 are classified as low risk chemicals. Several factors such asmolecular weight, lipid-water solubility, and ionization ofthe drug, protein binding and pH of the milk or pKa of thedrug are concerned in the accumulation of a drug in breastmilk and its absorption by the baby.[8] Again, the results ofthese experiments depend on various uncontrolled varia-bles such as laboratory conditions, geographical region andtime of lactation, and thus, the reliability of these experi-mental data is highly dubious.

Lack of information of the concentration of a drug/chem-ical in breast milk increases the difficulty in assessing therisk of exposure of these agents to the infant. Unfortunate-ly, for many common drugs, the M/P ratios are not deter-mined. Therefore, even an approximate prediction of the

experimental concentration of a drug in breast milk forgiven dosage to the mother would be very useful in clinicalstudy. Development of some theoretical methods such asquantitative structure�activity relationship models (QSAR)to classify the compounds into higher and lower risk onesand to predict the M/P values are necessary and useful forassessing the risk of substances mostly lacking any experi-mental data. As the structure of drugs/chemicals is hetero-geneous in nature, pharmacophore mapping may be animportant in silico tool for predictive modeling studies. Suc-cessful QSARs and 3D pharmacophore models offer the po-tential to be used as a appropriate filter for virtual screen-ing to identify drugs/chemicals that may pose the risk tohumans through secretion into milk.[14,15] In silico predic-tions are a striking alternative to costly and labor exhaus-tive in vitro and in vivo testing.[16] Again, the EuropeanChemical Bureau encourages the use of models in makingregulatory decisions while allowing the use of correlationbased models for screening assessments. Predictive modelsare used by Food and Drug Administration (FDA) to mini-mize false negatives and false positives saving incalculablecosts for manufacturers.[17]

A few QSAR models have been constructed to predict M/P values of drugs. Agatonovic-Kustrin et al.[18,19] developeda genetic neural network model using a set of 60 drugcompounds and examined a larger set of 123 drugs andapplied an artificial neural network to predict their degreeof transfer into breast milk. Katritzky et al.[7] investigatedthe prediction of M/P ratios for a set of 115 drugs usingmultiple linear regression. Zhao et al.[20] used a supportvector machine, SVM, method to analyze M/P ratios for 126drugs. The only statistic they gave was an ‘accuracy’ of90.48 %. Also, Abraham et al.[21] developed QSAR models forM/P ratio with a large data set consisting of 179 drugs andhydrophobic environmental pollutants. They developeda nonlinear ANN model using five linear free energy rela-tionship (LFER) parameters as inputs. A classification ofdrugs was also done by Fatemi and Ghorbanzade[22] accord-ing to their milk/plasma concentration ratio (M/P) by usingcounter propagation artificial neural network (CP-ANN). Theresults of this study revealed the superiority of CP-ANNover other methods in terms of accuracy of classifications.

In the present work, we have constructed classificationand regression based QSTR models to predict milk/plasmaconcentration data using 185 diverse chemicals. We havedeveloped linear regression models using simple computedmolecular descriptors for easy interpretation and reproduci-bility. Pharmacophore models have also been built up forthe first time to encode the essential features of drugs andenvironmental pollutants for their safety profile with re-spect to secretion into milk. All the developed models havebeen assessed according to the Organization for EconomicCooperation and Development (OECD) principles.[23] Thedata analysis provides significant insight on the applicabili-ty of such statistical models as well as identifying the fea-tures relevant for toxicity. The aim of this work has been to

694 www.molinf.com � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2013, 32, 693 – 705

Full Paper S. K. Kar, K. Roy

Page 3: Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants Using In Silico Tools: Classification and Regression Based QSARs and Pharmacophore Mapping

develop integrated in silico models for preliminary identifi-cation of the essential structural attributes and quantify theprime molecular prerequisites which are majorly responsi-ble for the risk of secretion into milk and followed byscreening of compounds from the DrugBank database.[24]

The constructed in silico models may constitute importantquery tools for prediction the milk/plasma concentrationratio during the early drug development stages. The majoraim behind this study is that if virtual screening approachesare used for lead or hit searching using DrugBank database,risk potential due to secretion into milk can be checked forindividual DrugBank compounds using our reported results.These models provide rich information in the context of vir-tual screening of relevant libraries.

2 Materials and Methods

2.1 Dataset

The milk/plasma concentration ratios of 185 compounds in-cluding drugs and organic pollutants were collected fromthe literature.[20–22] The advantage of including such a widerange of molecules for development of quantitative modelsis that the models encompass a wide range of applicabilitydomain and hence can be successfully utilized for predic-tion of a variety of untested molecules. The data were as-signed as high risk (H) (M/P>1) and low risk (L) (M/P<1)drugs as proposed by Malone et al.[25] for classificationbased QSTR model. For development of the 3D pharmaco-phore model, concentration (M/P) ratio of the moleculeswas multiplied with 1000 to increase the magnitude of thevalue. On the contrary, for development of regressionbased QSTR models, concentration ratios are convertedinto log(0.01+M/P) scale.

2.2 Descriptor Calculation

A pool of 323 descriptors was calculated using Cerius 2 ver-sion 4.10,[26] Dragon 6[27] and PaDEL-Descriptor version2.11[28] software. Using Descriptor + module of the Cerius2software, 239 descriptors belonging to various categorieswere calculated for the present work which include: (i) spa-tial, (ii) topological, (iii) thermodynamic, (iv) electronic and(v) structural parameters and (vi) E-state parameters. Afterexcluding those descriptors having variance less than0.0001, a total of 201 descriptors were chosen fromCerius2. 100 descriptors (constitutional and functionalgroup counts) were selected after considering 99 % inter-correlation among 197 descriptors obtained from Dragonsoftware. Additionally, 22 extended topochemical atom in-dices (ETA) (both the 1st and 2nd generations) were calculat-ed using the PaDEL-Descriptor software.

2.3 Dataset Splitting

Descriptor based QSAR models were developed froma total pool of 185 compounds. In case of descriptor basedQSAR, randomly 97 compounds are selected as the trainingset and the remaining as the test set (88 compounds). Onthe contrary, in case of 3D pharmacophore, as 4 com-pounds had zero values for milk/plasma concentrationratio, those 4 compounds (one from training set and threefrom test set) are deleted from the dataset to constructmeaningful pharmacophore models. So, for pharmaco-phore models, training and test set comprises of 96 and 85compounds respectively.

2.4 Model Development

Three different types of models were built in the presentwork: (a) classification based QSAR model to identify themajor discriminatory features between higher and lowerrisk agents, (b) regression based QSAR models to quantifythe contribution of essential molecular attributes, and (c)3D pharmacophore models to identify the features crucialfor the safe potency profile of the molecules. The descrip-tor based QSAR models were built using stepwise multiplelinear regression (stepwise MLR)[29] and linear discriminantanalysis (LDA) techniques.[30] The 3D pharmacophore modelwas developed using conformers obtained from the BESTmethod of conformer generation based on conformationalanalysis of the molecules using the poling algorithm.[31] The3D pharmacophore were generated using the HypoGenmodule implemented in Discovery Studio 2.1[32] software. Aminimum of 0 to a maximum of 5 features including hydro-gen bond acceptor, hydrophobic, positive ionizable, nega-tive ionizable and ring aromatic features were selected forthe 3D pharmacophore generation. A value of 3 was ascer-tained to the uncertainty parameter.[33] The 10 hypothesesthus generated were analyzed in terms of their correlationcoefficients and the cost function values. A wide range ofdifference (more than 60 bits) between the total cost andnull cost values reduces the probability for existence ofchance correlation for the developed hypotheses.

2.5 Software

Software tools like Discovery Studio 2.1,[32] STATISTICA7.0,[34] SPSS 9.0[35] and MINITAB 14[36] have been used in thepresent study for developing in silico models.

2.6 Validation Metrics

Various statistical metrics were employed to check the fit-ness of the developed models and different internal, exter-nal and overall validation methodologies were subsequent-ly employed for model validation.

Mol. Inf. 2013, 32, 693 – 705 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.molinf.com 695

Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants

Page 4: Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants Using In Silico Tools: Classification and Regression Based QSARs and Pharmacophore Mapping

2.6.1 Metrics for Classification Based QSAR Models

To evaluate the classifier model performance and classifica-tion capability, a number of statistical tests have been em-ployed. Such tests include computation of Wilk’s l statis-tics,[37] Canonical index (Rc),

[38] Matthews correlation coeffi-cient (MCC), squared Mahalanobis distance[37] and plottingof Receiver Operating Characteristic (ROC) curve.[39] Themethod used to select the descriptors was based on theFisher�Snedecor parameter (F), which determines the rela-tive importance of applicant variables.[37] The developedclassification model has also been judged for its qualityusing the chi-square (c2) statistics that detects the inde-pendence between two groups or classes signifying thathigher value of this parameter will indicate greater separa-bility between groups, i.e. good classification analysis. Wehave also considered the value of 1,[40] a parameter thatcan be defined as the ratio of number of compounds inthe training set to the number of descriptors present in thediscriminant model. Besides these, external predictive abili-ty of the model was also determined qualitatively based ondifferent statistical parameters like recall (or sensitivity),specificity, accuracy, precision and F-measure.[41] The ROCcurve obtained by plotting the sensitivity and (1-specificity)indices along the Y and X axes respectively identifies thediscrimination ability of the classification system. The per-formance of a diagnostic variable can be quantified by cal-culating the area under the ROC curve (AUROC). The idealtest would have an AUROC of 1, whereas a random guesswould have an AUROC of 0.5. In this study we have calcu-lated two new additional parameters[42] namely the ROCgraph Euclidean distance (ROCED) and the ROC graph Eucli-dean distance corrected with Fitness Function (FIT(l))(ROCFIT) to have better explainable results. In the presentstudy, we have used another validation diagram namedPDD[43] to verify the degree and extent of discriminationachieved in the training as well as test set observations.

2.6.2 Metrics for Regression Based QSAR Models

The robustness of the regression model was verified byusing different types of validation criteria. The quality ofthe equations was judged by the quality metric R2 and ex-plained variance (R2

a), while the internal predictive ability ofthe models was judged based on the leave-one-out crossvalidation parameter Q2

LOO and leave-many-out (LMO)cross-validation metrics (in this study, we have used L-10 %-O, L-50 %-O and L-20 %-O). External validation has beenperformed with classification based statistical parameters.

2.6.3 Metrics for 3D Pharmacophore Model

Validation of the obtained pharmacophore model wasdone using two procedures: Fischer’s validation (as avail-able in the HypoGen module) for the training set, and ex-ternal validation using the test set prediction method. Ex-

ternal validation has been performed with classificationbased statistical parameters (sensitivity, specificity, accuracy,precision and F-measure) to judge the predictive quality ofthe pharmacophore model.

2.7 Y-Randomization Test

The robustness of the models was checked based on therandomization technique where the activity data of thetraining set molecules was scrambled keeping the descrip-tor matrix unchanged and new models were built based onthe permuted activity data. For a robust model, thesquared correlation coefficient (R2) of the non-randomizedmodel should exceed the squared average correlation coef-ficient of the randomized model (R2

r ) by far. In case of theQSAR model, the model randomization (the Y variable wasscrambled based on the unaltered model descriptors) wasperformed at 99 % confidence level followed by calculationof the cR2

p parameter[44] that penalizes model R2 for smalldifferences in the values of R2 and R2

r . Additionally, in caseof the pharmacophore model, Fischer randomization tech-nique at 95 % confidence level was employed to judgewhether the pharmacophore model was a significant oneor a mere outcome of chance only. In an ideal case, theaverage value of R2 for the randomized models should bezero, i.e. R2

r should be zero. Accordingly, we have calculatedthe metric cR2

p using the following formula for both QSARand pharmacophore models :

cR2p ¼ R� ðR2�R2

rÞ1=2 ð3Þ

For an acceptable model, the value of cR2p should be

more than 0.5.

2.8 Applicability Domain (AD) Test

According to OECD principle 3, a QSAR model should bereported with a defined domain of applicability. Technically,AD represents the chemical space defined by the structuralinformation of the chemicals used in model development,i.e. , the training set compounds in a QSAR analysis.[45] Topredict the property of any new compound with confi-dence, AD for the developed model should be checked.Here, we have tried to check the applicability domain ofour developed models by the Euclidean distance ap-proach.[46]

2.9 Screening of the DrugBank Database

The DrugBank database[24] is a unique resource that com-bines detailed drug data with comprehensive drug targetinformation. The database contains 6711 drug entries in-cluding 1441 FDA-approved small drug molecules, 134FDA-approved biotech drugs, 84 nutraceuticals and 5084experimental drugs. We have downloaded all drugs (6508number of molecular structures available) in sdf format

696 www.molinf.com � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2013, 32, 693 – 705

Full Paper S. K. Kar, K. Roy

Page 5: Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants Using In Silico Tools: Classification and Regression Based QSARs and Pharmacophore Mapping

from http://www.drugbank.ca/. The DrugBank databasemolecules were initially classified into higher and lower riskagents using the developed LDA model and then the de-veloped regression equation was used to quantify their riskin terms of log(0.01+M/P) values. Based on the set thresh-old potency value, the DrugBank database compoundswere again classified using the predicted risk values intohigher and lower risk classes and matched with the classifi-cation results. Then, for feature based screening, the Drug-Bank compounds were mapped using the pharmacophoremodel. Thus, a large number of compounds were screenedfor their toxic potentials using both descriptor and featuresbased in silico models.

3 Results

3.1 Results Obtained from Classification Based QSARAnalysis

Considering the potency threshold value, out of 185 com-pounds, 108 are identified as lower and 77 are identified ashigher risk compounds secreting into breast milk. Out ofthe 97 discrimination set (training set) compounds, 39 com-pounds belonged to the higher risk class and 58 com-pounds belonged to the lower risk group. Similarly, 38 and50 compounds belonging to the higher and lower riskgroups respectively constituted the test set. The values ofthe descriptors varied widely in magnitude. To determinethe relative importance of a given descriptor easily fromthe magnitude of its coefficient, each descriptor set (dis-crimination as well as test set) was scaled within the range(0 to 1) according to the following equation:

D0 ¼ ðD�DminÞ=ðDmax�DminÞ ð4Þ

In Equation 4, D’ is the scaled descriptor, D is the un-scaled value of the descriptor, and Dmax and Dmin are the

maximum and minimum values of the particular descriptorconsidering all training set compounds.

LDA was performed using stepwise method of variableselection with objective function F = 4 for inclusion; F = 3.9for exclusion and a tolerance value of 0.001, followed bythe discriminant analysis. The calculated and predicted clas-sification categories of the discrimination and test sets re-spectively were determined at 50 % probability level. Thediscriminant function DP is represented with the followingequation:

DP ¼ 20:755 nABþ 6:590 nRCONHR�34:833 nCrs

þ52:154 ðJurs�DPSA�1Þ þ 65:308 h0B�6:587 S tsC

�35:528

NTr ¼ 97, l ¼ 0:353, Fðdf ¼ 6,90Þ ¼ 27:513; p < 0:0000,

Rc ¼ 0:804,

Squared Mahalanobis distance ¼ 7:47, MCCTr ¼ 0:763,

AUROCTraining ¼ 0:957, c2 ðdf ¼ 6Þ 95:84; p < 0:0000,

1 ¼ 16:167;

NTest ¼ 88, MCCTest ¼ 0:151, AUROCTest ¼ 0:610

ROCED ¼ 1:807, ROCFIT ¼ 5:118

ð5Þ

The LDA equation contains only 6 independent variables.The statistical data for the different validation metricsstrongly accounted for the significance of the derivedequation. All the metrics were within the acceptable limitand thus reflected the reliability and acceptability of theLDA model. The discrimination set showed the followingresults: sensitivity = 82.05 %, specificity = 93.10 %, preci-sion = 88.69 %, accuracy = 88.66 % and F-measure = 85.33 %.The developed LDA model was later used to predict thetest set to validate the model externally. The results for thetest set are as follows: sensitivity = 63.16 %, specificity =52.00 %, precision = 50.00 %, accuracy = 56.82 % and F-mea-

Figure 1. Receiver Operating Characteristics (ROC) curves for the discrimination set (training set) and the test set.

Mol. Inf. 2013, 32, 693 – 705 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.molinf.com 697

Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants

Page 6: Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants Using In Silico Tools: Classification and Regression Based QSARs and Pharmacophore Mapping

sure = 55.81 %. All the results were obtained based on theconfusion matrix developed using the LDA technique.

The area under the ROC curve (AUROC) was also deter-mined to check the performance of the classification modelfor both the discrimination and test sets. The calculatedvalues of AUROC for discrimination and test sets were 0.957and 0.610 respectively, values much higher than the ac-ceptable limit of 0.5. Thus, the AUROC also strongly sup-ports the reliability of the developed discrimination model.The ROC curve for the discrimination and test sets is repre-sented in Fig. 1. The parameter ROCED bears a value of 0for a perfect classifier, a value greater than 2.5 is consideredas random classifier and a value above 4 is considered asbad classifier. Our model showed a value of 1.807 forROCED, which corresponds to a good quality of the ROCanalysis. ROCFIT was calculated by dividing the ROCED withWilk’s l value, and an acceptable value of 5.118 was ob-tained. These two parameters prove the following points:(1) the obtained model has a similar accuracy for the train-ing and test series, (2) both training and test sets have rat-ings close to perfection and (3) a maximum accuracy oftest set prediction. The MCC usually varies from �1 to + 1referring to an inverse classification and a perfect classifica-tion respectively, whereas a value of 0 corresponds torandom classification performance. The present study alsoshowed acceptable values for the MCC; 0.763 for the train-ing and 0.151 for the test sets. The training set showsa MCC value more close to the perfect classifier value (MCCvalue of 1) compared to the test set.

The discrimination of high risk and low risk drugs wascarried out to show that the obtained discriminant functionvalues in terms of % probability activity (PA) of the LDAmodel for both the groups make it possible to separate thetwo populations. To design the PDD, we observed that themaximum of the Ei (expectancy to get low risk drugs) and

Ea (expectancy to get high risk drugs) values are distributedon different sides of PA indicating 50 % expectancy. Drugswith a PA value of 50 % or more were termed as high riskdrugs and those with a PA value of less than 50 % weretermed as low risk drugs, in both the discrimination andthe test groups. PDD is presented in Figure 2 for the dis-crimination set and the test set. On analysis of Figure 2, al-though overlapping of Ei can be seen in the Ea region forboth the discrimination set and test set of compounds, theoverlapping blocks are significantly fewer in number andlower in heights. The lower overlapping between Ei and Ea,indicates more meaningful PDD. The studied PDD for bothsets are significant and reliable. Using the PDD, it is possi-ble to discriminate between higher and lower risk drugswithin a structurally heterogeneous set of compounds andit constitutes a valuable tool in the validation of discrimina-tion analysis for our study.

A contribution plot (Figure 3) was developed for the bestdiscriminating descriptors by taking the product of theiraverage values with their corresponding coefficients asgiven in the discriminant equation 2. Based on analysis ofthe contribution plot, it can be inferred that the indices Ju-rs�DPA�1 and nAB are two most significant features fordiscrimination between the high and low risk chemicalsand they show positive contributions towards lower riskchemicals. Jurs descriptors combine shape and electronicinformation to characterize molecules. The descriptors arecalculated by mapping atomic partial charges on solvent-accessible surface areas of individual atoms. Jurs�DPA�1indicates the difference in charged partial surface areasthat is partial positive solvent accessible surface area minuspartial negative solvent-accessible surface area. Differencesin charged partial surface areas of polychlorinated biphenylcompounds are in the negative range and hence they aretermed as high risk chemicals. On the other hand, com-

Figure 2. PDD for the discrimination set (training set) and the test set.

698 www.molinf.com � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2013, 32, 693 – 705

Full Paper S. K. Kar, K. Roy

Page 7: Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants Using In Silico Tools: Classification and Regression Based QSARs and Pharmacophore Mapping

pounds like Noscapine (102), Pefloxacin (109) and Timolol(141) have high differences in partial surface areas andthey are identified as low risk chemicals. nAB, a constitution-al index, indicates the number of aromatic bonds presentin a chemical. It is interesting to point out that, except PCB33 (163), all 12 PCB compounds taken for model develop-ment do not contain any aromatic bond which loudly sup-ports that high number of aromatic bonds results in lowtoxic chemicals. Though, hB’ and nRCONHR contribute posi-tively for both group of chemicals, h’B has more positivecontribution towards high risk chemicals and nRCONHR hasmore positive contribution towards low risk chemicals. Thedescriptor h’B belongs to the class of ETA indices[47] impli-cating the topological environment of the molecule. h’B isa branching index referring the extent of branching in themolecular structure and according to contribution plot, mo-lecular structure of compounds should be less branched.nRCONHR indicates the number of functional group countsfor secondary aliphatic amides. In the training set, 17 chem-icals have secondary aliphatic amide group and all of themare categorized in low risk chemicals which is also observedfrom the contribution plot. On the other hand, nCrs(number of ring secondary C (sp3) counts) contribute nega-tively towards both groups and has more contribution to-wards higher toxic chemicals. Compounds like Doxorubicin(50), Venlafaxine (148) and Dieldrin (43) containing highernumber of ring secondary C (sp3) are classified as hightoxic chemicals. Though, the index S_tsC (acetylenic carbonexpressed by E-state index of fragment �C�) exerts samecontribution to both higher and lower risk compounds, it isalso an important as discriminating feature for the devel-oped LDA model.

The AD study has been performed by using a programEUCLIDEAN developed in our laboratory.[48] This study sug-gested that seven compounds [Digoxin (44), Erythromycin(52), Rosaramicin (128), Roxithromycin (129), PCB 209(162), PCB 206 (164) and Gentamicin (185)] fall outside ofthe AD of the model and hence, their predictions are lessreliable. Thus, predictions of 92.05 % of the test set com-pounds are quite reliable (Figure 4).

3.2 Results Obtained from the Regression Analysis

Statistically significant regression based QSAR model wasdeveloped using stepwise MLR as the chemometric tool. Adetailed report of the statistical quality of various models iselaborated in Equation 6. The developed regression equa-tion is as follows:

logð0:01þM=PÞ ¼ 1:619�0:090 nBMþ 9:30 DaA

þ0:123 nCb�0:504� nRCOOHþ 0:800 ðAtype�C�35Þ�2:22 ðShadow�XYfracÞ�0:504 ðAtype�C�43Þþ0:096 ðAtype�C�6ÞnTraining ¼ 97, R2 ¼ 0:700, R2

adj ¼ 0:673, Q2LOO ¼ 0:636,

PRESS ¼ 17:503

nTest ¼ 88, Sensitivity ¼ 58:97 %, Specificity ¼ 61:22 %,

Precision ¼ 54:76 %

Accuracy ¼ 56:82 %, F-measures ¼ 56:79 %

ð6Þ

The value of the parameter PRESS is low which is desiredfor a good QSAR model. The model can explain 67.3 % of

Figure 3. Average contributions of indices to the discriminant functions for higher and lower toxic groups.

Mol. Inf. 2013, 32, 693 – 705 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.molinf.com 699

Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants

Page 8: Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants Using In Silico Tools: Classification and Regression Based QSARs and Pharmacophore Mapping

variance (R2a) while it could predict 63.6 % of variance (Q2).

The R2 (0.700) and Q2 (0.636) values being close to eachother, they account for significant correlation between theexperimental and predicted values of the studied com-pounds. For the assessment of external prediction qualityof the developed model, classification metrics were used.All the metrics were within the acceptable limit and thusreflected the reliability and acceptability of the regressionmodel. The test set showed the following results : sensitivi-ty = 58.97 %, specificity = 61.22 %, precision = 54.76 %, accu-racy = 56.82 % and F-measure = 56.79 %. The value of cR2

p

(Model randomization: 0.660) calculated based on the ran-domization results was much higher than the thresholdvalue of 0.5 and thus ensured that the model was not themere outcome of chance only.

The order of importance of the descriptors towards thetoxicity of the molecules has been determined usingthe standardized descriptor matrix. The descriptors areranked according to the descending order of standardizedcoefficient values: nBM > DaA > nCb� > nRCOOH >Atype�C�35 > Shadow�XYfrac > Atype�C�43 >Atype�C�6. The positive coefficients for the descriptorsnBM, nRCOOH, Shadow�XYfrac and Atype�C�43 refer toan increase in the risk profile of the molecules with an in-crease in the values of these descriptors. Again, negativecoefficients for DaA, nCb�, Atype�C�35 and Atype�C�6descriptors signify that the risk profile of these molecules isinversely proportional to the values of these descriptors.

Though it will be an overambitious attempt to explorethe mechanisms of action of milk/plasma partitioning be-haviour of diverse compounds using the model developedhere from simple molecular descriptors computed theoreti-cally, the models can certainly give idea about the structur-al alerts for the studied toxicity. According to OECD princi-ple 5, “It is recognized that it is not always possible, froma scientific viewpoint, to provide a mechanistic interpretationof a given (Q)SAR (Principle 5), or that there even be multiplemechanistic interpretations of a given model. The absence ofa mechanistic interpretation for a model does not mean thata model is not potentially useful in the regulatory context.The intent of Principle 5 is not to reject models that have noapparent mechanistic basis, but to ensure that some consid-eration is given to the possibility of a mechanistic associationbetween the descriptors used in a model and the endpointbeing predicted, and to ensure that this association is docu-mented”. Thus, our developed model complies with theOECD Principle 5 which states as much as possible mecha-nistic interpretation in order to account for its acceptability.The interpretation and importance of each descriptor ap-pearing in the best regression equation is discussed below.

nBM is a constitutional index signifying the number ofmultiple bonds, DaA is a measure of count of heteroatoms,nCb� is the number of substituted benzene C(sp2),nRCOOH is a functional group count signifying the numberof aliphatic carboxylic acids in a particular compound,Shadow�XYfrac stands for fractional area of the molecular

Figure 4. Test of AD for classification based QSAR model by Euclidean distance approach (mean normalized distance from the training setcompounds is shown).

700 www.molinf.com � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2013, 32, 693 – 705

Full Paper S. K. Kar, K. Roy

Page 9: Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants Using In Silico Tools: Classification and Regression Based QSARs and Pharmacophore Mapping

shadow in the XY plane over the area of the enclosing rec-tangle containing the molecule. Atype�C�35,Atype�C�43 and Atype�C�6 descriptors refer to the hy-drophobicity of the C atom of the molecule imparted dueto the presence of :R- -CX···X, :X- -CR···X and �CH2RX frag-ments respectively. Here, R represents any group linkedthrough carbon, X represents any heteroatom (O, N, S, P,Se, and halogens), (- -) represents an aromatic bond as inbenzene or delocalized bonds such as the N�O bond ina nitro group and (···) represents aromatic single bonds asthe C�N bond in pyrrole.

The AD study using EUCLIDEAN suggested that fourcompounds [Astemizole (3), Ethanol (29), Valproic (76), PCB209 (81) and Gentamicin (88)] fall outside of the AD of themodel and hence, their predictions are less reliable. So, pre-dictions of 94.84 % of the test set compounds are quite reli-able (Figure 5).

3.3 Analysis of the 3D Pharmacophore Model

10 pharmacophore hypotheses (Table 1) were developedusing 96 training set compounds based on the conformersobtained from the BEST method of conformer generation.All the hypotheses yielded acceptable results in terms ofcost functions and correlation coefficients. The values offixed cost and null cost, expressed in bits, differed signifi-cantly from each other by 110.833 bits and such a differenceimplied existence of more than 95 % chance of true correla-

tion for the developed 3D pharmacophore. All the hypoth-eses obtained were satisfactory in terms of their total costvalues, which were close to the fixed cost value, and that

Figure 5. Test of AD for regression based QSAR model by Euclidean distance approach (mean normalized distance from the training setcompounds is shown).

Table 1. Results of 10 pharmacophore hypotheses generatedusing conformers developed from the BEST method of conformersearch against.

Hypothesisno.

Total cost[a]

RMS Correlation Features

1 388.851 1.069 0.752 HBA, HYD, NegIon,RA

2 390.969 1.089 0.741 HBA, HYD, NegIon,RA

3 391.594 1.090 0.741 HBA, HYD, NegIon,RA

4 391.682 1.090 0.741 HBA, HYD, NegIon,RA

5 393.839 1.107 0.731 HBA, HYD, NegIon,RA

6 395.861 1.125 0.720 HBA, HYD, NegIon,RA

7 399.280 1.159 0.699 HBA, HYD, NegIon,RA

8 399.480 1.157 0.701 HBA, HYD, NegIon,RA

9 400.233 1.170 0.692 HBA, HBA, HYD, RA10 401.737 1.621 0.690 HBA, HYD, NegIon,

RA

[a] Config. cost = 12.852, Null cost = 444.352, Fixed cost = 333.519

Mol. Inf. 2013, 32, 693 – 705 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.molinf.com 701

Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants

Page 10: Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants Using In Silico Tools: Classification and Regression Based QSARs and Pharmacophore Mapping

of the configuration cost parameter that yielded a value of12.852 bits, which was much lower than the limiting valueof 17. Good overall correlation between the observed andcalculated activity data was also reflected from the satisfac-tory correlation coefficients of the 6 hypotheses. Subse-quently, hypothesis 1 (Figure 6a) was selected as the bestranking four feature (HBA, HYD, NegIon and RA) pharmaco-phore based on the values of the correlation coefficient(0.752) and the cost functions (total cost = 388.851). The ac-ceptability of the developed pharmacophore model wasalso assessed based on different qualitative validation pa-rameters (Sensitivity = 100 %, Precision = 97.5 %, Specifici-ty = 98.25 %, Accuracy = 98.96 %, F-measure = 98.73 %)(Table 2). The Fischer validation performed for the devel-oped model also showed that the value of the average cor-relation coefficient of the randomized models (Rr) wasmuch lower than the corresponding correlation coefficient(R) of the non-randomized model (Rr = 0.097, R = 0.752), im-plying the robustness of the developed model and denot-ing the existence of a true correlation with a cR2

p value of0.557. Again, total cost of randomized model (444.027) wasclose to the null cost (444.352) of the original model whichproved that the developed model was not obtained bychance. External validation of the developed model wasperformed by mapping the test set molecules to the devel-oped models and the predicted activity data was matchedwith those of the observed ones based on the activity clas-sification technique. This technique ensures the ability ofthe model to rightly predict the higher and lower toxiccompounds of the test set. The ability of the model to ide-ally distinguish between the two classes (higher and lowertoxic) of the test set was determined based on differentqualitative validation parameters (Sensitivity = 60 %, Preci-sion = 62.79 %, Specificity = 56.76 %, Accuracy = 58.54 %, F-measure = 61.36 %). Acceptable values for all these valida-tion parameters ensured the predictive potential of the de-veloped 3D pharmacophore model and reflected the statis-tical significance of the developed model.

Hypothesis 1 is a four feature pharmacophore displayingthe significance of the following features arranged at spe-cific distances: HBA, HYD, NegIon and RA. The vectors forthe HBA features indicate the direction of formation of thehydrogen bond between the electronegative atom of thedrug/chemical molecules and neighboring electropositivehydrogen atom. Additionally, the hydrophobic feature de-notes the regions favorable for substitution by hydrophobic

groups. Similarly, the vector for the ring aromatic feature in-dicates the direction of the p�p interaction between anelectron rich and an electron deficient aromatic center.NegIon feature matches atoms or groups of atoms that arelikely to be deprotonated at physiological pH. A quadrilat-eral having a definite shape is obtained by placing the fourfeatures at the vertices. The distance between the NegIonfeature and hydrophobic feature is 7.191 � while the ringaromatic feature is placed at a distance of 4.144 � and4.051 � from the HBA and hydrophobic features respective-ly. The distance between hydrophobic and NegIon featuresis 3.597 �. The diagonal of the quadrilateral obtained byjoining the ring aromatic and the hydrophobic featuremakes an angle of 57.1328 with the HBA feature (Figure 6a).Molecules bearing substitutions that match with the verti-ces of the quadrilateral thus formed and thereby captureall the pharmacophoric features may exhibit minimum tox-icity.

Figure 6b shows the mapping of one of the least toxiccompounds tolmetin (143) to the developed pharmaco-phore. The phenyl ring and the connecting ketonic groupbetween pyrrole and phenyl rings capture the RA and theHBA features respectively. The methyl group attached tothe nitrogen atom of the pyrrole ring captures the hydro-phobic feature and hydroxyl group of acrylic acid substitut-ed at 5 position of the pyrrole ring can confine the NegIonfeature of the developed pharmacophore. Thus, the mole-cule capturing all the features exerts least risk for secretionthrough breast milk. On the contrary, compound nos. p,p’-DDE (156), p,p’-DDT (157) and Hexachlorobenzene (161)lacking the required substituents fail to map with all thepharmacophoric features which in turn accounts for theirhigher risk potetial. Figure 6c shows the mapping of one ofthe most toxic compounds p,p’-DDD (155) to the devel-oped toxicophore.

4 Results of Screening of the DrugBankDatabase

The DrugBank database[24] compounds were used in ourstudy for the screening purpose. The screening approachwas divided into two segments: (i) QSAR based screeningand (ii) 3D pharmacophore feature based screening. For thefirst segment of screening, 6508 molecular structures wereimported to Cerius-2 and DRAGON 6 software for computa-tion of required descriptors that evolved from the bestQSAR model. As descriptor calculation algorithm of Dragonsoftware cannot account for the calculation related to 116compounds, we could not calculate descriptors for thosecompounds. Finally, 6392 DrugBank compounds were usedfor final screening. As discussed in Section 2.9, 6392 com-pounds were initially classified as higher (P) or lower (N)risk agents using the developed LDA model. Equation 4was then used to quantify their risk and based on the setpotency value; each compound was again classified into

Table 2. Statistical results in % according to qualitative validationparameters and randomization results for the best pharmacophorehypothesis-1.

Qualitymeasures

Sensitivity Specificity Precision Accuracy F-mea-sure

cR2p

Trainingset

100 98.25 97.5 98.96 98.73 0.557

Test set 60 56.76 62.791 58.54 61.36 –

702 www.molinf.com � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2013, 32, 693 – 705

Full Paper S. K. Kar, K. Roy

Page 11: Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants Using In Silico Tools: Classification and Regression Based QSARs and Pharmacophore Mapping

higher (P) and lower (N) risk groups using the predicted m/p ratio values and was compared with the classification re-sults.

Identical prediction was observed for 3606 compounds(2877 compounds predicted as L by both models and 729compounds predicted as H by both models) and dissimilarprediction was found for 2786 compounds (810 com-pounds predicted as H in LDA model and L in QSAR modeland 999 compounds predicted as L in LDA model and H inQSAR model). Out of the 4883 experimental drugs,common predictions were observed for 2784 compounds

(2258 compounds predicted as L and 526 compounds pre-dicted as H by both models) and dissimilar predictionswere found for 2099 compounds (1382 compounds pre-dicted H in LDA model and L in QSAR model and 717 com-pounds predicted L in LDA model and H in QSAR model).In case of approved drugs, same prediction was observedby both models for 527 approved drugs (129 drugs predict-ed as H and 398 drugs predicted as L) out of 935 approveddrugs. Majority of the drug compounds were classified intothe same group by both models which was quite encour-aging for the developed models. In case of the feature

Figure 6. (a) Pharmacophore obtained from hypothesis 1 by monitoring the positions of the different features, (b) mapping of the leasttoxic compound (compound no. 143) to the developed pharmacophore and (c) mapping of the most toxic compound (compound no.155) to the developed pharmacophore. Shown are ring aromatic sphere (RA), hydrophobic group (HYD), negative ionizable group (NegIon)and hydrogen bond acceptor (HBA) features with vectors in the direction of the putative hydrogen bonds.

Mol. Inf. 2013, 32, 693 – 705 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.molinf.com 703

Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants

Page 12: Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants Using In Silico Tools: Classification and Regression Based QSARs and Pharmacophore Mapping

based screening, the DrugBank compounds were mappedby the developed pharmacophore model (Hypothesis 1)with 0 omit features to find out the best possible com-pounds which mapped with all four features. Interestingly,597 compounds were mapped to the developed 3D phar-macophore model. Out of 597 compounds, there were only6 withdrawn compounds which indicated that developedpharmacophoric features could ideally map the less riskchemicals. Finally, results regarding each of drug groupsand number of cases predicted as N or P by both QSARmodels and number of cases mapped with pharmacophorewith zero omit feature are summarized in Table S1 andTable S2 in Supporting Information.

Again, regarding the AD issue, we have tried Euclideandistance approach to see which of the DrugBank com-pounds are falling outside of the AD of the developed re-gression based and classification based QSAR models. Outof 6392 compounds, 252 compounds (around 3.94 % of theDrugBank compounds) for classification based model and1262 compounds (around 19.74 % of the DrugBank com-pounds) for regression based model have been found to lieoutside of the applicability domain and hence predictionsof these particular compounds are less reliable. Compoundsnumbers which are outside of the applicability domain arerepresented in Table S3 and Table S4 in Supporting Infor-mation.

5 Overview and Conclusion

The essential molecular fragments and their degree of con-tribution to milk/plasma partitioning of the molecules weredetermined from the robust and predictive QSTR modelsbuilt in the present work using 185 structurally diversecompounds, including drugs and environmental pollutants.Additionally, the crucial features constituting the pharmaco-phore of the molecules were analyzed based on the 3Dpharmacophore model. Satisfyingly, the results obtainedfrom the regression based and classification based QSTRand pharmacophore models were quite complementary toeach other with respect of the obtained requisite featuresand structural fragments that enable to discriminate be-tween higher and lower risk compounds. The QSTR modelthus developed quantitatively determine the influence ofvarious structural features in estimating the risk of the mol-ecules and on the other hand pharmacophore model couldidentify the required features needed for the less risk chem-icals.

Unsaturated compounds and number of aromatic bondsstrongly account for lower risk due to the lower partition-ing of the molecules into breast milk from the plasma. Sec-ondary aliphatic amides and aliphatic carboxylic groups areconducive for lower risk of the studied molecules. Frag-ments like :R- -CX···X and�CH2RX account for increased riskprofile of the molecules while fragments like :X- -CR···X arerequired to reduce their risk profile. Four features (HBA,

HYD, NegIon and RA) arranged at specified distances arehighly essential for diminishing the compound’s excretioninto breast milk from plasma. Our in silico models can pre-dict the extent of a compound’s excretion into breast milkfor the DrugBank molecules, not included in model genera-tion, and can efficiently distinguish between higher andlower risk molecules. It is conceivable that our in silico ap-proach offers an efficient screening method for the risk pre-diction for secretion of drugs into milk and can be used inintegration with existing in vitro methods to increase over-all predictivity at the early stages of drug development.

Declaration of Conflict of Interest

The authors declare no conflict of interest.

Acknowledgements

SK thanks the Department of Science and Technology, Gov-ernment of India for awarding him a Research Fellowshipunder the INSPIRE Scheme. KR thanks the Council of Scientif-ic and Industrial Research (CSIR), New Delhi for awardinga major research project (No. 01(2546)/11/EMR-II).

References

[1] UNICEF Report: http://www.unicef.org/nutrition/index_24824.html, http://www.un.org/apps/news/story.asp?New-sID = 41893&Cr = children&Cr1#.UGFaSfBi9eY.

[2] Queensland Health Report, http://www.health.qld.gov.au/breastfeeding/importance.asp.

[3] C. R. Howard, R. A. Lawrence, Clin. Perinatol. 1999, 26, 447 –478.

[4] A. S. Goldman, Pediatr. Infect. Dis. J. 1993, 12, 664.[5] A. Llewellyn, Z. N. Stowe, J. Clin. Psychiat. 1998, 59, 41 – 52.[6] L. A. Larsen, S. Ito, G. Koren, Ann. Pharmacother. 2003, 37,

1299.[7] A. R. Katritzky, D. A. Dobchev, E. Hur, D. C. Fara, M. Karelson,

Bioorg. Med. Chem. 2005, 13, 1623 – 1632.[8] S. Jensen, New Sci. 1966, 32, 612.[9] E. P. Laug, F. M. Kunze, C. S. Prickett, AMA Arch. Ind. Hyg.

Occup. Med. 1951, 3, 245 – 246.[10] J. C. Fleishaker, Adv. Drug Del. Rev. 2003, 55, 643 – 652.[11] H. R. Chao, S. L. Wang, P. H. Su, H. Y. Yu, S. T. Yu, O. Papke, J.

Hazard. Mater. 2005, 121, 1 – 10.[12] D. Costopoulou, I. Vassiliadou, A. Papadopoulos, V. Makropou-

los, L. Leondiadis, Chemosphere 2006, 65, 1462 – 1469.[13] S. Ito, A. Lee, Adv. Drug Del. Rev. 2003, 55, 617 – 627.[14] G. Schneider, S. So, Modeling Structure-Activity Relationships,

Landes Bioscience, Gerorgetown, 2002.[15] J. R. Rabinowitz, M. R. Goldsmith, S. B. Little, M. A. Pasquinelli,

Environ. Health Perspect. 2008, 116, 573 – 577.[16] S. Kar, K. Roy, J. Indian Chem. Soc. 2010, 87, 1455 – 1515.[17] R. Benigni, R. Zito, Mutat. Res. 2004, 566, 49 – 63.[18] S. Agatonovic-Kustrin, I. G. Tucker, M. Zecevic, L. J. Zivanovic,

Anal. Chim. Acta. 2000, 418, 181 – 195.[19] S. Agatonovic-Kustrin, L. H. Ling, S. Y. Tham, R. G. Alany, J.

Pharm. Biomed. Anal. 2002, 29, 103 – 119.

704 www.molinf.com � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2013, 32, 693 – 705

Full Paper S. K. Kar, K. Roy

Page 13: Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants Using In Silico Tools: Classification and Regression Based QSARs and Pharmacophore Mapping

[20] C. Zhao, H. Zhang, X. Zhang, R. Zhang, F. Luan, M. Liu, Z. Hu,B. Fan, Pharm. Res. 2006, 23, 41 – 48.

[21] M. H. Abraham, J. Gil-Lostes, M. Fatemi, Eur. J. Med. Chem.2009, 44, 2452 – 2458.

[22] M. H. Fatemi, M. Ghorbanzad’e, Eur. J. Med. Chem. 2010, 45,5051 – 5055.

[23] OECD Document, 2007, Guidance Document on the Validationof (Quantitative) 1226 Structure�Activity Relationships(Q)SARs], Models, ENV/JM/MONO(2007)2.

[24] C. Knox, V. Law, T. Jewison, P. Liu, S. Ly, A. Frolkis, A. Pon, K.Banco, C. Mak, V. Neveu, Y. Djoumbou, R. Eisner, A. C. Guo,D. S. Wishart, Nucleic Acids Res. 2011, 39, D1035 – 1041.

[25] K. Malone, K. Papagni, S. Ramini, N. L. Keltner, Perspect. Psy-chiatr. Care 2004, 40, 73 – 85.

[26] Cerius2, Vers. 4.10, Accelrys Inc. , San Diego, CA, USA; http://www.accelrys.com/cerius2.

[27] DRAGON, Vers. 6, TALETE srl, Italy; http://www.talete.mi.it/products/dragon_molecular_descriptors.htm.

[28] C. W. Yap, J. Comput. Chem. 2011, 32, 1466 – 1474.[29] R. B. Darlington, Regression and Linear Models, McGrawHill,

New York, 1990.[30] P. Mitteroecker, F. Bookstein, Evol. Biol. 2011, 38, 100 – 114.[31] A. Smellie, S. L. Teig, P. Towbin, J. Comput. Chem. 1995, 16,

171 – 187.[32] Discovery Studio 2.1, Accelrys Inc. , SanDiego, CA, 2010.[33] K. Poptodorov, T. Luu, R. D. Hoffmann, in Methods and Princi-

ples in Medicinal Chemistry, Pharmacophores and Pharmaco-phores Searches, Vol. 2, (Eds: T. R. D. Langer Hoffmann), Wiley-VCH, Weinheim, Germany, 2006, pp. 17 – 47.

[34] STATISTICA, STATSOFT Inc. , USA; http://www.statsoft.com/.

[35] SPSS, SPSS Inc. , USA; http://www.spss.com[36] MINITAB, Minitab Inc. , USA; http://www.minitab.com.[37] M. G�lvez-Llompart, M. C. Recio, R. Garc�a-Domenech, Mol.

Divers. 2011, 15, 917 – 926.[38] F. J. Prado-Prado, E. Uriarte, F. Borges, H. Gonz�lez-D�az, Eur. J.

Med. Chem. 2009, 44, 4516 – 4521.[39] T. Fawcett, Pattern. Recogn. Lett. 2006, 27, 861 – 874.[40] A. Speck-Planche, V. V. Kleandrova, F. Luan, M. N. D. S. Cor-

deiro, Eur. J. Med. Chem. 2011, 46, 5910 – 5916.[41] K. Roy, I. Mitra, Comb. Chem. High Throughput Screen 2011, 14,

450 – 474.[42] A. Perez-Garrido, A. M. Helguera, F. Borges, M. N. D. S. Cor-

deiro, V. Rivero, A. G. Escudero, J. Chem. Inf. Model 2011, 51,2746 – 2759.

[43] M. Murcia-Soler, F. P�rez-Gim�nez, F. J. Garc�a-March, M. T. Sala-bert-Salvador, W. D�az-Villanuev, P. Medina-Casamayor, J. Mol.Graph. Model 2003, 21, 375 – 390.

[44] I. Mitra, A. Saha, K. Roy, Mol. Simul. 2010, 36, 1067 – 1079.[45] P. Gramatica, QSAR Comb. Sci. 2007, 26, 694 – 701.[46] J. Jaworska, N. Nikolova-Jeliazkova, T. Aldenberg, Altern. Lab.

Anim. 2005, 33, 445 – 459.[47] K. Roy, R. N. Das, SAR QSAR Environ. Res. 2011, 22, 451 – 472.[48] EUCLIDEAN (a program written in C + +) is developed and va-

lidated (2012) on known data-sets by Pravin Ambure (Email :[email protected]) of Drug Theoretics and Chem-informatics Laboratory, Jadavpur University.

Received: February 11, 2013Accepted: April 17, 2013

Published online: July 1, 2013

Mol. Inf. 2013, 32, 693 – 705 � 2013 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim www.molinf.com 705

Prediction of Milk/Plasma Concentration Ratios of Drugs and Environmental Pollutants


Recommended