ORIGINAL RESEARCH
QSAR and pharmacophore modeling of diverse aminothiazolesand aminopyridines for antimalarial potency againstmultidrug-resistant Plasmodium falciparum
Rahul Balasaheb Aher • Kunal Roy
Received: 2 August 2013 / Accepted: 6 March 2014
� Springer Science+Business Media New York 2014
Abstract Artemisinin antimalarials are the frontline and
effective drugs used worldwide for the treatment of
deadly Plasmodium falciparum malaria. But the recent
reports of artemisinin resistance have created the urgent
need to discover new molecules against single and mul-
tidrug-resistant strains of P. falciparum. In this back-
ground, we have developed here 2D-quantitative
structure–activity relationship (2D-QSAR) and 3D-phar-
macophore models using aminothiazole and aminopyri-
dine compounds for their activity against multidrug-
resistant strain (k1) of P. falciparum. Based on the
internal (Q2), external (Rpred2 ), overall validation
(r2mðOverallÞ) metrics, and number of descriptors used for
model development, a QSAR equation developed from a
genetic function algorithm having both linear and spline
terms was found to be the best model (Q2 = 0.675;
Rpred2 = 0.720; r2
mðOverallÞ= 0.617). The pharmacophore
models were developed in order to unveil the structural
requirements for the activity, and to classify the com-
pounds into more active and less active antimalarials
against the multidrug-resistant strain (k1) of P. falcipa-
rum. The best pharmacophore model (Hypo-1) with a
correlation coefficient of 0.932 showed one hydrogen
bond acceptor, one hydrophobic aliphatic, and two ring
aromatic features as the essential structural requirements
for the antimalarial activity. The pharmacophore model
(Hypo-1) also shows 86.00 % correct classification of
more active compounds of the test set against the multi-
drug-resistant (k1) strain of P. falciparum. Both the
models could be utilized further for the prediction of
antimalarial potency of aminothiazole and aminopyridine
compounds against multidrug-resistant P. falciparum.
Keywords Aminothiazoles � Aminopyridines �Multidrug-resistance � Plasmodium falciparum �QSAR � 3D-pharmacophore
Introduction
Artemisinin-based combination therapies are the recom-
mended first-line treatments for falciparum malaria in the
countries of the endemic disease. But the recent sign of
decline of efficacy of artemisinin-based combination ther-
apy and artesunate monotherapy in western Cambodia
raises the serious alarms for global malaria control (Don-
dorp et al., 2009). According to the WHO malaria report of
2012, there were about 219 million cases of malaria in
2010, with an estimated 660,000 deaths (World Malaria
Report, 2012). Therefore, there is an urgent need to
develop new molecules with novel mode of action.
Drug discovery program is a multidisciplinary effort for
the lead identification and optimization of druggable can-
didates. The higher cost, time, and disappointing pace of
approvals of new molecules create a pressure on the
pharmaceutical industries to improve the efficiency of the
drug discovery cycle. Hence, different chemoinformatic
tools are being used in the drug discovery projects, so as to
Electronic supplementary material The online version of thisarticle (doi:10.1007/s00044-014-0997-x) contains supplementarymaterial, which is available to authorized users.
R. B. Aher � K. Roy (&)
Drug Theoretics and Cheminformatics Laboratory, Department
of Pharmaceutical Technology, Jadavpur University,
Kolkata 700032, India
e-mail: [email protected]; [email protected]
URL: http://sites.google.com/site/kunalroyindia/
123
Med Chem Res
DOI 10.1007/s00044-014-0997-x
MEDICINALCHEMISTRYRESEARCH
optimize the number of molecules to be synthesized
and analyzed. Quantitative structure–activity relationship
(QSAR) is one such efficient chemoinformatic tools, which
aims to find the consistent structure–activity relationship
with the development of predictive models. Such predictive
models could be utilized to determine the biological
activity of newer compounds prior to synthesis and
experimental testing.
The aminothiazole and aminopyridine scaffolds were
reported against the multidrug-resistant strain (k1) of P.
falciparum (Gonzalez Cabrera et al., 2012; Paquet et al.,
2012), though their target information is not available in
the literature. The k1 strain is mainly resistant to three
marketed drugs, namely chloroquine, pyrimethamine, and
cycloguanil (Wenzel et al., 2010). In the present work, we
have utilized the ligand-based approaches of 2D-QSAR
and 3D-pharmacophore model development for the ami-
nothiazole and aminopyridine compounds. For this, we
have combined two datasets of diverse aminothiazoles and
aminopyridines scaffolds, which were tested previously
against the multidrug-resistant strain (k1) of P. falciparum,
by the same research group (Gonzalez Cabrera et al., 2012;
Paquet et al., 2012), and with the same assay protocol. The
structural diversity in the molecules (non-congeneric) is
indicated by the wide range of activity from 7 to
42,000 nM. The model developed from such non-conge-
neric series of compounds would be always much useful,
since it could be utilized for predicting the activity of a
varied range of compounds of similar chemical domain.
There are some previous reports of the development of
QSAR and pharmacophore modeling using varied scaffolds
against the multidrug-resistant strain of P. falciparum (k1).
These scaffolds includes tryptanthrins (Bhattacharjee et al.,
2004), alkoxylated chalcones (Xue et al., 2004), pentami-
dines (Athri et al., 2010), 3-carboxyl-4(1H)-quinolones (Li
et al., 2013), 7-chloro-4-aminoquinolines (Sahu et al.,
2011), prodiginines (Mahajan et al., 2013), etc. But there
are no previous reports of the development of 2D-QSAR
Table 1 General structural features of aminothiazole and aminopyridine derivatives
General structures Compound nos. General structures Compound nos.
S
NHN
O
NN
H2N
1 NH2N S O
O
O
FF
F
21
NN
Ph
HN
O
R
2–7 N
R
H2N S O
O
22–52
NN
Ph
HN
O
N
S
R
8–12 NO
N
NH2
R
54–82
S
NNH
RO
H2N13–20 N
H2N R
R1
83–88
Med Chem Res
123
and 3D-pharmacophore models using aminothiazole and
aminopyridine scaffolds against, the multidrug-resistant
strain of P. falciparum (k1).
Materials and methods
Development of 2D-QSAR models
Dataset and descriptors
The dataset comprises a non-congeneric series of com-
pounds which includes thiazole amides, thiazole ureas, and
3,5-diaryl-2-aminopyridines scaffolds. A total set of 87
compounds was collected from the two publications,
reported by the same research group (Gonzalez Cabrera
et al., 2012; Paquet et al., 2012). The general structures of
aminothiazole and aminopyridine are given in Table 1.
Detailed structural features along with antimalarial potency
against multidrug-resistant (k1) P. falciparum of the
compounds are given in Table S1 of the Supplementary
material section. The structures of all the compounds were
drawn in ChemDraw software (CS ChemDraw 5.0) in .mol
format, and used for the descriptor calculation by
employing Cerius2 software (Cerius2 version 4.10). All the
descriptors were calculated using a Descriptor? module of
the Cerius2 software. The calculated descriptors include
topological (E-state index, Balaban index, kappa shape
index, molecular connectivity index, subgraph count,
information content indices), structural (H-bond donor,
H-bond acceptor, Rotlbonds, MW, chiral centers), spatial
(radius of gyration, Jurs, Area, PMImag, Density, Vm),
electronic (dipole-mag, HOMO, LUMO, Sr), and thermo-
dynamic (ALogP, ALogP98, AlogP_atypes, MolRef, MR,
LogP) variables. A total set of 247 descriptors was calcu-
lated, and the thinning of the descriptor matrix was done
based on variance criteria (variance \ 0.0001) using the
Cerius2 software. Finally, 152 descriptors were utilized for
the cluster analysis.
Cluster analysis
The clustering technique is a rational method of selection
of training and test set compounds. The standardized
descriptor matrix was used for k-mean clustering division
using the SPSS software (SPSS 9.0). The total dataset
(n = 87) was divided into a training set (n = 63, 72 % of
the total number of compounds) for the model develop-
ment, and test set (n = 24, 28 % of the total number of
compounds) for the external validation, based on the
clusters obtained from the clustering technique (Roy et al.,
2012). The splitting was done in such a way that both the
sets cover the total chemical space of the whole dataset.
Model development and validation
The biological activity data in IC50 values were converted
to a negative logarithm (pIC50) value, and used as the
dependent variable, while the computed descriptors were
used as an independent variables. The QSAR models were
developed using stepwise multiple linear regression (MLR)
with the stepping criteria F = 4 for inclusion and F = 3.9
for exclusion using MINITAB software (MINITAB 14).
The genetic function approximation (GFA) analysis was
also performed in order to select the best descriptors by
using the same training and test sets division. It was per-
formed using the QSAR module of Cerius2 software on a
Silicon Graphics O2 workstation running under the IRIX
6.5 operating system. The mutation probabilities were kept
at 50 % with 5,000 iterations. Both the linear and spline
terms were used for the model development.
The models were validated by both internal and external
validation tools. The internal validation deals with the
predictive ability of a model based on the training set
compounds, while the external validation deals with the
predictive ability of model for the test set compounds. The
quality of internal validation was judged by a cross-vali-
dated squared correlation coefficient (Q2) based on the
observed and predicted activity of training set compounds.
The high value of Q2 is considered as an indicator of high
predictive ability of the model. The quality of external
validation was determined by calculating the Rpred2 value
for the test set compounds. The acceptable value for both
Q2 and Rpred2 should be more than 0.5. We have also cal-
culated the additional metrics such as r2m and Drm
2 for the
training, test, and overall sets for determining the statistical
significance, predictive potential, and robustness of the
developed models (Roy et al., 2012). The validation of
developed models was also checked by Golbraikh–Tropsha
Fig. 1 Plot showing distribution of more active and less active
compounds in the training and test sets (pharmacophore model)
according to the activity threshold (pIC50: 3.523)
Med Chem Res
123
criteria (Golbraikh and Tropsha, 2002). The equations
(S1–S17) utilized for the calculation of internal, external,
overall validation, and Golbraikh–Tropsha parameters are
given in the supplementary material section.
Randomization
We have also performed the Y-randomization tests for
both the model and process randomization in order to
check the non-randomness of the developed models. In
case of process randomization, the dependent variable
column entries are scrambled by keeping the entire
descriptor matrix intact for random model development
(with fresh selection of variables). In case of model ran-
domization, the dependent variable is scrambled and the
new model is developed using the same set of variables as
present in the original nonrandom model. The process
randomization is generally carried out at 95 % confidence
level, and the model randomization at 99 % confidence
level. If the correlation coefficient of the nonrandom model
is significantly greater than the average value of the cor-
relation coefficient of randomized model, then the model is
considered to be robust and statistically significant.
Moreover, we have also computed the corrected Rp2 (cRp
2)
Table 2 Observed and predicted activity obtained from stepwise MLR (Eq. 1), GFA spline (Eq. 2), and GFA linear/spline (Eq. 3) models for the
training set compounds
Comp. no. Observed pIC50 (mM) Predicted pIC50 (mM) Comp. no. Observed pIC50 (mM) Predicted pIC50 (mM)
Eq. 1 Eq. 2 Eq. 3 Eq. 1 Eq. 2 Eq. 3
1 4.097 2.742 2.989 4.224 49 2.666 3.312 3.129 3.379
3 2.152 2.704 2.774 2.893 50 2.951 2.769 3.116 2.352
5 1.960 2.244 2.520 2.446 51 4.149 4.432 4.321 4.312
6 2.306 2.404 2.290 2.177 52 4.523 4.394 4.277 4.237
8 2.370 3.055 3.058 2.880 54 2.597 2.817 3.116 2.959
10 2.830 2.792 2.965 2.620 55 4.357 4.124 3.829 3.829
11 3.721 2.693 2.998 3.425 57 2.975 3.346 3.344 3.281
12 3.174 2.787 2.929 2.389 58 3.963 3.898 3.725 3.829
13 1.467 2.261 2.415 2.010 60 3.780 3.860 3.250 3.497
15 2.156 2.506 2.881 2.938 61 3.090 3.056 3.116 3.222
16 1.955 1.518 1.473 1.416 63 3.026 3.665 3.463 3.017
17 1.371 1.354 1.112 1.096 64 2.051 2.022 2.075 2.725
19 1.368 1.711 1.475 1.470 65 1.976 3.095 2.622 3.162
20 1.427 1.815 1.129 1.577 66 2.822 1.943 2.203 2.165
22 4.292 3.526 3.566 3.700 68 2.695 2.550 2.713 3.436
24 2.956 3.365 3.417 3.055 69 2.670 2.920 2.783 2.843
25 3.553 3.439 3.328 3.435 71 2.520 3.326 3.390 3.288
27 4.538 3.914 4.012 4.115 72 3.389 3.047 2.929 3.018
29 4.602 3.682 3.553 3.859 73 3.548 3.131 3.253 3.084
30 3.772 3.785 3.691 3.879 74 3.924 3.200 3.405 3.079
31 3.268 3.520 3.691 3.836 75 2.796 3.320 3.479 3.105
32 3.810 3.455 3.565 3.337 76 3.793 3.394 3.729 3.623
34 4.046 4.004 3.905 3.745 78 3.680 3.575 3.807 3.699
35 3.198 4.316 4.039 3.981 79 3.520 3.845 3.730 3.625
36 4.721 4.626 4.724 4.365 80 3.460 4.001 4.120 3.829
38 2.671 3.252 3.280 3.427 81 4.244 3.865 3.789 3.682
39 3.087 3.825 3.360 3.835 82 4.131 3.846 3.918 3.806
40 4.699 3.834 3.767 3.850 83 4.921 4.635 4.702 4.501
41 3.824 3.785 3.846 3.626 85 4.959 4.446 4.368 4.357
44 3.229 3.722 3.691 3.851 86 4.770 4.989 5.044 4.501
45 3.284 2.964 2.710 2.592 88 5.137 4.882 5.021 5.433
47 3.658 3.333 3.691 3.727
Med Chem Res
123
metric to check the non-randomness and acceptability of
the developed models (Mitra et al., 2010).
Development of 3D-pharmacophore models
The target of aminothiazole and aminopyridine scaffolds
being unknown, so we have tried to determine the required
pharmacophoric features responsible for the inhibitory
activity with an indirect approach, i.e., by deriving a phar-
macophoric model. The antimalarial potency against P.
falciparum in terms of IC50 values was used as the dependent
variable for the pharmacophore development. We have
selected the test set for QSAR model as the training set for the
pharmacophore development and vice versa. This was done
to cross-check whether both the sets cover the complete
chemical space of whole dataset or not. The validated
pharmacophore could be utilized further to predict the
activity of unknown compounds of similar chemical domain.
Generation of pharmacophore model
The structures of all the compounds in .mol format were
converted into a single SDF file by the Open Babel
software (Boyle et al., 2011), and used as an input file
for the conformation generation. After conformation
generation, the pharmacophore was developed by the
HypoGen module of Discovery Studio (Li et al., 2000).
Pharmacophore development was carried out from the
training set compounds, setting different parameters in
the automatic generation procedure in the Discovery
Studio software (Accelry’s Discovery Studio 2.1), such
as activity uncertainty 2.0; maximum five features
including hydrogen bond acceptors (HBA), hydrogen
bond donors (HBD), hydrophobic aliphatic (HYAl),
hydrophobic aromatic, and ring aromatic regions (RA);
and 2.97 A for the interfeature spacing, by the BEST
method of poling algorithm.
Table 3 Observed and predicted activity obtained from stepwise MLR (Eq. 1), GFA spline (Eq. 2) and GFA linear/spline (Eq. 3) models for the
test set compounds
Comp. no. Observed pIC50 (mM) Predicted pIC50 (mM) Comp. no. Observed pIC50 (mM) Predicted pIC50 (mM)
Eq. 1 Eq. 2 Eq. 3 Eq. 1 Eq. 2 Eq. 3
2 2.190 2.676 2.861 2.471 42 4.097 4.332 3.769 4.114
4 1.990 2.320 2.569 2.793 43 4.481 4.534 4.495 4.098
7 2.833 3.069 3.398 2.863 46 3.481 3.955 3.821 4.165
9 2.533 3.514 2.946 2.302 48 2.620 3.942 3.802 4.146
14 1.436 1.147 0.973 1.063 56 4.167 3.521 3.549 3.785
18 1.653 1.985 1.944 2.059 59 4.481 4.018 3.553 3.790
21 4.310 3.568 3.565 3.802 62 3.971 3.327 3.310 3.402
23 3.087 3.382 3.691 3.183 67 3.319 2.941 3.077 3.556
26 3.921 3.959 3.815 3.720 70 4.009 3.150 3.225 3.152
28 3.301 3.612 3.634 3.476 77 3.538 3.644 3.962 3.449
33 3.745 3.061 3.248 3.664 84 4.721 4.976 5.164 4.501
37 4.237 3.963 3.710 3.902 87 5.000 4.654 4.655 5.433
Table 4 Comparison of statistical and validation parameters of different 2D-QSAR models
Eq.
no.
Type of
model
Descriptors R2 Q2 Rpred2 r2
mðtestÞ Drm(test)2 r2
mðOverallÞ Drm(Overall)2
1 Stepwise
MLR
SC-2, Atype _C_25, Atype _O_60, AlogP98,
Dipole-mag, CHI-V-3_C
0.705 0.637 0.700 0.578 0.17 0.591 0.200
2 GFA spline \-0.199-Jurs-FPSA-2[, \2.65-Atype_O_60[, \1.039-
Dipole-mag[, \Atype_C_25?1.69[0.736 0.689 0.668 0.544 0.174 0.609 0.195
3 GFA linear
and spline
\0.43-Radofgyration[, Atype_O_60, \Dipole-mag
?0.911[, Atype_N_67
0.724 0.675 0.720 0.612 0.119 0.617 0.180
Med Chem Res
123
Pharmacophore mapping and validation
The developed model was validated by mapping the
entire test set molecules on the developed pharmaco-
phore. It was performed by using the same setting as
employed for a pharmacophore generation. The predict-
ability of a model to classify both more active and less
active compounds has been determined by classifying the
molecules with an activity threshold of 300 nM. The
training and test sets cover not only the total chemical
space of the whole dataset but also have uniform dis-
tribution of more active and less active compounds. The
training set (n = 24) consist of 13 more active and 11
less active compounds, while test set (n = 63) consist of
28 more active and 35 less active compounds. The plot
of distribution values of the inhibitory activity (pIC50)
of the compounds within the training and test sets along
with activity threshold is shown in Fig. 1. Different
qualitative validation parameters were computed in order
to check the quality of the model, to ideally distinguish
between two classes for both the sets. The validation
parameters include sensitivity, specificity, accuracy, pre-
cision, F-measure, recall, and G-means. These validation
parameters depend on the four different quantities,
namely true positives, true negatives, false positives, and
false negatives, and were calculated from the confusion
matrix based on the observed and predicted activity
values. The model is considered to be robust, if all the
validation parameters values are greater than 50 % for
both the sets. The equations for the calculation of dif-
ferent qualitative validation parameters are given in the
supplementary material section (S19-S25).
We have also performed the Fischer randomization test
(F-test), to check whether the obtained model is by chance
or not. It was carried out by scrambling the activity data of
training set molecules and by employing the same settings
as used for the pharmacophore development at 95 % con-
fidence interval. The actual model is considered to be
obtained by chance, if the results of randomized models are
better than the actual one.
Results and discussion
2D-QSAR analysis
QSAR models were developed by using a training set of
63 compounds and by utilizing different chemometric tools
(stepwise regression, GFA spline and GFA linear?spline).
These models were validated rigorously using different
validation and statistical metrics in search of robust and
statistically significant models. The external validation was
performed by predicting the activity of test set compounds
Fig. 2 a Pharmacophore hypothesis (Hypo-1) with one hydrogen
bond acceptor (HBA), one hydrophobic aliphatic (HYAl), and two
ring aromatic (RA) features and interfeature distance (A); b Mapping
of the most active compound 87 of the training set (pharmacophore
mapping) on the Hypo-1; c Mapping of the least active compound 14
(with two features missing) of the training set (pharmacophore
mapping) on Hypo-1
Med Chem Res
123
using the developed models. The best QSAR models
obtained by different tools are given below.
QSAR model using stepwise multiple linear regression
pIC50ðmMÞ ¼ 3:355þ 0:288�00 SC�200 þ 0:61�00
Atype C 2500 � 0:266�00 Atype O 60
00
� 0:253�00 A log P9800 � 0:248�00
Dipole�mag00 þ 0:28�00 CHI�V�3 C
00
ð1Þ
NTraining = 63; R2 = 0.705, Ra2 = 0.674, Q2 = 0.637,
Rpred2 = 0.700, r2
mðtrainingÞ = 0.590, Drm(training)2 = 0.210;
NTest = 24; r2mðtestÞ = 0.578, Drm(test)
2 = 0.17, r2mðOverallÞ =
0.591, Drm(Overall)2 = 0.20, S = 0.550.
QSAR model using GFA spline term
pIC50ðmMÞ ¼ 1:50� 0:841�\� 0:199�00
Jurs�FPSA�200[ þ 0:327\2:65�00
Atype O 6000[ þ 0:245\1:039�00
Dipole�mag00[ þ 0:560\
00
Atype C 2500 þ 1:69 [
ð2Þ
NTraining = 63; R2 = 0.736, Ra2 = 0.717, Q2 = 0.689,
Rpred2 = 0.668, r2
mðtrainingÞ = 0.633, Drm(training)2 = 0.20;
NTest = 24; r2mðtestÞ = 0.544, Drm(test)
2 = 0.174, r2mðOverallÞ =
0.609, Drm(Overall)2 = 0.195, S = 0.512.
QSAR model using GFA linear and spline terms
pIC50ðmMÞ ¼ 4:201� 1:027�\0:43�00
RadofGyration00[ � 0:382�00 Atype O 60
00
� 0:237�\00Dipole�mag
00 þ 0:911 [
þ 0:171�00 Atype N 6700
ð3Þ
NTraining = 63; R2 = 0.724, Ra2 = 0.705, Q2 = 0.675,
Rpred2 = 0.720, r2
mðtrainingÞ = 0.618, Drm(training)2 = 0.210;
NTest = 24; r2mðtestÞ = 0.612, Drm(test)
2 = 0.119, r2mðOverallÞ =
0.617, Drm(Overall)2 = 0.180, S = 0.523.
In Eq. 1, SC-2 and CHI-V-3_C are topological
descriptors; Atype_C_25, Atype_O_60, and AlogP98 are
atom-type logP fragments (molecular hydrophobicity), and
Dipole-mag is the dipole moment (electronic descriptor).
The SC-2 index is the number of second-order subgraphs
Table 5 Results of process and model randomization tests
Eq.
no.
Type of model Model
randomization at
99 % confidence
level
Process
randomization at
95 % confidence
level
R2 Rr2 cRp
2 R2 Rr2 cRp
2
2 GFA spline 0.735 0.073 0.698 0.735 0.262 0.589
3 GFA
linear ? spline
0.724 0.012 0.718 0.724 0.249 0.587
Table 6 Results of pharmacophore development for antimalarial
activity against multidrug-resistant P. falciparum; Config.
cost = 16.105; null cost = 193.357; fixed cost = 86.903
Hypothesis Features Total
cost
Dcosta Dcostb rmsd correlation
(R)
c1 HBA,
HYAl,
RA,
RA
103.046 90.311 16.143 1.158 0.932
2 HBA,
HYAl,
RA,RA
118.637 74.72 31.734 1.594 0.868
3 HBA,
HYAl,
RA,RA
119.086 74.271 32.183 1.633 0.860
4 HBA,
HBD,
RA
119.773 73.584 32.87 1.654 0.856
5 HYA,
RA,
RA
120.356 73.001 33.453 1.669 0.854
6 HYA,
RA,
RA
120.705 72.652 33.802 1.664 0.855
7 HBA,
HBD,
RA
120.828 72.529 33.925 1.677 0.852
8 HBA,
HYAl,
RA,
RA
120.855 72.502 33.952 1.648 0.858
9 HBA,
RA,
RA
121.129 72.228 34.226 1.688 0.850
10 HBA,
HYAl,
RA,
RA
121.189 72.168 34.286 1.640 0.859
Dcosta : (null cost - total cost)
Dcostb: (total cost - fixed cost)c Best hypothesis
Med Chem Res
123
in the molecular graph. It indicates the number of pairs of
connected edges. As the value of an SC-2 index increases,
the size of the molecule also increases. The most active
molecule (88) with an SC-2 index value of 49 and the least
active molecule (19) with an SC-2 index value of 22
suggest that, as the size of the molecule increases, the
activity also increases. The CHI-V-3_C is a Kier & Hall
valence-modified connectivity index, which takes into
consideration of the electronic configuration of atoms
represented by the vertex with four skeletal atoms in a
trigonal relationship. This structural motif is generally
appears only in trifluoromethane and tetrafluoromethane
fragments in the datasets (Cerius2 QSAR? 4.5 manual,
2000). The presence of trifluoromethyl (–CF3) group in the
aminopyridine compounds contributes positively to the
inhibitory activity. All the compounds possessing a tri-
fluoromethyl fragment have pIC50 values greater than 3 log
unit. The compounds 64 and 65, despite having –CF3
fragments, show lower activity. This may be due to that
although these compounds have –CF3 fragments, they have
low second-order subgraph count (SC-2). This shows that
the compounds with both higher values of an SC-2 and
CHI-V-3_C indices have higher antimalarial activity as
observed in case of compounds 83–88. This indicates that
the topological descriptors can encode the required struc-
tural features for the activity with positive contributions of
higher molecular size and the presence of trifluoromethyl
group. The dipole moment is a 3D electronic descriptor
which indicates the strength and orientation behavior of a
molecule in an electrostatic field. It is estimated by uti-
lizing partial atomic charges and atomic coordinates. As
the dipole moment increases, the polarity of the molecule
also increases and vice versa. The negative contribution of
dipole moment indicates that, hydrophobicity is favorable
for the activity, which is also shown by the positive con-
tribution of Atype_C_25 and CHI-V-3_C indices. The
hydrophobicity requirement for the activity is also
confirmed by the presence of at least two aromatic
rings (Pharamcophoric features; vide infra) in all the
compounds.
The hydrophobic parameter Atype_C_25 contributes
positively, while Atype_O_60 and AlogP98 contribute
negatively to the activity. The Atype_C_25 index is related
to the tertiary carbon atom (R–CR–R) of the benzene ring
(Ghose et al., 1998). The compound with more number of
tertiary carbon atoms of benzene ring was found to be the
most active (compound 88) than the least active compound
with fewer number of tertiary carbon atoms (compound
19). The descriptor Atype_O_60 is related to the groups of
type Al–O–Ar, Ar2O, and R–O–R. The presence of di-
aryloxy or arylalkyloxy groups in compounds 64 (CH3–O–
py and ph–O–CF3), 66 (two CH3–O–py), and 69 (CH3–O–
py) decreases the antimalarial activity as compared to the
most active compound (88), which does not possess any of
these groups.
We have also performed the GFA analysis to improve
the results of stepwise regression model. The equation
obtained by GFA spline is given in Eq. 2. In Eq. 2, the
Jurs-FPSA-2 index is a fractional charged partial surface
area (spatial descriptor). This is calculated by dividing total
charge weighted positive surface area by the total molec-
ular solvent-accessible surface area. The negative contri-
bution of Jurs-FPSA-2 suggests that, an ionic interaction
involving positive charges of the ligand is not favorable for
interaction with the receptor protein. The descriptors Aty-
pe_C_25, Atype_O_60, and Dipole-mag in Eq. 2 have same
Table 7 Observed and estimated antimalarial activity against P. falciparum of the training set compounds based on Hypo-1
Comp. no. Training set Activity scale Comp. no. Training set Activity scale
Observed
(IC50 nM)
Estimated
(IC50 nM)
Observed Estimated Observed
(IC50 nM)
Estimated
(IC50 nM)
Observed Estimated
2 6,450 10,845 L L 42 80 207 H H
4 10,230 14,694 L L 43 33 541 H L
7 1,470 2,096 L L 46 330 500 L L
9 2,930 469 L L 48 2,400 243 L H
14 36,620 20,159 L L 56 68 60 H H
18 22,210 21,193 L L 59 33 29 H H
21 49 55 H H 62 107 96 H H
23 818 516 L L 67 480 319 L L
26 120 465 H L 70 98 277 H H
28 500 518 L L 77 290 147 H H
33 180 112 H H 84 19 28 H H
37 58 98 H H 87 10 19 H H
Compounds with IC50 B 300 nM: more active (H) and IC50 [ 300 nM: less active (L)
Med Chem Res
123
type of contributions to the activity as observed in the
stepwise regression equation. In order to improve the results
further, we have combined the linear and spline terms in
GFA analysis (Eq. 3). In Eq. 3, the RadofGyration index is
a size descriptor (spatial descriptor) for the distribution of
atomic masses in a molecule. It measures the molecular
compactness, i.e., smaller values are observed when most of
the atoms are close to the center of mass. The equation
(S18) for the calculation of radius of gyration is given in the
supplementary material section. The negative contribution
of RadofGyration suggests that the antimalarial activity
increases with increase in the molecular compactness. As a
molecule becomes more and more compact, more will be
the possibility of its entering into the active site and con-
sequent increase in the antimalarial activity. The Aty-
pe_N_67 index is atomic logP contribution due to Al–NH–
Al group. The presence of secondary nitrogen as observed
in the piperazine ring of compounds 87 and 88 suggests that,
its presence is responsible for the high antimalarial activity.
The descriptors Atype_O_60 and Dipole-mag have similar
type of contributions to the activity as observed in the
stepwise regression and GFA spline equations. The
observed and predicted activity of the training and test sets
compounds, computed from different models, are given in
Table 8 Observed and estimated antimalarial activity against P. falciparum of the test set compounds using Hypo-1
Comp. no. Test set Activity scale Comp. no. Test set Activity scale
Observed
(IC50 nM)
Estimated
(IC50 nM)
Observed Estimated Observed
(IC50 nM)
Estimated
(IC50 nM)
Observed Estimated
1 80 1,683 H L 49 2,160 210 L H
3 7,050 14,190 L L 50 1,120 695 L L
5 10,960 4,204 L L 51 71 207 H H
6 4,940 1,660 L L 52 30 553 H L
8 4,270 647 L L 54 2,527 20 L H
10 1,480 1,693 L L 55 44 52 H H
11 190 1,797 H L 57 1,059 54 L H
12 670 836 L L 58 109 67 H H
13 34,110 4,752 L L 60 166 28 H H
15 6,980 11,643 L L 61 812 56 L H
16 11,100 11,590 L L 63 942 39 L H
17 42,540 40,722 L L 64 8,898 49 L H
19 42,870 40,491 L L 65 10,576 31 L H
20 37,450 13,263 L L 66 1,508 24 L H
22 51 98 H H 68 2,018 311 L L
24 1,106 596 L L 69 2,140 316 L L
25 280 86 H H 71 3,018 315 L L
27 29 359 H L 72 408 319 L L
29 25 88 H H 73 283 104 H H
30 169 231 H H 74 119 91 H H
31 540 455 L L 75 1,598 20 L H
32 155 43 H H 76 161 17 H H
34 90 43 H H 78 209 169 H H
35 634 153 L H 79 302 28 L H
36 19 31 H H 80 347 66 L H
38 2,132 28 L H 81 57 141 H H
39 818 449 L L 82 74 26 H H
40 20 93 H H 83 12 17 H H
41 150 208 H H 85 11 18 H H
44 590 390 L L 86 17 37 H H
45 520 536 L L 88 7 18 H H
47 220 206 H H
Compounds with IC50 B 300 nM: more active (H) and IC50 [ 300 nM: less active (L)
Med Chem Res
123
Tables 2 and 3. The comparison of different statistical and
validation parameters of different models is given in
Table 4. The developed models are not obtained by chance,
and this was confirmed by the model and process random-
ization tests at 95 and 99 % confidence levels. The squared
average correlation coefficients of the random models (Rr2)
for the model and process randomization test were found to
be less than the squared correlation coefficient (R2) of the
corresponding nonrandom models. Moreover, the robust-
ness of models was also confirmed by the high values of cRp2
([0.5). The results of randomization tests are given in
Table 5. All the three models show the acceptable criteria
for statistical validation. But the quality of stepwise
regression model depends on six independent variables,
while the quality of GFA models depends on only four
variables. Among the developed models, the model devel-
oped by GFA linear and spline terms (Eq. 3) was found to
be the best model based on the quality of statistical metrics
and number of independent variables present in the model.
3D-Pharmacophore model
Ten different pharmacophore hypotheses were obtained
from a training set of 24 compounds. The pharmacophore
model (Hypo-1) with a high correlation coefficient (r: 0.932),
lower root mean square deviation (rmsd: 1.15), error 85.79,
and weight 1.14 was found to be of the acceptable quality.
The configuration cost was also within the recommended
range, which indicates that all the generated models have
been thoroughly analyzed. The actual cost for Hypo-1 is
much closer to the fixed cost with only a difference of 16.14
bits, which indicates the true correlation of the data. Again,
there is a large difference of 90.31 bits between the actual
cost and the null cost for Hypo-1. Hence, Hypo-1 was found
to be the best one among the ten hypotheses with one HBA,
one HYAl, and two ring aromatic features (Fig. 2a). The
results of ten pharmacophore hypotheses against multidrug-
resistant P. falciparum are given in Table 6. The external
predictability of the model has been done by mapping the test
set molecules on the Hypo-1 with the same settings as
employed for the pharmacophore generation by the BEST
method. All the molecules were mapped completely. The
classification ability of the model to classify compounds into
more active and less active antimalarials was checked by
comparing the observed activity with predicted activity by
the classification based technique. For this purpose, the
compounds with IC50 values B 300 nm were classified as
more actives and compounds with IC50 values [ 300 nM as
less actives. The observed and estimated activity of the
training and test sets compounds using Hypo-1 are given in
Tables 7 and 8, respectively. The values of different vali-
dation parameters for training as well as test sets are given in
Table 9. The values of validation parameters for both the
training and test sets are greater than 62.00 %, which suggest
the robustness and acceptability of the developed model. The
model correctly classified 11 out of 13 compounds as actives
(84.62 %) and 10 out of 11 (90.91 %) compounds as less
actives for the training set. For the test set, the model cor-
rectly classified 24 out of 28 compounds (85.71 %) as more
actives and 22 out of 35 (62.85 %) compounds as less active
antimalarials. The model reveals better classification of the
more active compounds for the test set. So, the Hypo-1 model
is best suited for the classification of more active antimala-
rials against multidrug-resistant P. falciparum.
All the compounds have at least two ring aromatic
features, of which one ring is either pyrazole/pyridine/
thiazole and the other is phenyl or other heterocycle. These
two RA features are the preliminary requirements for the
activity against multidrug-resistant P. falciparum strain.
These two RA features are also in accordance with the
Atype_C_25 and SC-2 indices of the 2D-QSAR models
(Eqs. 1, 2). The most active compound of the training set
(87, IC50:10 nM) mapped completely on Hypo-1 with all
the four features (Fig. 2b). The two pyridine rings lie in the
RA region, the trifluoromethyl (–CF3) group in the HYAl
region, and the carbonyl oxygen in the hydrogen bond
acceptor region. The least active compound (14, IC50:
36,620 nM) of the training set lacks one HBA and one
HYAl feature, and thus does not map completely (Fig. 2c).
The absence of these two features decreases the antima-
larial potency by 1,000 times as compared to the activity of
compound 87. The two ring aromatic features including
one aminopyridine and the other pyridine/phenyl, one
hydrophobic aliphatic as –CF3 group, and one hydrogen
bond acceptor as carbonyl group flanked with one phenyl
and piperazine ring are responsible for the highest anti-
malarial potency. These features are present only in com-
pounds 87 and 88, due to which these compounds show
Table 9 Different qualitative validation parameters of Hypo-1 model obtained by classification of more active and less active compounds for the
training and test sets
Training/Test Qualitative validation parameters (%)
No. of compounds Sensitivity Specificity Recall Accuracy Precision F-measure G-means
Training set (28 % compounds) 24 84.62 90.91 84.62 87.50 91.67 88.00 87.71
Test set (72 % compounds) 63 85.71 62.86 85.71 73.02 64.86 73.85 73.40
Med Chem Res
123
activity B10 nM. The HYAl feature (–CF3) of pharmaco-
phore are also in accordance with the CHI-V-3_C
descriptor of the 2D-QSAR model (Eq. 1). The replace-
ment of HYAl (–CF3 group) with ethereal oxygen R–O–Ar
(Atype_O_60) reduces the antimalarial potency as
observed in the negative contribution of Atype_O_60 index
(Eqs. 1–3). The structures of the most active and least
active compounds of the training set, along with the
pharmacophoric features and QSAR descriptors are given
in Fig. 3. The F-test confirms the non-randomness of the
developed pharmacophore (Hypo-1). This was confirmed
by a higher correlation coefficient (R: 0.932) of the actual
model than the average correlation coefficient of random
models (Rr: 0.64), and by also the closeness of total cost of
the actual model (103.04) to the fixed cost (86.90), rather
than the average cost of randomized models (158.79).
Moreover, the high value of cRp2 ([0.5) (0.713) further
confirms the robustness of the model. The original and
randomized total cost values of hypotheses for F-test are
given in Fig. S1 of the Supplementary materials section.
In summary, this study suggests that both the models
could be utilized for quantitative (2D-QSAR Eq. 3) as well
as qualitative (pharmacophore) prediction of antimalarial
activity of similar class of compounds as used in this study
against the deadly multidrug-resistant P. falciparum.
Conclusion
We have developed here 2D-QSAR and 3D-pharmaco-
phore models using aminothiazole and aminopyridine
compounds for their activity against multidrug-resistant
strain (k1) of P. falciparum. The selected dataset was
important because of two aspects. Firstly, it comprises
novel scaffolds (aminothiazole and aminopyridines) for
which there are no previous reports of development of
QSAR/pharamcohphore modeling against the multidrug-
resistant strain of P. falciparum. Secondly, the compounds
show inhibitory activity against multidrug-resistant strain
(k1) of P. falciparum, the further study of which is
important in a scenario of increasing P. falciparum resis-
tance. The study of QSAR and pharmacophore modeling
would be important in order to rationally identify and
design the inhibitors of these two classes, in the absence of
the target information. The 3D-pharmacophoric study
unveiled four different pharmacophoric features namely
one HBA, one HYAl, and two RA features contributing to
the antimalarial potency against multidrug-resistant strain
of P. falciparum. The two ring aromatic features (one
aminopyridine/pyrazole/pyridine/thiazole rings and the
other phenyl/other heterocycle) are the minimum structural
features for the activity, while the HBA (carbonyl group)
and HYAl (–CF3 group) features contribute to the potency
of the compounds. The absence of HBA and HYAl features
in the structure results in a significant decline of antima-
larial potency. Thus, these pharamcophoric features could
be helpful in designing compounds against multidrug-
resistant strain (k1) of P. falciparum. The developed
regression and pharmacophore models are statistically
significant and robust for the prediction of antimalarial
activity of aminothiazoles and aminopyridines against the
multidrug-resistant strain of P. falciparum. These models
can also be utilized for screening of compounds within the
applicability domain for discovering novel leads against
multidrug-resistant P. falciparum.
Acknowledgments The authors are thankful to the University
Grants Commission (UGC), New Delhi for providing financial
assistance in the form of a major research project (KR).
References
Athri P, Wenzler T, Tidwell R, Bakunova SM, Wilson WD (2010)
Pharmacophore model for pentamidine analogs active against
Plasmodium falciparum. Eur J Med Chem 45(12):6147–6151
Accelry’s Discovery Studio 2.1. http://accelrys.com/products/
discovery-studio/
Bhattacharjee AK, Hartell MG, Nichols DA, Hicks RP, Stanton B, van
Hamont JE, Milhous WK (2004) Structure-activity relationship
study of antimalarial indolo [2, 1-b] quinazoline-6, 12-diones
(tryptanthrins). Three dimensional pharmacophore modeling and
identification of new antimalarial candidates. Eur J Med Chem
39(1):59–67
Boyle NM, Banck M, James CA, Morley C, Vandermeersch T,
Hutchison GR (2011) Open Babel: an open chemical toolbox.
J Cheminform 3(1):1–14
Cerius2 QSAR ? 4.5. manual (2000): 28-35
Fig. 3 Structures of the most active and least active compounds of
the training set with pharmacophoric features and QSAR descriptor
fragments
Med Chem Res
123
Cerius 2 version 4.10 is a product of Accelrys, Inc., San Diego, USA.
http://www.accelrys.com/cerius2
CS ChemDraw version 5.0. http://www.camsoft.com
Dondorp AM, Nosten F, Yi P, Das D, Phyo AP, Tarning J, Lwin KM,
Ariey F, Hanpithakpong W, Lee SJ (2009) Artemisinin resis-
tance in Plasmodium falciparum malaria. N Engl J Med 361(5):
455–467
Ghose AK, Viswanadhan VN, Wendoloski JJ (1998) Prediction of
hydrophobic (lipophilic) properties of small organic molecules
using fragmental methods: an analysis of ALOGP and CLOGP
methods. J Phys Chem A 102(21):3762–3772
Golbraikh A, Tropsha A (2002) Beware of q2 ! J Mol Graph Model
20(4):269–276
Gonzalez Cabrera D, Douelle F, Younis YE, Feng TS, LeManach C,
Nchinda AT, Street LJ, Scheurer C, Kamber J, White KL (2012)
Structure-activity relationship studies of orally active antimalarial
3, 5-substituted 2-aminopyridines. J Med Chem 55:11022–11030
Li H, Sutter J, Hoffmann R (2000) HypoGen: An automated system
for generating predictive 3D pharmacophore models. Pharma-
cophore perception, development, and use in drug design.
International University, La Jolla, pp 171–189
Li J, Li S, Bai C, Liu H, Gramatica P (2013) Structural requirements
of 3-carboxyl-4 (1H)-quinolones as potential antimalarials from
2D and 3D QSAR analysis. J Mol Graphics Model 44:266–277
Mahajan DT, Masand VH, Patil KN, Hadda TB, Rastija V (2013)
Integrating GUSAR and QSAR analyses for antimalarial activity
of synthetic prodiginines against multi drug resistant strain. Med
Chem Res 22(5):2284–2292
Mitra I, Saha A, Roy K (2010) Exploring quantitative structure–
activity relationship studies of antioxidant phenolic compounds
obtained from traditional Chinese medicinal plants. Mol Simul
36:1067–1079
MINITAB 14 is a Statistical software of Minitab Inc., USA, http://
www.minitab.com
Paquet T, Gordon R, Waterson D, Witty MJ, Chibale K (2012)
Antimalarial aminothiazoles and aminopyridines from pheno-
typic whole-cell screening of a SoftFocusA� library. Future Med
Chem 4(18):2265–2277
Roy K, Mitra I, Kar S, Ojha PK, Das RN, Kabir H (2012)
Comparative studies on some metrics for external validation of
QSPR models. J Chem Inf Model 52(2):396–408
Sahu NK, Sharma MC, Mourya V, Kohli DV (2011) QSAR studies of
some side chain modified 7-chloro-4-aminoquinolines as anti-
malarial agents. Arabian J. Chem (in press). doi:10.1016/j.arabjc.
2010.12.005
SPSS 9.0 is statistical software of SPSS Inc., USA. http://www.spss.
com
Wenzel NI, Chavain N, Wang Y, Friebolin W, Maes L, Pradines B,
Lanzer M, Yardley V, Brun R, Herold-Mende C (2010)
Antimalarial versus cytotoxic properties of dual drugs derived
from 4-aminoquinolines and Mannich bases: interaction with
DNA. J Med Chem 53(8):3214–3226
World Malaria Report 2012; http://www.who.int/malaria/publications/
world_malaria_report_2012/en/ Accessed on 12 June 2013
Xue CX, Cui SY, Liu MC, Hu ZD, Fan BT (2004) 3D QSAR studies
on antimalarial alkoxylated and hydroxylated chalcones by
CoMFA and CoMSIA. Eur J Med Chem 39(9):745–753
Med Chem Res
123