+ All Categories
Home > Documents > Does tautomerism influence the outcome of QSAR modeling?

Does tautomerism influence the outcome of QSAR modeling?

Date post: 21-Dec-2016
Category:
Upload: mohamed-ashraf
View: 213 times
Download: 1 times
Share this document with a friend
16
ORIGINAL RESEARCH Does tautomerism influence the outcome of QSAR modeling? Vijay H. Masand Devidas T. Mahajan Taibi Ben Hadda Rahul D. Jawarkar Ahmed M. Alafeefy Vesna Rastija Mohamed Ashraf Ali Received: 21 February 2013 / Accepted: 29 August 2013 Ó Springer Science+Business Media New York 2013 Abstract Tautomerism is an important aspect associated with a variety of pharmacologically and biologically active compounds. It is a challenge to account for tautomerism in computer-aided drug designing (CADD). The estimations and calculations of many physico-chemical properties and theoretical descriptors of the molecules are sensitive to tautomerism. In this study, we have attempted to analyze the effect of tautomerism on feature selection and statistical performance/characteristics of conventional quantitative structure–activity relationship (QSAR) equations. These equations are developed using 2D and 3D-descriptors employing two different statistical methods, i.e., genetic algorithm (GA) and stepwise regression (SR). Five datasets of moderate sizes viz. (1) anti-malarial activity of synthetic prodiginines against multi-drug resistant strain (N = 43), (2) anti-malarial activity of bisaryl quinolones (N = 37), (3) anti-malarial activity of phosphoramidate and phos- phorothioamidate analogs of amiprophos methyl (N = 36), (4) anti-proliferative activity of substituted N-phenyl urei- dobenzenesulfonate derivatives (N = 44), and (5) anti-HIV activity of indolylarylsulfones as HIV-1 non-nucleoside reverse transcriptase inhibitors (N = 36) showing different types of tautomerism were used in the study. In each case, the developed model and the selected descriptors derived using one tautomer were applied on other tautomeric forms to understand the influence of tautomerism on QSAR equations. Different parameters like R, R 2 , R adj 2 , R cv 2 , F, S and Y-randomization were used for thorough validation of all the models. The results revealed that tautomerism has significant influence on feature selection. In addition, it was found that tautomerism has a great influence on the per- formance of QSAR models of the second and the third datasets. However, no significant influence was observed on the statistical characteristics of QSAR models for datasets 1, 4, and 5. Therefore, it is suggested that separate Electronic supplementary material The online version of this article (doi:10.1007/s00044-013-0776-0) contains supplementary material, which is available to authorized users. V. H. Masand (&) Á D. T. Mahajan Department of Chemistry, Vidya Bharati College, Camp, Amravati, Maharashtra, India e-mail: [email protected]; [email protected] T. Ben Hadda Laboratoire Chimie des Mate ´riaux, Universite ´ Mohammed Premier, Oujda 60000, Morocco R. D. Jawarkar Department of Pharmaceutical Chemistry, P. Wadhwani College of Pharmacy, Dhamangaon Rly. Road, Yavatmal, Maharashtra, India A. M. Alafeefy Department of Pharmaceutical Chemistry, College of Pharmacy, Salman Bin Abdulaziz University, P.O. Box 173, Alkharj 11942, Saudi Arabia V. Rastija Department of Chemistry, Faculty of Agriculture, Josip Juraj Strossmayer University of Osijek, Osijek 31000, Croatia M. A. Ali Pharmacogenetic and Pharmacogenomic Research, Institute for Research in Molecular Medicine, Universiti Sains Malaysia, Pennag 11800, Malaysia 123 Med Chem Res DOI 10.1007/s00044-013-0776-0 MEDICINAL CHEMISTR Y RESEARCH
Transcript

ORIGINAL RESEARCH

Does tautomerism influence the outcome of QSAR modeling?

Vijay H. Masand • Devidas T. Mahajan • Taibi Ben Hadda • Rahul D. Jawarkar •

Ahmed M. Alafeefy • Vesna Rastija • Mohamed Ashraf Ali

Received: 21 February 2013 / Accepted: 29 August 2013

� Springer Science+Business Media New York 2013

Abstract Tautomerism is an important aspect associated

with a variety of pharmacologically and biologically active

compounds. It is a challenge to account for tautomerism in

computer-aided drug designing (CADD). The estimations

and calculations of many physico-chemical properties and

theoretical descriptors of the molecules are sensitive to

tautomerism. In this study, we have attempted to analyze

the effect of tautomerism on feature selection and statistical

performance/characteristics of conventional quantitative

structure–activity relationship (QSAR) equations. These

equations are developed using 2D and 3D-descriptors

employing two different statistical methods, i.e., genetic

algorithm (GA) and stepwise regression (SR). Five datasets

of moderate sizes viz. (1) anti-malarial activity of synthetic

prodiginines against multi-drug resistant strain (N = 43),

(2) anti-malarial activity of bisaryl quinolones (N = 37),

(3) anti-malarial activity of phosphoramidate and phos-

phorothioamidate analogs of amiprophos methyl (N = 36),

(4) anti-proliferative activity of substituted N-phenyl urei-

dobenzenesulfonate derivatives (N = 44), and (5) anti-HIV

activity of indolylarylsulfones as HIV-1 non-nucleoside

reverse transcriptase inhibitors (N = 36) showing different

types of tautomerism were used in the study. In each case,

the developed model and the selected descriptors derived

using one tautomer were applied on other tautomeric forms

to understand the influence of tautomerism on QSAR

equations. Different parameters like R, R2, Radj2 , Rcv

2 , F,

S and Y-randomization were used for thorough validation

of all the models. The results revealed that tautomerism has

significant influence on feature selection. In addition, it was

found that tautomerism has a great influence on the per-

formance of QSAR models of the second and the third

datasets. However, no significant influence was observed

on the statistical characteristics of QSAR models for

datasets 1, 4, and 5. Therefore, it is suggested that separate

Electronic supplementary material The online version of thisarticle (doi:10.1007/s00044-013-0776-0) contains supplementarymaterial, which is available to authorized users.

V. H. Masand (&) � D. T. Mahajan

Department of Chemistry, Vidya Bharati College, Camp,

Amravati, Maharashtra, India

e-mail: [email protected]; [email protected]

T. Ben Hadda

Laboratoire Chimie des Materiaux, Universite Mohammed

Premier, Oujda 60000, Morocco

R. D. Jawarkar

Department of Pharmaceutical Chemistry, P. Wadhwani College

of Pharmacy, Dhamangaon Rly. Road, Yavatmal, Maharashtra,

India

A. M. Alafeefy

Department of Pharmaceutical Chemistry, College of Pharmacy,

Salman Bin Abdulaziz University, P.O. Box 173, Alkharj 11942,

Saudi Arabia

V. Rastija

Department of Chemistry, Faculty of Agriculture, Josip Juraj

Strossmayer University of Osijek, Osijek 31000, Croatia

M. A. Ali

Pharmacogenetic and Pharmacogenomic Research, Institute for

Research in Molecular Medicine, Universiti Sains Malaysia,

Pennag 11800, Malaysia

123

Med Chem Res

DOI 10.1007/s00044-013-0776-0

MEDICINALCHEMISTRYRESEARCH

models need to be developed for different tautomeric forms

of a dataset.

Keywords Tautomerism � QSAR � Statistical robustness �Stepwise regression � Genetic algorithm

Abbreviations

CADD Computer-aided drug design

QSAR Quantitative structure–activity relationship

CEU 2-Chloroethylurea

PIB-SO Phenyl-4-(2-oxoimidazolidin-1-yl)benzene

sulfonate

PUB-SO N-phenyl ureidobenzenesulfonate

PUB-SA N-phenylureidobenzenesulfonamide

CPU 3-Chloropropylurea

EU Ethylurea

Introduction

Developing a highly potent drug free from side effects for a

disease is the primary aim of drug designing and a large

number of molecules are synthesized and optimized to

realize this need. The conventional ‘trial and error’ method

that involves continuous cycles of ‘synthesis-testing’ is a

time taking, costly and laborious technique in drug

designing (Doweyko, 2008; Mahajan et al., 2012; Myint

and Xie, 2010; Scior et al., 2009). After synthesizing and

screening certain molecules, computer-aided drug design

(CADD) is employed in order to achieve high success rate

and reduce the time taken for the process. Quantitative

structure–activity relationship (QSAR), molecular model-

ing, molecular docking and pharmacophore modeling, each

having its own advantages and limitations, are major

thriving contemporary techniques in CADD (Masand et al.,

2010, 2012; Schwab, 2010; Van Drie, 2007).

QSAR is an established and widely appreciated diag-

nostic chemometric technique that correlates biological

activity with structural features (Baumann and Stiefl, 2004;

da Cunha et al., 2011; Golbraikh and Tropsha, 2002;

Gramatica and Papa, 2007; Gramatica et al., 2007; Kub-

inyi, 2002; Mahajan et al., 2013; Tropsha, 2010). The

conventional QSAR modeling involves the establishment

of an appropriately validated mathematical equation cor-

relating biological activity/response with one or more

molecular descriptors. These descriptors represent the

structural patterns/features having significant correlation

with the response (Beheshti et al., 2012; Consonni et al.,

2010; Mitra et al., 2011, 2012; Roy et al., 2008). For a

successful QSAR analysis, the developed equation must be

statistically robust with a minimum correlation (R \ 0.60)

among the descriptors. Appropriately validated QSAR

models are very useful for the prediction of activities even

before the synthesis of potential compounds. It has been

widely accepted that the development of statistically robust

QSAR models depend on the quality of experimental data,

selection of descriptors and statistical methods (Gramatica

and Papa, 2007; Gramatica et al., 2007; Huang and Fan,

2011; Mitra et al., 2011, 2012; Sahigara et al., 2012; Yi

and Zhang, 2012). Presently, plenty of advanced methods,

statistical algorithms and techniques are available for the

calculation and selection of descriptors, thereby, making

the process of building a statistically significant QSAR

model relatively simple and more straightforward.

Overall, a QSAR equation represents a simplified,

coherent, structure-based summary of patterns for a par-

ticular set of congeneric molecules apropos of a biological

activity/property in a statistical way. The understanding of

these patterns accelerates the process of finding new ther-

apeutic agents or successfully modifying the existing ones.

Many researchers carry out the QSAR analysis to find

patterns for lead optimization. During the QSAR analysis,

myriad numbers of QSAR equations are developed. The

analysis is followed by the selection of an appropriate

QSAR equation on the basis of its statistical performance

(Hawkins, 2004; Hawkins et al., 2008; Kiralj and Ferreira,

2009; Pratim Roy et al., 2009; Tropsha, 2010). The QSAR

equation is selected from various statistically acceptable

and equally feasible alternative QSAR equations.

The selection of a QSAR equation merely on the bases

of statistical parameters is very exigent and tricky. The

situation becomes more complicated when congeneric

molecules can exhibit tautomerization. Tautomers can

interchange due to the migration of a labile hydrogen atom

or proton (termed as prototropy). They are structural iso-

mers of organic compounds that interconvert with a rela-

tively low activation energy below ca. 20 kcal/mol. Many

pharmaceutically and biologically important molecules

exhibit prototropy. Tautomerism can transform H-donor to

H-acceptor and vice-versa (Oellien et al., 2006). Tautomers

usually have different physico-chemical properties like

pKa, logP, solubility, etc. (Martin, 2009, 2010). The tau-

tomeric form that is energetically favored in solution may

not be the ‘bioactive tautomeric form’ of a molecule that

can interact with a specific receptor. Molecules showing

tautomerism can have different interacting tendencies

toward receptors. A molecule may interact with different

receptors in different tautomeric forms (Trepalin et al.,

2003). For example, inside the DNA, adenine normally

pairs with thymine but the imino form of adenine pairs with

cytosine (Shugar and Kierdaszuk, 1985). The existence of a

particular tautomer depends on factors like dielectric con-

stant of medium, pH, lipophilicity, etc. Many computer-

based applications consider tautomers as different

Med Chem Res

123

structures, resulting in small to significant changes in the

values of 2D- and 3D-descriptors, especially for the theo-

retically calculated descriptors (Thalheim et al., 2010; Zou

et al., 2007). Many researchers have emphasized that tau-

tomerism tends to complicate the calculation of molecular

descriptors/properties, which consequently, affects the

development of a QSAR equation for conventional 2D-

QSAR analysis (Pospisil et al., 2003; Zou et al., 2007).

Tautomerism can have a significant influence on CADD

(Oellien et al., 2006; Pospisil et al., 2003; Zou et al., 2007).

In this study, we have attempted to understand the effect

of tautomerism on feature selection and statistical perfor-

mance/characteristics of conventional QSAR equation

development using 2D- and 3D-descriptors using two dif-

ferent statistical methods.

Datasets

Dataset-1

The experimental in vitro anti-malarial inhibitory concen-

trations (IC50) expressed in nanomolar units against the

chloroquine (CQ) resistant strain Dd2 of Plasmodium fal-

ciparum (P. Falciparum) for 43 synthetic prodiginines

exhibiting azafulvene–pyrrole tautomerism was used in the

study (see Fig. 1). This resulted in the discovery of four

tautomeric forms for the same molecule. The dataset was

selected for the study from a recent publication (Papireddy

et al., 2011). The dataset includes prodiginines with dif-

ferent substituents like –F, –Cl, varying lengths of alkyl

chains, and substituents at different positions of the ben-

zene ring. The experimental activities (IC50 and pIC50), and

substituents are listed in Table 1. For modeling purpose,

IC50 (nM) values were converted to logarithm units pIC50

(M) (-log 10 IC50 = pIC50).

Dataset-2

The in vitro antimalarial activities of bicyclic quinolones

versus 3D7 P. falciparum were used for second dataset

(Pidathala et al., 2012). The bicyclic quinolones have a

variety of substituents like –F, –OCF3, etc. at various

positions. For QSAR analysis, the reported activity IC50

(nM) values were converted to pIC50 (M) (see Fig. 2,

Table 2).

Dataset-3

Thirty-six phosphoramidate and phosphorothioamidate

analogs of amiprophos methyl previously reported as

potential anti-malarial agents were selected for QSAR

N

HN

O

R1

NH

R2

NH

N

O

R1

NH

R2

N

HN

O

R1

N

R2

NH

HN

O

R1

N

R2

Tautomer-2Tautomer-1

Tautomer-4Tautomer-3

Fig. 1 Tautomeric forms of

synthetic prodiginines (dataset-1)

used in this study

Med Chem Res

123

Table 1 Experimental data IC50 and pIC50 for dataset-1 of synthetic prodiginines

R1

N HN

R3

R2

O

S. no. R1 R2 R3 IC50 (nM)

Dd2

pIC50

expt.

1 2-pyrolyl n-C4H9 H 1,590 5.799

2 2-pyrolyl n-C6H13 H 450 6.347

3 2-pyrolyl n-C8H17 H 130 6.886

4 2-pyrolyl n-C16H33 H 400 6.398

5 2-pyrolyl H CH2CH(CH3)2 230 6.638

6 2-pyrolyl H n-C4H9 18 7.745

7 2-pyrolyl H n-C6H13 7 8.155

8 2-pyrolyl H n-C8H17 1.8 8.745

9 2-pyrolyl H n-C10H21 10 8.000

10 2-pyrolyl H C6H5CH2 86 7.066

11 2-pyrolyl H 4-OCH3C6H4CH2 156 6.807

12 2-pyrolyl H 4-ClC6H4CH2 81 7.092

13 2-pyrolyl H 4-BrC6H4CH2 108 6.967

14 2-pyrolyl CH3 CH3 8,130 5.090

15 2-pyrolyl n-C6H13 n-C3H7 4.0 8.398

16 2-pyrolyl n-C8H17 n-C3H7 2.7 8.569

17 2-pyrolyl n-C3H7 1.3 8.886

18 2-pyrolyl n-C6H13 n-C6H13 1.1 8.959

19 2-pyrolyl n-C7H15 n-C6H13 1.2 8.921

20 2-pyrolyl n-C6H13 n-C8H17 2.0 8.699

21 2-pyrolyl n-C7H15 n-C8H17 2.9 8.538

22 2-pyrolyl n-C8H17 n-C8H17 129 6.889

23 2-pyrolyl 3.5 8.456

24 2-pyrolyl C2H5 4-ClC6H4CH2 6.2 8.208

25 2-pyrolyl n-C3H7 4-ClC6H4CH2 2.6 8.585

26 2-pyrolyl n-C6H13 4-ClC6H4CH2 1.8 8.745

27 2-pyrolyl n-C7H15 4-ClC6H4CH2 2.2 8.658

28 2-pyrolyl n-C8H17 4-ClC6H4CH2 12.0 7.921

29 2-pyrolyl 4-ClC6H4CH2 2.9 8.538

30 2-pyrolyl n-C6H13 4-FC6H4CH2 0.9 9.046

31 2-pyrolyl n-C8H17 4-FC6H4CH2 1.2 8.921

32 2-pyrolyl n-C6H13 4-BrC6H4CH2 2.8 8.553

33 2-pyrolyl n-C8H17 4-BrC6H4CH2 2.9 8.538

34 2-pyrolyl 4-ClC6H4CH2 4-ClC6H4CH2 4.8 8.319

35 2-pyrolyl 4-FC6H4CH2 4-FC6H4CH2 5.7 8.244

Med Chem Res

123

Y NH

O

R1

X

A

R2

Y N

HO

R1

X

A

R2

Tautomer-2Tautomer-1

Fig. 2 Tautomeric forms of bisaryl quinolones (dataset-2) used in this study

Table 2 Experimental data IC50 and pIC50 for dataset-2 of bisaryl quinolones

Y NH

A

O

R1

R2

X

S.

no.

X Y R1 A R2 IC50 pIC50

1 H N –CH3 pCH2 OCF3 407 6.390

2 7-F CH –CH3 pCH2 OCF3 69 7.161

3 6-F, 7-F CH –CH3 pCH2 OCF3 24 7.620

4 H N –CH3 pCH2 F 506 6.296

5 H CH H pCH2 OCF3 48 7.319

6 6-F, 7-F CH H pCH2 OCF3 16 7.796

7 6-Cl, 7-Cl CH H pCH2 OCF3 28 7.553

8 6-F, 7-OMe CH H pC2 OCF3 39 7.409

9 N N6 CH H pCH2 OCF3 430 6.367

Table 1 continued

S. no. R1 R2 R3 IC50 (nM)

Dd2

pIC50

expt.

36 2-pyrolyl 4-BrC6H4CH2 4-BrC6H4CH2 11.0 7.959

37 2-pyrolyl 4-FC6H4CH2 4-ClC6H4CH2 6.1 8.215

38 2-pyrolyl 4-BrC6H4CH2 4-ClC6H4CH2 7.7 8.114

39 2-pyrolyl 4-BrC6H4CH2 4-FC6H4CH2 5.1 8.292

40 2-pyrolyl 2,4-Cl2C6H3CH2 2,4-Cl2C6H3CH2 11.0 7.959

41 2-pyrolyl 2,4-F2C6H3CH2 2,4-F2C6H3CH2 18.3 7.738

42 2-pyrolyl 3-FC6H4CH2 3-FC6H4CH2 6.7 8.174

43 2-pyrolyl 2-ClC6H4CH2 2-ClC6H4CH2 4.9 8.310

Med Chem Res

123

analysis (Mara et al., 2011). The activity values IC50 (lM)

were converted to pIC50 (M) (see Fig. 3, Table 3). The

phosphoramidate and phosphorothioamidate analogs of

amiprophos methyl have a variety of substituents at various

positions.

Dataset-4

Forty-four substituted N-phenyl ureidobenzenesulfonate

derivatives that block cell cycle progression in S-phase

were selected (Turcotte et al., 2012). These substituted N-

phenyl ureidobenzenesulfonate derivatives have a variety

of substituents. The activity values IC50 (lM) were con-

verted to pIC50 (M) (see Fig. 4, Table 4).

Dataset-5

A set of 36 indolylarylsulfones previously reported as anti-

HIV-1 non-nucleoside reverse transcriptase inhibitors were

selected (La Regina et al., 2011). The activity values IC50

(nM) were converted to pIC50 (M) (see Fig. 5, Table 5).

Methodology

Tautomer equilibria in homologous structures depend on

the structure and the fractions of individual tautomers. The

equilibrium mixture varies from compound to compound in

the set. These fractions play an important role in deter-

mining the correct correlation equation. Detailed

Table 2 continued

S.

no.

X Y R1 A R2 IC50 pIC50

10N N7

CH H pCH2 OCF3 443 6.354

11 H CH H pCH2 CO2Me 272 6.565

12 H CH Cl pCH2 OCF3 19 7.721

13 H CH –CH3 pCH2 H 107 6.971

14 H CH –CH3 pCH2 OCF3 117 6.932

15 H CH –CH3 mCH2 OCF3 26 7.585

16 H CH –CH3 pCH2 F 83 7.081

17 H CH –CH3 pCH2 OMe 35 7.456

18 6-CF3 CH –CH3 pCH2 OCF3 654 6.184

19 7-CF3 CH –CH3 pCH2 OCF3 212 6.674

20 7-Cl CH –CH3 pCH2 OCF3 36 7.444

21 6-Cl, 7-F CH –CH3 pCH2 OCF3 70 7.155

22 6-F, 7-Cl CH –CH3 pCH2 OCF3 38 7.420

23 5-OMe CH –CH3 pCH2 OCF3 664 6.178

24 6-OMe CH –CH3 pCH2 OCF3 465 6.333

25 7-OMe CH –CH3 pCH2 OCF3 8 8.097

26 8-OMe CH –CH3 pCH2 OCF3 381 6.419

27 6-Cl CH –CH3 mCH2 OCF3 8.4 8.076

28 7-Cl CH –CH3 mCH2 OCF3 34 7.469

29 7-Cl CH –CH3 mCH2 F 105 6.979

30 7-Cl CH –CH3 pCH2 OCF3 30 7.523

31 H CH –CH3 pO OCF3 26 7.585

32 7-Cl CH –CH3 pO OCF3 73 7.137

33 H CH –CH3 pO Cl 230 6.638

34 6-OH CH –CH3 pCH2 OCF3 465 6.333

35 7-OH CH –CH3 pCH2 OCF3 139 6.857

36 8-OH CH –CH3 pCH2 OCF3 819 6.087

37 6-OAc CH –CH3 pCH2 OCF3 408 6.389

Med Chem Res

123

O

P

NHR2

OX

R1

O

P

NHR2

OHX

R1

X = O or S

Tautomer-2Tautomer-1

Fig. 3 Tautomeric forms of

phosphoramidate and

phosphorothioamidate analogs of

amiprophos methyl (dataset-3)

used in this study

Table 3 Experimental data IC50 and pIC50 for dataset-3 of phos-

phoramidate and phosphorothioamidate analogs of amiprophos

methyl

O

P

NHR2

OX

R1

S. no. R1 R2 X IC50 pIC50

1 4-CH3-2-NO2 i-Propyl S 4 5.398

2 4-CH3-2-NO2 i-Propyl O 126 3.900

3 2-CH3-4-NO2 i-Propyl O 128 3.893

4 2-CH3-5-NO2 i-Propyl O 128 3.893

5 3-CH3-4-NO2 i-Propyl O 79 4.102

6 2-CH3-3-NO2 i-Propyl O 128 3.893

7 2-CN-4-CH3 i-Propyl O 128 3.893

8 2-Br-4-CH3 i-Propyl O 39 4.409

9 2-CH3O-4-CH3 i-Propyl O 128 3.893

10 2-Cl-4-CH3 i-Propyl O 39 4.409

11 2-CF3 i-Propyl O 87 4.060

12 3-CF3 i-Propyl O 50 4.301

13 4-CF3 i-Propyl O 50 4.301

14 2-Naphthol i-Propyl O 72 4.143

15 1-NO2-2-Naphthol i-Propyl O 87 4.060

16 4-CH3-2-NO2 n-Butyl O 28 4.553

17 4-CH3-2-NO2 i-Butyl O 75 4.125

18 4-CH3-2-NO2 n-Pentyl O 51 4.292

19 4-CH3-2-NO2 Cyclopentyl O 47 4.328

20 5-CH3-2-NO2 n-Propyl O 102 3.991

21 4-CF3 NH2 O 79 4.102

22 4-CF3 n-Butyl O 32 4.495

23 4-CF3 sec-Butyl O 40 4.398

24 4-CF3 Cyclobutyl O 45 4.347

25 4-CF3 n-Pentyl S 4.5 5.347

26 4-CF3 Cyclopentyl O 26 4.585

27 4-CF3 Cyclopentyl S 8.6 5.066

28 4-CF3 Cyclohexyl O 43 4.367

29 4-CF3 n-Heptyl O 44 4.357

30 4-CF3 Piperidino O 84 4.076

31 4-CF3 Pyrrolidino O 56 4.252

32 4-CF3 Morpholino O 98 4.009

Table 3 continued

S. no. R1 R2 X IC50 pIC50

33 2-CH3-4-NO2 n-Pentyl S 6.9 5.161

34 2-CH3-4-NO2 Cyclopentyl S 1.6 5.796

35 4-Br Cyclopentyl O 17 4.770

36 4-Br Cyclopentyl S 23 4.638

S

O

O

X N

H

O

N

H

R2

R1

S

O

O

X N

O

H

N

H

R2

R1

S

O

O

X N

H

O

H

N

R2

R1

Tautomer-1

Tautomer-2

Tautomer-3

X = O or –NH, R2 = CH3, -CH2Cl, -CH2CH2Cl

Fig. 4 Tautomeric forms of substituted N-phenyl ureidobenzenesulf-

onate derivatives (dataset-4) used in this study

Med Chem Res

123

Table 4 Experimental data IC50 and pIC50 for dataset-4 of substituted N-phenyl ureidobenzenesulfonate derivatives

S

O

O

X N

H

O

N

H

R2

R1

S. no. X R1 R2 IC50 (lM)

(HT-29)

pIC50 (M)

(HT-29)

1 O 4-OH 4-CEU 1.5 5.824

2 O 2-Me 3-CEU 33 4.481

3 O 2-CH2-CH3 3-CEU 4.3 5.367

4 O 2-(CH2)2-CH3 3-CEU 15 4.824

5 O 4-OH 3-CEU 120 3.921

6 O 2-CH2-CH3 4-CEU 17 4.770

7 O 2-(CH2)2-CH3 4-CEU 2.5 5.602

8 NH 2-Me 3-CEU 71 4.149

9 NH 2-CH2-CH3 3-CEU 48 4.319

10 NH 2-(CH2)2-CH3 3-CEU 15 4.824

11 NH 2-Me 4-CEU 55 4.260

12 NH 2-CH2-CH3 4-CEU 40 4.398

13 NH 2-(CH2)2-CH3 4-CEU 21 4.678

14 O 2-Me 3-CPU 21 4.678

15 O 2-CH2-CH3 3-CPU 23 4.638

16 O 2-(CH2)2-CH3 3-CPU 14 4.854

17 O 4-OH 3-CPU 51 4.292

18 O 2-Me 4-CPU 26 4.585

19 O 2-CH2-CH3 4-CPU 15 4.824

20 O 2-(CH2)2-CH3 4-CPU 13 4.886

21 O 4-OH 4-CPU 50 4.301

22 NH 2-Me 3-CPU 42 4.377

23 NH 2-CH2-CH3 3-CPU 96 4.018

24 NH 2-(CH2)2-CH3 3-CPU 15 4.824

25 NH 2-Me 4-CPU 64 4.194

26 NH 2-(CH2)2-CH3 4-CPU 26 4.585

27 O 2-Me 4-CEU 4.7 5.328

28 O 2-Me 3-EU 44 4.357

29 O 2-CH2-CH3 3-EU 33 4.481

30 O 2-(CH2)2-CH3 3-EU 25 4.602

31 O 4-OH 3-EU 75 4.125

32 O 2-Me 4-EU 12 4.921

33 O 2-CH2-CH3 4-EU 12 4.921

34 O 2-(CH2)2-CH3 4-EU 2.4 5.620

35 O 4-OH 4-EU 12 4.921

36 O 3-Me 4-CEU 7.2 5.143

37 NH 2-Me 3-EU 102 3.991

38 NH 2-CH2-CH3 3-EU 15 4.824

39 NH 2-(CH2)2-CH3 3-EU 41 4.387

40 NH 2-CH2-CH3 4-EU 86 4.066

Med Chem Res

123

information about the fractions can be obtained using

publicly available resources, such as SPARC web server

(https://archemcalc.com/sparc.html). Therefore, before

building the QSAR models, the possible fractions for dif-

ferent tautomeric forms for all datasets were checked using

SPARC web server. In all the datasets, the tautomeric form

that has been predicted to be most stable by the SPARC

web server, has been designated as tautomer form 1. In this

work, we have adopted a simplistic approach to check the

effect of tautomerism on feature selection, statistical

characteristics and performance of QSAR equations with

the assumption that the entire set of compounds is present

in the same tautomeric form.

The tautomeric structures were drawn using ACD

Chemsketch 12 freeware. The 3D conversion and geometry

optimization were carried out using a molecular mechanics

method available in the program VegaZZ, using Gasteiger

partial charges and Tripos force field (Liu and Long, 2009;

Tetko, 2005). To calculate various theoretical molecular

descriptors, the optimized structures were uploaded on

e-DRAGON server. For model building, stepwise regres-

sion (SR) and GA-MLR methods were used. The SR

Table 4 continued

S. no. X R1 R2 IC50

(lM)(HT-29)

pIC50

(M)(HT-29)

41 NH 2-(CH2)2-CH3 4-EU 32 4.495

42 O 4-Me 4-CEU 30 4.523

43 O 4-OMe 4-CEU 18 4.745

44 O 4-N(Me)2 4-CEU 39 4.409

CEU 2-chloroethylurea, PIB-SO phenyl-4-(2-oxoimidazolidin-1-yl)benzenesulfonate, PUB-SO N-phenyl ureidobenzenesulfonate, PUB-SA N-

phenylureidobenzenesulfonamide, CPU 3-chloropropylurea, EU ethylurea

NH

O

N

H

SO

OR2

R1

NH

HO

N

SO

OR2

R1

Tautomer-2Tautomer-1

Fig. 5 Tautomeric forms of indole-2-carboxamides (dataset-5) used

in this study

Table 5 Experimental data IC50 and pIC50 for dataset-5 of indole-2-

carboxamides

NH

O

N

H

SO

OR2

R1

S. no. R1 R2 IC50 (nM) pIC50 (M)

1 5-Cl

N

3.3 8.481

2 5-Br 1.3 8.886

3 5-NO2 2.5 8.602

4 5-Cl, 4-F 3.9 8.409

5 5-Cl

N

1.3 8.886

6 5-Br 3.7 8.432

7 5-NO2 3.1 8.509

8 5-Cl, 4-F 3.9 8.409

9 5-Cl

N O

1.9 8.721

10 5-Br 3.4 8.469

11 5-NO2 5.8 8.237

12 5-Cl, 4-F 2.5 8.602

13 5-Cl 5.7 8.244

14 5-Br 5.7 8.244

15 5-NO2 6.2 8.208

16 5-Cl, 4-F 5.8 8.237

Med Chem Res

123

analysis involves forward and backward selection with

F = 4 and 3.5 for inclusion and exclusion, respectively.

For genetic algorithm (GA) analysis, default settings in

QSARINS were used. Minitab (for SR) and QSARINS (for

GA) were used to build multi-linear regression equations

(Liu and Long, 2009).

In this study, a QSAR equation built (using SR or GA)

successfully for one (parent) tautomeric form was per-

formed on other tautomeric forms to check the influence of

tautomerism on statistical reliability of the QSAR model.

For the further evaluation of the influence of tautomerism

on feature selection, a set of significant descriptors selected

using the parent tautomeric forms were used to build

QSAR equations for the parent form as well as other tau-

tomeric forms. This assured the correlation ability of the

selected descriptors. In an attempt to develop highly pre-

dictive and informative QSAR equation, minimum

orthogonal descriptors have been included in the equations.

Results and discussion

The results of the analyses on all the datasets are summarized

in Tables 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and 15.

The high values of R, R2, Radj2 , Rcv

2 , and F with low

values of S and PRESS for different tautomeric forms

reveal that most of the equations are statistically robust

(Gramatica and Papa, 2007; Gramatica et al., 2007; Ma-

hajan et al., 2013; Masand et al., 2010, 2012, 2013a; Mitra

et al., 2011, 2012). The close values of R2 and Rcv2 reveal

that the equations are statistically stable apropos of inclu-

sion/exclusion of molecules in the dataset.

Dataset-1

Dataset-1 consists of molecules possessing four tautomeric

forms due to pyrrole–azafulvene tautomerization. Therefore,

Table 5 continued

S. no. R1 R2 IC50 (nM) pIC50 (M)

17 5-Cl

N

17 7.77

18 5-Br 16 7.796

19 5-NO2 16 7.796

20 5-Cl, 4-F 28 7.553

21 5-Cl

N O

8.8 8.056

22 5-Br 170 6.77

23 5-NO2 8 8.097

24 5-Cl, 4-F 13 7.886

25 5-Cl 5.7 8.244

26 5-Br 14 7.854

27 5-NO2 6.8 8.167

28 5-Cl, 4-F 14 7.854

29 5-Cl

N

6.5 8.187

30 5-Br 6.4 8.194

31 5-NO2 6.9 8.161

32 5-Cl, 4-F 5.5 8.26

33 5-Cl

O

26 7.585

34 5-Br 29 7.538

35 5-NO2 29 7.538

36 5-Cl, 4-F 19 7.721

Table 6 Statistical quality of the developed QSAR model one tautomeric form (in bold) and tested on different tautomeric forms for dataset-1

(synthetic prodiginines)

Tautomeric form Model 1 Model 2 Model 3 Model 4

R2 RCV2 F R2 RCV

2 F R2 RCV2 F R2 RCV

2 F

1 0.769 0.658 31.560 0.807 NC 39.723 0.819 NC 42.986 0.752 NC 28.806

2 0.643 NC 17.111 0.775 0.613 32.720 0.776 NC 32.911 0.753 NC 28.961

3 0.761 NC 30.249 0.706 NC 22.813 0.821 0.759 43.670 0.908 NC 93.761

4 0.723 NC 24.796 0.754 NC 29.118 0.918 NC 106.354 0.826 0.757 45.030

Tautomeric form GA-Model 5 GA-Model 6 GA-Model 7 GA-Model 8

R2 RCV2 F R2 RCV

2 F R2 RCV2 F R2 RCV

2 F

1 0.880 0.843 69.834 0.799 NC 37.764 0.779 NC 33.486 0.758 NC 29.756

2 0.762 NC 30.416 0.881 0.869 74.490 0.818 NC 42.698 0.770 NC 31.804

3 0.843 NC 51.009 0.820 NC 43.278 0.878 0.822 68.660 0.809 NC 40.238

4 0.847 NC 52.591 0.734 NC 26.214 0.858 NC 57.401 0.885 0.847 72.78

NC not calculated

Med Chem Res

123

four different QSAR models were developed for each tau-

tomeric form. The developed QSAR model for the parent

tautomer was applied on other tautomeric forms. Interest-

ingly, for all the tautomeric forms, the statistical perfor-

mances of all the models are acceptable with close values of

R2 and F, thereby, indicating insensitivity of QSAR models

toward tautomerism.

Models 1 and 5 built using tautomeric form 1 (as parent

tautomeric form, shown in bold) showed a R2 = 0.769 and

Table 7 Statistical quality of the developed QSAR models using descriptors selected from one tautomeric form to other tautomeric forms for

dataset-1 (synthetic prodiginines)

Tautomeric form Descriptors from Model 1 Descriptors from Model 2 Descriptors from Model 3 Descriptors from Model 4

R6u?, R1u, EEig02r,

Mor18p

F06[C–N], R1u, RDF035p,

G(N..N)

BELp7, WA, SP01,

RDF150v

BELp7, WA, SP01,

RDF045u

R2 RCV2 F R2 RCV

2 F R2 RCV2 F R2 RCV

2 F

1 0.769 0.658 31.56 0.794 0.712 36.69 0.817 0.758 42.53 0.774 0.670 32.45

2 0.740 0.602 27.03 0.775 0.613 32.72 0.832 0.777 46.92 0.695 0.580 21.67

3 0.707 0.543 22.97 0.774 0.594 32.45 0.821 0.759 43.67 0.813 0.737 41.38

4 0.738 0.603 26.80 0.780 0.624 33.67 0.820 0.757 43.37 0.826 0.757 45.03

Tautomeric form Descriptors from GA-Model 5 Descriptors from GA-Model 6 Descriptors from GA-Model 7 Descriptors from GA-Model 8

ATS6e, VEe1,

RDF090p, HATS0u

SMTI, ATS6p,

EEig04d, RDF080m

TWC, ATS6v,

GATS2e, BELe7

SRW02, ATS6e,

RDF100v, HATS0e

R2 RCV2 F R2 RCV

2 F R2 RCV2 F R2 RCV

2 F

1 0.880 0.843 69.83 0.834 0.778 47.73 0.871 0.808 64.12 0.875 0.833 66.58

2 0.874 0.834 65.72 0.881 0.869 74.49 0.843 0.777 50.86 0.883 0.845 71.89

3 0.858 0.809 57.26 0.787 0.720 35.14 0.878 0.822 68.66 0.878 0.838 68.64

4 0.875 0.835 66.25 0.797 0.737 37.39 0.887 0.831 74.57 0.885 0.847 72.78

Table 8 Statistical quality of the developed QSAR model one tautomeric form (in bold) and tested on different tautomeric forms for dataset-2 of

bisaryl quinolones

Tautomeric form Model 1 Model 2 GA-Model 1 GA-Model 2

R2 RCV2 F R2 RCV

2 F R2 RCV2 F R2 RCV

2 F

1 0.781 0.715 22.06 0.726 NC 16.43 0.805 0.700 25.64 0.341 NC 3.21

2 0.682 NC 13.30 0.743 0.643 17.94 0.761 NC 19.74 0.821 0.737 28.48

Table 9 Statistical quality of the developed QSAR models using descriptors selected from one tautomeric form to other tautomeric forms for

dataset-2 of bisaryl quinolones

Tautomeric form Model 1 Model 2 GA-Model 1 GA-Model 2

RDF140u, DISPv, Mor18p,

F03[N–O], Mor27p

RDF140u, DISPv, JGI2,

Mor16p, RDF010e

GGI5, Mor03e, Mor28e,

Mor02p, E1v

RDF030u, RDF035p, G2v,

E1v, Gu

R2 RCV2 F R2 RCV

2 F R2 RCV2 F R2 RCV

2 F

1 0.781 0.715 22.06 0.742 0.651 17.86 0.805 0.700 25.636 0.547 0.352 7.49

2 0.690 0.588 13.79 0.743 0.643 17.94 0.791 0.695 23.53 0.821 0.737 28.48

Table 10 Statistical quality of the developed QSAR model one

tautomeric form (in bold) and tested on different tautomeric forms for

dataset-3 (phosphoramides)

Tautomeric form Model 1 Model 2

R2 RCV2 F R2 RCV

2 F

1 0.878 0.842 76.53 – – –

2 – – – 0.833 0.787 53.21

Med Chem Res

123

0.880, respectively, for the parent form. For the other tau-

tomeric forms, R2 is statistically significant and comparable

with R2 of the parent tautomeric form (ranging from 0.643 to

0.769 for model 1, and from 0.762 to 0.880 for model 5) (see

Table 6). Parallel observations were found to be true for

models 2 and 6 as well. Interestingly, for models 3 and 7 built

using tautomeric form 3 (as parent tautomeric form, shown in

bold) consistently high values of R2 and F were observed for

parent form as compared to other tautomeric forms. This

indicates that the model 3, based on tautomeric form 3, would

be predictably more suitable for biological activity. In other

words, the descriptors and the structural features of tautomer

3 explain maximum variations in activity. This observation

is confirmed by the consistently high values of R2 and F for

parent and other tautomeric forms, when a set of significant

descriptors were selected using parent tautomeric form 3 to

build QSAR equations for all the tautomeric forms (see

Table 7). This result indicates that tautomeric form 3 could

be the ‘bioactive tautomer’ for antimalarial activity of

prodiginines.

Surprisingly, for tautomers 3 and 4, the best models 3

and 4 have three common descriptors (BELp7, WA, SP01)

and they are different with respect to only one descriptor

(RDF150v and RDF045u in models 3 and 4, respectively).

In addition, for model 3, highest R2 and F values were

observed when it was applied on the tautomer form 4.

Similar observations were found to be correct for model 4.

This observation can be plausibly associated with the

ability of the descriptors to recognize and discriminate the

tautomers. It appears that two descriptors from above-

mentioned models have an ability to discriminate tautom-

ers. One is BELp7, which belongs to the group of BCUT

descriptors and the other is a radial distribution function

(RDF) descriptor RDF045u. BCUT descriptors incorporate

the connectivity information and atomic properties (e.g.,

polarizability, atomic charge, etc.). BELp7 stands for the

highest eigen value 7 weighted by atomic polarizabilities.

Its positive contribution (see model 4) to the pIC50 suggests

that an increase of polarizabilities of diagonal elements in

adjacency matrix increases the anti-malarial activity of

synthetic prodiginines. RDF045u is a molecular descriptor

that is calculated using radial basis functions. It contains

information of probability distribution of atoms in a

spherical volume of radius 4.5 A. As opposed to BELp7,

increase in the values of RDF45u negatively affects the

observed activity. Surprisingly, the QSAR models are

again statistically robust with R2 and RCV2 ranging from

0.75 to 0.90 for various tautomeric forms. This reveals that

for these molecules QSAR is independent of tautomerism

(see Table 6).

Dataset-2

The bisaryl quinolones possess keto-enol tautomerism. All

the QSAR models performed well for the parent form with

R2 ranging from 0.743 to 0.821. The difference in R2

appears when they are tested on other tautomers, especially

for GA-model 2. Model 2 obtained by GA showed a dif-

ference in the statistical quality when it was tested on the

two tautomeric forms (R2 = 0.341 and 0.821). The devel-

oped models contain the following descriptors: RDF030u,

Table 11 Statistical quality of the developed QSAR models using

descriptors selected from one tautomeric form to other tautomeric

forms for dataset-3 (phosphoramides)

Tautomeric form Model 1 Model 2

GATS1m, nBR,

RDF030u

O-056, RDF050u,

RDF105m

R2 RCV2 F R2 RCV

2 F

1 0.878 0.842 76.53 – – –

2 – – – 0.833 0.787 53.21

‘–’ means not possible to calculate

Table 12 Statistical quality of the developed QSAR model one tautomeric form (in bold) and tested on different tautomeric forms for dataset-4

of substituted N-phenyl ureidobenzenesulfonate derivatives

Tautomeric form Model 1 Model 2 Model 3

R2 RCV2 F R2 RCV

2 F R2 RCV2 F

1 0.709 0.597 18.50 0.672 NC 15.57 0.705 NC 18.16

2 0.701 NC 17.82 0.688 0.569 16.77 0.700 NC 17.73

3 0.677 NC 15.93 0.643 NC 13.69 0.684 0.557 16.43

Tautomeric form GA-Model 1 GA-Model 2 GA-Model 3

R2 RCV2 F R2 RCV

2 F R2 RCV2 F

1 0.666 0.510 15.16 0.581 NC 10.54 0.686 NC 16.60

2 0.662 NC 14.89 0.664 0.505 14.99 0.686 NC 16.60

3 0.649 NC 14.05 0.587 NC 10.80 0.684 0.577 16.46

Med Chem Res

123

RDF035p, G2v, E1v, and Gu. The statistical quality of the

models applied on the two tautomeric forms, and devel-

oped using the same descriptors, was also found to be

different (R2 = 0.547 and 0.821). Of the five descriptors,

three descriptors viz. RDF030u, RDF035p, and G2v were

found to be sensitive toward tautomerism in the case of

bicyclic quinolones. The two remaining RDF descriptors

viz. RDF030u and RDF035p reveal the importance of

distribution of atoms in the spherical volume of radius

3.0 A (RDF030u), and the distribution of atomic polariz-

abilities in the radius of 3.5 A (RDF035p). The rest of the

descriptors belong to the group of the WHIM descriptors,

and two of them are related to the atomic van der Waals

volumes (G2v, E1v).

Dataset-3

Phosphoramidate and phosphorothioamidate analogs of

amiprophos methyl show two tautomeric forms. Thereby,

two equations were obtained in this dataset. GA and the SR

gave identical equations. For the tautomeric form-1, O-056

represents the presence of alcoholic –OH. Since no mole-

cule in the tautomeric form-1 possesses alcoholic –OH

group, it cannot be calculated for the tautomeric form-1.

Table 13 Statistical quality of the developed QSAR models using descriptors selected from one tautomeric form to other tautomeric forms for

dataset-4 of substituted N-phenyl ureidobenzenesulfonate derivatives

Tautomeric form Descriptors from Model 1 Descriptors from Model 2 Descriptors from Model 3

F07[C–N], F05[C–C], Mor29u,

Mor03m, RDF095v

F07[C–N], F05[C–C], Mor29u, Mor16v,

RDF120m

F07[C–N], F05[C–C], Mor29e,

Mor03m, RDF095v

R2 RCV2 F R2 RCV

2 F R2 RCV2 F

1 0.709 0.597 18.50 0.673 0.545 15.64 0.710 0.596 18.59

2 0.705 0.592 18.19 0.688 0.569 16.77 0.707 0.593 18.33

3 0.682 0.556 16.32 0.653 0.503 14.28 0.684 0.557 16.43

Tautomeric form Descriptors from GA-Model 1 Descriptors from GA-Model 2 Descriptors from GA-Model 3

ESpm12d, BEHv2,

RDF140u, E3e, H7m

BELm1, BELv1,

RDF140u, Mor17m, H6e

RDF140v, RDF150v,

RDF030e, F03[C–N], F10[N–O]

R2 RCV2 F R2 RCV

2 F R2 RCV2 F

1 0.666 0.510 15.16 0.600 0.444 11.42 0.687 0.580 16.69

2 0.664 0.506 15.02 0.664 0.505 14.99 0.687 0.578 16.66

3 0.651 0.481 14.18 0.600 0.435 11.42 0.684 0.577 16.46

Table 14 Statistical quality of the developed QSAR model one tautomeric form (in bold) and tested on different tautomeric forms for dataset-5

(indole-2-carboxamides)

Tautomeric form Model 1 Model 2 GA-Model 1 GA-Model 2

R2 RCV2 F R2 RCV

2 F R2 RCV2 F R2 RCV

2 F

1 0.861 0.778 47.92 0.648 NC 14.27 0.821 0.746 35.54 0.835 NC 39.22

2 0.823 NC 36.04 0.822 0.723 35.89 0.829 NC 37.57 0.841 0.763 40.92

Table 15 Statistical quality of the developed QSAR models using descriptors selected from one tautomeric form to other tautomeric forms for

dataset-5 (indole-2-carboxamides)

Tautomeric form Model 1 Model 2 GA-Model 1 GA-Model 2

F07[C–S], H-048,

RDF105m, G2e

F07[C–S], H-048, RDF105m,

RDF040e

DECC, RDF105v, RDF045e,

F05[N–N]

BEHm7, RDF045e, HATS6p,

B02[N–N]

R2 RCV2 F R2 RCV

2 F R2 RCV2 F R2 RCV

2 F

1 0.861 0.778 47.92 0.824 0.726 36.29 0.821 0.746 35.54 0.832 0.751 38.42

2 0.788 0.684 28.83 0.822 0.723 35.89 0.836 0.765 39.47 0.841 0.763 40.92

Med Chem Res

123

The same is true for GATS1m from the tautomeric form-1,

which cannot be calculated for the tautomeric form 2.

Interestingly, the negative coefficient of O-056 in Eq. 2

suggests that the conversion of the keto to the enol form is

unfavorable for anti-malarial activity for these molecules.

Consequently, the keto form is the dominating form that

decides and controls the biological activity of these mol-

ecules. It can be assumed that the keto form is the active

form which interacts with the ‘receptor.’ Such types of

mechanistic/pharmacophoric details are obtained when all

the possible tautomeric forms are considered while devel-

oping the QSAR models.

Dataset-4

The substituted N-phenyl ureidobenzenesulfonate deriva-

tives exhibit three tautomeric forms. Therefore, three

QSAR equations, one for each of the tauotmeric forms

were developed. The best QSAR models for anti-prolifer-

ative activity of the substituted N-phenyl ureidobenzene-

sulfonate derivatives were developed using SR and GA-

MLR for all the three tauotomeric forms. Other tautomeric

forms revealed very close values of statistical parameters

(Table 12). Also, the statistical qualities of QSAR models

developed using descriptors from the best models of one

tautomeric form and performed on other tautomeric forms

were found to be similar. This proves that the selected

descriptors are not sensitive toward the tautomerism of N-

phenyl ureidobenzenesulfonate derivatives. For many

models, the RCV2 values are consistently low (\0.60), which

is true for F values as well.

Dataset-5

Only model 2 developed using SR (Table 14) showed

difference in statistical quality (R2 = 0.822 and 0.648 for

the parent and the second form, respectively) (Table 14).

The difference was not observed when both the models

were built using the same descriptors (F07[C–S], H-048,

RDF105v, RDF045e, F05[N–N]) (Table 15). Two of these

descriptors (F07[C–S] and F05[N–N]) have the same

values for the two tautomeric forms. RDF descriptors viz.

RDF105v and RDF045e have an ability to discriminate

tautomers, but their relative importance is too low in the

developed models. This explains the partial insensitivity of

these models toward tautomerism.

This study has shown that some QSAR equations are

sensitive toward tautomerism and some are not. One reason

could be the identification of the appropriate set of

descriptors. Thus, feature selection becomes important

when the dataset contains compounds that exhibit tau-

tomerism. Once the appropriate set and number of

descriptors are identified, a good correlation is observed

between biological activity and descriptors, and tautomer-

ization, then, does not seem to have an effect on QSAR.

Another possible reason could be the contribution and

affinity of the tautomeric groups in the molecules toward

binding to the target protein. It may be possible to obtain

more information in this regard by examining the bound

conformation of the molecule in the active site of the target

protein. Further, the bound conformations could be

obtained from a crystal structure or docking simulations.

Docking simulations were not performed since the recep-

tors interacting with molecules (for dataset 1–4) are

unknown. For dataset 5, docking experiments of indolyla-

rylsulfones inhibitors to the reverse transcriptase enzyme

have been performed (La Regina et al., 2011; Masand

et al., 2013b) and it suggests a possible interaction of the

carboxamide group in both the keto and the enol forms

with active site residues. It can be assumed that the QSAR

models of this dataset are independent of tautomerism.

Moreover, tautomerization is a spontaneous, fast and

dynamic phenomenon, resulting in a rapid equilibrium

among the tautomers. This could be the reason for the

insensitivity of some of the descriptors in the identification

and discrimination of tautomers.

The 3D-descriptors from BCUT, RDF, and WHIM

group of descriptors appear relatively sensitive toward

tautomerism. Therefore, descriptors from these groups

could be used for the prediction of biological activity of

datasets that include molecules exhibiting tautomerism.

Conclusion

Although the datasets are found to be dominated by tau-

tomer 1, the presence of other tautomers cannot be

neglected, as it is very difficult to determine the exact

active tautomeric forms under physiological conditions.

Feature selection is found to be very important for the

selection of an optimum number and set of descriptors

while dealing with tautomerism during QSAR model

development. Tautomerism was found to have a significant

influence on the performance of some QSAR models for

second and third datasets. For dataset-1, 4 and 5, the QSAR

models for all the tautomeric forms are found to be sta-

tistically significant and independent of tautomerism.

In conclusion, tautomerism can influence the outcome of

QSAR modeling for a particular dataset and should be

taken into account during modeling studies.

Acknowledgments We are thankful to e-Dragon, Vega ZZ, ACD

ChemSketch, QSARINS, and RapidMiner developing teams for

providing free/trial versions of their softwares. We sincerely thank Dr.

Harsh Chauhan, Assistant Professor, Creighton University, Greater

Boston Area, USA for helpful discussions throughout this work.

Med Chem Res

123

Sincere thanks to Dr. S.R. Nair, Head, Department of Languages,

Vidya Bharati College, Amravati for improving the English language.

References

Baumann K, Stiefl N (2004) Validation tools for variable subset

regression. J Comput Aided Mol Des 18(7–9):549–562

Beheshti A, Riahi S, Ganjali MR, Norouzi P (2012) Highlighting and

trying to overcome a serious drawback with QSPR studies; data

collection in different experimental conditions (mixed-QSPR).

J Comput Chem 33(7):732–747

Consonni V, Ballabio D, Todeschini R (2010) Evaluation of model

predictive ability by external validation techniques. J Chemomem

24:194–201

da Cunha EFF, Mancini DT, Ramalho TC (2011) Molecular modeling

of the Toxoplasma gondii adenosine kinase inhibitors. Med

Chem Res 21(5):590–600

Doweyko AM (2008) QSAR: dead or alive? J Comput Aided Mol Des

22(2):81–89

Golbraikh A, Tropsha A (2002) Beware of q2! J Mol Graph Model

20(4):269–276

Gramatica P, Papa E (2007) Screening and ranking of POPs for global

half-life: QSAR approaches for prioritization based on molecular

structure. Environ Sci Technol 41(8):2833–2839

Gramatica P, Pilutti P, Papa E (2007) Approaches for externally

validated QSAR modelling of nitrated polycyclic aromatic

hydrocarbon mutagenicity. SAR QSAR Environ Res

18(1–2):169–178

Hawkins DM (2004) The problem of overfitting. J Chem Inf Comput

Sci 44(1):1–12

Hawkins DM, Kraker JJ, Basak SC, Mills D (2008) QSPR checking

and validation: a case study with hydroxy radical reaction rate

constant. SAR QSAR Environ Res 19(5–6):525–539

Huang J, Fan X (2011) Why QSAR fails: an empirical evaluation

using conventional computational approach. Mol Pharm

8(2):600–608

Kiralj R, Ferreira MMC (2009) Basic validation procedures for

regression models in QSAR and QSPR studies: theory and

application. J Braz Chem Soc 20:770–787

Kubinyi H (2002) From narcosis to hyperspace: the History of QSAR.

Quant Struct Act Rel 21:348–356

La Regina G, Coluccia A, Brancale A, Piscitelli F, Gatti V, Maga G,

Samuele A, Pannecouque C, Schols D, Balzarini J, Novellino E,

Silvestri R (2011) Indolylarylsulfones as HIV-1 non-nucleoside

reverse transcriptase inhibitors: new cyclic substituents at indole-

2-carboxamide. J Med Chem 54(6):1587–1598

Liu P, Long W (2009) Current mathematical methods used in QSAR/

QSPR studies. Int J Mol Sci 10(5):1978–1998

Mahajan DT, Masand VH, Patil KN, Ben Hadda T, Jawarkar RD,

Thakur SD, Rastija V (2012) CoMSIA and POM analyses of

anti-malarial activity of synthetic prodiginines. Bioorg Med

Chem Lett 22(14):4827–4835

Mahajan DT, Masand VH, Patil KN, Hadda TB, Rastija V (2013)

Integrating GUSAR and QSAR analyses for antimalarial activity

of synthetic prodiginines against multi drug resistant strain. Med

Chem Res 22:2284–2292

Mara C, Dempsey E, Bell A, Barlow JW (2011) Synthesis and

evaluation of phosphoramidate and phosphorothioamidate ana-

logues of amiprophos methyl as potential antimalarial agents.

Bioorg Med Chem Lett 21(20):6180–6183

Martin YC (2009) Let’s not forget tautomers. J Comput Aided Mol

Des 23(10):693–704

Martin YC (2010) Tautomerism, Hammett sigma, and QSAR.

J Comput Aided Mol Des 24(6–7):613–616

Masand VH, Jawarkar RD, Patil KN, Nazerruddin GM, Bajaj SO

(2010) Correlation potential of Wiener index and molecular

refractivity vis-a-vis antimalarial activity of xanthone deriva-

tives. Org Chem Indian J 6(1):30–38

Masand VH, Jawarkar RD, Mahajan DT, Hadda TB, Sheikh J, Patil

KN (2012) QSAR and CoMFA studies of biphenyl analogs of the

anti-tuberculosis drug (6S)-2-nitro-6-{[4-(trifluoromethoxy) ben-

zyl]oxy}-6,7-dihydro-5H-imidazo[2,1-b][1,3]oxazine(PA-824).

Med Chem Res 21:2624–2629

Masand VH, Mahajan DT, Patil KN, Hadda TB, Youssoufi MH,

Jawarkar RD, Shibi IG (2013a) Optimization of anti-malarial

activity of synthetic prodiginines: QSAR, GUSAR and CoMFA

analyses. Chem Biol Drug Des 81(4):527–536

Masand VH, Mahajan DT, Ben Hadda T, Jawarkar RD, Chavan H,

Bandgar BP, Chauhan H (2013b) Molecular docking and

quantitative structure activity relationship (QSAR) analyses of

indolylarylsulfones as HIV-1 non-nucleoside reverse transcrip-

tase inhibitors. Med Chem Res (in press) doi:10.1007/s00044-

013-0647

Mitra I, Saha A, Roy K (2011) Chemometric QSAR modeling and in

silico design of antioxidant NO donor phenols. Sci Pharm

79(1):31–57

Mitra I, Saha A, Roy K (2012) Development of multiple QSAR

models for consensus predictions and unified mechanistic

interpretations of the free-radical scavenging activities of

chromone derivatives. J Mol Model 18(5):1819–1840

Myint KZ, Xie XQ (2010) Recent advances in fragment-based QSAR

and multi-dimensional QSAR methods. Int J Mol Sci

11(10):3846–3866

Oellien F, Cramer J, Beyer C, Ihlenfeldt WD, Selzer PM (2006) The

influence of tautomer forms on pharmacophore-based virtual

screening. J Chem Inf Model 46(6):2342–2354

Papireddy K, Smilkstein M, Kelly JX, Shweta, Salem SM, Alham-

adsheh M, Haynes SW, Challis GL, Reynolds KA (2011)

Antimalarial activity of natural and synthetic prodiginines.

J Med Chem 54(15):5296–5306

Pidathala C, Amewu R, Pacorel B, Nixon GL, Gibbons P, Hong WD,

Leung SC, Berry NG, Sharma R, Stocks PA, Srivastava A,

Shone AE, Charoensutthivarakul S, Taylor L, Berger O,

Mbekeani A, Hill A, Fisher NE, Warman AJ, Biagini GA,

Ward SA, O’Neill PM (2012) Identification, design and biolog-

ical evaluation of bisaryl quinolones targeting Plasmodium

falciparum type II NADH: quinone oxidoreductase (PfNDH2).

J Med Chem 55(5):1831–1843

Pospisil P, Ballmer P, Scapozza L, Folkers G (2003) Tautomerism in

computer-aided drug design. J Recept Signal Transduct Res

23(4):361–371

Pratim Roy P, Paul S, Mitra I, Roy K (2009) On two novel parameters

for validation of predictive QSAR models. Molecules

14(5):1660–1701

Roy KK, Dixit A, Saxena AK (2008) An investigation of structurally

diverse carbamates for acetylcholinesterase (AChE) inhibition

using 3D-QSAR analysis. J Mol Graph Model 27(2):197–208

Sahigara F, Mansouri K, Ballabio D, Mauri A, Consonni V,

Todeschini R (2012) Comparison of different approaches to

define the applicability domain of QSAR models. Molecules

17(5):4791–4810

Schwab CH (2010) Conformations and 3D pharmacophore searching.

Drug Discov Today Technol 7(4):e245–e253

Scior T, Medina-Franco JL, Do QT, Martinez-Mayorga K, Yunes

Rojas JA, Bernard P (2009) How to recognize and workaround

pitfalls in QSAR studies: a critical review. Curr Med Chem

16(32):4297–4313

Med Chem Res

123

Shugar D, Kierdaszuk B (1985) New light on tautomerism of purines

and pyrimidines and its biological and genetic implications. Proc

Int Symp Biomol Struct Interact Suppl J Biosci 8(3):657–668

Tetko IV (2005) Computing chemistry on the web. Drug Discov

Today 10(22):1497–1500

Thalheim T, Vollmer A, Ebert R-U, Kuhne R, Schuurmann G (2010)

Tautomer identification and tautomer structure generation based

on the InChI code. J Chem Inf Model 50(7):1223–1232

Trepalin SV, Skorenko AV, Balakin KV, Nasonov AF, Lang SA,

Ivashchenko AA, Savchuk NP (2003) Advanced exact structure

searching in large databases of chemical compounds. J Chem Inf

Comput Sci 43(3):852–860

Tropsha A (2010) Best practices for QSAR model development,

validation, and exploitation. Mol Inf 29:476–488

Turcotte V, Fortin S, Vevey F, Coulombe Y, Lacroix J, Cote MF,

Masson JY, C-Gaudreault R (2012) Synthesis, biological

evaluation, and structure–activity relationships of novel substi-

tuted N-phenyl ureidobenzenesulfonate derivatives blocking cell

cycle progression in S-phase and inducing DNA double-strand

breaks. J Med Chem 55(13):6194–6208

Van Drie JH (2007) Computer-aided drug design: the next 20 years.

J Comput Aided Mol Des 21(10–11):591–601

Yi Z, Zhang A (2012) A QSAR study of environmental estrogens

based on a novel variable selection method. Molecules

17(5):6126–6145

Zou JW, Luo CC, Zhang HX, Liu HC, Jiang YJ, Yu QS (2007) Three-

dimensional QSAR of HPPD inhibitors, PSA inhibitors, and

anxiolytic agents: effect of tautomerism on the CoMFA models.

J Mol Graph Model 26(2):494–504

Med Chem Res

123


Recommended