QSPR of antioxidant phenolic compounds using quantum chemical descriptors

This article was downloaded by: [Moskow State Univ Bibliote]On: 18 February 2014, At: 09:53Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Molecular SimulationPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/gmos20

QSPR of antioxidant phenolic compounds usingquantum chemical descriptorsIndrani Mitra a , Achintya Saha b & Kunal Roy aa Division of Medicinal and Pharmaceutical Chemistry, Drug Theoretics and CheminformaticsLab, Department of Pharmaceutical Technology , Jadavpur University , Kolkata, 700032,Indiab Department of Chemical Technology , University College of Science and Technology,University of Calcutta , 92, A.P.C. Road, Kolkata, 700009, IndiaPublished online: 13 Apr 2011.

To cite this article: Indrani Mitra , Achintya Saha & Kunal Roy (2011) QSPR of antioxidant phenolic compounds using quantumchemical descriptors, Molecular Simulation, 37:05, 394-413

To link to this article: http://dx.doi.org/10.1080/08927022.2010.543980

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/gmos20

http://dx.doi.org/10.1080/08927022.2010.543980

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

QSPR of antioxidant phenolic compounds using quantum chemical descriptors

Indrani Mitraa, Achintya Sahab and Kunal Roya*

aDivision of Medicinal and Pharmaceutical Chemistry, Drug Theoretics and Cheminformatics Lab, Department of PharmaceuticalTechnology, Jadavpur University, Kolkata 700032, India; bDepartment of Chemical Technology, University College of Science andTechnology, University of Calcutta, 92, A.P.C. Road, Kolkata 700009, India

(Received 8 October 2010; final version received 11 November 2010)

Accelerated systemic free radical production poses a serious problem to healthy living. Since long, phenolic antioxidantshave been studied for their ability to react with these toxic radicals. The present work deals with a series of substitutedphenolic derivatives with a wide range of antioxidant property data. Quantitative structure–property relationship modelshave been developed correlating the antioxidant properties of these molecules with quantum chemical descriptors such asMulliken charges of the common atoms and quantum topological molecular similarity indices of the common bonds. Modelswere developed based on the training set compounds, and were subsequently validated externally using the test setmolecules. The results infer that substituents having a positive mesomeric effect increase the electron density over thephenolic oxygen and reduce the aromatic delocalisation of the lone pair of electron on the phenolic oxygen, thereby enablingeffective interaction of the phenolic proton with the free radicals. Moreover, in addition to the mesomeric effect, theinductive effect of the different substituents also plays a crucial role for maintaining the overall charge distribution on thephenolic nucleus. On the basis of the predictive power and interpretability of the models, they may be further utilised forthe design of more potent antioxidant molecules.

Keywords: QTMS; QSPR; phenolic derivatives; antioxidants

1. Introduction

Nocive free radicals have been implicated in the

exacerbation of a number of oxidative stress-related

diseases such as Parkinson’s disease [1], Alzheimer

disease [2], atherosclerosis and cardiovascular damage

[3]. These free radicals easily react with vital molecules in

the body, such as DNA, causing mutations in the sequence

of the genetic material [4]. The accumulation of changes is

then thought to lead to the development of ageing and

several degenerative diseases. Free radicals can damage

nucleic acids, proteins or lipids resulting in the breakage of

DNA strands and in the production of toxic carbonyls.

Lipid peroxidation of polyunsaturated fatty acids exposed

to oxygen leads to rancidity in foods. In living animal cells,

peroxidised membranes lose their permeability, becoming

rigid, reactive and non-functional. The mitochondrial

theory of ageing [5] postulates that damage to mitochon-

drial DNA and organelles by free radicals leads to loss of

mitochondrial function and loss of cellular energy [6].

These free radicals are produced from both endogenous

and exogenous sources. Endogenously, free radicals are

produced constitutively during metabolic reactions [7].

They are produced as by-products during autooxidation of

catecholamines, haemoglobin, myoglobin, reduced cyto-

chrome C and thiol. However, exogenous sources of free

radicals include air pollution, of which industrial waste

and cigarette smoke are major contributors. Cigarette

smoke itself bristles with oxidants [8], while radiation as

well as several trace metals (lead, mercury, iron and

copper) constitutes the major sources of free radical

generation [9].

In order to combat the systemic free radicals, the body

has its own defence mechanism, which comprises several

endogenous antioxidant enzymes. Antioxidants [10] are

molecules that are capable of neutralising the free radicals

by breaking the free radical chain reaction and by

chelating with the metal ions that catalyse these chain

reactions. Chain reactions [11] involving free radicals

proceed through the steps of initiation (reactions of free

radicals with stable species to form more free radicals),

followed by propagation (free radicals thus formed react

with more unsaturated lipid molecules) and termination

(two free radicals combine to form a stable molecule). The

antioxidants accelerate the process of termination, thus

inhibiting the propagation of the reaction. At the molecular

level, the antioxidants interact with the free radicals based

on three primary mechanisms: (a) hydrogen atom transfer;

(b) single-electron transfer followed by proton transfer and

(c) sequential proton loss electron transfer [12–14]. When

the systemic antioxidants fail to manage the huge pool of

free radicals, external supplementation of antioxidants

becomes essential. Fruits and vegetables serve as surplus

sources of antioxidants [15]. Researchers revealed that

compounds belonging to a variety of chemical class

ISSN 0892-7022 print/ISSN 1029-0435 online

q 2011 Taylor & Francis

DOI: 10.1080/08927022.2010.543980

http://www.informaworld.com

*Corresponding author. Email: [email protected]

Molecular Simulation

Vol. 37, No. 5, April 2011, 394–413

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

(phenolic compounds, flavonoids, lycopene, carotenoids,

etc.) exhibit antioxidant property. The free radical

quenching ability of phenolic compounds arises, because

of both their acidity (ability to donate protons) and their

delocalised p-electrons (ability to transfer electrons while

remaining relatively stable) characteristic of benzene

rings. Vitamin E is the most well-known physiological

phenolic antioxidant. The most abundant polyphenols are

the condensed tannins, found in virtually all families of

plants, and comprises up to 50% of the dry weight of

leaves. Tea, cereal grains, nuts, grapes, pears, plums,

berries, etc. are rich in phenolic compounds that include

biphenyls, flavonoids and phenolic acids. The beneficial

effects of polyphenols to the higher animal species include

reduction in inflammatory effects such as coronary artery

disease [16] and down regulation of oxidative low-density

lipoprotein [17]. Resveratrol has been reported to inhibit

occurrence and/or growth of experimental tumours [18].

Another potential effect of the phenolic antioxidants is

their anti-ageing consequences like slowing the process of

skin wrinkling [19]. Polyphenols may also bind with

non-haem iron (e.g. from plant sources) in vitro in model

systems [20] possibly reducing its absorption. Apart from

their biological importance, antioxidants are also useful in

preserving food materials and for other purposes. Phenolic

antioxidants react with lipid peroxy free radicals, produced

as a result of lipid oxidation [21] and thereby prevent

further autooxidation of lipid molecules [22]. The

inhibition of lipid autooxidation is necessary for storage

or heating of foods as well as for reduction of the oxidation

of lipids after ingestion and absorption through the

intestinal wall. In course of reaction with the free radicals,

the phenolic antioxidant forms a free radical itself, which

is more stable than a lipidoxy or lipid peroxy radical. The

antioxidant free radical thus formed may undergo either of

the two different reactions: (i) it reacts with another

antioxidant free radical forming a dimmer that itself

possesses an antioxidant activity or (ii) it may form co-

polymer by reacting with a lipidoxy or lipid peroxy radical

or a biradical that is rapidly isomerised into a quinine

[23,24]. Again, chemicals like the petroleum products are

unstable in presence of oxygen and they decompose

through autooxidation [25]. The liquid phase oxidation of

hydrocarbons proceeds through a chain radical mechanism

with chain development and ramification. Thus, phenolic

antioxidants with tertiary alkyl substituents at 2 and 6

positions can also be used for the improvement of fuel

chemical stability and inhibition of hydrocarbon radical

chain oxidation [26].

Viewing the immense utility of antioxidants to the

healthy living of mankind, researchers were oriented

towards the design and development of synthetic chemical

entities with potent antioxidant property. In the aspect

of design of active chemical moieties, the quantitative

structure – property relationship (QSPR) technique

executes a key role [27]. The QSPR method employs a

correlation between the structures of chemical compounds

and their property data. Since the property of a chemical

compound is a function of its structural attributes and

physico-chemical characters, a QSPR model can aptly be

represented by a mathematical equation exhibiting a

relationship between the property data and the descriptors

(numerical representation of molecular characteristics

based on their structure) [28]. Several such models, based

on the antioxidant property exerted by a variety of chemical

entities, have been reported by different authors. Zhao et al.

[29] used the conventional Hansch method to develop

quantitative structure – activity relationship (QSAR)

models for 2-substituted phenylnitronyl nitroxides as free

radical scavengers. Calgarotto et al. [30] also performed a

multivariate study on 24 flavonoid compounds for their

peroxynitrite free radical scavenging activity. Partial least

squares (PLS) regression technique and partial 3D

comparison of molecules by frontal polygon method were

performed by Khlebnikov et al. [31] to develop a QSAR

model of flavonoid antioxidants. Using electrotopological

state (E state) atom parameters, the antiradical and

antioxidant activities of flavonoids were modelled by Ray

et al. [32]. MOLMAP descriptors of local properties were

utilised by Gupta et al. [33] for QSAR analysis of

antioxidant activity of phenolic compounds. Recently,

Mitra et al. [34–36] have reported several QSAR models

developed for the antioxidant activities of different

chemical entities such as hydroxybenzalacetone deriva-

tives [34], benzodioxoles [35] and benzothiophenes [36],

based on a wide category of descriptors using various

chemometric tools. The present work deals with a series of

phenolic derivatives that have been reported to exert

antioxidant action. QSPR models have been developed

using quantum chemical descriptors and employing

different statistical tools. The predictive performance of

the models has been judged based on various validation

measures, ensuring that the models developed can be

utilised for predicting the antioxidant property of new

chemical entities belonging to the class of phenolic

derivatives. However, the developed models being limited

to a congeneric data set of parent phenolic moiety with

identical chemical features, only phenolic compounds can

be designed and assessed further for antioxidant property

using the QSPR models developed in this work. The

observations depicted in this study can be used aptly in

optimisation studies of compounds similar to those used in

the work with the aim to improve their antioxidant activity.

2. Materials and methods

2.1 The data set

The model data set used for this work was obtained by

clubbing three different data sets reported by Kajiyama

Molecular Simulation 395

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

and Ohkatsu [25,37] and Matsuura and Ohkatsu [38]. The

whole data set comprises 65 phenolic antioxidants with

different substituents at ortho, meta and para positions of

the parent phenolic moiety. The antioxidant properties of

these molecules were measured by an oxygen absorption

method, where autooxidation of styrene was measured

in the presence of the phenolic derivatives and their rates

of peroxy radical trapping activities were measured [38].

The antioxidant activity of these phenolic derivatives was

measured based on their rate of oxygen uptake, which is

inversely proportional to the concentration of the phenolic

antioxidants [38]. For the development of the QSPR

models, the oxygen uptake rates, ½ð2d½O2�=dtÞinh

ð£1026 M21 s21Þ� (Table 1), were converted to the

negative logarithmic scale [log 103/[2d[O2]/dt]. The

molecular structures of these compounds as well as their

antioxidant property data are summarised in Table 1.

Although there are other available data sets comprising

phenolic derivatives with antioxidant activity, only three

datasets [25,37,38] as mentioned above were clubbed

together considering that similar experimental protocols

were adopted in these reports for measuring the

antioxidant activity of this series of phenolic derivatives.

Due to marked variations in the end point measurement

techniques, other available data sets of antioxidant

phenolic compounds were not included in this work.

2.2 Descriptor calculation

Two types of QSPR models were developed in this work

based on the two categories of quantum chemical

descriptors, viz. Mulliken charges of the common atoms

(Figure 1) and quantum topological molecular similarity

(QTMS) descriptors (Table 2). The QTMS descriptors refer

to varying characteristics of a chemical bond due to a

variation in the electron density of the participating atoms.

For the calculation of the Mulliken atomic charges,

molecules were prepared using the Gauss View 3.0

software [39], and subsequently the energy minimisation of

the molecules was carried out using the GAUSSIAN03W

software [40]. Energy was calculated at three different

levels of theory: (i) the semi-empirical AM1 method, (ii)

the Hartree–Fock method at HF/3-21G(d) level and

(iii) Hartree–Fock method at HF/6-31G(d) level [40]. The

output from each level was used as the input for the next

level. The charges of the common atoms thus calculated at

each level were then correlated with the antioxidant

property data of the compounds involved in this study.

Although charges were calculated at three levels of theory

(as indicated above), statistically significant QSPR models

were obtained based on the charges calculated at the

HF/6-31G(d) level only. Hence, charges obtained at this

level were only used for final QSPR model development.

Similarly, the QTMS descriptors were also calculated at the

HF/6-31G(d) level [40] based on the wave functions

generated at this level, since the bond critical point (BCP)

descriptors at this level gave better predictive models and

significantly consistent results compared to those calcu-

lated at lower levels of theory.

In the calculation of the BCP properties, seven types

of descriptors (r, 72r, l, 1, K, G and equilibrium bond

lengths) were calculated for each of the bonds connecting

the adjacent common atoms. The details of QTMS

descriptors can be found in some publications of Roy and

Popelier [41,42]. In a nutshell, QTMS descriptors focus on

BCPs that occur when the gradient of electron density

vanishes (7r ¼ 0) at some point between the two bonded

nuclei. Between each pair of bonded atoms, there exists a

pathway of charge density called bond path. Along this

path, there is a point of minimum electron density in the

plane of the bond path, but with maximum electron density

in the plane perpendicular, and this is referred to as the

saddle point or the BCP. For each of the molecules sharing

a common skeleton, properties are calculated at each BCP

formed by the common atoms. At a BCP, the Hessian

[43,44] of r has two negative eigenvalues (l1 , l2 , 0)

and one positive value (l3 . 0). Eigenvalues express local

curvature of r in a point: negative eigenvalues are

curvatures perpendicular to the bond, while the positive

eigenvalue measures the curvature along the bond. If the

positive eigenvalue l3 dominates, electron density is

accumulated along the bond path towards the nuclei.

However, if the negative eigenvalues dominate, electron

density accumulation in the plane perpendicular to the

bond path is prominent. This reflects the large charge

build-up between two bonded nuclei, which is reminiscent

of covalent bonding. The descriptor l3 gives a measure of

the s character of a bond, while the degree of p character

is measured by the summation of values of l1 þ l2 [45].

The Laplacian, denoted by 72r, refers to the sum of

eigenvalues and is a measure of how much r is

concentrated (72r , 0) or depleted (72r . 0) in a point.

Another descriptor in this series is the ellipticity of a bond,

which also measures the degree of p character of a bond

together with the susceptibility of the ring bonds to rupture

and is defined as 1 ¼ ðl1=l2Þ2 1. In the QTMS bond

descriptor vector, there are two more components: the

kinetic energy density KðrÞ and a more classical kinetic

energy GðrÞ [44]. Interpreting KðrÞ in chemical terms is

not straightforward; however, useful formulas describing

its link to the Laplacian and the ‘more classical kinetic

energy’ GðrÞ can be found [44]. Additionally, the

equilibrium bond length (Re) has also been used as one

of the descriptors along with other QTMS descriptors. It

was reported that the BCP descriptors were successful in

translating the predicted electronic effects of orbital

theories into observable consequences of variation in

bond electron densities [46]. In particular, BCP properties

detect conjugation, subtle delocalisation effects and

I. Mitra et al.396

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

Tab

le1

.S

tru

ctu

res

of

the

ph

eno

lic

der

ivat

ives

wit

hth

eir

ob

serv

edan

dp

red

icte

d/c

alcu

late

dan

tio

xid

ant

pro

per

tyd

ata.

Par

ent

mo

iety

Sl.

no

.S

ub

stit

uen

ts

(2d

[O2]/

dt)

inh

(£

102

6M

21

s21)

[Ref

s[2

5,3

7,3

8]]

log

10

3

2

� d½O

2�

dt

�P

rop

erty

a

pre

dic

ted

/ca

lcu

late

d

Pro

per

tyb

pre

dic

ted

/ca

lcu

late

d

Pro

per

tyc

pre

dic

ted

/ca

lcu

late

d

Pro

per

tyd

pre

dic

ted

/ca

lcu

late

d

OH

R

1e

ZO

CH

32

51

.60

21

.88

31

.88

41

.80

61

.83

82

ZO

CH

2C

H3

25

1.6

02

1.8

46

1.8

32

1.8

04

1.8

44

3Z

OC

H(C

H3) 2

5.6

2.2

52

2.0

86

2.0

67

1.8

18

1.9

05

OH

R

4Z

OC

H3

1.6

2.7

96

2.9

13

3.0

52

3.0

30

2.9

18

5Z

OC

H2C

H3

0.8

3.0

97

2.9

68

3.0

87

3.0

41

2.9

24

6Z

OC

H(C

H3) 2

1.6

2.7

96

2.7

30

2.7

79

2.5

98

2.6

52

7Z

OC

(CH

3) 3

3.3

2.4

81

2.7

36

2.7

79

2.6

52

2.6

66

OH

R

OC

H3

8Z

OC

H3

4.0

2.3

98

2.3

37

2.3

16

2.4

73

2.3

04

9e

ZO

CH

2C

H3

3.2

2.4

95

2.3

50

2.3

29

2.4

91

2.3

09

10

eZ

OC

H(C

H3) 2

4.5

2.3

47

2.3

07

2.3

09

2.3

84

2.3

58

11

eZ

CH

38

.42

.07

62

.05

22

.04

42

.06

41

.96

612

ZC

H2C

H3

9.4

2.0

27

2.0

49

2.0

42

2.0

30

1.9

21

13

ZC

H(C

H3) 2

9.2

2.0

36

2.0

44

2.0

37

2.0

19

1.8

72

14

ZC

(CH

3) 3

10

.11

.99

62

.06

82

.06

22

.09

71

.92

315

ZC

H2C

Hv

CH

29

.52

.02

22

.01

52

.01

31

.98

11

.88

416

ZC

Hv

CH

CH

36

.42

.19

41

.98

21

.98

72

.04

11

.88

617

ZC

H2O

H1

0.1

1.9

96

1.9

98

1.9

90

1.9

46

1.9

40

18

ZC

Hv

CH

28

.92

.05

11

.91

91

.93

01

.97

51

.83

019

eZ

Cl

9.9

2.0

04

1.6

82

1.6

81

1.8

62

2.0

54

20

eZ

CHv

CH

CO

OC

H3

11

.41

.94

31

.78

81

.83

71

.97

41

.71

021

ZC

OC

H3

25

1.6

02

1.6

32

1.6

82

1.5

66

1.5

76

22

eZ

CO

OC

H3

25

1.6

02

1.6

29

1.6

76

1.6

32

1.6

72

23

ZC

OO

C2H

52

51

.60

21

.63

71

.68

21

.63

51

.66

824

ZC

N2

51

.60

21

.57

71

.63

61

.73

51

.88

025

eZ

CH

O2

51

.60

21

.61

01

.67

71

.45

51

.54

626

ZN

O2

25

1.6

02

1.5

81

1.6

70

1.4

21

1.8

58


Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

Tab

le1

–continued

Par

ent

mo

iety

Sl.

no

.S

ub

stit

uen

ts

(2d

[O2]/

dt)

inh

(£

102

6M

21

s21)

[Ref

s[2

5,3

7,3

8]]

log

10

3

2

� d½O

2�

dt

�P

rop

erty

a

pre

dic

ted

/ca

lcu

late

d

Pro

per

tyb

pre

dic

ted

/ca

lcu

late

d

Pro

per

tyc

pre

dic

ted

/ca

lcu

late

d

Pro

per

tyd

pre

dic

ted

/ca

lcu

late

d

OH

R

27

ZN

(CH

3) 2

5.5

2.2

60

2.2

88

2.1

45

2.0

81

2.1

86

28

ZN

H2

6.2

2.2

08

2.1

02

2.0

32

1.9

43

1.9

66

29

eZ

CH

38

.82

.05

62

.01

62

.04

22

.16

82

.39

030

eZ

CH

2C

H3

5.5

2.2

60

2.0

30

2.0

51

2.1

64

2.3

58

31

ZC

H(C

H3) 2

5.4

2.2

68

2.0

46

2.0

53

2.1

67

2.3

13

32

eZ

C(C

H3) 3

6.1

2.2

15

2.1

04

2.0

86

2.1

92

2.2

96

33

ZO

CH

32

51

.60

21

.70

71

.67

01

.76

91

.62

134

eZ

OC

H2C

H3

25

1.6

02

1.7

18

1.6

87

1.8

01

1.6

67

35

ZO

CH

(CH

3) 2

25

1.6

02

1.7

23

1.7

08

1.8

15

1.6

89

36

ZO

CH

2C

6H

52

51

.60

21

.81

41

.85

51

.71

21

.79

837

eZ

OC

6H

52

51

.60

21

.70

31

.64

41

.58

61

.40

238

ZC

OO

CH

32

51

.60

21

.83

41

.84

11

.87

82

.00

939

ZC

l2

51

.60

21

.64

11

.67

21

.50

51

.39

640

eZ

NO

22

51

.60

21

.65

11

.63

31

.31

51

.32

6O

H

R

OC

H3

41

ZN

(CH

3) 2

3.3

2.4

81

1.9

50

1.9

60

2.2

24

2.0

82

42

ZN

H2

4.6

2.3

37

1.9

62

1.9

52

2.1

15

2.0

13

43

ZO

CH

2C

6H

58

.52

.07

11

.88

71

.92

92

.22

62

.20

744

ZO

CH

38

.62

.06

62

.01

32

.04

92

.08

82

.05

945

ZO

CH

2C

H3

9.4

2.0

27

2.0

15

2.0

49

2.1

01

2.0

64

46

ZO

CH

(CH

3) 2

8.6

2.0

66

2.0

21

2.0

52

2.1

31

2.0

73

47

ZC

H3

25

1.6

02

1.9

34

1.9

39

1.9

61

1.9

10

48

ZC

(CH

3) 3

25

1.6

02

1.9

00

1.8

93

2.0

41

1.8

85

49

eZ

H2

51

.60

21

.88

21

.88

41

.80

61

.83

850

ZC

H2O

H2

51

.60

21

.90

01

.90

81

.92

91

.87

751

ZC

Hv

CH

CO

OC

H3

25

1.6

02

1.7

61

1.7

26

1.8

63

1.5

77

52

eZ

CO

OC

H3

25

1.6

02

1.6

95

1.6

63

1.7

55

1.6

67

53

ZC

HO

25

1.6

02

1.6

76

1.6

54

1.6

00

1.5

44

54

ZN

O2

25

1.6

02

1.3

43

1.3

27

1.6

10

1.7

71

OH

CH

2

55

2.2

2.6

58

2.5

44

2.5

07

2.6

47

2.6

84

56

e1

.72

.77

02

.54

62

.50

82

.64

42

.68

1

57

CH

3

2.1

2.6

78

2.7

18

2.6

54

2.6

68

2.6

55

I. Mitra et al.398

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

Tab

le1

–continued

Par

ent

mo

iety

Sl.

no

.S

ub

stit

uen

ts

(2d

[O2]/

dt)

inh

(£

102

6M

21

s21)

[Ref

s[2

5,3

7,3

8]]

log

10

3

2

� d½O

2�

dt

�P

rop

erty

a

pre

dic

ted

/ca

lcu

late

d

Pro

per

tyb

pre

dic

ted

/ca

lcu

late

d

Pro

per

tyc

pre

dic

ted

/ca

lcu

late

d

Pro

per

tyd

pre

dic

ted

/ca

lcu

late

d

58

H3C

2.0

2.6

99

2.5

38

2.5

02

2.6

40

2.6

81

59

OC

H3

2.1

2.6

78

2.5

43

2.5

06

2.6

56

2.6

93

60

F1

.92

.72

12

.76

02

.69

02

.66

82

.64

4

61

F2

.02

.69

92

.73

42

.66

82

.67

12

.67

4

62

e

Br

1.5

2.8

24

2.7

60

2.6

93

2.6

50

2.6

16

63

Br

3.1

2.5

09

2.5

68

2.5

25

2.6

28

2.6

57

64

Cl

2.5

2.6

02

2.7

60

2.6

93

2.6

53

2.6

21


Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

hyperconjugation. Thus, each molecule is represented by

just a handful of numbers, being the components of the

vectors describing the molecule’s bonds. As a result,

similarity measures are reduced to discrete distance-like

measures in the BCP space without losing their quantum

mechanical basis.

2.3 Splitting of the data set

Any QSAR modelling should ultimately lead to statisti-

cally robust models capable of making reliable predictions

of activities of compounds. When QSAR/QSPR models

are developed, it is important to validate any fitted model

to check that their predictions will be carried over to fresh

data not used in the model fitting exercise. The validation

strategies check the reliability of the developed models for

their possible application on a new set of data, and

confidence of prediction can thus be judged. Often, since

truly external data points are unavailable for prediction

purpose, original dataset compounds are divided into

training and test sets. This strategy has been used in this

work by splitting the data set of 65 compounds into a

training set and a test set. The former has been used for the

purpose of development of models, and the latter has been

used to check the goodness of predictions from the derived

models.

For the purpose of external validation, the model data

set was divided into a training set of 46 compounds (75%

of the total number of compounds), and a test set of 19

compounds (25% of the whole set). This procedure is

generally performed in cases, where enough new

chemicals are not available for examining the predictive

ability and the robustness of the developed model. Hence,

in such cases, the training set is used for the QSPR model

development, while the test set compounds are utilised for

ensuring the reliability of the developed model. The

selection of the training set plays a key role in the process

of development of a statistically significant QSPR model.

This is because the developed model captures the features

of the training set molecules, and a compound structurally

similar to the training set molecules is predicted well,

since it contains the features captured by the developed

model. On the contrary, a compound significantly

dissimilar from the training set molecules suffers fromTab

le1

–continued

Par

ent

mo

iety

Sl.

no

.S

ub

stit

uen

ts

(2d

[O2]/

dt)

inh

(£

102

6M

21

s21)

[Ref

s[2

5,3

7,3

8]]

log

10

3

2

� d½O

2�

dt

�P

rop

erty

a

pre

dic

ted

/ca

lcu

late

d

Pro

per

tyb

pre

dic

ted

/ca

lcu

late

d

Pro

per

tyc

pre

dic

ted

/ca

lcu

late

d

Pro

per

tyd

pre

dic

ted

/ca

lcu

late

d

65

eC

l2

.12

.67

82

.56

12

.52

02

.62

42

.65

8

No

tes:

aP

roper

typre

dic

ted/c

alcu

late

dac

cord

ing

toE

quat

ion

(6).

bP

roper

typre

dic

ted/c

alcu

late

dac

cord

ing

toE

quat

ion

(7).

cP

roper

typre

dic

ted/c

alcu

late

dac

cord

ing

toE

quat

ion

(8).

dP

roper

typre

dic

ted/c

alcu

late

dac

cord

ing

toE

quat

ion

(9).

eT

est

set

com

po

und

s.

Figure 1. Parent phenolic moiety showing the common atoms.

I. Mitra et al.400

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

poor prediction [47]. Thus, the splitting should be such that

the training set compounds encompass the chemical

features of the whole data set and span the entire descriptor

space. In order to achieve uniformity in the distribution of

molecules into training and test sets, the selection of a

training set for the present work has been performed based

on activity ranking of the molecules comprising the

data set. Thus, the molecules were ranked in the ascending

order of their antioxidant property profile, and every third

compound was selected as the test set starting from the first

ranked compound. Such ranking based on activity data

may ensure that each set bears molecules capturing all the

different molecular features of the entire data set. In order

to assess the ability of the activity ranking method to attain

such similarity-based classification, a principal component

analysis (PCA) score plot, showing the distribution of the

training and the test set compounds in the 3D space, was

analysed. The plot was obtained using the first three

principal components of the QTMS descriptor matrix,

calculated based on the factor analysis approach using the

SPSS software [48]. The plot (Figure 2) shows that each of

the test set compounds is located in close vicinity to at

least one training set compound in the 3D space, thereby

capturing the features of the modelling set. Thus, the data

set has been aptly classified from the aspect of the

molecular similarity approach.

2.4 Chemometric tools utilised for the present work

For the development of the QSPR models, two different

chemometric tools were employed, namely genetic

function approximation (GFA) method and the genetic

PLS (G/PLS) technique. Both the techniques were utilised

for developing models with each of the two sets of

descriptors viz. quantum chemical and QTMS descriptors.

The GFA [49,50] technique is a combination of two

different algorithms: (i) Holland’s genetic algorithm and

(ii) Friedman’s multivariate adaptive regression splines

algorithm. In this technique, an initial population of

equations is generated by random selection of descriptors,

which is then followed by random crossover between pairs

of equations from the initial population, resulting in the

formation of new progeny equations. A parameter referred

to as the ‘lack of fit’ (LOF) value measures the fitness of

the developed model, and the models are ranked according

to this fitness value. Models of higher significance exhibit

lower LOF values.

LOF ¼LSE

1 2cþ d £ p

m

� �2; ð1Þ

where, LSE is the least square error, c is the number of

basis functions, d is the smoothing parameter that was set

at the default value of 1, p is the number of descriptors and

m is the number of observations in the training set. In

effect, ‘d’ is the user’s estimate of how much detail in the

training data set is worth modelling. Smaller equations are

obtained for larger values of ‘d’. The large number of

equations formed by this technique result in a range of

variations during crossover, thereby providing added

information on the quality of fit and importance of the

descriptors. GFA builds models not only with linear

polynomials, but also uses higher-order polynomials,

splines and other nonlinear functions.

The G/PLS [50,51] method was derived from a

combination of the two methods: GFA and PLS

regression. The GFA technique is employed for selecting

the appropriate basis functions, while the PLS regression

method serves as a fitting technique to weigh the relative

contributions of the basis functions for building the final

QSPR model. The PLS regression technique enables the

use of numerous, highly correlated and noisy variables for

3

2

1

0

PC_1

PC_2

–1

–2

–1

–4 –3Test set

Training set

–2PC_3

–1 0 1 2

0

1

2

Figure 2. PCA score plot of first three components for theQTMS descriptor matrix.

Table 2. Descriptors used for the present work.

Category ofdescriptor Descriptors used

Mulliken charges C1, C2, C3, C4, C5, C6,O7, H8 (Mulliken charges on theeight common atoms of the phenolicderivatives)

Quantum topologicalmolecular similarityindices (QTMS)

l1, l2, l3, 1, r, 72r, K, G, distance

(all the descriptors were calculatedfor each of the bonds connectingthe common atoms)


Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

building the models. In this method, latent variables (LVs)

are generated, which are functions of the original

variables, and thus this technique enables the building of

larger QSPR models, avoiding chances of overfitting of

data.

2.5 Statistical analysis and model validation

The quality of the developed QSPR model is judged on the

basis of several statistical parameters. Thus, the fitness of

the model is determined using the following metrics:

(i) determination coefficient (R 2), (ii) explained variance

(Ra2), (iii) standard error of estimate (s) and (iv) variance

ratio (F) at specified degrees of freedom (df) [52]. In case

of a GFA model, an additional parameter viz. LOF is used

to assess the model fitness. However, analysis of the

statistical parameters does not always assess the predictive

power of the model, especially in cases where the model is

used for activity prediction of new untested molecules.

The addition of increased number of descriptors may add

up the value of R 2, but such an increase in the value of R 2

does not necessarily mean an improvement in the

predictive ability of the developed QSPR model. Thus,

further validation of the models is needed to analyse their

predictive potential. Subsequently, internal and external

predictive abilities of the models were determined based

on the internal and external validation techniques,

respectively. The validation strategies check the reliability

of the developed models for their possible application on a

new set of data, and thus the confidence of prediction can

be judged [53,54].

Leave-one-out cross-validation (LOO-CV) is a prac-

tical and reliable method of internal model validation [55].

In the LOO-CV method, parts of the training set data are

kept out of model development, while the model is

developed based on the remaining data. The portion of the

data, which has been held out, is then predicted by the

developed model and compared with the actual values.

This procedure is repeated several times, until every

observation has been kept out once and only once. On the

basis of the validation technique, two parameters [56,57]

are calculated, viz. predicted residual sum of squares

(PRESS) and cross-validated R 2 (LOO-Q2), which are

used as criteria of both robustness and predictive ability of

the model. The higher the value of Q 2 (more than 0.5), the

better is the model predictivity.

Q2 ¼ 1 2

PðYobsðtrainÞ 2 YpredðtrainÞÞ

2PðYobsðtrainÞ 2 �YtrainingÞ

2; ð2Þ

where, Yobs(train) is the observed activity, Ypred(train) is the

LOO predicted activity and �Ytraining is the mean observed

activity of the training set compounds.

External validation is an important tool for proper

selection of QSPR models. Since for this work, enough

new chemicals were unavailable for prediction purpose,

external validation has been performed on a fragment of

the original data set that has not been utilised for the

development of the QSPR model and that has been

selected as the test set. Thus, the antioxidant properties of

the test set molecules were predicted using the QSPR

model, developed with the training set, and were

subsequently compared with the observed antioxidant

property data. The outcome of the external validation

technique is another new parameter referred to as the

predictive R 2 (R2pred) [53] and is defined by the following

equation:

R2pred ¼ 1 2

PðYobsðtestÞ 2 YpredðtestÞÞ

2PðYobsðtestÞ 2 �YtrainingÞ

2: ð3Þ

In the above equation, Yobs(test) and Ypred(test) are the observed

and predicted antioxidant property data, respectively, of the

test set compounds. A value of R2pred (given by Equation (3))

greater than the stipulated value of 0.5 reflects an efficient

prediction of the antioxidant property for the test set

molecules by the developed model. Other parameters

calculated for judging the external predictive potential of

the developed QSPR models include the metrics developed

by Golbraikh and Tropsha [53]. These parameters refer to the

fact that for an ideal QSPR/QSAR model, the value of the

correlation coefficient (r) between the observed [Yobs(test)]

and predicted [Ypred(test)] activities of the test set compounds

should be close to 1. They showed that either of the squared

correlation coefficients of these two regression lines,Yobs(test)

against Ypred(test) and Ypred(test) against Yobs(test), passing

through the origin, i.e. r20 or r0

20, respectively, should be close

to the value of r 2 for an ideal QSPR model. Here, r 2 and r20

indicate the squared correlation coefficients between the

observed and the predicted activity values with and without

the intercept, respectively, while r020 provides the same

information as r20 does, but with inverted axes. Besides these,

for an ideal QSPR model, regressions of the observed against

predicted activity data or predicted against observed activity

data through the origin should be characterised by either k or

k0(slopes of the corresponding regression lines) being close

to 1.

Thus, according to Golbraikh and Tropsha [53],

models satisfying the following conditions are considered

acceptable:

(i) Q 2 . 0.5;

(ii) r 2 . 0.6;

(iii) r20 or r0

20 close to r 2, i.e. ðr 2 2 r2

0Þ=ðr2Þ , 0:1 or

ðr 2 2 r020Þ=r

2 , 0:1 and

(iv) 0:85 # k # 1:15 or 0:85 # k0 , 1:15.

These stringent external validation parameters efficiently

eliminate the probability of chance correlation, resulting in

I. Mitra et al.402

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

case of models with a large number of descriptors. Again,

as the above (Equation (3)) suggests, the value of R2pred

depends significantly on the value of the denominator in

Equation (3). As the value ofP

ðYobsðtestÞ 2 �YtrainingÞ2

increases, the value of predictive R 2 also improves. Again,

the dependence of the denominator on the value of �Ytraining

reflects that the selection of the training set chiefly

dominates the value of R2pred, and hence it may not truly

reflect the models’ predictive capability for the test set (or

a new data set) molecules. So, the squared correlation

coefficient values between the observed and predicted

values of the test set compounds with the intercept (r 2) and

without the intercept (r20) may be calculated to assess the

performance of the prediction of the developed QSPR

model. Moreover, the value of the squared regression

coefficient (r 2) between observed and predicted values of

the test set compounds does not necessarily indicate that

the predicted values are very near to the observed property

data. Despite maintaining a good overall intercorrelation

among them, there may be a considerable numerical

difference between the observed and predicted property

data. Thus, to obviate these problems and to better gauge

the external predictive capacity of a model, the values of

modified r 2 metrics (r2m) having a threshold of 0.5 were

calculated [47,54,58,59].

r2m ¼ r 2 £ 1 2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffir 2 2 r2

0

� �q� �: ð4Þ

For any QSPR model, the value of r 2 is always greater

than (or equal to) the value of r20. Thus, in case of good

prediction, where the predicted property values lie in close

proximity to the observed data, the r 2 value will be very

near to the r20 value. Consequently, an ideal prediction of

the property is characterised by a value of r2m equal to that

of r 2. The r2m metrics, being solely dependent on the

observed and predicted property data of the molecules

under analysis, exert improved reliability for the assess-

ment of QSPR model predictivity. The r2m ðLOOÞ and r2

m ðtestÞ

parameters are used for detecting the proximity of fitness

of the predicted property data to that of the observed ones

for the training and test sets, respectively. Besides these,

the r2m ðoverallÞ parameter [58,59] is also calculated, which

ascertains the overall model predictivity based on the

predicted property values of the whole data set (both

training and test sets). The r2m ðoverallÞ statistics may be used

for the selection of the best predictive models from among

comparable models. The parameter r2m was used by

different groups of authors to check the external

predictability of QSAR models [60,61].

2.6 Validation by randomisation

The randomisation technique provides a more robust method

for further validation of a QSPR model. This technique

ensures whether the model developed is the outcome of mere

chance or a robust one with significant reproducibility. This

method involves permutation of the response parameter

(Y-column) with the descriptor matrix (X-columns) kept

unchanged, followed by the development of QSPR models

using the shuffled property values. For a robust QSPR model,

the value of average correlation coefficient (Rr) thus

calculated from the randomised models is less than

the correlation coefficient (R) of the original model. The

randomisation technique constitutes two methods: process

randomisation and model randomisation. Process randomis-

ation is performed using the total data set and scrambling the

property data based on the entire descriptor matrix in order to

assess the reliability of the model building process employed

for this work. However, model randomisation is done based

on the descriptors appearing in the corresponding QSPR

model developed, so as to judge the robustness of the model

obtained. In order to quantify the degree of difference,

between the values of Rr and R, another parameter, cR2p, was

calculated which penalises the model R 2 for small

differences between the values of R 2 and R2r [62].

cR2p ¼ R £

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiR2 2 R2

r

� �q: ð5Þ

In an ideal case, the average value of R 2 for the randomised

models should be zero, i.e. R2r should be zero. Consequently,

in such a case, the value of cR2p should be equal to that of R 2

for the developed QSAR model. Thus, models having an

acceptable value for this parameter (.0.5) are considered to

be robust enough and are not obtained merely by chance.

2.7 Applicability domain

Applicability domain (AD) [63,64] is a theoretical region

in the physico-chemical, structural or biological space

defined by the model descriptors and modelled response.

Based on the AD criterion, a QSAR/QSPR model is

developed, which can be utilised for activity prediction of

only those chemicals that lie within the specified domain.

The domain of applicability of the molecules estimates the

uncertainty in the prediction of a particular molecule,

based on how similar it is to the compounds used to build

the QSAR/QSPR model. For a compound highly

dissimilar to all other modelling compounds, reliable

prediction of its activity becomes unlikely. Thus, the

concept of AD [65] enables to avoid such unjustified

extrapolation for activity predictions. In this work, the AD

of the phenolic derivatives has been analysed based on two

different methods: (i) the leverage approach [64] for the

GFA models and (ii) the distance to model (DModX) [51]

based calculations for the G/PLS models.

In case of the leverage approach, a leverage value (h) is

calculated for each of the test set molecules and is plotted

against the corresponding standardised residual for


Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

obtaining a plot referred to as the Williams plot. For a

particular model, the value of h for the individual

molecules should be lower than the corresponding critical

leverage value (h* ¼ 3 £ p0=n, where p0 is the number of

model variables plus 1, and n is the number of the

compounds used to calculate the model). For a molecule

with a value of h greater than h*, activity prediction based

on the respective QSAR/QSPR model should be

considered unreliable. Moreover, compounds with stan-

dardised residuals greater than three standard deviation

(SD) units (.3s) are response outliers. The leverage

values have been calculated using the SPSS software [48].

Again, in case of AD calculation based on the DModX

approach, the residuals of Y and X are used as diagnostic

values for ensuring the quality of the model [51].

The residual SD of the X-residuals of the corresponding

row of the residual matrix E is proportional to the distance

between the data point and the model plane in X-space,

often called distance to the model in X-space (DModX).

Here, X is the matrix of predictor variables, of size N £ K

[where N is the number of objects (cases, observations)

and k is the index of X-variables (k ¼ 1; 2; . . . ;K)], Y is the

matrix of response variables of size N £ M [m is the index

of Y-variables (m ¼ 1; 2; . . . ;M)] and E is the N £ K

matrix of X-residuals. A DModX value larger than around

2.5 times the overall SD of the X-residuals (corresponding

to an F-value of 6.25) indicates that the observation

is outside the AD of the model [51]. The DModX values

for the G/PLS models were calculated using SIMCA

software [66].

3. Results and discussion

Several QSPR models were developed in this work based

on the two sets of descriptors: Mulliken charges and

QTMS descriptors. Two chemometric tools namely GFA

and G/PLS were employed for the development of the

models. The best models thus developed are summarised

in Table 3. Models developed from charges and QTMS

descriptors calculated at HF/6-31G(d) level of theory were

superior to models developed from corresponding

descriptors calculated at other two levels and hence only

the former models are reported here. The GFA models

were developed using 5000 iterations considering both

linear and spline options. The models thus developed are

nonlinear, and the spline terms are expressed as truncated

power splines and denoted with angular brackets. For

example, , f ðxÞ2 a . is equal to zero if the value of

f ðxÞ2 a is negative, else it is equal to f ðxÞ2 a. The

constant ‘a’ is called the knot of the spline. G/PLS was

performed with 1000 iterations, scaled variables and with

the option of no fixed length of equation. The maximum

number of components or LVs fixed for variable selection

was 3. These components are the functions of the original

descriptors and they encode data as represented by theTab

le3

.S

tati

stic

alq

ual

itie

so

ffo

ur

dif

fere

nt

QS

PR

mo

del

sd

evel

op

edin

this

wo

rk.

Usi

ng

Mu

llik

ench

arg

eso

nth

eco

mm

on

ato

ms

asd

escr

ipto

rs

Sta

tist

ical

too

lM

od

eln

o.

Eq

.n

o.

Des

crip

tors

LV

sn

train

ing

sR

2R

2 aF

PR

ES

SQ

2r m2

(LO

O)

nte

stR

2 pre

dr2 m

(test

)r2 m

(overa

ll)

GF

Asp

lin

e1

6,

C22

0.3

39

47

2.

,C

4,

,O

7þ

0.7

82

95

6.

,,

0.4

57

28

32

H8.

–4

60

.18

40

.84

60

.83

15

6.3

91

.68

20

.81

30

.64

81

90

.86

70

.77

80

.69

7

G/P

LS

spli

ne

27

,C

22

0.3

46

61

3.

,C

3,

O7

,,

C4þ

0.2

67

87

5.

,,

0.4

58

96

12

H8.

24

60

.18

60

.83

50

.82

71

08

.59

1.7

97

0.8

01

0.7

87

19

0.8

61

0.7

50

0.8

02

UsingQTMSdescriptors

GF

Ali

nea

r3

8l

2_

01

07;l

3_

04

03;1

04

05;

dis

t 04

05

–4

60

.18

40

.84

60

.83

15

6.3

11

.70

40

.81

10

.64

51

90

.90

40

.88

20

.69

8

G/P

LS

spli

ne

49

,l

2_

01

07þ

0:6

70

42.;

,0:2

67

542

l3

_0

30

2.;

l3

_0

40

3;1

04

05

24

60

.19

10

.82

60

.81

81

02

.39

1.8

55

0.7

94

0.7

79

19

0.8

53

0.8

00

0.7

86

I. Mitra et al.404

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

descriptors. Among all the developed QSPR models, two

best models based on each set of descriptors are detailed

below. The calculated/predicted antioxidant property

values according to the discussed equations are shown in

Table 1.

3.1 Models developed using the charge descriptors

The models have been developed based on the charge

descriptors calculated using the Hartree–Fock method at

the HF/6-31G(d) level of theory. The best two models

developed using the GFA and G/PLS techniques,

respectively, are reported below.

log103

2d½O2�

dt

� �¼2:72229:2ð^3:983Þ,O7þ0:782956.

28:34ð^1:402Þ,C220:339472.

þ0:807ð^0:144ÞC4þ204ð^57:58Þ

,0:4572832H8. ð6Þ

ntraining ¼ 46; s ¼ 0:184; R2 ¼ 0:846; R2a ¼ 0:831;

F ¼ 56:39ðdf4; 41Þ; PRESS ¼ 1:682; Q 2 ¼ 0:813;

r2m ðLOOÞ ¼ 0:648; ntest ¼ 19; R2

pred ¼ 0:867;

r2m ðtestÞ ¼ 0:778; r2

m ðoverallÞ ¼ 0:697:

Equation (6) has been modelled based on the GFA technique

of descriptor selection. In the above equation, ntraining and

ntest refer to the number of compounds in the training and test

sets, respectively. Thevalues ofQ 2 (0.813) andR2pred (0.867),

being much higher than their stipulated values of 0.5, signify

the statistical significance of the developed model. Besides

these, acceptable values of r2m metrics signify that the

antioxidant property values predicted using Equation (6) are

in close proximity to the corresponding observed data. Thus,

the model developed may be satisfactorily used for property

prediction of new molecules of this class. The descriptors

appearing in Equation (6) may be ranked according to the

following order of their weightage: (i), O7þ 0:782956 .,

(ii) , C2 2 0:339472 ., (iii) C4 and (iv) , 0:4572832

H8 .. The O7 descriptor refers to the charge on the hydroxyl

oxygen atom. A negative coefficient of the , O7 þ

0:782956 . descriptor signifies that any negative value of

the O7 descriptor greater than 0.782956 accounts for the zero

contribution of the spline term, and hence facilitates an

increase in the property profile of these molecules. Such an

increase in the negative charge over the hydroxyl oxygen is

achieved by substituting the phenolic nucleus with groups

having a positive mesomeric effect as well as those having

positive inductive effect (electron-donor groups). The

increased antioxidant property of compound nos. 58, 60,

61 and 64 may be attributed to the positive inductive effects

exerted by the alkylphenyl substituents on C2 and alkyl

substituents on C4 and C6. Again, positive mesomeric effect

of compound nos.4 and 5 accounts for the increased negative

charge on O7. Again, a negative coefficient of the spline term

bearing the C2 descriptor (referring to the value of charge on

C2) indicates that negative values of the spline term exert

zero contribution on the property profile of these molecules,

and hence brings about an enhancement in the antioxidant

property data. The C2 descriptor bears both positive and

negative values indicating that any value less than the knot of

the spline (0.339472) is conducive for the antioxidant

property of these phenolic derivatives. Now, if compound

nos. 2 and 57 are compared, it is observed that compound no.

2 having a large positive value (0.378243) for the C2

descriptor shows less antioxidant property than compound

no. 57 with a value of 0.023671 for the C2 descriptor. This

observation may be explained based on the inductive effect

of the substituents present at the C2 positions of compound

nos. 2 and 57. Compound no. 57 having an alkylphenyl

substituent

H2C CH3

experiences a positive inductive effect resulting in an

increase in the electron density over C2, and hence a decrease

in the value of positive charge over C2. On the contrary,

compound no. 2 having an electronegative atom bonded to

C2 (ZOCH2CH3) suffers from a negative inductive effect

and subsequently the positive charge on C2 increases.

A positive coefficient of the C4 descriptor signifies that the

antioxidant property of the phenolic derivatives is favoured

by an increase in the value of positive charge on C4. Thus, a

desirable charge on C4 may be achieved by substituting the

position with groups having a negative inductive effect

(substituents with electronegative atoms). A comparison of

compound no. 8 with compound no. 12 reveals that

compound no. 8 having a methoxy substituent at C4 exhibits

increased antioxidant property, due to a negative inductive

effect exerted by the substituent. However, compound no.12,

bearing an alkyl substituent at C4, experiences a positive

inductive effect that in turn results in an increase in the

electron density over C4 and hence a decrease in its positive

charge leading to a reduction in its antioxidant property. A

positive coefficient for the spline term with the H8 descriptor

implies that the antioxidant property profile of these

molecules is favoured when the value of positive charge on

H8 is less than the value of 0.457283. Compound nos. 5–7,

27, 28 and 31 having positive values for the , 0:457283 2

H8 . descriptor exhibit maximum to moderate antioxidant

property profiles. Thus, it may be inferred that phenolic

derivatives having an increased electron density over the

phenolic oxygen show an improvement in the antioxidant

property, since in such cases, the hydrogen ion becomes


Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

easily available for interaction with the nocive free radicals.

log103

2d½O2�

dt

� � ¼ 217:048 2 9:882

, C2 2 0:346613 . þ0:868

, C4 þ 0:267875 . 20:169

£ C3 2 24:810 £ O7 þ 101:937

, 0:458961 2 H8 .

ð7Þ

ntraining ¼ 46; s ¼ 0:186; R2 ¼ 0:835; R2a ¼ 0:827;

F ¼ 108:59ðdf2; 43Þ; PRESS ¼ 1:797; Q2 ¼ 0:801;

r2m ðLOOÞ ¼ 0:787; ntest ¼ 19; R2

pred ¼ 0:861;



Equation (7) was developed based on the G/PLS technique

for variable selection and model development. Although

Equation (7) is equally acceptable to Equation (6) based on

the values of Q 2 (0.801) and R2pred (0.861), the significantly

larger value of the r2m ðoverallÞ (0.802) parameter for the former

indicates increased predictive ability of the model. Thus,

based on overall predictive performance, it may be inferred

that Equation (7) may be more efficiently utilised for

antioxidant property prediction of new series of phenolic

derivatives. The variables appearing in Equation (7)

resembled significantly those of Equation (6). A variable

importance plot (VIP) (Figure 3) for the descriptors

appearing in Equation (7) shows that the descriptors can be

weighted according to the following order of significance: (i)

O7, (ii) , C4 þ 0:267875 ., (iii) , C2 2 0:346613 .,

(iv) , 0:458961 2 H8 . and (v) C3. Moreover, a spline

term for the C4 descriptor appearing in Equation (7) provides

an optimum value for the charge on C4, required for

exhibiting significant antioxidant property. Since the knot of

the spline term is 0.267875, the antioxidant property thus

improves with an increase in the negative numerical value of

C4 above 0.267875 and for increased positive values of

charge on C4. Such a condition is accomplished by

substituting C4 with functional groups having positive

mesomeric effect and negative inductive effect. Additionally,

the C3 descriptor, implying the charge on C3, bears a

negative coefficient and hence signifies that the property

profile of the molecules is favoured with an increase in the

numerical value of negative charge on C3, together with a

decrease in the value of positive charge. More positive charge

on C3 for compound no. 33 accounts for its reduced

antioxidant property compared to compound no. 31. This

observation may be explained based on the inductive effect

of the substituents at C3. Compound no. 31 having an alkyl

substituent exerts a positive inductive effect on the phenolic

nucleus, resulting in an increase in the electron density on C3

and hence a decrease in the positive charge. On the contrary,

compound no. 33 having an electronegative atom attached at

C3 (ZOCH3) suffers from a negative inductive effect, where

the electron cloud is pulled away from the phenolic nucleus

towards the substituent, resulting in an increase in the

positive charge on C3. All the remaining descriptors

appearing in Equation (7), as explained above, signify the

importance of substitutions of the parent moiety, so as to

attain the desired antioxidant property. The mesomeric effect

and the inductive effect of the substituents play the key role

for achieving the desirable charge density over the respective

atoms. Thus, the charge density over the ring should be such

that the electron cloud over the hydroxyl oxygen is stabilised

and the proton becomes easily available for free radical

neutralisation.

In order to view the impact of electron-donor

substituents on the electronic delocalisation of the

phenolic oxygen, the electrostatic potential surface area

was determined for the most active compound (compound

no. 5) and one of the least active compounds (compound

no. 26) bearing a ring deactivating substituent (ZNO2).

Energy minimisation of both the compounds was

performed at the AM1 level using the MOPAC tool

present in the Chem 3D software [67]. Figures 4 and 5

show the wire mesh structures for the molecular

electrostatic potential surfaces of compound nos. 5 and

26 respectively, where the blue mesh indicates negative

charge, while the red mesh implies positive charge. It is

clear from the figures that the electron density over the

phenolic oxygen is much more in compound no. 5

compared to that in compound no. 26. In case of

compound no. 26, the electron cloud is shifted towards the

deactivating nitro group, thus reducing the electron density

over the hydroxyl oxygen. Thus, in compound no. 5, the

electron-donor group (a methoxy substituent) increases the

electron density over the hydroxyl oxygen and thus

reduces the availability of the lone pair of electrons for

aromatic delocalisation.

1.6

1.2

0.8

Var

iabl

e im

port

ance

Variable importance plot (VIP)

0.4

Variables

0

07 C3

<C

4+0.

2678

75>

<C

2–0.

3466

13>

<0.

4589

61-H

8>

Figure 3. VIP plot for the descriptors appearing in model 2(Equation (7)).

I. Mitra et al.406

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

3.2 Models developed using the QTMS descriptors

These models were developed based on the BCP

descriptors calculated at the HF/6-31G(d) level of theory

using two different chemometric tools (GFA and G/PLS).

Thus, the descriptors appearing in these equations refer to

the characteristics exhibiting different electronic proper-

ties of different bonds of the phenolic nucleus, necessary

for exhibiting maximum antioxidant property.

log103

2d½O2�

dt

� � ¼ 231:31 þ 12:134ð^1:015Þ l2_0107

þ 11:860ð^2:907Þ £ l3_0403

þ 8:108ð^1:316Þ £ 10405

þ 13:708ð^4:692Þ £ dist0405 ð8Þ

ntraining ¼ 46; s ¼ 0:184; R2 ¼ 0:846;

R2a ¼ 0:831; F ¼ 56:31ðdf4; 41Þ;

PRESS ¼ 1:704; Q2 ¼ 0:811;

r2m ðLOOÞ ¼ 0:645; ntest ¼ 19; R2

pred ¼ 0:904;



The above model was developed based on the GFA

technique of descriptor selection and model development.

Acceptable values of Q 2 (0.811) and R2pred (0.904) for

Equation (8) reflect the predictive potential of the model.

Moreover, a significantly large value of R2pred indicates

improved ability of the model to predict the property of new

set of molecules of this class. Again, statistically significant

results for all the r2m metrics indicate that the predicted

property values of all the compounds are close to the

corresponding observed data. Although in Equation (8), the

R2pred (0.904) parameter bears a larger value compared to

R 2 (0.846) for the training set, it does not significantly

affect the model predictivity. This is because the r2m ðtestÞ

(0.882) parameter compensates for the increased value

of R2pred and bears little difference with the value of R 2. The

R2pred parameter may overlook a large difference between

the observed and predicted activity data of the test set

molecules for a wide range of activity data, but the r2m ðtestÞ

metric efficiently determines the proximity between them.

Thus, the r2m ðtestÞ metric effectively denotes the external

predictive ability of the developed model and being close to

model R 2, it may be inferred that the above model is well

predictive in terms of both internal and external predictive

ability. Based on the regression analysis performed with the

standardised values of the descriptors, they may be ranked

as follows: (i) l2_0107, (ii) 10405, (iii) l3_0403 and (iv) dist0405.

In the above equation, a positive coefficient of the l2_0107

descriptor implies that the activity of these molecules

improves with a decrease in the p character of the C1ZO7

bond, since the l2_0107 descriptor values are negative. Such

a decrease in p character of the C1ZO7 bond occurs in the

Figure 4. Molecular electrostatic potential surface forcompound no. 5.

Figure 5. Molecular electrostatic potential surface forcompound no. 26.


Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

presence of electron-donor substituents on the parent

phenolic moiety, thus making the lone pair of electrons on

phenolic oxygen less available for aromatic delocalisation.

In case of compound nos. 19 and 26, the presence of

deactivating functional groups such as ZNO2 and ZCl

signifies an increased probability of aromatic delocalisa-

tion of the lone pair of electrons of the phenolic oxygen and

hence a reduction in antioxidant property values of the

compounds. On the contrary, compound no. 8 having

methoxy (ZOCH3) substituent exhibits a reduction in thep

character of the C1ZO7 bond and hence improved

antioxidant property, since the presence of an electron-

donor substituent or an activating group reduces the p bond

character of the bond between C1ZO7. Again, a positive

coefficient of the l3_0403 descriptor indicates that the

antioxidant property of these molecules improves with an

increase in the s character of the C3ZC2 bond. This in turn

implies that a reduction in the p electron cloud over the

benzene nucleus is conducive for an improved property

profile of these molecules. Thus, the presence of electron-

donor substituents on the phenolic moiety decreases the

tendency of the oxygen atom to participate in the formation

of the delocalised resonance structures of benzene and thus

decreases the extent of delocalisation of the aromatic p

electron cloud. Compared to compound no. 26, compound

no. 8 shows an improvement in its antioxidant property

profile and such an increase is attributed to the presence of

electron-donor methoxy group (ZOCH3) in compound no.

8 and electron-withdrawing nitro group (ZNO2) in

compound no. 26. Similarly, comparing compound nos.

28 and 39, the presence of ring-activating substituents in

compound no. 28 accounts for an increased property profile

of compound no. 28 compared to compound no. 39. The

10405 descriptor (1 ¼ ðl1=l2Þ2 1) refers to the ellipticity

of the C4ZC5 bond, and a positive coefficient of this

variable in the above equation signifies an increase in the

antioxidant property of these molecules, with an increase in

the value of 10405. Moreover, such an increase in the value

of 10405 descriptors is accomplished through a decrease in

the p character of the C4ZC5 bond. Thus, the 10405

descriptor once again demonstrates that the presence of

electron-donor substituents is necessary for improved

antioxidant property of these molecules. Compound nos.

4–7 and 41 with electron-donor substituents at various

positions of the phenolic nucleus bear higher values for the

10405 descriptor and hence exhibit increased antioxidant

property profile. Again, a positive coefficient for the

dist0405 descriptor refers to the fact that the antioxidant

property profile of these molecules improves with an

increase in the length of the C4ZC5 bond. The dist0405

descriptor in Equation (8) also implies that a decrease in p

character of the C4ZC5 bond causes an increase in the

bond length and hence an improvement in the antioxidant

property data of these molecules. Compound nos. 4–6

exerting maximum values for the antioxidant property bear

electron-donor substituents on the phenolic moiety and

hence the hydroxyl oxygen has more availability of

localised lone pair of electrons, resulting in easy

availability of the proton for interaction with the

neighbouring toxic free radicals.

log103

2d½O2�

dt

� � ¼ 22:999 þ 8:763

, l2_0107 þ 0:67042 . 212:621

, 0:26754 2 l3_0302 . þ12:207 £ l3_0403

þ 6:562 £ 10405 ð9Þ

ntraining ¼ 46; s ¼ 0:191; R2 ¼ 0:826; R2a ¼ 0:818;

F ¼ 102:39ðdf2; 43Þ; PRESS ¼ 1:855;

Q 2 ¼ 0:794; r2m ðLOOÞ ¼ 0:779; ntest ¼ 19;

R2pred ¼ 0:853; r2

m ðtestÞ ¼ 0:800; r2m ðoverallÞ ¼ 0:786:

Equation (9) was developed based on the G/PLS technique,

where the GFA tool was used for descriptor selection, while

the PLS technique was utilised for model construction.

This model exhibits statistically acceptable internal

(Q 2 ¼ 0.794) and external (R2pred ¼ 0.853) validation

parameters. Additionally, the model exhibits high values

of the r2m metrics implying that the predicted property data

of these molecules are close to the desired range of the

observed data. According to the VIP plot (Figure 6), the

descriptors occurring in Equation (9) may be arranged

according to the following order of significance:

(i) , l2_0107 þ 0:67042 ., (ii) , 0:26754 2 l3_0302 .,

(iii) 10405 and (iv) l3_0403. The QTMS descriptors have been

claimed [41,42] to have a diagnostic potential in revealing

important fragments of molecules contributing to the

response property. The present work shows the importance

of the CZO bond [41] (and its p character) for the

antioxidant property of phenols. Most of the descriptors

appearing in Equation (8) reappear in Equation (9),

1.6

1.2

0.8

Var

iabl

e im

port

ance

Variable importance plot (VIP)

0.4

Variables

<lam2_0107+0.67042>

<0.26754–lam3_0302>

e110405 lam3_04030

Figure 6. VIP plot for the descriptors appearing in model 4(Equation (9)).

I. Mitra et al.408

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

signifying the importance of these few descriptors on the

antioxidant property profile of these molecules. Moreover,

the occurrence of the spline term for the l2_0107 descriptor

provides a cut-off range for the value of this descriptor for

exerting maximum impact on the antioxidant property

profile of these molecules. Since a negative value within a

spline term indicates zero contribution of the correspond-

ing spline function, reduced numerical value (values being

negative) of the l2_0107 descriptor below the knot of the

spline (0.67042) accounts for an increased activity profile

of this series of phenolic derivatives, as in case of

compound nos. 4, 5, 58–61. Thus, the increased activity

profile of these molecules may be attributed to the presence

of electron-donor substituents at various positions of the

phenolic nucleus. Again, a negative coefficient of the spline

term bearing the l3_0302 descriptor signifies that values of

this descriptor above the optimum value of 0.26754

account for zero contribution of the corresponding spline

term and hence an improvement in the property profile of

these molecules. Such an increase in the value of l3_0302

descriptor is achieved by substituting the parent moiety

with electron-donor substituents. This results in a

diminution of the p character of the C3ZC2 bond, while

an electron-donor substituent leads to an increase in the

electron density over the phenolic oxygen. A comparison

of compound nos. 28 and 39 reveals that compound no. 28

having electron-donating substituents exhibits improved

antioxidant property than compound no. 39 bearing a

deactivating group (ZCl). Similar phenomenon is also

observed for compound nos. 42 and 54, where compound

no. 42 bearing an amino (ZNH2) substituent exerts an

increased antioxidant property. The two remaining

descriptors (10405 and l3_0403) appearing in Equation (9)

were also present in Equation (8) and imply that electron-

donor or ring-activating substituents on the parent phenolic

moiety exert a positive influence on the antioxidant

property of these molecules.

Besides these, a comparison of the external validation

parameters developed by Golbraikh and Tropsha [53] for all

the developed QSPR models is summarised in Table 4. From

Table 4, it can be concluded that for all four of the QSPR

models, the values of r20 and r0

20 are closely associated with

the value of r 2, while those of k and k0 lie within the desired

range of 0.85 and 1.15. Moreover, significantly low values of

r 2 2 r20=r

2 and r 2 2 r020=r

2 (much lower than the stipulated

value of 0.1) for all the developed models once again indicate

the efficient predictive potential of all the QSPR models

developed in this work.

3.3 Randomisation test employed for further modelvalidation

The robustness of the developed models was checked

based on the randomisation technique. Both process and

model randomisation tests were performed for this work

at 90 and 99% confidence levels, respectively. Model

randomisation was performed in order to ensure the

reliability and predictive ability of the developed QSPR

models, while process randomisation ascertains the

acceptability of the process employed for the development

of the respective models. The results of both the types of

randomisation tests, as summarised in Table 5, reveal that

the average randomised correlation coefficients (Rr) values

for all the models are much lower than the original

correlation coefficients (R) of the corresponding QSPR

models. Thus, all the models are robust and well predictive

based on the values of their randomised correlation

coefficients. Moreover, the cR2p values for all the presented

models were calculated [62]. The cR2p values penalise the

model R 2 for small differences in the values of R 2 and R2r .

The results signify that since the values of cR2p are much

higher than the threshold value of 0.5, all the models may

be considered to be robust and not the outcome of mere

chance only. For the two types of QSPR models developed,

models 1 and 3 show highly acceptable results for both

process and model randomisation tests. However, the

G/PLS models (models 2 and 4) developed using the two

sets of descriptors exhibit maximum values of cR2p for

model randomisation.

Table 4. Validation using external validation parameters of Golbraikh and Tropsha [53].

Model no. r 2 r02 r00

2 k k0 ðr 2 2 r20=r

2Þ ðr 2 2 r020=r2Þ

1 (Equation (6)) 0.884 0.870 0.808 1.018 0.977 0.016 0.0862 (Equation (7)) 0.893 0.867 0.787 1.022 0.973 0.030 0.1193 (Equation (8)) 0.901 0.900 0.885 1.004 0.992 0.002 0.0184 (Equation (9)) 0.836 0.830 0.828 1.016 0.977 0.008 0.010

Table 5. Results obtained from the randomisation tests.

Model no. R 2 R Rr Rr2 cRp

2

Process randomisation1 (Equation (6)) 0.846 0.920 0.362 0.131 0.7782 (Equation (7)) 0.835 0.914 0.540 0.292 0.6743 (Equation (8)) 0.846 0.920 0.401 0.161 0.7614 (Equation (9)) 0.826 0.909 0.610 0.372 0.612

Model randomisation1 (Equation (6)) 0.846 0.920 0.281 0.079 0.8062 (Equation (7)) 0.835 0.914 0.065 0.004 0.8333 (Equation (8)) 0.846 0.920 0.270 0.073 0.8094 (Equation (9)) 0.826 0.909 0.079 0.006 0.823


Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

3.4 Test for AD

The leverage approach was employed for checking the AD

of the test set molecules, predicted using the two GFA

models. Figure 7 shows a plot of standardised residuals vs.

leverage values (Williams plot) [64] of the test set

compounds for models 1 and 3. Since the critical value of

leverage for both the models is 0.789, all 19 test set

compounds are found to be within the AD of the models

(i.e. there is no structurally influential chemical). More-

over, the standardised residuals of all the 19 molecules are

2

1.5

(a) (b)

1

0.5

00 0.05

Leverage Leverage

0.1 0.15 0

1.5

1

0.5

0

–0.5

–1

–1.5

–2

0.05 0.1 0.15 0.2–0.5

Stan

dard

ised

resi

dual

Stan

dard

ised

resi

dual

–1

–1.5

–2

–2.5

Figure 7. Williams plots for (a) model 1 and (b) model 3.

1

2.50

2.00

1.50

DM

odX

1.00

0.50

0.00

2.50(a)

(b)

2.00

D-Crit (0.00999999)

D-Crit (0.00999999)

1.50

DM

odX

1.00

0.50

0.00

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Figure 8. DModX values of the 19 test set compounds at 99% level for (a) model 2 and (b) model 4 with the thick horizontal linessignifying the critical DmodX values.

I. Mitra et al.410

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

located within the range of ^3s, inferring that none of the

compounds behave as outliers. The AD of the two G/PLS

models was checked based on the DModX [51] approach.

A bar diagram for the DModX values of all the test set

compounds for models 2 and 4 is shown in Figure 8. For

model 2, the DModX values of all the test compounds are

below the critical value of 2.502 at the 99% significance

level. So, none of the compounds are outside the AD, and

predictions for all the 19 test compounds are acceptable.

Similarly for model 4, DModX values of all 19 test

compounds are below the critical point of 2.843 at the 99%

significance level. The results thus obtained denote that all

the test set compounds are within the AD and provide

reliable predictions.

4. Overview and conclusions

This work deals with the QSPR analysis of a series of 65

phenolic derivatives having an antioxidant property. The

variation in antioxidant property of these compounds is

due to a variation in the position and type of substitutions

on the parent moiety. The QSPR models thus developed

provide an outline regarding the type and position of

substitutions for the molecules to exert optimum

antioxidant property. Quantum chemical descriptors

belonging to two classes (Mulliken charges and QTMS)

were calculated at the Hartree–Fock HF/6-31G(d) level

for this work. Due to lack of additional available data

related to the measurement of antioxidant properties of

phenolic derivatives by an oxygen absorption method

(involving autooxidation of styrene in presence of the

phenolic derivatives and their rates of peroxy radical

trapping activities), true external validation of the

developed QSPR models was not performed. However,

validation of the developed models has been performed

based on the splitting of the entire data set into training

and test sets and calculating several stringent validation

metrics. This ensures the predictive ability of the

developed models. Approximately, 75% of the compounds

of the whole data set were utilised as the training set, while

25% of the compounds were held out as the test set.

Models were built using the GFA and G/PLS techniques.

Based on the order of significance of the descriptors

appearing in Equations (6) and (7), it can be inferred that

the value of charge on O7 exerts a significant contribution

to the antioxidant property profile of the molecules. An

increase in the electron density over O7, achieved through

electron-donor groups substituted to the parent moiety,

reduces the degree of aromatic delocalisation of the lone

pair of electrons of the phenolic oxygen and facilitates

easy interaction of the phenolic proton with the hazardous

free radicals. Additionally, the charges over C2, C4 and

H8 also influence the antioxidant property of the

molecules to a considerable extent. Thus, substituents

having positive mesomeric effect are conducive to the

property data of these molecules. However, those having a

negative inductive effect (ZOCH3) are favoured for

substitution at C4, while groups with positive inductive

effect (alkyl groups) are favoured at C2. Again, the models

developed with the QTMS descriptors signify that a

decrease in the p character of the C1ZO7 bond is a key

factor for the phenolic derivatives for exhibiting an

efficient antioxidant property. Besides these, a decrease in

aromatic delocalisation of the lone pair of electrons of the

phenolic oxygen over the phenolic nucleus also adds up to

the antioxidant property profile of the phenolic derivatives.

Thus, the QTMS descriptors also infer that substituents

with a positive mesomeric effect (electron-donor groups

like methoxy) increase the electron density over the

hydroxyl oxygen and enable a rapid reaction with the

peroxy radicals. The conclusions drawn in this work are in

accordance with the qualitative observations made in the

previous papers [25,37,38]. The model performance in this

work was judged by the internal and external validation

(using the test set) measures. Acceptable values were

obtained for both the internal and external validation

parameters (Q 2 and R2pred) for all the four models reported

in this work. Besides these, the r2m metrics calculated for

all the models imply that a good correlation is maintained

between the observed and predicted activity data and the

values are in close proximity to each other. Further

validation of the developed QSPR models was performed

using the randomisation technique, and the results obtained

firmly ensure the reliability and robustness of the models.

Finally, it may be concluded that the molecules are capable of

achieving the required criteria for maximising the response

variables, i.e. high antioxidant property, through suitable

substitution (as explained by the developed models) on the

parent phenolic moiety. The QSPR models developed in this

work may be successfully utilised for further design and

analysis of new molecules with an improved antioxidant

property.

Acknowledgements

This research work was supported in the form of a major researchproject to K.R. and a senior research fellowship to I.M. by theIndian Council of Medical Research (ICMR), New Delhi.

References

[1] E. Koutsilieri, C. Scheller, E. Grunblatt, K. Nara, J. Li, andP. Riederer, Free radicals in Parkinson’s disease, J. Neurol. 249(2002), pp. II/1–II/5.

[2] K.N. Prasad, W.C. Cole, and B. Kumar, Multiple antioxidants in theprevention and treatment of Parkinson’s disease, J. Am. Coll. Nutr.18 (1999), pp. 413–423.

[3] S.L. Nuttall, M.J. Kendall, and U. Martin, Antioxidant therapy forthe prevention of cardiovascular disease, Q. J. Med. 92 (1999),pp. 239–244.

[4] M. Dizdaroglu, P. Jaruga, M. Birincioglu, and H. Rodriguez,Free radical-induced damage to DNA: Mechanisms and measure-ment, Free Radic. Biol. Med. 32 (2002), pp. 1102–1115.


Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

[5] T. Ozawa, in Understanding the Process of Aging, E. Cadenas andL. Packer, eds., Marcel Dekker, New York, 1999, pp. 265–292.

[6] V. Grishko, M. Solomon, G.L. Wilson, S.P. LeDoux, and M.N.Gillespie, Oxygen radical-induced mitochondrial DNA damage andrepair in pulmonary vascular endothelial cell phenotypes, Am.J. Physiol. Lung Cell Mol. Physiol. 280 (2001), pp. L1300–L1308.

[7] M. Genestra, Oxyl radicals, redox-sensitive signalling cascades andantioxidants, Cell Signal. 19 (2007), pp. 1807–1809.

[8] R.C. Hubbard, F. Ogushi, G.A. Fells, A.M. Cantin, S. Jallat,M. Courtney, and R.G. Crystal, Oxidants spontaneously released byalveolar macrophages of cigarette smokers can inactivate the activesite of alpha 1-antitrypsin, rendering it ineffective as an inhibitor ofneutrophil elastase, J. Clin. Invest. 80 (1987), pp. 1289–1295.

[9] B.N. Ames, M.K. Shigenaga, and T.M. Hagen, Oxidants,antioxidants, and the degenerative diseases of aging, Proc. NatlAcad. Sci. USA 90 (1993), pp. 7915–7922.

[10] J.M.C. Gutteridge and B. Halliwell, Antioxidants in Nutrition,Health and Disease, Oxford University Press, Oxford, 1994.

[11] B. Halliwell, Antioxidants and human disease: A generalintroduction, Nutr. Rev. 55 (1997), pp. S44–S49.

[12] J.S. Wright, E.R. Johnson, and G.A. DiLabio, Predicting the activityof phenolic antioxidants: Theoretical method, analysis of substituenteffects, and application to major families of antioxidants, J. Am.Chem. Soc. 123 (2001), pp. 1173–1183.

[13] A.P. Vafiadis and E.G. Bakalbassis, A DFT study on thedeprotonation antioxidant . . . step of ortho-substituted phenoliccation radicals, Chem. Phys. 316 (2005), pp. 195–204.

[14] M. Musialik and G. Litwinienko, Scavenging of DPPH.: Radicalsby vitamin E is accelerated by its partial ionization: The role ofsequential proton loss electron transfer, Org. Lett. 7 (2005),pp. 4951–4954.

[15] J. Pokorny, Natural antioxidants for food use: A re-view, TrendsFood Sci. Technol. 2 (1991), pp. 223–227.

[16] Z. Judith and C. John, The Cardiovascular Cure: How to Strengthenyour Self-defense Against Heart Attack and Stroke, BroadwayBooks, New York, 2002.

[17] M. Serafini, J.A. Laranjinha, L.M. Almeida, and G. Maiani,Inhibition of human LDL lipid peroxidation by phenol-richbeverages and their impact on plasma total antioxidant capacityin humans, J. Nutr. Biochem. 11 (2000), pp. 585–590.

[18] M. Jang, L. Cai, G.O. Udeani, K.V. Slowing, C.F. Thomas,C.W.W. Beecher, H.H.S. Fong, N.R. Farnsworth, A.D. Kinghorn,R.G. Mehta, R.C. Moon, and J.M. Pezzuto, Cancer chemopreven-tive activity of resveratrol, a natural product derived from grapes,Science 275 (1997), pp. 218–220.

[19] O. Vieira, I. Escargueil-Blanc, O. Meilhac, J.P. Basile, J. Laranjinha,L. Almeida, R. Salvayre, and A. Negre-Salvayre, Effect of dietaryphenolic compounds on apoptosis of human cultured endothelialcells induced by oxidized LDL, Br. J. Pharmacol. 123 (1998),pp. 565–573.

[20] P.J. Magalhaes, D.O. Carvalho, J.M. Cruz, L.F. Guido, andA.A. Barros, Fundamentals and health benefits of xanthohumol, anatural product derived from hops and beer, Nat. Prod. Commun. 4(2009), pp. 591–610.

[21] E.N. Frankel, Lipid Oxidation, Oily Press, Bridgewater, UK, 2005.[22] E.N. Frankel, Antioxidants in Food and Biology, Oily Press,

Bridgewater, UK, 2007.[23] I. Nagamine, H. Sakurai, H.T.T. Nguyen, M. Miyahara,

J. Parkanyiova, Z. Reblova, and J. Pokorny, Vegetable oils andamino acids, Czech J. Food Sci. 22 (2004), pp. 155S–158S.

[24] J. Pokorny, in Natural Antioxidant Phenols, D. Boskou, ed.,Synpost, Trivandrum, India, 2006.

[25] T. Kajiyama and Y. Ohkatsu, Effect of meta-substituents of phenolicantioxidants-proposal of secondary substituent effect, Polym.Degrad. Stab. 75 (2002), pp. 535–542.

[26] T. Vasileva, K. Stanulov, and S. Nenkova, Phenolic antioxidants forfuels, J. Univ. Chem. Technol. Metall. 43 (2008), pp. 65–68.

[27] P. Liu and W. Long, Current mathematical methods used inQSAR/QSPR studies, Int. J. Mol. Sci. 10 (2009), pp. 1978–1998.

[28] A.M. Helguera, R.D. Combes, M.P. Gonzalez, and M.N. Cordeiro,Applications of 2D descriptors in drug design: A DRAGON tale,Curr. Top. Med. Chem. 8 (2008), pp. 1628–1655.

[29] M. Zhao, Z. Li, Y. Wu, Y.R. Tang, C. Wang, Z. Zhang, and S. Peng,Studies on log P, retention time and QSAR of 2-substitutedphenylnitronyl nitroxides as free radical scavengers, Eur. J. Med.Chem. 42 (2007), pp. 955–965.

[30] A.K. Calgarotto, S. Miotto, K.M. Honorio, A.B.F. Da Silva,S. Marangoni, J.L. Silva, M. Comar, Jr., K.M.T. Oliveirad, andS.L. Da Silva, A multivariate study on flavonoid compoundsscavenging the peroxynitrite free radical, J. Mol. Struct.:Theochem. 808 (2007), pp. 25–33.

[31] A.I. Khlebnikov, I.A. Schepetkin, N.G. Domina, L.N. Kirpotina,and M.T. Quinn, Improved quantitative structure–activity relation-ship models to predict antioxidant activity of flavonoids in chemical,enzymatic, and cellular systems, Bioorg. Med. Chem. 15 (2007),pp. 1749–1770.

[32] S. Ray, C. Sengupta, and K. Roy, QSAR modeling of antiradical andantioxidant activities of flavonoids using electrotopological stateatom (E-state) parameters, Cent. Eur. J. Chem. 5 (2007),pp. 1094–1113.

[33] S. Gupta, S. Matthew, P.M. Abreu, and J. Aires-de-Sousa, QSARanalysis of phenolic antioxidants using MOLMAP descriptors oflocal properties, Bioorg. Med. Chem. 14 (2006), pp. 1199–1206.

[34] I. Mitra, A. Saha, and K. Roy, QSAR modeling of antioxidantactivities of hydroxybenzalacetones using quantum chemical,physicochemical and spatial descriptors, Chem. Biol. Drug Des.73 (2009), pp. 526–536.

[35] I. Mitra, K. Roy, and A. Saha, QSAR of anti-lipid peroxidativeactivity of substituted benzodioxoles using chemometric tools,J. Comput. Chem. 30 (2009), pp. 2712–2722.

[36] I. Mitra, A. Saha, and K. Roy, Pharmacophore mapping ofarylamino substituted benzo[b]thiophenes as free radical scaven-gers, J. Mol. Model. 16 (2010), pp. 1585–1596.

[37] T. Kajiyama and Y. Ohkatsu, Effect of para-substituents of phenolicantioxidants, Polym. Degrad. Stab. 71 (2001), pp. 445–452.

[38] T. Matsuura and Y. Ohkatsu, Phenolic antioxidants: Effect ofo-benzyl substituents, Polym. Degrad. Stab. 70 (2000), pp. 59–63.

[39] GaussView3.0, Semichem Inc., Gaussian Inc., Pittsburgh, PA, USA,2003.

[40] M.J. Frisch, G.W. Trucks, H.B. Schlegel, G.E. Scuseria, M.A. Robb,J.R. Cheeseman, J.A.J Montgomery, J.T. Vreven, K.N. Kudin,J.C. Burant, J.M. Millam, S.S. Iyengar, J. Tomasi, V. Barone,B. Mennucci, M. Cossi, G. Scalmani, N. Rega, G.A. Petersson,H. Nakatsuji, M. Hada, M. Ehara, K. Toyota, R. Fukuda,J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao,H. Nakai, M. Klene, X. Li, J.E. Knox, H.P. Hratchian, J.B. Cross,C. Adamo, J. Jaramillo, R. Gomperts, R.E. Stratmann, O. Yazyev,A.J. Austin, R. Cammi, C. Pomelli, J.W. Ochterski, P.Y. Ayala,K. Morokuma, G.A. Voth, P. Salvador, J.J. Dannenberg,V.G. Zakrzewski, S. Dapprich, A.D. Daniels, M.C. Strain,O. Farkas, D.K. Malick, A.D. Rabuck, K. Raghavachari,J.B. Foresman, J.V. Ortiz, Q. Cui, A.G. Baboul, S. Clifford,J. Cioslowski, B.B. Stefanov, G. Liu, A. Liashenko, P. Piskorz,I. Komaromi, R.L. Martin, D.J. Fox, T. Keith, M.A. Al-Laham,C.Y. Peng, A. Nanayakkara, M. Challacombe, P.M.W. Gill,B. Johnson, W. Chen, M.W. Wong, C. Gonzalez, and J.A. PopleGAUSSIAN 03, Revision B.05, Gaussian Inc., Pittsburgh, PA, 2003.

[41] K. Roy and P.L.A. Popelier, Predictive QSPR modeling of acidicdissociation constant (pKa) of phenols in different solvents, J. Phys.Org. Chem. 22 (2009), pp. 186–196.

[42] K. Roy and P.L.A. Popelier, Exploring predictive QSAR modelsusing quantum topological molecular similarity (QTMS) descriptorsfor toxicity of nitroaromatics to Saccharomyces cerevisiae, QSARComb. Sci. 27 (2008), pp. 1006–1012.

[43] P.L.A. Popelier, Quantum molecular similarity. 1. BCP space,J. Phys. Chem. A 103 (1999), pp. 2883–2890.

[44] R.F.W. Bader and H.J.T. Preston, The kinetic energy ofmolecular . . .molecular stability, Int. J. Quantum Chem. 3 (1969),pp. 327–347.

[45] S.T. Howard and O. Lamarche, Description of covalent bond ordersusing the charge density topology, J. Phys. Org. Chem. 16 (2003),pp. 133–141.

[46] R.F.W. Bader, T.S. Slee, D. Cremer, and E. Kraka, Description ofconjugation and hyperconjugation in terms of electron distributions,J. Am. Chem. Soc. 105 (1983), pp. 5061–5068.

I. Mitra et al.412

Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

[47] J.T. Leonard and K. Roy, On selection of training and test sets forthe development of predictive QSAR models, QSAR Comb. Sci. 25(2006), pp. 235–251.

[48] SPSS, standard version 1998, is statistical software of SPSS Inc.,Chicago, IL; software available at http://www.spss.com.

[49] D. Rogers and A.J. Hopfinger, Application of genetic functionapproximation to quantitative structure–activity relationship andquantitative structure–property relationship, J. Chem. Inf. Comput.Sci. 34 (1994), pp. 854–866.

[50] A. Fraser, Simulation of genetic systems by automatic digitalcomputers. I. Introduction, Aust. J. Biol. Sci. 10 (1957),pp. 484–491.

[51] S. Wold, M. Sjostrom, and L. Eriksson, PLS-regression: A basic toolof chemometrics, Chemom. Intell. Lab. Syst. 58 (2001),pp. 109–130.

[52] G.W. Snedecor and W.G. Cochran, Statistical Methods, Oxford &IBH, New Delhi, 1967.

[53] A. Golbraikh and A. Tropsha, Beware of q2!, J. Mol. Graph. Model.20 (2002), pp. 269–276.

[54] P.P. Roy and K. Roy, On some aspects of variable selection forpartial least squares regression models, QSAR Comb. Sci. 27(2008), pp. 302–313.

[55] S. Wold and L. Eriksson, Validation tools, in Chemometric Methodsin Molecular Design, H. van de Waterbeemd, ed., VCH, Weinheim,1995, pp. 312–317.

[56] A.K. Debnath, Quantitative structure–activity relationship(QSAR): A versatile tool in drug design, in Combinatorial LibraryDesign and Evaluation, A.K. Ghose, and V.N. Viswanadhan, eds.,Marcel Dekker, New York, 2001, pp. 73–129.

[57] K. Roy, On some aspects of validation of predictive QSAR models,Expert Opin. Drug Discov. 2 (2007), pp. 1567–1577.

[58] P.P. Roy and K. Roy, Comparative QSAR studies of CYP1A2inhibitor flavonoids using 2D and 3D descriptors, Chem. Biol. DrugDes. 72 (2008), pp. 370–382.

[59] P.P. Roy, S. Paul, I. Mitra, and K. Roy, On two novel parameters forvalidation of predictive QSAR models, Molecules 14 (2009),pp. 1660–1701.

[60] A.A. Toropov, A.P. Toropova, and E. Benfenati, QSPR modelingbioconcentration factor (BCF) by balance of correlations, Eur.J. Med. Chem. 44 (2009), pp. 2544–2551.

[61] P. Lu, X. Wei, and R. Zhang, CoMFA and CoMSIA 3D-QSARstudies on quionolone caroxylic acid derivatives inhibitors of HIV-1integrase, Eur. J. Med. Chem. 45 (2010), pp. 3413–3419.

[62] I. Mitra, A. Saha, and K. Roy, Exploring quantitative structure–activity relationship (QSAR) studies of antioxidant phenoliccompounds obtained from traditional Chinese medicinal plants,Mol. Simul. 36 (2010), pp. 1067–1079.

[63] P. Gramatica, Principles of QSAR models validation: Internal andexternal, QSAR Comb. Sci. 26 (2007), pp. 694–701.

[64] L. Eriksson, J. Jaworska, A.P. Worth, M.T. Cronin, R.M. McDowell,and P. Gramatica, Methods for reliability and uncertaintyassessment and for applicability evaluations of classification- andregression-based QSARs, Environ. Health Perspect. 111 (2003),pp. 1361–1375.

[65] L. Zhang, H. Zhu, T. Oprea, A. Golbraikh, and A. Tropsha, QSARmodeling of the blood–brain barrier permeability for diverseorganic compounds, Pharm. Res. 25 (2008), pp. 1902–1914.

[66] SIMCA-P version 10.0.2.0, 2002, is a product of UMETRICS,Umea, Sweden, [email protected], www.umetrics.com.

[67] ChemDraw Ultra version 5.0, 1999, is a product of Cambridge softCorporation, Cambridge, MA 02140, USA; available athttp://www.camsoft.com/support.html.


Dow

nloa

ded

by [

Mos

kow

Sta

te U

niv

Bib

liote

] at

09:

53 1

8 Fe

brua

ry 2

014

Date post:	21-Dec-2016
Category:	Documents
Upload:	kunal
View:	212 times
Download:	0 times

QSPR of antioxidant phenolic compounds using quantum chemical descriptors

Documents