This article was downloaded by: [Moskow State Univ Bibliote]On: 18 February 2014, At: 09:53Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK
Molecular SimulationPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/gmos20
QSPR of antioxidant phenolic compounds usingquantum chemical descriptorsIndrani Mitra a , Achintya Saha b & Kunal Roy aa Division of Medicinal and Pharmaceutical Chemistry, Drug Theoretics and CheminformaticsLab, Department of Pharmaceutical Technology , Jadavpur University , Kolkata, 700032,Indiab Department of Chemical Technology , University College of Science and Technology,University of Calcutta , 92, A.P.C. Road, Kolkata, 700009, IndiaPublished online: 13 Apr 2011.
To cite this article: Indrani Mitra , Achintya Saha & Kunal Roy (2011) QSPR of antioxidant phenolic compounds using quantumchemical descriptors, Molecular Simulation, 37:05, 394-413
To link to this article: http://dx.doi.org/10.1080/08927022.2010.543980
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
QSPR of antioxidant phenolic compounds using quantum chemical descriptors
Indrani Mitraa, Achintya Sahab and Kunal Roya*
aDivision of Medicinal and Pharmaceutical Chemistry, Drug Theoretics and Cheminformatics Lab, Department of PharmaceuticalTechnology, Jadavpur University, Kolkata 700032, India; bDepartment of Chemical Technology, University College of Science andTechnology, University of Calcutta, 92, A.P.C. Road, Kolkata 700009, India
(Received 8 October 2010; final version received 11 November 2010)
Accelerated systemic free radical production poses a serious problem to healthy living. Since long, phenolic antioxidantshave been studied for their ability to react with these toxic radicals. The present work deals with a series of substitutedphenolic derivatives with a wide range of antioxidant property data. Quantitative structure–property relationship modelshave been developed correlating the antioxidant properties of these molecules with quantum chemical descriptors such asMulliken charges of the common atoms and quantum topological molecular similarity indices of the common bonds. Modelswere developed based on the training set compounds, and were subsequently validated externally using the test setmolecules. The results infer that substituents having a positive mesomeric effect increase the electron density over thephenolic oxygen and reduce the aromatic delocalisation of the lone pair of electron on the phenolic oxygen, thereby enablingeffective interaction of the phenolic proton with the free radicals. Moreover, in addition to the mesomeric effect, theinductive effect of the different substituents also plays a crucial role for maintaining the overall charge distribution on thephenolic nucleus. On the basis of the predictive power and interpretability of the models, they may be further utilised forthe design of more potent antioxidant molecules.
Keywords: QTMS; QSPR; phenolic derivatives; antioxidants
1. Introduction
Nocive free radicals have been implicated in the
exacerbation of a number of oxidative stress-related
diseases such as Parkinson’s disease [1], Alzheimer
disease [2], atherosclerosis and cardiovascular damage
[3]. These free radicals easily react with vital molecules in
the body, such as DNA, causing mutations in the sequence
of the genetic material [4]. The accumulation of changes is
then thought to lead to the development of ageing and
several degenerative diseases. Free radicals can damage
nucleic acids, proteins or lipids resulting in the breakage of
DNA strands and in the production of toxic carbonyls.
Lipid peroxidation of polyunsaturated fatty acids exposed
to oxygen leads to rancidity in foods. In living animal cells,
peroxidised membranes lose their permeability, becoming
rigid, reactive and non-functional. The mitochondrial
theory of ageing [5] postulates that damage to mitochon-
drial DNA and organelles by free radicals leads to loss of
mitochondrial function and loss of cellular energy [6].
These free radicals are produced from both endogenous
and exogenous sources. Endogenously, free radicals are
produced constitutively during metabolic reactions [7].
They are produced as by-products during autooxidation of
catecholamines, haemoglobin, myoglobin, reduced cyto-
chrome C and thiol. However, exogenous sources of free
radicals include air pollution, of which industrial waste
and cigarette smoke are major contributors. Cigarette
smoke itself bristles with oxidants [8], while radiation as
well as several trace metals (lead, mercury, iron and
copper) constitutes the major sources of free radical
generation [9].
In order to combat the systemic free radicals, the body
has its own defence mechanism, which comprises several
endogenous antioxidant enzymes. Antioxidants [10] are
molecules that are capable of neutralising the free radicals
by breaking the free radical chain reaction and by
chelating with the metal ions that catalyse these chain
reactions. Chain reactions [11] involving free radicals
proceed through the steps of initiation (reactions of free
radicals with stable species to form more free radicals),
followed by propagation (free radicals thus formed react
with more unsaturated lipid molecules) and termination
(two free radicals combine to form a stable molecule). The
antioxidants accelerate the process of termination, thus
inhibiting the propagation of the reaction. At the molecular
level, the antioxidants interact with the free radicals based
on three primary mechanisms: (a) hydrogen atom transfer;
(b) single-electron transfer followed by proton transfer and
(c) sequential proton loss electron transfer [12–14]. When
the systemic antioxidants fail to manage the huge pool of
free radicals, external supplementation of antioxidants
becomes essential. Fruits and vegetables serve as surplus
sources of antioxidants [15]. Researchers revealed that
compounds belonging to a variety of chemical class
ISSN 0892-7022 print/ISSN 1029-0435 online
q 2011 Taylor & Francis
DOI: 10.1080/08927022.2010.543980
http://www.informaworld.com
*Corresponding author. Email: [email protected]
Molecular Simulation
Vol. 37, No. 5, April 2011, 394–413
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014
(phenolic compounds, flavonoids, lycopene, carotenoids,
etc.) exhibit antioxidant property. The free radical
quenching ability of phenolic compounds arises, because
of both their acidity (ability to donate protons) and their
delocalised p-electrons (ability to transfer electrons while
remaining relatively stable) characteristic of benzene
rings. Vitamin E is the most well-known physiological
phenolic antioxidant. The most abundant polyphenols are
the condensed tannins, found in virtually all families of
plants, and comprises up to 50% of the dry weight of
leaves. Tea, cereal grains, nuts, grapes, pears, plums,
berries, etc. are rich in phenolic compounds that include
biphenyls, flavonoids and phenolic acids. The beneficial
effects of polyphenols to the higher animal species include
reduction in inflammatory effects such as coronary artery
disease [16] and down regulation of oxidative low-density
lipoprotein [17]. Resveratrol has been reported to inhibit
occurrence and/or growth of experimental tumours [18].
Another potential effect of the phenolic antioxidants is
their anti-ageing consequences like slowing the process of
skin wrinkling [19]. Polyphenols may also bind with
non-haem iron (e.g. from plant sources) in vitro in model
systems [20] possibly reducing its absorption. Apart from
their biological importance, antioxidants are also useful in
preserving food materials and for other purposes. Phenolic
antioxidants react with lipid peroxy free radicals, produced
as a result of lipid oxidation [21] and thereby prevent
further autooxidation of lipid molecules [22]. The
inhibition of lipid autooxidation is necessary for storage
or heating of foods as well as for reduction of the oxidation
of lipids after ingestion and absorption through the
intestinal wall. In course of reaction with the free radicals,
the phenolic antioxidant forms a free radical itself, which
is more stable than a lipidoxy or lipid peroxy radical. The
antioxidant free radical thus formed may undergo either of
the two different reactions: (i) it reacts with another
antioxidant free radical forming a dimmer that itself
possesses an antioxidant activity or (ii) it may form co-
polymer by reacting with a lipidoxy or lipid peroxy radical
or a biradical that is rapidly isomerised into a quinine
[23,24]. Again, chemicals like the petroleum products are
unstable in presence of oxygen and they decompose
through autooxidation [25]. The liquid phase oxidation of
hydrocarbons proceeds through a chain radical mechanism
with chain development and ramification. Thus, phenolic
antioxidants with tertiary alkyl substituents at 2 and 6
positions can also be used for the improvement of fuel
chemical stability and inhibition of hydrocarbon radical
chain oxidation [26].
Viewing the immense utility of antioxidants to the
healthy living of mankind, researchers were oriented
towards the design and development of synthetic chemical
entities with potent antioxidant property. In the aspect
of design of active chemical moieties, the quantitative
structure – property relationship (QSPR) technique
executes a key role [27]. The QSPR method employs a
correlation between the structures of chemical compounds
and their property data. Since the property of a chemical
compound is a function of its structural attributes and
physico-chemical characters, a QSPR model can aptly be
represented by a mathematical equation exhibiting a
relationship between the property data and the descriptors
(numerical representation of molecular characteristics
based on their structure) [28]. Several such models, based
on the antioxidant property exerted by a variety of chemical
entities, have been reported by different authors. Zhao et al.
[29] used the conventional Hansch method to develop
quantitative structure – activity relationship (QSAR)
models for 2-substituted phenylnitronyl nitroxides as free
radical scavengers. Calgarotto et al. [30] also performed a
multivariate study on 24 flavonoid compounds for their
peroxynitrite free radical scavenging activity. Partial least
squares (PLS) regression technique and partial 3D
comparison of molecules by frontal polygon method were
performed by Khlebnikov et al. [31] to develop a QSAR
model of flavonoid antioxidants. Using electrotopological
state (E state) atom parameters, the antiradical and
antioxidant activities of flavonoids were modelled by Ray
et al. [32]. MOLMAP descriptors of local properties were
utilised by Gupta et al. [33] for QSAR analysis of
antioxidant activity of phenolic compounds. Recently,
Mitra et al. [34–36] have reported several QSAR models
developed for the antioxidant activities of different
chemical entities such as hydroxybenzalacetone deriva-
tives [34], benzodioxoles [35] and benzothiophenes [36],
based on a wide category of descriptors using various
chemometric tools. The present work deals with a series of
phenolic derivatives that have been reported to exert
antioxidant action. QSPR models have been developed
using quantum chemical descriptors and employing
different statistical tools. The predictive performance of
the models has been judged based on various validation
measures, ensuring that the models developed can be
utilised for predicting the antioxidant property of new
chemical entities belonging to the class of phenolic
derivatives. However, the developed models being limited
to a congeneric data set of parent phenolic moiety with
identical chemical features, only phenolic compounds can
be designed and assessed further for antioxidant property
using the QSPR models developed in this work. The
observations depicted in this study can be used aptly in
optimisation studies of compounds similar to those used in
the work with the aim to improve their antioxidant activity.
2. Materials and methods
2.1 The data set
The model data set used for this work was obtained by
clubbing three different data sets reported by Kajiyama
Molecular Simulation 395
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014
and Ohkatsu [25,37] and Matsuura and Ohkatsu [38]. The
whole data set comprises 65 phenolic antioxidants with
different substituents at ortho, meta and para positions of
the parent phenolic moiety. The antioxidant properties of
these molecules were measured by an oxygen absorption
method, where autooxidation of styrene was measured
in the presence of the phenolic derivatives and their rates
of peroxy radical trapping activities were measured [38].
The antioxidant activity of these phenolic derivatives was
measured based on their rate of oxygen uptake, which is
inversely proportional to the concentration of the phenolic
antioxidants [38]. For the development of the QSPR
models, the oxygen uptake rates, ½ð2d½O2�=dtÞinh
ð£1026 M21 s21Þ� (Table 1), were converted to the
negative logarithmic scale [log 103/[2d[O2]/dt]. The
molecular structures of these compounds as well as their
antioxidant property data are summarised in Table 1.
Although there are other available data sets comprising
phenolic derivatives with antioxidant activity, only three
datasets [25,37,38] as mentioned above were clubbed
together considering that similar experimental protocols
were adopted in these reports for measuring the
antioxidant activity of this series of phenolic derivatives.
Due to marked variations in the end point measurement
techniques, other available data sets of antioxidant
phenolic compounds were not included in this work.
2.2 Descriptor calculation
Two types of QSPR models were developed in this work
based on the two categories of quantum chemical
descriptors, viz. Mulliken charges of the common atoms
(Figure 1) and quantum topological molecular similarity
(QTMS) descriptors (Table 2). The QTMS descriptors refer
to varying characteristics of a chemical bond due to a
variation in the electron density of the participating atoms.
For the calculation of the Mulliken atomic charges,
molecules were prepared using the Gauss View 3.0
software [39], and subsequently the energy minimisation of
the molecules was carried out using the GAUSSIAN03W
software [40]. Energy was calculated at three different
levels of theory: (i) the semi-empirical AM1 method, (ii)
the Hartree–Fock method at HF/3-21G(d) level and
(iii) Hartree–Fock method at HF/6-31G(d) level [40]. The
output from each level was used as the input for the next
level. The charges of the common atoms thus calculated at
each level were then correlated with the antioxidant
property data of the compounds involved in this study.
Although charges were calculated at three levels of theory
(as indicated above), statistically significant QSPR models
were obtained based on the charges calculated at the
HF/6-31G(d) level only. Hence, charges obtained at this
level were only used for final QSPR model development.
Similarly, the QTMS descriptors were also calculated at the
HF/6-31G(d) level [40] based on the wave functions
generated at this level, since the bond critical point (BCP)
descriptors at this level gave better predictive models and
significantly consistent results compared to those calcu-
lated at lower levels of theory.
In the calculation of the BCP properties, seven types
of descriptors (r, 72r, l, 1, K, G and equilibrium bond
lengths) were calculated for each of the bonds connecting
the adjacent common atoms. The details of QTMS
descriptors can be found in some publications of Roy and
Popelier [41,42]. In a nutshell, QTMS descriptors focus on
BCPs that occur when the gradient of electron density
vanishes (7r ¼ 0) at some point between the two bonded
nuclei. Between each pair of bonded atoms, there exists a
pathway of charge density called bond path. Along this
path, there is a point of minimum electron density in the
plane of the bond path, but with maximum electron density
in the plane perpendicular, and this is referred to as the
saddle point or the BCP. For each of the molecules sharing
a common skeleton, properties are calculated at each BCP
formed by the common atoms. At a BCP, the Hessian
[43,44] of r has two negative eigenvalues (l1 , l2 , 0)
and one positive value (l3 . 0). Eigenvalues express local
curvature of r in a point: negative eigenvalues are
curvatures perpendicular to the bond, while the positive
eigenvalue measures the curvature along the bond. If the
positive eigenvalue l3 dominates, electron density is
accumulated along the bond path towards the nuclei.
However, if the negative eigenvalues dominate, electron
density accumulation in the plane perpendicular to the
bond path is prominent. This reflects the large charge
build-up between two bonded nuclei, which is reminiscent
of covalent bonding. The descriptor l3 gives a measure of
the s character of a bond, while the degree of p character
is measured by the summation of values of l1 þ l2 [45].
The Laplacian, denoted by 72r, refers to the sum of
eigenvalues and is a measure of how much r is
concentrated (72r , 0) or depleted (72r . 0) in a point.
Another descriptor in this series is the ellipticity of a bond,
which also measures the degree of p character of a bond
together with the susceptibility of the ring bonds to rupture
and is defined as 1 ¼ ðl1=l2Þ2 1. In the QTMS bond
descriptor vector, there are two more components: the
kinetic energy density KðrÞ and a more classical kinetic
energy GðrÞ [44]. Interpreting KðrÞ in chemical terms is
not straightforward; however, useful formulas describing
its link to the Laplacian and the ‘more classical kinetic
energy’ GðrÞ can be found [44]. Additionally, the
equilibrium bond length (Re) has also been used as one
of the descriptors along with other QTMS descriptors. It
was reported that the BCP descriptors were successful in
translating the predicted electronic effects of orbital
theories into observable consequences of variation in
bond electron densities [46]. In particular, BCP properties
detect conjugation, subtle delocalisation effects and
I. Mitra et al.396
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014
Tab
le1
.S
tru
ctu
res
of
the
ph
eno
lic
der
ivat
ives
wit
hth
eir
ob
serv
edan
dp
red
icte
d/c
alcu
late
dan
tio
xid
ant
pro
per
tyd
ata.
Par
ent
mo
iety
Sl.
no
.S
ub
stit
uen
ts
(2d
[O2]/
dt)
inh
(£
102
6M
21
s21)
[Ref
s[2
5,3
7,3
8]]
log
10
3
2
� d½O
2�
dt
�P
rop
erty
a
pre
dic
ted
/ca
lcu
late
d
Pro
per
tyb
pre
dic
ted
/ca
lcu
late
d
Pro
per
tyc
pre
dic
ted
/ca
lcu
late
d
Pro
per
tyd
pre
dic
ted
/ca
lcu
late
d
OH
R
1e
ZO
CH
32
51
.60
21
.88
31
.88
41
.80
61
.83
82
ZO
CH
2C
H3
25
1.6
02
1.8
46
1.8
32
1.8
04
1.8
44
3Z
OC
H(C
H3) 2
5.6
2.2
52
2.0
86
2.0
67
1.8
18
1.9
05
OH
R
4Z
OC
H3
1.6
2.7
96
2.9
13
3.0
52
3.0
30
2.9
18
5Z
OC
H2C
H3
0.8
3.0
97
2.9
68
3.0
87
3.0
41
2.9
24
6Z
OC
H(C
H3) 2
1.6
2.7
96
2.7
30
2.7
79
2.5
98
2.6
52
7Z
OC
(CH
3) 3
3.3
2.4
81
2.7
36
2.7
79
2.6
52
2.6
66
OH
R
OC
H3
8Z
OC
H3
4.0
2.3
98
2.3
37
2.3
16
2.4
73
2.3
04
9e
ZO
CH
2C
H3
3.2
2.4
95
2.3
50
2.3
29
2.4
91
2.3
09
10
eZ
OC
H(C
H3) 2
4.5
2.3
47
2.3
07
2.3
09
2.3
84
2.3
58
11
eZ
CH
38
.42
.07
62
.05
22
.04
42
.06
41
.96
612
ZC
H2C
H3
9.4
2.0
27
2.0
49
2.0
42
2.0
30
1.9
21
13
ZC
H(C
H3) 2
9.2
2.0
36
2.0
44
2.0
37
2.0
19
1.8
72
14
ZC
(CH
3) 3
10
.11
.99
62
.06
82
.06
22
.09
71
.92
315
ZC
H2C
Hv
CH
29
.52
.02
22
.01
52
.01
31
.98
11
.88
416
ZC
Hv
CH
CH
36
.42
.19
41
.98
21
.98
72
.04
11
.88
617
ZC
H2O
H1
0.1
1.9
96
1.9
98
1.9
90
1.9
46
1.9
40
18
ZC
Hv
CH
28
.92
.05
11
.91
91
.93
01
.97
51
.83
019
eZ
Cl
9.9
2.0
04
1.6
82
1.6
81
1.8
62
2.0
54
20
eZ
CHv
CH
CO
OC
H3
11
.41
.94
31
.78
81
.83
71
.97
41
.71
021
ZC
OC
H3
25
1.6
02
1.6
32
1.6
82
1.5
66
1.5
76
22
eZ
CO
OC
H3
25
1.6
02
1.6
29
1.6
76
1.6
32
1.6
72
23
ZC
OO
C2H
52
51
.60
21
.63
71
.68
21
.63
51
.66
824
ZC
N2
51
.60
21
.57
71
.63
61
.73
51
.88
025
eZ
CH
O2
51
.60
21
.61
01
.67
71
.45
51
.54
626
ZN
O2
25
1.6
02
1.5
81
1.6
70
1.4
21
1.8
58
Molecular Simulation 397
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014
Tab
le1
–continued
Par
ent
mo
iety
Sl.
no
.S
ub
stit
uen
ts
(2d
[O2]/
dt)
inh
(£
102
6M
21
s21)
[Ref
s[2
5,3
7,3
8]]
log
10
3
2
� d½O
2�
dt
�P
rop
erty
a
pre
dic
ted
/ca
lcu
late
d
Pro
per
tyb
pre
dic
ted
/ca
lcu
late
d
Pro
per
tyc
pre
dic
ted
/ca
lcu
late
d
Pro
per
tyd
pre
dic
ted
/ca
lcu
late
d
OH
R
27
ZN
(CH
3) 2
5.5
2.2
60
2.2
88
2.1
45
2.0
81
2.1
86
28
ZN
H2
6.2
2.2
08
2.1
02
2.0
32
1.9
43
1.9
66
29
eZ
CH
38
.82
.05
62
.01
62
.04
22
.16
82
.39
030
eZ
CH
2C
H3
5.5
2.2
60
2.0
30
2.0
51
2.1
64
2.3
58
31
ZC
H(C
H3) 2
5.4
2.2
68
2.0
46
2.0
53
2.1
67
2.3
13
32
eZ
C(C
H3) 3
6.1
2.2
15
2.1
04
2.0
86
2.1
92
2.2
96
33
ZO
CH
32
51
.60
21
.70
71
.67
01
.76
91
.62
134
eZ
OC
H2C
H3
25
1.6
02
1.7
18
1.6
87
1.8
01
1.6
67
35
ZO
CH
(CH
3) 2
25
1.6
02
1.7
23
1.7
08
1.8
15
1.6
89
36
ZO
CH
2C
6H
52
51
.60
21
.81
41
.85
51
.71
21
.79
837
eZ
OC
6H
52
51
.60
21
.70
31
.64
41
.58
61
.40
238
ZC
OO
CH
32
51
.60
21
.83
41
.84
11
.87
82
.00
939
ZC
l2
51
.60
21
.64
11
.67
21
.50
51
.39
640
eZ
NO
22
51
.60
21
.65
11
.63
31
.31
51
.32
6O
H
R
OC
H3
41
ZN
(CH
3) 2
3.3
2.4
81
1.9
50
1.9
60
2.2
24
2.0
82
42
ZN
H2
4.6
2.3
37
1.9
62
1.9
52
2.1
15
2.0
13
43
ZO
CH
2C
6H
58
.52
.07
11
.88
71
.92
92
.22
62
.20
744
ZO
CH
38
.62
.06
62
.01
32
.04
92
.08
82
.05
945
ZO
CH
2C
H3
9.4
2.0
27
2.0
15
2.0
49
2.1
01
2.0
64
46
ZO
CH
(CH
3) 2
8.6
2.0
66
2.0
21
2.0
52
2.1
31
2.0
73
47
ZC
H3
25
1.6
02
1.9
34
1.9
39
1.9
61
1.9
10
48
ZC
(CH
3) 3
25
1.6
02
1.9
00
1.8
93
2.0
41
1.8
85
49
eZ
H2
51
.60
21
.88
21
.88
41
.80
61
.83
850
ZC
H2O
H2
51
.60
21
.90
01
.90
81
.92
91
.87
751
ZC
Hv
CH
CO
OC
H3
25
1.6
02
1.7
61
1.7
26
1.8
63
1.5
77
52
eZ
CO
OC
H3
25
1.6
02
1.6
95
1.6
63
1.7
55
1.6
67
53
ZC
HO
25
1.6
02
1.6
76
1.6
54
1.6
00
1.5
44
54
ZN
O2
25
1.6
02
1.3
43
1.3
27
1.6
10
1.7
71
OH
CH
2
55
2.2
2.6
58
2.5
44
2.5
07
2.6
47
2.6
84
56
e1
.72
.77
02
.54
62
.50
82
.64
42
.68
1
57
CH
3
2.1
2.6
78
2.7
18
2.6
54
2.6
68
2.6
55
I. Mitra et al.398
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014
Tab
le1
–continued
Par
ent
mo
iety
Sl.
no
.S
ub
stit
uen
ts
(2d
[O2]/
dt)
inh
(£
102
6M
21
s21)
[Ref
s[2
5,3
7,3
8]]
log
10
3
2
� d½O
2�
dt
�P
rop
erty
a
pre
dic
ted
/ca
lcu
late
d
Pro
per
tyb
pre
dic
ted
/ca
lcu
late
d
Pro
per
tyc
pre
dic
ted
/ca
lcu
late
d
Pro
per
tyd
pre
dic
ted
/ca
lcu
late
d
58
H3C
2.0
2.6
99
2.5
38
2.5
02
2.6
40
2.6
81
59
OC
H3
2.1
2.6
78
2.5
43
2.5
06
2.6
56
2.6
93
60
F1
.92
.72
12
.76
02
.69
02
.66
82
.64
4
61
F2
.02
.69
92
.73
42
.66
82
.67
12
.67
4
62
e
Br
1.5
2.8
24
2.7
60
2.6
93
2.6
50
2.6
16
63
Br
3.1
2.5
09
2.5
68
2.5
25
2.6
28
2.6
57
64
Cl
2.5
2.6
02
2.7
60
2.6
93
2.6
53
2.6
21
Molecular Simulation 399
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014
hyperconjugation. Thus, each molecule is represented by
just a handful of numbers, being the components of the
vectors describing the molecule’s bonds. As a result,
similarity measures are reduced to discrete distance-like
measures in the BCP space without losing their quantum
mechanical basis.
2.3 Splitting of the data set
Any QSAR modelling should ultimately lead to statisti-
cally robust models capable of making reliable predictions
of activities of compounds. When QSAR/QSPR models
are developed, it is important to validate any fitted model
to check that their predictions will be carried over to fresh
data not used in the model fitting exercise. The validation
strategies check the reliability of the developed models for
their possible application on a new set of data, and
confidence of prediction can thus be judged. Often, since
truly external data points are unavailable for prediction
purpose, original dataset compounds are divided into
training and test sets. This strategy has been used in this
work by splitting the data set of 65 compounds into a
training set and a test set. The former has been used for the
purpose of development of models, and the latter has been
used to check the goodness of predictions from the derived
models.
For the purpose of external validation, the model data
set was divided into a training set of 46 compounds (75%
of the total number of compounds), and a test set of 19
compounds (25% of the whole set). This procedure is
generally performed in cases, where enough new
chemicals are not available for examining the predictive
ability and the robustness of the developed model. Hence,
in such cases, the training set is used for the QSPR model
development, while the test set compounds are utilised for
ensuring the reliability of the developed model. The
selection of the training set plays a key role in the process
of development of a statistically significant QSPR model.
This is because the developed model captures the features
of the training set molecules, and a compound structurally
similar to the training set molecules is predicted well,
since it contains the features captured by the developed
model. On the contrary, a compound significantly
dissimilar from the training set molecules suffers fromTab
le1
–continued
Par
ent
mo
iety
Sl.
no
.S
ub
stit
uen
ts
(2d
[O2]/
dt)
inh
(£
102
6M
21
s21)
[Ref
s[2
5,3
7,3
8]]
log
10
3
2
� d½O
2�
dt
�P
rop
erty
a
pre
dic
ted
/ca
lcu
late
d
Pro
per
tyb
pre
dic
ted
/ca
lcu
late
d
Pro
per
tyc
pre
dic
ted
/ca
lcu
late
d
Pro
per
tyd
pre
dic
ted
/ca
lcu
late
d
65
eC
l2
.12
.67
82
.56
12
.52
02
.62
42
.65
8
No
tes:
aP
roper
typre
dic
ted/c
alcu
late
dac
cord
ing
toE
quat
ion
(6).
bP
roper
typre
dic
ted/c
alcu
late
dac
cord
ing
toE
quat
ion
(7).
cP
roper
typre
dic
ted/c
alcu
late
dac
cord
ing
toE
quat
ion
(8).
dP
roper
typre
dic
ted/c
alcu
late
dac
cord
ing
toE
quat
ion
(9).
eT
est
set
com
po
und
s.
Figure 1. Parent phenolic moiety showing the common atoms.
I. Mitra et al.400
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014
poor prediction [47]. Thus, the splitting should be such that
the training set compounds encompass the chemical
features of the whole data set and span the entire descriptor
space. In order to achieve uniformity in the distribution of
molecules into training and test sets, the selection of a
training set for the present work has been performed based
on activity ranking of the molecules comprising the
data set. Thus, the molecules were ranked in the ascending
order of their antioxidant property profile, and every third
compound was selected as the test set starting from the first
ranked compound. Such ranking based on activity data
may ensure that each set bears molecules capturing all the
different molecular features of the entire data set. In order
to assess the ability of the activity ranking method to attain
such similarity-based classification, a principal component
analysis (PCA) score plot, showing the distribution of the
training and the test set compounds in the 3D space, was
analysed. The plot was obtained using the first three
principal components of the QTMS descriptor matrix,
calculated based on the factor analysis approach using the
SPSS software [48]. The plot (Figure 2) shows that each of
the test set compounds is located in close vicinity to at
least one training set compound in the 3D space, thereby
capturing the features of the modelling set. Thus, the data
set has been aptly classified from the aspect of the
molecular similarity approach.
2.4 Chemometric tools utilised for the present work
For the development of the QSPR models, two different
chemometric tools were employed, namely genetic
function approximation (GFA) method and the genetic
PLS (G/PLS) technique. Both the techniques were utilised
for developing models with each of the two sets of
descriptors viz. quantum chemical and QTMS descriptors.
The GFA [49,50] technique is a combination of two
different algorithms: (i) Holland’s genetic algorithm and
(ii) Friedman’s multivariate adaptive regression splines
algorithm. In this technique, an initial population of
equations is generated by random selection of descriptors,
which is then followed by random crossover between pairs
of equations from the initial population, resulting in the
formation of new progeny equations. A parameter referred
to as the ‘lack of fit’ (LOF) value measures the fitness of
the developed model, and the models are ranked according
to this fitness value. Models of higher significance exhibit
lower LOF values.
LOF ¼LSE
1 2cþ d £ p
m
� �2; ð1Þ
where, LSE is the least square error, c is the number of
basis functions, d is the smoothing parameter that was set
at the default value of 1, p is the number of descriptors and
m is the number of observations in the training set. In
effect, ‘d’ is the user’s estimate of how much detail in the
training data set is worth modelling. Smaller equations are
obtained for larger values of ‘d’. The large number of
equations formed by this technique result in a range of
variations during crossover, thereby providing added
information on the quality of fit and importance of the
descriptors. GFA builds models not only with linear
polynomials, but also uses higher-order polynomials,
splines and other nonlinear functions.
The G/PLS [50,51] method was derived from a
combination of the two methods: GFA and PLS
regression. The GFA technique is employed for selecting
the appropriate basis functions, while the PLS regression
method serves as a fitting technique to weigh the relative
contributions of the basis functions for building the final
QSPR model. The PLS regression technique enables the
use of numerous, highly correlated and noisy variables for
3
2
1
0
PC_1
PC_2
–1
–2
–1
–4 –3Test set
Training set
–2PC_3
–1 0 1 2
0
1
2
Figure 2. PCA score plot of first three components for theQTMS descriptor matrix.
Table 2. Descriptors used for the present work.
Category ofdescriptor Descriptors used
Mulliken charges C1, C2, C3, C4, C5, C6,O7, H8 (Mulliken charges on theeight common atoms of the phenolicderivatives)
Quantum topologicalmolecular similarityindices (QTMS)
l1, l2, l3, 1, r, 72r, K, G, distance
(all the descriptors were calculatedfor each of the bonds connectingthe common atoms)
Molecular Simulation 401
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014
building the models. In this method, latent variables (LVs)
are generated, which are functions of the original
variables, and thus this technique enables the building of
larger QSPR models, avoiding chances of overfitting of
data.
2.5 Statistical analysis and model validation
The quality of the developed QSPR model is judged on the
basis of several statistical parameters. Thus, the fitness of
the model is determined using the following metrics:
(i) determination coefficient (R 2), (ii) explained variance
(Ra2), (iii) standard error of estimate (s) and (iv) variance
ratio (F) at specified degrees of freedom (df) [52]. In case
of a GFA model, an additional parameter viz. LOF is used
to assess the model fitness. However, analysis of the
statistical parameters does not always assess the predictive
power of the model, especially in cases where the model is
used for activity prediction of new untested molecules.
The addition of increased number of descriptors may add
up the value of R 2, but such an increase in the value of R 2
does not necessarily mean an improvement in the
predictive ability of the developed QSPR model. Thus,
further validation of the models is needed to analyse their
predictive potential. Subsequently, internal and external
predictive abilities of the models were determined based
on the internal and external validation techniques,
respectively. The validation strategies check the reliability
of the developed models for their possible application on a
new set of data, and thus the confidence of prediction can
be judged [53,54].
Leave-one-out cross-validation (LOO-CV) is a prac-
tical and reliable method of internal model validation [55].
In the LOO-CV method, parts of the training set data are
kept out of model development, while the model is
developed based on the remaining data. The portion of the
data, which has been held out, is then predicted by the
developed model and compared with the actual values.
This procedure is repeated several times, until every
observation has been kept out once and only once. On the
basis of the validation technique, two parameters [56,57]
are calculated, viz. predicted residual sum of squares
(PRESS) and cross-validated R 2 (LOO-Q2), which are
used as criteria of both robustness and predictive ability of
the model. The higher the value of Q 2 (more than 0.5), the
better is the model predictivity.
Q2 ¼ 1 2
PðYobsðtrainÞ 2 YpredðtrainÞÞ
2PðYobsðtrainÞ 2 �YtrainingÞ
2; ð2Þ
where, Yobs(train) is the observed activity, Ypred(train) is the
LOO predicted activity and �Ytraining is the mean observed
activity of the training set compounds.
External validation is an important tool for proper
selection of QSPR models. Since for this work, enough
new chemicals were unavailable for prediction purpose,
external validation has been performed on a fragment of
the original data set that has not been utilised for the
development of the QSPR model and that has been
selected as the test set. Thus, the antioxidant properties of
the test set molecules were predicted using the QSPR
model, developed with the training set, and were
subsequently compared with the observed antioxidant
property data. The outcome of the external validation
technique is another new parameter referred to as the
predictive R 2 (R2pred) [53] and is defined by the following
equation:
R2pred ¼ 1 2
PðYobsðtestÞ 2 YpredðtestÞÞ
2PðYobsðtestÞ 2 �YtrainingÞ
2: ð3Þ
In the above equation, Yobs(test) and Ypred(test) are the observed
and predicted antioxidant property data, respectively, of the
test set compounds. A value of R2pred (given by Equation (3))
greater than the stipulated value of 0.5 reflects an efficient
prediction of the antioxidant property for the test set
molecules by the developed model. Other parameters
calculated for judging the external predictive potential of
the developed QSPR models include the metrics developed
by Golbraikh and Tropsha [53]. These parameters refer to the
fact that for an ideal QSPR/QSAR model, the value of the
correlation coefficient (r) between the observed [Yobs(test)]
and predicted [Ypred(test)] activities of the test set compounds
should be close to 1. They showed that either of the squared
correlation coefficients of these two regression lines,Yobs(test)
against Ypred(test) and Ypred(test) against Yobs(test), passing
through the origin, i.e. r20 or r0
20, respectively, should be close
to the value of r 2 for an ideal QSPR model. Here, r 2 and r20
indicate the squared correlation coefficients between the
observed and the predicted activity values with and without
the intercept, respectively, while r020 provides the same
information as r20 does, but with inverted axes. Besides these,
for an ideal QSPR model, regressions of the observed against
predicted activity data or predicted against observed activity
data through the origin should be characterised by either k or
k0(slopes of the corresponding regression lines) being close
to 1.
Thus, according to Golbraikh and Tropsha [53],
models satisfying the following conditions are considered
acceptable:
(i) Q 2 . 0.5;
(ii) r 2 . 0.6;
(iii) r20 or r0
20 close to r 2, i.e. ðr 2 2 r2
0Þ=ðr2Þ , 0:1 or
ðr 2 2 r020Þ=r
2 , 0:1 and
(iv) 0:85 # k # 1:15 or 0:85 # k0 , 1:15.
These stringent external validation parameters efficiently
eliminate the probability of chance correlation, resulting in
I. Mitra et al.402
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014
case of models with a large number of descriptors. Again,
as the above (Equation (3)) suggests, the value of R2pred
depends significantly on the value of the denominator in
Equation (3). As the value ofP
ðYobsðtestÞ 2 �YtrainingÞ2
increases, the value of predictive R 2 also improves. Again,
the dependence of the denominator on the value of �Ytraining
reflects that the selection of the training set chiefly
dominates the value of R2pred, and hence it may not truly
reflect the models’ predictive capability for the test set (or
a new data set) molecules. So, the squared correlation
coefficient values between the observed and predicted
values of the test set compounds with the intercept (r 2) and
without the intercept (r20) may be calculated to assess the
performance of the prediction of the developed QSPR
model. Moreover, the value of the squared regression
coefficient (r 2) between observed and predicted values of
the test set compounds does not necessarily indicate that
the predicted values are very near to the observed property
data. Despite maintaining a good overall intercorrelation
among them, there may be a considerable numerical
difference between the observed and predicted property
data. Thus, to obviate these problems and to better gauge
the external predictive capacity of a model, the values of
modified r 2 metrics (r2m) having a threshold of 0.5 were
calculated [47,54,58,59].
r2m ¼ r 2 £ 1 2
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffir 2 2 r2
0
� �q� �: ð4Þ
For any QSPR model, the value of r 2 is always greater
than (or equal to) the value of r20. Thus, in case of good
prediction, where the predicted property values lie in close
proximity to the observed data, the r 2 value will be very
near to the r20 value. Consequently, an ideal prediction of
the property is characterised by a value of r2m equal to that
of r 2. The r2m metrics, being solely dependent on the
observed and predicted property data of the molecules
under analysis, exert improved reliability for the assess-
ment of QSPR model predictivity. The r2m ðLOOÞ and r2
m ðtestÞ
parameters are used for detecting the proximity of fitness
of the predicted property data to that of the observed ones
for the training and test sets, respectively. Besides these,
the r2m ðoverallÞ parameter [58,59] is also calculated, which
ascertains the overall model predictivity based on the
predicted property values of the whole data set (both
training and test sets). The r2m ðoverallÞ statistics may be used
for the selection of the best predictive models from among
comparable models. The parameter r2m was used by
different groups of authors to check the external
predictability of QSAR models [60,61].
2.6 Validation by randomisation
The randomisation technique provides a more robust method
for further validation of a QSPR model. This technique
ensures whether the model developed is the outcome of mere
chance or a robust one with significant reproducibility. This
method involves permutation of the response parameter
(Y-column) with the descriptor matrix (X-columns) kept
unchanged, followed by the development of QSPR models
using the shuffled property values. For a robust QSPR model,
the value of average correlation coefficient (Rr) thus
calculated from the randomised models is less than
the correlation coefficient (R) of the original model. The
randomisation technique constitutes two methods: process
randomisation and model randomisation. Process randomis-
ation is performed using the total data set and scrambling the
property data based on the entire descriptor matrix in order to
assess the reliability of the model building process employed
for this work. However, model randomisation is done based
on the descriptors appearing in the corresponding QSPR
model developed, so as to judge the robustness of the model
obtained. In order to quantify the degree of difference,
between the values of Rr and R, another parameter, cR2p, was
calculated which penalises the model R 2 for small
differences between the values of R 2 and R2r [62].
cR2p ¼ R £
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiR2 2 R2
r
� �q: ð5Þ
In an ideal case, the average value of R 2 for the randomised
models should be zero, i.e. R2r should be zero. Consequently,
in such a case, the value of cR2p should be equal to that of R 2
for the developed QSAR model. Thus, models having an
acceptable value for this parameter (.0.5) are considered to
be robust enough and are not obtained merely by chance.
2.7 Applicability domain
Applicability domain (AD) [63,64] is a theoretical region
in the physico-chemical, structural or biological space
defined by the model descriptors and modelled response.
Based on the AD criterion, a QSAR/QSPR model is
developed, which can be utilised for activity prediction of
only those chemicals that lie within the specified domain.
The domain of applicability of the molecules estimates the
uncertainty in the prediction of a particular molecule,
based on how similar it is to the compounds used to build
the QSAR/QSPR model. For a compound highly
dissimilar to all other modelling compounds, reliable
prediction of its activity becomes unlikely. Thus, the
concept of AD [65] enables to avoid such unjustified
extrapolation for activity predictions. In this work, the AD
of the phenolic derivatives has been analysed based on two
different methods: (i) the leverage approach [64] for the
GFA models and (ii) the distance to model (DModX) [51]
based calculations for the G/PLS models.
In case of the leverage approach, a leverage value (h) is
calculated for each of the test set molecules and is plotted
against the corresponding standardised residual for
Molecular Simulation 403
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014
obtaining a plot referred to as the Williams plot. For a
particular model, the value of h for the individual
molecules should be lower than the corresponding critical
leverage value (h* ¼ 3 £ p0=n, where p0 is the number of
model variables plus 1, and n is the number of the
compounds used to calculate the model). For a molecule
with a value of h greater than h*, activity prediction based
on the respective QSAR/QSPR model should be
considered unreliable. Moreover, compounds with stan-
dardised residuals greater than three standard deviation
(SD) units (.3s) are response outliers. The leverage
values have been calculated using the SPSS software [48].
Again, in case of AD calculation based on the DModX
approach, the residuals of Y and X are used as diagnostic
values for ensuring the quality of the model [51].
The residual SD of the X-residuals of the corresponding
row of the residual matrix E is proportional to the distance
between the data point and the model plane in X-space,
often called distance to the model in X-space (DModX).
Here, X is the matrix of predictor variables, of size N £ K
[where N is the number of objects (cases, observations)
and k is the index of X-variables (k ¼ 1; 2; . . . ;K)], Y is the
matrix of response variables of size N £ M [m is the index
of Y-variables (m ¼ 1; 2; . . . ;M)] and E is the N £ K
matrix of X-residuals. A DModX value larger than around
2.5 times the overall SD of the X-residuals (corresponding
to an F-value of 6.25) indicates that the observation
is outside the AD of the model [51]. The DModX values
for the G/PLS models were calculated using SIMCA
software [66].
3. Results and discussion
Several QSPR models were developed in this work based
on the two sets of descriptors: Mulliken charges and
QTMS descriptors. Two chemometric tools namely GFA
and G/PLS were employed for the development of the
models. The best models thus developed are summarised
in Table 3. Models developed from charges and QTMS
descriptors calculated at HF/6-31G(d) level of theory were
superior to models developed from corresponding
descriptors calculated at other two levels and hence only
the former models are reported here. The GFA models
were developed using 5000 iterations considering both
linear and spline options. The models thus developed are
nonlinear, and the spline terms are expressed as truncated
power splines and denoted with angular brackets. For
example, , f ðxÞ2 a . is equal to zero if the value of
f ðxÞ2 a is negative, else it is equal to f ðxÞ2 a. The
constant ‘a’ is called the knot of the spline. G/PLS was
performed with 1000 iterations, scaled variables and with
the option of no fixed length of equation. The maximum
number of components or LVs fixed for variable selection
was 3. These components are the functions of the original
descriptors and they encode data as represented by theTab
le3
.S
tati
stic
alq
ual
itie
so
ffo
ur
dif
fere
nt
QS
PR
mo
del
sd
evel
op
edin
this
wo
rk.
Usi
ng
Mu
llik
ench
arg
eso
nth
eco
mm
on
ato
ms
asd
escr
ipto
rs
Sta
tist
ical
too
lM
od
eln
o.
Eq
.n
o.
Des
crip
tors
LV
sn
train
ing
sR
2R
2 aF
PR
ES
SQ
2r m2
(LO
O)
nte
stR
2 pre
dr2 m
(test
)r2 m
(overa
ll)
GF
Asp
lin
e1
6,
C22
0.3
39
47
2.
,C
4,
,O
7þ
0.7
82
95
6.
,,
0.4
57
28
32
H8.
–4
60
.18
40
.84
60
.83
15
6.3
91
.68
20
.81
30
.64
81
90
.86
70
.77
80
.69
7
G/P
LS
spli
ne
27
,C
22
0.3
46
61
3.
,C
3,
O7
,,
C4þ
0.2
67
87
5.
,,
0.4
58
96
12
H8.
24
60
.18
60
.83
50
.82
71
08
.59
1.7
97
0.8
01
0.7
87
19
0.8
61
0.7
50
0.8
02
UsingQTMSdescriptors
GF
Ali
nea
r3
8l
2_
01
07;l
3_
04
03;1
04
05;
dis
t 04
05
–4
60
.18
40
.84
60
.83
15
6.3
11
.70
40
.81
10
.64
51
90
.90
40
.88
20
.69
8
G/P
LS
spli
ne
49
,l
2_
01
07þ
0:6
70
42.;
,0:2
67
542
l3
_0
30
2.;
l3
_0
40
3;1
04
05
24
60
.19
10
.82
60
.81
81
02
.39
1.8
55
0.7
94
0.7
79
19
0.8
53
0.8
00
0.7
86
I. Mitra et al.404
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014
descriptors. Among all the developed QSPR models, two
best models based on each set of descriptors are detailed
below. The calculated/predicted antioxidant property
values according to the discussed equations are shown in
Table 1.
3.1 Models developed using the charge descriptors
The models have been developed based on the charge
descriptors calculated using the Hartree–Fock method at
the HF/6-31G(d) level of theory. The best two models
developed using the GFA and G/PLS techniques,
respectively, are reported below.
log103
2d½O2�
dt
� �¼2:72229:2ð^3:983Þ,O7þ0:782956.
28:34ð^1:402Þ,C220:339472.
þ0:807ð^0:144ÞC4þ204ð^57:58Þ
,0:4572832H8. ð6Þ
ntraining ¼ 46; s ¼ 0:184; R2 ¼ 0:846; R2a ¼ 0:831;
F ¼ 56:39ðdf4; 41Þ; PRESS ¼ 1:682; Q 2 ¼ 0:813;
r2m ðLOOÞ ¼ 0:648; ntest ¼ 19; R2
pred ¼ 0:867;
r2m ðtestÞ ¼ 0:778; r2
m ðoverallÞ ¼ 0:697:
Equation (6) has been modelled based on the GFA technique
of descriptor selection. In the above equation, ntraining and
ntest refer to the number of compounds in the training and test
sets, respectively. Thevalues ofQ 2 (0.813) andR2pred (0.867),
being much higher than their stipulated values of 0.5, signify
the statistical significance of the developed model. Besides
these, acceptable values of r2m metrics signify that the
antioxidant property values predicted using Equation (6) are
in close proximity to the corresponding observed data. Thus,
the model developed may be satisfactorily used for property
prediction of new molecules of this class. The descriptors
appearing in Equation (6) may be ranked according to the
following order of their weightage: (i), O7þ 0:782956 .,
(ii) , C2 2 0:339472 ., (iii) C4 and (iv) , 0:4572832
H8 .. The O7 descriptor refers to the charge on the hydroxyl
oxygen atom. A negative coefficient of the , O7 þ
0:782956 . descriptor signifies that any negative value of
the O7 descriptor greater than 0.782956 accounts for the zero
contribution of the spline term, and hence facilitates an
increase in the property profile of these molecules. Such an
increase in the negative charge over the hydroxyl oxygen is
achieved by substituting the phenolic nucleus with groups
having a positive mesomeric effect as well as those having
positive inductive effect (electron-donor groups). The
increased antioxidant property of compound nos. 58, 60,
61 and 64 may be attributed to the positive inductive effects
exerted by the alkylphenyl substituents on C2 and alkyl
substituents on C4 and C6. Again, positive mesomeric effect
of compound nos.4 and 5 accounts for the increased negative
charge on O7. Again, a negative coefficient of the spline term
bearing the C2 descriptor (referring to the value of charge on
C2) indicates that negative values of the spline term exert
zero contribution on the property profile of these molecules,
and hence brings about an enhancement in the antioxidant
property data. The C2 descriptor bears both positive and
negative values indicating that any value less than the knot of
the spline (0.339472) is conducive for the antioxidant
property of these phenolic derivatives. Now, if compound
nos. 2 and 57 are compared, it is observed that compound no.
2 having a large positive value (0.378243) for the C2
descriptor shows less antioxidant property than compound
no. 57 with a value of 0.023671 for the C2 descriptor. This
observation may be explained based on the inductive effect
of the substituents present at the C2 positions of compound
nos. 2 and 57. Compound no. 57 having an alkylphenyl
substituent
H2C CH3
experiences a positive inductive effect resulting in an
increase in the electron density over C2, and hence a decrease
in the value of positive charge over C2. On the contrary,
compound no. 2 having an electronegative atom bonded to
C2 (ZOCH2CH3) suffers from a negative inductive effect
and subsequently the positive charge on C2 increases.
A positive coefficient of the C4 descriptor signifies that the
antioxidant property of the phenolic derivatives is favoured
by an increase in the value of positive charge on C4. Thus, a
desirable charge on C4 may be achieved by substituting the
position with groups having a negative inductive effect
(substituents with electronegative atoms). A comparison of
compound no. 8 with compound no. 12 reveals that
compound no. 8 having a methoxy substituent at C4 exhibits
increased antioxidant property, due to a negative inductive
effect exerted by the substituent. However, compound no.12,
bearing an alkyl substituent at C4, experiences a positive
inductive effect that in turn results in an increase in the
electron density over C4 and hence a decrease in its positive
charge leading to a reduction in its antioxidant property. A
positive coefficient for the spline term with the H8 descriptor
implies that the antioxidant property profile of these
molecules is favoured when the value of positive charge on
H8 is less than the value of 0.457283. Compound nos. 5–7,
27, 28 and 31 having positive values for the , 0:457283 2
H8 . descriptor exhibit maximum to moderate antioxidant
property profiles. Thus, it may be inferred that phenolic
derivatives having an increased electron density over the
phenolic oxygen show an improvement in the antioxidant
property, since in such cases, the hydrogen ion becomes
Molecular Simulation 405
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014
easily available for interaction with the nocive free radicals.
log103
2d½O2�
dt
� � ¼ 217:048 2 9:882
, C2 2 0:346613 . þ0:868
, C4 þ 0:267875 . 20:169
£ C3 2 24:810 £ O7 þ 101:937
, 0:458961 2 H8 .
ð7Þ
ntraining ¼ 46; s ¼ 0:186; R2 ¼ 0:835; R2a ¼ 0:827;
F ¼ 108:59ðdf2; 43Þ; PRESS ¼ 1:797; Q2 ¼ 0:801;
r2m ðLOOÞ ¼ 0:787; ntest ¼ 19; R2
pred ¼ 0:861;
r2m ðtestÞ ¼ 0:750; r2
m ðoverallÞ ¼ 0:802:
Equation (7) was developed based on the G/PLS technique
for variable selection and model development. Although
Equation (7) is equally acceptable to Equation (6) based on
the values of Q 2 (0.801) and R2pred (0.861), the significantly
larger value of the r2m ðoverallÞ (0.802) parameter for the former
indicates increased predictive ability of the model. Thus,
based on overall predictive performance, it may be inferred
that Equation (7) may be more efficiently utilised for
antioxidant property prediction of new series of phenolic
derivatives. The variables appearing in Equation (7)
resembled significantly those of Equation (6). A variable
importance plot (VIP) (Figure 3) for the descriptors
appearing in Equation (7) shows that the descriptors can be
weighted according to the following order of significance: (i)
O7, (ii) , C4 þ 0:267875 ., (iii) , C2 2 0:346613 .,
(iv) , 0:458961 2 H8 . and (v) C3. Moreover, a spline
term for the C4 descriptor appearing in Equation (7) provides
an optimum value for the charge on C4, required for
exhibiting significant antioxidant property. Since the knot of
the spline term is 0.267875, the antioxidant property thus
improves with an increase in the negative numerical value of
C4 above 0.267875 and for increased positive values of
charge on C4. Such a condition is accomplished by
substituting C4 with functional groups having positive
mesomeric effect and negative inductive effect. Additionally,
the C3 descriptor, implying the charge on C3, bears a
negative coefficient and hence signifies that the property
profile of the molecules is favoured with an increase in the
numerical value of negative charge on C3, together with a
decrease in the value of positive charge. More positive charge
on C3 for compound no. 33 accounts for its reduced
antioxidant property compared to compound no. 31. This
observation may be explained based on the inductive effect
of the substituents at C3. Compound no. 31 having an alkyl
substituent exerts a positive inductive effect on the phenolic
nucleus, resulting in an increase in the electron density on C3
and hence a decrease in the positive charge. On the contrary,
compound no. 33 having an electronegative atom attached at
C3 (ZOCH3) suffers from a negative inductive effect, where
the electron cloud is pulled away from the phenolic nucleus
towards the substituent, resulting in an increase in the
positive charge on C3. All the remaining descriptors
appearing in Equation (7), as explained above, signify the
importance of substitutions of the parent moiety, so as to
attain the desired antioxidant property. The mesomeric effect
and the inductive effect of the substituents play the key role
for achieving the desirable charge density over the respective
atoms. Thus, the charge density over the ring should be such
that the electron cloud over the hydroxyl oxygen is stabilised
and the proton becomes easily available for free radical
neutralisation.
In order to view the impact of electron-donor
substituents on the electronic delocalisation of the
phenolic oxygen, the electrostatic potential surface area
was determined for the most active compound (compound
no. 5) and one of the least active compounds (compound
no. 26) bearing a ring deactivating substituent (ZNO2).
Energy minimisation of both the compounds was
performed at the AM1 level using the MOPAC tool
present in the Chem 3D software [67]. Figures 4 and 5
show the wire mesh structures for the molecular
electrostatic potential surfaces of compound nos. 5 and
26 respectively, where the blue mesh indicates negative
charge, while the red mesh implies positive charge. It is
clear from the figures that the electron density over the
phenolic oxygen is much more in compound no. 5
compared to that in compound no. 26. In case of
compound no. 26, the electron cloud is shifted towards the
deactivating nitro group, thus reducing the electron density
over the hydroxyl oxygen. Thus, in compound no. 5, the
electron-donor group (a methoxy substituent) increases the
electron density over the hydroxyl oxygen and thus
reduces the availability of the lone pair of electrons for
aromatic delocalisation.
1.6
1.2
0.8
Var
iabl
e im
port
ance
Variable importance plot (VIP)
0.4
Variables
0
07 C3
<C
4+0.
2678
75>
<C
2–0.
3466
13>
<0.
4589
61-H
8>
Figure 3. VIP plot for the descriptors appearing in model 2(Equation (7)).
I. Mitra et al.406
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014
3.2 Models developed using the QTMS descriptors
These models were developed based on the BCP
descriptors calculated at the HF/6-31G(d) level of theory
using two different chemometric tools (GFA and G/PLS).
Thus, the descriptors appearing in these equations refer to
the characteristics exhibiting different electronic proper-
ties of different bonds of the phenolic nucleus, necessary
for exhibiting maximum antioxidant property.
log103
2d½O2�
dt
� � ¼ 231:31 þ 12:134ð^1:015Þ l2_0107
þ 11:860ð^2:907Þ £ l3_0403
þ 8:108ð^1:316Þ £ 10405
þ 13:708ð^4:692Þ £ dist0405 ð8Þ
ntraining ¼ 46; s ¼ 0:184; R2 ¼ 0:846;
R2a ¼ 0:831; F ¼ 56:31ðdf4; 41Þ;
PRESS ¼ 1:704; Q2 ¼ 0:811;
r2m ðLOOÞ ¼ 0:645; ntest ¼ 19; R2
pred ¼ 0:904;
r2m ðtestÞ ¼ 0:882; r2
m ðoverallÞ ¼ 0:698:
The above model was developed based on the GFA
technique of descriptor selection and model development.
Acceptable values of Q 2 (0.811) and R2pred (0.904) for
Equation (8) reflect the predictive potential of the model.
Moreover, a significantly large value of R2pred indicates
improved ability of the model to predict the property of new
set of molecules of this class. Again, statistically significant
results for all the r2m metrics indicate that the predicted
property values of all the compounds are close to the
corresponding observed data. Although in Equation (8), the
R2pred (0.904) parameter bears a larger value compared to
R 2 (0.846) for the training set, it does not significantly
affect the model predictivity. This is because the r2m ðtestÞ
(0.882) parameter compensates for the increased value
of R2pred and bears little difference with the value of R 2. The
R2pred parameter may overlook a large difference between
the observed and predicted activity data of the test set
molecules for a wide range of activity data, but the r2m ðtestÞ
metric efficiently determines the proximity between them.
Thus, the r2m ðtestÞ metric effectively denotes the external
predictive ability of the developed model and being close to
model R 2, it may be inferred that the above model is well
predictive in terms of both internal and external predictive
ability. Based on the regression analysis performed with the
standardised values of the descriptors, they may be ranked
as follows: (i) l2_0107, (ii) 10405, (iii) l3_0403 and (iv) dist0405.
In the above equation, a positive coefficient of the l2_0107
descriptor implies that the activity of these molecules
improves with a decrease in the p character of the C1ZO7
bond, since the l2_0107 descriptor values are negative. Such
a decrease in p character of the C1ZO7 bond occurs in the
Figure 4. Molecular electrostatic potential surface forcompound no. 5.
Figure 5. Molecular electrostatic potential surface forcompound no. 26.
Molecular Simulation 407
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014
presence of electron-donor substituents on the parent
phenolic moiety, thus making the lone pair of electrons on
phenolic oxygen less available for aromatic delocalisation.
In case of compound nos. 19 and 26, the presence of
deactivating functional groups such as ZNO2 and ZCl
signifies an increased probability of aromatic delocalisa-
tion of the lone pair of electrons of the phenolic oxygen and
hence a reduction in antioxidant property values of the
compounds. On the contrary, compound no. 8 having
methoxy (ZOCH3) substituent exhibits a reduction in thep
character of the C1ZO7 bond and hence improved
antioxidant property, since the presence of an electron-
donor substituent or an activating group reduces the p bond
character of the bond between C1ZO7. Again, a positive
coefficient of the l3_0403 descriptor indicates that the
antioxidant property of these molecules improves with an
increase in the s character of the C3ZC2 bond. This in turn
implies that a reduction in the p electron cloud over the
benzene nucleus is conducive for an improved property
profile of these molecules. Thus, the presence of electron-
donor substituents on the phenolic moiety decreases the
tendency of the oxygen atom to participate in the formation
of the delocalised resonance structures of benzene and thus
decreases the extent of delocalisation of the aromatic p
electron cloud. Compared to compound no. 26, compound
no. 8 shows an improvement in its antioxidant property
profile and such an increase is attributed to the presence of
electron-donor methoxy group (ZOCH3) in compound no.
8 and electron-withdrawing nitro group (ZNO2) in
compound no. 26. Similarly, comparing compound nos.
28 and 39, the presence of ring-activating substituents in
compound no. 28 accounts for an increased property profile
of compound no. 28 compared to compound no. 39. The
10405 descriptor (1 ¼ ðl1=l2Þ2 1) refers to the ellipticity
of the C4ZC5 bond, and a positive coefficient of this
variable in the above equation signifies an increase in the
antioxidant property of these molecules, with an increase in
the value of 10405. Moreover, such an increase in the value
of 10405 descriptors is accomplished through a decrease in
the p character of the C4ZC5 bond. Thus, the 10405
descriptor once again demonstrates that the presence of
electron-donor substituents is necessary for improved
antioxidant property of these molecules. Compound nos.
4–7 and 41 with electron-donor substituents at various
positions of the phenolic nucleus bear higher values for the
10405 descriptor and hence exhibit increased antioxidant
property profile. Again, a positive coefficient for the
dist0405 descriptor refers to the fact that the antioxidant
property profile of these molecules improves with an
increase in the length of the C4ZC5 bond. The dist0405
descriptor in Equation (8) also implies that a decrease in p
character of the C4ZC5 bond causes an increase in the
bond length and hence an improvement in the antioxidant
property data of these molecules. Compound nos. 4–6
exerting maximum values for the antioxidant property bear
electron-donor substituents on the phenolic moiety and
hence the hydroxyl oxygen has more availability of
localised lone pair of electrons, resulting in easy
availability of the proton for interaction with the
neighbouring toxic free radicals.
log103
2d½O2�
dt
� � ¼ 22:999 þ 8:763
, l2_0107 þ 0:67042 . 212:621
, 0:26754 2 l3_0302 . þ12:207 £ l3_0403
þ 6:562 £ 10405 ð9Þ
ntraining ¼ 46; s ¼ 0:191; R2 ¼ 0:826; R2a ¼ 0:818;
F ¼ 102:39ðdf2; 43Þ; PRESS ¼ 1:855;
Q 2 ¼ 0:794; r2m ðLOOÞ ¼ 0:779; ntest ¼ 19;
R2pred ¼ 0:853; r2
m ðtestÞ ¼ 0:800; r2m ðoverallÞ ¼ 0:786:
Equation (9) was developed based on the G/PLS technique,
where the GFA tool was used for descriptor selection, while
the PLS technique was utilised for model construction.
This model exhibits statistically acceptable internal
(Q 2 ¼ 0.794) and external (R2pred ¼ 0.853) validation
parameters. Additionally, the model exhibits high values
of the r2m metrics implying that the predicted property data
of these molecules are close to the desired range of the
observed data. According to the VIP plot (Figure 6), the
descriptors occurring in Equation (9) may be arranged
according to the following order of significance:
(i) , l2_0107 þ 0:67042 ., (ii) , 0:26754 2 l3_0302 .,
(iii) 10405 and (iv) l3_0403. The QTMS descriptors have been
claimed [41,42] to have a diagnostic potential in revealing
important fragments of molecules contributing to the
response property. The present work shows the importance
of the CZO bond [41] (and its p character) for the
antioxidant property of phenols. Most of the descriptors
appearing in Equation (8) reappear in Equation (9),
1.6
1.2
0.8
Var
iabl
e im
port
ance
Variable importance plot (VIP)
0.4
Variables
<lam2_0107+0.67042>
<0.26754–lam3_0302>
e110405 lam3_04030
Figure 6. VIP plot for the descriptors appearing in model 4(Equation (9)).
I. Mitra et al.408
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014
signifying the importance of these few descriptors on the
antioxidant property profile of these molecules. Moreover,
the occurrence of the spline term for the l2_0107 descriptor
provides a cut-off range for the value of this descriptor for
exerting maximum impact on the antioxidant property
profile of these molecules. Since a negative value within a
spline term indicates zero contribution of the correspond-
ing spline function, reduced numerical value (values being
negative) of the l2_0107 descriptor below the knot of the
spline (0.67042) accounts for an increased activity profile
of this series of phenolic derivatives, as in case of
compound nos. 4, 5, 58–61. Thus, the increased activity
profile of these molecules may be attributed to the presence
of electron-donor substituents at various positions of the
phenolic nucleus. Again, a negative coefficient of the spline
term bearing the l3_0302 descriptor signifies that values of
this descriptor above the optimum value of 0.26754
account for zero contribution of the corresponding spline
term and hence an improvement in the property profile of
these molecules. Such an increase in the value of l3_0302
descriptor is achieved by substituting the parent moiety
with electron-donor substituents. This results in a
diminution of the p character of the C3ZC2 bond, while
an electron-donor substituent leads to an increase in the
electron density over the phenolic oxygen. A comparison
of compound nos. 28 and 39 reveals that compound no. 28
having electron-donating substituents exhibits improved
antioxidant property than compound no. 39 bearing a
deactivating group (ZCl). Similar phenomenon is also
observed for compound nos. 42 and 54, where compound
no. 42 bearing an amino (ZNH2) substituent exerts an
increased antioxidant property. The two remaining
descriptors (10405 and l3_0403) appearing in Equation (9)
were also present in Equation (8) and imply that electron-
donor or ring-activating substituents on the parent phenolic
moiety exert a positive influence on the antioxidant
property of these molecules.
Besides these, a comparison of the external validation
parameters developed by Golbraikh and Tropsha [53] for all
the developed QSPR models is summarised in Table 4. From
Table 4, it can be concluded that for all four of the QSPR
models, the values of r20 and r0
20 are closely associated with
the value of r 2, while those of k and k0 lie within the desired
range of 0.85 and 1.15. Moreover, significantly low values of
r 2 2 r20=r
2 and r 2 2 r020=r
2 (much lower than the stipulated
value of 0.1) for all the developed models once again indicate
the efficient predictive potential of all the QSPR models
developed in this work.
3.3 Randomisation test employed for further modelvalidation
The robustness of the developed models was checked
based on the randomisation technique. Both process and
model randomisation tests were performed for this work
at 90 and 99% confidence levels, respectively. Model
randomisation was performed in order to ensure the
reliability and predictive ability of the developed QSPR
models, while process randomisation ascertains the
acceptability of the process employed for the development
of the respective models. The results of both the types of
randomisation tests, as summarised in Table 5, reveal that
the average randomised correlation coefficients (Rr) values
for all the models are much lower than the original
correlation coefficients (R) of the corresponding QSPR
models. Thus, all the models are robust and well predictive
based on the values of their randomised correlation
coefficients. Moreover, the cR2p values for all the presented
models were calculated [62]. The cR2p values penalise the
model R 2 for small differences in the values of R 2 and R2r .
The results signify that since the values of cR2p are much
higher than the threshold value of 0.5, all the models may
be considered to be robust and not the outcome of mere
chance only. For the two types of QSPR models developed,
models 1 and 3 show highly acceptable results for both
process and model randomisation tests. However, the
G/PLS models (models 2 and 4) developed using the two
sets of descriptors exhibit maximum values of cR2p for
model randomisation.
Table 4. Validation using external validation parameters of Golbraikh and Tropsha [53].
Model no. r 2 r02 r00
2 k k0 ðr 2 2 r20=r
2Þ ðr 2 2 r020=r2Þ
1 (Equation (6)) 0.884 0.870 0.808 1.018 0.977 0.016 0.0862 (Equation (7)) 0.893 0.867 0.787 1.022 0.973 0.030 0.1193 (Equation (8)) 0.901 0.900 0.885 1.004 0.992 0.002 0.0184 (Equation (9)) 0.836 0.830 0.828 1.016 0.977 0.008 0.010
Table 5. Results obtained from the randomisation tests.
Model no. R 2 R Rr Rr2 cRp
2
Process randomisation1 (Equation (6)) 0.846 0.920 0.362 0.131 0.7782 (Equation (7)) 0.835 0.914 0.540 0.292 0.6743 (Equation (8)) 0.846 0.920 0.401 0.161 0.7614 (Equation (9)) 0.826 0.909 0.610 0.372 0.612
Model randomisation1 (Equation (6)) 0.846 0.920 0.281 0.079 0.8062 (Equation (7)) 0.835 0.914 0.065 0.004 0.8333 (Equation (8)) 0.846 0.920 0.270 0.073 0.8094 (Equation (9)) 0.826 0.909 0.079 0.006 0.823
Molecular Simulation 409
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014
3.4 Test for AD
The leverage approach was employed for checking the AD
of the test set molecules, predicted using the two GFA
models. Figure 7 shows a plot of standardised residuals vs.
leverage values (Williams plot) [64] of the test set
compounds for models 1 and 3. Since the critical value of
leverage for both the models is 0.789, all 19 test set
compounds are found to be within the AD of the models
(i.e. there is no structurally influential chemical). More-
over, the standardised residuals of all the 19 molecules are
2
1.5
(a) (b)
1
0.5
00 0.05
Leverage Leverage
0.1 0.15 0
1.5
1
0.5
0
–0.5
–1
–1.5
–2
0.05 0.1 0.15 0.2–0.5
Stan
dard
ised
resi
dual
Stan
dard
ised
resi
dual
–1
–1.5
–2
–2.5
Figure 7. Williams plots for (a) model 1 and (b) model 3.
1
2.50
2.00
1.50
DM
odX
1.00
0.50
0.00
2.50(a)
(b)
2.00
D-Crit (0.00999999)
D-Crit (0.00999999)
1.50
DM
odX
1.00
0.50
0.00
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Figure 8. DModX values of the 19 test set compounds at 99% level for (a) model 2 and (b) model 4 with the thick horizontal linessignifying the critical DmodX values.
I. Mitra et al.410
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014
located within the range of ^3s, inferring that none of the
compounds behave as outliers. The AD of the two G/PLS
models was checked based on the DModX [51] approach.
A bar diagram for the DModX values of all the test set
compounds for models 2 and 4 is shown in Figure 8. For
model 2, the DModX values of all the test compounds are
below the critical value of 2.502 at the 99% significance
level. So, none of the compounds are outside the AD, and
predictions for all the 19 test compounds are acceptable.
Similarly for model 4, DModX values of all 19 test
compounds are below the critical point of 2.843 at the 99%
significance level. The results thus obtained denote that all
the test set compounds are within the AD and provide
reliable predictions.
4. Overview and conclusions
This work deals with the QSPR analysis of a series of 65
phenolic derivatives having an antioxidant property. The
variation in antioxidant property of these compounds is
due to a variation in the position and type of substitutions
on the parent moiety. The QSPR models thus developed
provide an outline regarding the type and position of
substitutions for the molecules to exert optimum
antioxidant property. Quantum chemical descriptors
belonging to two classes (Mulliken charges and QTMS)
were calculated at the Hartree–Fock HF/6-31G(d) level
for this work. Due to lack of additional available data
related to the measurement of antioxidant properties of
phenolic derivatives by an oxygen absorption method
(involving autooxidation of styrene in presence of the
phenolic derivatives and their rates of peroxy radical
trapping activities), true external validation of the
developed QSPR models was not performed. However,
validation of the developed models has been performed
based on the splitting of the entire data set into training
and test sets and calculating several stringent validation
metrics. This ensures the predictive ability of the
developed models. Approximately, 75% of the compounds
of the whole data set were utilised as the training set, while
25% of the compounds were held out as the test set.
Models were built using the GFA and G/PLS techniques.
Based on the order of significance of the descriptors
appearing in Equations (6) and (7), it can be inferred that
the value of charge on O7 exerts a significant contribution
to the antioxidant property profile of the molecules. An
increase in the electron density over O7, achieved through
electron-donor groups substituted to the parent moiety,
reduces the degree of aromatic delocalisation of the lone
pair of electrons of the phenolic oxygen and facilitates
easy interaction of the phenolic proton with the hazardous
free radicals. Additionally, the charges over C2, C4 and
H8 also influence the antioxidant property of the
molecules to a considerable extent. Thus, substituents
having positive mesomeric effect are conducive to the
property data of these molecules. However, those having a
negative inductive effect (ZOCH3) are favoured for
substitution at C4, while groups with positive inductive
effect (alkyl groups) are favoured at C2. Again, the models
developed with the QTMS descriptors signify that a
decrease in the p character of the C1ZO7 bond is a key
factor for the phenolic derivatives for exhibiting an
efficient antioxidant property. Besides these, a decrease in
aromatic delocalisation of the lone pair of electrons of the
phenolic oxygen over the phenolic nucleus also adds up to
the antioxidant property profile of the phenolic derivatives.
Thus, the QTMS descriptors also infer that substituents
with a positive mesomeric effect (electron-donor groups
like methoxy) increase the electron density over the
hydroxyl oxygen and enable a rapid reaction with the
peroxy radicals. The conclusions drawn in this work are in
accordance with the qualitative observations made in the
previous papers [25,37,38]. The model performance in this
work was judged by the internal and external validation
(using the test set) measures. Acceptable values were
obtained for both the internal and external validation
parameters (Q 2 and R2pred) for all the four models reported
in this work. Besides these, the r2m metrics calculated for
all the models imply that a good correlation is maintained
between the observed and predicted activity data and the
values are in close proximity to each other. Further
validation of the developed QSPR models was performed
using the randomisation technique, and the results obtained
firmly ensure the reliability and robustness of the models.
Finally, it may be concluded that the molecules are capable of
achieving the required criteria for maximising the response
variables, i.e. high antioxidant property, through suitable
substitution (as explained by the developed models) on the
parent phenolic moiety. The QSPR models developed in this
work may be successfully utilised for further design and
analysis of new molecules with an improved antioxidant
property.
Acknowledgements
This research work was supported in the form of a major researchproject to K.R. and a senior research fellowship to I.M. by theIndian Council of Medical Research (ICMR), New Delhi.
References
[1] E. Koutsilieri, C. Scheller, E. Grunblatt, K. Nara, J. Li, andP. Riederer, Free radicals in Parkinson’s disease, J. Neurol. 249(2002), pp. II/1–II/5.
[2] K.N. Prasad, W.C. Cole, and B. Kumar, Multiple antioxidants in theprevention and treatment of Parkinson’s disease, J. Am. Coll. Nutr.18 (1999), pp. 413–423.
[3] S.L. Nuttall, M.J. Kendall, and U. Martin, Antioxidant therapy forthe prevention of cardiovascular disease, Q. J. Med. 92 (1999),pp. 239–244.
[4] M. Dizdaroglu, P. Jaruga, M. Birincioglu, and H. Rodriguez,Free radical-induced damage to DNA: Mechanisms and measure-ment, Free Radic. Biol. Med. 32 (2002), pp. 1102–1115.
Molecular Simulation 411
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014
[5] T. Ozawa, in Understanding the Process of Aging, E. Cadenas andL. Packer, eds., Marcel Dekker, New York, 1999, pp. 265–292.
[6] V. Grishko, M. Solomon, G.L. Wilson, S.P. LeDoux, and M.N.Gillespie, Oxygen radical-induced mitochondrial DNA damage andrepair in pulmonary vascular endothelial cell phenotypes, Am.J. Physiol. Lung Cell Mol. Physiol. 280 (2001), pp. L1300–L1308.
[7] M. Genestra, Oxyl radicals, redox-sensitive signalling cascades andantioxidants, Cell Signal. 19 (2007), pp. 1807–1809.
[8] R.C. Hubbard, F. Ogushi, G.A. Fells, A.M. Cantin, S. Jallat,M. Courtney, and R.G. Crystal, Oxidants spontaneously released byalveolar macrophages of cigarette smokers can inactivate the activesite of alpha 1-antitrypsin, rendering it ineffective as an inhibitor ofneutrophil elastase, J. Clin. Invest. 80 (1987), pp. 1289–1295.
[9] B.N. Ames, M.K. Shigenaga, and T.M. Hagen, Oxidants,antioxidants, and the degenerative diseases of aging, Proc. NatlAcad. Sci. USA 90 (1993), pp. 7915–7922.
[10] J.M.C. Gutteridge and B. Halliwell, Antioxidants in Nutrition,Health and Disease, Oxford University Press, Oxford, 1994.
[11] B. Halliwell, Antioxidants and human disease: A generalintroduction, Nutr. Rev. 55 (1997), pp. S44–S49.
[12] J.S. Wright, E.R. Johnson, and G.A. DiLabio, Predicting the activityof phenolic antioxidants: Theoretical method, analysis of substituenteffects, and application to major families of antioxidants, J. Am.Chem. Soc. 123 (2001), pp. 1173–1183.
[13] A.P. Vafiadis and E.G. Bakalbassis, A DFT study on thedeprotonation antioxidant . . . step of ortho-substituted phenoliccation radicals, Chem. Phys. 316 (2005), pp. 195–204.
[14] M. Musialik and G. Litwinienko, Scavenging of DPPH.: Radicalsby vitamin E is accelerated by its partial ionization: The role ofsequential proton loss electron transfer, Org. Lett. 7 (2005),pp. 4951–4954.
[15] J. Pokorny, Natural antioxidants for food use: A re-view, TrendsFood Sci. Technol. 2 (1991), pp. 223–227.
[16] Z. Judith and C. John, The Cardiovascular Cure: How to Strengthenyour Self-defense Against Heart Attack and Stroke, BroadwayBooks, New York, 2002.
[17] M. Serafini, J.A. Laranjinha, L.M. Almeida, and G. Maiani,Inhibition of human LDL lipid peroxidation by phenol-richbeverages and their impact on plasma total antioxidant capacityin humans, J. Nutr. Biochem. 11 (2000), pp. 585–590.
[18] M. Jang, L. Cai, G.O. Udeani, K.V. Slowing, C.F. Thomas,C.W.W. Beecher, H.H.S. Fong, N.R. Farnsworth, A.D. Kinghorn,R.G. Mehta, R.C. Moon, and J.M. Pezzuto, Cancer chemopreven-tive activity of resveratrol, a natural product derived from grapes,Science 275 (1997), pp. 218–220.
[19] O. Vieira, I. Escargueil-Blanc, O. Meilhac, J.P. Basile, J. Laranjinha,L. Almeida, R. Salvayre, and A. Negre-Salvayre, Effect of dietaryphenolic compounds on apoptosis of human cultured endothelialcells induced by oxidized LDL, Br. J. Pharmacol. 123 (1998),pp. 565–573.
[20] P.J. Magalhaes, D.O. Carvalho, J.M. Cruz, L.F. Guido, andA.A. Barros, Fundamentals and health benefits of xanthohumol, anatural product derived from hops and beer, Nat. Prod. Commun. 4(2009), pp. 591–610.
[21] E.N. Frankel, Lipid Oxidation, Oily Press, Bridgewater, UK, 2005.[22] E.N. Frankel, Antioxidants in Food and Biology, Oily Press,
Bridgewater, UK, 2007.[23] I. Nagamine, H. Sakurai, H.T.T. Nguyen, M. Miyahara,
J. Parkanyiova, Z. Reblova, and J. Pokorny, Vegetable oils andamino acids, Czech J. Food Sci. 22 (2004), pp. 155S–158S.
[24] J. Pokorny, in Natural Antioxidant Phenols, D. Boskou, ed.,Synpost, Trivandrum, India, 2006.
[25] T. Kajiyama and Y. Ohkatsu, Effect of meta-substituents of phenolicantioxidants-proposal of secondary substituent effect, Polym.Degrad. Stab. 75 (2002), pp. 535–542.
[26] T. Vasileva, K. Stanulov, and S. Nenkova, Phenolic antioxidants forfuels, J. Univ. Chem. Technol. Metall. 43 (2008), pp. 65–68.
[27] P. Liu and W. Long, Current mathematical methods used inQSAR/QSPR studies, Int. J. Mol. Sci. 10 (2009), pp. 1978–1998.
[28] A.M. Helguera, R.D. Combes, M.P. Gonzalez, and M.N. Cordeiro,Applications of 2D descriptors in drug design: A DRAGON tale,Curr. Top. Med. Chem. 8 (2008), pp. 1628–1655.
[29] M. Zhao, Z. Li, Y. Wu, Y.R. Tang, C. Wang, Z. Zhang, and S. Peng,Studies on log P, retention time and QSAR of 2-substitutedphenylnitronyl nitroxides as free radical scavengers, Eur. J. Med.Chem. 42 (2007), pp. 955–965.
[30] A.K. Calgarotto, S. Miotto, K.M. Honorio, A.B.F. Da Silva,S. Marangoni, J.L. Silva, M. Comar, Jr., K.M.T. Oliveirad, andS.L. Da Silva, A multivariate study on flavonoid compoundsscavenging the peroxynitrite free radical, J. Mol. Struct.:Theochem. 808 (2007), pp. 25–33.
[31] A.I. Khlebnikov, I.A. Schepetkin, N.G. Domina, L.N. Kirpotina,and M.T. Quinn, Improved quantitative structure–activity relation-ship models to predict antioxidant activity of flavonoids in chemical,enzymatic, and cellular systems, Bioorg. Med. Chem. 15 (2007),pp. 1749–1770.
[32] S. Ray, C. Sengupta, and K. Roy, QSAR modeling of antiradical andantioxidant activities of flavonoids using electrotopological stateatom (E-state) parameters, Cent. Eur. J. Chem. 5 (2007),pp. 1094–1113.
[33] S. Gupta, S. Matthew, P.M. Abreu, and J. Aires-de-Sousa, QSARanalysis of phenolic antioxidants using MOLMAP descriptors oflocal properties, Bioorg. Med. Chem. 14 (2006), pp. 1199–1206.
[34] I. Mitra, A. Saha, and K. Roy, QSAR modeling of antioxidantactivities of hydroxybenzalacetones using quantum chemical,physicochemical and spatial descriptors, Chem. Biol. Drug Des.73 (2009), pp. 526–536.
[35] I. Mitra, K. Roy, and A. Saha, QSAR of anti-lipid peroxidativeactivity of substituted benzodioxoles using chemometric tools,J. Comput. Chem. 30 (2009), pp. 2712–2722.
[36] I. Mitra, A. Saha, and K. Roy, Pharmacophore mapping ofarylamino substituted benzo[b]thiophenes as free radical scaven-gers, J. Mol. Model. 16 (2010), pp. 1585–1596.
[37] T. Kajiyama and Y. Ohkatsu, Effect of para-substituents of phenolicantioxidants, Polym. Degrad. Stab. 71 (2001), pp. 445–452.
[38] T. Matsuura and Y. Ohkatsu, Phenolic antioxidants: Effect ofo-benzyl substituents, Polym. Degrad. Stab. 70 (2000), pp. 59–63.
[39] GaussView3.0, Semichem Inc., Gaussian Inc., Pittsburgh, PA, USA,2003.
[40] M.J. Frisch, G.W. Trucks, H.B. Schlegel, G.E. Scuseria, M.A. Robb,J.R. Cheeseman, J.A.J Montgomery, J.T. Vreven, K.N. Kudin,J.C. Burant, J.M. Millam, S.S. Iyengar, J. Tomasi, V. Barone,B. Mennucci, M. Cossi, G. Scalmani, N. Rega, G.A. Petersson,H. Nakatsuji, M. Hada, M. Ehara, K. Toyota, R. Fukuda,J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao,H. Nakai, M. Klene, X. Li, J.E. Knox, H.P. Hratchian, J.B. Cross,C. Adamo, J. Jaramillo, R. Gomperts, R.E. Stratmann, O. Yazyev,A.J. Austin, R. Cammi, C. Pomelli, J.W. Ochterski, P.Y. Ayala,K. Morokuma, G.A. Voth, P. Salvador, J.J. Dannenberg,V.G. Zakrzewski, S. Dapprich, A.D. Daniels, M.C. Strain,O. Farkas, D.K. Malick, A.D. Rabuck, K. Raghavachari,J.B. Foresman, J.V. Ortiz, Q. Cui, A.G. Baboul, S. Clifford,J. Cioslowski, B.B. Stefanov, G. Liu, A. Liashenko, P. Piskorz,I. Komaromi, R.L. Martin, D.J. Fox, T. Keith, M.A. Al-Laham,C.Y. Peng, A. Nanayakkara, M. Challacombe, P.M.W. Gill,B. Johnson, W. Chen, M.W. Wong, C. Gonzalez, and J.A. PopleGAUSSIAN 03, Revision B.05, Gaussian Inc., Pittsburgh, PA, 2003.
[41] K. Roy and P.L.A. Popelier, Predictive QSPR modeling of acidicdissociation constant (pKa) of phenols in different solvents, J. Phys.Org. Chem. 22 (2009), pp. 186–196.
[42] K. Roy and P.L.A. Popelier, Exploring predictive QSAR modelsusing quantum topological molecular similarity (QTMS) descriptorsfor toxicity of nitroaromatics to Saccharomyces cerevisiae, QSARComb. Sci. 27 (2008), pp. 1006–1012.
[43] P.L.A. Popelier, Quantum molecular similarity. 1. BCP space,J. Phys. Chem. A 103 (1999), pp. 2883–2890.
[44] R.F.W. Bader and H.J.T. Preston, The kinetic energy ofmolecular . . .molecular stability, Int. J. Quantum Chem. 3 (1969),pp. 327–347.
[45] S.T. Howard and O. Lamarche, Description of covalent bond ordersusing the charge density topology, J. Phys. Org. Chem. 16 (2003),pp. 133–141.
[46] R.F.W. Bader, T.S. Slee, D. Cremer, and E. Kraka, Description ofconjugation and hyperconjugation in terms of electron distributions,J. Am. Chem. Soc. 105 (1983), pp. 5061–5068.
I. Mitra et al.412
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014
[47] J.T. Leonard and K. Roy, On selection of training and test sets forthe development of predictive QSAR models, QSAR Comb. Sci. 25(2006), pp. 235–251.
[48] SPSS, standard version 1998, is statistical software of SPSS Inc.,Chicago, IL; software available at http://www.spss.com.
[49] D. Rogers and A.J. Hopfinger, Application of genetic functionapproximation to quantitative structure–activity relationship andquantitative structure–property relationship, J. Chem. Inf. Comput.Sci. 34 (1994), pp. 854–866.
[50] A. Fraser, Simulation of genetic systems by automatic digitalcomputers. I. Introduction, Aust. J. Biol. Sci. 10 (1957),pp. 484–491.
[51] S. Wold, M. Sjostrom, and L. Eriksson, PLS-regression: A basic toolof chemometrics, Chemom. Intell. Lab. Syst. 58 (2001),pp. 109–130.
[52] G.W. Snedecor and W.G. Cochran, Statistical Methods, Oxford &IBH, New Delhi, 1967.
[53] A. Golbraikh and A. Tropsha, Beware of q2!, J. Mol. Graph. Model.20 (2002), pp. 269–276.
[54] P.P. Roy and K. Roy, On some aspects of variable selection forpartial least squares regression models, QSAR Comb. Sci. 27(2008), pp. 302–313.
[55] S. Wold and L. Eriksson, Validation tools, in Chemometric Methodsin Molecular Design, H. van de Waterbeemd, ed., VCH, Weinheim,1995, pp. 312–317.
[56] A.K. Debnath, Quantitative structure–activity relationship(QSAR): A versatile tool in drug design, in Combinatorial LibraryDesign and Evaluation, A.K. Ghose, and V.N. Viswanadhan, eds.,Marcel Dekker, New York, 2001, pp. 73–129.
[57] K. Roy, On some aspects of validation of predictive QSAR models,Expert Opin. Drug Discov. 2 (2007), pp. 1567–1577.
[58] P.P. Roy and K. Roy, Comparative QSAR studies of CYP1A2inhibitor flavonoids using 2D and 3D descriptors, Chem. Biol. DrugDes. 72 (2008), pp. 370–382.
[59] P.P. Roy, S. Paul, I. Mitra, and K. Roy, On two novel parameters forvalidation of predictive QSAR models, Molecules 14 (2009),pp. 1660–1701.
[60] A.A. Toropov, A.P. Toropova, and E. Benfenati, QSPR modelingbioconcentration factor (BCF) by balance of correlations, Eur.J. Med. Chem. 44 (2009), pp. 2544–2551.
[61] P. Lu, X. Wei, and R. Zhang, CoMFA and CoMSIA 3D-QSARstudies on quionolone caroxylic acid derivatives inhibitors of HIV-1integrase, Eur. J. Med. Chem. 45 (2010), pp. 3413–3419.
[62] I. Mitra, A. Saha, and K. Roy, Exploring quantitative structure–activity relationship (QSAR) studies of antioxidant phenoliccompounds obtained from traditional Chinese medicinal plants,Mol. Simul. 36 (2010), pp. 1067–1079.
[63] P. Gramatica, Principles of QSAR models validation: Internal andexternal, QSAR Comb. Sci. 26 (2007), pp. 694–701.
[64] L. Eriksson, J. Jaworska, A.P. Worth, M.T. Cronin, R.M. McDowell,and P. Gramatica, Methods for reliability and uncertaintyassessment and for applicability evaluations of classification- andregression-based QSARs, Environ. Health Perspect. 111 (2003),pp. 1361–1375.
[65] L. Zhang, H. Zhu, T. Oprea, A. Golbraikh, and A. Tropsha, QSARmodeling of the blood–brain barrier permeability for diverseorganic compounds, Pharm. Res. 25 (2008), pp. 1902–1914.
[66] SIMCA-P version 10.0.2.0, 2002, is a product of UMETRICS,Umea, Sweden, [email protected], www.umetrics.com.
[67] ChemDraw Ultra version 5.0, 1999, is a product of Cambridge softCorporation, Cambridge, MA 02140, USA; available athttp://www.camsoft.com/support.html.
Molecular Simulation 413
Dow
nloa
ded
by [
Mos
kow
Sta
te U
niv
Bib
liote
] at
09:
53 1
8 Fe
brua
ry 2
014