1
CHAPTER 1
INTRODUCTION
1.1 GENERAL INTRODUCTION
Medicinal Chemistry is the science that deals with the discovery and
design of new therapeutic chemicals and their development into useful
medicines. It had its beginning when chemist, pharmacist and physician
isolated and purified active principles of plant and animal tissue and later from
microorganism and their fermented products. During the latter decades of the
20th century, the traditional dividing line between biological, chemical and
physical science were erased and new borderline investigation such as
molecular biology, molecular pharmacology, biomedicine and other begin to
capture the interest of medicinal scientists. Medicinal chemistry which had
organic chemistry, biology and some area of physics extended new root into
these emerging topics [1].
Drug designing is a multi-disciplinary activity involving chemists,
biologists, biochemists, pharmacologists and many others. The chemist’s role is
central in inventing new compounds, which exert a beneficial effect. However
once a lead for a new active drug has been established, its effective
toxicological studies were undertaken to demonstrate its safety and efficacy
before clinical trials commences.
With the accident discovery of penicillin came the screening of
microorganism and the large number of antibiotics from bacterial and fungal
2
sources. Many of these antibiotics provide the prototype structure, that the
medicinal chemist modifies to obtain antibacterial drug with better therapeutic
profile. Thousand of chemicals are prepared annually throughout the world and
many of them are entered into pharmacological screening to determine whether
they have useful biological activity or not. This process of random screening
has been considered inefficient, but it has result in the identification of new
prototype compounds, whose structure have been optimized to produce clinical
observation of the pharmacological behavior of an existing drug.
The term “drug design” represents mainly the efforts to develop new
drugs on rational basis. The various approaches used in drug design include.
Random screening of synthetic compounds or chemicals and
natural products by bioassay procedures.
Novel compounds preparation based on the known structures of
biologically active, natural substances of plant and animal
origin, i.e., lead skeleton.
Preparation of structural analogues of lead with increasing
biological activity.
Application of the bio-isosteric principle.
In the course of drug design the two major types of chemical
modification are achieved through the formation of analogues and prodrugs.
An analogue is normally accepted as being that modification which brings
about a carbon-skeletal transformation or substitute synthesis. eg.
Oxytetracycline, Demclocycline with regard to Oestradiol. The term prodrug is
applied to an appropriate derivative of a drug, that undergoes in vivo hydrolysis
to the parent drug, e.g., Testosterone propionate, Chloramphenicol palmitate.
3
More recently automated High-Throughput Screening (HTPS) system
utilizing cell culture system with linked enzyme assay and receptor molecule
derived from gene cloning have greatly increased the efficiency of random
screening. It is now practical to screen enormous libraries of peptides and
nucleic acid obtained from combinatorial chemistry procedures.
Rational design is another approach which is also flourishing.
Significant advance in X-Ray crystallography and NMR have made it possible
to obtain detailed representation of enzyme and other drug receptor. The
technique of molecular graphics and computational chemistry has provided
novel chemical structure that has lead to new drug with potent medicinal
activities. Development of HIV protease inhibitor and ACE inhibitor came
from an understanding of the geometric and chemical character of the
respective enzyme’s active site. Even if the receptor structure is not known in
detail rational approaches based on the physicochemical properties of lead
compound can provide new drugs.
1.2 NUCLEUS INTRODUCTION
1.2.1 Gallic Acid
Chemical Name: 3, 4, 5 –Trihydroxybenzoic acid
HO
HO
HO
COOH
4
Molecular Formula : C6H2(OH)3COOH
Molecular weight : 188.14
Description : White or Pale power
Solubility : Soluble in acetone and ethylacetate but insoluble in
benzene.
Category : Antioxidant
Gallic acid is a naturally occurring poly phenol. It is known as
3, 4, 5 –trihydroxybenzoic acid monohydrate. It is obtained by the hydrolysis
of tannic acid with sulphuric acid. Gallic acid is found in almost all the plants.
Plants known for their high gallic acid content include
Areca nut, (Areca catechu)
Barberry (Berberis Vulgaris),
Blackberry, Hot chocolate, (Robus argatus),
Common walnut (Juglans regia),
Mango peels and leaves (Magnifera indica),
Indian gooseberry (Phyllanthus emblica),
Clove (Syzygium aromaticum)
Golden root (Rhodiola rosea),
Witch hazel (Hamamelis virginiana).
5
It is wide spread in plant foods and beverages such as tea and wine
and was proven to be one of the anti -carcinogenic polyphenol present in green
tea. The consumption of a high diet in saturated fat coupled with gallic acid
apparently in France produced low incidence of coronary heart disease [2]. It is
a strong natural antioxidant; able to scavenge hypochlorous acid also decreases
the peroxidation of brain phospholipids. Antioxidant capacity of galloyl ester
against hydroxyl, azide and super oxide radicals have been reported [3]. Gallic
acid acts as an antioxidant and helps to protect our cells against oxidative
damage. It was found to show cytotoxicity against cancer cells, without
harming healthy cells [4]. It is also present in red wine and found to have a
protective role against oxidation of low-density lipoproteins (LDL) [5].
Synthesis of gallic acid
1. With the elaboration of high-yielding, high-titer synthesis of
3-Dehydroshikimic acid from glucose using recombinant
Escherichia coli, oxidation of this hydroaromatic becomes a
potential route for synthesis of gallic acid. Conversion of
3-Dehydroshikimic acid into gallic acid likely proceeds via initial
enolization of an -hydroxycarbonyl and oxidation of the resulting
enediol. 3-Dehydroshikimate enolization in water was catalyzed by
inorganic phosphate while Zn2+ was used to catalyze enolization in
acetic acid. Enediol oxidation employed Cu2+ as either the
stoichiometric oxidant or as a catalyst in the presence of a
co-oxidant. Gallic acid was produced in a yield of 36% when
3-Dehydroshikimic acid in phosphate-buffered water reacted for 35 h
with H2O2 and catalytic amounts of CuSO4 [6].
6
COOH
OH
O
COOH
OH
OH
HOOH
PO4 / ZnO
3-Dehydroshikimic acid Gallic acid
2. Gallic acid may result from the dehydration of 3-Dehydroshikimic
acid followed by hydroxylation of the intermediate protocatechuic
acid. 3-Dehydroshikimic acid is obtained from erythrose-4-phosphate
[7].
O
OH
HO
PO
O
OH
OHCOOH
OH
OH
HO
(i) Dehydration
(ii) Hydroxylation
Erythrose-4-phosphate Gallic acid
3. In tea seedlings flavan-3-ols are produced by a naringenin-
chalcone naringenin dihydrokaempferol pathway. Dihydrokaempferol
is a branch point in the synthesis of ( )-epigallocatechin-3-O-gallate
and other flavan-3-ols, which can be formed by routes beginning with
either a flavonoid-3 -hydroxylase mediated conversion of the flavonol
to dihydroquercetin or a flavonoid 3 ,5 -hydroxylase-catalysed
conversion to dihydromyricetin with subsequent steps involving
7
sequential reactions catalysed by dihydroflavanol-4-reductase,
anthocyanidin synthase, anthocyanidin reductase and flavan-3-ol
gallate synthase [8].
NH2
O
O
COOH
OH
OH
HO
DihydrokaempferolPathway
Naringenin Gallic acid
4. Series of reactions was elaborated for the transformation of 1, 2, 3-tri
methoxybenzene into gallic acid. The intermediate 1-Bromo-3, 4, 5-
trimethoxybenzene was prepared by nitration, reduction, diazotization
and decomposition of the diazonium salt in the presence of cuprous
bromide. The halide-exchange reaction of the aryl bromide with butyl
lithium, decomposition of the inter-mediate lithio derivative with CO2
and demethylation, lead to gallic acid [9].
OCH3H3CO
H3CO
COOH
HO
OH
OH
Multistep reaction
1, 2, 3-Trimethoxybenzene Gallic acid
8
5. In plants gallic acid is obtained by hydrolysis of tannins [10,11]
O
O
OH
OHO
O
HO
HO COOH
HO
OH
OH
Hydrolysis
Acid / Alkali
Tannin Gallic acid
Some natural products possessing gallic acid nucleus and its
derivatives are
NH2
O
CH3
O
H3C
O
CH3
O
O OH
OH
OH
HO
HO
OH
Mescaline Myrcetin
O CH3
O
H3C
OHHOO
OH
HO
O
O
OH
OH
OH
OH
OH
OH
Sinapyl alcohol Gallocatechin gallate
9
O
O
OH
HO
HO
O OH
OH
HO
OH
O
O
O
O
OH
OH
HO
HO
Ellagic acid Epicatechingallate
1.2.2 Thiazolidinones
Chemical Name: 4 – Oxothiazolidine.
Thiazolidinones are the derivatives of thiazolidine which belong to an
important group of heterocyclic compounds containing sulfur and nitrogen in a
five member ring. A lot of research work on thiazolidinones have been done in
the past. The nucleus is also known as wonder nucleus because it gives out
different derivatives with all different types of biological activities. The 3-
unsubstituted-4-thiazolidinones are usually solids, often melt with
decomposition, but the attachment of an alkyl group to the nitrogen lowers the
melting point. The 4-thiazolidinones that do not contain aryl or higher alkyl
substituents are somewhat soluble in water.
Thiazolidinones are reported to possess variety of pharmacological
activities such as antiHIV [12-14], anticancer [15, 16], anticonvulsant [17],
anti-inflammatory [18], antimicrobial [19, 20] and follicle stimulating hormone
(FSH) receptor agonist activity [21] etc.
HN
SO
10
Some of the drugs with thiazolidinone nucleus are
N
S
O
OH
O
HN
OH2N
H
S
NH
O
O
ON
Ampicillin Pioglitazone
1.2.3 Azetidinones
HN
O
Chemical Name: Azetidin-2-one
The name lactam is given to cyclic amides. In older nomenclature
second carbon in an aliphatic carboxylic acid was designated as , the third
as ß and so on. Thus a -lactam is a cyclic amide with four atoms in its ring.
The contemporary name for this ring system is azetidinone. ß- lactam came
to be a generic descriptor for penicillin family. The ring ultimately proved to
be the main component of the pharmacophore. So the term possesses
medicinal as well as chemical significance.
The chemistry of -lactams has taken an important place in organic
chemistry since the discovery of Penicillin by Sir Alexander Fleming in 1928
and shortly thereafter Cephalosporin which were both used as successful
antibiotics. The 2-azetidinone ( -lactams) ring is a common structural feature
11
of a number of broad spectrum -lactam antibiotics including penicillins,
cephalosporins, carbapenems, nocardicin and monobactams which have been
widely used as chemotherapeutic agents to treat bacterial infection and
microbial diseases. These molecules operate by forming a covalent adduct
with membrane bound bacterial transpeptidases which are also known as
penicillin binding proteins (PBPs) involved in the bio- synthesis of cell wall
[22]. Apart from antibiotic activity, -lactam also possess cholesterol
inhibition [23], antithrombotic [24], antiviral [25] and antifungal activities
[26].
Some of the drugs with azetidinone nucleus are
N
O
HO
HO
F
F
N
S
O
OH
O
H3C HN
O OCH3
H3CO
H
H3C
Ezetimibe Methicillin
1.3 BIOLOGICAL ACTIVITIES
Biological screening is an important part of any research. The
modification of pharmacophore is an important part of drug design. When a
drug is a complex chemical mixture, this activity is exerted by the substance's
active ingredient or pharmacophore but can be modified by other constituents.
Activity is generally dosage-dependent and it is not uncommon to have effects
ranging from beneficial to adverse for one substance when going from low to
high doses.
12
1.3.1 Antimicrobial activity
Microbial infection cause many diseases like pneumonia, meningitis,
bacteraemia, otitis media, sinusitis, tuberculosis, plague, petrusis, cholera,
diptheria, pneumonia, tetanus, leprosy, leptospirosis, etc. The upsurge of
widespread multi-drug resistance microorganisms such as Bacillus subtilis,
Staphylococcus aureus, Streptococcus mutans, Escherichia coli, Klebsiella
pneumonia, Pseudomonas aeruginosa, etc, had been reported as a major threat
to human health. In view of this resistance to drugs currently in use and
emergence of new diseases, there is a continuous need for the synthesis of new
organic compounds as potential antimicrobial agents using a fast and efficient
approach.
The fungal infection also causes number of diseases like athelets foot,
candidiasis, mycosis, tinea, white nose syndrome, zeaspora etc. Primary and
opportunistic fungal infections continue to increase rapidly because of the
increased number of immune compromised patients. As known, not only
biochemical similarity of the human cell and fungi forms a handicap for
selective activity, but also the easily gained resistance is the main problem
encountered in developing safe and efficient antifungals. The ideal antifungal
agents should be fungicidal with broad spectrum of activity and also be suitable
for oral or intraveneous administration and possess good pharmacodynamic
properties without development of resistance during therapy. At present none
of the clinically used drugs satisfies all these criteria. So there is a need to
develop antifungal drugs [27].
1.3.2 Antioxidant activity
Antioxidant compounds in food play an important role as a health
protecting factor. Scientific evidence suggests, that antioxidants reduce the risk
13
for chronic diseases including cancer and heart disease. Primary sources of
naturally occurring antioxidants are whole grains, fruits and vegetables. Plant
sourced food antioxidants like vitamin C, vitamin E, carotenes, phenolic acids,
phytate and phytoestrogens have been recognized as having the potential to
reduce disease risk. Most of the antioxidant compounds in a typical diet are
derived from plant sources and belong to various classes of compounds with a
wide variety of physical and chemical properties. Some compounds, such as
gallates, have strong antioxidant activity, while others, such as mono-phenols
are weak antioxidants. The main characteristic of an antioxidant is its ability to
trap free radicals. Highly reactive free radicals and oxygen species are present
in biological systems from a wide variety of sources. These free radicals may
oxidize nucleic acids, proteins, lipids or DNA and can initiate degenerative
disease. Antioxidant compounds like phenolic acids, polyphenols and
flavonoids scavenge free radicals such as peroxide, hydroperoxide or lipid
peroxide and thus inhibit the oxidative mechanisms that lead to degenerative
diseases [28].
There are a number of clinical studies suggesting that the antioxidants
in fruits, vegetables, tea and red wine are the main factors for the observed
efficacy of these foods in reducing the incidence of chronic diseases including
heart disease and some cancers. The free radical scavenging activity of
antioxidants in food materials has been substantially investigated and reported
in the literature [29, 30]. Various antioxidant activity methods have been used
to monitor and compare the antioxidant activity of food. In recent years,
oxygen radical absorbance capacity assays and enhanced chemiluminescence
assays have been used to evaluate antioxidant activity of foods, serum and
other biological fluids. These methods require special equipment and technical
skills for the analysis. The different types of methods published in the literature
for the determinations of antioxidant activity of foods involve electron spin
resonance (ESR) and chemiluminescence methods. These analytical methods
14
measure the radicalscavenging activity of antioxidants against free radicals
like the 1,1-Diphenyl-2-picrylhydrazyl (DPPH) radical, the superoxide anion
radical (O2.), the hydroxyl radical (OH.) or the peroxide radical (ROO.).
The various methods used to measure antioxidant activity of food
products can give varying results depending on the specific free radical being
used as a reactant. There are other methods which determine the resistance of
lipid or lipid emulsions to oxidation in the presence of the antioxidant being
tested. The malondialdehyde (MDA) or thiobarbituric acid-reactive-substance
(TBARS) assays have been used extensively since 1950’s to estimate the
peroxidation of lipids in membrane and biological systems. These methods are
time consuming, because they depend on the oxidation of a substrate which is
influenced by temperature, pressure, matrix etc. and may not be practical when
large numbers of samples are involved.
Antioxidant activity methods using free radical traps are relatively
straightforward to perform. The ABTS [2,2’-Azinobis(3-ethylbenzothiazolin-6-
sulfonic acid)] radical cation [30] has been used to screen the relative radical-
scavenging abilities of flavonoids and phenolics. The Oxygen Radical
Absorbance Capacity (ORAC) procedure to determine antioxidant capacity of
fruits and vegetables are also reported [31]. Phenolic and polyphenolic
compounds constitute the main class of natural antioxidants present in plants,
food and beverages and are usually quantified employing Folin’s reagent. A
rapid, simple and inexpensive method to measure antioxidant capacity of food
involves the use of the free radical, DPPH [32].
1.3.3 Antitubercular activity
Tuberculosis (TB) is a common and often deadly infectious disease
caused by various strains of mycobacteria, usually Mycobacterium tuberculosis
15
in humans [33]. TB usually attacks the lungs but can also affect other parts of
the body. It is spread through the air when people who have the disease when
cough, sneeze or spit [34]. Most infections in humans result in an
asymptomatic, latent infection and about one in ten latent infections eventually
progresses to active disease, which, if left untreated, kills more than 50% of its
victims.
The classic symptoms are chronic cough with blood-tinged sputum,
fever, night sweats and weight loss (the last giving rise to the formerly
prevalent colloquial term "consumption"). Infection of other organs causes a
wide range of symptoms. Diagnosis relies on radiology (commonly chest X-
rays), a tuberculin skin test, blood tests, as well as microscopic examination
and microbiological culture of bodily fluids. Treatment is difficult and requires
long courses of multiple antibiotics. Contacts are also screened and treated if
necessary. Antibiotic resistance is a growing problem in (extensively) multi-
drug-resistant tuberculosis. Prevention relies on screening programs and
vaccination, usually with Bacillus Calmette-Guérin vaccine.
One third of the world's population is thought to be infected with M.
tuberculosis, [35, 36] and new infections occur at a rate of about one per
second [37]. The proportion of people who become sick with tuberculosis each
year is stable or falling worldwide but, because of population growth, the
absolute number of new cases is still increasing. In 2007 there were an
estimated 13.7 million chronic active cases, 9.3 million new cases, and 1.8
million deaths, mostly in developing countries [38]. In addition, more people in
the developed world are infected with tuberculosis, because their immune
systems are compromised by immunosuppressive drugs, substance abuse or
AIDS. The distribution of tuberculosis is not uniform across the globe; about
80% of the population in many Asian and African countries test positive in
tuberculin tests, while only 5-10% of the US population test positive.
16
Treatment for TB uses antibiotics to kill the bacteria. Effective TB treatment is
difficult, due to the unusual structure and chemical composition of the
mycobacterial cell wall, which makes many antibiotics ineffective and hinders
the entry of drugs [39]. The two most commonly used drugs are Rifampicin
and Isoniazid. However, instead of the short course of antibiotics typically used
to cure other bacterial infections, TB requires much longer periods of treatment
(around 6 to 24 months) to entirely eliminate mycobacteria from the body.
Latent TB treatment usually uses a single antibiotic, while active TB disease is
best treated with combinations of several antibiotics, to reduce the risk of the
bacteria developing antibiotic resistance People with latent infections are
treated to prevent them from progressing to active TB disease later in life.
Drug-resistant tuberculosis is transmitted in the same way as regular
TB. Primary resistance occurs in persons infected with a resistant strain of TB.
A patient with fully susceptible TB develops secondary resistance (acquired
resistance) during TB therapy because of inadequate treatment, not taking the
prescribed regimen appropriately or using low-quality medication [40]. Drug-
resistant TB is a public health issue in many developing countries, as treatment
is longer and requires more expensive drugs. Multi-drug-resistant tuberculosis
(MDR-TB) is defined as resistance to the two most effective first-line TB
drugs: Rifampicin and Isoniazid. Extensively drug-resistant TB (XDR-TB) is
also resistant to three or more of the six classes of second-line drugs [41]. So
there is an urgent need to develop drugs for treating tuberculosis.
1.4 QUANTITATIVE STRUCTURE ACTIVITY RELATIONSHIP
(QSAR)
QSAR represent an attempt to correlate structural descriptors of
compounds with activities. These structural descriptors, which include
parameters to account for hydrophobicity, topology, electronic properties, and
17
steric effects, are determined empirically or, more recently, by computational
methods. Activities used in QSAR include chemical measurements and
biological assays. QSAR currently are being applied in many disciplines, with
many pertaining to drug design and environmental risk assessment. In the
1890's, Hans Horst Meyer of the University of Marburg and Charles Ernest
Overton of the University of Zurich, working independently, noted that the
toxicity of organic compounds depended on their lipophilicity [42, 43].
QSAR based on Hammett's relationship utilizes electronic properties
as the descriptors of structures. Difficulties were encountered when
investigators attempted to apply Hammett-type relationships to biological
systems, indicating that other structural descriptors were necessary.
Robert Muir, a botanist at Pomona College, California, has studied
the biological activity of compounds that resembled indole acetic acid and
phenoxy acetic acid, which function as plant growth regulators. In his attempt
to correlate the structures of the compounds with their activities, he consulted
his colleague in chemistry, Corwin Hansch. Using Hammett sigma parameters
to account for the electronic effect of substituents did not lead to meaningful
QSAR [44]. However, Hansch recognized the importance of the lipophilicity,
expressed as the octanol-water partition coefficient, on biological activity [45].
We now recognize this parameter to provide a measure of the bioavailability of
compounds, which will determine, in part, the amount of the compound that
gets to the target site. Relationships were developed to correlate a structural
parameter (i.e., lipophilicity) with activity. In some cases, a univariate
relationship correlating structure and activity was adequate. The form of the
equation is:
Log (1/C) =a log P + b (1.1)
18
where log P – Partition Coefficent
b - Constant
where C is the molar concentration of compound that produces a standard
response (e.g., LD50, ED50). Carbonic anhydrase catalyzes the reaction
CO2 + H2O HCO3- + H+ (1.2)
the hydration of some aldehydes and ketones, and the hydrolysis of alkyl and
aryl esters. It is a zinc-containing enzyme of about 30,000 daltons, and the
three-dimensional structure has been characterized by X-ray diffraction.
Physiologically, carbonic anhydrase is involved in gastric, urinary, pancreatic,
lacrimal, and cerebrospinal secretions. Inhibitors of carbonic anhydrase include
aromatic and heterocyclic sulfonamides, and some of these compounds have
found application as diuretics.
Both traditional QSAR and computer graphical methods have been
applied to the development of sulfonamides and other compounds as inhibitors
of carbonic anhydrase. For example, Hansch et al. [46] developed a QSAR
based on the binding constants of 29 phenylsulfonamides to the enzyme. The
equation that was derived was the following
log K = 1.55 + 0.64 log P – 2. 07 I1 – 3.28 I2 + 6.94 (1.3)
where K is the binding constant, I1=1 if X is meta and I1= 0 if X is ortho or para
and, I2 = 1 if X is ortho and I2 = 0 if X is ortho or para .
The negative coefficients of I1 and I2 suggest that, they account for
unfavorable steric effects when substituents are in the meta or ortho positions.
Binding is favored by electron-withdrawing substituents, which is consistent
with the hypothesis that the ionized form of -SO2NH2 binds to the zinc in the
19
active site of carbonic anhydrase [47]. Interactive computer graphics also
applied to understand better interaction of carbonic anhydrase inhibitors with
the enzyme as illustrated in Figure 1.1.
Fig. 1.1 Interactive computer graphics of carbonic anhydrase inhibitors
The active site is a cavity approximately 12 Angstroms deep with a
zinc atom (magenta) near the bottom of the cavity. The active site is divided
into a hydrophilic half (blue) and a hydrophobic half (red). In the complex, the
inhibitor appears to be bound such that the sulfonamide moiety occupies the
fourth coordination site of the zinc atom, with the other three sites being
occupied by histidine residues.
The QSAR approach uses parameters which have been assigned to
the various chemical groups that can be used to modify the structure of the
drug. The parameter is a measure of the potential contribution of its group to a
particular property of the parent drug. The selection of parameters is an
important step in QSAR study. The various parameters used in QSAR study are
as follows.
20
1.4.1 Thermodynamic Parameters
(i) Heat of Formation: The enthalpy for forming a molecule from its
constituent atom is a measure of the relative thermal stability of a molecule. It
is calculated by quantum-chemical technique and has a wide range of
applicability in conformational analysis, intermolecular modeling and chemical
reaction modeling. The atom limit is 300 atoms or 300 atomic orbitals
(whichever is less) per molecule.
(ii) Partition Coefficient Log P: Log P (the octanol/water partition
coefficient) and molar refractivity are molecular descriptors that can be used to
relate chemical structure to observe chemical behavior. Log P is related to the
hydrophobic character of the molecule. The molecular refractivity index of a
substituent is a combined measure of its size and polarizability.
waterionizedun
octwat/oct ]Solute[
]Solute[LogPLog (1.4)
The partition coefficient is a ratio of concentrations of un-ionized
compound between the two solutions. To measure the partition coefficient of
ionizable solutes, the pH of the aqueous phase is adjusted such that, the
predominant form of the compound is un-ionized. The logarithm of the ratio of
the concentrations of the un-ionized solute in the solvents is called log P.
(iii) Melting Point: The melting point of a solid is the temperature at
which the vapor pressure of the solid and the liquid are equal. At the melting
point, the solid and liquid phase exists in equilibrium. When considered as the
temperature of the reverse change from liquid to solid, it is referred as the
freezing point. When the "characteristic freezing point" of a substance is
determined, in fact the actual methodology is almost always "the principle of
21
observing the disappearance rather than the formation of ice", that is, the
melting point.
(iv) Molar Refractivity (MR): The molar refractivity is a measure of
both the volume of a compound and how easily it is polarized. It is expressed
as:
2
2
(n 1)MMR(n 1)d
(1.5)
where n is the refractive index
M is the molecular weight
d is the density.
The term mol.wt/density define a volume, while the term
(n2 – 1) / (n2 + 1) provide a correction factor by defining how easily the
substituent can be polarized. This is particularly significant if the substituent
has a electron or lone pair of electrons.The positive sign of MR in QSAR
equation explains that, the substituent binds to polar surface while a negative
sign or non-linear relationship indicates steric hindrance at the binding site.
(v) Energy Stretching: Energy stretching is the bond stretching energy.
The value of the E stretching bond energy for pair of atoms joined by a single
bond can be estimated by considering the bond to be a mechanical spring that
obeys Hooke’s law. If r is the stretched length of the bond and r0 is the ideal
bond length, then
E stretching = 1/2 K (r – r0)2 (1.6)
where ro is Ideal bond
22
r is Stretched bond
K is the force constant.
If a molecule consist of three atoms, (a-b-c), then
E stretching = E a-b + E b-c
=K(a-b) [r(a-b) – r0(a-b)] +½ k(b-c) [r(b-c) – r0(b-c)]2 (1.7)
(vi) Torsion Energy: E Torsion is the bond enery due to changes in the
conformation of the bond and given by
1 (1 cos( ( ))2TorsionE k m offset
(1.8)
where k is the energy barrier to the rotation about the torsion angle , m is
the periodicity of the rotation
is offset of the ideal torsion angle relative to staggered
arrangement of two atoms.
(vii) Energy VDW: The Van der Waals interaction energy of the
molecule with the receptor. EvdW is the total energy contribution due to Van der
Waal’s force and it is calculated from the Leonard – Jone potential equation
r)r(2
r)r(E
6min
12min
vdw (1.9)
The6
min(r )r
term in this equation represents attractive force, while
12min(r )r
term represents the short range of repulsive forces between the atoms.
23
The r min is the distance between two atoms when the energy is at a minimum .
The actual distance between the atoms is represented as r.
1.4.2 Electronic Parameters
(i) Energy Bend: E bend is the bond energy due to the changes in the
bond angle and estimated as:
E bend = ½ k ( 0)2 (1.10)
Where, is the actual bond length
0 is the ideal bond length that is the minimum energy position of the
3 atoms.
(ii) Highest Occupied Molecular Orbital (HOMO) Energy: HOMO is
the highest energy level in the molecule that contains electrons. It is crucially
important in governing molecular reactivity and properties. When a molecule
acts as a Lewis base (an electron-pair donor) in bond formation, the electrons
are supplied from the molecule's HOMO. How readily this occurs is reflected
in the energy of the HOMO. Molecules with high HOMOs are more able to
donate their electrons hence relatively reactive when compared to molecules
with low-lying HOMOs, thus the HOMO descriptor measures the
nucleophilicity of a molecule.
(iii) Lowest Unoccupied Molecular Orbital (LUMO) Energy: LUMO
is the lowest energy level in the molecule that contains no electrons. It is
important in governing molecular reactivity and properties.When a molecule
acts as a Lewis acid (an electron-pair acceptor) in bond formation, incoming
electron pairs are received in its LUMO. Molecules with low-lying LUMOs are
24
more able to accept electrons more than those with high LUMOs, thus the
LUMO descriptor measures the electrophilicity of a molecule.
1.4.3 Steric Parameters
(i) Ovality: Ovality or non-circularity is the degree of deviation from
perfect circularity of cross section of the core or cladding of the fibre.
Quantitatively, the ovality of either the core or lading is expressed as,
(a b)2(a b)
(1.11)
where a is the length of major axis
b is the length of minor axis.
(ii) Dipole Moment: The dipole moment descriptor is a 3D electronic
descriptor that indicates the strength and orientation behavior of a molecule in
an electrostatic field. Both the magnitude and the components (X, Y and Z) of
the dipole moment are calculated. It is estimated by utilizing partial atomic
charges and atomic co-ordinates. The descriptor uses Debye units. Dipole
properties have been correlated to long-range ligand-receptor recognition and
subsequent binding.
(iii) Balaban Index: The Balaban Index ‘J’ is a graph index defined for a
graph on n nodes and m edges. This is a highly discriminating descriptor,
whose values do not substantially increase with molecule size and the number
of rings present. Its evaluation begins with the D-matrix modified as follows:
Each edge contributes length 1/b to overall path lengths, where
b is the edge (bond) order.
25
For aromatic bonds, the number b is set to 1.5 by definition
(thus contributing 2/3 to overall path lengths).
n n 1/2i 1 j 1
mJ (DiDj)1
(1.12)
where = m – n +1 is the circuit tank of the graph
Di is the sum of all entries in the ith (or column) of the graph
distance matrix.
Dj is the sum of all entries in the jth (or column) of the graph
distance matrix.
Balaban Index helps to differentiate the molecule according to their
shape
(iv) Connolly Solvent Accessible Area (Angstrom2): The locus of the
center of a spherical probe as it is rolled over the molecular model. Connolly’s
solvent accessible area, a steric descriptor, represents the surface area, that is in
contact with the solvent. The descriptor bears negative coefficient in the model,
suggesting increase in the bulkiness of the substituents and molecular solvent
accessible surface area is not conducive to the activity.
(v) Connolly Molecular Surface Area (Angstrom2): The contact
surface created when a spherical probe is rolled over the molecular model. The
molecular surface (MS) is a continuous sheet consisting of two parts: the
contact surface and the reentrant surface.The contact surface is part of the van
der Waals surface that is accessible to a probe sphere. The reentrant surface is
the inward-facing surface of the probe when it touches two or more atoms.
Molecular surface is also called the Connolly surface.
26
(vi) Connolly Solvent Excluded Volume (Angstrom3): The volume
contained within the contact molecular surface. The molecular surface is also
called the solvent-excluded surface (SES), which is the boundary of the union
of all possible probes which do not overlap with the molecule
(vii) Principle Moment of Inertia(X,Y,Z): The moment of inertia of the
whole body with respect to one of the principal axes is known as Principle
Moment of Inertia. The moments of inertia are computed for a series of straight
lines through the center of mass.
(viii) Wiener Index (W): The Wiener index is the sum of the chemical
bonds existing between all pairs of heavy atoms in the molecule. In graph-
theoretical terms: the sum of lengths of minimal paths between all pairs of
vertices representing heavy atoms. This is equal to half the sum of all D-matrix
entries
Di j ij
1W a2
(1.13)
aij is the ij element of distance matrix of molecule. The summation is made over
all the atoms I and j in the molecule.
1.4.4 QSAR equations
QSAR equations determine the functional relationship between
activity and the selected descriptors; that is, search for mathematical function f,
that has a property that, activity= f (descriptor) to a suitably high level of
accuracy. i.e after identifying the dependent and independent variables a
suitable statistical method is used to generate a QSAR equation [48]. The
statistical methods can be broadly divided into two: linear and non-linear
27
methods. In statistics a correlation is established between dependent variable(s)
(biological activity) and independent variable(s) (molecular descriptors).
The linear method fits a line between the selected descriptors and
activity as compared to non-linear method which fits a curve between the
selected descriptors and activity. The statistical method to build QSAR model
is decided based on the type of biological activity data.
Following are few commonly used statistical methods:
Categorical Dependent Variable - Discriminant analysis, Logistic
regression, k-Nearest neighbour classification, Decision trees.
Continuous Dependent Variable - Multiple regression, Principle
component regression, Continuum regression, Partial least
squares regression, Canonical correlation analysis, k-Nearest
neighbor method, Neural networks.
Multiple regression is the widely used method for building QSAR
model. It is simple to interpret a regression model, in which contribution of
each descriptor could be seen by the magnitude and sign of its regression
coefficient. Multiple linear regression attempts to maximize the fit of the data
to a regression equation for the biological activity by adjusting each of the
parameters upon down. Successive regression equations will be derived in
which parameters will be either added or removed until the r2 and S values are
optimized. The magnitude of coefficients derived in this manner that indicates
the relative contribution of the associated parameter to bioactivity.
There are various statistical measures available for evaluation of the
significance of the model; following are most commonly used [49].
28
n - number of molecules
k - number of descriptors in a model
df - degree of freedom (n-k-1) (higher is better)
r2 - coefficient of determination (> 0.7)
Q2 - cross-validated r2 (>0.5)
pred_r2 - for external test set (>0.5)
SEE - standard error of estimate (smaller is better)
F-test - F-test for statistical significance of the model
(higher is better, for same set of descriptors and
compounds)
Z score - Z score calculated by the randomization test (higher
is better)
SDEP - Standard deviation error of predictivity.
Correlation Coefficient (r) and Coefficient of Determination (r2): The
quantity r, called linear correlation coefficient, measures the strength and the
direction of a linear relationship between two variables. The coefficient of
determination, r2, is useful because it gives the proportion of the variance
(fluctuation) of one variable that is predictable from the other variable. It is a
measure that allows us to determine how one can be in making predictions
from certain model/graph. It can be calculated as:
2 Sum of Squares of the deviation from the regressionlinerSum of Squares of the deviations from the mean
Regression VarianceOriginal Variance
(1.14)
29
Regression variance is defined as the original variance minus the
variance around the regression line. The original variance is the sum of square
distances of the original data from the mean. If
0 < r2 < 1, it indicates positive correlation
r2 = 0, it shows that there is no linear correlation or weak correlation
r2 = 1, it means perfect correlation.
The higher of the r2 value, less likely the relationship is due to
chance.
F or Variance Ratio: F-statistic value is a ratio between explained and
unexplained variance for a given number of degree of freedom. The larger the
value of F, greater the probability that the QSAR model is significant.
Z-Score: Z score can be defined as an absolute difference between the values
of the model and the activity field, divided by the square root of the mean
square error of the data set. Any compounds which show Z-score higher than
2.5 in QSAR model is considered as outlier.
1.4.5 Validation of equation
Validation technique is used to identify outlines (data that is not
modeled well by the equation). Graphic analysis and cross validation are used
to characterize the robustness the QSAR .There is no single method that works
better for predictiveness, interpretability and computational efficiency.
Cross Validation Technique: As opposed to traditional regression methods,
cross validation [45] evaluates the validity of a model by how well it will
predict data rather than how well it will fit data. The analysis uses Leave-One-
Out (LOO) scheme. Each compound is left out of the model derivation and
predicted inturn. An indication of the performance of the model is obtained
from the cross validated r2 which is defined as
30
r2 =SD-Press/SD (1.15)
where SD is sum of squares of deviation for each activity from the mean,
Press is predictive sum of squares which is the sum of the squared differences
between the actual and predicted value.
Once a model is developed which has the highest cross-validated r2
that is used to derive the conventional QSAR equation and conventional r2 and
S values. The final model results are then visualized as contour maps of the
coefficients.
1.4.6 Predict of Activity
From the QSAR equations obtained, the biological activity of new
compounds may be predicted
QSAR methods are useful in elucidating the mechanism of chemical-
biological interaction in various biomolecules, particularly enzymes,
membranes, organelles and cells. It has also utilized for the evaluation of
absorption, distribution, metabolism and excretion phenomena in organism and
whole animal study. Potential use of QSAR model for screening of chemical
database or virtual libraries before their synthesis appears equally attractive to
chemical manufacturers and pharmaceutical companies.