Quantitative Structure-Activity Relationships Quantitative Structure-Property-Relationships
QSAR & QSPR
Alexandre VarnekFaculté de Chimie, ULP, Strasbourg, FRANCE
History of QSAR
Dmitry Mendeleév (1834 –
1907)
•
Russian chemist who arranged the 63 known elements into a periodic table based on atomic mass, which he published in Principles of Chemistry in 1869. Mendeléev left space for new elements, and predicted three yet-to-be-discovered elements:
Ga (1875), Sc
(1879)
and Ge
(1886).
Discoverer of the Periodic Table —an early “Chemoinformatician”
Periodic Table
Chemical properties of elements gradually vary along the two axis
History of QSAR
•
1868, D. Mendeleev –
The Periodic Table of Elements
•
1868, A. Crum-Brown and T.R. Fraser –
formulated a suggestion that physiological activity of molecules depends on their constitution:
Activity = F(structure)They studied a series of quaternized strychnine derivatives, some of
which possess activity similar to curare in paralyzing muscle.
•
1869, B.J. Richardson –
narcotic effect of primary alcohols varies in proportion to their molecular weights.
History of QSAR
•
1893, C. Richet has shown that toxicities of some simple organic compounds (ethers, alcohols, ketones) were inversely related to their solubility in water.
•
1899, H. Meyer and 1901, E. Overton have found variation of the potencies of narcotic compounds with LogP.
•
1904, J. Traube found a linear relation between narcosis and surface tension.
History of QSAR
•
1937, L.P. Hammett studied chemical reactivity of substituted benzenes: Hammett equation,Linear Free Energy Relationship (LFER)
•
1939, J. Fergusson formulated a concept linking narcotic activity, logP and thermodynamics.
•
1952-
1956, R.W. Taft devised a procedure for separating polar, steric and resonance effects.
History of QSAR
•
1964, C. Hansch and T. Fujita: the biologist’s Hammett equation.
•
1964, Free and Wilson, QSAR on fragments.
•
1970s –
1980s –
development of 2D QSAR (descriptors, mathematical formalism).
•
1980s –
1990s, development of 3D QSAR (pharmacophores, CoMFA, docking).
•
1990s –
present, virtual screening.
R H CH3 OCH3 F Cl NO2
ortho 6.27 12.3 8.06 54.1 11.4 671
meta 6.27 5.35 8.17 13.6 14.8 32.1
para 6.27 4.24 3.38 7.22 10.5 37.0
1934 -
Hammett
Substituent SubstituentMeta Para Meta Para
O -0.708 -1.00 F +0.337 +0.062
OH +0.121 -0.37 Cl +0.373 +0.227
OCH3+0.115 -0.268 CO2 H
+0.355 +0.406
NH2-0.161 -0.660 COCH3
+0.376 +0.502
CH3-0.069 -0.170 CF3
+0.43 +0.54
(CH3
)3
Si -0.121 -0.072 SO2 Ph +0.61 +0.70
C6 H5+0.06 -0.01 NO2
+0.710 +0.778
H 0.000 0.000 +N(CH3)3
+0.88 +0.82
SH +0.25 +0.15 N2 + +1.76 +1.91
SCH3+0.15 0.00 +S(CH3
)2+1.00 +0.90
1934 -
Hammett
σ σ
Here, the size of R affects the rate of reaction by blocking nucleophilic attack by water.
Taft
quantified the steric (spatial) effects using the hydrolysis of
esters:
In this case, the steric effects were quantified by the Taft parameter
Es
: k is the rate constant for ester hydrolysis. This expression is analogous to the Hammett equation.
Steric effects
t-Bu -2.78 : large resistance to hydrolysis
Me -1.24: little steric resistance to hydrolysis
H 0.00 the reference substituent in the Taft equation
Compare some extreme values:
Es Values for Various SubstituentsH Me Pr t-Bu F Cl Br OH SH NO2 C6
H5 CN NH2
0.0 -1.24 -1.60 -2.78 -0.46 -0.97 -1.16 -0.55 -1.07 -2.52 -3.82 -0.51 -0.61
Note: H is usually used as the reference substituent (Es
(0)), but sometimes when another group, such as methyl (Me) is used as the reference, as in the chemical
equation above, the value becomes 1.24.
Organophosphates must be hydrolysed to be active and it is observed that their biological activity is directly related to the Taft steric parameter ES
for the substituent R by the equation:
Es may be used in other chemical reactions and to explain biological activities, for example the hydrolysis of inhibitors of acetylcholine esterase.
Steric effects
Usually, logP instead of P is used
logP > 0, the compound prefers hydrophobic (unpolar) medialogP > 0, the compound prefers polar media
Octanol/water partition coefficient
Biological activity as a function of logP
Hansch AnalysisHansch Analysis
Biological ActivityBiological Activity
= = log1/C log1/C C, drug concentration causes EC50, GI50, etcC, drug concentration causes EC50, GI50, etc..
EL (electronic descriptor): EL (electronic descriptor): σσ
Hammett constant ( Hammett constant ( σσ
mm
, , σσ
p, p, σσ
pp
00, , σσ
pp
++, , σσ
pp
--, , R, F )R, F )
HPh (hydrophobicity descriptor):HPh (hydrophobicity descriptor):ππ
hydrophobic subst. constant, hydrophobic subst. constant, log Plog P
octanol/water octanol/water
partition coeff. partition coeff.
ST (steric descriptor):ST (steric descriptor):
Taft steric constantTaft steric constant
Biological Activity = Biological Activity = f f ((EL, ST, HPhEL, ST, HPh) + constant ) + constant
Hansch, C.; Fujita, T. J. Am. Chem. Soc., 1964, 86, 1616.Hansch, C.; Fujita, T. J. Am. Chem. Soc., 1964, 86, 1616.
log1/C = a ( log P )log1/C = a ( log P )22 + b log P + + b log P + ρσρσ
+ + δδEEss + C+ C
•
Physicochemical properties can be broadly classiied into three general types:
•
Electronic •
Steric
•
Hydrophobic
Hansch AnalysisHansch AnalysisBiological Activity = Biological Activity = f f ((Physicochemical properties Physicochemical properties ) + constant ) + constant
Descriptors
Molecular Structure
Molecular Molecular StructureStructure ACTIVITIESACTIVITIESACTIVITIES
RepresentationRepresentationRepresentation Feature Selection & Mapping
Feature Selection & Feature Selection & MappingMapping
DescriptorsDescriptorsDescriptors
Quantitative structureQuantitative structure--activity relationships correlate, within activity relationships correlate, within congeneric seriescongeneric series of of compounds, their chemical or biological activities, either with compounds, their chemical or biological activities, either with certain structural certain structural features or with atomic, group or molecular descriptors.features or with atomic, group or molecular descriptors.
Quantitative Structure Activity Relationship (QSAR)
Katiritzky, A. R. ; Lovanov, V. S.; Karelson, M. Chem. Soc. Rev.
19951995, 24, 279-287
The molecular descriptor is the final result of a logic
and mathematical procedure which transforms
chemical information encoded within a symbolic
representation of a molecule into a useful number or
the result of some standardized experiment.
Definition of molecular descriptorDefinition of molecular descriptor
Roberto Todeschini and Viviana Consonni
A complete description of all the molecular descriptors is given in: A complete description of all the molecular descriptors is given in:
Handbook of Molecular DescriptorsHandbook of Molecular DescriptorsRoberto Todeschini and Viviana ConsonniRoberto Todeschini and Viviana Consonni
WILEY -
VCH, Mannheim, Germany -
2000WILEY -
VCH, Mannheim, Germany -
2000
Methods and Principles in Medicinal ChemistryVolume 11
Edited by:H. KubinyiR. Mannholdxx. Timmermann
Descriptors from Codessa Pro
TopologicalFragmentsReceptor surfaceStructuralInformation-contentSpatialElectronicThermodynamicConformationalQuantum mechanical
Descriptor Families
Products
Plus Molecular and Quantum Methods
Descriptors -
calculable molecular attributes that govern particular macroscopic properties
Molecular Descriptors
•
1D (atom counts, MW, number of functional groups, …)
•
2D (topological indices, BCUT, TPSA, Shannon enthropy, …)
•
3D (geometrical parameters, molecular surfaces, parameters calculated in quantum chemistry programs, …)
Classification based on the dimensionality of structure presentation
Molecular Descriptors
1D
Constitutional descriptorsConstitutional descriptors
••
number of atoms number of atoms ••
absolute and relative numbers of C, H, O, S, N, F, Cl, Br, I, P absolute and relative numbers of C, H, O, S, N, F, Cl, Br, I, P atoms atoms
••
number of bonds (single, double, triple and aromatic bonds) number of bonds (single, double, triple and aromatic bonds) ••
number of benzene rings, number of benzene rings divided by the number of benzene rings, number of benzene rings divided by the number of atoms number of atoms
••
molecular weight and average atomic weight molecular weight and average atomic weight ••
Number of rotatable bonds (All terminal H atoms are ignored) Number of rotatable bonds (All terminal H atoms are ignored)
••
Hbond acceptor Hbond acceptor -- Number of hydrogen bond acceptors Number of hydrogen bond acceptors ••
Hbond donor Hbond donor -- Number of hydrogen bond donors Number of hydrogen bond donors
These simple descriptors reflect only the molecular composition These simple descriptors reflect only the molecular composition of the of the compound without using the geometry or electronic structure of compound without using the geometry or electronic structure of the molecule.the molecule.
Molecular Descriptors
2D
Topological DescriptorsTopological Descriptors
Descriptors based on the molecular graph representation are wideDescriptors based on the molecular graph representation are widely used in ly used in QSPR, QSAR studies because they help to differentiate the molecuQSPR, QSAR studies because they help to differentiate the molecules les according mostly to their size, degree of branching, flexibilityaccording mostly to their size, degree of branching, flexibility and overall and overall shape.shape.
•
Total adjacency index: A
= (1/2)
•
For G1
and G2
, A = 5.•
This TI can only distinguish between structures having different
number of cycles (for cyclohexane A = 6).
TI based on the adjacency matrix
, 1
n
iji j
a=
∑
•M1 =
M2 = where the vertex degree δι is a number of σ
bonds involving atom i excluding
bonds to H atoms.
TI based on the adjacency matrix
: Zagreb group indices
2
1
n
ii
δ=∑ i jδ δ∑
Zagreb group indices were introduced to characterize branching
M1 =
M2 =
Zagreb group indices
2
1
n
ii
δ=∑ i jδ δ∑
M1
(G2
) = 2*12
+4*22
= 18
M1
(G2
) = 2*(1*2) +3*(2*2) = 16M1
(G1
) = 4*12
+2*32
= 22
M2
(G1
) = 4*(1*3) +1*(3*3) = 21
Randić’s molecular connectivity indexRandic introduced a connectivity index similar to M2
χR
=
M. Randić, J. Am. Chem. Soc., 97, 6609 (1975).
1/ 2( )i jδ δ −∑
The entry dij of the distance matrix indicates the number of edges in the shortest path between vertices i and j.
The Wiener index (the first TI !) accounts for the branching:W(G1) = 29 W(G2) = 35
Reference: H. Wiener, J. Am. Chem. Soc., 69, 17 (1947)
TI based on the Distance Matrix: the Wiener Index
Peter Ertl, Bernhard Rohde, and Paul Selzer, J. Med. Chem. 2000, 43, 3714-3717
TPSA - Topological Polar Surface Area
)c(fragmentn i
fragmN
ii
PSAD .)(
13 ∑
=
=−
TPSA - Topological Polar Surface Area
TPSA - Topological Polar Surface Area
3D PSA vs TPSA for 34 810 molecules from theWorld Drug Index
••Moments of inertia Moments of inertia -- rigid rotator approximation rigid rotator approximation -- The moments of inertia characterize the mass distribution in thThe moments of inertia characterize the mass distribution in the molecule. e molecule.
Geometrical descriptorsGeometrical descriptors
Area Area ––
--
Molecular surface area descriptor Molecular surface area descriptor
––
--
Describes the van der Waals area of molecule Describes the van der Waals area of molecule ––
--
related to binding, transport, and solubilityrelated to binding, transport, and solubility
1. Rohrbaugh, R.H., Jurs, P.C., 1. Rohrbaugh, R.H., Jurs, P.C., Anal.Chim. ActaAnal.Chim. Acta, , 19871987. . 199199, 99, 99--109.109.
( )
mass ofcenter the torelative scoordinate atomic the: zy,x,atoms ofnumber : N
222
⎟⎟⎠
⎞⎜⎜⎝
⎛ ++= ∑ N
zyxRog iii
∑=i
iidmI 2
••Shadow indicesShadow indices11
-- Surface area projectionsSurface area projections
Radius of gyration Radius of gyration
Molecular Descriptors
3D
Steric parametersSteric parameters••
LengthLength--toto--breadth ratio : L/B breadth ratio : L/B 11
••
Molecular thickness Molecular thickness
••
Ovality Ovality 2 2
(ratio of the actual surface area and (ratio of the actual surface area and minimum surface )minimum surface )
••
Molecular volume Molecular volume
••
Sterimol parameters Sterimol parameters 33
••
Taft steric parameter ETaft steric parameter Ess
1.1. Janini, G.M.; Johnston, K.; Zielinski, W. L. Janini, G.M.; Johnston, K.; Zielinski, W. L. Anal. Anal. Chem.Chem.
1975, 1975, 4747, 670. , 670. 2.2. Verloop, A.; Tipker, J. In Verloop, A.; Tipker, J. In Biological Activity and Biological Activity and
Chemical StructureChemical Structure, Buisman, J. A. K.(editors), , Buisman, J. A. K.(editors), Elsevier, Amsterdam, Netherlands, 1977, p63. Elsevier, Amsterdam, Netherlands, 1977, p63.
3.3. Kourounakis, A.; Bodor, N. Kourounakis, A.; Bodor, N. Pharm. Res.Pharm. Res.
1995, 1995, 12(8)12(8), , 1199.1199.
LLBBLL BB
Molecular thicknessMolecular thickness
B1
B4
B2 B3
L ax is
B1
B4
B2 B3
L ax is
⎥⎥
⎦
⎤
⎢⎢
⎣
⎡⎟⎠⎞
⎜⎝⎛ ×
=32
434
πvolumnπ
eaSurface arovality
L ax i sL ax i sL ax i s
B1B1B1
Quantum Chemical DescriptorsQuantum Chemical Descriptors••
Quantitative values calculated in QUANTUM MECHANICSQuantitative values calculated in QUANTUM MECHANICS(semi(semi--empirical, HF empirical, HF Ab InitioAb Initio
or DFT ) calculationsor DFT ) calculations
-- Atomic charges Atomic charges (quant)(quant)-- Atomic chargesAtomic charges-- LUMO LUMO --
Lowest occupied molecular orbital energy Lowest occupied molecular orbital energy
––
HOMO HOMO --
Highest occupied molecular orbital energy Highest occupied molecular orbital energy ––
DIPOLE DIPOLE --
Dipole moment Dipole moment
••
--
Components of dipole moment along inertia axes (DComponents of dipole moment along inertia axes (Dxx
, D, Dyy
, D, Dzz
) ) ––
Hf Hf --
Heat of formation Heat of formation
––
Mean PolarizabilityMean Polarizability --
αα
= 1/3(= 1/3(αα
xxxx
++αα
yyyy
++αα
zzzz
) ) ––
EAEA ––
Electron Affinity Electron Affinity
––
IPIP ––
Ionization Potential Ionization Potential ––
ΔΔEE ––
Energy of Protonation Energy of Protonation
––
Electrostatic PotentialElectrostatic Potential --
∫∑ −−
−=
rrdrr
rRZrV
A A
A
'')'()( ρ
Lipophilic Descriptors (2D and 3D)
Lipophilic Descriptors
OctanolOctanol--water partition coefficient water partition coefficient ••
HanschHansch--Leo methodLeo method (ClogP)(ClogP)
••
Rekker's methodRekker's method ∑∑==
+=M
mmm
N
nnn FbfaP
11
log
∑= +
=n
i ij
i
dfjMLP
1 1)(
••GhoseGhose--Grippen methodGrippen method
(calculated logP based on summing contributions of atom types)(calculated logP based on summing contributions of atom types)
logP(octanollogP(octanol--water), logP(alkanewater), logP(alkane--water), logP(chloroformwater), logP(chloroform--water), logP(dichloroethane/water)water), logP(dichloroethane/water)
••Molecular lipophilicity potential (MLP)Molecular lipophilicity potential (MLP)
The MLP describe how lipophilicity is distributed all over the dThe MLP describe how lipophilicity is distributed all over the different parts of a ifferent parts of a molecule(lipophilicity maps and determination of hydro and lipopmolecule(lipophilicity maps and determination of hydro and lipophilic regions of hilic regions of a molecule)a molecule)
Lipophilic Descriptors
Some LogPo/w Extremes in Therapy
OH
Cl
Cl
Cl
OH
Cl
Cl
ClNNH
O
N
Cl
F
F
O
O Cl
Cl
O
NH2
NH
OH
O
NH2
NH
HO
O
OH
XX
OH
OH
OH
OOHOH
OHOH O
O
OHOH
OOH
OH
OH
OH
OH
OH
permethrin6.5
clopimozide7.1
hexachlorophen7.54
arginine-4.2
inuline-3.7
sucrose-3.7
What do these Drugs have in Common?
NH O
NH O
O
OH
O O
O
O
OH
O
O
OH
O O
OOH
O
H
H
H
N
O
OH
ONH
OO
HH
H
HH
N
NCl
N
NH2
NH2
Cl
Cl
ClCl
IrsogladineLogPo/w
= 1.97
ChloroformLogPo/w
= 1.97
SecobarbitalLogPo/w
= 1.97
TrandolaprilLogPo/w
= 1.97
AcetyldigitoxineLogPo/w
= 1.97
3D Hydrophobicity
All molecules have the same logP ~1.5, but different 3D MLP pattern.
hydrophobic hydrophilic
–
Drug is exposed to a large varietyof pH values:
•
Saliva pH 6.4•
Stomach pH 1.0 –
3.5
•
Duodenum pH 5 –
7.5•
Jejunum pH 6.5 –
8
•
Colon pH 5.5 –
6.8•
Blood pH 7.4
–
„Liver-first-pass-effect“ www.3dscience.com
Example of oral administration:
Lipophilic DescriptorsLipophilic Descriptors
••
Log D Log D ••
Log PLog PNN : : logP of the neutral form logP of the neutral form
••
Log PLog PII : : logP of the ionized form logP of the ionized form
II
NN
pHsystem PfPfD •+•=
logD –
The Calculation•
LogD may simply be calculated from predicted logP and pKa of the singly ionized species at certain pH:
•
For acids:logD(pH)
= logP –
log[1 + 10(pH -
pKa)]
•
For bases:logD(pH)
= logP –
log[1 + 10(pKa
- pH)]
Fragment Descriptors
Descriptors: Cl, amide, COOH, Br, Phenyl
Cl = 1amide = 1COOH = 1Br = 0Phenyl = 0
NO
N
S
N
O
OCl
H
Cl
O
NH
O
O
N
N
N
N
NHH
HH
HI. Sequences
II. Augmented Atoms
ISIDA Fragment descriptors
Type of Fragments
C-N=C-H
C-N=CN=C-NC-NN=CC-H
I(AB, 2-4)
sequenceAtoms+Bonds
2 to 4 atoms
I. Sequences
II. Augmented AtomsN
N
N
N
NHH
HH
H
Type of Fragments
II(Hy) (hybridization of neighboursis taken into account)
II(A) (no hybridization)
ISIDA Fragment descriptors
N
O
N
O
N
O
Etc.
DataSet
C-C-C
-C-C
-CC-C
-C-N
-C-C
C=OC-C
-C-N
C-N-C
-C*C
ISIDA FRAGMENTOR
0 10 1 5 0
0 8 1 4 0
0 4 1 2 4
the Pattern matrix
Calculation of Descriptors
+
PATTERN MATRIX PROPERTY VALUES
-0.222
0.973
-0.066
LEARNING STAGEBuilding of models
QSAR models
VALIDATION STAGEQSAR models filtering ->
selection of the most predictive ones
Example : linear QSPR model Daa i
k
ii.Propriété
10 ∑
=+=Property
PROPERTYcalc
= -0.36 * NC-C-C-N-C-C
+ 0.27 * NC=O
+ 0.12 * NC-N-C*C
+ …
Software
DRAGON
The software DRAGON calculates 1664 molecular descriptors divided in 20 blocks
CODESSA Pro
calculate a large variety of molecular descriptors on the basis of the 3D geometrical structure and/or quantum-chemical parameters;
develop (multi)linear and non-linear QSPR
ISIDA program
calculates fragment descriptors; develop (multi)linear and non-linear QSPR models