QSAR study on diketo acid and carboxamide
derivatives as potent HIV-1 integrase inhibitor Presented By
Olayide Arodola (Master student – Pharmaceutical Chemistry)
Aim of this study
The aim of this study is to find out how accurate the QSAR method predicted the
activities of compounds in comparison to their experimental biological activities.
Therefore, a 2-dimensional QSAR model was used to analyze 40 potential diketo
acid and carboxamide-based compounds as HIV-1 integrase inhibitors.
KEY WORDS:
Diketo acid and Carboxamide derivatives
2D-QSAR (2-dimensional quantitative structural activity relationship)
GFA (Genetic function algorithm)
Integrase inhibitor
SOFTWARES USED IN THIS STUDY
Chemdraw ultra 10.0 (to draw 2D structures of the compounds)
Discovery studio v3.5 (to perform QSAR analysis)
The integration of HIV-1 DNA into the host chromosome contains a series of DNA cutting and joining reactions. The first step in
the integration process is 3”end processing. In the second step, termed DNA strand transfer, the viral DNA end is inserted into the
target DNA. Thus, the integrase enzyme is crucial for viral replication and represents a potential target for antiretroviral drug.
About HIV-1 integrase
• First, a quick reminder: what do you understand by ‘drug’
• A very broad definition of a drug would include “all chemicals other than food that affect
living processes”. if it helps the body, its medicine, but if it causes a harmful effect on the
body, its poison.
Nowadays, we are facing a problem of screening a huge number of molecules in other to
testify:
• If they are toxic to human
• If they have an effect on virus e.g HIV, HPV (cervical cancer), H1N1 (flu), ebola etc
• Such screenings are measured by laborious experiments.
• Researchers came up with a process to relate a series of molecular
features with biological activities or chemical reactivities, which is
expected to decrease a number of laborious and expensive experiments
thereby selecting small number of good compounds for later synthesis.
QSAR• QSAR is a mathematical relationship between a biological activity of a molecular
system and its physical and chemical characteristics i.e QSAR represents an attempt to develop correlations between biological activity and physicochemical properties of a set of molecules.
• In pharmacology, biological activity describes the beneficial or adverse effects of a drug on living matter.
• Physicochemical properties of a compound simply means both its physical and chemical property.
• The first application of QSAR is attributed to Hansch (1969), who developed an equation that related biological activity to certain physicochemical properties of a set of structures.
WHY QSARThe number of compounds required for synthesis in order to place 10
different groups in 4 positions of benzene ring is 104
Solution: synthesize a small number of compounds and from their data derive rules to predict the biological activity of other compounds.
Compounds + biological activity
New compounds with improved biological activity
QSAR
Correlate chemical structure with activity using statistical approach
QSAR and Drug Design
BASIC PRINCIPLES
A QSAR normally takes the general form of a linear equation:
Biological activity = Const + (C1×P1) + (C2×P2) + (C3×P3) +...
where the parameters
P1 through pn are computed for each molecule in the series and the coefficients C1 through cn
are calculated by fitting variations in the parameters and the biological activity.
• A = k1d1 + k2d2 + k3d3 + kndn + Const
A – Biological activity
D – Structural properties (descriptors)
K – Regression coefficient
There are a series of statistical model analysis that are used to develop a QSAR model, they
include:
Multiple linear regression (MLR)
Principle component analysis (PCA)
Partial least square (PLS)
Genetic function algorithm (GFA)
There are a series of statistical model analysis that are used to develop a QSAR model, they
include:
Multiple linear regression (MLR)
Principle component analysis (PCA)
Partial least square (PLS)
Genetic function algorithm (GFA)
Why GFA
GFA was used to develop this QSAR models for variable selection. The
purpose of variable selection is to select the variables significantly
contributing to prediction and to discard other variables by fitness
function.
Ability to build multiple models rather than single model
Ability to incorporate the lack of fit (LOF) error that resists over-fitting
Automatic removal of outliers e.g 1, 3, 6, 9, 100
Provision of additional information not available from other statistical
regression analysis
Cpd Core
R1 R2 R3 IC50(μM)
*pIC50(μM)
Predicted pIC50(μM)
1 A Pyrrole 4'-F - 0.17 0.770 0.409
2 A O-xylene - - 5.67 -0.754 0.105
3 A 1,2-(CH3)-1H-pyrrole
- - 0.22 0.658 0.377
4a A 2,3-(CH3) thiopene
- - 0.18 0.745 0.326
5 A 2,4-(CH3) thiopene
- - 0.16 0.796 0.498
6 A 1,3-(CH3)-1H- pyrrole
- - 0.5 0.301 0.616
7 A 2,5-( CH3) thiopene
- - 0.5 0.301 0.608
8a B 4'-Cl - - 1.0 0.000 0.485
9 B 3'-F - - 0.25 0.602 0.463
10 B - 4'-OCH3 - 0.15 0.824 0.505
11 B - 3'-OCH3 - 0.14 0.854 0.591
12a C 4'-F - - 0.10 1.000 1.178
13 C H - - 0.23 0.638 0.971
14 C 2'-Cl - - 0.37 0.432 1.280
15 C 3'-Cl - - 0.04 1.398 1.239
16a C 4'-Cl - - 0.38 0.420 1.213
17 C 4'-F, 3'-Cl - - 0.04 1.398 1.267
18 C 4'-F CN - 0.02 1.699 1.580
19 C 4'-F Br - 0.03 1.523 1.276
20a C 4'-F I - 0.02 1.699 1.482
21 D N(CH3)3 tetrahydro-2H-pyran
1'-(CH3)-4'-F benzene
0.002 2.699 2.495
22 D NH-CO- CH3 CH3 1'-(CH3)-4'-F benzene
0.007 2.155 1.681
23 D NH-SO2- CH3 CH3 1'-(CH3)-4'-F benzene
0.008 2.097 1.973
24a D NH-CO-N(CH3)2 CH3 1'-(CH3)-4'-F benzene
0.018 1.745 1.580
25 D NH-SO2-N(CH3)2 CH3 1'-(CH3)-4'-F benzene
0.012 1.921 1.957
26 D NH-CO-CO-N(CH3)2
CH3 1'-(CH3)-4'-F benzene
0.01 2.000 1.704
27 D NH-CO-CO-OCH3
CH3 1'-(CH3)-4'-F benzene
0.015 1.824 1.797
28a D NH-CO-CO-OH CH3 1'-(CH3)-4'-F benzene
0.004 2.398 1.594
29 D N(CH3)-CO-CO-N(CH3)2
CH3 1'-(CH3)-4'-F benzene
0.015 1.824 1.943
30 D NH-CO-CO-1,4-( CH3) morpholine
CH3 1'-(CH3)-4'-F benzene
0.02 1.699 1.970
31 D NH-CO-CO-1,4-( CH3) piperazine
CH3 1'-(CH3)-4'-F benzene
0.026 1.585 1.391
32a D NH-CO-CO-N(CH3)2
CH3 1'-(C2H5)-2',3'-(OCH3)
0.021 1.678 1.937
33 D NH-CO-CO-N(CH3)2
CH3 1'-(C2H5)-3'-Cl-4'-F benzene
0.009 2.046 1.739
34 D NH-CO-pyridine CH3 1'-(CH3)-4'-F benzene
0.02 1.699 2.020
35 D NH-CO-pyridazine
CH3 1'-(CH3)-4'-F benzene
0.015 1.824 1.931
36a D NH-CO-pyrimidine
CH3 1'-(CH3)-4'-F benzene
0.007 2.155 1.936
37 D NH-CO-oxazole CH3 1'-(CH3)-4'-F benzene
0.007 2.155 2.325
38 D NH-CO-thiazole CH3 1'-(CH3)-4'-F benzene
0.008 2.097 2.221
39 D NH-CO-iH Imidazole
CH3 1'-(CH3)-4'-F benzene
0.006 2.222 2.357
40a D NH-CO-1,3,4- oxadiazole
CH3 1'- (CH3)-4'-F benzene
0.015 1.824 2.656
Methods
Out of 40 compounds, 30 were used as a training set and 10 as a test set to evaluate the internal degree
of predicitivity of the QSAR equation.
Using Chemdraw ultra 10.0, different 2D structures were drawn, followed by the conversion to 3D
structures of reasonable conformations using Discovery studio v3.5 software.
A large number of descriptors were also calculated (e.g. ALogP, molecular weight, molar refractivity,
dipole moment, heat of formation, Radius of gyration, Wiener index, Zagreb index etc.).
2D QSAR analysis was carried out using genetic function algorithm (GFA) analysis.
RESULT
A QSAR model was generated for integrase activity. In order to select the
optimal set of descriptors, we used systematic variable selection leave one
out (LOO) method in a stepwise forward manner for the selection of
descriptors. Three best QSAR equations models generated for this study
using the GFA approach and LOO method are shown in table below.
Equation R2 Q2 LOF P-value
1
Y= -11.65 − 0.0024929W + 0.088809Z + 0.01936M + 1.1879R
0.820 0.558 0.193 5.174e-09
2
Y= -12.896 − 0.0028585W + 0.077907Z + 0.020068M + 0.015681Ms
0.812 0.470 0.202 9.270e-09
3
Y= -9.6736 − 0.0020098W + 0.078883Z + 0.89779R
0.790 0.620 0.190 5.641e-09
Y: pIC50, set of descriptors (W, Z, M, R, Ms,), R2: correlation coefficient, Q2: cross-validated R squared, LOF: Lack of fit, P-value: significance level
N
N
O OH
OH
O
F
Cl
17
N
N
O OH
Br
OH
O
F
19
HN
N
HN
OH
O
HN
OF
O
N
O
30
HN
N
HN
OH
O
HN
OF
O
N
34
HN
N
HN
OH
O
HN
OF
O
NN
350.04 0.03
0.02 0.02
0.015
pIC50 = -11.65 − 0.0024W + 0.089Z + 0.019M + 1.187R
Cmpds pIC50 Predicted1 Residual1 Predicted2
Residual2 Predicted3 Residual3
1 0.77 0.409 0.361 0.393 0.377 0.274 0.4962 -0.754 0.105 -0.859 0.407 -1.161 0.335 -1.0893 0.658 0.377 0.281 0.397 0.261 0.261 0.3975 0.796 0.498 0.298 0.618 0.178 0.228 0.5686 0.301 0.616 -0.315 0.536 -0.235 0.422 -0.1217 0.301 0.608 -0.307 0.398 -0.097 0.512 -0.2119 0.602 0.463 0.139 0.330 0.272 0.602 0.00010 0.824 0.505 0.319 0.563 0.261 0.692 0.13211 0.854 0.591 0.263 0.900 -0.046 0.725 0.12913 0.638 0.971 -0.333 0.676 -0.038 1.017 -0.37914 0.432 1.280 -0.848 1.316 -0.884 1.276 -0.84415 1.398 1.239 0.159 1.166 0.232 1.260 0.13817 1.398 1.267 0.131 1.401 -0.003 1.340 0.05818 1.699 1.580 0.119 1.311 0.388 1.559 0.13919 1.523 1.276 0.247 1.464 0.059 1.362 0.16021 2.699 2.495 0.204 2.796 -0.097 2.334 0.365
22 2.155 1.681 0.474 1.672 0.483 1.713 0.44223 2.097 1.973 0.124 2.034 0.063 1.989 0.10825 1.921 1.957 -0.036 1.998 -0.077 1.975 -0.05426 2.000 1.704 0.296 1.724 0.276 1.777 0.22327 1.824 1.797 0.027 1.707 0.117 1.867 -0.04329 1.824 1.943 -0.119 1.851 -0.027 1.883 -0.05930 1.699 1.970 -0.271 1.926 -0.227 1.929 -0.23031 1.585 1.391 0.194 1.499 0.086 1.594 -0.00933 2.046 1.739 0.307 1.845 0.201 1.860 0.18634 1.699 2.020 -0.321 1.809 -0.110 2.154 -0.45535 1.824 1.931 -0.107 1.787 0.037 2.017 -0.19337 2.155 2.325 -0.170 2.302 -0.147 2.090 0.06538 2.097 2.221 -0.124 2.243 -0.146 2.109 -0.01239 2.222 2.357 -0.135 2.219 0.002 2.133 0.089
Cmpds pIC50
Predicted1
Residual1
Predicted2
Residual2
Predicted3
Residual3
4 0.745 0.326 0.419 0.287 0.458 0.282 0.4638 0.000 0.485 -0.485 0.761 -0.761 0.587 -0.58712 1.000 1.178 -0.178 0.836 0.164 1.215 -0.21516 0.420 1.212 -0.792 1.259 -0.839 1.233 -0.81320 1.699 1.482 0.217 1.784 -0.085 1.473 0.22624 1.745 1.580 0.165 1.471 0.274 1.634 0.11128 2.398 1.594 0.804 1.500 0.898 1.706 0.69232 1.678 1.937 -0.260 1.877 -0.199 1.961 -0.28336 2.155 1.936 0.219 1.765 0.390 2.096 0.05940 1.824 2.656 -0.832 2.360 -0.536 2.371 -0.547
ConclusionFrom the above result, it can be concluded that Radius of gyration,
Zagreb index, Weiner index and minimized energy are statistically
important with the correlation coefficient value of 0.8209, which is
highly significant.
This QSAR method can be used to predict the activities of future HIV-
1 integrase inhibitors.
References1. Summa, V., Petrocchi, A., Bonelli, F., Crescenzi, B., Donghi, M., Ferrara, M., Fiore, F., Gardelli, C., Paz, O. G., Hazuda, D.
J., Jones, P., Kinzel, O., Laufer, R., Monteagudo, E., Muraglia, E., Nizi, E., Orvieto, F., Pace, P., Pescatore, G., Scarpelli, R.,
Stillmock, K., Witmer, M. V., and Rowley, M. (2008) Discovery of Raltegravir, a potent, selective orally bioavailable HIV-
integrase inhibitor for the treatment of HIV-AIDS infection, J. Med. Chem. 51, 5843-5855.
2. Wai, J. S., Egbertson, M. S., Payne, L. S., Fisher, T. E., Embrey, M. W., Tran, L. O., Melamed, J. Y., Langford, H. M., Guare,
J. P., Zhuang, L. G., Grey, V. E., Vacca, J. P., Holloway, M. K., Naylor-Olsen, A. M., Hazuda, D. J., Felock, P. J., Wolfe, A.
L., Stillmock, K. A., Schleif, W. A., Gabryelski, L. J., and Young, S. D. (2000) 4-aryl-2,4-dioxobutanoic acid inhibitors of
HIV-1 integrase and viral replication in cells, J. Med. Chem. 43, 4923-4926.
3. Wai, J. S., Kim, B., Fisher, T. E., Zhuang, L., Embrey, M. W., Williams, P. D., Staas, D. D., Culberson, C., Lyle, T. A., Vacca,
J. P., Hazuda, D. J., Felock, P. J., Schleif, W. A., Gabryelski, L. J., Jin, L., Chen, I. W., Ellis, J. D., Mallai, R., and Young, S.
D. (2007) Dihydroxypyridopyrazine-1,6-dione HIV-1 integrase inhibitors, Bioorg. Med. Chem. Lett. 17, 5595-5599.
My Current Research
Could the FDA-approved anti-HIV drugs be promising anti-
cancer agents? An answer from extensive molecular dynamic
analyses
Acknowledgement
Dr Mahmoud Soliman (my supervisor) & the lab members
CHPC (Technical support)
UKZN School of health sciences (Financial support)
Thank you