1
Looking for the Best QSAR and Docking Methods
Guillermo RestrepoLaboratorio de Química Teórica, Universidad de
Pamplona, Pamplona, Colombia
2
3
Outline
o Rankingo How we ranko Ranking problemso QSAR modelso Docking programso Conclusionso Acknowledgements
4
“Good” “Bad”
5
12 3
4
5 6
We love rankings!
La romería de San Isidro, Goya
6
How do we rank?Beauty Intelligence Glamour
a 0 8 2b 9 13 12c 2 1 5d 10 14 11e 4 3 7
a
b
c
d
e
a
b
c
d
ea
b
c
d
e
Priorities
a
b
c
d
e
Subjectivities
7
q1 q2 q3
a 0 8 2b 9 13 12c 2 1 5d 10 14 11e 4 3 7
x ≥ y if all qi(x) > qi (y) or at least one attribute (qj) is higher for x while all others are equal.
Comparable
IncomparableIf at least one qj fulfills qj(x) < qj(y) while the others are opposite (qi(x) ≥ qi (y)), x and y are incomparable.
Hasse diagram
a
b
c
d
e
a
b
c
d
e
a
b
c
d
e
a
b
c
d
e
a
b
c
d
e a
b
c
d
e
a
b
c
d
e
Total set of linear extensions
A B C D E F
Brüggemann, R.; Restrepo, G.; Voigt, K. J. Chem. Inf. Model. 2006, 46, 894-902.
8
a
b
c
d
e
a
b
c
d
e
a
b
c
d
e
a
b
c
d
e a
b
c
d
e
a
b
c
d
e
A B C D E F1 2 3 4 5
a 2 2 2 0 0b 0 0 0 3 3c 4 2 0 0 0d 0 0 0 3 3e 0 2 4 0 0
r1 r2 r3 r4 r5a 0.333 0.333 0.333 0 0b 0 0 0 0.5 0.5c 0.667 0.333 0 0 0d 0 0 0 0.5 0.5e 0 0.333 0.667 0 0
Ranking probability of having n at m
pmn = rmn / |LE|
rmn: ocurrence of object n at rank m
Average rank of nAv rkn = ∑m m∙pmn
bd
e
a
c1
2
3
4
5
Min rk Av rk Max rk Vara 1 2 3 2b 4 4.5 5 1c 1 1.333 2 1d 4 4.5 5 1e 2 2.667 3 1
Restrepo, G.; Brüggemann, R.; Weckert, M.; Gerstmann, S.; Frank, H. MATCH Commun. Math. Comput. Chem. 2008, 59, 555-584.
9
Best QSAR methods
Case study:
o Mutagenicity
o 95 aromatic & heteroaromatic amines
o 13 QSAR models
o Two statistics
10
Model label Descriptors r2 s Method
Basak 1997 Topological and geometric 0.797 0.910 Linear
Basak 1998 Topological,geometric and quantum chemical 0.790 0.920 Linear
Maran 1999 #rings, γ-polarizability, HASA1 (SCF/AM1), HDSA (SCF/AM1), Etot(C-C), Etot(C-N)
0.834 0.811 Linear
Karelson 2000a
#rings, γ-polarizability, HASA1 (SCF/AM1), HDSA (SCF/AM1), Etot(C-C), Etot(C-N)
0.834 0.811 Linear
Karelson 2000b
Ic, 3κ, #H acceptor sites, max valence N, PNSA1, γ-polarizability
0.895 1.333 Non-linear
Basak 2001a Expanded set of topological, geometric and quantum chemical
0.794 0.912 Linear
Basak 2001b Expanded set of topological, geometric, quantum chemical and electrotopological
0.821 0.840 Linear
Cash 2001 Electrotopological 0.767 0.979 Linear
Toropov 2001 Graphs weighted with contributions of atomic orbitals 0.758 0.950 Linear
Vračko 2004a Topostructural, topochemical and geometric 0.793 0.840 Non-linear
Vračko 2004b Topostructural, topochemical, geometric and quantum chemical
0.793 0.840 Non-linear
Cash 2005a Electrotopological 0.760 0.950 Linear
Cash 2005b Electrotopological 0.750 0.890 Linear
11
594 linear extensions
Maran 1999Karelson 2000a
Basak 2001b
Basak 1997 Vračko 2004a,b
Basak 2001a
Karelson 2000b
Basak 1998 Cash 2005b
Cash 2005aCash 2001
Toropov 2001
o Maran 1999 & Karelson 2000a are better than 10 other models.
o It is not possible to state whether Karelson 2000b is better or worse than other models.
o There are better models than Cash 2001 & Toropov 2001.
Restrepo, G.; Basak, S. C.; Mills, D. Curr. Comput-Aid Drug. 2011, 7, 109-121.
Min rk Av rk Max rk VarBasak 1997 6 8.2424 9 3Basak 1998 4 5.0909 6 2Maran 1999 10 10.909 11 1
Karelson 2000b 1 6 11 10Basak 2001a 5 6.6667 8 3Basak 2001b 9 9.8182 10 1
Cash 2001 1 2.5455 5 4Toropov 2001 1 1.697 4 3Vracko 2004a 6 7.7576 9 3Cash 2005a 2 3.3939 5 3Cash 2005b 1 3.8788 8 7
Maran 1999Karelson 2000a
Basak 2001b
Basak 1997
Vračko 2004a,b
Basak 2001aKarelson 2000b
Basak 1998
Cash 2005bCash 2005a
Cash 2001
Toropov 20011
11
2
3
4
5
6
7
8
9
10
o Maran 1999 & Karelson 2000a and Basak 2001b are the less variable models.
o Karelson 2000b & Cash 2005b are the most variable models.
12
13
Best Docking methodsCase study:
o 10 docking programs: Dock4, DockIt, FlexX, Flo, Fred, Glide, Gold, LigFit, MOE, MVP
o 8 protein targets
o Two main characteristics:o prediction of conformations of small molecules
bound to protein targetso virtual screening of compound databases to
identify leads for a protein targetWarren, G. L.; Andrews, C. W.; Capelli, A-M.; Clarke, B.; LaLonde, J.; Lambert, M. H.; Lindvall, M.; Nevins, N.; Semus, S. F.; Senger, S.; Tedesco, G.; Wall, I. D.; Woolven, J. M.; Peishoff, C. E.; Head, M. S. J. Med. Chem. 2006, 49, 5912-5931.
Protein-ligand conformations
Percentage of compounds for which a docked pose was found within 2 Å of the crystal structure
136 protein/ligand conformations
chk1 pdfs mrs ppard fxa gyrb hcvpDock4 7 25 19 2 10 29 0DockIt 47 25 3 7 10 0 8FlexX 73 75 39 37 40 43 0
Flo 60 88 45 80 50 0 38Fred 73 50 58 0 10 0 0Glide 67 50 74 33 40 29 8Gold 53 88 94 78 40 43 31LigFit 40 63 0 15 0 0 0MOE 0 0 0 0 0 0 0MVP 87 38 42 41 0 43 0
Kina
se
Poly
pepti
de
defo
rmila
se
Synt
heta
se
Nuc
lear
hor
mon
e re
cept
or
Serin
e pr
otea
se
Isom
eras
e
Poly
mer
ase
14
15
o There are better programs than MOE
o There is no program behaving better than the others
o Gold performs better than 4 other programs
GoldFred
Dock4
MOE
DockIt LigFit
MVP FlexX Glide Flo+
Protein-ligand conformations
12,960 linear extensions
16
Min rk Av rk Max rk VarDock4 2 3.5833 7 5DockIt 2 3.5833 7 5FlexX 4 7.75 10 6Flo+ 4 7.75 10 6Fred 2 6 10 8Glide 4 7.75 10 6Gold 5 8 10 5LigFit 2 3.5833 7 5MOE 1 1 1 0MVP 2 6 10 8
Gold
FlexX, Flo+, Glide
Fred, MVP
Dock4, DockIt, LigFit
MOE 1
2
3
4
5
6
7
8
9
10
o All programs have variable positions in the ranking, except MOE.
o The most suitable docking program to estimate protein-ligand conformations is Gold.
17
Enrichment factor for actives (≤1 μM) found at 10% of the docking-score-ordered list
chk1 fxa gyrb hcvp mrs Ecoli-pdf Strep-pdf ppardDock4 1.4 4.1 1.7 1.8 4.2 0.9 0.8 1.7DockIt 4.2 2 2 1 1 0.2 0 3.2FlexX 7 2.2 5.8 0.9 3.9 0.8 0.8 5.2Flo+ 5.6 2.7 2.3 3.4 1.7 1.5 0.8 3.6Fred 2.9 4.1 1.9 2 0.6 3.2 1.2 1.1Glide 6.3 3.4 1 1 5.3 0.6 0.4 4.8Gold 0.1 4.1 4 0 0.8 1 0.1 5.5
LigandFit 3.3 1.9 2.8 1.8 2.9 2.9 1.7 1.2MOEDock 3.9 0.6 0 0 1 2.1 0.6 0
MVP 7.2 5.8 5.3 3.6 6.4 6.7 6.9 3.9
kina
se
Serin
e pr
otea
se
Isom
eras
e
Poly
mer
ase
Synt
heta
se
Met
allo
prot
ease
Met
allo
prot
ease
Nuc
lear
hor
mon
e re
cept
or
Docking as a virtual screening tool
18
Gold
FredDock4 MOE
DockIt
LigFit
MVPFlexX Glide
Flo+
Ability to correctly identify all active chemotypes from a population of decoy molecules
o MVP works better than 6 of the other programs
o DockIt behaves worse than Flo+ and MVP
o There is no program behaving better than all the others
o Flex, Glide and Gold are the programs for which it is not possible to find a better or worse program
Docking as a virtual screening tool
259,200 linear extensions
19
Min rk Av rk Max Rk VarDock4 1 4.8125 9 8DockIt 1 3.2083 8 7FlexX 1 5.5 10 9Flo+ 2 6.4167 9 7Fred 1 4.8125 9 8Glide 1 5.5 10 9Gold 1 5.5 10 9LigFit 1 4.8125 9 8MOE 1 4.8125 9 8MVP 7 9.625 10 3
MVP
Flo+
FlexX, Glide, Gold
Dock4, Fred, LigFit, MOE
1
2
3
4
5
6
7
8
9
10
DockIt
o All programs have quite variable positions in the ranking
o The most suitable docking program to identify active chemotypes is MVP
20
Conclusions
o With 2 statistics characterising QSAR models, we found 2 best models.
o … and 2 “worse” models.
o The docking program for protein-ligand conformations with the highest probability of being the best one (21%) is Gold.
o MVP has 70% probability of being the best docking program for virtual screening searches.
21
Outlook
o Why not using more statistics for QSAR models?o Instead of ordering Alice and Bob’s models, a
work to do is to order QSAR models, e.g. linear & non-linear ones.
o Some other attributes of QSAR methods need to be introduced, e.g. related to the applicability domain.
o Computational costs and other docking programs features may be included in the study.
22
Acknowledgements
Rainer Brüggemann
Subhash C. Basak
23
Thank you!