i
Discovery and Molecular Modeling of Small Molecule Inhibitors of the Histone Acetyltransferase PCAF
Virtual Screening and Drug Design
Diploma Thesis
Medicinal and Pharmaceutical Chemistry Computational Drug Design
Institute of Pharmacy
Naturwissenschaftliche Fakultt I
Martin-Luther-Universitt Halle-Wittenberg
Pharmacist Suhaib Shekfeh
From Homs/Hama - Syria
Referees:
1. Prof. Dr. Wolfgang Sippl (MLU Halle-Wittenberg)
2. Prof. Dr. Manfred Jung (Albert-Ludwigs Universitt Freiburg)
ii
List of Abbreviations .....................................................................................................................................v 1 Introduction: Epigenetics and HAT ........................................................................................................1
1.1 Histone Acetyltransferases: Role in Epigenetics, Structure, and Classification .................................1 1.2 Histone Lysine Acetyltransferase - Catalytic Mechanism ..................................................................6 1.3 HAT Modulators: Chemical Regulation of Acetyltransferases ...........................................................9
1.3.1 Non-peptidic Natural Product HAT Inhibitors ..........................................................................10 1.3.2 Irreversible HAT Inhibitors (Aryl and alkyl N-substituted Isothiazolones) .............................. 11 1.3.3 Other Synthetic HAT Inhibitors .................................................................................................12
1.4 Structural Overview of Serotonin Acetyltransferases AANAT.........................................................14 1.5 Structural Overview of PCAF HAT ..................................................................................................20
2 Aim of the Work ......................................................................................................................................24 3 Computational Methods - Docking and Virtual Screening.................................................................26
3.1 The Search Problem ..........................................................................................................................26 3.2 The Scoring Problem.........................................................................................................................27 3.3 Solvation Effects ...............................................................................................................................29 3.4 Solvation Effects and Scoring Functions ..........................................................................................31 3.5 Effects of Rescoring Docking Hits using MM-GBSA or MM-PBSA Methods ...............................34 3.6 Docking Programs and Rescoring Methods......................................................................................36
3.6.1 PBSA Scoring using ZAP Library and AMBER-score ..............................................................37 3.6.2 Cscore ........................................................................................................................................38
3.7 Similarity Search...............................................................................................................................39 3.8 ZINC Compound Library..................................................................................................................40 3.9 Fragment-based Drug Design ...........................................................................................................41
4 Implementation .......................................................................................................................................42 4.1 Molecular Modeling..........................................................................................................................42 4.2 Dataset (Test Set) for Docking and Enrichment Studies...................................................................42 4.3 Docking Optimization.......................................................................................................................43 4.4 Isothiazolones - Covalent Docking ...................................................................................................44 4.5 Fragment Docking (Fragments Derived from the ZINC Database)..................................................44 4.6 PCAF in vitro Assay..........................................................................................................................45
5 Results and Discussion............................................................................................................................46 5.1 Optimization of the GOLD Docking Procedure for AANAT ...........................................................46
5.1.1 Reproducing the Binding Mode of AANAT Ligands ..................................................................47 5.1.2 Scoring AANAT Inhibitors .........................................................................................................48 5.1.3 Evaluation of Further Scoring Methods....................................................................................55
5.2 Virtual Screening and Experimental Validation of Selected PCAF Hits...........................................58 5.3 Covalent Docking of Isothiazolones .................................................................................................60 5.4 Fragment-based Drug Design ...........................................................................................................64
6 Conclusions..............................................................................................................................................68 7 References ................................................................................................................................................69
iii
Declaration of Authorship I hereby confirm that I have authored this thesis independendtly and without use of others than the indicated resources. All passages,which are literally or in general manner taken out of publications or other sources, are marked as such. Suhaib Shekfeh Halle (Saale) , Germany 15 May 2009
iv
Acknowledgement I would like to thank
- My advisor and the first Refree of this work: Prof. Dr. habil Wolfgang Sippl for providing all the academic advices, and giving me the opportunity to work in such interesting project.
- My second advisor Prof. Dr. Manfred Jung (Freiburg University) for reading and correcting this work , and for performing the in vitro assay of PCAF inhibition in his laboratory.
- My Family especially my Mother and My Grandmother in Syria for all kinds of Support that they gave and always give to me.
- All the Colleagues in Prof. Sippl AG (Rene, Urszula, Ralf, German, Mark, Martin, Kanin) for all the help they gave and for providing the friendly environment.
- For all the Friends in Germany and especially in Halle (Salle).
v
List of Abbreviations Ada = Adaptorprotein
Ac-CoA = Acetyl Cofactor-A
AANAT = Aryl Alkyl N-AcetylTransferase = Serotonin AcetylTransferase.
ASP = Astex Statistical Potential
BAX = Bcl2 associated X protein
BEAR = Binding Estimation After Refinement
bp= base pairs (of nucleotides)
CBP = CREB-binding protein
CDK = Cyclin-Dependent Kinase
CREB = cAMP Response Element Binding protein
Evdw, Eelec, Esolv = van der Waals Energy, Electrostatic Energy, Solvation Energy
GA = Genetic Algorithm
GOLD = Genetic Optimizaion of Ligand Docking
GNAT = GCN5-related N-AcetylTransferase, tGCN5 = Tetrahymna GCN5
GB = Generalized Born
HAT = Histone Acetyltransferase
HDAC = Histone deacetylase
HTD = High Throughput Docking
MD = Molecular Dynamics
MM-PBSA = Molecular Mechanics-Poison Boltzmann/ Solvent-accessible Surface Area
MM = Molecular Mechanics
MC = Monte Carlo Simulation
MM-GBSA = Molecular Mechanics-Generalized Born/Solvent-accessible Surface Area
MOE = Molecular Operating Enveroment
MCSS = Multiple Copy Simultaneous Search
MYST = MOZ,Ybf2/Sas3,Sas2 und Tip60
NuA3 = Nucleosomal Acetyltransferase for H3
vi
NuA4 = Nucleosomal Acetyltransferase for H4
PDB = Protein Data Bank, previously called Brookhaven PDB
PB = Poisson-Boltzmann
PCAF = p300/CBP-associated factor, hPCAF = human PCAF
PMF = Potential of Mean Force
ROF = Lipinskis Rule Of Five
ROC curve = Receiver Operating Characteristic Curve
RMSD = Root Mean Square Deviation
Rhodanine = 2-Thioxo-4-thiazolidinone or 2-Thioxo-1,3-thiazolidin-4-one
SAR = Structure-Activity Relationship
SAGA = Spt, Ada, GCN5 Acetyltransferase
SANT = Swi3, Ada2, NcoR, TFIIIB
SAP = Sin-associated Protein
SAS = Solvent Accessible Surface
Sas 2 = Something about Silencing 2
SBVS = Structure-Based Virtual Screening
Sin3 = Switch Independent 3
SMRT = Silence-Mediator of Retinoic Acid and Thyroid Hormone Receptor SPC = Simple Point Charge
Spt = Suppressor of Transcription
SVL = Scientific Vector Language (Script Language of MOE)
TIF= Transcriptional Intermediary Factor 2
TrpNH2 = Tryptamine
VS = Virtual Screening
vdw = van der Waals potential/energy
Introduction 1
1 Introduction: Epigenetics and HAT
1.1 Histone Acetyltransferases: Role in Epigenetics, Structure, and Classification
The genetic material present in the nucleus of eukaryotic cells in tightly packed form, which
functions as a dynamic structure and basic contributor in the regulation of various nuclear
processes, including transcription, DNA replication and repair, mitosis and apoptosis [1]. Core
histones are small basic proteins which form a well defined structure, known as nucleosome. There
are four types of core histones known, named H2A, H2B, H3, and H4. The nucleosome core
consists of two copies of each histone type H2A, H2B, H3, and H4, forming an octamer. Around
this octamer, 147 base pairs of DNA are wrapped in left-handed turn. The linker histone H1 binds
the nucleosome and the entry and exit sites of the DNA, thus locking the DNA into place, and
allowing the formation of a higher order structure [2]. An important post-translational modification
of histones is the acetylation of -amino groups on conserved lysine residues. Acetylation
neutralizes the positively charged lysines and therefore affects interactions of the histones with
other proteins and/or with the DNA. Histone acetylation has long been associated with
transcriptionally active chromatin and also implicated in histone deposition during DNA replication
[3].
Histone acetyltransferases (HATs) can be classified into several families based on their sequence
conservation (Table 1) [5]. The human genome encodes up to 25 proteins that show lysine
acetyltransferase activity. At the primary structure level there is little similarity between the
different HATs, and even members of the same family usually display considerable sequence
diversity. Furthermore, there is no single homolog domain that is conserved in all HATs, although
many enzymes contain recognizable Acetyl-CoenzymeA (Ac-CoA) binding motifs and
bromodomains [6]. More similarities are observed at the tertiary structure level (Figure 1 and 2).
HATs display a conserved core domain which contains a L-shaped cleft, formed by the N- and C-
terminal segments of the core domain. This cleft contains the catalytic site, where Ac-CoA binds in
the short segment and the macromolecular substrate binds in the long segment. Beyond the core
domain, there is little structural similarity between the different HATs. In vitro assays indicated that
HATs have different substrate specificities, although the molecular mechanisms underlying the
binding specificities, as well as the true physiological specificities of HATs, remain poorly
understood [4].
Introduction 2
Important and extensively investigated families of HATs are (see also Table 1):
- GNAT family (GCN5-related N-acetyltransferase): includes GCN5, PCAF (p300/CBP-
associated factor), other acetyltranferases like serotonin acetyltransferase (AANAT),
aminoglycoside N-acetyltransferases (AAC-3, and AAC-6), spermidine/spermine N-
acetyltransferase, the elongator subunit Elp3, and Hpa2. HAT1 could be classified to GNAT
or as separate family.
- MYST family (named after its founding members, which include MOZ, YBF2/SAS3, SAS2
and TIP60) [5].
- p300/CBP family [7-9].
Figure 1. Comparison of the three-dimensional structures of GCN5-related N-acetyltransferases: GCN5, PCAF, and
AANAT. (A) tGCN5: the ternary complex with CoA and an 11-residue peptide (in blue) is shown. The black line
indicates CoA or Ac-CoA. (B) PCAf, complexed with CoA, (c) AANAT: the complex with the bisubstrate analog is
shown (indole ring colored blue). The four conserved motifs of the GNAT superfamily C, D, A, and B are shown in
purple, green, yellow, and red, respectively (adapted from [5(b)]).
Over 40 transcription factors and 30 other nuclear, cytoplasmic, bacterial, and viral proteins have
been shown to be acetylated in vivo by HATs [8, 10]. For example, p300/CBP proteins are involved
in diverse physiological processes, such as proliferation, differentiation and apoptosis [11]. GCN5p
is the catalytic subunit of the two multi-protein complexes, ADA and SAGA, involved in
remodeling the chromatin structure and acetylation of histone tails at specific lysines. Table 2
presents a list of all known families of acetyltransferases.
Introduction 3
Figure 2. Superposition of the putative active-site region of GCN5 (in yellow) and HAT1 (in red) with bound Ac-CoA
(shown in capped sticks, adapted from [15]).
Table 1: Main families of HATs, their substrates and their involvement in cancer mechanisms
[5 (a)].
AcetylTransferases Family Substrate Involvement in Cancer
GCN5 GNAT H2B,H4,cMyc Critical regulator of cell cycle and cMyc
PCAF GNAT H3,H4,cMyc,p53,MyoD,E2F Critical regulator of cell cycle , p53,E2F,
and cMyc
CPB CPB/p300 H2A,H2B,H3,H4,pRb,
E2F,p53,c-Myb,
MyoD,AR,FoxO
Translocation: MOZ-,MORF-,and MLL-
p300/CPB fusions.
Mutation : biallelic mutations,p300
epithelial cancer.
Inactivation: haemotological malignancy
P300 CPB/p300 H2A,H2B,H3,H4,pRb,E2F,p53
,c-Myb,MyoD,AR,FoxO
Translocation: MOZMORFMLL-
p300/CBP fusions
Mutation: biallelic mutations,p300
epithelial cancer.
Inactivation: haemotological malignancy
TIP60 MYST H2A,H3,H4,cMyc,AR Association with androgen receptor in
prostate cancer.
MOZ MYST H3,H4 Fusion with p300/CPB and TIF2
MORF MYST H3,H4 Fusion with p300/CPB
ACTR SRC H3,H4 Upregulation in breast cancer
Introduction 4
Table 2. Summary of acetyltransferases families, numbers in brackets are UniProt accession
numbers [23].
Gene Family
Name Synonyms
Gene product name and synonyms
HAT1 HAT1
(O14929)
-- Histone acetyltransferase type B catalytic subunit
(HAT1)
HTATIP
(Q92993)
TIP60 60 kDa HIV-1 Tat-interacting protein, (Tip60)
(NuA4/TRRAP complex component)
MYST1
(Q9H7Z6)
MOF, hMOF Homolog of Drosophila males absent on the first
(hMOF) Component of human male specific
lethal complex (MSL)
MYST2
(O95251)
HBO1, HBOa HAT binding to origin recognition complex
(HBO1), Component of inhibitor of growth
complexes (ING4, ING5).
MYST3
(Q92794)
MOZ, RUNXBP2,
ZNF220
Monocytic leukaemia zinc finger protein, (MOZ)
Runt-related transcription factor-binding protein
(RunxBP2) , Zinc finger protein 220 kDa
(ZNF220) (Component of ING5 complex)
MYST
MYST4
(Q8WYB5)
MORF, MOZ2 MOZ-related factor (MORF), MOZ2, Querkopf
(Component of ING5 complex).
GCN5L2
(Q92830)
GCN5, HGCN5 General control of nitrogen metabolism (GCN5)-
like 2 Homolog of yeast GCN5, STAF97
GNAT
PCAF
(Q92831)
--- p300/CBP-associated factor (P/CAF)
EP300
(Q09427)
p300 E1A-associated protein 300 kDa, (p300) p300/CBP
CREBBP
(Q92793)
CPB CREB-binding protein (CBP)
NCOA1
(Q15788)
SRC1, RIP160 Steroid receptor coactivator (SRC1)
Nuclear receptor coactivator (NCOA1)
160-kDa receptor interacting protein (RIP160)
NCOA2
(Q15596)
TIF2 Transcriptional intermediary factor (TIF2)
Nuclear receptor coactivator 2 (NCOA2)
SRC/p160
NCOA3
(Q9Y6Q9)
AIB1, ACTR, p/CIP
RAC3, TRAM1
Nuclear receptor coactivator (NCOA3)
Amplified in breast cancer (AIB1)
Introduction 5
Gene Family
Name Synonyms
Gene product name and synonyms
Thyroid hormone receptor activator molecule
(TRAM1).
Receptor-associated coactivator (RAC3)
Steroid receptor coactivator protein (SRC3)
p300/CBP-interacting protein ( p/CIP)
TFIIIC subunit 4
family
GTF3C4
(Q9UK98)
-- General transcription factor 3C polypeptide 4
(GTF3C4)
Transcription factor IIIC-delta subunit, (TF3Cd)
TFIIIC 90-kDa subunit ( TFIIIC 90)
ATF ATF2
(P15336)
CREB2, CREBP1 Cyclic AMP-dependent transcription factor
(CREB2)
Activating transcription factor (ATF2)
cAMP response element-binding protein
(CREBP1) , HB16
CIITA CIITA
(P33076)
MHC2TA MHC class II transactivator (CIITA)
TAF1 TAF1
(P21675)
BA2R, CCG1, TAF2A Transcription initiation factor (TFIID) subunit 1
TBP-associated factor (TAF1)
TBP-associated factor 250 kDa (TAFII250)
Cell-cycle gene 1 (CCG1)
Testis-specific chromodomain protein Y1
(CDY1)
CDY1
(Q9Y6F8)
-- Chromodomain Y-like protein (CDYL1,
CDYL2)
CDY
CDYL1
(Q9Y232)
CDYL2
(Q8N8U2)
CDYL1, CDYL2 Chromodomain Y-like protein (CDYL1,
CDYL2)
TFIIB GTF2B
(Q00403)
TF2B, TFIIB Transcription initiation factor (TFIIB)
General transcription factor TFIIB (GTF2B)
MCM3AP MCM3AP
(O60318)
GANP, KIAA0572,
MAP80, SAC3
Mini chromosome maintenance 3-associated
protein (MCM3AP).
80-kDa MCM3-associated protein (MAP80).
Germinal centre-associated nuclear protein
(GANP).
ESCO ESCO1
(Q5FWF5)
EFO1, KIAA1911 Establishment of cohesion 1 homolog 1(ESCO1,
ECO1).
Introduction 6
Gene Family
Name Synonyms
Gene product name and synonyms
Establishment factor-like protein 1, (EFO1p,
hEFO1).
CTF7 homolog 1.
ESCO2
(Q56NI9)
-- Establishment of cohesion 1 homolog 2
(ESCO2).
ECO1 homolog 2.
ARD1 ARD1A
(P41227)
hARD1, TE2, ARD2 Arrest defective protein (ARD1)
N-alpha acetyltranferase
(retroposon-mediated gene duplication product)
CLOCK CLOCK
(O15506)
KIAA0334 Circadian locomoter output cycles protein kaput
(CLOCK)
MGEA5
NCOAT
NCOAT MGEA5
HEXC
(O60502)
--
Meningioma-expressed antigen 5 (MGEA5)
Nuclear cytoplasmic O-linked N-
acetylglucosaminase and acetyltransferase
(NCOAT)
1.2 Histone Lysine Acetyltransferase - Catalytic Mechanism
Reported studies have proposed two different catalytic mechanisms for HATs [12]. GNAT (GCN5-
related N-acetyltransferase) family members use a sequential ordered mechanism that involves an
acetyl transfer from Ac-CoA directly to the N- of the substrate lysine residue (Figure 3). For the
GNAT family, initial structural and kinetic data revealed an ordered sequential mechanism for the
acetyl-transfer [13]. In this mechanism, Trievel et al. proposed the ternary complex mechanism for
the catalysis by GCN5 [14]. In the suggested mechanism Ac-CoA and then the lysine substrate
binds to form a ternary complex, then a glutamate residue (GLU173) is positioned to abstract a
proton from the amino group of the lysine residue, then the uncharged amino group performs a
nucleophilic attack on the carbonyl carbon of the reactive thioester group of Ac-CoA.
According to Trievel et al. [14], GLU173 of GCN5 is the possible candidate to perform the base
catalysis, because it is close enough to the histone lysine, and it has been found later that the
mutation of GLU173 to GLN abolishes the activity in vivo and in vitro [15].
Introduction 7
In general, there is a need for an active-site glutamate (GLU173 in GCN5/ScKAT2) to activate the
-amine of lysine to facilitate the direct nucleophilic attack of the carbonyl carbon of Ac-CoA [16].
The formed tetrahedral intermediate then collapses to the acetyl-lysine product and CoA (Figure.
4).
Figure 3. Ordered sequential mechanism resulting in the formation of a ternary complex (adapted from [16]).
Introduction 8
Figure 4. Comparison between the two proposed mechanisms: ternary complex formation and ping-pong mechanism
(adapted from [17]).
In the second mechanism, which is called ping-pong (i.e. double displacement) catalytic
mechanism, a cysteine residue within the enzyme active site receives the acetyl moiety in the first
step from Ac-CoA, and in a second step the acetyl moiety is transferred to the substrate lysine
residue (Figure. 4) [16]. It has been noticed that all biochemically and structurally characterized
HATs have a conserved glutamate residue in the active-site, which seems to have a similar function
of deprotonating the amino group of the target lysine substrate before the acetyl transfer. Currently
it is thought that all characterized HATs follow an ordered sequential bibi kinetic mechanism
where differences between families may affect substrate specificity but not the overall mechanism
of catalysis [4].
A recent study on p300 demonstrated that p300 HAT is itself polyacetylated and contains an
activation loop that requires (auto)acetylation for full enzyme activation [18]. This is similar to the
situation with protein kinases, where activity is also regulated through an autoinhibitory switch
involving phosphorylation of an activation loop. Additional to their role in catalyzing reversible
post-translational modifications, the similarity between HATs and kinases include also how these
proteins are recruited to their target complexes. In the case of kinases this usually involves the SH2
and 1433 domains that recognize phosphopeptide motifs. HATs frequently contain a
Introduction 9
bromodomain that bind acetyl-lysine-containing sequence motifs in histones and other proteins
[19].
1.3 HAT Modulators: Chemical Regulation of Acetyltransferases
One of the direct structural insights into small molecule-mediated inhibition of HAT proteins came
from a crystal structure of the tetrahymena GCN5 (tGCN5) HAT domain bound to a modified H3-
CoA-20 inhibitor [20]. This bisubstrate inhibitor was prepared with an isopropionyl bridge between
CoA and the peptide to mimic the Ac-CoA-lysine intermediate [21]. Until now, H3-CoA-20 is the
most potent inhibitor of GCN5/PCAF HATs identified, with an IC50 of 300 nM for tGCN5. On
other hand, the bisubstrate inhibitor Lys-CoA, in which an acetyl bridge is introduced between the
amine group and CoA, is a potent p300 inhibitor (IC50 = 500 nM) but a weak PCAF inhibitor (IC50
of 200 M) [17] (Figure 5). This suggested that the p300 enzyme family uses also a ternary
complex mechanism [22]. In contrast to Lys-CoA, neither the H3-CoA-20 nor the H4-CoA-20
peptide-CoA conjugates, where CoA is linked to lysine 14 and 8 of the respective histone peptides,
are potent p300 inhibitors (IC50 values above 10 M). On the other hand, H3-CoA-20 is a potent
PCAF inhibitor (IC50 = 360 nM) [17, 20]. These findings suggested that p300 might use a ternary
complex mechanism that differs somehow from that of the GCN5/PCAF HAT proteins.
In spite of the great interest in HATs as therapeutic targets, just a few synthetic small-molecules
(beside of few natural product) inhibitors of HATs have been discovered to date. The most crucial
disadvantages of the identified substrate-based inhibitors are their low cell permeability and
metabolic instability, which decreases their suitability for investigations in vivo [23].
Introduction 10
1.3.1 Non-peptidic Natural Product HAT Inhibitors
Figure 5. Molecular structures of HAT inhibitors (adapted from [23]).
Anacardic acid is a major component of cashew nutshell liquid and was identified in a natural
product screen as a noncompetitive HAT (i.e. p300 and PCAF) inhibitor [24]. It has poor membrane
permeability and, therefore, shows little effect on cells [25]. It works as weak non-specific
inhibitors of p300/CBP and PCAF (IC50 = 8.5 and 5 M, respectively). Interestingly, CTPB, the
amide derivative of anacardic acid enhances HAT activity of p300 by fourfold, but not that of
PCAF [26]. Later Mantelingu et al. [27] have described the identification of chemical entities
essential to activate p300 HAT activity. Significantly, by employing surface-enhanced Raman
spectroscopy of the enzyme-inhibitor complexes, they have shown that the activation of HAT
activity is achieved by the alteration of the p300 structure.
Another natural product with HAT-inhibitory activity is Curcumin, a yellow pigment extracted
from the root of the turmeric herb Curcuma longa L [28]. Curcumin has long been known to
possess interesting pharmacological properties; apart from its chemopreventive and antiproliferative
activities. It has been found to have antioxidative, anti-inflammatory, anti-infective and antiseptic
properties, and is widely used in Indian medicine and culinary traditions [27]. Curcumin has been
reported to inhibit the HAT activity of p300/CBP but not that of PCAF [28]. The observed kinetics
of p300 enzyme inhibition by Curcumin was originally interpreted that this compound does not
Introduction 11
bind to the active site but act as an allosteric inhibitor [28]. Subsequently, it was shown that
Curcumin is in fact a covalent inhibitor of p300 but not PCAF, presumably targeting some of the
amino acid residues by virtue of its electrophilic unsaturated ketone function [29].
Garcinol is a polyprenylated benzophenone natural product isolated from the edible fruit Garcinia
indica and was shown to be an active site inhibitor of p300 and PCAF, where inhibition kinetics
were observed to be uncompetitive with respect to Ac-CoA but competitive with respect to the
histone substrate [30]. Garcinol inhibits p300 (IC50 = 7 M) and PCAF (IC50 = 5 M) both in vitro and in vivo [31, 32]. Recently, Mantelingu et al. synthesized and tested a set of Garcinol derivatives
(e.g., LTK-13, LTK-14, LTK-19) (Figure 7) that are selective for p300 (IC50 = 57 M) and inactive at PCAF [33]. However, these compounds tend to be poorly soluble and are unstable
because of facile oxidation of the isoprene moieties.
1.3.2 Irreversible HAT Inhibitors (Aryl and alkyl N-substituted Isothiazolones)
Aryl and alkyl N-substituted isothiazolone compounds have been shown to inhibit H3 and H4
acetylation by PCAF and p300 irreversibly (e.g., CCT077791). [34] Stimson et al. showed that a
series of isothiazolones, identified from high-throughput screening, inhibits HAT catalytic activity.
They are also cell permeable, and can reduce global acetylation as well as acetylation of specific
histones (H3 and H4) as well as nonhistone proteins, like alpha-tubulin. In this series of aryl and
alkyl N-substituted isothiazolones, the inhibition is due to the irreversible interaction with thiol
groups (Figure 6). HAT inhibition of isothiazolones is abolished in the presence of thiol-reducing
agents like dithiothreitol (DTT) or glutathione [34]. Furthermore, HAT activity was not restored in
experiments involving the incubation of PCAF with the two isothiazolones CCT077791 and
CCT077792 followed by dialysis for 24 hours. The SAR study of this serie of compounds has
proved that their activity is related to the nature and electron withdrawing/pushing properties of the
substitutes [34]. These properties affect strongly the chemical kinetics of breaking down the sulfur-
nitrogen bond in the isothiazolone ring. The compounds also seem to have considerable off-target
effects, which may be attributable to their high chemical reactivity towards free thiol groups.
Introduction 12
Figure 6. Proposed mechanism of the covalent binding of isothiazolones to thiol groups.
1.3.3 Other Synthetic HAT Inhibitors
Figure 7. Some structures of synthetic HAT inhibitors.
Introduction 13
-Methylene--butyrolactones, like MB-3, are small-molecule HAT inhibitors of purely synthetic
origin (Figure 5). They were designed based on the known interactions between Ac-CoA and the
acetyl acceptor Lys side-chain of the macromolecular substrate [30]. Biel et al. developed MB-3, a
small, cell-permeable inhibitor of human GCN5. The compound contains an -methylene--
butyrolactone scaffold, which is a known substructure element in natural products. MB-3 shows
only weak inhibition of CBP (IC50 = 500 M) and GCN5 (IC50 = 100 M) [26]. Costi et al. reported that cinnamoyl compounds are also inhibitors for p300 [35 (a)].
Recently some chemical modifications have been made on Garcinol to develop p300 selective
inhibitors (Isogarcinol (IG) and LTK14) [35 (b)]. SAR study has been done in the same work to
understand the binding to p300 and PCAF [35 (b)].
Trifluoro-methyl phenyl benzamides have been found to modulate p300 [27]. Cycloalkylidene-(4-
phenylthiazol-2-yl)hydrazone derivatives have been synthesized and have been identified as
capable of inhibiting growth of a GCN5 [35 (d)]. One of these derivatives, CTPH2 has showed
inhibition of GCN5. It has been confirmed that this compound targets the Gcn5p functional network
through an interacting protein [35 (e)].
Another way to inhibit HAT activity is to block the recognition of acetylated partners by targeting
the bromodomain [35 (f)]. Developing of bromodomains inhibitors could be useful for developing
anti-HIV-therapeutics. A series of selective ligands for the PCAF bromodomain has been
discovered recently [35 (g)].
Introduction 14
1.4 Structural Overview of Serotonin Acetyltransferases AANAT
Figure 8. A view of the AANAT-inhibitor complex containing the four-stranded (1-4) -sheet and showing the bisubstrate analog bound in the active site. Side-chains of the tryptamine-binding residues are displayed. GNAT motifs
C, D, A and B are color coded red, green, blue and magenta, respectively (adapted from [36]).
Melatonin is produced in the pineal gland on a circadian cycle and is involved in the regulation of
the biological clock in vertebrate organisms [37]. Circulating levels of melatonin rise and fall daily
under the control of an endogenous circadian clock. The biosynthesis of melatonin in the pineal
gland involves the conversion of 5-hydroxytryptamine (serotonin) to 5-hydroxy-N-acetyltryptamine
(N-acetylserotonin), catalyzed by the serotonin N-acetyltransferase (also named arylalkylamine N-
acetyltransferase AANAT). This step is followed by O-methylation to 5-methoxy-N-
acetyltryptamine (melatonin) catalyzed by 5-hydroxyindole O-methyltransferase (HIOMT). The
activitys change of AANAT is the main factor which controls the rhythmic production of
melatonin. In contrast to AANAT, HIOMT is constitutively active and does not regulate melatonin
circadianic rhythm.
AANAT belongs to the GCN5-related N-acetyltransferase (GNAT) family of proteins, which share
a common conserved structural domain [38]. Regarding the similarity of the function, the domain
Introduction 15
has originally evolved to bind CoA through conserved backbone regions and to facilitate acetyl
transfer to the substrate.
CoA binds between the backbone amides of the P-loop in the motifs A-D and a V-shaped cavity
created between two parallel strands (Figure 8). The exposed amide backbone within this cavity
binds to the alanylpantetheine backbone of CoA. Interestingly, the adenine moiety of the cofactor is
solvent-exposed and does not significantly contribute to the binding with the protein. In general, the
CoA binding site is burried within the protein and offers the possibility to bind small drug-like
molecules [36, 39].
Figure 9. Bisubstrate inhibitors that have been co-crystallized with AANAT (PDB code 1KUY, 1KUV, and 1KUX
[36]).
Introduction 16
Figure 10. Coenzyme A structure.
The pantetheine-pyrophosphate moiety forms extensive hydrogen-bonding contacts to main-chain
functional groups of residues LEU124, VAL126, and GLN132-SER137 of the conserved GNAT
motif A.
The two residues closest to the sulfur atom of CoA are TYR168 (3.1 away from cofactors sulfur)
and GLU161 (3.8 away from cofactors sulfur). The adenine and 3-phosphate-ribose group of
CoA are present in two alternative conformations, which are stabilized by two different sets of
crystal contacts. The presence of two conformations is not surprising, because the adenine and 3-
phosphate moiety occur in various conformations in previously determined GNAT structures
(reviewed in [40]). The tryptamine moiety in the serotonin binding pocket is also found in two
alternative conformations (referred to as cis and trans), localized in the hydrophobic binding pocket
of serotonin formed by residues PHE56, PRO64, MET159, VAL183, LEU186 [36, 39].
In AANAT, two histidine residues in (-strand 4) of motif A, HIS120 and HIS122, have been suggested [38] to play the role of the general base in catalysis because of their proximity to the NH2
group of the bi-substrate analog (HIS120 is 7.5 away from the substrate, HIS122 is 8.7 away
from the substrate). However, site-directed mutagenesis showed that Michaelis Menten constant
Introduction 17
(Km) but not the maximum rate of the catalytic reaction (Vmax) was affected by HIS120 to GLN
and HIS122 to GLN mutations in ovine AANAT [39]. Another candidate for the role of catalytic
base in AANAT is GLU161 in the loop following strand S5. The mutation to alanine for this
residue doesnt affect enzymatic activity [41], providing evidence against this possibility. Thus, the
identity of the catalytic base in AANAT remains unknown. It is possible that some active-site
residues of AANAT can play the catalytic role, making site-directed mutagenesis results difficult to
interpret [41]. Furthermore, the pKa of the nucleophilic substrate amino group may be lowered in
the hydrophobic AANAT enzyme active site and a catalytic base might be expected to have only a
small impact on the acceleration rate. A similar proposal has been suggested for ribosomal catalysis
of peptide bond formation [42].
Catalytic enhancement of the chemical step may result from stabilization of a tetrahedral complex
and/or activation of leaving group (CoA-SH) departure. Polarization of the thioester carbonyl group
as well as stabilization of a potential tetrahedral intermediate could be achieved by hydrogen
bonding of the thioester carbonyl group to the backbone of the hydrophobic residue localized in
beta-strand 4 (LEU124 in AANAT) [43].
Introduction 18
Figure 11. Schematic representation of the interactions between AANAT and a bisubstrate inhibitor. The surrounding
residues in the cofactor and substrate binding pockets of AANAT are shown (PDB code 1KUX). Blue arrows refers to
backbone hydrogen bonds while green arrows refers to side-chain hydrogen bonds. Blue areas refers to the ligands
exposure, while the residues with light blue shadow refers protein exposure.
Introduction 19
Figure 12 . Binding mode of the bisubstrate inhibitor 3 (see Figure 9) to AANAT serotonin acetyltransferase (PDB
code 1KUX).
Introduction 20
1.5 Structural Overview of PCAF HAT
Figure 13. Structure of the PCAFCoA complex representing the general secondary structure of mamallian GNAT
family acetyltranferases and the location of the Ac-CoA binding site. The four domains of the protein are color-coded.
Motifs AD and motif B (based on structural conservation) are colored blue and green, respectively. The N- and C-
terminal protein segments flanking the core are colored magenta and gold, respectively. CoA is colored red (adapted
from [44]).
In the PCAF crystal structure, CoA is bound in a conformation, forming an extensive set of protein
interactions that are mediated predominantly by the pantetheine arm and the pyrophosphate group
[44] with motif A-D and motif B (Figure 13). All but two groups of the 16 member pantheteine
armpyrophosphate chain make contacts with the protein. Most of the contacts are mediated
through either protein backbone hydrogen bonds or protein side chain van der Waals contacts [44].
GNAT conserved residues in PCAF motifs A and B interact extensively with CoA. It could be
noticed that residues 580 and 582587 in the 4loop3 region of motif A make direct and water-
mediated hydrogen bonds with the pyrophosphate group [45]. Thr587 makes a hydrogen bond to
the pyrophosphate oxygen. The aliphatic side chain of GLN581 and a CYSALAVAL sequence
Introduction 21
(residues 574576) at the top of the 4-strand makes van der Waals contacts with the aliphatic part
of the pantetheine arm [44] (see Figure 14 for details).
In addition, the backbone of CYS574 and VAL576 forms hydrogen bonds with the pantetheine arm.
Residues in the 5loop4 region of GNAT motif B interact by van der Waals contacts with the -
mercaptoethylamine segment of the pantetheine arm and thus play a major role in orienting the
reactive sulfhydryl atom for the acetyl transfer [44] (Figure 13). Other protein residues, involved in
the binding, are ALA613, TYR616 and PHE617. Also TYR616 makes van der Waals contacts with
the end of the pantetheine arm near the pyrophosphate group [44]. Residues GLN525 and LEU526,
which are located at the substrate-binding cleft, also make van der Waals contacts with the
pantetheine arm of coenzyme A. The proximity of these residues to the cofactorsubstrate junction
suggests that they play an important role in substrate specificity and/or catalysis [16, 46] (Figure
14).
In the PCAF substrate-binding cleft, there are two residues that are in proximity to act as a general
base for the catalysis via a ternary complex mechanism. These residues, GLU570 in the 4-strand
and Asp610 in the loop between the 5-strand and the 4-helix, are both located in the core domain
of PCAF and are strictly conserved within the GCN5/PCAF subfamily of histone acetyltransferases.
Mutational analysis strongly favors the catalytic involvement of GLU570 since mutation of the
corresponding residue in yeast GCN5 (GLU173) to alanine or glutamine mutations debilitates the
GCN5 activity in both transcriptional activation in vivo and histone acetylation in vitro [47,48]. In
contrast, mutation of the yeast counterpart of ASP610 in PCAF affects slightly the transcriptional
activation in vivo and histone acetylation in vitro [48, 49-51]. According to Clements et al. [44],
GLU570 exists in an ideal environment to play a catalytic role, first because GLU570 is located
proximal to an acidic patch which forms an attractive surface for the basic lysine substrate., and
secondly because the carboxylate of Glu570 is surrounded by several hydrophobic residues
(PHE563, PHE568, ILE571, VAL572, LEU606, ILE637 and TYR640) that probably function to
raise the pKa of the glutamate side chain and thus facilitate the proton extraction from the lysine
substrate. Thirdly, the carboxylate of GLU570 is only ~11.5 away from the putative position of
the reactive thioester of acetyl-coenzyme A [44] (Figure 15).
It was suggested that, the proton extraction may proceed directly through the carboxylate of
GLU570 or, alternatively, through a water molecule. What supports this hypothesis is the presence
Introduction 22
of a water molecule tightly bound to the carboxylate oxygen of GLU570 which is close to the
coenzyme structure [44]. Further requirement for the catalysis is the presence of a hydrogen bond
donor which stabilizes the tetrahedral intermediate. The potential hydrogen bond donor is the
backbone NH of CYS574, although in the presence of the bound substrate additional donors may
also exist (i.e. backbone amine groups of the histone or transcription factor substrate) [44].
Figure14. Schematic representation of the interaction between Ac-CoA and the surrounding residues in the cofactor
binding pocket of PCAF (PDB code 1CM0). Blue arrows refers to backbone hydrogen bonds while green arrows refers
to side-chain hydrogen bonds. Blue areas refers to the ligands exposure, while the residues with light blue shadow
refers protein exposure.
Introduction
23
Figure 15. Binding mode of Ac-CoA at PCAF, showing residues that contribute to ligand binding (PDB code 1CM0).
Aim of the work 24
2 Aim of the Work
In contrast to many nucleotide-dependent protein inhibitors, a small molecule HAT inhibitor
doesnt need to mimic the adenine moiety as the adenine ring is loosely bound to the surface of
HATs. Therefore, less risk exits to get non-selective binding to the multitude of nucleotide binding
proteins (e.g. ATP-binding proteins). In addition, the conserved backbone interactions observed for
CoA may be used to get high-affinity binding, utilizing a wide range of drug-like moieties such as
carboxylate, amide, or sulfonamide groups. The V-shaped cavity in HAT is buried and thus
provides a hydrophobic environment that is suitable to binding small drug-like molecules. The CoA
binding site of GNAT members is conserved and thus similar, while considerable structural
differences could be found in the substrate binding site. These regions have evolved to bind a broad
range of acetyl-group acceptors, including proteins and small-molecule substrates (histones,
cofactors, serotonin, etc.). Thus, to gain selectivity over other homologous proteins that bind Ac-
CoA, it is desirable for an inhibitor to span both sites or to interact with the substrate binding site.
In the current work the focus was put on the docking analysis of known inhibitors for PCAF and the
related serotonin acetyltransferase AANAT for which a series of potent inhibitors has been reported
recently. As all of the currently known PCAF inhibitors show either complex structures or are
natural products with unknown binding mode, the rational discovery of drug-like inhibitors
represents still a challenge.
As there is high homology as well as structural similarity between the cofactor binding pockets of
PCAF and AANAT, the structures of recently identified AANAT inhibitors will be used as
template to design novel PCAF inhibitors. To reach this goal, docking and virtual screening settings
will be tested to find optimal docking conditions for GNAT acetyltransferases. The gained
knowledge on AANAT will then be used to dock compounds identified by similarity searching into
the PCAF binding pocket.
A second focus will be given to the development of recently identified isothiazolones as irreversible
PCAF inhibitors. Different modelling techniques will be applied in order to get ideas to further
improve the activity of this series of compounds and to establish first structure-activity
relationships.
Aim of the work
25
Beside the application of different computer-based methods to identify and develop small molecule
PCAF inhibitors, a special focus will be given on the evaluation of different docking and scoring
methods for available ligand data set. It is hoped, with a systematic evaluation, to improve the
quality of docking and virtual screening methods. These data could be helpful in further improving
the optimization process of PCAF inhibitor lead structures.
Computational Methods 26
3 Computational Methods - Docking and Virtual Screening
Docking is a method which predicts the preferred orientation of one molecule relative to a
second one (usually a macromolecule) to form a stable complex. Knowledge of the preferred
orientation in turn may be used to estimate the strength of association between two molecules
using special mathematical functions called scoring functions. By this way, docking plays an
important role in the rational drug design.
Molecular docking can be used for three main purposes:
1) to predict the binding mode of a known active ligand.
2) to identify new ligands using virtual screening.
3) to predict the binding affinities of related compounds from a known series of actives.
The docking process can be divided into two parts: the search algorithm and the scoring
algorithm. Those two algorithms try to solve the two classical problems of docking process,
the search problem and the scoring problem.
3.1 The Search Problem
The search algorithm should sample the degrees of freedom of the ligand/macromolecule
system sufficiently to include the true binding modes, while the scoring algorithm should
represent the thermodynamics of interaction to distinguish the true binding modes from all
others explored.
Treatment of ligand flexibility can be divided into three basic categories [52]:
- Systematic methods (incremental construction, conformational search, databases)
- Random or stochastic methods (Monte Carlo, genetic algorithms, tabu search)
- Simulation methods (molecular dynamics ab intio docking, energy minimization)
The evaluation and ranking of predicted ligand conformations are always considered as
crucial step of structure-based virtual screening. Even when binding conformations are
correctly predicted, the calculations will not be successful if they cannot differentiate between
true binders and inactives.
Computational Methods 27
3.2 The Scoring Problem
Scoring problem represents the second challenge for docking and virtual screening methods.
Virtual screening is used to identify new lead molecules. In every virtual screening, molecules
must be docked into a protein site to get a predicted pose of ligand binding. The best pose
by scoring of each molecule is then selected to get a top-ranking hit list.
Scoring functions implemented in docking programs make various assumptions and
simplifications in the evaluation of modelled complexes and do not fully account for a number
of physical phenomena that determine molecular recognition for example, entropic effects.
Essentially, three types or classes of scoring functions are currently applied:
- Force-field-based scoring: (D-score [53], G-score [53], Gold [54], Autodock [55], Dock
[56]).
- Empirical scoring: (Ludi [57, 58], F-score [59], Chemscore [60], Score [61, 62], Fresno
[63], X-score [66]).
- Knowledge-based scoring: (PMF [67-69], DrugScore [67], SMoG [68]).
Consensus scoring combines information from different scores to balance errors in single
scores and improve the probability of identifying true ligands. An exemplary implementation
of consensus scoring is X-CSCORE [69, 70], which combines GOLD-like, DOCK-like,
Chemscore, PMF and FlexX scoring functions. However, the potential value of consensus
scoring might be limited, if terms in different scoring functions are significantly correlated,
which could amplify calculation errors, rather than balance them.
In principle, the fitness or scoring functions try to predict the free energies of binding of every
molecule being screened. In practice, the best ranking, that we look for, is the ranking that is
most compatible with the real binding energy. Actually, docking results are often judged by
enrichment of true hits among a larger number of molecules tested, which are determined
by number of real actives among the hit list. The more true positives (real actives) and less
false positives (decoys) we get in the top-scoring hit list, the better enrichment indexes should
be assigned for this docking (virtual screening) run.
In any virtual screening, few benchmarks and metrics for the performance should be
considered: firstly the root mean square deviation (RMSD) between a generated docking pose
Computational Methods 28
and the captured experimental pose in the crystal structure should be considered. Usually
absolute RMSD is used in the docking to estimate the distance between corresponding atom
pairs of two conformers. The optimal docking run should be able to reproduce approximately
the experimental binding pose with RMSD less than 2 . Secondly a visual inspection should
be done for the suggested docking poses and rational judgment of these predictions should be
made by considering the quality of the interactions between the chemical groups of ligands
and the significantly important residues in the protein. In such step, creating the molecular
surfaces with properties maps (electrostatic energy map or van der Waals contacts) could be
essential. The molecular interaction fields for different chemical probes, created e.g. by the
GRID software, could be useful to consider the best predicted binding mode.
Later some enrichment indexes could be calculated like the sensitivity (Se, true positive rate),
which is the ratio of the number of active molecules found by the virtual screening method to
the number of all active database compounds. The second index of enrichment is the
specificity (Sp, false positive rate), which represents the ratio of the number of inactive
compounds that were not selected by the virtual screening methods to the total number of
inactives in the whole database. One of the most used methods currently to describe the
enrichment is the receiver of operator curve (ROC), which describes the selectivity (Se) as a
function of (1-Sp). As Sp is the ratio of discarded inactives to the total inactives, then 1-Sp is
the ratio of the selected inactives, or in another words the selected decoys. The ROC curve is
plotted by considering the different scores of actives as thresholds. For every threshold, the
number of decoys and number of actives within this cut-off is counted. Then we can get the
ROC curve as map of the distribution for actives and decoys according to their scores. By this
method, we avoid the selection of arbitrary threshold by considering all Se and Sp pairs for
each score threshold, which represent important advantage of this method over the other
enrichment indexes [71, 72].
The most difficult challenge for docking, is the accurate prediction of the binding affinities of
compounds, unless if these compounds were from a single series. In all studies there was no
strong correlation between the ability of a docking program to produce a correct pose and its
success in a virtual screen. This difficulty can be attributed to the inherent danger of using a
one single metric such as RMSD, as poses can be fundamentally correct despite a large
deviation in one part of the molecule. Another problem comes from observing those cases
where the poses are barely in the correct binding site or completely with wrong binding mode,
Computational Methods 29
and yet good enrichment is observed. Enrichment may be due to screening out compounds
that are wrong for the target rather than selecting those that are right. Clearly, the enrichment
indexes should be considered but always with visual inspection of predicted binding mode
and its agreement with X-ray structures or the enzymatic kinetic studies for the inhibition type
(competitive, non-competitive, and uncompetitive).
It is always a difficult task to get accurate prediction of binding affinities for a diverse set of
molecules. At its simplest level, this is a problem of subtraction of large numbers,
inaccurately calculated, to get a small number. The large numbers are the interaction energy
between the ligand and protein on one hand and the cost of bringing the two molecules out of
solvent and into an intimate complex on the other hand. The result of this subtraction is the
free energy of binding, which is the ultimate target in any drug design study [73]. The
problem arises from the condensed phases in which biology occurs and also from the many
degrees of freedom of biomolecules [74]. In water, and with highly flexible proteins and
ligands, accurate calculations are much more costly and error prone. Additionally, as pointed
out by Tirado-Rives and Jorgensen [75], the window of activity, as they called it, is very
small. That means that there is just small free energy difference, estimated to be just 4.5
kcal/mol, between the best possible detected ligand in a virtual screening study (potency, ~
50 nM) and the experimental detection limit (potency, approximately 100 M. Among the
most accurate methods today are thermodynamic integration/free energy perturbation
methods, which could sometimes calculate the differences in affinities between related
molecules with accuracy about 1 kcal/mol [76, 77]. But even these methods only compare
close analogues, but they do not predict absolute binding affinities nor can they compare
affinities among the diverse compounds.
3.3 Solvation Effects
Protein-ligand binding happens in a salt-water environment. Such an environment has a strong
effect on energetics of protein-ligand binding. Water has a dielectric constant of about 80,
whereas the dielectric constant of vacuum is 1. As a favorable interaction exists between the
charge and the high-dielectric environment, new one-body solvation energy for each atomic
charge would arise [78]. As a consequence, there can be a substantial energy penalty for
moving the polar part of a ligand out of water and into the binding site.
Computational Methods 30
Moreover, water molecules performs a screening on the charge-charge interactions of fully
hydrated atoms by approximately 80-fold. However, atoms in a protein-ligand interface are
hold apart from the solvent and therefore interact with an effective dielectric constant less
than 80. In general, we can consider atoms that are further apart, more likely to interact
through solvent, and this idea led to introduce a new computational model; called as crude
screening model.
The crude screening model is consisting of a distance-dependent dielectric. For atoms i and j,
the dielectric between two atoms I and j is Dij = C Rij, where C is a constant often set to 4 and
Rij is the inter-atomic distance between two atoms i and j. This model allows the modeling of
one chief effect of the solvent with efficient manner, and it is used in a number of ligand-
protein docking algorithms. However, this model is not enough to account for all solvations
effects [73].
In addition, the electrostatic interaction of two atoms is not only linked to their mutual
distance, it depends also on the positions of all the other protein and ligand atoms, because
these positions determine where the high-dielectric solvent can penetrate. Another important
effect of water is the hydrophobic effect, which is the tendency of water molecules to drive
non-polar solutes together [79]. This promotes the association of non-polar surfaces of the
ligand and the protein. The hydrophobic effect is often considered by an additional solvation
energy term that is proportional to molecular surface area, with a positive coefficient.
Two computational models have been developed to describe the electrostatic solvation effects
of water. The more precise model is called Poisson-Boltzman (PB), while another faster but
less precise model is called Generalized Born approach (GB). Combining the PB or GB
electrostatics model with a surface area term (to account the hydrophobic effect) yields the
PBSA [80] and GBSA [81] solvation models, respectively.
These two models are called implicit solvent models because they do not treat any water
molecules explicitly during a simulation. The influence of solvent on binding can also be
treated with molecular dynamics (MD) or Monte Carlo (MC) simulations that include
thousands of explicit water molecules modeled with an empirical force field [82- 84].
Dielectric screening, the solvation of polar groups, and the hydrophobic effect all emerge
Computational Methods 31
automatically within this approach. But it is substantially more computationally demanding
than an implicit solvent model.
When ligand solvation is not considered in molecular docking, there is no penalty for placing
a charged ligand atom in a region where the receptor only weakly complements it. In this
situation, a highly charged molecule will be overestimated to have better interaction energy
than a true ligand. The true ligand, bearing less formal charge, would be estimated to have
less favorable interaction energy with this receptor site [73].
When a charged molecule transfers from water to a binding site, it changes a high dielectric
for a low dielectric environment. When the cost of moving a charged species from a high to a
low dielectric environment is considered, the bias toward highly charged molecules is
eliminated [73].
In the same way, when non-polar solvation is not considered, larger molecules would
typically receive better scoring values than they should receive. In the docking poses, these
molecules often have fragments that are poorly complemented by the binding site. To solve
this problem, taking the hydrophobic effect into account and considering the non-polar
solvation (estimated by the loss of molecular surface) could result in better estimations. In this
case, molecules that make few favorable interactions with the enzyme would be disfavored
relative to molecules that are well complemented by the binding site. The non-polar solvation
term acts as a balance to the van der Waals term in the interaction energy, leading to
complexes with a higher proportion of interacting surfaces [73].
In summary, ignoring the electrostatic component of ligand solvation results in higher ranking
of compounds with high formal charges than the known neutral inhibitors for this enzyme.
Also ignoring the non-polar component of ligand solvation biases the results towards larger
compounds that dont complement the binding site as the known, smaller ligands [73].
3.4 Solvation Effects and Scoring Functions
The GOLD program [85-87] (Genetic Optimization of Ligand Docking) utilizes a genetic
algorithm (GA) to find an optimal ligand conformation for a given protein target and thus
evaluates poses with a fitness function (Goldscore, Chemscore or ASP score). The Goldscore
Computational Methods 32
fitness function is force-field-based and includes directional hydrogen (H)-bonding term, a
soft van der walls potential (vdw) term, and an internal energy term. The interesting features
of this function are the additional H-bonding term, the indirect consideration of desolvation
through the H-bonding term, and the evaluation of internal energies.
The LUDI program differentiate between the ionic bonding Energy and H-bonds' energy and
also contain a term for accounting the entropic effect, or the contribution due to freezing out
of rotational degrees of freedom upon binding.
FlexX software [59] has modified the LUDI scoring function later to replace the hydrophobic
interaction term (van der walls forces, abbreviated as vdw ) with two terms : one for ligand-
receptor aromatic contacts and another for other hydrophobic interactions, additionally the
coefficients of other terms has been re-calibrated using a set of 19 complexes [59]. Examples
of other empirical scoring functions are Chemscore [60], Fresno [63], Score [61], and the
scoring function of Hammerhead [88, 89]. These scores are only different in their weights or
geometric constraints (which affects the penalty function that accounts for deviations from
ideal H-bond geometry).
At last, there is the scoring function of Autodock [55, 90], which is a function that combines
both force-field-like and empirically based attributes. This scoring function has firstly three
terms similar a molecular-mechanics force field (vdw, H-bonds, and electrostatics), but in this
instance, they are weighted by empirical weighting factors. The last two terms represents the
entropic contribution and the proteinligand solvation penalty.
An intermediate but practical approach to address solvation effects is to treat the solvent as a
continuum dielectric medium [91, 92]. Shoichet et al. have chosen to use continuum
electrostatics to evaluate the ligand solvation term, assuming that the ligand is completely
desolvated upon binding, and that every ligand desolvates the protein equally [73]. They start
with the DOCK energy function and add separate electrostatic and non-polar corrections to
ligand solvation as determined by the program HYDREN [93]. In spite of all approximations
in such method, this simple implementation had a considerable effect on the ranking of the
known actives and the size and charge of other ligands populating the top of the hit list [73].
Computational Methods 33
The scoring functions discussed above aim to approximate the important contributions to the
free energy of binding in a manner consistent with the demands of high-throughput docking
(HTD). Most of the terms added to these functions to address the effects of solvation are
included to capture the qualitative effects in an easily implemented atom-based fashion (i.e.,
weight down pairwise Coulombic interactions, penalize buried polar groups, reward buried
hydrophobic interactions).
Recently some researchers have used more rigorous, physics-based approaches to capture the
effects of solvent in a HTD scoring function (i.e. continuum electrostatics). However, to speed
up the calculations, these more rigorous approaches still utilize an approximate continuum
electrostatics method like generalized born (GB) and take algorithmic shortcuts. With the
availability of faster methods to calculate Poisson-Boltzmann (PB) -based electrostatics, it is
possible to use a full solvation-based HTD scoring function.
To get more precise results using Poisson-Boltzmann/Surface Area (PBSA) implicit model,
electrostatic (Coulombic + solvation) energies could be calculated using ZAP, which is an
OpenEyes library to apply the PBSA implicit solvation calculation [94]. In this method,
solutions to the Poisson-Boltzmann equation are obtained using an exponentially switched
atomic Gaussian function to represent the dielectric boundary, such that the dielectric constant
varies smoothly from = 2 for the molecular region to = 80 for the solvent [95,96,97].
Atomic charges are calculated on a grid with 0.5 spacing. Electrostatic solvation energies
Gelec are then obtained by summing the product of every atoms charge and the potential
over all atoms and subtracting out the self-energy and Coulombic terms.
The apolar contribution to desolvation is calculated using Gap = A, where A is the total
loss of solvent exposed surface area of the protein and ligand upon forming a complex (also
calculated using ZAP). The quantity (= 47 cal/mol/2) was chosen such that Gap
represents the difference (complex vs. protein + ligand) in transfer energy from a low
dielectric environment (such as an alkane solvent or binding site in a protein) with = 2, to a
water with = 80 [98, 99]. Such Equation could be used:
apsolvgaselec
gasvdw
solvbind G+G+EE=G +
Computational Methods 34
The last equation states that the binding energy in in solvent equals to electrostatic energy in
gas phase plus the solvation electrostatic contribution plus the solvents apolar contribution.
The sum of electrostatic + loss area contribution could enhance the correlation with the
observed potency. If area loss term is ignored, calculations comparing the binding affinities
of dissimilar ligands will be biased towards overly charged and overly large molecules.
3.5 Effects of Rescoring Docking Hits using MM-GBSA or MM-PBSA Methods
One of the first applications of molecular mechanicsPoissonBoltzmann surface area (MM-
PBSA) scoring was the trial of Wang et al. which consists of hierarchical technique that used
an initial database screening and a MM-PBSA rescoring to find HIV-1 reverse transcriptase
inhibitors [100]. An initial docking screen with subsequent rescoring by a molecular
mechanicsgeneralized Born surface area (MMGBSA) method has been recently used to
improve the enrichment of known ligands for several enzymes [101105].
MMPBSA and MMGBSA methods involve minimization and often dynamic sampling of
the proteinligand complexes, and include ligand and receptor conformational energies and
strain. They evaluate the electrostatics and solvation components of the binding energy by PB
or GB methods, including the desolvation of both ligand and receptor. The MMGBSA
binding energy is determined by (E (complex) E (receptor) E (ligand)) where E is an energy
estimation using GBSA solvation model [102]. As we are using implicit solvation model, it is
clear that solute configurational entropy effects are completely ignored.
There are three main limitations in these methods:
1) The force fields and solvation energies are not uniformly accurate
2) For reasons of computational efficiency, only a small part of configuration space near the
docking starting pose could be really explored
3) Configurational entropy effects would be ignored.
In spite of these limitations, the MMGBSA rescoring methods represent a substantially
higher level of scoring methodologies than that applied by most docking programs and are
attractive alternatives to the more complete computationally-expensive methods of the energy
calculation like free-energy perturbation and thermodynamic integration [106-108]. The
Computational Methods 35
principal improvement conferred by MMGBSA rescoring over docking is the inclusion of
receptor binding site relaxation and the optimal induced fitting of the docking solutions.
Consequently, this induced fitting could improve the rank of larger ligands that would be
missed by rigid receptor docking.
The structural relaxation with MMGBSA performed well when the initial docking geometry
resembled the crystallographic pose, but there is a little to do when large protein
conformational changes were provoked by ligand binding site or the docking binding mode
was away from the real crystallographic binding mode. In most cases this relaxation led not
only to improved rankings but also improved geometries. For many ligands, RMSD values
between the MMGBSA predictions and the crystallographic results declined relative to those
of the docking predictions and, especially in hydrophilic or anionic cavity, many ligands
refined by MMGBSA had improved hydrogen bonding to the site. But this rescoring method
couldn't rescue the wrong docking solutions for some false negatives (missed hits) [102].
By allowing the receptor to respond to ligand binding, one allows for new and potentially
unfavorable receptor conformations. These must be distinguished by the MMGBSA energy
functions from the true low-energy conformations that may be sampled in solution. This is
challenging and hard task, as the receptor conformational energies are large and the errors in
these calculations are typically on the same order of the net interaction energy of the protein
ligand complex. Although some of the errors are cancelled by subtraction of the internal
energies before and after ligand binding, one is still subtracting two large numbers with
relatively large errors to find a small one, the net binding free energy. Consistent with this
view, ligands could achieve their maximal advantage over decoys on rescoring when we
allowed only a 5 region around the binding site to relax [102].
But still, relaxing the entire system is the more physically correct way to calculate these
energies [102]. Our own results refers that the results with just minimum binding site
relaxation has lower capability to distinguish between real actives and false positives (data not
shown).
Additionally some changes in polarity (due to some substitutes) could increase the solvation
cost, but that could be not captured by the GBSA model. The challenges of balancing ligand
electrostatic interaction energies and desolvation penalties were also apparent in any anionic
Computational Methods 36
cavity [102]. Overall, the results of MMGBSA rescoring of docking hit lists on the model
binding sites seem conflicted. On the one hand, rescoring could:
1) rescue many docking false negatives
2) improve the geometric fidelity of most of the predicted structures
3) and increase the diversity of the hit lists.
PBSA scoring as implemented in DOCK6 is considered as one of the best methods for
rescoring nowadays. This method has proven to be very efficient in increasing the enrichment
factors in spite of the approximation that it contains. Our reliance on fixed ligand
conformations is another source of error in this work; we can improve the result by allowing
the ligand conformational flexibility [109]. Also the calculations in PBSA scoring also do not
correct for lost degrees of rotational and translational freedom on binding, nor do they
consider gains in vibrational entropy of the system on ligand binding.
Moreover it has not been investigated how the terms (vdw, electrostatics, and surface loss)
should add up. In some cases, it would be possible that the desolvation penalizing of hydrogen
bonding groups is not enough adequately [73]. The failure to adequately penalize neutral
polarity also may stem from the use of an inductive method for calculating partial atomic
charges [110]. So it could be thought that using quantum mechanically-derived partial atomic
charges may improve matters [73, 111].
Recently several studies have shown good results in the application of MM-GBSA and MM-
PBSA rescoring methods [112-114]. Even in the absence of more intensive, detailed energy
evaluation schemes, it is clear that fairly simple considerations can dramatically improve the
ability to distinguish binders from non-binders. One of the improvements that could be done
is trying to calculating desolvation penalties that reflected the degree of burial for each
orientation of each ligand [73]. Correcting for solvation helps us to recognize more true
inhibitors and fewer decoys in virtual screening for receptors of known structure.
3.6 Docking Programs and Rescoring Methods
GOLD 4.0 (Genetic Optimization for Ligand Docking) is an automated ligand docking
program that uses a genetic algorithm for flexible ligand docking to a fixed protein structure.
Computational Methods 37
Three different fitness and scoring functions are available with GOLD: Goldscore,
Chemscore, and ASP score.
An additional important feature of GOLD is the possibility to use docking constraints by
several methods:
1) Distance constraint, for use with individual ligands
2) Substructure based distance constraint, for use with multiple ligands that have a common
substructure or functional group.
3) Hydrogen bond constraint, for specifying a hydrogen bond between a particular ligand
atom and a particular atom in the protein.
4) Protein hydrogen bond constraint, for specifying that a particular protein atom should be
hydrogen-bonded to the ligand, but without specifying to which ligand atom.
5) Region (hydrophobic) constraint, for biasing the docking towards solutions in which
particular regions of the binding site are occupied by specific ligand atoms or types of ligand
atom.
6) Template similarity constraint, for biasing the conformation of docked ligands towards a
given solution, or template.
7) Scaffold constraint, to place a ligand fragment at an exact specified position in the binding
site.
Protein hydrogen bond constraint could be used efficiently to find an universal setting that
enable us to perform virtual screening run with high enrichment factors and energetically
preferred binding mode.
3.6.1 PBSA Scoring using ZAP Library and AMBER-score
ZAP library is a PBSA optimizer provided by OpenEye. The Poisson equation in this
approach describes how electrostatic fields change in a medium of varying dielectric, such as
an organic molecule in water. The Boltzmann modification is to take in consideration the
effect of mobile charge, e.g. salt. PB is an effective way to simulate the effects of water in
biological systems. It relies on a charge description of a molecule, the designation of low
(molecular) and high (solvent) dielectric regions and a description of an ion-accessible
volume and produces a grid of electrostatic potentials. From this, transfer energies between
different solvents, binding energies, pka shifts, pI's, solvent forces, electrostatic descriptors,
solvent dipole moments, surface potentials and dielectric focusing are calculated. As
Computational Methods 38
electrostatics is one of the two principal components of molecular interaction (the other, of
course, is the shape complementary factor), ZAP is OpenEye's attempt to solve the whole
electrostatic energy as precise as possible.
The AMBER-score includes the following terms: AMBER molecular mechanics, with
implicit solvation, and molecular dynamics simulation, receptor flexibility, and conjugate
gradient minimization. AMBER-score implements molecular mechanics implicit solvent
simulations with the traditional all-atom AMBER force field for protein atoms and the general
AMBER force field (GAFF) for ligand atoms. The interaction between the ligand and the
receptor is represented by adding the electrostatic and the van der Waals energy terms,
additionally the solvation energy is calculated using a Generalized Born (GB) solvation
model. The user has the option to choose one of the following GB models: (i) Hawkins,
Cramer and Truhlar pairwise GB model with parameters described by Tsui and Case (gb=1)
[115], (ii) Onufriev, Bashford and Case model, GB (OBC) (gb=2) [116], and (iii) a modified
GB (OBC) (gb=5) [75]. The surface area term is derived using a fast LCPO algorithm [117].
The AMBER-score is calculated as:
E (Complex) - [E (Receptor) + E (Ligand)]
where E (Complex), E (Receptor), and E (Ligand) are respectively, the internal energies of the complex,
receptor, and ligand (all solvated) as approximated by AMBER forcefield with GBSA
solvation terms. The calculation of each of these three energies uses the same protocol:
minimization with a conjugate gradient method is followed by MD simulation (Langevin
molecular dynamics at constant temperature), another minimization, and a final energy
evaluation. The user can specify the number of pre-MD-minimization cycles, the number of
MD simulation steps, and the number of post-MD-minimization cycles in the dock input file.
During the final energy evaluation, a surface area term is included. The receptor energy is
determined once. The AMBER-score energy protocol is performed for every ligand and its
corresponding complex.
3.6.2 Cscore
Scoring functions can be adapted from force field approaches, estimating the enthalpy of
binding via the pair-energy of the complex. Other functions estimate the entropy of
binding, incorporating terms for desolvation and loss of conformational flexibility.
While such functions are more chemically appealing, they require significantly more
Computational Methods 39
statistical fitting than those based on force fields. FlexX is an example of this second
approach. Statistically-fit functions are dependent on their training set. Each author has
tried to make this as general as possible, but concerns remain as to the extensibility of
these functions to new systems. Since each scoring function has been derived from a
different set of crystal structures, it is reasonable to use multiple functions when
evaluating a protein-ligand pair.
According to the consensus scoring principles, Structures which are considered good fits
in multiple scoring functions can be examined further, while those which do not can be
dropped. CScore approach could be used in Sybyl7.3 and 8.1 as consensus scoring in
virtual high throughput screening [118]. CScore provides several functions:
G_score, based on the work of Willett's group. D_score, based on the work of Kuntz et al. PMF_score, based on the work of Muegge and Martin. Chemscore, based on the work of Eldridge, Murray, Auton, Paolini, and Mee .
The consensus can be generated from any combination of these or other previously-calculated
scores. There is possibility to add FlexX scoring function to Cscore if the FlexX license is
available.
3.7 Similarity Search
Several fingerprint systems are implemented in Chemical Computings package: Molecular
Chemical fingerprints can be used to search in large compound databases for structurally
related molecules to a given search query. Several fingerprint systems are implemented in
Chemical Computings Molecular Operating Environment (MOE) [119]. Moreover each
fingerprint system will support a number of similarity metrics and use different
representation. Most important fingerprints systems are:
1) MACCS Structural Keys (feature list version). Each feature indicates the presence of
one of the 166 public MDL MACCS structural keys computed from the molecular
graph. The fingerprint is represented as a sparse list of keys present in the molecule.
2) Bit MACCS: MACCS Structural Keys (bit packed version). Each feature indicates
the presence of one of the 166 public MDL MACCS structural keys calculated from
the molecular graph. The fingerprint is a dense bit vector of feature bits 6 words long.
Computational Methods 40
3) Protein Ligand Interactions Fingerprints: Each feature represents a protein-ligand
interaction type, e.g. hydrogen bond or ionic interaction.
4) PiDAPH3: 3-point pharmacophore based fingerprint calculated from a 3D
conformation. Each atom is given one of 8 atom types computed from 3 atomic
properties: "in pi system", "is donor", "is acceptor". Anions and cations are not
represented. Then, all triplets of atoms are coded as features using the three inter-
atomic distances and three atom types of each triangle. The resulting fingerprint is
represented as a sparse feature list.
5) piDAPH4: 4-point pharmacophore based fingerprint calculated from a 3D
conformation. Each atom is given one of 8 atom types computed from 3 atomic
properties: "in pi system", "is donor", "is acceptor". Anions and cations are not
represented. Then, all quadruplets of atoms are coded as features using the six inter-
atomic distances, four atom types and chirality of each quadruplet. The resulting
fingerprint is represented as a sparse feature list.
6) GpiDAPH3: 3-point pharmacophore based fingerprint calculated from the 2D
molecular graph. Each atom is given one of 8 atom types computed from 3 atomic
properties: "in pi system", "is donor", "is acceptor". Anions and cations are not
represented. Then, all triplets of atoms are coded as features using the three graph
distances and three atom types of each triangle. The resulting fingerprint is represented
as a sparse feature list.
Tanimoto similarity search could be later accomplished using MOE. Tanimoto similarity
module calculates the similarity values for each target molecule with respect to one or more
reference molecules using molecular fingerprints systems. The Tanimoto similarity search is
defined by the expression: Similarity = Nab/ (Na+Nb+Nab)
where : Nab is the number of fingerprint bits presented in both reference and target molecule,
Na is the number of fingerprint bits presented only in the Reference molecule, Nb is the
number of fingerprint bits presented only in the Target molecule. Tanimoto similarity index
ranges from zero (no common bits) to one (exact same bits).
3.8 ZINC Compound Library
ZINC, is a free database of commercially-available compounds for virtual screening, provided
from the University of California-San Francisco. The number of commercial compounds
included in ZINC currently is over 8 million purchasable compounds in ready-to-dock, 3D
Computational Methods
41
formats. ZINC 8 is available currently on-line for download (http://zinc.docking.org). It is
currently built from the catalogs of ten major compound vendors, and is updated periodically
by deleting the unavailable compounds and updating the vendors lists or even adding new
chemical vendors. Of these 8 Millions compounds, there are 5 Millions compounds which are
Lipinski compliant [120] with the caveat that Molinspirations LogP has been used as a
surrogate for cLogP. Of these, 1.1 Million are lead-like molecules, which are defined as
having molecular weight between 150 and 350, calculated LogP less than four, number of
hydrogen-bond donors less than or equal to three, and number of hydrogen-bond acceptors
less than or equal to six. A total of 63 thousands molecules are fragment-like,
- with calculated LogP values between -2 and 3
- less than three hydrogen-bond donors
- less than six hydrogen-bond acceptors
- less than three rotatable bonds
- molecular weight less than 250
3.9 Fragment-based Drug Design
Knowledge of how good a given fragment binds to a protein target, allows us to optimize the
hits by growing the fragments or even by finding new leads by combining and linking
different fragments. The main benefit of using fragments rather than small-molecules is the
notable reduction of the space size as fragments contains less number of atoms.
The fragment universe is much smaller in size than the chemical universe of small molecules.
The size of the chemical universe of compounds below 160 Da is estimated to be about ~14
million compounds [121]. So, screening a fragment library of 10,000 compounds captures
substantially more chemical diversity space than a conventional high-throughput screening.
An additional factor working in favour of fragment-based screening is that hypothesis
proposed by Hann and co-workers [122], this hypothesis states that less complex molecules
should show higher hit rates against protein targets. As a result, even though a typical
fragment screen will only explore much less than 1% of the available low-molecular-mass
universe, the ability to find leads is substantially higher and subsequently increases the value
of the screen. This theoretical model has been recently validated by the Novartis group [123],
in which the observed hit rates for fragment screens were 101,000 times higher than
conventional high-throughput screens.
Implementation 42
4 Implementation 4.1 Molecular Modeling
Seven X-ray crystal structures are reported in the Protein Data Bank for mammalian AANAT.
The three protein-ligand structures with the highest crystallographic resolution (PDB codes
1CJW, 1KUV and 1KUX) represent suitable targets for virtual screening purposes [35]. All
three structures contain a potent bi-substrate inhibitor of AANAT. 1KUX (resolution 1.8 )
has been selected for the current virtual screening study. The coordinates of the protein were
extracted from the corresponding pdb file. The inhibitor was removed, an