Discovery and Molecular Modeling of Small Molecule Inhibitors of the Histone Acetyltransferase PCAF:...

i

Discovery and Molecular Modeling of Small Molecule Inhibitors of the Histone Acetyltransferase PCAF

Virtual Screening and Drug Design

Diploma Thesis

Medicinal and Pharmaceutical Chemistry Computational Drug Design

Institute of Pharmacy

Naturwissenschaftliche Fakultt I

Martin-Luther-Universitt Halle-Wittenberg

Pharmacist Suhaib Shekfeh

From Homs/Hama - Syria

Referees:

1. Prof. Dr. Wolfgang Sippl (MLU Halle-Wittenberg)

2. Prof. Dr. Manfred Jung (Albert-Ludwigs Universitt Freiburg)

ii

List of Abbreviations .....................................................................................................................................v 1 Introduction: Epigenetics and HAT ........................................................................................................1

1.1 Histone Acetyltransferases: Role in Epigenetics, Structure, and Classification .................................1 1.2 Histone Lysine Acetyltransferase - Catalytic Mechanism ..................................................................6 1.3 HAT Modulators: Chemical Regulation of Acetyltransferases ...........................................................9

1.3.1 Non-peptidic Natural Product HAT Inhibitors ..........................................................................10 1.3.2 Irreversible HAT Inhibitors (Aryl and alkyl N-substituted Isothiazolones) .............................. 11 1.3.3 Other Synthetic HAT Inhibitors .................................................................................................12

1.4 Structural Overview of Serotonin Acetyltransferases AANAT.........................................................14 1.5 Structural Overview of PCAF HAT ..................................................................................................20

2 Aim of the Work ......................................................................................................................................24 3 Computational Methods - Docking and Virtual Screening.................................................................26

3.1 The Search Problem ..........................................................................................................................26 3.2 The Scoring Problem.........................................................................................................................27 3.3 Solvation Effects ...............................................................................................................................29 3.4 Solvation Effects and Scoring Functions ..........................................................................................31 3.5 Effects of Rescoring Docking Hits using MM-GBSA or MM-PBSA Methods ...............................34 3.6 Docking Programs and Rescoring Methods......................................................................................36

3.6.1 PBSA Scoring using ZAP Library and AMBER-score ..............................................................37 3.6.2 Cscore ........................................................................................................................................38

3.7 Similarity Search...............................................................................................................................39 3.8 ZINC Compound Library..................................................................................................................40 3.9 Fragment-based Drug Design ...........................................................................................................41

4 Implementation .......................................................................................................................................42 4.1 Molecular Modeling..........................................................................................................................42 4.2 Dataset (Test Set) for Docking and Enrichment Studies...................................................................42 4.3 Docking Optimization.......................................................................................................................43 4.4 Isothiazolones - Covalent Docking ...................................................................................................44 4.5 Fragment Docking (Fragments Derived from the ZINC Database)..................................................44 4.6 PCAF in vitro Assay..........................................................................................................................45

5 Results and Discussion............................................................................................................................46 5.1 Optimization of the GOLD Docking Procedure for AANAT ...........................................................46

5.1.1 Reproducing the Binding Mode of AANAT Ligands ..................................................................47 5.1.2 Scoring AANAT Inhibitors .........................................................................................................48 5.1.3 Evaluation of Further Scoring Methods....................................................................................55

5.2 Virtual Screening and Experimental Validation of Selected PCAF Hits...........................................58 5.3 Covalent Docking of Isothiazolones .................................................................................................60 5.4 Fragment-based Drug Design ...........................................................................................................64

6 Conclusions..............................................................................................................................................68 7 References ................................................................................................................................................69

iii

Declaration of Authorship I hereby confirm that I have authored this thesis independendtly and without use of others than the indicated resources. All passages,which are literally or in general manner taken out of publications or other sources, are marked as such. Suhaib Shekfeh Halle (Saale) , Germany 15 May 2009

iv

Acknowledgement I would like to thank

- My advisor and the first Refree of this work: Prof. Dr. habil Wolfgang Sippl for providing all the academic advices, and giving me the opportunity to work in such interesting project.

- My second advisor Prof. Dr. Manfred Jung (Freiburg University) for reading and correcting this work , and for performing the in vitro assay of PCAF inhibition in his laboratory.

- My Family especially my Mother and My Grandmother in Syria for all kinds of Support that they gave and always give to me.

- All the Colleagues in Prof. Sippl AG (Rene, Urszula, Ralf, German, Mark, Martin, Kanin) for all the help they gave and for providing the friendly environment.

- For all the Friends in Germany and especially in Halle (Salle).

v

List of Abbreviations Ada = Adaptorprotein

Ac-CoA = Acetyl Cofactor-A

AANAT = Aryl Alkyl N-AcetylTransferase = Serotonin AcetylTransferase.

ASP = Astex Statistical Potential

BAX = Bcl2 associated X protein

BEAR = Binding Estimation After Refinement

bp= base pairs (of nucleotides)

CBP = CREB-binding protein

CDK = Cyclin-Dependent Kinase

CREB = cAMP Response Element Binding protein

Evdw, Eelec, Esolv = van der Waals Energy, Electrostatic Energy, Solvation Energy

GA = Genetic Algorithm

GOLD = Genetic Optimizaion of Ligand Docking

GNAT = GCN5-related N-AcetylTransferase, tGCN5 = Tetrahymna GCN5

GB = Generalized Born

HAT = Histone Acetyltransferase

HDAC = Histone deacetylase

HTD = High Throughput Docking

MD = Molecular Dynamics

MM-PBSA = Molecular Mechanics-Poison Boltzmann/ Solvent-accessible Surface Area

MM = Molecular Mechanics

MC = Monte Carlo Simulation

MM-GBSA = Molecular Mechanics-Generalized Born/Solvent-accessible Surface Area

MOE = Molecular Operating Enveroment

MCSS = Multiple Copy Simultaneous Search

MYST = MOZ,Ybf2/Sas3,Sas2 und Tip60

NuA3 = Nucleosomal Acetyltransferase for H3

vi

NuA4 = Nucleosomal Acetyltransferase for H4

PDB = Protein Data Bank, previously called Brookhaven PDB

PB = Poisson-Boltzmann

PCAF = p300/CBP-associated factor, hPCAF = human PCAF

PMF = Potential of Mean Force

ROF = Lipinskis Rule Of Five

ROC curve = Receiver Operating Characteristic Curve

RMSD = Root Mean Square Deviation

Rhodanine = 2-Thioxo-4-thiazolidinone or 2-Thioxo-1,3-thiazolidin-4-one

SAR = Structure-Activity Relationship

SAGA = Spt, Ada, GCN5 Acetyltransferase

SANT = Swi3, Ada2, NcoR, TFIIIB

SAP = Sin-associated Protein

SAS = Solvent Accessible Surface

Sas 2 = Something about Silencing 2

SBVS = Structure-Based Virtual Screening

Sin3 = Switch Independent 3

SMRT = Silence-Mediator of Retinoic Acid and Thyroid Hormone Receptor SPC = Simple Point Charge

Spt = Suppressor of Transcription

SVL = Scientific Vector Language (Script Language of MOE)

TIF= Transcriptional Intermediary Factor 2

TrpNH2 = Tryptamine

VS = Virtual Screening

vdw = van der Waals potential/energy

Introduction 1

1 Introduction: Epigenetics and HAT

1.1 Histone Acetyltransferases: Role in Epigenetics, Structure, and Classification

The genetic material present in the nucleus of eukaryotic cells in tightly packed form, which

functions as a dynamic structure and basic contributor in the regulation of various nuclear

processes, including transcription, DNA replication and repair, mitosis and apoptosis [1]. Core

histones are small basic proteins which form a well defined structure, known as nucleosome. There

are four types of core histones known, named H2A, H2B, H3, and H4. The nucleosome core

consists of two copies of each histone type H2A, H2B, H3, and H4, forming an octamer. Around

this octamer, 147 base pairs of DNA are wrapped in left-handed turn. The linker histone H1 binds

the nucleosome and the entry and exit sites of the DNA, thus locking the DNA into place, and

allowing the formation of a higher order structure [2]. An important post-translational modification

of histones is the acetylation of -amino groups on conserved lysine residues. Acetylation

neutralizes the positively charged lysines and therefore affects interactions of the histones with

other proteins and/or with the DNA. Histone acetylation has long been associated with

transcriptionally active chromatin and also implicated in histone deposition during DNA replication

[3].

Histone acetyltransferases (HATs) can be classified into several families based on their sequence

conservation (Table 1) [5]. The human genome encodes up to 25 proteins that show lysine

acetyltransferase activity. At the primary structure level there is little similarity between the

different HATs, and even members of the same family usually display considerable sequence

diversity. Furthermore, there is no single homolog domain that is conserved in all HATs, although

many enzymes contain recognizable Acetyl-CoenzymeA (Ac-CoA) binding motifs and

bromodomains [6]. More similarities are observed at the tertiary structure level (Figure 1 and 2).

HATs display a conserved core domain which contains a L-shaped cleft, formed by the N- and C-

terminal segments of the core domain. This cleft contains the catalytic site, where Ac-CoA binds in

the short segment and the macromolecular substrate binds in the long segment. Beyond the core

domain, there is little structural similarity between the different HATs. In vitro assays indicated that

HATs have different substrate specificities, although the molecular mechanisms underlying the

binding specificities, as well as the true physiological specificities of HATs, remain poorly

understood [4].

Introduction 2

Important and extensively investigated families of HATs are (see also Table 1):

- GNAT family (GCN5-related N-acetyltransferase): includes GCN5, PCAF (p300/CBP-

associated factor), other acetyltranferases like serotonin acetyltransferase (AANAT),

aminoglycoside N-acetyltransferases (AAC-3, and AAC-6), spermidine/spermine N-

acetyltransferase, the elongator subunit Elp3, and Hpa2. HAT1 could be classified to GNAT

or as separate family.

- MYST family (named after its founding members, which include MOZ, YBF2/SAS3, SAS2

and TIP60) [5].

- p300/CBP family [7-9].

Figure 1. Comparison of the three-dimensional structures of GCN5-related N-acetyltransferases: GCN5, PCAF, and

AANAT. (A) tGCN5: the ternary complex with CoA and an 11-residue peptide (in blue) is shown. The black line

indicates CoA or Ac-CoA. (B) PCAf, complexed with CoA, (c) AANAT: the complex with the bisubstrate analog is

shown (indole ring colored blue). The four conserved motifs of the GNAT superfamily C, D, A, and B are shown in

purple, green, yellow, and red, respectively (adapted from [5(b)]).

Over 40 transcription factors and 30 other nuclear, cytoplasmic, bacterial, and viral proteins have

been shown to be acetylated in vivo by HATs [8, 10]. For example, p300/CBP proteins are involved

in diverse physiological processes, such as proliferation, differentiation and apoptosis [11]. GCN5p

is the catalytic subunit of the two multi-protein complexes, ADA and SAGA, involved in

remodeling the chromatin structure and acetylation of histone tails at specific lysines. Table 2

presents a list of all known families of acetyltransferases.

Introduction 3

Figure 2. Superposition of the putative active-site region of GCN5 (in yellow) and HAT1 (in red) with bound Ac-CoA

(shown in capped sticks, adapted from [15]).

Table 1: Main families of HATs, their substrates and their involvement in cancer mechanisms

[5 (a)].

AcetylTransferases Family Substrate Involvement in Cancer

GCN5 GNAT H2B,H4,cMyc Critical regulator of cell cycle and cMyc

PCAF GNAT H3,H4,cMyc,p53,MyoD,E2F Critical regulator of cell cycle , p53,E2F,

and cMyc

CPB CPB/p300 H2A,H2B,H3,H4,pRb,

E2F,p53,c-Myb,

MyoD,AR,FoxO

Translocation: MOZ-,MORF-,and MLL-

p300/CPB fusions.

Mutation : biallelic mutations,p300

epithelial cancer.

Inactivation: haemotological malignancy

P300 CPB/p300 H2A,H2B,H3,H4,pRb,E2F,p53

,c-Myb,MyoD,AR,FoxO

Translocation: MOZMORFMLL-

p300/CBP fusions

Mutation: biallelic mutations,p300

epithelial cancer.

Inactivation: haemotological malignancy

TIP60 MYST H2A,H3,H4,cMyc,AR Association with androgen receptor in

prostate cancer.

MOZ MYST H3,H4 Fusion with p300/CPB and TIF2

MORF MYST H3,H4 Fusion with p300/CPB

ACTR SRC H3,H4 Upregulation in breast cancer

Introduction 4

Table 2. Summary of acetyltransferases families, numbers in brackets are UniProt accession

numbers [23].

Gene Family

Name Synonyms

Gene product name and synonyms

HAT1 HAT1

(O14929)

-- Histone acetyltransferase type B catalytic subunit

(HAT1)

HTATIP

(Q92993)

TIP60 60 kDa HIV-1 Tat-interacting protein, (Tip60)

(NuA4/TRRAP complex component)

MYST1

(Q9H7Z6)

MOF, hMOF Homolog of Drosophila males absent on the first

(hMOF) Component of human male specific

lethal complex (MSL)

MYST2

(O95251)

HBO1, HBOa HAT binding to origin recognition complex

(HBO1), Component of inhibitor of growth

complexes (ING4, ING5).

MYST3

(Q92794)

MOZ, RUNXBP2,

ZNF220

Monocytic leukaemia zinc finger protein, (MOZ)

Runt-related transcription factor-binding protein

(RunxBP2) , Zinc finger protein 220 kDa

(ZNF220) (Component of ING5 complex)

MYST

MYST4

(Q8WYB5)

MORF, MOZ2 MOZ-related factor (MORF), MOZ2, Querkopf

(Component of ING5 complex).

GCN5L2

(Q92830)

GCN5, HGCN5 General control of nitrogen metabolism (GCN5)-

like 2 Homolog of yeast GCN5, STAF97

GNAT

PCAF

(Q92831)

--- p300/CBP-associated factor (P/CAF)

EP300

(Q09427)

p300 E1A-associated protein 300 kDa, (p300) p300/CBP

CREBBP

(Q92793)

CPB CREB-binding protein (CBP)

NCOA1

(Q15788)

SRC1, RIP160 Steroid receptor coactivator (SRC1)

Nuclear receptor coactivator (NCOA1)

160-kDa receptor interacting protein (RIP160)

NCOA2

(Q15596)

TIF2 Transcriptional intermediary factor (TIF2)

Nuclear receptor coactivator 2 (NCOA2)

SRC/p160

NCOA3

(Q9Y6Q9)

AIB1, ACTR, p/CIP

RAC3, TRAM1

Nuclear receptor coactivator (NCOA3)

Amplified in breast cancer (AIB1)

Introduction 5

Gene Family

Name Synonyms


Thyroid hormone receptor activator molecule

(TRAM1).

Receptor-associated coactivator (RAC3)

Steroid receptor coactivator protein (SRC3)

p300/CBP-interacting protein ( p/CIP)

TFIIIC subunit 4

family

GTF3C4

(Q9UK98)

-- General transcription factor 3C polypeptide 4

(GTF3C4)

Transcription factor IIIC-delta subunit, (TF3Cd)

TFIIIC 90-kDa subunit ( TFIIIC 90)

ATF ATF2

(P15336)

CREB2, CREBP1 Cyclic AMP-dependent transcription factor

(CREB2)

Activating transcription factor (ATF2)

cAMP response element-binding protein

(CREBP1) , HB16

CIITA CIITA

(P33076)

MHC2TA MHC class II transactivator (CIITA)

TAF1 TAF1

(P21675)

BA2R, CCG1, TAF2A Transcription initiation factor (TFIID) subunit 1

TBP-associated factor (TAF1)

TBP-associated factor 250 kDa (TAFII250)

Cell-cycle gene 1 (CCG1)

Testis-specific chromodomain protein Y1

(CDY1)

CDY1

(Q9Y6F8)

-- Chromodomain Y-like protein (CDYL1,

CDYL2)

CDY

CDYL1

(Q9Y232)

CDYL2

(Q8N8U2)

CDYL1, CDYL2 Chromodomain Y-like protein (CDYL1,

CDYL2)

TFIIB GTF2B

(Q00403)

TF2B, TFIIB Transcription initiation factor (TFIIB)

General transcription factor TFIIB (GTF2B)

MCM3AP MCM3AP

(O60318)

GANP, KIAA0572,

MAP80, SAC3

Mini chromosome maintenance 3-associated

protein (MCM3AP).

80-kDa MCM3-associated protein (MAP80).

Germinal centre-associated nuclear protein

(GANP).

ESCO ESCO1

(Q5FWF5)

EFO1, KIAA1911 Establishment of cohesion 1 homolog 1(ESCO1,

ECO1).

Introduction 6

Gene Family

Name Synonyms


Establishment factor-like protein 1, (EFO1p,

hEFO1).

CTF7 homolog 1.

ESCO2

(Q56NI9)

-- Establishment of cohesion 1 homolog 2

(ESCO2).

ECO1 homolog 2.

ARD1 ARD1A

(P41227)

hARD1, TE2, ARD2 Arrest defective protein (ARD1)

N-alpha acetyltranferase

(retroposon-mediated gene duplication product)

CLOCK CLOCK

(O15506)

KIAA0334 Circadian locomoter output cycles protein kaput

(CLOCK)

MGEA5

NCOAT

NCOAT MGEA5

HEXC

(O60502)

--

Meningioma-expressed antigen 5 (MGEA5)

Nuclear cytoplasmic O-linked N-

acetylglucosaminase and acetyltransferase

(NCOAT)

1.2 Histone Lysine Acetyltransferase - Catalytic Mechanism

Reported studies have proposed two different catalytic mechanisms for HATs [12]. GNAT (GCN5-

related N-acetyltransferase) family members use a sequential ordered mechanism that involves an

acetyl transfer from Ac-CoA directly to the N- of the substrate lysine residue (Figure 3). For the

GNAT family, initial structural and kinetic data revealed an ordered sequential mechanism for the

acetyl-transfer [13]. In this mechanism, Trievel et al. proposed the ternary complex mechanism for

the catalysis by GCN5 [14]. In the suggested mechanism Ac-CoA and then the lysine substrate

binds to form a ternary complex, then a glutamate residue (GLU173) is positioned to abstract a

proton from the amino group of the lysine residue, then the uncharged amino group performs a

nucleophilic attack on the carbonyl carbon of the reactive thioester group of Ac-CoA.

According to Trievel et al. [14], GLU173 of GCN5 is the possible candidate to perform the base

catalysis, because it is close enough to the histone lysine, and it has been found later that the

mutation of GLU173 to GLN abolishes the activity in vivo and in vitro [15].

Introduction 7

In general, there is a need for an active-site glutamate (GLU173 in GCN5/ScKAT2) to activate the

-amine of lysine to facilitate the direct nucleophilic attack of the carbonyl carbon of Ac-CoA [16].

The formed tetrahedral intermediate then collapses to the acetyl-lysine product and CoA (Figure.

4).

Figure 3. Ordered sequential mechanism resulting in the formation of a ternary complex (adapted from [16]).

Introduction 8

Figure 4. Comparison between the two proposed mechanisms: ternary complex formation and ping-pong mechanism

(adapted from [17]).

In the second mechanism, which is called ping-pong (i.e. double displacement) catalytic

mechanism, a cysteine residue within the enzyme active site receives the acetyl moiety in the first

step from Ac-CoA, and in a second step the acetyl moiety is transferred to the substrate lysine

residue (Figure. 4) [16]. It has been noticed that all biochemically and structurally characterized

HATs have a conserved glutamate residue in the active-site, which seems to have a similar function

of deprotonating the amino group of the target lysine substrate before the acetyl transfer. Currently

it is thought that all characterized HATs follow an ordered sequential bibi kinetic mechanism

where differences between families may affect substrate specificity but not the overall mechanism

of catalysis [4].

A recent study on p300 demonstrated that p300 HAT is itself polyacetylated and contains an

activation loop that requires (auto)acetylation for full enzyme activation [18]. This is similar to the

situation with protein kinases, where activity is also regulated through an autoinhibitory switch

involving phosphorylation of an activation loop. Additional to their role in catalyzing reversible

post-translational modifications, the similarity between HATs and kinases include also how these

proteins are recruited to their target complexes. In the case of kinases this usually involves the SH2

and 1433 domains that recognize phosphopeptide motifs. HATs frequently contain a

Introduction 9

bromodomain that bind acetyl-lysine-containing sequence motifs in histones and other proteins

[19].

1.3 HAT Modulators: Chemical Regulation of Acetyltransferases

One of the direct structural insights into small molecule-mediated inhibition of HAT proteins came

from a crystal structure of the tetrahymena GCN5 (tGCN5) HAT domain bound to a modified H3-

CoA-20 inhibitor [20]. This bisubstrate inhibitor was prepared with an isopropionyl bridge between

CoA and the peptide to mimic the Ac-CoA-lysine intermediate [21]. Until now, H3-CoA-20 is the

most potent inhibitor of GCN5/PCAF HATs identified, with an IC50 of 300 nM for tGCN5. On

other hand, the bisubstrate inhibitor Lys-CoA, in which an acetyl bridge is introduced between the

amine group and CoA, is a potent p300 inhibitor (IC50 = 500 nM) but a weak PCAF inhibitor (IC50

of 200 M) [17] (Figure 5). This suggested that the p300 enzyme family uses also a ternary

complex mechanism [22]. In contrast to Lys-CoA, neither the H3-CoA-20 nor the H4-CoA-20

peptide-CoA conjugates, where CoA is linked to lysine 14 and 8 of the respective histone peptides,

are potent p300 inhibitors (IC50 values above 10 M). On the other hand, H3-CoA-20 is a potent

PCAF inhibitor (IC50 = 360 nM) [17, 20]. These findings suggested that p300 might use a ternary

complex mechanism that differs somehow from that of the GCN5/PCAF HAT proteins.

In spite of the great interest in HATs as therapeutic targets, just a few synthetic small-molecules

(beside of few natural product) inhibitors of HATs have been discovered to date. The most crucial

disadvantages of the identified substrate-based inhibitors are their low cell permeability and

metabolic instability, which decreases their suitability for investigations in vivo [23].

Introduction 10

1.3.1 Non-peptidic Natural Product HAT Inhibitors

Figure 5. Molecular structures of HAT inhibitors (adapted from [23]).

Anacardic acid is a major component of cashew nutshell liquid and was identified in a natural

product screen as a noncompetitive HAT (i.e. p300 and PCAF) inhibitor [24]. It has poor membrane

permeability and, therefore, shows little effect on cells [25]. It works as weak non-specific

inhibitors of p300/CBP and PCAF (IC50 = 8.5 and 5 M, respectively). Interestingly, CTPB, the

amide derivative of anacardic acid enhances HAT activity of p300 by fourfold, but not that of

PCAF [26]. Later Mantelingu et al. [27] have described the identification of chemical entities

essential to activate p300 HAT activity. Significantly, by employing surface-enhanced Raman

spectroscopy of the enzyme-inhibitor complexes, they have shown that the activation of HAT

activity is achieved by the alteration of the p300 structure.

Another natural product with HAT-inhibitory activity is Curcumin, a yellow pigment extracted

from the root of the turmeric herb Curcuma longa L [28]. Curcumin has long been known to

possess interesting pharmacological properties; apart from its chemopreventive and antiproliferative

activities. It has been found to have antioxidative, anti-inflammatory, anti-infective and antiseptic

properties, and is widely used in Indian medicine and culinary traditions [27]. Curcumin has been

reported to inhibit the HAT activity of p300/CBP but not that of PCAF [28]. The observed kinetics

of p300 enzyme inhibition by Curcumin was originally interpreted that this compound does not

Introduction 11

bind to the active site but act as an allosteric inhibitor [28]. Subsequently, it was shown that

Curcumin is in fact a covalent inhibitor of p300 but not PCAF, presumably targeting some of the

amino acid residues by virtue of its electrophilic unsaturated ketone function [29].

Garcinol is a polyprenylated benzophenone natural product isolated from the edible fruit Garcinia

indica and was shown to be an active site inhibitor of p300 and PCAF, where inhibition kinetics

were observed to be uncompetitive with respect to Ac-CoA but competitive with respect to the

histone substrate [30]. Garcinol inhibits p300 (IC50 = 7 M) and PCAF (IC50 = 5 M) both in vitro and in vivo [31, 32]. Recently, Mantelingu et al. synthesized and tested a set of Garcinol derivatives

(e.g., LTK-13, LTK-14, LTK-19) (Figure 7) that are selective for p300 (IC50 = 57 M) and inactive at PCAF [33]. However, these compounds tend to be poorly soluble and are unstable

because of facile oxidation of the isoprene moieties.

1.3.2 Irreversible HAT Inhibitors (Aryl and alkyl N-substituted Isothiazolones)

Aryl and alkyl N-substituted isothiazolone compounds have been shown to inhibit H3 and H4

acetylation by PCAF and p300 irreversibly (e.g., CCT077791). [34] Stimson et al. showed that a

series of isothiazolones, identified from high-throughput screening, inhibits HAT catalytic activity.

They are also cell permeable, and can reduce global acetylation as well as acetylation of specific

histones (H3 and H4) as well as nonhistone proteins, like alpha-tubulin. In this series of aryl and

alkyl N-substituted isothiazolones, the inhibition is due to the irreversible interaction with thiol

groups (Figure 6). HAT inhibition of isothiazolones is abolished in the presence of thiol-reducing

agents like dithiothreitol (DTT) or glutathione [34]. Furthermore, HAT activity was not restored in

experiments involving the incubation of PCAF with the two isothiazolones CCT077791 and

CCT077792 followed by dialysis for 24 hours. The SAR study of this serie of compounds has

proved that their activity is related to the nature and electron withdrawing/pushing properties of the

substitutes [34]. These properties affect strongly the chemical kinetics of breaking down the sulfur-

nitrogen bond in the isothiazolone ring. The compounds also seem to have considerable off-target

effects, which may be attributable to their high chemical reactivity towards free thiol groups.

Introduction 12

Figure 6. Proposed mechanism of the covalent binding of isothiazolones to thiol groups.

1.3.3 Other Synthetic HAT Inhibitors

Figure 7. Some structures of synthetic HAT inhibitors.

Introduction 13

-Methylene--butyrolactones, like MB-3, are small-molecule HAT inhibitors of purely synthetic

origin (Figure 5). They were designed based on the known interactions between Ac-CoA and the

acetyl acceptor Lys side-chain of the macromolecular substrate [30]. Biel et al. developed MB-3, a

small, cell-permeable inhibitor of human GCN5. The compound contains an -methylene--

butyrolactone scaffold, which is a known substructure element in natural products. MB-3 shows

only weak inhibition of CBP (IC50 = 500 M) and GCN5 (IC50 = 100 M) [26]. Costi et al. reported that cinnamoyl compounds are also inhibitors for p300 [35 (a)].

Recently some chemical modifications have been made on Garcinol to develop p300 selective

inhibitors (Isogarcinol (IG) and LTK14) [35 (b)]. SAR study has been done in the same work to

understand the binding to p300 and PCAF [35 (b)].

Trifluoro-methyl phenyl benzamides have been found to modulate p300 [27]. Cycloalkylidene-(4-

phenylthiazol-2-yl)hydrazone derivatives have been synthesized and have been identified as

capable of inhibiting growth of a GCN5 [35 (d)]. One of these derivatives, CTPH2 has showed

inhibition of GCN5. It has been confirmed that this compound targets the Gcn5p functional network

through an interacting protein [35 (e)].

Another way to inhibit HAT activity is to block the recognition of acetylated partners by targeting

the bromodomain [35 (f)]. Developing of bromodomains inhibitors could be useful for developing

anti-HIV-therapeutics. A series of selective ligands for the PCAF bromodomain has been

discovered recently [35 (g)].

Introduction 14

1.4 Structural Overview of Serotonin Acetyltransferases AANAT

Figure 8. A view of the AANAT-inhibitor complex containing the four-stranded (1-4) -sheet and showing the bisubstrate analog bound in the active site. Side-chains of the tryptamine-binding residues are displayed. GNAT motifs

C, D, A and B are color coded red, green, blue and magenta, respectively (adapted from [36]).

Melatonin is produced in the pineal gland on a circadian cycle and is involved in the regulation of

the biological clock in vertebrate organisms [37]. Circulating levels of melatonin rise and fall daily

under the control of an endogenous circadian clock. The biosynthesis of melatonin in the pineal

gland involves the conversion of 5-hydroxytryptamine (serotonin) to 5-hydroxy-N-acetyltryptamine

(N-acetylserotonin), catalyzed by the serotonin N-acetyltransferase (also named arylalkylamine N-

acetyltransferase AANAT). This step is followed by O-methylation to 5-methoxy-N-

acetyltryptamine (melatonin) catalyzed by 5-hydroxyindole O-methyltransferase (HIOMT). The

activitys change of AANAT is the main factor which controls the rhythmic production of

melatonin. In contrast to AANAT, HIOMT is constitutively active and does not regulate melatonin

circadianic rhythm.

AANAT belongs to the GCN5-related N-acetyltransferase (GNAT) family of proteins, which share

a common conserved structural domain [38]. Regarding the similarity of the function, the domain

Introduction 15

has originally evolved to bind CoA through conserved backbone regions and to facilitate acetyl

transfer to the substrate.

CoA binds between the backbone amides of the P-loop in the motifs A-D and a V-shaped cavity

created between two parallel strands (Figure 8). The exposed amide backbone within this cavity

binds to the alanylpantetheine backbone of CoA. Interestingly, the adenine moiety of the cofactor is

solvent-exposed and does not significantly contribute to the binding with the protein. In general, the

CoA binding site is burried within the protein and offers the possibility to bind small drug-like

molecules [36, 39].

Figure 9. Bisubstrate inhibitors that have been co-crystallized with AANAT (PDB code 1KUY, 1KUV, and 1KUX

[36]).

Introduction 16

Figure 10. Coenzyme A structure.

The pantetheine-pyrophosphate moiety forms extensive hydrogen-bonding contacts to main-chain

functional groups of residues LEU124, VAL126, and GLN132-SER137 of the conserved GNAT

motif A.

The two residues closest to the sulfur atom of CoA are TYR168 (3.1 away from cofactors sulfur)

and GLU161 (3.8 away from cofactors sulfur). The adenine and 3-phosphate-ribose group of

CoA are present in two alternative conformations, which are stabilized by two different sets of

crystal contacts. The presence of two conformations is not surprising, because the adenine and 3-

phosphate moiety occur in various conformations in previously determined GNAT structures

(reviewed in [40]). The tryptamine moiety in the serotonin binding pocket is also found in two

alternative conformations (referred to as cis and trans), localized in the hydrophobic binding pocket

of serotonin formed by residues PHE56, PRO64, MET159, VAL183, LEU186 [36, 39].

In AANAT, two histidine residues in (-strand 4) of motif A, HIS120 and HIS122, have been suggested [38] to play the role of the general base in catalysis because of their proximity to the NH2

group of the bi-substrate analog (HIS120 is 7.5 away from the substrate, HIS122 is 8.7 away

from the substrate). However, site-directed mutagenesis showed that Michaelis Menten constant

Introduction 17

(Km) but not the maximum rate of the catalytic reaction (Vmax) was affected by HIS120 to GLN

and HIS122 to GLN mutations in ovine AANAT [39]. Another candidate for the role of catalytic

base in AANAT is GLU161 in the loop following strand S5. The mutation to alanine for this

residue doesnt affect enzymatic activity [41], providing evidence against this possibility. Thus, the

identity of the catalytic base in AANAT remains unknown. It is possible that some active-site

residues of AANAT can play the catalytic role, making site-directed mutagenesis results difficult to

interpret [41]. Furthermore, the pKa of the nucleophilic substrate amino group may be lowered in

the hydrophobic AANAT enzyme active site and a catalytic base might be expected to have only a

small impact on the acceleration rate. A similar proposal has been suggested for ribosomal catalysis

of peptide bond formation [42].

Catalytic enhancement of the chemical step may result from stabilization of a tetrahedral complex

and/or activation of leaving group (CoA-SH) departure. Polarization of the thioester carbonyl group

as well as stabilization of a potential tetrahedral intermediate could be achieved by hydrogen

bonding of the thioester carbonyl group to the backbone of the hydrophobic residue localized in

beta-strand 4 (LEU124 in AANAT) [43].

Introduction 18

Figure 11. Schematic representation of the interactions between AANAT and a bisubstrate inhibitor. The surrounding

residues in the cofactor and substrate binding pockets of AANAT are shown (PDB code 1KUX). Blue arrows refers to

backbone hydrogen bonds while green arrows refers to side-chain hydrogen bonds. Blue areas refers to the ligands

exposure, while the residues with light blue shadow refers protein exposure.

Introduction 19

Figure 12 . Binding mode of the bisubstrate inhibitor 3 (see Figure 9) to AANAT serotonin acetyltransferase (PDB

code 1KUX).

Introduction 20

1.5 Structural Overview of PCAF HAT

Figure 13. Structure of the PCAFCoA complex representing the general secondary structure of mamallian GNAT

family acetyltranferases and the location of the Ac-CoA binding site. The four domains of the protein are color-coded.

Motifs AD and motif B (based on structural conservation) are colored blue and green, respectively. The N- and C-

terminal protein segments flanking the core are colored magenta and gold, respectively. CoA is colored red (adapted

from [44]).

In the PCAF crystal structure, CoA is bound in a conformation, forming an extensive set of protein

interactions that are mediated predominantly by the pantetheine arm and the pyrophosphate group

[44] with motif A-D and motif B (Figure 13). All but two groups of the 16 member pantheteine

armpyrophosphate chain make contacts with the protein. Most of the contacts are mediated

through either protein backbone hydrogen bonds or protein side chain van der Waals contacts [44].

GNAT conserved residues in PCAF motifs A and B interact extensively with CoA. It could be

noticed that residues 580 and 582587 in the 4loop3 region of motif A make direct and water-

mediated hydrogen bonds with the pyrophosphate group [45]. Thr587 makes a hydrogen bond to

the pyrophosphate oxygen. The aliphatic side chain of GLN581 and a CYSALAVAL sequence

Introduction 21

(residues 574576) at the top of the 4-strand makes van der Waals contacts with the aliphatic part

of the pantetheine arm [44] (see Figure 14 for details).

In addition, the backbone of CYS574 and VAL576 forms hydrogen bonds with the pantetheine arm.

Residues in the 5loop4 region of GNAT motif B interact by van der Waals contacts with the -

mercaptoethylamine segment of the pantetheine arm and thus play a major role in orienting the

reactive sulfhydryl atom for the acetyl transfer [44] (Figure 13). Other protein residues, involved in

the binding, are ALA613, TYR616 and PHE617. Also TYR616 makes van der Waals contacts with

the end of the pantetheine arm near the pyrophosphate group [44]. Residues GLN525 and LEU526,

which are located at the substrate-binding cleft, also make van der Waals contacts with the

pantetheine arm of coenzyme A. The proximity of these residues to the cofactorsubstrate junction

suggests that they play an important role in substrate specificity and/or catalysis [16, 46] (Figure

14).

In the PCAF substrate-binding cleft, there are two residues that are in proximity to act as a general

base for the catalysis via a ternary complex mechanism. These residues, GLU570 in the 4-strand

and Asp610 in the loop between the 5-strand and the 4-helix, are both located in the core domain

of PCAF and are strictly conserved within the GCN5/PCAF subfamily of histone acetyltransferases.

Mutational analysis strongly favors the catalytic involvement of GLU570 since mutation of the

corresponding residue in yeast GCN5 (GLU173) to alanine or glutamine mutations debilitates the

GCN5 activity in both transcriptional activation in vivo and histone acetylation in vitro [47,48]. In

contrast, mutation of the yeast counterpart of ASP610 in PCAF affects slightly the transcriptional

activation in vivo and histone acetylation in vitro [48, 49-51]. According to Clements et al. [44],

GLU570 exists in an ideal environment to play a catalytic role, first because GLU570 is located

proximal to an acidic patch which forms an attractive surface for the basic lysine substrate., and

secondly because the carboxylate of Glu570 is surrounded by several hydrophobic residues

(PHE563, PHE568, ILE571, VAL572, LEU606, ILE637 and TYR640) that probably function to

raise the pKa of the glutamate side chain and thus facilitate the proton extraction from the lysine

substrate. Thirdly, the carboxylate of GLU570 is only ~11.5 away from the putative position of

the reactive thioester of acetyl-coenzyme A [44] (Figure 15).

It was suggested that, the proton extraction may proceed directly through the carboxylate of

GLU570 or, alternatively, through a water molecule. What supports this hypothesis is the presence

Introduction 22

of a water molecule tightly bound to the carboxylate oxygen of GLU570 which is close to the

coenzyme structure [44]. Further requirement for the catalysis is the presence of a hydrogen bond

donor which stabilizes the tetrahedral intermediate. The potential hydrogen bond donor is the

backbone NH of CYS574, although in the presence of the bound substrate additional donors may

also exist (i.e. backbone amine groups of the histone or transcription factor substrate) [44].

Figure14. Schematic representation of the interaction between Ac-CoA and the surrounding residues in the cofactor

binding pocket of PCAF (PDB code 1CM0). Blue arrows refers to backbone hydrogen bonds while green arrows refers

to side-chain hydrogen bonds. Blue areas refers to the ligands exposure, while the residues with light blue shadow

refers protein exposure.

Introduction

23

Figure 15. Binding mode of Ac-CoA at PCAF, showing residues that contribute to ligand binding (PDB code 1CM0).

Aim of the work 24

2 Aim of the Work

In contrast to many nucleotide-dependent protein inhibitors, a small molecule HAT inhibitor

doesnt need to mimic the adenine moiety as the adenine ring is loosely bound to the surface of

HATs. Therefore, less risk exits to get non-selective binding to the multitude of nucleotide binding

proteins (e.g. ATP-binding proteins). In addition, the conserved backbone interactions observed for

CoA may be used to get high-affinity binding, utilizing a wide range of drug-like moieties such as

carboxylate, amide, or sulfonamide groups. The V-shaped cavity in HAT is buried and thus

provides a hydrophobic environment that is suitable to binding small drug-like molecules. The CoA

binding site of GNAT members is conserved and thus similar, while considerable structural

differences could be found in the substrate binding site. These regions have evolved to bind a broad

range of acetyl-group acceptors, including proteins and small-molecule substrates (histones,

cofactors, serotonin, etc.). Thus, to gain selectivity over other homologous proteins that bind Ac-

CoA, it is desirable for an inhibitor to span both sites or to interact with the substrate binding site.

In the current work the focus was put on the docking analysis of known inhibitors for PCAF and the

related serotonin acetyltransferase AANAT for which a series of potent inhibitors has been reported

recently. As all of the currently known PCAF inhibitors show either complex structures or are

natural products with unknown binding mode, the rational discovery of drug-like inhibitors

represents still a challenge.

As there is high homology as well as structural similarity between the cofactor binding pockets of

PCAF and AANAT, the structures of recently identified AANAT inhibitors will be used as

template to design novel PCAF inhibitors. To reach this goal, docking and virtual screening settings

will be tested to find optimal docking conditions for GNAT acetyltransferases. The gained

knowledge on AANAT will then be used to dock compounds identified by similarity searching into

the PCAF binding pocket.

A second focus will be given to the development of recently identified isothiazolones as irreversible

PCAF inhibitors. Different modelling techniques will be applied in order to get ideas to further

improve the activity of this series of compounds and to establish first structure-activity

relationships.

Aim of the work

25

Beside the application of different computer-based methods to identify and develop small molecule

PCAF inhibitors, a special focus will be given on the evaluation of different docking and scoring

methods for available ligand data set. It is hoped, with a systematic evaluation, to improve the

quality of docking and virtual screening methods. These data could be helpful in further improving

the optimization process of PCAF inhibitor lead structures.

Computational Methods 26

3 Computational Methods - Docking and Virtual Screening

Docking is a method which predicts the preferred orientation of one molecule relative to a

second one (usually a macromolecule) to form a stable complex. Knowledge of the preferred

orientation in turn may be used to estimate the strength of association between two molecules

using special mathematical functions called scoring functions. By this way, docking plays an

important role in the rational drug design.

Molecular docking can be used for three main purposes:

1) to predict the binding mode of a known active ligand.

2) to identify new ligands using virtual screening.

3) to predict the binding affinities of related compounds from a known series of actives.

The docking process can be divided into two parts: the search algorithm and the scoring

algorithm. Those two algorithms try to solve the two classical problems of docking process,

the search problem and the scoring problem.

3.1 The Search Problem

The search algorithm should sample the degrees of freedom of the ligand/macromolecule

system sufficiently to include the true binding modes, while the scoring algorithm should

represent the thermodynamics of interaction to distinguish the true binding modes from all

others explored.

Treatment of ligand flexibility can be divided into three basic categories [52]:

- Systematic methods (incremental construction, conformational search, databases)

- Random or stochastic methods (Monte Carlo, genetic algorithms, tabu search)

- Simulation methods (molecular dynamics ab intio docking, energy minimization)

The evaluation and ranking of predicted ligand conformations are always considered as

crucial step of structure-based virtual screening. Even when binding conformations are

correctly predicted, the calculations will not be successful if they cannot differentiate between

true binders and inactives.


3.2 The Scoring Problem

Scoring problem represents the second challenge for docking and virtual screening methods.

Virtual screening is used to identify new lead molecules. In every virtual screening, molecules

must be docked into a protein site to get a predicted pose of ligand binding. The best pose

by scoring of each molecule is then selected to get a top-ranking hit list.

Scoring functions implemented in docking programs make various assumptions and

simplifications in the evaluation of modelled complexes and do not fully account for a number

of physical phenomena that determine molecular recognition for example, entropic effects.

Essentially, three types or classes of scoring functions are currently applied:

- Force-field-based scoring: (D-score [53], G-score [53], Gold [54], Autodock [55], Dock

[56]).

- Empirical scoring: (Ludi [57, 58], F-score [59], Chemscore [60], Score [61, 62], Fresno

[63], X-score [66]).

- Knowledge-based scoring: (PMF [67-69], DrugScore [67], SMoG [68]).

Consensus scoring combines information from different scores to balance errors in single

scores and improve the probability of identifying true ligands. An exemplary implementation

of consensus scoring is X-CSCORE [69, 70], which combines GOLD-like, DOCK-like,

Chemscore, PMF and FlexX scoring functions. However, the potential value of consensus

scoring might be limited, if terms in different scoring functions are significantly correlated,

which could amplify calculation errors, rather than balance them.

In principle, the fitness or scoring functions try to predict the free energies of binding of every

molecule being screened. In practice, the best ranking, that we look for, is the ranking that is

most compatible with the real binding energy. Actually, docking results are often judged by

enrichment of true hits among a larger number of molecules tested, which are determined

by number of real actives among the hit list. The more true positives (real actives) and less

false positives (decoys) we get in the top-scoring hit list, the better enrichment indexes should

be assigned for this docking (virtual screening) run.

In any virtual screening, few benchmarks and metrics for the performance should be

considered: firstly the root mean square deviation (RMSD) between a generated docking pose


and the captured experimental pose in the crystal structure should be considered. Usually

absolute RMSD is used in the docking to estimate the distance between corresponding atom

pairs of two conformers. The optimal docking run should be able to reproduce approximately

the experimental binding pose with RMSD less than 2 . Secondly a visual inspection should

be done for the suggested docking poses and rational judgment of these predictions should be

made by considering the quality of the interactions between the chemical groups of ligands

and the significantly important residues in the protein. In such step, creating the molecular

surfaces with properties maps (electrostatic energy map or van der Waals contacts) could be

essential. The molecular interaction fields for different chemical probes, created e.g. by the

GRID software, could be useful to consider the best predicted binding mode.

Later some enrichment indexes could be calculated like the sensitivity (Se, true positive rate),

which is the ratio of the number of active molecules found by the virtual screening method to

the number of all active database compounds. The second index of enrichment is the

specificity (Sp, false positive rate), which represents the ratio of the number of inactive

compounds that were not selected by the virtual screening methods to the total number of

inactives in the whole database. One of the most used methods currently to describe the

enrichment is the receiver of operator curve (ROC), which describes the selectivity (Se) as a

function of (1-Sp). As Sp is the ratio of discarded inactives to the total inactives, then 1-Sp is

the ratio of the selected inactives, or in another words the selected decoys. The ROC curve is

plotted by considering the different scores of actives as thresholds. For every threshold, the

number of decoys and number of actives within this cut-off is counted. Then we can get the

ROC curve as map of the distribution for actives and decoys according to their scores. By this

method, we avoid the selection of arbitrary threshold by considering all Se and Sp pairs for

each score threshold, which represent important advantage of this method over the other

enrichment indexes [71, 72].

The most difficult challenge for docking, is the accurate prediction of the binding affinities of

compounds, unless if these compounds were from a single series. In all studies there was no

strong correlation between the ability of a docking program to produce a correct pose and its

success in a virtual screen. This difficulty can be attributed to the inherent danger of using a

one single metric such as RMSD, as poses can be fundamentally correct despite a large

deviation in one part of the molecule. Another problem comes from observing those cases

where the poses are barely in the correct binding site or completely with wrong binding mode,


and yet good enrichment is observed. Enrichment may be due to screening out compounds

that are wrong for the target rather than selecting those that are right. Clearly, the enrichment

indexes should be considered but always with visual inspection of predicted binding mode

and its agreement with X-ray structures or the enzymatic kinetic studies for the inhibition type

(competitive, non-competitive, and uncompetitive).

It is always a difficult task to get accurate prediction of binding affinities for a diverse set of

molecules. At its simplest level, this is a problem of subtraction of large numbers,

inaccurately calculated, to get a small number. The large numbers are the interaction energy

between the ligand and protein on one hand and the cost of bringing the two molecules out of

solvent and into an intimate complex on the other hand. The result of this subtraction is the

free energy of binding, which is the ultimate target in any drug design study [73]. The

problem arises from the condensed phases in which biology occurs and also from the many

degrees of freedom of biomolecules [74]. In water, and with highly flexible proteins and

ligands, accurate calculations are much more costly and error prone. Additionally, as pointed

out by Tirado-Rives and Jorgensen [75], the window of activity, as they called it, is very

small. That means that there is just small free energy difference, estimated to be just 4.5

kcal/mol, between the best possible detected ligand in a virtual screening study (potency, ~

50 nM) and the experimental detection limit (potency, approximately 100 M. Among the

most accurate methods today are thermodynamic integration/free energy perturbation

methods, which could sometimes calculate the differences in affinities between related

molecules with accuracy about 1 kcal/mol [76, 77]. But even these methods only compare

close analogues, but they do not predict absolute binding affinities nor can they compare

affinities among the diverse compounds.

3.3 Solvation Effects

Protein-ligand binding happens in a salt-water environment. Such an environment has a strong

effect on energetics of protein-ligand binding. Water has a dielectric constant of about 80,

whereas the dielectric constant of vacuum is 1. As a favorable interaction exists between the

charge and the high-dielectric environment, new one-body solvation energy for each atomic

charge would arise [78]. As a consequence, there can be a substantial energy penalty for

moving the polar part of a ligand out of water and into the binding site.


Moreover, water molecules performs a screening on the charge-charge interactions of fully

hydrated atoms by approximately 80-fold. However, atoms in a protein-ligand interface are

hold apart from the solvent and therefore interact with an effective dielectric constant less

than 80. In general, we can consider atoms that are further apart, more likely to interact

through solvent, and this idea led to introduce a new computational model; called as crude

screening model.

The crude screening model is consisting of a distance-dependent dielectric. For atoms i and j,

the dielectric between two atoms I and j is Dij = C Rij, where C is a constant often set to 4 and

Rij is the inter-atomic distance between two atoms i and j. This model allows the modeling of

one chief effect of the solvent with efficient manner, and it is used in a number of ligand-

protein docking algorithms. However, this model is not enough to account for all solvations

effects [73].

In addition, the electrostatic interaction of two atoms is not only linked to their mutual

distance, it depends also on the positions of all the other protein and ligand atoms, because

these positions determine where the high-dielectric solvent can penetrate. Another important

effect of water is the hydrophobic effect, which is the tendency of water molecules to drive

non-polar solutes together [79]. This promotes the association of non-polar surfaces of the

ligand and the protein. The hydrophobic effect is often considered by an additional solvation

energy term that is proportional to molecular surface area, with a positive coefficient.

Two computational models have been developed to describe the electrostatic solvation effects

of water. The more precise model is called Poisson-Boltzman (PB), while another faster but

less precise model is called Generalized Born approach (GB). Combining the PB or GB

electrostatics model with a surface area term (to account the hydrophobic effect) yields the

PBSA [80] and GBSA [81] solvation models, respectively.

These two models are called implicit solvent models because they do not treat any water

molecules explicitly during a simulation. The influence of solvent on binding can also be

treated with molecular dynamics (MD) or Monte Carlo (MC) simulations that include

thousands of explicit water molecules modeled with an empirical force field [82- 84].

Dielectric screening, the solvation of polar groups, and the hydrophobic effect all emerge


automatically within this approach. But it is substantially more computationally demanding

than an implicit solvent model.

When ligand solvation is not considered in molecular docking, there is no penalty for placing

a charged ligand atom in a region where the receptor only weakly complements it. In this

situation, a highly charged molecule will be overestimated to have better interaction energy

than a true ligand. The true ligand, bearing less formal charge, would be estimated to have

less favorable interaction energy with this receptor site [73].

When a charged molecule transfers from water to a binding site, it changes a high dielectric

for a low dielectric environment. When the cost of moving a charged species from a high to a

low dielectric environment is considered, the bias toward highly charged molecules is

eliminated [73].

In the same way, when non-polar solvation is not considered, larger molecules would

typically receive better scoring values than they should receive. In the docking poses, these

molecules often have fragments that are poorly complemented by the binding site. To solve

this problem, taking the hydrophobic effect into account and considering the non-polar

solvation (estimated by the loss of molecular surface) could result in better estimations. In this

case, molecules that make few favorable interactions with the enzyme would be disfavored

relative to molecules that are well complemented by the binding site. The non-polar solvation

term acts as a balance to the van der Waals term in the interaction energy, leading to

complexes with a higher proportion of interacting surfaces [73].

In summary, ignoring the electrostatic component of ligand solvation results in higher ranking

of compounds with high formal charges than the known neutral inhibitors for this enzyme.

Also ignoring the non-polar component of ligand solvation biases the results towards larger

compounds that dont complement the binding site as the known, smaller ligands [73].

3.4 Solvation Effects and Scoring Functions

The GOLD program [85-87] (Genetic Optimization of Ligand Docking) utilizes a genetic

algorithm (GA) to find an optimal ligand conformation for a given protein target and thus

evaluates poses with a fitness function (Goldscore, Chemscore or ASP score). The Goldscore


fitness function is force-field-based and includes directional hydrogen (H)-bonding term, a

soft van der walls potential (vdw) term, and an internal energy term. The interesting features

of this function are the additional H-bonding term, the indirect consideration of desolvation

through the H-bonding term, and the evaluation of internal energies.

The LUDI program differentiate between the ionic bonding Energy and H-bonds' energy and

also contain a term for accounting the entropic effect, or the contribution due to freezing out

of rotational degrees of freedom upon binding.

FlexX software [59] has modified the LUDI scoring function later to replace the hydrophobic

interaction term (van der walls forces, abbreviated as vdw ) with two terms : one for ligand-

receptor aromatic contacts and another for other hydrophobic interactions, additionally the

coefficients of other terms has been re-calibrated using a set of 19 complexes [59]. Examples

of other empirical scoring functions are Chemscore [60], Fresno [63], Score [61], and the

scoring function of Hammerhead [88, 89]. These scores are only different in their weights or

geometric constraints (which affects the penalty function that accounts for deviations from

ideal H-bond geometry).

At last, there is the scoring function of Autodock [55, 90], which is a function that combines

both force-field-like and empirically based attributes. This scoring function has firstly three

terms similar a molecular-mechanics force field (vdw, H-bonds, and electrostatics), but in this

instance, they are weighted by empirical weighting factors. The last two terms represents the

entropic contribution and the proteinligand solvation penalty.

An intermediate but practical approach to address solvation effects is to treat the solvent as a

continuum dielectric medium [91, 92]. Shoichet et al. have chosen to use continuum

electrostatics to evaluate the ligand solvation term, assuming that the ligand is completely

desolvated upon binding, and that every ligand desolvates the protein equally [73]. They start

with the DOCK energy function and add separate electrostatic and non-polar corrections to

ligand solvation as determined by the program HYDREN [93]. In spite of all approximations

in such method, this simple implementation had a considerable effect on the ranking of the

known actives and the size and charge of other ligands populating the top of the hit list [73].


The scoring functions discussed above aim to approximate the important contributions to the

free energy of binding in a manner consistent with the demands of high-throughput docking

(HTD). Most of the terms added to these functions to address the effects of solvation are

included to capture the qualitative effects in an easily implemented atom-based fashion (i.e.,

weight down pairwise Coulombic interactions, penalize buried polar groups, reward buried

hydrophobic interactions).

Recently some researchers have used more rigorous, physics-based approaches to capture the

effects of solvent in a HTD scoring function (i.e. continuum electrostatics). However, to speed

up the calculations, these more rigorous approaches still utilize an approximate continuum

electrostatics method like generalized born (GB) and take algorithmic shortcuts. With the

availability of faster methods to calculate Poisson-Boltzmann (PB) -based electrostatics, it is

possible to use a full solvation-based HTD scoring function.

To get more precise results using Poisson-Boltzmann/Surface Area (PBSA) implicit model,

electrostatic (Coulombic + solvation) energies could be calculated using ZAP, which is an

OpenEyes library to apply the PBSA implicit solvation calculation [94]. In this method,

solutions to the Poisson-Boltzmann equation are obtained using an exponentially switched

atomic Gaussian function to represent the dielectric boundary, such that the dielectric constant

varies smoothly from = 2 for the molecular region to = 80 for the solvent [95,96,97].

Atomic charges are calculated on a grid with 0.5 spacing. Electrostatic solvation energies

Gelec are then obtained by summing the product of every atoms charge and the potential

over all atoms and subtracting out the self-energy and Coulombic terms.

The apolar contribution to desolvation is calculated using Gap = A, where A is the total

loss of solvent exposed surface area of the protein and ligand upon forming a complex (also

calculated using ZAP). The quantity (= 47 cal/mol/2) was chosen such that Gap

represents the difference (complex vs. protein + ligand) in transfer energy from a low

dielectric environment (such as an alkane solvent or binding site in a protein) with = 2, to a

water with = 80 [98, 99]. Such Equation could be used:

apsolvgaselec

gasvdw

solvbind G+G+EE=G +


The last equation states that the binding energy in in solvent equals to electrostatic energy in

gas phase plus the solvation electrostatic contribution plus the solvents apolar contribution.

The sum of electrostatic + loss area contribution could enhance the correlation with the

observed potency. If area loss term is ignored, calculations comparing the binding affinities

of dissimilar ligands will be biased towards overly charged and overly large molecules.

3.5 Effects of Rescoring Docking Hits using MM-GBSA or MM-PBSA Methods

One of the first applications of molecular mechanicsPoissonBoltzmann surface area (MM-

PBSA) scoring was the trial of Wang et al. which consists of hierarchical technique that used

an initial database screening and a MM-PBSA rescoring to find HIV-1 reverse transcriptase

inhibitors [100]. An initial docking screen with subsequent rescoring by a molecular

mechanicsgeneralized Born surface area (MMGBSA) method has been recently used to

improve the enrichment of known ligands for several enzymes [101105].

MMPBSA and MMGBSA methods involve minimization and often dynamic sampling of

the proteinligand complexes, and include ligand and receptor conformational energies and

strain. They evaluate the electrostatics and solvation components of the binding energy by PB

or GB methods, including the desolvation of both ligand and receptor. The MMGBSA

binding energy is determined by (E (complex) E (receptor) E (ligand)) where E is an energy

estimation using GBSA solvation model [102]. As we are using implicit solvation model, it is

clear that solute configurational entropy effects are completely ignored.

There are three main limitations in these methods:

1) The force fields and solvation energies are not uniformly accurate

2) For reasons of computational efficiency, only a small part of configuration space near the

docking starting pose could be really explored

3) Configurational entropy effects would be ignored.

In spite of these limitations, the MMGBSA rescoring methods represent a substantially

higher level of scoring methodologies than that applied by most docking programs and are

attractive alternatives to the more complete computationally-expensive methods of the energy

calculation like free-energy perturbation and thermodynamic integration [106-108]. The


principal improvement conferred by MMGBSA rescoring over docking is the inclusion of

receptor binding site relaxation and the optimal induced fitting of the docking solutions.

Consequently, this induced fitting could improve the rank of larger ligands that would be

missed by rigid receptor docking.

The structural relaxation with MMGBSA performed well when the initial docking geometry

resembled the crystallographic pose, but there is a little to do when large protein

conformational changes were provoked by ligand binding site or the docking binding mode

was away from the real crystallographic binding mode. In most cases this relaxation led not

only to improved rankings but also improved geometries. For many ligands, RMSD values

between the MMGBSA predictions and the crystallographic results declined relative to those

of the docking predictions and, especially in hydrophilic or anionic cavity, many ligands

refined by MMGBSA had improved hydrogen bonding to the site. But this rescoring method

couldn't rescue the wrong docking solutions for some false negatives (missed hits) [102].

By allowing the receptor to respond to ligand binding, one allows for new and potentially

unfavorable receptor conformations. These must be distinguished by the MMGBSA energy

functions from the true low-energy conformations that may be sampled in solution. This is

challenging and hard task, as the receptor conformational energies are large and the errors in

these calculations are typically on the same order of the net interaction energy of the protein

ligand complex. Although some of the errors are cancelled by subtraction of the internal

energies before and after ligand binding, one is still subtracting two large numbers with

relatively large errors to find a small one, the net binding free energy. Consistent with this

view, ligands could achieve their maximal advantage over decoys on rescoring when we

allowed only a 5 region around the binding site to relax [102].

But still, relaxing the entire system is the more physically correct way to calculate these

energies [102]. Our own results refers that the results with just minimum binding site

relaxation has lower capability to distinguish between real actives and false positives (data not

shown).

Additionally some changes in polarity (due to some substitutes) could increase the solvation

cost, but that could be not captured by the GBSA model. The challenges of balancing ligand

electrostatic interaction energies and desolvation penalties were also apparent in any anionic


cavity [102]. Overall, the results of MMGBSA rescoring of docking hit lists on the model

binding sites seem conflicted. On the one hand, rescoring could:

1) rescue many docking false negatives

2) improve the geometric fidelity of most of the predicted structures

3) and increase the diversity of the hit lists.

PBSA scoring as implemented in DOCK6 is considered as one of the best methods for

rescoring nowadays. This method has proven to be very efficient in increasing the enrichment

factors in spite of the approximation that it contains. Our reliance on fixed ligand

conformations is another source of error in this work; we can improve the result by allowing

the ligand conformational flexibility [109]. Also the calculations in PBSA scoring also do not

correct for lost degrees of rotational and translational freedom on binding, nor do they

consider gains in vibrational entropy of the system on ligand binding.

Moreover it has not been investigated how the terms (vdw, electrostatics, and surface loss)

should add up. In some cases, it would be possible that the desolvation penalizing of hydrogen

bonding groups is not enough adequately [73]. The failure to adequately penalize neutral

polarity also may stem from the use of an inductive method for calculating partial atomic

charges [110]. So it could be thought that using quantum mechanically-derived partial atomic

charges may improve matters [73, 111].

Recently several studies have shown good results in the application of MM-GBSA and MM-

PBSA rescoring methods [112-114]. Even in the absence of more intensive, detailed energy

evaluation schemes, it is clear that fairly simple considerations can dramatically improve the

ability to distinguish binders from non-binders. One of the improvements that could be done

is trying to calculating desolvation penalties that reflected the degree of burial for each

orientation of each ligand [73]. Correcting for solvation helps us to recognize more true

inhibitors and fewer decoys in virtual screening for receptors of known structure.

3.6 Docking Programs and Rescoring Methods

GOLD 4.0 (Genetic Optimization for Ligand Docking) is an automated ligand docking

program that uses a genetic algorithm for flexible ligand docking to a fixed protein structure.


Three different fitness and scoring functions are available with GOLD: Goldscore,

Chemscore, and ASP score.

An additional important feature of GOLD is the possibility to use docking constraints by

several methods:

1) Distance constraint, for use with individual ligands

2) Substructure based distance constraint, for use with multiple ligands that have a common

substructure or functional group.

3) Hydrogen bond constraint, for specifying a hydrogen bond between a particular ligand

atom and a particular atom in the protein.

4) Protein hydrogen bond constraint, for specifying that a particular protein atom should be

hydrogen-bonded to the ligand, but without specifying to which ligand atom.

5) Region (hydrophobic) constraint, for biasing the docking towards solutions in which

particular regions of the binding site are occupied by specific ligand atoms or types of ligand

atom.

6) Template similarity constraint, for biasing the conformation of docked ligands towards a

given solution, or template.

7) Scaffold constraint, to place a ligand fragment at an exact specified position in the binding

site.

Protein hydrogen bond constraint could be used efficiently to find an universal setting that

enable us to perform virtual screening run with high enrichment factors and energetically

preferred binding mode.

3.6.1 PBSA Scoring using ZAP Library and AMBER-score

ZAP library is a PBSA optimizer provided by OpenEye. The Poisson equation in this

approach describes how electrostatic fields change in a medium of varying dielectric, such as

an organic molecule in water. The Boltzmann modification is to take in consideration the

effect of mobile charge, e.g. salt. PB is an effective way to simulate the effects of water in

biological systems. It relies on a charge description of a molecule, the designation of low

(molecular) and high (solvent) dielectric regions and a description of an ion-accessible

volume and produces a grid of electrostatic potentials. From this, transfer energies between

different solvents, binding energies, pka shifts, pI's, solvent forces, electrostatic descriptors,

solvent dipole moments, surface potentials and dielectric focusing are calculated. As


electrostatics is one of the two principal components of molecular interaction (the other, of

course, is the shape complementary factor), ZAP is OpenEye's attempt to solve the whole

electrostatic energy as precise as possible.

The AMBER-score includes the following terms: AMBER molecular mechanics, with

implicit solvation, and molecular dynamics simulation, receptor flexibility, and conjugate

gradient minimization. AMBER-score implements molecular mechanics implicit solvent

simulations with the traditional all-atom AMBER force field for protein atoms and the general

AMBER force field (GAFF) for ligand atoms. The interaction between the ligand and the

receptor is represented by adding the electrostatic and the van der Waals energy terms,

additionally the solvation energy is calculated using a Generalized Born (GB) solvation

model. The user has the option to choose one of the following GB models: (i) Hawkins,

Cramer and Truhlar pairwise GB model with parameters described by Tsui and Case (gb=1)

[115], (ii) Onufriev, Bashford and Case model, GB (OBC) (gb=2) [116], and (iii) a modified

GB (OBC) (gb=5) [75]. The surface area term is derived using a fast LCPO algorithm [117].

The AMBER-score is calculated as:

E (Complex) - [E (Receptor) + E (Ligand)]

where E (Complex), E (Receptor), and E (Ligand) are respectively, the internal energies of the complex,

receptor, and ligand (all solvated) as approximated by AMBER forcefield with GBSA

solvation terms. The calculation of each of these three energies uses the same protocol:

minimization with a conjugate gradient method is followed by MD simulation (Langevin

molecular dynamics at constant temperature), another minimization, and a final energy

evaluation. The user can specify the number of pre-MD-minimization cycles, the number of

MD simulation steps, and the number of post-MD-minimization cycles in the dock input file.

During the final energy evaluation, a surface area term is included. The receptor energy is

determined once. The AMBER-score energy protocol is performed for every ligand and its

corresponding complex.

3.6.2 Cscore

Scoring functions can be adapted from force field approaches, estimating the enthalpy of

binding via the pair-energy of the complex. Other functions estimate the entropy of

binding, incorporating terms for desolvation and loss of conformational flexibility.

While such functions are more chemically appealing, they require significantly more


statistical fitting than those based on force fields. FlexX is an example of this second

approach. Statistically-fit functions are dependent on their training set. Each author has

tried to make this as general as possible, but concerns remain as to the extensibility of

these functions to new systems. Since each scoring function has been derived from a

different set of crystal structures, it is reasonable to use multiple functions when

evaluating a protein-ligand pair.

According to the consensus scoring principles, Structures which are considered good fits

in multiple scoring functions can be examined further, while those which do not can be

dropped. CScore approach could be used in Sybyl7.3 and 8.1 as consensus scoring in

virtual high throughput screening [118]. CScore provides several functions:

G_score, based on the work of Willett's group. D_score, based on the work of Kuntz et al. PMF_score, based on the work of Muegge and Martin. Chemscore, based on the work of Eldridge, Murray, Auton, Paolini, and Mee .

The consensus can be generated from any combination of these or other previously-calculated

scores. There is possibility to add FlexX scoring function to Cscore if the FlexX license is

available.

3.7 Similarity Search

Several fingerprint systems are implemented in Chemical Computings package: Molecular

Chemical fingerprints can be used to search in large compound databases for structurally

related molecules to a given search query. Several fingerprint systems are implemented in

Chemical Computings Molecular Operating Environment (MOE) [119]. Moreover each

fingerprint system will support a number of similarity metrics and use different

representation. Most important fingerprints systems are:

1) MACCS Structural Keys (feature list version). Each feature indicates the presence of

one of the 166 public MDL MACCS structural keys computed from the molecular

graph. The fingerprint is represented as a sparse list of keys present in the molecule.

2) Bit MACCS: MACCS Structural Keys (bit packed version). Each feature indicates

the presence of one of the 166 public MDL MACCS structural keys calculated from

the molecular graph. The fingerprint is a dense bit vector of feature bits 6 words long.


3) Protein Ligand Interactions Fingerprints: Each feature represents a protein-ligand

interaction type, e.g. hydrogen bond or ionic interaction.

4) PiDAPH3: 3-point pharmacophore based fingerprint calculated from a 3D

conformation. Each atom is given one of 8 atom types computed from 3 atomic

properties: "in pi system", "is donor", "is acceptor". Anions and cations are not

represented. Then, all triplets of atoms are coded as features using the three inter-

atomic distances and three atom types of each triangle. The resulting fingerprint is

represented as a sparse feature list.

5) piDAPH4: 4-point pharmacophore based fingerprint calculated from a 3D

conformation. Each atom is given one of 8 atom types computed from 3 atomic


represented. Then, all quadruplets of atoms are coded as features using the six inter-

atomic distances, four atom types and chirality of each quadruplet. The resulting

fingerprint is represented as a sparse feature list.

6) GpiDAPH3: 3-point pharmacophore based fingerprint calculated from the 2D

molecular graph. Each atom is given one of 8 atom types computed from 3 atomic


represented. Then, all triplets of atoms are coded as features using the three graph

distances and three atom types of each triangle. The resulting fingerprint is represented

as a sparse feature list.

Tanimoto similarity search could be later accomplished using MOE. Tanimoto similarity

module calculates the similarity values for each target molecule with respect to one or more

reference molecules using molecular fingerprints systems. The Tanimoto similarity search is

defined by the expression: Similarity = Nab/ (Na+Nb+Nab)

where : Nab is the number of fingerprint bits presented in both reference and target molecule,

Na is the number of fingerprint bits presented only in the Reference molecule, Nb is the

number of fingerprint bits presented only in the Target molecule. Tanimoto similarity index

ranges from zero (no common bits) to one (exact same bits).

3.8 ZINC Compound Library

ZINC, is a free database of commercially-available compounds for virtual screening, provided

from the University of California-San Francisco. The number of commercial compounds

included in ZINC currently is over 8 million purchasable compounds in ready-to-dock, 3D

Computational Methods

41

formats. ZINC 8 is available currently on-line for download (http://zinc.docking.org). It is

currently built from the catalogs of ten major compound vendors, and is updated periodically

by deleting the unavailable compounds and updating the vendors lists or even adding new

chemical vendors. Of these 8 Millions compounds, there are 5 Millions compounds which are

Lipinski compliant [120] with the caveat that Molinspirations LogP has been used as a

surrogate for cLogP. Of these, 1.1 Million are lead-like molecules, which are defined as

having molecular weight between 150 and 350, calculated LogP less than four, number of

hydrogen-bond donors less than or equal to three, and number of hydrogen-bond acceptors

less than or equal to six. A total of 63 thousands molecules are fragment-like,

- with calculated LogP values between -2 and 3

- less than three hydrogen-bond donors

- less than six hydrogen-bond acceptors

- less than three rotatable bonds

- molecular weight less than 250

3.9 Fragment-based Drug Design

Knowledge of how good a given fragment binds to a protein target, allows us to optimize the

hits by growing the fragments or even by finding new leads by combining and linking

different fragments. The main benefit of using fragments rather than small-molecules is the

notable reduction of the space size as fragments contains less number of atoms.

The fragment universe is much smaller in size than the chemical universe of small molecules.

The size of the chemical universe of compounds below 160 Da is estimated to be about ~14

million compounds [121]. So, screening a fragment library of 10,000 compounds captures

substantially more chemical diversity space than a conventional high-throughput screening.

An additional factor working in favour of fragment-based screening is that hypothesis

proposed by Hann and co-workers [122], this hypothesis states that less complex molecules

should show higher hit rates against protein targets. As a result, even though a typical

fragment screen will only explore much less than 1% of the available low-molecular-mass

universe, the ability to find leads is substantially higher and subsequently increases the value

of the screen. This theoretical model has been recently validated by the Novartis group [123],

in which the observed hit rates for fragment screens were 101,000 times higher than

conventional high-throughput screens.

Implementation 42

4 Implementation 4.1 Molecular Modeling

Seven X-ray crystal structures are reported in the Protein Data Bank for mammalian AANAT.

The three protein-ligand structures with the highest crystallographic resolution (PDB codes

1CJW, 1KUV and 1KUX) represent suitable targets for virtual screening purposes [35]. All

three structures contain a potent bi-substrate inhibitor of AANAT. 1KUX (resolution 1.8 )

has been selected for the current virtual screening study. The coordinates of the protein were

extracted from the corresponding pdb file. The inhibitor was removed, an

Date post:	10-Nov-2015
Category:	Documents
Upload:	suhaib
View:	7 times
Download:	0 times

Discovery and Molecular Modeling of Small Molecule Inhibitors of the Histone Acetyltransferase PCAF:...

Documents