University of Groningen The application of molecular ... · The Application of Molecular Dynamics...

University of Groningen

The application of molecular dynamics simulation techniques and free energyPieffet, Gilles

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite fromit. Please check the document version below.

Document VersionPublisher's PDF, also known as Version of record

Publication date:2005

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):Pieffet, G. (2005). The application of molecular dynamics simulation techniques and free energy.Groningen: s.n.

CopyrightOther than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of theauthor(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policyIf you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediatelyand investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons thenumber of authors shown on this cover page is limited to 10 maximum.

Download date: 06-04-2020

https://www.rug.nl/research/portal/en/publications/the-application-of-molecular-dynamics-simulation-techniques-and-free-energy(530b6461-a0ef-4397-9532-3e692d3b33ce).html

https://www.rug.nl/research/portal/en/publications/the-application-of-molecular-dynamics-simulation-techniques-and-free-energy(530b6461-a0ef-4397-9532-3e692d3b33ce).html

The Application of Molecular Dynamics

Simulation Techniques and Free Energy

Calculations to Predict Protein-Protein

and Protein-Ligand Interactions

This Ph.D. study was carried out in the Groningen Biomolecular Sciences andBiotechnology Institute (Faculty of Mathematics and Natural Sciences, Universityof Groningen).

RIJKSUNIVERSITEIT GRONINGEN

The Application of Molecular DynamicsSimulation Techniques and Free EnergyCalculations to Predict Protein-Protein

and Protein-Ligand Interactions

Proefschrift

ter verkrijging van het doctoraat in de

Wiskunde en Natuurwetenschappen

aan de Rijksuniversiteit Groningen

op gezag van de

Rector Magnificus, dr. F. Zwarts,

in het openbaar te verdedigen op

maandag 25 april 2005

om 14.45 uur

door

Gilles Paul Pieffet

geboren op 31 december 1973

te Parijs

ii

Promotor Prof. dr. A.E. Mark

Beoordelingscommissie Prof. dr. B.W. DijkstraProf. dr. B. PoolmanProf. dr. D.B. Janssen

iii

There is more to life than proteins even if proteins seem to belife’s favourite mode of expression.

iv

v

Contents

List of Figures vii

List of Tables xiii

1 Introduction 1

1.1 Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 The protein folding problem . . . . . . . . . . . . . . . 41.3 Experimental structure determination . . . . . . . . . 5

1.4 Molecular dynamics . . . . . . . . . . . . . . . . . . . 51.5 Free energy . . . . . . . . . . . . . . . . . . . . . . . . 81.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Self association of the EPO mimetic peptide 1 13

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 152.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.1 Simulation parameters . . . . . . . . . . . . . . 17

2.2.2 Aggregation simulations . . . . . . . . . . . . . 192.2.3 Dimerization simulations . . . . . . . . . . . . 19

2.2.4 Conformational space search . . . . . . . . . . 202.2.5 Effect of the temperature . . . . . . . . . . . . 202.2.6 Analysis . . . . . . . . . . . . . . . . . . . . . . 21

2.3 Results/Discussion . . . . . . . . . . . . . . . . . . . . 212.3.1 Aggregation of the EPO mimetic peptide 1 . . 212.3.2 Dimerization of the EPO mimetic peptide 1 . . 22

2.3.3 Conformational space search, Binding modes andmain clusters of conformations . . . . . . . . . 28

2.3.4 Effect of the temperature on the dimer confor-mation . . . . . . . . . . . . . . . . . . . . . . . 32

2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 36

vi Contents

3 Free energy calculations of protein-ligand interactions:

the binding of triphenoxypyridine derivatives to factor

Xa and trypsin 39

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 413.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.2.1 Mutations . . . . . . . . . . . . . . . . . . . . . 433.2.2 Force Field . . . . . . . . . . . . . . . . . . . . 453.2.3 Computational Details . . . . . . . . . . . . . . 453.2.4 Free energy calculations . . . . . . . . . . . . . 46

3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.3.1 Mutations in water . . . . . . . . . . . . . . . . 473.3.2 Ligand-protein complexes . . . . . . . . . . . . 553.3.3 Mutations in factor Xa and trypsin . . . . . . . 563.3.4 Experiment vs. calculation . . . . . . . . . . . 59

3.4 Discussion and conclusions . . . . . . . . . . . . . . . . 60

4 Free energy calculations of the relative stability of the

SUC1 dimer upon mutation 65

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 674.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . 694.3 Results and discussion . . . . . . . . . . . . . . . . . . 734.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 83

5 Sampling and convergence in free energy calculations:

Suc1 as a case study 85

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 875.2 Background . . . . . . . . . . . . . . . . . . . . . . . . 885.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . 905.4 Results and discussion . . . . . . . . . . . . . . . . . . 91

5.4.1 Sampling error and statistical errors in free en-ergy calculations . . . . . . . . . . . . . . . . . 93

5.4.2 Statistical error vs sampling error . . . . . . . 1005.4.3 Convergence in free energy calculations . . . . 101

5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 102

6 Conclusion - Outlook 105

6.1 Free energy calculations . . . . . . . . . . . . . . . . . 1076.2 The sampling issue . . . . . . . . . . . . . . . . . . . . 108

Bibliography 109

Summary 117

Samenvatting 119

Contents vii

Acknowledgements 121

viii Contents

ix

List of Figures

2.1 Crystallographic structure of the EBP-EMP1 complex and of thepeptide dimer of EMP1. . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 Starting conformations used for studying the dimerization of EMP1. 202.3 Aggregation of EMP1 as a function of time at two different tem-

peratures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.4 Structure of EMP1 before and after aggregation. . . . . . . . . . . 23

2.5 Variation of the solvent accessible surface as a function of time. . . 232.6 Distance between centers of mass as a function of time. . . . . . . 252.7 Intermolecular secondary structure elements of the simulation Pair1a

as a function of time. . . . . . . . . . . . . . . . . . . . . . . . . . . 262.8 Intermolecular secondary structure elements for the six simulations

Pair1a,1b,2a,2b,3a and 3b of 100 ns each. . . . . . . . . . . . . . . 272.9 Root mean square deviation matrix of the dimer for the 6 simulations. 292.10 Root mean square deviation matrix of the monomers during the 6

simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.11 Backbone representations of the 4 most populated clusters found

for the monomer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.12 Backbone of representative structures of the 10 most populated

clusters found for the dimer. . . . . . . . . . . . . . . . . . . . . . . 312.13 Size of the 20 most populated clusters of the dimer. . . . . . . . . 31

2.14 Root mean square deviation matrix of the dimer for 18 simulationsof 100 ns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.15 Intermolecular secondary structure of pair1 and pair3 at 350 K,400 K and 450 K as a function of time. . . . . . . . . . . . . . . . . 34

2.16 Root mean square deviation matrix of the dimer at 350, 400 and450 K. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.1 The generic chemical structure of the inhibitors, 2,4,6-triphenoxypyridine. 42

3.2 Mutations performed to transform inhibitors I1–I6 into each other. 44

x List of Figures

3.3 Mutations performed to transform inhibitors I2, and I7–I10 intoeach other. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.4 Thermodynamic cycles used to determine the difference in bindingfree energy between inhibitor X and inhibitor Y . . . . . . . . . . . 48

3.5 Schematic diagram of the mutations between the inhibitors in water. 493.6 The I2 → I5 mutation. . . . . . . . . . . . . . . . . . . . . . . . . 523.7 Free energy profiles of the mutation 2→5 and 2→5* over 150 ps

and 6 ns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.8 Cumulative and local average of the free energy derivative at λ =

0.65 for the mutation I2 → I5 in water. . . . . . . . . . . . . . . . 543.9 The binding site of factor Xa bound to I1. . . . . . . . . . . . . . . 563.10 The binding site of trypsin bound to I1. . . . . . . . . . . . . . . . 563.11 Schematic diagram of the mutations between the inhibitors in fac-

tor Xa. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.12 Schematic diagram of the mutations between the inhibitors in trypsin. 583.13 Schematic illustration of the inhibitors classified according to their

relative affinities to trypsin and factor Xa. . . . . . . . . . . . . . . 60

4.1 Crystallographic structure of Suc1 in its monomeric and strand-exchanged dimeric form. . . . . . . . . . . . . . . . . . . . . . . . . 68

4.2 Root Mean Square Deviation plot of Monomer-nce and DimerB-D-nce during equilibration. . . . . . . . . . . . . . . . . . . . . . . . . 74

4.3 Calculated versus the experimental relative free energy of dissoci-ation for mutants with respect to the wild type. . . . . . . . . . . . 77

4.4 Free energy profiles of all the mutations performed on the monomerof Suc1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.5 Free energy profiles of the LA and VA mutations. . . . . . . . . . . 794.6 Evolution of the χ1 and χ2 angles of the Leu residues that are being

mutated into Ala as a function of time. . . . . . . . . . . . . . . . 80

5.1 Free energy profiles for both the monomer and the dimer of themutations V89A, L74A and L95A. . . . . . . . . . . . . . . . . . . 94

5.2 Characteristics of the simulation of the mutation L74A performedat λ = 0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5.3 Free energy derivative averages and evolution of the dihedrals χ1

and χ2 of Leu 74 for the mutation L74A performed at λ = 0.5. . . 985.4 Example of the effect of soft-core interactions at λ = 0.5. . . . . . . 1005.5 Free energy derivative averages for the mutation L74A reverse at

λ = 0.45 and L74A random at λ = 0.40. . . . . . . . . . . . . . . . 1015.6 Free energy profiles of the mutations L74A and L95A using ex-

tended simulations for specific λ points. . . . . . . . . . . . . . . . 103

xi

List of Tables

1.1 The 20 amino acids together with their three and one letter code. . 4

2.1 Summary of the simulations performed to study the self associationof EMP1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 Summary of the parameters used in the simulations. . . . . . . . . 19

3.1 The set of 10 inhibitors binding to factor Xa and trypsin. . . . . . 433.2 The free energies for mutating inhibitor X into inhibitor Y in water. 503.3 The free energies for the mutations I2 → I5 and I2 → I5* in water

using different sampling times. . . . . . . . . . . . . . . . . . . . . 523.4 Free energies of mutating one inhibitor into another along circular

paths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.5 The relative binding free energy of the inhibitors to factor Xa and

trypsin with respect to inhibitor I2. . . . . . . . . . . . . . . . . . 593.6 Experimental results of the binding affinities of the inhibitors to

factor Xa and trypsin. . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.1 List of the mutations performed on the dimer and monomer of Suc1. 734.2 Free energies calculated for the mutations of the Suc1 using ther-

modynamic integration and the corresponding experimental results. 734.3 Relative stability of the Ala95 mutant dimer with respect to the

wild type dimer (Leu95) for different values of the soft-core param-eter α. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.1 List of the λ values for which extended simulations were performedfor the L74A and L95A mutations. . . . . . . . . . . . . . . . . . . 91

5.2 The free energies obtained for the mutations V89A, L74A and L95Ausing different schemes. . . . . . . . . . . . . . . . . . . . . . . . . 92

5.3 Sampling characteristics of the mutation L74A of the monomer forsimulations at λ = 0, 0.40, 0.45, 1. . . . . . . . . . . . . . . . . . . 95

xii List of Tables

5.4 List of the free energies obtained for the mutations L74A and L95Afor extended simulations. . . . . . . . . . . . . . . . . . . . . . . . 102

xiii

Abbreviations

CDK cyclin-dependant kinase

CKS cyclin-dependant kinase subunit

COM Center of Mass

EMP1 EPO Mimetic Peptide 1

EPO Erythropoietin

EM Energy Minimization

FE Free Energy

FEP Free Energy Perturbation

MD Molecular Dynamics

MC Monte Carlo

NMR Nuclear Magnetic Resonance

NOE Nuclear Overhauser Effect (or Enhancement)

NOESY NOE Spectroscopy

PDB Protein Data Bank

SUC1 Suppressor of cdc2

TI Thermodynamic Integration

TOCSY Total Correlation Spectroscopy

RMSD Root Mean Square Deviation

SPC Simple Point Charge

WT Wild Type

xiv Abbreviations

1

Chapter 1Introduction

This chapter provides a general introduction to proteins, protein folding and thetechniques, both experimental and computational, commonly used for their study.The protein folding problem is also posed. As all the simulations in this thesis areperformed using molecular dynamics simulation techniques, a special emphasisis placed on this method. The concept of free energy is discussed within theframework of protein folding. The methods used to calculate relative free energiesare also introduced.

2 Chapter 1. Introduction

1.1. Proteins 3

Proteins are at the center of most, if not all, biological processes. Their rangeof activity spans from receptor activation and signal transduction to regulatingcellular processes such as membrane fusion. One class of proteins called enzymescan be seen as molecular factories catalyzing very specific chemical reactions.Proteins can also be found in membranes where they regulate the transport ofspecific molecules such as ions and metabolites across cellular boundaries. Acommon feature in all these cases is that proteins carry out their function throughbinding and interaction with specific partners.

Our view of proteins continues to evolve over time. At the beginning of thelast century, proteins were generally thought to be simple colloids. The first crys-tal structure of myoglobin solved in 1958 [1] revealed that proteins could have acomplex well defined 3D structure. The fact that all of the initial proteins studiedin detail had structures solved by x-ray crystallography lead to the dogma thatall functional proteins had well defined structures. However, numerous proteinshave recently been found to lack intrinsic structure under physiological condi-tions [2]. It appears that they only become structured upon binding to a targetmolecule. Among other advantages, such a mechanism would confer the abilityto bind, maybe in different conformations, to several different targets. This posesthe question of the nature of the native structure. In early studies [3] it was al-ready questioned whether the native structure of a protein (its functional form)corresponded to the thermodynamic equilibrium structure or whether the func-tional form was just transient and existed only for the period of time needed forthe protein to perform its specific function.

In most cases, however, it does appear that the native structure of a protein isunique and directly related to its function. For this reason much effort has goneinto understanding the nature of the protein structure. The structure adoptedby a protein is the result of a complex molecular recognition mechanism thatdepends on the cooperative action of many weak non-bonding interactions (vander Waals and Coulombic). Since all the information necessary for a protein tofold to a unique structure is solely contained in the amino acid sequence [3], manymethods have been developed which attempt to predict the folding behavior of aprotein on the basis of its sequence only. No simple solution has yet been foundto this extremely complex problem. In addition, the specific structure observedunder certain conditions (pH, presence of salt...) depends not only on the sequencebut also on the nature of the environment increasing the complexity of structureprediction.

1.1 Proteins

Proteins are biological macromolecules. They consist of a chain of amino acids(or residues) linked by peptide bonds. There are 20 naturally occurring aminoacids (Table 1.1). The residues are composed of two parts, a backbone and a side-chain. The backbone is identical for all residues with the exception of proline.


Table 1.1: The 20 amino acids found in nature together with their three and one lettercode.

Residue Residue

Alanine Ala A Leucine Leu LArginine Arg R Lysine Lys KAsparagine Asn N Methionine Met MAspartate Asp D Phenylalanine Phe FCysteine Cys C Proline Pro PGlutamine Gln Q Serine Ser SGlutamate Glu E Threonine Thr TGlycine Gly G Tryptophane Trp WHistidine His H Tyrosine Tyr YIsoleucine Ile I Valine Val V

All the differences observed in the structures of different proteins are thereforedetermined by the side-chains.

Protein structure can be represented at different levels. The primary structurecorresponds to the sequence of amino acids. The secondary structure refers to theformation of local structure [4] and describes the conformation of the backbone.Characteristic elements are α-helices and β-sheets and the structure of a protein isoften described in terms of these two elements of secondary structure. It should benoted that structures described by secondary structure elements are local in spaceand primarily involve sequential residues (α-helices) or complementary sequences(β-sheets). Tertiary structure refers to the 3D structure or the spatial arrangementof the elements of secondary structure.

1.2 The protein folding problem

As stated above, the key to understanding the function of a protein is to beable to determine its structure. The protein folding problem can generally bedefined as knowing the relationship between the amino-acid sequence and thenative structure. The problem may be more clearly understood if expressed interms of two separate aspects. The first concerns the determination of the nativestructure of the protein from the amino-acid sequence only. The second is relatedto understanding the folding process itself. As will be seen later in this chapter,a method capable of correctly predicting the final structure does not necessarilyyield any insight into the mechanism of folding itself. In the same way thatknowing the fold/conformation of a protein does not allow one to predict the

1.3. Experimental structure determination 5

amino-acid sequence.

1.3 Experimental structure determination

The most important methods used to determine the structure of a protein are x-ray crystallography and NMR spectroscopy [5] and most of the structures availablefrom public databases such as the protein data bank were obtained using thesemethods. Each of them has strengths and weaknesses. X-ray crystallographyyields high resolution but (almost) no dynamical information. NMR spectroscopyin contrast usually yields less precise structures (the time resolution of the methodis such that the signal corresponds to a time and ensemble average of structures)but offers the possibility to extract some information on the dynamical propertiesof the system.

1.4 Molecular dynamics

Molecular dynamics is the method of choice when one wants to study the dy-namical properties of a system in full atomic detail, provided that the propertiesare observable within the time scale accessible to simulations. Time scale is oneof the two main limitations of the method as will be discussed later. Moleculardynamics simulations are also useful when the system cannot be studied by theexperimental methods mentioned above. For example when the protein cannotbe crystallized or is too big or insoluble to be studied by NMR.

To calculate the dynamics of the system, that is the position of each atom asa function of time, Newton’s classical equation of motion are solved iteratively foreach atom:

Fi = miai = mid2ri

dt2(1.1)

The force on each atom is the negative of the derivative of the potential energywith respect to the position of the atom:

Fi = −∂V

∂ri(1.2)

If the potential energy of the system is known then, given the coordinates of astarting structure and a set of velocities, the force acting on each atom can becalculated and a new set of coordinates generated, from which new forces can becalculated. Repetition of the procedure will generate a trajectory correspondingto the evolution of the system in time.

The accuracy of the simulations is directly related to the potential energyfunction used to describe the interactions between particles. In molecular dynam-ics, a classical potential energy function is used that is defined as a function of


the coordinates of each of the atoms. The potential energy function is separatedinto terms representing covalent interactions and non-covalent interactions. Thecovalent interactions may be described by the following terms:

Vbond =

Nb∑

i=1

1

2kb

i (ri − r0,i)2 (1.3)

Vangle =

Nθ∑

i=1

1

2kθ

i (θi − θ0,i)2 (1.4)

Vdihedral =

Nφ∑

i=1

1

2kφ

i cos(ni(φi − φ0,i)) (1.5)

Vimproper =

Nξ∑

i=1

1

2kξ

i (ξi − ξ0,i)2 (1.6)

which correspond to two, three, four and four body interactions, respectively.These interactions are represented by harmonic potentials for the bond lengthsri, for the bond angle θi, and for the improper dihedral (out of the plane) angleξi and by a more complex potential for the dihedral angles φi. The non-covalentinteractions, which correspond to interactions between particles separated by morethan three covalent bonds are usually described by Coulomb’s law

VCoulomb =∑

i<j

1

4πε0εr

qiqj

rij(1.7)

for the electrostatic interactions and by a Lennard-Jones potential

VLJ =∑

i<j

Aij

r12ij

−Bij

r6ij

(1.8)

for the Van der Waals interactions where rij is the atomic distance between parti-cle i and j. The complete set of parameters used in the potentials (force constants,ideal bond lengths, bond angles, improper dihedral angles, dihedral angles, par-tial charges and Van der Waals parameters) to describe the interactions betweendifferent particle types is called the force field.

Molecular dynamics is a very useful tool. It can provide a wealth of detailedinformation on the structure and dynamics of proteins and peptides. However, itsuffers certain limitations. First, the method is computationally very demandingand depending on the size of the system simulation times are currently limitedto hundreds of nanoseconds or a few microseconds at most. For example the cu-mulated simulation time of the peptide studied in the second chapter, the EPO

1.4. Molecular dynamics 7

mimetic peptide 1, is approximately 2.4 µs. This is too short to observe, for in-stance, the complete folding of a protein which occurs on a time scale rangingfrom milliseconds to seconds. Also, the form of the potential energy functionmust be kept simple for reasons of efficiency. The possibility to observe certainproperties is directly related to the quality of the force field and, whether or notit has been parameterized for the system simulated. The quality of the force fieldis especially critical in the simulation of proteins. Proteins are in general onlymarginally stable. The difference in free energy between the folded and unfoldedform is in the order of 10-20 kbT which corresponds to the energy associated withthe formation of a couple of hydrogen bonds in vacuum. The force field thusneeds to be very accurate to discriminate between different conformations. How-ever, it is questionable whether an empirical force-field can achieve the requiredaccuracy especially when important effects such as polarization of the atoms bytheir environment is not taken into account by the electrostatic potential. Thelast limitation is that a classical description of the particles is used. This prohibitsthe study of quantum-mechanical based phenomena such as electron transfer orbond breaking/formation. It should be noted that mixed QM/MM mixed meth-ods exist that can treat this type of phenomena but due to the computational costof including a quantum treatment for part of the system, the simulation times arerestricted to hundreds of picoseconds. Such simulations are restricted to essentialQM process as, for example, the study of electron transfer.

Before finishing this section on MD, a few other theoretical methods that canbe used to study proteins should be mentioned. The Monte Carlo (MC) method[6, 7] was historically used before MD. Monte Carlo procedures also involve theevaluation of a potential energy but differ in that an ensemble of conformations isgenerated by performing random displacements of the atomic positions from oneconformation to the other, accepting or rejecting these based on the Metropoliscriteria. Its main advantage is that it allows crossing (hopping over might be amore accurate image) of high-energy barriers provided that they are narrow. Themethod is also very efficient in sampling low or medium density systems but notdense systems such as proteins in solution. The main disadvantage with respectto MD is that the dynamics of the system is lost and no insight can be gained forinstance on folding pathways. Homology modeling can reliably predict the foldof a protein if its sequence is close enough (>25% identity) to the sequence of aprotein with a known structure. However, even when the method is successful inpredicting the correct fold it still does not give any information on the nature ofthe interactions, on the pathways or dynamics leading to the folding of the proteinand therefore does not provide any insights on the physics involved in the foldingprocess.


1.5 Free energy

The free energy is a thermodynamic function that determines the equilibrium ofa system. It is related to many if not all the physical properties a chemist ora biochemist might find of interest such as binding constants or conformationalpreferences. The free energy is in a way the key to the folding problem as in mostcases it is believed that the native state of a protein corresponds to its lowest freeenergy state. A very popular view of the general folding mechanism is that thereis an overall bias in the free energy towards the native state which is representedby a funnel in the free energy landscape when plotting the configurational entropyand the configurational energy as a function of a progress variable (for examplethe number of native contacts) [8, 9, 10]. This would explain why a protein wouldonly visit a fraction of the conformational space before folding into its native state,solving Levinthal’s paradox [11].

However it has been found using a simple model that a small penalty termapplied to locally incorrect bond configurations can reduce dramatically the con-formational space really accessible to proteins [12]. In this model, the proteinwas described by the states of N bonds connecting N+1 amino acids, each bondbeing characterized by only two states, correct (c) or incorrect (i). Changes inthe system were made through to the conversion of a bond from c to i with a ratek0 or from i to c with a different rate k1. Applying a small energy penalty formaking an incorrect bond, it was found that the lowest (free) energy conformation(when all the bonds are correct) could be obtained within a biologically relevanttime scale. Using another simple model, it was also shown that the free energylandscape does not necessarily need to have a funnel like shape or other propertiesthat have been proposed by some to be relevant [13] such as a large energy gapbetween the native and the lowest non native structure.

The free energy is usually expressed as the Helmholtz free energy, F, foran isothermal-isochoric system (the corresponding ensemble is referred to as thecanonical ensemble) or the Gibbs free energy, G, for an isothermal-isobaric systemrespectively.

Using statistical mechanics, the Helmholtz free energy can be expressed interms of the canonical partition function Z:

F = −kBT ln Z (1.9)

where Z is defined as

Z =1

h3NN !

∫ ∫

e−H(p,r)/kBT dp dr (1.10)

for a system of N indistinguishable particles. The 3N-dimensional vectors r andp respectively correspond to the coordinates and conjugate momenta of all theparticles of the system. Each pair (r, p) represents one point in the phase space ofthe system defined by all possible values of r and p. It can be seen that from the

1.5. Free energy 9

definition of the partition function Z the absolute free energy can usually not becalculated as it requires the sampling of the complete phase space of the system.What can be determined is the free energy difference between two states of asystem.

The relative free energy between two states A and B of a system is given by:

4FBA = F (B) − F (A) = −kBT lnZB

ZA(1.11)

which corresponds to the probability of finding the system in one state with respectto the other. Calculating the free energy with this method can be extremelyinefficient depending on the type of process studied. In the case of the binding oftwo molecules, many association/dissociation events must be sampled in order toobtain reliable statistics on the process. Unfortunately, for strongly interactingsystems, the rate of dissociation can be too slow to be simulated. This will bediscussed in detail in relation to the dimerization of the EPO mimetic peptide1 studied in the next chapter. However, the method can be successfully used tostudy the folding-unfolding thermodynamics of small peptides in rapid equilibriumfor which conformational preferences can be calculated [14].

The free energy difference between two states A and B of a system can also becalculated as the work done on the system to force the transition from one stateto the other. Standard methods to calculate free energy are the ThermodynamicIntegration (TI) method [15] and the free energy perturbation (FEP) method [16].Both make use of the so-called coupling parameter approach where the state ofthe system is coupled to a parameter λ. More precisely, the Hamiltonian is definedas a function of this coupling parameter λ which connects both the initial andend states such that H(λA) = HA corresponds to state A and H(λB) = HB tostate B.

If the Hamiltonian is made a function of λ, the free energy also becomes afunction of λ. In this case, the relative free energy between the two states A andB can be expressed as:

4FBA = F (λB) − F (λA) =

∫ λB

λA

∂F (λ)

∂λdλ (1.12)

=

∫ λB

λA

⟨

∂H(λ)

∂λ

⟩

λ

dλ (1.13)

where 〈...〉λ represents an average over the ensemble at the corresponding λ value.Formula 1.13 is referred to as the thermodynamic integration formula [15]. TI cal-culations can be performed according two different schemes. The integration canbe performed continuously while slowly changing the coupling parameter λ fromλA to λB during the course of the simulation (slow growth method). This schemeis usually not used as the system lags behind the changing Hamiltonian and neverequilibrates appropriately [17]. The other scheme is to perform simulations at


certain λ points and to evaluate the integral numerically. This way the conver-gence of the simulations can be checked independently at each λ point and extraλ points can be added if needed. This method is used in chapter 3 to evaluate therelative affinity of several ligands to two different serine proteases and in chapter4 to evaluate the relative stability of a swapped dimer (SUC1) upon mutation.The method is very demanding as equilibrium simulations must be performed atintermediate states during which a representative ensemble must be sampled [18].It should be noted that a complete sampling of the conformational space is notneeded, even though results are still directly related to the extent of the phasespace sampled.

Combining equation 1.11 with the coupling parameter approach leads to:

4FBA = F (λB) − F (λA) = −kBT lnZ(λB)

Z(λA)(1.14)

= −kBT ln⟨

e−[H(λB)−H(λA)]/kBT⟩

λA

(1.15)

which has the form of an ensemble average over state A. Formula 1.15 is knownas the (free energy) perturbation formula [16]. Although the method is formallyexact (there is no assumption made as to the size of the perturbation and, asopposed to perturbation theories in statistical mechanics, there is no truncatedexpansion), it theoretically requires the sampling of the complete phase space ofthe reference state and therefore poses the same problem as the calculation ofabsolute free energy. In practice, however, convergence can be obtained if lowenergy conformations can be sampled both for the reference and the perturbedstate which means that conformations sampled in the reference state A also havea high probability in the perturbed state B. In order to have significant overlapof the low energy regions of both ensembles, the perturbation must be small. Forthis reason the change between A and B is usually expressed as a sum over aseries of small changes from λ to λ + ∆λ:

4FBA = −kBT

λB∑

λ=λA

ln⟨

e−(Hλ+∆λ−Hλ)/kBT⟩

λ(1.16)

which is usually referred to as the multi windows free energy perturbation method.Alternatively, in an effort to increase the efficiency of the method (make it lesscomputationally demanding), another approach was derived where the samplingof the reference state was biased by the use of a soft core interaction site atpositions where atoms were to be created or deleted [19]. This results in theextension of the phase space sampled in the reference state to relevant parts of theconfiguration space accessible to the system in the perturbed state. The increaseof the conformational overlap between the two states leading to the convergenceof the ensemble average. Using this soft core potential, accurate estimates of the

1.6. Outline 11

relative free energy can be obtained from the single ensemble of a reference state(single step perturbation) [19, 20, 21, 22].

1.6 Outline

Discovering the structure of a protein is usually an important step towards the un-derstanding of its function and of the mechanism by which the function is carriedout. However, as already stated previously, knowing the structure of a proteindoes not give any insight on the way the protein folds. Further, its interactionswith other proteins in order to carry out its function can only be extrapolatedunless the structure of the different proteins in complex together is also avail-able. An alternative is to gain a better understanding of the interactions betweenproteins and between proteins and ligands in order to be able to predict how pro-teins or some of their elements associate with one another. This thesis discusseshow these interactions lead to the formation of secondary structure elements. Italso addresses the prediction of the relative affinity between proteins and betweenproteins and ligands using free energy calculations.

The ultimate goal when studying proteins is of course the resolution of theprotein folding problem. Due to the complexity of the problem however, thishas not been done yet and probably no single thesis would be able to bring aglobal solution to this problem. Instead, the subject of this thesis is the studyof proteins and protein-protein interactions. A particular focus in the differentchapters is put on the time scale needed to obtain statistically reliable informationand properties.

Chapter 2. The EPO mimetic peptide 1 is used as a model system to helpunderstand how β-sheets can rearrange by observing how the dimer canswitch between different dimeric states. The interactions are the same asthose involved in protein folding and the study provides insight into howsecondary structure elements can find and recognize each other in the courseof the folding of a protein. The study also provides insight into the timescale on which these processes happen.

Chapter 3. The importance of convergence in the case of free energy calculationsusing TI is assessed. It is shown that the simulation of specific intermediatepoints can require up to 20 ns to reach convergence in the case of sim-ple ligands mutated in a water environment. The mutations performed onthese ligands involve the simultaneous creation/deletion of many sites andillustrate both the power and limitations of these types of calculations.

Chapter 4. Reliability and applicability of TI free energy calculations when ap-plied to protein-protein interactions are evaluated by determining the rela-tive stability of the dimer of the Suc1 protein upon mutation. The mutationsare performed on a swapped dimer and on the corresponding monomer in


order to evaluate their effect on the relative stability of the dimers. Com-parison with experiment gives insight into the current state of the methodand the progress that still remains to be accomplished before it can be usedin a practical manner to predict protein self-assembly.

Chapter 5. Factors that affect the convergence properties of free energy calcu-lations are investigated. The sources of errors related to the convergence ofthe calculations (the sampling error and the statistical error) are evaluatedusing three mutations of the Suc1 protein. The ability to reliably determinethe error associated to the calculations is critical to the meaning and ap-plicability of the calculations hence the two most popular methods used tocalculate the statistical error are reviewed and their reliability assessed.

Chapter 6. Conclusion and outlook of this thesis.

13

Chapter 2Self association of the EPO mimetic

peptide 1

Proteins are known to fold on a time scale ranging from ms to seconds or evenhours. Before the whole process can be accomplished, secondary structure ele-ments of the protein have to recognize and form patterns which correspond tothe most stable structure. However, it is not known how long it takes for thesesimple elements to self organize and achieve their lowest free energy conformation.In this chapter, the self association of the EPO mimetic peptide 1 is studied indetail. This system can be viewed as a model system to help understand the timescale on which β-sheets can recognize each other and rearrange to find the mostoptimal pairing.

14 Chapter 2. Self association of the EPO mimetic peptide 1

2.1. Introduction 15

2.1 Introduction

Protein-protein interactions are at the center of many important problems instructural biology including protein folding, ligand binding and protein self-association.In fact protein-protein interactions are fundamental to overall cellular regulationby mediating cellular signaling pathways. The most simple form of protein-proteininteraction, the association of a protein with itself to form a functional dimer orhigher aggregate has evolved as a mechanism of control in many systems. Selfassociation can also lead to changes in the conformational stability or in the struc-tural properties of proteins. These can in turn lead to the loss of functionalityor even toxicity due to the accumulation of aggregates in the cell [23, 24]. Theinteractions that drive protein self-association are the same as those that driveprotein folding and domain assembly. Self-association can therefore be used asa simple model to understand how partly folded proteins evolve to give the finalnative state.

In this work we investigate the self-association of the erythropoietin (EPO)mimetic peptide 1 (EMP1) as a model of protein-protein interactions in general.EMP1 is the most potent member of a series of cyclic peptides which were de-signed to mimic the effect of the erythropoietin [25]. EMP1 shows comparableactivity to the native hormone although it is completely unrelated in sequence.EPO is a growth hormone that stimulates the production of Erythrocytes undercertain stress conditions. When the oxygen level is low, EPO is produced andcirculates in the blood stream where it targets EPO receptors, leading to theiroligomerization which in turn leads to their activation. A dimer of EMP1 in-duces the dimerization of the receptors and thus can activate the EPO receptors(EMP1 competes with EPO in binding assays). EMP1 was also found to aggre-gate very strongly in solution. Its small size and the availability of experimentaldata (crystallographic structure [26], dissociation constant [27]) make it a goodmodel system with which to study self-association and β-sheet formation. EMP1is a 20 amino acid peptide (GGTYSCHFGPLTWVCKPQGG). It contains a β-turn consisting of the residues Gly9 - Pro10 - Leu11 - Thr12. A disulfide bridgebetween Cys6 and Cys15 stabilizes the two β-strands. The crystallographic struc-ture of the EMP1 dimer forming a 2:2 complex with the binding domain of theEPO receptor (EBP) is illustrated in Fig. 2.1. The structure shows the almostperfect twofold symmetry of the receptor as well as the specific binding modeof the peptide dimer [26]. The crystallographic structure (pdb entry 1EBP) alsoshows the existence of a 4 stranded β-sheet characterized by a complex network ofintra-molecular and intermolecular hydrogen bonds (see Fig. 2.1.b). A hydropho-bic core is formed between the two monomers that involves the disulfide bridgesand residues Tyr4, Phe8 and Trp13 (Fig. 2.1.c).

In order to study the self-association of EMP1 more closely, a series of Molec-ular Dynamics (MD) simulations have been performed. The aim of the work wasto investigate the process of aggregation and to understand the forces drivingassembly. The dimerization of the peptide was also extensively studied by a very


(a)

(b) (c)

Figure 2.1: (a) Cartoon representation of the crystallographic structure of the EBP-EMP1 complex [26] (pdb entry 1EBP). The two receptor molecules are shown in silverand are surrounding the EMP1 dimer shown dark grey. (b) An all atom representationof the peptide dimer of EMP1 as found in the EBP-EMP1 complex. The backbone isdrawn in thicker lines in order to highlight the formation of a four-stranded β-sheet. (c)The same as (b) but highlighting the hydrophobic core of the dimer formed by Trp13 (atthe interface), His8, Cys6 and Cys15.

2.2. Methods 17

thorough search of the conformational space to gain insight into the β-sheet for-mation and the influence of the temperature on the dimer and its binding modes.The simulations have allowed us to gather detailed information on the possibleconformations of the EMP1 dimer over 2.4 µs, and to obtain insight into the mostprobable conformations accessible to the dimer and their relative stability.

2.2 Methods

All simulations and analysis were performed using the Gromacs software package[28, 29] (version 2.1 for the aggregation and dimerization simulations, version3.1 for the search of the dimer conformational space and the influence of thetemperature). The Gromos96 force field [30] was used. In this force field thenon polar hydrogen atoms are treated as united atoms together with the carbonto which they are attached. The peptides were solvated using the Simple PointCharge (SPC) water model [31].

2.2.1 Simulation parameters

Simulations were performed using periodic boundary conditions. The solvatedpeptides were energy minimized using a steepest descent algorithm and furtherrelaxation of the system was obtained by equilibrating it for a period of timedepending on its size (see Table 2.1 for details). The initial velocities used forequilibrating the system were taken from a Maxwell distribution at 300 K. Thetemperature was kept constant using weak coupling to a temperature bath [32] at300K and with a coupling constant of 0.1 ps. Protein and solvent were coupledindependently to the temperature bath. The pressure was also controlled by weakcoupling to a bath [32] of constant pressure (Po = 1 bar, coupling time τ = 1ps). All covalent bonds were constrained using the LINCS algorithm [33], whichallows the use of a time step of 2 fs. The relative dielectric constant was set to 1for simulations performed without a reaction field and to 78 for simulations per-formed with a reaction-field. In each case, a twin range cut-off was used for thecalculation of the non bonded interactions. The short-range cut-off radius was setto 0.8 nm for the simulations with a plain cut-off and to 0.9 nm for the simulationsusing a reaction field. The long-range cut-off was set to 1.4 nm for both electro-static and van der Waals interactions. The cut-off values used in the simulationswith no reaction field are the same as those used for the Gromos96 force fieldparameterization. Interactions within the short-range cut-off were updated everytime step while interactions within the long-range cut-off were updated every fivetime steps together with the pair list.


Table 2.1: A summary of the simulations performed to analyse the self associationof the erythropoietin mimetic peptide 1 (EMP1). Listed are the simulation identifiers,the number of copies of the peptide, the number of water molecules, the temperature,the equilibration time and the total time the system was simulated. The R.F. columnindicates whether a reaction field was used in the simulations in order to correct for thetruncation of long range electrostatic interactions.

Simulation peptides water mol. T (K) Equil. (ns) Length (ns) R.F.

AggregationC1 12 14831 300 0.1 40 noAggregationC2 4 17832 300 0.2 50 noPair1a, Pair1b 2 4629 300 0.15 100 noPair2a, Pair2b 2 4645 300 0.15 100 noPair3a, Pair3b 2 4838 300 0.15 100 no

Pair1c1 2 4049 300 0.2 100 yesPair1c2 2 4426 300 0.2 100 yesPair1c3 2 4451 300 0.2 100 yesPair1c4 2 4375 300 0.2 100 yesPair1c5 2 4058 300 0.2 100 yesPair1c6 2 4344 300 0.2 100 yesPair3c1 2 3984 300 0.2 100 yesPair3c2 2 3997 300 0.2 100 yesPair3c3 2 3995 300 0.2 100 yesPair3c4 2 3992 300 0.2 100 yesPair3c5 2 3982 300 0.2 100 yesPair3c6 2 3990 300 0.2 100 yes

Pair1 - T350 2 4049 350 0.2 100 yesPair1 - T400 2 4049 400 0.2 100 yesPair1 - T450 2 4049 450 0.2 100 yesPair3 - T350 2 4006 350 0.2 100 yesPair3 - T400 2 4006 400 0.2 100 yesPair3 - T450 2 4006 450 0.2 100 yes

2.2. Methods 19

Table 2.2: A summary of the parameters used in the energy minimizations (EM) andMD simulations.

MD simulations EM simulationsdt (ps) 0.002 emtol 100

Tcoupling yes/Berendsen emsteps 0.01tau t (ps) 0.1ref t (K) 300Pcoupling isotropic/Berendsentau p (ps) 1.0

compressibility (bar−1) 4.6 10-5

ref p (bar) 1.0

2.2.2 Aggregation simulations

The starting conformation of the peptide was taken from the crystallographicstructure of the EMP1 peptide dimer complexed with the two EPO binding do-mains of the receptor (pdb entry 1EBP [26]). The Gly residues (four residues intotal, two at the beginning and two at the end of the peptide) were disordered inthe crystal and had to be added (using the spdbv [34] program). The structureof the first peptide of the dimer in the x-ray structure was chosen as the buildingblock for the aggregation simulations. The peptide was solvated, energy mini-mized and equilibrated. Twelve copies of the equilibrated peptide were placed in

a box of 470 nm3 (a rectangular box with vector lengths a = 6.3 nm b = 6.9 nmand c = 10.9 nm). After solvation of the peptides the system contained 14831water molecules corresponding to a concentration of approximately 45 mM. An-other system was also simulated at lower concentration (approximately 12 mM).The simulation box consisted in this case of four peptides surrounded by 17832water molecules. Details of the simulation parameters can be found in Tables 2.1and 2.2.

2.2.3 Dimerization simulations

Six 100 ns MD simulations of EMP1 dimers were performed in water in order toassess whether the peptide would converge to a single well defined dimeric statewithin the accessible time scale. The 6 simulations consisted of three differentinitial conformations, each simulated using two different sets of velocities. Two ofthe initial structures (pair1 and pair2) corresponded to pairs formed during thespontaneous aggregation of EMP1 (see Fig. 2.2). The third conformation (pair3)was taken directly from the crystallographic structure. The concentration of thepeptide was around 23-24 mM for this set of simulations. See Tables 2.1 and 2.2for details of the simulation parameters.


(a) Pair1 (b) Pair2 (c) Pair3

Figure 2.2: The three starting conformations used for studying the dimerization ofEMP1 (backbone representation). (a) and (b) correspond to pairs formed during thespontaneous aggregation of EMP1 performed at C = 45 mM. (c) was taken from thecrystallographic structure.

2.2.4 Conformational space search

In order to investigate the conformational space accessible to the peptide dimer aseries of starting structures were generated based on structures obtained duringthe simulations on pair1 (conformation of the dimer after 40 ns in Pair1a) and onpair3 (the equilibrated structure). For each of these conformations, six additionalstarting conformations were systematically derived by rotating one peptide at atime by 90 degrees with respect to the previous conformation. The rotation wasperformed along an internal axis defined by two points located at the midpointbetween residues 9 and 12 and residues 5 and 16 respectively. The peptide wasrotated by 90, 180 and 270 degrees. Then the same rotations were applied to theother peptide of the dimer starting from the conformation where the previous pep-tide was rotated by 90 degrees. The rotations of the peptides along their internalaxes were performed using the program VMD[35]. Simulations were performedat concentrations between 25-28 mM. See Tables 2.1 and 2.2 for details of thesimulation parameters.

2.2.5 Effect of the temperature

To investigate the effect of temperature on the sampling of conformational space,simulations were performed at three additional temperatures (T = 350, 400, 450K) for two cases. The initial structures for these calculations were the dimer ofPair1a after 40 ns and pair3 after equilibration. These are the same two structuresthat had been used previously for the conformational space search (section 2.2.4).The system was equilibrated using an initial set of velocities taken from a Maxwelldistribution at the corresponding temperature. All the simulations were 100 ns

2.3. Results/Discussion 21

in length.

2.2.6 Analysis

Evolution of the number of aggregates: To determine the state of aggre-gation, the distance between the centers of mass (COM) of each peptide wascalculated for all possible pairs. When the distance between any pair of peptideswas less than 1.6 nm the pair was considered to have aggregated. A peptide wasconsidered to be part of an aggregate so long as any member of the aggregate waswithin 1.6 nm of the peptide in question.

Secondary structure assignment: The assignment of secondary structurewas performed using the DSSP algorithm. DSSP assigns secondary structurebased on hydrogen-bonding and geometrical criteria [4].

Cluster analysis: A cluster analysis was performed using the method describedin [14]. For each pair of structures taken from the trajectories a least-squarestranslational and rotational fit on the backbone atoms (N, Cα, C) of residues 3 to18 was performed and the atom-positional root-mean-square difference (RMSD)for this set of atoms calculated. Using as a similarity criterion an RMSD ≤ 0.15nm for the monomer and an RMSD ≤ 0.30 nm for the dimer, the number ofneighbors (structures satisfying the similarity criterion) was determined for eachstructure. The structure with the largest number of neighbors was consideredtogether with its neighbors as a cluster. These structures were removed from thepool of structures. The procedure was repeated for the remaining structures untilthere were no structures left in the pool. The analysis was performed using struc-tures every 100 ps in the trajectories of 100 ns and every 1 ns in the concatenatedtrajectory of 2.4 µs.

2.3 Results/Discussion

2.3.1 Aggregation of the EPO mimetic peptide 1

Two simulations were performed at different concentrations (C = 45 mM for thesimulation labeled AggregationC1 and C = 12 mM for the one labeled Aggre-gationC2) in order to study the aggregation phenomena. Experimental evidencesuggests that the peptide aggregates at low concentration (Kd˜20 µM determinedby analytical ultracentrifugation [27]). This makes it extremely difficult to getinformation on the conformation of EMP1 in solution using techniques such asNMR. Both simulations show a rapid aggregation (see Fig. 2.3) of all the peptides.This was expected as the simulations are performed at much higher concentrationsthan the dissociation constant. After 10 ns there is only one group remaining inthe simulation AggregationC2 (lower concentration). In the simulation at 45 mM


the system appears to fluctuate between one and two groups even after 10 ns.This can be explained by the method used to calculate the number of groups. Apeptide is considered to be a part of a cluster when its distance to the center ofmass of any member of the cluster is less than a given cut-off radius. For peptidesat the surface of the cluster this criteria may not be fulfilled in a specific configu-ration even if the peptide never truly separates from the cluster. The probabilityof this happening is of course greater the larger the cluster. In fact it appearsthat once the peptides are associated together into a large aggregate, they neverseparate on the time scale of the simulation. The aggregates of peptides formedduring the simulations do not show any higher form of organization such as theassembly into an extended β-sheet (see Fig. 2.4).

0 10 20 30 40Time (ns)

0

2

4

6

8

10

12

14

Num

ber o

f agg

rega

tes

12 monomers

0 10 20 30 40 50Time (ns)

0

1

2

3

4

54 monomers

a b

Figure 2.3: Aggregation of the EMP1. The graphs show the evolution of the number ofaggregates as a function of time at (a) C = 45 mM (12 monomers) and (b) C = 12 mM(4 monomers). Two peptides were considered to be in the same cluster if the distancebetween their respective centers of mass was < 1.6 nm.

The aggregation of the peptides show some features expected in the foldingof proteins. For example Fig. 2.5 shows the evolution of the solvent accessiblesurface area of the hydrophobic and hydrophilic residues in both simulations asa function of time. As can be seen, the aggregation of the peptides is correlatedwith the burial of hydrophobic surface. The change in hydrophilic surface is muchless than that of the hydrophobic surface indicating some degree of orientationduring the aggregation.

2.3.2 Dimerization of the EPO mimetic peptide 1

It is clear that EMP1 readily self-associates at high concentration to form nonspecific aggregates. The question is, can the formation of a specific dimer, whichis assumed to be the biologically active form, be simulated at lower concentration.


Figure 2.4: Structure of EPO mimetic peptide 1 evenly distributed at t = 0 ns (left)and after aggregation (right) at a concentration of 45 mM.

0 10 20 30 40Time (ns)

300

350

400

450

Are

a (n

m2 )

HydrophobicHydrophilic

12 monomers

a

0 10 20 30 40 50Time (ns)

100

150

200

Are

a (n

m2 )

HydrophobicHydrophilic

4 monomers

b

Figure 2.5: Variation of the solvent accessible surface as a function of time for bothhydrophobic and hydrophilic residues at 2 different concentrations: (a) C = 45 mM (12monomers) and (b) C = 12 mM (4 monomers).


It is not possible to simulate the self-association of the EMP1 peptide at a concen-tration (or peptide to water ratio) where the dimer is believed to be the dominatespecies (˜10−6M). This is both because of the size of the system that would berequired to be simulated and because of the time scale needed for the simula-tion in order for two monomers to find each other by diffusion. For this reason,a series of simulations of isolated dimers were performed instead. The startingstructures for these simulations are given in Fig. 2.2. They are characterized bydifferent intermolecular interactions. Pair1 and pair2 initially do not contain anyintermolecular β-sheets in their starting conformation. Pair3 in contrast containsan intermolecular β-sheet formed by interactions between the N-terminal (N-ter)parts of the monomers (residues 1-10).

The first characteristic that can be noticed is the stability of each of the dimersthat have formed. This can be seen in Fig. 2.6 which shows the distance betweenthe centers of mass of the two monomers as a function of time in each of thetwelve simulations. In most of the simulations, the distance between the centersof mass (COM) decreases very quickly and after a few ns the value is less than1 nm. In the simulations Pair2a and Pair2b pairing only occurs after 30 ns and90 ns, respectively. Once the peptides come together, they do not separate atany time during the rest of the simulation with the exception of Pair2a. In thissimulation the distance between COM increases from 0.8 nm to more than 1.7 nmafter 70 ns before decreasing again after 90 ns to reach 0.9 nm at the end of thesimulation. Even in this case, however, the peptides are always in close enoughproximity for direct interaction between the peptides.

Fig 2.7 shows the intermolecular elements of secondary structure present as afunction of time for the simulation Pair1a. More precisely it shows the presenceor absence of a β-sheet between the two peptides. The two EMP1 monomers caninteract in four different orientations. These are N-terminal - N-terminal (N-ter- N-ter), C-terminal - C-terminal (C-ter - C-ter) and N-ter - C-ter or C-ter - N-ter. The latter two are in principle equivalent due to reasons of symmetry butare distinguishable in the simulation because the two monomers are treated asseparate entities. What can be seen in Fig 2.7 is that for Pair1a only C-ter - C-terinteractions are significant. In comparison no other combinations show extensivehydrogen bond formation as would be indicated by continuous red sections in Fig.2.7. Fig. 2.8 gives the same information for all six simulations, but only combina-tions containing intermolecular elements of secondary structure are represented.For Pair1a and b, 2a and b and 3a one orientation is dominant. Pair3b in contrastshows a major transition between 30 - 40 ns. Fig. 2.9 shows a matrix of the rootmean square deviation (RMSD) values of the dimer for the combined trajectoriesof the six simulations of the different pairs. The figure provides an overview of theoverlap of the conformational space sampled by the different dimers during thesimulation. The colour indicates the degree of similarity. Blue indicates a highdegree of similarity and red a low degree of similarity. The absence of blue-greenoff diagonal elements clearly indicates that each simulation searches a differentpart of the conformational space available. The simulations Pair2a and Pair2b


0 20 40 60 80 100Time (ns)

Pair3 T=350KPair3 T=400KPair3 T=450K

Pair1 T=350KPair1 T=400KPair1 T=450K

0 20 40 60 80 100Time (ns)

0

0.4

0.8

1.2

1.6 Pair 3aPair 3b

0

0.4

0.8

1.2

1.6

Dis

tanc

e (n

m) Pair 2a

Pair 2b

0

0.4

0.8

1.2

1.6

2Pair 1aPair 1b

Distance between centers of mass

a

b

c

d

e

Figure 2.6: Distance between centers of mass as a function of time. (a,b,c) Simulationsof pair1, pair2 and pair3 at 300 K for 2 sets of velocities. (d,e) Simulations of pair1 andpair3 at 3 different temperatures (T = 350, 400 and 450 K).


N-t

er

-N

-ter

N-t

er

-C

-ter

C-t

er

-N

-ter

C-t

er

-C

-ter

Figure 2.7: Intermolecular β-sheets of the simulation Pair1a as a function of time. Thesecondary structure elements are shown for all possible combinations of the two strands(N-ter and C-ter) of each peptide.

show some overlap at the beginning of the simulation. This is related to the factthat both simulations were started from the same structure.

In the case of Pair2a, the COM of the two peptides are separated by only 0.8nm for more than 60 ns and the RMSD indicates that the dimer adopts only oneconformation during this period. This element suggests that the dimer is at leastin a meta-stable conformation. At the same time, Fig. 2.8 indicates that there isno intermolecular β-sheet and no hydrogen bond network between the monomersimplying that the interaction is primarily hydrophobic.

The fact that the distance between the COM remains constant or even de-creases in the other simulations indicates strong intermolecular interactions. Thedecrease of the distance between the COM of the peptides of Pair1b from 1.0 nmto 0.8 nm after 50 ns corresponds to the formation of an intermolecular β-sheet.Similar behavior is observed for Pair2b where the distance between the COMdecreases from 1.2 nm to 0.8 nm and an intermolecular β-sheet forms after 90 ns.

The matrix of the RMSD values of the dimer (Fig. 2.9) shows that each simula-tion samples a different region of the conformational space. In the equivalent plotfor the individual monomers (Fig. 2.10), many (white) off diagonal elements areevident indicating a high degree of overlap of the space sampled by the monomersin the different simulations. Only a very limited set of conformations are ac-


Pair

1a

C-t

er

-C

-ter

Pair

1b

C-t

er

-N

-ter

Pair

2a

C-t

er

-N

-ter

Pair

2b

C-t

er

-N

-ter

Pair

3a

N-t

er

-N

-ter

Pair

3b

N-t

er

-N

-ter

Pair

3b

C-t

er

-C

-ter

Figure 2.8: Intermolecular β-sheets as a function of time for the 6 simulations. Onlycombinations that show some formation of intermolecular hydrogen bonds are shown.


cessible to the monomer. The rigidity of the monomer can be explained by thepresence of the intra-molecular disulfide bridge between Cys 6 and Cys 15. Fromthe simulations, it appears that 600 ns (or less) are enough to completely samplethe conformational space of the monomer. Given the rigidity of the monomer, itis clear that the different dimers arise from the different relative orientations ofthe monomers with respect to the each other. However even for this relativelysimple system, the complete sampling of the conformational space for the dimeris not possible within 600 ns.

From the RMSD of the dimer in the simulation Pair1b (Fig. 2.9 between 110and 150 ns) we can observe that the dimer can remain in the same conformationfor 40 ns without having a well defined secondary structure (Fig. 2.8). In contrast,the simulation Pair3b shows that the dimer can still undergo major conformationalchange even after forming a well defined secondary structure. The dimer in Pair3badopts a conformation containing an intermolecular N-ter - N-ter β-sheet for thefirst 30 ns. This structure is then lost and only one hydrogen bond from theβ-sheet is retained for a few nanoseconds. The dimer then forms a hydrogen bondbetween the C-ter - C-ter chains before finally forming an intermolecular C-ter -C-ter β-sheet which is stable for the rest of the simulation.

2.3.3 Conformational space search, Binding modes and mainclusters of conformations

From the study of the spontaneous dimerization of the peptides it is clear that:

1. The peptides are very sticky. Non specific interactions between the monomersprevent the efficient sampling of the conformational space on the time scaleaccessible.

2. The monomers themselves are very rigid.

Given these two elements, an alternate approach to explore the different confor-mations available to the dimer was used. The approach was based on using anumber of different starting structures that were closely related to each other insuch a way that a global overview of the conformation phase space of the dimercould be obtained. In short a set of 12 alternative dimers were generated by ro-tating the two monomers around various axes (see section 2.2.4) and simulatingeach for 100 ns.

First, the rigidity of the monomer was confirmed by performing a clusteranalysis using the 1.8 µs of the combined trajectories of this set of simulationsplus the 6 simulations of the dimers described previously. From this we couldshow that the monomers sampled only a small range of conformations. The 4dominant clusters were the same for both monomers and comprised 85.6 % of allconformations in the case of monomer 1 and 90.2 % in the case of monomer 2.The limited range of conformations of the monomer are illustrated in Fig. 2.11.The relative populations were similar for both monomers except in the case of


0 100 200 300 400 500 6000

100

200

300

400

500

600

Tim

e (n

s)

Time (ns)

0 RMSD (nm) 0.3

Figure 2.9: Root mean square deviation (RMSD) matrix of the dimer for the 6 simu-lations Pair1a,1b,2a,2b,3a and 3b of 100 ns each. The values were calculated based onthe backbone atoms excluding the two Gly residues at the beginning and at the end ofeach peptide. An RMSD value of 0.30 nm was used as maximum limit to discriminatebetween different conformations.

0 100 200 300 400 500 6000

100

200

300

400

500

600

Tim

e (n

s)

Time (ns)

0 RMSD (nm) 0.15

(a)

0 100 200 300 400 500 6000

100

200

300

400

500

600

Tim

e (n

s)

Time (ns)

0 RMSD (nm) 0.15

(b)

Figure 2.10: Root mean square deviation (RMSD) matrix of the monomer 1 (a) and2 (b) during the 6 simulations of 100 ns each shown in Fig. 2.9. The values werecalculated based on the backbone atoms excluding the two Gly residues at the beginningand at the end of each peptide. An RMSD value of 0.15 nm was used as maximum limitto discriminate between different conformations.


Cluster 1 Cluster 2 Cluster 3 Cluster 4

Figure 2.11: Backbone representations of the 4 most populated clusters found for themonomer during 1.8 µs. Populations were not strictly identical for both monomers butwere similar. Cluster 1: 56.2 % for monomer 1 and 52.8 % for monomer 2, Cluster 2:9.7 % for monomer 1 and 20.6 % for monomer 2, Cluster 3 : 13.4 % for monomer 1 and14.0 % for monomer 2, Cluster 4: 6.3% for monomer 1 and 2.8% for monomer 2. Theclustering was performed using as similarity criterion a backbone RMSD ≤ 0.15 nm.

cluster 2 which was only the third most populated cluster in monomer 1 while itwas the second most populated cluster in monomer 2 with a relative populationtwo times as high. This is due to the fact that cluster 1, cluster 2 and cluster 4are still closely related and that many conformations could be part of any of thesethree clusters.

Many of the starting conformations only lead to the formation of very limitedintermolecular β-sheets and/or β-strand (Pair1c1, Pair1c2, Pair1c5). Neverthelessthese still corresponded to relatively stable structures. In the same way, the lossof an intermolecular β-sheet during the simulation did not necessarily result in aless stable structure (Pair3c6).

The result of the cluster analysis of the dimer from the combined 1.8 µs simula-tion time can be seen in Fig. 2.12 where the backbone of representative structuresof the first 10 clusters are shown. It appears that the conformation of the mostpopulated cluster is not that observed in the crystallographic structure. Thedominant cluster has the form of a parallel intermolecular C-ter - C-ter β-sheet asopposed to the anti-parallel N-ter - N-ter β-sheet found experimentally. The sim-ulations do not, however, give a clear suggestion of a preferred structure. Fig. 2.13shows the relative populations of the first 20 clusters. The population of the first3 clusters are about the same size with around 10 % of the total number of theconformations sampled. None of the top 9 clusters are similar to the crystal struc-ture. Only cluster 10 contains a N-ter - N-ter β-sheet but the central structure ofthis cluster has a backbone RMSD > 0.3 nm from the crystal structure.

Fig. 2.14 shows the RMSD matrix of the dimer for the combined trajectoryof the 18 simulations. This global RMSD matrix clearly indicates the presenceof (blue-green) off diagonal elements but only in very restricted areas. In whatfollows, the matrix elements are denoted (r, c) where r corresponds to the sim-


Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5

Cluster 6 Cluster 7 Cluster 8 Cluster 9 Cluster 10

Figure 2.12: Backbone of representative structures of the 10 most populated clustersfound for the dimer during 1.8 µs. Color convention: black corresponds to the N-terminalof the peptides while white corresponds to the C-terminal. The cluster analysis was basedon a similarity criterion of a backbone RMSD ≤ 0.30 nm.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20Cluster no

0

2

4

6

8

10

12

14

16

18

20

Stru

ctur

es (%

)

Figure 2.13: Size of the 20 most populated clusters of the dimer for a combined simu-lation time of 1.8 ms. Clustering was performed using as similarity criterion a backboneRMSD < 0.30 nm.


ulation time in ns in the row and c corresponds to the simulation time in thecolumn (note the matrix is symmetric thus the order of r and c is irrelevant). Themost significant off diagonal elements of the matrix are (1050, 250), (1350, 650),(1450, 650), (1400, 50) and (1550, 550) as these correspond to simulations withdifferent starting structures. The elements (1150, 950), (1450, 1350) and (1750,1650) show (clear) off diagonal elements. Although these pairs also had differentstarting structures, they were generated from the same structure (pair1 for (1150,950) and (1450, 1350) and pair3 for (1750, 1650)).

Two main conclusions can be drawn from this global RMSD matrix and thecluster analysis. First, as noted previously, even for conformations without welldefined β-sheets the dimers are relatively stable structures. Clearly the main forcestabilizing the dimer is not the formation of intermolecular hydrogen bonds butthe burial of hydrophobic surface. Second, the presence of off diagonal elementsin several locations indicates that trajectories from unrelated starting structuresshow only limited overlap. All possible dimer conformations will also not havebeen sampled. Much longer simulations would be needed to obtain reliable statis-tics on the relative populations of the different dimer conformations.

2.3.4 Effect of the temperature on the dimer conformation

A common method to facilitate sampling in simulations of molecular systemsis simply to increase the temperature [36]. The system can then cross energybarriers more easily thus allowing it to escape trapped conformations. In thecase of the EMP1 dimer increasing the temperature would facilitate dissociationre-association events. It could also lead to the loss of secondary structure. In thecase of a cyclic peptide such EMP1, the presence of an intra-molecular disulfidebridge will prevent the monomer from adopting a significantly different structure.

Simulations of 100 ns in length were performed at 350, 400 and 450 K startingfrom a structure taken from Pair1a and from pair3 (the crystallographic struc-ture). Despite the increase in temperature no dissociation events were observedin any of the simulations performed. On average, the distance between the COMof the peptides is similar in the case of pair1 irrespective of the temperature asindicated in Fig. 2.6d. In the case of pair3 the distance between COM is slightlyhigher at 400 K and 450 K than at 300 K and 350 K (Fig. 2.6e).

The number of transitions occurring during the course of these simulationsperformed at increased temperature seems to be directly related to the startingstructure. While in the case of Pair1a and Pair1b (simulations performed at 300K) the secondary structure of the dimer only shows the presence of one typeof intermolecular β-sheet (Fig. 2.8), either between C-ter - C-ter for Pair1a orC-ter - N-ter for Pair1b, simulations of pair1 at higher temperatures show theoccurrence of transitions from one type of intermolecular β-sheet to another. Thisis illustrated in Fig. 2.15 which shows the evolution of the secondary structure ofpair1 and pair3 as a function of time during the simulations performed at 350,400 and 450 K. At 350 K the dimer goes from a C-ter - C-ter intermolecular β-


0 200 400 600 800 1000 1200 1400 1600 18000

200

400

600

800

1000

1200

1400

1600

1800

Tim

e (n

s)

Time (ns)

0 RMSD (nm) 0.3

Figure 2.14: Root mean square deviation (RMSD) matrix, showing the RMSD of thebackbone atoms of the dimer (excluding the two Gly at the beginning and at the endof each peptide) for 18 simulations of 100 ns at 300 K. Values between 0 and 600 nscorrespond to the simulations of pair1, pair2 and pair3 for two sets of velocities. Valuesbetween 600 and 1200 ns correspond to simulations performed using starting structuresgenerated based on a structure from simulation Pair1a. Values between 1200 and 1800 nscorrespond to simulations performed using starting structures generated based on pair3(crystallographic structure after equilibration).


Figure 2.15: Intermolecular secondary structure of pair1 and pair3 at 350 K (upperpanels), 400 K (middle) and 450 K (lower panels) as a function of time for all possiblecombinations of the two strands (N-ter and C-ter) of each peptide.


0 100 200 300 400 500 6000

100

200

300

400

500

600

Tim

e (n

s)

Time (ns)

0 RMSD (nm) 0.3

Figure 2.16: Root mean square deviation (RMSD) values of the backbone atoms of thedimer for the 6 simulations of pair1 and pair3 performed at 350, 400 and 450 K. Thecalculations were performed excluding the two Gly at the beginning and at the end ofeach peptide. Values from 0 to 300 ns correspond to the 3 simulations of 100 ns eachof pair1 at 350, 400 and 450 K. Values between 300 and 600 ns correspond to 3 similarsimulations using pair3 (the crystallographic structure) as the starting structure.


sheet to a N-ter - C-ter β-sheet (Fig. 2.15) through an intermediate conformationwithout intermolecular β-sheet as can be seen around 65 ns in Fig. 2.16, whichshows the backbone RMSD matrix of the dimer of pair1 and pair3 simulated at350, 400 and 450 K. At 400 K the same kind of transition is observed but thistime the transition involved the flip of the other monomer (from a C-ter - C-terβ-sheet to a C-ter - N-ter β-sheet). The N-ter - C-ter β-sheet observed at 350K and the C-ter - N-ter β-sheet observed at 400 K correspond to two differentstructures as the peptide’s orientation is different at 350 K and 400 K. At 450K the dimer undergoes three transitions between three conformations with welldefined secondary structure (Fig. 2.15). The first and second transitions occurafter 20 and 75 ns, respectively. The dimer goes from a C-ter - C-ter β-sheet toa C-ter - N-ter β-sheet and then back to the C-ter - C-ter β-sheet. The thirdtransition occurs between the C-ter - C-ter β-sheet and a N-ter - N-ter β-sheet at85 ns.

The N-ter - N-ter β-sheets formed during the simulation of pair3 at increasingtemperature appear surprisingly stable compared to the ones formed during thesimulations of pair1. At 350 K the dimer forms a β-sheet which remains relativelystable for a short period after a change in the dimer’s conformation. At 400 K theinitial N-ter - N-ter β-sheet remains stable for the full length of the simulation butrearrangements still occur in the dimer which switches back and forth betweenthe main and two alternative conformations (Fig. 2.16 between 400 and 500 ns).At 450 K the dimer forms a N-ter - N-ter β-sheet but still undergoes a transitionfrom the main to an alternative conformation as indicated in Fig. 2.16 around525 ns while retaining the β-sheet. Transitions to other conformations containingdifferent β-sheet pairing also occur at three other occasions (transitions to C-ter- C-ter around 40 ns and to C-ter - N-ter around 80 ns and 95 ns in Fig. 2.15 )but in two of them (transitions at 40 and 80 ns) the dimer goes back to N-ter- N-ter β-sheet although Fig. 2.16 indicates that they do not correspond to thesame conformation. The resistance of the N-ter - N-ter β-sheet to the increasein temperature suggests that this β-sheet is more stable than the other types ofβ-sheet. However this was not found during the simulations performed at 300 Kin which the N-ter - N-ter β-sheet was not the most populated cluster.

2.4 Conclusion

The simulations suggest that a wide variety of alternative dimers are metastableon a timescale of 100’s of ns. In fact, once formed, no complete dissociation eventwas observed in any of the simulations. This makes any estimation of the rateof association/dissociation impossible. Because such a wide variety of alternativedimer conformations were relatively stable it was not possible to determine thepreferred conformation of the dimer (if any) from the simulations performed. Awide variety of alternative hydrogen bonding patterns in the various dimers wereobserved. While the formation of an intermolecular β-sheet may stabilize the

2.4. Conclusion 37

dimer, this was not the primary stabilizing factor. Instead, the burial of thehydrophobic surface appears to drive the formation of the dimer. The existenceof many similar local energy minima makes the sampling of the conformationalspace difficult. Even after an extensive systematic search using many differentstarting structures only a fraction of the phase space potentially available couldbe sampled.

Simulations performed at higher temperature (350, 400 and 450 K) suggestthat structures involving an intermolecular N-ter - N-ter β-sheet were more resis-tant to change than structures containing other types of β-sheet. This suggeststhat a N-ter - N-ter dimer as found in the crystal structure of the EMP1-EPOreceptor complex might indeed be the most stable structure also in solution.


39

Chapter 3Free energy calculations of protein-ligand

interactions: the binding of

triphenoxypyridine derivatives to factor

Xa and trypsin

The calculation of free energy differences between alternate states of a system isof great importance as the rate and extent of many if not all chemical and bio-physical processes are governed by the nature of underlying free energy landscape.The preferential binding of a set of 10 triphenoxypyridine derivatives to 2 serineproteases, factor Xa and trypsin, has been evaluated using molecular dynamicssimulations together with the thermodynamic integration method. A soft corepotential was used during the mutations to facilitate the creation and deletion ofatoms. The inhibitors studied represent a severe challenge for explicit free energycalculations as the mutations from one compound to another involve up to 19atoms and the creation or annihilation of a net charge. Parts of this chapter havebeen published in the article “Sampling and convergence in free energy calcula-tions of protein-ligand interactions: the binding of triphenoxypyridine derivativesto factor Xa and trypsin” by A. Villa, R. Zangi, G. Pieffet and A. E. Mark in theJournal of Computational Aided Molecular Design, Vol. 17, 673–686 (2003).

40 Chapter 3. Free energy calculations of protein-ligand interactions


3.1 Introduction

Estimating differences in free energy is central to the process of rational molecu-lar design. This is because all equilibrium properties of a system such as phasebehavior, association-dissociation constants, solubilities, adsorption coefficientsand conformational equilibria depend on differences in free energy between al-ternative states. Free energy differences are essentially related to the relativeprobability of finding a system in a given microscopic state. Many empiricalapproaches have been developed to estimate interaction or binding free energiesbetween proteins and ligands. However, only by using an approach that samplesan appropriate thermodynamic ensemble of states, such as Molecular Dynamics(MD) and Monte Carlo (MC) simulation techniques, from which it is possibleto get thermal averages over microscopic configurations at an atomic level, candifferences in free energy between two states of a system be estimated directly[37, 38, 39, 40, 41, 42, 43]. The difficulty is that the computational cost of obtain-ing sufficient sampling and converged results has made the routine applicationof free energy calculations for estimating binding free energies impractical. Thissituation is, however, rapidly changing. The use of modified intermediate poten-tials has been shown to improve sampling dramatically and the rapid advanceof computer power means that the utility of free energy calculation in moleculardesign must be constantly re-evaluated.

In this chapter, the relative binding affinities of a set of 10 inhibitors to twoserine proteases, factor Xa and trypsin, that share sequence and structural ho-mology, are evaluated. Factor Xa activates thrombin and plays a regulatory rolein blood coagulation. Thus, it has been a target for the design of anti-thromboticagents. However, because the active sites of many serine proteases are very simi-lar, it is important that inhibitors selectively bind to factor Xa and not to otherserine proteases, such as trypsin or thrombin.

The set of inhibitors studied in this work were initially proposed as part ofthe Critical Assessment of Techniques for Free Energy Evaluation (CATFEE)project. This competition, scheduled for summer 2001, was intended as a blindtest of approaches to estimate free energies. However, although predictions weresubmitted by various groups around the world neither the corresponding experi-mental data nor any objective assessment of the predictions were ever publishedby the organizers.

The inhibitors analyzed in this study are based on a triphenoxypyridine tem-plate (Fig. 3.1) but differ in the number and in the type of the substituents ontwo of the three phenyl groups (Table 3.1). The inhibitors differ significantly fromeach other and the transformation from one target into another involves mutatingmany more sites compared with cases normally dealt within free energy calcu-lations [39, 44, 45, 40, 46]. Therefore, the mutations are highly challenging andproblems of sampling and convergence are of major concern in the calculations.

In this chapter issues related to sampling and convergence are considered. Aspecial focus is placed on the simulations performed in water. We primarily looked


NH

H

H

H

N

N OO

O

F F

OH

R2

+

R1

1413

49

1112

9

58

115

32

Figure 3.1: The generic chemical structure of the inhibitors, 2,4,6-triphenoxypyridine.The substituents R1 and R2 for each inhibitor are specified in Table 3.1. The numberingof atoms was designed with a part common to all inhibitors: 2,4,6-triphenoxypyridine,atoms 1-14; benzylamidine ring, atoms 15-31. The aromatic ring with substituent R1 orR2 contained the mutated area and was different for each inhibitor. The numbering ofatoms for the inhibitor I2 is given as an indication: aromatic ring with substituent R2,atoms 32-46; aromatic ring with substituent R1, atoms 47-58.

at self consistency, exploiting the fact that the free energy is a state function sothe difference in the free energy between two states is independent of the pathchosen. In addition, a comparison with experimental results communicated tous by one of the organizers of the CATFEE competition is also presented anddiscussed.

3.2 Methods

In this study complexes of factor Xa and trypsin with a set of 10 inhibitorswere investigated. All the inhibitors are derivatives of 2,4,6-triphenoxypyridinedisplayed in Fig. 3.1. The substituents R1 and R2 for each inhibitor are givenin Table 3.1. No structural data on the complexes of factor Xa and trypsinwith these inhibitors was available at the time the work was undertaken. Forthis reason the crystallographic structures of factor Xa and trypsin complexedwith 2,6-diphenoxypyridine (PDB reference 1FJS for factor Xa [47] and 1QB1 fortrypsin [48]) were used as templates to obtain initial structures. The structureof factor Xa is comprised of two polypeptide chains. However, only the primarychain, which contains the binding site, was included in the calculations. This wasdone in order to minimize the size of the system and because the minor chainsare positioned well away from the inhibitor binding site.

The initial structure of factor Xa and trypsin complexed with inhibitor 1 (I1)were constructed by superimposing the nitrogen atoms of the amino(imino)methyl

3.2. Methods 43

Table 3.1: The set of 10 inhibitors used in this study to estimate binding free energiesto factor Xa and trypsin. R1 and R2 are the substituents of 2,4,6-triphenoxypyridineshown in Fig. 3.1. Note that the substituent R2 is common to inhibitors I1–I6 while thesubstituent R1 is common to inhibitors I2 and I7–I10.

Inhibitor R1 R2

I1 2-OH-4-COO− 1-methyl-2(2H)-imidazolineI2 2,6-OCH3-4-COO− 1-methyl-2(2H)-imidazolineI3 2-Cl-4-COOCH2CH3 1-methyl-2(2H)-imidazolineI4 2,6-CH3-4-COOCH2CH3 1-methyl-2(2H)-imidazolineI5 2-OCH2CH2CH2N(CH3)2-5-COO− 1-methyl-2(2H)-imidazolineI6 2-Cl 1-methyl-2(2H)-imidazolineI7 2,6-OCH3-4-COO− NHC(NH)NH2

I8 2,6-OCH3-4-COO− (pyrrolidin-1-yl)(imino)methylaminoI9 2,6-OCH3-4-COO− (1H-imidazolin-2-yl)aminoI10 2,6-OCH3-4-COO− (1-methyl-1H-imidazolin-2-yl)amino

group and the nitrogen atoms of the imidazolyl group of the inhibitor onto thecorresponding atoms in the appropriate template structure. The conformation ofI1 used for this procedure was taken from a simulation of the isolated inhibitor inwater. The structures of factor Xa and trypsin complexed with inhibitors I2–I10were derived from the I1 complexes by mutating R1 and R2 substituents duringthe free energy calculations.

3.2.1 Mutations

The inhibitors (see Table 3.1) can be divided into two groups depending on thenature of the substituents. Inhibitors I1–I6 have the same R2 substituent whileinhibitors I2 and I7–I10 have the same R1 substituent. Inhibitor I2 belongs toboth groups and was used as a reference. The mutations were chosen in order tomaximize the number of possible cycle closures while minimizing the size of themutation itself (i.e. by only performing mutations within a given group). As thefree energy for any closed cycle is zero by definition, cycle closure can provide animportant check on the degree of convergence within the calculations. Figures3.2 and 3.3 illustrate the pathways that were used to convert one inhibitor intoanother.


OO

OH

-

Cl

OO

CH3CH3

OO

OCH3CH3O

-CH3CH2

I3

OO

Cl

I2

I4

I6

I1

CH3CH2O

O-

OCH2CH2CH2 N(CH3)2

I5

Figure 3.2: Mutations performed to transform inhibitors I1–I6 into each other. Theseinhibitors differ in the aromatic substituents, R1, of the benzene ring in the para position(see Fig. 3.1).

Figure 3.3: Mutations performed to transform inhibitors I2, and I7–I10 into each other.The inhibitors differ in the aromatic meta-substituent, R2, of the benzene ring in theortho position.

3.2. Methods 45

3.2.2 Force Field

The GROMOS96 (43a2) force field was used to describe both the protein and theinhibitors where possible [30, 49]. When no parameters were available in the stan-dard force field to describe certain atomic interactions, parameters were obtainedby fitting to ab initio calculations. The primary aim in developing additionalparameters was to maintain compatibility with the rest of the force field. The pa-rameters that were not available included the torsional potential between the phe-noxy ring and the pyridine, and some atomic charges. The determination of theseparameters was done by Alessandra Villa. The calculations were performed atthe Restricted Hartree-Fock (RHF) level with the Gaussian94 [50] program usingthe 6-31G* basis set. The torsional potential defined by an aromatic carbon andan ether oxygen as central atoms was determined by fitting an empirical potentialto the ab initio potential energy surface of the model system, p-phenylpyridine.Atomic charges were derived by scaling the atomic charges obtained by fitting theRHF/6-31G* molecular electrostatic potential of the corresponding small organicmolecules using the method of Merz and Kollman [51].

3.2.3 Computational Details

The Molecular Dynamics simulations were performed using the GROMACS pack-age version 3.0 [28, 29, 52] in explicit solvent and under periodic boundary condi-tions. The protein-inhibitor complexes were placed in a truncated octahedron boxcontaining approximately 6300 water molecules. Simulations of the free inhibitorin water were performed in boxes containing approximately 850 water molecules.The Simple Point Charge (SPC) model was used to describe the water molecules[31]. The non-bonded interactions were evaluated using a twin range cutoff of 0.9and 1.4 nm. Interactions within the shorter range cutoff were evaluated every stepwhile interactions within the longer range cutoff were evaluated every 5 steps. Tocorrect for the neglect of electrostatic interactions beyond the longer range cutoff,a reaction field (RF) correction with εRF = 78.0 was used. To maintain the tem-perature of the system at a constant value of 300 K, a Berendsen thermostat [32]was applied. The protein, the inhibitor and the solvent were each independentlycoupled to a temperature bath with a coupling time of 0.1 ps. The pressure wasmaintained by weak coupling to a reference pressure of 1 bar, with a couplingtime of 1.0 ps and an isothermal compressibility of 4.6·10−5 bar−1 [32]. The timestep used for integrating the equations of motion was 0.002 ps. The bond lengthsand angle of the water molecules were constrained using the SETTLE algorithm[53] while the bond lengths within the protein were constrained using the LINCSalgorithm [33].

The protein-I1 complex was equilibrated for 2 ns before the free energy calcu-lations were performed. For the case of the free inhibitor in water the equilibrationtime was 200 ps.


3.2.4 Free energy calculations

The free energy difference between two states of a given system can be obtainedusing the coupling parameter approach [15]. The Hamiltonian of the system,H , is expressed as a function of a coupling parameter, λ, which describes thepath taken from the initial to the final state such that when λ = λA the systemcorresponds to state A and when λ = λB the system corresponds to state B. Thedifference in the free energy between the two states, ∆GAB , is then given by theThermodynamic Integration (TI) equation,

∆GAB = G(λB) − G(λA) =

∫ λB

λA

(

∂G

∂λ

)

dλ =

∫ λB

λA

⟨

∂H

∂λ

⟩

λ

dλ (3.1)

In Eq. 3.1, < ... >λ denotes an ensemble average at a given value of λ. The integralin Eq. 3.1 may be evaluated numerically using a number of discrete λ-points.

Mutating one atom type into another with multiple steps requires the inter-polation of the Lennard-Jones (LJ) and the Coulomb interactions between stateA and state B. This procedure generates instabilities at points where atoms arecreated or annihilated. To circumvent this problem a soft-core potential maybe applied where the singularity at the origin is substituted by a core of finiteheight [54, 19, 20, 55]. The interpolated interaction between atom i and atom jis described by,

V (rij , λ) = (1 − λ)VA(rA) + λVB(rB) (3.2)

where the modified distances rA and rB are,

rA =(

ασA6λ2 + r6

)1/6(3.3)

rB =(

ασB6(1 − λ)2 + r6

)1/6(3.4)

the soft-core parameter, α, controls the height of the potential around r = 0 andσ has its normal meaning as in LJ potential function.

The binding free energy, ∆Gb, is the work required to transfer the inhibitorfrom being free in solution to being bound to the protein. The relative binding freeenergy, ∆∆GX−Y

b , represents the difference in the binding free energy betweeninhibitor X and inhibitor Y and was evaluated using the thermodynamic cycleshown in Fig. 3.4a,

∆∆GX−Yb = ∆GY

b − ∆GXb = ∆GX−Y (pro) − ∆GX−Y (wat) (3.5)

where ∆GX−Y (pro) and ∆GX−Y (wat) are the work required to mutate inhibitorX to inhibitor Y in the protein and in water, respectively. A similar expression isobtained for calculating the difference in the binding free energy between inhibitorX and inhibitor Y with respect to two proteins, p1 and p2, as shown in Fig. 3.4b,

∆∆GX−Yp1−p2 = ∆GY

p1−p2 − ∆GXp1−p2 = ∆GX−Y

p1 − ∆GX−Yp2 (3.6)

3.3. Results 47

In the calculations of ∆GX−Y one inhibitor was gradually mutated into an-other inhibitor. In cases where there was no direct correspondence between theatoms in the two molecules ’dummy atoms’ were used. A dummy atom is an atomfor which the non-bonded interactions with all other atoms are zero. Only bondedand non-bonded interactions were mutated during the calculations. The massesof the atoms were not altered. The non-bonded interactions between the initialand final states were interpolated using a soft-core potential [54] as implementedin the GROMACS simulation package [29, 56].

All mutations indicated in Fig. 3.2 and 3.3 were performed for factor Xa andtrypsin complexes and for the inhibitors free in water. In this collective work,the simulations of the mutations of the ligands in factor Xa and trypsin wereperformed by A. Villa and R. Zangi respectively. The free energy was evaluatedusing 18 λ-points.

The number of λ-points was increased in regions where the integrand in Eq. 3.1exhibited a sharp discontinuity when plotted as a function of λ which indicated alarge perturbation of the system by the mutation.

At each value of λ the system was equilibrated for 50 ps. The derivative∂H(λ)/∂λ was then averaged over 250 ps in the case of the protein-inhibitorscomplexes and over 150 ps in the case the inhibitors in water. The average valueof the derivatives was calculated at each λ point and the resulting free energyprofile was then integrated using the trapezoidal method to obtain ∆GX−Y . Theerror in 〈∂H(λ)/∂λ〉λ was estimated using a block averaging procedure [57, 58]at each λ-point. The individual errors were then integrated to yield an estimateof the error in ∆GX−Y . All calculations were performed on a Linux cluster of 1.7GHz Pentium-IV based machines. Each λ-point for the complex (300 ps) requiredapproximately 12 CPU hours.

3.3 Results

3.3.1 Mutations in water

The free energy values of mutating one inhibitor X into another Y , ∆GX−Y , inwater via the pathways shown in Fig. 3.2 and 3.3 are reported in Table 3.2 andFig. 3.5. To check the dependency of the results on the direction of the mutation,the free energy of the reverse mutations, ∆GY −X , were also calculated. Exceptfor I1 → I3 and for I2 → I5, the average discrepancy in the free energy betweenthe forward and the reverse transformation is only 2.7 kJ/mol, just above theaverage estimated error. Thus, in most cases the results are insensitive to thedirection of the mutation, i.e. are reversible. The two cases where reversibilitywas not observed are discussed below.

In principle, the aromatic rings in the inhibitor are free to rotate. However,such rotations may not necessarily be observed on the time scale of the simulations.For cases where such rotation would yield a different state (i.e. for cases not axially


Figure 3.4: Thermodynamic cycles used to determine the difference in binding freeenergy between inhibitor X and inhibitor Y relative to (a) the unbound inhibitors inwater (b) the inhibitors bound to another protein.

3.3. Results 49

��

∗

��∗

�

∗

�

�

��

-�� !

� "$#�% � &�� '� (! � " �*)+� �,�-�� .�

-/10 &�� 2�� 3! �

��

��

��

��

/ #�# � )�� 4�

XXXXXXXXy / � # � # ��!� �!

?

�� ,�-�� 5�

BB

BBB

BB

BBM

� " � )6� 7'� 8� Q

QQ

QQ

QQ

QQ

QQ

QQs

/ 2 "�" � " �� 8� @

@@

@@@R

/ 2 " �9� ),�-�� :�

��

��

��

��

/ " � % � �,�� 4!

?

/ � %�% � " �;5<� �=

@@

@@

@@

@@@I

" 29&�� 0 � 5<� 5!

?

/ 2�)�� ,�� !

�2�� % � 7'� 3!

?

# 2�� )>�� 7<

Figure 3.5: Schematic diagram of the mutations between the inhibitors in water. Thevalues shown are the free energy changes (kJ/mol). The numbers in parentheses are theestimated errors.


Table 3.2: The free energies (in kJ/mol) for mutating inhibitor X into inhibitor Yin water, ∆GX−Y , and for the corresponding reversed mutation, ∆GY −X . D is theabsolute value of the sum between ∆GX−Y and ∆GY −X . The description of the chemicaltransformations are shown in Fig. 3.2 and 3.3.

Mutation ∆GX−Y ∆GY −X DI1 → I2 77.3 ± 1.3 -73.0 ± 1.6 4.3I1* → I2 −−−− -76.6 ± 1.4 –I1 → I3 271.5 ± 1.6 -258.1 ± 7.4 13.4I1* → I3* 270.5 ± 1.7 −−−− –I3 →I6 52.6 ± 0.6 -51.4 ± 0.6 1.2I6 → I1 -321.5 ± 1.5 322.9 ± 1.7 1.4I6* → I1* −−−− 321.0 ± 1.9 –I3 → I4 9.7 ± 1.3 -5.7 ± 1.1 4.0I4 → I1 -277.8 ± 1.7 271.6 ± 1.9 6.2I2 → I5 -39.5 ± 3.0 56.5 ± 4.8 17.0I2 → I5* -51.9 ± 3.1 48.8 ± 3.2 3.1I2 → I7 -97.9 ± 1.3 98.7 ± 1.4 0.8I7 → I8 -141.9 ± 1.8 146.6 ± 2.4 4.7I8 → I2 238.8 ± 2.1 -239.0 ± 2.4 0.2I9 → I7 -28.6 ± 1.4 32.9 ± 1.4 4.3I2 → I9 -66.0 ± 1.2 65.9 ±2.4 0.1I9 → I10 2.8 ± 0.3 -4.0 ± 0.3 1.2I2 → I10 -61.0 ± 1.1 65.0 ± 1.0 4.0

3.3. Results 51

symmetric) the results may be dependent on the starting conformation. This couldoccur for inhibitors I1, I3, I5 and I6 (see Fig. 3.2). Therefore, the mutations forboth orientations were performed. The alternative orientation is indicated by astar, ’*’. In water, free rotation of the aromatic rings is expected. The resultsshould be independent of the initial conformation of the inhibitor. This is indeedthe case as can be seen from the values reported in Table 3.2 and Fig. 3.5. Thedifference in the free energy starting from the two different conformers is withinthe estimated error (0.3–3.2 kJ/mol). There is only one exception, the I2 →I5 transformation, which also showed a large discrepancy between the forwardand backward directions. This mutation involves many atoms as well as theannihilation and creation of a charge. The pathways for mutating I2 to I5 or I5*are shown in Fig. 3.6. Simulations were conducted to examine the dependency of∆G2→5(wat) and of ∆G2→5∗(wat) on the sampling time. The results are shownin Table 3.3. As the sampling time is increased, the difference in the free energyfor the two conformations of I5 globally decreases. From 18.2 kJ/mol when thesampling time is 150 ps to 9.2 kJ/mol when the sampling time is 6 ns. Althoughthe difference in free energy between the two states is still quite large, their freeenergy profiles (see Fig. 3.7) indicate that most of the difference is due to thediscrepancy at one single λ point. At λ = 0.65 and after 6 ns, 〈∂H(λ)/∂λ〉 = 214.0kJ/mol for mutating I2 to I5 and 〈∂H(λ)/∂λ〉 = 47.8 kJ/mol for mutating I2 toI5*. This clearly indicates that different sets of conformations were sampled duringthe simulations. Furthermore, the trend of the free energy profiles shown in Fig.3.7 suggests that the value of the free energy derivative for the transformation I2 →I5* is too low. In order to further study the variation of the free energy derivativeat this λ point, the transformation I2 → I5* was simulated using a different set ofinitial velocities. After 6 ns, the free energy derivative converged within 5 kJ/molof the value calculated for the mutation of I2 to I5 with 〈∂H(λ)/∂λ〉 = 209.0kJ/mol. The free energy difference of the transformation I2 → I5* calculated usingthis alternative result for λ = 0.65 was equal to 47.7 kJ/mol which corresponds to adifference of 1 kJ/mol only with respect to the free energy difference calculated forthe transformation I2 → I5. In order to determine what simulation length wouldbe needed to observe a spontaneous transition between the conformations sampledduring the transformation I2 → I5 and I2 → I5* at λ = 0.65, the simulation timewas increased to 20 ns. After 20 ns, the free energy derivative of the transformationI2 → I5 remained almost identical and was equal to 213.7 kJ/mol while in thecase of the transformation I2 → I5* the free energy derivative increased to 159.2kJ/mol. The evolution of the cumulative and the local average of the free energyderivative as a function of time at λ = 0.65 for the transformations I2 → I5 andI2 → I5* is shown in Fig. 3.8. Figures 3.8.a and 3.8.b show that the free energyseems to converge quickly and towards the same value in both cases althoughstarting from different conformations. Figures 3.8.c shows that after 6 ns the freeenergy seems to have also converged but to a different value. However, after 6.8ns a large increase of the free energy occurs indicating a change of conformation.This implies that most relevant conformations could not be sampled within 6 ns.


Figure 3.6: The I2 → I5 mutation.

Table 3.3: The free energies (kJ/mol) for the mutations I2 → I5 and I2 → I5* in waterobtained using different sampling times.

sampling time (ps) ∆G2→5(wat) ∆G2→5∗(wat) ∆∆G150 -39.5 ± 3.0 -57.7±3.1 18.22000 -52.5 ± 2.6 -53.0±9.6 0.54000 -51.5 ± 1.9 -55.0±6.3 3.56000 -46.7 ± 4.9 -55.8±4.5 9.2

The cumulative average also suggests that even 20 ns are still not enough for thefree energy to converge. The dihedral angle describing the position of the pyridinering with respect to the benzene ring of the substituent R1 is also shown in Fig.3.8. Analysis of this dihedral angle show that the conformations sampled in Fig.3.8.a and 3.8.b are different even though their free energy derivatives are similar,confirming that only local convergence can be achieved in 6 ns.

In this case 150 ps, 6 ns or even 20 ns are insufficient to sample the rotationof the phenoxy ring. This is because the amine group in I5 can form strongintramolecular interactions. For inhibitors I1, I3 and I6 similar intramolecularinteractions are not possible.

3.3. Results 53

0 0.2 0.4 0.6 0.8 1λ

-400

-200

0

200

400<d

H/d

λ> (k

J m

ol-1

)

2->52->5*

a

0 0.2 0.4 0.6 0.8 1λ

-400

-200

0

200

400

<dH

/dλ>

(kJ

mol

-1)

2->52->5*

b

Figure 3.7: Free energy profiles of the mutation 2→5 and 2→5*. (a) The ensembleaverage of each λ point is calculated over 150 ps. The free energies associated to themutations are ∆G2→5

= −39.5 ± 4.9 kJ/mol and ∆G2→5∗= −57.7 ± 4.5 kJ/mol. (b)

The ensemble average of each λ point is calculated over 6 ns. The free energies associatedto the mutations are ∆G2→5

= −46.7± 4.9 kJ/mol and ∆G2→5∗= −55.8± 4.5 kJ/mol.

Table 3.4: Free energies (kJ/mol) of mutating one inhibitor into another along circularpaths as described schematically in Fig. 3.2 and 3.3 in water, in factor Xa and in trypsin.

circular path water factor Xa trypsin1-3-4-1 -2.2 ± 7.5 -12.1 ± 5.9 13.2 ± 6.61-3-6-1 -5.4 ± 6.7 6.0 ± 4.7 9.3 ± 6.22-9-7-2 1.6 ± 4.5 5.5 ± 6.2 -0.7 ± 6.02-9-10-2 0.4 ± 3.1 3.2 ± 3.1 -3.6 ± 5.22-7-8-2 -3.6 ± 5.6 5.3 ± 6.5 -4.9 ± 6.6

In Table 3.4 the free energy values for the closed paths shown schematically inFig. 3.2 and 3.3 are reported. The magnitude of the error in evaluating the freeenergy along circular paths is comparable in the forward and reverse directions.The values reported in Table 3.4 correspond to the average of the forward andthe backward transformations. The deviations from zero are in the range 0.4–5.4kJ/mol which is smaller than the sum of the estimated errors of the individualsteps indicating there is some cancellation of error within the cycles.

In the I1 → I3 mutation, the derivative, ∂H(λ)/∂λ, varied sharply as a functionof λ in the region λ ≤ 0.4. Therefore, in an attempt to improve the convergence ofthe calculation for this mutation, 6 additional λ-points were added: λ = 0.04, 0.06,0.08, 0.15, 0.25 and 0.35. However, the difference in the free energy compared tothe 18 λ-point calculation was only 2.2 kJ/mol. The free energy profiles of theforward and backward mutation in the region λ ≥ 0.4 exhibited some degree ofhysteresis. Therefore, the sampling time for all λ ≥ 0.4 was extended to 1 ns in


0 1 2 3 4 5 6Time (ns)

0

50

100

150

200

250

300

<dH

/dλ>

(kJ

mol

-1)

cumulative averagelocal average over 50ps

0 1 2 3 4 5 6Time (ns)

0

45

90

135

180

225

270

315

360

Dih

edra

l ang

le (d

egre

e)χ1 11-12-13-14

a

0 1 2 3 4 5 6Time (ns)

0

50

100

150

200

250

300

<dH

/dλ>

(kJ

mol

-1)


0 1 2 3 4 5 6Time (ns)

180

225

270

315

360

Dih

edra

l ang

le (d

egre

e)

χ1 11-12 -13-14

b

0 5 10 15 20Time (ns)

0

50

100

150

200

250

300

<dH

/dλ>

(kJ

mol

-1)


135

180

225

270

315

360

Dih

edra

l ang

le (d

egre

e)

χ1 11-12-13-14

c

Figure 3.8: Cumulative and local average of the free energy derivative at λ = 0.65 forthe mutation I2 → I5 in water. The local average of the dihedral angle formed by theatoms 11-12-13-14 (see Fig. 3.1) is also represented. (a) The mutation I2 → I5. (b) Thesecond simulation of the mutation I2 → I5*. (c) The first simulation of the mutation I2→ I5* extended to 20 ns.

3.3. Results 55

Figure 3.9: The binding site of factor Xa bound to I1. The inhibitor and the residuesAsp189 and Trp215, that form hydrogen bonds with the benzylamidine ring of I1, areemphasized.

order to increase convergence. Again, no significant improvement was observed.The difference in the free energy compared to the 150 ps sampling time was 0.3kJ/mol. The transformation from I1 to I3 involves an annihilation of charge of thepara-carboxy group. It is not completely clear to us if this could be an explanationto the large discrepancy observed in this mutation.

3.3.2 Ligand-protein complexes

A prerequisite to obtain the correct relative free energy for a protein-inhibitorcomplex is that the inhibitor forms the proper interactions inside the binding site.Figures 3.9 and 3.10 show the structures after 2 ns of simulation of inhibitor I1bound to factor Xa and to trypsin, respectively. Note that in both cases thenumbering of the residues is the same as found in the X-ray PDB files. Thebenzylamidine ring resides in the S1 pocket making hydrogen bonds with Asp189.The average distance between the amidine group of the inhibitor and the carbonyloxygens of this residue is 2.1 A for both factor Xa and trypsin. The hydroxylgroup in the para position on the benzylamidine ring forms a hydrogen bond withSer195. The average bond distances in factor Xa and in trypsin are 2.3 A and2.1 A, respectively. The interactions in the S1 binding site in the simulations areconsistent with results obtained from X-ray crystallographic studies of trypsin [48]and factor Xa [47] complexed with bisphenoxypyridine.


Figure 3.10: The binding site of trypsin bound to I1. The inhibitor and the residuesAsp189 and Trp215, that form hydrogen bonds with the benzylamidine ring of I1, areemphasized.

The phenyl ring of inhibitor I1, carrying the substituent 1-methyl-2-(2H)-imidazoline, lies in the vicinity of Trp215 in the S4 pocket. The average distancebetween the center of mass of these two groups is 6.0 A in factor Xa and 5.0A in trypsin. In the X-ray structures of the bisphenoxypyridine complexes, thephenoxy group fits snugly into the S4 pocket and lies across the indole ring ofTrp215 (4.0 A).

3.3.3 Mutations in factor Xa and trypsin

Starting from the equilibrated structure of the protein-I1 complex, all mutationsshown in Fig. 3.2 and 3.3 were performed, except for the transformation I2 → I5.For this mutation, the results in water (Table 3.3) had already indicated that aminimum sampling time at each λ value in the order of nanoseconds is required.As in the complex even longer sampling times might be necessary to achieveconvergence and simulating the protein-inhibitor system is computationally verydemanding this mutation was not performed in factor Xa or trypsin.

As shown earlier, the rotation of the aromatic ring in water is, in general, sam-pled adequately within 150 ps. However, inside the protein sampling is restrictedand the aromatic rings may not adopt all possible orientations spontaneously.Therefore, for cases where rotation around the aromatic ring would yield differentstates, the calculations were performed in both orientations. The results of the

3.3. Results 57

?A@CBED�FHGJIK@

LL∗

MNN∗

O

P P∗

Q

R

SL�T

-U�V�W X�Y�Z�[ \�]

�V�^9V�W _>Y�Z�[ `!]

�V$a�X+W ^,Y;b<[ c!]

-d!e9egf W V,Y�Z�[ `�]�

��

��

��

��

d U e W V6Y�Z�[ \�]

XXXXXXXXy d ^$a�W X6Y�Z![ `�]

?

e UhW _iY j'[ k!]

BB

BBB

BB

BBM

^9U�W X6Y-Z�[ j!]Q

QQ

QQ

QQ

QQ

QQ

QQs

d1f�l _EW f Y�Z�[ k�]@@

@@

@@R

d=f�l U�W f Y;m<[ j!]

��

��

��

��

d V$n e W X�Y b<[ m!]

?

d<e V9U�W VoY;m<[ k�]

@@

@@

@@

@@@I V*_�^�W UoY�Z�[ `!]

?

d=f a�W ^6Y m<[ k�]

�V�W e Y j'[ c<]

?

U�V�W f Y�Z�[ m!]

Figure 3.11: Schematic diagram of the mutations between the inhibitors in factor Xa.The values shown are the free energy changes (kJ/mol). The numbers in parentheses arethe estimated errors.

mutations in factor Xa and in trypsin are reported in Fig. 3.11 and 3.12, respec-tively. The difference between the free energy of the same mutation but with adifferent starting conformation of the inhibitor ranged from 2.8 to 17.1 kJ/mol.Thus, 250 ps at each λ point is not sufficient to sample an equilibrium distributionof orientations of the benzene rings inside the binding pocket. In fact, it is proba-ble that due to steric hindrance such rotations are not possible while the inhibitoris inside the binding pocket and that treating the conformations as independentwould be required irrespective of the sampling time. Thus, both orientations wereconsidered separately and the preferred orientation was assumed to be the onethat corresponds to the lower free energy.

As a test of convergence the results of the free energies along circular pathsare reported in Table 3.4. The average deviation from zero is 6.4 kJ/mol, whichis approximately equal to the accumulated error.

The binding free energies are obtained by subtracting the free energies of themutations in water from the corresponding free energies in the protein. The rela-tive free energy of binding of the two inhibitors inside the two proteins, ∆∆GX→Y

p1−p2

was also obtained by subtracting the free energies of the mutations in one pro-tein from the corresponding free energies in the other protein. It is convenientto have a single reference state for the inhibitors mutations, thus, in Table 3.5


poq�rAsuthv�w

xx∗

yzz∗

{

| |∗

}

~

�x��

-��h� ��

� � �9�� <� �!�� +� �6�;�<� �!�

-�1� �h� � �� !��

��

��

��

��

� �� >� �<� �!�

XXXXXXXXy � ��h� �6� �'� �=�

?

�9�� >�� <�

BB

BBB

BB

BBM

� � � �6�-�� Q

QQ

QQ

QQ

QQ

QQ

QQs

� �� 6�� @@

@@

@@R

� � � �E� � �-��

��

��

��

��

� � �+�� <� �!�

?

� ��9� �6�;�<� �!�

@@

@@

@@

@@@I

� �9�+� �� !�

?

� �9��

��+� �>� �'� �!�

?

�9�� <�

Figure 3.12: Schematic diagram of the mutations between the inhibitors in trypsin.The values shown are the free energy changes (kJ/mol). The numbers in parentheses arethe estimated errors.

3.3. Results 59

Table 3.5: The difference of the average free energy (kJ/mol) of the mutation betweenthe inhibitors referenced to inhibitor I2, ∆∆G2→Y , in water, factor Xa and trypsin.

factor Xa - water trypsin - water trypsin - factor XaI1 2.2 ± 2.9 -0.7 ± 3.5 -2.9 ± 3.9I2 0.0 ± 0.0 0.0 ± 0.0 0.0 ± 0.0I3 -10.7 ± 5.2 -10.9 ± 7.2 -0.2 ± 9.3I4 -1.0 ± 5.6 -0.8 ± 5.8 0.2 ± 5.6I6 -4.8 ± 5.5 2.5 ± 7.0 7.3 ± 7.5I7 -15.9 ± 3.4 0.6 ± 3.3 16.4 ± 3.8I8 1.1 ± 4.6 -6.6 ± 4.7 -7.8 ± 5.4I9 -10.1 ± 3.4 5.1 ± 3.3 15.1 ± 3.8I10 -11.4 ± 3.5 6.0 ± 3.6 17.4 ± 4.1

the relative binding free energies are given with respect to I2, ∆∆G2→Y . Wherethere are two or more pathways by which the free energy could be calculated thevalue corresponding to the pathway with the smallest accumulated error is given.The values provide an indication of the relative binding affinity of the inhibitors.Based on the results in Table 3.5 the ranking of the inhibitors between trypsinand factor Xa was determined and is shown in Fig. 3.13. A distinction betweenthe inhibitors is only made if the difference in the binding free energies is largerthan the associated error. The average error, estimated from the closure of thethermodynamic cycle, of ∆∆G2→Y

p1−p2 for the case of trypsin and factor Xa is 5.4kJ/mol. Therefore, the inhibitors were divided into groups that differ by morethan 10 kJ/mol. According to this ranking, I8 has the highest affinity for trypsinwith respect to factor Xa. While inhibitors I7, I9 and I10, have the highest affinityfor factor Xa with respect to trypsin. It should be emphasized that this does notmean that inhibitor I3 has more affinity for trypsin than for factor Xa in absoluteterms but only with respect to the other inhibitors.

3.3.4 Experiment vs. calculation

Almost 2 years after the work described here was performed and after our predic-tions had been published experimental values of the binding of the inhibitors tothe receptors factor Xa and trypsin were privately communicated to us as inhi-bition constants Ki by one of the organizers of the CATFEE contest. Inhibitionconstants Ki correspond to the concentration of inhibitor that results in half max-imal rate of the enzyme. Ki are related to the dissociation of the enzyme-inhibitorcomplex and have the same meaning as dissociation constants. They are shownin Table 3.6 together with the corresponding relative binding free energies withrespect to inhibitor I2, ∆∆G2→Y , and the ranks of the inhibitors binding to thereceptors. Clearly, it appears from these results that the entire set of inhibitors


Figure 3.13: Schematic illustration of the inhibitors classified according to their relativeaffinities to trypsin and factor Xa. The resolution of the binding affinities between thedifferent groups is approximately 10 kJ/mol.

binds much stronger to factor Xa than to trypsin. The weakest inhibitor for fac-tor Xa (i.e. I8) still binds stronger than the strongest inhibitor for trypsin (i.e.I2). The strength of the binding varies greatly depending on the inhibitor in thecase of factor Xa with more than two orders of magnitude between the strongestand the weakest inhibitors. In contrast, all the inhibitors bind with a similarstrength to trypsin. Less than one order of magnitude separates the strongestand the weakest inhibitors. In comparison the difference in binding strength cal-culated from the simulations was found to be around three orders of magnitudebetween the strongest and the weakest inhibitors in both proteins. Table 3.6shows that inhibitor I2 binds much more strongly to both proteins relative to theother inhibitors than indicated by the simulations. Inhibitor I2 appears to be thestrongest inhibitor for trypsin and one of the strongest inhibitors for factor Xa (itis separated from I1 by only 1 kJ/mol). Inhibitor I7 was correctly predicted to beone of the inhibitors with the highest affinity for factor Xa but inhibitor I1 whichexperimentally has the highest affinity for factor Xa ranks last in the results ofthe simulations. Results in trypsin are even more difficult to reproduce as thedifferences in free energy between the inhibitors are even smaller than in the caseof factor Xa. Experimentally the difference in free energy between the strongestand the weakest inhibitor is only 4.7 kJ/mol. This is in the order of the averageerror in the calculations. Not surprisingly the ranking of the inhibitors in trypsincould not be accurately determined.

3.4 Discussion and conclusions

The accuracy of free energy calculations depends on two factors. First, the accu-racy of the force field. This can only be confirmed by comparison to experimentalresults. Second, free energy calculations depend on the degree of sampling andconvergence. This is in principle independent of whether we know the ’true’ an-

3.4. Discussion and conclusions 61

Table 3.6: Experimental values of the inhibition constant Ki (nM), relative bindingfree energy (kJ/mol) of the inhibitors with respect to inhibitor I2, ∆∆G2→Y , in factorXa and trypsin, their associated ranking and the relative free energy of binding of theinhibitors with respect to inhibitor I2 inside the two proteins.

factor Xa trypsin trypsin - factor XaKi ∆∆G2→Y rank Ki ∆∆G2→Y rank

I1 0.12 -1.0 1 280 1.1 2 2.1I2 0.18 0 3 180 0 1 0I3 5.8 8.5 6 760 3.6 7 -4.9I4 6.3 8.7 7 960 4.1 8 -4.6I5 0.68 3.3 4 290 1.2 3 -2.1I6 0.98 4.2 5 380 1.9 4 -2.3I7 0.17 -0.1 2 550 2.8 5 2.9I8 35 12.9 10 1200 4.7 9 -8.2I9 6.8 8.9 9 650 3.2 6 -5.7I10 6.6 8.8 8 1200 4.7 9 -4.1

swer and was the main focus of this study. To quote a recent review by Chipot andPearlman [43], “it is clear that in some cases the amazingly good agreement be-tween theory and experiment.... must have been fortuitous”. For cases where thetransformation involves a simple mutation (i.e. involves a small number of atomsand no change of charge) and the location of the compound inside the binding siteis known experimentally, good agreement between the calculated binding affinitiesand experimental values are generally found. However, for large mutations thatinvolve the annihilation and creation of adjacent groups of atoms, difficulties insampling led to large errors when determining binding affinities. Note, that evenconverged results can still deviate from experimental results due to simulationconditions which are not appropriate for the system investigated. For example,the treatment of the electrostatic interactions such as the RF approach is notideal for simulation for large proteins.

The calculation of the forward and the backwards mutations in water show thatreasonable convergence was obtained during the 150 ps sampling time at each λpoint for all the transformations studied, except for I2 → I5. Slight improvementsupon increasing the sampling time could be achieved as the average difference be-tween the forward and the backward mutation was still slightly higher than theaverage estimated error. Another indication that the results had converged is thatthe mutations in water were insensitive to the initial or final conformation of theinhibitor. In particular, sufficient averaging over the different rotational statesof the benzene rings was obtained. For I2 → I5, a mutation characterized by along chain tertiary amine that can form favorable intra molecular interactions,


approximately 6 ns at each λ value appeared to be enough to achieve conver-gence, except at λ = 0.65 for which 20 ns was still too short to sample the correctdistribution of all the relevant conformations. Depending on the type of muta-tions performed, the free energy along closed cycles in water (that encompass 3individual transformations) is in the range 0.4–5.4 kJ/mol. This range of error issmaller than the sum of the estimated errors of the individual transformations.This cycle closure is obtained when using the average values from the forward andthe backward mutations, suggesting a further increase in sampling time, wouldcertainly decrease the error.

The results of the free energy calculations involving the protein-ligand com-plexes show that due to steric hindrance full sampling of the rotational states ofthe aromatic rings does not occur on a time scale of a few hundreds picoseconds.Thus, in order to evaluate the binding free energy in such cases, it is necessaryto consider each of the possible orientations separately. For the three cases thatwe examined the difference between the free energies of the transformations withtwo different orientations ranged from ∼3 to ∼17 kJ/mol. Such differences aresufficient to distinguish between the two orientations of the inhibitor inside thebinding pocket of the receptor.

The type and complexity of the mutations that were performed in this studyare very challenging. The number of mutated sites, their proximity in spaceand the creation or annihilation of the charges all contribute to the difficulty ofobtaining converged results. Nevertheless, it has been possible to obtain closure ofthermodynamic cycles containing 3 individual steps to within ±5 kJ/mol. This isa very high degree of convergence considering the mutations involved and shouldgive confidence that, given an appropriate force field, nanosecond simulationsmay yield accurate estimates of the binding free energies in cases relevant to drugdesign.

As noted previously no experimental data on the binding affinities of the dif-ferent compounds was available at the time the work was performed and analyzed.Recently, however, some experimental data was comunicated to us privately andin confidence by one of the organizers of the CATFEE contest. In lights of theseexperimental results, it is possible to re-examine some of the conclusions of ourwork. The discrepancy between the results from experiment and simulations isstriking. Two possible reasons for such a discrepancy are the accuracy of theforce field and the degree of convergence. A third possibility is the accuracy ofthe experimental results. The force field might be responsible for not being able tocorrectly discriminate between inhibitors with similar binding free energies (witha difference in the order of a few kJ/mol) as is the case with the binding of theinhibitors to trypsin. However, when dealing with force field inaccuracies, errorsshould be systematic. Although the ranking of the inhibitors is expected to bedifferent, the spread of the values should be similar in the experimental values andthe values calculated from the simulations. The fact that in the case of trypsin thedifference in binding free energy between the strongest and the weakest inhibitorsis only 4.7 kJ/mol from experiment and 17 kJ/mol from simulations suggests

3.4. Discussion and conclusions 63

other sources of error are also significant.The results suggest clearly that in fact nanosecond simulations are too short

to yield accurate estimates of the binding free energies in cases relevant to drugdesign and that closures of thermodynamic cycles are useful but by no meansufficient to ensure that convergence has been reached. They can also suggeststhat the different orientations of the inhibitors inside the binding pocket of thereceptor might need to be taken into account and not only the most favourableone. In this case the correct distribution would be required and much longersimulations would be needed.

Finally, despite the CATFEE competition fiasco, there is a clear need to ob-jectively test methods to predict interaction free energies. The necessary data isalready available in the proprietary data bases of various pharmaceutical compa-nies. We can only hope that a small proportion of such data can be released sothat true ’blind’ tests can be performed and evaluated by the research community.This is certainly in the interest of everybody.


65

Chapter 4Free energy calculations of the relative

stability of the SUC1 dimer upon

mutation

Free energy calculations remain a major challenge because of the extent of theconformational space that needs to be sampled even in the case of free energydifferences where only relevant conformations of the beginning and end states arerequired. However the increase of computational power raises the question of thelimitations of the methodology and whether the change in free energy associatedto mutations in realistic systems such as proteins can be estimated. The relativestability of 17 mutants of the Suc1 protein dimer is evaluated using molecular dy-namics simulations together with thermodynamic integration. Comparison withexperiment gives insight on the current state of the method and allows one toexamine the different factors influencing the accuracy of the results.



4.1 Introduction

The direction in which a chemical reaction proceeds or the position of the chemicalequilibrium (the equilibrium constant) are both determined by the difference infree energy between different states of a system. In biological systems this meansthat important properties such as partitioning behavior of functional groups or theresponse of a system to changes in external conditions (temperature, pressure, pH)are also governed by differences in free energy. The ability to accurately determinedifferences in free energy is therefore of great interest in biophysics and structuralbiology as it would allow the prediction of phenomena such as conformationalchanges or protein-ligand interactions. The prediction of the change in free energyeven associated with simple processes such as the binding of a ligand to a receptorremains a major challenge. Free energy calculations are very demanding. They inprinciple require the sampling of the complete conformational space (in the caseof the absolute free energy) or at the very least extensive sampling of relevantconformations of the system (in the case of free energy differences) which is stilla formidable task even for simple systems.

Nevertheless, free energy calculations are increasingly used to calculate solva-tion free energy [20, 59, 56] and relative binding affinities of ligands to proteins[21, 60, 22]. With the rapid increase of computational resources and improvedsampling techniques [21, 59], the question of what are the limitations of themethodology arises. For example, is it possible to estimate the change in freeenergy associated with amino-acid substitutions on protein-protein interaction[61, 62].

In this chapter the effect of the mutation of a residue on the stability of the

Suc1 (p13suc1) protein dimer is evaluated. The relative free energy of dissociationhas been calculated for 17 mutants of the Suc1 protein using molecular dynamics(MD) simulations techniques together with the thermodynamic integration (TI)formula.

Suc1 is a member of the cks (cyclin-dependant kinase subunit) family and isinvolved in cell cycle regulation in eukaryotic cells [63]. It was the first memberof the cks family of cell-cycle regulatory protein to be isolated. The cks familycontains a conserved sequence (HVPEPH in single-letter amino acid code) cor-responding to a hinge loop that mediates dimerization through the exchange ofa C-terminal β-strand. While the monomer has a standard globular form [64],the dimer forms a β-strand exchanged dimer (swap dimer [65, 66]) in which theexchanged β-strand is replaced by the equivalent β-strand of the other proteinof the dimer [67]. As can be seen in Fig. 4.1.a, the monomer is comprised offour β-sheet strands and three short helices. The dimer has an identical structureexcept for the C-terminal β-strand β4 (residues 94-101) which is exchanged andthe hinge loop (loop preceding β4) which is extended (Fig. 4.1.b) whereas it foldsback on itself to form a β-hairpin in the monomer. It has been proposed thatconversion between the monomeric form and the β-strand exchanged dimer is im-portant for the regulation of its biological function [68]. While the monomer can

68 Chapter 4. FE calculations of the relative stability of the SUC1 dimer

(a) (b)

Figure 4.1: Ribbon representation of the backbone of the crystallographic structure ofSuc1 in its monomeric [64] and strand-exchanged dimeric form [67]. (a) The monomerhas a standard globular form. The structure comprises a four stranded β-sheet cappedat one end by three short helices. (b) The dimer forms the same structure as in themonomer except for the hinge region which connects the exchanged β-strand β4 withthe rest of the protein and which is extended in the dimer. The structural elements areassigned as follow (both in the monomer and in the dimer): α helix 1, α1, residues 11-22; α2, 45-49; α3, 68-78; β-strand 1, β1, 25-33; β2, 36-43; β3, 82-85; β4, 94-101. Thehinge region is located between β3 and β4 (residues 88-93).

bind to cdc2, a cdk (cyclin-dependant kinase) enzyme involved in the regulationof the mitotic checkpoints, dimerization would prevent the binding due to theburial of the binding region in the β-strand exchanged form.

Suc1 has attracted much attention from a structural as well as a functionalperspective. In order to be able to understand the mechanism by which domainswapping occurs and which factors could control it, the folding pathway [69, 70]and the effect of mutations [71] have been extensively studied. It has been pro-posed that conversion between the monomer and the dimer of Suc1 occurs via thedenatured state and that partitioning between the monomer and dimer was con-trolled by two prolines in the hinge loop. In this current study, a large number ofmutations has been selected from the vast array of mutations studied experimen-tally [71]. Our aim is to assess the applicability of using free energy calculationsto estimate changes in protein-protein interactions on a realistic system. The ex-amples selected have allowed us to examine the influence of the type of mutationon the accuracy of the results. They have also allowed us to separate the effects ofsampling from force field considerations as many of the transformations involvedthe mutation of the same residue type located in different parts of the protein. Itshould also be noted that the mutations chosen for this study are non-trivial as

4.2. Method 69

all involve the creation/deletion of atomic sites.

4.2 Method

Free energy calculations

The difference in free energy associated with various single mutations was esti-mated using the coupling parameter approach. In this approach the Hamiltonianand thus the free energy is made a function of a coupling parameter λ. If theinitial state A and the end state B of a system are described by the HamiltoniansH(A) and H(B), respectively, the free energy difference between the two statescan be expressed as

4FBA = F (λB) − F (λA) =

∫ λB

λA

∂F (λ)

∂λdλ (4.1)

where λA and λB correspond to the states of the system such as H(λA) = H(A)and H(λB) = H(B). It can be shown [15] that

∂F (λ)

∂λ=

⟨

∂H(λ)

∂λ

⟩

λ

(4.2)

where 〈...〉λ represents an ensemble average at the corresponding λ value. Thefree energy difference between the states A and B is thus given by

4FBA =

∫ λB

λA

⟨

∂H(λ)

∂λ

⟩

λ

dλ (4.3)

which corresponds to the thermodynamic integration formula [15]. The integra-tion of equation 4.3 was performed by simulating the system at a number of fixedλ points and evaluating the integral numerically. This way the equilibration ofthe simulation at each λ point could be controlled and extra λ points added ifneeded. The error was calculated for each 〈∂H(λ)/∂λ〉λ using block averagingmethods [72, 57] and an estimate of the error for ∆FBA was obtained by integrat-ing over the errors at individual λ values.

Soft-core potentials

The mutations performed during the simulations involved the deletion of atomssince residues other than Gly were mutated into Ala (see Table 4.1 for a list ofthe mutations). This is known to give rise to numerical instabilities [54, 40] dueto a singularity at r=0 in the Lennard-Jones and Coulomb potentials. In order toavoid this problem, soft-core potentials are used at sites where atoms are createdor deleted, substituting the singularity limr→0 V (r) = ∞ by a core of finite height


[54, 19]. Interactions of intermediate states between A and B were interpolatedusing

V (r, λ) = (1 − λ)VA(rA) + λVB(rB) (4.4)

where VA and VB correspond to the normal potentials in states A and B respec-tively. The distances rA and rB were defined as follow

rA = (ασ6Aλ2 + r6)

16 (4.5)

rB = (ασ6B(1 − λ)2 + r6)

16 (4.6)

The parameter σ has its normal meaning as in a Lennard-Jones potential. Thesoft-core parameter α controls the height of the potential around r = 0 and thusdetermines the softness of the potential. A value of 1.51 was used in all casesexcept for the set of simulations where Leu95 was mutated into Ala. In this caseseveral different values of the soft-core parameter (1.21, 1.36, 1.51 and 1.70) wereused in order to evaluate its influence on the convergence of the calculated freeenergy [20].

Dimerization free energy

In order to relate the free energy associated with the mutation of one residueinside the protein to the change in stability observed experimentally in the dimer,the following thermodynamic cycle was used

Dimer (WT) Mono (WT) + Mono (WT)

Dimer (M) Mono (M) + Mono (M)

∆Gdiss(WT )

2∆GmonoM−WT

∆Gdiss(M)

∆GdimerM−WT

where ∆GdimerM−WT and ∆Gmonomer

M−WT correspond to the free energy associated withthe mutation occurring in the dimer and the monomer, respectively. As the freeenergy is a state function, the change in free energy for any closed path alongthe thermodynamic cycle is 0. The relative stability of the wild type (WT) dimerwith respect to the mutant (M) can therefore be expressed as

∆∆GD−M = ∆Gdiss(M) − ∆Gdiss(WT ) (4.7)

= 2∆GmonomerM−WT − ∆Gdimer

M−WT (4.8)

where

4.2. Method 71

∆∆GD−M < 0 indicates that the WT dimer is more stable than the M dimer.

Experimental values used for comparison were taken from [71]. The effect of themutation on the equilibrium between the monomer and dimer of Suc1, ∆∆GD−M ,was directly calculated from the dissociation constants of the wild-type and themutants of Suc1:

∆∆GD−M = −RT ln

(

KMd

KWTd

)

The dissociation constants were determined by size-exclusion chromatography at298 K and pH 7.5 [71]. Samples were equilibrated at 323 K and then transferedto ice for 5min before separation. The presence of a kinetic barrier between themonomer and the dimer at low temperature means that the measurements reflectthe equilibrium at 323 K. As a consequence the simulations were also performedat 323 K in order to sample the same statistical ensemble.

Simulations parameters

The molecular dynamics simulations were performed using the Gromacs softwarepackage (version 3.1) [28, 29] together with the GROMOS96 force field [30]. Inthis force field non polar hydrogen atoms are treated as united atoms togetherwith the carbon atom to which they are attached. The calculation of the relativestability of the wild type dimer with respect to its mutants requires the simulationof the mutations of both the monomer and the dimer. For the simulation of themonomer, the protein was placed in a dodecahedral box containing approximately5600 simple point charge (SPC) water molecules [31]. The dimer was placed in adodecahedral box containing approximately 15000 water molecules. The covalentbonds of the protein were constrained using the LINCS algorithm [33], whilethe bonds and angle of the water molecules were constrained using the SETTLEalgorithm [53]. The time step used to integrate the equations of motion was 2 fs.Simulations were performed using periodic boundary conditions. The system wasweakly coupled to a heat bath and a pressure bath [32]. Protein and solvent werecoupled separately to a temperature bath at 323 K with a coupling time of 0.1 ps.Pressure was controlled by coupling to a reference pressure of 1 bar with a couplingtime of 1 ps. The isothermal compressibility was 4.6× 10−5 bar−1. A twin rangecut-off was used for the calculation of the non-bonding interactions. The shortrange cut-off within which interactions were calculated every time step was set to0.9 nm. The long range cut-off was set to 1.4 nm for both the electrostatic andvan der Waals interactions which were calculated every 5 time steps during theneighbor-list update. A reaction field with a dielectric constant εRF = 78 wasapplied to account for the electrostatic interactions with the solvent beyond thelong range cut-off. Monomer and dimer were energy minimized using a steepestdescent algorithm and equilibrated for 100 ps using position restraints on the


atoms of the proteins. Monomer and dimer were further relaxed for 10 ns withoutposition restraints before performing the free energy calculations.

For each mutation, the free energy was calculated using 18 λ-points. Thestarting structures were obtained by relaxing the system for 100 ps at a givenλ-point and using the last conformation as starting structure for the relaxationat the next λ-value. The structures were further equilibrated for 100 ps at eachλ-point. The ensemble average 〈∂H/∂λ〉λ was calculated at each λ-point over 400ps for the monomer and over 200 ps for the dimer. The resulting free energy profilewas then integrated using the trapezoidal rule to obtain the mutation free energies∆Gmonomer

M−WT and ∆GdimerM−WT . Each mutation involved an accumulated simulation

time of 7.2 ns for the monomer and 3.6 ns for the dimer (data collection only).

Suc1 and the mutations

Suc1 is a 113-residue protein with an α − β fold. The crystal structure of themonomer [64] and the dimer [67] were used as starting points for the creationof the structures used in the simulations. The crystal structure of the dimer(pdb entry 1SCE) contains two independent non identical β-interchanged Suc1dimers in the asymmetric unit which will be referred to as DimerA-C and DimerB-D where A,B,C and D simply indicate the successive chains in the PDB file.The residues 1-5 and 102-113 were disordered and did not appear in the crystalstructures of the monomer and dimers. The absence of these residues in thecrystal structures indicates that they are highly flexible and therefore are notlikely to play an important role in the stabilization of the structure. These residueswere not included in the structures used to perform the simulations as modellingthese residues would have resulted in additional uncertainties. However, the noninclusion of the residues has the consequence that, at least in the dimer, thepositively charged N-terminus (N-ter) of the protein could interact directly withthe negatively charged C-terminus (C-ter).

In order to determine which dimer was most suitable to perform free en-ergy calculations, a set of 10 ns simulations were performed with each of the4 starting structures. Two structures were created with both charged and un-charged terminal residues for each of the two independent dimers. An acetylgroup (CONH3) was used as uncharged N-terminal group while a N-methyl ac-etamide group (NHCH3) was used as uncharged C-terminal group. Results ofthe simulations indicated that the DimerB-D with the uncharged terminal groups(DimerB-D-nce) was the most stable, showing the lowest root mean square de-viation from the crystallographic structure. As a consequence, the structure ofthe DimerB-D-nce was used as the starting structure for the simulations of themutation of the Suc1 dimer. The residue numbering used in the text is the sameas that used in the crystallographic structure.

Different types of mutations were performed in this study, some only involvethe mutation of neutral groups and atoms (mutation of Leu and Val into Ala),some involve the mutation of polar atoms (mutation of Ser and Tyr into Ala) and

4.3. Results and discussion 73

Table 4.1: List of the mutations performed on the dimer and monomer of Suc1. Inthe first column, the first residue corresponds to the native residue while the secondcorresponds to the residue into which it is mutated. In the third column, the mutationsize is given for the monomer. In the case of the dimer the number of sites involved bythe mutations is double the value given.

Type of mutation Residue mutated Mutation size(residue No) (No of groups mutated/deleted)

Val - Ala 41, 87, 89 3 / 2Leu - Ala 10, 18 ,43, 48, 63, 74, 95, 96 4 / 3Tyr - Ala 38 9 / 8Ser - Ala 13, 79 3 / 2Lys - Ala 49, 98 8 /7Glu - Ala 86 5 / 4

some involve the mutation of positively or negatively charged residues (mutationof Lys and Glu into Ala). A complete list of all the mutations performed on themonomer and dimer together with the size of the mutations is given in Table4.1. Mutations of the same type of residue but located at different positions inthe protein (Leu is mutated into Ala in eight different locations) have allowedus to check the influence of sampling on the results independently of force fieldconsiderations.

4.3 Results and discussion

The root mean square positional deviation (RMSD) of all backbone atomswith respect to the starting crystal structure of the monomer and dimer as afunction of time during the equilibration simulation are shown in Fig. 4.2. TheRMSD of the monomer at the end of the simulation is 0.45 nm. Although 0.45 nmis relatively large the global structure of the monomer is conserved. The largestdeviation from the crystal structure involves the residues preceding the first αhelix α1 (residues 6 to 11). This is demonstrated in Fig. 4.2.a which containsthe RMSD of the backbone of the monomer both with and without this region.During the first 3.6 ns of the simulation the RMSD values fluctuate stronglyand are identical with or without this region. After 3.6 ns the RMSD calculatedexcluding this region remains constant at around 0.3 nm whereas the RMSD forthe whole molecule continues to rise. This N-terminal region representing only5% of the sequence dominates the deviation observed with respect to the crystalstructure (33% of the total deviation) as well as the fluctuations observed after3.6 ns. In general the elements of secondary structure are conserved and have the


0 2 4 6 8 10Time (ns)

0

0.1

0.2

0.3

0.4

0.5

RM

SD (n

m)

residues 11-101residues 6-101

Monomer-nce

a

0 2 4 6 8 10Time (ns)

0

0.5

1

1.5

2

RM

SD (n

m)

dimer res. 6-88monomer 1 res. 6-88monomer 2 res. 6-88dimer res. 6-101monomer 1 res. 6-101monomer 2 res. 6-101

DimerB-D-nce

b

Figure 4.2: Root Mean Square Deviation (RMSD) plot showing the RMSD of the back-bone atoms of Monomer-nce (a) and DimerB-D-nce (b) as a function of time for theequilibration simulation of 10 ns with respect to their respective crystal structure. Dif-ferent length of backbone are used in order to show the influence of different segmentson the RMSD value.

same relative positions as in the crystal structure. The local interactions outsidethe 6 N-terminal residues are thus expected to be similar to those in the crystalstructure. Overall, the dimer showed even larger deviation than the monomer(4.2.b). This was unexpected as the dimer was resolved at a higher resolutionthan the monomer. The RMSD also fluctuates significantly with the value at theend of the simulation being 1.09 nm. However, in the case of a dimer, high RMSDvalues do not necessarily indicate conformational changes. Small differences inthe relative orientations of the monomers can also result in high RMSD values.As can be seen in Fig. 4.2.b the RMSD calculated for each of the monomersindependently is much lower than that of the dimer although still very high with0.74 nm for monomer 1 and 0.65 nm for monomer 2. From visual inspection itwas evident that the region of the hinge loop (starting at residue 88) and theexchanged C-terminal β-strand β4 deviate the most with respect to the crystalstructure in both monomers. Excluding this region, the RMSD of monomer 1drops to 0.54 nm and for the monomer 2 to 0.34 nm. It should be noted that thesecondary structure of the dimer is still conserved despite the high RMSD values.The exchanged β-strand of each monomer is still fully part of the four strandedβ-sheet of the other monomer. The RMSD of the dimer is not affected by theexclusion of residues 89-101, indicating that the high value of the RMSD in thedimer is mainly due to changes in the relative orientation of the monomers in theabsence of crystal packing effects.

The free energies associated with the mutations listed in Table 4.1 for themonomer and dimer are reported in Table 4.2. Also listed in Table 4.2 is therelative stability of each of the mutant dimers with respect to the wild type cal-culated from the simulations and determined from experiment. The statistical


Table 4.2: List of the free energies obtained for the mutations of the Suc1 protein(∆GWT→M ) both for the monomer and the dimer using thermodynamic integration.Relative stability of the mutant dimers with respect to the wild type dimer (∆∆GDM)calculated from the simulations of the monomer and the dimer are also reported togetherwith the values obtained from experiment. Statistical errors are indicated in parentheses.

mutations ∆GWT→M (kJ/mol) ∆∆GDM (kJ/mol)mono dimer simulation experiment

V41A 1.5 (0.6) 8.1 (1.7) -5.1 (2.3) -2.7V87A -1.4 (1.3) -7.7 (2.0) 4.9 (3.3) -0.2V89A -5.2 (1.3) 17.1 (2.3) -27.5 (3.6) -2.3L10A 2.9 (1.1) 21.6 (1.9) -15.8 (3.0) -5.2L18A 11.6 (2.9) 25.4 (1.8) -2.2 (4.7) -2.4L43A 9.4 (1.2) 28.7 (2.0) -9.9 (3.2) 1.4L48A 5.7 (1.3) 26.8 (1.8) -15.4 (3.1) 2.9L63A 14.6 (1.4) 21.1 (1.6) 8.1 (3.0) -0.2L74A 11.8 (1.4) 13.9 (2.4) 9.7 (3.8) -4.4L95A 4.6 (1.4) 28.1 (2.0) -18.9 (3.4) 0.4L96A 13.9 (1.0) 12.6 (2.3) 15.2 (3.3) 2.7Y38A 79.5 (1.2) 153.3 (2.8) 5.7 (4.0) 5.3S13A 29.5 (1.1) 56.7 (1.2) 2.3 (2.3) -2.6S79A 27.4 (0.9) 55.3 (1.0) -0.5 (1.9) -2.8K49A 193.1 (1.4) 381.3 (2.4) 4.9 (3.8) -3.5K98A 216.3 (2.5) 450.6 (3.2) -18.0 (5.7) 3.6E86A 276.6 (1.2) 586.9 (4.2) -33.7 (5.4) -7.0


errors are shown in parentheses. The results are also summarized in Fig. 4.3in which is plotted the calculated values (on the x-axis) versus the experimentalvalues (on the y-axis). Clearly, only a small subset of the calculated values matchthe experimental values. While the experimental values range between -7.0 and+5.3 kJ/mol, the calculated values range from -33.7 to +15.2 kJ/mol. The av-erage deviation between the values from the simulations and from experiment is11.3 kJ/mol. Even predicting the relative stability of the dimer calculated forcomparatively simple mutations, such as the 8 mutants involving the transforma-tion of a Leu residue into an Ala proved problematic. The mutation of Leu toAla is still relatively simple only involving the deletion of three interaction siteswith no change in charge. With the exception of the mutation L18A for whichthe difference with the experimental value is only 0.2 kJ/mol, all Leu to Ala mu-tations show large deviations from experiment with the average deviation being11.8 kJ/mol. Clearly given the spread of the values observed any match betweenthe calculations and experiments is purely coincidental. The mutation of a Valresidue into Ala is even more simple only involving the deletion of two interactionsites. This appeared to give better results. Of the three mutations of Val into Alathat were performed, two gave reasonable results, the mutations V41A and V87Afor which the difference between the simulation and experiment were 2.4 kJ/moland 5.2 kJ/mol respectively. However the mutation V89A showed a difference of25.2 kJ/mol. This casts doubt on the reliability of the results obtained for themutations V41A and V87A. The results for the two mutations of Ser into Alawere also reasonable. The deviation from experiment for the mutation S79A wasonly 2.3 kJ/mol and for S13A was 4.9 kJ/mol. This mutation involves the dele-tion of two interaction centers and some charge rearrangement. It is interestingto note that the mutation of the residue Tyr38 into Ala matched closely withexperiment with a difference of 0.4 kJ/mol even though it involved the largestnumber of interaction sites deleted (12 groups deleted) and changes in the chargedistribution. In this case the presence of an aromatic ring in the side-chain limitsits flexibility which might help convergence. The mutations of Lys and Glu intoAla yielded poor estimates of the relative binding affinity for the different dimerswith respect to the wild type. This was, however, not surprising as the changein net charge associated with these mutations results in a large amount of workagainst the system as can be seen in Table 4.2 from the free energy ∆GWT→M

of the mutations K49A, K98A and E86A for the monomer and the dimer. Therelative affinities ∆∆GDM are therefore the result of a small difference betweenlarge numbers and obtaining high accuracy is difficult.

From Fig. 4.3, it is clear that even for identical mutations the relative freeenergies vary significantly. The ∆∆GDM calculated for the Leu to Ala mutationsrange from very negative (L95A, L10A and L48A) to very positive (L96A). Thisraises the question of how different are the individual free energy profiles. Fig.4.4 shows the free energy profiles of all the mutations performed for the Suc1


L10AL18A

L43AL48A

L63A

L74A

L95AL96A

V41AV87A

V89A

Y38A

S13AS79AK49A

K98A

E86A

-35 -30 -25 -20 -15 -10 -5 0 5 10 15 20

∆∆GSimulDM (kJ.mol-1)

-35

-30

-25

-20

-15

-10

-5

0

5

10

15

20

∆∆G

exp

DM

(kJ.

mol

-1)

LA mutationsVA mutationsYA mutationsSA mutationsKA mutationsEA mutations

Figure 4.3: Calculated versus the experimental relative free energy of dissociation formutants with respect to the wild type of the Suc1 protein (∆∆GDM). The diagonalline (slope = 1) corresponds to a perfect match between experimental values and valuescalculated from the simulations.

0 0.2 0.4 0.6 0.8 1λ

-200

0

200

400

600

800

<dH

/dλ>

(kJ.

mol

-1)

VALAYASAKAEA

Figure 4.4: Free energy profiles of all the mutations performed on the monomer ofSuc1. The plots represent the derivative of the free energy with respect to λ as a functionof λ. Identical mutations are represented with the same color. Equivalent mutationshave very similar free energy profiles.


monomer. As can be seen the profiles for each mutation are in fact quite similarindicating that the shape of the profiles are dominated by terms related to thenature of the mutation as opposed to the precise nature of the local environment.In the cases of the mutations KA and EA, a positive and a negative charge aredestroyed respectively. This results in a large amount of work being performedagainst the system as can been seen from the high values of 〈∂H/∂λ〉. In contrast,the mutations VA and LA only involve the mutation of neutral atoms and thusthe values of 〈∂H/∂λ〉 are much lower.

Fig. 4.5 shows the free energy profiles for all of the mutations of Leu and Valresidues into Ala performed in both the monomer and the dimer. The shape ofthe profiles in the dimer is identical to that of the monomer, only the magnitudeis a factor 2 larger. The free energy profiles of the two types of mutations arevery similar. Fig. 4.5 shows however that identical mutations can have largedifferences in their free energy profiles leading to large differences in the relativefree energies. The differences are in general greatest for λ values between 0.4and 0.6 but this is mutation dependant. For example at λ = 0.5 the differencebetween 〈∂H/∂λ〉 of L10A and L63A is ∼ 95 kJ/mol. In contrast the values of〈∂H/∂λ〉λ=0 for the same mutations are very close. The difference between thetwo most extreme values of the LA mutations is 12 kJ/mol. The statistical errorfor λ = 0 is extremely low, usually less than 1 kJ/mol. The magnitude of thestatistical error is dependent on the degree of sampling and the magnitude of〈∂H/∂λ〉.

• If there is a complete sampling of all the representative conformations duringthe time scale of the simulation, the statistical error calculated will be smalland the result reliable.

• However, if only one energy minimum is sampled during the simulation thestatistical error will be small but the result is probably not reliable.

• Alternatively, several conformations may be partly sampled during the sim-ulations. Unless these have close or identical free energy derivative 〈∂H/∂λ〉the estimation of the statistical error will be large and the results uncertain.

Statistical error will be examined in much greater details in the next chapter.Fig. 4.6 shows the evolution of χ1 and χ2 angles as a function of time. These

angles describe the rotamers of all the Leu residues that have been mutated intoAla during the simulation of the monomer at λ = 0 and λ = 0.4. At λ = 0,very few transitions are observed for χ1 while more transitions occur for χ2. Atλ = 0.4 many transitions occur both for χ1 and χ2. This is because close toλ = 0.5 the soft core potential allows atoms to pass through one another. Thisgreatly increases the configurational space accessible to the system. As can beseen from Fig. 4.5, the statistical error calculated for λ = 0.4 is much larger


0 0.2 0.4 0.6 0.8 1

λ

-100

-50

0

50

100

<dH

/dλ>

(kJ.

mol

-1)

L10AL18AL43AL48AL63AL74AL95AL96A

monomer

0 0.2 0.4 0.6 0.8 1

λ

-200

-100

0

100

200dimer

(a)

0 0.2 0.4 0.6 0.8 1

λ

-100

-50

0

50

100

<dH

/dλ>

(kJ.

mol

-1)

V41AV87AV89A

monomer

0 0.2 0.4 0.6 0.8 1

λ

-200

-100

0

100

200

dimer

(b)

Figure 4.5: Free energy profiles of the LA and VA mutations. The graphs show thederivative of free energy with respect to λ as a function of λ for the monomer (left) andthe dimer (right). The error bars represent the statistical error obtained at each λ pointusing block averaging methods. Free energy profiles obtained for (a) the mutations L10A,L18A, L43A, L48A, L63A, L74A, L95A and L96A and (b) the mutations V41A, V87Aand V89A.


0 100 200 300 400Time (ps)

0

90

180

270

3600

90

180

270

360

λ=0.40

0 100 200 300 400Time (ps)

0

90

180

270

360

χ 2 (de

gree

s)

0

90

180

270

360

χ 1 (de

gree

s)

LA10LA18LA43LA48LA63LA74LA95LA96

λ=0.00

Figure 4.6: Evolution of the dihedral angles χ1 and χ2of the Leu residues that are beingmutated into Ala as a function of time. The plots correspond to the simulations of themonomer for λ = 0 and λ = 0.4.

than for λ = 0. This suggests that much longer simulation times at intermediateλ values would be needed to obtain convergence. The small statistical error atλ = 0 does not necessarily imply that the results are more reliable, only thata limited range of conformational states could be sampled. However, the factthat the different LA mutations using different starting structures have similar〈∂H/∂λ〉suggests that different conformations have similar 〈∂H/∂λ〉 at λ = 0.

The statistical error for λ = 1 was also extremely low, usually around 1 kJ/mol.This can be explained by the fact that λ = 1 corresponds to the Ala residue whichhas no alternative rotamer states. However, should λ = 1 correspond to a differentresidue with a side chain capable of alternative conformations, the situation wouldbe the same as for λ = 0.

Different values of the soft-core parameter α were investigated in order to eval-uate the influence of α on the convergence of the free energy. Increasing α lowersthe soft-core height (making the potential softer). Note, changing the α parameterdoes not alter the end states of the mutation so does not affect the overall changein free energy. Changing α does, however, change the nature/distribution of thestates sampled at intermediate values of λ. In Table 4.3 the relative stability of theL95A mutant dimer with respect to the wild type dimer calculated using differentvalues of the soft-core parameter α is given. Changing the soft-core parameter α


Table 4.3: Relative stability of the Ala95 mutant dimer with respect to the wild typedimer (Leu95) calculated with different values of the soft-core parameter α. The value1.51 for the α parameter corresponds to the default value used in the simulations. Theresults are to be compared with the experimental value ∆∆GDM = 0.4 kJ/mol.

α ∆GWT→M (kJ/mol) ∆∆GDM (kJ/mol)mono dimer

1.21 2.7 (2.1) 26.2 (1.3) -20.8 (3.4)1.36 7.2 (2.1) 22.3 (1.5) -7.9 (3.6)1.51 4.6 (1.4) 28.1 (2.0) -18.9 (3.4)1.70 8.0 (1.4) 24.7 (2.1) -8.7 (3.5)

does not affect the convergence significantly. The result closest to experiment wasobtained using α = 1.36 and corresponded to a value of ∆∆GDM = −7.9 kJ/mol.Although this was significantly closer to the experimental value of 0.4 kJ/molthan the result obtained using the default value of α no systematic relationshipbetween α and the degree of convergence is evident. The next closest value toexperiment was obtained for α=1.70.

The discrepancy between the values obtained from experiment and the val-ues calculated from the simulations can have different origins. The force field ispotentially the primary source of error. The accuracy of the force field can varydepending on the type of residue under consideration [56]. For this reason it isnot straightforward to quantify the accuracy of a force field in a global sense [59].Furthermore, the results observed in various tests can be greatly influenced by theprotocols used to perform the simulations and these protocols might be differentfrom the one used initially for the force field parameterization.

In the case of the calculation of the (non physical) free energy associated withthe mutation of a residue into another one, it is clear that an accurate descriptionof both the wild type and the mutant protein is required to obtain the correctanswer. The solvation free energies for analogs of several of the residues involvedin the mutations considered in this study have been shown to be quite accuratein different environments [56] using the GROMOS96 force field. The solvationfree energies of analogs of hydrophobic amino acids such as Ala, Val and Leuare within 2 kJ/mol of the experimental value in both water and cyclohexane.Thus, from a force field perspective the mutations of Val or Leu into Ala wereexpected to performed well irrespective of whether the residues are exposed tosolvent or buried inside the protein. However, the results for the 8 mutations ofa Leu into Ala show discrepancies with experiment much larger than 2 kJ/mol,the average deviation being 11.8 kJ/mol. Furthermore, as can be seen from Fig.4.3 the values obtained for the identical mutations of Leu to Ala are randomlydistributed clearly indicating that the major source of error is not directly relatedto the accuracy of the force field for these specific amino acids.


The accuracy of the force field can also have indirect effects on the result. Theforce field determines which conformations are sampled during the simulationsand therefore determines the local environment around the mutation site. Theforce field must be able to yield the correct structure if the free energy associatedwith a given mutation is to be estimated correctly.

The accuracy of the starting structures used for the simulations is also a crucialfactor affecting the reliability of the calculations. This is especially true since therelaxation of the protein towards its lowest free energy conformations is not pos-sible within the time scale of the simulation of the mutation. Both the monomerand the dimer were equilibrated for 10 ns. Thermal equilibration occurs on amuch shorter time scale. The extended equilibration time was intended to givethe structures the possibility to relax within the force field and to adjust to theremoval of crystal packing effects. If the starting structure was incorrect it wouldbe expected to deviate significantly during the course of a MD simulation. Sincethe results of free energy calculations are directly related to the local interactionsaround the mutation site it is extremely important that the structure remainsclose enough to the native structure so that the local environment around themutation site is appropriate. The resolution of the crystal structure of the dimer(2.2 A) was higher than that of the monomer (2.7 A). After the equilibrationperiod of 10 ns both the structure of the monomer and the dimer had deviatedfrom the starting crystal structure, with the monomer deviating to a much lesserextent than the dimer. Most of the deviation observed in the structure of thedimer was due to a change in the relative orientation of the monomers. The sec-ondary structure of the monomers was retained. The difficulty is knowing if thesechanges reflect problems in the force field or an inappropriate starting structuredominated for example by packing constraints.

Statistical errors in the calculations were estimated using a block averagingprocedure. While this approach is widely used, it is based on the assumption thatall thermally relevant conformations have been sampled appropriately. If only afew local minima are sampled during the course of a simulation, the statisticalerror will be underestimated. This can be tested by increasing the length of asimulation or by altering the initial conditions. If all the relevant regions of theconformational space have been sampled, increasing the simulation time shoulddecrease the statistical error. An increase in the statistical error indicates thatnew conformations have been sampled. In the case of ∆∆GDM , statistical errorswere calculated by summing the statistical errors obtained for ∆GWT→M of themonomer and the dimer. The value of the statistical error estimated in this wayshould correspond to an upper limit as internal compensation of errors would beexpected to lead to a lower value. The statistical errors calculated for ∆∆GDM

of the different mutations are given in Table 4.2 and indicated by parentheses.The values range from 1.9 kJ/mol to 5.7 kJ/mol. The average statistical erroris 3.5 kJ/mol. Only the mutations L18A and Y38A give results that are correctwithin the statistical error while the mutations V41A and S79A are also veryclose. In most of the cases however, the statistical error does not account for

4.4. Conclusion 83

the discrepancy between simulation and experiment. On average the differencebetween simulation and experiment is 11.3 kJ/mol while the statistical error isonly 3.5 kJ/mol. Clearly the statistical error calculated using the block averagingprocedure drastically underestimates the true uncertainty in the results. Thisis also indicated by the fact that small changes in the protocol such as using adifferent value of α leads to variations in the free energy much greater than theapparent statistical error within one calculation.

Numerical errors can also arise from the integration over a small number ofdiscrete values of the free energy derivative. Integrating using the trapezoidal rulewhich corresponds to using a linear interpolation scheme gives similar results asusing a cubic spline interpolation scheme. The biggest difference observed betweenthe two schemes is 1.7 kJ/mol and corresponds to the mutation of Glu 86 into Alain the dimer. This corresponds to 0.3 % of the value of the free energy obtainedfor this mutation. However, on average the difference is extremely small with 0.5kJ/mol calculated over all the mutations studied.

The source of error that most likely explains the large discrepancies observedbetween simulation and experiment is simply poor sampling. The difficulty ofthe system to cross large energy barriers, mainly associated with the rotation ofdihedrals, effectively limits the conformations that can be sampled. The timescale on which the simulations are performed is simply too short whether theside chains described by the dihedrals are buried or supposedly free in solution.There are multiple minima that the system can not explore. Different simulationswill sample different conformations leading to “structural hysteresis”. This can bequantified by performing simulation in the reverse direction thus starting from adifferent conformation and testing whether the simulations have converged.

4.4 Conclusion

Free energy calculations in proteins using the thermodynamic integration methodstill remain extremely difficult even for very simple mutation such as the trans-formation of Leu into Ala. While small mutations involving the deletion of upto two groups with or without charge rearrangements (mutations of Val or Serinto Ala respectively) gave reasonable results, the results of the series of Leu toAla mutations appears to be randomly distributed when plotted versus the ex-perimental results. Although the size of the mutation seems to be related to thediscrepancy observed between simulation and experiment, the results obtained forthe mutation of Tyr into Ala suggests that other factors such as the rigidity ofthe side-chain might help convergence. Mutations of residues associated with achange in net charge whether positive or negative (mutations of Lys or Glu intoAla) do not give accurate results due to the large amount of work done againstthe system. In order to explain the results observed for the series of Leu to Alamutations, several sources of errors were evaluated from the quality of the startingstructure to the quality of the force field or even the integration scheme used to


evaluate the results. Due to the random character of the results obtained for theLA mutation series, the force field effect could be ruled out as main cause of thediscrepancy between simulation and experiment. The main source of error is be-lieved to be the sampling error. Simulations starting from different conformationscan be used to confirm this assumption and will be the focus of the next chapter.

85

Chapter 5Sampling and convergence in free energy

calculations: Suc1 as a case study

Convergence is a critical problem in free energy calculations. The accuracy andreliability of the calculations depend on it. In this chapter factors that affectthe convergence properties of free energy calculations and related sources of errorboth sampling and statistical are investigated using three mutations of the Suc1protein (L74A, L95A and V89A) described in the previous chapter as examples.In particular, two common methods used to calculate statistical errors in freeenergy calculations are reviewed and their reliability assessed.

86 Chapter 5. Sampling and convergence in free energy calculations


5.1 Introduction

To be able to accurately estimate differences in free energy between different statesof a system from computer simulations is one of the ultimate goals of molecularmodeling. However, as we have shown predicting the effect of amino acid substitu-tions on protein-protein interactions is still extremely difficult even for relativelysimple mutations. In the previous chapter, a series of free energy calculationswere performed in order to study the effect of various mutations on the stabil-ity of the dimer of the protein Suc1. Agreement with experiment was extremelypoor. Different possible sources of error were evaluated. It was concluded thatthe main reason for the discrepancy between simulation and experiment was likelydue to insufficient sampling. It was simply not possible to sample a representativeproportion of the phase space accessible to the molecule. Many studies have ad-dressed the problem of convergence in free energy calculations in order to bettercharacterize their accuracy and reliability [72, 57, 73, 74, 61, 62]. Convergenceis a major consideration in free energy calculations as it has been found thatthe amount of sampling needed to obtain reliable results is often significantlygreater than that frequently used to perform the calculations. Many studies haveexplored the convergence behavior of free energy calculations in a very detailedmanner generally overcoming the sampling problem by performing long simula-tions of very simple systems [72, 57, 73, 74]. Other studies have focused on morecomplicated systems such as peptides or proteins [61, 62] in which the lengths ofthe simulations were typically orders of magnitude too short to obtain relevantstatistics on the different conformations of the systems under study. Poor sam-pling results in that different parts of the conformational space will be sampled bydifferent starting structures. The “conformational hysteresis” resulting from thedifferent distinct regions of conformational space being sampled by the systemwill be referred to here as the sampling error. In principle, the sampling errorarises from the fact that the system is non-ergodic on the timescale of the sim-ulation. Another source of apparent error in free energy calculations stems fromthe inability to sufficiently explore the local region of phase space. Here this willbe referred to as the statistical error.

In order to evaluate the sampling error, simulations of the mutations L74A,L95A and V89A were performed according to three different schemes. 1) Thestarting configurations corresponding to different values of λ were generated se-quentially in an ascending order as in chapter 4. 2) The starting configurationscorresponding to different values of λ were generated in decreasing order corre-sponding to the reverse mutation. 3) The order in which configurations for specificvalues of λ were generated was randomized.

Another focus of this chapter is the evaluation of the statistical errors. Thetwo principal methods used to calculate the statistical error are reviewed andtheir relevance and limitations examined in the light of the results obtained forthe different mutations. The first method is based on a block averaging procedurewhile the second method takes explicitly into account the correlation in the data


to calculate the statistical error.In the final part of this chapter the connection between sampling error and

statistical error is discussed. The convergence of the free energy as a functionof the simulation time is also investigated for the mutations L74A and L95A byextending the length of the simulations by more than an order of magnitude forspecific λ points.

5.2 Background

For a stationary (equilibrated) series of data (for example an ensemble of confor-mations generated during a simulation), the mean value of a property X is givenby

X =1

n

n∑

i=1

X(i) (5.1)

where n is the number of data points collected. The variance of the individualpoints of the series is given by

σ2(Xi) =1

n − 1

n∑

i=1

(Xi − X)2 (5.2)

which is nothing other than an estimate of the width or the variability of thedistribution around the average value. The variance σ2 of the mean value of theseries is given by

σ2(X) =σ2(Xi)

n(5.3)

The standard deviation σ is usually used as a measure of the error in X. Howeverequation 5.3 is only valid when the data are uncorrelated. This is usually not thecase in MD computer simulations where configurations are by their nature corre-lated in time. Two approaches are commonly used to account for the correlationin the data.

In the first approach the variance is calculated from the deviation in the aver-ages over sub series that are considered uncorrelated. The n data points are splitinto m subblocks of p conformations each such as n = m · p. The mean of each ofthe m subblocks is determined by

Xl =1

p

lp∑

k=(l−1)p+1

X(k) (5.4)

with l = 1, ..., m. The overall average is obtained by averaging the means of them subblocks,

5.2. Background 89

X =1

m

m∑

l=1

Xl (5.5)

The variance σ2 of the means of the subseries is given by

σ2(X) =σ2(Xl)

m(5.6)

where σ2(Xl) is the variance of the m individual points of the subseries. It canbe seen easily that equations 5.3 and 5.6 are identical and that equation 5.6 isonly valid if p is sufficiently large such that the m subblocks are independent.In practice, the minimum size for which the subblocks are independent is notknown. To overcome this difficulty the variance of the mean can be calculated forsubseries of increasing size until a plateau is reached. However, for large values ofp there may be too few subblocks to obtain accurate statistics. In such cases thevariance may be extrapolated from an analytical function which is fitted to theblock averages calculated for small p values.

In the second approach the correlation in the data is treated explicitly withthe inclusion of the autocorrelation function of the data series in the calculationof the statistical error. Whereas the variance of the mean of the series is givenby equation 5.3 in the case of uncorrelated data, the variance of the mean for aseries of correlated data can be approximated as

σ2(X) =σ2(Xi)

n/(1 + 2τ)(5.7)

where σ2(X) and σ2(Xi) have the same meaning as previously and τ is the “cor-relation length” of the series:

τ =

n−1∑

k=1

(1 − k/n)ρk (5.8)

where ρk is the autocorrelation function for two data points separated by k − 1data points. τ can be seen as a weighted sum of the autocorrelation function ρk.If the number n of data points collected is much larger than the value of k atwhich ρk falls to 0 then τ can be approximated by

τ =n−1∑

k=1

ρk (5.9)

When the data points are uncorrelated, τ = 0 and σ2(X) = σ2(Xi)/n as givenpreviously. The effect of correlation is to reduce the effective number of datapoints by a factor (1+2τ) as only one every (1+2τ) points can be considered trulyindependent. The quantity (1 + 2τ) is called the “sampling ratio” and determineshow frequently a system has to be sampled in order to obtain a single statisticallyindependent point.


5.3 Method

The free energy calculations were performed as described in chapter 4 usingmolecular dynamics simulations together with the thermodynamic integration(TI) method (see chapter 4/method for the details of the simulation parame-ters). Two additional series of simulations were performed. The aim of the firstseries of simulations was to generate new starting conformations at each value ofλ for the mutations V89A, L74A and L95A. In principle the value of < dH/dλ >λ

should be independent of the starting structure so long as the starting conforma-tion is a representative conformation of the system at equilibrium and the systemis sampled sufficiently long to ensure that the average obtained is representativeof an equilibrium ensemble. However, because the simulations are performed se-quentially i.e. λ = 0.0, λ = 0.1, λ = 0.2... λ = 1.0 the starting structuresare correlated. This means that a similar region of conformational space maybe sampled at each of the different λ points. Two different schemes were usedto evaluate the degree to which the free energy values were dependant on thestarting structures and the order in which the λ values were sampled.

1. Simulations at specific λ values were performed in the reverse direction tothat used in chapter 4, starting from λ = 1 (corresponding to the Alamutant) to λ = 0 (corresponding to wild type Suc1). This meant that inter-actions sites were created as opposed to being deleted. The free energy ofeach mutation was again calculated using 18 λ-points. The starting struc-ture used for the reverse mutation (at λ = 1) was the last conformationobtained from the forward mutation at λ = 1, that is the conformation ofSuc1 at 400 ps for the monomer and at 200 ps for the dimer. The startingstructures used for the simulations at the remaining λ-points were obtainedsequentially as described in chapter 4. The system was relaxed for 100 ps ata given λ value and the last conformation was used as the starting structurefor the next λ value.

2. Simulations were performed from λ = 0 to λ = 1 but with intermediateλ values simulated in a non sequential order. The order of the λ valueswas randomized in order to reduce the correlation between the startingstructures. The simulations were performed in the following order:λ= 0.00, 0.30, 0.05, 0.70, 0.40, 0.50, 0.80, 0.45, 0.02, 0.20, 0.65, 0.95, 0.55,0.98, 0.10, 0.60, 1.00 and 0.90.

The aim of the second series of simulations was to examine the convergence atspecific λ values by performing extended simulations for the mutations L74A andL95A (in the forward and reverse direction and using the random scheme for theλ values). The simulation time was increased for λ values at which <dH/dλ>showed the greatest variation between different trials (mutations in the forward,reverse or random direction). The simulation length was extended from 400 ps


Table 5.1: List of the λ values for which extended simulations were performed for theL74A and L95A mutations both for the monomer and the dimer.

λ (monomer) λ (dimer)

L74A 0.45. 0.50, 0.55, 0.60 0.45. 0.50, 0.55, 0.60L95A 0.40, 0.45, 0.50, 0.55 0.30, 0.40, 0.45, 0.50

to 6 ns for the monomer and from 200 ps to 4 ns for the dimer. The values forwhich extended simulations were performed are indicated in Table 5.1.

5.4 Results and discussion

The free energy associated with the mutations V89A, L74A and L95A of themonomer and dimer of Suc1 using different starting structures and schemes arereported in Table 5.2. The statistical error was calculated using a block averagingprocedure and is given in parenthesis. Also listed in Table 5.2 is the relative sta-bility of the mutant dimer with respect to the wild type dimer (∆∆GDM for eachmutation) calculated for the different schemes. For each mutation, the largestdifference between ∆∆GDM calculated from different schemes can be interpretedas an estimation of the sampling error. While the statistical error based on thefluctuations in < dH/dλ >λ ranges between 2.5 and 6.1 kJ/mol, the difference be-tween ∆∆GDM calculated for the same mutation using different mutation schemesare much larger. This indicates that the individual calculations of these muta-tions have not converged. The greatest difference between ∆∆GDM is observedin the case of the mutation V89A where it is equal to 59.7 kJ/mol (difference be-tween the forward and the reverse simulations). This is almost a factor 20 largerthan the apparent statistical error calculated for this mutation (3.6 kJ/mol). Thesmallest sampling error is obtained for the mutation L74A (13.6 kJ/mol) whichis still a factor 2 larger than the largest apparent statistical error (6.1 kJ/mol)for this mutation. Note, the largest difference between ∆∆GDM does not alwaysoccur between the simulations performed in the forward and the reverse direction.While this was the case for mutations V89A and L74A, for mutation L95A thelargest difference was between the reverse and the random scheme.

To fully understand the factors that affect the convergence of the calculated∆∆GDM values, analysing the final value of ∆∆GDM calculated for the differentschemes is insufficient. For one compensation of errors between the free ener-gies associated with the transformation of wild type Suc1 into the correspondingmutant ∆GWT→M for the monomer and the dimer may give rise to apparent con-vergence even if the two systems are far from equilibrium. One needs to examinethe contribution of ∆GWT→M of the monomer and the dimer and ultimately atindividual λ values separately. Note, in the dimers two sites are mutated simulta-


Table 5.2: The free energies obtained for the mutations V89A, L74A and L95A of Suc1(∆GWT→M) when performing the simulations in the forward and reverse direction butalso in a random order. The results for the monomer and the dimer are expressed as∆GWT→M for ease of comparison. The relative stability of the mutant dimer with respectto the wild type dimer (∆∆GDM) calculated from the simulations of the monomer andthe dimer are also reported together with the values obtained from experiment. Statisticalerrors are indicated in parentheses.

mutations ∆GWT→M (kJ/mol) ∆∆GDM (kJ/mol)monomer dimer simulation experiment

V89A -5.2 (1.3) 17.1 (2.3) -27.5 (3.6) -2.3V89A reverse -2.5 (1.0) -37.2 (1.7) 32.2 (2.7)V89A random 0.5 (0.7) -5.9 (1.8) 6.9 (2.5)L74A 11.8 (1.4) 13.9 (2.4) 9.7 (3.8) -4.4L74A reverse 1.8 (3.2) 7.5 (2.9) -3.9 (6.1)L74A random 11.3 (3.3) 21.6 (2.1) 1.0 (5.4)L95A 4.6 (1.4) 28.1 (2.0) -18.9 (3.4) 0.4L95A reverse -0.4 (1.6) 3.4 (1.4) -4.2 (3.0)L95A random -3.5 (1.4) 24.2 (2.8) -31.2 (4.2)

neously. In order to make a direct comparison between the monomer and dimer,results for the monomer should be multiplied by a factor 2. Table 5.2 shows thatsimulations of the monomers are not significantly more converged than the simu-lations of the dimer on this timescale. The one exception is that of the mutationV89A for which the sampling error is 54.3 kJ/mol for the dimer and 5.7 kJ/mol forthe monomer. In this case the sampling error of the dimer completely dominatesthe sampling error of ∆∆GDM and illustrates the variation that can be obtainedwith slight differences in starting structures. In the case of the mutation L74A theresults for the dimer show less spread than for the monomer. Again the apparentstatistical error is too small to account for the discrepancies between the differentschemes.

∆GWT→M results from the integration over a number of λ values. Anotherapproach to assess convergence is to directly compare the free energy profiles.Figure 5.1 shows the free energy profiles for the mutations V89A, L74A and L95Ain both the monomer and the dimer using different schemes. As can be seen theprofiles show considerable variation. Compensation within the free energy profilesin some cases leads to similar values of ∆G even though the free energy profilesdiffer significantly. Clearly, intermediate <dH/dλ> values have not converged.This can be seen most clearly in Fig. 5.1.b for the mutation L74A in the monomerwhich shows large differences between the forward and the random schemes eventhough ∆GWT→M for the two schemes are very similar (11.8 kJ/mol and 11.3


kJ/mol for the forward and the random schemes respectively). Figure 5.1 alsoshows that apart from the mutation V89A of the monomer, there is significanthysteresis between the forward and reverse simulations in all cases.

5.4.1 Sampling error and statistical errors in free energycalculations

Sampling error varied greatly depending on the λ value. For λ points close to 0and 1, the ensemble average <dH/dλ> appears to converge significantly fasterthan for λ points around 0.5. For λ = 0 and 1, the variation in <dH/dλ> was onaverage 2.5 and 4.7 kJ/mol respectively in the case of the monomer whereas forλ = 0.5 the average variation between the different schemes was 31.3 kJ/mol. Thestatistical errors were also much smaller around 0 and 1 than for λ values around0.5. Figure 5.1 shows the statistical error at each λ point for the mutations V89A,L74A and L95A.

Convergence around λ = 0 or 1: small sampling error and smaller sta-

tistical error.

Table 5.3 shows the values of the statistical error in <dH/dλ> calculated bythe two different methods at several λ values for the mutation L74A performedusing different schemes. <dH/dλ> seems to have converged to similar valueswith small statistical errors for λ values close to 0 and 1 despite differences instarting structure. This suggests, first, that the conformational freedom is veryrestricted at these λ values and, second, that different conformations have verysimilar <dH/dλ> values. Table 5.3 also shows the variance of the series andthe “sampling ratio” 1+2τ calculated at several λ values for the mutation L74A.Interestingly, although <dH/dλ> seems to be almost converged for λ around 0and 1, the statistical error still can not account for the differences between thedifferent schemes.

Table 5.3 shows that <dH/dλ> for the mutation L74A converged to within4.5 kJ/mol for the forward and the reverse schemes at λ = 0. However, theapparent statistical error calculated using the block averaging procedure is ap-proximately an order of magnitude smaller (0.4 and 0.8 kJ/mol for the forwardand the reverse mutations respectively). The apparent statistical error based onthe autocorrelation function was even slightly smaller again being 0.3 kJ/mol inboth cases.

Figure 5.2.a shows the local average of <dH/dλ> calculated over 2 ps blocksfor the mutation L74A at λ = 0 together with the cumulative average and the finalaverage over the whole period. The fluctuations of the local average are centeredaround the final average which suggests that the fluctuations corresponds to onesingle conformation. This is confirmed by Fig. 5.2.b which shows the evolution of


0 0.2 0.4 0.6 0.8 1λ

-200

-150

-100

-50

0

50

100

<dH

/dλ>

(k

J m

ol-1

)

V89AV89A reverseV89A random

monomer

0 0.2 0.4 0.6 0.8 1λ

-400

-300

-200

-100

0

100

200

V89AV89A reverseV89A random

dimer

(a)

0 0.2 0.4 0.6 0.8 1λ

-100

-50

0

50

100

<dH

/dλ>

(k

J m

ol-1

)

L74AL74A reverseL74A random

monomer

0 0.2 0.4 0.6 0.8 1λ

-200

-100

0

100

200


dimer

(b)

0 0.2 0.4 0.6 0.8 1λ

-100

-50

0

50

100

<dH

/dλ>

(k

J m

ol-1

)


monomer

0 0.2 0.4 0.6 0.8 1λ

-200

-100

0

100

200


dimer

(c)

Figure 5.1: Free energy profiles for both the monomer and the dimer of the mutations(a) V89A (b) L74A and (c) L95A using different starting structures.


Table 5.3: Sampling characteristics of the mutation L74A of the monomer of Suc1for simulations at λ = 0, 0.40, 0.45, 1 for different starting structures. <dH/dλ> andσ(X) are in kJ/mol and the sampling ratio 1+2τ is in ps. The statistical error σ(X) wascalculated both by treating the correlation of the data explicitly (Autocorr.) and using theblock averaging procedure (B.A.).

mutations λ <dH/dλ> σ2(Xi) 1+2τ σ(X)Autocorr. B.A.

fwd 0.00 35.2 29.3 1.5 0.3 0.4rev 0.00 39.7 28.8 1.0 0.3 0.8rand 0.00 same as for fwdfwd 0.40 -28.2 1.77 103 6.4 5.3 6.9rev 0.40 -40.1 2.42 103 17.3 10.2 16.4rand 0.40 -51.4 2.61 103 25.4 12.9 36.3fwd 0.45 -29.5 1.56 103 0.2 1.0 4.3rev 0.45 -61.8 3.99 103 33.8 18.4 48.1rand 0.45 -37.4 2.20 103 3.8 4.6 6.9fwd 0.50 -45.4 2.04 103 0.5 1.7 5.8rev 0.50 -113.7 5.62 103 3.0 6.4 12.0rand 0.50 -20.1 1.92 103 10.1 7.0 12.1fwd 1.00 58.8 35.1 18.3 1.3 1.5rev 1.00 55.4 44.0 42.3 2.2 3.3rand 1.00 50.2 35.0 5.7 0.7 1.2


0 100 200 300 400Time (ps)

25

30

35

40

45

<dH

/dλ>

(kJ

mol

-1)

local average over 2 pscumulative averagefinal average

a

0 20 40 60 80 100Block size (time)

0

0.1

0.2

0.3

0.4

0.5

Err

or e

stim

ate

σ using block averagingσ extrapolated

c

0 100 200 300 400Time (ps)

0

50

100

150

200

250

300

Dih

edra

l ang

le (d

egre

es)

χ1χ2

b

0 50 100 150 200Time (ps)

0

0.2

0.4

0.6

0.8

1

C(t

)

d

Figure 5.2: (a) Free energy derivative of the mutation L74A as a function of thesimulation time at λ = 0. The local average calculated over 2 ps, the cumulative averageand the total average are plotted on the same graph. (b) Evolution of the dihedral anglesχ1 and χ2of the Leu 74 residue as a function of time. (c) Evolution of the statisticalerror as a function of the block size. (d) Autocorrelation function as a function of thelag between the points.


the dihedrals χ1 and χ2 of Leu 74 as a function of the simulation time. The valueof χ1 fluctuates around 190 degrees and χ2 fluctuates around 60 degrees duringmost of the simulation indicating that only one conformation is sampled duringthat period. After 380 ps there is a sharp increase in the values of both χ1 and χ2

indicating a possible transition. <dH/dλ> also increases around that time (seeFig. 5.2.a).

Figure 5.2.c shows the statistical error calculated using block averaging forblocks of increasing size. As in effect only one conformation is sampled duringthe simulation the sub-averages are calculated over similar structure hence thevariance of the subseries is very small.

Figure 5.2.d shows the autocorrelation function as a function of the lag betweenthe points. It can be seen that the correlation decreases extremely quickly. Theautocorrelation coefficient is less than 0.2 after only 0.1 ps. The “sampling ratio”calculated over the complete simulation is 1.2 ps implying that points separated by1.2 ps are independent. As a consequence the statistical error which is calculatedunder the assumption that many independent points have been sampled is small.In reality the points are all correlated. The problem is that the behaviour ofthe autocorrelation function is dominated by large rapid fluctuations in dH/dλ.These large fluctuations come from the local motions of the mutated group andare rapidly averaged leading to the small value calculated for the statistical error.The effect of lower amplitude slow modes corresponding to conformational changesare not reflected in the autocorrelation function.

In fact the statistical error calculated may be correct but only reflects sam-pling of a local conformation. If the “real” statistical error of <dH/dλ> is tobe calculated, all the structures sampled are to be considered correlated and thenumber of effective data points n/(1+2τ) is equals to 1. The estimate of thestatistical error in this case corresponds to the standard deviation of the series:σ(X) = σ(Xi) = 5.4 kJ/mol. Although this would explain the discrepancy of theforward and reverse scheme for <dH/dλ>, the calculation of a statistical errorbased on a single independent point has in principle little meaning. However, incases where data is highly correlated, the statistical error calculated by the othertwo methods will be greatly underestimated. In these cases simply using the stan-dard deviation of the series would provide a more realistic estimate. While notstrictly correct it can be seen as an estimate of the upper bound for the statisticalerror as the standard deviation gives the maximum fluctuation over the periodunder study.

Convergence around λ = 0.5: large statistical error and larger sampling

error.

As can be seen from Fig. 5.1, the greatest discrepancies between different schemesare observed for λ values between 0.4 and 0.6. From Table 5.3 it can be seenthat the difference in <dH/dλ> between the reverse and the random scheme ofthe mutation L74A for λ = 0.5 is 93.6 kJ/mol. The apparent statistical errors


-200

-150

-100

-50

0

<dH

/dλ>

(kJ

mol

-1)

local avarage over 2 ps

0 100 200 300 400Time (ps)

0

60

120

180

240

300

360

Dih

edra

l ang

le (d

egre

e)

χ1χ2

a

0 50 100 150 200Time (ps)

-0.2

0

0.2

0.4

0.6

0.8

1

C(t

)

d

0 100 200 300 400Time (ps)

-400

-350

-300

-250

-200

-150

-100

-50

0

<dH

/dλ>

(kJ

mol

-1)

local average over 2 ps

0

60

120

180

240

300

360

Dih

edra

l ang

le (d

egre

e)

χ1χ2

b

0 50 100 150 200Time (ps)

-0.2

0

0.2

0.4

0.6

0.8

1

C(t

)e

0 100 200 300 400Time (ps)

-200

-150

-100

-50

0

<dH

/dλ>

(kJ

mol

-1)

local average over 2 ps

0

60

120

180

240

300

360

Dih

edra

l ang

le (d

egre

e)

χ1χ2

c

0 50 100 150 200Time (ps)

-0.2

0

0.2

0.4

0.6

0.8

1

C(t

)

f

Figure 5.3: Free energy derivative averages and evolution of the dihedral angles χ1 andχ2 of Leu 74 as a function of the simulation time for the mutation L74A performed at λ =

0.5 in the case of (a) the forward direction (b) the reverse direction and (c) the randomscheme. Local averages are calculated over 2 ps subblocks. (d)(e)(f) Autocorrelationcoefficients as a function of the lag between the points for the corresponding simulationon the left side.


calculated using block averaging at λ = 0.5 is at most 12.1 kJ/mol, less whencalculated using the autocorrelation function. Although quite large, the statisticalerror can still not account for the discrepancies. Figures 5.3.[a, b, c] shows theevolution of the angles χ1 and χ2 as a function of time for the mutation L74A inthe different schemes and indicates that several conformations are sampled duringeach of the simulations. The fact that there is a large increase in sampling aroundλ = 0.5 is due to the use of a soft core potential to perform the mutation. Soft-corepotentials are used to avoid numerical instabilities occurring in the calculationswhen annihilating or growing a particle due to the singularity in the Lennard-Jones potential when the interparticle distance is 0. Figure 5.4 shows the soft-core interactions at λ = 0.5 as a function of the distance r between interactinggroups for different values of the soft-core parameter α. When the interparticledistance r approaches 0 the soft-core interactions become constant as opposed toinfinite in the standard Lennard-Jones interaction (α = 0). The value of the soft-core parameter used in the simulations (α = 1.51) with the λ values around 0.5correspond to the region of the potential where atoms that are being annihilatedjust start passing through each other. As a consequence, new conformations aresampled for the first time. At λ = 0 the interaction corresponds to the fullpotential, the atoms are fixed in position by surrounding groups and the positionitself has relatively little influence on the free energy derivative. At λ = 1 atomscan pass freely through each other as the interactions are turned off. They cantherefore occupy any position but their contribution to the free energy derivativeis very small.

The autocorrelation coefficients calculated for the three different schemes arepresented in Fig. 5.3.[d, e, f]. They decrease as quickly at λ = 0.5 as for simu-lations at λ = 0. This indicates that again only contributions due to fast modesare captured by this method. As a consequence, the statistical errors indicated inTable 5.3 calculated using the autocorrelation function are vastly underestimatedfor λ = 0.5.

Also in these cases the standard deviation of the series provides a better es-timate of the true error than the other two methods. For λ = 0.4 and 0.45 theaverage standard deviations calculated over the different schemes are 47 kJ/moland 50 kJ/mol respectively while the sampling errors are 23 kJ/mol and 32 kJ/molrespectively. For λ = 0.5 the standard deviation (55 kJ/mol) can not account forthe discrepancy between the different schemes (94 kJ/mol) but still provides amuch better estimate of the real statistical error.

Special cases:

As opposed to the previous cases, the statistical error for the mutation L74Acalculated with the block averaging procedure for the random scheme at λ = 0.40and the reverse scheme at λ = 0.45 do not appear to be underestimated. Table5.3 shows that σ(X) reaches 36.3 and 48.1 kJ/mol for the random and the reverseschemes respectively. These values can account for the differences between the


Figure 5.4: Soft-core interactions at λ = 0.5, with the parameters of the Lennard-Jonespotential set to CA

6 = CA12 = CB

6 = CB12 = 1.

different schemes but how reliable are they?

Figures 5.5.a and 5.5.b show the free energy derivative for these two simu-lations as a function of time. They show that two distinct states are sampledconnected only by one transition in both cases. The subblocks calculated beforethe transition will contain only the first state whereas the subblocks calculatedafter the transition will contain the other state. As a result, when the size of thesubblocks approaches the lifetime of the first state the apparent statistical errorprovides a realistic estimate of the error in the series as the subblocks becometruly independent.

However only two states are sampled during the course of these two simulationsand although the statistical error calculated accurately describes the fluctuationin the series, the high value of the error mainly tells us that there is a clear lackof statistics for the system.

5.4.2 Statistical error vs sampling error

Although they have been discussed separately the statistical error and the sam-pling error both reflect the extent of sampling of the system and in principlerepresent the same thing. However, because of the way the calculations are per-formed, they have in practice different meanings. The statistical error representsthe spread of the values in the distribution of the data. It corresponds to thedeviation of the average values of different series of data which occurs because theensemble of states sampled is never exactly the same in the different series. To bemeaningful it is required that they contain most if not all of the relevant statesof the system.

The sampling error on the other hand represents the fact that each series ofdata samples only part of the different states available to the system. Each series


0 100 200 300 400Time (ps)

-200

-150

-100

-50

0<d

H/d

λ> (k

J m

ol-1

)


a

0 100 200 300 400Time (ps)

-200

-150

-100

-50

0

<dH

/dλ>

(kJ

mol

-1)


b

Figure 5.5: Free energy derivative averages as a function of the simulation time for (a)the mutation L74A reverse at λ = 0.45 and (b) the mutation L74A random at λ = 0.40.Local averages calculated over 2 ps subblocks, cumulative averages and total averages arealso represented.

therefore represents a different state of the system and unless these different stateshave identical values the average of each series will be different, the differencecorresponding to the sampling error.

Limited sampling will always remain a problem in free energy calculationsespecially in protein systems. Fortunately, in many cases we are only interestedin the free energy of a specific state of a system. In fact the application ofa thermodynamic cycle to determine the difference in free energy between themonomeric and dimeric states requires that the dimer remains stable for theduration of the simulations. The problem is that often in free energy calculationsonly the apparent statistical error calculated from a single series of simulationsusing a block averaging procedure or using the correlation function approach isreported. This statistical error only reflects the spread of the data correspondingto those states that have been sampled. It is frequently meaningless as it doesnot represent the real spread of the data. Worse, it can give an inappropriateimpression of the precision of the calculations and may led to the conclusion thatfailure to reproduce the available experimental results may be due to reasons otherthan insufficient sampling.

5.4.3 Convergence in free energy calculations

Table 5.4 shows the results obtained for ∆GWT→M (and ∆∆GDM ) when extend-ing the simulation time of the λ points listed in Table 5.1 for the mutations L74Aand L95A. Table 5.4 show a dramatic increase of the degree of convergence of∆GWT→M in the case of the monomers. The sampling error, as indicated by thedifference between the results for different schemes falls from 10.0 to 2.5 kJ/molfor the mutation L74A and from 8.1 to 3.6 kJ/mol for the mutation L95A. Theconvergence is clearly evident in the free energy profiles of the different schemes


Table 5.4: List of the free energies obtained for the mutations L74A and L95A of Suc1(∆GWT→M) for extended simulations performed in the forward and reverse direction butalso in a random order. Simulations of the monomers and the dimers were performedfor 6 ns and 4 ns respectively. Statistical errors are indicated in parentheses.

mutations ∆GWT→M (kJ/mol) ∆∆GDM (kJ/mol)monomer dimer simulation experiment

L74A 9.7 (1.2) 16.7 (2.0) 2.7 (3.2) -4.4L74A reverse 7.2 (1.6) 7.7 (2.8) 6.7. (4.4)L74A random 8.3 (3.3) 17.3 (1.7) -0.7 (5.0)L95A 3.6 (1.3) 27.6 (1.8) -20.4 (3.1) 0.4L95A reverse 1.0 (1.2) 3.0 (1.3) -1.0 (2.5)L95A random 0.0 (1.0) 20.8 (2.4) -20.8 (3.4)

of the mutations L74A and L95A shown in Fig. 5.6.

In the case of the monomer the forward and the random schemes of the mu-tation L74A at λ=0.5 have converged within 6 kJ/mol after 6 ns while there wasa difference of 25 kJ/mol after 400 ps. The convergence of the reverse scheme iseven more important as the difference between the reverse and the random schemedecreased from 90 kJ/mol after 400 ps to 20 kJ/mol after 6 ns. It is unlikely thatincreasing the simulation length for other λ points would have resulted in thesame increase of the convergence. This is because the λ points selected showedthe greatest discrepancy between the different schemes and corresponded to theregion of the free energy profile where transitions between alternatives states aremost facilitated by the use of a soft core potential. However, although a largedecrease in the differences between the various schemes occurs, these differencesremain greater than the apparent statistical error indicating that the calculationshave still not completely converged on this timescale.

The degree of convergence of ∆GWT→M for the dimer is only marginallyimproved (sampling error decreasing from 14.1 kJ/mol to 9.6 kJ/mol for L74Aand only from 24.7 to 24.6 kJ/mol for L95A). As a consequence the ∆∆GDM

values calculated for the different schemes converge only slightly. This can beobserved in Fig. 5.6 where the free energy profiles of the dimers show greatervariation than those of the monomers.

5.5 Conclusion

In this chapter we have employed a somewhat artificial separation between sam-pling error and statistical error in order to show clearly that the statistical errorcalculated using either the block averaging procedure or the explicit correlation in

5.5. Conclusion 103

0 0.2 0.4 0.6 0.8 1λ

-100

-50

0

50

100

<dH

/dλ>

(k

J m

ol-1

)


monomer

0 0.2 0.4 0.6 0.8 1λ

-200

-100

0

100

200


dimer

0 0.2 0.4 0.6 0.8 1λ

-100

-50

0

50

100

<dH

/dλ>

(k

J m

ol-1

)


monomer

0 0.2 0.4 0.6 0.8 1λ

-200

-100

0

100

200


dimer

Figure 5.6: Free energy profiles of the mutations L74A and L95A calculated usingdifferent schemes for both the monomer and the dimer and using extended simulationsfor the λ points listed in Table 5.1.


the data in general vastly underestimates the true uncertainty in free energy cal-culations and is therefore of little use. On the other hand comparing the samplingerror and the statistical error provides a simple mean to check whether the cal-culations have converged. Performing a series of simulations and demonstratingthat the statistical error is similar to the sampling error is a minimum requirementfor demonstrating convergence in the calculations.

The two methods most widely used to calculate statistical errors failed com-pletely to give reasonable estimates of the statistical error except in very specificcases. However the problem is not so much related to shortcomings of the methodsthemselves than to their misuse. These methods all assume that there is enoughsampling (i.e. independent data points have been collected). When simulatingproteins this is usually far from being the case. The properties of interest aresensitive to small changes in conformation and transitions between alternativerotamer states are relatively slow on the timescale of the simulations.

For simulations performed around λ = 0 or 1, the number of conformationssampled is very small (and therefore the number of independent data points aswell). In this case the better estimate of the true sampling error is obtained byusing directly the standard deviation of the series rather than attempting to ac-count for correlations in the data such as by using a block averaging approach. Forother λ points, although the number of conformations sampled is greater thanksto the use of the soft core potential, the statistical error is still significantly un-derestimated using standard methods and again a better estimate of the samplingerror is also obtained by simply using the standard deviation of the series.

In the case of the monomer the simulations were approaching a sufficientlength. Extending the simulation time by an order of magnitude resulted in adramatic improvement on the convergence of the mutation free energy calculatedfor the monomer. In particular, better convergence could be achieved by in-creasing the simulation length for just a few λ points. In the case of the dimer,convergence of the results was not significantly improved if at all by increasing thesimulation time due to the extra degrees of freedom corresponding to the relativepositions of the monomers. Much longer simulations would be needed to reachthe same degree of convergence as achieved for the monomer.

Finally the extent of the sampling error found for the calculation of the relativestability of the mutant dimer with respect to the wild type dimer is such that it canexplain for the most part the discrepancies oberved in Chapter 4 between resultsfrom simulations and experiment. Unfortunately, this implies that problems withsampling still dominate the calculations and that it is not possible to use thesecalculations to address questions neither related to the accuracy of the force fieldor structures, nor to propose possible new substitutions for genetic engineeringstudies.

105

Chapter 6Conclusion - Outlook

106 Chapter 6. Conclusion - Outlook

6.1. Free energy calculations 107

The main focus of this thesis has been the study of protein-protein, peptide-peptide and protein-ligand interactions. We have attempted to analyze such in-teractions by directly simulating the process of recognition starting from isolatedmolecules as well as using simulation techniques to calculate differences in freeenergy between alternative states. The power of Molecular dynamics simulationtechniques is that they can be used to obtain detailed information either on sys-tems or on properties that are otherwise not amenable to experimental studies.The challenge is to ensure that the results obtained are a true reflection of theproperties of the system.

6.1 Free energy calculations

While potentially very powerful, free energy calculations are still only rarely ap-plied to protein systems. In order to use free energy calculations in a predictiveway, one must first demonstrate that free energy differences can be determinedwith accuracy. In order to assess whether free energy calculations could be usedto predict protein-ligand and protein-protein interactions, we first investigatedthe relative binding free energy of a set of ligands to two proteins using the TImethod and then calculated the relative stability of a protein dimer upon muta-tion with respect to the wild type. The main problem that arose when performingthese free energy calculations was a failure to converge within the time scale ofthe simulations. Another problem was that sometimes they appeared to haveconverged although in reality they had not. As a consequence, the results fromthe simulations generally did not agree with the experiments due to the lack ofsampling of the system.

During the calculation of the relative binding free energy of a series of ligandsto trypsin and factor Xa, it was found that using the closure of thermodynamic cy-cle to check the convergence of the calculations was useful but not sufficient. Therandom character of the discrepancy between results from experiment and simu-lation found for these calculations and for the calculations of the relative stabilityof the Suc1 dimer upon mutation suggested that the discrepancy was more likelydue to the sampling error rather than to other sources of errors such as the qualityof the force field. This was confirmed by the study of the convergence propertiesof free energy calculations investigated using three mutations of the Suc1 protein.A new criterion was given to check the convergence of the calculation and the twomost widely used methods to calculate the statistical error (which should be thesame as the sampling error) were assessed and found inadequate. Clearly morework needs to be done both in regards to enhanced sampling techniques as wellas criteria against which free energy calculations can be assessed before they willbe able to be routinely applied to protein systems.

108 Chapter 6. Conclusion - Outlook

6.2 The sampling issue

The issue of sampling is not only relevant in the case of free energy calculations.The study of the EMP1 dimer in Chapter 2 is a perfect example of a system forwhich only a fraction of the phase space could be sampled despite that an extensivesearch was performed. This example shows that the size of the conformationalspace dramatically increases when simulating a dimer with respect to a monomer.This is even true for a system such as EMP1 in which the monomer has a veryrigid backbone.

Non convergence in the calculations is related to the characteristics of thephysical model used to performed the simulations. Indeed molecular dynamicssimulations sample local low energy conformations. This means that the systemtends to remain trapped in local minima of the energy landscape and cannoteasily cross energy barriers. New methods, such as the replica exchange approach[75] which has also been proposed for use in free energy calculation [76], cansignificantly improve the sampling of the system. It would be interesting to seehow such approaches would perform in these cases and to what extent they wouldhelp solving the sampling problem. However, while these methods would probablyfacilitate convergence in the case of the mutation of a residue inside a monomer,it is unlikely that such an approach alone would be sufficient to obtain reliableresults for the dimer.

However, even in the eventuality of a complete sampling of the system, theaccuracy of the calculation would still be directly related to the accuracy of themodel used to describe the system (i.e. the force field). The quality of the resultwill at best reflect the quality of the force field within the limit of the precision ofthe calculation given by the statistical error. For Molecular Dynamics simulationsto play an increasing role in biological sciences, it is important that a balance thesedifferent sources of error is achieved.

109

Bibliography

[1] J. C. Kendrew. A three dimensional model of the myoglobin molecule ob-tained by x-ray analysis. Nature, 181:662–666, 1958.

[2] H. J. Dyson and P. E. Wright. Coupling of folding and binding for unstruc-tured proteins. Curr. Opin. Struct.Biol., 12:54–60, 2002.

[3] C. B. Anfinsen. Principles that govern the folding of protein chains. Science,181:223–230, 1973.

[4] W. Kabsch and C. Sander. Dictionary of protein secondary structure: Pat-tern recognition of hydrogen-bonded and geometrical features. Biopolymers,22:2577–2637, 1983.

[5] K. Wuthrich. NMR of Proteins and Nucleic Acids. Wiley-Interscience, New-York, 1986.

[6] N. Metropolis and S. Ulam. The Monte Carlo method. J. Am. Stat. Ass.,44:334–341, 1949.

[7] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, andE. Teller. Equation of state calculations by fast computing machines. J.

Chem. Phys., 21:1087–1092, 1953.

[8] C. M. Dobson, A. S. Sali, and M. Karplus. Protein folding: A perspectivefrom theory and experiment. Angew. Chem. Int. Ed., 37:868–893, 1998.

[9] T. Lazaridis and M. Karplus. New view of protein folding reconciled with theold through multiple unfolding simulations. Science, 278:1928–1931, 1997.

[10] C. L. Brooks III. Simulations of protein folding and unfolding. Curr. Opin.

Struct.Biol., 8:222–226, 1998.

[11] C. Levinthal. Are there pathways for protein folding ? J. Chim. Phys.,65:44–45, 1968.

110 Bibliography

[12] R. Zwanzig, A. Szabo, and B. Bagchi. Levinthal’s paradox. Proc. Natl. Acad.

Sci. USA, 89:20–22, 1992.

[13] G. M. Crippen and Y. Z. Ohkubo. Statistical mechanics of protein folding byexhaustive enumeration. Proteins: Struct. Funct. Gen., 32:425–437, 1998.

[14] X. Daura, W. F. van Gunsteren, and A. E. Mark. Folding-unfolding ther-modynamics of a β-heptapeptide from equilibrium simulations. Proteins:

Struct. Funct. Gen., 34:269–280, 1999.

[15] J. G. Kirkwood. Statistical mechanics of fluid mixtures. J. Chem. Phys.,3:300–313, 1935.

[16] R. W. Zwanzig. High-temperature equation of state by a perturbationmethod. i. nonpolar gases. J. Chem. Phys., 22:1420–1426, 1954.

[17] A. E. Mark, S. P. van Helden, P. E. Smith, L. H. M. Janssen, and W. F.van Gunsteren. Convergence properties of free energy calculations: α-cyclodextrin complexes as a case study. J. Am. Chem. Soc., 116:6293–6302,1994.

[18] W. F. van Gunsteren, T. C. Beutler, F. Fraternali, P. M. King, A. E. Mark,and P. E. Smith. Computation of free energy in practice: choice of approxi-mations and accuracy limiting factors. In W. F. van Gunsteren, P. K. Weiner,and A. J. Wilkinson, editors, Computer simulation of biomolecular systems:

theoretical and experimental applications, volume 2, pages 315–348, Leiden,1993. ESCOM Science.

[19] H. Liu, A. E. Mark, and W. F. van Gunsteren. Estimating the relative freeenergy of different molecular states with respect to a single reference state.J. Phys. Chem., 100:9485–9494, 1996.

[20] H. Schafer, W. F. van Gunsteren, and A. E. Mark. Estimating relative freeenergies from a single ensemble: Hydration free energies. J. Comp. Chem.,20:1604–1617, 1999.

[21] C. Oostenbrink and W. F. van Gunsteren. Single-step perturbations to cal-culate free energy differences from unphysical reference states: limits on sizeflexibility, and character. J. Comp. Chem, 24:1730–1739, 2003.

[22] C. Oostenbrink and W. F. van Gunsteren. Free energies of binding of poly-chlorinated biphenyls to the estrogen receptor from a single simulation. Pro-

teins: Struct. Funct. Bioinf., 54:237–246, 2004.

[23] R. W. Carrell and B. Gooptu. Conformational changes and disease - serpins,prions and Alzheimer’s. Curr. Opin. Struct.Biol., 8:799–809, 1998.

Bibliography 111

[24] D. Thirumalai, D. K. Klimov, and R. I. Dima. Emerging ideas on the molecu-lar basis of protein and peptide aggregation. Curr. Opin. Struct.Biol., 13:146–159, 2003.

[25] J. A. Wells. Hormone mimicry. Science, 273:449–450, 1996.

[26] O. Livnah, E. A. Stura, D. L. Johnson, S. A. Middleton, L. S. Mulcahy, N. C.Wrighton, W. J. Dower, L. K. Jolliffe, and I. A. Wilson. Functional mimicryof a protein hormone by a peptide agonist: the EPO receptor complex at 2.8A. Science, 273:464–471, 1996.

[27] O. Livnah, D. L. Johnson, E. A. Stura, F. X. Farrell, F. P. Barbone, Y. You,K. D. Liu, M. A. Goldsmith, W. He, C. D. Krause, S. Pestka, L. K. Jolliffe,and I. A. Wilson. An antagonist peptide-EPO receptor complex suggeststhat receptor dimerization is not sufficient for activation. Nature structural

biology, 5:993–1004, 1998.

[28] H. J. C. Berendsen, D. van der Spoel, and R. van Drunen. GROMACS: Amessage-passing parallel molecular dynamics implementation. Comp. Phys.

Comm., 91:43–56, 1995.

[29] E. Lindal, B. Hess, and D. van der Spoel. Gromacs 3.0: A package formolecular simulation and trajectory analysis. J. Mol. Mod., 7:306–317, 2001.

[30] W. F. van Gunsteren, S. R. Billeter, A. A. Eising, P. H. Hunenberger,P. Kruger, A. E. Mark, W. R. P. Scott, and I. G. Tironi. Biomolecular

Simulation: GROMOS96 Manual and User Guide. BIOMOS b.v., Zurich,Groningen, 1996.

[31] H. J. C. Berendsen, J. P. M. Postma, W. F. van Gunsteren, and J. Her-mans. Interaction models for water in relation to protein hydration. InB. Pullman, editor, Intermolecular Forces, pages 331–342, Dordrecht, 1981.D. Reidel Publishing Company.

[32] H. J. C. Berendsen, J. P. M. Postma, W. F. van Gunsteren, A. DiNola, andJ. R. Haak. Molecular dynamics with coupling to an external bath. J. Chem.

Phys., 81:3684–3690, 1984.

[33] B. Hess, H. Bekker, H. J. C. Berendsen, and J. G. E. M. Fraaije. LINCS: Alinear constraint solver for molecular simulations. J. Comp. Chem., 18:1463–1472, 1997.

[34] N. Guex and M. C. Peitsch. SWISS-MODEL and the swiss-pdbviewer: Anenvironment for comparative protein modeling. Electrophoresis, 18:2714–2723, 1997. http://www.expasy.org/spdbv/.

[35] W. Humphrey, A. Dalke, and K. Schulten. VMD - VisualMolecular Dynamics. J. Molec. Graphics, 14.1:33–38, 1996.http://www.ks.uiuc.edu/Research/vmd.

112 Bibliography

[36] A. Di Nola, H. J. C. Berendsen, and O. Edholm. Free energy determinationof polypeptides conformations generated by molecular dynamics. Macro-

molecules, 17:2044–2050, 1984.

[37] W. F. van Gunsteren and H. J. C. Berendsen. Computer simulation of molec-ular dynamics: Methodology, applications, and perspectives in chemistry.Angew. Chem. Int. Ed. Engl., 29:992–1023, 1990.

[38] P. A. Kollman. Free energy calculations: Applications to chemical and bio-chemical phenomena. Chem. Rev., 93:2395–2417, 1993.

[39] A. E. Mark, S. P. van Helden, P. E. Smith, L. H. M. Janssen, and W. F.van Gunsteren. Convergence properties of free energy calculations: α-cyclodextrin complexes as a case study. J. Am. Chem. Soc., 116:6293–6302,1994.

[40] A. E. Mark. Free energy perturbation calculations. In P. V. R. Schleyer,N. L. Allinger, T. Clark, J. Gasteiger, P. A. Kollman, H. F. Schaefer III,and P. R. Schreiner, editors, Encyclopedia of computational chemistry, pages1070–1083, Chichester, 1998. Wiley and Sons.

[41] S. B. Dixit and C. Chipot. Can absolute free energies of association beestimated from molecular mechanical simulations? the biotin-streptavidinsystem revisited. J. Phys. Chem. A, 105:9795–9799, 2001.

[42] W. F. van Gunsteren, X. Daura, and A. E. Mark. Computation of free energy.Helvetica Chimica Acta, 85:3113–3129, 2002.

[43] C. Chipot and D. A. Pearlman. Free energy calculations. the long and windinggilded road. Mol. Sim., 28:1–12, 2002.

[44] R. J. Radmer and P. A. Kollman. The application of three approximate freeenergy calculations methods to structure based ligand design: Trypsin andits complex with inhibitors. J. Comput. Aided Mol. Des., 12:215–227, 1998.

[45] J. W. Essex, D. L. Severance, J. Tirado-Rives, and W. L. Jorgensen. Montecarlo simulations for proteins: Binding affinities for tryps in-benzamidinecomplexes via free-energy perturbations. J. Phys. Chem. B, 101:9663–9669,1997.

[46] J. Ramon Blas, Manuel Marquez, Jonathan L. Sessier, F. Javier Luque, andModesto Orozco. Theoretical study of anion binding to clax[4]pyrrole: theeffects of solvent, fluorine substitution, cosolute, and water. J. Am. Chem.

Soc., 124:12796–12805, 2002.

[47] M. Adler, D. D. Davey, G. B. Phillips, S. H. Kim, J. Jancarik, G. Rumennik,D. R. Light, and M. Whitlow. Preparation, characterization, and crystalstructure of the inhibitor zk-807834 (ci-1031) complexed with factor xa. Bio-

chemistry, 39:12534–12542, 2000.

Bibliography 113

[48] M. Whitlow, D. O. Arnaiz, B. O. Buckman, D. D. Davey, B. Griedel, W. J.Guilford, S. K. Koovakkat, A. Liang, R. Mohan, G. B. Phillips, M. Seto,K. J. Shaw, W. Xu, Z. Zhao, D. R. Light, and M. M. Morrissey. Crystal-lographic analysis of potent and selective factor xa inhibitor complexed tobovine trypsin. Act. Cryst. D., D55:1395–1404, 1999.

[49] L.D. Schuler and W. F. van Gunsteren. On the choice of dihedral anglepotential energy functions for n-alkanes. Mol. Sim., 25:301–319, 2000.

[50] M. J. Frisch, G. W. Trucks, H. B. Schlegel, P. M. W. Gill, B. G. Johnson,M. A. Robb, J. R. Cheeseman, T. A. Keith, G. A. Petersson, J. A. Mont-gomery, K. Raghavachari, M. A. Al-Laham, V. G. Zakrzewski, J. V. Ortiz,J. B. Foresman, J. Ciosloswki, B. B. Stefanof, A. Nanayakkara, M. Chal-lacombe, C. Y. Peng, P. Y. Ayala, W. Chen, M. W. Wong, J. L. Andres,E. S. Replogle, R. Gomperts, R. L. Martin, D. J. Fox, J. S. Binkley, D. J.Defrees, J. Baker, J. P. Stewart, M. Head-Gordon, C. Gonzalez, and J. A.Pople. Gaussian 94, Revision A.1. Gaussian, Inc., Pittsburgh PA, 1995.

[51] B. H. Besler, K. M. Merz Jr., and P. A. Kollman. Atomic charges derivedfrom semiempirical methods. J. Comp. Chem., 11:431–439, 1990.

[52] D. van der Spoel, A. R. van Buuren, E. Apol, P. J. Meulenhoff, D. P. Tiele-man, A. L. T. M. Sijbers, B. Hess, K. A. Feenstra, E. Lindahl, R. van Drunen,and H. J. C. Berendsen. Gromacs User Manual version 3.0. Nijenborgh 4,9747 AG Groningen, The Netherlands. Internet: http://www.gromacs.org,2001.

[53] S. Miyamoto and P. A. Kollman. SETTLE: An analytical version of theSHAKE and RATTLE algorithms for rigid water models. J. Comp. Chem.,13:952–962, 1992.

[54] T. C. Beutler, A. E. Mark, R. C. van Schaik, P. R. Gerber, and W. F. vanGunsteren. Avoiding singularities and numerical instabilities in free energycalculations based on molecular simulations. Chem. Phys. Lett., 222:529–539,1994.

[55] J. W. Pitera and W. F. van Gunsteren. One-step perturbation methods forsolvation free energies of polar solutes. J. Phys. Chem. B, 105:11264–11274,2001.

[56] A. Villa and A. E. Mark. Calculation of free energy of solvation for neutralanalogs of amino acid side chains. J. Comp. Chem., 23:548–553, 2002.

[57] M. Bishop and S. Frinks. Error analysis in computer simulations. J. Chem.

Phys., 87:3675–3676, 1987.

[58] M. P. Allen and D. J. Tildesley. Computer Simulations of Liquids. OxfordScience Publications, Oxford, 1987.

114 Bibliography

[59] M. R. Shirts, J. W. Pitera, W. C. Swope, and V. S. Pande. Extremelyprecise free energy calculations of amino acid side chain analogs: Comparisonof common molecular mechanics force fields for proteins. J. Chem. Phys.,119:5740–5761, 2003.

[60] A. Villa, R. Zangi, G. Pieffet, and A. E. Mark. Sampling and convergence infree energy calculations of protein-ligand interactions: the binding of triphe-noxypyridine derivatives to factor xa and trypsin. J. Comp. Aid. Mol. Design,17:673–686, 2003.

[61] T. Simonson and A. Brunger. Thermodynamics of protein-peptide interac-tions in the ribonuclease-s system studied by molecular dynamics and freeenergy calculations. Biochemistry, 31:8661–8674, 1992.

[62] A. Di Nola and A. T. Brunger. Free energy calculations in globular proteins:Methods to reduce errors. J. Comp. Chem., 19:1229–1240, 1998.

[63] J. Hayles, D. Beach, B. Durkacz, and P. Nurse. The fission yeast cell cyclecontrol gene cdc2: isolation of a sequence suc1 that suppresses cdc2 mutantfunction. Mol. Cell. Genet., 202:291–293, 1986.

[64] J. A. Endicott, M. E. Noble, E. F. Garman, N. Brown, B. Rasmussen,P. Nurse, and L. N. Johnson. The crystal strucure of p13suc1, a p34cdc2-interacting cell cycle control protein. EMBO J., 14:1004–1014, 1995.

[65] M. J. Bennett, S. Choe, and D. Eisenberg. Domain swapping: entanglingalliances between proteins. Proc. Natl. Acad. Sci. USA, 291:3127–3131, 1994.

[66] M. J. Bennett, M. P. Schlunegger, and D. Eisenberg. 3d domain swapping:a mechanism for oligomer assembly. Protein Sci., 4:2455–2468, 1995.

[67] Y. Bourne, A. S. Arvai, S. L. Bernstein, M. H. Watson, S. I. Reed, J. E.Endicott, M. E. Noble, L. N. Johnson, and J. A. Tainer. Crystal structureof the cell-regulatory protein suc1 reveals a β-hinge conformational switch.Proc. Natl. Acad. Sci. USA, 92:10232–10236, 1995.

[68] J. Pines. Reaching for a role for the cks proteins. Curr. Biol., 6:1399–1402,1996.

[69] D. O. V. Alonso, E. Alm, and V. Daggett. Characterization of the unfoldingpathway opf the cell-cycle protein p13suc1 by molecular dynamics simula-tions: implications for domain swapping. Struct. Folding Des., 8:101–110,2000.

[70] J. W. H. Schymkowitz, F. Rousseau, L. R. Irvine, and L. S. Itzhaki. Thefolding pathway of the cell-cycle regulatory protein p13suc1: clues for themechanism of domain swapping. Struct. Folding Des., 8:89–100, 2000.

Bibliography 115

[71] F. Rousseau, J. W. H. Schymkowitz, H. R. Wilkinson, and L. S. Itzhaki.Three-dimensional domain swapping in p13suc1 occurs in the unfolded stateand is controlled by conserved proline residues. Proc. Natl. Acad. Sci. USA,98:5596–5601, 2001.

[72] T. P. Straatsma, H. J. C. Berendsen, and S. J. Stam. Estimation of statisticalerrors in molecular simulation calculation. Molec. Phys., 57:89–95, 1986.

[73] T. P. Straatsma and J. A. McCammon. Multiconfiguration thermodynamicintegration. J. Chem. Phys., 95:1175–1188, 1991.

[74] D. A. Pearlman. Free energy derivatives: a new method for probing theconvergence problem in free energy calculations. J. Comp. Chem., 15:105–123, 1994.

[75] Y. Sugita and Y. Okamoto. Replica-exchange molecular dynamics methodsfor protein folding. Chem. Phys. Lett., 314:141–151, 1999.

[76] Y. Sugita, A. Kitao, and Y. Okamoto. Multidimensional replica-exchangemethod for free energy calculation. J. Chem. Phys., 113:6042–6051, 2000.

[77] M. J. Mitchell and J. A. McCammon. Free energy difference calculationsby thermodynamic integration: difficulties in obtaining a precise value. J.

Comp. Chem., 12:271–275, 91.

[78] W. Yang, R. Bittti-Putzer, and M. Karplus. Free energy simulations: Useof reverse cumulative averaging to determine the equilibrated region and thetime required for convergence. J. Chem. Phys., 120:2618–2628, 2004.

[79] C. F. Wong. Systematic sensitivity analysis in free energy perturbation cal-culations. J. Am. Chem. Soc., 113:3208–3209, 1991.

116 Bibliography

117

Summary

Proteins are the media through which genetic information is expressed. Theyare involved in most if not all biological processes, from receptor activation tocellular process regulation or even chemical reaction catalysis. This diversity ofactivities is made possible because proteins come in all sizes and “shapes” (theirstructures). In most cases, each protein has one unique structure (the nativestructure) which is directly related to its function. Knowing the structure of aproteins is therefore of great importance for the understanding of its function orof the mechanism by which the function is carried out. However the structure of aprotein cannot be determined solely from its sequence. Instead, the structure canbe obtained experimentally for certain proteins either by x-ray crystallographyor by NMR spectroscopy. These experimental methods cannot provide detailedinformation on the dynamical properties of a protein and therefore only providevery limited insights on the folding process itself. Molecular dynamics (MD)simulations can be used to study the dynamical properties of a system in fullatomic details. Thus MD simulations can be used to gain a better understandingof the interactions between proteins and between proteins and ligands in orderto predict how proteins or some of their elements associate with one another toachieve their lowest free energy conformation.

The ability to accurately determine differences in free energy is therefore ofgreat practical interest in biophysics and structural biology as it would allow theprediction of phenomena such as conformational changes or protein-ligand interac-tions. Free energy differences can be calculated from numerical simulations usinga variety of statistical mechanical approaches. The accuracy of such calculationsis primarily limited by two factors, the nature of the underlying model or forcefield and the extent of the sampling during the simulation.

The research in this project concentrated on problems of sampling and conver-gence when using free energy calculations in the elucidation of protein-protein andprotein-ligand interactions. Three different systems were used to study differentaspects of the problem:

1) The self association of the EPO mimetic peptide 1 (EMP1) was studied as a

118 Summary

model for protein association in general. EMP1 was used to help understand howβ-sheets can rearrange by observing how the dimer can switch between differentdimeric states. The study provides insight into how secondary structure elementscan find and recognize each other in the course of the folding of a protein. Fromthe simulations it appeared that the burial of the hydrophobic core was the forcedriving the aggregation and was the main stabilizing factor of the different dimersobserved. The alternative dimers were very slow to interconvert on the MD timescale making impossible the calculation of the free energy associated with thedimerization based on the number of association-dissociation events.

2) The binding of a series of triphenoxypyridine derivatives to factor Xa andtrypsin were studied as a model for understanding the use of free energy calcu-lations in predicting protein-ligand interactions. This work aimed at analysingfactors related to sampling and convergence in free energy calculations based onthe Thermodynamic Integration (TI) method. The inhibitors studied representeda severe challenge for explicit free energy calculations. The mutation from onecompound to another involved up to 19 atoms, the creation and annihilation ofnet charge and several alternate binding modes. The results suggested clearlythat nanosecond simulations were too short to yield accurate estimates of thebinding free energies and that closure of thermodynamic cycles were useful butby no means sufficient to ensure that convergence had been reached.

3) The effect of mutation on the dimerization of SUC1 was investigated toassess the utility of using free energy calculations to predict changes in protein-protein interactions. The relative free energy of dissociation was calculated for17 mutants of the Suc1 protein using MD simulations together with the TI for-mula. From all the mutations performed it appeared that only a small subsetof the calculated values matched the experimental values. Even predicting therelative stability of the dimer calculated for comparatively simple mutations suchas the 8 mutants involving the transformation of a Leu residue into an Ala provedproblematic. By the number and the type of mutations performed the work repre-sented one of the first attempts to truly determine the applicability of free energycalculations on mutations within a protein. The results clearly suggest that earlierwork claiming good agreement with experiment was simply fortuitous. Finally adetailed analysis of sampling and convergence properties of free energy calcula-tions was made in order to understand the large discrepancies between the cal-culated and experimental values while the small statistical error usually obtainedsuggested converged results.

As computational power increases regularly, so do expectations that methodsdevised in certain cases a few decades earlier to calculate free energy differenceswill finally be applicable to system of biological interest. There is little doubt thatthis will eventually happen, thanks also to the systematic development of moreefficient methods. Nevertheless, the complexity of proteins is such that currentlyfree energy calculations cannot be routinely applied successfully especially if thesystem is composed of not one but two proteins. Certainly, novel methods toimprove sampling exist but even these will need to be coupled to methods that

Summary 119

restrict the system to only the most relevant part of its available conformationalspace if one is to be able to reliably determine free energy differences of proteinsor peptides in a near future.

120 Summary

121

Samenvatting

Genetische informatie kom tot uitdrukking in eiwitten. Eiwitten zijn betrokkenbij de meeste, zo niet alle, biologische processen van activering van receptoren totregulering van cellulaire processen tot aan katalyse van chemische reacties. Dezediversiteit aan activiteiten wordt mogelijk gemaakt doordat eiwitten in allerleigrootten en ”vormen” (dat wil zeggen: structuren) voorkomen. In de meestegevallen heeft elk eiwit een unieke structuur (de natieve structuur) die direct ver-want is aan zijn functie. Kennis van de structuur van een eiwit is daarom vangroot belang voor het begrip van zijn functie of van het mechanisme dat leidt totzijn functie. De structuur van een eiwit kan vooralsnog niet worden afgeleid uit devolgorde van de aminozuren waaruit het bestaat. De driedimensionale structuurkan experimenteel worden opgehelderd door Rontgendiffractie en NMR spectro-scopie. Gedetailleerde informatie over de dynamische eigenschappen van een eiwitkan echter niet verkregen worden met deze experimentele methoden en daaromgeven zij slechts zeer beperkt inzicht in het proces van eiwitvouwing. Moleculairedynamica (MD) simulaties kunnen worden gebruikt om de dynamische eigen-schappen van een systeem in volledig atomair detail te bestuderen. Aldus kandoor middel van MD simulaties een beter inzicht worden verkregen in de interac-tie tussen eiwitten en liganden. Ook kan men voorspellen hoe eiwitten of delenvan eiwitten associeren om hun laagste vrije energie configuratie te bereiken.

Het vermogen verschillen in vrije energie nauwkeurig te bepalen is van grootbelang in de biofysica en structurele biologie aangezien deze kennis het voor-spellen van fenomenen als conformatieverandering en eiwit-ligandbinding toes-taat. Vrije-energieverschillen kunnen met behulp van een verscheidenheid aanstatistisch-mechanische benaderingen door numerieke simulaties worden verkre-gen. De nauwkeurigheid van dergelijke berekeningen wordt hoofdzakelijk bepaalddoor twee factoren: de aard van het onderliggende model of wel krachtenveld ende omvang van de bemonstering tijdens de simulatie.

Het in dit proefschrift beschreven onderzoek richt zich vooral op vragen diespelen rond de bemonstering en de convergentie in vrije-energieberekeningen vaneiwit-eiwit en eiwit-ligand interacties. Drie verschillende systemen werden ge-

122 Samenvatting

bruikt om verschillende aspecten van het probleem te bestuderen:1) De associatie van het EPO-mimetisch peptide 1 (EMP1) met zichzelf werd

bestudeerd als model voor eiwitassociatie in het algemeen. EMP1 werd gebruiktom te begrijpen hoe β-sheets kunnen herschikken door na te gaan hoe het dimeertussen verschillende toestanden schakelt. De studie verschaft inzicht in hoe de se-cundaire structuurelementen elkaar vinden en kunnen herkennen tijdens het pro-ces van eiwitvouwing. Uit de simulaties bleek dat de vorming van de hydrofobekern de drijvende kracht is achter het associatieproces en dat deze de belangrijkstestabiliserende factor is van de verschillende waargenomen dimeerstructuren. Deinterconversie van de verschillende dimeerstructuren was echter zo langzaam opde MD tijdschaal dat de berekening van de vrije energie van dimerisatie onmo-gelijk was aangezien deze gebaseerd is op het veelvuldig voorkomen van associatie-dissociatie gebeurtenissen.

2) De binding van een reeks trifenoxypyridinederivaten aan factor Xa en aantrypsine werd bestudeerd als voorbeeld van het gebruik van vrije-energieberekeningenvoor het voorspellen van eiwit-ligand interacties. Het doel van dit werk is de be-monstering en convergentie te analyseren in vrije-energieberekeningen die met be-hulp van de Thermodynamische Integratie (TI) methode worden uitgevoerd. Debestudeerde set derivaten vertegenwoordigt een strenge test voor het gebruik vanexpliciete vrije-energieberekeningen. De derivaten verschilden sterk van elkaar inchemische structuur. Het verschil liep op tot 19 atomen, met verdere verschillenin de totale lading en uiteenlopende bindingswijzen. Aan de hand van de resul-taten kon worden geconcludeerd dat simulaties van meerdere nanoseconden tekort zijn om voldoende nauwkeurige waarden van de vrije energie van binding tegeven en dat het sluiten van de thermodynamische cyclus nuttig is, maar volstrektonvoldoende om voor convergentie te zorgen.

3) Het effect van mutatie op de dimerisatie van SUC1 werd onderzocht om hetnut van het gebruik van vrije-energieberekeningen voor het voorspellen van de ve-randeringen in eiwitinteractie te beoordelen. De relatieve vrije energie van dissoci-atie werd berekend voor 17 mutanten van SUC1 met behulp van MD in combinatiemet de TI methode. Van alle onderzochte mutaties bleek dat slechts een klein deelvan de berekende waarden overeenkwam met de experimenteel bepaalde waarden.Zelfs het voorspellen van de relatieve stabiliteit van het dimeer voor betrekkelijkeenvoudige mutaties, zoals die van de 8 mutanten waarbij een leucine residu ineen alanine residu werd veranderd, bleek problematisch. Door het aantal en typeonderzochte mutaties is deze studie een van de eerste pogingen de toepasbaarheidvan vrije-energieberekeningen op mutaties van eiwitten te bepalen. De resultatenduiden er nadrukkelijk op dat de goede overeenkomst met experimentele waar-den die in eerdere numerieke studies werd gevonden toevallig was. Omdat er eenkleine statistische fout werd verkregen die er gewoonlijk op duidt dat de resul-taten geconvergeerd zijn, werd een gedetailleerde analyse van de bemonstering envan de convergentie-eigenschappen van vrije-energieberekeningen uitgevoerd omde grote discrepantie tussen de berekende en experimentele waarden te begrijpen.

Met de voortdurende toename van de rekenkracht van computers wordt de

Samenvatting 123

verwachting gesteund dat methoden voor het berekenen van vrije-energieverschillendie soms een aantal decennia eerder zijn ontwikkeld, toegepast kunnen worden opbiologisch relevante systemen. Er bestaat weinig twijfel dat dit, mede dankzij desystematische ontwikkeling van efficientere methoden, uiteindelijk zal gebeuren.De complexiteit van eiwitten is echter dusdanig groot dat vrije-energieberekeningenmomenteel niet routinematig succesvol kunnen worden toegepast, vooral als sys-temen niet uit een maar uit twee of meerdere eiwitten bestaan. Zeker, er bestaannieuwe methoden die de bemonstering aanzienlijk verbeteren maar zelfs dezezullen moeten worden gekoppeld aan methoden die de vrijheid van het systeembeperken tot het meest relevante deel van de conformationele ruimte wil menvrije-energieverschillen van eiwitten of peptides in de nabije toekomst betrouw-baar kunnen voorspellen.

124 Samenvatting

125

Acknowledgements

J’aimerais tout d’abord remercier ceux sans qui je ne serais pas la aujourd’hui etqui m’ont toujours soutenu quand j’en avais besoin. Il s’agit evidemment de mamaman, de mon papa, de mon frere ainsi que le reste de la famille (mamie, mesoncles, tantes et cousins) meme si je me suis fait rare au cours de ces dernieresannees. J’aimerais aussi remercier mes amis restes en France qui ont aussi sum’accueillir, me conseiller et me reconforter dans les moments delicats: Jean-Seb,Cristelle, Marie-Alix, Laurence, Lucky, Marilyne et Cedric C.

The Netherlands are not so far away from France, thus I was bound to meetsome fellow countrymen. Many thanks to Cyril, Joyce, Mylene, Laurent, Gregory,Aymeric, Ilir, Oriane and Eloise for making life here much easier. Vivre ensemblen’est pas forcement la chose la plus facile. Mention speciale donc a Cedric T. eta Benoit qui ont reussi tant bien que mal a me supporter, a moins que ce ne soitl’inverse, lorsque nous habitions au Stoeldraaierstraat 13 (et ce malgre le manquede pression du jet de douche et la difficulte a parfois obtenir de l’eau chaude).

Italians may or may not be more fun than French, but they are definitelynoisier. Thanks to Rosanna, Martina, Francesca, Eleonora and Cristiano for thenice moments spent together.

I don’t know who told me that these balkan people were completely crazy butwhomever told me that forgot to mention that they were also amazingly noisy(yes, more than italians... I know, it’s hard to believe but they are). Many thanksto Sonja, Andrija, Branislava, Diana and eStasinos for some very animated nights.

These acknowledgements are looking more and more like a pile of stereotypesand this paragraph is not going to change the impression. I would like to thankthe large South American community present in Groningen for its warm spiritand dancing mood, despite a weather rather unfavourable to them most of theyear, and more particularly Paolita (obviously), Carolita aka Sissy, don Cesar,Andres, Patricio, Anita and Francisco.

Many sundays would not have been so fun without the volley-ball group, thusmany thanks to Simon, Simone, Pavel, Michele, Valeria, Christian, Daniele, Ste-fania, Roberto, Graeme.

126 Acknowledgements

Of course I would like to thank my colleagues of the mdgroup for the niceworking atmosphere: Anton, Marc, Berk, Emile, Gerrit, Ronen, Patricia, Hariclia,Danilo, Giorgio, Siewert-Jan, Alex, Fan Hao, Tjerk, Xavier, Sergei, Jelger, Angel,Erik, Volker, Jolanda and in the NMR group: Ruut, Klaas, Rene and Franziska.Special thanks to my supervisor Alan, for the help, patience and advices providedall along those years.

These acknowledgements would not be complete if I did not mention Char-maine, Tony, Lenny, Isabel and Frans van H. Sorry to put you there guys, butyou did not fit in any of the previous category.

Date post:	27-Mar-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

University of Groningen The application of molecular ... · The Application of Molecular Dynamics...

Documents