Accuracy, stability, convergence of rigorous thermodynamic ... · Accuracy, stability, convergence...

Freie Universitat BerlinFachbereich Mathematik und InformatikStudiengang Bioinformatik

Master’s thesis

Accuracy, stability, convergence ofrigorous thermodynamic sampling

methods

Alexander Riemer∗

2006/08/18

supervised by Dr. Frank Cordes† and Prof. Dr. Paul Wrede‡

∗Freie Universitat Berlin†Konrad-Zuse-Zentrum fur Informationstechnik, Computational Drug Design Group‡Charite Universitatsmedizin Berlin, Institute for Molecular Biology and Bioinformatics

Contents

1. Introduction 51.1. Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2. Basics 92.1. Canonical Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2. Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . 112.3. Molecular Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4. Hybrid Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.5. Conformational space . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3. Sampling strategies 213.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2. ZIBgridfree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.1. Soft-characteristic molecular conformations . . . . . . . . . . . 233.2.2. Partitioning by membership basis functions . . . . . . . . . . 253.2.3. The algorithm (outline) . . . . . . . . . . . . . . . . . . . . . 273.2.4. Presampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2.5. Choice of nodes . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2.6. Sampling of partial densities . . . . . . . . . . . . . . . . . . . 293.2.7. Computation of thermodynamic weights . . . . . . . . . . . . 303.2.8. Transition and overlap matrix and conformation analysis . . . 313.2.9. Convergence criterion . . . . . . . . . . . . . . . . . . . . . . . 323.2.10. Efficiency of ZIBgridfree . . . . . . . . . . . . . . . . . . . . . 33

3.3. Replica Exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.3.1. Efficiency of the Replica Exchange method . . . . . . . . . . . 35

3.4. ConfJump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.4.1. Jump Proposition Matrix . . . . . . . . . . . . . . . . . . . . 373.4.2. ConfJump as a rigorous sampling method . . . . . . . . . . . 383.4.3. The ConfJump Algorithm . . . . . . . . . . . . . . . . . . . . 393.4.4. Efficiency of the ConfJump strategy . . . . . . . . . . . . . . . 40

4. Convergence diagnostics 434.1. The Gelman-Rubin Criterion . . . . . . . . . . . . . . . . . . . . . . . 444.2. Comparing Sampling Results . . . . . . . . . . . . . . . . . . . . . . . 454.3. Symmetry criterion for convergence . . . . . . . . . . . . . . . . . . . 49

4.3.1. Applicability of the symmetry criterion . . . . . . . . . . . . . 51

3

Contents

4.3.2. Automatic detection of molecule symmetries . . . . . . . . . . 53

5. Numerical Experiments 595.1. Performance measure for sampling runs . . . . . . . . . . . . . . . . . 595.2. Molecules used for this study . . . . . . . . . . . . . . . . . . . . . . 605.3. Simulation details and choice of parameters . . . . . . . . . . . . . . 62

6. Results 656.0.1. L-Benzylsuccinate . . . . . . . . . . . . . . . . . . . . . . . . . 656.0.2. Trimethoprim . . . . . . . . . . . . . . . . . . . . . . . . . . . 676.0.3. BSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.0.4. Performance comparison . . . . . . . . . . . . . . . . . . . . . 72

7. Conclusion 75

A. Algorithm for automatic detection of molecule symmetries 79

Bibliography 83

4

1. Introduction

Ever since the advent of computers, the behavior of microscopic systems of parti-cles has been a primary subject to be studied in computer simulations. The basicmethodology for such simulations, most notably Monte Carlo methods [42, 43, 44]and molecular dynamics [36], had already been developed by the 1950s. Computersimulations are hoped to give an understanding of structural or dynamic proper-ties of molecular systems which cannot be observed directly. Of special interestare biochemical molecular systems, where large molecules or clusters of molecules,primarily proteins, act like molecular machines which perform a multitude of differ-ent functions in metabolism, transport processes, coordinated movement, immunedefense, and signal transduction [38]. With the computational power available inmassively parallel computer systems today, it has become possible to explore struc-ture, function and dynamics of ever larger and more complex biochemical systems ina mathematically rigorous way, which has led to the emergence of the discipline ofcomputational drug design, the aim of which is to identify novel drug molecules whichbind to a given receptor molecule, thus providing new impulses for pharmaceuticalresearch.

The biochemical function of a molecule is basically determined by its 3-dimensionalstructure. The interaction between two biomolecules, e.g. of a small molecule calleda ligand with a protein, its target, is only possible when the ligand sterically “fits”into the target’s binding site. Typical ligands are highly flexible biomolecules thatcan switch between several metastable conformations, each of which has a different3-dimensional shape. In order to predict in silico whether a ligand binds to a targetor not, the ligand’s main conformations have to be known. So-called “3D structuregenerators” such as CONCORD [50] and CORINA [24] use different heuristics ondatabases of known structures in order to quickly generate representatives of molec-ular conformations. However, there is no way to estimate the error of such methods,and they return no information at all about the statistical distribution of the gen-erated representatives. Information about the statistical weights of the differentconformations, steric variance within a conformation, and transition probabilitiesbetween conformations allow the prediction of the dynamics of the intermolecularinteraction that is being studied and can only be obtained from thermodynamicsimulations. A transition from one conformation to the other can be brought aboutby (possibly simultaneous and correlated) rotations around single bonds within themolecule. Additionally, the interaction between two biomolecules, which is a dy-namic process, induces conformational changes in each of the reactants. This allowsthem to recruit further interaction partners, which can lead to cascades of interac-tions, as they are found in the signalling pathways of all eukaryote organisms [3]. In

5

1. Introduction

this thesis, however, the focus will be on exploring the static thermodynamic distri-bution of biomolecules, and transition processes will be of minor concern.

While biomolecules can be very flexible, they do not assume any state in con-formational space with equal probability. The probability of a transition from onemolecule configuration to another one is determined by the difference in total energybetween the two configurations. Thus, the energy landscape associated with theconformational space defines a statistical distribution which favors low-energy stateswhile disallowing physically forbidden states (e.g. two atoms can neither overlap normove too far away from each other while connected by a chemical bond).

In order to determine the metastable conformations of a biomolecule, a clusteranalysis is performed on a large sample of molecule configurations which is gener-ated in a sampling phase according to the thermodynamically correct distributionat the desired temperature.

The goal of this work is to compare three methods for exploring a molecule’s sta-tistical distribution in conformational space with respect to the following questions.

• How fast does a method converge against the “true” distribution?

• How sensitive is it to the choice of initial configurations?

• How closely does a method approximate the “true” distribution in a giventime?

• What is the computational cost of each method?

The three methods under consideration are all based on the hybrid Monte Carlomethod but use a variety of approaches to accelerate convergence compared to asimple hybrid Monte Carlo approach.ZIBgridfree uses a meshless partitioning of the conformational space. It was origi-nally implemented as “HuMFree” by Holger Meyer from April 2004 to February 2005in the course of his master’s thesis [45]. The method was developed by Marcus Weberin his doctoral thesis [73]. The Replica Exchange method was added by AlexanderRiemer from August to November 2005. It uses independent sampling runs at dif-ferent temperatures which are allowed to exchange positions at certain intervals.The sampling strategy “ConfJump” [71] uses known minima of the potential energysurface to accelerate sampling by randomly introducing jumps from the proximityof one minimum to the proximity of another one, thus effectively escaping “trap-ping” within the basin of attraction of one local minimum. It has been developedby Lionel Walter and Marcus Weber and was implemented by Lionel Walter fromOctober 2005 to June 2006. The three techniques have been implemented within acommon framework that allows them to be combined easily [47].

This thesis aims at more than just a comparison of different sampling methods.Methods for monitoring convergence of a Markov chain Monte Carlo sampling and for

6

1.1. Outline

comparing the quality of different sampling runs needed to be developed in the firstplace. Almost as a byproduct of this thesis, a graph-theoretic recursive algorithmhas been developed to find all rotationally symmetric functional groups in arbitrarybiomolecules.A metric is proposed for measuring the difference between two sampling results(cf. 4.2) in order to be able to compare sampling methods. Further, a variant of thismetric is used to define a new semi-empirical convergence indicator based on moleculesymmetries (cf. 4.3). The performance of each of the three sampling techniquesunder consideration is assessed in a series of numerical experiments conducted onthree increasingly complex biomolecules (see chapter 5).

1.1. Outline

The following chapter gives an overview of the basics of statistical mechanics andconformation analysis. Both molecular dynamics and Markov chain Monte Carlomethods are presented in chapter 2 along with the hybrid Monte Carlo approachwhich combines the two.

The three sampling techniques under consideration, ZIBgridfree, Replica Exchange,and ConfJump, will be presented in chapter 3. Theoretical considerations concerningthe efficiency of each method compared to pure hybrid Monte Carlo will be given aswell.

Chapter 4 deals with the issue of convergence diagnostics, i.e. algorithms for esti-mating whether the thermodynamic distribution sampled by a molecular simulationis sufficiently close to the molecule’s “true” distribution. In addition to that, methodsfor comparing the different sampling strategies are developed in the same chapter:In section 4.2, a metric for measuring the difference between two sampling results isdeveloped, which is based on histograms over 1-dimensional sampled distributions.A semi empirical convergence criterion that employs knowledge about rotationalsymmetries in the molecule under consideration is developed based on this metric insection 4.3. In connection with this symmetry criterion, a graph-theoretic algorithmfor automatic detection of molecule symmetries is developed in section 4.3.2.

The numerical simulations that were performed for assessing the performance ofthe different sampling methods are described in chapter 5. A measure for the per-formance of a sampling technique is developed in section 5.1 based on the metricdeveloped in section 4.2.

Chapter 6 presents the results of the numerical experiments.Finally, a conclusion is given in chapter 7 along with an outlook, especially re-

garding the future goal of simulating large molecular systems.

7

1. Introduction

8

2. Basics

The goal of a conformation analysis is to divide a molecule’s conformational spaceinto metastable regions, i.e. to find a partition of the conformation space withthe following property: For a given period of time, the transition probability fromone region to itself is high, while transition probabilities between any two differentmetastable regions are minimal. Transition here means physically feasible transitionswithin that period of time according to the system’s dynamics as usually simulatedby molecular dynamics (cf. 2.3).

A conformation analysis that divides conformations based on differences in freeenergy consists of two phases, sampling and clustering. During the sampling phase,molecule configurations are generated according to the correct thermodynamic dis-tribution of the molecule at a given sampling temperature. Afterwards, these con-figurations are clustered into metastable regions.

The molecule to be analyzed is given by its N atoms and the bonds between themas well as atom and bond types. A specific 3-dimensional molecule configuration isdescribed as a position state q of the system which is a 3N -dimensional vector ofatom coordinates q ∈ IR3N = Ω. Let p ∈ IR3N analogously to q be the collectivemomentum vector andM, a 3N × 3N -matrix with

Mij =

m⌊(i−1)/3⌋+1, i = j

0, else, (2.1)

the mass matrix of the molecule, where mk is the mass of atom k.

In classical mechanics, the total energy of a microstate (q, p) of the system isdescribed by a separable Hamiltonian

H(q, p) = V (q) +K(p)

= V (q) +1

2p⊤Mp, (2.2)

which is the sum of the potential energy V and the kinetic energy K. K dependsonly on the momenta p, while V can be calculated from the positions q alone. Whilethe kinetic energy can be calculated directly from atom masses and momenta, thepotential V is approximated by a molecular force field which describes V as thesum of energy terms for binding and non-binding interactions between atoms. Allmethods discussed in this thesis have been implemented using the Merck molecularforce field (MMFF) [31].

9

2. Basics

2.1. Canonical Ensemble

Rather than simulating the behavior of an individual molecule over time, moleculesare simulated within a statistical ensemble, which assigns a probability measure toany point (q, p) in the molecule’s phase space Γ = Ω×IR3N . It can be thought of as alarge (possibly infinite) number of realizations of a random experiment – in this caseobserving the microstate of the molecule at an arbitrary point in time [21]. Samplingthen consists in generating the ensemble according to the underlying probabilitydistribution, i.e. drawing samples from that distribution. In conformation dynamics,we are interested in ensembles that are in thermodynamic equilibrium, i.e. stationaryensembles, in which the underlying probability density function is time-independent.

The quantities of interest are the expected values of observables over the statisticalensemble. An observable is any function A : Γ → IR that assigns a real number toevery point (q, p) in phase space. Examples for observables are total energy H ,potential energy V , kinetic energy K, geometric properties such as the value of aparticular torsion angle in the molecule (cf. 2.5), but also the degree of membershipfor a certain metastable conformation (see section 3.2.1).

In the canonical or NVT ensemble, a molecule is considered a subsystem of fixedvolume V that is embedded in an infinitely large thermal bath with a constanttemperature T , with which it continuosly exchanges energy, while its mean kineticenergy remains constant in the limit. Since chemical reactions are not allowed in thesimulation, the number of particles in the system N is also constant [23].

Phase space microstates (q, p), which describe the system’s positions q and mo-menta p, are distributed according to a Boltzmann distribution:

π(q, p) =1

Qexp(−βH(q, p)), (2.3)

where

Q =

∫

Γ

exp(−βH(q, p)) dq dp (2.4)

is a normalization factor used to make π a probability distribution. Since it is anintegral over all possible states of a 6N -dimensional system, this partition functioncan only be calculated analytically for the most simple systems.The temperature enters the equation in the form of β = 1/kBT , the inverse temper-ature. kB = 1.38065 · 10−23J/K is Boltzmann’s constant.

Since in the canonical ensemble, the system is in constant thermal contact withthe environment, there is no limitation on the total energy of an actual state of thesystem. Thus, any microstate (q, p) is in principle reachable, and every open subsetof Γ has a non-zero probability.

Substituting equation 2.2 in equation 2.3 yields

π(q, p) =1

Qq ·Qpexp(−βV (q)) exp(−βK(p))

= ρ(q) · η(p). (2.5)

10

2.2. Markov Chain Monte Carlo

The thermodynamic distribution can be split into independent distributions ρ ofpositions and η of momenta. In fact, in a conformation analysis, one is only in-terested in observables in q. Therefore, it is sufficient to sample from the positiondistribution ρ. The expected value of an observable A : Ω→ IR is the integral

〈A〉ρ =

∫

Ω

A(q)ρ(q) dq

=1

Qq

∫

Ω

A(q) exp(−βV (q)) dq (2.6)

over the whole position space Ω.Let (q1, . . . , qn) be an independent sequence of molecule configurations distributedaccording to ρ. Then it follows from the law of large numbers that the sample means

A =1

n

n∑

i=1

A(qi) (2.7)

converges to the expected value 〈A〉ρ for n→∞ [55]. In addition to that, it followsfrom the central limit theorem that with increasing n the sampling error, i.e. the dif-ference between the sampled distribution and ρ, decreases asymptotically in O(

√n)

almost surely.

However, as the partition function Qq is unknown and hard to compute, it is notpossible to directly draw samples from ρ. The Markov chain Monte Carlo approach(MCMC) generates samples from a probability distribution ρ by constructing anergodic Markov chain that has ρ as its unique stationary distribution.

2.2. Markov Chain Monte Carlo

The MCMC method was developed in the late 1940s and early 1950s by Metropolis,Ulam, Fermi, von Neumann, Teller et al. for studying the diffusion of neutrons infissible material and also already for molecular simulations. This work led to theMetropolis algorithm which was published in 1953 [43]. As stated above, the ideais to generate a dependent sequence

(

q(n))

of random vectors q(n) ∈ Ω that aredistributed according to ρ for n→∞.

This Markov chain must be ergodic and meet the criterion of detailed balance,

ρ(q)P (q → q) = ρ(q)P (q → q), (2.8)

in order for its unique stationary distribution to be the thermodynamically correctequilibrium distribution ρ. The convergence rate is O(

√n) just as for independent

random vectors [21]. Detailed balance is a restatement of the constraint of ther-modynamic equilibrium, i.e. in the limit there is no net flow between any two opensubsets A and B of the position space Ω. It is also called microscopic reversibility.

11

2. Basics

Substituting ρ from equation 2.5 into equation 2.8 yields

P (q → q)

P (q → q)=

exp(−βV (q))

exp(−βV (q))

= exp(−β∆V ). (2.9)

The ratio of the probabilities of a transition and its reversal depends only on thepotential at the positions q and q and is thus directly computable with only twoevaluations of the force field (which, in the case of MMFF and many other force fieldmodels, has a computational cost of O(N2)).

The Metropolis algorithm is one of the most popular MCMC strategies in usetoday. It splits the transition from a state q to a state q into two steps: A trial stepand an acceptance step, in which the new state q is accepted with a probability ofPacc and rejected in favor of resting in q with 1− Pacc:

P (q → q) = Pgen(q → q) · Pacc(q → q). (2.10)

The trial step must ensure that every state q ∈ Ω is in principle reachable, i.e. anyopen subset of Ω must have a non-zero probability. If, in addition to that, the trialstep is chosen symmetrically, i.e. Pgen(q → q) = Pgen(q → q), equation 2.9 becomes

Pacc(q → q)

Pacc(q → q)= exp(−β∆V ). (2.11)

By choosing

Pacc(q → q) = min

1,ρ(q)

ρ(q)

= min 1, exp(−β∆V ) (2.12)

this equation is easily satisfied. This choice of acceptance probability is called theMetropolis criterion.

The resulting Markov chain is irreducible because in the trial generation algorithm,any position state is in principle reachable from any other, so that every open subsetof Ω has a non-zero probability, i.e. all the states communicate. Due to the possibilityof rejecting a trial, the Markov chain is also aperiodic. A Markov chain that is bothirreducible and aperiodic is ergodic [2]. Therefore, the unique stationary distributionexists. Detailed balance, the algorithm used for generating trials, and the acceptancecriterion ensure that it is the Boltzmann distribution at the sampling temperature T .

Constructing an ergodic Markov chain whose transitions are split into trial andacceptance steps with the additional constraint of a symmetric trial step and theMetropolis acceptance criterion allows generating random variables distributed ac-cording to ρ without knowledge of the partition function Qq.

12

2.3. Molecular Dynamics

The Metropolis Algorithm

Starting from an initial configuration q(0), repeat the following:

1. From the current state q(i) = q, generate a trial q by a perturbation techniquethat satisfies detailed balance and has a symmetric proposal probability.

2. Calculate the acceptance probability

Pacc(q → q) = min 1, exp(−β∆V ) .

3. Generate a uniformly distributed random number ζ ∈ [0, 1).

4. Set the new configuration

q(i+1) :=

q, ζ < Pacc

q, else. (2.13)

This is done either for a fixed number of times n or until some error measure indi-cates convergence. See chapter 4 for a discussion of convergence monitors.

The main problem of this algorithm is that in order to be efficient, it has topropose a new configuration q that is substantially different from q but also has ahigh probability of being accepted, i.e. q must be of similar or lower potential energythan q but must be as far away from q as possible so that the algorithm will cover alarge amount of space in a short time. The efficiency of any sampling strategy thatis based on the Metropolis algorithm or its generalization, the Metropolis-Hastingsalgorithm [33], is dependent on two quantities:

• the computational cost of the trial step and

• the average acceptance probability or alternatively the

acceptance ratio =# accepted steps

n. (2.14)

The hybrid Monte Carlo approach employs molecular dynamics to generate trialswith a high acceptance ratio at an acceptable computational cost.

2.3. Molecular Dynamics

Molecular dynamics (MD) [21, 23, 37, 55] simulates the behavior of a molecularsystem over time as a many-body system in terms of classical mechanics, i.e. itsolves the Newtonian or Hamiltonian equations of motion, respectively, for the givensystem by numerical integration. Quantum effects such as induced changes in the

13

2. Basics

electronic density of a molecule are ignored. In contrast to Monte Carlo approaches,molecular dynamics is a deterministic method.

MD simulates the motion of the system under the influence of a specified forcefield (the potential energy function V ). Given an ideal integrator, MD reproduces thecorrect physical behavior of the system over time, within the limitations of classicalmechanics and the force field used to describe interactions between atoms.

For a system of N atoms the equations of motion can be written as a set of twodifferential equations:

v(t) = q(t),

F (q) = Mv(t) = −∇V (q(t)) . (2.15)

The velocity v of a particle, the product of its mass m and momentum p, is thederivative of that particle’s position with respect to time. The force F acting ona particle is the negative gradient of the potential at the particle’s position. Addi-tional terms are sometimes added to the force F to simulate interactions with theenvironment.

Since analytic solutions for this system of differential equations are known only forvery simple systems, it is necessary to employ numerical integrators. A very popularintegrator used in molecular dynamics is the velocity Verlet integrator [23, 65]. Likethe Euler integrator and other Verlet-type integrators it is derived from a Taylorexpansion of the trajectory q(t). Verlet integrators are based on a second-orderTaylor approximation in which the third-order terms cancel thus leaving a localerror in position of O(τ 4), where τ is the length of the integration step [69]. Thevelocity Verlet integrator updates position q and velocity q = v according to thefollowing equations:

q(t+ τ) = q(t) + τ q(t) +τ 2

2M−1F (t),

q(t+ τ) = q(t) +τ

2M−1 (F (t) + F (t+ τ)) . (2.16)

The time step length τ is typically on the order of 1fs = 10−15s so as to be ableto correctly simulate high-frequency processes such as bond vibrations or bond-angle oscillations. By repeatedly applying equations 2.16, starting from some initialconfiguration (q(0), q(0)), a trajectory is generated which describes the change of thedynamic variables with time.

In contrast to the MCMC approach, which samples the canonical ensemble, amolecular dynamics trajectory samples a part of the microcanonical or NV E-ensemble, in which number of particles N , volume V and total energy E = H isconstant. Since states of constant energy are not necessarily connected, an MDtrajectory is not an ergodic Markov chain. Moreover, an integrator that exactlyconserves energy is theoretically impossible [21]. However, symplectic integratorssuch as varieties of the Verlet integrator conserve the total energy of the system onaverage. A variety of approaches exists for molecular dynamics in different ensembles

14

2.4. Hybrid Monte Carlo

such as the NVT ensemble, e.g. by rescaling momenta or adding correcting terms tothe force F in equation 2.16 [23, 55].

The average of an observable A : Ω → IR on an MD trajectory of n time stepsstarting at time t0 = 0 is calculated as the time average over the trajectory:

A =1

n

n−1∑

i=0

A (q(iτ)) . (2.17)

The ergodic hypothesis [21, 49, 55], which is one of the fundamental axioms of statis-tical mechanics, posits that a molecular system will assume all possible microstates(q, p) within some ergodic component Ω ⊆ Ω (which contains all points that arecompatible with the constraint of conservation of energy (or conservation of energyon average)) for t→∞ (n→∞). Therefore, the unique time average of an observ-able A exists in Ω. The ergodic hypothesis states further that A, as calculated byequation 2.17, converges towards the expected value of A over the microcanonicalensemble,

A∞ = 〈A〉ρNV E. (2.18)

In practice, the ergodic hypothesis can usually not be proven and may even be falsefor special cases.

While molecular dynamics has a number of advantages, such as simulating “true”dynamics, which allows estimating kinetic properties of the system, it also has severedisadvantages, especially when applied to conformation dynamics:

• It has a high error amplification due to numerical errors, effectively disallowingsimulations over a long period of time,

• a very low time step length τ , because of which an MD trajectory can onlycover a small region of phase space in a given period of time, and

• since MD simulations model the system’s true dynamics, they tend to gettrapped within basins of attraction of local minima (metastabilities) for longtimes.

2.4. Hybrid Monte Carlo

The hybrid Monte Carlo strategy (HMC) [7, 12, 19, 21] combines Markov chainMonte Carlo and molecular dynamics in order to efficiently generate samples fromthe canonical ensemble of the molecule at the given temperature T . HMC is a Markovchain Monte Carlo method which is based on the Metropolis-Hastings algorithm [33],which, in contrast to the Metropolis algorithm, does not require a symmetric trialstep. The Metropolis-Hastings algorithm satisfies equation 2.9 and thus detailed

15

2. Basics

balance by choosing the following acceptance criterion:

Pacc(q → q) = min

1,ρ(q)Pgen(q → q)

ρ(q)Pgen(q → q)

= min

1, exp(−β∆V )Pgen(q → q)

Pgen(q → q)

. (2.19)

In hybrid Monte Carlo, the trial step consists in a short MD trajectory. Trial gen-eration in this way has a moderate computational effort but also a high probabilityof being accepted in the subsequent acceptance step since MD on average conservesthe system’s total energy thus only generating physically meaningful configurations.The method requires that MD simulations be performed with an integrator that isboth time-reversible and preserves phase space volume [21]. The symplectic velocityVerlet integrator (given in equations 2.16) has both properties.

For every step in the Markov chain, a short MD trajectory is computed, startingfrom the current position state q and a randomly generated momentum state p,which is distributed according to the Boltzmann distribution η (see equation 2.5).Since molecular dynamics is deterministic, the outcome (q, p) of the MD simulationdepends only on the initial state (q, p). As the initial position state q is given,the trial probability in equation 2.19 depends only on the distribution of the initialmomenta p:

Pgen(q → q) = η(p) =1

Qpexp(−βK(p)). (2.20)

The integrator used in the MD simulation is reversible, i.e. if the state (q, p) isgenerated from (q, p) in l iterations, then l integration steps starting from (q,−p)will generate the state (q,−p). Therefore, the probability of generating q from qdepends only on the distribution of the start momenta −p:

Pgen(q → q) = η(−p) =1

Qpexp(−βK(−p)). (2.21)

The kinetic energy K is a quadratic function in the momenta p (see equation 2.2).Therefore, K(−p) = K(p). Thus, equation 2.19 becomes

Pacc(q → q) = min

1, exp(−β∆V )exp(−βK(p))

exp(−βK(p))

= min 1, exp(−β∆H) . (2.22)

The acceptance probability of a hybrid Monte Carlo step depends on the total en-ergyH . However, the trial momentum p is discarded (as is the whole MD trajectory),and only the next position (q or q) needs to be stored. It is worth noting that if thesystem’s total energy H is exactly conserved in the trajectory, the trial is acceptedwith probability 1.

16

2.5. Conformational space

The HMC Algorithm

Starting from an initial configuration q(0), repeat the following:

1. Draw a random collective momentum vector p from the Boltzmann distributionη for the simulation temperature T .

2. Let q = q(i) denote the current position state. Run a short MD simulation ofa fixed length l starting from (q, p). Let (q, p) denote the microstate after literations.

3. Calculate the acceptance probability

Pacc(q → q) = min 1, exp(−β∆H) .


5. Set the new configuration

q(i+1) :=

q, ζ < Pacc

q, else. (2.23)

Again, this is done either for a fixed number of times n or until convergence isdetected.


After generating a sufficient number of samples from the canonical ensemble of amolecule, the metastabilities in the molecule’s position space have to be identified.Generally, metastabilities are almost invariant subsets of the state space, i.e. non-equilibrium states which are stable for longer periods of time. When consideringthe dynamics of the system under consideration for some given period of time, thetransition probability from any metastable region to itself is high while transitionsbetween two different metastable regions occur with low probability; for a formaldefinition of almost invariant subsets see [57]. In order to facilitate the metastabilityanalysis, the conformation space has to be defined in a meaningful way.

A molecular system of 3N particles has 3N − 6 degrees of freedom. However, amolecule’s metastabilities can usually be described in terms of very few degrees offreedom. Since in metastability analysis, one is interested in slow transition pro-cesses, bond-angle and bond-length oscillations can be neglected due to the fact thattheir frequencies are very high. Thus, it is sufficient to define a molecule’s confor-mational space in terms of a selection of its dihedral angles, i.e. in terms of rotationsaround chemical bonds.

Let a1, a2, a3, and a4 be four atoms in the molecule under consideration which areconnected by the chemical bonds (a1, a2), (a2, a3) and (a3, a4) (and possibly others).

17

2. Basics

The dihedral angle defined by the dihedral (a1, a2, a3, a4) is the angle between theplanes spanned by the triangles (a1, a2, a3) and (a2, a3, a4) (see fig. 2.1). Rotationsaround the bond (a2, a3) lead to different values for the dihedral angle. It is alsocalled a torsion angle.

Dihedral coordinates are invariant to rotation and translation of the whole system,which is good for conformation analysis, since absolute atom positions are irrelevant.The conformational space can be further reduced by omitting those dihedral anglesthat have no potential to define metastabilities, i.e. dihedrals that are either com-pletely rigid or extremely flexible. Consequently, a dihedral (a1, a2, a3, a4) is excludedif

• a1 or a4 is hydrogen (such a dihedral’s flexibility is almost unrestricted), or

• the bond (a2, a3) is not a single bond (only single bonds are rotatable).

In addition to that, no two dihedrals are used for defining the conformational spacethat describe the same single bond. The set of dihedrals obtained by removing theultra-flexible and inflexible dihedrals restricted to one dihedral per single bond willbe called the set of important or “heavy” dihedrals throughout this thesis. Figure 2.1illustrates the concept on the butane molecule.

Figure 2.1.: The only “heavy” dihedral angle in the butane molecule. Note that thedihedral angle is defined as the angle between the planes spanned byatoms 1,2,3 and atoms 2,3,4, respectively.

Since torsion angles are a cyclic measure, the Euclidian distance is not a metric intorsion angle space [45]. Let ϕ, ψ ∈ [0, 2π)d be two configurations given as points inthe conformational space defined by the molecule’s d heavy dihedrals. The Euclidiandistance on the torus between ϕ and ψ is

δ(ϕ, ψ) =

√

√

√

√

d∑

i=1

(ϕi ⊖ ψi)2, (2.24)

18


with

ϕi ⊖ ψi =

2π − (ϕi − ψi), ϕi − ψi > π

2π + (ϕi − ψi), ϕi − ψi < −πϕi − ψi, else

. (2.25)

With this intuitive metric, two angles can differ by no more than π (180). Conse-quently, the difference between two points in conformational space can be no greaterthan

√π2d.

19

2. Basics

20

3. Sampling strategies

3.1. Overview

The potential energy surface V of a biomolecule is usually very rough, i.e. regionswith a low potential energy are separated by high energy barriers. In a Markovchain Monte Carlo sampling of the canonical ensemble at physiologically relevanttemperatures (around 300K) this hinders transitions between different low-energyregions due to the fact that the acceptance criterion for an MCMC step dependson the potential energy V (or the total energy H = V + K in the case of HMC).Metastabilities in configuration space, the very phenomenon that is examined byconformation dynamics, make the sampling process slow by causing a “trapping”effect, where the sampling generates configurations from within the basin of attrac-tion of one local minimum for a long time, while the interesting transitions betweendifferent local minima, which correspond to conformational changes, are observedvery rarely. This effect is known as “broken ergodicity” and can lead to a very slowconvergence of the hybrid Monte Carlo method.

In order to overcome an energy barrier between two adjacent metastable regions,the HMC algorithm must by chance generate a vector of initial momenta p whichboth

• “points towards” an energy barrier that is not too far away for a short MDtrajectory to pass and

• infuses the system with enough energy so as to allow the MD trajectory toactually pass through the region of high potential energy rather than beingdiverted.

Finally, the end point of the MD trajectory has to be accepted.Generating random momenta that carry the system in one HMC step (or very

few steps with some accepted high-energy states on the path) from the basin ofattraction of one local minimum of the potential energy surface to that of anotherbecomes more and more unlikely with increasing size and complexity of the molecule.In fact, a molecule’s complexity (in terms of containing certain “complex” structures)is far more important than its size as is illustrated by cyclohexane, which has only18 atoms and whose configurations can be described in terms of only three torsionangles (disregarding high-frequency oscillations as well as translation and rotation ofthe whole molecule). However, cyclohexane’s metastabilities are separated by veryhigh energy barriers, and it is extremely difficult to accurately sample its Boltzmann

21


distribution at 300K. Figure 3.1 shows the major conformations of the molecule andtheir thermodynamic distribution in conformational space.

Figure 3.1.: Thermodynamic distribution of cyclohexane in its conformational spaceat 300K. Also shown are four conformations (left-most and right-most:’chair’ conformations, center: two ’twist’ conformations) correspond-ing to different metastable regions in conformational space. The twochair conformations have a combined thermodynamic weight of morethan 99%.

Of course, a sampling should need as few simulation steps as possible, i.e. generatea minimum amount of redundant data. On the other hand, all major local minimaof the potential energy surface (metastabilities) have to be found and assigned ther-modynamically correct weights.

In order to accelerate the sampling, different so-called umbrella strategies [66, 67,68] can be employed to systematically modify the probability distribution to be sam-pled (e.g. by lowering energy barriers) so as to make the Markov chains “mix” faster.These modifications are designed in such a way that their effect can be eliminatedfrom the resulting trajectories by reweighting, which allows estimating the original,unmodified probability distribution.Replica Exchange and ConfJump use different systematic potential modificationsthat facilitate sampling by “flattening” or “smoothing” the thermodynamic distri-bution, while ZIBgridfree uses a soft meshless partitioning of the conformation space,where ideally each subset of the conformation space does not assign a high weightto more than one major local minimum. Using a soft partitioning, i.e. restrictingthe sampling to certain subsets of the conformational space by erecting artificialenergy barriers, allows estimating transition probabilities between the partitions ofconformational space, which, in turn, allow correct reweighting of the different sub-samplings to form a combined estimate of the Boltzmann distribution at the samplingtemperature.

22

3.2. ZIBgridfree

3.2. ZIBgridfree

In large molecular systems with high-dimensional conformational spaces, the poten-tial energy surface is very rough. It is desirable to be able to discretize space in away that does not entail an exponential computational cost and sample the ther-modynamic density on each partition, separately. ZIBgridfree [45, 73, 75] uses ameshless discretization of the conformational space. By sampling different subsetsof the conformational space separately, less metastabilities occur within each subset,and thus, the HMC sampling converges fast. Instead of a crisp partitioning of theconformational space (as e.g. a Voronoi tessellation), ZIBgridfree employs a parti-tioning that is function-based (“fuzzy”) rather than set-based. This is achieved byadding softly limiting functions to the potential energy V . These potential modifi-cations do not have the effect of smoothing the potential, but rather, they (softly)restrict the sampling to certain regions in conformational space so that it is easierto sample all physically relevant regions of the conformational space, i.e. all regionswith a high statistical weight. The potential modifications are defined adaptivelywith respect to covering a large amount of the physically relevant regions, which areidentified by a presampling.

Rather than using one Markov chain to sample the unmodified potential, ZIB-gridfree subdivides the conformational space by defining a potential modificationfor each partition and then launches one Markov chain for each modified potentialenergy function. ZIBgridfree pursues an uncoupling-coupling strategy [22]. In anuncoupling step the conformational space is partitioned, and subsequently, for eachpartition of the space a distribution is sampled that has a lower variance than theoriginal distribution because it contains fewer local minima. Due to the loweredvariance, each sampling converges fast. Afterwards, the samplings of the differentpartitions of the conformational space are reweighted and combined in the couplingstep so that the resulting linear combination of the sampled partial densities is anapproximation of the target distribution.

3.2.1. Soft-characteristic molecular conformations

ZIBgridfree is based on the concept of conformation dynamics as described byDeuflhard, Schutte et al. [16, 17, 59]. Conformations are defined in terms of almost-characteristic membership functions rather than classical sets in conformationalspace. The goal then is to identify a set of C conformations defined by membershipfunctions χ1, . . . , χC : Ω → [0, 1] (see [18]). The functions χi are non-negative, i.e.for all q ∈ Ω:

χi(q) ≥ 0, i = 1, . . . , C, (3.1)

and form a partition of unity,

∀q ∈ Ω :C∑

i=1

χi(q) = 1. (3.2)

23


Conformations defined on the basis of membership functions χi have overlappingpartial density functions ρi associated with them:

ρi(q) =χi(q)ρ(q)

wi, (3.3)

where the partition functions wi =∫

Ωχi(q)ρ(q) dq are the thermodynamic weights,

and ρ is the spatial Boltzmann distribution (see equation 2.5). Note that the mem-bership functions χi are by their definition observables over the canonical ensembleunder consideration, as each function χi assigns a real number to every point q ∈ Ω.The thermodynamic weight wi is then the expected value of χi under the distribu-tion ρ (see equation 2.6). While the integral

Zi =

∫

Ω

χi(q) dq (3.4)

is the fraction of the conformational space “covered” by conformation i, wi is thefraction of the thermodynamic density over Ω that conformation i accounts for. Notethat in the case of a set-based approach with conformations S1, . . . , SC ⊂ Ω withcharacteristic functions

ξi(q) =

1, q ∈ Si

0, else(3.5)

replacing the membership functions χi, the “coverage” integral Zi is equal to thevolume of Si, and wi is the same as the integral of ρ over Si. Figure 3.2 illustratesthe difference between a crisp and a soft discretization of conformation space on a1-dimensional example. Figure 3.3 shows the decomposition of a Boltzmann dis-tribution over Ω into partial density functions ρi by the soft partitioning functionsshown in figure 3.2.

Ω

χ

Figure 3.2.: Partitioning of a set Ω into three “subsets” either via soft-characteristicfunctions (lines) or a crisp partitioning into classical sets (boxes).

The expected value of a spatial observable A : Ω → IR can be calculated sepa-rately for each function-based conformation χi under the partial density ρi of that

24

3.2. ZIBgridfree

Ω

ρ

Figure 3.3.: A Boltzmann distribution (dashed black line) over Ω and the partialdensity functions derived from it via the three soft partitioning functionsfrom figure 3.2.

conformation (see equation 3.3):

〈A〉ρ,χi=

1

wi〈Aχi〉ρ

=1

wi

∫

Ω

A(q)χi(q)ρ(q) dq. (3.6)

The soft-characteristic conformations χi can be interpreted as macrostates in con-figuration space that are fully described by modified potential energy functions Vi

with

Vi(q) = V (q)− 1

βln (χi(q)) . (3.7)

This follows from an interpretation of the partial density functions substituting ρfrom equation 2.5 in equation 3.3 (see also [73]):

1

wiχi(q)ρ(q) =

1

wiQqχi(q) exp (−βV (q))

=1

wiQqexp

(

−β(V (q)− 1

βln (χi(q)))

)

. (3.8)

3.2.2. Partitioning by membership basis functions

A central concept in ZIBgridfree is the approximation of the unknown conformationmembership functions χi from a function basis φ1, . . . , φs : Ω → [0, 1]. If this func-tion basis has the same properties as the membership functions χ1, . . . , χC , namelynon-negativity (equation 3.1) and partition of unity (equation 3.2), then each con-formation χl is a convex combination of the basis functions φi (see [73]).

χl =s∑

i=1

χdisc(i, l)φi, i = 1, . . . , C, (3.9)

25


where χdisc is the matrix of linear combination factors which is row-stoachastic, i.e.

s∑

l=1

χdisc(i, l) = 1, i = 1, . . . , C. (3.10)

The number of basis functions s is chosen sufficiently greater than the anticipatednumber of conformations C.

The basis functions form a soft partitioning of Ω as well but are not necessarilymetastable. Consequently, the concepts of thermodynamic weights (defined analo-gously to equation 3.3) and potential modifications as defined in equation 3.7 applyto the basis functions as well. From here onward, Vi will denote the modified poten-tial corresponding to the basis function φi, and wi will denote the thermodynamicweight of φi. ρi will denote the partial density function corresponding to φi.

The goal of cluster analysis will be to identify both the correct number of clus-ters C and the matrix χdisc of linear combination factors from samplings of thepartial densities ρi associated with the basis functions φi so as to obtain the set ofmembership functions χl by applying equation 3.9. The membership basis functionsare also referred to as shape functions in meshless methods.

ZIBgridfree defines the membership basis functions φi by means of a set of defin-ing nodes k1, . . . , ks ⊂ Ω. Nodes are placed equidistantly in the relevant part ofconfiguration space which is identified beforehand in a presampling at a high tem-perature (cf. 3.2.4). As most of configuration space is physically “forbidden” dueto extremely high potential energy in regions where atoms either overlap or are toofar away from each other to maintain chemical bonds, the amount of “relevant” (i.e.physically allowed) space that has to be covered is hoped not to grow exponentiallywith the number of atoms N [45, 73]. The definition of basis functions as

φi :=Wi

∑sj=1Wj

, i = 1, . . . , C, (3.11)

follows the partition of unity method of Shepard [62]. With radial basis functions Wi

with

Wi(q) = exp(

−α δ2(q, ki))

, i = 1, . . . , C, (3.12)

the basis functions φi are unimodal, non-negative, and continuously differentiableand form a partition of unity [73]. δ2(q, ki) is the squared distance of the projectionsof q and ki into the space of heavy dihedrals as defined in section 2.5. The shapeparameter α is chosen in dependence on the number of nodes s and the given nodedistance θ. The meshfree discretization using soft-characteristic basis functions φi

is a generalized Voronoi tessellation which converges towards a Voronoi tessellationfor α→∞. The basis functions φi have their maximum at the defining node ki anddecrease exponentially with growing distance from ki. Consequently, the modifiedpotential Vi is identical to V at position ki while the difference between V and Vi

increases exponentially in the distance from ki.

26

3.2. ZIBgridfree

ZIBgridfree samples the Boltzmann distributions corresponding to the modifiedpotentials Vi separately, which can even be done parallelly as each Vi can be evalu-ated at every position q ∈ Ω independently of all Vj with j 6= i. The current imple-mentation of ZIBgridfree [47] supports both serial and (massively) parallel sampling.The algorithm is described in more detail in the following sections.

3.2.3. The algorithm (outline)

1. Perform a (relatively short) presampling at a high temperature on the originalpotential V (cf. section 3.2.4). LetQ denote the set of generated configurations.

2. Place nodes k1, . . . , ks ∈ Q approximately equidistantly within relevant regionsof Ω only (cf. 3.2.5).

3. Define a meshless soft discretization by constructing basis functions φ1, . . . φs :Ω→ [0, 1] from k1, . . . , ks as described in section 3.2.2.

4. Perform HMC sampling of each partial density ρi which is induced by themodified potential Vi corresponding to the basis function φi (cf. 3.2.6).

5. Accumulate the transition matrix P and the overlap matrix S from the trajec-tories generated in step 4.

6. Calculate thermodynamic weights of the partial densities ρi (cf. 3.2.7).

7. Determine the number of conformations C and the matrix of linear combinationfactors χdisc by Robust Perron Cluster Analysis in order to obtain conformationmembership functions χ1, . . . , χC from the membership basis φ1, . . . , φs (see3.2.8.

3.2.4. Presampling

In the first step of the algorithm, a presampling at a high temperature is performedon the unmodified potential. The sampled distribution, which differs from the Boltz-mann distribution at temperature T only in the parameter β (see section 2.1), sharesall important minima and maxima with the target distribution while being generallymore variable. Therefore, the Markov chains exploring this distribution in HMCsampling are expected to mix better. The increased variability stems from the in-creased temperature which causes the random initial momenta for the HMC stepsto be higher on average than when sampling at temperature T . The presamplingeffectively yields a rough overview of the potential energy landscape which allowsidentification of the low-energy regions. Convergence of the presampling is moni-tored by a Gelman-Rubin criterion (see section 4.1) which is more tolerant than forthe regular sampling. After all, the goal is not estimation of the high-temperaturedistribution but merely finding all relevant regions in conformation space.

27


3.2.5. Choice of nodes

In order to reflect only physically relevant parts of the conformational space, nodesare chosen from the presampling trajectory. As energetically forbidden and very un-likely states are not assumed during presampling, nodes are only generated withinthe relevant regions of conformational space. This results in a problem-adaptivediscretization of the conformational space as the modified potentials Vi defined bythe nodes ki will assign very high values to all regions that have never been vis-ited during presampling. Therefore, it is crucial that the presampling discovers alllow-energy regions. Meyer [45] proposed an iterative strategy for finding the op-timal presampling temperature. In practice, it suffices to choose the presamplingtemperature high enough, as no error results from the inclusion of regions that havea low statistical weight at temperature T but are assumed easily at the presam-pling temperature. The computational overhead from sampling a few low-weightpartial densities is probably not very great overall, especially when compared to theoverhead resulting from multiple presamplings at different temperatures. The nodeselection algorithm of ZIBgridfree (see [45, 73]) chooses nodes from the presamplingtrajectory that are spaced approximately equidistantly and no closer to each otherthan the given minimum node distance θ:Let Q denote the set of molecule configurations generated in presampling and Q∗

the list of nodes which is initially empty. Further, let L be another list of moleculeconfigurations which is also initially empty.

1. Pick an arbitrary configuration k1 ∈ Q and add k1 to Q∗.

2. Calculate the distances of all geometries q ∈ Q to k1 (in heavy dihedral space;cf. section 2.5).

3. Add all configurations q ∈ Q to L, sorted by their distance from k1.

4. Repeat:

a) Let k denote the configuration that was added to Q∗ most recently.Remove from L all geometries q with δ(q, k) < θ.

b) If L is not empty, add the first element of L to Q∗ and remove it from L.

until L is empty.

Note that step 4a does not require testing all elements of L for their distance tothe node k that was added in the previous step of the algorithm. The reason for thisis as follows:Let a = δ(k, k1) be the distance of the node k that was added to Q∗ in the previousstep to the initial node k1. For all molecule configuration q ∈ Q with δ(q, k1) > θ+a,it follows that δ(q, k) > θ, because δ is a metric and the triangle inequality

δ(k1, k) + δ(k, q) = a+ δ(k, q) > δ(k1, q) (3.13)

28

3.2. ZIBgridfree

holds for all k, k1, q ∈ Ω. Consequently, in step 4a, the search in the list L can bestopped when the first element with δ(q, k1) > a is encountered. Therefore, the totalcomputational cost of node selection is the sum of O(|Q| log |Q|) for sorting all |Q|configurations and O(|Q|) for the loop in step 4 which per iteration removes at leastone element from L and examines each element at most twice. This yields a totalcomputational cost of O(|Q| log |Q|) for node generation.

This moderately low computational cost (as the presampling is kept fairly short)allows controlling the overall computational cost of the sampling phase by limitingthe number of nodes from the outset. This is done by repeating the node selectionalgorithm for a different values of θ which are generated by a binary search until thetarget number of nodes is (approximately) reached. Compared to the sampling oreven the presampling, the cost of node selection is negligible.

3.2.6. Sampling of partial densities

ZMFree uses different kinds of sampling in order to obtain information about

a) the overlap between two basis functions φi and φj,

S(i, j) = φi(q)φj(q)ρ(q) dq, (3.14)

and

b) the transition probabilities

P (i, j) =

∫

Ωφi(q)P

τφj(q) dq∫

Ωφi(q)ρ(q) dq

, (3.15)

where P τ is the Markov operator that describes the propagation of the systemin the canonical ensemble in time span τ ; see [58]. As the partition membershipfunctions φi are soft-characteristic functions, equation 3.15 describes a “fuzzy”concept of transition probabilities as well.

Both the overlap and the transition matrix can be used for reweighting the partialdensities (cf. 3.2.8).

Figure 3.4 shows the sampling scheme used by ZIBgridfree for each partial den-sity ρi. A regular HMC sampling is performed to generate a sequence of configu-rations q1, . . . , qn, and from every state qj , a short MD trajectory is launched onthe original, unmodified potential to obtain a new position state q′j. The sequence(q1, . . . , qn) is a realization of a Markov chain, as it is generated by HMC sampling.The sequence (q′1, . . . , q

′n), however, is also a realization of a Markov chain because

molecular dynamics is deterministic. Thus, the transition from q′j to q′j+1 is deter-mined solely by the transition from qj to qj+1 (and the random initial momentadrawn in steps j and j + 1), as it can be realized by (deterministically) movingfrom q′j to qj, then from qj to qj+1, which is the HMC step, and finally from qj+1 toq′j+1, deterministically once more. The “horizontal” chain (states qj) can be used to

29


q1 q2 q3

q1 q2 q3

...HMC HMC HMC

MD MD MD

’ ’ ’

Figure 3.4.: The sampling process employed by ZIBgridfree for every modified dis-tribution. Configurations q1, . . . , qn are generated by HMC sampling ofa modified potential. The sequence (q′j) is generated from the first se-quence “on the fly” by drawing random momenta p distributed accordingto the Boltzmann distribution η for every state qj and launching a shortMD trajectory from (qj , p). These MD simulations are performed on theunmodified potential.

calculate the overlap between partial densities, while the “vertical chain” (states q′j)is used to estimate the Markov operator P τ and accumulate transition probabilitiesbetween partial densities. This sampling approach is presented in more detail in [73]and [45].

3.2.7. Computation of thermodynamic weights

The first step in conformation analysis based on trajectories generated by the sam-pling approach outlined in section 3.2.6 is the computation of a matrix M whichcontains information about the degree of membership of configurations sampled fromthe partial density ρi in all partitions φj of the conformational space:

Mij =1

ni

ni∑

k=1

φj

(

q(i)k

)

. (3.16)

If the basis functions φi are interpreted as abstract states in a Markov chain, thenthe stochastic matrix M is an estimate of the transition matrix of that Markov chain,i.e. Mij is the probability of moving from the fuzzy set φi to the fuzzy set φj,

Mij = 〈φj〉ρi. (3.17)

Alternatively, the degrees of membership of q′(i)k in φj can be used in equation 3.17.

This Markov chain is ergodic (see [45]), as detailed balance holds in the canonicalensemble. Therefore, the unique stationary distribution π of M exists. Because ofdetailed balance, the basis functions φi must be waited against each other in such away that the net flow between them is zero:

πiMij = πjMji. (3.18)

Therefore, the components of the stationary distribution πi are, in fact, the thermo-dynamic weights wi (see theorem 4.11 in [73]) and can be computed by eigenvalue

30

3.2. ZIBgridfree

iteration, as

w = π = limn→∞

Mnα (3.19)

with an arbitrary initial distribution α.The weights wi are the linear combination factors that are used to reconstruct the

Boltzmann distribution ρ on Ω from the the partial densities ρi:

ρ =

s∑

i=1

wiρi. (3.20)

Note that weights are computed by evaluating the degree of membership of everysampling point from the sampling of each partial density ρi in every soft partition φi.This means that if the sampling of one partial density ρi does not converge, allweights will be flawed. ZIBgridfree must sample accurately the distribution overevery soft partition φi, even those whose thermodynamic weights are intrinsicallylow due to a generally high level of potential energy.

3.2.8. Transition and overlap matrix and conformation analysis

Afterwards, the estimated thermodynamical weights wi are used to compute theoverlap integral matrix S and the transition matrix P (see equations 3.14 and 3.15).S is constructed as

S(i, k) = 〈φi, φk〉ρ =s∑

j=1

wj〈φi, φk〉ρj=

s∑

j=1

wjSj(i, k), (3.21)

approximated from individual summands

Sj(i, k) ≈1

nj

nj∑

l=1

φi(q(j)l )φk(q

(j)l ), (3.22)

each of which is built from a trajectory(

q(j)1 , . . . , q

(j)nj

)

that is a sampling of the

partial density ρj .Analogously, the matrix P is constructed as

P (i, k) = 〈φi, Pτφk〉ρ =

s∑

j=1

wj〈φi, Pτφk〉ρj

=s∑

j=1

wjPj(i, k), (3.23)

where the individual summands for each trajectory(

q(j)1 , . . . q

(j)nj

)

are

Pj(i, k) ≈1

nj

nj∑

l=1

φi(q(j)l )φk(q

′(j)l ), (3.24)

31


where q′(j)l are the configurations generated by ‘vertical’ sampling (cf. 3.2.6). P is

then obtained by making P stochastic. For more details, see [45, 73].If metastable conformations exist, the overlap integral matrix S is almost block-

structured (see figure 3.5) after a suitable permutation. The same holds true for thetransition matrix P . Robust Perron Cluster Analysis [18, 72] is used to find thispermutation and thus the matrix of linear combination factors χdisc that transformsthe vector of basis functions (φ1, . . . , φs) into a vector of conformation membershipfunctions (χ1, . . . , χC).

≈ 0

≈ 0

Figure 3.5.: After suitable permutation, the overlap matrix S or the transition ma-trix P of a metastable system has a block structure, where basis func-tions within each block communicate, while transitions to (or overlapwith) basis functions outside a block are seldom (or weak).

3.2.9. Convergence criterion

ZMFree allows more than any other sampling technique to control the samplingerror, by directly estimating that error. Theoretically, only with an infinite numberof points n one could be sure that the transition matrix M has been estimatedcorrectly. Weber, Kube et al. [74] pick up the idea of Weber [73] to estimate thesampling error ||E||∞ = ||M −Mtr||, the difference between the true matrix M andthe estimation Mtr obtained from generating finitely many sampling points. Thecase that ||E||∞ ≤ ǫ is equivalent to the statement that the || · ||1-norms of rowvectors of E are small,

||E(i, :)||1 ≤ ǫ, i = 1, . . . , s. (3.25)

The rows i of E correspond to different subsamplings(

q(j)1 , . . . q

(j)nj

)

. The normE

convergence indicator computes the ith row of Mtr for each subsampling accordingto equation 3.17. The difference between these rows in vector-||·||1-norm is measured,and the maximum distance is compared to a given ǫ.

32

3.3. Replica Exchange

3.2.10. Efficiency of ZIBgridfree

The ZIBgridfree approach has to produce O(ns) sampling points by the HMCmethod, which is only better than pure hybrid Monte Carlo if the sampling of eachpartial density ρi converges s times as fast as that of HMC on the unmodified poten-tial. However, there is no theoretical limit to the mean time that a molecular systemspends in one metastable conformation – consider diamond, which is very hard tochange experimentally into graphite despite the latter conformation’s lower potentialenergy [30]. It is expected that from some threshold for the size or complexity of themolecule onward, no sampling strategy can hope to sample the unmodified Boltz-mann density in a reasonable time, and ZIBgridfree becomes more efficient and alsomore reliable than other strategies. O(ns) is also the computational cost of ZIBgrid-free sampling in terms of memory usage, as every point in every subsampling has tobe stored. By making use of parallelization, at least the time cost can be loweredconsiderably. It is also worth noting that the computational cost of sampling anal-ysis is on the order of O(ns3) if S and P are calculated from summands Sj and Pj

as given by equations 3.21–3.24, respectively.


Ideally, the sampling would consist in a random walk in energy space rather thanin position space. This would allow a fast discovery of all local minima of thepotential energy surface. Unfortunately, there is no direct way to construct a randomwalk in energy space so that usually it cannot be done efficiently. The ReplicaExchange method is a generalized-ensemble approach that consists in a random walkin temperature space which in turn induces a random walk in energy space [64] thusallowing the simulation to jump out of local minima more easily. Replica Exchangesimulations have successfully been applied to macromolecules [10, 39, 51, 54].

T1

T2

T3

T4

T5

Figure 3.6.: Replica Exchange method for 5 replicas. Non-interacting copies of thesystem at different temperatures Ti are allowed to exchange positions(or temperatures) at regular intervals.

The principle of the Replica Exchange method is shown in figure 3.6. The basicidea is to consider M independent copies or replicas of the system to be simulatedon which non-interacting simulations at M different temperatures are performed. Atperiodic intervals positions q are exchanged between replicas according to a MonteCarlo acceptance criterion. At high simulation temperatures it is easier to pass

33


energy barriers since, due to higher momenta, the effectively sampled probabilitydensity function is generally flatter, including the barriers. However, sampling at ahigher temperature means generating samples from a different thermodynamic dis-tribution. While the sampling points created in this way can be reweighted to theBoltzmann distribution at temperature T , this is only a heuristic and in no wayequivalent to drawing samples from the target distribution in a mathematically rig-orous way. For this reason, a single simulation at a high temperature is not sufficient.In a Replica Exchange simulation the replicas at high temperatures provide new startpositions for the HMC chain at the relevant (low) sampling temperature T , whichallows jumps out of the basin of attraction of a local minimum. This is illustratedfor a 1-dimensional potential energy function and two HMC chains at different tem-peratures in figure 3.7. The subsequent conformation analysis is done based only onthe chain at T , all sampling data at higher temperatures are discarded.

V

V

Conformation space

A

B

Figure 3.7.: Replica exchange. The potential energy surface is sampled by a high-temperature (red) and a low-temperature (blue) HMC chain.(A) before and (B) after a replica exchange step. Replica exchangeavoids trapping of the low-temperature chain in local minima.

In practice, M non-interacting hybrid Monte Carlo chains are started at M dif-ferent temperatures where for every time step i there exists a one-to-one mappingbetween chains and temperatures. All chains are propagated simultaneously usinghybrid Monte Carlo sampling as described in section 2.4. A replica exchange stepis performed after every tRE simulation steps. The replica exchange is realized tech-nically as an exchange of temperatures rather than positions between two chains.This reduces the amount of data that has to actually be moved in memory, which isespecially useful when chains are to be propagated truly parallelly on different CPUs.Let prior to an exchange step HMC chain i be at temperature m and replica j be at

34


temperature n. The exchange step then corresponds to the state transition

x = (. . . , q[i]m , . . . , q

[j]n , . . .)→ x′ = (. . . , q[i]

n , . . . , q[j]m , . . .). (3.26)

For transitions between states in this generalized ensemble [32], detailed balance isassumed as well, which means

πGE(x)Pxchg(x→ x′) = πGE(x′)Pxchg(x′ → x) with (3.27)

πGE(x) =1

QGEexp

(

−M∑

i=1

βm(i)V (q[i])

)

, (3.28)

where m(i) is the index of the temperature of chain i and QGE is again a normal-ization factor that is used to obtain a probability distribution. The assumption ofdetailed balance is necessary for the Boltzmann distribution at each temperatureto be an invariant measure of the Markov operator associated with the generalizedensemble [32].

Substituting equation 3.28 into equation 3.27 yields

Pxchg(x→ x′)

Pxchg(x′ → x)=

πGE(x′)

πGE(x)

= exp[

−βmV (q[j])− βnV (q[i]) + βmV (q[i]) + βnV (q[j])]

= exp[

(βn − βm)(

V (q[j])− V (q[i]))]

=: exp(−∆). (3.29)

The detailed balance constraint can thus easily be met by choosing the acceptancecriterion as

Pxchg(x→ x′) = min 1, exp(−∆) . (3.30)

3.3.1. Efficiency of the Replica Exchange method

The resulting acceptance ratio decreases exponentially as the distance |βn − βm|of inverse temperatures increases. Therefore, replica exchange is only attemptedbetween chains at adjacent temperatures. In [64] Sugita and Okamoto formulate thefollowing criteria for evaluating the efficiency of the Replica Exchange method:

(a) The simulation temperatures should be chosen from the interval [T, Tmax] insuch a way that the acceptance ratios are approximately equal for all pairs oftemperatures under consideration.

(b) The number of temperatures (and chains) M should be chosen so that the ac-ceptance ratios for all pairs of temperatures under consideration are higher than10%.

(c) The maximum simulation temperature Tmax should be chosen high enough toavoid trapping in local minima, i.e. all major local minima have to be foundwithin an acceptable period of time.

35


The first two criteria are easy to test, and criterion (b) can be ensured simply bystarting short test runs with different numbers of chains and measuring the accep-tance ratios. Different algorithms exist to calculate optimal choices for the simulationtemperatures with respect to obtaining equal acceptance ratios for all pairs of tem-peratures. For small to medium-sized molecules, however, choosing temperatureswith exponentially increasing distance already yields very good results in terms ofcriterion (a). In this approximation the sampling temperatures are closer to oneanother near the relevant temperature T than in the high-temperature region. A setof temperatures with this property is generated by

Ti = T · ai with a =

(

Tmax

T

)1

M−1

, i = 0, . . . ,M − 1. (3.31)

In contrast to the first two efficiency indicators, criterion (c) can only be evaluatedempirically or estimated indirectly. The former requires knowledge of the “true”conformations. As usually experimental data on the molecule’s conformations arenot available, statistical methods have to be employed. In fact, the question whetherthe maximum temperature in Replica Exchange has been chosen high enough isrelated to the question whether an MCMC simulation has been run long enough.It must also be noted that for systems that are stabilized primarily by hydrogenbonds or van der Waals forces, it is not allowable to use arbitrarily high simulationtemperatures, as that would destabilize the system under consideration.

The computational overhead associated with the Replica Exchange compared topure hybrid Monte Carlo is thus O(Mn). In order to be efficient, a Replica Exchangesampling has to be M times as fast as an HMC sampling at temperature T only.

3.4. ConfJump

The ConfJump strategy [71] employs a priori knowledge of the shape of the potentialenergy surface and thus of the Boltzmann distribution to be sampled. It facilitatestransitions between different low-energy regions by introducing artificial jumps fromone low-energy region to another into the sampling process. Thus, while still usingHMC sampling to obtain physically correct transition probabilities, the average timethe simulation spends within the basin of attraction of one local minimum of the po-tential energy is considerably shortened by occasional “jump steps” so that trappingis actively avoided. The combined Markov process which uses two different transi-tion operators is ergodic and satisfies detailed balance (cf. 2.2), thus sampling thethermodynamically correct distribution. The ConfJump method is closely related toSmart Darting Monte Carlo [1] and the Jump Between Wells approach [60, 61].

ConfJump needs a preprocessing step in which a minimization algorithm is used togenerate representatives of all important low-energy regions of the potential energysurface. Let M = m1, . . . , mC denote the set of these representatives. In the

36

3.4. ConfJump

current implementation this is done using the ConFlow algorithm by Holger Meyer,which is based on the RPROP algorithm [52]. This method is very fast and hasbeen found empirically to be able to identify all important conformations of a widevariety of small to medium-sized biomolecules [46].

The information about low-energy regions can then be used in a standard Metropo-lis Monte Carlo approach as described in section 2.2 to propose jumps from theproximity of one local minimum of the potential energy to a point in the proximityof another. More precisely, ConfJump determines the configuration mj ∈M that isclosest to the current position state q ∈ Ω and then randomly chooses another con-figuration mk ∈ M and proposes a new configuration q ∈ Ω whose relative positionto mk is determined by the relative position of q to mj. Throughout this work thefollowing intuitive algorithm is used to obtain q from q, mj , and mk:Let x be the Z-matrix representation of q. Then x is obtained from x by addingthe difference vector (mk −mj) to x. Transforming x back to Cartesian coordinatesyields q. Z-matrix coordinates are a very popular form of internal coordinates whichare invariant to translation and rotation of the molecule and otherwise describe amolecule’s position state accurately [37].

Trials generated in this way are subsequently accepted with a probability ofPacc(q → q) = min 1, exp(−β∆V ). This is the usual Metropolis acceptance crite-rion which requires a symmetric trial step, i.e. the jump from a point q ∈ Ω in theproximity of a low-energy configuration mj to a point q whose nearest neighbor fromM is a configuration mk must be proposed with the same probability as the reversejump. Due to the constraint of symmetric trial steps, the trial q must also be rejectedif its nearest neighbor in M is not mk. As the Metropolis acceptance criterion de-pends on the difference in potential energy between q and q, it can be expected thatthe acceptance probability improves if the probability to propose mk given a pointq whose nearest neighbor from M is mj is based on the potential energy differencebetween mj and mk instead of proposing all low-energy configurations mk ∈M withthe same probability.

3.4.1. Jump Proposition Matrix

Therefore, in a second preprocessing step a jump proposition matrix A is calculatedwhose entries Ajk are the probabilities to propose a configuration mk from a pointwhose nearest neighbor from M is mj. Consequently, A must be a stochastic matrix,i.e.

C∑

k=1

Ajk = 1, j = 1, . . . , C. (3.32)

In order for the Metropolis algorithm to be applicable, the detailed balance conditiongiven by equation 2.8 must be satisfied. Choosing A symmetric, i.e.

Ajk = Akj, j, k = 1, . . . , C, (3.33)

37


ensures detailed balance as the proposed new position q depends only on the positionof mk and the relative position of q to mj .

Let A be a C × C-matrix with

Ajk :=

exp (−β |V (mk)− V (mj)|) , j 6= k

0, j = k. (3.34)

The doubly-stochastic symmetric matrix A is computed by scaling the symmetricnon-stochastic matrix A using Ruiz’s algorithm [53]. Using the jump propositionmatrix A for trial generation is hoped to yield a high acceptance ratio as

• the acceptance probability depends on V (q)−V (q) and due to spatial proximityq and q are expected to be close in potential energy to mj and mk, respectively,and

• it is hoped that regions of similar potential energy have similar shapes as well.

The second point is very important as a trial q whose nearest neighbor in M is notmk has to be rejected as well. Figure 3.8 illustrates on a 2-dimensional conformationspace how the acceptance ratio can decrease when the given low-energy regions havedifferent geometric shapes. As Ajk is proportional to exp (−β |V (mk)− V (mj)|) forj 6= k, the transition from mj to mk and the reverse transition have an equally

high probability of being accepted. Setting Ajj = Ajj = 0 skips some unnecessarycomputations since the effect of accepting mk = mj is the same as rejecting mj (andstaying in mj).

~~x x

~xm i

m j

m i

m j

m j

m i

∆

∆

iso energy contours

∆∆ ∆

∆x

x x

Figure 3.8.: Jump steps proposed by ConfJump in three different scenarios. Coor-dinate axes represent two different internal coordinates. The left panelshows a jump step that is accepted. The low-energy regions representedby mi and mj have a very similar shape. If mi and mj differ stronglyin their relative positions to the regions they represent as shown in thecentral panel, the acceptance ratio is low. The same holds true if thetwo regions have very different shapes as shown on the right.

3.4.2. ConfJump as a rigorous sampling method

A Metropolis Monte Carlo sampling using only the jump method would not be er-godic as from a starting point q(0) only a small part of the configuration space Ω

38

3.4. ConfJump

is reachable by a series of jump steps. However, the method can be combined witha regular HMC approach which results in a Markov chain Monte Carlo method inwhich the configuration q(i+1) is determined from the current configuration q(i) byattempting a jump step with a fixed low probability Pjump and an HMC step withprobability 1−Pjump. While the jumping Metropolis Monte Carlo is not ergodic andthus does not have a unique stationary distribution it satisfies detailed balance withrespect to the thermodynamically correct distribution by construction. Mathemati-cally, the Boltzmann distribution at temperature T is one possible invariant measureof the non-ergodic Markov operator of the jumping Metropolis Monte Carlo. In con-trast to that, hybrid Monte Carlo is an ergodic Markov process, and the underlyingMarkov operator has the target distribution as its unique invariant measure, i.e. it isthe unique stationary distribution [21]. If the two are combined by making a jumpstep with probability Pjump and an HMC step with 1 − Pjump, an ergodic Markovprocess results whose unique stationary distribution is the Boltzmann distributionat temperature T .

3.4.3. The ConfJump Algorithm

Let M = m1, . . . , mC be a set of C low-energy configurations given in internalcoordinates (Z-matrix representation) obtained from some minimization algorithmon the potential energy function V . Further, let Pjump denote the constant fixedprobability of making a jump step rather than an HMC step.

Preprocessing: Compute the jump proposition matrix A (cf. 3.4.1).

Starting from an initial configuration q(0) ∈ Ω, repeat the following:


2. If ζ > Pjump, perform an HMC step (cf. 2.4).

3. Else, perform a jump step:

Let q = q(i) denote the current state, and let x be the Z-matrix representationof q.

(a) Find the nearest low-energy configuration mj ∈ M to x. This is donebased on the cyclic Euclidian distance in the space of important dihedralangles (cf. 2.5).

(b) Select a second low-energy configuration mk with probability Ajk.

(c) Compute x = x + (mk − mj). Let q be x transformed into Cartesiancoordinates.

(d) Find the nearest low-energy configuration X ∈M to x.

(e) If X 6= mk, set q(i+1) := q.

39


(f) Else, accept q according to the Metropolis acceptance criterion (cf. 2.2),i.e. generate a uniformly distributed random number ξ ∈ [0, 1) and set thenew configuration

q(i+1) :=

q, ξ < min 1, exp(−β∆V )q, else

. (3.35)

This is done either for a fixed number of times n or until convergence is detected.Afterwards, conformations can be identified from the trajectory

(

q(i))

by successivePerron Cluster Analysis as described by Cordes et al. in [13].

3.4.4. Efficiency of the ConfJump strategy

A point that is only briefly discussed in [71] is the efficiency of the ConfJump strategy.In fact, only a “proof of concept” using numerical examples is provided. Someconsiderations regarding the theoretical efficiency of ConfJump will be presentedhere.

When compared to pure HMC, the ConfJump strategy has very little computa-tional overhead as long as the acceptance ratio for jump steps is reasonably high.Its use of a jump proposition matrix for trial generation is hoped to improve theefficiency over the similar Jump Between Wells method [60, 61] and has a low com-putational overhead of O(C) as Ruiz’s algorithm usually converges within a fewiterations.

However, ConfJump relies fundamentally on precomputed information about low-energy regions of the potential energy V which is a very rough high-dimensionalfunction. A global search strategy has to be employed in order to find all minima ofthe conformational space. Any such algorithm is necessarily affected by the “curse ofdimensionality”, i.e. even if only a projection of V into some lower-dimensional space(e.g. the space of heavy dihedral angles) is explored, the search algorithm will stillhave a computational cost that is exponential in the dimension of the search space.In fact, the number of local minima tends to increase exponentially with increasingsize of the system under consideration [29]. Additionally, little a priori informationabout V can be employed as it is a multimodal nonconvex function [5]. Therefore,the only available option when searching for all local minima is a systematic search.Furthermore, in [76] Wille and Vennik proved theoretically that searching for allminima of the Lennard-Jones part of the potential1 alone is already an NP-hardproblem.

It must also be stressed that low-energy regions in high-dimensional very roughpotential energy landscapes can hardly be expected to be all of a similar shape asrequired for a good acceptance rate of jump steps. Rather, we would expect theshape of low-energy regions to become more and more irregular. This poses a strong

1The Lennard-Jones potential is the additive part of V that describes the non-covalent, non-electrostatic interactions between pairs of atoms, i.e. repulsion between atoms whose electronorbitals overlap and van-der-Waals attraction [23, 37, 55].

40

3.4. ConfJump

problem for ConfJump as the direction of jumps is determined exclusively by therelative positions of q, mj , and mk to each other (see step 3c of the algorithm).

For these reasons, the applicability of ConfJump is limited to small to medium-sized molecules from the outset. In practice, the ConFlow algorithm is able toidentify the low-energy regions of a wide-variety of drug-sized molecules.

41


42

4. Convergence diagnostics

When discussing Markov chain Monte Carlo algorithms in chapter 2, one very im-portant question has been left open: For how many steps should the algorithm beiterated until the sampled distribution is a reasonably good approximation of the“true” distribution? Some criterion is needed to determine when improvements inthe quality of the approximation of the target distribution can no longer be expectedfrom continuing the simulation. A related question that is no less important is: Howdo we differentiate between “good” and “bad” sampling runs? Given two or moresampling results, which one approximates the physically correct distribution best?This requires a distance measure, preferably a metric, on a suitably defined space ofsampling results.

While any rigorously conducted MCMC sampling converges towards the thermo-dynamically correct distribution of the system under consideration in O(

√n) [21]

with the number of simulation steps n going to infinity almost surely, this is onlystatistical convergence, and it is very hard to tell in practice when a given upperbound for the sampling error has been reached. Therefore, heuristics must be usedwhich, while not generally able to detect true convergence, can at least give a nec-essary condition for convergence. All convergence criteria are necessarily unreliablefor slowly mixing Markov chains [15], i.e. for chains whose state space is divided intosubsets between which transitions are rare. This is an intrinsic property of MCMCalgorithms applied to molecular systems in some physically meaningful statisticalensemble where the sampled high-dimensional probability density functions are veryrough [70]. No generally applicable convergence criterion for ergodic Markov pro-cesses is able to reliably distinguish between true convergence and local convergencewithin some metastable region in conformational space [9]. If a convergence monitorsignals convergence, it is always possible that yet undiscovered regions with highstatistical weight exist which are separated from the sampled subset by high energybarriers [45]. Obviously, using more than one convergence diagnostic can give astricter criterion of (global) convergence and thus increase the probability to detectlocal convergence.

A wide variety of approaches is in use for convergence diagnostics; for an overviewsee [9] or [15]. Most methods are based on analyses of the properties of Markovprocesses and applicable to a wide field of problems. The most commonly used ofthese is presented in section 4.1.In addition to that, knowledge of the properties of the systems under considerationcan (and should) be employed. In section 4.3 a semi-empirical convergence criterionis developed which can give an independent necessary condition for convergence ofthe MCMC method for molecules that contain rotational symmetries.

43


Generalization of this criterion leads to a histogram-based method for comparing theresults of two different sampling runs which is presented in section 4.2.

4.1. The Gelman-Rubin Criterion

The Gelman-Rubin statistic [8, 26] is one of the most widely used convergence indi-cators in practice. Gelman and Rubin’s algorithm has a low computational cost, itis applicable to any type of ergodic Markov process and very easy to implement.Gelman and Rubin’s approach requires multiple independent Markov chains whichare launched from different starting points from an overdispersed distribution. Itdifferentiates between true convergence and a trapping in some subset of the confor-mational space on the basis of a comparison of the variance within each chain withthe variance between chains for some set of one-dimensional real-valued observables.

For m independent Markov chains with a length of n steps each the average of them within-chain variances for an observable θ is given by

W =1

m(n− 1)

m∑

j=1

n∑

i=1

(θij − θj)

2, (4.1)

where θij denotes the value of θ at step i in chain j. The variance between the m

chain means θj is

1

nB =

1

m− 1

m∑

j=1

(θj − θ)2. (4.2)

From W and B the total variance of the observable θ can be estimated:

σ2 =

(

1− 1

n

)

W +1

nB. (4.3)

If the MCMC simulation has converged, W and σ2 are almost equal since W andB converge (statistically) to the same value. If, however, one chain gets trappedwithin one local subset of the conformational space which it never leaves while otherchains generate a significant number of samples from other regions as well, then Wwill be lower than B. σ2 is an overly strict estimate of the total variance. Takingthe sampling variance of both µ = θ and σ2 into account yields a pooled posteriorvariance estimate

V = σ2 +B

mn. (4.4)

The Gelman-Rubin statistic, also called potential scale reduction,

√

R =

√

V

W, (4.5)

the square root of the ratio between the pooled and within-chain variance estimate, isthen used as a measure for how much closer the sampled distribution might become

44

4.2. Comparing Sampling Results

to the stationary distribution if the simulation were run longer.√

R is always greaterthan 1 and converges to 1 + 1

mn

n→∞−→ 1 almost surely. Gelman and Rubin suggest

running the simulation at least until√

R is less than 1.1 or 1.2 [25, 27].In order to generate starting points from an “overdispersed” distribution, a short

disperse sampling is performed in which m independent Markov chains are launchedfrom the same arbitrary starting point at a high temperature Tdisperse. This leads toa Boltzmann distribution that covers the distribution at temperature T in the sensethat it shares all important minima and maxima with it while being generally morevariable. A subsequent very short burn-in sampling at a temperature Tburn-in ≤ Tensures that the points from which the m Markov chains used for the actual sam-pling start lie within important regions under the distribution at temperature T , i.e.usually near local minima of the potential energy surface.

For all experiments described in this thesis the implementation by Holger Meyer [47]is used which monitors convergence in all d important dihedral angles (cf. 2.5) byusing both the sine and the cosine of each torsion angle ϕi as linearized observableswhich results in 2d observables for each of which a potential scale reduction factoris computed at regular intervals. Approximate convergence is assumed when themaximum of these factors drops below a fixed threshold (usually 1.01).

Monitoring convergence by the Gelman-Rubin method has a low additional com-putational cost since the samples generated in all m chains are equally used for thesubsequent cluster analysis. This is possible because due to ergodicity running mul-tiple moderately long Markov chains is equivalent to running one very long chainin the limit [28]. Only a small computational overhead of O(m) arises from the mshort disperse and burn-in samplings.

The Replica Exchange method (cf. 3.3) which inherently uses multiple indepen-dent Markov chains is directly accessible to Gelman and Rubin’s method. With mchains at different temperatures which are allowed to exchange temperatures in fre-quent intervals the within-chain variance and between-chain variance as calculatedby equations 4.1 and 4.2 also converge to the same value. However, the combineddistribution sampled by these “switching” chains on a generalized ensemble (see fig-ure 3.6) is different from the target distribution (which, in fact, is only sampled bythe Markov chain that is obtained by piecing together the segments at the samplingtemperature T ). Therefore, a greater total variance is expected from the combineddistribution, but the Gelman-Rubin convergence monitor is still applicable and noless reliable than on m Markov chains sampling the Boltzmann distribution at tem-perature T . This were not the case if the RE method were implemented using mchains at different temperatures which exchange positions rather than temperaturessince in that setting each chain would sample a different distribution.


The primary goal of this thesis, comparing the performance of the three HMC-basedsampling methods presented in chapter 3 on a certain set of molecules, requires com-

45


parisons of different sampling runs. Different ideas were considered with the aim thatcomparisons can be performed easily and independently of the generating samplingtechnique while incorporating as much as possible of the information generated inthe sampling process. The idea to compare clustering results was quickly discardedas there is no way to define an informed distance measure on sets of clusters in ahigh-dimensional space. Further, it was felt that the sampling results should becompared in a more direct way. Thus, a metric on approximated statistical distribu-tions, which are directly computable from any sampling result, was developed, whichis presented in this section. A variant of this metric has been developed which isable to monitor convergence during sampling. The resulting symmetry criterion ispresented in the following section.

The result of an HMC sampling run with a total length of n steps is a timeseries (trajectory) of molecule configurations (q1, . . . , qn). Projected into the con-formational space defined by “heavy” dihedrals (cf. 2.5), it becomes a time series(Φ1, . . . ,Φn) whose data points are vectors of dihedral angles Φ = (ϕ1, . . . , ϕd),where d is the number of dihedral angles used to define the conformational space.This projection discards information from those degrees of freedom which do notdefine metastabilities.

When comparing two sampling results it makes no sense to look at individualmolecule configurations. Rather, we want to compare the distributions sampled bythe two simulation runs. It should be noted that in doing so all information aboutthe order in which the sampling points where generated (and thus information abouttransition probabilities) is discarded. Instead of trying to compare two d-dimensionalsampled probability density functions, comparisons are performed on the basis of theprojections of the sampled distribution into each of the d 1-dimensional subspaces ofthe conformational space, i.e. we look at the sampled distributions in each dihedralangle separately. A sampling result S is thus interpreted as a tuple of approximated1-dimensional statistical distributions S = (ρ1, . . . , ρd) for each dihedral angle, eachdefined on the interval [0, 2π). Being density functions, the functions ρi are non-negative with

∫ 2π

0

ρi(ϕ) dϕ = 1.

These distributions can easily be discretely approximated as histograms for eachtorsion angle by binning all configurations Φj according to their value of ϕi using zbins of equal width.

1-dimensional projections are used instead of the original sampled distributionbecause comparing two d-dimensional is simply not practicable. Comparing twod-dimensional functions using a discretization of z bins in every dimension has acomputational cost of zd (and requires the same number of memory cells for storingthe resulting histogram) which can only be done in a reasonable time for very lowvalues of d. The determining factor for the cost of comparing sampling results shouldbe the number of sampling points n which has a linear influence on the computational

46


cost as every point has to be processed exactly once when accumulating a histogramof the density of the sampling points. Comparing sets of d 1-dimensional histogramsinstead has a cost of z ·d (both in time and memory) which is clearly preferable. Thesame idea of looking at the d degrees of freedom separately rather than discretizingthe original d-dimensional space is used in [13] for cluster analysis.

If a metric defined on the space of d 1-dimensional projections of the d-dimensionalsampled distribution indicates a distance of zero between two sampling results, this isonly a necessary condition for the original two approximated d-dimensional distribu-tions being identical. However, since both distributions are the results of samplingsof the same molecule exploring the same potential energy landscape, correlationsbetween dihedral angles are expected to have the same effect in both sampling re-sults so that the difference in the original sampled distributions is not expected tobe significantly greater than the difference measured between the sets of their 1-dimensional projections. Moreover, if that difference lies below some threshold sothat the sampling results would be considered “similar”, the distance between theoriginal d-dimensional distributions is expected to be “low” as well.

Let S1 = (ρ1, . . . , ρd) and S2 = (σ1, . . . , σd) be two sampling results given by theapproximated distributions ρi and σi, respectively, for each dihedral angle ϕi. Aninformative measure for the difference between S1 and S2 within one torsion angleϕi is the L1-metric in function space:

δi(S1,S2) =1

2

∫ 2π

0

|ρi(ϕ)− σi(ϕ)| dϕ. (4.6)

The factor 12

ensures that values of δi are always in the interval [0, 1]:

∫ 2π

0

|ρi(ϕ)− σi(ϕ)| dϕ ≤∫ 2π

0

ρi(ϕ) + σi(ϕ) dϕ = 2. (4.7)

The value 2 is not only an upper bound for the difference integral, but the integralcan actually take on this value, namely if ρi(ϕ) = 0 for every point ϕ with σi(ϕ) > 0and vice versa.

The metric defined in equation 4.6 can be extended to a metric over tuples ofdihedral distributions by simply averaging over all dihedrals:

δ(S1,S2) =1

d

d∑

i=1

δi(S1,S2)

=1

2d

d∑

i=1

(∫ 2π

0

|ρi(ϕ)− σi(ϕ)| dϕ)

. (4.8)

This metric is highly informative since it uses information from every point in everyhistogram and is therefore well-suited for measuring distances between samplingresults.

47


Let W = (w1, . . . , wd) ∈ [0, 1]d be a weight vector with∑d

i=1wi = 1. Then theweighted average over the metric’s values for each dihedral,

δ(S1,S2) =1

2

d∑

i=1

wi ·(∫ 2π

0

|ρi(ϕ)− σi(ϕ)| dϕ)

, (4.9)

is a metric as well. Note that δ is a special case of δ for wi = 1/d, i = 1, . . . , d. δallows weighting the histograms for the different dihedral angles against each other,which can be used to put more emphasis on major metastabilities separated fromthe rest of conformational space by high potential energy barriers than on minormetastabilities which correspond to shallower local minima in conformational space.As an example, w = 1/λ2 can be used, where the second eigenvalues of each dihedral’stransition matrix T discovered by successive Perron Cluster Analysis [13]. Thismeans weighting the dihedrals by their degree of metastability.

In practice the density functions ρi are approximated by histograms Hi each ofwhich consists of z bins H1

i , . . . , Hzi of equal width which form a discretization of the

interval [0, 2π). The histograms are normalized, i.e. for all i = 1, . . . , d:

z∑

j=1

Hji = 1.

The difference between two sampling results given as sets of histograms H and J isthen calculated as

δ(H, J) =1

2d

d∑

i=1

z∑

j=1

∣

∣Hji − J j

i

∣

∣ (4.10)

which is the average bin-wise difference of all d pairs of histograms. Again, his-tograms for different dihedrals can be weighted against each other in a way analogousto equation 4.9.

The histogram-based metric thus defined is a very widely applicable method forcomparing different sampling results as it only depends on the sampling points them-selves. Even the results of ZIBgridfree (cf. 3.2), which consist of a set of s samplingresults with different weights w1, . . . , ws ∈ [0, 1] each, can be compared to each otherand to sampling results from other techniques. The total histogram H for one ZIB-gridfree sampling run is computed from the normalized histograms H1, . . . , Hs of theresults from the samplings of the s partial densities (see 3.2.2) which are weightedby the thermodynamic weights calculated in the sampling analysis:

H =

s∑

i=1

wiHi. (4.11)

Similarly, the method can be applied to results of sampling techniques that assign in-dividual weights to all sampling points. These point weights can easily be taken intoaccount when accumulating the histograms: When a sampling point with weight w

48

4.3. Symmetry criterion for convergence

is determined to fall into a bin b, the counter for b is not increased by 1 but by w.

With this distance measure the quality of different sampling runs can be judgedby comparing the different sampling results to a reference. Unfortunately, thereexists no general method for assessing the quality of a sampling run, and thus it isin general impossible to create a reliable reference run. However, one can at leastuse a sampling run as reference in which one has great confidence, e.g. because of

• having run the simulation for a very long time,

• obtaining very similar results (as measured by the metric presented in this sec-tion) from very long simulations with different sampling methods, e.g. ReplicaExchange and ConfJump, from different starting points, and/or

• finding many features of the sampling result in accordance with chemical in-tuition and possibly expert knowledge.

As mentioned in section 3.4 the ConfJump approach has been found to be reli-able for typical drug-like molecules of small to medium size for which the ConFlowalgorithm can reliably identify representatives of all important low-energy regions.Therefore, for the numerical experiments conducted for this thesis reference runswere created by running several long ConfJump simulations (5 HMC chains at 200000steps each), verifying that all pair-wise distances were below a threshold of 0.03 andtaking the simulation result as reference that had the lowest distance to all others.This reference was then verified by performing long simulations using the ReplicaExchange methods and comparing the results to the reference run.


When assessing the convergence behavior of different HMC-based sampling methods,it is clearly not advisable to rely on Gelman and Rubin’s statistic alone, as

• no convergence indicator is able to reliably discern true convergence on thewhole conformational space from local convergence within some metastableregion (see [9] and also page 43 in this thesis), and

• when dealing with very rough high-dimensional functions such as the Boltz-mann distributions of biomolecules, it is quite probable that some region inconformational space with a high statistical weight is never reached by any ofthe Markov chains, a case in which the Gelman-Rubin statistic falsely indicatesconvergence.

Therefore, it has been one of the goals of this thesis from the outset to developa new convergence criterion to be used in addition to Gelman and Rubin’s methodwhich would incorporate knowledge about the system to be simulated. The ideafor the criterion presented in this section stems from the observation that many

49


biomolecules contain rotational symmetries which should, of course, be reproducedin sampling. If e.g. the molecule under consideration contains a symmetric planarring that is connected to the rest of the molecule by one single bond1, the distributionof the torsion angle corresponding to that single bond over all molecule configura-tions generated should be periodic with a period of π (see fig. 4.1). A configurationwith the torsion angle at a value of ψ and the configuration that has the same torsionangle set to π + ψ but is otherwise identical to the first one behave physically andchemically in the same way and are therefore generated with equal probability insampling.A measure for the sampling error in a rotationally symmetric dihedral is definedon the basis of the metric for comparing histograms developed in section 4.2. Thiscriterion proposed here is only applicable to molecules containing rotational sym-metries. However, a cursory look at the ligand structures stored in the ProteinData Bank [4] reveals such symmetries, particularly symmetric planar ring struc-tures to be an abundant feature of drug-like molecules. It is worth noting that thesymmetry criterion is applicable to a large fraction of the class of peptide ligands(see e.g. [14, 20, 34]), as the amino acids phenylalanine and tyrosine each contain asymmetric aromatic ring.

I II I

HO HOOH OH

H H

I

OH

I

HO

H

1

2

5

4

3

2ππ

V

5

4

1

2

3

Figure 4.1.: Rotation of a symmetric planar ring and its effect on the potential en-ergy.

In order to define a measure of sampling error based on this, a histogram is createdfor each rotationally symmetric torsion angle of the molecule. This is done by binningthe configurations generated by sampling according to their value for the symmetrictorsion angle. Throughout this work a fixed bin width of 5( π

36) was used. Then

the sections of the histogram that are expected to be identical due to moleculesymmetries are compared to each other. This is done by applying the error measurefor comparing histograms presented in section 4.2 to all pairs of symmetric histogramsections (see fig. 4.2). The symmetry error measure is derived from equation 4.6. Itis defined as the mean bin-wise difference between all pairs of symmetric sectionsof the (normalized) histogram H (as defined on page 48) for a symmetric torsionangle ϕ.

1This connecting bond must lie on a symmetry axis of the ring.

50


For 180 rotational symmetry the symmetry error is calculated as

Esym(ϕ) =

z/2∑

i=1

∣

∣Hi −H z2+i

∣

∣ . (4.12)

Using a bin width of π36

yields a number of bins z = 72.In the case of 120 rotational symmetry the average difference between three pairsof histogram sections is used:

Esym(ϕ) =1

2

z/3∑

i=1

(

∣

∣Hi −H z3+i

∣

∣ +∣

∣

∣Hi −H 2z

3+i

∣

∣

∣+∣

∣

∣H z

3+i −H 2z

3+i

∣

∣

∣

)

. (4.13)

The factor 12

again ensures that the error is in the interval [0, 1] and is calculatedas 1

3, for averaging between 3 pairs of histogram sections, divided by 2

3which is the

maximum average difference between these pairs.

Figure 4.2.: The symmetry error for a single bond with 180 rotational symmetry ismeasured as average bin-wise distance between periodic sections of thecorresponding histogram.

Thus, we obtain an informative measure for the sampling error which yields anecessary condition for convergence: If the sampling error is still above some fixedthreshold, the MCMC sampling has not converged, yet. The convergence criteriongets stricter with every rotationally symmetric single bond in the molecule as, likewith the Gelman-Rubin indicator, the maximum of all symmetry errors is used asconvergence monitor.

4.3.1. Applicability of the symmetry criterion

The symmetry criterion is applicable to all MCMC methods that sample a Boltzmanndistribution, such as ConfJump and Replica Exchange. When performing Replica

51


Exchange, either the combined distribution of all chains (which is not a Boltzmanndistribution but preserves symmetric behavior) or a Markov chain that is composedof all segments that are at the sampling temperature T can be used. The latterapproach was chosen for the simulations performed for this thesis as it was felt thatthis would give a stricter criterion of convergence due to the fact that only thelow-temperature data are used in the cluster analysis.

ZIBgridfree samples a series of different distributions based on modified poten-tial energy functions which do not necessarily assign the same energy value to twosymmetric configurations. Reconstructing the overall sampled distribution requiresreweighting of the sampling results under each modified potential against each otherwhich has a computational cost of O(ns3), where s is the number of modified poten-tial energy functions and n is the number of sampling steps per individual samplingrun (cf. 3.2.6). Moreover, the weights change as the sampling progresses. Thus,building the histograms for the symmetric torsion angles requires looking at all timesteps of the samplings under each potential modification. Therefore, a convergencemonitor based on symmetry errors should not be used for ZIBgridfree due to itsprohibitive computational cost. However, it is easy to calculate symmetry errorsduring cluster analysis after the correct weights are calculated. The histogram fora symmetric torsion angle is built by adding histograms for the sampling runs ineach potential modification which are multiplied by the respective weight of thecorresponding partial density function.

In an RE or ConfJump simulation with an interval of convergence tests ttest thehistogram at a time t can be reused when estimating the distribution at time t+ttest.Therefore, each convergence test based on the symmetry criterion only needs to lookat the last ttest sampling steps. This is, in fact, less than the computational costof the Gelman-Rubin convergence monitor which has to look at all t + ttest steps.Figure 4.3 shows the symmetry error decreasing in a typical sampling run using theReplica Exchange method for the molecule L-benzylsuccinic acid (BZS) shown infigure 5.1, which contains one 180 rotationally symmetric bond.

The fact that multiple chains are sampled as a requirement of Gelman and Rubin’smethod is useful for the symmetry criterion as well. By using 5 chains which is neitherdivisible by 2 nor by 3, we know that at least one chain must sample the transitionbetween the 2 (or 3) symmetric parts of a monitored dihedral’s distribution in orderfor an approximately equal number of points being generated from all symmetricparts. If e.g. all 5 chains never crossed the barrier between the two periodic regionsin the distribution of a 180 rotationally symmetric torsion angle, there would beat best 3

5of the sampling points in one region and 2

5in the other. The symmetry

criterion is thus able to recognize this type of local convergence.

52


0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0 10000 20000 30000 40000 50000 60000

Esym

n

Figure 4.3.: The symmetry error decreases with growing number of simulation steps.

4.3.2. Automatic detection of molecule symmetries

Of course, it is desirable that the user of conformation analysis software such asthe ZIBgridfree program [47] need not input the information about rotational sym-metries. Rather, it should be possible to identify such symmetries automaticallywithout any input from the user. Therefore, the following algorithm has been devel-oped for examining symmetric properties of a molecule based on its topology.

Rotational symmetry of single bonds in a molecule is a property of the molecule’stopology rather than its geometry. If two “branches” of a molecule are consid-ered symmetric based on a topological analysis, the only geometric property thatremains to be tested is whether there exist chirality centers in the branches. There-fore, rotational symmetries in a molecule are mainly determined based on a graphrepresentation of the molecule.

Let a graph G = (V,E) with a set of nodes V and a set of undirected edgesE ⊆ V × V be a graph representation of the given molecule, i.e. each node v ∈ Vrepresents one atom by storing the atom’s unique index and its atomic number, andeach edge e = (u, v) = (v, u) ∈ E represents a bond between atom u and atom v.Then a node v with degree g = deg(v) ≥ 3 is a symmetry center if after removingone edge (u, v) from the graph, the component of the remaining graph that containsv and its remaining neighbors v1, . . . , vg−1 can be split into g − 1 non-overlappingisomorphic subgraphs so that no two edges (v, vi) and (v, vj), i 6= j, are part ofthe same subgraph. Such a graph partitioning is shown schematically in figure 4.4.Then the edge (u, v) is rotationally symmetric if it is a single bond. Since we are

53


interested in biomolecules, it is sufficient to consider nodes with a degree of 3 or 4,i.e. 2 symmetric branches resulting in a 180 rotational symmetry or 3 symmetricbranches which gives a 120 rotational symmetry.

v

u

v1 v2

vg-1

...

Figure 4.4.: An edge is rotationally symmetric if the connected component border-ing on one node v of the edge can be split into deg(v) − 1 isomorphicsubgraphs.

All symmetry centers (and consequently all rotationally symmetric single bonds)for 180 rotational symmetry can be found efficiently by the following recursivealgorithm. It is assumed implicitly that no atom has more than four binding partners.Full pseudocode can be found in appendix A.

Recursive algorithm for identifying symmetry centers

Start from an empty list of symmetric dihedrals.

For every node v ∈ V with deg(v) = 3 that is adjacent to a single bond describedby a “heavy” dihedral (cf. 2.5), repeat the following:

• For each neighbor u of v do:

1. Mark u and v as visited, all other nodes as unvisited.

2. Let l, r be the other two neighbors of v.Call function compareSubgraphs(v, l, v, r) to determine whether thebranches starting with the directed edges v → l and v → r are isomorphic.

3. If the result of step 2 is True, the bond described by the edge (u, v) isrotationally symmetric.Identify the dihedral that describes the bond (u, v) and add it to the listof symmetric dihedrals if each of the isomorphic branches contains morethan one atom.

compareSubgraphs(from1, to1, from2, to2)Test the following cases in the order given:

Case 1: to1 = to2, i.e. a ring closes.

54


to1/2

Figure 4.5.: Recursion ends if two branches meet.

a) to1 has ≤ 3 neighbors (at most one unvisited neighbor):

Return True.

b) to1 has four neighbors (two unvisited neighbors, l and r):

l r

to1/2

Figure 4.6.: Two branches meet at a new branching point.

Mark to1 = to2 as visited.If compareSubgraphs(to1, l, to1, r), then return True.Else, unmark to1 and return False.

Case 2: Nodes to1 and to2 are of different types of atoms or have a different numberof neighbors.⊲ Atoms are incompatible ⇒ backtrack.Return False.

Case 3: to1 has exactly one neighbor (the one we came from).

to1 to2

Figure 4.7.: Recursion ends at matching terminal atoms.

Return True.

Case 4: to1 and to2 have a different number of unvisited neighbors.⊲ One branch is growing into the other ⇒ backtrack.Return False.

Case 5: to1 has no unvisited neighbors.

Return True.

55


to1 to2

Figure 4.8.: Recursion ends with a ring closing on each branch.

Else: Recurse into branches.Set result ← False.Mark to1 and to2 as visited.

a) to1 has 1 unvisited neighbor:

to1 to2

a A

Figure 4.9.: Only one path to pursue on each branch.

Set result ← compareSubgraphs(to1, a, to2, A).

b) to1 has 2 unvisited neighbors:

to1 to2

a b A B

Figure 4.10.: Branching point with 2 branches on each side.

⊲ The pairs of isomorphic subbranches are either (a, A) and (b, B) or(a,B) and (b, A).Call compareSubgraphs recursively to identify isomorphic pairs ofsubbranches.Set result accordingly.

c) to1 has 3 unvisited neighbors:

⊲ Try to find a permutation of (A,B,C) that is isomorphic to (a, b, c).Call compareSubgraphs recursively to identify isomorphic pairs ofsubbranches.Set result accordingly.

56


to1 to2

ab

c AB

C

Figure 4.11.: Branching point with 3 branches on each side.

If result = True, check for chirality.

a

b

c

a

b

c

a

b

c

a

c

b

Figure 4.12.: Two branches are not symmetric if they contain chiral atoms.

If the atom to1 is a chirality center, set result← False

⊲ See “A note on chirality” on page 57.

If result = False, unmark to1 and to2.Return result.

The algorithm is very similar for 120 rotational symmetry. This variation isomitted here.

A note on chirality

Chirality (or “handedness”) is a form of stereoisomerism which in organic chemistryoccurs with certain carbon atoms [48]. When a carbon atom is connected to 4different functional groups, these can be arranged in two different ways that representnonsuperimposable mirror images of each other. Two molecules that differ only inthe configuration at one chiral center from each other have different physical andchemical properties.

When looking for 120 symmetry, three branches that contain chirality centers canonly be accepted as symmetric if the chirality is the same on all branches. Otherwisethe resulting structure is not rotationally symmetric.

When trying to identify 180 rotational symmetry, however, two branches of amolecule that are identified as symmetric based on topology must not contain anychiral carbon atoms (see figure 4.12 for an illustration). Therefore, whenever in bothbranches a node is discovered that has three undiscovered neighbors, the branches areconsidered not symmetric if the functional groups to which the four neighbors belongare all different. This property is tested in case 5c of the algorithm by applying avariant of compareSubgraphs to pairs of the branches that start with from1, a,

57


b, and c, respectively, until an isomorphic pair is found or all pairs have found to benot isomorphic.

58

5. Numerical Experiments

The three sampling methods presented in chapter 3, ZIBgridfree, Replica Exchange,and ConfJump were compared by trying to estimate the thermodynamically correctdistribution at 300K for the three model systems presented in section 5.2.

5.1. Performance measure for sampling runs

The most important quantities for comparing different sampling methods are thetime requirement, the mean accuracy, and possibly the memory requirement of eachstrategy. It was decided to deal with the latter only theoretically and only incor-porate the former two quantities into a measure of performance. The reciprocal ofthe product of a time measure and a measure of sampling error is well suited asperformance measure as both time and sampling error will be close to zero in theoptimal case and large for “bad” sampling runs. It is impossible to normalize thetwo measures against each other as no general statement is possible about the ratiobetween one unit of time and one unit of error (in fact, this ratio is being examinedin this study). This leads to the problem that it is impossible to tell which case isworse, that of a high sampling error after a short sampling time or that of a longsampling yielding a low sampling error, where the product of the two is the samein both cases. However, it is not the goal of this thesis to compare arbitrary sam-pling runs, and samplings are always run either until all convergence criteria signalconvergence or until a fixed number of time steps which is chosen relatively high.Therefore, the sampling error can be expected to be approximately on the same or-der of magnitude for all samplings while the actual number of iterations needed byeach sampling run can differ strongly. At least, the constellation of a high samplingerror at a low sampling time is actively prevented. Thus, the performance measureproposed here can be thought of as a measure of time until convergence augmentedby a punishment factor for sampling error.

As all sampling techniques under consideration are based on the hybrid MonteCarlo approach (cf. section 2.4), it is sufficient to measure the time of each samplingrun in terms of total HMC steps. Corrections are necessary only for the preprocessingstep of ConfJump in which representatives from all low-energy regions are generated(see 3.4). No correction was used for the presampling phase of ZIBgridfree, as thatvalue is low compared to the total number of time steps in 100 subsamplings (cf. 5.3).

As the identification of all low-energy regions in conformational space is not basedon HMC, a correction factor ν is introduced, which is the average time neededby a simulation per HMC step on the same computer on which the preprocessing

59


was performed. Thus, the time needed for detecting all low-energy regions in thepreprocessing can be expressed in HMC steps by division by ν.

The sampling error is measured as the distance between a sampling result andthe result of a reference run by the metric developed in section 4.2. Reference runswere generated for each molecule by performing several very long runs (1.2 · 106

steps overall) of ConfJump and Replica Exchange, respectively, and choosing theone which had the least distance to all others after removing obvious outliers. Thesampling runs used for creating the reference were discarded afterwards. Visualinspection of the distributions in all heavy dihedrals also shows almost no flaws forall the reference runs for every model system used (see chapter 6). The samplingerror must, of course, be considered zero if it lies below some fixed threshold dueto the inherent uncertainty inherent in the generation of reference runs which isexplained in more detail in section 4.2. The unweighted form of the sampling error(calculated by equation 4.10) is used for evaluating the quality of all simulations.It has been found empirically that when comparing pairs of random histograms,the average bin-wise difference is 15.0%. When evaluating sampling error alone,samplings that have an average bin-wise difference from the reference of less than1%, are considered equal. Values above 1% are considered very low up to 3%, lowbetween 3% and 6%, medium between 6% and 11% and high if they lie above 11%.The same scale applies to the symmetry error.

The performance of a sampling run S is measured against a reference run Sref inall practical experiments as

G(S) =1

n · δ(S,Sref), (5.1)

where n is the total number of HMC steps performed during sampling which iscorrected as described above.

5.2. Molecules used for this study

The performance of the sampling methods was measured for three different ligandmolecules which were extracted from the Protein Data Bank (PDB) [4]. The particu-lar choice of ligand molecules used here was inspired by Bostrom [6]. Only moleculeswere chosen that contain 180 rotational symmetries so as to be able to use thesemi-empirical convergence criterion developed in section 4.3. Table 5.1 shows thatthe three molecules chosen from the PDB differ considerably with respect to theirsize and complexity. The structural formulas of the molecules are shown in figures5.1, 5.2, and 5.3. The single bonds that correspond to heavy dihedrals are labeledwith numbers.

L-Benzylsuccinate (found in the PDB under its “HET ID” BZS) is an inhibitorof carboxypeptidase A [40]. It consists of 25 atoms of which 10 are hydrogen. Themolecule is shown in figure 5.1 and has one rotationally symmetric single bondwhich connects the aromatic ring to the rest of the molecule. Its conformations

60

5.2. Molecules used for this study

HET ID atoms H atoms d dsym

BZS 25 10 5 1TOP 39 18 5 2BSI 46 18 7 2

Table 5.1.: The molecules used for this study. d is the number of heavy dihedrals, anddsym is the number of rotationally symmetric dihedrals (180 rotationalsymmetry).

-

-

1

2

3

4

5

Figure 5.1.: L-Benzylsuccinate (BZS). Numbers 1–5 indicate heavy dihedrals.

are described in terms of 5 “heavy” dihedral angles. BZS is the smallest and leastcomplex system considered in this work.

1 3

2

4

5

Figure 5.2.: Trimethoprim (TOP), an antibiotic.

Trimethoprim (HET ID: TOP) is an antibiotic that works by inhibiting bacterialdihydrofolate reductases [11]. The molecule consists of 39 atoms of which 18 arehydrogen and is more complex than L-benzylsuccinate. Of the 5 heavy dihedrals inthe Trimethoprim molecule two are rotationally symmetric, namely the two bondsthat lie in the symmetry axis of the aromatic ring shown on the left hand side infigure 5.2, one facing the greater part of the molecule, the other facing the central(-OCH3)-group.

BSI (2-(Biphenyl-4-sulfonyl)-1,2,3,4-tetrahydro-isoquinoline-3-carboxylate) is aninhibitor of the enzyme neutrophil collagenase which is also called matrix metal-loproteinase 8 (MMP-8) [41]. At 46 atoms of which 18 are hydrogen, BSI is notonly the largest but also the most complex molecule under consideration in thiswork. As clearly visible in figure 5.3 the molecule contains a non-aromatic ring (topleft), which is why a similar behavior to that of cyclohexane (see figure 3.1) canbe expected from BSI. It is expected that this ring can assume two very different

61


-

1

2

3

4

56

7 8

Figure 5.3.: 2-(Biphenyl-4-sulfonyl)-1,2,3,4-tetrahydro-isoquinoline-3-carboxylate(BSI).

configurations that are separated by extremely high energy barriers which makes thesampling very difficult. Of the 8 heavy dihedrals spanning the molecule’s conforma-tional space 2 describe rotationally symmetric single bonds, namely that connectingthe two aromatic rings on the right hand side and the bond that connects one ofthese rings to the sulfur atom.

5.3. Simulation details and choice of parameters

5 experiments were conducted for each molecule and each sampling strategy underconsideration which yields a total of 45 simulation runs. The 5 simulations for onemolecule using the same technique were run with the same set of parameters exceptfor the initial state of the random number generator which was chosen differentlyfor each simulation run. All computer simulations were performed using the ZIB-gridfree framework [47]. All methods developed for this thesis, most importantly thesymmetry criterion (cf. 4.3) and the algorithm for automatic detection of moleculesymmetries (cf. 4.3.2) have been implemented within this framework which alreadycontained all three simulation techniques compared in this thesis. The ZIBgridfreeprogram is written in C++ and uses libraries from amira [63] and amiraMol [56].

All experiments were run at a temperature of 300K which is near to typical phys-iological temperatures. Every individual HMC sampling run (including the presam-pling phase of ZIBgridfree) was started with a disperse phase at a temperature of2000K for 300 HMC steps and a burn-in phase at 300K for 10 steps in order to en-sure that the Markov chains start in different regions of Ω. All disperse and burn-insampling steps are discarded (cf. section 4.1). Each HMC proposal was generatedby 60 integration steps of molecular dynamics. The length of an MD step was chosenas 1.3fs. All three methods were used with the parameters set to values that werefound to be suitable in earlier experiments, see e.g. [45, 71].

ZIBgridfree simulations were restricted to 100 nodes resulting in 100 partial dis-tributions to be sampled. The maximum number of HMC steps within for thesampling of each partial density was set to 20000 per chain. 60 MD steps were per-formed for trial generation for HMC in the “horizontal” sampling (see section ref-

62

5.3. Simulation details and choice of parameters

sec:zmfsampling), while 30 steps of MD were used for generating the configurationsq′j in the “vertical” sampling. In presampling the maximum number of HMC stepsallowed was 18000. 5 Markov chains were launched per simulation both in presam-pling and in sampling resulting in a total upper bound of approximately 11 · 106

HMC steps depending on the actual number of nodes used. The presampling wasperformed at a temperature of 2500K, and convergence was detected by a Gelman-Rubin statistic (see section 4.1) using a threshold of 1.05. The convergence of thesampling was monitored by ... Convergence checks were performed every 500 HMCsteps. In order to accelerate calculations a cutoff value of 10−6 was used below whichthe value of a basis function φi was set to zero.

Replica Exchange simulations for all three molecules were performed with 10 chainsat temperatures of 300K, 387.46K, 500.43K, 646.33K, 834.77K, 1078.14K, 1392.48K,1798.45K, 2322.79K, and 3000K which were determined by equation 3.31. The max-imum number of steps allowed per chain was set to 100000 which amounts to a totalupper bound of 106 HMC steps. Convergence was monitored by a combination ofthe Gelman-Rubin statistic with a threshold of 1.01 and the symmetry criterion (cf.section 4.3) using a threshold of 0.04. Convergence tests were performed at intervalsof 500 HMC steps.

For the ConfJump simulations the same convergence criteria were used as forReplica Exchange (although the interpretation of the value of the Gelman-Rubinstatistic changes slightly for Replica Exchange, see section 4.1). The same totalupper bound of 106 HMC steps was chosen (mainly due to memory limitations) whichcorresponds to 200000 maximally allowed steps in each of 5 Markov chains. The samesets of precomputed representatives of the low-energy regions of the potential energyas in [71] were used for all three molecules. As ConfJump simulations are expected toconverge fast from the simulations performed in [71], convergence was checked every200 steps. The probability of jump steps was set to Pjump = 0.2 for all simulations(see 3.4).Correction factors ν (cf. 5.1), which are used to express the time for generatingrepresentatives of all local minima of the molecule, were calculated for all threemolecules from ConfJump simulations of 1000000 steps in length (similar to theactual simulation runs in setup) as shown in table 5.2.

HET ID tConFlow ν cBZS 470s 83.1 39000TOP 1700s 47.0 80000BSI 2700s 34.5 93000

Table 5.2.: Time of generating representatives of low-energy regions tConFlow (in s),number of HMC steps per second ν (in a ConfJump run), and c, theproduct of the two, for the three model systems (approximate values).

63


64

6. Results

In the following tables ‘error’ is the sampling error, ‘steps’ is the total number ofHMC steps performed, the column ‘corrected’ contains the corrected number ofsteps in the case of ConfJump, ‘performance’ is the performance as calculated byequation 5.1 while ‘Esym’ and ‘G-R’ contain the final values of the symmetry andGelman-Rubin criterion, respectively. µ denotes the means over the values for the 5respective sampling results while σ is the estimated standard deviation.

6.0.1. L-Benzylsuccinate

Figure 6.1 shows the 1-dimensional projections of the Boltzmann distribution of L-benzylsuccinate at 300K sampled by the reference run, a ConfJump simulation witha length of 120000 steps per chain performed with overly strict convergence criteriabut with the other parameters set to the values given in section 5.3. The threedihedral angles corresponding to rotationally symmetric single bonds, namely thebond next to the aromatic ring (top left panel in figure 6.1) and the two bonds thatare adjacent to the carboxyl groups (bottom panels), show nearly perfect symmetryat visual inspection. The symmetry error for the monitored dihedral (1) is very lowat 2.06%. The distribution of dihedral 2 (see figure 5.1; top center in figure 6.1)shows two peaks with different weight. Remarkably, dihedral 3 (top right panel)shows only a single peak which is probably due to the strong repulsion between thetwo negatively charged carboxyl groups.

The ZIBgridfree strategy achieves a very low to medium sampling error except forone outlier which is highly different at 20.5% average bin-wise difference from thereference (see table 6.1). Excluding this outlier (line 4 in table 6.1), the average islow at 5.83% with a standard deviation of 3.02%. Therefore, it can be concluded thatthe ZIBgridfree method can reproduce the Boltzmann distribution at T = 300K witha fairly low sampling error when using a meshless discretization into s = 100 partialdensities. The symmetry error is very low to medium, again except for simulationrun 4, with values between 2.9 and 8.6%. It must be noted that the results producedby ZIBgridfree have a high standard deviation, which is almost as high as the meanswith respect to sampling error, symmetry error and overall performance relative tothe reference run.The method needs many time steps, and frequently, the sampling of a partial densityfunction does not converge (according to the criterion that was used (see 5.3)).Therefore, the average simulation time (measured in HMC steps) is high at about11.5 ·106 with a standard deviation of 322000. Consequently, the performance is lowat values between 4 · 10−7 and 4 · 10−6.

65

6. Results

Figure 6.1.: The sampled distributions of the five heavy dihedrals of L-benzylsuccinate. Numbers below the diagrams refer to the dihedralsmarked in figure 5.1.

error steps performance Esym

1 0.0897 11.47 · 106 9.721 · 10−7 0.08572 0.0458 11.75 · 106 1.857 · 10−6 0.03643 0.0754 11.72 · 106 1.132 · 10−6 0.02884 0.2054 10.96 · 106 4.444 · 10−7 0.15925 0.0223 11.60 · 106 3.873 · 10−6 0.0428µ 0.0877 11.50 · 106 1.656 · 10−6 0.0706σ 0.0708 322000 1.339 · 10−6 0.0542

Table 6.1.: Results for BZS using ZIBgridfree.

Replica Exchange is able to reproduce the sampling result from the reference runvery well (see table 6.2). The mean sampling error is low at 3.6% average bin-wise difference to the reference. However, all experiments except for the fourthin table 6.2, where the sampling obviously has not converged (see the right-mostcolumn, ‘G-R’), have yielded values that are below this average. Excluding theoutlier gives a very low mean error of 2.79% and a standard deviation of 0.42%. Themean symmetry error is low at a value of 5.8% with a standard deviation of 2.35%.Replica exchange converges fairly well according to Gelman and Rubin’s convergencemonitor within the maximally allowed number of HMC steps. This upper bound isreached in 3 of the 5 sampling runs. The average simulation time is 946000 HMCsteps with a standard deviation of 358000. The average performance of the REmethod on BZS is 4.238·10−5, 5.002·10−5 after removing the outlier, with a standarddeviation of 2.8795 · 10−5 or 2.677 · 10−5, respectively. This is about 26 times the

66

performance of ZIBgridfree.

error steps performance Esym G-R1 0.0293 1.2 · 106 2.848 · 10−5 0.0642 1.022 0.0319 1.2 · 106 2.611 · 10−5 0.0948 1.053 0.0284 445000 7.912 · 10−5 0.0363 1.0084 0.0704 1.2 · 106 1.183 · 10−5 0.0530 1.365 0.0220 685000 6.636 · 10−5 0.0398 1.008µ 0.03640 946000 4.238 · 10−5 0.0576 1.09σ 0.01937 358000 2.880 · 10−5 0.0235 0.152

Table 6.2.: Results for BZS using Replica Exchange.

In the simulations of L-benzylsuccinate, ConfJump yielded the best results onaverage, with a mean sampling error of 2.17% which is very low (see table 6.3.Highly accurate results are obtained with a high reliability as the standard deviationof the sampling error is only 0.97%. The ConfJump sampling converged in everycase, i.e. the sampling error dropped below 0.04, and the Gelman-Rubin indicatorwent below a threshold of 1.01 within the limit set for the number of HMC steps. Infact, the sampling converged after less than 100000 steps (20000 steps per chain) in4 of 5 runs.The number of sampling steps was corrected by adding the correction value c =39000 from table 5.2 for BZS. The resulting approximate total simulation time was128800 with a very high standard deviation of 192200 due to the outlier in line 5 oftable 6.3. The performance calculated on the basis of these values was 4.639 · 10−4

on average and had a standard variance of 1.369 · 10−4. Thus, for L-benzylsuccinatethe performance of ConfJump was 32 times that of Replica Exchange and 827 timesas high as that of ZIBgridfree.

error steps corrected performance Esym G-R1 0.0351 16000 55000 5.181 · 10−4 0.0393 1.0072 0.0237 34000 73000 5.791 · 10−4 0.0393 1.0093 0.0207 80000 119000 4.059 · 10−4 0.0389 1.0074 0.0213 44000 83000 5.651 · 10−4 0.0397 1.0055 0.0078 470000 509000 2.513 · 10−4 0.0400 1.0007µ 0.0217 128800 167800 4.639 · 10−4 0.03945 1.006σ 0.0097 192200 192200 1.369 · 10−4 0.0004 0.003

Table 6.3.: Results for BZS using ConfJump.

6.0.2. Trimethoprim

The sampled distributions over the 5 heavy dihedrals of Trimethoprim are shown infigure 6.2. Again, the reference run was created by the ConfJump strategy running

67

6. Results

for 120000 steps per chain. The rotational symmetry of dihedral 1 is somewhat im-perfectly reflected by the reference run (see top left panel), but still within acceptablelimits at a symmetry error of approximately 4.65%. Use of this particular run asreference is justified by the fact that of all Replica Exchange and ConfJump runs itis the one with the lowest mean distance to all others. The rotational symmetry ofdihedral 2 situated on the opposite side of the ring on the left hand side in figure 5.2is considerably better reproduced by the result of the reference run, as a visual ex-amination of the top center panel reveals. The distributions of the dihedrals of thetwo methoxy groups at the sides of the symmetric ring (bottom panels) are nearlyidentical (except for a shift by π), which is expected because the ring is symmetricand the functional groups are equal. Therefore, they act chemically and physicallyin the same way.

Figure 6.2.: The sampled distributions of the five heavy dihedrals of Trimethoprim.Numbers under the panels refer to the numbering of heavy dihedrals infigure 5.2.

ZIBgridfree produces consistently very low sampling errors with respect to thereference for Trimethoprim at an average value of 2.66% with a standard deviationof 0.6% (see table 6.4. Unfortunately, the symmetry error of the simulations was notmeasured in the simulations. However, at visual inspection the dihedral distributionslook very similar to those in figure 6.2 for all 5 simulation runs (not shown).ZIBgridfree’s overall performance on Trimethoprim is considerably better than forL-benzylsuccinate. The sampling needs on average 9.5 · 106 steps at a standarddeviation of only 17000, which results in a mean performance of 4.2 · 10−6 with astandard deviation of 1.25 · 10−6.

The Replica Exchange technique produces sampling results for Trimethoprim witha low average sampling error of 3.88% at a standard deviation of 1.16% (see table 6.5,

68

error steps performance1 0.0290 9.715 · 106 3.548 · 10−6

2 0.0345 9.510 · 106 3.045 · 10−6

3 0.0244 9.445 · 106 4.332 · 10−6

4 0.0276 9.545 · 106 3.802 · 10−6

5 0.0172 9.260 · 106 6.280 · 10−6

µ 0.0266 9.495 · 106 4.201 · 10−6

σ 0.00638 165000 1.2515 · 10−6

Table 6.4.: Results for Trimethoprim using ZIBgridfree.

i.e. the results are reliably good. The symmetry error (which is actually the maxi-mum of two symmetry errors, one for each symmetric dihedral) is, however, in themedium range at 10% with a standard deviation of 3.31%. All simulation runs areconsidered to have converged by the Gelman-Rubin criterion after the maximallyallowed number of 106 HMC steps.As all 5 simulations have been run for the same time, the average performance de-pends solely on the sampling error, which is very low in 4 of 5 cases. The meanperformance is thus 27.5 · 10−6 with a standard deviation of 7.48 · 10−6. This is 6.5times as high as that of ZIBgridfree.

error steps performance Esym G-R1 0.0274 106 3.654 · 10−5 0.1085 1.0022 0.0347 106 2.882 · 10−5 0.0749 1.0053 0.0558 106 1.791 · 10−5 0.1109 1.0074 0.0450 106 2.222 · 10−5 0.1456 1.0045 0.0312 106 3.206 · 10−5 0.0615 1.003µ 0.0388 106 2.751 · 10−5 0.1003 1.004σ 0.01155 0 7.484 · 10−6 0.03309 0.0017

Table 6.5.: Results for Trimethoprim using Replica Exchange.

The ConfJump strategy was more successful in reproducing the reference resultthan Replica Exchange but less so than ZIBgridfree (see table 6.6. The mean sam-pling error is 3.46% which is low. The standard deviation is 1.65%. As with theRE simulations, the symmetry errors differ strongly between different sampling runs,which gives rise to the conjecture that rotation around the single bond correspondingto dihedral 1 of the molecule is sterically hindered to a high degree, possibly dueto electrostatic attraction between partial charges of different sign in the two rings.The mean symmetry error is 9.7% with a standard deviation of 3.89%.All simulations have been run for 106 steps, 120000 in each chain. Thus, the per-formance only depends on the sampling error. A correction value of c = 80000 wasadded to the simulation time for identifying representatives of all low-energy regionsin conformational space (see table 5.2. The mean performance was 3.332·10−5 with a

69

6. Results

standard deviation of 1.812 · 10−5. For Trimethoprim, ConfJump has a mean perfor-mance that is 1.2 times that of Replica Exchange and 7.9 times that of ZIBgridfree.

error steps corrected performance Esym G-R1 0.0152998 106 1.08 · 106 6.052 · 10−5 0.066784 1.02442 0.0219898 106 1.08 · 106 4.211 · 10−5 0.0724 1.062723 0.0332182 106 1.08 · 106 2.787 · 10−5 0.078386 1.025294 0.0521558 106 1.08 · 106 1.775 · 10−5 0.105084 1.028765 0.0505242 106 1.08 · 106 1.833 · 10−5 0.161262 1.02625µ 0.03463756 106 1.08 · 106 3.332 · 10−5 0.0967832 1.033484σ 0.016546929 0 0 1.812 · 10−5 0.038920972 0.016424458

Table 6.6.: Results for Trimethoprim using ConfJump.

6.0.3. BSI

Figure 6.3 illustrates the Boltzmann distribution of BSI sampled by the reference runat 300K, a simulation run of 106 steps using the ConfJump method, projected into5 of its 8 heavy dihedral angles. The top left panel and the central panel at the topshow the distributions of the two rotationally symmetric dihedrals that correspondto the bond between the two aromatic 6-rings on the right hand side in figure 5.3(left) and between the sulfonyl group and the adjacent planar ring (center). Bothdistributions show only minor flaws at visual inspection. The single bond adjacentto the carboxyl group, which corresponds to dihedral 4 is also rotationally symmetricwhich is reproduced well by the reference run, as can be seen in the bottom left panelin figure 6.3. The distribution of dihedral 3 which is situated between the S- and theN-atom shows two peaks with very different statistical weights (top right panel), anda similar behavior can be seen in the distribution of dihedral 8 which lies inside thenon-aromatic ring of the molecule. The three heavy dihedrals that are not shownhave only one visible peak each. Therefore, if BSI should show a similar behavior tocyclohexane with respect to a large conformational change induced by a “flip” of thenon-aromatic ring, this behavior is at least not reproduced well by the simulation.However, it is also conceivable that the large functional groups that surround thatring force it into one of the two possible conformations most of the time. Thisspeculation is supported by the fact that none of the 15 sampling runs evaluatedbelow assigned a higher statistical weight to the small peak seen in dihedral 8, and,in fact, most sampling runs failed to reproduce it at all.

The Replica Exchange simulations of BSI all produced results with a very lowsampling error except for one outlier with a low sampling error (see table 6.7). Themean sampling error is 3.15% with a standard deviation of 1.45%. However, thesymmetry error is high at 24 to 28.5%.As none of the simulations has converged within the maximally allowed time steps,all simulations ran for 106 time steps. The performance thus depends only on the

70

Figure 6.3.: The sampled distributions of five of the eight heavy dihedrals of BSI.The numbers under each diagram correspond to the dihedrals markedin figure 5.3.

sampling error. The mean performance is 3.58 · 10−5 while the estimated standarddeviation is 1.15 · 10−5.

error steps performance Esym G-R1 0.0203 106 4.927 · 10−5 0.2723 1.042 0.0257 106 3.887 · 10−5 0.285 1.033 0.0274 106 3.645 · 10−5 0.2429 1.024 0.0272 106 3.678 · 10−5 0.2476 1.035 0.0570 106 1.755 · 10−5 0.2410 1.19µ 0.0315 106 3.578 · 10−5 0.2578 1.06σ 0.01452 0 1.1460 · 10−5 0.01973 0.074

Table 6.7.: Results for BSI using Replica Exchange.

For BSI, the ConfJump approach also produced results with a low sampling error(see table 6.8). The means was 0.8%, while the standard deviation was 0.32%. Theseexremely low results compared to the other methods (especially Replica Exchange)combined with the high symmetry error with a very low variance gives some reasonto doubt the validity of the reference run.None of the simulations has converged, like in the case of RE, within the maximallyallowed time steps. All simulations ran for 106 time steps. Therefore, the perfor-mance is dependent only on the sampling error. 93000 steps have been added to thesampling time, corresponding to the time for preprocessing. The mean performanceis 1.23 · 10−4 while the estimated standard deviation is 3.978 · 10−5.

71

6. Results

error steps corrected performance Esym G-R0.00534 106 1.093 · 106 1.714 · 10−4 0.2516 1.080.00634 106 1.093 · 106 1.443 · 10−4 0.2525 1.100.00714 106 1.093 · 106 1.282 · 10−4 0.2521 1.040.01369 106 1.093 · 106 6.684 · 10−5 0.2487 1.040.00875 106 1.093 · 106 1.046 · 10−4 0.2528 1.12

µ 0.00825 106 1.093 · 106 1.231 · 10−4 0.2516 1.07σ 0.00329 0 0 3.975 · 10−5 0.0017 0.036

Table 6.8.: Results for BSI using ConfJump.

Unfortunately, due to technical difficulties, the results obtained from the ZIBgrid-free could not be evaluated for this thesis.

6.0.4. Performance comparison

Figure 6.4 shows a plot of accuracy (1−sampling error) vs. time in HMC steps for thesimulations of the three model systems with all ConfJump and Replica Exchange.The corrected times are used for ConfJump.ConfJump produces a lower sampling error (higher accuracy) than Replica Ex-change for all three molecules. Overall, very few simulations converged, due tothe strict choice of convergence criteria. In the case of BZS, however, ConfJump wasable to beat Replica Exchange on both accounts by producing a better result in amuch shorter time. Surprisingly both methods fare best on BSI, the most complexmolecule. However, there are reasons to doubt the validity of the reference run inthat case.

The mean results of ZMFree are shown in comparison for BZS and Trimethoprimin figure 6.5. ZMFree has a large computational overhead compared to ConfJumpand Replica Exchange. In figure 6.5, its results appear far to the right because of this.ZMFree was able to sample the distributions of Trimethoprim and BZS sufficientlywell and in the case of Trimethoprim even produced the lowest sampling error of allthree methods.

While ZMFree needs a more thorough sampling of the conformational space thanthe other two methods, it also gains more information than ConfJump and RE.ZIBgridfree is the only method able to compute transition probabilities between theconformations. The high number of basis functions needed for an accurate samplingcan in part be dealt with by parallelization. As mentioned in section 3.2, the currentimplementation already uses a parallelization, and in fact, in every sampling, threepartial density functions were sampled at the same time.

72

Figure 6.4.: Mean accuracy (y-axis) vs. time (x-axis) for the ConfJump and ReplicaExchange simulations.

Figure 6.5.: Mean accuracy (y-axis) vs. time (x-axis) for the ZIBgridfree simulationsin comparison to the values from the ConfJump and Replica Exchangesimulations.

73

6. Results

74

7. Conclusion

In this thesis a method for comparing sampling results from different Markov chainMonte Carlo methods was developed and applied to samplings of three typical drug-like molecules using three different sampling methods, ZIBgridfree, Replica Exchangeand ConfJump.

It has been shown that a method is generally more stable the less information itneeds about high-energy transition regions in conformational space.

• ConfJump employs knowledge about the shape of the potential and is thusable to bypass high-energy regions altogether.

• In Replica Exchange, high-temperature chains must pass through high-energyregions in sampling in order to be able to discover different low-energy regionswhich are then sampled accurately by the chain at the sampling temperature.However, Replica Exchange does not need to sample high-energy regions ac-curately.

• ZIBgridfree, on the other hand, requires accurate sampling of transition regionsin order to correctly weight different low-energy regions against each other.

For small to medium-sized molecules where it is affordable to generate represen-tatives of all low-energy regions in conformational space, the ConfJump approachseems to be both the most accurate and the most stable method. By switching be-tween HMC steps and jump steps that carry the system swiftly from one metastableregion to the next and thus actively avoiding the problem of broken ergodicity, theConfJump approach can greatly accelerate the sampling. However, as the dimensionof the conformational space grows, ConfJump will invariably become less efficient,less stable and ultimately also less accurate than other methods. On the one hand,low-energy regions in a high-dimensional, rough potential energy landscape will bemore irregularly shaped than in lower dimensions, which is a critical problem for theefficiency of ConfJump. This is due to the fact that the jump vector is determinedindependent of the shape of the target region (solely on the basis of one representa-tive of that region) but is only accepted if it “hits” the target. On the other hand,identifying all low-energy regions in the d-dimensional conformational space has acomputational cost that is exponential in d. This soon leads to a prohibitive compu-tational cost as d grows. By relying on precomputed representatives of low-energyregions, ConfJump gives up the crucial advantage of Monte Carlo methods over e.g.numerical integration, namely being able to approximate high-dimensional statisti-cal distributions at a computational cost that does not depend on the dimension ofthe problem but only on the number of samples generated.

75

7. Conclusion

Replica Exchange has been found in numerical experiments to be able to ap-proximate the Boltzmann distributions of drug-sized molecules almost as well asConfJump, with a slightly higher average sampling error with a somewhat highervariance. This means that it is less stable numerically than ConfJump. Exchangingtemperatures between replicas at certain intervals cannot avoid trapping in basinsof attraction of local minima as well as the ConfJump approach because no informa-tion is available on the destination of the “jump” associated with a replica exchange.One problem of the Replica Exchange method is the enormous amount of redundantdata that is generated. When sampling at ten temperatures only one tenth of thedata generated during sampling can be used for conformation analysis. In order tobe able to reach all regions of conformational space in an acceptable time on aver-age, the maximum temperature must be chosen high enough. However, the hybridMonte Carlo method and especially the molecular dynamics integration are bound toencounter numerical difficulties when working with very high temperatures. Worse,systems such as DNA or clusters of lipids or proteins that are stabilized by weakmolecular interaction forces such as van der Waals forces and hydrogen bonds canimpossibly be simulated at high temperatures as high temperatures would break thestabilizing interactions, and the system being studied would simply fall apart. An-other problem for Replica Exchange that is independent of this consideration is thatthe acceptance probability of two replicas exchanging temperatures depends on thepotential energy difference between the position states of the two chains. In largesystems the interesting low-energy regions will likely be far away from each otherwhich leads to a decrease in the acceptance ratio as it is less likely that two chainsare at similar energy levels, and the probability of a high-temperature chain beingin a high-energy region is high.

ZIBgridfree has also been found to be able to get very close to a given referencerun in most cases. However, the method is not very robust with respect to initialconditions. ZIBgridfree will occasionally generate samplings with a large samplingerror. The reason for this is that this method relies on being able to weight all pairsof “adjacent” sampled partial densities correctly against each other, which requires ahigh accuracy of sampling also in high-energy transition regions which are seldom vis-ited in sampling. Nevertheless, ZIBgridfree must be considered the most promisingstrategy when the goal is to be able to simulate large systems as no other techniquediscussed here is, in principle, able to deal with very rough potential energy surfaceson high-dimensional conformational spaces. It seems inevitable to discretize verycomplex Boltzmann distributions and look at uncoupled partial densities separately.Very likely, ZIBgridfree’s approach for weighting partial densities against each other,which relies on accurate sampling of high-energy transition regions, is bound to failon very rough potential energy surfaces. However, better methods for weighting thedifferent partitions against each other are being discussed already and will be thesubject of further research. It might be possible to use the ConfJump method toquickly and accurately explore transition probabilities between partial densities thatcontain low-energy regions without needing detailed information about the transi-tion regions in between. Additionally, ZIBgridfree is the only method that yields

76

transition probabilities between metastable regions thus allowing examination of thedynamics of the system under consideration.

The semi-empirical convergence indicator for Markov chain Monte Carlo methodsthat was developed in this thesis can be used to supplement convergence monitorsthat are based solely on properties of Markov chains. This convergence indicator iswidely applicable as symmetric planar rings and other rotationally symmetric groupsare abundant in biomolecules and occur particularly frequently in the class of peptideligands. In the numerical experiments conducted for this thesis cases were observedwhere the Gelman-Rubin statistic indicated convergence while the symmetry errorwas still high as well as the reversed situation. Therefore, the convergence criterionthat uses the Gelman-Rubin statistic and the symmetry error in combination isa more powerful criterion than either method alone. The computational cost ofthe combined method is lower than twice the cost of Gelman-Rubin due to thereusing of histograms by the symmetry monitor. The method owes some of itseasy applicability to the graph-theoretic algorithm for finding rotationally symmetricgroups in molecules that was developed in this thesis.

Acknowledgments

First, I would like to deeply thank Dr. Marcus Weber for helpful suggestions andfruitful discussions throughout all stages of this work. Marcus Weber is also acknowl-edged for bringing to my attention the problem of automatic detection of moleculesymmetries.

I would like to thank Lionel Walter and Dr. Frank Cordes for interesting discussionsand suggestions, Susanna Kube and Marcus Weber for proofreading parts of thisthesis, and Prof. Dr. Paul Wrede for pointing out 3D structure generators.

I would further like to thank Johannes Schmidt-Ehrenberg for help with amira andamiraMol and Wolgang Pyszkalski for swift technical assistance in the final stagesof this work.

Image Credits

• Figure 2.1, created using amira [78].

• Figure 3.1, simulation and image courtesy M. Weber and H. Meyer, taken withpermission from [75].

• Figure 3.2 and fig:partialdens, based on plots created with MATLAB [35].

• Figure 3.8, courtesy L. Walter, taken with permission from [71].

• Figure 4.2, based on a diagram created with gnuplot [77].

• Figure 4.3, based on a diagram created with gnuplot [77].

77

7. Conclusion

• Figures 5.1, 5.2, and 5.3, extracted from visualizations created by the ProteinData Bank’s “ligand summary” viewer [4].

• Figures 6.4 and 6.5, created with Microsoft Excel.

78

A. Algorithm for automatic detectionof molecule symmetries

The following recursive algorithm is used to detect all single bonds with 180 rota-tional symmetry. It operates on a graph representation G = (V,E) of the moleculewhose nodes v ∈ V represent atoms (by storing index and atomic number) andwhose undirected edges e = (u, v) = (v, u) ∈ E ⊆ V × V represent bonds betweenatoms. This has been implemented as an adjacency list in which each node storesthe indices of its neighbors. It is assumed that no atom has more than four bindingpartners (an extremely rare phenomenon in biomolecules).

An array discovered of N = |V | binary flags is used to mark which atoms havealready been visited. discovered [i] = True means that node i lies on one of the twobranches that are being compared at that moment. The discovered flags are used toprevent branches from growing into themselves or each other – each atom can onlybelong to one branch in which it also cannot occur twice.

The core of the algorithm is the function compareSubgraphs which determineswhether two branches starting with the directed edges from1→ to1 and from2→ to2are isomorphic. This is done by removing the edges (from1, to1) and (from2, to2) fromthe graph and recursively trying to split the component of the remaining graph thatcontains to1 and to2 into two isomorphic subgraphs (where to1 is in one branch andto2 in the other).

1: Initialize list of symmetric dihedrals symList ← [].2: for all nodes v with 3 neighbors do

3: if v is adjacent to a single bond in a ‘heavy’ dihedral then

⊲ See section 2.5 for definition of ‘heavy’ dihedrals.4: for all neighbors u of v do

5: for i← 1, N do

6: discovered [i]← False

7: end for

8: discovered [u]← discovered [v]← True

9: Let l, r be the other 2 neighbors of v.10: if compareSubgraphs(v, l, v, r) then

11: Identify the dihedral D that describes the bond (u, v)12: symList.append(D)13: end if

14: end for

15: end if

79

A. Algorithm for automatic detection of molecule symmetries

16: end for

17: function compareSubgraphs(from1, to1, from2, to2)⊲ Checks whether the two branches starting with the directed edges from1→ to1and from2→ to2 are isomorphic.

18: if to1 = to2 then ⊲ a ring closes19: if nNeighbors(to1) ≤ 3 then

20: ⊲ singular ring appendage is part of both branches ⇒ skip21: return True

22: else ⊲ 4 neighbors ⇒ recurse into non-ring neighbors of ring link23: Let l, r be the non-ring neighbors of to1 = to2.24: discovered [to1]← True

25: if compareSubgraphs(to1, l, to1, r) then

26: return True

27: else ⊲ backtrack28: discovered [to1]← False

29: return False

30: end if

31: end if

32: else if (atomType[to1] 6= atomType[to2]) or (nNeighbors(to1) 6=nNeighbors(to2))then

33: ⊲ atoms to1 and to2 are incompatible ⇒ backtrack34: return False

35: else if nNeighbors(to1) = 1 then

36: ⊲ reached (compatible) terminal atoms37: return True

38: end if

39: ⊲ Build lists of undiscovered neighbors40: neighbors1← neighbors2← []41: for i← 0, nNeighbors(to1) do

42: if not discovered[neighbor[to1][i]] then

43: neighbors1.append(neighbor[to1][i])44: end if

45: end for

46: for i← 0, nNeighbors(to2) do

47: if not discovered[neighbor[to2][i]] then

48: neighbors2.append(neighbor[to2][i])49: end if

50: end for

51: if neighbors1.size() 6= neighbors2.size() then

52: ⊲ one branch grows into the other ⇒ backtrack53: return False

80

54: else if neighbors1.size() = 0 then

55: ⊲ a side ring closes on each branch; already checked56: return True

57: end if

58: ⊲ Recurse through branches59: a← neighbors1[0], b← neighbors1[1], c← neighbors1[2]60: A← neighbors2[0], B ← neighbors2[1], C ← neighbors2[2]61: ⊲ (provided these exist)62: discovered [to1]← discovered [to2]← True

63: result ← False

64: if neighbors1.size() = 1 then

65: ⊲ only one path to pursue on each branch66: result ← compareSubgraphs(to1, a, to2, A)67: else if neighbors1.size() = 2 then

68: ⊲ Either A corresponds to a and B to b or A corresponds to b and B to a:69: result ← (compareSubgraphs(to1, a, to2, A)70: and compareSubgraphs(to1, b, to2, B))71: or (compareSubgraphs(to1, a, to2, B)72: and compareSubgraphs(to1, b, to2, A))73: else if neighbors1.size() = 3 then

74: ⊲ Check all possible combinations75: if compareSubgraphs(to1, a, to2, A) then

76: result ← (compareSubgraphs(to1, b, to2, B)77: and compareSubgraphs(to1, c, to2, C))78: or (compareSubgraphs(to1, b, to2, C)79: and compareSubgraphs(to1, c, to2, B))80: else if (not result) and compareSubgraphs(to1, a, to2, B) then

81: result ← (compareSubgraphs(to1, b, to2, A)82: and compareSubgraphs(to1, c, to2, C))83: or (compareSubgraphs(to1, b, to2, C)84: and compareSubgraphs(to1, c, to2, A))85: else if (not result) and compareSubgraphs(to1, a, to2, C) then

86: result ← (compareSubgraphs(to1, b, to2, A)87: and compareSubgraphs(to1, c, to2, B))88: or (compareSubgraphs(to1, b, to2, B)89: and compareSubgraphs(to1, c, to2, A))90: end if

91: ⊲ Check chirality of corresponding triples of atoms92: if result then

93: result ← checkChirality(to1, from1, neighbors1[0], neighbors1[1], neighbors1[2])94: ⊲ (using a variant of compareSubgraphs on a second discovered

array)95: end if

81

A. Algorithm for automatic detection of molecule symmetries

96: end if

97: if not result then

98: discovered [to1]← discovered [to2]← False

99: end if

100: return result101: end function

The algorithm can easily be adapted for 120 symmetry (not shown).

82

Bibliography

[1] I. Andricioaiei, J. Straub, and A. Voter. Smart Darting Monte Carlo. J. Chem.Phys., 114(16):6994–7000, 2001.

[2] Ehrhard Behrends. Introduction to Markov Chains. Vieweg Verlagsgesellschaft,1st edition, 2000.

[3] Jeremy M. Berg, John L. Timoczko, and Lubert Stryer. Biochemistry. PalgraveMacmillan, 5th edition, 2002.

[4] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N.Shindyalov, and P.E. Bourne. The Protein Data Bank. Nucleic Acids Research,28:235–242, 2000. http://www.rcsb.org/pdb.

[5] J.F. Bonnans, J.Ch. Gilbert, C. Lemarechal, and C.A. Sagastizabal. NumericalOptimization – Theoretical and Practical Aspects. Universitext. Springer-Verlag,Berlin, 2003.

[6] Jonas Bostrom. Reproducing the conformations of protein-bound ligands:A critical evaluation of several popular conformational searching tools.J. Comput.-Aided Mol. Des., 15(12):1137–1152, 2001.

[7] A. Brass, B.J. Pendleton, Y. Chen, and B. Robson. Hybrid Monte Carlo sim-ulations theory and initial comparison with molecular dynamics. Biopolymers,33(8):1307–1315, 1993.

[8] S.P. Brooks and A. Gelman. General Methods for Monitoring Convergence ofIterative Simulations. J. Comput. Graph. Stat., 7(4):434–455, 1998.

[9] S.P. Brooks and G.O. Roberts. Assessing Convergence of Markov Chain MonteCarlo Algorithms. Technical report, University of Cambridge, 1997.

[10] M. Cecchini, F. Rao, M. Seeber, and A. Caflisch. Replica exchange molec-ular dynamics simulations of amyloid peptide aggregation. J. Chem. Phys,121:10748–10756, 2004.

[11] J.N. Champness, A. Achari, S.P. Ballantine, P.K. Bryant, C.J. Delves, and D.K.Stammers. The structure of Pneumocystis carinii dihydrofolate reductase to 1.9A resolution. Structure, 2(10):915–924, 1994.

83

Bibliography

[12] M.E. Clamp, P.G. Baker, C.J. Stirling, and A. Brass. Hybrid Monte Carlo:An efficient algorithm for condensed matter simulation. J. Comput. Chem.,15(8):838–846, 1994.

[13] Frank Cordes, Marcus Weber, and Johannes Schmidt-Ehrenberg. MetastableConformations via successive Perron-Cluster Cluster Analysis of dihedrals.Technical report 02-40, Zuse Institute Berlin, 2002.

[14] J. Couet, S. Li, T. Okamoto, T. Ikezu, and M.P. Lisanti. Identification ofPeptide and Protein Ligands for the Caveolin-scaffolding Domain. Implicationsfor the Interaction of Caveolin with Caveolae-Associated Proteins. J. Biol.Chem., 272(10):6525–6533, 1997.

[15] M.K. Cowles and B.P. Carlin. Markov Chain Monte Carlo Convergence Diag-nostics: A Comparative Review. J. Am. Stat. Assoc., 91(434):883–904, 1996.

[16] Peter Deuflhard. From Molecular Dynamics to Conformational Dynamics inDrug Design. In M. Kirkilionis, S. Kromker, R. Rannacher, and F. Tomi, editors,Trends in Nonlinear Analysis, pages 269–288. Springer-Verlag, Berlin, 2003.

[17] Peter Deuflhard and Christof Schutte. Molecular Conformation Dynamics andComputational Drug Design. In J.M. Hill and R. Moore, editors, Applied Mathe-matics Entering the 21st Century. Invited Talks from the ICIAM 2003 Congress,Sydney, Australia, 2004.

[18] Peter Deuflhard and Marcus Weber. Robust Perron Cluster Analysis in Confor-mation Dynamics. In M. Dellnitz, S. Kirkland, Neumann M., and C. Schutte,editors, Lin. Alg. Appl. – Special Issue on Matrices and Methematical Biology,volume 398C, pages 161–184. 2005.

[19] S. Duane, A.D. Kennedy, B.J. Pendleton, and D. Roweth. Hybrid Monte Carlo.Phys. Lett. B, 195:216–222, 1987.

[20] M. Filter, M. Eichler-Mertens, A. Bredenbeck, F.O. Losch, T. Sharav, A. Give-hchi, P. Walden, and P. Wrede. A Strategy for the Identification of Canonicaland Non-canonical MHCI-binding Epitopes Using an ANN-based Epitope Pre-diction Algorithm. QSAR & Comb. Sci., 25(4):350–358, 2006.

[21] Alexander Fischer. Die Hybride Monte–Carlo Methode in der Molekulphysik.Master’s thesis, Freie Universitat Berlin, 1997. In German.

[22] Alexander Fischer. An Uncoupling-Coupling Technique for Markov ChainMonte Carlo Methods. Technical report 00-04, Zuse Institute Berlin, 2006.

[23] Daan Frenkel and Berend Smit. Understanding Molecular Simulation. AcademicPress, 2nd edition, 2002.

84

Bibliography

[24] Johann Gasteiger and Jens Sadowski. From Atoms and Bonds to Three-Dimensional Atomic Coordinates: Automatic Model Builders. Chem. Rev.,93:2567–2581, 1993.

[25] A. Gelman. Inference and monitoring convergence. In W. Gilks, S. Richardson,and D.J. Spiegelhalter, editors, Practical Markov Chain Monte Carlo, pages131–143. Chapman & Hall, London, UK, 1996.

[26] A. Gelman and D. Rubin. Inference from Iterative Simulation using MultipleSequences. Statist. Sci., 7:457–511, 1992.

[27] A. Gelman and D. Rubin. Markov chain Monte Carlo Methods in Biostatistics.Stat. Meth. Med. Res., 5:339–355, 1996.

[28] W.R. Gilks, S. Richardson, and D.J. Spiegelhalter. Markov Chain Monte Carloin Practice. Chapman & Hall, London, UK, 1996.

[29] Stefan Goedecker. Minima hopping: An efficient search method for theglobal minimum of the potential energy surface of complex molecular systems.J. Chem. Phys., 120(21):9911–9917, 2004.

[30] Y.G. Gogotsi, A. Kailer, and K.G. Nickel. Transformation of diamond tographite. Nature, 401:663–664, 1999.

[31] Thomas A. Halgren. Merck molecular force field. I–V. J. Comput. Chem.,17(5–6):490–641, 1996.

[32] Okamoto Y. Hansmann, U.H.E. Generalized-ensemble Monte Carlo method forsystems with rough energy landscape. Phys. Rev. E, 56(2):2228–2233, 1997.

[33] W.K. Hastings. Monte Carlo sampling methods using Markov chains and theirapplications. Biometrika, 57(1):97–109, 1970.

[34] K. Inaba, S. Turley, T. Iyoda, F. Yamaide, S. Shimoyama, C.R. e Sousa, R.N.Germain, I. Mellman, and R.M. Steinman. The Formation of Immunogenic Ma-jor Histocompatibility Complex Class II-Peptide Ligands in Lysosomal Com-partments of Dendritic Cells Is Regulated by Inflammatory Stimuli. J. Exp.Med., 191(6):927–936, 2000.

[35] The MathWorks Inc. MATLAB(R) 6.5.0, 1984-2002.

[36] Martin Karplus. Molecular dynamics of biological macromolecules: A briefhistory and perspective. Biopolymers, 68(3):350–358, 2002.

[37] Andrew R. Leach. Molecular Modelling: Principles and Applications. PrenticeHall, 2002.

[38] Georg Loffler and Petro E. Petrides. Biochemie und Pathobiochemie. Springer-Verlag, 5th edition, 1997. In German.

85

Bibliography

[39] Z. Lu, H. Hu, W. Yang, and P.E. Marszalek. Simulating Force-Induced Con-formational Transitions in Polysaccharides with the SMD Replica ExchangeMethod. Biophys. J, 2006.

[40] S. Mangani, P. Carloni, and P. Orioli. Crystal structure of the complex betweencarboxypeptidase A and the biproduct analog inhibitor L-benzylsuccinate at2.0 A resolution. J. Mol. Biol., 223(2):573–578, 1992.

[41] H. Matter, W. Schwab, D. Barbier, G. Billen, B. Haase, B. Neises, M. Schu-dok, W. Thorwart, H. Schreuder, V. Brachvogel, P. Lonze, and K.U. Weith-mann. Quantitative structure-activity relationship of human neutrophil colla-genase (MMP-8) inhibitors using comparative molecular field analysis and X-raystructure analysis. J. Med. Chem., 42(11):1908–1920, 1999.

[42] N. Metropolis. The Beginning of the Monte Carlo Method. Los Alamos Science,Special Issue, pages 125–130, 1987.

[43] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller.Equation of state calculations by fast computing machines. J. Chem. Phys.,21(6):1087–1092, 1953.

[44] N. Metropolis and S. Ulam. The Monte Carlo method. J. Am. Stat. Assoc.,44:335–341, 1949.

[45] Holger Meyer. Die Implementierung und Analyse von HuMFree – einer git-terfreien Methode zur Konformationsanalyse von Wirkstoffmolekulen. Master’sthesis, Freie Universitat Berlin, 2005. In German.

[46] Holger Meyer, Frank Cordes, and Marcus Weber. ConFlow: A new space-based Application for complete Conformational Analysis of Molecules. Technicalreport 06-31, Zuse Institute Berlin, 2006. in preparation.

[47] Holger Meyer, Marcus Weber, Alexander Riemer, and Lionel Walter. ZIB-gridfree, 2004–2006. Software package for HMC simulation and conformationanalysis based upon C++ classes of amiraMol [56] using the Merck MolecularForce Field [31] implemented by T. Baumeister and parametrized by F. Cordes.Robust Perron Cluster Analysis implemented by M. Weber and J. Schmidt-Ehrenberg. Status: August 2006. Software owned by Zuse Institute Berlin.

[48] David L. Nelson and Michael M. Cox. Lehninger Principles of Biochemistry,chapter 1, pages 16–21. W.H. Freeman, New York, NY, USA, 4th edition, 2004.

[49] Adrian Patrascioiu. The Ergodic Hypothesis: A Complicated Problem in Math-ematics and Physics. Los Alamos Science, Special Issue, pages 263–279, 1987.

[50] RS Pearlman. Rapid generation of high quality approximate 3D molecularstructures. Chem. Des. Auto. News, 2:1–7, 1987.

86

Bibliography

[51] J.W. Pitera and W. Swope. Understanding folding and design: Replica-exchange simulations of “trp-cage” miniproteins. PNAS, 100(13):7587–7592,2003.

[52] Martin Riedmiller and Heinrich Braun. A Direct Adaptive Method for FasterBackpropagation Learning: The RPROP Algorithm. In Proceedings of the IEEEInternational Conference on Neural Networks, pages 586–591. IEEE Press, 1993.

[53] Daniel Ruiz. A Scaling Algorithm to Equilibrate both Rows and Columns inMatrices. Technical report RAL-TR-2001-034, Rutherford Appleton Labora-tory, 2001.

[54] K.Y. Sanbonmatsu and A.E. Garcia. Structure of met-enkephalin in explicitaqueous solution using replica exchange molecular dynamics. Proteins, 46:225–234, 2002.

[55] Tamar Schlick. Molecular Modeling and Simulation: An Interdisciplinary Guide.Springer-Verlag, New York, NY, USA, 2002.

[56] Johannes Schmidt-Ehrenberg, Daniel Baum, and Hans-Christian Hege. Visu-alizing dynamic molecular conformations. In IEEE Visualization 2002, pages235–242. IEEE Computer Society Press, 2002.

[57] Ch. Schutte, A. Fischer, W. Huisinga, and P. Deuflhard. A Direct Approachto Conformational Dynamics based on Hybrid Monte Carlo. J. Comput. Phys.,Special Issue on Computational Biophysics, 151:146–168, 1999.

[58] Christof Schutte. Conformational Dynamics: Modelling, Theory, Algorithm andApplication to Biomolecules. Habilitation thesis, Freie Universitat Berlin, 1998.

[59] Christof Schutte and Wilhelm Huisinga. Biomolecular Conformations can beIdentified as Metastable Sets of Molecular Dynamics, volume X, pages 699–744.North-Holland, 2003.

[60] H. Senderowitz, F. Guarnieri, and W.C. Still. A Smart Monte Carlo Techniquefor Free Energy Simulations of Multiconformal Molecules. Direct Calculation ofthe Conformational Population of Organic Molecules. J. Am. Chem. Society,117:8211–8219, 1995.

[61] H. Senderowitz and W.C. Still. Simple but smart monte carlo algorithm forfree energy simulations of multiconformational molecules. J. Comput. Chem.,19(15):1736–1745, 1998.

[62] D. Shepard. A two-dimensional interpolation function for irregularly spaceddata. In Proc. 23rd ACM Nat. Conf., pages 517–524, 1968.

87

Bibliography

[63] D. Stalling, M. Westerhoff, and H.C. Hege. Amira: A Highly Interactive Systemfor Visual Data Analysis. In C.D. Hansen and C.R. Johnson, editors, TheVisualization Handbook, chapter 38, pages 749–767. Elsevier, 2005.

[64] Yuji Sugita and Yuko Okamoto. Replica-exchange molecular dynamics methodfor protein folding. Chem. Phys. Lett., 314(1-2):141–151, 1999.

[65] W.C. Swope, H.C. Andersen, P.H. Berens, and K.R. Wilson. A computer sim-ulation method for the calculation of equilibrium constants for the formationof physical clusters of molecules: Application to small water clusters. J. Chem.Phys., 76(1):637–649, 1982.

[66] G.M Torrie and J.P. Valleau. Monte Carlo free energy estimates using non-Boltzmann sampling: Application to the sub-critical Lennard-Jones fluid.Chem. Phys. Lett., 28(4):578–581, 1974.

[67] G.M Torrie and J.P. Valleau. Monte Carlo study of a phase-separating liquidmixture by umbrella sampling. J. Chem. Phys., 66(4):1402–1408, 1977.

[68] G.M Torrie and J.P. Valleau. Nonphysical sampling distributions in MonteCarlo free-energy estimation: Umbrella sampling. J. Comput. Phys., 23:187–199, 1977.

[69] L. Verlet. Computer “Experiments” on Classical Fluids I. thermodynamicalProperties of Lennard-Jones Molecules. Phys. Rev., 159:98–103, 1967.

[70] David Wales. Energy Landscapes: Applications to Clusters, Biomolecules andGlasses (Cambridge Molecular Science). Cambridge University Press, 2004.

[71] Lionel Walter and Marcus Weber. ConfJump: A fast biomolecular samplingmethod which drills tunnels through high mountains. Technical report 06-26,Zuse Institute Berlin, 2006.

[72] Marcus Weber. Clustering by using a simplex structure. Technical report 04-03,Zuse Institute Berlin, 2004.

[73] Marcus Weber. Meshless Methods in Conformation Dynamics. PhD thesis,Freie Universitat Berlin, 2006.

[74] Marcus Weber, Susanna Kube, Lionel Walter, and Peter Deuflhard. Well-conditioned computation of probability densities for metastable conformations.Technical report 06-39, Zuse Institute Berlin, 2006. in preparation.

[75] Marcus Weber and Holger Meyer. ZIBgridfree – Adaptive Conformation Anal-ysis with qualified Support of Transition States and Thermodynamic Weights.Technical report 05-17, Zuse Institute Berlin, 2006.

88

Bibliography

[76] L.T. Wille and J. Vennik. Computational complexity of the ground-state de-termination of atomic clusters. J. Phys. A, 18(8):L419–L422, 1985.

[77] Thomas Williams and Colin Kelley. Gnuplot, version 4.0, 1986–1993, 1998,2004. http://www.gnuplot.info.

[78] ZIB and Mercury Computer Systems, Berlin. amira and amiraMol, 1999–2004.

89

Date post:	22-Jul-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

Accuracy, stability, convergence of rigorous thermodynamic ... · Accuracy, stability, convergence...

Documents