An improved hybrid Monte Carlo method for conformational sampling of proteins

Post on 10-Feb-2016

33 views 0 download

Tags:

description

An improved hybrid Monte Carlo method for conformational sampling of proteins. Jesús A. Izaguirre and Scott Hampton Department of Computer Science and Engineering University of Notre Dame March 5, 2003 - PowerPoint PPT Presentation

transcript

1

An improved hybrid Monte Carlo method for

conformational sampling of proteins

Jesús A. Izaguirre and Scott HamptonDepartment of Computer Science and Engineering

University of Notre Dame

March 5, 2003

This work is partially supported by two NSF grants (CAREER and BIOCOMPLEXITY) and two grants from University of Notre Dame

2

Overview1. Motivation: sampling conformational space of proteins

2. Methods for sampling (MD, HMC)

3. Evaluation of new Shadow HMC

4. Future applications

3

Protein: The Machinery of LifeNH2-Val-His-Leu-Thr-Pro-Glu-Glu-Lys-Ser-Ala-Val-Thr-Ala-Leu-Trp-Gly-Lys-Val-Asn-Val-Asp-Glu-Val-Gly-Gly-Glu-…..

4

Protein Structure

5

Why protein folding? Huge gap: sequence data and 3D structure data

EMBL/GENBANK, DNA (nucleotide) sequences 15 million sequence, 15,000 million base pairs

SWISSPROT, protein sequences120,000 entries

PDB, 3D protein structures20,000 entries

Bridging the gap through prediction Aim of structural genomics:

“Structurally characterize most of the protein sequences by an efficient combination of experiment and prediction,” Baker and Sali (2001)

Thermodynamics hypothesis: Native state is at the global free energy minimum

Anfinsen (1973)

6

Questions related to folding I Long time kinetics:

dynamics of folding only statistical

correctness possible ensemble dynamics e.g., folding@home

Short time kinetics strong correctness

possible e.g., transport

properties, diffusion coefficients

7

Questions related to folding II Sampling

Compute equilibrium averages by visiting all (most) of “important” conformations

Examples: Equilibrium

distribution of solvent molecules in vacancies

Free energies Characteristic

conformations (misfolded and folded states)

8

Overview1. Motivation: sampling conformational space of proteins

2. Methods for sampling (MD, HMC)

3. Evaluation of new Shadow HMC

4. Future applications

9

Classical molecular dynamics Newton’s

equations of motion:

Atoms Molecules CHARMM force

field(Chemistry at Harvard Molecular Mechanics)

'' ( ) ( ). - - - (1)U Mq q F q

Bonds, angles and torsions

10

What is a Forcefield?

The forcefield is a collection of equations and associated constants designed to reproduce molecular geometry and selected properties of tested structures.

In molecular dynamics a molecule is described as a series of charged points (atoms) linked by springs (bonds).

11

Energy Terms Described in the CHARMm forcefield

Bond Angle

Dihedral Improper

12

Energy Functions

Ubond = oscillations about the equilibrium bond lengthUangle = oscillations of 3 atoms about an equilibrium angleUdihedral = torsional rotation of 4 atoms about a central bondUnonbond = non-bonded energy terms (electrostatics and Lennard-Jones)

13

Molecular Dynamics –what does it mean?MD = change in conformation over time using a forcefield

Conformational change

EnergyEnergy supplied to the minimized system at the start of the simulation

Conformation impossible to access through MD

14

MD, MC, and HMC in sampling Molecular Dynamics takes long steps in phase

space, but it may get trapped Monte Carlo makes a random walk (short

steps), it may escape minima due to randomness

Can we combine these two methods?

MCMDHMC

15

Hybrid Monte Carlo We can sample from a distribution with

density p(x) by simulating a Markov chain with the following transitions: From the current state, x, a candidate state x’

is drawn from a proposal distribution S(x,x’). The proposed state is accepted with prob.min[1,(p(x’) S(x’,x)) / (p(x) S(x,x’))]

If the proposal distribution is symmetric, S(x’,x)) = S(x,x’)), then the acceptance prob. only depends on p(x’) / p(x)

16

Hybrid Monte Carlo II Proposal functions must be reversible:

if x’ = s(x), then x = s(x’) Proposal functions must preserve

volume Jacobian must have absolute value one Valid proposal: x’ = -x Invalid proposals:

x’ = 1 / x (Jacobian not 1) x’ = x + 5 (not reversible)

17

Hybrid Monte Carlo III Hamiltonian dynamics preserve volume in

phase space Hamiltonian dynamics conserve the Hamiltonian

H(q,p) Reversible symplectic integrators for

Hamiltonian systems preserve volume in phase space

Conservation of the Hamiltonian depends on the accuracy of the integrator

Hybrid Monte Carlo: Use reversible symplectic integrator for MD to generate the next proposal in MC

18

HMC Algorithm

Perform the following steps:1. Draw random values for the momenta p from

normal distribution; use given positions q2. Perform cyclelength steps of MD, using a

symplectic reversible integrator with timestep t, generating (q’,p’)

3. Compute change in total energy H = H(q’,p’) - H(q,p)

4. Accept new state based on exp(- H )

19

Hybrid Monte Carlo IV

Advantages of HMC: HMC can propose and accept distant points

in phase space, provided the accuracy of the MD integrator is high enough

HMC can move in a biased way, rather than in a random walk (distance k vs sqrt(k))

HMC can quickly change the probability density

20

Hybrid Monte Carlo V As the number of atoms

increases, the total error in the H(q,p) increases. The error is related to the time step used in MD

Analysis of N replicas of multivariate Gaussian distributions shows that HMC takes O(N5/4 ) with time step t = O(N-1/4) Kennedy & Pendleton, 91

System size N

Max t

66 0.5

423 0.25

868 0.1

5143 0.05

21

Hybrid Monte Carlo VI The key problem in scaling is the accuracy of

the MD integrator More accurate methods could help scaling Creutz and Gocksch 89 proposed higher

order symplectic methods for HMC In MD, however, these methods are more

expensive than the scaling gain. They need more force evaluations per step

22

Overview1. Motivation: sampling conformational space of proteins

2. Methods for sampling (MD, HMC)

3. Evaluation of new Shadow HMC

4. Future applications

23

Improved HMC Symplectic integrators conserve exactly

(within roundoff error) a modified Hamiltonian that for short MD simulations (such as in HMC) stays close to the true Hamiltonian Sanz-Serna & Calvo 94

Our idea is to use highly accurate approximations to the modified Hamiltonian in order to improve the scaling of HMC

24

Shadow Hamiltonian

Work by Skeel and Hardy, 2001, shows how to compute an arbitrarily accurate approximation to the modified Hamiltonian, called the Shadow Hamiltonian

Hamiltonian: H=1/2pTM-1p + U(q) Modified Hamiltonian: HM = H + O(t p) Shadow Hamiltonian: SH2p = HM + O(t 2p)

Arbitrary accuracy Easy to compute Stable energy graph

Example, SH4 = H – f( qn-1, qn-2, pn-1, pn-2 ,βn-1 ,βn-2)

25

See comparison of SHADOW and ENERGY

26

Shadow HMC Replace total energy H with shadow

energy SH2m = SH2m (q’,p’) – SH2m (q,p)

Nearly linear scalability of sampling rateComputational cost SHMC, N(1+1/2m), where

m is accuracy order of integrator Extra storage (m copies of q and p) Moderate overhead (25% for small

proteins)

27

Example Shadow Hamiltonian

28

ProtoMol: a framework for MD

Front-end

Middle layer

back-end

libfrontend

libintegrators

libbase, libtopologylibparallel, libforces

Modular design of ProtoMol (Prototyping Molecular dynamics).Available at http://www.cse.nd.edu/~lcls/protomol

Matthey, et al, ACM Tran. Math. Software (TOMS), submitted

29

SHMC implementation Shadow Hamiltonian

requires propagation of β

Can work for any integrator

30

Systems tested

31

Sampling Metric 1 Generate a plot of dihedral angle vs.

energy for each angle Find local maxima Label ‘bins’ between maxima For each dihedral angle, print the label

of the energy bin that it is currently in

32

Sampling Metric 2 Round each dihedral angle to the

nearest degree Print label according to degree

33

Acceptance Rates

34

More Acceptance Rates

35

Sampling rate for decalanine (dt = 2 fs)

36

Sampling rate for 2mlt

37

Sampling rate comparison Cost per conformation is total

simulation time divided by number of new conformations discovered (2mlt, dt = 0.5 fs) HMC 122 s/conformation SHMC 16 s/conformation HMC discovered 270 conformations in

33000 seconds SHMC discovered 2340 conformations in

38000 seconds

38

Conclusions SHMC has a much higher acceptance

rate, particularly as system size and timestep increase

SHMC discovers new conformations more quickly

SHMC requires extra storage and moderate overhead.

SHMC works best at relatively large timesteps

39

Future work Multiscale problems for rugged energy surface

Multiple time stepping algorithms plus constraining Temperature tempering and multicanonical

ensemble Potential smoothing

System size Parallel Multigrid O(N) electrostatics

Applications Free energy estimation for drug design Folding and metastable conformations Average estimation

40

Acknowledgments Dr. Thierry Matthey, co-developer of ProtoMol, University

of Bergen, Norway Graduate students: Qun Ma, Alice Ko, Yao Wang, Trevor

Cickovski Students in CSE 598K, “Computational Biology,” Spring

2002 Dr. Robert Skeel, Dr. Ruhong Zhou, and Dr. Christoph

Schutte for valuable discussions Dr. Radford Neal’s presentation “Markov Chain Sampling

Using Hamiltonian Dynamics” (http://www.cs.utoronto.ca ) Dr. Klaus Schulten’s presentation “An introduction to

molecular dynamics simulations” (http://www.ks.uiuc.edu )