Parameterisation of a custom amino-acid, PCA

Post on 08-Jan-2016

33 views 2 download

description

Parameterisation of a custom amino-acid, PCA. Contents. Force-fields and how they work What kind of interactions do we need to parameterise? Quantum Mechanical calculations Where force-fields obtain basic information CHARMM parameterisation process The goal of developers => follow same goal - PowerPoint PPT Presentation

transcript

Parameterisation of a custom amino-acid, PCA

Contents

• Force-fields and how they work– What kind of interactions do we need to

parameterise?

• Quantum Mechanical calculations– Where force-fields obtain basic information

• CHARMM parameterisation process– The goal of developers => follow same goal

• Worked example of pyroglutamic acid

What this talk will cover• It will:

– Teach the necessary background

– Advise starting points for parameterisation, by analogy with existing compounds

– Follow the process for a CHARMM style strategy

• pyroglutamic acid under CHARMM27

- We will be following a published process from developers

• It will not:• Teach you how to

create a force-field from scratch– Full parameterisation is a

long process, with many potential pit-falls

– This applies to new atoms and geometries not seen in force-fields

– If your molecule falls under this category, it’s best to ask the experts whether someone else has done it

NB: read Vanommeasleagh's home page and tutorials: http://dogmans.umaryland.edu/~kenno/#CGenFF

1: Force fields• All force-fields have a purpose

– There are several force-fields, each with their own goals (e.g. following developer’s research interest)

– They are thus not fully compatible

• All force-fields are made from simple components– They are not really black-boxes– Each terms can be fully understood– Therefore, researchers like us can make force-fields

• I will describe CHARMM-ffs from above points

Purpose of force-fields• Force-fields are used to

replicate chemical or biological environments.– This means they should

produce reasonable results compared to experimental data

• Force-fields must also be derived from quantum mechanical (QM) simulations for molecular accuracy– i.e. FFs must approximate

behaviour of electrons and atomic bonds

– FFs must choose basic functions with which to do this

• FF-developers must choose what kind of environment or interaction they will replicate

• Thus, FFs must make choices– AMBER targets proteins

conformations– OPLS-AA targets organic

liquid environments– CHARMM targets hydrous

solvent interactions and energies

• .˙. Many force-fields can be used for task X, but some will perform better.

Purpose of force-fieldsDue to these differences:

• Force-fields have an optimal range of performances, and limits– E.g. GROMOS works

particularly well for studies of lipid behaviour

– Results may be surprising if ffs used in exotic conditions

• Force-fields should not be mixed with each other• The detailed mechanics are

different

Analogy: TIP3 water do not melt or boil at expected temperatures – they were never

meant to!

CHARMM-CGenFF force field

• Basis: General forcefield for all organic molecules– Match MM-properties to QM calculations (MP2/6-31G*)– Minimum necessary work, to allow parameters to be

shared across many different molecules– Can use for, e.g. drug-binding studies in MD

• Overall process for parametrisation– Charges <= Dipole moments, interaction with TIP3 water– Bond and angle values <= QM Equilibrium geometry– Bond and angle constraints <= Vibrational modes– Dihedral constraints <= Potential energy scans– Van der Waals <= Heat of solvation, molecular volume

CHARMM force field for biomolecules

• Basis: Energetically accurate set of parameters– Replicate observed experimental quantities specific to

proteins, nucleic-acids, etc.– i.e. Aim is for CHARMM proteins/DNA which act exactly like

their counterparts in real life• Overall process for parametrisation

– Fit parameters like CGenFF.– Modify to fit with relevant experimental observations. This

means heat of solvation, solvent/solute interactions, etc.– Operating conditions: This force-field is optimised for room

temperature, liquid phase experiments – Similar process: CGenFF can combine with CHARMM

Comparison: AMBER’s protein FF• Basis: To replicate protein behaviour by modelling amino

acid conformations accurately– (subtle difference to CHARMM)

• Process:– Point charges fit to higher QM calculations

(B3LYP/cc-pVTZ/HF/6-31G**) at dielectric constant = 4– Bonds and angles matched to X-ray crystals and vibrational

spectra– Torsions fit to reproduce peptide conformations and phi-psi

energy surfaces.• Performs similar to CHARMM by different means

– Complete amino acids are parameterised, where as CHARMM uses fragments with known experimental data

– More dependent on QM and biochemical observations, less on strict chemical data

Inner-working of CHARMM

• CHARMM and CGenFF (from Duan e.t al., 2003) :– Harmonic bonds– Harmonic angles

(modified)– Harmonic

impropers– Cosine dihedrals

Inner-working of CHARMM

Compositions• Basic interactions between

bonded atoms are harmonic potentials– This reduces computation

cost by using simple functions and eliminating electrons from the equation

• Electrostatics and vdW forces are preserved– They are essential to

molecular systems!

Observations• Intra-molecular terms are very

simple in nature• They must be fitted to

approximate real covalently bonded molecules

• QM-calculations form the basis of these target data

• During parameterisation, we won’t usually need to modify vdW terms

QM calculations: Gaussian• CHARMM developers

use Gaussian to produce target data– Process documented

for Gaussian– However, one can use

other QM programs• I’ll explain what

calculations are required and how to do them

Using Gaussian with GaussView

Gaussian: Scripting• Textual interface

provides all input necessary– One file for one

simulation• GaussView (GUI) is

provided to assist process

• The QM-level for parameterisation is MP2/6-31G*– Enough to describe

common interactions– Probably not sufficient for

certain organo-metal interactions

Gaussian: ScriptingConditions and general

purpose of this simulation

Coordinates:different representations

possible

Title

Detailed commands, follow-on simulations,

etc.

QM calculations required:

1: Equilibrium geometry

2: Vibrational spectra

3: Dihedral energy surfaces

NB: CHARMM website provides a fully worked tutorial

CHARMM parametrisation process

• Priorities of different parameters– The energetics needs to be replicated ultimately

to < 1 kcal/mol– Some parameters have wide-spread effects, other

fill in important details

• Flowchart– Parameterisation is iterative, and will take time

• Illustrate process with pyroglutamic acid

CHARMM parametrisation process

• The priority of different parameters:– Charges– Equilibrium bond and

angle values– Bonds and angles force

constants– Torsions

• This order will permeate throughout parameterisation

• Each set of parameters depends on everything above

• Hence, refinement of parameters follows:• Create/modify data• Refine, check results• repeat.

Flowchart and notations• The rest of this talk follows like so:• 1 – Prepare entries and coordinates• 2 – Optimise charges• 3 – Optimise bonds and angles

– 3.1 – equilibrium values by QM geometry– 3.2 – force constants by molecular vibrations

• 4 – Optimise dihedrals– 4.1 – Generate Potential Energy Scans– 4.2 – Dihedral constants by matching and chemcial

knowledge• 5 – Validation

Individual ProcessesSet initial topology

and geometry

QM-minim. geometrywith water molecules

Set charges and vdW

MM-minim. geometrywith water molecules

Do theymatch?

charge and vdWfits complete

Set initial geometry

QM-minim. geometry

Modify bonds and angles

MM-minim. geometry

Do theymatch?

bond and anglefits complete

...etc.

Calculate QM vib. spec.in CHARMM

Modify bond/angleforce const.

Calculate MM vib. spec.in CHARMM

Do theymatch?

bond/angleforce const. complete

Validation• After the last fit to

dihedrals is complete, all outputs need to be verified– Starting again from

partial charges and water interaction simulations...

• Any changes likely propagate itself downwards through flowchart• Main reason why

parameterisation is time-consuming

First parametrisation

Do charges

Do bonds and angles

Do dihedrals

Validate all data

all complete

Worked example:

• Pyroglutamic acid– As its name suggests,

an amino acid– N-terminal only,

cyclisation of glumatic acid or enzymatic activity

– Used in protein-ligand simulation with scorpion toxins.

1.1: prepare topology (idea)

• Observe existing molecules in the database– What chemical

groups are your molecules?

• Cut and paste sections together, borrowing charges and groupings

1.1: prepare topology (details)• Adapt from existing residues.

– PRO as base residue, referred to GLN and backbone values.

• Consider its use and clashes with existing parameters

• Right now, backbone angles and torsion parameters will be used (bad idea for cyclic molecule)

• Change atom-typings to allow modification of important atom bonds and angles

• NB: Creating new atom-types are not preferred, as it will be more difficult for future work

1.1: atom-typings• From existing ff:• neutral Ns in CHARMM

– NH2 is primary amine– NH1 is secondary amine – N is tertiary amine

• carboxyl Cs in CHARMM– CC/CD are proteins– CE1/CE2 are elementary

alkanes• Use NH1/CC typing, no need

to modify O and H.• Topology done! Return to

ICs later.

1.1: prepare parameters• Work out required bonds, angles,

dihedrals– running CHARMM can help, as it stops

with a warning when bonds and angles are missing

• Borrow known bond and angle values– PRO provides most of existing– CGenFF has same philosophy in making

bonded parameters• 2PDO is a very similar molecule in

CGenFF. Use its values for dihedrals as an initial guess.

• TIP: Establish what you *need* to parameterise now, and change only them in future edits.

1.2: Create molecule for Gaussian

• The starting geometry will affect results– We grab some experimental coordinates from, e.g.

DrugBank or PDB– These coordinates can also be created with some

chemical softwares

• NB: CHARMM uses IC tables which it will use to generate the residues when coordinates are not given– Either paste by analogy or transfer from

minimised geometry

1.2: QM Minimisation

• Gaussian outputs in formats offering more accuracy than pdb– Outputs all bond, angle

and dihedral data– file conversions may be

necessary for visualisation

1.3: Create IC table

• Each entry in the IC table (see below) lists 4 connected atoms, I, J, K and L,; for a normal IC table entry, the I-J bond length, R(IJ); the I-J-K bond angle, T(IJK); the dihedral angle I-J-K-L, PHI; the bond angle T(JKL); and the K-L bond length, R(KL) are listed. Improper dihedral angles, which are used to keep sp2 atoms planar and sp3 atoms in a tetrahedral geometry, are marked with a star. The center atom of an improper dihedral angle is marked with a star. For an improper dihedral angle entry, the I-K bond length, R(IK); the I-K-J bond angle, T(IKJ); the dihedral angle I-J-K-L , PHI; the J-K-L bond angle T(JKL); and the K-L bond length, R(KL) are listed. The atom entry "-99" indicates an undefined atom.

• CHARMM begins with a seed of three atoms, then defines the rest of the molecule with these three– Protein conventions

follow backbone N-CA-C• When you create an IC

table, remember to base your first entries on these and ‘grow’ the molecule from there

CHARMM manual entry(!)

1.3: Create IC tableAtom set Bond

I-JAngleI-J-K

Dihedral I-J-K-L

AngleJ-K-L

BondK-L

N CA C O 1.441 111.58 22.93 120.82 1.23

CB CA C O 1.545 110.65 -172.00 120.82 1.23

Atom set BondI-K

AngleI-K-J

Dihedral I-J-K-L

AngleJ-K-L

BondK-L

N C *CA HA 1.441 110.86 -122.40 109.09 1.102

N C *CA CB 1.441 110.86 113.74 110.65 1.545

1.3: IC table notes• CHARMM does not need

every possible entry in a given IC table– Only the dihedrals are

necessary• The command “IC

PARAM” will fill in bonds/angles using existing parameters

• Will save you a lot of time

• Number of entries: about 1 less than the total number of atoms.

2: Optimise charges• In CHARMM ff, protein charges are not

parameterised by QM calculations alone.• Consistent behaviour with other amino-acids

means that PCA should obey similar charges, rather than QM data per-se– Obtained charges from analogy are “good enough”

when compared with QM data– No unique chemical groups such that charge

optimisation is required

• This step is thus skipped for PCA• I will show you a mocked example

2.1: Optimise charges

• Begin with charges from Gaussian/analogy

• (can read from output)

2.1: Optimise charges

• Run water interaction simulation and also check molecule dipole– i.e. imitate H-bonding

interaction with charges

• Modify charges to fit both data

• NB: MM dipole needs to overestimate QM dipole by 30-50%, since QM data is in vacuum

3: Optimise bonds and angles • Begin by setting equilibrium values to

QM or crystal values• Compare results and modify

equilibrium values– Using an IC table by analogy gives the

wrong ring conformation to CHARMM– I constructed an IC to start PCA near the

other minimum. CHARMM then finds the equatorial conformer

• It is important to check that your conformation agrees after dihedras are fitted

3.1: Bond/angle equilibrium values

• CHARMM developers used 0.2 Å and 3° as the upper limit– Can usually do much

better– Developers seek to

use same parameters to describe many ligands, we do not need to

3.1: Bond/angle force constants• Method: Comparison of QM and MM

vibrational spectra • Analogous residues are good starting points

for force constants

• However, perfect agreement is impossible– This is due to differences between MM and QM

minima, and mixing with torsion parameters– May need to modify again during validation– A “general” agreement (about 10%) is good

enough when all parameterisation is finished

3.1: Vibrational spectra• Vibrations can be

collected into components– change in dipole

determines IR absorption• Components are defined

by motions of collective parts– bond stretches and angle

bending– rocking, scissoring,

wagging motions

• Tip: revise IR and Raman spectroscopy

dihedrals, out-of-plane

motion

N-H,O-H stretch

C-H stretch

C=O stretch

N-C stretch

3.1: Vibrational notation

• In CHARMM, you need to convert motions of atoms into bonds, angles and dihedrals (internal coordinates, IC)– Forms basis set of all the degrees of freedoms– # Vibrations = DoG = 3N-6– CHARMM then uses these to fit vibrational spetra

• Then you need to convert these ICs into vibrational modes– Read Pulay et. al. (1979)

NB: I wrote a relatively simple tutorial on the CHARMM forums to run users through the Pulay conversion for water and propane.

QMMM

• Visualising these vibrations with GaussView and others will help you identify important vibrations

3.1: Bond/angle force constants

• NB: Remember there is a distinction between fitting to QM calculations, and fitting to experimental spectra

4: Optimise dihedrals• Potential Energy Scans

involve: fixing a dihedral at discrete points, minimising the rest of the geometry, and calculating absolute energy.

• Determines conformational preferences of residues, especially important for packing

4: Optimise dihedrals

• PCA with initial dihedrals favour the opposite ring conformer (top)

• One will need to work out which parameter affect the rotations you need

• After a series of fits and rationales for given parameters, a closer agreement can be obtained (bottom)

Some rationales for the PCA case:

• Fit only dihedrals that contains re-typed atoms– This includes NH1 and CC

• Carboxyl backbone prefers equatorial over axial orientation w.r.t. ring– Fit a single 1-fold or 2-fold

dihedral to CD-N-CA-C to express this

• Keep all the 2PDO dihedrals as is, except where necessary– So, planar amide retains

2-fold only due to symmetry

– Ring-dihedrals contain only 3-fold

– Modify numbers to produce correct energy surface

• The rest is heuristic searching

• Pointers for a good fit:• Single parameters of 1-fold,2-fold, 3-fold, 6-fold.• Accuracy to about 0.2 kcal/mol• Attention to barrier heights and relative minima

positions

• Using multiple dihedral parameters and no restrictions on phase, one can achieve a fit like this.

• The energy surface is very close to QM. However, the parameters used are arguably unphysical

5: Iteration

• Now that you have your molecule, re-run all the tests and check if discrepancies have arisen during the process

• The molecular vibrational spectrum should look better– Dihedrals factor into lower vibrational modes

• Adjust as necessary.

5: Comments

• You can spend as much time as you wish tweaking the numbers, but keep in mind that:– Simulation accuracy is going to be larger than the

residual errors in your parameters– Even experimental accuracy is 1 kcal/mol

• If it is convenient, post the major results and check with the developers (be nice)

Finished product• Once you are satisfied

with the result, it’s time to test the new molecule in MD

• As force-field are always in constant development...– if you did your

parameterisation well, CHARMM developers may add it to the collection.

• Good luck!