Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural...

Structural bioinformatics for glycobiology

Structural glycoinformatics approaches

• Structural modeling– Comparative modeling of glycoproteins– Complex modeling: glycoprotein replacement

• Modeling of the complex of glycans and GBPs and GTs:– docking– Analysis of interaction specificities

• Key residues vs. Specific glycan conformations

• Molecular Dynamics– Modeling the dynamics of the recognition of glycans by

GBPs– Modeling the enzymology of GTs: quantum mechanic

calculations

obtain sequence (target)

fold assignment

comparativemodeling

ab initiomodeling

build, assess model

Approaches to predicting protein structures

high identitylong alignment

low identityfragment alignment

Sequence-sequence alignment orSequence-structure alignment

Comparative modeling of proteins

• Definition: Prediction of three dimensional structure of a target protein from the

amino acid sequence (primary structure) of a homologous (template) protein for which an X-ray or NMR structure is available.

• Why a Model:A Model is desirable when either X-ray crystallography or NMR spectroscopy cannot determine the structure of a protein in time or at all. The built model provides a wealth of information of how the protein functions with information at residue property level, e.g. the interaction with the ligands, GBPs/GTs with glycans.

??

KQFTKCELSQNLYDIDGYGRIALPELICTMFHTSGYDTQAIVENDESTEYGLFQISNALWCKSSQSPQSRNICDITCDKFLDDDITDDIMCAKKILDIKGIDYWIAHKALCTEKLEQWLCEKE

Comparative Modeling(or homology modeling)

Use as template & model

8lyz1alc

KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRLShare Similar

Sequence

Homologous

Homology models have RMSDs less than 2Å more than 70% of the time.

Homology models can be very smart!

.

0

20

40

60

80

100

0 50 100 150 200 250

Number of residues aligned

Perc

enta

ge s

equence

identi

ty/s

imila

rity

(B.Rost, Columbia, NewYork)

Sequence identity implies structural similarity

Don’t know region .....

Sequence similarity implies structural similarity?

Step 1: Fold IdentificationAim: To find a template or templates structures from protein database (PDB)

Improved Multiple sequence alignment methods improves sensitivity - remote homologs PSIBLAST, CLUSTAL

pairwise sequence alignment - finds high homology sequences BLAST

Fold recognition programs – find low homology sequences (threading, profile-profile alignment)

Step 2: Model ConstructionAim: To build three dimension (3D) structures of proteins, coordinates of every

atoms of the homology proteins

Approach 1: protein structure buildup: cores, loops and sidechains;

Approach 2: whole protein modeling: constraint-based optimization.

Commonly used programs: Modeller (http://salilab.org/modeller/)Swiss-model (http://swissmodel.expasy.org/)Geno3D (http://geno3d-pbil.ibcp.fr/)… …

http://salilab.org/modeller/

http://swissmodel.expasy.org/

http://geno3d-pbil.ibcp.fr/

Step 3: Model Construction

Modeling of glycan-protein complexes• Template: glycan-protein complex;

– Case 1: same glycan, different protein• Glycoprotein replacement: comparative modeling of protein

structure• Energy minimization, allowing structural flexibility of glycans

– Case 2: same protein, different glycan• Flexible docking of glycans

– Case 3: different protein and different glycan• Comparative modeling of proteins• Flexible docking of glycan• Can also be applied without a template of complex

Flexible docking• Semi-flexible (rigid protein, flexible ligand)

– Useful for drug screening– >150 programs: Dock, AutoDock, FlexX/FlexE, …

• Flexible protein: mainly sidechains (hard)• Two elements of semi-flexible docking algorithms

– ligand sampling methods• Pattern matching: Genetic Algorithm, Molecular Dynamics, Monte

Carlo…– Treatment of intermolecular forces:

• Simplified scoring functions: empirical, knowledge-based and molecular mechanics e.g. AMBER, CHARMM, GROMOS, ...

• Very simple treatment of solvation and entropy, or completely ignored!

Flexible docking of glycans to proteins

• Glycan structure sampling– Automatic generation / sampling of 3D glycan

structures: Sweet II (http://www.dkfz-heidelberg.de/spec/sweet2)

• Docking of each glycan conformation to the GBP: Scoring schemes– Empirical scores– Forcefield

• GLYCAM: modified AMBER forcefield / MD tools for glycans (R. Woods group)

– Challenge: water molecules

Flexibility of molecules

• Atoms connected by covalent bonds

• Bond lengths and bond angles are rigid

• Torsion (dihedral) angles are flexible

Frequently used definitions of glycosidic torsion angles

Angle NMR style

C − 1 crystallographic style

C + 1 crystallographic style

ϕ H1—C1—O—C′x O5—C1—O—C′x O5—C1—O—C′x

ψ C1—O—C′x—H′x C1—O—C′x—C′x−1 C1—O—C′x—C′x+1

ψ [(1–6)-linkage] C1—O—C′6—C′5 C1—O—C′6—C′5 C1—O—C′6—C′5

ω [(1–6)-linkage] O—C′6—C′5—H′5 O—C′6—C′5—C′4 O—C′6—C′5—O′5

ASN

sweet2: http://www.dkfz-heidelberg.de/spec/sweet2/

Induced fit? rigid receptor hypethesis

Preferred torsion angles of glycans

Cone-like (left) and umbrella-like (right) topologies of 2-3 and 2-6 siaylated glycans binding to influenza

viral HAs

Chandrasekaran, et. al. Nature Biotechnology 26, 107 - 113 (2008)

M. E. Taylor and K. Drickamer, Glycobiology 2009 19(11):1155-1162

Combine structural analysis with the glycan array analysis: providing structural insights.


Ligand binding by the scavenger receptor C-type lectin (SRCL) and LSECtin


Binding of multiple classes of ligands to DC-SIGN and the macrophage galactose receptor. Model of the binding site in the macrophage galactose receptor with a bound GalNAc residue, based on the structure of the galactose-binding mutant of mannose-binding protein that was created by insertion of key binding site residues from the galactose-binding receptor.


Mechanisms of mannose-binding protein interaction with ligands.

Molecular Dynamics: simulation of molecular motions

• Energy model of conformation• Two main approaches:

– Monte Carlo - stochastic– Molecular dynamics – deterministic

• Understand molecular function and interactions– Catalysis of enzymes

• Complementary to experiments• Obtain a movie of the interacting molecules

Basic Concepts of simulation of molecular motion

1. Compute energy for the interaction between all pairs of atoms.

2. Move atoms to the next state.3. Repeat.

Energy Function

• Target function that MD uses to govern the motion of molecules (atoms)

• Describes the interaction energies of all atoms and molecules in the system

• Always an approximation– Closer to real physics --> more realistic, more

computation time (I.e. smaller time steps and more interactions increase accuracy)

F = MA

exp(-E/kT)

domain

quantumchemistry

moleculardynamics

Monte Carlo

mesoscale continuum

Length Scale

Tim

e Sc

ale

10-10 M 10-8 M 10-6 M 10-4 M

10-12 S

10-8 S

10-6 S

Taken from Grant D. SmithDepartment of Materials Science and EngineeringDepartment of Chemical and Fuels EngineeringUniversity of Utahhttp://www.che.utah.edu/~gdsmith/tutorials/tutorial1.ppt

Scale in Simulations

The energy model

http://cmm.cit.nih.gov/modeling/guide_documents/molecular_mechanics_document.html

The NIH Guide to Molecular Modeling

• Proposed by Linus Pauling in the 1930s

• Bond angles and lengths are almost always the same

• Energy model broken up into two parts:– Covalent terms

• Bond distances (1-2 interactions)

• Bond angles (1-3)• Dihedral angles (1-4)

– Non-covalent terms• Forces at a distance between

all non-bonded atoms

The energy equation

Energy = Stretching Energy +Bending Energy + Torsion Energy + Non-Bonded Interaction Energy

These equations together with the data (parameters) required to describe the behavior of different kinds of atoms and bonds, is called a force-field.

Bond Stretching Energy

kb is the spring constant of the bond.

r0 is the bond length at equilibrium.

Unique kb and r0 assigned for each bond pair, i.e. C-C, O-H

Bending Energy

k is the spring constant of the bend.

0 is the bond length at equilibrium.

Unique parameters for angle bending are assigned to each bonded triplet of atoms based on their types (e.g. C-C-C, C-O-C, C-C-H, etc.)

Torsion Energy

A controls the amplitude of the curve

n controls its periodicity

shifts the entire curve along the rotation angle axis ().

The parameters are determined from curve fitting.

Unique parameters for torsional rotation are assigned to each bonded quartet of atoms based on their types (e.g. C-C-C-C, C-O-C-N, H-C-C-H, etc.)

Non-bonded Energy

A determines the degree the attractiveness

B determines the degree of repulsion

q is the charge

A determines the degree the attractiveness

B determines the degree of repulsion

q is the charge

Simulating In A Solvent• The smaller the system, the more particles on the

surface– 1000 atom cubic crystal, 49% on surface

– 106 atom cubic crystal, 6% on surface

• Would like to simulate infinite bulk surrounding N-particle system

• Two approaches:– Implicitly– Explicitly

• Periodic boundary conditions

Schematic representation of periodic boundary conditions.

http://www.ccl.net/cca/documents/molecular-modeling/node9.html

Parameters for MD: Forcefield

• Derived from direct experimental measurements on small molecules (~10 atoms)

• Commonly used: AMBER, CHARMM, GROMOS, etc– GLYCAM for MD of glycoconjugates (derived from

AMBER forcefield)

Monte CarloExplore the energy surface by randomly probing the

configuration space by a Markov Chain approachMetropolis method (avoids local minima):

1. Specify the initial atom coordinates.2. Select atom i randomly and move it by random displacement.3. Calculate the change of potential energy, E corresponding to

this displacement.4. If E < 0, accept the new coordinates and go to step 2.5. Otherwise, if E 0, select a random R in the range [0,1] and:

1. If e-E/kT < R accept and go to step 2 2. If e-E/kT R reject and go to step 2

Deterministic Approach

• Provides us with a trajectory of the system.– From atom positions, velocities, and accelerations,

calculate atom positions and velocities at the next time step.

– Integrating these infinitesimal steps yields the trajectory of the system for any desired time range.

• Typical simulations of small proteins including surrounding solvent in the pico-seconds.

Fi E

x i

F m

a

Deterministic / MD methodology

• From atom positions, velocities, and accelerations, calculate atom positions and velocities at the next time step.

• Integrating these infinitesimal steps yields the trajectory of the system for any desired time range.

• There are efficient methods for integrating these elementary steps with Verlet and leapfrog algorithms being the most commonly used.

MD algorithm

• Initialize system– Ensure particles do not overlap in initial positions

(can use lattice)– Randomly assign velocities.

• Move and integrate.

{r(t), v(t)}

{r(t+t), v(t+t)}

Leapfrog algorithm

MD studies of Prion proteins

• Prion protein (PrP) is associated with an unusual class of neurodegenerative diseases– Scrapie (sheep); bovine spongiform encephalopathy (BSE) in cattle; kuru,

Creutzfeldt-Jacob disease (CJD), Gerstmann-Sträussler-Scheinker syndrome

(GSS), and fatal familiar insomnia (FFI) in humans

• Protein-only hypothesis (Prusiner, 1982): the disease is caused by an abnormal form of the 250 amino acid PrP, which accumulates in plaques in the brain.

• PrP (PrPSc) differs from the normal cellular form (PrPC) only in its 3-D structure, and FTIR and CD spectra indicate it has a significantly increased content of ß-sheet conformation compared with PrPC

• Glycosylation appears to protect prion protein (PrPC) from the conformational transition to the disease-associated scrapie form (PrPSc);

PrP is a glyco-protein

• Available NMR structures are for non-glycosylated PrPC only

• Glycosylation appears to protect prion protein (PrPC) from the conformational transition to the disease-associated scrapie form (PrPSc)

• Objective: study of the influence of two N-linked glycans (Asn181 and Asn197) and of the GPI anchor attached to Ser230

Zuegg, et. al., Glycobiology, 2000, 10(10):959-974.

MD simulations• Molecular dynamics simulations on the C-terminal region of human prion

protein HuPrP(90–230), with and without the three glycans• AMBER94 force field in a periodic box model with explicit water

molecules, considering all long-range electrostatic interactions• HuPrP(127–227) is stabilized overall from addition of the glycans,

specifically by extensions of two helix and reduced flexibility of the linking turn containing Asn197;

• The stabilization appears indirect, by reducing the mobility of the surrounding water molecules, and not from specific interactions such as H bonds or ion pairs.– Asn197 having a stabilizing role, while Asn181 is within a region with already

stable secondary structure

Zuegg, et. al., Glycobiology, 2000, 10(10):959-974.

Cone-like (left) and umbrella-like (right) topologies of 2-3 and 2-6 siaylated glycans

binding to influenza viral HAs

Chandrasekaran, et. al. Nature Biotechnology 26, 107 - 113 (2008)

A retrospective analysis

MD simulation of glycan binding of influenza HAs

• A combined approach (MD + sequences) to predict ligand-binding mutants of H5N1 influenza HA– Modeling the ligand-bound state of H5N1 HA using the isolate VN1194

bound to α2,3-sialyllactose as previously crystallized– Excess mutual information was computed between each residue of each

monomer and the corresponding bound ligand, using the average mutual information between the residue and all residues as an estimate of the “background” mutual information.

– Combine these results with sequence analysis of H5N1 mutational data to predict clusters of residues that undergo coordinated mutation, which have some capacity to vary but are subject to selective pressure relating mutation. These residues may be richer targets to change ligand specificity than residues absolutely conserved or residues that display uncorrelated mutations (involved in immune escape).

Kasson, et. al., JACS, 2009, 131 (32), pp 11338–11340

Experimentally identified ligand-binding mutations in red, the top 5% of residues by dynamics scoring in cyan (overlap of these two in magenta), and the six mutation sites identified by both dynamics and sequence analysis in yellow.

The top three mutations from the ligand dissociation analyses in yellow. A modeled α2,3-sialyllactose is shown in orange.

Prediction of dissociation rate for HA mutants (in silico mutagenesis)

• Bayesian analysis methods to predict dissociation rates based on extensive simulation of each mutant and evaluate whether a mutant has a faster dissociation rate than the influenza clinical isolate that we use as a wild-type reference.

• These simulations were used to estimate the dissociation rate for each mutation.

• The mutation sites predicted by analysis of the molecular dynamics data include both residues immediately contacting the bound glycan and residues located farther away on the globular head of the hemagglutinin molecule.

Date post:	31-Dec-2015
Category:	Documents
Upload:	calvin-whitehead
View:	220 times
Download:	1 times

Structural bioinformatics for glycobiology. Structural glycoinformatics approaches Structural...

Documents