Date post: | 02-Apr-2015 |
Category: |
Documents |
Upload: | anahi-morell |
View: | 215 times |
Download: | 1 times |
Michael Schroeder BioTechnological CenterTU Dresden Biotec
Protein Structure
Lesk, chapter 5Details on SCOP and CATH can be found in
Structural Bioinformatics, Bourne/Weissig, chapter 12 and 13
By Michael Schroeder, Biotec, 2
Folding Proteins are linear polymer
mainchains with different amino acid side chains
Proteins fold spontaneously reaching a state of minimal energy Side and main chains
interact with one another and with solvent
Example movie
Jones, D.T. (1997) Successful ab initio prediction of the tertiary structure of NK-Lysin using multiple sequences and recognized supersecondary structural motifs. PROTEINS. Suppl. 1, 185-191
By Michael Schroeder, Biotec, 3
Examining Proteins
Specialised tools with different views of structure Corey, Pauling, Koltun
(CPK) Diameter of sphere ~
atomic radius Hydrogen white,
carbon grey, nitrogen blue, oxygen red, sulphur yellow
Cartoon Wire Balls
By Michael Schroeder, Biotec, 4
Examining Proteins
By Michael Schroeder, Biotec, 5
Protein Folding
Residue
Image taken from www.expasy.org/swissmod/course
Conformation of residue Rotation around N-Ca bond, (phi) Rotation around Ca-C bond, (psi) Rotation around peptide bond (omega)
Peptide bond tends to be planar and in one of two states:
trans 180 (usually) and cis, 0 (rarely, and mostly proline)
By Michael Schroeder, Biotec, 6
Sasisekharan-Ramakrishnan-Ramachandran plot
Solid line = energetically preferred
Outside dotted line = disallowed
Most amino acids fall into R region (right-handed alpha helix) or -region (beta-strand)
Glycine has additional conformations (e.g. left-handed alpha helix = L region) and in lower right panel
Image taken from www.expasy.org/swissmod/course
By Michael Schroeder, Biotec, 7
Ramachandran plot
Plot for a protein with mostly beta-sheets
Example for conformations
Image taken from www.expasy.org/swissmod/course
By Michael Schroeder, Biotec, 8
Helices and Strands
Consecutive residues in alpha or beta conformation generate alpha-helices and beta-strands, respectively
Such secondary structure elements are stabilised by weak hydrogen bonds
They are by turns or loops, regions in which the chain alters direction
Turns are often surface exposed and tend to contain charged or polar residues
By Michael Schroeder, Biotec, 9
Alpha Helix
Residue j is hydrogen-bonded to residue j+4
3.6 residues per turn 1.5A rise per turn Repeat every 3.6*1.5A = 5.4 A = -60 , = -45
Image taken from www.expasy.org/swissmod/course
By Michael Schroeder, Biotec, 10
Beta strand
Image taken from www.expasy.org/swissmod/course
By Michael Schroeder, Biotec, 11
Beta Sheets
Image taken from www.expasy.org/swissmod/course
By Michael Schroeder, Biotec, 12
Turn Residue j is bonded to
residue j+3
Often proline and glycine
Image taken from www.expasy.org/swissmod/course
By Michael Schroeder, Biotec, 13
How to Fold a Structure All residues must have stereochemically allowed
conformations Buried polar atoms must be hydrogen-bonded
If a few are missed, it might be energetically preferable to bond these to solvent
Enough hydrophobic surface must be buried and interior must be sufficiently densely packed
There is evidence, that folding occurs hierarchically: First secondary structure elements, then super-secondary,…
This justifies hierarchic approach when simulating folding
By Michael Schroeder, Biotec, 14
Structure Alignment
+
Slides from Hanekamp, University of Wyoming, www.uwyo.edu
By Michael Schroeder, Biotec, 15
Structure Alignment
+
By Michael Schroeder, Biotec, 16
Structure Alignment
In the same way that we align sequences, we wish to align structure
Let’s start simple: How to score an alignment Sequences: E.g. percentage of matching residues Structure: rmsd (root mean square deviation)
By Michael Schroeder, Biotec, 17
Root Mean Square Deviation
What is the distance between two points a with coordinates xa and ya and b with coordinates xb and yb? Euclidean distance:
d(a,b) = √ (xa--xb )2 + (ya -yb )2 + (za -zb )2
a
b
By Michael Schroeder, Biotec, 18
Root Mean Square Deviation
In a structure alignment the score measures how far the aligned atoms are from each other on average
Given the distances di between n aligned atoms, the root mean square deviation is defined as
rmsd = √ 1/n ∑ di2
By Michael Schroeder, Biotec, 19
Quality of Alignment and Example Unit of RMSD => e.g. Ångstroms
Identical structures => RMSD = “0” Similar structures => RMSD is small (1 – 3 Å) Distant structures => RMSD > 3 Å
Structural superposition of gamma-chymotrypsin and Staphylococcus aureus epidermolytic toxin A
By Michael Schroeder, Biotec, 20
Pitfalls of RMSD
all atoms are treated equally(e.g. residues on the surface have a higher degree of freedom than those in the core)
best alignment does not always mean minimal RMSD
significance of RMSD is size dependent
From www.uwyo.edu/molecbio/LectureNotes/ MOLB5650
By Michael Schroeder, Biotec, 21
Alternative RSMDs
aRMSD = best root-mean-square deviation calculated over all aligned alpha-carbon atoms
bRMSD = the RMSD over the highest scoring residue pairs
wRMSD = weighted RMSD
Source: W. Taylor(1999), Protein Science, 8: 654-665.http://www.prosci.uci.edu/Articles/Vol8/issue3/8272/8272.html#relat
From www.uwyo.edu/molecbio/LectureNotes/ MOLB5650
By Michael Schroeder, Biotec, 22
Computing Structural Alignments DALI (Distance-matrix-ALIgnment) is one of the first tools for structural
alignment How does it work?
Atoms: Given two structures’ atomic coordinates
Compute two distance matrices: Compute for each structure all pairwise inter-atom distances.
This step is done as the computed distances are independent of a coordinate system
The two original atomic coordinate sets cannot be compared, the two distance matrices can
Align two distance matrices: Find small (e.g. 6x6) sub-matrices along diagonal that match Extend these matches to form overall alignment
This method is a bit similar to how BLAST works.
SSAP (double dynamic programming) in term 3.
By Michael Schroeder, Biotec, 23
DALI Example
The regions of common fold, as determined by the program DALI by L. Holm and C. Sander, in the TIM-barrel proteins mouse adenosine deaminase [1fkx] (black) and Pseudomonas diminuta phosphotriesterase [1pta] (red):
By Michael Schroeder, Biotec, 24
Protein zinc finger (4znf)
Slides from Hanekamp, University of Wyoming, www.uwyo.edu
By Michael Schroeder, Biotec, 25
Superimposed 3znf and 4znf
30 CA atoms RMS = 0.70Å248 atoms RMS = 1.42Å
Slides from Hanekamp, University of Wyoming, www.uwyo.edu
Lys30
By Michael Schroeder, Biotec, 26
Superimposed 3znf and 4znf backbones
30 CA atoms RMS = 0.70Å
Slides from Hanekamp, University of Wyoming, www.uwyo.edu
By Michael Schroeder, Biotec, 27
RMSD vs. Sequence Similarity At low sequence identity, good structural
alignments possible
Picture from www.jenner.ac.uk/YBF/DanielleTalbot.ppt
By Michael Schroeder, Biotec, 28
Structure Classification
By Michael Schroeder, Biotec, 29
Why classify structures?
Structure similarity is good indicator for homology, therefore classify structures
Classification at different levels Similar general folding patterns (structures not
necessarily related) Possibly low sequence similarity, but similar structure
and function implies very likely homology High sequence similarity implies similar structures
and homology Classification can be used to investigate
evolutionary relationships and possibly infer function
By Michael Schroeder, Biotec, 30
Structure Classification
SCOP: Structural Classification of Proteins Hand curated (Alexei Murzin, Cambridge) with some
automation CATH: Class, Architecture, Topology, Homology
Automated, where possible, some checks by hand FSSP: Fold classification based on Structure-
Structure alignment of Proteins Fully automated
Reasonable correspondance (>80%)
By Michael Schroeder, Biotec, 31
Evolutionary Relation
Strong sequence similarity is assumed to be sufficient to infer homology
Close structural and functional similarity together are also considered sufficient to infer homology Similar structure alone not sufficient, as proteins may have
converged on structure due to physiochemical necessity Similar function alone not sufficient, as proteins may have
developed it due to functional selection In general, structure is more conserved than sequence
Beware: Descendents of ancestor may have different function, structure, and sequence! Difficult to detect
By Michael Schroeder, Biotec, 32
What is a domain? Single and Multi-Domain Proteins
By Michael Schroeder, Biotec, 33
What is a domain?
Functional: Domain is “independent” functional unit, which occurs in more than one protein
Physiochemical: Domain has a hydrophobic core
Topological: Intra-domain distances of atoms are minimal, Inter-domain distances maximal
Difficult to exactly define domain Difficult to agree on exact domain border
By Michael Schroeder, Biotec, 34
Domains re-occur
A domain re-occurs in different structures and possibly in the context of different other domains
P-loop domain in 1goj: Structure Of A
Fast Kinesin: Implications For ATPase Mechanism and Interactions With Microtubules Motor Protein (single domain)
1ii6: Crystal Structure Of The Mitotic Kinesin Eg5 In Complex With Mg-ADP Cell Cycle (two domains)
By Michael Schroeder, Biotec, 35
Domains re-occur
1in5: interaction of P-loop domain (green & orange) and winged helix DNA binding domain
1a5t: interaction of P-loop domain (green & orange) and DNA polymerase III domain
By Michael Schroeder, Biotec, 36
Domains have hydrophobic core
Kyte J., Doolittle R.F, J. Mol. Biol. 157:105-132(1982).
Hydrophobicity Plot for 1GOJ Kinesin Motor
-3
-2
-1
0
1
2
3
1 51 101 151 201 251 301
Residue
Hydrophobicity
Ala: 1.800 Arg: -4.500 Asn: -3.500 Asp: -3.500 Cys: 2.500 Gln: -3.500 Glu: -3.500 Gly: -0.400 His: -3.200 Ile: 4.500 Leu: 3.800 Lys: -3.900 Met: 1.900 Phe: 2.800 Pro: -1.600 Ser: -0.800 Thr: -0.700 Trp: -0.900 Tyr: -1.300 Val: 4.200
By Michael Schroeder, Biotec, 37
Intra-domain distances minimal
Distances between atoms within domain are minimal
Distances between atoms of two different domains are maximal
By Michael Schroeder, Biotec, 38
PDB, Proteins, and Domains
Ca. 20.000 structures in PDB 50% single domain 50% multiple domain 90% have less than 5 domains
Distribution of Number of Domains
-2000
0
2000
4000
6000
8000
10000
0 10 20 30 40 50 60
Number of Domains
Frequency
Dom# Freq.
1 8464
2 4358
3 926
4 1888
5 148
6 624
7 42
8 491
9 22
10 58
…
…
30 7
31 1
32 16
36 1
40 8
42 1
48 3
49 1
By Michael Schroeder, Biotec, 39
A structure with 49 domains 1AON, Asymmetric Chaperonin Complex Groel/Groes/(ADP)7
By Michael Schroeder, Biotec, 40
SCOP: Structural Classification of Proteins
FOLD
CLASS top
SUPERFAMILY
FAMILY
C1 set domains (antibody constant)
V set domains (antibody variable)
All alpha (218) All Beta (144) Alpha/Beta (136)Alpha+Beta (279)
Trypsin-like serine proteases (1) Immunoglobulin-like (23)
Transglutaminase (1) Immunoglobulin (6)
By Michael Schroeder, Biotec, 41
Class
All alpha (possibly small beta
adornments)
All beta (possibly small alpha
adornments)
By Michael Schroeder, Biotec, 42
Class Alpha/beta (alpha and beta) =
single beta sheet with alpha helices joining C-terminus of one strand to the N-terminus of the next subclass: beta sheet forming barrel
surrounded by alpha helices sublass: central planar beta sheet
Alpha+beta (alpha plus beta) = Alpha and beta units are largely separated Strands joined by hairpins leading
to antiparallel sheets
By Michael Schroeder, Biotec, 43
Class
Multi-domain proteins have domains placed in
different classes domains have not been
observed elsewhere
E.g. 1hle
By Michael Schroeder, Biotec, 44
Class
Membrane (few and most unique) and cell surface proteins E.g. Aquaporin 1ih5
By Michael Schroeder, Biotec, 45
Class
Small Proteins E.g. Insulin, 1pid
By Michael Schroeder, Biotec, 46
Class
Coiled coil proteins E.g. 1i4d, Arfaptin-Rac
binding fragment
By Michael Schroeder, Biotec, 47
Class
Low-resolution structures, peptides, designed proteins
E.g. 1cis, a designed protein, hybrid protein between chymotrypsin inhibitor CI-2 and helix E from subtilisin Carlsberg from Barley (Hordeum vulgare), hiproly strain
By Michael Schroeder, Biotec, 48
Fold, Superfamily, Family
Fold Common core structure
i.e. same secondary structure elements in the same arrangement with the same topological structure
Superfamily Very similar structure and function
Family Sequence identity (>30%) or extremely similar
structure and function
By Michael Schroeder, Biotec, 49
Distribution (2007)
Class Fold Superfamily Family
All alpha 259 459 772
All beta 165 331 679
Alpha/beta 141 232 736
Alpha+beta 334 488 897
Multidomain 53 53 74
Membrane and cell surface
50 92 104
Small proteins 85 122 202
Total 1086 1777 3464
By Michael Schroeder, Biotec, 50
Uses of SCOP
Automatic classification Understanding of protein enzymatic function Use superfamily and fold to study distantly related
proteins Study sequence and structure variability Derive substitution matrices for sequence
comparison Extract structural principles for design Study decomposition of multi domain proteins Estimate total number of folds Derived databases
By Michael Schroeder, Biotec, 51
PDB, Proteins, Domains revisited
80% of PDB have only one type of SCOP superfamily
15% of PDB have two different SCOP superfamilies
Frequency of Number of SCOP Superfamilies
-2000
02000
4000
60008000
10000
1200014000
16000
0 5 10 15 20 25
Number of Superfamilies
Frequency
sfNo sfNoFreq
1 13960
2 2721
3 495
4 178
5 33
6 25
7 1
9 4
20 9
21 1
22 1
23 6
By Michael Schroeder, Biotec, 52
A structure with 23 different
superfamilies
1k9m Co Crystal Structure Of Tylosin Bound To The 50S Ribosomal Subunit Of Haloarcula Marismortui Ribosome
By Michael Schroeder, Biotec, 53
The 20 Most Frequently Occurring
Superfamilies
Suyperfamily SCOP ID #PDB
Immunoglobulin b.1.1 823
Lysozyme-like d.2.1 777
Trypsin-like serine proteases b.47.1 649
P-loop containing nucleotide triphosphate hydrolases c.37.1 521
NAD(P)-binding Rossmann-fold domains c.2.1 384
Globin-like a.1.1 384
(Trans)glycosidases c.1.8 332
Acid proteases b.50.1 288
Concanavalin A-like lectins/glucanases b.29.1 230
Thioredoxin-like c.47.1 217
EF-hand a.39.1 212
alpha/beta-Hydrolases c.69.1 195
Cupredoxins b.6.1 178
Ribonuclease H-like c.55.3 178
PLP-dependent transferases c.67.1 176
Periplasmic binding protein-like II c.94.1 171
Carbonic anhydrase b.74.1 169
Metalloproteases (\zincins\"), catalytic domain" d.92.1 169
FAD/NAD(P)-binding domain c.3.1 162
Cytochrome c a.3.1 161
By Michael Schroeder, Biotec, 54
CATH
Class secondary structure
composition Architecture
orientation in 3D Topology
connectivity Homology
Grouped by evidence for homology (sequence, structure and function)
By Michael Schroeder, Biotec, 55
Generating CATH
1. Identify close relatives by pairwise sequence alignment
2. Detect more distant relatives using 2a. sequence profiles and 2b. structure alignment
3. Structures still unclassified after 1. and 2. are examined by hand to detect domain boundaries
4. Try 2. and 3. again 5. If still unclassified assign manually
By Michael Schroeder, Biotec, 56
CATH step 1: Sequence-based Identification of
Homologues Structures
> 30% sequence similarity implies similar structure
Relatives identified using pairwise alignment are clustered using hierarchical clustering with single linkage
Reminder…
By Michael Schroeder, Biotec, 57
Hierarchical Clustering
(1,2) 3 (4,5)
(1,2) 0 5 8
3 0 4
(4,5) 0
1 2 3 4 5
1 0 2 6 10 9
2 0 5 9 8
3 0 4 5
4 0 3
5 0
(1,2) 3 4 5
(1,2) 0 5 9 8
3 0 4 5
4 0 3
5 0
(1,2) (3,(4,5))
(1,2) 0 5
(3,(4,5)) 0
5
4
3
2
1
0
1 2 3 4 5
By Michael Schroeder, Biotec, 58
Hierarchical Clustering: How to define distance between clusters?
Single linkage: Minimum Example: Distance (A,B) to C is 1
Complete linkage: Maximum Example: Distance (A,B) is C is 2
Average linkage: Average Example: Distance (A,B) to C is 1.5
Are dendrograms always the same independent of the linkage method?
0C
10B
210A
CBA
A B C A B C
By Michael Schroeder, Biotec, 59
Hierarchical Clustering: Chaining Beware of chaining
when using single linkage
As nearest neighbour selected, it appears that all members of the cluster are very similar to each other, when in fact A and Z are very different
A B C D … Z
A 0 1 2 3 … 25
B 0 1 2 … 24
C 0 1 … 23
D 0 … 22
… …
Z 0
A B C D … Z
By Michael Schroeder, Biotec, 60
CATH and single linkage
It is argued that structural data is quite sparse, hence it cannot be expected that all cluster
members will be very similar (in terms of sequence) to each other,
so that the chaining effect is even useful
By Michael Schroeder, Biotec, 61
CATH step 2a:
Profile-based methods such as PSI-BLAST are used to detect distant relatives
Build profiles using all sequence data available (rather than only sequences for which structure exists)
This increases quality of profiles dramatically 51% distant relatives retrieved using profiles based on
sequences with known structure only 82% distant relatives retrieved using profile based on
all sequences
By Michael Schroeder, Biotec, 62
CATH step 2b: Structure-based methods to detect distant relatives
For ca. 15% of structures, sequence-based method does not work Example: For globins sequence similarity can fall
below 10%, yet structure and function (oxygen-binding) are preserved
Use SSAP, the Sequential Structure Alignment Program
By Michael Schroeder, Biotec, 63
Clustering Result of Structure Alignment
Relatives identified using pairwise alignment are clustered using hierarchical clustering with single linkage
By Michael Schroeder, Biotec, 64
Improving Efficiency: GRATH
Screening large structures (>300 residues) against database can take days
Idea of GRATH (Graphical Representation of CATH): Improve efficiency by filtering at a higher level before doing
detailed comparison Represent protein as graph where
Nodes are secondary structure elements represented as their midpoint, tilt, and rotation
Edges distances between midpoints of secondary structure elements
Use algorithm to determine subgraph isomorphism (i.e. does one graph occur in another one) Yes, then do detailed comparison using SSAP
By Michael Schroeder, Biotec, 65
Structure Prediction and Modelling
By Michael Schroeder, Biotec, 66
Structure Prediction:Four Main Problem Areas
Given a sequence with unknown structure, predict its structure
Secondary structure prediction Predict regions of helices and strands
Homology modelling Predict structure from known structures of one or more related
proteins Fold recognition
Given a library of structures, determine which one (if any) is the fold of the given sequence
Prediction of novel folds: A-priori and knowledge-based methods
By Michael Schroeder, Biotec, 67
Structure Prediction of Novel Folds: Two Approaches
A priori: Most approaches aim to reproduce inter-atomic
interactions by defining an energy function and trying to find global minimum
Problem: Inadequacy of the energy function Algorithms get stuck in local minima
Knolwedge-based: Find similarities to known structures or sub-
structures
By Michael Schroeder, Biotec, 68
Secondary Structure Prediction A successful tool for secondary structure prediction is PROF PROF uses a neural networks to learn secondary structure from
known structures ¾ of PROF’s prediction are correct At CASP 2000 it predicted e.g. the following
|10 |20 |30 |40 |50Sequence ALVEDPPLKVSEGGLIREGYDPDLDALRAAHREGVAYFLELEERERERTGPrediction HH------------EEE------HHHHHHHHHH-HHHHHHHHHHHHHHH-Experiment -E-------------E-----HHHHHHHHHHHHHHHHHHHHHHHHHHHH-
|60 |70 |80 | 90 |100IPTLKVGYNAVFGYYLEVTRPYYERVPKEYRPVQTLKDRQRYTLPEMKEK--EEEEEEEEEEEEEEEE-----------EEEEEEEE—-EEEE-HHHHHH----EEEEE---EEEEEEEHHHHHH-----EEEEE---EEEEE-HHHHHH
|110 |120EREVYRLEALIRRREEEVFLEVRERAKRQHHHHHHHHHHHHHHHHHHHHHHHHHHHH-HHHHHHHHHHHHHHHHHHHHHHHHHHH--
By Michael Schroeder, Biotec, 69
PROF’s prediction The regions
predicted by the PROF server of Rost to be helical are shown as wider ribbons. The prediction missed only a short helix, at the top left of the picture
By Michael Schroeder, Biotec, 70
Homology modelling Define the model of an unknown structure by making
minimal changes to a relative with known structure
Align amino acid sequences of target and one or more known structures Insertions and deletions should be in loop regions
Determine mainchain segments to represent the regions containing insertions and deletions and stitch these into the known structure
Replace the sidechains of the residues that have been mutated
Examine the model (by hand and computationally) to detect collisions between atoms
Refine the model by limited energy minimisation
By Michael Schroeder, Biotec, 71
Accuracy of Homology Modelling
Works for >40-50% sequence similarity Example: SWISS-MODEL Prediction of neurotoxin of red
scorpion (1DQ7) from neurotoxin of yellow scorpion (1PTX)
By Michael Schroeder, Biotec, 72
Fold Recognition: 3D Profiles
Given a sequence determine which (if any) fold is most similar Can we build profiles to represent structures of similar fold
(similar to sequence profiles)? 3D profiles:
Classify the environment of each residue Secondary structure:
Is it part of helix, sheet or other (determined by Mainchain hydrogen bonding interactions)
Surface exposure: <40A2, 40-114A2, or >114A2 accessible surface area
Polar or non-polar nature of environment Total of 18 residue classes, one of which each residue is part of Sequence of these residue classes is 3D profile
By Michael Schroeder, Biotec, 73
3D Profiles and Alignments Structure-Structure Alignment:
3D profiles of two known structures can be aligned against each other Sequence-Structure Alignment:
Based on existing 3D profiles, probability can be determined for a residue occurring in a residue class.
Using this probability, we can assign 3D profile to a sequence And hence align the sequence 3D profile to a structure 3D profile
For correctly determined protein structures, the structure 3D profile fits the sequence 3D profile well
However, other proteins may score even better
If a structure does not match its own 3D profile well it is likely that there is an error in the structure determination
By Michael Schroeder, Biotec, 74
Threading
Pull query sequence through known structure and rate the score
Necessary: Method to score the
models to select best one Method to calibrate the
scores to decide which of the best is correct
Homology modelling
Threading
Identify homologues
Try all possible parents
Determine optimal alignment
Try many alignments
Optimize one model
Evaluate many rough models
By Michael Schroeder, Biotec, 75
Scoring for Threading
Empirical patterns of residue neighbours derived from known structures
Observe distribution of inter-residue distances for all 20 x 20 residue pairs
Derive probability distribution as function of distance in space and on sequence
Boltzmann equation relates probability and energy Reverse this and derive energy function from
probability distribution
By Michael Schroeder, Biotec, 76
Threading the sequence
template
Target
Slides from Hanekamp, University of Wyoming, www.uwyo.edu
By Michael Schroeder, Biotec, 77
“Threaded” sequence
Yellow = adrenergic receptor sequenceBlue = adrenergic receptor (PDB 1F88 )
Slides from Hanekamp, University of Wyoming, www.uwyo.edu
By Michael Schroeder, Biotec, 78
Modeled structure
Gaps
Slides from Hanekamp, University of Wyoming, www.uwyo.edu
By Michael Schroeder, Biotec, 79
Corrected Model
Slides from Hanekamp, University of Wyoming, www.uwyo.edu
By Michael Schroeder, Biotec, 80
Ab initio Structure Prediction
By Michael Schroeder, Biotec, 81
Molecular dynamics
Structure prediction = place atoms so that interactions between them create a unique state of maximum stability
Problem: Model of inter-atomic distances is not complete Computational scale:
Large number of variables and massive search space Non-linearities Rough energy surface with many local minima
By Michael Schroeder, Biotec, 82
Conformational energy calculations
Bond stretching: Bond angle bend Torsion angle (e.g. , , ) Van der Waals interactions
Short-range repulsion ~R-12 and long-range attraction ~R-6, where R is the inter-atom distance
Hydrogen bond Weak chemical/electrostatic interaction, ~R-12 and ~R-10
Electrostatics Charges on atoms
Solvent Interactions with water, salt, sugar, etc.
By Michael Schroeder, Biotec, 83
Rosetta
Predicts structure by first generating structures of fragments using known structures (3-9 residues)
Combine fragments using Monte Carlo simulation using an energy function with terms for Paired beta-sheets Burial of hydrophobic residues
Carries out 1000 simulations Results are clustered and the centre of the largest
cluster is presented as prediction
Demo
By Michael Schroeder, Biotec, 84
ROSETTA The program ROSETTA, by D. Baker and colleagues,
can predict the structures of proteins for which no complete domain of similar folding pattern appears in the database. Prediction by ROSETTA of H. influenzae, hypothetical protein. Black lines, experimental structure; red lines, prediction
By Michael Schroeder, Biotec, 85
Rosetta
Prediction by ROSETTA of The N-terminal half of domain 1 of human DNA repair protein Xrcc4. This figures shows a selected substructure of Xrcc4 containing the N-terminal 55 out of 116 residues. Black lines, experimental structure; red lines, prediction
By Michael Schroeder, Biotec, 86
LINUS Another programme with similar idea Prediction by LINUS (program by G.D. Rose and R. Srinivasan) of C-
terminal domain of rat endoplasmic reticulum protein ERp29. Black lines, experimental structure; red lines, prediction
By Michael Schroeder, Biotec, 87
Monte Carlo Simulation Objective: Find conformation with minimal energy Problem: Avoid local minima
Algorithm: 1. Generate a random initial conformation x 2. Perturb conformation x to generate a neighbouring conformation x’ 3. Calculate the energies E(x) and E(x’), resp., for conformations x and x’ 4. If E(x)>E(x’) (i.e. x’ is an improvement, we go down hill from x to x’) then accept
x’ as new conformation and go to 2. 5. If E(x)<E(x’) (i.e. x’ is no improvement, we go uphill from x to x’) then accept x’
as new conformation with probability p 6. The probability p to accept uphill moves is reduced with every step 7. Go to step 2.
Step 1.-4. make sure that we “walk” downhill towards a minimum Step 5.-7. make sure that if we are in local minimum there is a chance to get out
of it by accepting an uphill move. It’s important that this probability decreases so that we are getting more and more unlikely to walk uphill
By Michael Schroeder, Biotec, 88
Summary You should know now
What helices, strands, sheets are What a Ramachandran plot is How to score a structural alignment (rmsd) How to compute a structural alignment How a domain can be characterised Why structure classification is useful What the main structure classes are How classifications can be generated automatically What the problems are What secondary structure prediction, homology modelling, threading,
ab-initio and knowledge-based structure prediction of novel folds are
Visit PDB, SCOP and CATH websites and Read chapter 5