Date post: | 18-Jan-2016 |
Category: |
Documents |
Upload: | buddy-newman |
View: | 212 times |
Download: | 0 times |
.
Protein Structure Prediction
Protein Structure
Amino-acid chains can fold to form 3-dimensional structures
Proteins are sequencesthat have (more or less) stable 3-dimensional configuration
Why Structure is Important?
The structure a protein takes is crucial for its function Forms “pockets” that can recognize an enzyme
substrate Situates side chain of
specific groups to co-locate to form areas with desired chemical/electrical properties
Creates firm structures such ascollagen, keratins, fibroins
Determining Structure
X-Ray and NMR methods allow to determine the structure of proteins and protein complexes
These methods are expensive and difficult Could take several work months to process one
proteins
A centralized database (PDB) contains all solved protein structures
XYZ coordinate of atoms within specified precision
~19,000 solved structures
Growth of the Protein Data Bank
Structure is Sequence Dependent
Experiments show that for many proteins, the 3-dimensional structure is a function of the sequence
Force the protein to loose its structure, by introducing agents that change the environment
After sequences put back in water, original conformation/activity is restored
However, for complex proteins, there are cellular processes that “help” in folding
Amino Acids
What Forces Hold the Structure?
Structure is supported by several types of chemical bonds/forces
Hydrogen Bonds
What Forces Hold the Structure?
Charge-charge interactions Positive charged groups prefer to be situated
against negatively charged groups
What Forces Hold the Structure?
Disulfide bonds S-S bonds between
cysteine residues These form during
folding
What Forces Hold the Structure?
Hydrophobic effect
Levels of structure
Secondary Structure
-helix -strands
Hydrogen Bonds in -Helixes
-Strands form Sheets
parallel Anti-parallel
These sheets hold together by hydrogen bonds across strands
Angular Coordinates
Secondary structures force specific angles between residues
Ramachandran Plot
We can related angles to types of structures
Labeling Secondary Structure
Using both hydrogen bond patterns and angles, we can label secondary structure tags from XYZ coordinate of amino-acids
These do not lead to absolute definition of secondary structure
Prediction of Secondary Structure
Input: amino-acid sequence
Output: Annotation sequence of three classes:
alpha beta other (sometimes called coil/turn)
Measure of success: Percentage of residues that were correctly labeled
Protein Folds: sequential, spatial and topological arrangement of
secondary structures
The Globin foldThe Globin fold
Approaches for structure prediction
Homology modeling (25-30% identity as a predictor)
Fold recognition Remote homology
Ab initio Prediction Heavy computations
Newly Determined Structures-Fraction of New Folds
Fraction of new folds (PDB new entries in 1998)
Koppensteiner et al., 2000,Koppensteiner et al., 2000,JMB 296:1139-1152.JMB 296:1139-1152.
A Finite Number of Protein Folds
Aim: recognize fold that “matches” a given sequence
Approaches: PSI-Blast, Profile HMMs, etc. Threading
EEabab A C D E …..
A -3 -1 0 0 ..C -1 -4 1 2 ..D 0 1 5 6 ..E 0 2 6 7 ... . . . .
ACCECADAAC -3-1-4-4-1-4-3-3=-23
• structural templatestructural template
• neighbor definitionneighbor definition
• energy functionenergy function
11
22
33
44
55
66
77
1010
88
99
AA
CC
CC
EE
CC
AA
DDAA
AA
CC
E Eji, positions
ba ji
Threading: Essential components
MAHFPGFGQSLLFGYPVYVFGD...
Potential fold
...
1) ... 56) ... n)
...
-10 ... -123 ... 20.5
Find best fold for a protein sequence:
Fold recognition (threading)
GenTHREADER(Jones , 1999, JMB 287:797-815)
For each template provide MSA align the query sequence with the MSA assess the alignment by sequence alignment
score assess the alignment by pairwise potentials assess the alignment by solvation function record lengths of: alignment, query, template
Essentials of GenTHREADER
Ab-initio Structure Recognition
Goal: Predict structure from “first principles”
Benefits: Works for novel folds Shows that we understand the process
Approaches to Ab-initio Prediction
Molecular Dynamics Simulates the forces that governs the protein within
water Since proteins natural fold, this would lead to
solved structure
Problems: Thousands of atoms Huge number of time steps to reach folded protein
Intractable problem
Approaches to Ab-initio Prediction
Minimal Energy Assumption: folded form is the minimal energy
conformation of the protein
Decomposition: Define energy function Search for 3-D conformation that minimize energy
Energy Function
Account for the forces that apply on the molecule Van der wals forces Covalent bonds Hydrogen bonds Charges Hydrophobic effects
Issues: Estimating parameters How do we compute it --- O( (# atoms)^2 )
Simplified Energy Functions
Different levels of granularity Residue-Residue energy function (Bead model)
Partial model Backbone as a bid Side-chain as a rigid body that can move wrt to
backbone
Many other variants
Search Strategy
High dimensional search problem
How do we represent partial solutions?
Position of each atom (too detailed!) Position of each reside (too coarse!) Intermediate solutions (e.g., backbone and side
chain)
Search Strategy
Representation tradeoffs
X,Y,Z coordinates Easy to compute distances between residues Might represent infeasible solutions
Angles between successive residues Easy to ensure a “legal” protein Harder to compute distances
Search Strategy
Typical approach: Secondary structure prediction Attempts at different conformation keeping
secondary structure fixed Finer moves relaxing secondary structure
Use Greedy search Simulated annealing …
Rosetta Method
Idea: “Structural” signatures are reoccurring within
protein structures Use these as cues during structure search
Local structure motifs
diverging type-2 turn
Serine hairpin Type-I hairpin
Frayed helix
Proline helix C-capalpha-alpha corner
glycine helix N-cap
I-sites Library = a catalog of local sequence-structure correlations
Example: Non-polar Alpha-helix
Example: Non-polar beta-strand
Example: Gly alpha-C-cap Type 1
Construction of I-sites library
Construct profiles (PSI-BLAST like) for each solved structure
Collect each possible segments of fixed length(len = 3, 9, 15)
Perform k-means clustering of segments Check each cluster for a “coherent” structure (in
terms of dihedral angles Prune incoherent structures Iteratively refine remaining clusters by removing
structurally different segments, redefining cluster membership, etc.
All proteins can be constructed from fragments
Recent experiment:
For representative proteins, backbones were assembled from a library of 1000 different 5-residue fragments.
Fragment insertion Monte Carlo
Energyfunctionchange backbone
angles
Convert to 3D
accept or reject
Choose a fragment
frag
men
ts
backbone torsion angles
Rosetta: a folding simulation program
evaluate
Sequence dependent features
Rosetta’s energy function
Residue-residue contact energies are derived from the database
Current structure
Sequence-independent features
The energy score for a contact between secondary structures is summed using database statistics.
vector representationProbabilities from the database
Rosetta’s energy function
Rosetta prediction results
61% “topologically correct”
60% “locally correct”
73% secondary structure (Q3) correct
http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php
Evaluation of partially correct predictions
RM
SD
L=30
L=20
L=8
6.0Å
Sequence
Tertiary structure %correct is the fraction of the sequence that is in a 30-residue window with RMSD < 6.0Å
MD
AL=windowsize
Ter
iary
str
uct
ure
Loc
al s
tru
ctu
re
mda = maximum deviation in backbone angles over an 8 residue window.
Local structure %correct is the fraction of the sequence that has mda < 90°.
90°
Sequence
T0116 262-322 (61 residues)
prediction true structure
Topologically correct (rmsd=5.9Å) but helix is mis-predicted as loop.
T0121 126-199 (66 residues)
prediction true structure
Topologically correct (rmsd=5.9Å) but loop is mis-predicted as helix.
T0122 57-153 (97 residues)
...contains a 53 residue stretch with max deviation = 96°
prediction true structure
T0112 153-213
Low rmsd (5.6Å) and all angles correct ( mda = 84°), but topologically wrong!!
prediction true structure
(this is rare)