+ All Categories
Home > Documents > . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional...

. Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional...

Date post: 18-Jan-2016
Category:
Upload: buddy-newman
View: 212 times
Download: 0 times
Share this document with a friend
Popular Tags:
52
. Protein Structure Prediction
Transcript
Page 1: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

.

Protein Structure Prediction

Page 2: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Protein Structure

Amino-acid chains can fold to form 3-dimensional structures

Proteins are sequencesthat have (more or less) stable 3-dimensional configuration

Page 3: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Why Structure is Important?

The structure a protein takes is crucial for its function Forms “pockets” that can recognize an enzyme

substrate Situates side chain of

specific groups to co-locate to form areas with desired chemical/electrical properties

Creates firm structures such ascollagen, keratins, fibroins

Page 4: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Determining Structure

X-Ray and NMR methods allow to determine the structure of proteins and protein complexes

These methods are expensive and difficult Could take several work months to process one

proteins

A centralized database (PDB) contains all solved protein structures

XYZ coordinate of atoms within specified precision

~19,000 solved structures

Page 5: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Growth of the Protein Data Bank

Page 6: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Structure is Sequence Dependent

Experiments show that for many proteins, the 3-dimensional structure is a function of the sequence

Force the protein to loose its structure, by introducing agents that change the environment

After sequences put back in water, original conformation/activity is restored

However, for complex proteins, there are cellular processes that “help” in folding

Page 7: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Amino Acids

Page 8: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

What Forces Hold the Structure?

Structure is supported by several types of chemical bonds/forces

Hydrogen Bonds

Page 9: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

What Forces Hold the Structure?

Charge-charge interactions Positive charged groups prefer to be situated

against negatively charged groups

Page 10: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

What Forces Hold the Structure?

Disulfide bonds S-S bonds between

cysteine residues These form during

folding

Page 11: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

What Forces Hold the Structure?

Hydrophobic effect

Page 12: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Levels of structure

Page 13: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Secondary Structure

-helix -strands

Page 14: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Hydrogen Bonds in -Helixes

Page 15: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

-Strands form Sheets

parallel Anti-parallel

These sheets hold together by hydrogen bonds across strands

Page 16: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Angular Coordinates

Secondary structures force specific angles between residues

Page 17: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Ramachandran Plot

We can related angles to types of structures

Page 18: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Labeling Secondary Structure

Using both hydrogen bond patterns and angles, we can label secondary structure tags from XYZ coordinate of amino-acids

These do not lead to absolute definition of secondary structure

Page 19: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Prediction of Secondary Structure

Input: amino-acid sequence

Output: Annotation sequence of three classes:

alpha beta other (sometimes called coil/turn)

Measure of success: Percentage of residues that were correctly labeled

Page 20: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Protein Folds: sequential, spatial and topological arrangement of

secondary structures

The Globin foldThe Globin fold

Page 21: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Approaches for structure prediction

Homology modeling (25-30% identity as a predictor)

Fold recognition Remote homology

Ab initio Prediction Heavy computations

Page 22: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Newly Determined Structures-Fraction of New Folds

Page 23: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Fraction of new folds (PDB new entries in 1998)

Koppensteiner et al., 2000,Koppensteiner et al., 2000,JMB 296:1139-1152.JMB 296:1139-1152.

Page 24: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

A Finite Number of Protein Folds

Aim: recognize fold that “matches” a given sequence

Approaches: PSI-Blast, Profile HMMs, etc. Threading

Page 25: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

EEabab A C D E …..

A -3 -1 0 0 ..C -1 -4 1 2 ..D 0 1 5 6 ..E 0 2 6 7 ... . . . .

ACCECADAAC -3-1-4-4-1-4-3-3=-23

• structural templatestructural template

• neighbor definitionneighbor definition

• energy functionenergy function

11

22

33

44

55

66

77

1010

88

99

AA

CC

CC

EE

CC

AA

DDAA

AA

CC

E Eji, positions

ba ji

Threading: Essential components

Page 26: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

MAHFPGFGQSLLFGYPVYVFGD...

Potential fold

...

1) ... 56) ... n)

...

-10 ... -123 ... 20.5

Find best fold for a protein sequence:

Fold recognition (threading)

Page 27: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

GenTHREADER(Jones , 1999, JMB 287:797-815)

For each template provide MSA align the query sequence with the MSA assess the alignment by sequence alignment

score assess the alignment by pairwise potentials assess the alignment by solvation function record lengths of: alignment, query, template

Page 28: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Essentials of GenTHREADER

Page 29: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Ab-initio Structure Recognition

Goal: Predict structure from “first principles”

Benefits: Works for novel folds Shows that we understand the process

Page 30: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Approaches to Ab-initio Prediction

Molecular Dynamics Simulates the forces that governs the protein within

water Since proteins natural fold, this would lead to

solved structure

Problems: Thousands of atoms Huge number of time steps to reach folded protein

Intractable problem

Page 31: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Approaches to Ab-initio Prediction

Minimal Energy Assumption: folded form is the minimal energy

conformation of the protein

Decomposition: Define energy function Search for 3-D conformation that minimize energy

Page 32: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Energy Function

Account for the forces that apply on the molecule Van der wals forces Covalent bonds Hydrogen bonds Charges Hydrophobic effects

Issues: Estimating parameters How do we compute it --- O( (# atoms)^2 )

Page 33: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Simplified Energy Functions

Different levels of granularity Residue-Residue energy function (Bead model)

Partial model Backbone as a bid Side-chain as a rigid body that can move wrt to

backbone

Many other variants

Page 34: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Search Strategy

High dimensional search problem

How do we represent partial solutions?

Position of each atom (too detailed!) Position of each reside (too coarse!) Intermediate solutions (e.g., backbone and side

chain)

Page 35: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Search Strategy

Representation tradeoffs

X,Y,Z coordinates Easy to compute distances between residues Might represent infeasible solutions

Angles between successive residues Easy to ensure a “legal” protein Harder to compute distances

Page 36: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Search Strategy

Typical approach: Secondary structure prediction Attempts at different conformation keeping

secondary structure fixed Finer moves relaxing secondary structure

Use Greedy search Simulated annealing …

Page 37: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Rosetta Method

Idea: “Structural” signatures are reoccurring within

protein structures Use these as cues during structure search

Page 38: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Local structure motifs

diverging type-2 turn

Serine hairpin Type-I hairpin

Frayed helix

Proline helix C-capalpha-alpha corner

glycine helix N-cap

I-sites Library = a catalog of local sequence-structure correlations

Page 39: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Example: Non-polar Alpha-helix

Page 40: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Example: Non-polar beta-strand

Page 41: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Example: Gly alpha-C-cap Type 1

Page 42: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Construction of I-sites library

Construct profiles (PSI-BLAST like) for each solved structure

Collect each possible segments of fixed length(len = 3, 9, 15)

Perform k-means clustering of segments Check each cluster for a “coherent” structure (in

terms of dihedral angles Prune incoherent structures Iteratively refine remaining clusters by removing

structurally different segments, redefining cluster membership, etc.

Page 43: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

All proteins can be constructed from fragments

Recent experiment:

For representative proteins, backbones were assembled from a library of 1000 different 5-residue fragments.

Page 44: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Fragment insertion Monte Carlo

Energyfunctionchange backbone

angles

Convert to 3D

accept or reject

Choose a fragment

frag

men

ts

backbone torsion angles

Rosetta: a folding simulation program

evaluate

Page 45: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Sequence dependent features

Rosetta’s energy function

Residue-residue contact energies are derived from the database

Page 46: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Current structure

Sequence-independent features

The energy score for a contact between secondary structures is summed using database statistics.

vector representationProbabilities from the database

Rosetta’s energy function

Page 47: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Rosetta prediction results

61% “topologically correct”

60% “locally correct”

73% secondary structure (Q3) correct

http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php

Page 48: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

Evaluation of partially correct predictions

RM

SD

L=30

L=20

L=8

6.0Å

Sequence

Tertiary structure %correct is the fraction of the sequence that is in a 30-residue window with RMSD < 6.0Å

MD

AL=windowsize

Ter

iary

str

uct

ure

Loc

al s

tru

ctu

re

mda = maximum deviation in backbone angles over an 8 residue window.

Local structure %correct is the fraction of the sequence that has mda < 90°.

90°

Sequence

Page 49: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

T0116 262-322 (61 residues)

prediction true structure

Topologically correct (rmsd=5.9Å) but helix is mis-predicted as loop.

Page 50: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

T0121 126-199 (66 residues)

prediction true structure

Topologically correct (rmsd=5.9Å) but loop is mis-predicted as helix.

Page 51: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

T0122 57-153 (97 residues)

...contains a 53 residue stretch with max deviation = 96°

prediction true structure

Page 52: . Protein Structure Prediction. Protein Structure u Amino-acid chains can fold to form 3-dimensional structures u Proteins are sequences that have (more.

T0112 153-213

Low rmsd (5.6Å) and all angles correct ( mda = 84°), but topologically wrong!!

prediction true structure

(this is rare)


Recommended