+ All Categories
Home > Documents > Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

Date post: 06-Jan-2016
Category:
Upload: latham
View: 48 times
Download: 0 times
Share this document with a friend
Description:
Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction. Jarek Meller Division of Biomedical Informatics, Children’s Hospital Research Foundation & Department of Biomedical Engineering, UC. Outline of the lecture. - PowerPoint PPT Presentation
Popular Tags:
29
JM - http://folding.chmcc.o rg 1 Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction Jarek Jarek Meller Meller Division of Biomedical Informatics, Division of Biomedical Informatics, Children’s Hospital Research Foundation Children’s Hospital Research Foundation & Department of Biomedical Engineering, & Department of Biomedical Engineering, UC UC
Transcript
Page 1: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 1

Introduction to Bioinformatics: Lecture XIComputational Protein Structure Prediction

Jarek MellerJarek Meller

Division of Biomedical Informatics, Division of Biomedical Informatics, Children’s Hospital Research Foundation Children’s Hospital Research Foundation & Department of Biomedical Engineering, UC& Department of Biomedical Engineering, UC

Page 2: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 2

Outline of the lecture

Protein structure and complexity of

conformational search: from similarity based methods to de novo structure prediction

Multiple sequence alignment and family profiles Secondary structure and solvent accessibility

prediction Matching sequences with known structures:

threading and fold recognition Ab initio folding simulations

Page 3: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 3

Polypeptide chains: backbone and side-chains

C-ter

N-ter

Page 4: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 4

Distinct chemical nature of amino acid side-chains

ARG

PHE

GLU

VALCYS

C-ter

N-ter

Page 5: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 5

Hydrogen bonds and secondary structures

helix

strand

Page 6: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 6

Tertiary structure and long range contacts: annexin

Page 7: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 7

Quaternary structure and protein-protein interactions: annexin hexamer

Page 8: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 8

Domains, interactions, complexes: cyclin D and Cdk

Cyclin Box

Page 9: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 9

Domains, interactions, complexes: VHL

HIF - 1

Elongin B

Elongin C

V H L

Page 10: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 10

Protein folding problem

The protein folding problem consists of predicting three-dimensional structure of a protein from its amino acid sequence

Hierarchical organization of protein structures helps to break the problem into secondary structure, tertiary structure and protein-protein interaction predictions

Computational approaches for protein structure prediction: similarity based and de novo methods

Page 11: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 11

Polypeptide chains: backbone and rotational degrees of freedom

               H     O         R2               |     ||        |         NH3+--Ca -- C -- N -- Ca -- C --O-                 |          |     |       \\                 R1          H    H        O

The equilibrium length of the peptide bond (C -- N) is about 2 [Ang]. The average Ca - Ca distance in a polypeptide chain is about 3.8 [Ang]. The angle of rotation around N - Ca bond is called , and the angle around the Ca - C bond is called . These two angles define the overall conformation of polypeptide chains.Simplifying, there are three discrete states (rotations) for each of thesesingle bonds, implying 9N possible backbone conformations. 

Page 12: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 12

Scoring alternative conformations with empirical force fields (folding potentials)

misfolded

native

E

Ideally, each misfolded structure should have an energy higher than the native energy, i.e. :

Emisfolded - Enative > 0

Page 13: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 13

Ab initio (or de novo) folding simulations

When dealing with a new fold, the similarity base

methods cannot be applied Ab initio folding simulations consist of conformational

search with an empirical scoring function (“force field”) to be maximized (or minimized)

Computational bottleneck: exponential search space and sampling problem (global optimization!)

Fundamental problem: inaccuracy of empirical force fields

Importance of mixed protocols, such as Rosetta by D. Baker and colleagues (more when Monte Carlo protocols for global optimization are introduced)

Page 14: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 14

Similarity based approaches to structure prediction: from sequence alignment to fold recognition

High level of redundancy in biology: sequence similarity is often

sufficient to use the “guilt by association” rule: if similar sequence then similar structure and function

Multiple alignments and family profiles can detect evolutionary relatedness with much lower sequence similarity, hard to detect with pairwise sequence alignments: Psi-BLAST by S. Altschul et. al.

For sufficiently close proteins one may superimpose the backbones using sequence alignment and then perform conformational search (with the backbone fixed) to find the optimal geometry (according to atomistic empirical force field) of the side-chains: homology modeling (e.g. Modeller by A. Sali et. al.)

Many structures are already known (see PDB) and one can match sequences directly with structures to enhance structure recognition: fold recognition

For both, fold recognition and de novo simulation, prediction of intermediate attributes such secondary structure or solvent accessibility helps to achieve better sensitivity and specificity

Page 15: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 15

Protein families and domains

PFAM (7246 families as of April 2004):http://www.sanger.ac.uk/Software/Pfam/

PRODOM:http://prodes.toulouse.inra.fr/prodom/current/html/home.php

CDD:http://www.ncbi.nlm.nih.gov/Structure/cdd/cddsrv.cgiCheck: pfam00134.11, Cyclin_N

The notion of protein family is derived from evolutionary considerations:members of the same family are related, perform the same function andare assumed to have diverged from the same ancestor.

The notion of domain is derived from structural considerations:“A domain is defined as an autonomous structural unit, or a reusable sequence unit that may be found in multiple protein contexts”, Baterman et. al.

Page 16: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 16

Multiple alignment and PSSM

Page 17: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 17

Multiple alignment, clustering and families

DP search gives optimal solution scaling exponentially with the number of sequences K, O(nK), not practical for more than 3,4 sequences.

Standard heuristics start from pairwise alignments (e.g. PsiBLAST, Clustalw)

Hidden Markov Model approach to family profiles (profile HMM) as an alternative with pre-fixed parameters, trained separately for each family. Some initial multiple alignments necessary for training (next lecture).

Page 18: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 18

Predicting 1D protein profiles from sequences: secondary structures and solvent accessibility

SABLE serverhttp://sable.cchmc.org

POLYVIEW serverhttp://polyview.cchmc.org

a) Multiple alignment and family profiles improve prediction of localstructural propensities.

b) Use of advanced machine learning techniques, such as Neural Networks or Support Vector Machines improves results as well.

B. Rost and C. Sander were first to achieve more than 70%accuracy in three state (H, E, C) classification, applying a) and b).

Page 19: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 19

Predicting 1D protein profiles from sequences: secondary structures and solvent accessibility

Page 20: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 20

Predicting transmembrane domains

Page 21: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 21

“Hydropathy” profiles and membrane domains prediction

Problem Design a simple algorithm for finding putative trans-membrane regions based on “hydropathy” (or hydrophobicity)profiles. Consider an extension based on prototypes and k-NN.

Page 22: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 22

Predicting transmembrane domains

Page 23: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 23

Going beyond sequence similarity: threading and fold recognition

When sequence similarity is notdetectable use a library of knownstructures to match your querywith target structures.

As in case of de novo folding,one needs a scoring functionthat measures compatibilitybetween sequences and structures.

Page 24: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 24

Why “fold recognition”?

Divergent (common ancestor) vs. convergent (no ancestor) evolution

PDB: virtually all proteins with 30% seq. identity have similar structures, however most of the similar structures share only up to 10% of seq. identity !

www.columbia.edu/~rost/Papers/1997_evolution/paper.html (B. Rost)

www.bioinfo.mbb.yale.edu/genome/foldfunc/ (H. Hegyi, M. Gerstein)

Page 25: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 25

Simple contact model for protein structure prediction

Each amino acid is represented by a point in 3D space and two amino acids are said to be in contact if their distance is smaller than a cutoff distance, e.g. 7 [Ang].

Page 26: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 26

Sequence-to-structure matching with contact models

Generalized string matching problem: aligning a string of amino acids against a string of “structural sites” characterized by other residues in contact

Finding an optimal alignment with gaps using inter-residue pairwise models:

E = k< l k l , is NP-hard because of the non-local character of scores

at a given structural site (identity of the interaction partners may change depending on location of gaps in the alignment)

R.H. Lathrop, Protein Eng. 7 (1994)

Page 27: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 27

Hydrophobic contact model and sequence-to-structure alignment

HPHPP-

Solutions to this yet another instance of the global optimization problem:a) Heuristic (e.g. frozen environment approximation)b) “Profile” or local scoring functions (folding potentials)

Page 28: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 28

Using sequence similarity, predicted secondary structures and contact potentials: fold recognition protocols

In practice fold recognition methods are often mixtures of sequence matching and threading, e.g., with compatibility between a sequence and a structure measured by contact potentials and predicted secondary structures compared to the secondary structure of a template).

D.Fischer and D. Eisenberg, Curr. Opinion in Struct. Biol. 1999, 9: 208

Page 29: Introduction to Bioinformatics: Lecture XI Computational Protein Structure Prediction

JM - http://folding.chmcc.org 29

Some fold recognition servers

PsiBLAST (Altschul SF et. al., Nucl. Acids Res. 25: 3389)

Live Bench evaluation (http://BioInfo.PL/LiveBench/1/) :

1. FFAS (L. Rychlewski, L. Jaroszewski, W. Li, A. Godzik (2000), Protein

Science 9: 232) : seq. profile against profile

2. 3D-PSSM (Kelley LA, MacCallum RM, Sternberg JE, JMB 299: 499 ) : 1D-3D profile combined with secondary structures and solvation potential

3. GenTHREADER (Jones DT, JMB 287: 797) : seq. profile combined with pairwise interactions and solvation potential

LOOPP: annotations of remote homologs

http://www.tc.cornell.edu/CBIO/loopp


Recommended