+ All Categories
Home > Documents > Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

Date post: 23-Dec-2015
Category:
Upload: octavia-george
View: 215 times
Download: 0 times
Share this document with a friend
47
Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115
Transcript
Page 1: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

Protein Structure Prediction

Xiaole Shirley Liu

And

Jun Liu

STAT115

Page 2: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

2

Protein Structure PredictionRam Samudrala

University of Washington

Page 3: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT1153

Outline

• Motivations and introduction

• Protein 2nd structure prediction

• Protein 3D structure prediction– CASP– Homology modeling– Fold recognition– ab initio prediction– Manual vs automation

• Structural genomics

Page 4: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT1154

Protein Structure

• Sequence determines structure, structure determines function

• Most proteins can fold by itself very quickly

• Folded structure: lowest energy state

Page 5: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

5

Protein Structure• Main forces for considerations

– Steric complementarity– Secondary structure preferences (satisfy H

bonds)– Hydrophobic/polar patterning– Electrostatics

Page 6: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

6

Rationale for understanding protein structure and function

Protein sequence

-large numbers of sequences, including whole genomes

Protein function

- rational drug design and treatment of disease- protein and genetic engineering- build networks to model cellular pathways- study organismal function and evolution

?

structure determination structure prediction

homologyrational mutagenesisbiochemical analysis

model studies

Protein structure

- three dimensional- complicated- mediates function

Page 7: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT1157

Protein Databases

• SwissProt: protein knowledgebase

• PDB: Protein Data Bank, 3D structure

Page 8: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

8

View Protein Structure

• Free interactive viewers

• Download 3D coordinate file from PDB

• Quick and dirty:– VRML– Rasmol– Chime

• More powerful– Swiss-PdbViewer

Page 9: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

9

Compare Protein Structures

• Structure is more conserved than sequence• Why compare?

– Detect evolutionary relationships– Identify recurring structural motifs– Predicting function based on structure– Assess predicted structures

• Protein structure comparison and classification– Manual: SCOP– Automated: DALI

Page 10: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

10

Compare protein structures

• Need ways to determine if two protein structures are related and to compare predicted models to experimental structures

• Commonly used measure is the root mean square deviation (RMSD) of the Cartesian atoms between two structures after optimal superposition (McLachlan, 1979):

 

• Usually use C atoms 

2 2 2

1

N

i i iidx dy dz

N

3.6 Å 2.9 Å

NK-lysin (1nkl) Bacteriocin T102/as48 (1e68) T102 best model

• Other measures include contact maps and torsion angle RMSDs

Page 11: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11511

SCOP• Compare protein

structure, identify

recurring structural

motifs, predict function• A. Murzin et al, 1995

– Manual classification

– A few folds are highly

populated

– 5 folds contain 20% of all homologous superfamilies

– Some folds are multifunctional

Page 12: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11512

Determine Protein Structure

• X-ray crystallography (gold standard)– Grow crystals, rate limiting, relies on the repeating

structure of a crystalline lattice

– Collect a diffraction pattern

– Map to real space electron density, build and refine structural model

– Painstaking and time consuming

Page 13: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11513

Protein Structure Prediction

• Since AA sequence determines structure, can we predict protein structure from its AA sequence?= predicting the three angles, unlimited DoF!

• Physical properties that determine fold– Rigidity of the protein backbone

– Interactions among amino acids, including• Electrostatic interactions

• van der Waals forces

• Volume constraints

• Hydrogen, disulfide bonds

– Interactions of amino acids with water

Page 14: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

14

unfolded

Protein folding landscape

Large multi-dimensional space of changing conformationsfr

ee e

nerg

y

folding reaction

molten globule

J=10-8 s

native

J=10-3 s

G**

barrierheight

Page 15: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

15

Protein primary structure

twenty types of amino acids

R

H

C

OH

O

N

H

HCα

two amino acids join by forming a peptide bond

R

H

C

O

N

H

H NCα

H

C

O

OH

R

H

R

H

C

O

N

H

NCα

H

C

O

R

HR

H

C

O

N

H

NCα

H

C

O

R

H

each residue in the amino acid main chain has two degrees of freedom (and

the amino acid side chains can have up to four degrees of freedom 1-4

Page 16: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11516

2nd Structure Prediction helix, sheet, turn/loop

Page 17: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11517

2nd Structure Prediction

• Chou-Fasman 1974• Base on 15 proteins (2473 AAs) of known

conformation, determine P, P from 0.5-1.5

• Empirical rules for 2nd struct nucleation– 4 H or h out of 6 AA, extends to both dir,

P > 1.03, P > P, no breakers– 3 H or h out of 5 AA, extends to both dir, P

> 1.05, P > P, no breakers

• Have ~50-60% accuracy

)20//( sj

si ff

Page 18: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11518

P and P

Page 19: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11519

2nd Structure Prediction

• Garnier, Osguthorpe, Robson, 1978• Assumption: each AA influenced by flanking

positions

• GOR scoring tables (problem: limited dataset)

• Add scores, assign 2nd with highest score

Page 20: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11520

2nd Structure Prediction

• D. Eisenberg, 1986– Plot hydrophobicity as function of sequence

position, look for periodic repeats– Period = 3-4 AA, (3.6 aa / turn)– Period = 2 AA, sheet

• Best overall JPRED by Geoffrey Barton, use many different approaches, get consensus– Overall accuracy: 72.9%

Page 21: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11521

3D Protein Structure Prediction

• CASP contest: Critical Assessment of Structure Prediction

• Biannual meeting since 1994 at Asilomar, CA• Experimentalists: before CASP, submit sequence

of to-be-solved structure to central repository• Predictors: download sequence and minimal

information, make predictions in three categories• Assessors: automatic programs and experts to

evaluate predictions quality

Page 22: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11522

CASP Category I• Homology Modeling (sequences with high

homology to sequences of known structure)

• Given a sequence with homology > 25-30% with known structure in PDB, use known structure as starting point to create a model of the 3D structure of the sequence

• Takes advantage of knowledge of a closely related protein. Use sequence alignment techniques to establish correspondences between known “template” and unknown.

Page 23: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11523

CASP Category II

• Fold recognition (sequences with no sequence identity (<= 30%) to sequences of known structure

• Given the sequence, and a set of folds observed in PDB, see if any of the sequences could adopt one of the known folds

• Takes advantage of knowledge of existing structures, and principles by which they are stabilized (favorable interactions)

Page 24: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11524

CASP Category III• Ab initio prediction (no known homology with any

sequence of known structure)

• Given only the sequence, predict the 3D structure from “first principles”, based on energetic or statistical principles

• Secondary structure prediction and multiple alignment techniques used to predict features of these molecules. Then, some method necessary for assembling 3D structure.

Page 25: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11525

Structure Prediction Evaluation

• Hydrophobic core similar? • 2nd struct identified?• Energy: minimized? H-bond contacts?• Compare with solved crystal structure: gold

standard

N

yxyxNRMSD Ni

ii

...1

2||||),;(

Page 26: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

26

Comparative modelling of protein structure

KDHPFGFAVPTKNPDGTMNLMNWECAIPKDPPAGIGAPQDN----QNIMLWNAVIP** * * * * * * * **

… …

scanalign

build initial modelconstruct non-conserved

side chains and main chains

refine

Page 27: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11527

Homology Modeling Results

• When sequence homology is > 70%, high resolution models are possible (< 3 Å RMSD)

• MODELLER (Sali et al)– Find homologous proteins with known

structure and align– Collect distance distributions between atoms in

known protein structures– Use these distributions to compute positions for

equivalent atoms in alignment– Refine using energetics

Page 28: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11528

Homology Modeling Results

• Many places can go wrong:– Bad template - it doesn’t have the same

structure as the target after all– Bad alignment (a very common problem)– Good alignment to good template still gives

wrong local structure– Bad loop construction– Bad side chain positioning

Page 29: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11529

Homology Modeling Results

• Use of sensitive multiple alignment (e.g. PSI-BLAST) techniques helped get best alignments

• Sophisticated energy minimization techniques do not dramatically improve upon initial guess

Page 30: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11530

Fold Recognition Results

• Also called protein threading

• Given new sequence and library of known folds, find best alignment of sequence to each fold, returned the most favorable one

Page 31: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11531

Fold Recognition with Dynamic Programming

• Environmental class for each AA based on known folds (buried status, polarity, 2nd struct)

Page 32: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11532

Protein Folding with Dynamic Programming

• D. Eisenburg 1994• Align sequence to each fold (a string of

environmental classes)

• Advantages: fast and works pretty well• Disadvantages: do not consider AA contacts

Page 33: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11533

Fold Recognition Results

• Each predictor can submit N top hits

• Every predictor does well on something

• Common folds (more examples) are easier to recognize

• Fold recognition was the surprise performer at CASP1. Incremental progress at CASP2, CASP3, CASP4…

Page 34: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11534

Fold Recognition Results

• Alignment (seq to fold) is a big problem

Page 35: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11535

ab initio

• Predict interresidue contacts and then compute structure (mild success)

• Simplified energy term + reduced search space (phi/psi or lattice) (moderate success)

• Creative ways to memorize sequence structure correlations in short segments from the PDB, and use these to model new structures: ROSETTA

Page 36: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

36

Ab initio prediction of protein structuresample conformational space such that

native-like conformations are found

astronomically large number of conformations5 states/100 residues = 5100 = 1070

select

hard to design functionsthat are not fooled by

non-native conformations(“decoys”)

Page 37: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

37

Sampling conformational space – continuous approaches

• Most work in the field- Molecular dynamics- Continuous energy minimization (follow a valley)- Monte Carlo simulation- Genetic Algorithms

• Like real polypeptide folding process

• Cannot be sure if native-like conformations are sampled

energy

Page 38: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

38

Molecular dynamics

• Force = -dU/dx (slope of potential U); acceleration, force = m ×a(t)

• All atoms are moving so forces between atoms are complicated functions of time

• Analytical solution for x(t) and v(t) is impossible; numerical solution is trivial

• Atoms move for very short times of 10-15 seconds or 0.001 picoseconds (ps)

x(t+t) = x(t) + v(t)t + [4a(t) – a(t-t)] t2/6

v(t+t) = v(t) + [2a(t+t)+5a(t)-a(t-t)] t/6

Ukinetic = ½ Σ mivi(t)2 = ½ n KBT

• Total energy (Upotential + Ukinetic) must not change with time

new positionold position

new velocity

old velocity acceleration

n is number of coordinates (not atoms)

Page 39: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

39

Energy minimization

For a given protein, the energy depends on thousands of x,y,z Cartesian atomic coordinates; reaching a deep minimum is not trivial

Furthermore, we want to minimize the free energy, not just the potential energy.

energy

number of steps deep minimum

starting conformation

Page 40: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

40

Monte Carlo Simulation• Propose moves in torsion or Cartesian conformation

space• Evaluate energy after every move, compute E• Accept the new conformation based on

• If run infinite time, the simulated conformation follows the Boltzmann distribution

• Many variations, including simulated annealing and other heuristic approaches.

ΔEP exp

kT

E(C)( ) exp

kTC

Page 41: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

41

Scoring/energy functions

• Need a way to select native-like conformations from non-native ones

• Physics-based functions: electrostatics, van der Waals, solvation, bond/angle terms.

• Knowledge-based scoring functions: – Derive information about atomic properties

from a database of experimentally determined conformations

– Common parameters include pairwise atomic distances and amino acid burial/exposure.

Page 42: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11542

Rosetta

• D. Baker, U. Wash• Break sequence into short segments (7-9 AA)• Sample 3D from library of known segment

structures, parallel computation• Use simulated annealing (metropolis-type

algorithm) for global optimization– Propose a change, if better energy, take; otherwise take

at smaller probability

• Create 1000 structures, cluster and choose one representative from each cluster to submit

Page 43: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11543

Manual Improvements and Automation

• Very often manual examination could improve prediction– Catch errors– Need domain knowledge– A. Murzin’s success at CASP2

• CAFASP: Critical Assessment of Fully Automated Structure Prediction– Murzin Can’t play!!

• MetaServers: combine different methods to get consensus

Page 44: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11544

CAFASP Evaluation

Page 45: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11545

Structural Genomics

• With more and more solved structures and novel folds, computational protein structure prediction is going to improve

• Structural genomics: – Worldwide initiative to high throughput

determine many protein structures– Especially, solve structures that have no

homology

Page 46: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11546

Summary• Protein structures: 1st, 2nd, 3rd, 4th

– Different DB: SwissProt, PDB and SCOP– Determine structure: X-ray crystallography

• Protein structure prediction:– 2nd structure prediction– Homology modeling– Fold recognition– Ab initio– Evaluation: energy, RMSD, etc– CASP and CAFASP contest

• Manual improvement and combination of computational approaches work better

• Structural Genomics, still very difficult problem…

Page 47: Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.

STAT11547

Acknowledgement

• Amy Keating

• Michael Yaffe

• Mark Craven

• Russ Altman


Recommended