Protein Structure Prediction - MMTSB · 2006-08-02 · Ab initio Structure Prediction Protocol...

Post on 20-May-2020

4 views 0 download

transcript

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Protein Structure Prediction

Michael FeigMMTSB/CTBP

2006 Summer Workshop

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

From Sequence to Structure

SEALGDTIVKNA…

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Ab initio Structure Prediction Protocol

Conformational Sampling… to generate native-like structures

Scoring & Clustering… to identify most native-like structures

Amino Acid Sequence

3D Structure

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Folding with All-Atom Models

CHARMM force fieldImplicit solvent replica exchange simulations8 replicas, 10 ns/replica

AAQAAAAQAAAAQAA

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Folding with Low-Resolution Model

SICHO model, MONSSTER simulated annealing run

EQQNAFYEILHLPNLNEEQRNGFIQSLKDDPSQSANLLAEAKKLNDAQA

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

SICHO Lattice Model

Kolinski & Skolnick: Proteins 32, 475 (1998)

Monte Carlo simulations:

> Attempt move

> Compute ΔE

> Accept with probability p:

Simulated annealing

Constant Temperature

Replica Exchange Sampling

LeuThr

Asp

Phe

!

p =1 "E # 0

exp($"E /kBT ) "E > 0

% & '

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

SICHO Energy FunctionKnowledge-Based Terms

Excluded volume

Side chain burial propensity

follows Kyte-Doolittle scale

4.5Ile-3.5Asp-4.5Arg

1.8Ala

Centrosymmetric bias

rg = 2.2 Nres0.38

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

SICHO Energy FunctionStatistical Terms

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

0 1 2 3 4 5 6 7 8 9 10 11

GLU-GLU r(i,i+4) in Å

ener

gy

helixextended

Potential of mean force (PMF):

!

"E = #kT ln(p)

!

pi

p j

= e"#Eij

kBT

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Conformational Sampling with SICHOProtein A

-3400

-3350

-3300

-3250

-3200

1 2 3 4 5 6 7 8 9 10 11 12

RMSD from native in Å

All-

Ato

m E

nerg

y in

kca

l/m

ol

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Ab initio Structure Prediction Protocol

Efficient Samplinge.g. MONSSTER/SICHO

All-Atom Reconstruction

Scoring & Clusteringe.g. MMGB/SA, DFIRE

Secondary Structure PredictionPSIPRED et al.

Amino Acid Sequence

3D Structure

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Secondary Structure Prediction

…GDPIVKNAKLDSRLANKEALRLL…

?

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Secondary Structure Propensities

Chou & Fasman (1974)

α-helix β-sheet turn

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Secondary Structure Prediction Methods

C+F: Chou & FasmanGOR: Garnier, Osguthorpe, Robson

77.6%SABLE

75.1%PSSP

50-60%C+F

72.3%PHD

76.1%SAM-T99-sec

76.2%PSIPRED

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Secondary structure predictionPer-Residue Accuracy

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Realistic ab initio Structure Prediction

-2750

-2700

-2650

-2600

-2550

-2500

-2450

-2400

6 7 8 9 10 11 12 13 14

C! RMSD in Å (62 residues)

CHAR

MM

22/G

BMV.

-2750

-2700

-2650

-2600

-2550

-2500

-2450

-2400

6 7 8 9 10 11 12 13 14

C! RMSD in Å (62 residues)

CHAR

MM

22/G

BMV.

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

-2750

-2700

-2650

-2600

-2550

-2500

-2450

-2400

6 7 8 9 10 11 12 13 14

C! RMSD in Å (62 residues)

CHAR

MM

22/G

BMV.

Sampling, Scoring, Clustering

-2750

-2700

-2650

-2600

-2550

-2500

-2450

-2400

6 7 8 9 10 11 12 13 14

C! RMSD in Å (62 residues)

CHAR

MM

22/G

BMV.

-2750

-2700

-2650

-2600

-2550

-2500

-2450

-2400

6 7 8 9 10 11 12 13 14

C! RMSD in Å (62 residues)

CHAR

MM

22/G

BMV.

-2750

-2700

-2650

-2600

-2550

-2500

-2450

-2400

6 7 8 9 10 11 12 13 14

C! RMSD in Å (62 residues)

CHAR

MM

22/G

BMV.

-2750

-2700

-2650

-2600

-2550

-2500

-2450

-2400

6 7 8 9 10 11 12 13 14

C! RMSD in Å (62 residues)

CHAR

MM

22/G

BMV.

-2750

-2700

-2650

-2600

-2550

-2500

-2450

-2400

6 7 8 9 10 11 12 13 14

C! RMSD in Å (62 residues)

CHAR

MM

22/G

BMV.

-2750

-2700

-2650

-2600

-2550

-2500

-2450

-2400

6 7 8 9 10 11 12 13 14

C! RMSD in Å (62 residues)

CHAR

MM

22/G

BMV.

-2750

-2700

-2650

-2600

-2550

-2500

-2450

-2400

6 7 8 9 10 11 12 13 14

C! RMSD in Å (62 residues)

CHAR

MM

22/G

BMV.

-2750

-2700

-2650

-2600

-2550

-2500

-2450

-2400

6 7 8 9 10 11 12 13 14

C! RMSD in Å (62 residues)

CHAR

MM

22/G

BMV.

-2750

-2700

-2650

-2600

-2550

-2500

-2450

-2400

6 7 8 9 10 11 12 13 14

C! RMSD in Å (62 residues)

CHAR

MM

22/G

BMV.

-2750

-2700

-2650

-2600

-2550

-2500

-2450

-2400

6 7 8 9 10 11 12 13 14

C! RMSD in Å (62 residues)

CHAR

MM

22/G

BMV.

-2750

-2700

-2650

-2600

-2550

-2500

-2450

-2400

6 7 8 9 10 11 12 13 14

C! RMSD in Å (62 residues)

CHAR

MM

22/G

BMV.

-2750

-2700

-2650

-2600

-2550

-2500

-2450

-2400

6 7 8 9 10 11 12 13 14

C! RMSD in Å (62 residues)

CHAR

MM

22/G

BMV.

-2750

-2700

-2650

-2600

-2550

-2500

-2450

-2400

6 7 8 9 10 11 12 13 14

C! RMSD in Å (62 residues)

CHAR

MM

22/G

BMV.

-2750

-2700

-2650

-2600

-2550

-2500

-2450

-2400

6 7 8 9 10 11 12 13 14

C! RMSD in Å (62 residues)

CHAR

MM

22/G

BMV.

-2750

-2700

-2650

-2600

-2550

-2500

-2450

-2400

6 7 8 9 10 11 12 13 14

C! RMSD in Å (62 residues)

CHAR

MM

22/G

BMV.

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Ab initio PredictionsDNase fragmentation factor

Best-scoring prediction 7.4 Å RMSD

NMR structure 1KOY

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Scoring Functions

Knowledge-based/statistical derived from known protein structures limited by training data usually fast e.g. DFIRE, RAPDF, prosaII

Force field based model physical energy landscape more robust and transferable often expensive (require minimization) e.g. MMPB(GB)/SA, UNRES

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Scoring Function ComparisonMMGB/SA vs. DFIRE

14.2

14.4

14.6

14.8

15

15.2

15.4

15.6

15.8

5.5 6 6.5 7 7.5 8 8.5

C! RMSD in Å

radiu

s of

gyr

atio

n

-4200

-4100

-4000

-3900

-3800

-3700

-3600

-3500

5.5 6 6.5 7 7.5 8 8.5

C! RMSD in Å

MM

GB/S

A s

core

-10500

-10000

-9500

-9000

-8500

-8000

5.5 6 6.5 7 7.5 8 8.5

C! RMSD in Å

DFIR

E s

core

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Sampling with Restraints

Secondary structure bias

Secondary structure prediction

NMR shift data

Distance restraints

Experimental restraints (disulfides, NMR, EPR)

Side chain contacts from analogous structures

Shape restraints

cryoEM data, small-angle X-ray scattering

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

… but the solution may lieelsewhere.

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Sequence Homology

SDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQGKLTVA : :. .. .: ..::: : :::::::: :.. .....:.. . .MVKQIESKTAFQEALDAAGDKLVVVDFSATWCGPCKMIKPFFHSLSEKYSNVIFL-

KLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDAN---LA....:. .: . .. .::. .::.:. ::: .: : :: :.:. :.EVDVDDCQDVASECEVKCMPTFQFFKKGQ----KVGEFS-GANKEKLEATINELV

Human thioredoxin (1AUC)

E. Coli thioredoxin (1THO)

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Comparative Modeling

Human thioredoxin (1AUC)

E. Coli thioredoxin (1THO)

Assumption: Proteins with similar sequence have similar structure

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Structural Templates from Homology

Challenges:

Correct alignment Loop modeling Side chain rebuilding

PGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDAN---LA .: . .. .::. .::.:. ::: .: : :: :.:. :.QDVASECEVKCMPTFQFFKKGQ----KVGEFS-GANKEKLEATINELV

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Accuracy of Predictions by Homology

0

20

40

60

80

100

20 30 40 50 60 70 80 90 100

% sequence identity

% p

red

icti

on

s

<2Å RMSD

<5Å RMSD

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Prediction through Fold Recognition

Assumption: Proteins with similar secondary structure share fold

1N91 1JRM

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Templates through Fold Recognition

Challenges:

Wrong templates Alignment uncertain Fragment modeling Refinement needed

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Ab initio Sampling in Template-basedStructure Prediction

Template provides known protein structure

Ab initio sampling of unknown fragments in the context of template

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Template Restraints Near Flexible Part

0.10.2

0.40.7

0.0

1.0

Restraint potential:

!

U = f " k(r # r0)2

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Loop Sampling Methods

MMTSB Tool SetAll-Atom Reconstruction1-2

Rosetta (Baker)Fragment-based2-100MMTSB Tool SetMulti-Scale5-30Modeller (Sali)Torsional Space MC/MD2-12

Exhaustive Search1-3

ProgramSampling# Residues

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Structural Genomics Efforts

MMTSB/CTBP Summer Workshop © Michael Feig, 2006.

Structure Refinement

predicted native (NMR)

?