MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Protein Structure Prediction
Michael FeigMMTSB/CTBP
2006 Summer Workshop
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
From Sequence to Structure
SEALGDTIVKNA…
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Ab initio Structure Prediction Protocol
Conformational Sampling… to generate native-like structures
Scoring & Clustering… to identify most native-like structures
Amino Acid Sequence
3D Structure
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Folding with All-Atom Models
CHARMM force fieldImplicit solvent replica exchange simulations8 replicas, 10 ns/replica
AAQAAAAQAAAAQAA
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Folding with Low-Resolution Model
SICHO model, MONSSTER simulated annealing run
EQQNAFYEILHLPNLNEEQRNGFIQSLKDDPSQSANLLAEAKKLNDAQA
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
SICHO Lattice Model
Kolinski & Skolnick: Proteins 32, 475 (1998)
Monte Carlo simulations:
> Attempt move
> Compute ΔE
> Accept with probability p:
Simulated annealing
Constant Temperature
Replica Exchange Sampling
LeuThr
Asp
Phe
!
p =1 "E # 0
exp($"E /kBT ) "E > 0
% & '
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
SICHO Energy FunctionKnowledge-Based Terms
Excluded volume
Side chain burial propensity
follows Kyte-Doolittle scale
4.5Ile-3.5Asp-4.5Arg
1.8Ala
Centrosymmetric bias
rg = 2.2 Nres0.38
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
SICHO Energy FunctionStatistical Terms
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
0 1 2 3 4 5 6 7 8 9 10 11
GLU-GLU r(i,i+4) in Å
ener
gy
helixextended
Potential of mean force (PMF):
!
"E = #kT ln(p)
!
pi
p j
= e"#Eij
kBT
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Conformational Sampling with SICHOProtein A
-3400
-3350
-3300
-3250
-3200
1 2 3 4 5 6 7 8 9 10 11 12
RMSD from native in Å
All-
Ato
m E
nerg
y in
kca
l/m
ol
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Ab initio Structure Prediction Protocol
Efficient Samplinge.g. MONSSTER/SICHO
All-Atom Reconstruction
Scoring & Clusteringe.g. MMGB/SA, DFIRE
Secondary Structure PredictionPSIPRED et al.
Amino Acid Sequence
3D Structure
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Secondary Structure Prediction
…GDPIVKNAKLDSRLANKEALRLL…
?
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Secondary Structure Propensities
Chou & Fasman (1974)
α-helix β-sheet turn
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Secondary Structure Prediction Methods
C+F: Chou & FasmanGOR: Garnier, Osguthorpe, Robson
77.6%SABLE
75.1%PSSP
50-60%C+F
72.3%PHD
76.1%SAM-T99-sec
76.2%PSIPRED
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Secondary structure predictionPer-Residue Accuracy
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Realistic ab initio Structure Prediction
-2750
-2700
-2650
-2600
-2550
-2500
-2450
-2400
6 7 8 9 10 11 12 13 14
C! RMSD in Å (62 residues)
CHAR
MM
22/G
BMV.
-2750
-2700
-2650
-2600
-2550
-2500
-2450
-2400
6 7 8 9 10 11 12 13 14
C! RMSD in Å (62 residues)
CHAR
MM
22/G
BMV.
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
-2750
-2700
-2650
-2600
-2550
-2500
-2450
-2400
6 7 8 9 10 11 12 13 14
C! RMSD in Å (62 residues)
CHAR
MM
22/G
BMV.
Sampling, Scoring, Clustering
-2750
-2700
-2650
-2600
-2550
-2500
-2450
-2400
6 7 8 9 10 11 12 13 14
C! RMSD in Å (62 residues)
CHAR
MM
22/G
BMV.
-2750
-2700
-2650
-2600
-2550
-2500
-2450
-2400
6 7 8 9 10 11 12 13 14
C! RMSD in Å (62 residues)
CHAR
MM
22/G
BMV.
-2750
-2700
-2650
-2600
-2550
-2500
-2450
-2400
6 7 8 9 10 11 12 13 14
C! RMSD in Å (62 residues)
CHAR
MM
22/G
BMV.
-2750
-2700
-2650
-2600
-2550
-2500
-2450
-2400
6 7 8 9 10 11 12 13 14
C! RMSD in Å (62 residues)
CHAR
MM
22/G
BMV.
-2750
-2700
-2650
-2600
-2550
-2500
-2450
-2400
6 7 8 9 10 11 12 13 14
C! RMSD in Å (62 residues)
CHAR
MM
22/G
BMV.
-2750
-2700
-2650
-2600
-2550
-2500
-2450
-2400
6 7 8 9 10 11 12 13 14
C! RMSD in Å (62 residues)
CHAR
MM
22/G
BMV.
-2750
-2700
-2650
-2600
-2550
-2500
-2450
-2400
6 7 8 9 10 11 12 13 14
C! RMSD in Å (62 residues)
CHAR
MM
22/G
BMV.
-2750
-2700
-2650
-2600
-2550
-2500
-2450
-2400
6 7 8 9 10 11 12 13 14
C! RMSD in Å (62 residues)
CHAR
MM
22/G
BMV.
-2750
-2700
-2650
-2600
-2550
-2500
-2450
-2400
6 7 8 9 10 11 12 13 14
C! RMSD in Å (62 residues)
CHAR
MM
22/G
BMV.
-2750
-2700
-2650
-2600
-2550
-2500
-2450
-2400
6 7 8 9 10 11 12 13 14
C! RMSD in Å (62 residues)
CHAR
MM
22/G
BMV.
-2750
-2700
-2650
-2600
-2550
-2500
-2450
-2400
6 7 8 9 10 11 12 13 14
C! RMSD in Å (62 residues)
CHAR
MM
22/G
BMV.
-2750
-2700
-2650
-2600
-2550
-2500
-2450
-2400
6 7 8 9 10 11 12 13 14
C! RMSD in Å (62 residues)
CHAR
MM
22/G
BMV.
-2750
-2700
-2650
-2600
-2550
-2500
-2450
-2400
6 7 8 9 10 11 12 13 14
C! RMSD in Å (62 residues)
CHAR
MM
22/G
BMV.
-2750
-2700
-2650
-2600
-2550
-2500
-2450
-2400
6 7 8 9 10 11 12 13 14
C! RMSD in Å (62 residues)
CHAR
MM
22/G
BMV.
-2750
-2700
-2650
-2600
-2550
-2500
-2450
-2400
6 7 8 9 10 11 12 13 14
C! RMSD in Å (62 residues)
CHAR
MM
22/G
BMV.
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Ab initio PredictionsDNase fragmentation factor
Best-scoring prediction 7.4 Å RMSD
NMR structure 1KOY
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Scoring Functions
Knowledge-based/statistical derived from known protein structures limited by training data usually fast e.g. DFIRE, RAPDF, prosaII
Force field based model physical energy landscape more robust and transferable often expensive (require minimization) e.g. MMPB(GB)/SA, UNRES
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Scoring Function ComparisonMMGB/SA vs. DFIRE
14.2
14.4
14.6
14.8
15
15.2
15.4
15.6
15.8
5.5 6 6.5 7 7.5 8 8.5
C! RMSD in Å
radiu
s of
gyr
atio
n
-4200
-4100
-4000
-3900
-3800
-3700
-3600
-3500
5.5 6 6.5 7 7.5 8 8.5
C! RMSD in Å
MM
GB/S
A s
core
-10500
-10000
-9500
-9000
-8500
-8000
5.5 6 6.5 7 7.5 8 8.5
C! RMSD in Å
DFIR
E s
core
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Sampling with Restraints
Secondary structure bias
Secondary structure prediction
NMR shift data
Distance restraints
Experimental restraints (disulfides, NMR, EPR)
Side chain contacts from analogous structures
Shape restraints
cryoEM data, small-angle X-ray scattering
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
… but the solution may lieelsewhere.
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Sequence Homology
SDKIIHLTDDSFDTDVLKADGAILVDFWAEWCGPCKMIAPILDEIADEYQGKLTVA : :. .. .: ..::: : :::::::: :.. .....:.. . .MVKQIESKTAFQEALDAAGDKLVVVDFSATWCGPCKMIKPFFHSLSEKYSNVIFL-
KLNIDQNPGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDAN---LA....:. .: . .. .::. .::.:. ::: .: : :: :.:. :.EVDVDDCQDVASECEVKCMPTFQFFKKGQ----KVGEFS-GANKEKLEATINELV
Human thioredoxin (1AUC)
E. Coli thioredoxin (1THO)
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Comparative Modeling
Human thioredoxin (1AUC)
E. Coli thioredoxin (1THO)
Assumption: Proteins with similar sequence have similar structure
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Structural Templates from Homology
Challenges:
Correct alignment Loop modeling Side chain rebuilding
PGTAPKYGIRGIPTLLLFKNGEVAATKVGALSKGQLKEFLDAN---LA .: . .. .::. .::.:. ::: .: : :: :.:. :.QDVASECEVKCMPTFQFFKKGQ----KVGEFS-GANKEKLEATINELV
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Accuracy of Predictions by Homology
0
20
40
60
80
100
20 30 40 50 60 70 80 90 100
% sequence identity
% p
red
icti
on
s
<2Å RMSD
<5Å RMSD
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Prediction through Fold Recognition
Assumption: Proteins with similar secondary structure share fold
1N91 1JRM
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Templates through Fold Recognition
Challenges:
Wrong templates Alignment uncertain Fragment modeling Refinement needed
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Ab initio Sampling in Template-basedStructure Prediction
Template provides known protein structure
Ab initio sampling of unknown fragments in the context of template
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Template Restraints Near Flexible Part
0.10.2
0.40.7
0.0
1.0
Restraint potential:
!
U = f " k(r # r0)2
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Loop Sampling Methods
MMTSB Tool SetAll-Atom Reconstruction1-2
Rosetta (Baker)Fragment-based2-100MMTSB Tool SetMulti-Scale5-30Modeller (Sali)Torsional Space MC/MD2-12
Exhaustive Search1-3
ProgramSampling# Residues
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Structural Genomics Efforts
MMTSB/CTBP Summer Workshop © Michael Feig, 2006.
Structure Refinement
predicted native (NMR)
?