Post on 02-Jan-2016
transcript
Protein Folding Protein Folding ProgramsPrograms
ByByAsım OKURAsım OKUR
CSE 549CSE 549November 14, 2002November 14, 2002
Protein StructureProtein Structure
DNA Sequence DNA Sequence Protein Sequence Protein Sequence Structure Structure (Mis)function (Mis)function
It is believed that all the information It is believed that all the information necessary to determine the structure necessary to determine the structure of a protein is present in its primary of a protein is present in its primary sequence.sequence.
Protein Folding Protein Folding ProgramsPrograms
Protein folding is one of the biggest Protein folding is one of the biggest computational challengescomputational challenges
Different types of folding and Different types of folding and structure predictions programsstructure predictions programs SimulationsSimulations Homology Modeling ApproachesHomology Modeling Approaches
SimulationsSimulations
Simulate the real behavior of Simulate the real behavior of proteinsproteins
High detail, short time scalesHigh detail, short time scales 2 main simulation types2 main simulation types
Molecular DynamicsMolecular Dynamics Monte Carlo Monte Carlo
The Energy FunctionThe Energy Function
Calculate energies for each particle Calculate energies for each particle Since long range interactions Since long range interactions
important for each pair of particles important for each pair of particles the pair-wise interactions should be the pair-wise interactions should be calculatedcalculated
bonds angles torsions ji ij
ji
ij
ij
ij
ijneqeqrPair R
R
B
R
An
VKrrKE
612
22 )cos(12
)()(
Homology ModelingHomology Modeling
Template Selection Template Selection and Fold and Fold Assignment Assignment
Target – Template Target – Template AlignmentAlignment
Model BuildingModel Building Loop ModelingLoop Modeling Sidechain ModelingSidechain Modeling
Model EvaluationModel Evaluation
Fold Assignment and Fold Assignment and Template SelectionTemplate Selection
Identify all protein structures with Identify all protein structures with sequences related to the target, then select sequences related to the target, then select templatestemplates
3 main classes of comparison methods3 main classes of comparison methods Compare the target sequence with each Compare the target sequence with each
database sequence independently, pair-wise database sequence independently, pair-wise sequence – sequence comparison, BLAST and sequence – sequence comparison, BLAST and FASTAFASTA
Multiple sequence comparisons to improve Multiple sequence comparisons to improve sensitivity, PSI-BLASTsensitivity, PSI-BLAST
Threading or 3-D template matching methodsThreading or 3-D template matching methods
Target – Template Target – Template AlignmentAlignment
Most important step in Homology Most important step in Homology ModelingModeling
A specialized method should be used for A specialized method should be used for alignmentalignment Over 40% identity the alignment is likely to be Over 40% identity the alignment is likely to be
correct. correct. Regions of low local sequence similarity Regions of low local sequence similarity
become common when overall sequence become common when overall sequence identity is under 40%. (Saqi et al., Protein identity is under 40%. (Saqi et al., Protein Eng. 1999)Eng. 1999)
The alignment becomes difficult below 30% The alignment becomes difficult below 30% sequence identity. (Rost, Protein Eng. 1999)sequence identity. (Rost, Protein Eng. 1999)
Model BuildingModel Building Construct a 3-D model of the target Construct a 3-D model of the target
sequence based on its alignment on sequence based on its alignment on template structurestemplate structures
Three different model building approachesThree different model building approaches Modeling by rigid body assemblyModeling by rigid body assembly Modeling by segment matchingModeling by segment matching Modeling by satisfaction of spatial restraintsModeling by satisfaction of spatial restraints
Accuracies of these models are similarAccuracies of these models are similar Template selection and alignment have Template selection and alignment have
larger impact on the modellarger impact on the model
Swiss-MOD Web Server
Screenshots from the Homology Modeling Server Swiss-Model
• Construct a framework using known protein structures
• Generate the location of the target amino acids on the framework
• If loop regions not determined, additional database search or short simulations
Procedure of the MODELLER program
• After obtaining restraints run a geometry optimization or real-space optimization to satisfy them
Errors in Homology Errors in Homology ModelsModels
a. Errors in sidechain packing
b. Distortions and shifts in correctly aligned regions
c. Errors in regions without a template
d. Errors due to misalignment
e. Incorrect templates
Model Building Model Building ProgramsPrograms
COMPOSER P www-cryst.bioc.cam.ac.uk
CONGEN P www.congenomics.com/congen/congen.html
CPH models S www.cbs.dtu.dk/services/CPHmodels/
DRAGON P www.nimr.mrc.ac.uk/~mathbio/a-aszodi/dragon.html
ICM P www.molsoft.com
InsightII P www.msi.com
MODELLER P guitar.rockefeller.edu/modeller/modeller.html
LOOK P www.mag.com
QUANTA P www.msi.com
SYBYL P www.tripos.com
SCWRL P www.cmpharm.ucsf.edu/~bower/scrwl/scrwl.html
SWISS-MOD S www.expasy.ch/swissmod
WHAT IF P www.sander.embl-heidelberg.de/whatif/
ApplicationsApplications
Critical Assessment of Critical Assessment of protein Structure protein Structure Prediction (CASP)Prediction (CASP)
Venclovas et al. Proteins, 2001
Critical Assessment of Critical Assessment of protein Structure protein Structure Prediction (CASP)Prediction (CASP)
Venclovas et al. Proteins, 2001
ConclusionsConclusions Computer Simulations are powerful to show Computer Simulations are powerful to show
detailed motions but they cannot cover long detailed motions but they cannot cover long enough time spans to simulate folding for large enough time spans to simulate folding for large systemssystems
Homology Modeling techniques can be Homology Modeling techniques can be successful if the target protein has a known foldsuccessful if the target protein has a known fold The higher the sequence similarity the more likely the The higher the sequence similarity the more likely the
model will be successful model will be successful With the implementation of better techniques the With the implementation of better techniques the
errors in fold assignment, alignment, and sidechain errors in fold assignment, alignment, and sidechain and loop modeling are decreasingand loop modeling are decreasing
Theoretically, if at least one member of every possible Theoretically, if at least one member of every possible fold is known, it is possible to predict the structure of fold is known, it is possible to predict the structure of every coding sequence to within a certain accuracyevery coding sequence to within a certain accuracy