Lecture 5Protein Modeling
June 4, 2008
Protein Structure Prediction
3D Protein Structure
ALA
C
LEU
C
PRO
C
VAL
C
ARG
C
? ? ?
backbone
sidechain
The Protein Folding Problem
we know that the function of a protein is determined in large part by its 3D shape (fold, conformation)
can we predict the 3D shape of a protein given only its amino-acid sequence?
Motivation Want to identify the function of genes we find,
and what different mutations/alleles do One gene = one protein (sort of)
Function of protein = function of gene
Function can be determined in many ways Gene expression, knockouts, etc But these take time, and are prone to mistakes
Goal: If we can structure every protein, learning their functions isnt too far away
Thornton et al 2000 (Nature)
ai.stanford.edu/~serafim/CS262_2006/Slides
Protein Architecture proteins are polymers consisting of amino acids linked by
peptide bonds each amino acid consists of
a central carbon atom an amino group a carboxyl group a side chain
differences in side chains distinguish different amino acids
2NHCOOH
3D Protein Structure
backbonebackbonesidechainbackbonesidechainC-alpha
Peptide Bondsaminogroup
carboxylgroup
sidechain
carbon (common reference point for coordinates of a structure)
Amino Acid Side Chains side chains vary in: shape, size, charge, polarity
Levels of Description protein structure is often described at four different scales
primary structure secondary structure tertiary structure quaternary structure
Levels of Description
Secondary Structure secondary structure refers to certain common repeating
structures it is a local description of structure two common secondary structures
helices strands/sheets
a third category, called coil or loop, refers to everything else
Helices
carbon
hydrogenbond
individualamino acid
Sheets
Ribbon Diagram Showing Secondary Structures
Levels of Description
What Determines Conformation? in general, the amino-acid sequence of a protein determines
the 3D shape of a protein [Anfinsen et al., 1950s] but some exceptions
all proteins can be denatured some proteins are inherently disordered (i.e. lack a regular
structure) some proteins get folding help from chaperones there are various mechanisms through which the
conformation of a protein can be changed in vivo post-translational modifications such as
phosphorylation prions etc.
What Determines Conformation?
what physical properties of the protein determine its fold? rigidity of the protein backbone interactions among amino acids, including
electrostatic interactions van der Waals forces volume constraints hydrogen, disulfide bonds
interactions of amino acids with water
Determining Protein Structures protein structures can be determined
experimentally (in many cases) by x-ray crystallography nuclear magnetic resonance (NMR)
Myoglobin
From www.inst.bnl.gov/GasDetectorLab/x-rays/SRI94.htm
Myoglobin
S.E.V. Phillips. "Structure and refinement of oxymyoglobin at 1.6 resolution.", J. Mol. Biol. 1980, 142, 531.
X-ray Crystallography
proteincrystal collection
plate
x-raybeam
diffractionpattern
electrondensity map
(3D picture)
Electron Density Map Interpretation
GIVEN: 3D Electron Density Map
Electron Density Map Interpretation
FIND: All-atom Protein Model
NMR
Nuclear Magnetic Resonance Spectroscopy Cannot handle large proteins like X-ray Exploits the chemical environment to return
distances between atoms Can use knowledge of restraints to identify
positions of atoms that produce peaks
Protein structure determination in solution by NMR spectroscopy Wuthrich K. J Biol Chem. 1990 December 25;265(36):22059-62
Experimental Methods
Very expensive and time-consuming Computational methods can help with time
Many proteins still cannot be done in this manner
More motivation
there is a large sequence-structure gap300K protein sequences in SwissProt
database50K protein structures in PDB database
key question: can we predict structures by computational means instead?
Approaches to Protein Structure Prediction
prediction in 1D secondary structure solvent accessibility (which residues are exposed to
water, which are buried) transmembrane helices (which residues span
membranes) prediction in 2D
inter-residue/strand contacts prediction in 3D
homology modeling fold recognition (e.g. via threading) ab initio prediction (e.g. via molecular dynamics)
Prediction in 1D, 2D and 3D
Figure from B. Rost, Protein Structure in 1D, 2D, and 3D, The Encyclopaedia of Computational Chemistry, 1998
predicted secondary structure and solvent accessibility
known secondary structure (E = beta strand) and solvent accessibility
2D Prediction Approaches use secondary structure predictions
to predict short-range contacts (e.g. hydrogen bonds in helices)
use secondary structure predictions to predict strand alignments
use correlated mutations to predict contacts
Prediction in 3D homology modeling
given: a query sequence Q, a database of protein structuresdo:
find protein P has high sequence similarity to Q return Ps structure as an approximation to Qs structure
fold recognition (threading)given: a query sequence Q, a database of known foldsdo:
find fold F such that Q can be aligned with F in a highly compatible manner
return F as an approximation to Qs structure
Prediction in 3D fragment assembly(Rosetta)
given: a query sequence Q, a database of structure fragmentsdo:
find a set of fragments that Q can be aligned with in a highly compatible manner
return the combined fragments as an approximation
molecular dynamicsgiven: a query sequence Qdo:
use laws of physics to to simulate folding of Q
Homology Modeling
0% 100%30%pairwise sequence identity
homologsprobablyunrelated
remotehomologs
20%
most pairs of proteins with similar structure are remote homologs (< 25% sequence identity)
homology modeling usually doesnt work for remote homologs ; most pairs of proteins with < 25% sequence identity are unrelated
Homology-based Prediction
Raw model
Loop modeling
Side chain placement
Refinement
The SCOP DatabaseStructural Classification Of Proteins
FAMILY: proteins that are >30% similar, or >15% similar and have similar known structure/function
SUPERFAMILY: proteins whose families have some sequence and function/structure similarity suggesting a common evolutionary origin
COMMON FOLD: superfamilies that have same secondary structures in same arrangement, probably resulting by physics and chemistry
ai.stanford.edu/~serafim/CS262_2006/Slides
Examples of Fold Classesai.stanford.edu/~serafim/CS262_2006/Slides
Threading
Ab initio Prediction ROSETTA
1. PSI-BLAST homology search
Discard sequences with >25% homology
2. PHD
For each 3-long and each 9-long sequence fragment, get 25 structure fragments that match well
3 M k Ch i M C l
?? ?
ai.stanford.edu/~serafim/CS262_2006/Slides
ai.stanford.edu/~serafim/CS262_2006/Slides
Ab initio Prediction CASP results
ai.stanford.edu/~serafim/CS262_2006/Slides
Summary of current state of the art
ai.stanford.edu/~serafim/CS262_2006/Slides
Open Ended
Ab Initio is the goal, far from it Sidechain prediction Contact Map prediction Search space reduction Parallelization (GPUs) Surface Accessibility
Other areas
Protein-Protein Interaction Drug Design Protein Engineering Ligand Docking/Inhibition Function Prediction
Lecture 5Protein ModelingJune 4, 20083D Protein StructureThe Protein Folding ProblemMotivationSlide Number 5Slide Number 6Protein Architecture3D Protein StructurePeptide BondsAmino Acid Side ChainsLevels of DescriptionLevels of DescriptionSecondary Structurea Helicesb SheetsRibbon Diagram Showing Secondary StructuresLevels of DescriptionWhat Determines Conformation?What Determines Conformation?Determining Protein StructuresSlide Number 21Slide Number 22X-ray CrystallographyElectron Density Map InterpretationElectron Density Map InterpretationNMRSlide Number 27Experimental MethodsMore motivationApproaches to Protein Structure PredictionPrediction in 1D, 2D and 3D2D Prediction ApproachesPrediction in 3DPrediction in 3DSlide Number 35Homology ModelingHomology-based PredictionThe SCOP DatabaseExamples of Fold ClassesThreadingAb initio Prediction ROSETTA Slide Number 42Ab initio Prediction CASP resultsSummary of current state of the artOpen EndedOther areas