http://creativecommons.org/licenses/by-sa/2.0/
From Protein Sequence to Protein Structure
Prof:Rui [email protected]
973702406Dept Ciencies Mediques Basiques,
1st Floor, Room 1.08Website of the
Course:http://web.udl.es/usuaris/pg193845/Courses/Bioinformatics_2007/ Course: http://10.100.14.36/Student_Server/
• Fundamentals of protein structure
• From protein sequence to secondary structure
• Protein tertiary structure
• Predicting protein structure
Outline
Predicting protein sequence from DNA sequence
• Protein sequence can be predicted by translating the cDNA and using the genetic code.
MQTLSERLKKRRIALKMTQTELATKAGVKQQSIQLIEAGVTKRPRFLFEIAMALNCDPVWLQYGTKRGKAA
atgcaaactctttctgaacgcctcaagaagaggcgaattgcgttaaaaatgacgcaaaccgaactggcaaccaaagccggtgttaaacagcaatcaattcaactgattgaagctggagtaaccaagcgaccgcgcttcttgtttgagattgctatggcgcttaactgtgatccggtttggttacagtacggaactaaacgcggtaaagccgcttaa
augcaaacucuuucugaacgccucaagaagaggcgaauugcguuaaaaaugacgcaaaccgaacuggcaaccaaagccgguguuaaacagcaaucaauucaacugauugaagcuggaguaaccaagcgaccgcgcuucuuguuugagauugcuauggcgcuuaacugugauccgguuugguuacaguacggaacuaaacgcgguaaagccgcuuaa
Proteins are the primary functionalmanifestation of genomes
DNA sequence
RNA sequence
proteinsequence
proteinstructure
Protein function
transcription
translation
Being able to predict the protein sequence from the gene sequence allows us to predict structure, which in turn helps us understand how the protein does what it does
• The sequence of AAs is the primary structure of proteins• Sequence determines structure• Amino acids don’t fall neatly into classes• How we casually speak of them can affect the way we think
about their behavior. For example, if you think of Cys as a polar residue, you might be surprised to find it in the hydrophobic core of a protein unpaired to any other polar group. But this does happen.
• The properties of a residue type can also vary with conditions/environment
Amino acids are the primary building blocks of proteins
Grouping the amino acids by properties
Livingstone & Barton, CABIOS, 9, 745-756, 1993.
Proteins are made by controlled polymerization of amino acids
H2N CH C
R1
OH
O
H2N CH C
R2
OH
O
H2N CH C
R1
NH
O
CH C
R2
OH
O
pe ptide bond is formed
+ HOH
res idue 1 res idue 2
two amino a cidscondense to form...
...a dipeptide . Ifthe re a re more itbe comes a polype ptide .S hort polype ptide cha insa re usua lly ca lled peptideswhile longer one s a re ca lle dprote ins .
wa te r is e limina ted
N or aminote rminus
C or ca rboxyte rminus
• Fundamentals of protein structure
• From protein sequence to secondary structure
• Protein tertiary structure
• Predicting protein structure
Outline
Repeating torsion angles
/ angles characterize the secondary structure
Secondary structure elements in proteins
beta-strand(nonlocal interactions)
alpha-helix (local interactions)
A secondary structure element is a contiguous region of a protein sequence characterized by a repeating pattern of main-chain hydrogen bonds and backbone phi/psi angles
Reflect the tendency of backbone to hydrogen bond with itself in a semi-ordered fashion when compacted
Principal types of secondary structure found in proteins
Repeating (f,y) values
-63o -42o
-57o -30o
-119o +113o
-139o +135o
-helix(15) (right-handed)
310 helix(14)
Parallel -sheet
Antiparallel -sheet
The alpha-helix: repeating i,i+4 h-bonds
2
1
3
4
5
7
8
9
6
10
11
12
By DSSP definitions, which of residues 1-12 are in the helix? Does this coincide with the residues in the helical region of phi-psi space?
right-handed helical region of phi-psi space
hydrogen
bond-63o -42o
-helix(15) (right-handed)
-60
-120
-180
0
60
120
180
-180 -120 -60 0 60 120
strands/sheets
Is this a parallel or anti-parallel sheet?
49
50
51
52
53
54
57
56
beta-strand region of phi-psi space
By DSSP definitions, which of res 49-57 are in the sheet? Does this coincide with the residues in the beta-strand region of phi-psi space?
-119o +113o
Parallel -sheet
-60
-120
-180
0
60
120
180
-180 -120 -60 0 60 120 180
Contact maps of protein structures
1avg--structure of triabin
map of C-C distances < 6 Å
rainbow ribbon diagramblue to red: N to C
-both axes are the sequence of the protein
near diagonal: local contacts in the sequence
off-diagonal: long-range (nonlocal) contacts
• Secondary structure is the sequence of fold elements in a protein (--loop) - The number and order of secondary structures in the sequence (connectivity) and their arrangement in space defines a protein’s fold or topology
• If, from the primary structure one can predict secondary structure, then this may help in predicting protein function, via evolutionary relationships with known folds
What is secondary structure and what does it teach?
Predicting the secondary structure of your protein
• Fundamentals of protein structure
• From protein sequence to secondary structure
• Protein tertiary structure
• Predicting protein structure
Outline
Tertiary structure in proteins
• Single polypeptide chain
• The number and order of secondary structures in the sequence (connectivity) and their arrangement in space defines a protein’s fold or topology
• Pattern of contacts between side chains/backbone also an aspect of tertiary structure
• Outer surface and interior
Obvious interactions in native protein structures
S
S
R3
R1R2
CO2
NH3
ONH
disulfide crosslinks polar interactions (hydrogen bond/salt bridge)
hydrophobic interactions
The protein databank
The protein databank is a central repository of protein structures
http://www.rcsb.org/pdb/home/home.do
Major structure classification systems
SCOP (Structural Classification of Proteins)CATH (Class-Architecture-Topology-Homology)DALI/FSSP (Fold classification based on Structure-Structure Alignment)
SCOP and CATH are quite similar and generally combine automated and manual aspects. They are both “curated” by human experts.
• Fundamentals of protein structure
• From protein sequence to secondary structure
• Protein tertiary structure
• Predicting protein structure
Outline
Training set of known structures
Training set of corresponding sequences
Test set of known structures
Test set of corresponding sequences
The knuts and bolts behind fold predition
p(-helix) p(coil) p(-strand)
A 0.23 0.28 0.5
Database of known structures
Database of corresponding sequences
ACDEFGTYAEE……
-helix coil -strand
p(-helix) p(coil) p(-strand)
A…C… A…C.. A…C…
A 0.1…0.03 0.04…0.002 0.1…0.21
p(aa1-coil) p(aa1-helix)
p(aa1-strand) …
Predict 2ary structureCompare
Bad Predictions:
Reshuffle training set and test set and repeat until predictions are correct
Good Predictions:
Method ready for new sequence 2ndary structure prediction
How does a fold prediction server work?
Database of known structures
Database of corresponding sequences
Database of probabilities of aa in 2ndary structure
YOUR SEQUENCE
Homology
based helix
coil-strand
profile folds database
Server
Strong Homology
… Fold Prediction
Weak/No Homology
Helix-coil-strand
profile prediction
… Fold Prediction
Predicting protein folding
Predicting protein structure
• Homology Modeling– Phyre, 3D-JIGSAW, SWISSMODEL
• Ab initio Modeling– ROBETTA
Predicting protein structure by homology
How does a homology modeling server work?
Database of known structures
Database of corresponding sequences
…YDVRSEQVENCE…
Server/
Program
Strong Homologues
Best possible alignment
(Sequence+
Structure)
…YDVR-SEQVENCE…
…YDVRMSD-VDNCD…
…YDVR-SEQVENCE…
…YDVRMSD-VDNCD…
…
…
Thread sequence to predict over known structure according to alignment
…
… Optimization via energy
minimization, etc…
Predicting protein structure
• Homology Modeling– 3D-JIGSAW,SWISSMODEL
• Ab initio Modeling– ROSETTA
Predicting protein structure by ab initio methods
Database of corresponding sequences
…YDVRSEQVENCE…
Server/
Program
NO Homologues
Database of structures for smaller amino acid runs
…YDVR-SEQ
…YDVRMSD-……YDVR-SEQ
…YPVRMSD-…
…
…VENCE…
…YDNCD……VENCE…
…VEQCE…
…
… Assemble
Energy minimization
& optimization
…
Accuracy of modelling
• Accuracy is widely varying.• The quality of the model is VERY dependent on
the quality of the alignment • Globular proteins are more accurately predicted• Membrane proteins are still a big problem• Homology modelling is “bad” if Homology<30%• CASP is a bienial meeting where accuracy of the
different methods is predicted– Baker group is usually and consistently more accurate
than others
http://www.predictioncenter.org/
BLAST Algorithm
• Sequences are split into words (default n=3)– Speed, computational efficiency
• Scoring of matches done using scoring matrices• HSP = high scoring segment pair
– BLAST algorithm extends the initial “seed” hit into an HSP
• Local optimal alignment• More than one HSP can be found
Sequence-Structure Hybrid alignments
ACEFGHIKLMNPQRSTVWYAALII….ACDYGHIKLCQANRSTVWY ALII….ACDYGHIKLCQANRSTVWY -ALII….
aaaaaaaaa l l l l l aaaaaaaaaa….aaaaaaaaaaaaaaaaaaaaaaaa….
Using a probability model to predict secondary structure we can align the secondary structures
If 3D structures are available for homologues, then structure can be used to improve alignment. STRAP does that:
http://www.charite.de/bioinf/strap/
• DNA sequence to protein sequence
• From protein sequence to secondary structure
• Protein tertiary structure
• Predicting protein structure
Summary
To Do
• Second task: Use your genes from the first task and obtain the protein sequence of all real genes, characterizing physico-chemically, predicting/finding the localization of proteins, their post translational modifications. Finish by creating structural models of each of your proteins. Write a small paper describing all your procedures and results in less than 8 pages, double spaced and in times new roman font, no smaller than 12 points. Tables (maximum 2) and figures (maximum 5) are allowed and are not included in the page limit. Organize your paper in the following way: introduction, methods, results, conclusions and discussion, bibliography, Table, Figures, with figure captions.