+ All Categories
Home > Documents > creativecommons/licenses/by-sa/2.0

creativecommons/licenses/by-sa/2.0

Date post: 18-Jan-2016
Category:
Upload: bracha
View: 21 times
Download: 0 times
Share this document with a friend
Description:
http://creativecommons.org/licenses/by-sa/2.0/. From Protein Sequence to Protein Structure. Prof:Rui Alves [email protected] 973702406 Dept Ciencies Mediques Basiques, 1st Floor, Room 1.08 Website of the Course: http://web.udl.es/usuaris/pg193845/Courses/Bioinformatics_2007/ - PowerPoint PPT Presentation
Popular Tags:
36
http:// creativecommons.org/ licenses/by-sa/2.0/
Transcript
Page 1: creativecommons/licenses/by-sa/2.0

http://creativecommons.org/licenses/by-sa/2.0/

Page 2: creativecommons/licenses/by-sa/2.0

From Protein Sequence to Protein Structure

Prof:Rui [email protected]

973702406Dept Ciencies Mediques Basiques,

1st Floor, Room 1.08Website of the

Course:http://web.udl.es/usuaris/pg193845/Courses/Bioinformatics_2007/ Course: http://10.100.14.36/Student_Server/

Page 3: creativecommons/licenses/by-sa/2.0

• Fundamentals of protein structure

• From protein sequence to secondary structure

• Protein tertiary structure

• Predicting protein structure

Outline

Page 4: creativecommons/licenses/by-sa/2.0

Predicting protein sequence from DNA sequence

• Protein sequence can be predicted by translating the cDNA and using the genetic code.

Page 5: creativecommons/licenses/by-sa/2.0

MQTLSERLKKRRIALKMTQTELATKAGVKQQSIQLIEAGVTKRPRFLFEIAMALNCDPVWLQYGTKRGKAA

atgcaaactctttctgaacgcctcaagaagaggcgaattgcgttaaaaatgacgcaaaccgaactggcaaccaaagccggtgttaaacagcaatcaattcaactgattgaagctggagtaaccaagcgaccgcgcttcttgtttgagattgctatggcgcttaactgtgatccggtttggttacagtacggaactaaacgcggtaaagccgcttaa

augcaaacucuuucugaacgccucaagaagaggcgaauugcguuaaaaaugacgcaaaccgaacuggcaaccaaagccgguguuaaacagcaaucaauucaacugauugaagcuggaguaaccaagcgaccgcgcuucuuguuugagauugcuauggcgcuuaacugugauccgguuugguuacaguacggaacuaaacgcgguaaagccgcuuaa

Proteins are the primary functionalmanifestation of genomes

DNA sequence

RNA sequence

proteinsequence

proteinstructure

Protein function

transcription

translation

Being able to predict the protein sequence from the gene sequence allows us to predict structure, which in turn helps us understand how the protein does what it does

Page 6: creativecommons/licenses/by-sa/2.0

• The sequence of AAs is the primary structure of proteins• Sequence determines structure• Amino acids don’t fall neatly into classes• How we casually speak of them can affect the way we think

about their behavior. For example, if you think of Cys as a polar residue, you might be surprised to find it in the hydrophobic core of a protein unpaired to any other polar group. But this does happen.

• The properties of a residue type can also vary with conditions/environment

Amino acids are the primary building blocks of proteins

Page 7: creativecommons/licenses/by-sa/2.0

Grouping the amino acids by properties

Livingstone & Barton, CABIOS, 9, 745-756, 1993.

Page 8: creativecommons/licenses/by-sa/2.0

Proteins are made by controlled polymerization of amino acids

H2N CH C

R1

OH

O

H2N CH C

R2

OH

O

H2N CH C

R1

NH

O

CH C

R2

OH

O

pe ptide bond is formed

+ HOH

res idue 1 res idue 2

two amino a cidscondense to form...

...a dipeptide . Ifthe re a re more itbe comes a polype ptide .S hort polype ptide cha insa re usua lly ca lled peptideswhile longer one s a re ca lle dprote ins .

wa te r is e limina ted

N or aminote rminus

C or ca rboxyte rminus

Page 9: creativecommons/licenses/by-sa/2.0

• Fundamentals of protein structure

• From protein sequence to secondary structure

• Protein tertiary structure

• Predicting protein structure

Outline

Page 10: creativecommons/licenses/by-sa/2.0

Repeating torsion angles

/ angles characterize the secondary structure

Page 11: creativecommons/licenses/by-sa/2.0

Secondary structure elements in proteins

beta-strand(nonlocal interactions)

alpha-helix (local interactions)

A secondary structure element is a contiguous region of a protein sequence characterized by a repeating pattern of main-chain hydrogen bonds and backbone phi/psi angles

Reflect the tendency of backbone to hydrogen bond with itself in a semi-ordered fashion when compacted

Page 12: creativecommons/licenses/by-sa/2.0

Principal types of secondary structure found in proteins

Repeating (f,y) values

-63o -42o

-57o -30o

-119o +113o

-139o +135o

-helix(15) (right-handed)

310 helix(14)

Parallel -sheet

Antiparallel -sheet

Page 13: creativecommons/licenses/by-sa/2.0

The alpha-helix: repeating i,i+4 h-bonds

2

1

3

4

5

7

8

9

6

10

11

12

By DSSP definitions, which of residues 1-12 are in the helix? Does this coincide with the residues in the helical region of phi-psi space?

right-handed helical region of phi-psi space

hydrogen

bond-63o -42o

-helix(15) (right-handed)

-60

-120

-180

0

60

120

180

-180 -120 -60 0 60 120

Page 14: creativecommons/licenses/by-sa/2.0

strands/sheets

Is this a parallel or anti-parallel sheet?

49

50

51

52

53

54

57

56

beta-strand region of phi-psi space

By DSSP definitions, which of res 49-57 are in the sheet? Does this coincide with the residues in the beta-strand region of phi-psi space?

-119o +113o

Parallel -sheet

-60

-120

-180

0

60

120

180

-180 -120 -60 0 60 120 180

Page 15: creativecommons/licenses/by-sa/2.0

Contact maps of protein structures

1avg--structure of triabin

map of C-C distances < 6 Å

rainbow ribbon diagramblue to red: N to C

-both axes are the sequence of the protein

near diagonal: local contacts in the sequence

off-diagonal: long-range (nonlocal) contacts

Page 16: creativecommons/licenses/by-sa/2.0

• Secondary structure is the sequence of fold elements in a protein (--loop) - The number and order of secondary structures in the sequence (connectivity) and their arrangement in space defines a protein’s fold or topology

• If, from the primary structure one can predict secondary structure, then this may help in predicting protein function, via evolutionary relationships with known folds

What is secondary structure and what does it teach?

Page 17: creativecommons/licenses/by-sa/2.0

Predicting the secondary structure of your protein

Page 18: creativecommons/licenses/by-sa/2.0

• Fundamentals of protein structure

• From protein sequence to secondary structure

• Protein tertiary structure

• Predicting protein structure

Outline

Page 19: creativecommons/licenses/by-sa/2.0

Tertiary structure in proteins

• Single polypeptide chain

• The number and order of secondary structures in the sequence (connectivity) and their arrangement in space defines a protein’s fold or topology

• Pattern of contacts between side chains/backbone also an aspect of tertiary structure

• Outer surface and interior

Page 20: creativecommons/licenses/by-sa/2.0

Obvious interactions in native protein structures

S

S

R3

R1R2

CO2

NH3

ONH

disulfide crosslinks polar interactions (hydrogen bond/salt bridge)

hydrophobic interactions

Page 21: creativecommons/licenses/by-sa/2.0

The protein databank

The protein databank is a central repository of protein structures

http://www.rcsb.org/pdb/home/home.do

Page 22: creativecommons/licenses/by-sa/2.0

Major structure classification systems

SCOP (Structural Classification of Proteins)CATH (Class-Architecture-Topology-Homology)DALI/FSSP (Fold classification based on Structure-Structure Alignment)

SCOP and CATH are quite similar and generally combine automated and manual aspects. They are both “curated” by human experts.

Page 23: creativecommons/licenses/by-sa/2.0

• Fundamentals of protein structure

• From protein sequence to secondary structure

• Protein tertiary structure

• Predicting protein structure

Outline

Page 24: creativecommons/licenses/by-sa/2.0

Training set of known structures

Training set of corresponding sequences

Test set of known structures

Test set of corresponding sequences

The knuts and bolts behind fold predition

p(-helix) p(coil) p(-strand)

A 0.23 0.28 0.5

Database of known structures

Database of corresponding sequences

ACDEFGTYAEE……

-helix coil -strand

p(-helix) p(coil) p(-strand)

A…C… A…C.. A…C…

A 0.1…0.03 0.04…0.002 0.1…0.21

p(aa1-coil) p(aa1-helix)

p(aa1-strand) …

Predict 2ary structureCompare

Bad Predictions:

Reshuffle training set and test set and repeat until predictions are correct

Good Predictions:

Method ready for new sequence 2ndary structure prediction

Page 25: creativecommons/licenses/by-sa/2.0

How does a fold prediction server work?

Database of known structures

Database of corresponding sequences

Database of probabilities of aa in 2ndary structure

YOUR SEQUENCE

Homology

based helix

coil-strand

profile folds database

Server

Strong Homology

… Fold Prediction

Weak/No Homology

Helix-coil-strand

profile prediction

… Fold Prediction

Page 26: creativecommons/licenses/by-sa/2.0

Predicting protein folding

Page 27: creativecommons/licenses/by-sa/2.0

Predicting protein structure

• Homology Modeling– Phyre, 3D-JIGSAW, SWISSMODEL

• Ab initio Modeling– ROBETTA

Page 28: creativecommons/licenses/by-sa/2.0

Predicting protein structure by homology

Page 29: creativecommons/licenses/by-sa/2.0

How does a homology modeling server work?

Database of known structures

Database of corresponding sequences

…YDVRSEQVENCE…

Server/

Program

Strong Homologues

Best possible alignment

(Sequence+

Structure)

…YDVR-SEQVENCE…

…YDVRMSD-VDNCD…

…YDVR-SEQVENCE…

…YDVRMSD-VDNCD…

Thread sequence to predict over known structure according to alignment

… Optimization via energy

minimization, etc…

Page 30: creativecommons/licenses/by-sa/2.0

Predicting protein structure

• Homology Modeling– 3D-JIGSAW,SWISSMODEL

• Ab initio Modeling– ROSETTA

Page 31: creativecommons/licenses/by-sa/2.0

Predicting protein structure by ab initio methods

Database of corresponding sequences

…YDVRSEQVENCE…

Server/

Program

NO Homologues

Database of structures for smaller amino acid runs

…YDVR-SEQ

…YDVRMSD-……YDVR-SEQ

…YPVRMSD-…

…VENCE…

…YDNCD……VENCE…

…VEQCE…

… Assemble

Energy minimization

& optimization

Page 32: creativecommons/licenses/by-sa/2.0

Accuracy of modelling

• Accuracy is widely varying.• The quality of the model is VERY dependent on

the quality of the alignment • Globular proteins are more accurately predicted• Membrane proteins are still a big problem• Homology modelling is “bad” if Homology<30%• CASP is a bienial meeting where accuracy of the

different methods is predicted– Baker group is usually and consistently more accurate

than others

http://www.predictioncenter.org/

Page 33: creativecommons/licenses/by-sa/2.0

BLAST Algorithm

• Sequences are split into words (default n=3)– Speed, computational efficiency

• Scoring of matches done using scoring matrices• HSP = high scoring segment pair

– BLAST algorithm extends the initial “seed” hit into an HSP

• Local optimal alignment• More than one HSP can be found

Page 34: creativecommons/licenses/by-sa/2.0

Sequence-Structure Hybrid alignments

ACEFGHIKLMNPQRSTVWYAALII….ACDYGHIKLCQANRSTVWY ALII….ACDYGHIKLCQANRSTVWY -ALII….

aaaaaaaaa l l l l l aaaaaaaaaa….aaaaaaaaaaaaaaaaaaaaaaaa….

Using a probability model to predict secondary structure we can align the secondary structures

If 3D structures are available for homologues, then structure can be used to improve alignment. STRAP does that:

http://www.charite.de/bioinf/strap/

Page 35: creativecommons/licenses/by-sa/2.0

• DNA sequence to protein sequence

• From protein sequence to secondary structure

• Protein tertiary structure

• Predicting protein structure

Summary

Page 36: creativecommons/licenses/by-sa/2.0

To Do

• Second task: Use your genes from the first task and obtain the protein sequence of all real genes, characterizing physico-chemically, predicting/finding the localization of proteins, their post translational modifications. Finish by creating structural models of each of your proteins. Write a small paper describing all your procedures and results in less than 8 pages, double spaced and in times new roman font, no smaller than 12 points. Tables (maximum 2) and figures (maximum 5) are allowed and are not included in the page limit. Organize your paper in the following way: introduction, methods, results, conclusions and discussion, bibliography, Table, Figures, with figure captions.


Recommended