/105© Burkhard Rost
�1
title: Computational Biology 1 - Protein structure: Intro into protein structure
short title: cb1_intro2_structure
lecture: Protein Prediction 1 - Protein structure Computational Biology 1 - TUM Summer 2016
/105© Burkhard Rost
Videos: YouTube / www.rostlab.org/talks THANKS : . EXERCISES: Special lectures: • TBN No lecture: • 04/26 Security check Rostlab (exercise WILL be) • 05/01 May Day (also no exercise) • 05/10 Ascension Day (also no exercise) • 05/22 Whitsun holiday (also no exercise) • 05/31 Corpus Christi (also no exercise) • 06/21 no lecture (but exercise) LAST lecture: bef: Jul 14 after: Jul xx Examen: x(!!) J TBA • Makeup: Oct 16 (TBC)
�2
CONTACT: [email protected]
Dmitrij Nechaev
Your Name
Lothar Richter
Michael Heinzinger
http://www.rostlab.orgmailto:[email protected]
/105© Burkhard Rost
�3
Questions about last lecture?
??
???
© Wikipedia
© Burkhard Rost
Recap
�4
/105© Burkhard Rost
common to life: DNA/cellsDNA->RNA->proteins = machinery of life
�5
Recap: genes - proteins
/105© Burkhard Rost
�6
Central dogma
slide: Andrea [email protected]
Structure
Function
/105© Burkhard Rost
�7
Codon wheel
© yourgenome.org slide: Andrea Schafferhans
/105© Burkhard Rost
run: 6 hoursfull capacity: ~5-10 TB data / day
�8
Illumina: MiSeq
Illumina - Early 2012
/105© Burkhard Rost
common to life: DNA/cellsDNA->RNA->proteins = machinery of lifeproteins made up of amino acids (20 different: like pearl chains with pearls of 20 different sizes & “colors”, i.e. biophysical features)~20k proteins in human~11M protein sequences knownprotein length (number of amino acids): 35-30K
�9
Recap: genes - proteins
/105© Burkhard Rost
�10
1ej9 Topoisomerase
1i6h RNA Polymerase
1ais TATA-binding Protein/Transcription Factor IIB
1cgp Catabolite GeneActivator Protein
1tau DNA Polymerase
1lbh+1efa lacRepressor
1aoi Nucleosome
1aoi Nucleosome
1b7t Myosin 1atn Actin
1tub Microtubule1bkv Collagen
© David Goodsell
scale reduced
/105© Burkhard Rost
�11
Cells: outside & inside
Illustration of Mycoplasma genitalium by David S. Goodsell, the Scripps Research Institute, UCSD, USA
/105© Burkhard Rost
Previous lecture• Organisms, genes, central dogma
TODAY: Protein introduction• Amino acids • Protein structure • Bonds & energies • domains • 3D comparisons
NEXT lectures• sequence comparisons/alignments
�12
TOC today
© Burkhard Rost
Protein Prediction I: Beginners
1 Introduction
1.2 - Proteins/domains
1.3 - 3D comparisons
�13
© Burkhard Rost
Reality and images
�14
/105© Burkhard Rost
�15
slide: Marco Punta
/105© Burkhard Rost
Georges Braque - Houses at L'Estaque
�16
slide: Marco Punta
/105© Burkhard Rost
�17
Where is that?
Illustration by David S. Goodsell, the Scripps Research Institute, UCSD, USA
/105© Burkhard Rost
�18
Mycoplasma genitalium
Illustration by David S. Goodsell, the Scripps Research Institute, UCSD, USA
/105© Burkhard Rost
�19
Eukaryotic cell
Illustration by David S. Goodsell, the Scripps Research Institute, UCSD, USA
/105© Burkhard Rost
�20
slide: Marco Punta
/105© Burkhard Rost
Doyle et al. (1998) Science 280:69-77 - The structure of the potassium channel: molecular basis of K+ conduction and selectivity �21
slide: Marco Punta
/105© Burkhard Rost
�22
http://www.proteopedia.org/wiki/images/7/7b/1htb2.png
Alcohol dehydrogenase (ADH)
PC Sanghani, H Robinson, WF Bosron, TD Hurley (2002) Biochemistry 41:10778-86
http://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Protein_ADH5_PDB_1m6h.png/800px-Protein_ADH5_PDB_1m6h.png
homodimer ADH5
http://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.pnghttp://www.proteopedia.org/wiki/images/7/7b/1htb2.png
/105© Burkhard Rost
�23
slide: Marco Punta
/105© Burkhard Rost
Umberto Boccioni - Dynamism of a soccer player
�24
slide: Marco Punta
/105© Burkhard Rost
Photograph: Filippo Monteforte/AFP/Getty Images
�25
slide: Marco Punta
Umberto Boccioni - Dynamism of a soccer player
/105© Burkhard Rost
�26
Different levels of abstraction
Wu et al. unpublished
Photograph: Filippo Monteforte/AFP/Getty Images slide: Marco Punta
Umberto Boccioni - Dynamism of a soccer player
© Burkhard Rost
Constituents of proteins: amino
acids
�27
/105© Burkhard Rost
CH
R
H NH
C O
OH
backbone
side-chain
�28
Amino acid
slide: Marco Punta
/105© Burkhard Rost
CH
R
H NH
C OH
backbone
side-chain
O
isolated amino acid
�29
Joining amino acids into proteins
slide: Marco Punta
/105© Burkhard Rost
�30
Joining amino acids into proteins
CH
R
H NH
C O
backbone
side-chain
HC
R
HN C
OH
Obackbone
side-chain
a dipeptide
From Wikipedia
slide: Marco Punta
/105© Burkhard Rost
�31
Joining amino acids into proteins
CH
R
H NH
C O
backbone
side-chain
HC
R
HN C
OH
Obackbone
side-chain
a dipeptide
From Wikipediawww.webchem.net/notes/chemical_bonding/covalent_bonding.htm
slide: Marco Punta
http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm
/105© Burkhard Rost
�32
CH
R
H NH
C O
backbone
side-chain
HC
R
HN C
OH
Obackbone
side-chain
a dipeptide
From Wikipedia
www.webchem.net/notes/chemical_bonding/covalent_bonding.htm
Joining amino acids into proteins
slide: Marco Punta
http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm
/105© Burkhard Rost
�33
CH
R
H NH
C O
backbone
side-chain
HC
R
HN C
OH
Obackbone
side-chain
a dipeptide
From Wikipedia
www.webchem.net/notes/chemical_bonding/covalent_bonding.htm
Joining amino acids into proteins
http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm
/105© Burkhard Rost
�34
CH
R
H NH
C O
backbone
side-chain
HC
R
HN C
OH
Obackbone
side-chain
a dipeptide
From Wikipedia
www.webchem.net/notes/chemical_bonding/covalent_bonding.htm
Joining amino acids into proteins
http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm
/105© Burkhard Rost
�35
CH
R
H NH
C O
backbone
side-chain
HC
R
HN C
OH
Obackbone
side-chain
a dipeptide
From Wikipedia
www.webchem.net/notes/chemical_bonding/covalent_bonding.htm
Joining amino acids into proteins
http://www.webchem.net/notes/chemical_bonding/covalent_bonding.htmhttp://www.webchem.net/notes/chemical_bonding/covalent_bonding.htm
/105© Burkhard Rost
�36
Joining amino acids into proteins
CH
R
H NH
C O
backbone
side-chain
HC
R
HN C
Obackbone
side-chain
CH
R
NH
C O
backbone
side-chain
HC
R
HN C
OH
Obackbone
side-chain
N’
C’
polypeptide chain
/105© Burkhard Rost
�37
Joining amino acids into proteins
CH
R
H NH
C OH
backbone
side-chain
O
/105© Burkhard Rost
�38
Joining amino acids into proteins
/105© Burkhard Rost
�39
Joining amino acids into proteins
/105© Burkhard Rost
�40
Joining amino acids into proteins
© Burkhard Rost
Rationalizing biophysical features
of constituents
�41
/105© Burkhard Rost
�42
Side chain properties
/105© Burkhard Rost
�43
Side chain properties
+R
+K
/105© Burkhard Rost
�44
Negatively charged amino acids-
-D
E
/105© Burkhard Rost
�45
Polar amino acids
+----
++
+N
Q
S
YT
/105© Burkhard Rost
�46
amino acids
© Burkhard Rost
“components” of protein structure:
domains
�47
/105© Burkhard Rost
�48
Domain from introns?
© Wikipedia
RNA splicing
Gene product = protein
/105© Burkhard Rost
�49
Domain merger
prokaryote P, protein Aprokaryote P, protein B
prokaryote P2, protein C
/105© Burkhard Rost
�50
3D modules
Multiple 3D alignment identifies consensus secondary structure
© C
hrist
ine O
reng
o
/105© Burkhard Rost
�51
3D modules
Multiple 3D alignment identifies consensus secondary structure
© C
hrist
ine O
reng
o
/105© Burkhard Rost
�52
Guessing domains from sequenceprotein Aprotein Bprotein Cprotein Dprotein Eprotein F
domain 1 domain 2
/105© Burkhard Rost
�53
Guessing domains from sequenceprotein Aprotein Bprotein Cprotein Dprotein Eprotein F
domain 1 domain 2
/105© Burkhard Rost
�54
Most proteins multi-domain
Liu, Hegyi, Acton, Montelione & Rost 2003 Proteins 56:188-200Liu & Rost 2004 Proteins 55:678-686
Single-domain proteins: 61% in PDB 28% in 62 proteomes
© Burkhard Rost
3D comparisons: principle idea
how to?
�55
/105© Burkhard Rost
�56
Matching shapes
© http://www.whattoexpect.com/toddler/photo-gallery/best-toys-for-toddlers.aspx#/slide-3
http://www.whattoexpect.com/toddler/photo-gallery/best-toys-for-toddlers.aspx#http://www.whattoexpect.com/toddler/photo-gallery/best-toys-for-toddlers.aspx#http://www.whattoexpect.com/toddler/photo-gallery/best-toys-for-toddlers.aspx#
/105© Burkhard Rost
�57
How to match?
?
/105© Burkhard Rost
�58
How to match?
?
/105© Burkhard Rost
�59
Differences for corresponding points
/105© Burkhard Rost
�60
Differences for corresponding points
/105© Burkhard Rost
�61
Differences for corresponding points
/105© Burkhard Rost
�62
Differences for corresponding points
/105© Burkhard Rost
�63
Differences for corresponding points
12
34
56
87
/105© Burkhard Rost
�64
Differences for corresponding points
12
34
56
87
Difference
= d1+d2+d3...+d8
= |r1a-r1b|+...+|r8a-r8b|
RMSD (root mean square deviation)
=SQRT [ (r1a-r1b)2+...(r8a-r8b)2 ]
=
!"#$ !,! = !!! − !!!!
!!
/105© Burkhard Rost
�65
Differences for corresponding points
12
34
56
87
12
34
56
87
!"#$ !,! = !!! − !!!!
!!
/105© Burkhard Rost
1st: find corresponding points2nd: superimpose
�66
Actual algorithm inverted
!"#$ !,! = !!! − !!!!
!!
/105© Burkhard Rost
�67
fit now?
/105© Burkhard Rost
�68
Scaling easy for simple shapes
x^2+y^2=r^2
/105© Burkhard Rost
�69
Proteins: points are defined->no scaling
12
34
56
87
Doyle et al (1998) Science 280:69-77 - The structure of the potassium channel
/105© Burkhard Rost
�70
Global vs. local comparisons
/105© Burkhard Rost
�71
Global vs. local comparisons
/105© Burkhard Rost
�72
Global vs. local comparisons
global
solution 1:
global
solution 2:
/105© Burkhard Rost
�73
cut into “units”
/105© Burkhard Rost
�74
cut into “units”
/105© Burkhard Rost
�75
trouble: where to stop?
valid “unit”
for comparison?
/105© Burkhard Rost
�76
How to decide what is a valid unit?
??
???
/105© Burkhard Rost
�77
valid “unit”
for comparison?
Decision upon validity
/105© Burkhard Rost
Scientifically significant: some expert says
�78
Valid or not?
/105© Burkhard Rost
�79
How can a machine decide what is a valid unit?
??
???
/105© Burkhard Rost
Scientifically significant: some expert says
Statistically significant: background
�80
Valid or not?
background
signal
/105© Burkhard Rost
�81
Cut, match, compare by RMSD
!"#$ !,! = !!! − !!!!
!!
/105© Burkhard Rost
�82
Only Cartesian RMSD comparison?
12
34
56
87
12
34
56
87
!"#$ !,! = !!! − !!!!
!!
/105© Burkhard Rost
�83
2D: difference matrix
12
34
56
87
1 2 3 4 5 6 7 81
2 3 4 5 6 7 8
/105© Burkhard Rost
�84
Comparison 2D: differences of differences
12
34
56
87
Total of 8 x 8 differences
© Burkhard Rost
3D comparisons: biology
�85
/105© Burkhard Rost
Slides taken from Patrice Koehl, UC Davis
�86
Structure alignment
Patrice Koehl
/105© Burkhard Rost
1. Identify equivalent positions (residues that match in 3D)2. find superposition independent of domain movements
�87
Structure alignment: two steps
Patrice Koehl UC Davies: http://nook.cs.ucdavis.edu:8080/~koehl/Classes/CSB/lecture6.pdf© Patrice Koehl, UC Davis
http://nook.cs.ucdavis.edu:8080/~koehl/Classes/CSB/lecture6.pdf
/105© Burkhard Rost
Step 1: find corresponding points in proteins A and Bd(i) are the distances between all corresponding points (typic: Calpha, all atoms)
�88
rmsd(A,B) = N1
i=1
N
di2
Root mean square displacement (rmsd)
/105© Burkhard Rost
�89
RMSD is not a metric
A similar B B similar C NOT implying: A similar C
cRMSD = 2.8 Ǻ = 0.28 nm
A
B C
cRMSD = 2.85 Ǻ = 0.285 nm
© Burkhard Rost
DALI 3D alignment
Holm & Sander
�90
/105© Burkhard Rost
�91
PQITLWQRPLVTIKIGGQLKEALLDTGADDTVL
PP PQQQYFFQVISSIVRLLSTLWWQEDRKQAKRRRPQPPPPPVVTKFVVLIITTKEKAALIVHYKKFIILVIEENGGGGGTGQQKRRPPLWWVVFKVEESKKVVGLGLLILLLLLVVDDDDDTTTTTGGGGGAAAAADDDDDDDAKESSTTVIIVIVVVIVL
1281757077
120238169200247114740
904
466268
11831
1241
292449726217
102691
140
1109760691481976248590
690
730
415371597395000
5851300
79586900
EEEEE
EEEEEE
EEEEEEE
EE
EEEEE
EEEEEE
EE
kcal/mol0 -1 -2 -3 -4 -5
1 10 20 30 40 50 60 70 80 90
1
10
20
30
40
50
60
70
80
90
1D1D 2D2D 3D3D
Notation: protein structure 1D, 2D, 3D
/105© Burkhard Rost
L Holm & C Sander (1993) JMB 233:123-38
Distance matrix AlignmentAlgorithm: Monte Carlo on all-against-all for hexapeptides (5)
�92
Structural alignment: DALI
L Holm & C Sander 91993) JMB 233:123-38: Fig. 1
© Burkhard Rost
Vorolign 3D alignment
Birzele & Zimmer
�93
/105© Burkhard Rost
Fabian Birzele, Jan E Gewehr, Gergely Csaba & Ralf Zimmer (2006) Bioinformatics 23:e205-11Dynamic programming on Voronoi environments
�94
Structural alignment: VOROLIGN
F Birzele, JE Gewehr, G Csaba & R Zimmer (2006) Bioinformatics 23:e205-11: Fig. 2
© Burkhard Rost
3D comparisons: local vs. global
�95
/105© Burkhard Rost
�96
2 forms of calcium-bound Calmodulin
Two forms of calcium-bound Calmodulin:
Ligand free
Complexed with trifluoperazine
Patrice Koehl UC Davies: http://nook.cs.ucdavis.edu:8080/~koehl/Classes/CSB/lecture6.pdf
© Patrice Koehl, UC Davis
/105© Burkhard Rost
�97
Global alignment: RMSD =15 Å /143 residues
Local alignment: RMSD = 0.9 Å/ 62 residues
Patrice Koehl UC Davies: http://nook.cs.ucdavis.edu:8080/~koehl/Classes/CSB/lecture6.pdf
© Patrice Koehl, UC Davis
© Burkhard Rost
Many other 3D alignment methods
exist
�98
© Burkhard Rost
Recap: proteins/cells
�99
/105© Burkhard Rost
�100
Cells: outside & inside
Illustration of Mycoplasma genitalium by David S. Goodsell, the Scripps Research Institute, UCSD, USA
/105© Burkhard Rost
�101
A gallery of proteins
1ej9 Topoisomerase
1i6h RNA Polymerase
1ais TATA-binding Protein/Transcription Factor IIB
1cgp Catabolite GeneActivator Protein
1tau DNA Polymerase
1lbh+1efa lacRepressor
1aoi Nucleosome
1aoi Nucleosome
1b7t Myosin 1atn Actin
1tub Microtubule1bkv Collagen
© David Goodsell
scale reduced
/105© Burkhard Rost
�102
HIV-1 and a Human T-cell
gp120
CD4
CCR5
HIV-1envelope
glycoprotein
T-cellsurfaceslide: Natasha Wood, Cape Town
/105© Burkhard Rost
�103
HIV-1 and a Human T-cell
IMAGE:http://www.sciencemag.org/content/320/5877/760/F3.large.jpg
V3 loop
slide: Natasha Wood, Cape Town
/105© Burkhard Rost
�104
3D modules
Multiple 3D alignment identifies consensus secondary structure
© C
hrist
ine O
reng
o
/105© Burkhard Rost
01: 04/10 Tue: No lecture02: 04/12 Thu: No lecture03: 04/17 Tue: No lecture04: 04/19 Thu: Intro 1: organization of lecture: intro into cells & biology05: 04/24 Tue: Intro 2: amino acids, protein structure (comparison), domains06: 04/26 Thu: No lecture07: 05/01 Tue: SKIP: May Day08: 05/03 Thu: Alignment 109: 05/08 Tue: Alignment 210: 05/10 Thu: SKIP: Ascension Day11: 05/15 Tue: Comparative modeling & exp structure determination & secondary structure assignment12: 05/17 Thu: 1D: Secondary structure prediction 113: 05/22 Tue: SKIP: Whitsun holiday14: 05/24 Thu: 1D: Secondary structure prediction 215: 05/29 Tue: 1D: Secondary structure prediction 316: 05/31 Thu: SKIP: Corpus Christi17: 06/05 Tue: 1D: Transmembrane structure prediction 118: 06/07 Thu: 1D: Transmembrane structure prediction 2 / Solvent accessibility prediction19: 06/12 Tue: 1D: Transmembrane structure prediction 3 / Solvent accessibility prediction20: 06/14 Thu: 1D: Disorder prediction21: 06/19 Tue: 2D prediction / 3D prediction22: 06/21 Thu: No lecture23: 06/26 Tue: recap 124: 06/28 Thu: recap 225: 07/03 Tue: TBA26: 07/05 Thu: TBA27: 07/10 Tue: TBA28: 07/12 Thu: TBA
�105
Lecture plan (CB1 structure: INF)
today