/134© Burkhard Rost
1
title: Computational Biology 1 - Protein structure: Sequence alignments 2
short title: cb1_alignments2
lecture: Protein Prediction 1 - Protein structure Computational Biology 1 - TUM Summer 2016
/134© Burkhard Rost
Videos: YouTube / www.rostlab.org THANKS : . EXERCISES: Special lectures: • 07/xx Predrag Radivojac - Indiana Univ. • 06/xx Yana Bromberg - Rutgers Univ. No lecture: • 05/09 no lecture • 05/15 Ascension day • 05/23 Student assembly (SVV) • 06/06 Whitsun holiday • 06/15 Corpus Christi LAST lecture: bef: Jul 11 after: Jul 28 Examen: WEDNESDAY(!!) July 12: 18:00-19:30 TBA • Makeup: TBC: Oct 17 & 19, 2017 - lecture time
2
CONTACT: Lothar Richter [email protected]
Dmitrij Nechaev
Lothar Richter
Christian Dallago
http://www.rostlab.orgmailto:[email protected]?subject=
© Burkhard Rost (TUM Munich)
Recap: 3D comparison
3
/134© Burkhard Rost
4
Notation: protein structure 1D, 2D, 3DPQITLWQRPLVTIKIGGQLKEALLDTGADDTVL
PP PQQQYFFQVISSIVRLLSTLWWQEDRKQAKRRRPQPPPPPVVTKFVVLIITTKEKAALIVHYKKFIILVIEENGGGGGTGQQKRRPPLWWVVFKVEESKKVVGLGLLILLLLLVVDDDDDTTTTTGGGGGAAAAADDDDDDDAKESSTTVIIVIVVVIVL
1281757077
120238169200247114740
904
466268
11831
1241
292449726217
102691
140
1109760691481976248590
690
730
415371597395000
5851300
79586900
EEEEE
EEEEEE
EEEEEEE
EE
EEEEE
EEEEEE
EE
kcal/mol0 -1 -2 -3 -4 -5
1 10 20 30 40 50 60 70 80 90
1
10
20
30
40
50
60
70
80
90
1D1D 2D2D 3D3D
/134© Burkhard Rost
5
Dynamic programming: Global
GGQLAKEEALEGQPVEVL
GGQLAKEEAL EGQPVEVL
GGQLAKEEAL..EGQPVEVL
/134© Burkhard Rost
6
Dynamic programming concept no gaps
G G Q L A K E E A LE 1 1G 1 1 1 1Q 1 2 1 1P 1 2 1V 1 2E 1 2V 1 2L 1 2P 1 2
© Burkhard Rost /134
allowing for gaps
7
GGQLAKEEAL GQ..PVEVL
GGQLAKEEAL GQPVEVL
no gap with gap
/134© Burkhard Rost
8
Dynamic programming concept gaps
G G Q L A K E E A LE 1 1G 1 1 1 1Q 2 1 1P 2 1V 2E 3V 3L 4PScore = ∑ 𝛿ij - Go * Ngap
GQPVE•V•LGQLAKEEAL
GQGQ
/134© Burkhard Rost
9
Dynamic programming: optimal alignment
€
SW = MUkTkk=1
Lali∑ - Go ⋅ Ngap - Ge ⋅ (Lgap - Ngap )
• Global/no gap:
SB Needleman and CD Wunsch 1970 J Mol Biol 48, 443-53
• Local/Gap:
TF Smith and MS Waterman 1981 J Mol Biol 147, 195-197
/134© Burkhard Rost
10
Alignments: scoring matrixsynonyms:
[scoring|substitution| exchange|log-odds]
[matrix|metric|table]
one particular: Blossum65
S Henikoff & J Henikoff (1992) PNAS 89:10915-9
© Burkhard Rost /13411
Many more substitution matrices exist today
Most use: BLOSUM62
/134© Burkhard Rost
12
Interactive software toolIgnacio Ibarra & Francisco Melo: Interactive software tool to comprehend the calculation of optimal sequence alignments with dynamic programming Bioinformatics 2010, 26:1664-5
http:/melolab.org/sat
http:/melolab.org/sat
/134© Burkhard Rost
13
BLAST: fast matching of single ‘words’
TTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEKTIYKLILQGRTIKAELITEGVDGATGEKVYKQYGNQNAVDAEYTYDNATRTFTITQK
Default “word” size for “seeds” = 3
#1 seed=3
#2 extendTTYKLIL AATAEKVFKQYA WTYDDATKTFTIYKLIL GATGEKVYKQYG YTYDNATRTF
? ?
/134© Burkhard Rost
the major challenge for word search algorithms is to get the statistics right
14
Word searches: challenge: statistics
/134© Burkhard Rost
15
Significance of match (e.g. BLAST E-values)
Hits
© Burkhard Rost (TUM Munich)
Sequence comparisons:
multiple alignment
16
© Burkhard Rost (TUM Munich)
How accurate are pairwise alignments?
17
/134© Burkhard Rost
18
Zones
Day
light
Zon
e
Twili
ght Z
one
Mid
nigh
t Zon
eprofile - pro
filesequence - profile
sequence - sequence
sequ
ence
sim
ilar
->
stru
ctur
e sim
ilar
B Rost (1997) Fold Des 2:S19-24
B Rost (1999) Protein Eng 12:85-94
/134© Burkhard Rost
19
Zones
Day
light
Zon
e
Twili
ght Z
one
Mid
nigh
t Zon
eprofile - pro
filesequence - profile
sequence - sequence
sequ
ence
sim
ilar
->
stru
ctur
e sim
ilar
B Rost (1997) Fold Des 2:S19-24
B Rost (1999) Protein Eng 12:85-94
/134© Burkhard Rost
20
Zones
Day
light
Zon
e
Twili
ght Z
one
Mid
nigh
t Zon
eprofile - pro
filesequence - profile
sequence - sequence
sequ
ence
sim
ilar
->
stru
ctur
e sim
ilar
B Rost (1997) Fold Des 2:S19-24
B Rost (1999) Protein Eng 12:85-94
/134© Burkhard Rost
21
40
50
60
70
80
90
100
15 20 25 30 35
Schematic: structural similarity in Twilight zone
not true data: just to illustrate the idea of the twilight
zone
True Positives=pairs of proteins with similar structure
Percentage:
accuracy
-or-
specificity
/134© Burkhard Rost
22
1
10
100
1,000
15 20 25 30 35
Schematic: true and false in Twilight zone
True Positives = pairs of proteins with similar structureFalse Positives=pairs of proteins with DIFFERENT structure
Number of
pairs:
/134© Burkhard Rost
23
All-vs-all: PDB
3D =
structural alignment
1D =
sequence alignment
0.5nm rmsd
SAME 3D
ignore
DIFFER in 3D
/134© Burkhard Rost
24
PDB all-against-all ok?
/134© Burkhard Rost
25
Databases biased: MUST remove bias!
/134© Burkhard Rost
26
Databases biased: MUST remove bias!
Day
light
Zo
ne
Twili
ght
Zone
Mid
nigh
t Zo
ne
redundancy-reduced vs.
redundancy-reduced?
/134© Burkhard Rost
27
Databases biased: MUST remove bias!
/134© Burkhard Rost
28
Sequence conservation of protein structure
B Rost 1999 Prot Engin 12, 85-94 C Sander & R Schneider 1991 Proteins 9:56-69
/134© Burkhard Rost
29
Raw data: density? lesson learned?
B Rost 1999 Prot Engin 12, 85-94 C Sander & R Schneider 1991 Proteins 9:56-69
/134© Burkhard Rost
30
How to estimate performance from the curves?
?gro
ups
groups
groupsgroupsgroups
/134© Burkhard Rost
31
Distance from new HSSP-curve
B Rost 1999 Prot Engin 12, 85-94 C Sander & R Schneider 1991 Proteins 9:56-69
/134© Burkhard Rost
32
Twilight zone = true positives explode
101
102
103
104
105
106
-15 -10 -5 0 5 10
10 15 20 25 30 35
Num
ber o
f pro
tein
pai
rs
Distance from HSSP threshold
Percentage sequence identity
0
20
40
60
80
100
50 100 150 200 250 300 350 400
%
Seq
uenc
e id
entit
y
Number of residues aligned
+25+20
+15+10
+ 5 0
- 5- 10
- 15- 20
B Rost 1999 Prot Engin 12, 85-94
/134© Burkhard Rost
33
Twilight zone = false positives bazoom!!
101
102
103
104
105
106
-15 -10 -5 0 5 10
10 15 20 25 30 35
Num
ber o
f pro
tein
pai
rs
Distance from HSSP threshold
Percentage sequence identity
0
20
40
60
80
100
50 100 150 200 250 300 350 400
%
Seq
uenc
e id
entit
y
Number of residues aligned
+25+20
+15+10
+ 5 0
- 5- 10
- 15- 20
B Rost 1999 Prot Engin 12, 85-94
© Burkhard Rost (TUM Munich)
Multiple alignment:
simple hacks
34
/134© Burkhard Rost
Dynamic programming? for 3 sequences: O(N1 x N2 x N3) NP-complete (L Wang & T Jiang (1994) JCB 1: 337-48)
35
Multiple alignments
G G Q L A K E E A L
E 1 1
G 1 1 1 1
Q 2 1 1P 2 1
V 2
E 3
V
L 4
P3D
/134© Burkhard Rost
Dynamic programming? for 3 sequences: O(N1 x N2 x N3) NP-complete (L Wang & T Jiang (1994) JCB 1: 337-48)
claim: computer: up to 6 ~60 TB main memory no quote-> unsure
36
Multiple alignments
G G Q L A K E E A LE 1 1G 1 1 1 1Q 2 1 1P 2 1V 2E 3VL 4P
/134© Burkhard Rost
Dynamic programming? for 3 sequences: O(N1 x N2 x N3) NP-complete (L Wang & T Jiang (1994) JCB 1: 337-48) hack 1: dynamic programming: pairwise, only space in vicinity of intersection searched n-wise – H Carrillo & DJ Lipman (1988) SIAM J Applied Math. 48: 1073-82
– DJ Lipman, SF Altschul, JC Kececioglu (1989) PNAS 86:4412-5
37
Multiple alignments
/134© Burkhard Rost
Dynamic programming? for 3 sequences: O(N1 x N2 x N3) NP-complete (L Wang & T Jiang (1994) JCB 1: 337-48) hack 1: dynamic programming: pairwise, only space in vicinity of intersection searched n-wise – H Carrillo & DJ Lipman (1988) SIAM J Applied Math. 48: 1073-82
– DJ Lipman, SF Altschul, JC Kececioglu (1989) PNAS 86:4412-5
hack 2: map to tree / pairwise Russell Doolittle, UCSD
38
Multiple alignments
Russell Doolittle
Shapers and Shakers
/134© Burkhard Rost
39
Multiple alignment: progressive 1
A B C DA 90 80 70B 90 80C 90D
GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI
/134© Burkhard Rost
40
Multiple alignment: progressive
A B C DA 90 80 70B 90 80C 90D
GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI
GGQLAKEEALGGQLAKDEALggqlakeeal
Step 1
Step 1
/134© Burkhard Rost
41
Multiple alignment: progressive
A B C DA 90 80 70B 90 80C 90D
GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI
GGQLAKEEALGGQLAKDEALggqlakeeal
Step 1
Step 2
Step 1 Step 2GGQIAKDEALGGQIAKDEAIggqiakdeal
/134© Burkhard Rost
42
Multiple alignment: progressive 1
A B C DA 90 80 70B 90 80C 90D
GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI
GGQLAKEEALGGQLAKDEALggqlakeeal
Step 1
Step 2
Step 1 Step 2GGQIAKDEALGGQIAKDEAIggqiakdeal
ggqlakeealggqiakdeal
Step 3
Step 3
/134© Burkhard Rost
43
Multiple alignment: progressive 2
A B C DA 90 80 70B 90 80C 90D
GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI
GGQLAKEEALGGQLAKDEALggqlakeeal
Step 1
Step 1
/134© Burkhard Rost
44
Multiple alignment: progressive 2
A B C DA 90 80 70B 90 80C 90D
GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI
GGQLAKEEALGGQLAKDEALggqlakeeal
Step 1
Step 2
Step 1 Step 2ggqlakeealGGQIAKDEAL ggqlakeeal
/134© Burkhard Rost
45
Multiple alignment: progressive 2
A B C DA 90 80 70B 90 80C 90D
GGQLAKEEALGGQLAKDEALGGQIAKDEALGGQIAKDEAI
GGQLAKEEALGGQLAKDEALggqlakeeal
Step 1
Step 2
Step 1 Step 2ggqlakeealGGQIAKDEAL ggqlakeeal
Step 3
Step 3ggqlakeealGGQIAKDEAI
© Burkhard Rost (TUM Munich)
Profiles: concept
46
/134© Burkhard Rost
47
Pairwise vs. multiple: built-up
GGQQLAKEEALGGQQLAKDEAL G.GQQLAKEEAL
GGAQGLAKDEAL
/134© Burkhard Rost
48
Pairwise vs. multiple: built-up
GGQQLAKEEALGGQQLAKDEAL G.GQQLAKEEAL
GGAQGLAKDEAL
GGQQLAKEEALGGQQLAKDEAL
GGAQGLAKDEAL
/134© Burkhard Rost
49
Pairwise vs. multiple: built-up
GGQQLAKEEALGGQQLAKDEAL G.GQQLAKEEAL
GGAQGLAKDEAL
GGQQLAKEEALGGQQLAKDEAL
GGAQGLAKDEAL
GG.QQLAKEEALGG.QQLAKDEALGGAQGLAKDEAL
/134© Burkhard Rost
50
Profiles profit from relation of “families”
/134© Burkhard Rost
51
Computationally: motifs
K Robison et al (1998) JMB 284, 241-54retrieved from http://weblogo.berkeley.edu/examples.html
Transcription factor binding sites
Following slides:
thanks kudos to Theresa Wirth
http://weblogo.berkeley.edu/examples.html
/134© Burkhard Rost
52Fig. 7: M Zvelebil & JO Baum (2008) Understanding Bioinformatics, Garland
Sequence motifs
Any AAOne-letter code
Two or more possible AA Disallowed AA Repetition range X(n,m) of X
Matching sequences e.g.
GLLMSACVVVGILMSAYPPGLLMSAES
Representation of a sequence with more than one possible amino acid (AA) or nuclear acid (NA) at a single position
G-[LI]-L-M-S-A-{RK}-X(1,3)
slide: Theresa Wirth
/134© Burkhard Rost
53S Henikoff & J Henikoff (1992) PNAS 89:10915-9
Recap: substitution matrix (BLOSUM)
GENERIC (for all proteins)
/134© Burkhard Rost
54
Scoring matrix: generic vs. specific
slide: Theresa Wirth
Generic
(here Blossum62)
Position-specific
scoring matrix
PSSM
/134© Burkhard Rost
• Matrix of numbers with scores for each residue or nucleotide at each position
55
PSSM - Position specific substitution matrix: concept
PSSM: Position-specific scoring matrix
Building PSSM:
◦ Absolute frequencies
◦ Add pseudo-counts if necessary
◦ Relative frequencies
◦ Log likelihoods
Starting point: 123456
ATGCTA
ATTGCT
TCTGAG
GTTGAG
CCATCC
slide: Theresa Wirth
/134© Burkhard Rost
56
PSSM - Position specific substitution matrix: one solution
Absolutefrequency
A:2T:1G:1C:1
Relativefrequency
A:0.4T:0.2G:0.2C:0.2
Σ=1
1 2 3 4 5 6
A 0.4 0 0.2 0 0.4 0.2
T 0.2 0.6 0.6 0.2 0.2 0.2
G 0.2 0 0.2 0.6 0 0.4
C 0.2 0.4 0 0.2 0.4 0.2
1 2 3 4 5 6
A 0.47 ∞ -0.22 ∞ 0.47 -0.22
T -0.22 0.86 0.86 -0.22 -0.22 -0.22
G -0.22 ∞ -0.22 0.86 ∞ 0.47
C -0.22 0.47 ∞ -0.22 0.47 -0.22
Logodds
slide: Theresa Wirth
/134© Burkhard Rost
57Olga Zhaxybayeva http://carrot.mcb.uconn.edu/~olgazh/bioinf2010/class10.html
PSSM - Position specific substitution matrix: example
slide: Theresa Wirth
http://carrot.mcb.uconn.edu/~olgazh/bioinf2010/class10.htmlhttp://carrot.mcb.uconn.edu/~olgazh/bioinf2010/class10.htmlhttp://carrot.mcb.uconn.edu/~olgazh/bioinf2010/class10.html
© Burkhard Rost (TUM Munich)
PSI-Blast
Position-Specific Iterative-
Basic Local Alignment Tool
58
/134© Burkhard Rost
59
Profile-based comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
/134© Burkhard Rost
60BLOSUM: S Henikoff & J Henikoff (1992) PNAS 89:10915-9
Alignment scores use generic scoring matrix
Generic scoring matrix
(here BLOSUM62)
/134© Burkhard Rost
61
Idea: replace generic by specific scoring
BLOSUM: S Henikoff & J Henikoff (1992) PNAS 89:10915-9
Generic scoring matrix
(here BLOSUM62)
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
© Burkhard Rost (TUM Munich)
concept of PSI-BLAST
62
/134© Burkhard Rost
1. fast hashing
63
PSI-BLAST in steps
/134© Burkhard Rost
64
Like BLAST match ‘words’
TTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEKTTYKLILLLLLLLLLLLLLLLLAWTVEKAFKTFAAAAAAAAAWTVEKAFKTFAAAAA
TTYKLILTTYKLIL
WTYDDATKTFWTVEKAFKTF
AATAEKVFKQYAAWTVEKAFKTFA
? ?
Default “word” size for “seeds” = 3
#1 seed=3
#2 extend
/134© Burkhard Rost
1. fast hashing 2. dynamic programming extension between
matches
65
PSI-BLAST in steps
/134© Burkhard Rost
66
BLAST + Smith-Waterman
TTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEKTTYKLILLLLLLLLLLLLLLLLAWTVEKAFKTFAAAAAAAAAWTVEKAFKTFAAAAA
TTYKLILTTYKLIL
WTYDDATKTFWTVEKAFKTF
AATAEKVFKQYAAWTVEKAFKTFA
#1 seed=3
#2 extend
dynamic programming to extend
/134© Burkhard Rost
67
BLAST + Smith-Waterman
TTYKLILNGKTLKGETTTEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEKTTYKLILLLLLLLLLLLLLLLLAWTVEKAFKTFAAAAAAAAAWTVEKAFKTFAAAAA
TTYKLILTTYKLIL
WTYDDATKTFWTVEKAFKTF
AATAEKVFKQYAAWTVEKAFKTFA
#1 seed=3
#2 extend
dynamic programming to extend
Why is this fast?
/134© Burkhard Rost
1. fast hashing 2. dynamic programming extension between
matches 3. compile statistics
EVAL - Expectation values
68
PSI-BLAST in steps
Hits
/134© Burkhard Rost
1. fast hashing 2. dynamic programming extension between
matches 3. compile statistics 4. collect all pairs and build profile
69
PSI-BLAST in steps
/134© Burkhard Rost
70
Sequence-profile comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
YDFHGVGEDDISIKRG
PSI-BLAST SF Altschul 1997 Nucl Acids Res 25 3389-3402
PS- position specific
/134© Burkhard Rost
1. fast hashing 2. dynamic programming extension between
matches 3. compile statistics 4. collect all pairs and build profile 5. ?
71
PSI-BLAST in steps
??? 1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
YDFHGVGEDDISIKRG
/134© Burkhard Rost
1. fast hashing 2. dynamic programming extension between
matches 3. compile statistics 4. collect all pairs and build profile 5. iterate
72
PSI-BLAST in steps
/134© Burkhard Rost
73
Sequence-profile comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
YDFHGVGEDDISIKRG
PSI-BLAST SF Altschul 1997 Nucl Acids Res 25 3389-3402
PSI- position specific iteration
/134© Burkhard Rost
74
Steps involved for profile-based alignments
sequence-sequence query database using substitution metric
results -> profile
/134© Burkhard Rost
75
Steps involved for profile-based alignments
sequence-sequence query database using substitution metric
results -> profile
profile-sequence query database using PSSM/profile
results -> update profile
/134© Burkhard Rost
76
Steps involved for profile-based alignments
sequence-sequence query database using substitution metric
results -> profile
profile-sequence query database using PSSM/profile
results -> update profile ite
rate
/134© Burkhard Rost
PSI-BLAST fast, partial dynamic programming SF Altschul (1997) NAR 25:3389-3402
ClustalW/ClustalX slow, dynamic programming, for experts JD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80
77
Sequence-profile methods
/134© Burkhard Rost
all against all (pairs) by dynamic programming (varying substitution matrices) build phylogenetic tree
78
Clustal (ClustalW, ClustalX)
A B C DA 90 80 70B 90 80C 90D A B C D
JD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80
/134© Burkhard Rost
PSI-BLAST fast, partial dynamic programming SF Altschul (1997) NAR 25:3389-3402
ClustalW/ClustalX slow, dynamic programming, for experts JD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80
MaxHom relatively slow, dynamic programming, good first guess C Sander & R Schneider (1991) Proteins 9:56-69
79
Sequence-profile methods
/134© Burkhard Rost
PSI-BLAST fast, partial dynamic programming SF Altschul (1997) NAR 25:3389-3402
ClustalW/ClustalX slow, dynamic programming, for experts JD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80
SAM/HMMer slow, need preprocess, HMM (statistics), very accurate R Hughey & A Krogh (1996) CABIOS 12:95-107 S Eddy (1998) Bioinformatics 14:755-63
80
Sequence-profile methods
/134© Burkhard Rost
81
Zones
Day
light
Zon
e
Twili
ght Z
one
Mid
nigh
t Zon
e
profile - profile
sequence - profilesequence - sequence
sequ
ence
sim
ilar
->
stru
ctur
e sim
ilar
B Rost (1997) Fold Des 2:S19-24
B Rost (1999) Protein Eng 12:85-94
© Burkhard Rost (TUM Munich)
another way to do math: HMM -
Hidden Markov
Models82
/134© Burkhard Rost
83SR Eddy (1998) Bioinformatics 755-63
HMM: Hidden Markov Models
Fig. 1. A toy HMM, modeling sequences of as and bs as two regions of potentially different residue composition. The model is drawn (top) with circles for states and arrows for state transitions. A possible state sequence generated from the model is shown, followed by a possible symbol sequence. The joint probability P(x,π|HMM) of the symbol sequence and the state sequence is a product of all the transition and emission probabilities. Notice that another state sequence (1-2-2) could have generated the same symbol sequence, though probably with a different total probability. This is the distinction between HMMs and a standard Markov model with nothing to hide: in an HMM, the state sequence (e.g. the biologically meaningful alignment) is not uniquely determined by the observed symbol sequence, but must be inferred probabilistically from it.
/134© Burkhard Rost
84
ProfileHMM: example footballFC Augsburg 2013/14 Bundesliga Fixtures
Date Status Home Score Away Attendance Competition
Aug 10 FT FC Augsburg 0-4 Borussia Dort. 30,660 Bundesliga
Aug 17 FT Werder Bremen 1-0 FC Augsburg 40,112 Bundesliga
Aug 25 FT FC Augsburg 2-1 VfB Stuttgart 30,030 Bundesliga
Aug 31 FT Nurnberg 0-1 FC Augsburg 37,239 Bundesliga
Sep 14 FT FC Augsburg 2-1 SC Freiburg 28,453 Bundesliga
Sep 21 FT Hannover 96 2-1 FC Augsburg 39,200 Bundesliga
Sep 27 FT FC Augsburg 2-2 Borussia Mon. 30,352 Bundesliga
Oct 5 FT Schalke 04 4-1 FC Augsburg 60,731 Bundesliga
Oct 20 FT FC Augsburg 1-2 VfL Wolfsburg 27,554 Bundesliga
Oct 26 FT Bayer Leverk 2-1 FC Augsburg 27,811 Bundesliga
Nov 3 FT FC Augsburg 2-1 Mainz 28,007 Bundesliga
Nov 9 FT Bayern Munich 3-0 FC Augsburg 71,000 Bundesliga
slide: Marco Punta
/134© Burkhard Rost
85
ProfileHMM: example footballFC Augsburg 2013/14 Bundesliga Fixtures
Date Status Home Score Away Attendance Competition
Aug 10 FT FC Augsburg 0-4 Borussia Dort. 30,660 Bundesliga
Aug 17 FT Werder Bremen 1-0 FC Augsburg 40,112 Bundesliga
Aug 25 FT FC Augsburg 2-1 VfB Stuttgart 30,030 Bundesliga
Aug 31 FT Nurnberg 0-1 FC Augsburg 37,239 Bundesliga
Sep 14 FT FC Augsburg 2-1 SC Freiburg 28,453 Bundesliga
Sep 21 FT Hannover 96 2-1 FC Augsburg 39,200 Bundesliga
Sep 27 FT FC Augsburg 2-2 Borussia Mon. 30,352 Bundesliga
Oct 5 FT Schalke 04 4-1 FC Augsburg 60,731 Bundesliga
Oct 20 FT FC Augsburg 1-2 VfL Wolfsburg 27,554 Bundesliga
Oct 26 FT Bayer Leverk 2-1 FC Augsburg 27,811 Bundesliga
Nov 3 FT FC Augsburg 2-1 Mainz 28,007 Bundesliga
Nov 9 FT Bayern Munich 3-0 FC Augsburg 71,000 Bundesliga
slide: Marco Punta
W
L
D
WW
W
L
LLL
L
L
/134© Burkhard Rost
86
ProfileHMM: example football
slide: Marco Punta
W D LProbabilistic
W D L
/134© Burkhard Rost
87
ProfileHMM: example football
slide: Marco Punta
Our model for Augsburg’s Bundesliga results:
3 states: W, D, L
S(t)=F(S(t-1))
States S connected by probabilities
pij≥0; j
pij=1Σ
/134© Burkhard Rost
88
ProfileHMM: example footballFC Augsburg 2013/14 Bundesliga Fixtures
Date Status Home Score Away Attendance Competition
Aug 10 FT FC Augsburg 0-4 Borussia Dort. 30,660 Bundesliga
Aug 17 FT Werder Bremen 1-0 FC Augsburg 40,112 Bundesliga
Aug 25 FT FC Augsburg 2-1 VfB Stuttgart 30,030 Bundesliga
Aug 31 FT Nurnberg 0-1 FC Augsburg 37,239 Bundesliga
Sep 14 FT FC Augsburg 2-1 SC Freiburg 28,453 Bundesliga
Sep 21 FT Hannover 96 2-1 FC Augsburg 39,200 Bundesliga
Sep 27 FT FC Augsburg 2-2 Borussia Mon. 30,352 Bundesliga
Oct 5 FT Schalke 04 4-1 FC Augsburg 60,731 Bundesliga
Oct 20 FT FC Augsburg 1-2 VfL Wolfsburg 27,554 Bundesliga
Oct 26 FT Bayer Leverk 2-1 FC Augsburg 27,811 Bundesliga
Nov 3 FT FC Augsburg 2-1 Mainz 28,007 Bundesliga
Nov 9 FT Bayern Munich 3-0 FC Augsburg 71,000 Bundesliga
slide: Marco Punta
W
L
D
WW
W
L
LLL
L
L
/134© Burkhard Rost
89
ProfileHMM: example footballFC Augsburg 2013/14 Bundesliga Fixtures
Date Status Home Score Away Attendance Competition
Aug 10 FT FC Augsburg 0-4 Borussia Dort. 30,660 Bundesliga
Aug 17 FT Werder Bremen 1-0 FC Augsburg 40,112 Bundesliga
Aug 25 FT FC Augsburg 2-1 VfB Stuttgart 30,030 Bundesliga
Aug 31 FT Nurnberg 0-1 FC Augsburg 37,239 Bundesliga
Sep 14 FT FC Augsburg 2-1 SC Freiburg 28,453 Bundesliga
Sep 21 FT Hannover 96 2-1 FC Augsburg 39,200 Bundesliga
Sep 27 FT FC Augsburg 2-2 Borussia Mon. 30,352 Bundesliga
Oct 5 FT Schalke 04 4-1 FC Augsburg 60,731 Bundesliga
Oct 20 FT FC Augsburg 1-2 VfL Wolfsburg 27,554 Bundesliga
Oct 26 FT Bayer Leverk 2-1 FC Augsburg 27,811 Bundesliga
Nov 3 FT FC Augsburg 2-1 Mainz 28,007 Bundesliga
Nov 9 FT Bayern Munich 3-0 FC Augsburg 71,000 Bundesliga
slide: Marco Punta
W
L
D
WW
W
L
LLL
L
L
H
A
A
H
H
H
A
H
A
H
A
A
/134© Burkhard Rost
90
ProfileHMM: example football
slide: Marco Punta
W D L
H A
/134© Burkhard Rost
91
ProfileHMM: formalism
slide: Marco Punta
HMMs are probabilistic models defined by:
! A finite set S of states
! A discrete alphabet A of symbols (observed objects)
! A probability transition matrix T=(tij) , i,j states
! A probability emission matrix E=(eix), i state, x symbol
… …
states
symbols
tij tij tij tij
eix eix eix eix
/134© Burkhard Rost
92
ProfileHMM - intro
Profile-HMMs are probabilistic models…
Model -> simulates a system Probabilistic -> produces outcomes based on probabilities
/134© Burkhard Rost
93
ProfileHMM: example 2: traffic light
/134© Burkhard Rost
94
ProfileHMM: example 2: traffic light
R Y G
/134© Burkhard Rost
95
ProfileHMM: example 2: traffic light
R Y G
Deterministic
/134© Burkhard Rost
96
ProfileHMM: example 3: weather
/134© Burkhard Rost
97
ProfileHMM: example 3: weather
S C R
/134© Burkhard Rost
98
ProfileHMM: example 3: weather
S C R
Probabilistic
*In fact, chaotic, deterministic
*
/134© Burkhard Rost
99
ProfileHMM: example 3: weather
Probabilistic
pijTransition probability from symbol i to symbol j
*
*In fact, chaotic, deterministic
/134© Burkhard Rost
100
Probabilistic, 1st order Markov model
Example 2: The weather
P(Xn+1=x|X1=x1,…,Xn=xn)=P(Xn+1=x|Xn=xn)
© Burkhard Rost (TUM Munich)
HMM for alignment
101
/134© Burkhard Rost
102M Zvelebil & JO Baum (2008) Understanding Bioinformatics, Garland
Generic Profile-HMM for alignment
M1 M2 M3 M4
I0 I1 I3I2 I4
D1 D4D3D2
START END
◦ Captures matches, insertions and deletions
◦ Transition and emission probabilites
◦ Gap penalty handled by variation of transition probabilities
◦ Calculation of probability by multiplication of path variables
slide: Theresa Wirth
/134© Burkhard Rost
consider residue position i BEFORE any amino acid is aligned,
103
Entropy in alignment
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPIQRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30
CMYFPIVLHQARETKGSDN
31
MFYHIPVTNDLGQSAERK
32
MFYHIPVTNDLGQSAERK
33
WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37MYCHFPI
QRVLDEKNGTAS
© Kevin Karplus UCSC
/134© Burkhard Rost
consider residue position i BEFORE any amino acid is aligned, we expect a particular acid according to some prior or background probability, P0, with entropy H0
104
Entropy in alignment
© Kevin Karplus UCSC
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPIQRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30
CMYFPIVLHQARETKGSDN
31
MFYHIPVTNDLGQSAERK
32
MFYHIPVTNDLGQSAERK
33
WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37MYCHFPI
QRVLDEKNGTAS
/134© Burkhard Rost
consider residue position i BEFORE any amino acid is aligned, we expect a particular acid according to some prior or background probability, P0, with entropy H0 now consider same column AFTER alignment posterior probability Pi + priors -> Hi
105
Entropy in alignment
© Kevin Karplus UCSC
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPIQRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30
CMYFPIVLHQARETKGSDN
31
MFYHIPVTNDLGQSAERK
32
MFYHIPVTNDLGQSAERK
33
WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37MYCHFPI
QRVLDEKNGTAS
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPIQRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30CMYFPIVLHQARETKGSDN
31MFYHIPVTNDLGQSAERK
32MFYHIPVTNDLGQSAERK
33WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37
MYCHFPI
QRVLDEKNGTAS
/134© Burkhard Rost
consider residue position i BEFORE any amino acid is aligned, we expect a particular acid according to some prior or background probability, P0, with entropy H0 now consider same column AFTER alignment posterior probability Pi + priors -> Hi if position i conserved: Hi =?
106
Entropy in alignment
© Kevin Karplus UCSC
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPIQRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30
CMYFPIVLHQARETKGSDN
31
MFYHIPVTNDLGQSAERK
32
MFYHIPVTNDLGQSAERK
33
WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37MYCHFPI
QRVLDEKNGTAS
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPIQRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30CMYFPIVLHQARETKGSDN
31MFYHIPVTNDLGQSAERK
32MFYHIPVTNDLGQSAERK
33WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37
MYCHFPI
QRVLDEKNGTAS
/134© Burkhard Rost
consider residue position i BEFORE any amino acid is aligned, we expect a particular acid according to some prior or background probability, P0, with entropy H0 now consider same column AFTER alignment posterior probability Pi + priors -> Hi if position i conserved: Hi -> 0 if position i completely varied: Hi=?
107
Entropy in alignment
© Kevin Karplus UCSC
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPIQRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30
CMYFPIVLHQARETKGSDN
31
MFYHIPVTNDLGQSAERK
32
MFYHIPVTNDLGQSAERK
33
WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37MYCHFPI
QRVLDEKNGTAS
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPIQRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30CMYFPIVLHQARETKGSDN
31MFYHIPVTNDLGQSAERK
32MFYHIPVTNDLGQSAERK
33WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37
MYCHFPI
QRVLDEKNGTAS
/134© Burkhard Rost
consider residue position i BEFORE any amino acid is aligned, we expect a particular acid according to some prior or background probability, P0, with entropy H0 now consider same column AFTER alignment posterior probability Pi + priors -> Hi if position i conserved: Hi -> 0 if position i completely varied: Hi -> H0
108
Entropy in alignment
© Kevin Karplus UCSC
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPIQRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30
CMYFPIVLHQARETKGSDN
31
MFYHIPVTNDLGQSAERK
32
MFYHIPVTNDLGQSAERK
33
WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37MYCHFPI
QRVLDEKNGTAS
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPIQRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30CMYFPIVLHQARETKGSDN
31MFYHIPVTNDLGQSAERK
32MFYHIPVTNDLGQSAERK
33WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37
MYCHFPI
QRVLDEKNGTAS
/134© Burkhard Rost
consider residue position i BEFORE any amino acid is aligned, we expect a particular acid according to some prior or background probability, P0, with entropy H0 now consider same column AFTER alignment posterior probability Pi + priors -> Hi if conserved: Hi -> 0; if varied: Hi -> H0 Hi-H0 reflects the “bits saved” by the alignment
109
Entropy in alignment
© Kevin Karplus UCSC
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPIQRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30
CMYFPIVLHQARETKGSDN
31
MFYHIPVTNDLGQSAERK
32
MFYHIPVTNDLGQSAERK
33
WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37MYCHFPI
QRVLDEKNGTAS
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPIQRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30CMYFPIVLHQARETKGSDN
31MFYHIPVTNDLGQSAERK
32MFYHIPVTNDLGQSAERK
33WHQYPMERDNKFGTSIVLAC
34
CMYFPHIVDGTNSALQEKR
35
WHQYPMERDNKFGTSIVLAC
36
CPMDQNGEWRTKSAIHVLFY
37
MYCHFPI
QRVLDEKNGTAS
/134© Burkhard Rost
• few members / little divergence entropy dominated by priors -> the background signal dominates
110
0
1
2
bits
sav
ed
1 2CPQNDHWERKTMSGAVIYLF
3HCYMFPQRDEGNKILVAST
4CMYFPIVLHQARETKGSDN
5HQNPDCYREGKMSFTALIV
6MYCHFPIQRVLDEKNGTAS
7WHQYPMERDNKFGTSIVLAC
8HCYMFPQRDEGNKILVAST
9HCYMFPQRDEGNKILVAST
10
MYCHFPI
QRVLDEKNGTAS
11
MFYHIPVTNDLGQSAERK
12
MFYIH
PVGLTNRSAQKDE
13
WHQYPMERDNKFGTSIVLAC
14
PCHNDMQERKTSVIA
GLFYW
15
MYCHFPI
QRVLDEKNGTAS
16
HQNPDCYREGKMSFTALIV
17
WHQYPMERDNKFGTSIVLAC
18
FYMPIHVGTNLDSARKEQ
19
CMYFPHIVDGTNSALQEKR
20
WHNPDCQEGYRSKTAMFVIL
21
CMPIVTFDGLQSAKRENYH
22
CMYFPIVLHQARETKGSDN
23
HCYMFPQRDEGNKILVAST
24
MYCHFPI
QRVLDEKNGTAS
25
CMYFPHIVDGTNSALQEKR
26
WMCYHFPQI
RVELTDKNSAG
27
MFYHIPVTNDLGQSAERK
28
WHQYPMERDNKFGTSIVLAC
29
CHPDNYEGQRSKTFAVIML
30
CMYFPIVLHQARETKGSDN
31
MFYHIPVTNDLGQSAERK
32
MFYHIPVTNDLGQSAERK
33
WHQYPMERDNKFGTSIVLAC
34CMYFPHIVDGTNSALQEKR
35WHQYPMERDNKFGTSIVLAC
36CPMDQNGEWRTKSAIHVLFY
37MYCHFPI
QRVLDEKNGTAS
Alignment entropy for small families
0
1
2
bits
sav
ed
1MHYFQPIN
RGDKELVAST
2MFYHIVPLNGTQDSAERK
3CMYHFIQNRVTLDGKESAP
4CMPQDNGHWERKTSIAVLFY
5MFYHI
PVNGDTLQSAEKR
6WMCFYIHVQLPRTKENDSAG
7MIPFVQTYGNLDARSEKH
8MFYHI
PVNGDTLQSAEKR
9CDQNPHGEKRMWSTAVIYLF
10
MHYFQPINRGDKELVAST
11
MFYHIV
PLNGTQDSAERK
12
MFYHIVLPGNTRQSAKDE
13
FIYHVPLQRATKESGDN
14
WHCNDQGPYRMKESFTALIV
15
MFYHI
PVNGDTLQSAEKR
16
WHCNDQGPREKYSMTFAVLI
17
CWHNDQGPERSKYTMAFVIL
18
MFYHIVLPGNTRQSAKDE
19
MFHYIQVLPRNKDEGTAS
20
CPMQHDNKEGRTSAIVLYFW
21
CDQNPHGEKRMWSTAVIYLF
22
HCMYFNQRPDIKTELSGVA
23
MFYHIV
PLNGTQDSAERK
24
FIYHVPLQRATKESGDN
25
WHCNDQGPREKYSMTFAVLI
26
MFYHIVLPGNTRQSAKDE
27
FIYHVPLQRATKESGDN
28
CMYHFI
QNRVTLDGKESAP
29
CMPQDNGHWERKTSIAVLFY
30CWHNDQGPERSKYTMAFVIL
31MFIYHVLPRQTAKGSNED
32MHYFQPINRGDKELVAST
33
MFYHIV
PLNGTQDSAERK
34
WMCFYIHVQLPRTKENDSAG
35
CWHNDQGPERSKYTMAFVIL
36
MFYHIVLPGNTRQSAKDE
37
FIYHVPLQRATKESGDN
38
CWHNDQGPERSKYTMAFVIL
39
CHNDPGQYREKSTFAIVLM
40
MFYHIV
PLNGTQDSAERK
41
FIYHVPLQRATKESGDN
42
MHYFQPINRGDKELVAST
0
1
2
3
4
bits
sav
ed
1HVLPTNGQASDREK
2LDPATNSEQGRK 3IYVHPLTANGDSQEKR
4YIVPHDTLSANGEQKR
5PGK 6FMYIPVHDGTNLSAEQKR
7GYEFKRQSIAVLT
8SIVTAL 9PQGMNEKRSHWTAVLIYF
10
LHQRAGPNKDEST
11
PGQNTADERSK
12
ASE
13
FPYGNIDRSTLVKEAQ
14
QEYFSKARIVTL
15
QEVAL
16
QFKSRTIVLAE
17
CTYAFVMIL
18
HIPGVLTSNADQRKE
19
HPLGTQDNSAERK
20
AVLE
21
RTSHAMVIWLYF
22
AL
23
REAVKFL
24
PRAGQKEHDTSN
25
AIVL
26
LDTRASEK
27 28
EANSQKRP
29
MPDQGHENSI
RATVFLKY
30
NGQEKRYSMTFAVIPL
31
HPQRKAEGNDTS
32
SEIAVLR
33
RSAE
34
PMIGNLVSTADQKRE
35
DPHGMNYFQESTAVILKR
36
KRTVAILE
37
GVNTDLRSKAQE
38
CKSYTAFMVIL
39
HYFNDQCPRMKEILTVGSA
40
TDNVLSQRAEK
41
VEKASL
42
NPGQEKCRYMSFAIVTL
© Kevin Karplus UCSC
/134© Burkhard Rost
111
SAM-T98: Build alignment
© Kevin Karplus UCSC
Reestimate the alignment with the new homologs
Use the model to search for additional homologs
Build a model from the sequence or alignment
SAM-T98 Alignment Building
(Iterations 1 - 3)
Start: a single sequence
End: a SAM-T98 alignment(Iteration 4)
/134© Burkhard Rost
PSI-BLAST fast, partial dynamic programming SF Altschul (1997) NAR 25:3389-3402
ClustalW/ClustalX slow, dynamic programming, for experts JD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80
SAM/HMMer slow, need preprocess, HMM (statistics), very accurate R Hughey & A Krogh (1996) CABIOS 12:95-107/ S Eddy (1998) Bioinformatics 14:755-63
T-Coffee much slower, requires preprocessing, Genetic Algorithm Cedric Notredame, DG Higgins, Jaap Heringa (2000) JMB 302:205-17
112
Sequence-profile methods
© Burkhard Rost (TUM Munich)
Genetic Algorithm for alignment
113
/134© Burkhard Rost
114
Independence assumption
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
dynamic programming
(smith-waterman) PSI-BLAST sequence-sequence
or
sequence-profile
SAM
HMMer
/134© Burkhard Rost
115
Independence assumption
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
dynamic programming
(smith-waterman) PSI-BLAST sequence-sequence
or
sequence-profile
SAM
HMMer
ALL assume that alignment at position i independent of alignment at position j
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
E E R E E D
K K K E E R
© Burkhard Rost /134
Genetic algorithm does not make the
independence assumption
116
© Burkhard Rost /134
Genetic algorithm operates on segments
117
/134© Burkhard Rost
118
Genetic algorithm - concept
The 2006 NASA ST5 spacecraft antenna. This complicated shape was found through a genetic algorithm optimizing the radiation
pattern. It is known as an Evolved antenna.
© Wikipedia
http://en.wikipedia.org/wiki/Space_Technology_5http://en.wikipedia.org/wiki/Evolved_antenna
/134© Burkhard Rost
119
Genetic algorithm - concept
The 2006 NASA ST5 spacecraft antenna.
This complicated shape was found through a genetic algorithm optimizing the
radiation pattern. It is known as an Evolved
antenna.
© W
ikipe
dia
Initialize population
Compute fitness
STOP
? • roulette parents
• cross-over to children
• mutate children
• compute fitness
START
END
Crossover
parents
child
mutation
http://en.wikipedia.org/wiki/Space_Technology_5http://en.wikipedia.org/wiki/Evolved_antennahttp://en.wikipedia.org/wiki/Evolved_antenna
/134© Burkhard Rost
Begin with “library” of local and global pairwise alignments
120© Cedric Notredame CRG
T-Coffee Genetic algorithm (GA)
/134© Burkhard Rost
Local Alignment Global Alignment
Extension
Multiple Sequence Alignment
121
T-Coffee: Mix local and global alignment
© Cedric Notredame, CRG Barcelona
/134© Burkhard Rost
Local Alignment Global Alignment
Multiple Sequence Alignment
Multiple Alignment
StructuralSpecialist
122
T-Coffee: Use more information
© Cedric Notredame, CRG Barcelona
/134© Burkhard Rost
PSI-BLAST fast, partial dynamic programming SF Altschul (1997) NAR 25:3389-3402
ClustalW/ClustalX slow, dynamic programming, for experts JD Thompson, DG Higgins, TJ Gibson (1994) NAR 22:4673-80
MaxHom relatively slow, dynamic programming, good first guess C Sander & R Schneider (1991) Proteins 9:56-69
T-Coffee much slower, requires preprocessing, Genetic Algorithm Cedric Notredame, DG Higgins, Jaap Heringa (2000) JMB 302:205-17
SSEARCH/PSI-Search SSEARCH - similar to MaxHom with SW only
PSI-Search: iterated SSEARCH W Liu, H McWilliam, M Goujon, A Cowley, R Lopez, WR Pearson (2013) Bioinformatics 28:1650-1
123
Sequence-profile methods
/134© Burkhard Rost
124
Sequence-profile comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
YDFHGVGEDDISIKRG
/134© Burkhard Rost
125
Zones
Day
light
Zon
e
Twili
ght Z
one
Mid
nigh
t Zon
e
profile - profile
sequence - profilesequence - sequence
sequ
ence
sim
ilar
->
stru
ctur
e sim
ilar
B Rost (1997) Fold Des 2:S19-24
B Rost (1999) Protein Eng 12:85-94
/134© Burkhard Rost
pairwise multiple sequence-profile ?
126
evolution of alignment methods
© Burkhard Rost (TUM Munich)
Profile-profile alignments
127
/134© Burkhard Rost
128
Profile-profile comparison
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF
1 50fyn_human VTLFVALYDY EARTEDDLSF HKGEKFQILN SSEGDWWEAR SLTTGETGYIyrk_chick VTLFIALYDY EARTEDDLSF QKGEKFHIIN NTEGDWWEAR SLSSGATGYIfgr_human VTLFIALYDY EARTEDDLTF TKGEKFHILN NTEGDWWEAR SLSSGKTGCIyes_chick VTVFVALYDY EARTTDDLSF KKGERFQIIN NTEGDWWEAR SIATGKTGYIsrc_avis2 VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_aviss VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_avisr VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIsrc_chick VTTFVALYDY ESRTETDLSF KKGERLQIVN NTEGDWWLAH SLTTGQTGYIstk_hydat VTIFVALYDY EARISEDLSF KKGERLQIIN TADGDWWYAR SLITNSEGYIsrc_rsvpa .......... ESRIETDLSF KKRERLQIVN NTEGTWWLAH SLTTGQTGYIhck_human ..IVVALYDY EAIHHEDLSF QKGDQMVVLE ES.GEWWKAR SLATRKEGYIblk_mouse ..FVVALFDY AAVNDRDLQV LKGEKLQVLR .STGDWWLAR SLVTGREGYVhck_mouse .TIVVALYDY EAIHREDLSF QKGDQMVVLE .EAGEWWKAR SLATKKEGYIlyn_human ..IVVALYPY DGIHPDDLSF KKGEKMKVLE .EHGEWWKAK SLLTKKEGFIlck_human ..LVIALHSY EPSHDGDLGF EKGEQLRILE QS.GEWWKAQ SLTTGQEGFIss81_yeast.....ALYPY DADDDdeISF EQNEILQVSD .IEGRWWKAR R.ANGETGIIabl_mouse ..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVabl1_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YnnGEWCEAQ ..TKNGQGWVsrc1_drome..VVVSLYDY KSRDESDLSF MKGDRMEVID DTESDWWRVV NLTTRQEGLImysd_dicdi.....ALYDF DAESSMELSF KEGDILTVLD QSSGDWWDAE L..KGRRGKVyfj4_yeast....VALYSF AGEESGDLPF RKGDVITILK ksQNDWWTGR V..NGREGIFabl2_human..LFVALYDF VASGDNTLSI TKGEKLRVLG YNQNGEWSEV RSKNG.QGWVtec_human .EIVVAMYDF QAAEGHDLRL ERGQEYLILE KNDVHWWRAR D.KYGNEGYIabl1_caeel..LFVALYDF HGVGEEQLSL RKGDQVRILG YNKNNEWCEA RlrLGEIGWVtxk_human .....ALYDF LPREPCNLAL RRAEEYLILE KYNPHWWKAR D.RLGNEGLIyha2_yeastVRRVRALYDL TTNEPDELSF RKGDVITVLE QVYRDWWKGA L..RGNMGIFabp1_sacex.....AEYDY EAGEDNELTF AENDKIINIE FVDDDWWLGE LETTGQKGLF