+ All Categories
Home > Documents > Edward N. Trifonov University of Haifa and Masaryk University, Brno

Edward N. Trifonov University of Haifa and Masaryk University, Brno

Date post: 11-Feb-2016
Category:
Upload: kato
View: 62 times
Download: 0 times
Share this document with a friend
Description:
Edward N. Trifonov University of Haifa and Masaryk University, Brno Thrill of linking polymer statistics and sequence space with protein structure and function Oak Ridge, 2009. Two related sequences, aligned - PowerPoint PPT Presentation
Popular Tags:
54
Edward N. Trifonov University of Haifa and Masaryk University, Brno hrill of linking polymer statistics and sequence space with protein structure and function Oak Ridge, 2009
Transcript
Page 1: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Edward N. Trifonov University of Haifa and

Masaryk University, Brno

Thrill of linking polymer statistics and sequence space

with protein structure and function

Oak Ridge, 2009

Page 2: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Two related sequences, aligned

33% match

Q816J5DVNLPKFDGFYWCRQIRHESTCPIIFISARAGEMEQIMAIESGADDYITKPFHYDVVMAKIKGQLRR|||||-|||----|--|--|----------------------||||---|||------|-----|||DVNLPGIDGWDLLRRLRERSSARVMMLTGHGRLTDKVRGLDLGADDFMVKPFQFPELLARVRSLLRRQ7DCC5

Page 3: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Methyltransferases

LEVALALSQADIIVRDALVS Q8UBQ7| | || ||| || |||| LHAANALRQADVIVHDALVN Q92P47| | | ||||||||||LRAQRVLMEADVIVHDALVP Q8YEV9||| | ||||||||||||||LRAHRLLMEADVIVHDALVP Q98GP6| ||| ||||| LKGQRLLQEADVILYADSLV Q8DLD2 |||| ||||| || |||IKGQRIVKEADVIIYAGSLV Q8REX7 |||| ||||||||| VKGQRLIRQCPVIIYAGSLV Q88HF0| | || ||| ||||||VRGRDLIAACPVCLYAGSLV Q8UBQ5

Page 4: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

LEVALALSQADIIVRDALVS Q8UBQ7 VRGRDLIAACPVCLYAGSLV Q8UBQ5

No-match relatives

Page 5: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Response regulators

CPIIFISARAGEMEQIMAIE Q816J5 |||||||| | | ||||VPIIFISARDSDMDQVMAIE Q97IX4 || ||||||| | | | |VPVIFISARDADIDRVLGLE O32192 || | |||| ||||||||VPILFLSARDEEIDRVLGLE Q89D26 || | || || | |||||IPIIMLTARSEEFDKVLGLE Q8R9H7 | |||||| ||| |||SRIMMLTARSRLADKVRGLE Q88RT2 | |||| || ||||||ARVMMLTGHGRLTDKVRGLD Q7DCC5

No-match relatives

CPIIFISARAGEMEQIMAIE Q816J5ARVMMLTGHGRLTDKVRGLD Q7DCC5

Page 6: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Existing most advanced sequence alignment techniques

(e. g. BLAST)would not be able to qualify

such fully dissimilar sequence fragments as relatives

unless many intermediate sequences are analyzed

(that amounts to a whole research project)

Page 7: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

To be related

the sequences

do not have to be similar

(upto even complete mismatch)

Page 8: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

single walk

network (of relatives)

Page 9: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

One can make long

walks from fragment to fragment in the

formatted protein sequence space(sequence fragments of the same length, 20 residues,

gathered from all or many proteomes)

Pair-wise connected matching fragments make also

networks

Page 10: Edward N. Trifonov University of Haifa and       Masaryk University, Brno
Page 11: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Networks of fragments of aa-tRNA synthetases

at various thresholds of sequence match

A tyr trp B met C arg trp D cys E leu F met leu ile val G ile H lepA

Aa-tRNA synthasemodule of lepA

Page 12: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Network of GTP binding proteins

Sequence fragments with the same function are found in the same network

←GTP-binding module of lepA

Page 13: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

1mh1 Rac (GTP-binding) (Homo sapiens)2 26QAIKCVVVGDGAVGKTCLLISYTTN | || | AGDVISIIGSSGSGKSTFLRCINFL31 551b0ua ATP-binding subunit of the histidine permease {Salmonella typhimurium}

Page 14: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

1 Putative peptidoglycan bound protein2 Collagen adhesion protein3 Ribosomal protein L114 Penicillin-binding protein 2x5 Penicillin-binding protein 16 Penicillin binding protein 2A 7 D-alanyl-D-alanine carboxypeptidase

8 cytochrome

9 Beta-Lactamase 10 Mannitol-1-phosphate 5-dehydrogenase 11 glutaminase 12 Beta-lactamase 13 Esterase EstB

Fragments of the same network have, essentially, the same structure.Periferal fragments may be different

Page 15: Edward N. Trifonov University of Haifa and       Masaryk University, Brno
Page 16: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

New definition of sequence relatedness:

Sequence fragments of the same network in the sequence space

are relatives

They may be rather different sequence-wise.Yet, their functions and structures

are, essentially, the same

Page 17: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Every fragment is tagged (protein, species)

It is also uniquely located in it´s family network.

The size of the network says

how many relatives the fragment has

Thus, one can take a sequence and for all fragments of it

find their networks and plot the sizes

Page 18: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

This generates the maps of modules from which the protein is built

for example:Modules of histidine permease, ATP binding subunit

(ABC transporter)

Page 19: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

ABC transporters

GPS (Aleph) LTA (Dalet) LSG, LAD (Beth) IYV (Zayin)

(36) GPSGSGKsTmL (38) fVFQqfnLiPlLTALENV (40) QLSGGQQQRVAIARAL(6)iLADEPTgALD (22) vvVTHDi (30) 1F3O

(32-72)GPSGSGKTTLL(29-41)MVFQNYALFPHLTALENV(31-42)QLSGGQQQRVAIARAL(6 LLADEPTSALD(21-22)IYVTHDQ(28-263) consensus

Page 20: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Modules of TIM-barrell protein

Page 21: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Modules of chemotaxis protein cheY

Page 22: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Modules of cytidylate kinase

Page 23: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

When long sequences are compared it is worth first to identify which segments are more

informative.

This is done bymapping of the modules.

Page 24: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Specific functions of individual module types are largely not known yet.

Since, however, they represent wide-spread, conserved and, thus, functional motifs,

their individual roles will have to be eventually elucidated.

One peculiar class of modules are so-called „silent modules“, which have only few relatives in the sequence space, if at all.

Page 25: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

A silent modules 1-3 D IVLLVGPSGSGKTTLLRALAGLLGPDGG RRGIGMVFQEYALFPHLTVLENVALGL | ||||| | || | | | | |||| | | ||||||

VISIIGSSGSGKSTFLRCINFLEKPSEGSIVVNGQTINLVRDKDGQLKVADKNQLRLLRTRLTMVFQHFNLWSHMTVLENVMEAP 1 | ||||| | || | || || | || | | | |||| | |||| |

FMILLGPSGCGKTTTLRMIAGLEEPSRG---QIYIGDRLVADPEKGIFVPPK------DRDIAMVFQSYALYPHMTVYDNIAFPL 2| ||||||| | |||||||| | | || | |||||||||||| | | | |

FVVFVGPSGCGKSTLLRMIAGLETITSG---------DLFIGEKRMNDTPPA------ERGVGMVFQSYALYPHLSVAENMSFGL 3

DA

A D A D

A

A

D

D

silent module 1 silent module 3 silent module 2

AD

12 3

The silent modules appear to maintain 3D structural relationships between functionall modules

Page 26: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

The list of modules revealed in the map for a given protein sequence,

with reference to corresponding (characterized) networks

of the pre-calculated sequence space

provides full annotation of the protein

Page 27: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Protein sequence characterization via networks in the sequence space

does not require

gap penalties, nor substitution matrices, nor statistics of alignment.

Every sequence fragment of interest may belong to one and only one network.

Page 28: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Descriptive definition of protein modules:

Their sequences are represented by networks in the protein sequence space -separate network (or group of related networks) for each module.

Each module has its own unique structure. Typically, these are closed loops of the contour length 25-30 residues.

Apart from general activity ascribed to the protein that harbors given module,

each module type has its own specific function.

Individual modules even of the same type are sequence-wise often different.

Their evolution from ancestral prototypes may be traced along walks and networks in the sequence space.

Page 29: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Examples ofevolutionary paths

Page 30: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

KVALVGRSGSGKTTVTSLLMFIAVEGIDGAGKTTLAKSLS GxxxxGKT - Walker A motif (NTP binding)

Page 31: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

MOST COMMON PROTEIN SEQUENCE MODULES (PROTOTYPES)

Aleph GEIVLLVGPSGSGKTTLLRALAGLLGPDGG

Beth LSGGQRQRVAIARALALEPKLLLLDEPTSALD

Gimel DVVVIGAGGAGLAAALALARAGAKVVVVE

Dalet RRGIGMVFQEYALFPHLTVLENVALGL

Heh PVIMLTARGDEEDRVEALLEAGADDYLTKPF

Vav LLGLSKKEARERALELLELVGLEEKADRYP

Zayin LLLKLLKELGLTVLLVTHDLEEA Berezovsky et al. 2000-2003

The underlined motifs are omnipresent

Page 32: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Omnipresent 6-9 mers of 15 prokaryotes from different phyla

ALEPH ATP/GTP binding

1 HVDHGKTTL 2 GPPGTGKT 3 GHVDHGKT 4 GSGKTTLL 5 IDTPGHV 6 GPSGSGK 7 PTGSGKT 8 NGSGKTT 9 GKSTLLN10 SGSGKT11 TGSGKS12 PGVGKT13 PNVGKS14 GVGKTT15 GTGKTT16 DHGKST17 GKTTLA18 GKTTLV19 KSTLLK

BETH ATPases of ABC transporters

20 QRVAIARAL 21 LSGGQQQRV22 LADEPT23 TLSGGE

Other omni: 24 FIDEID 25 KMSKSL 26 WTTTPWT 27 NADFDGD

Omnipresence is a new measure of sequence conservation.These elements are the most conserved ones,

coming, presumably from last common ancestor

Page 33: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Many of the 27 omnipresent elements do not match to one another

(e. g. WTTTPWT and QRVAIARAL)

yet, they turn out to belong to the same network.

Page 34: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

ALEPH and BETH reconstructed

from overlapping omnipresent motifs turn out to be relatives,though they do not match:

IDTPGHVDHGKTTLLN ALEPH | TLSGGQQQRVAIARAL BETH

They both belong to 10% monster network.

All 27 omnipresent elements belong to the same network

Page 35: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

10% MONSTER network (107 fragments)

Page 36: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Sequence space based evolutionary tree of omnipresent elements

Page 37: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

All 27 omnipresent LUCA motifs originate from one prototype sequence, which is:

(now skipping separate two two-hour lectures)

Ala Ala Ala Ala Gly Ala Ala Gly Gly Ala Gly Gly Gly Gly

encoded in

GCC GCC GCC GCC GGC GCC GCC GGC GGC GCC GGC GGC GGC GGC

which is self-complementary:

GCC GCC GCC GCC GGC GCC GCC GGC GGC GCC GGC GGC GGC GGC GCC GCC GCC GCC GGC GCC GCC GGC GGC GCC GGC GGC GGC GGC

Page 38: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

The very first gene was a short duplex,

encoding the same thing in both strands

Page 39: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

TO CONCLUDE:

Proteins are made from standard size modules of many types.

Each type has its unique structure and function, but highly variable sequence

All current protein science turns inside out:Protein world is world of modules

Page 40: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Every breakthrough that opens new vistas also removes the ground from under the feet of other scientists.

The scientific joy of those who have seen the new light is accompanied by the dismay of those whose way of life has been changed for ever. Fersht A, Nature Rev Mol Cell Biol, 2008

Page 41: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Major references:

Papers of Igor N. Berezovsky, Zakharia M. Frenkel, Yehoshua Sobolevsky and E.N.T. 2006-2009

Page 42: Edward N. Trifonov University of Haifa and       Masaryk University, Brno
Page 43: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

THANKS TONetworks - Zacharia M. Frenkel University of Haifa

Omnipresent motifs - Yehoshua Sobolevsky University Minas Gerais, Brazil

Modules – closed loops – Igor N. Berezovsky University of Bergen, Norway

AND TO THE AUDIENCE

Support by: Israeli Science Foundation, Center of Complexity Science, andMasaryk University, Brno

Page 44: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Changing gears:

Reconstruction of evolutionary history of the triplet code (Trifonov 2000-2003) suggests that the earliest protein sequences could be presented in the binary alphabet of two types of amino acids –

those encoded by xYx triplets (Ala family, A) andthose encoded by xRx triplets (Gly family, G).

Page 45: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

EVOLUTION OF THE TRIPLET CODE E. N. Trifonov, December 2007, Chart 101 Consensus temporal order of amino acids: UCX CUX CGX AGY UGX AGR UUY UAX Gly Ala Asp Val Ser Pro Glu Leu Thr Arg Ser TRM Arg Ile Gln Leu TRM Asn Lys His Phe Cys Met Tyr Trp Sec Pyl

1 GGC-GCC . . . . . . . . . . . . . . . . . | . . . . . . . . 2 | | GAC-GUC . . . . . . . . . . . . . . . | . . . . . . . .

3 GGA--|---|---|--UCC . . . . . . . . . . . . . . | . . . . . . . . 4 GGG--|---|---|---|--CCC . . . . . . . . . . . . . | . . . . . . . . 5 | | (gag)-|---|---|--GAG-CUC . . . . . . . . . . . | . . . . . . . . 6 GGU--|---|---|---|---|---|---|--ACC . . . . . . . . . . | . . . . . . . . 7 . GCG--|---|---|---|---|---|---|--CGC . . . . . . . . . | . . . . . . . . 8 . GCU--|---|---|---|---|---|---|---|--AGC . . . . . . . . | . . . . . . . . 9 . GCA--|---|---|---|---|---|---|---|---|--ugc . . . . . . . | . . UGC . . . . . 10 . . | | | CCG--|---|---|--CGG | | . . . . . . . | . . | . . . . . 11 . . | | | CCU--|---|---|---|---|---|--AGG . . . . . . | . . | . . . . . 12 . . | | | CCA--|---|---|---|---|--ugg | . . . . . . | . . | . . UGG . . 13 . . | | UCG------|---|---|--CGA | | | . . . . . . | . . | . . . . . 14 . . | | UCU------|---|---|---|---|---|--AGA . . . . . . | . . | . . . . . 15 . . | | UCA------|---|---|---|---|--UGA . . . . . . . | . . | . . . UGA . 16 . . | | . . | | ACG-CGU | | . . . . . . . | . . | . . . . . 17 . . | | . . | | ACU-----AGU | . . . . . . . | . . | . . . . . 18 . . | | . . | | ACA---------ugu . . . . . . . | . . UGU . . . . . 19 . . GAU--|-----------|---|----------------------AUC . . . . . | . . . . . . . . 20 . . . GUG----------|---|-----------------------|--cac . . . . |CAC . . . . . . . 21 . . . | . . | CUG----------------------|--CAG . . . . | | . . . . . . .

22 . . . | . . | | . . . . . aug-cau . . . . |CAU . . AUG . . . . 23 . . . | . . GAA--|-----------------------|---|--uuc . . . | . UUC . . . . . . 24 . . . GUA--------------|-----------------------|---|---|--uac . . | . | . . UAC . . . 25 . . . | . . . CUA----------------------|---|---|--UAG . . | . | . . | . . UAG26 . . . GUU--------------|-----------------------|---|---|---|--AAC . | . | . . | . . . 27 . . . . . . . CUU----------------------|---|---|---|---|--AAG| . | . . | . . . 28 . . . . . . . . . . . . . | CAA-UUG | | | | . | . . | . . .29 . . . . . . . . . . . . . AUA------|--uau | | | . | . . UAU . . .30 . . . . . . . . . . . . . AUU------|---|--AAU | | . | . . . . . .31 . . . . . . . . . . . . . . . UUA-UAA | | . | . . . . . .32 . . . . . . . . . . . . . . . uuu---------AAA| . UUU . . . . . .

CONSECUTIVE ASSIGNMENT OF 64 TRIPLETS CODON CAPTURE aa "age": 17 17 16 16 15 14 13 13 12 11 10 9 8 7 6 5 4 3 2 1

Page 46: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

A F I L M P T V|C D E G H K N Q R W Y A 1 1 | 1 4 F | I 1 1 3| Ala L 1 3 1| alphabet M 1 3 1| P 1 | T 1 | V 3 1 1 |_____________________ C | D | 3 2 1 E | 3 1 2 G 1 | Gly H | 2 3 1 alphabet K | 1 2 N | 2 1 2 1 Q | 1 2 3 1 R | 1 2 1 1 W | 1 2 Y 4 | 2

Rearranged PAM120 substitution matrix (original matrix in Altschul SF, JMB 219, 555, 1991)

The conclusion about two alphabets is strongly supported by respective

rearrangements of substitution matrices:

Page 47: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

A F I L M P T V|C D E G H K N Q R W Y A | F | 1 3 I 2 1 3| Ala L 2 2 1| alphabet M 1 2 1| P | T | V 3 1 1 |_____________________ C | D | 2 1 E | 2 1 2 G | Gly H | 1 2 alphabet K | 1 1 2 N | 1 1 Q | 2 1 1 R | 2 1 W 1 | 2 Y 3 | 2 2

Rearranged BLOSUM substitution matrix (original matrix in Henikoff S, Henikoff JG, PNAS 89, 10915,1992)

Page 48: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Rewriting modern amino acid sequence in the binary form

would suggest what was the ancestral form of that sequence,

all the way to original Alanines and Glycines only

Page 49: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

AGAAGGAGGGGAAAAG ++-+-+++++++++ AASGGGGGGAAAAGAA

In binary form ALEPH and BETH are rather similar

Compare to

IDTPGHVDHGKTTLLN + TLSGGQQQRVAIARAL

Page 50: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

According to the same theory (reconstruction of evolutionary history of the triplet code)

the earliest proteins have been encoded in both strands of the genes-duplexes,so that the xYx codons of one strand would be complementary to xRx codons of another strand.

Remarkably, the above ALEPH and BETH are, indeed, complementary:

ALEPH AGAAGGAGGGGAAAAG |||||||||||-

BETH AASGGGGGGAAAAGAA

Page 51: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Two most widespread modules ALEPH and BETH, apparently,represent the earliest duplex gene that encoded in the earliest past two vitally important activities involved in energy supply (ATP binding and ATP-ase).

Today the module ALEPH is located in a variety of enzymes that require ATP, including the most ancient ones:

1. ABC transporters,2. cell division proteins (proteases),3. initiation and4. elongation translation factors.

Other most ancient enzymes are

5. RNA polymerase and6. Amino acyl tRNA synthetase

Page 52: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Proteases (cell division proteins FtsH)

GPP (Aleph) FVE FID

(197) LLVGPPGTGKTLLARAVAGEA(7)SGSDFVELFVGVGAARVRD(9)PCIVFIDEIDAVGR (10) 2CEA

(146-463)LLVGPPGTGKTLLARAVAGEA(7)SGSDFVEMFVGVGASRVRD(9)PCIIFIDEIDAVGR(7-11) consensus

DER RPG

DEREQTLNQLLVEMDGF(8)MAATNRPDILDPALLRPGRFDKK (297) 2CEA

DEREQTLNQLLVEMDGF(8)IAATNRPDxLDPALLRPGRFDRQ (95-415) consensus

- another example of the omnipresent cassette

Page 53: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

Omnipresent cassette of RNA polymerases

FAT NEK NLL

(529) VDGGRFATSDLNDLYRRLINRNNRLK (12) RNEKRMLQEAVDAL (27) GKQGRFRQNLLGKRVDYSGRSVIVVGP 2A6E (224-518)LDGGRFATSDLNDLYRRVINRNNRLK (12) RNEKRMLQEAVDAL(25-27)GKQGRFRQNLLGKRVDYSGRSVIVVGP consensus

VLL NAD

(62) KVVLLNRAPTLHRLGIQAF (18) AFNADFDGDQMAVH (776) 2A6E (59-84)HPVLLNRAPTLHRLGIQAF (18) AFNADFDGDQMAVH (131-961) consensus

Page 54: Edward N. Trifonov University of Haifa and       Masaryk University, Brno

60% match threshold networks:

320,000 proteins from 120 prokaryotes, ~100,000,000 fragments

The largest (monster) network 9,368,905 sequence fragments (~10% of all)

Next largest 2,535 fragments

Networks of sizes 120 to 2,535 fragments (several thousand, 3.8% of all fragments)

Small networks cover 86% of the space

35% of fragments are single, no relatives


Recommended