Date post: | 02-Jan-2016 |
Category: |
Documents |
Upload: | rafael-curry |
View: | 22 times |
Download: | 0 times |
Complementarity of network and sequence
information in homologous proteins
March, 2010
1Department of Computing, Imperial College London, London, UK2Department of Computer Science, University of California, Irvine, USA
International Symposium on Integrative Bioinformatics
Vesna Memišević2, Tijana Milenković2, and Nataša Pržulj1
Motivation
• Genetic sequences – revolutionized understanding of biology• Non-sequence based data of importance, e.g.:
– secondary & tertiary structure of RNA have the dominant role in RNA function (tRNA: Gautheret et al., Comput. Appl. Biosci., 1990)(rRNA: Woese et al., Microbiological Reviews, 1983)
– Secondary structure-based approach – more effective at finding new functional RNAs than sequence-based alignments(Webb et al., Science, 2009)
• What about patterns of interconnections in PPI networks?– Can they complement the knowledge learned from genomic sequence?– Wiring patterns of duplicated proteins in PPI net – insights into evol. dist.?
– Does the information about homologues captured by PPI network topology differ from that captured by their sequence?
Nataša Prž[email protected]
.uk
2
Background
• Homologs – descend from a common ancestor:
1. Paralogs: in the same species, evolve through gene duplication events
2. Orthologs: in different species, evolve through speciation events
3
Nataša Prž[email protected]
.uk
44
Background
• Sequence-based homology data from: 1. Clusters of Orthologous Groups – COG[1]
2. KEGG Orthology System[2]
4
Nataša Prž[email protected]
.uk
[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.
555
• Sequence-based homology data from: 1. Clusters of Orthologous Groups – COG[1]
• Proteins in different genomes – sequence compared for the best hits (BeTs)
• The graph of BeTs constructed
2. KEGG Orthology System[2]
5
Nataša Prž[email protected]
.uk
[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.
Background
666
Background
• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]
• Proteins in different genomes – sequence compared for the best hits (BeTs)
• The graph of BeTs constructed
2. KEGG Orthology System[2]
6
Nataša Prž[email protected]
.uk
[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.
1 1’
2 3
4
5
67
77
Background
• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]
• Proteins in different genomes – sequence compared for the best hits (BeTs)
• The graph of BeTs constructed
• Triangles in it found
2. KEGG Orthology System[2]
7
Nataša Prž[email protected]
.uk
[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.
1 1’
2 3
4
5
67
888
Background
• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]
• Proteins in different genomes – sequence compared for the best hits (BeTs)
• The graph of BeTs constructed
• Triangles in it found
2. KEGG Orthology System[2]
8
Nataša Prž[email protected]
.uk
[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.
1 1’
2 3
4 67
999
Background
• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]
• Proteins in different genomes – sequence compared for the best hits (BeTs)
• The graph of BeTs constructed
• Triangles in it found
• Triangles sharing a side merged into the groups of orthologs and paralogs
2. KEGG Orthology System[2]
9
Nataša Prž[email protected]
.uk
[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.
1 1’
2 3
4 67
10101010
Background
• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]
• Proteins in different genomes – sequence compared for the best hits (BeTs)
• The graph of BeTs constructed
• Triangles in it found
• Triangles sharing a side merged into the groups of orthologs and paralogs
2. KEGG Orthology System[2]
10
Nataša Prž[email protected]
.uk
[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.
1 1’
2 3
4
11111111
Background
• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]
• Proteins in different genomes – sequence compared for the best hits (BeTs)
• The graph of BeTs constructed
• Triangles in it found
• Triangles sharing a side merged into the groups of orthologs and paralogs
• No dependence on the absolute level of similarity between compared proteins
2. KEGG Orthology System[2]
11
Nataša Prž[email protected]
.uk
[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.
1 1’
2 3
4
1212121212
Background
• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]
2. KEGG Orthology System[2]
12
Nataša Prž[email protected]
.uk
[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.
13
Background
• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]
2. KEGG Orthology System[2]
• Sequences aligned
• If alignment score < 10-8 then 1 assigned as “similarity bit”
• Otherwise, 0 assigned as “similarity bit”
• “Bit vectors” constructed for a protein, over all proteins
• Graph constructed with nodes protein sequences and edges correlation coefficients of bit vectors of nodes
• Cliques found in the graph = orthology groups
[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000. Nataša Pržulj
1414141414
Nataša Prž[email protected]
.uk
[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.
1 1’
2 3
4
5
67
Background
• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]
2. KEGG Orthology System[2]
• Sequences aligned
• If alignment score < 10-8 then 1 assigned as “similarity bit”
• Otherwise, 0 assigned as “similarity bit”
• “Bit vectors” constructed for a protein, over all proteins
• Graph constructed with nodes protein sequences and edges correlation coefficients of bit vectors of nodes
• Cliques found in the graph = orthology groups
1515151515
Nataša Prž[email protected]
.uk
[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.
1 1’
2 3
4
5
67
Background
• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]
2. KEGG Orthology System[2]
• Sequences aligned
• If alignment score < 10-8 then 1 assigned as “similarity bit”
• Otherwise, 0 assigned as “similarity bit”
• “Bit vectors” constructed for a protein, over all proteins
• Graph constructed with nodes protein sequences and edges correlation coefficients of bit vectors of nodes
• Cliques found in the graph = orthology groups
• Again, no dependence on absolute level of similarity
161616161616
Background
• Sequence-based homology data from :1. Clusters of Orthologous Groups – COG[1]
2. KEGG Orthology System[2]
• We examine yeast proteins only:• Extract all possible pairs of them in COG and
KEGG groups = “orthologous pairs” • There are 9,643 of unique such pairs
• What are their topological similarities within the PPI network?
16
Nataša Prž[email protected]
.uk
[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.
17171717171717
Background
• Sequence-based homology data from :1. Clusters of Orthologous Groups – COG[1]
2. KEGG Orthology System[2]
• We examine yeast proteins only:• Extract all possible pairs of them in COG and
KEGG groups = “orthologous pairs” • There are 9,643 of unique such pairs
• What are their topological similarities within the PPI network?
17
Nataša Prž[email protected]
.uk
[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.
18181818181818
Background
• Sequence-based homology data from :1. Clusters of Orthologous Groups – COG[1]
2. KEGG Orthology System[2]
• We examine yeast proteins only:• Extract all possible pairs of them in COG and
KEGG groups = “orthologous pairs” • There are 9,643 of unique such pairs
• What are their topological similarities within the PPI network?
18
Nataša Prž[email protected]
.uk
[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.
1919191919191919
Background
• Sequence-based homology data from :1. Clusters of Orthologous Groups – COG[1]
2. KEGG Orthology System[2]
• Previous network-topology assisted approaches:
• Network-alignment-based (ISORank)• Yosef, Sharan & Noble, Bioinformatics, 2008
(hybrid Rankprop) Rely heavily on sequence information Use only limited amount of network topology
19
Nataša Prž[email protected]
.uk
[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.
20202020202020
Our Method
• We examine yeast proteins only:• Extract all possible pairs of them in COG and
KEGG groups = “orthologous pairs” • There are 9,643 of unique such pairs
• What are their topological similarities within the PPI network?
• PPI networks are noisy• We analyze the high-confidence part of yeast PPI
network by Collins et al.[3]: 9,074 edges amongst 1,621 proteins
• Focus on proteins with degree > 3 to avoid noisy PPIs• There are 175 orthologous pairs amongst 181
proteins
20
Nataša Prž[email protected]
.uk
[3] Collins et al., Molecular and Cellular Proteomics, 6(3):439–450, 2008
21
Our Method
Nataša Prž[email protected]
.uk
• Does PPI network topology contain homology information? Are similarly wired proteins homologous?
• Does homology information obtained from network topology differ from that obtained from sequence?
2222
Our Method
Nataša Prž[email protected]
.uk
N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.
232323
Our Method
Nataša Prž[email protected]
.uk
N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.
Induced Of any frequency
24242424
Our Method
Nataša Prž[email protected]
.uk
Generalize node degree
N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.
2525252525
Our Method
Nataša Prž[email protected]
.uk
N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.
262626262626
Our Method
Nataša Prž[email protected]
.uk
N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.
27
T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.
Graphlet Degree (GD) vectors, or “node signatures”
Nataša Prž[email protected]
.uk
Our Method
2828
Nataša Prž[email protected]
.uk
Our Method
Similarity measure between nodes’ Graphlet Degree vectors
T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.
292929
Nataša Prž[email protected]
.uk
Our Method
T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.
Signature Similarity Measure
31
Results
Nataša Prž[email protected]
.uk
• Orthologous pairs often perform the same or similar function.
• Does GD vector similarity (GDS) imply shared biological function?
• Note: most GO annotations were obtained from sequences Similar topology ~ similar sequence ~ similar function
Network Topology
3232
Results
Nataša Prž[email protected]
.uk
• Orthologous proteins have high GD vector similarities Network Topology
333333
Results
Nataša Prž[email protected]
.uk
• Orthologous proteins have high GD vector similarities
p-value < 0.05
85%
Network Topology
34343434
Results
Nataša Prž[email protected]
.uk
• Orthologous proteins have high GD vector similarities
p-value < 0.05
85%
> 20% of orthologous pairs have GDS > 85%
Network Topology
3535353535
Results
Nataša Prž[email protected]
.uk
• PPI networks are noisy• Random edge additions, deletions and rewirings in the PPI
net
Network Topology – Robustness
363636363636
Results
Nataša Prž[email protected]
.uk
• PPI networks are noisy• Random edge additions, deletions and rewirings in the PPI
net
Network Topology – Robustness
373737373737
Results
Nataša Prž[email protected]
.uk
• PPI networks are noisy• Random edge additions, deletions and rewirings in the PPI
net
Network Topology – Robustness
38383838383838
Results
Nataša Prž[email protected]
.uk
• Sequence identities for the 175 orthologous pairsSequence
3939393939393939
Results
Nataša Prž[email protected]
.uk
• Sequence identities for the 175 orthologous pairsSequence
~70% orth. pairs have seq. identity < 35%
35%
404040404040404040
Results
Nataša Prž[email protected]
.uk
• Sequence identities for the 175 orthologous pairsSequence
~20% orth. pairs have seq. identity > 90%
90%
41414141414141414141
Results
Nataša Prž[email protected]
.uk
• Sequence identities for the 175 orthologous pairsSequence
“Twilight zone” for homology
20-35%
~70% orth. pairs have seq. identity < 35% No dependence on the absolute similarity COG& KEGG, but triangles in the graph of best matches
42
85%
20% 35%
~20% of orthologous pairs have signature similarities
above 85% (35 pairs)
~30% of orthologous pairs have sequence identities above 35% (53 pairs)
Overlap: 22 pairs (~60% of the smaller set) Sequence and network topology somewhat complementary slices of homology information
Nataša Prž[email protected]
.uk
ResultsComparison:
4343434343434343
Results
Nataša Prž[email protected]
.uk
• 59 of the yeast ribosomal proteins – retained two genomic copies
• Are duplicated proteins functionally redundant?• No: have different genetic requirements for their
assembly and localization so are functionally distinct• Also note: avg sequence identity of struct. similar prots
~8-10%• Two pairs with identical sequence:
Examples
100% sequence identity 50% signature similarity
Degrees 25 and 5
444444444444444444
Results
Nataša Prž[email protected]
.uk
• 59 of the yeast ribosomal proteins – retained two genomic copies
• Are duplicated proteins functionally redundant?• No: have different genetic requirements for their
assembly and localization so are functionally distinct• Also note: avg sequence identity of struct. similar prots
~8-10%• Two pairs with identical sequence:
Examples
100% sequence identity 65% signature similarity
Degrees 54 and 9
45
Conclusions
• Homology information captured by PPI network topology differs from that captured by sequence
• Complementary sources for identifying homologs
Future work:• Could topological similarity be used to
identify orthologs from best-hits graph analysis as done for sequences?
Acknowledgements
This project was supported by the NSF CAREER
IIS-0644424 grant
Nataša Prž[email protected]
.uk