Complementarity of network and sequence information in homologous proteins

Complementarity of network and sequence

information in homologous proteins

March, 2010

1Department of Computing, Imperial College London, London, UK2Department of Computer Science, University of California, Irvine, USA

International Symposium on Integrative Bioinformatics

Vesna Memišević2, Tijana Milenković2, and Nataša Pržulj1

Motivation

• Genetic sequences – revolutionized understanding of biology• Non-sequence based data of importance, e.g.:

– secondary & tertiary structure of RNA have the dominant role in RNA function (tRNA: Gautheret et al., Comput. Appl. Biosci., 1990)(rRNA: Woese et al., Microbiological Reviews, 1983)

– Secondary structure-based approach – more effective at finding new functional RNAs than sequence-based alignments(Webb et al., Science, 2009)

• What about patterns of interconnections in PPI networks?– Can they complement the knowledge learned from genomic sequence?– Wiring patterns of duplicated proteins in PPI net – insights into evol. dist.?

– Does the information about homologues captured by PPI network topology differ from that captured by their sequence?

Nataša Prž[email protected]

.uk

2

Background

• Homologs – descend from a common ancestor:

1. Paralogs: in the same species, evolve through gene duplication events

2. Orthologs: in different species, evolve through speciation events

3


.uk

44

Background

• Sequence-based homology data from: 1. Clusters of Orthologous Groups – COG[1]

2. KEGG Orthology System[2]

4


.uk

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000.

555

• Sequence-based homology data from: 1. Clusters of Orthologous Groups – COG[1]

• Proteins in different genomes – sequence compared for the best hits (BeTs)

• The graph of BeTs constructed


5


.uk


Background

666

Background

• Sequence-based homology data from : 1. Clusters of Orthologous Groups – COG[1]




6


.uk


1 1’

2 3

4

5

67

77

Background




• Triangles in it found


7


.uk


1 1’

2 3

4

5

67

888

Background






8


.uk


1 1’

2 3

4 67

999

Background





• Triangles sharing a side merged into the groups of orthologs and paralogs


9


.uk


1 1’

2 3

4 67

10101010

Background







10


.uk


1 1’

2 3

4

11111111

Background






• No dependence on the absolute level of similarity between compared proteins


11


.uk


1 1’

2 3

4

1212121212

Background



12


.uk


13

Background



• Sequences aligned

• If alignment score < 10-8 then 1 assigned as “similarity bit”

• Otherwise, 0 assigned as “similarity bit”

• “Bit vectors” constructed for a protein, over all proteins

• Graph constructed with nodes protein sequences and edges correlation coefficients of bit vectors of nodes

• Cliques found in the graph = orthology groups

[1] Tatusov et al., BMC Bioinformatics, 4(41), 2003.[2] Kanehisa et al., Nucleic Acids Res., 28:27–30, 2000. Nataša Pržulj

[email protected]

1414141414


.uk


1 1’

2 3

4

5

67

Background









1515151515


.uk


1 1’

2 3

4

5

67

Background









• Again, no dependence on absolute level of similarity

161616161616

Background

• Sequence-based homology data from :1. Clusters of Orthologous Groups – COG[1]


• We examine yeast proteins only:• Extract all possible pairs of them in COG and

KEGG groups = “orthologous pairs” • There are 9,643 of unique such pairs

• What are their topological similarities within the PPI network?

16


.uk


17171717171717

Background






17


.uk


18181818181818

Background






18


.uk


1919191919191919

Background



• Previous network-topology assisted approaches:

• Network-alignment-based (ISORank)• Yosef, Sharan & Noble, Bioinformatics, 2008

(hybrid Rankprop) Rely heavily on sequence information Use only limited amount of network topology

19


.uk


20202020202020

Our Method




• PPI networks are noisy• We analyze the high-confidence part of yeast PPI

network by Collins et al.[3]: 9,074 edges amongst 1,621 proteins

• Focus on proteins with degree > 3 to avoid noisy PPIs• There are 175 orthologous pairs amongst 181

proteins

20


.uk

[3] Collins et al., Molecular and Cellular Proteomics, 6(3):439–450, 2008

21

Our Method


.uk

• Does PPI network topology contain homology information? Are similarly wired proteins homologous?

• Does homology information obtained from network topology differ from that obtained from sequence?

2222

Our Method


.uk

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

232323

Our Method


.uk

N. Przulj, D. G. Corneil, and I. Jurisica, “Modeling Interactome: Scale Free or Geometric?,” Bioinformatics, vol. 20, num. 18, pg. 3508-3515, 2004.

Induced Of any frequency

24242424

Our Method


.uk

Generalize node degree

N. Przulj, “Biological Network Comparison Using Graphlet Degree Distribution,” ECCB, Bioinformatics, vol. 23, pg. e177-e183, 2007.

2525252525

Our Method


.uk


262626262626

Our Method


.uk


27

T. Milenkovic and N. Przulj, “Uncovering Biological Network Function via Graphlet Degree Signatures”, Cancer Informatics, vol. 4, pg. 257-273, 2008.

Graphlet Degree (GD) vectors, or “node signatures”


.uk

Our Method

2828


.uk

Our Method

Similarity measure between nodes’ Graphlet Degree vectors


292929


.uk

Our Method


Signature Similarity Measure

303030


.uk

Our Method

31

Results


.uk

• Orthologous pairs often perform the same or similar function.

• Does GD vector similarity (GDS) imply shared biological function?

• Note: most GO annotations were obtained from sequences Similar topology ~ similar sequence ~ similar function

Network Topology

3232

Results


.uk

• Orthologous proteins have high GD vector similarities Network Topology

333333

Results


.uk

• Orthologous proteins have high GD vector similarities

p-value < 0.05

85%

Network Topology

34343434

Results


.uk

• Orthologous proteins have high GD vector similarities

p-value < 0.05

85%

> 20% of orthologous pairs have GDS > 85%

Network Topology

3535353535

Results


.uk

• PPI networks are noisy• Random edge additions, deletions and rewirings in the PPI

net

Network Topology – Robustness

363636363636

Results


.uk


net


373737373737

Results


.uk


net


38383838383838

Results


.uk

• Sequence identities for the 175 orthologous pairsSequence

3939393939393939

Results


.uk


~70% orth. pairs have seq. identity < 35%

35%

404040404040404040

Results


.uk


~20% orth. pairs have seq. identity > 90%

90%

41414141414141414141

Results


.uk


“Twilight zone” for homology

20-35%

~70% orth. pairs have seq. identity < 35% No dependence on the absolute similarity COG& KEGG, but triangles in the graph of best matches

42

85%

20% 35%

~20% of orthologous pairs have signature similarities

above 85% (35 pairs)

~30% of orthologous pairs have sequence identities above 35% (53 pairs)

Overlap: 22 pairs (~60% of the smaller set) Sequence and network topology somewhat complementary slices of homology information


.uk

ResultsComparison:

4343434343434343

Results


.uk

• 59 of the yeast ribosomal proteins – retained two genomic copies

• Are duplicated proteins functionally redundant?• No: have different genetic requirements for their

assembly and localization so are functionally distinct• Also note: avg sequence identity of struct. similar prots

~8-10%• Two pairs with identical sequence:

Examples

100% sequence identity 50% signature similarity

Degrees 25 and 5

444444444444444444

Results


.uk

• 59 of the yeast ribosomal proteins – retained two genomic copies

• Are duplicated proteins functionally redundant?• No: have different genetic requirements for their

assembly and localization so are functionally distinct• Also note: avg sequence identity of struct. similar prots

~8-10%• Two pairs with identical sequence:

Examples

100% sequence identity 65% signature similarity

Degrees 54 and 9

45

Conclusions

• Homology information captured by PPI network topology differs from that captured by sequence

• Complementary sources for identifying homologs

Future work:• Could topological similarity be used to

identify orthologs from best-hits graph analysis as done for sequences?

Acknowledgements

This project was supported by the NSF CAREER

IIS-0644424 grant


.uk

Date post:	02-Jan-2016
Category:	Documents
Upload:	rafael-curry
View:	22 times
Download:	0 times

Complementarity of network and sequence information in homologous proteins

Documents