+ All Categories
Home > Documents > Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing,...

Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing,...

Date post: 03-Jan-2016
Category:
Upload: karin-gray
View: 217 times
Download: 2 times
Share this document with a friend
Popular Tags:
34
Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science, University of California) Weng Leong Ng (Department of Computer Science, University of California), Wayne Hayes (Department of Computer Science, University of California && Department of Mathematics, Imperial College London) Nataša Pržulj (Department of Computing, Imperial College London) Cancer Informatics 2010 Presented by: Lila Shnaiderman
Transcript
Page 1: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Optimal Network Alignment with Graphlet Degree Vectors

Tijana Milenković (Department of Computing, Imperial College London && Department of

Computer Science, University of California)

Weng Leong Ng (Department of Computer Science, University of California),

Wayne Hayes(Department of Computer Science, University of California && Department

of Mathematics, Imperial College London)

Nataša Pržulj(Department of Computing, Imperial College London)

Cancer Informatics 2010

Presented by: Lila Shnaiderman

Page 2: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

2/34

Motivation• Lately, advances in experimental techniques:

– yeast two-hybrid assay, – Mass spectrometry of purified complexes, – genome-wide chromatin immunoprecipitation,– etc.

• So, increasing amounts of biological network data becoming available!

• Comparative analyses of biological networks have as large an impact as comparative genomics on: – understanding of biology– Evolution– disease

• So, meaningful network comparisons across species becomes one of the foremost problems in evolutionary and systems biology!!!

Page 3: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Background• Subgraph isomorphism problem:

– Is one graph exists as an exact subgraph of another graph.– NP-complete complexity– So, network comparisons are computationally infeasible…

• Network alignment:– The most common network comparison method.– Is more general problem:

• Find the best way to “fit” a graph into another graph (not an exact subgraph)

• Unclear:– how to guide the alignment process– how to measure the “goodness” of an inexact fit– So, heuristic strategies must be sought

3/34

Page 4: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Background – alignment types• Local alignment:

– The majority of existing methods.

– match a small sub network from one network to one or more sub networks in another network.

– Can be ambiguous…

• Global alignment:– Measures the overall similarity between two networks.

– Aligns every node in the smaller network to exactly one node in the larger network.

– most existing methods incorporate some a priori information external to network topology

• like protein sequence similarities in PPIs networks, etc.

• Best known global network alignment algorithm based solely on network topology:– GRAph ALigner (GRAAL): uses a heuristic search strategy to quickly

find approximate alignments 4/34

Page 5: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Current solution: H-GRAAL• Hungarian-algorithm based GRAAL

• More expensive

• Guaranteed to find optimal alignments relative to

any fixed, deterministic cost function.

• Relies solely and explicitly on a strong and

direct measure of network topological similarity.

• Applicable to any type of networks

• Allows to transfer the knowledge between aligned

networks.

5/34

Page 6: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

• A small connected induced sub graph of a larger network.

6/34

Graphlet degree vectors (1)

0 1

2

G1G0

3

G2

G7

G4

6

7

G5

8

G6

13

12

G8

G3

4

5

11

10

9

14

Page 7: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Graphlet degree vectors (2)• Graphlet degrees vector of node V: counts the

number of different graphlets that the node touches (for all graphlets on 2 to 5 nodes).

7/34

Orbit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

GDV(V) 4 2 5 1 0 4 0 2 1 0 0 2 0 0 0

v0

v v

v

Page 8: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Graphlet degree vectors (3)

8/34

Orbit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

GDV(V) 4 2 5 1 0 4 0 2 1 0 0 2 0 0 0

v1

2 v

orbit

Page 9: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Graphlet degree vectors (4)

9/34

Orbit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

GDV(V) 4 2 5 1 0 4 0 2 1 0 0 2 0 0 0

v1

2 vv

v v

Page 10: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Graphlet degree vectors (4)

10/34

Orbit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

GDV(V) 4 2 5 1 0 4 0 2 1 0 0 2 0 0 0

v

34

5 ?

Orbit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

GDV(V) 4 2 5 1 0 4 0 2 1 0 0 2 0 0 0

Page 11: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Graphlet degree vectors (5)

11/34

Orbit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

GDV(V) 4 2 5 1 0 4 0 2 1 0 0 2 0 0 0

v4

5 v

v v

Page 12: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Graphlet degree vectors (6)

12/34

Orbit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

GDV(V) 4 2 5 1 0 4 0 2 1 0 0 2 0 0 0

v

6

7

v

Orbit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

GDV(V) 4 2 5 1 0 4 0 2 1 0 0 2 0 0 0

8

v

Page 13: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Graphlet degree vectors (7)

13/34

Orbit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

GDV(V) 4 2 5 1 0 4 0 2 1 0 0 2 0 0 0

v v11

10

9

What is the degree of node V (according to the vector)?

There are 73 different orbits across all 2-5-node graphlets

The signature of node V

Page 14: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Degree Vector - Signature• Many real-world Networks:

– Have a small-world nature

• So, degree Vector is an effective measure:– Looks at network distance of 4 around a node – Captures a large portion of network topology

• Thus, comparing two signatures:– Highly constraining measure of local

topological similarity between nodes.

14/34

Page 15: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Signature similarity• For uG, ui: =

– the ith coordinate of its signature vector.

– Distance:

– wi is the weight of orbit i.• Accounts for dependencies between orbits

• higher weights to orbits that are not affected by many other orbits

• Questions:– Why log?

– Why “+1”? 15/34

Page 16: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Distance and Similarity

• Total Distance:

– in (0,1)– O means: u,v identical

• Similarity: S(u,v) = 1-D(u,v)

16/34

Page 17: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

H-GRAAL algorithm-definitions• G1 and G2 are networks:

– |V(G1)|<|V(G2)|

• Alignment of G1 to G2:

– set of ordered pairs (u,v), u ∈ V (G1) and

v ∈ V (G2)

– no two ordered pairs share the same G1-node or the

same G2-node.

– Each pair called aligned pair.

• Maximum alignment:– Every G1-node is in some aligned pair– From now on:

alignment=maximum alignment 17/34

Page 18: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

H-GRAAL algorithm• H-GRAAL:

– Hungarian-algorithm-based GRAph Aligner

• Produces an alignment:– of minimum total cost between networks

– total cost: summed over all aligned pairs

– aligned pair cost: based on signature similarity

• The cost of aligning u and v:– favors alignment of the densest parts of the networks;

– Reduced as the degrees of both nodes increase: higher degree nodes with similar signatures provide a tighter constraint

– α ∈ [0, 1]: weighs the cost-function contributions of the node signature similarity between u and v

– 1 − α: weights the contribution of nodes degrees.

18/34

Page 19: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Alignment Cost

• Cost=0: a pair of topologically identical nodes u and v• Cost close to 2: a pair of topologically very different nodes.

• Any problem with this formula?• T(u,v) for most nodes is very low:

– As, there is small number of hubs (highly-linked nodes),

– So max_deg(G1) and max_deg(G2) are much larger than

deg(u) and deg(v).

19/34

Page 20: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Hungarian Algorithm• solves the assignment problem in polynomial

time:– Create two bipartite graphs V(G1), V(G2).

– Edge (u,v) from V(G1) to V(G2): labeled with the node

alignment cost.– Find perfect match between them (with minimal cost).

• More than one optimal alignment is possible:– the particular found alignment is highly dependent on

the implementation details of the underlying Hungarian algorithm.

– For example: the order of presenting the nodes to the algorithm

20/34

Page 21: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Finding Few Optimal Alignment• Can learn about all possible optimal matchings.

• Make H-GRAAL to give more alignments:– “Remove” (u,v): raise the alignment cost of a node-pair (u,v)

in A0 to +∞

– Run H-GRAAL again• Found alignment with higher cost than A0, “Remove” different edge.

• After trying to “remove” all edges, if not found alignment with optimal cost, no more optimal alignments exist.

• This process has too high complexity…– O(|V(G1)|3x||E(G1)|)

– There exist a fix O(|V(G1)|2x||E(G1)|) (based on dynamic

Hungarian algorithm).

– My remark: still very slow (can take months…)21/34

Page 22: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Few Optimal Alignment algorithm• Optimizing aligned pair:

– Appears in at least one optimal alignment.

• The set of optimizing pairs:– Can be computed in at worst O(n4) time.– Can be easily parallelized.

22/34

My remark: too slow…

Page 23: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Few Optimal Alignments - Analysis

• Significance of aligned pair:– According to number of optimizing pairs per

u.– If (u,v) were the only optimizing pair for u:

every optimal alignment contains (u,v). I.e., (u,v) is highly significant.

• Core alignment: – the set of all such special optimizing pairs.– Large core alignment means: stable

alignment.23/34

Page 24: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Measures of alignment quality (1)• Edge correctness (EC) –

– percentage of edges in one graph that are aligned to edges in the other graph.

To be able to measure the following measurements, must know the “true alignment” …

• Node correctness (NC) – – percentage of nodes in one network that are correctly aligned

to nodes in the other network

• Interaction correctness (IC) – – percentage of interactions that are aligned correctly

• IC is stricter than EC:– EC does not require that the alignment partners are the

correct ones 24/34

Page 25: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Measures of alignment quality (2)• Usually the “true alignment” is not known

– So, can measure just EC…

– two alignments possibly can have similar ECs, where one alignment is “good” and the other is “bad” EC is not enough…

• To uncover regions of similar topology:– the aligned edges must cluster together and form large

and dense connected sub-graphs.

• Common connected sub-graph (CCS):– connected sub-graph that appears in both networks

• Good alignment has:– large and dense CCSs.

– Large EC 25/34

Page 26: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Statistical Significance• Random alignment of real-world networks:

– the probability of obtaining a given or better EC at random.

• Null model of random alignment:– Random mapping g: E1 → V1 × V2.

– n1 = |V1|, n2 = |V2|, m1 = |E1|, and m2 = |E2|.

– p = n2 (n2 − 1)/2: the number of node pairs in G2

– EC = x%: the edge correctness of the given alignment– k = [m1 × x]: the number of aligned edges from G1 to edges in G2.

• P: – the probability of successfully aligning k or more edges by chance

(the tail of the hypergeometric distribution):

.26/34

Page 27: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

More statistical Significance Metrics• H-GRAAL’s alignment of random model networks:

– Checks the significance of the alignment in compare to alignment of random networks:

• Align two PPI networks,

• align them with random networks,

• compare results.

• Biological Validation:– find the number of aligned protein pairs sharing a Gene

Ontology (GO) term.

– Compute its statistical significance.

• Significance of functional enrichments:– Align metabolic networks of different species

– generate phylogenetic trees based on H-GRAALs ECs.

– Compute its statistical significance. 27/34

Page 28: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Results (1)

• H-GRAAL always produces better alignments than GRAAL for all values of α.

• using only degrees (α = 0) gives bad results. – So, graphlet-based signatures are far more valuable than a measure

based on degree alone.

28/34

Page 29: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Results (2)

• The largest common connected sub-graph in the alignment of the yeast and human PPI networks– consisting of 1,290 interactions amongst 317 proteins. – This network appears, in its entirety, in the PPI networks of both

species. 29/34

Page 30: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Results (3)

• Statistics of H-GRAAL’s core yeast-human alignment for α = 0.5.

• The percentage of yeast proteins, out of 2,390 of them, that participate in n “optimizing pairs”.

• Shows the quality of H-GRAAL!30/34

Page 31: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Results (4)

• Comparison of the phylogenetic trees for protists and fungies• H-GRAAL’s and GRAAL’s tree are slightly different from the

sequence-based one. • Sequence-based trees are built based on:

– multiple alignment of gene sequences– whole genome alignments.

31/34

Page 32: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Results (5)• Multiple alignments have few problems:

– Can be misleading due to gene rearrangements, inversions, transpositions, and translocations (at the substring level)

– Different species might have an unequal number of genes or genomes of vastly different lengths.

• Whole genome alignments can be misleading: – Noncontiguous copies of a gene or non-decisive gene order. – The trees are built incrementally from smaller pieces that are

“patched” together probabilistically probabilistic errors expected.

• H-GRAAL’s and GRAAL’s have none of these. But – There are noise problems – Incompleteness of PPI networks.

• No reason to believe that the sequence-based tree or GRAAL’s one should a priori be considered the correct one

32/34

Page 33: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

Conclusions• Presented H-GRAAL algorithm for global

alignment between networks• Presented different statistics to evaluate

the quality of the alignment.• Experimented with different PPI networks,

and not only PPI.• Showed that H-GRAAL is the best known

global alignment algorithm.• H-GRAAL can have huge influence on

researching biological networks!33/34

Page 34: Optimal Network Alignment with Graphlet Degree Vectors Tijana Milenković (Department of Computing, Imperial College London && Department of Computer Science,

34/34

Thank you for your attention!


Recommended