Discovering functional interaction patterns in Protein-Protein Interactions Networks
Authors: Mehmet E Turnalp Tolga Can
Presented By: Sandeep Kumar
Background
Availability of genome scale protein network
Understanding topological organization
Identification of conserved subnetworks across
different species
Discover modules of interaction
Predict functions of uncharacterized proteins
Improve the accuracy of currently available networks
Aim of study
Using available functional annotations of proteins in PPI network and look for overrepresented patterns of interactions in the network
Present new frequent pattern identification technique PPISpan
Yeast as a model
Why yeast genomics?
A model eukaryote organism …
Well known PPI network
Saccharomyces cerevisiaeSaccharomyces cerevisiae
PPI Network Protein protein interaction shown by edge
between them indicating physical association in
the form of modification, transport or complex
formation
Interesting conserved interaction patterns
among species
Patterns correspond to specific biological
process
Frequent sub-graphs
A graph (sub graph) is frequent if is support (occurrence frequency) in a given dataset is no less than minimum support
threshold
Example: Frequent Subgraphs
GRAPH DATASET
FREQUENT PATTERNS(MIN SUPPORT IS 2)
)A( )B( )C(
)1( )2(
The Algorithm - PPISpan
Based on gSpan
Modified to adapt for PPI network
Candidate generation
Frequency counting
Algorithm: PPISpan (G, L, minSup)
1. Set the vertex labels in G with GO terms from the desired GO level L
2. S <- all frequent 1-edge graphs in G in frequency based lexicographical order
3. for each edge e in S (in ascending order frequency) do
4. SubGraphs (e, minSup, e)5. Remove e from G
Algorithm: Subpgraphs (s, minSup, ext)
1. If (feasible (s, ext)) 2. If DES code of s != to its minimum DFS code3. return 4. C <- Generate all children of s (by growing an edge,
ext)5. Maximal <- true6. For each c in C (in DFS lexicographical order) do7. If support (c) >= minSup8. Subgraphs (c, minSup, c.ext)9. maximal <- false10. If (maximal)11. output s
Datasets used
1. Database of interacting proteins (DIP) data constructed from high-throughput experiments
1. String Database confidence weighted predicted data
1. WI-PHI weighted yeast interactome enriched for direct physical interactions
Gene Ontology annotations
o Used to assign functional category labels to the proteins in PPI network
o Collaborative effort to address the need of consistent descriptions of the gene products in different databases
o Provides description for biological processes, cellular components, and molecular functions
GO slim terms
Provides a broad overview of the functional categories in GO
GO Slim Molecular Function Terms for S. CerevisiaeTerm ID DefinitionGO:3674 molecular function unknownGO:16787 hydrolase activityGO:16740 transferase activityGO:5515 protein binding…Total of 22 broad functional categories
Research Steps
o Label the nodes with functional categories with GO annotations
o Consider molecular function hierarchy
o Focus on functional interaction patterns in arbitrarily topologies
o Find non-overlapping embeddings using PPISpan
Problems faced
o Noise in PPI networko False positiveso False negativeso Accuracy and specificity of
annotations of proteins
Supporting embedding
o Specific instance of the functional pattern realized by certain proteins in the PPI network
Experiment details
o Implemented in C++o Searched for frequent interaction
patterns of support >= 15
Pattern frequency in different datasets
Number of patterns found
Observation
Most of the patterns are trees
Star topology most abundant
Cycles rare
Comparison with known molecular complexes and pathways
Ignore topology and treat patterns as set of proteins for comparison
Molecular complexes from MIPS (Munich Information Center for Protein Sequences) complex catalogue database
Signaling, transport, and regulatory pathways from KEGG database
Use high quality complexes
cpcount
o Average number of different complexes or pathways the embeddings of a frequent interaction pattern overlaps with
o To speculate on the location of interacting patterns
cpoverlap
o Quantifies the overlap between proteins in an embedding and known complexes and pathways
o Ratio of proteins in an embedding that are members of known functional modules
Observations from comparison
o For some of the observed patterns, topology is more important than underlying functional annotations
o Comparison of all the patterns with random patterns in terms of overlap with MIPS complexes
o Comparison of all the patterns with random patterns in terms of overlap with transport and signaling pathways
Analysis of patterns with MIPS complexes
o Selected patterns from DIP and WI-PHI networks
o Selected patterns from the STRING network
o cpoverlap of selected patterns with respect to MIPS complexes
o cpcount of selected patterns with respect to MIPS complexes
Analysis of patterns with KEGG pathways
o Selected patterns from DIP, STRING and WI-PHI networks
o cpoverlap of selected patterns with respect to transport and signaling pathways
o cpcount of selected patterns with respect to transport and signaling pathways
Some interesting Functional interaction patterns
o A frequent functional interaction pattern in the DIP network
o A frequent functional interaction pattern in the WI-PHI network
o A functional interaction pattern related to the MAPK signaling pathwaysignaling pathways
o A functional interaction pattern related to the SNARE interactions in vesicular transport
Conclusions
o Proposed new frequent pattern identification technique, PPISpan
o utilized molecular function Gene Ontology annotations to assign non-unique labels to proteins of a PPI network
o identified significantly frequent functional interaction patterns
o Frequent patterns offer a new perspective into the modular organization of protein-protein interaction networks
QUESTIONS?
THANK YOU