Date post: | 11-Apr-2017 |
Category: |
Science |
Upload: | rsg-luxembourg |
View: | 17 times |
Download: | 1 times |
22 March 2017
Background & Aim
• There is more and more (genome-wide) data available that is still not optimally used• Genome-wide networks are too big and complex to be interpreted in a
meaningful way• Knowledge-based networks are in general non specific: e.g. canonical pathways,
PPI networks…
Develop a flexible method to identify context-specific subnetworks
Approach• Model the flow of information using chains of interactions• Chains = simple paths: sequence of interactions (e.g. protein modifications) that
connect one start and one ending point.• Multiple chains can exist between a couple of start and end protein: what is the
best meaningful subnetwork?• Prioritization of the chains based on many possible scores: gene expression,
functional module identification, …• Here they present a general tool for combining multiple biological information as
chain scores: ChainRank
Methods1. Search for all chains among user-defined start and end nodes in the network2. Annotate the nodes with scores in order to calculate chains score and p-value
Subnetwork
Restrict the network by heuristic breadth-first search from the fixed initial proteins to the final one with 2 criteria:1. Maximal length allowed = length of the shortest path between initial and final
node2. Prefer the integration of highly connected proteins (canonical signaling
interactors)
Scoring scheme• Chain score = • Node scores used
1. Localisation: mean expression variability across studied tissue vs. mean expression variability across all others -> gene expression
2. Relevance: occurrence of each protein among the significant ones across studies -> gene expression, protein modifications, metabolism…
3. Connectivity: degree centrality -> topology
• Combination of scores1. Weighted product of normalized scores2. Filtering: pre-filter chains by score S1 and rank them by score S23. Intersection: keep only chains that pass filter on all scores
Results• Application to chronic obstructive pulmonary disease (COPD)• Network used: experimental interactions from different public databases + COPD
knowledge base (10k nodes, 62k interactions)• Significance: comparison to chains in random networks• Evaluation: enrichment of the top ranked chains in gold standard pathways
proteins• Improvement metric:
Localisation: expression variability across studied tissue vs. across all others Relevance: occurrence of each protein among the significant ones across COPD-related studiesConnectivity: degree centralityCombination by weighted product: no improvement
Filtering: connectivity<0.05, ranked by localizationIntersection: connectivity and localization
Filtering: top quartile localization, ranked by relevanceIntersection: localization and relevance
IGF-Akt proximity subnetwork MAPK proximity subnetwork
Results for the best 50 chains
Other methods:recall 50-85%Precision 18-42%
Here (max): recall 67%, precision 30%
Conclusions and claims• 50% improvement in finding gold standard proteins (compared to random), and
combining scores even better (x2.5)• 11% improvement of the AUC (compared to random)
• Generic tool applicable to different network types (GRN, metabolic networks)• Importance of selected scores based on scientific question• Applications
• Causal, mechanistic connection?• Common mechanisms driving different diseases• Reduce the computational models • Synthetic lethality