+ All Categories
Home > Documents > Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov...

Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov...

Date post: 13-Sep-2019
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
25
Integrating protein-protein interaction data: navigating the maze Shoshana J. Wodak VIB Structural Biology Research Center, VUB, Brussels Belgium [email protected] Omics data integration, Gent Nov 19, 2018
Transcript
Page 1: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

Integrating protein-protein interaction data: navigating the maze

Shoshana J. Wodak

VIB Structural Biology Research Center, VUB, Brussels Belgium

[email protected]

Omics data integration, Gent Nov 19, 2018

Page 2: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

Genome-scale protein interaction (PPI) networks: an embarrassment of riches

Hairy monster: Typical PPI network Yeast, Human, Fly..

 

Over 30 PPI networks derived from experiments                                (yeasts,  human,  E.coli,  D.  melanogaster,                                    C.  elegans,  P.  falsiparum  and  more..)    

25 PPI networks (and counting) inferred by computational methods

In the last 15 years

Page 3: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

Predict protein function

Model evolutionary processes

Predict disease associations

Interpret information on mutations

Interpret information phenotype perturbations

Use as restraints in mutliscale modeling

Build 3D models

No information on stoichiometry, limited or absent temporal spatial and functional information… MUST MAKE MEANINGFUL USE OF THE DATA

Interactions explain everything, do they ?

Page 4: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

                                 ka  A  +  B    <  -­‐-­‐-­‐-­‐-­‐  >    C                                    kd  

                                               [A]  [B]  Kd  =  kd/ka  =  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐                                                                                  [C]      

Kd !Equilibrium dissociation constant (molar units)!∆Gd = -RT ln Kd /c° !Gibbs free energy of dissociation!

!(RT thermal energy, standard state c°=1M)!

Kd and ∆Gd quantify the binding affinity"!Their values determine whether the complex is formed given the component concentrations.!

The dynamics and time scales are governed by the rate constants ka (bimolecular) and kd (monomolecular):!

•  it takes τa = 1/ka[A] to form a complex ([A] in excess)!

•  the complex has a life-time τd = 1/kd!Adapted from J. Janin, 2014

Binding affinities and rates

Genome-wide studies answer by YES / NO the question Do proteins A and B form a complex ?

Yet PPI are dynamic and subject to the law of mass action !

Page 5: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

Measurable range    Kd   1M 1mM 1µM 1nM 1pM

τd <microsecond millisecond !second hour days

random short-lived transient stable permanent

Type of cell adhesion

assembly redox complexes antigen-antibody crystal enzyme-substrate enzyme-inhibitor

packing signal transduction

! ! ! !weak dimers

oligomeric proteins !

non-specific specific

The functional role of a PPI depends on Kd and the life-time τd = 1/kd

PPI in the cell: wide range of binding affinities & life-times

Adapted from J. Janin, 2014

Page 6: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

Experimentally derived genome-scale PPI datasets: prominent examples

Y A  

D  

B  

C  E  

LC/MS  

AP/MS  

Co-­‐frac;ona;on/MS  +  massive  data  integra;on    

Y2H  

Split  Ubiqui;n  (  Membrane  Y2H)  

PCA  

nucleus

Cytosol Cytosol

Binary  A              B  

Co-­‐complex  

≤80Å

Roland  et  al     human   2014   Y2H   4300   14000   NA  Hain  et  al.   Human   2015   AP-­‐MS   5400   28500   195  

Yang  et  al.   human   2016   Y2H   (248-­‐if  &  381)  629   1043   NA  

Page 7: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

758 268

380 310

778

120

1407

Y2H (union)

PCA (2008)

AP-MS (Babu et al., 2012)

523 360

966 858

230

355

821

Y2H (union) BioGRID HC

AP-MS (Babu et al, 2012)

2394 2264

207 42

248

21

12846

Y2H (union)

PCA (2008)

AP-MS (Babu et al., 2012)

2074 2690

3119 268

22

341

9934

Y2H (union)

BioGRID HC

AP-MS (Babu et al. 2012)

Interac;ons   Proteins  

Limited overlap of interaction networks from different experimental methods (yeast)

Page 8: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

Why is the overlap so limited ?

Different methods probe complementary subspaces of the interactome: AP-MS probe mainly ‘stable’ interactions, Y2H/PCA more transient ones? Biases for proteins in different cellular processes?

Network quality and coverage vary for different methods: AP-MS have a higher rate of FP, Y2H have a higher rate of FN ?

Co-complex associations (AP-MS) ≠ Binary interactions (Y2H/PCA..)

Is there a sampling problem? If so, why? Vlasblom et al. Curr. Opin. Struct. Biol (2013) Pu et al. J. Proteomics, (2015)

Page 9: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

The challenge of deriving the network (AP-MS)

High Confidence (HC) Co-complex Network

(~13,000 PPI) Raw co-complex data

(~700,000 PPI)

Scoring methods

A plethora of methods; HGScore, SAINT, PE, ComPASS, HART, Dice etc.

(Soluble PPI, yeast)

Y A  D  

B  C  

E  

LC/MS  

AP/MS  

Page 10: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

‘Quality’ assessment of PPI network (yeast membrane (2012)

Babu et al. Nature 2012

Comparison to Gold Standard PPI {GO  annota;ons}  

TAP-MS Y2H

Random

Correlation of mRNA expression profiles Experimental verification

by other methods

Yeast integrated PPI network, Babu et al. Nature 2012

Page 11: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

-0.15 -0.1

-0.05 0

0.05 0.1

0.15 0.2

0.25 0.3

log 1

0(R

elat

ive

Ann

otat

ion

Freq

uenc

y)

Y2H

PCA

APMS

Log 10 ( Protein abundance)

Den

sity

Biases of different methods

Biases towards different cellular process, or in sampling co-complex association can be rationalized The bias towards high abundance proteins (PCA & AP-MS) is expected in the raw data (long history of contaminants), but not in the HC networks! It is by far the most consequential since abundant proteins are more likely to form non-specific interactions

Wodak  et  al.,  Curr.  Opin  Struct.  Biol..  (2013)  

Page 12: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

Different PPI networks may yield different results.

PPIs of yeast soluble proteins

HC-Yeast BioGRID Network Integrate HC-Yeast HTP network

Hub End

?  

Mauricio  Macossay  et  al.  SubmiJed    

Page 13: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

Over 100 databases specialize in curating information on functional and physical interactions from publications describing small scale and large-scale studies -Contain unique as well as redundant information -May focus on different areas of biology -Different coverage of the literature -Differences in cross referencing genes & proteins -Different conventions for representing interactions

How can one obtain a comprehensive view of currently known interactions?

Literature curated protein-protein interaction data

Page 14: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

iRefWeb: consolidated PPI data

OPHID

CORUM!

InnateDB MatrixDB MPIDB iRefIndex consolidation : Ian Donaldson, UK, (VIB-Bioinfo core)

iReWeb portal: IrefWeb (URL: Wodaklab.org/irefweb)

iRefWeb (IrefIndex V13)

Interactions:

Total 509,876 Human: 222,465

Proteins: Total: 91,645 Human: 18,841

Tracks source DBs and PubMeds for each PPI Matches protein on basis of aa sequence + taxon Total of 81,132 PUBMEDs

Page 15: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

PSICQUIC: ‘real time’ database federator

IrefIndex  V15  

Page 16: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

The importance of standards data representation

Page 17: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

PSI-MITAB 2.5 format

Page 18: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

How consistent is the information curated by different databases?

Turinsky A. et al. Donaldson I. and Wodak SJ. Database (Oxford) 2010

Page 19: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

Publica;on    Publica;on    

Measuring consistency between pairwise co-citations

Sorensen-Dice Similarity Coefficient: Size of overlap over average set size

Publication    

DB2 DB1 PPI Overlaps

A-B A-C

A-B D-C

A-B A-C D-C

Protein Overlaps

A B C

D

Sppi = 1/2 Sprot = 6/7 Sets of annotated

protein-protein interactions

Analyzed 15,471 shared publications co-curated by two or more amongst 9 major public PPI DBs. When curating the same publication, on average two databases fully agree on : 42% of the interactions and 62% of the proteins Big variation of agreement levels for different organism categories

Page 20: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

Agreement and overlap between databases

Turinsky et al., Nat. Biotech, 2011

Page 21: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

Both proteins from same organism One protein from other organisms

Interactions curated from shared publicatio Interactions curated from shared publications

The Babel tower of organism assignment

Turinsky et al., Nat. Biotech, 2011

Page 22: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

Inconsistencies in Recording PPIs From HTP Studies

Page 23: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

Disagreement Between Databases: main Factors:

q  Problems with mapping protein/gene ID’s, and divergent assignments of splice isoforms: ~10% of data

q  Divergent assignments of organisms: ~21% of data

q  Different ways of representing protein complexes: ~12% of the data q  Inconsistent curation of HTP data: ~1-2% of the data

Most of these factors can be attributed to different curation policies by DBs

(Issues being addressed by PSI standards & IMEx consortium)

Page 24: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

PPI data consumers, beware! - Not all PPI data are created equal

- Different methods probe different types of interactions (e.g. binary/co-complex)

- Double check data quality claims - Literature curated PPIs are a mixed bag, filtering needs to be applied, no global reliability scores!

Page 25: Integrating protein-protein interaction data: navigating ... · Omics data integration, Gent Nov 19, 2018 . Genome-scale protein interaction (PPI) networks: an embarrassment of riches

Acknowledgements

Andrei Tourinsky (HSC, Toronto) Brian Turner (HSC, Toronto) Shuye Pu (HSC, Toronto) James Vlasblom (HSC, Toronto) Systems Support team (HSC, Toronto)

Andrew Emili (UoT) Jack Greenblatt (UoT) Edyta Marcon (UoT) Sadhna Phanse(UoT), Ruth Isserlin (UoT) Jonathan Olsen (UoT) Mohan Babu (UoT) Hyungwon Choi (NUS) Anne-Claude Gingras (SLRI, Toronto) Mathew E. Sowa (Harvard) Emmanuel Levy (Weizmann) Joel Janin (Orsay)

Funding Sources:

http://wodaklab.org


Recommended