SWATH-MS co-expression profiles reveal paralogue ......2020/09/08 · 112 enriched (Table S2, EASE...

SWATH-MS co-expression profiles reveal paralogue 1

interference in protein complex evolution 2

3

Luzia Stalder1,*, Amir Banaei-Esfahani1, Rodolfo Ciuffa1, Joshua L Payne2,3, 4

Ruedi Aebersold1,4,* 5

1 Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, 6

Switzerland. 7

2 Institute of Integrative Biology, ETH Zurich, 8092 Zurich, Switzerland. 8

3 Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland. 9

4 Faculty of Science, University of Zurich, 8057 Zurich, Switzerland. 10

* Authors for correspondence: [email protected] and [email protected] 11

12

Abstract 13

Understanding the conservation and evolution of protein complexes is of critical value to 14

decode their function in physiological and pathological processes. One prominent proposal 15

posits gene duplication as a potential mechanism for protein complex evolution. In this study 16

we take advantage of large-scale proteome expression datasets to systematically investigate the 17

role of paralogues, and specifically self-interacting paralogues, in shaping the evolutionary 18

trajectories of protein complexes. First, we show that protein co-expression derived from 19

quantitative proteomic matrices is a good indicator for complex membership and is conserved 20

across species. Second, we suggest that paralogues are commonly strongly co-expressed and 21

that for the subset of paralogues that show diverging co-expression patterns, the divergent co-22

expression patterns reflect both sequence and functional divergence. Finally, on this basis, we 23

show that homomeric paralogues known to be part of protein complexes display a unique co-24

expression pattern distribution, with a subset of them being highly diverging. These findings 25

support the idea that homomeric paralogues can avoid cross-interference by diversifying their 26

.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in

The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint

https://doi.org/10.1101/2020.09.08.287334

http://creativecommons.org/licenses/by-nc-nd/4.0/

2

expression patterns, and corroborates the role of this mechanism as a force shaping protein 27

complex evolution and specialization. 28

29

Acknowledgments 30

We thank Marija Buljan for help with the project supervision. This project was supported by 31

the Swiss National Science Foundation through grant # SNSF 31003A_166435 to R.A. and by 32

the European Research Council (ERC) through grant (ERC-20140AdG 670821 to R.A. R.C. 33

was supported in part by the IMI project ULTRA-DD (FP07/2007-2013, grant no. 115766). 34

J.L.P. acknowledges support from Swiss National Science Foundation Grant PP00P3_170604. 35

Author contributions 36

L.S. conceived the study, performed the analysis and wrote the manuscript. R.C. helped with 37

manuscript writing. R.A. and J.L.P supervised the project and provided feedback on the 38

manuscript. A. B.E. helped with project supervision. 39

Declaration of interests 40

The authors declare no competing interests. 41



https://doi.org/10.1101/2020.09.08.287334


3

Introduction 42

Protein complexes – described as stable protein assemblies that can be isolated by biochemical 43

means – are one of the main modes of proteome organization and fundamental functional 44

entities of the cell. The dramatic increase in structural knowledge of these complexes, as well 45

as the advent of proteome-wide profiling methods (most prominently via mass spectrometry) 46

have allowed the interrogation of general principles governing protein complex formation, 47

function and evolution. For instance, bioinformatics approaches of the Protein Data Bank 48

(PDB) and of large-scale proteomics datasets, respectively, have identified core symmetries 49

that complexes obey, and the extent to which the co-regulation of their subunits is 50

constrained1,2. An especially fertile area of investigation relates to the evolution of protein 51

complexes. 52

It has been demonstrated that a possible mechanism for protein complex core formation starts 53

with the duplication of self-interacting proteins (homomeric paralogues; Figure 1A)3,4. 54

Importantly, recent research on the fate of paralogues pointed out that duplication of genes for 55

obligate homomeric proteins can lead to interference between the resulting paralogues: 56

mutations in only one of the paralogues can “poison” the oligomer and affect the ancestral 57

function5. It has been suggested that this constrains the evolution of paralogues on the one hand, 58

but also promotes additional regulatory complexity on the other5. Specifically, if sequence 59

divergence of the homomeric paralogues impairs the ancestral function, the mutated paralogue 60

acts as a highly specific competitive inhibitor for the ancestral protein. This is referred to as 61

paralogue interference5. To prevent this type of interference, the paralogues will be under 62

selection pressure to develop mechanisms that prevent their cross-interaction, for example by 63

mutations that drive expressional separation. Importantly, if paralogues that are separated by 64

expression give rise to protein complexes, they will contain either the ancestral or the duplicated 65

paralogous pair, but not both at the same time. Expressional separation after duplication is a 66

common pattern on the RNA level5. However, expressional separation has not been analyzed 67

on the protein level in a general way and its role in protein complex evolution remains elusive5. 68

A main limitation in studying protein complex evolution has been the difficulty of measuring 69

protein complex states in a high-throughput manner. Here we address this by using data-70

independent acquisition and analysis of mass spectrometry data, particularly by the sequential 71

windowed acquisition of all theoretical mass spectra (SWATH-MS)6. We study complex states 72



https://doi.org/10.1101/2020.09.08.287334


4

at the proteome level in human, mouse, Drosophila melanogaster and Saccharomyces 73

cerevisiae7–9. We demonstrate that protein co-expression profiles determined by SWATH-MS 74

are substantially conserved across species and that they can be used to map the diversification 75

of protein paralogues, as measured for instance by divergence in sequence or interactors. 76

Critically, we find that protein paralogues that are known to be part of a protein complex are 77

enriched in homomer-derived paralogues that separated in their expression levels. These are 78

indeed the proteins shown to be affected by interference, and our finding therefore supports the 79

notion that this mechanism can propel protein complex diversification and expression 80

divergence. In conclusion, our study indicates that quantitative proteomic data can be used to 81

infer protein complex relationships and identifies paralogue interference as a constraint of their 82

evolution. 83



https://doi.org/10.1101/2020.09.08.287334


5

Results 84

SWATH-MS co-expression profiles recover protein complexes and are conserved 85

across species 86

To analyze the role of paralogue diversification in the evolution of protein complexes, we first 87

set out to define a suitable methodological framework. In recent years, a number of studies have 88

taken advantage of the rapidly accumulating body of data on protein-protein interactions and 89

protein expression to analyze constraints on, and evolution of, protein complexes1–4,10–13. Some 90

of these studies have shown that the expression levels of protein complex subunits are generally 91

covarying, and that, conversely, co-expression patterns can be used as a proxy for functional, 92

interaction and complex relatedness. Here we used abundance profiles derived from SWATH-93

MS data across four species: human, mouse, Drosophila melanogaster and Saccharomyces 94

cerevisiae7–9 (Figure 1B). The proteome dataset of each species contains 40 to 112 samples. 95

Overall, between 1610 and 3171 proteins were consistently quantified per species dataset across 96

samples (Table S1 and Figure S1). To examine covariance of protein pairs, we used Spearman‘s 97

rank coefficient (rs). First, we aimed at showing that covariation patterns in our data can indeed 98

preferentially recall known protein complexes and detail their conservation across species. We 99

used manually curated catalogues of protein complexes as a benchmark, specifically 100

CORUM14,15 for human and mouse complexes, DroID16 for Drosophila melanogaster 101

complexes and CYC200817 for Saccharomyces cerevisiae complexes. The results showed that 102

in the datasets of all four species complex members have significantly higher rs compared to 103

those not annotated as members of the same complex (Figure S2, Wilcoxon sign-rank test p-104

values < 0.001, number of pairs human = 17453, mouse = 1739, Drosophila 105

melanogaster = 2261, Saccharomyces cerevisiae = 1807). To further characterize conserved 106

covariance profiles, we performed functional enrichment by DAVID18,19. We found that protein 107

pairs with highly covarying abundance profiles in human and mouse (rs > 0.8) are enriched in 108

mitochondrial processes. Interestingly, when we relaxed the cutoff to rs > 0.6 we noted that 109

protein pairs functionally annotated with “splicing” and “signaling pathways regulating 110

metabolism”, “proliferation”, “cell-cell adhesion” and “immune responses” were additionally 111

enriched (Table S2, EASE score, a modified Fisher exact test p-value < 0.05). Figure 1C 112

illustrates this point by showing that the correlation network with edges rs > 0.8 recovers the 113

ATP synthase complex and the NADH dehydrogenase complex, whereas the correlation 114

network with edges rs > 0.6 additionally recovers complexes in the tricarboxylic acid cycle and 115



https://doi.org/10.1101/2020.09.08.287334


6

the spliceosome. Next, we asked whether protein pairs with highly correlated abundance in one 116

species are also highly correlated in another species. For this, we selected the most and least 117

correlating orthologues in one species (top and bottom 0.5 percentile) and tested whether these 118

most correlating pairs also showed higher covariance than the least correlating pairs in a second 119

species. We found that this was the case for all species combinations, indicating that covariance 120

profiles are conserved across species (Figure 1D and Figure S3, Wilcoxon signed-rank test p-121

values < 0.001, number of mouse orthologue pairs that are highly correlated in human/ number 122

of human orthologue pairs that are highly correlated in mouse = 1429). Consequently, proteins 123

that belong to a protein complex in one species also correlate higher in a second species (Figure 124

1D and Figure S3, Wilcoxon signed-rank test p-values < 0.001, number of pairs in mouse that 125

are annotated as CORUM complex pairs in human = 1807, number of pairs in human that are 126

annotated as CORUM complex pairs in mouse = 331). Taken together, our analyses indicate 127

that protein complex members exhibit coordinated expression, and that such coordinated 128

expression is conserved across species. 129

130

SWATH-MS correlation profiles reflect evolutionary trajectories of paralogues 131

We next asked whether our framework is able to capture important principles driving the 132

evolution of protein complexes. To this end, we focused on protein paralogues, because 133

paralogue diversification has been proposed as a significant factor of complex evolution4,5. 134

First, we wanted to verify that our protein abundance matrices recapitulated paralogue 135

divergence over time, as well as diversification of protein interactions. We identified paralogues 136

using Ensembl 92 (ref 27) and we classified them into paralogue families, whereby a family was 137

defined as the genes emerged from a single ancestral gene by duplication (Figure S4 and Table 138

S3, number of paralogue families human = 73, mouse = 114, Drosophila melanogaster = 3, 139

Saccharomyces cerevisiae = 9, mean size of paralogue families human = 9, mouse = 9, 140

Drosophila melanogaster = 11, Saccharomyces cerevisiae = 9). On a general scale, we found 141

that paralogous proteins exhibit a stronger degree of covariance than non-paralogous proteins 142

in all species examined (Figure 2A, Wilcoxon signed-rank test p-values < 0.001, number of 143

paralogous pairs human = 1302, mouse = 1964, Drosophila melanogaster = 119 and 144

Saccharomyces cerevisiae = 191). To assess whether paralogue covariance patterns 145

recapitulated sequence diversification, we tested whether a higher frequency of differentiating 146

mutations between paralogous pairs corresponded to a decrease in protein covariance. To do 147



https://doi.org/10.1101/2020.09.08.287334


7

so, we quantified all pairwise correlations among paralogue family members and determined 148

for each pair the rate of synonymous and non-synonymous amino acid sequence changes, i.e. 149

the number of nucleotide changes among the two paralogues that affects, respectively not 150

affects, the resulting codon sequence, relative to the paralogue length. As expected, the co-151

expression of paralogous pairs within a paralogue family was negatively associated with the 152

rate of synonymous and non-synonymous nucleotide changes (Figure 2B left and center, 153

respectively; due to limited numbers of observations we could not examine the Drosophila 154

melanogaster and Saccharomyces cerevisiae dataset in this and subsequent analysis). 155

Furthermore, the association of covariance with non-synonymous changes was stronger than 156

with synonymous changes, in line with intuition that non-synonymous mutations have generally 157

a greater phenotypic effect (Paired sample t-test p-values non-synonymous changes: 158

human = 0.005 and mouse = 0.009, synonymous changes: human = 0.04 and mouse = 0.05; 159

number of paralogue groups: human = 23 and mouse = 64). Finally, we reasoned that, if 160

covariance is a good proxy for sequence diversification, paralogues with more strongly 161

diverging interactomes – i.e. a smaller fraction of shared interactors – should also exhibit more 162

strongly diverging expression patterns, as interactome rewiring is likely a consequence of 163

sequence change. To test whether the paralogue covariance correlates to the diversity of protein 164

interactions, we calculated for each paralogous pair the Jaccard index of interaction partners, 165

defined as the intersection of the interaction partners divided by the union of the interaction 166

partners of each pair. By these means, we found that lower covariance within a paralogue family 167

was associated with more strongly diverging interactomes (Figure 2B right, Paired sample t-168

test p-values human = 0.008 and mouse = 0.09; number of paralogue groups in human = 35 and 169

mouse = 7). Taken together, the data show that protein covariance recapitulates paralogue 170

divergence over time as well as diversification of protein interactions that drives the evolution 171

of new protein complexes. 172

173

Negatively correlating SWATH-MS profiles from homomer-derived paralogues 174

are functionally divergent and prominent in complex members 175

Since correlation of quantitative proteomics data can inform us about the divergence of 176

evolutionary trajectories, we used it to assess, on a proteome wide scale, the notion of paralogue 177

interference (Figure 2C), first on a whole proteome level and then focusing specifically on 178

known protein complexes. We first reasoned that, if the divergence in protein abundance can 179



https://doi.org/10.1101/2020.09.08.287334


8

serve as a mechanism to escape interference, then homomeric paralogues should be 180

differentially abundant to a greater extent than monomer-derived paralogues. To test that, we 181

retrieved homo-, hetero- and monomer annotations from InterEvols32 and classified paralogous 182

pairs either as homomer-derived if at least one member was annotated as homomer, or as 183

monomer-derived when none of the members was annotated as either homo- or heteromer. In 184

line with our expectations, we found that among negatively correlating pairs, homomer-derived 185

paralogues showed an enrichment factor of 6.9 in human and 1.8 in mouse, respectively, over 186

monomer-derived paralogues (Figure S5; negative correlation rs < -0.7; number of human 187

homomeric pairs = 540, human monomeric pairs = 620, mouse homomeric pairs = 615, mouse 188

monomeric pairs = 1121, Fisher’s exact test p-values = 0.04 and 0.2, respectively). Of note, we 189

also found that among all negatively correlating paralogous pairs, homomer-derived pairs were 190

more likely to be of different abundance in tissues than monomer-derived paralogues, as defined 191

by the tissue specific expression analysis of the Human Proteome Map20 (HPM) (Figure 2D and 192

Figure S6, number of pairs human = 21 and mouse = 36). This indicates that homomeric 193

paralogues are more prone to be affected by spatial separation of expression, and gives credence 194

to the notion that this separation has evolved in response to protein interference. 195

Finally, we asked what impact the mechanism of paralogue interference has had on the 196

diversification of protein complexes. If interference from homomeric paralogues has played 197

any role in the organization of complexome diversity, then there must be a subset of complexes 198

whose homomer-derived paralogue members exhibit highly divergent expression patterns; and, 199

as a corollary, such divergence should not be observed in the case of monomer-derived 200

paralogues. We therefore plotted the distribution of correlations of protein abundance for all 201

paralogues present in CORUM complexes across the two classes listed above. Strikingly, we 202

found that protein complexes containing homomer-derived paralogues contained two discrete 203

subsets of highly positively and highly negatively correlating paralogue members (Figure 2E 204

and Figure S7). In support of the homomer-derived paralogue specificity of such a pattern, we 205

found no evidence for a similar distribution for the monomer-derived paralogues. Furthermore, 206

by calculating the covariance correlation for all CORUM complex members, irrespective of 207

them being paralogous or non-paralogous proteins, we found that homomer-derived paralogues 208

are strongly enriched among the negatively correlating complex pairs (Figure S8; negative 209

correlation rs < -0.7; number of human homomeric CORUM pairs = 5645, human CORUM 210

pairs = 11820, mouse homomeric CORUM pairs = 457, mouse CORUM pairs = 1572, Fisher’s 211

exact test p-values < 0.001 and 0.05, respectively). This leads us to suggest a classification of 212



https://doi.org/10.1101/2020.09.08.287334


9

protein complexes containing homomer-derived paralogues in two distinct groups: those where 213

paralogues are not interfering with each other’s function and therefore the need to minimize 214

their spatiotemporal co-existence is alleviated; and those that have diverged in their abundance 215

levels under the pressure of negative interference. To further corroborate our findings and 216

conceptualization, we manually curated the whole set of protein complexes in the ‘escaped’, 217

i.e. negatively correlating class. Consistent with our proposal, we found for 84% of the human 218

and 67% of the mouse complexes in this category, respectively, literature evidence supporting 219

mutual exclusivity/complementarity (Number of complexes with negatively correlating pairs 220

human = 13 and mouse = 6, for the complete list of paralogue correlations see Table S4 and 221

Figure S9-10). In the human dataset, for example, negatively correlating, homomer-derived 222

CORUM paralogous pairs included two subcomplexes of the emerin complex, one with 223

lamin A and the other with lamin B1. Lamin A has been shown to regulate nuclear mechanisms 224

and is also associated with several diseases, including Emery–Dreifuss muscular dystrophy. In 225

contrast, Lamin B1 is involved in intermediate filaments from the cytoskeleton, but not in 226

nuclear mechanisms21. Another example was the homomer-derived paralogues Hspa5 and 227

Hspa8, which are part of the HCF-1 complex involved in cell cycle and transcriptional 228

regulation21. Whereas Hspa5 localizes in the ER lumen, Hspa8 resides in the nucleolus and the 229

cell membrane21. In both the human and the mouse dataset, the paralogues were among the 230

most negatively correlated pairs. Further examples from the mouse dataset include negatively 231

correlating homomeric paralogous pairs of the ubiquitin E3 ligase complex, that is Cul1 and 232

Cul2, as well as Cul2 and Cul3. This is consistent with studies that established that the 233

paralogues Cul1, Cul2 and Cul3 are involved in three distinct subcomplexes22. Additionally, 234

the mouse dataset showed two negatively correlating members of the Ubiquitin-proteasome 235

complex, UBQLN1 and UBQLN2. Only UBQLN2 was shown to be able to translocate to the 236

nucleus, and it has been shown that after heat stress, the two proteins are in distinct subcellular 237

locations23. Taken together, our data indicate that the class of proteins postulated to be more 238

prone to protein interference, that is, homomer-derived paralogues, and especially those that 239

are part of protein complexes, exhibit a stronger divergence in protein abundance than other 240

paralogues, as well as subcellular specialization. By this means, our correlation studies pinpoint 241

interference escape as an important mechanism of protein complex evolution. 242



https://doi.org/10.1101/2020.09.08.287334


10

Discussion 243

In this study we show that rigorous statistical analysis of sets of protein abundance maps across 244

species and tissues can inform us about the evolution of protein complexes. We used this 245

framework to address the role of paralogue interference and diversification in protein complex 246

evolution and specialization. Besides demonstrating that protein complex members tend to 247

display highly correlated expression profiles, and that these profiles are conserved across 248

species, we also indicate that protein abundance matrices recapitulate paralogue divergence 249

over time, as well as diversification of protein interactions. Our study culminates with the 250

observation that homomeric paralogues that are part of protein complexes show highly 251

divergent expression patterns. This supports the notion that this is a mechanism by which 252

protein interference among homomeric paralogues is avoided and complexes are diversified. 253

While many general aspects of protein complex architectural principles and evolution have been 254

addressed in previous studies, to the best of our knowledge this is the first time that the escape 255

of paralogue interference, specifically by separation of expression, has been analyzed on the 256

proteome level and across species. Typically, positive co-variance of expression has been used 257

to study conservation of protein complexes. Here we show that, since divergence of co-258

expression patterns seem to scale to some extent with functional and structural diversification, 259

negative correlation may pinpoint specific evolutionary constraints in maintaining separation 260

of homomer-derived paralogues. In fact, our data suggest that anti-correlating complex pairs 261

are enriched in homomer-derived paralogues (Figure 2E). This is in agreement with the 262

suggestion that homomer-derived paralogues are the most likely to be affected by protein-263

protein interference which can be resolved by separating expression. We find only a fraction of 264

homomer-derived paralogues to exhibit such anticorrelation, while others strongly correlate. 265

We therefore suggest that complex paralogues can be divided into two main groups. First, 266

highly correlating co-evolved subunits and second, negatively correlating paralogues which, by 267

being expressed in complementary fashions minimize the risk of interference. We indicate 268

several examples of paralogues belonging to the latter class, for example lamin A and B1, which 269

could represent suitable targets for follow-up studies. 270

At present, the scope and generalizability of our conclusions is limited by several factors. First, 271

protein complex formation and evolution are likely to have many determinants, which can mask 272

the effect of paralogue interference or compensate for it in ways other than expression 273

divergence. Second, our analyses rests on resources curating protein complexes and 274



https://doi.org/10.1101/2020.09.08.287334


11

distinguishing homomer- from monomer-derived paralogues. Broadening and improvement in 275

such curations and annotation will allow more extensive and statistically more robust analyses 276

and conclusions. This has a clear impact on comparative studies, where the extent of annotation 277

between species varies greatly. Finally, the correlational nature of this, and many other 278

methodologically related studies, must be stressed. We show that proteome-wide, cross-tissue 279

and cross-species analyses are capable of capturing patterns that would otherwise be 280

indiscernible. In this respect, we identified specific trends behind complex evolution which give 281

support to specific proposals, such as homomer-derived paralogues-driven complex formation 282

and interference escape. However, targeted studies should decode the mechanisms underlying 283

these trends. Such an investigation, together with correlation analyses covering larger sets of 284

conditions, are what in our view holds the greatest promise to refine our understanding of the 285

forces shaping protein complex evolution. 286



https://doi.org/10.1101/2020.09.08.287334


12

Material and methods 287

SWATH-MS datasets and co-expression measures 288

We obtained all protein abundance data from publicly available SWATH-MS datasets. For 289

human we used the data of Guo et al. 20197, for mouse the data from Williams at al. 20189, for 290

fly the data of Okada et al. 20168 and for yeast the SWATH-MS dataset of the yeast strains 291

described Zhu et al., 200824 and Brem et al., 200225 (manuscript in preparation). For further 292

description of the datasets see Table S1. All analyses were based on the available protein matrix 293

with relative protein intensities. 294

For each protein pair, we calculated the Spearman correlation of raw protein abundances across 295

all samples. We used the cor function of the R package stats v. 3.4.4.26 with the option 296

pairwise.complete.obs to compute the correlation between each pair of proteins using all 297

complete pairs of observations on those proteins. 298

Orthologue identification 299

Orthologue mapping was conducted with BioMart Ensembl 9227 (release April 2018). We 300

considered only genes with a “one2one” mapping, i.e. when the gene in one species has only 301

one defined ortholog in another species. To translate protein and gene identifiers, we used the 302

R package biomaRt v2.34.228,29. 303

Paralogue identification and analysis 304

We identified paralogues and paralogue families with Ensembl 92 (release April 2018) 27. As 305

paralogue family we defined all paralogues connected via direct pairs. We determined the rate 306

of non-synonymous changes and synonymous changes between the paralogous pairs using 307

Ensembl 92 (ref27). We next assessed the similarity of interaction partners among paralogous 308

pairs. We obtained the interaction partners of each paralogue from BioGrid v3.530,31 and we 309

calculated the Jaccard similarity index, that is the intersection divided by the union of distinct 310

interaction partners. 311

To determine whether a paralogous pair is derived from an ancestral protein that either formed 312

homomers or was only present as a monomer, we used the InterEvol database (release 2010), 313

designed for the analysis of co-evolution events at the structural interfaces of hetero- and homo-314



https://doi.org/10.1101/2020.09.08.287334


13

oligomers32. We considered a paralogous pair as homomer-derived if at least one member was 315

annotated as homomer, and a pair as monomer-derived when none of the members was 316

annotated as either homo- or heteromer. For manual paralogue annotation, we additionally 317

considered the UniProt database (release November 2018)21. 318

Functional annotation 319

For functional enrichment analysis we used DAVID v6.818 with the following parameters: 320

Annotation categories: GOTERM_BP_DIRECT; GO_Kappa similarity: Similarity term 321

overlap = 3, similarity threshold = 0.5; Classification: Initial group membership = 2, final 322

group membership = 2, multiple linkage threshold = 0.5; Enrichment thresholds: EASE = 0.05. 323

We retrieved lists of protein complexes from CORUM v2.014,15. Tissue specific expression data 324

was retrieved from the Human Proteome Map portal20 (HPM). To compare expression between 325

tissue, the data was normalized using the normalizeBetweenArrays function from the R package 326

limma v. 3.40.633. 327

Statistics and visualization 328

We conducted all statistical analysis with R v3.4.426. For the Fisher's exact tests and the 329

Wilcoxon tests we used a one-sided alternative hypothesis if applicable. We drew density 330

graphs, bar- and boxplots with the R package ggplot2 v3.1.034. Boxplots were drawn in default 331

settings (lower and upper hinges correspond to the first and third quartiles, whiskers extend up 332

to 1.5 x the inter-quartile range or the distance between the first and the third quartile). In 333

addition, we used the Van de Peer’s webtool to draw the Venn diagrams35. For network 334

representations, we used Cytoscape v3.6.136. 335



https://doi.org/10.1101/2020.09.08.287334


14

References 336

1. Romanov, N. et al. Disentangling Genetic and Environmental Effects on the 337 Proteotypes of Individuals. Cell 177, 1308-1318.e10 (2019). 338

2. Ahnert, S. E., Marsh, J. A., Hernandez, H., Robinson, C. V. & Teichmann, S. A. 339 Principles of assembly reveal a periodic table of protein complexes. Science (80-340 . ). 350, 2245–2245 (2015). 341

3. Marsh, J. A. & Teichmann, S. A. Structure, Dynamics, Assembly, and Evolution 342 of Protein Complexes. Annu. Rev. Biochem. 84, 551–575 (2015). 343

4. Pereira-Leal, J. B., Levy, E. D., Kamp, C. & Teichmann, S. A. Evolution of 344 protein complexes by duplication of homomeric interactions. Genome Biol. 8, 345 (2007). 346

5. Kaltenegger, E. & Ober, D. Paralogue Interference Affects the Dynamics after 347 Gene Duplication. Trends Plant Sci. 20, 814–821 (2015). 348

6. Gillet, L. C. et al. Targeted data extraction of the MS/MS spectra generated by 349 data-independent acquisition: A new concept for consistent and accurate 350 proteome analysis. Mol. Cell. Proteomics 11, 1–17 (2012). 351

7. Guo, T. et al. Quantitative Proteome Landscape of the NCI-60 Cancer Cell Lines. 352 iScience 21, 664–680 (2019). 353

8. Okada, H., Ebhardt, H. A., Vonesch, S. C., Aebersold, R. & Hafen, E. Proteome-354 wide association studies identify biochemical modules associated with a wing-355 size phenotype in Drosophila melanogaster. Nat. Commun. 7, 1–11 (2016). 356

9. Williams, E. G. et al. Quantifying and localizing the mitochondrial proteome 357 across five tissues in a mouse population. Mol. Cell. Proteomics 17, 1766–1777 358 (2018). 359

10. Wan, C. et al. Panorama of ancient metazoan macromolecular complexes. Nature 360 525, 339–344 (2015). 361

11. Skinnider, M. A. et al. An Atlas of Protein-Protein Interactions Across 362 Mammalian Tissues. SSRN Electron. J. (2018). doi:10.2139/ssrn.3219264 363

12. Heusel, M. et al. Complex‐centric proteome profiling by SEC ‐ SWATH ‐ MS . 364 Mol. Syst. Biol. 15, 1–22 (2019). 365

13. Snider, J. et al. Fundamentals of protein interaction network mapping. Mol. Syst. 366 Biol. 11, 848 (2015). 367

14. Giurgiu, M. et al. CORUM: The comprehensive resource of mammalian protein 368 complexes - 2019. Nucleic Acids Res. 47, D559–D563 (2019). 369

15. Ruepp, A. et al. CORUM: The comprehensive resource of mammalian protein 370 complexes-2009. Nucleic Acids Res. 38, 497–501 (2009). 371

16. Murali, T. et al. DroID 2011: a comprehensive, integrated resource for protein, 372 transcription factor, RNA and gene interactions for Drosophila. Nucleic Acids 373 Res. 39, D736-43 (2011). 374

17. Pu, S., Wong, J., Turner, B., Cho, E. & Wodak, S. J. Up-to-date catalogues of 375 yeast protein complexes. Nucleic Acids Res. 37, 825–831 (2009). 376

18. Huang, D. W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative 377 analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 378 4, 44–57 (2009). 379

19. Huang, D. W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment 380



https://doi.org/10.1101/2020.09.08.287334


15

tools: paths toward the comprehensive functional analysis of large gene lists. 381 Nucleic Acids Res. 37, 1–13 (2009). 382

20. Kim, M.-S. et al. A draft map of the human proteome. Nature 509, 575–581 383 (2014). 384

21. UniProt Consortium, T. UniProt: the universal protein knowledgebase. Nucleic 385 Acids Res. 46, 2699–2699 (2018). 386

22. Bosu, D. R. & Kipreos, E. T. Cullin-RING ubiquitin ligases: Global regulation 387 and activation cycles. Cell Division 3, 7 (2008). 388

23. Hjerpe, R. et al. UBQLN2 Mediates Autophagy-Independent Protein Aggregate 389 Clearance by the Proteasome. Cell 166, 935–949 (2016). 390

24. Zhu, J. et al. Integrating large-scale functional genomic data to dissect the 391 complexity of yeast regulatory networks. Nat. Genet. 40, 854–861 (2008). 392

25. Brem, R. B., Yvert, G., Clinton, R. & Kruglyak, L. Genetic dissection of 393 transcriptional regulation in budding yeast. Science (80-. ). 296, 752–755 (2002). 394

26. Team, R. C. No Title. R: A Language and Environment for Statistical Computing 395 (2019). Available at: https://www.r-project.org/. 396

27. Zerbino, D. R. et al. Ensembl 2018. Nucleic Acids Res. 46, (2018). 397 28. Durinck, S., Spellman, P. T., Birney, E. & Huber, W. Mapping identifiers for the 398

integration of genomic datasets with the R/ Bioconductor package biomaRt. Nat. 399 Protoc. 4, 1184–1191 (2009). 400

29. Durinck, S. et al. BioMart and Bioconductor: a powerful link between biological 401 databases and microarray data analysis. Bioinformatics 21, 3439–3440 (2005). 402

30. Stark, C. et al. BioGRID: a general repository for interaction datasets. Nucleic 403 Acids Res. 34, D535-9 (2006). 404

31. Oughtred, R. et al. The BioGRID interaction database: 2019 update. Nucleic Acids 405 Res. 47, D529–D541 (2019). 406

32. Faure, G., Andreani, J. & Guerois, R. InterEvol database: Exploring the structure 407 and evolution of protein complex interfaces. Nucleic Acids Res. 40, 847–856 408 (2012). 409

33. Ritchie, M. E. et al. Limma powers differential expression analyses for RNA-410 sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015). 411

34. Wickham, H. ggplot2: Elegant Graphics for Data Analysis. (Springer-Verlag 412 New York, 2016). 413

35. Van der Peer, Y. Calculate and draw custom Venn diagrams. (2018). 414 36. Shannon, P. et al. Cytoscape: A software Environment for integrated models of 415

biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003). 416 417



https://doi.org/10.1101/2020.09.08.287334


A

C

#

HNRNPC

SRRT

RBM39

PRPF19

DHX15

HNRPK

HNRNPU

PUF60

WBP11

HNRPQ

ATP5I

ATP5H

ATP5F1 ATP5F1CATP5J

ATP5F1A

ATP5O

ATP5F1B

ATP5F1D

NDUFB3

NDUFV1

NDUFS6

NDUFB9

NDUFS5 NDUFS3

NDUFA2NDUFB4

NDUFB7 NDUFA9

NDUFA13NDUFA8

NDUFA7

NDUFA5

DLST

CS

IDH1

FAHD1

PDHB

IDH3GIDH3B

IDH3A

ACO2

OGDH

PDHA1

SUCLG2

SUCLG1

DLAT

SDHBSDHA

FH

Citrate

Cis-aconitate

D-threo-isocitrate

2-Oxoglutarate

Succinyl-CoASuccinate

Fumarate

(S)-malate

Oxalacetate +Acetyl-CoA

ACO2

IDH3G

OGDH, DLSTDLD

SUCLG1, SUCLG2

FH

MDH2

CS

ACO2

Spearman correlation network rs >0.6Spearman correlation network rs >0.8

D

−0.5 0.0 0.5 1.0

Den

sity

−1.0 −0.5 0.0 0.5 1.0

Den

sity

−1.0 −0.5 0.0 0.5 1.0

Den

sity

−1.0 −0.5 0.0 0.5 1.0

Den

sity

****** *** ***

−1.0

High rs humanLow rs human

Complex humanComplex human

High rs mouseLow rs mouse

Complex mouseComplex mouse

0.5 0.0 0.5 1.0Spearman correlation of mouse orthologs

High Corr Human

Low Corr Human

pval <0.001 Ntop0.05per = 1429 Nbottom0.05per = 1429


High Corr Human

Low Corr Human


0.00

0.25

0.50

0.75


Den

sity

High Corr Human

Low Corr Human


0.00

0.25

0.50

0.75


Den

sity

High Corr Human

Low Corr Human


0.00

0.25

0.50

0.75


Den

sity

High Corr Human

Low Corr Human


0.00

0.25

0.50

0.75


Den

sity

High Corr Human

Low Corr Human


0.00

0.25

0.50

0.75


Den

sity

High Corr Human

Low Corr Human


0.00

0.25

0.50

0.75


Den

sity

High Corr Human

Low Corr Human


rs mouse orthologues rs mouse orthologues rs human orthologues rs human orthologues

hRNPs associate with A complex

Prpf19 activates B complex

Dhx15 dissociates C complex

Early

Late

Binds 3’ SS on RNABinds capped RNAs

SDHC, SDHDSDHA, SDHB

IDH3A, IDH3B,

A

A’

Human

Mouse

Drosophilamelanogaster

Sacharomycescerevisiae

60

40

60

112

3171

3648

1610

1862

Samples Proteins

Inte

nsity

A

Intensity A’Retention time

Inte

nsity

m/z

A A

A’ A’

Homomeric dimer A

Homomeric dimer A’

A A’

Paralogousheteromer

Development of inactive heteromer

50%

25%25%

Coevolution

Expressional separation

Prevent cross-interaction

Inte

nsity

A

Intensity A’

Inte

nsity

A

Intensity A’

Inte

nsity

A

Intensity A’

B



https://doi.org/10.1101/2020.09.08.287334


Figure 1. SWATH-MS co-expression profiles of protein complex modules are conserved 1

across species. 2

(A) Schematic drawing illustrating possible evolutionary routes from homomer-derived 3

paralogues into protein complexes. 4

(B) Study design. Protein co-expression profiles were acquired in SWATH-MS from 272 5

samples covering the proteome of four species. Spearman’s rank correlation (rs) was used to 6

measure profile similarity. 7

(C) Conserved correlation networks of human and mouse. Nodes represent proteins, edges 8

indicate high correlation in both human and mouse datasets. Red edges are annotated in 9

CORUM. In panels on the left, a stringent correlation cutoff of rs > 0.8 was chosen, on the right 10

a more relaxed cutoff of rs > 0.6 was used. For comparison, correlation values of human and 11

mouse were quantile normalized. 12

Left: Proteins of the ATP-synthase complex (top) and proteins of the NADH dehydrogenase 13

complex (bottom). 14

Right: Proteins involved in the tricarboxylic acid cycle (top) and proteins involved in the 15

spliceosome (bottom). 16

(D) (I) Mouse Spearman correlation for mouse orthologues that are highly correlated in human 17

(top 0.5 percentile, rs > 0.8, npairs = 1429) and for mouse orthologues that not correlate in human 18

(bottom 0.5 percentile, npair = 1429). Pairs that highly correlate in human correlate also 19

significantly higher in mouse (md 0.2 vs 0.0, Wilcoxon signed-rank test p-value < 0.001). 20

(II) Mouse Spearman correlation for pairs that are annotated as CORUM complex pairs in 21

human (npairs = 1807) and for pairs that are not annotated in CORUM (npairs = 712803). Human 22

CORUM pairs correlate significantly higher in mouse (md 0.35 vs 0.0, Wilcoxon signed-rank 23

test p-value < 0.001). 24

(III) Human Spearman correlation for human orthologues that are highly correlated in mouse 25

(top 0.5 percentile, rs > 0.8, npairs = 1429) and for human orthologues that not correlate in mouse 26

(bottom 0.5 percentile, npairs = 1429). Pairs that highly correlate in mouse correlate also 27

significantly higher in human (md 0.2 vs 0.0, Wilcoxon signed-rank test p-value < 0.001). 28



https://doi.org/10.1101/2020.09.08.287334


2

(IV) Human Spearman correlation for pairs that are annotated as CORUM complex pairs in 29

mouse (npairs = 331) and for pairs that are not annotated in CORUM (npairs = 714279). Mouse 30

CORUM pairs correlate significantly higher in human (md 0.3 vs 0.0, Wilcoxon signed-rank 31

test p-value < 0.001). 32



https://doi.org/10.1101/2020.09.08.287334


−1.0 −0.5 0.0 0.5 1.0rs Human

Den

sity

−1.0 −0.5 0.0 0.5 1.0rsMouse

Den

sity

−0.5 0.0 0.5 1.0rs Fly

Den

sity

−0.5 0.0 0.5 1.0rs Yeast

Den

sity

−1.0−1.0

ParaloguesNot paralogues

0.00

0.5 0.0 0.5Spearman correlation of mouse orthologs

High Corr Human

Low Corr Human

0.00

0.5 0.0 0.5Spearman correlation of mouse orthologs

High Corr Human

Low Corr Human

0.0

0.2

0.4

0.6

Many FewDifferent interaction partners

pval hl:0.008 med t/b:0.24/0.18Number of Paragroups:35

Human

0.0

0.2

0.4

0.6

Many FewDifferent interaction partners


Mouse

0.0

0.2

0.4

0.6

Many FewNon synonomous changes


Human

0.0

0.2

0.4

0.6

Many FewSynonomous changes


Human

0.00

0.25

0.50

0.75

Many FewNon synonomous changes


Mouse

0.00

0.25

0.50

0.75

Many FewSynonomous changes


Mouse

top third pairs with most

bottom third pairs with fewest

Non-synonymous changes Synonymous changes Interactome divergence

Non-synonymous changes Synonymous changes Interactome divergence

** * **

** * +

Overlapping top expressed tissuesbetween paralogous pairs

Overlapping top expressed tissuesbetween paralogous pairs

Perc

ent

Homomer-derivedMonomer-derived

60

40

20

0

Perc

ent

30

20

10

0

Most negatively correlating dodecile

Homomer CORUMHomomer not CORUM

Most positivelycorrelating dodecile

Homomer-derived paralogues Monomer-derived paralogues

Most negatively correlating dodecile

Most positivelycorrelating dodecile

−0.5 0.0 0.5 1.0rs

-1.0

Monomer CORUMMonomer not CORUM

A

B

D

C

−0.5 0.0 0.5 1.0rs

-1.0

r s

r sr sr s

r sr s

****** *** ***D

ensi

ty

Den

sity

0.0

2.5

5.0

7.5

Perc

ent

0.0

2.5

5.0

7.5

Perc

ent

+ + n.s. +



top third pairs with highest

bottom third pairs with lowest





top third pairs with highest

bottom third pairs with lowest



https://doi.org/10.1101/2020.09.08.287334


Figure 2: Co-expression profiles of paralogues are conserved across species and reflect 1

sequence divergence within the paralogue family. Negatively correlating homomer-2

derived paralogues are functional divergent and prominent in complexes. 3

(A) Spearman correlation of paralogous pairs in yeast, fly, mouse and human compared to all 4

measured non-paralogous pairs (number of paralogous pairs human = 1302, mouse = 1964, 5

fly = 119 and yeast = 191). Paralogues correlate significantly higher in all species (Wilcoxon 6

signed-rank test p-values < 0.001, md 0.12 vs 0.03, md 0.25 vs 0.01, md 0.34 vs 0.08 and 7

md 0.17 vs 0.04, respectively). 8

(B) Left: Spearman correlation coefficients of paralogous pairs within the top, respectively 9

bottom third of non-synonymous sequence changes within their paralogue family. Families that 10

have a range in Spearman correlation of < 0.4 and a range in non-synonymous changes of < 0.2 11

were excluded. Paralogous pairs with fewer non-synonymous changes tend to correlate higher 12

(Paired sample t-test p-values human = 0.005 and mouse = 0.009; number of paralogue families 13

in human = 23 and mouse = 64). 14

Middle: Spearman correlation coefficients of paralogous pairs within the top, respectively 15

bottom third of synonymous sequence changes within their paralogue family. Families that have 16

a range in Spearman correlation of < 0.4 and a range in synonymous changes of < 0.2 were 17

excluded. Paralogous pairs with fewer synonymous changes tend to correlate higher (Paired 18

sample t-test p-values human = 0.04 and mouse = 0.05, number of paralogue families in 19

human = 28 and mouse = 78). 20

Right: Spearman correlation coefficients of paralogous pairs within the top, respectively 21

bottom third of interactome divergence within their paralogue family. To quantify interactome 22

divergence, the Jaccard index, defined as the intersection of the interaction partners divided by 23

the union of the interaction partners of each pair was calculated. All interaction partners from 24

Biogrid were considered. Families that have a range of the Jaccard index of < 0.2 and range in 25

Spearman correlation of < 0.2 were excluded. Paralogous pairs with more similar interaction 26

partners tend to correlate higher (Paired sample t-test p-values human = 0.008 and 27

mouse = 0.09; number of paralogue families in human = 35 and mouse = 7). 28

(C) The overlap of the four most expressed tissues between paralogous pairs (as defined by 29

HPM) is compared between strongly negative correlating homomer-derived and monomer-30

derived pairs paralogous pairs (rs < -0.5, human npairs = 21, mouse npairs = 36). Negatively 31



https://doi.org/10.1101/2020.09.08.287334


2

correlating homomer-derived paralogues tend to be expressed more diversely across different 32

tissues compared to monomer-derived paralogues. 33

(D) Left: Spearman correlation of homomer-derived paralogues that are annotated as CORUM 34

complex pairs (npairs = 42) and for pairs that are not annotated in CORUM (npairs = 474). 35

Homomer-derived paralogues are enriched in the bottom and top dodecile of the correlation 36

distribution (Fisher’s exact test p-values 0.06 and 0.1). 37

Right: Spearman correlation of monomer-derived paralogues that are annotated as CORUM 38

complex pairs (npairs = 51) and for pairs that are not annotated in CORUM (npairs = 569). 39

Monomer-derived paralogues are enriched not in the bottom but in the top dodecile of the 40

correlation distribution (Fisher’s exact test p-values 0.5 and 0.06). 41

*** indicate p-values £ 0.001, ** £ 0.01, * £ 0.01, + £ 0.1. 42



https://doi.org/10.1101/2020.09.08.287334


Date post:	22-Jan-2021
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

SWATH-MS co-expression profiles reveal paralogue ......2020/09/08 · 112 enriched (Table S2, EASE...

Documents