SWATH-MS co-expression profiles reveal paralogue 1
interference in protein complex evolution 2
3
Luzia Stalder1,*, Amir Banaei-Esfahani1, Rodolfo Ciuffa1, Joshua L Payne2,3, 4
Ruedi Aebersold1,4,* 5
1 Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, 8093 Zurich, 6
Switzerland. 7
2 Institute of Integrative Biology, ETH Zurich, 8092 Zurich, Switzerland. 8
3 Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland. 9
4 Faculty of Science, University of Zurich, 8057 Zurich, Switzerland. 10
* Authors for correspondence: [email protected] and [email protected] 11
12
Abstract 13
Understanding the conservation and evolution of protein complexes is of critical value to 14
decode their function in physiological and pathological processes. One prominent proposal 15
posits gene duplication as a potential mechanism for protein complex evolution. In this study 16
we take advantage of large-scale proteome expression datasets to systematically investigate the 17
role of paralogues, and specifically self-interacting paralogues, in shaping the evolutionary 18
trajectories of protein complexes. First, we show that protein co-expression derived from 19
quantitative proteomic matrices is a good indicator for complex membership and is conserved 20
across species. Second, we suggest that paralogues are commonly strongly co-expressed and 21
that for the subset of paralogues that show diverging co-expression patterns, the divergent co-22
expression patterns reflect both sequence and functional divergence. Finally, on this basis, we 23
show that homomeric paralogues known to be part of protein complexes display a unique co-24
expression pattern distribution, with a subset of them being highly diverging. These findings 25
support the idea that homomeric paralogues can avoid cross-interference by diversifying their 26
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
2
expression patterns, and corroborates the role of this mechanism as a force shaping protein 27
complex evolution and specialization. 28
29
Acknowledgments 30
We thank Marija Buljan for help with the project supervision. This project was supported by 31
the Swiss National Science Foundation through grant # SNSF 31003A_166435 to R.A. and by 32
the European Research Council (ERC) through grant (ERC-20140AdG 670821 to R.A. R.C. 33
was supported in part by the IMI project ULTRA-DD (FP07/2007-2013, grant no. 115766). 34
J.L.P. acknowledges support from Swiss National Science Foundation Grant PP00P3_170604. 35
Author contributions 36
L.S. conceived the study, performed the analysis and wrote the manuscript. R.C. helped with 37
manuscript writing. R.A. and J.L.P supervised the project and provided feedback on the 38
manuscript. A. B.E. helped with project supervision. 39
Declaration of interests 40
The authors declare no competing interests. 41
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
3
Introduction 42
Protein complexes – described as stable protein assemblies that can be isolated by biochemical 43
means – are one of the main modes of proteome organization and fundamental functional 44
entities of the cell. The dramatic increase in structural knowledge of these complexes, as well 45
as the advent of proteome-wide profiling methods (most prominently via mass spectrometry) 46
have allowed the interrogation of general principles governing protein complex formation, 47
function and evolution. For instance, bioinformatics approaches of the Protein Data Bank 48
(PDB) and of large-scale proteomics datasets, respectively, have identified core symmetries 49
that complexes obey, and the extent to which the co-regulation of their subunits is 50
constrained1,2. An especially fertile area of investigation relates to the evolution of protein 51
complexes. 52
It has been demonstrated that a possible mechanism for protein complex core formation starts 53
with the duplication of self-interacting proteins (homomeric paralogues; Figure 1A)3,4. 54
Importantly, recent research on the fate of paralogues pointed out that duplication of genes for 55
obligate homomeric proteins can lead to interference between the resulting paralogues: 56
mutations in only one of the paralogues can “poison” the oligomer and affect the ancestral 57
function5. It has been suggested that this constrains the evolution of paralogues on the one hand, 58
but also promotes additional regulatory complexity on the other5. Specifically, if sequence 59
divergence of the homomeric paralogues impairs the ancestral function, the mutated paralogue 60
acts as a highly specific competitive inhibitor for the ancestral protein. This is referred to as 61
paralogue interference5. To prevent this type of interference, the paralogues will be under 62
selection pressure to develop mechanisms that prevent their cross-interaction, for example by 63
mutations that drive expressional separation. Importantly, if paralogues that are separated by 64
expression give rise to protein complexes, they will contain either the ancestral or the duplicated 65
paralogous pair, but not both at the same time. Expressional separation after duplication is a 66
common pattern on the RNA level5. However, expressional separation has not been analyzed 67
on the protein level in a general way and its role in protein complex evolution remains elusive5. 68
A main limitation in studying protein complex evolution has been the difficulty of measuring 69
protein complex states in a high-throughput manner. Here we address this by using data-70
independent acquisition and analysis of mass spectrometry data, particularly by the sequential 71
windowed acquisition of all theoretical mass spectra (SWATH-MS)6. We study complex states 72
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
4
at the proteome level in human, mouse, Drosophila melanogaster and Saccharomyces 73
cerevisiae7–9. We demonstrate that protein co-expression profiles determined by SWATH-MS 74
are substantially conserved across species and that they can be used to map the diversification 75
of protein paralogues, as measured for instance by divergence in sequence or interactors. 76
Critically, we find that protein paralogues that are known to be part of a protein complex are 77
enriched in homomer-derived paralogues that separated in their expression levels. These are 78
indeed the proteins shown to be affected by interference, and our finding therefore supports the 79
notion that this mechanism can propel protein complex diversification and expression 80
divergence. In conclusion, our study indicates that quantitative proteomic data can be used to 81
infer protein complex relationships and identifies paralogue interference as a constraint of their 82
evolution. 83
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
5
Results 84
SWATH-MS co-expression profiles recover protein complexes and are conserved 85
across species 86
To analyze the role of paralogue diversification in the evolution of protein complexes, we first 87
set out to define a suitable methodological framework. In recent years, a number of studies have 88
taken advantage of the rapidly accumulating body of data on protein-protein interactions and 89
protein expression to analyze constraints on, and evolution of, protein complexes1–4,10–13. Some 90
of these studies have shown that the expression levels of protein complex subunits are generally 91
covarying, and that, conversely, co-expression patterns can be used as a proxy for functional, 92
interaction and complex relatedness. Here we used abundance profiles derived from SWATH-93
MS data across four species: human, mouse, Drosophila melanogaster and Saccharomyces 94
cerevisiae7–9 (Figure 1B). The proteome dataset of each species contains 40 to 112 samples. 95
Overall, between 1610 and 3171 proteins were consistently quantified per species dataset across 96
samples (Table S1 and Figure S1). To examine covariance of protein pairs, we used Spearman‘s 97
rank coefficient (rs). First, we aimed at showing that covariation patterns in our data can indeed 98
preferentially recall known protein complexes and detail their conservation across species. We 99
used manually curated catalogues of protein complexes as a benchmark, specifically 100
CORUM14,15 for human and mouse complexes, DroID16 for Drosophila melanogaster 101
complexes and CYC200817 for Saccharomyces cerevisiae complexes. The results showed that 102
in the datasets of all four species complex members have significantly higher rs compared to 103
those not annotated as members of the same complex (Figure S2, Wilcoxon sign-rank test p-104
values < 0.001, number of pairs human = 17453, mouse = 1739, Drosophila 105
melanogaster = 2261, Saccharomyces cerevisiae = 1807). To further characterize conserved 106
covariance profiles, we performed functional enrichment by DAVID18,19. We found that protein 107
pairs with highly covarying abundance profiles in human and mouse (rs > 0.8) are enriched in 108
mitochondrial processes. Interestingly, when we relaxed the cutoff to rs > 0.6 we noted that 109
protein pairs functionally annotated with “splicing” and “signaling pathways regulating 110
metabolism”, “proliferation”, “cell-cell adhesion” and “immune responses” were additionally 111
enriched (Table S2, EASE score, a modified Fisher exact test p-value < 0.05). Figure 1C 112
illustrates this point by showing that the correlation network with edges rs > 0.8 recovers the 113
ATP synthase complex and the NADH dehydrogenase complex, whereas the correlation 114
network with edges rs > 0.6 additionally recovers complexes in the tricarboxylic acid cycle and 115
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
6
the spliceosome. Next, we asked whether protein pairs with highly correlated abundance in one 116
species are also highly correlated in another species. For this, we selected the most and least 117
correlating orthologues in one species (top and bottom 0.5 percentile) and tested whether these 118
most correlating pairs also showed higher covariance than the least correlating pairs in a second 119
species. We found that this was the case for all species combinations, indicating that covariance 120
profiles are conserved across species (Figure 1D and Figure S3, Wilcoxon signed-rank test p-121
values < 0.001, number of mouse orthologue pairs that are highly correlated in human/ number 122
of human orthologue pairs that are highly correlated in mouse = 1429). Consequently, proteins 123
that belong to a protein complex in one species also correlate higher in a second species (Figure 124
1D and Figure S3, Wilcoxon signed-rank test p-values < 0.001, number of pairs in mouse that 125
are annotated as CORUM complex pairs in human = 1807, number of pairs in human that are 126
annotated as CORUM complex pairs in mouse = 331). Taken together, our analyses indicate 127
that protein complex members exhibit coordinated expression, and that such coordinated 128
expression is conserved across species. 129
130
SWATH-MS correlation profiles reflect evolutionary trajectories of paralogues 131
We next asked whether our framework is able to capture important principles driving the 132
evolution of protein complexes. To this end, we focused on protein paralogues, because 133
paralogue diversification has been proposed as a significant factor of complex evolution4,5. 134
First, we wanted to verify that our protein abundance matrices recapitulated paralogue 135
divergence over time, as well as diversification of protein interactions. We identified paralogues 136
using Ensembl 92 (ref 27) and we classified them into paralogue families, whereby a family was 137
defined as the genes emerged from a single ancestral gene by duplication (Figure S4 and Table 138
S3, number of paralogue families human = 73, mouse = 114, Drosophila melanogaster = 3, 139
Saccharomyces cerevisiae = 9, mean size of paralogue families human = 9, mouse = 9, 140
Drosophila melanogaster = 11, Saccharomyces cerevisiae = 9). On a general scale, we found 141
that paralogous proteins exhibit a stronger degree of covariance than non-paralogous proteins 142
in all species examined (Figure 2A, Wilcoxon signed-rank test p-values < 0.001, number of 143
paralogous pairs human = 1302, mouse = 1964, Drosophila melanogaster = 119 and 144
Saccharomyces cerevisiae = 191). To assess whether paralogue covariance patterns 145
recapitulated sequence diversification, we tested whether a higher frequency of differentiating 146
mutations between paralogous pairs corresponded to a decrease in protein covariance. To do 147
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
7
so, we quantified all pairwise correlations among paralogue family members and determined 148
for each pair the rate of synonymous and non-synonymous amino acid sequence changes, i.e. 149
the number of nucleotide changes among the two paralogues that affects, respectively not 150
affects, the resulting codon sequence, relative to the paralogue length. As expected, the co-151
expression of paralogous pairs within a paralogue family was negatively associated with the 152
rate of synonymous and non-synonymous nucleotide changes (Figure 2B left and center, 153
respectively; due to limited numbers of observations we could not examine the Drosophila 154
melanogaster and Saccharomyces cerevisiae dataset in this and subsequent analysis). 155
Furthermore, the association of covariance with non-synonymous changes was stronger than 156
with synonymous changes, in line with intuition that non-synonymous mutations have generally 157
a greater phenotypic effect (Paired sample t-test p-values non-synonymous changes: 158
human = 0.005 and mouse = 0.009, synonymous changes: human = 0.04 and mouse = 0.05; 159
number of paralogue groups: human = 23 and mouse = 64). Finally, we reasoned that, if 160
covariance is a good proxy for sequence diversification, paralogues with more strongly 161
diverging interactomes – i.e. a smaller fraction of shared interactors – should also exhibit more 162
strongly diverging expression patterns, as interactome rewiring is likely a consequence of 163
sequence change. To test whether the paralogue covariance correlates to the diversity of protein 164
interactions, we calculated for each paralogous pair the Jaccard index of interaction partners, 165
defined as the intersection of the interaction partners divided by the union of the interaction 166
partners of each pair. By these means, we found that lower covariance within a paralogue family 167
was associated with more strongly diverging interactomes (Figure 2B right, Paired sample t-168
test p-values human = 0.008 and mouse = 0.09; number of paralogue groups in human = 35 and 169
mouse = 7). Taken together, the data show that protein covariance recapitulates paralogue 170
divergence over time as well as diversification of protein interactions that drives the evolution 171
of new protein complexes. 172
173
Negatively correlating SWATH-MS profiles from homomer-derived paralogues 174
are functionally divergent and prominent in complex members 175
Since correlation of quantitative proteomics data can inform us about the divergence of 176
evolutionary trajectories, we used it to assess, on a proteome wide scale, the notion of paralogue 177
interference (Figure 2C), first on a whole proteome level and then focusing specifically on 178
known protein complexes. We first reasoned that, if the divergence in protein abundance can 179
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
8
serve as a mechanism to escape interference, then homomeric paralogues should be 180
differentially abundant to a greater extent than monomer-derived paralogues. To test that, we 181
retrieved homo-, hetero- and monomer annotations from InterEvols32 and classified paralogous 182
pairs either as homomer-derived if at least one member was annotated as homomer, or as 183
monomer-derived when none of the members was annotated as either homo- or heteromer. In 184
line with our expectations, we found that among negatively correlating pairs, homomer-derived 185
paralogues showed an enrichment factor of 6.9 in human and 1.8 in mouse, respectively, over 186
monomer-derived paralogues (Figure S5; negative correlation rs < -0.7; number of human 187
homomeric pairs = 540, human monomeric pairs = 620, mouse homomeric pairs = 615, mouse 188
monomeric pairs = 1121, Fisher’s exact test p-values = 0.04 and 0.2, respectively). Of note, we 189
also found that among all negatively correlating paralogous pairs, homomer-derived pairs were 190
more likely to be of different abundance in tissues than monomer-derived paralogues, as defined 191
by the tissue specific expression analysis of the Human Proteome Map20 (HPM) (Figure 2D and 192
Figure S6, number of pairs human = 21 and mouse = 36). This indicates that homomeric 193
paralogues are more prone to be affected by spatial separation of expression, and gives credence 194
to the notion that this separation has evolved in response to protein interference. 195
Finally, we asked what impact the mechanism of paralogue interference has had on the 196
diversification of protein complexes. If interference from homomeric paralogues has played 197
any role in the organization of complexome diversity, then there must be a subset of complexes 198
whose homomer-derived paralogue members exhibit highly divergent expression patterns; and, 199
as a corollary, such divergence should not be observed in the case of monomer-derived 200
paralogues. We therefore plotted the distribution of correlations of protein abundance for all 201
paralogues present in CORUM complexes across the two classes listed above. Strikingly, we 202
found that protein complexes containing homomer-derived paralogues contained two discrete 203
subsets of highly positively and highly negatively correlating paralogue members (Figure 2E 204
and Figure S7). In support of the homomer-derived paralogue specificity of such a pattern, we 205
found no evidence for a similar distribution for the monomer-derived paralogues. Furthermore, 206
by calculating the covariance correlation for all CORUM complex members, irrespective of 207
them being paralogous or non-paralogous proteins, we found that homomer-derived paralogues 208
are strongly enriched among the negatively correlating complex pairs (Figure S8; negative 209
correlation rs < -0.7; number of human homomeric CORUM pairs = 5645, human CORUM 210
pairs = 11820, mouse homomeric CORUM pairs = 457, mouse CORUM pairs = 1572, Fisher’s 211
exact test p-values < 0.001 and 0.05, respectively). This leads us to suggest a classification of 212
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
9
protein complexes containing homomer-derived paralogues in two distinct groups: those where 213
paralogues are not interfering with each other’s function and therefore the need to minimize 214
their spatiotemporal co-existence is alleviated; and those that have diverged in their abundance 215
levels under the pressure of negative interference. To further corroborate our findings and 216
conceptualization, we manually curated the whole set of protein complexes in the ‘escaped’, 217
i.e. negatively correlating class. Consistent with our proposal, we found for 84% of the human 218
and 67% of the mouse complexes in this category, respectively, literature evidence supporting 219
mutual exclusivity/complementarity (Number of complexes with negatively correlating pairs 220
human = 13 and mouse = 6, for the complete list of paralogue correlations see Table S4 and 221
Figure S9-10). In the human dataset, for example, negatively correlating, homomer-derived 222
CORUM paralogous pairs included two subcomplexes of the emerin complex, one with 223
lamin A and the other with lamin B1. Lamin A has been shown to regulate nuclear mechanisms 224
and is also associated with several diseases, including Emery–Dreifuss muscular dystrophy. In 225
contrast, Lamin B1 is involved in intermediate filaments from the cytoskeleton, but not in 226
nuclear mechanisms21. Another example was the homomer-derived paralogues Hspa5 and 227
Hspa8, which are part of the HCF-1 complex involved in cell cycle and transcriptional 228
regulation21. Whereas Hspa5 localizes in the ER lumen, Hspa8 resides in the nucleolus and the 229
cell membrane21. In both the human and the mouse dataset, the paralogues were among the 230
most negatively correlated pairs. Further examples from the mouse dataset include negatively 231
correlating homomeric paralogous pairs of the ubiquitin E3 ligase complex, that is Cul1 and 232
Cul2, as well as Cul2 and Cul3. This is consistent with studies that established that the 233
paralogues Cul1, Cul2 and Cul3 are involved in three distinct subcomplexes22. Additionally, 234
the mouse dataset showed two negatively correlating members of the Ubiquitin-proteasome 235
complex, UBQLN1 and UBQLN2. Only UBQLN2 was shown to be able to translocate to the 236
nucleus, and it has been shown that after heat stress, the two proteins are in distinct subcellular 237
locations23. Taken together, our data indicate that the class of proteins postulated to be more 238
prone to protein interference, that is, homomer-derived paralogues, and especially those that 239
are part of protein complexes, exhibit a stronger divergence in protein abundance than other 240
paralogues, as well as subcellular specialization. By this means, our correlation studies pinpoint 241
interference escape as an important mechanism of protein complex evolution. 242
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
10
Discussion 243
In this study we show that rigorous statistical analysis of sets of protein abundance maps across 244
species and tissues can inform us about the evolution of protein complexes. We used this 245
framework to address the role of paralogue interference and diversification in protein complex 246
evolution and specialization. Besides demonstrating that protein complex members tend to 247
display highly correlated expression profiles, and that these profiles are conserved across 248
species, we also indicate that protein abundance matrices recapitulate paralogue divergence 249
over time, as well as diversification of protein interactions. Our study culminates with the 250
observation that homomeric paralogues that are part of protein complexes show highly 251
divergent expression patterns. This supports the notion that this is a mechanism by which 252
protein interference among homomeric paralogues is avoided and complexes are diversified. 253
While many general aspects of protein complex architectural principles and evolution have been 254
addressed in previous studies, to the best of our knowledge this is the first time that the escape 255
of paralogue interference, specifically by separation of expression, has been analyzed on the 256
proteome level and across species. Typically, positive co-variance of expression has been used 257
to study conservation of protein complexes. Here we show that, since divergence of co-258
expression patterns seem to scale to some extent with functional and structural diversification, 259
negative correlation may pinpoint specific evolutionary constraints in maintaining separation 260
of homomer-derived paralogues. In fact, our data suggest that anti-correlating complex pairs 261
are enriched in homomer-derived paralogues (Figure 2E). This is in agreement with the 262
suggestion that homomer-derived paralogues are the most likely to be affected by protein-263
protein interference which can be resolved by separating expression. We find only a fraction of 264
homomer-derived paralogues to exhibit such anticorrelation, while others strongly correlate. 265
We therefore suggest that complex paralogues can be divided into two main groups. First, 266
highly correlating co-evolved subunits and second, negatively correlating paralogues which, by 267
being expressed in complementary fashions minimize the risk of interference. We indicate 268
several examples of paralogues belonging to the latter class, for example lamin A and B1, which 269
could represent suitable targets for follow-up studies. 270
At present, the scope and generalizability of our conclusions is limited by several factors. First, 271
protein complex formation and evolution are likely to have many determinants, which can mask 272
the effect of paralogue interference or compensate for it in ways other than expression 273
divergence. Second, our analyses rests on resources curating protein complexes and 274
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
11
distinguishing homomer- from monomer-derived paralogues. Broadening and improvement in 275
such curations and annotation will allow more extensive and statistically more robust analyses 276
and conclusions. This has a clear impact on comparative studies, where the extent of annotation 277
between species varies greatly. Finally, the correlational nature of this, and many other 278
methodologically related studies, must be stressed. We show that proteome-wide, cross-tissue 279
and cross-species analyses are capable of capturing patterns that would otherwise be 280
indiscernible. In this respect, we identified specific trends behind complex evolution which give 281
support to specific proposals, such as homomer-derived paralogues-driven complex formation 282
and interference escape. However, targeted studies should decode the mechanisms underlying 283
these trends. Such an investigation, together with correlation analyses covering larger sets of 284
conditions, are what in our view holds the greatest promise to refine our understanding of the 285
forces shaping protein complex evolution. 286
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
12
Material and methods 287
SWATH-MS datasets and co-expression measures 288
We obtained all protein abundance data from publicly available SWATH-MS datasets. For 289
human we used the data of Guo et al. 20197, for mouse the data from Williams at al. 20189, for 290
fly the data of Okada et al. 20168 and for yeast the SWATH-MS dataset of the yeast strains 291
described Zhu et al., 200824 and Brem et al., 200225 (manuscript in preparation). For further 292
description of the datasets see Table S1. All analyses were based on the available protein matrix 293
with relative protein intensities. 294
For each protein pair, we calculated the Spearman correlation of raw protein abundances across 295
all samples. We used the cor function of the R package stats v. 3.4.4.26 with the option 296
pairwise.complete.obs to compute the correlation between each pair of proteins using all 297
complete pairs of observations on those proteins. 298
Orthologue identification 299
Orthologue mapping was conducted with BioMart Ensembl 9227 (release April 2018). We 300
considered only genes with a “one2one” mapping, i.e. when the gene in one species has only 301
one defined ortholog in another species. To translate protein and gene identifiers, we used the 302
R package biomaRt v2.34.228,29. 303
Paralogue identification and analysis 304
We identified paralogues and paralogue families with Ensembl 92 (release April 2018) 27. As 305
paralogue family we defined all paralogues connected via direct pairs. We determined the rate 306
of non-synonymous changes and synonymous changes between the paralogous pairs using 307
Ensembl 92 (ref27). We next assessed the similarity of interaction partners among paralogous 308
pairs. We obtained the interaction partners of each paralogue from BioGrid v3.530,31 and we 309
calculated the Jaccard similarity index, that is the intersection divided by the union of distinct 310
interaction partners. 311
To determine whether a paralogous pair is derived from an ancestral protein that either formed 312
homomers or was only present as a monomer, we used the InterEvol database (release 2010), 313
designed for the analysis of co-evolution events at the structural interfaces of hetero- and homo-314
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
13
oligomers32. We considered a paralogous pair as homomer-derived if at least one member was 315
annotated as homomer, and a pair as monomer-derived when none of the members was 316
annotated as either homo- or heteromer. For manual paralogue annotation, we additionally 317
considered the UniProt database (release November 2018)21. 318
Functional annotation 319
For functional enrichment analysis we used DAVID v6.818 with the following parameters: 320
Annotation categories: GOTERM_BP_DIRECT; GO_Kappa similarity: Similarity term 321
overlap = 3, similarity threshold = 0.5; Classification: Initial group membership = 2, final 322
group membership = 2, multiple linkage threshold = 0.5; Enrichment thresholds: EASE = 0.05. 323
We retrieved lists of protein complexes from CORUM v2.014,15. Tissue specific expression data 324
was retrieved from the Human Proteome Map portal20 (HPM). To compare expression between 325
tissue, the data was normalized using the normalizeBetweenArrays function from the R package 326
limma v. 3.40.633. 327
Statistics and visualization 328
We conducted all statistical analysis with R v3.4.426. For the Fisher's exact tests and the 329
Wilcoxon tests we used a one-sided alternative hypothesis if applicable. We drew density 330
graphs, bar- and boxplots with the R package ggplot2 v3.1.034. Boxplots were drawn in default 331
settings (lower and upper hinges correspond to the first and third quartiles, whiskers extend up 332
to 1.5 x the inter-quartile range or the distance between the first and the third quartile). In 333
addition, we used the Van de Peer’s webtool to draw the Venn diagrams35. For network 334
representations, we used Cytoscape v3.6.136. 335
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
14
References 336
1. Romanov, N. et al. Disentangling Genetic and Environmental Effects on the 337 Proteotypes of Individuals. Cell 177, 1308-1318.e10 (2019). 338
2. Ahnert, S. E., Marsh, J. A., Hernandez, H., Robinson, C. V. & Teichmann, S. A. 339 Principles of assembly reveal a periodic table of protein complexes. Science (80-340 . ). 350, 2245–2245 (2015). 341
3. Marsh, J. A. & Teichmann, S. A. Structure, Dynamics, Assembly, and Evolution 342 of Protein Complexes. Annu. Rev. Biochem. 84, 551–575 (2015). 343
4. Pereira-Leal, J. B., Levy, E. D., Kamp, C. & Teichmann, S. A. Evolution of 344 protein complexes by duplication of homomeric interactions. Genome Biol. 8, 345 (2007). 346
5. Kaltenegger, E. & Ober, D. Paralogue Interference Affects the Dynamics after 347 Gene Duplication. Trends Plant Sci. 20, 814–821 (2015). 348
6. Gillet, L. C. et al. Targeted data extraction of the MS/MS spectra generated by 349 data-independent acquisition: A new concept for consistent and accurate 350 proteome analysis. Mol. Cell. Proteomics 11, 1–17 (2012). 351
7. Guo, T. et al. Quantitative Proteome Landscape of the NCI-60 Cancer Cell Lines. 352 iScience 21, 664–680 (2019). 353
8. Okada, H., Ebhardt, H. A., Vonesch, S. C., Aebersold, R. & Hafen, E. Proteome-354 wide association studies identify biochemical modules associated with a wing-355 size phenotype in Drosophila melanogaster. Nat. Commun. 7, 1–11 (2016). 356
9. Williams, E. G. et al. Quantifying and localizing the mitochondrial proteome 357 across five tissues in a mouse population. Mol. Cell. Proteomics 17, 1766–1777 358 (2018). 359
10. Wan, C. et al. Panorama of ancient metazoan macromolecular complexes. Nature 360 525, 339–344 (2015). 361
11. Skinnider, M. A. et al. An Atlas of Protein-Protein Interactions Across 362 Mammalian Tissues. SSRN Electron. J. (2018). doi:10.2139/ssrn.3219264 363
12. Heusel, M. et al. Complex‐centric proteome profiling by SEC ‐ SWATH ‐ MS . 364 Mol. Syst. Biol. 15, 1–22 (2019). 365
13. Snider, J. et al. Fundamentals of protein interaction network mapping. Mol. Syst. 366 Biol. 11, 848 (2015). 367
14. Giurgiu, M. et al. CORUM: The comprehensive resource of mammalian protein 368 complexes - 2019. Nucleic Acids Res. 47, D559–D563 (2019). 369
15. Ruepp, A. et al. CORUM: The comprehensive resource of mammalian protein 370 complexes-2009. Nucleic Acids Res. 38, 497–501 (2009). 371
16. Murali, T. et al. DroID 2011: a comprehensive, integrated resource for protein, 372 transcription factor, RNA and gene interactions for Drosophila. Nucleic Acids 373 Res. 39, D736-43 (2011). 374
17. Pu, S., Wong, J., Turner, B., Cho, E. & Wodak, S. J. Up-to-date catalogues of 375 yeast protein complexes. Nucleic Acids Res. 37, 825–831 (2009). 376
18. Huang, D. W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative 377 analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 378 4, 44–57 (2009). 379
19. Huang, D. W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment 380
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
15
tools: paths toward the comprehensive functional analysis of large gene lists. 381 Nucleic Acids Res. 37, 1–13 (2009). 382
20. Kim, M.-S. et al. A draft map of the human proteome. Nature 509, 575–581 383 (2014). 384
21. UniProt Consortium, T. UniProt: the universal protein knowledgebase. Nucleic 385 Acids Res. 46, 2699–2699 (2018). 386
22. Bosu, D. R. & Kipreos, E. T. Cullin-RING ubiquitin ligases: Global regulation 387 and activation cycles. Cell Division 3, 7 (2008). 388
23. Hjerpe, R. et al. UBQLN2 Mediates Autophagy-Independent Protein Aggregate 389 Clearance by the Proteasome. Cell 166, 935–949 (2016). 390
24. Zhu, J. et al. Integrating large-scale functional genomic data to dissect the 391 complexity of yeast regulatory networks. Nat. Genet. 40, 854–861 (2008). 392
25. Brem, R. B., Yvert, G., Clinton, R. & Kruglyak, L. Genetic dissection of 393 transcriptional regulation in budding yeast. Science (80-. ). 296, 752–755 (2002). 394
26. Team, R. C. No Title. R: A Language and Environment for Statistical Computing 395 (2019). Available at: https://www.r-project.org/. 396
27. Zerbino, D. R. et al. Ensembl 2018. Nucleic Acids Res. 46, (2018). 397 28. Durinck, S., Spellman, P. T., Birney, E. & Huber, W. Mapping identifiers for the 398
integration of genomic datasets with the R/ Bioconductor package biomaRt. Nat. 399 Protoc. 4, 1184–1191 (2009). 400
29. Durinck, S. et al. BioMart and Bioconductor: a powerful link between biological 401 databases and microarray data analysis. Bioinformatics 21, 3439–3440 (2005). 402
30. Stark, C. et al. BioGRID: a general repository for interaction datasets. Nucleic 403 Acids Res. 34, D535-9 (2006). 404
31. Oughtred, R. et al. The BioGRID interaction database: 2019 update. Nucleic Acids 405 Res. 47, D529–D541 (2019). 406
32. Faure, G., Andreani, J. & Guerois, R. InterEvol database: Exploring the structure 407 and evolution of protein complex interfaces. Nucleic Acids Res. 40, 847–856 408 (2012). 409
33. Ritchie, M. E. et al. Limma powers differential expression analyses for RNA-410 sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015). 411
34. Wickham, H. ggplot2: Elegant Graphics for Data Analysis. (Springer-Verlag 412 New York, 2016). 413
35. Van der Peer, Y. Calculate and draw custom Venn diagrams. (2018). 414 36. Shannon, P. et al. Cytoscape: A software Environment for integrated models of 415
biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003). 416 417
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
A
C
#
HNRNPC
SRRT
RBM39
PRPF19
DHX15
HNRPK
HNRNPU
PUF60
WBP11
HNRPQ
ATP5I
ATP5H
ATP5F1 ATP5F1CATP5J
ATP5F1A
ATP5O
ATP5F1B
ATP5F1D
NDUFB3
NDUFV1
NDUFS6
NDUFB9
NDUFS5 NDUFS3
NDUFA2NDUFB4
NDUFB7 NDUFA9
NDUFA13NDUFA8
NDUFA7
NDUFA5
DLST
CS
IDH1
FAHD1
PDHB
IDH3GIDH3B
IDH3A
ACO2
OGDH
PDHA1
SUCLG2
SUCLG1
DLAT
SDHBSDHA
FH
Citrate
Cis-aconitate
D-threo-isocitrate
2-Oxoglutarate
Succinyl-CoASuccinate
Fumarate
(S)-malate
Oxalacetate +Acetyl-CoA
ACO2
IDH3G
OGDH, DLSTDLD
SUCLG1, SUCLG2
FH
MDH2
CS
ACO2
Spearman correlation network rs >0.6Spearman correlation network rs >0.8
D
−0.5 0.0 0.5 1.0
Den
sity
−1.0 −0.5 0.0 0.5 1.0
Den
sity
−1.0 −0.5 0.0 0.5 1.0
Den
sity
−1.0 −0.5 0.0 0.5 1.0
Den
sity
****** *** ***
−1.0
High rs humanLow rs human
Complex humanComplex human
High rs mouseLow rs mouse
Complex mouseComplex mouse
0.5 0.0 0.5 1.0Spearman correlation of mouse orthologs
High Corr Human
Low Corr Human
pval <0.001 Ntop0.05per = 1429 Nbottom0.05per = 1429
0.5 0.0 0.5 1.0Spearman correlation of mouse orthologs
High Corr Human
Low Corr Human
pval <0.001 Ntop0.05per = 1429 Nbottom0.05per = 1429
0.00
0.25
0.50
0.75
0.5 0.0 0.5 1.0Spearman correlation of mouse orthologs
Den
sity
High Corr Human
Low Corr Human
pval <0.001 Ntop0.05per = 1429 Nbottom0.05per = 1429
0.00
0.25
0.50
0.75
0.5 0.0 0.5 1.0Spearman correlation of mouse orthologs
Den
sity
High Corr Human
Low Corr Human
pval <0.001 Ntop0.05per = 1429 Nbottom0.05per = 1429
0.00
0.25
0.50
0.75
0.5 0.0 0.5 1.0Spearman correlation of mouse orthologs
Den
sity
High Corr Human
Low Corr Human
pval <0.001 Ntop0.05per = 1429 Nbottom0.05per = 1429
0.00
0.25
0.50
0.75
0.5 0.0 0.5 1.0Spearman correlation of mouse orthologs
Den
sity
High Corr Human
Low Corr Human
pval <0.001 Ntop0.05per = 1429 Nbottom0.05per = 1429
0.00
0.25
0.50
0.75
0.5 0.0 0.5 1.0Spearman correlation of mouse orthologs
Den
sity
High Corr Human
Low Corr Human
pval <0.001 Ntop0.05per = 1429 Nbottom0.05per = 1429
0.00
0.25
0.50
0.75
0.5 0.0 0.5 1.0Spearman correlation of mouse orthologs
Den
sity
High Corr Human
Low Corr Human
pval <0.001 Ntop0.05per = 1429 Nbottom0.05per = 1429
rs mouse orthologues rs mouse orthologues rs human orthologues rs human orthologues
hRNPs associate with A complex
Prpf19 activates B complex
Dhx15 dissociates C complex
Early
Late
Binds 3’ SS on RNABinds capped RNAs
SDHC, SDHDSDHA, SDHB
IDH3A, IDH3B,
A
A’
Human
Mouse
Drosophilamelanogaster
Sacharomycescerevisiae
60
40
60
112
3171
3648
1610
1862
Samples Proteins
Inte
nsity
A
Intensity A’Retention time
Inte
nsity
m/z
A A
A’ A’
Homomeric dimer A
Homomeric dimer A’
A A’
Paralogousheteromer
Development of inactive heteromer
50%
25%25%
Coevolution
Expressional separation
Prevent cross-interaction
Inte
nsity
A
Intensity A’
Inte
nsity
A
Intensity A’
Inte
nsity
A
Intensity A’
B
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
Figure 1. SWATH-MS co-expression profiles of protein complex modules are conserved 1
across species. 2
(A) Schematic drawing illustrating possible evolutionary routes from homomer-derived 3
paralogues into protein complexes. 4
(B) Study design. Protein co-expression profiles were acquired in SWATH-MS from 272 5
samples covering the proteome of four species. Spearman’s rank correlation (rs) was used to 6
measure profile similarity. 7
(C) Conserved correlation networks of human and mouse. Nodes represent proteins, edges 8
indicate high correlation in both human and mouse datasets. Red edges are annotated in 9
CORUM. In panels on the left, a stringent correlation cutoff of rs > 0.8 was chosen, on the right 10
a more relaxed cutoff of rs > 0.6 was used. For comparison, correlation values of human and 11
mouse were quantile normalized. 12
Left: Proteins of the ATP-synthase complex (top) and proteins of the NADH dehydrogenase 13
complex (bottom). 14
Right: Proteins involved in the tricarboxylic acid cycle (top) and proteins involved in the 15
spliceosome (bottom). 16
(D) (I) Mouse Spearman correlation for mouse orthologues that are highly correlated in human 17
(top 0.5 percentile, rs > 0.8, npairs = 1429) and for mouse orthologues that not correlate in human 18
(bottom 0.5 percentile, npair = 1429). Pairs that highly correlate in human correlate also 19
significantly higher in mouse (md 0.2 vs 0.0, Wilcoxon signed-rank test p-value < 0.001). 20
(II) Mouse Spearman correlation for pairs that are annotated as CORUM complex pairs in 21
human (npairs = 1807) and for pairs that are not annotated in CORUM (npairs = 712803). Human 22
CORUM pairs correlate significantly higher in mouse (md 0.35 vs 0.0, Wilcoxon signed-rank 23
test p-value < 0.001). 24
(III) Human Spearman correlation for human orthologues that are highly correlated in mouse 25
(top 0.5 percentile, rs > 0.8, npairs = 1429) and for human orthologues that not correlate in mouse 26
(bottom 0.5 percentile, npairs = 1429). Pairs that highly correlate in mouse correlate also 27
significantly higher in human (md 0.2 vs 0.0, Wilcoxon signed-rank test p-value < 0.001). 28
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
2
(IV) Human Spearman correlation for pairs that are annotated as CORUM complex pairs in 29
mouse (npairs = 331) and for pairs that are not annotated in CORUM (npairs = 714279). Mouse 30
CORUM pairs correlate significantly higher in human (md 0.3 vs 0.0, Wilcoxon signed-rank 31
test p-value < 0.001). 32
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
−1.0 −0.5 0.0 0.5 1.0rs Human
Den
sity
−1.0 −0.5 0.0 0.5 1.0rsMouse
Den
sity
−0.5 0.0 0.5 1.0rs Fly
Den
sity
−0.5 0.0 0.5 1.0rs Yeast
Den
sity
−1.0−1.0
ParaloguesNot paralogues
0.00
0.5 0.0 0.5Spearman correlation of mouse orthologs
High Corr Human
Low Corr Human
0.00
0.5 0.0 0.5Spearman correlation of mouse orthologs
High Corr Human
Low Corr Human
0.0
0.2
0.4
0.6
Many FewDifferent interaction partners
pval hl:0.008 med t/b:0.24/0.18Number of Paragroups:35
Human
0.0
0.2
0.4
0.6
Many FewDifferent interaction partners
pval hl:0.097 med t/b:0.41/0.25Number of Paragroups:7
Mouse
0.0
0.2
0.4
0.6
Many FewNon synonomous changes
pval hl:0.005 med t/b:0.23/0.39Number of Paragroups:23
Human
0.0
0.2
0.4
0.6
Many FewSynonomous changes
pval hl:0.039 med t/b:0.23/0.32Number of Paragroups:28
Human
0.00
0.25
0.50
0.75
Many FewNon synonomous changes
pval hl:0.009 med t/b:0.34/0.39Number of Paragroups:64
Mouse
0.00
0.25
0.50
0.75
Many FewSynonomous changes
pval hl:0.059 med t/b:0.34/0.35Number of Paragroups:78
Mouse
top third pairs with most
bottom third pairs with fewest
Non-synonymous changes Synonymous changes Interactome divergence
Non-synonymous changes Synonymous changes Interactome divergence
** * **
** * +
Overlapping top expressed tissuesbetween paralogous pairs
Overlapping top expressed tissuesbetween paralogous pairs
Perc
ent
Homomer-derivedMonomer-derived
60
40
20
0
Perc
ent
30
20
10
0
Most negatively correlating dodecile
Homomer CORUMHomomer not CORUM
Most positivelycorrelating dodecile
Homomer-derived paralogues Monomer-derived paralogues
Most negatively correlating dodecile
Most positivelycorrelating dodecile
−0.5 0.0 0.5 1.0rs
-1.0
Monomer CORUMMonomer not CORUM
A
B
D
C
−0.5 0.0 0.5 1.0rs
-1.0
r s
r sr sr s
r sr s
****** *** ***D
ensi
ty
Den
sity
0.0
2.5
5.0
7.5
Perc
ent
0.0
2.5
5.0
7.5
Perc
ent
+ + n.s. +
top third pairs with most
bottom third pairs with fewest
top third pairs with highest
bottom third pairs with lowest
top third pairs with most
bottom third pairs with fewest
top third pairs with most
bottom third pairs with fewest
top third pairs with highest
bottom third pairs with lowest
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
Figure 2: Co-expression profiles of paralogues are conserved across species and reflect 1
sequence divergence within the paralogue family. Negatively correlating homomer-2
derived paralogues are functional divergent and prominent in complexes. 3
(A) Spearman correlation of paralogous pairs in yeast, fly, mouse and human compared to all 4
measured non-paralogous pairs (number of paralogous pairs human = 1302, mouse = 1964, 5
fly = 119 and yeast = 191). Paralogues correlate significantly higher in all species (Wilcoxon 6
signed-rank test p-values < 0.001, md 0.12 vs 0.03, md 0.25 vs 0.01, md 0.34 vs 0.08 and 7
md 0.17 vs 0.04, respectively). 8
(B) Left: Spearman correlation coefficients of paralogous pairs within the top, respectively 9
bottom third of non-synonymous sequence changes within their paralogue family. Families that 10
have a range in Spearman correlation of < 0.4 and a range in non-synonymous changes of < 0.2 11
were excluded. Paralogous pairs with fewer non-synonymous changes tend to correlate higher 12
(Paired sample t-test p-values human = 0.005 and mouse = 0.009; number of paralogue families 13
in human = 23 and mouse = 64). 14
Middle: Spearman correlation coefficients of paralogous pairs within the top, respectively 15
bottom third of synonymous sequence changes within their paralogue family. Families that have 16
a range in Spearman correlation of < 0.4 and a range in synonymous changes of < 0.2 were 17
excluded. Paralogous pairs with fewer synonymous changes tend to correlate higher (Paired 18
sample t-test p-values human = 0.04 and mouse = 0.05, number of paralogue families in 19
human = 28 and mouse = 78). 20
Right: Spearman correlation coefficients of paralogous pairs within the top, respectively 21
bottom third of interactome divergence within their paralogue family. To quantify interactome 22
divergence, the Jaccard index, defined as the intersection of the interaction partners divided by 23
the union of the interaction partners of each pair was calculated. All interaction partners from 24
Biogrid were considered. Families that have a range of the Jaccard index of < 0.2 and range in 25
Spearman correlation of < 0.2 were excluded. Paralogous pairs with more similar interaction 26
partners tend to correlate higher (Paired sample t-test p-values human = 0.008 and 27
mouse = 0.09; number of paralogue families in human = 35 and mouse = 7). 28
(C) The overlap of the four most expressed tissues between paralogous pairs (as defined by 29
HPM) is compared between strongly negative correlating homomer-derived and monomer-30
derived pairs paralogous pairs (rs < -0.5, human npairs = 21, mouse npairs = 36). Negatively 31
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint
2
correlating homomer-derived paralogues tend to be expressed more diversely across different 32
tissues compared to monomer-derived paralogues. 33
(D) Left: Spearman correlation of homomer-derived paralogues that are annotated as CORUM 34
complex pairs (npairs = 42) and for pairs that are not annotated in CORUM (npairs = 474). 35
Homomer-derived paralogues are enriched in the bottom and top dodecile of the correlation 36
distribution (Fisher’s exact test p-values 0.06 and 0.1). 37
Right: Spearman correlation of monomer-derived paralogues that are annotated as CORUM 38
complex pairs (npairs = 51) and for pairs that are not annotated in CORUM (npairs = 569). 39
Monomer-derived paralogues are enriched not in the bottom but in the top dodecile of the 40
correlation distribution (Fisher’s exact test p-values 0.5 and 0.06). 41
*** indicate p-values £ 0.001, ** £ 0.01, * £ 0.01, + £ 0.1. 42
.CC-BY-NC-ND 4.0 International licenseperpetuity. It is made available under apreprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in
The copyright holder for thisthis version posted September 8, 2020. ; https://doi.org/10.1101/2020.09.08.287334doi: bioRxiv preprint