�������� ����� ��
Unravelling microalgal molecular interactions using evolutionary and struc-tural bioinformatics
Dimitrios Vlachakis, Athanasia Pavlopoulou, Dorothea Kazazi, SophiaKossida
PII: S0378-1119(13)00928-1DOI: doi: 10.1016/j.gene.2013.07.039Reference: GENE 38831
To appear in: Gene
Accepted date: 18 July 2013
Please cite this article as: Vlachakis, Dimitrios, Pavlopoulou, Athanasia, Kazazi,Dorothea, Kossida, Sophia, Unravelling microalgal molecular interactions using evolu-tionary and structural bioinformatics, Gene (2013), doi: 10.1016/j.gene.2013.07.039
This is a PDF file of an unedited manuscript that has been accepted for publication.As a service to our customers we are providing this early version of the manuscript.The manuscript will undergo copyediting, typesetting, and review of the resulting proofbefore it is published in its final form. Please note that during the production processerrors may be discovered which could affect the content, and all legal disclaimers thatapply to the journal pertain.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
1
Unravelling microalgal molecular interactions using evolutionary
and structural bioinformatics
Dimitrios Vlachakis, Athanasia Pavlopoulou, Dorothea Kazazi, and Sophia Kossida*
Bioinformatics & Medical Informatics Team, Biomedical Research Foundation, Academy of Athens, Soranou Efessiou 4, Athens 11527, Greece
*Correspondence to: Sophia Kossida, Bioinformatics & Medical Informatics Team, Biomedical Research Foundation, Academy of Athens, Soranou Efessiou 4, Athens 11527, Greece Tel: + 30 210 6597 199, Fax: +30 210 6597 545 E-mail: [email protected]
Highlights:
We analysed 7 microalgae organisms, carefully selected to belong to diverse groups.
We identified one fission and four fusion events that are considered genuine.
Protein interactions and functional links were identified in the 7 microalgae.
We investigated their evolutionary links via protein phylogenetic profiling.
The 3D structures of the identified proteins were modelled to study their function.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
2
ABSTRACT
Microalgae are unicellular microorganisms indispensible for environmental stability and
life on earth, because they produce approximately half of the atmospheric oxygen, with
simultaneously feeding on the harmful greenhouse gas carbon dioxide. Using gene fusion
analysis, a series of five fusion/fission events was identified, that provided the basis for
critical insights to their evolutionary history. Moreover, the three-dimensional structures
of both the fused and the component proteins were predicted, allowing us to envisage
putative protein-protein interactions that are invaluable for the efficient usage, handling
and exploitation of microalgae. Collectively, our proposed approach on the five
fusion/fission algae protein events contributes towards the expansion of the microalgae
knowledgebase, bridging protein evolution of the ancient microalgae species and the
rapidly evolving, modern, bioinformatics field.
Keywords: gene fusion, gene fission, homology modelling, protein association,
microalgae
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
3
INTRODUCTION
The demand for sustainable energy reserves and for increased environmental control has
escalated in the last few years, and this trend seems set to continue (Ndimba et al.,
2013). A new ammunition in the race to face this challenge is the exploitation of
microalgae, which until now have been under experimental investigation mostly for their
utilization as biofuels. Recently, their capacity to mitigate CO2emission led to their
exploitation as essential components of bio-adaptive facades of eco-friendly buildings
that generate renewable energy and produce oxygen. Nevertheless, much work remains
to be done in their basic biology.
Bioinformatics analysis methods comprise a swiss army knife that can aid the elucidation
of the molecular mechanisms in microalgae and therefore facilitate their full exploitation.
In particular, virtual protein interactomics represents a rapidly developing scientific area
on the boundary line of bioinformatics and molecular biology and comprises an
instrumental tool for the prediction, simulation and modelling of protein complex
interactions, as well as providing insights into transient intracellular signaling pathways
and protein evolution. Bioinformatics approaches can now identify putative protein-
protein interactions purely from genome sequences (Enright et al., 1999; Marcotte et al.,
1999), complementing labour intensive and time consuming conventional experimental
methods such as mass spectrometry (Ewing et al., 2007), unlinked non-complementing
mutant detection (Phizicky and Fields, 1995) and the widely used yeast two-hybrid assay
(Fields and Song, 1989).
Recombination by fusion is one of the main evolutionary mechanism to produce more
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
4
complex and stable protein structures (Kummerfeld and Teichmann, 2005). Therefore,
bioinformatics analysis based on gene fusion and fission constitutes a powerful prediction
method in the study of protein interactions. These analysis is based on the principle that
two component proteins A and B in one organism are likely to have physical interaction or
functional association (involvement in the same protein complex, metabolic pathway or
biological process) (Snel et al., 2000; Enright and Ouzounis, 2001) if their homologs in
another organism are fused together to a single composite protein ΑB (otherwise known
as “Rosetta stone” protein) (Enright et al., 1999; Marcotte et al., 1999). Conversely, a
fission event is considered to have occurred when a composite protein is found split into
its component proteins in a reference genome (Enright and Ouzounis, 2001). Gene fusion
analysis has been applied to a number of eukaryotic and prokaryotic organisms (Enright
et al., 1999; Marcotte et al., 1999; Snel et al., 2000; Yanai et al., 2001; Kummerfeld and
Teichmann, 2005; Dimitriadis et al., 2011). However the proteomes of microalgae species
have not been yet fully explored for such fusion events.
In the present study, we utilized the potential of protein fusion analysis and recently
developed computational software in order to identify potential protein interactions and
functional links in seven microalgal species. SAFE is the only software currently available
in the public domain which has been developed specifically for the automated detection,
filtering and visualization of fusion events (Tsagrasoulis et al., 2012). Most importantly, a
performance comparison of the software against a previous benchmark study of gene
fusions showed that the results by SAFE agree with other methods, while the software
can also be highly selective.
The evolutionary fate of these fusion and fission events was investigated via protein
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
5
phylogenetic profiling, which reflects protein evolution and sheds light into the time
frame that these events occur.
Furthermore, the three-dimensional structure of the identified component proteins or
complexes was predicted by employing homology modelling, in order to gain insight into
the protein molecular organization and putative function. Protein homology modelling is
currently recognized as the most accurate method for 3D structure prediction, yielding
models suitable for a wide spectrum of applications, such as structure based molecular
design, docking simulations and mechanism investigation.
The current study focused on the analysis of seven microalgae organisms, selected to
belong to diverse evolutionary lineages, including green algae (Volvox carteri,
Chlamydomonas reinhardtii, Chlorella variabilis,Ostreococcus lucimarinus), red algae
(Cyanidioschyzon merolae) and diatoms (Phaeodactylum tricornutumand Thalassiosira
pseudonana). The algal species that were investigated for putative fusion and fission
events in the current study, along with the taxonomic group to which they belong, are
described in Table 1
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
6
RESULTS and DISCUSSION
The present study utilized the potential of gene fusion analysis in order to identify
potential protein interactions and functional links in seven microalgal species which
belong to diverse groups, including green algae (Volvox carteri, Chlamydomonas
reinhardtii, Chlorella variabilis,Ostreococcus lucimarinus), red algae (Cyanidioschyzon
merolae) and diatoms (Phaeodactylum tricornutumand Thalassiosira pseudonana).
Identification of putative fusion and fission events was achieved by comparison of the
seven complete, annotated proteomes against each other in an all-against-all analysis; a
total of 42 analyses were performed. Moreover, in order to extract the maximum amount
of information possible from our analysis, in case a fusion event was not detected in the
proteome of one of the seven organisms under study, then the available proteome of its
phylogenetically closest organism and/or strain was investigated by BLAST search. For
instance, in the case of Chlorella variabilis NC64A, Chlorella vulgaris was examined
instead, and in the case of Ostreococcus lucimarinus CCE9901, Ostreococcus tauris
(Supplementary Table 1 and 2).
The current study identified four fusion events and one fission event that were
considered genuine, based both on our strict parameter settings and a subsequent
thorough manual analysis (see Methods).
Two of these events have been confirmed experimentally and three of them were found
to be species-specific. The five fusion and fission events were:
1) Putative fusion of alpha 1,2 mannosidase and Fra10Ac1 homologs in the green alga
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
7
Volvox carteri
2) Putative fusion of DOT1 and riboflavin synthase in the green alga Volvox Carteri
3) Putative fission proteins COX2A and COX2B, in the green algae Volvox carteri and
Chlamydomonas reinhardtii
4) Putative fusion of G6PDH and 6PGDH in the diatom Phaeodactylum tricornutum
5) Putative fusion of TIM and GAPDH in the diatoms Phaeodactylum tricornutum and
Thalassiosira pseudonana
The domain organization of the fused proteins was found to correspond to the domain
organization of two individual proteins in one or more other microalgae species, as
shown in Table 2. In order to display the comparison between the domain organisation of
the fused and the individual proteins, one of those species was chosen as representative
and the diagrammatic representations are visible in Figure 1. It should be noted that the
individual fusion or fission C.reinhadtii proteins in the first two fusion and single fission
events are encoded by non-homologous genes residing in different
chromosomes/scaffolds/contigs. This also applies to the individual fusion proteins in T.
pseudonana and C. merolae for the fourth and fifth detected events. The predicted fusion
events are discussed in detail below.
Identified putative interactions and functional associated proteins
Two fusion events were detected in Volvox carteri. The first fused protein
(XP_002957696.1) was found to have a domain organisation that corresponded to the
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
8
domain organization of the split Chlamydomonas reinhardtii proteins alpha 1,2
mannosidase (IPR001382, Figure 1A), an enzyme implicated in the processing of Asn-
linked oligosaccharides (Lal et al., 1994) and shown to have lytic activity on harmful
marine microalgae and Fra10Ac1 (IPR019129), a protein of nuclear localization and
unknown function in Homo Sapiens, found expressed in human brain, heart, skeletal
muscle, kidney and liver (Sarafidou et al., 2004). Similar split protein pairs were also
detected in Ostreococcus lucimarinus, Chlorella variabilis, Thalassiosira pseudonana and
Phaeodactylum tricornutum. It should be noted here that all Genbank accession numbers
comensing with XP represent a computer-automated prediction, which have not been
manually curated, annotated or experimentally confirmed.
The second fused protein identified in Volvox (XP_002949156.1) had a domain
organization that corresponded to the domain rearrangement of the proteins DOT1
(IPR013110) and riboflavin synthase (IPR017938) in C. Reinhardtii (Figure 1B), while
similar protein pairs were found in Chlamydomonas reinhardtii, Ostreococcus lucimarinus,
Thalassiosira pseudonana and Phaeodactylum tricornutum. The function of the split
protein DOT1 (Disruptor of Telomeric silencing) is to modulate gene expression in yeast
by methylating histone H3-lysine 79 (Singer et al., 1998; Feng et al., 2002; van Leeuwen et
al., 2002), while the enzyme riboflavin synthase catalyzes the synthesis of riboflavin from
two molecules of 6,7-dimethyl-8-(1’-D-ribityl)-lumanize (DMRL) (Wacker et al., 1964).
Interestingly, one fission event was detected in both the green algae Volvox carteri
(XP_002950066 & XP_002948528) and Chlamydomonas reinhardtii (EDP00208.1 &
EDP09974.1) (Table 3). Analysis of the domain organization in the split Volvox carteri and
Chlamydomonas reinhardtiiproteins, COX2A and COX2B, showed a correspondence to the
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
9
composite protein COX2 in Cyanidioschyzon merolae (BAA34656.1) (Figure 1C). The C.
reinhardtii proteins EDP00208.1 and EDP09974.1, are annotated as cytochrome C oxidase
subunit II, transmembrane domain (COX2A) (IPR011759) and cytochrome c oxidase
subunit II C-terminal (COX2B) (IPR002429), respectively. COX2A and COX2B proteins
correspond to the N- and C-terminal region of the second subunit of the cytochrome c
oxidase; a component of the electron transport chain of aerobic respiration, which is
involved in the transfer of electrons from cytochrome c to reduce molecular oxygen. The
cytochrome c oxidase enzyme complex is located in the inner mitochondrial membrane in
eukaryotes, and in the plasma membrane in bacteria (Tsukihara et al., 1996; Ostermeier
et al., 1997; Muramoto et al., 2010). Most importantly, this fission event has been
experimentally verified by Perez-Martinez et al., 2001 (Perez-Martinez et al., 2001). It was
demonstrated that the proteins COX2A and COX2B are encoded by two distinct genes,
namely cox2a and cox2b, in the Chlamydomonad algae C.reinhardtii and Polytomella sp.
(Perez-Martinez et al., 2001).
In the diatom Phaeodactylum tricornutum a single fusion event, XP_002185945.1 was
detected, where the domain organization of the fused Phaeodactylum protein,
G6PDH/6PGDH, was shown to correspond to the domain organization of the split
proteins' pair in Thalassiosira pseudonana, glucose-6-phosphate 1-dehydrogenase
(G6PDH) (PR001282) and 6-phosphogluconate dehydrogenase (6PGDH)
(IPR006113)(Figure 1D). These two enzymes are implicated in the pentose phosphate
pathway. G6PDH catalyses the first step in the pentose phosphate pathway, which is the
reduction of glucose-6-phosphate into gluconolactone 6-phosphate in the presence of
NADP, releasing NADPH (Fouts et al., 1988; Martini and Ursini, 1996). 6PGDH catalyses
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
10
the conversion of 6-phosphogluconate to ribulose 5-phosphate in the presence of NADP,
producing NADPH (Adams et al., 1983; Broedel and Wolf, 1990). Similar split protein pairs
were detected in Volvox carteri, Chlamydomonas reinhardtii, and Ostreococcus
lucimarinus (Table 2).
SAFE analysis identified one fusion event in the diatoms Phaeodactylum tricornutum
(XP_002177987) and Thalassiosira pseudonana (EED92326.1, Table 2). The domain
arrangement of the fused diatom protein TIM/GAPDH was found to correspond to the
domain arrangement of the split protein pairs in the green algae Volvox carteri,
Chlamydomonas reinhardtii, Chlorella variabilis, and in the red alga Cyanidioschyzon
merolae (Table 2). The C. merolae proteins BAC67674.1 and BAC67669.1 are annotated as
triosephosphate isomerase (TIM) (IPR000652) and glyceraldehyde-3-phosphate
dehydrogenase (GAPDH) (IPR006424), respectively (Figure 1E). These enzymes are
implicated in successive steps of glycolysis, the major carbohydrate metabolic pathway in
eukaryotes (Fothergill-Gilmore, 1986). TIM catalyzes the isomerization of D-
glyceraldehyde 3-phosphate (G3P) and dihydroxyacetone phosphate (DHAP) (Bloom and
Topper, 1956; Jogl et al., 2003). TIM is active as a homodimer (Alber et al., 1981; Lolis et
al., 1990) with a notable exception in archaeobacteria where it is active as a tetramer
(Kohlhoff et al., 1996). On the other hand, GAPDH catalyzes the sixth step of the glycolytic
pathway which is the conversion of G3P to 1,3-diphospho-glycerate (Dugaiczyk et al.,
1983; Martin et al., 1993). All known active GAPDH enzymes are homotetramers (Banner
et al., 1975; Skarzynski and Wonacott, 1988).
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
11
3D structures of identified component proteins and complexes
Homology modelling was employed to predict the three-dimensional structure of the
identified component proteins or complexes for all the fusion and fission events, and the
resulting 3D structures are shown in Figure 2. The evolutionary history of the identified
gene fusion/fission events was also investigated. The ultimate goal of this analysis was to
determine whether those events are due to gene fusion of fission. Towards this direction,
the conservation of both the fused protein and the individual component proteins across
the main eukaryotic and prokaryotic taxonomic divisions was examined (Figure 5). Based
both on experimental evidence, as well as the position of the "reference" organism within
the species tree in Figure 5, an event was assigned as either fusion or fission (i.e. if a
protein was found to be split in a single taxon and composite in the other taxonomic
groups, then this protein was regarded as the product of a fission event).
Putative fusion of alpha 1,2 mannosidase and Fra10Ac1 homologs in Volvox carteri
While the crystal structure of alpha 1,2 mannosidase has been available for species like
Saccharomyces cerevisiae and 3D structure prediction studies have been performed for
other organisms, the three-dimensional structure of Fra10Ac1 has not been available.
We have produced a model of alpha 1,2 mannosidase complexed with Fra10Ac1, based
on the great structural similarity of the former protein to the crystal structure of
adenylylsulfate reductase from Desulfovibrio gigas (RCSB entry: 3GYX). Specifically, the
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
12
crystal structure of the later protein consists of six copies of a heterodimer that is made
up by a large α-helical barrel-like conformation and a smaller molecule in an extended
coil and β-sheet conformation that wraps around the larger component. After structurally
superposing the main subunit of adenylylsulfate reductase on alpha 1,2 mannosidase, a
model of the smaller extended molecule was prepared using the sequence of Fra10Ac1
(Figure 2A, left).Our results suggest that alpha 1,2 mannosidase and Fra10Ac1 may be
functionally associated in the species where they were detected as heterodimers. The
complex was subjected to exhaustive molecular dynamics simulations for a total of 20
nanoseconds. The explicitly solvated, periodic molecular system quickly reached
equilibrium and remained there for the remaining of the simulation time, indicating that
the protein complex (Figure 2A, right) was stable.
Putative fusion of DOT1 and riboflavin synthase in Volvox Carteri
We suggest that the proteins DOT1 and riboflavin synthase (Figure 2B, left) may interact
in species where they were found to be split. Although there is yet no evidence to
support interactions between these two proteins, the results from our homology
modelling study indicate that the homotrimer complex of riboflavin synthase creates a
concave surface in a three way asymmetrical conformation among the three monomers,
which bears just enough space to accommodate a single molecule of human DOT1. Our
docking results revealed a multiple coil-coil interaction pattern between the trimer
riboflavin synthase complex and the human DOT1 molecule (Figure 2B, right) that is
supported by numerous hydrophobic interactions
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
13
Putative fission proteins COX2A and COX2B, in Volvox carteri and Chlamydomonas
reinhardtii
The three dimensional homology model of the COXIIa and COXIIb complex is shown in
Figure 2C, left. Electrostatic potential surfaces were calculated in order to analyze and
compare the charge distribution of the produced 3D model to its template structure
(Figure 3). The two complexes exhibited almost identical electrostatic surfaces, sharing
common features that were not disturbed by the addition of the two extra alpha-helices
on the homology model. There is a hydrophobic, uncharged region in the mid section of
the two complexes (depicted by white boxes) that is vital to its function that has been
conserved, despite the addition of the insert structures. This observation verified the
validity of the model, which was found to share similar electrostatic surface of almost
identical intensity, to its X-ray determined template structure.
An intriguing finding came to light after further bioinformatics investigation into the
fusion site. Sequence alignment of the two microalgal component proteins against their
chosen templates revealed that there is an insert of 64 amino acids at the fusion site.
More specifically, there are 21 amino acids prior to and 43 amino acids posterior to the
fusion site. Due to the lack of coordinates for that insert from the template structures,
the 64 amino acid sequence was blasted against the full PDB database. Strikingly, the
structure of a protein fragment from the marine bacterium Thermotoga maritima was
identified for the 43 residue fragment right after the fusion site. In particular, the insert
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
14
appeared at chain A of the crystal structure of a trigger factor chaperone with
promiscuous substrate recognition in folding and assembly from the Thermotoga
maritima bacterium (RCSB entry: 3GU0). Through careful investigation and a series of
molecular dynamics simulations on homology-built models bearing or lacking the insert
structure from the bacterium, it was concluded that the insert structure is vital to the
optimal folding of the algal protein and consequently the survival of the algae species. 3D
modelling in silico studies demonstrate that when the parent protein was split into two
component ones, both adjacent residues to the fusion site acquire an extended coil
conformation that is highly exposure to the solvent. The component proteins bearing the
bacterial insert would have not been able toacquire a stable structure without it as the
exposed to the solvent coils are very unstable. Both bacterium inserts consist of a small
coil conformation, which eventually leads to structurally robust α-helices. Molecular
dynamics simulations of component protein missing the insert α-helical structures led to
the conclusion that the exposed coil has too many degrees of freedom and renders the
whole molecular system rather unstable. Notably the molecular system, which was
subjected to a five nanosecond molecular dynamics simulation never reached
equilibrium. On the contrary, the homology model of the same protein bearing the
bacterial insert structures that end in α-helical conformation, quickly reached equilibrium
(± 150 ps) and remained there for the rest of the simulation time. Judging on structural
features, the bacterial α-helical insert as well as the 21 aminoacid α-helical fragment, fit
nicely to its environment by joining in a multiple α-helix bundle conformation, next to the
pack of a-helices already present in the core of the protein. The smaller 21 aminoacid
long insert was modelled in α-helical upon the application of secondary structure
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
15
prediction algorithms that pointed this way.
There is more evidence that lends support to our horizontal gene transfer (HGT)
hypothesis of the protein insert from microalgae to Thermotoga maritima. The sequence
identity and similarity between the insert and the bacterial protein is 37 and 59 percent
respectively. The bacterium Thermotoga maritima is a hyperthermophilic organism, that
inhabits the sediments of marine geothermal areas such as hot springs and hydrothermal
vents. The ideal water environmental temperature of the bacterium is around 80 °C.
Currently Thermotoga maritima is the only known bacterium species capable of surviving
at such high temperatures. Importantly, Thermotoga maritima inhabits the same
environment as the algae species under study. Algae and members of the Archaea family
have been well known to live in such hostile environments. For many years and it has
been suggested that Thermotoga maritima is a very ancient organism too. This is firstly
due to its hyperthermophilic abilities and secondly due to its unique deep lineage, based
on phylogenetic analysis of its ribosomal RNA material. Therefore, we speculate that both
algae and Thermotoga maritima had the evolutionary time required for such gene
transfers. Secondly, they both live in environment of extremes that are well known to
accelerate evolution. Finally, it is quite common for bacteria to integrate genes from
neighbouring organisms.
In particular, looking into the composition of its genome more carefully led to the striking
observation that more than 24% of its full genome is identical to that of other Archaea
members. This is the highest genome overlap ever observed in all bacteria species.
Conclusively, our findings suggest that horizontal gene transfer between the Thermotoga
maritima and Archaea or other neighbouring species may have helped this bacterium to
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
16
survive in high temperature water. Gene transfer and internalization of exogenous genetic
material between species is highly promoted by the abundant energy in their
environment and the constant evolutionary push.
Putative fusion of G6PDH and 6PGDH in Phaeodactylum tricornutum
The homology model for the complex of proteins G6PDH and 6PGDH from the predicted
fusion event in Phaeodactylum tricornutum proved quite stable upon molecular
dynamicanalysis. There is a set of polar residues on extended coil conformation on both
enzymes that aid to the establishment of strong electrostatic interactions.
Putative fusion of TIM and GAPDH in diatoms Phaeodactylum tricornutum and
Thalassiosira pseudonana
For the Thalassiosira pseudonana fusion protein hypothesis the template structures of
the crystal structure of the rabbit muscle triosephosphate isomerase (RCSB entry: 1R2R)
and the photosynthetic glyceraldehyde-3-phosphate dehydrogenase structure in a
crystal of the A4 isoform complexed with NAD(RCSB entry: 1NBO) were used for TIM
and GAPDH respectively (Figure 2E).
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
17
Putative fusion of alpha 1,2 mannosidase and Fra10Ac1 homologs in Volvox carteri
Evolutionary analysis of the fused protein revealed that it is present as a single composite
only in Volvox, as a heterodimer in green algae and diatoms, whereas only an alpha 1,2
mannosidase homolog was detected in the red alga C. merolae (Figure 4A). Orthologs of
the two component proteins were not found in eubacteria and archaea (Figure 5).
The identified fusions of the genes alpha 1,2 mannosidase and Fra10Ac1, as well as DOT1
and riboflavin synthase were unique to Volvox carteri. This microalga, after its divergence
from its unicellular relatives 200 million years ago, has evolved into a highly complex
multicellular organism, where a number of developmental changes have taken place. It is
suggested that in the case of metazoa (e.g. Cnidaria) (Putnam et al., 2007), novel protein
domains and/or combinations of domains contributed to the transition from
unicellularity to multicellularity. Therefore, it would be intriguing to speculate that the
identified fusion events, which resulted in two novel fused proteins in Volvox, could have
contributed to the multicellularity of this organism. Notably, this definition does not
necessarily defines the boundaries between unicellular and multicellular organism.
Despite the lack of evidence to support either direct or indirect association between the
two pairs of component proteins, we were able to identify by homology modelling
conserved protein interaction sites.
Putative fusion of DOT1 and riboflavin synthase in Volvox Carteri
Investigation of the evolutionary fate of the second fused Volvox protein revealed that it
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
18
is present as a heterodimer in the Chlorophyceae, Mamiellophyceae and diatoms,
whereas only a riboflavin synthase homolog was detected in the Trebouxiophyceae C.
variabilis and the rhodophyte C. merolae (Figure 4B). Despite thorough database
searches, the fused protein in Volvox was found to exist as a single composite only in
Volvox (Figure 5).
Putative fission proteins COX2A and COX2B, in Volvox carteri and Chlamydomonas
reinhardtii
The heterodimeric protein identified in V. carteri and C. reinhardtii is present as a single
composite protein in Rhodophyta, Chlorophyceae, Mamiellophyceae and diatoms (Figure
4C). Despite thorough searches across diverse eukaryotic and prokaryotic taxonomic
groups, these two proteins were detected as heterodimers only in the algae of the order
Chlamydomonadales (Figure 5). It has been suggested that over the course of evolution,
the gene cox2 was split into two mitochondrial genes in Chlamydomonadales which were
later transferred to the nucleus (Perez-Martinez et al., 2001). Based both on our
evolutionary analysis and previous findings (Perez-Martinez et al., 2001), we propose that
the cox2 division took place after the divergence of Chlamydomonadales from the other
orders of Chlorophyta.
Putative fusion of G6PDH and 6PGDH in Phaeodactylum tricornutum
The fused protein identified in P. tricornutum is present as a heterodimer in the centric
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
19
diatom, and also in Chlorophyceae, Mamiellophyceae and Rhodophyta, whereas only one
G6PDH ortholog was detected in Trebouxiophyceae; this is probably due to incomplete
genomic studies (Figure 4D).
Upon examination of the evolutionary fate of this fusion event across eukaryotes and
prokaryotes, we observe that the fused protein G6PDH/6PGDH is species-specific since it
was found exclusively in P. tricornutum (Figure 5). It would be tempting to hypothesize
that there must have been evolutionary pressure for the G6PDH and 6PGDH genes to
fuse during the course of evolution. This fusion event might have taken place in order to
decrease the metabolic load in the Phaeodactylum cell. We propose that the
G6PDH/6PGDH fusion should have occurred after the divergence of pennate diatoms (P.
tricornutum) from the centric diatoms (T. pseudonana), approximately less than 90
million years ago (Sims, 2006).
Putative fusion of TIM and GAPDH in the diatoms Phaeodactylum tricornutum and
Thalassiosira pseudonana
Investigation into the evolutionary fate of the fusion event between TIM and GAPDH by
sequence analysis revealed that the fused protein identified in the diatoms
Phaeodactylum tricornutum and Thalassiosira pseudonana is present as a heterodimer in
red algae, Chrorela whereas only one TIM ortholog was detected in Mamiellophyceae
(Figure 3E); this is probably attributed to incomplete genomic studies. Orthologs of the
TIM/GAPDH fusion protein were also found in the photosynthetic brown algae and the
non-photosynthetic oomycetes (Figure 5), which belong, together with the
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
20
photosynthetic diatoms Phaeodctylum and Thalassiosira, to stramenopiles, a
heterogeneous group of heterokonts. We propose that the TIM and GAPDH fusion may
have taken place after the secondary endosymbiosis (Gray, 1999; Falkowski et al., 2004)
since the TIM/GAPDH fused protein was found to be split in green alga and red algae.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
21
CONCLUSIONS
Features that make microalgaeattractive are their CO2 abatement capacity, the ability for
growth at various CO2 concentrations, the ability for some of them to grow
phototrophically or heterotrophically, their rapid growth and scalability and the ease of
genetic manipulation in order to introduce genes of interest into the nuclear,
chloroplastic or mitochondrial genome.
However, the molecular biology of the microalgae has not been fully explored yet. We
argue that the implementation of novel bioinformatic techniques will help to elucidate
microalgal molecular mechanisms with implications on their exploitability. The present
virtual protein interactomics study employed such novel bioinformatics methods to the
proteomes of five diverse microalgal organisms, including green algae (Volvox carteri,
Chlamydomonas reinhardtii, Chlorella variabilis,Ostreococcus lucimarinus), red algae
(Cyanidioschyzon merolae) and diatoms (Phaeodactylum tricornutumand Thalassiosira
pseudonana). Overall we have identified five fusion and fission events, thereby obtaining
important information on putative novel protein interactions. Interestingly, three of the
five events are involved in metabolic pathways. Moreover, by employing homology
modelling we predicted the three-dimensional structures of the identified component
proteins or complexes. Comparative analysis of the evolutionary fate of the fusion and
fission events allowed us to propose hypotheses regarding the timing of these events. We
also indentified an incident of horizontal gene transfer in the bacterium Thermotoga
maritima. There is an urgent need for an in-depth understanding of the molecular
mechanisms of microalgal species for practical applications and the translation of this
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
22
knowledge into anthropogenic as well as complex natural ecosystems which are on the
verge of unbalance. The solution to this key challenge in this post-genomic era can be
envisioned through the application of systems biology approaches in order to enrich the
knowledge in the microalgal field. Bridging the gap between virtual interactomics and
structural bioinformatics with experimental findings in microalgae can be achieved
through high throughput profiling data and in silico modelling and we can ideate their
integration with observations at the cellular scale in order to extend our understanding of
microalgal species beyond the analysis of experimental observations. The power of
utilizing bioinformatic methodology approaches in conjunction with classical
experimental procedures can expand the gained insight into the underpinning of the
microalgal molecular functions, with an immediate effect on practical applications having
socio-economic facets, as well as enabling the finer control of the natural and
antropogenic terrestrial and space ecosystems.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
23
MATERIALS AND METHODS
Proteome sequences retrieval
The complete proteome sequences of the seven species were obtained from the NCBI
database (Sayers et al., 2008; Benson et al. 2009). These sequences were derived from a
computer-automated pipeline and stored as preliminary data. The species names,
taxonomic classification and proteome size are summarized and listed in Table 1.
Identification of fusion events
The entire proteome of each of the seven organisms was compared against each of the
other 6 proteomes which were used as references by employing SAFE (Software for the
Analysis of Fusion Events), a computational platform for the automated detection,
filtering and visualization of fusion events (Tsagrasoulis et al., 2012). To reduce false
positives and ensure that we obtain reliable results, we used the following set of
parameter values:
The proteins of the same organisms that shared more than 85% identity over their entire
length were considered as duplicates and were removed from the subsequent steps of
the analysis, to avoid redundancy
Two component proteins in a query organism were considered to be fused in the proteome
of the reference organism only if they had a minimum domain length of 70 amino acids.
The minimum percentage of identity between two orthologous domains is set by default to
27% in SAFE. However, given that the 7 algae species under study are phylogenetically
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
24
closely related (e.g. V. carteri and C. reinhardtii, T. pseudonana and P. tricornutum), the
parameter value was set to 35%.
A pair of proteins was considered to be fused if the corresponding component protein
domains aligned with a minimum protein coverage of 70% to the composite protein
sequence in the reference organism.
To increase the robustness of our analysis, due to the short evolutionary distance between
the organisms under investigation, the threshold for the E-value was set at 10-5 instead of
the default 10-3.
Verification of the predicted fusion events
The results of the automated analysis were subjected to further verification by manual
analysis:
The so-called ‘promiscuous’ domains which occur frequently in many otherwise unrelated
proteins, such as ATP-binding cassettes, actin binding domains, WD repeats and SH3
domains (Marcotte et al., 1999) were removed, to reduce errors.
All proteins (fused and heterodimeric) identified in our study were searched against
InterPro (which combines diverse information about protein families and domains from
multiple databases) (Hunter et al., 2011) for the full annotation of the individual protein
domains. The InterPro accession number for each protein domain is indicated in the text
by the three-letter code IPR followed by six digits.
The predicted reference fused protein was split into its component proteins and then
checked by reverse BLAST (Altschul et al., 1997) to assess whether these two proteins
returned the initial two query proteins as their best BLAST hit.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
25
Examination of the evolutionary fate of the predicted fusion events
The distribution of the 5 fusion/fission events across the major eukaryotic and
prokaryotic taxonomic divisions was investigated. Towards this end, we used both the
composite protein sequences and the protein sequence pairs to search the available
NCBI, UniProtKB (Magrane and Consortium, 2011), Cyanidioschyzon merolae (Matsuzaki
et al., 2004) databases for homologs by applying BLASTp (Altschul et al., 1997). The best
BLAST hit within each taxon or taxonomic group was considered as the best candidate
ortholog.
The results of our search are categorized as follows:
o A single composite protein (fused protein) homolog was detected.
o The query protein was found split into two component protein domains in the reference
proteome (heterodimeric protein).
o A single reference protein homologous to either one of the two query component proteins
was detected. This is probably due to incomplete genomic studies.
o No protein homologs were found in the reference proteome (protein not available/missing
data). This is probably attributed to incomplete genomic studies or lack of data availability
in the genomic databases.
The results of this search were also mapped to the respective leaves of a species tree in
order to trace the evolutionary fate of each of the fusion/fission events. The NCBI
Taxonomy species tree was constructed using iTol (Letunic and Bork, 2011) and visualized
with Dendroscope (Huson et al., 2007).
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
26
Secondary Structure Prediction
Secondary structure predictions were performed using the NPS (Network Protein
Sequence Analysis) web-server.
Homology Modelling and Model Evaluation
The homology modelling of the five algae enzymes was carried out using the MOE
(2004.03) package and its built-in homology modelling application. The produced models
were initially evaluated within the MOE package by a residue packing quality function,
which depends on the number of buried non-polar side chain groups and on hydrogen
bonding. The sequence identity scores of all homology models discussed in this study
were adequate enough to allow conventional homology modelling techniques to be used.
The homology model method of MOE comprises the following steps: First an initial partial
geometry specification, where an initial partial geometry for each target sequence is
copied from regions of one or more template chains. Secondly, the insertions and
deletions task, where residues that still have no assigned backbone coordinates are
modeled. Those residues may be in loops (insertions in the model with respect to the
template), they may be outgaps (residues in a model sequence which are aligned before
the C-terminus or after the N-terminus of its template) or may be deletions (regions
where the template has an insertion with respect to the model). For this study though
outgaps have not been included in the homology modelling process. Third step is the loop
selection and sidechain packing, where a collection of independent models is created.
Last step is the final model selection and refinement one, where the final models are
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
27
scored and ranked, after they have been stereochemically checked for persisting errors.
Furthermore the suite PROCHECK was employed to further evaluate the quality of each
one of the five algae enzyme models.
Molecular electrostatic potential (MEP)
Electrostatic potential surfaces were calculated by solving the nonlinear Poisson–
Boltzmann equation using finite difference method as implemented in the Pymol
Software. The potential was calculated on grid points per side (65, 65, 65) and the ‘grid fill
by solute’ parameter was set to 80%. The dielectric constants of the solvent and the
solute were set to 80.0 and 2.0, respectively. An ionic exclusion radius of 2.0 Å, a solvent
radius of 1.4 Å and a solvent ionic strength of 0.145 M were applied. AMBER99 charges
and atomic radii were used for this calculation.
Model Optimization
Energy minimisation was done in MOE (Molecular Operating Environment suite) initially
using the Amber99 forcefield implemented into the same package, up to a RMSD gradient
of 0.0001 to remove the geometrical strain. The model was subsequently solvated with
SPC water using the truncated octahedron box extending to 7 Å from the model and
molecular dynamics were performed after that for 200 nanoseconds, at 300K, 1 atm with
2 fs step size, using the NVT ensemble in a canonical environment. NVT stands for
Number of atoms, Volume and Temperature that remain constant throughout the
calculation. The results of the molecular dynamics simulation were collected into a
database by MOE and can be further analysed.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
28
Docking studies and protein-protein interactions
The docking studies amongst the various constructed models were executed using ZDOCK
version 3.0. Likewise, RDOCK was used in order to minimize the ZDOCK complex outputs
and re-rank them based on their re-estimated binding free energies. Upon docking
experiments all molecular systems were subjected to extensive energy minimisations up
to a Gradient G<0.0001, using the Charmm27 forcefield as implemented into the
Gromacs 4.5.5 suite, using our in-house developed graphical interface (Sellis et al., 2009).
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
29
REFERENCES
Adams, M.J., Archibald, I.G., Bugg, C.E., Carne, A., Gover, S., Helliwell, J.R., Pickersgill,
R.W. and White, S.W. The three dimensional structure of sheep liver 6-
phosphogluconate dehydrogenase at 2.6 A resolution. EMBO J 2 (1983), pp. 1009-14.
Alber, T., Banner, D.W., Bloomer, A.C., Petsko, G.A., Phillips, D., Rivers, P.S. and Wilson,
I.A. On the three-dimensional structure and catalytic mechanism of triose phosphate
isomerase. Philos Trans R Soc Lond B Biol Sci 293 (1981), pp. 159-71.
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman,
D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs. Nucleic Acids Res 25 (1997), pp. 3389-402.
Armbrust, E.V. The life of diatoms in the world's oceans. Nature 459 (2009), pp. 185-92.
Armbrust, E.V., Berges, J.A., Bowler, C., Green, B.R., Martinez, D., Putnam, N.H., Zhou, S.,
Allen, A.E., Apt, K.E., Bechner, M., Brzezinski, M.A., Chaal, B.K., Chiovitti, A.,
Davis, A.K., Demarest, M.S., Detter, J.C., Glavina, T., Goodstein, D., Hadi, M.Z.,
Hellsten, U., Hildebrand, M., Jenkins, B.D., Jurka, J., Kapitonov, V.V., Kroger, N.,
Lau, W.W., Lane, T.W., Larimer, F.W., Lippmeier, J.C., Lucas, S., Medina, M.,
Montsant, A., Obornik, M., Parker, M.S., Palenik, B., Pazour, G.J., Richardson, P.M.,
Rynearson, T.A., Saito, M.A., Schwartz, D.C., Thamatrakoln, K., Valentin, K., Vardi,
A., Wilkerson, F.P. and Rokhsar, D.S. The genome of the diatom Thalassiosira
pseudonana: ecology, evolution, and metabolism. Science 306 (2004), pp. 79-86.
Banner, D.W., Bloomer, A.C., Petsko, G.A., Phillips, D.C., Pogson, C.I., Wilson, I.A.,
Corran, P.H., Furth, A.J., Milman, J.D., Offord, R.E., Priddle, J.D. and Waley, S.G.
Structure of chicken muscle triose phosphate isomerase determined
crystallographically at 2.5 angstrom resolution using amino acid sequence data.
Nature 255 (1975), pp. 609-14.
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2009). GenBank. Nucleic
Acids Res. 2009 Jan;37(Database issue): D26-31.
Blanc, G., Duncan, G., Agarkova, I., Borodovsky, M., Gurnon, J., Kuo, A., Lindquist, E.,
Lucas, S., Pangilinan, J., Polle, J., Salamov, A., Terry, A., Yamada, T., Dunigan, D.D.,
Grigoriev, I.V., Claverie, J.M. and Van Etten, J.L. The Chlorella variabilis NC64A
genome reveals adaptation to photosymbiosis, coevolution with viruses, and cryptic
sex. Plant Cell 22 (2010), pp. 2943-55.
Bloom, B. and Topper, Y.J. Mechanism of action of aldolase and phosphotriose isomerase.
Science 124 (1956), pp. 982-3.
Bowler, C., Allen, A.E., Badger, J.H., Grimwood, J., Jabbari, K., Kuo, A., Maheswari, U.,
Martens, C., Maumus, F., Otillar, R.P., Rayko, E., Salamov, A., Vandepoele, K.,
Beszteri, B., Gruber, A., Heijde, M., Katinka, M., Mock, T., Valentin, K., Verret, F.,
Berges, J.A., Brownlee, C., Cadoret, J.P., Chiovitti, A., Choi, C.J., Coesel, S., De
Martino, A., Detter, J.C., Durkin, C., Falciatore, A., Fournet, J., Haruta, M., Huysman,
M.J., Jenkins, B.D., Jiroutova, K., Jorgensen, R.E., Joubert, Y., Kaplan, A., Kroger,
N., Kroth, P.G., La Roche, J., Lindquist, E., Lommer, M., Martin-Jezequel, V., Lopez,
P.J., Lucas, S., Mangogna, M., McGinnis, K., Medlin, L.K., Montsant, A., Oudot-Le
Secq, M.P., Napoli, C., Obornik, M., Parker, M.S., Petit, J.L., Porcel, B.M., Poulsen,
N., Robison, M., Rychlewski, L., Rynearson, T.A., Schmutz, J., Shapiro, H., Siaut,
M., Stanley, M., Sussman, M.R., Taylor, A.R., Vardi, A., von Dassow, P., Vyverman,
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
30
W., Willis, A., Wyrwicz, L.S., Rokhsar, D.S., Weissenbach, J., Armbrust, E.V., Green,
B.R., Van de Peer, Y. and Grigoriev, I.V. The Phaeodactylum genome reveals the
evolutionary history of diatom genomes. Nature 456 (2008), pp. 239-44.
Broedel, S.E., Jr. and Wolf, R.E., Jr. Genetic tagging, cloning, and DNA sequence of the
Synechococcus sp. strain PCC 7942 gene (gnd) encoding 6-phosphogluconate
dehydrogenase. J Bacteriol 172 (1990), pp. 4023-31.
Derelle, E., Ferraz, C., Rombauts, S., Rouze, P., Worden, A.Z., Robbens, S., Partensky, F.,
Degroeve, S., Echeynie, S., Cooke, R., Saeys, Y., Wuyts, J., Jabbari, K., Bowler, C.,
Panaud, O., Piegu, B., Ball, S.G., Ral, J.P., Bouget, F.Y., Piganeau, G., De Baets, B.,
Picard, A., Delseny, M., Demaille, J., Van de Peer, Y. and Moreau, H. Genome
analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique
features. Proc Natl Acad Sci U S A 103 (2006), pp. 11647-52.
Dimitriadis, D., Koumandou, V.L., Trimpalis, P. and Kossida, S. Protein functional links in
Trypanosoma brucei, identified by gene fusion analysis. BMC Evol Biol 11 (2011), p.
193.
Dugaiczyk, A., Haron, J.A., Stone, E.M., Dennison, O.E., Rothblum, K.N. and Schwartz, R.J.
Cloning and sequencing of a deoxyribonucleic acid copy of glyceraldehyde-3-
phosphate dehydrogenase messenger ribonucleic acid isolated from chicken muscle.
Biochemistry 22 (1983), pp. 1605-13.
Enright, A.J., Iliopoulos, I., Kyrpides, N.C. and Ouzounis, C.A. Protein interaction maps for
complete genomes based on gene fusion events. Nature 402 (1999), pp. 86-90.
Enright, A.J. and Ouzounis, C.A. Functional associations of proteins in entire genomes by
means of exhaustive detection of gene fusions. Genome Biol 2 (2001), p.
RESEARCH0034.
Ewing, R.M., Chu, P., Elisma, F., Li, H., Taylor, P., Climie, S., McBroom-Cerajewski, L.,
Robinson, M.D., O'Connor, L., Li, M., Taylor, R., Dharsee, M., Ho, Y., Heilbut, A.,
Moore, L., Zhang, S., Ornatsky, O., Bukhman, Y.V., Ethier, M., Sheng, Y., Vasilescu,
J., Abu-Farha, M., Lambert, J.P., Duewel, H.S., Stewart, II, Kuehl, B., Hogue, K.,
Colwill, K., Gladwish, K., Muskat, B., Kinach, R., Adams, S.L., Moran, M.F., Morin,
G.B., Topaloglou, T. and Figeys, D. Large-scale mapping of human protein-protein
interactions by mass spectrometry. Mol Syst Biol 3 (2007), p. 89.
Falkowski, P.G., Katz, M.E., Knoll, A.H., Quigg, A., Raven, J.A., Schofield, O. and Taylor,
F.J. The evolution of modern eukaryotic phytoplankton. Science 305 (2004), pp. 354-
60.
Feng, Q., Wang, H., Ng, H.H., Erdjument-Bromage, H., Tempst, P., Struhl, K. and Zhang, Y.
Methylation of H3-lysine 79 is mediated by a new family of HMTases without a SET
domain. Curr Biol 12 (2002), pp. 1052-8.
Field, C.B., Behrenfeld, M.J., Randerson, J.T. and Falkowski, P. Primary production of the
biosphere: integrating terrestrial and oceanic components. Science 281 (1998), pp.
237-40.
Fields, S. and Song, O. A novel genetic system to detect protein-protein interactions. Nature
340 (1989), pp. 245-6.
Fothergill-Gilmore, L.A. The evolution of the glycolytic pathway. Trends Biochem Sci 11
(1986), pp. 47-51.
Fouts, D., Ganguly, R., Gutierrez, A.G., Lucchesi, J.C. and Manning, J.E. Nucleotide
sequence of the Drosophila glucose-6-phosphate dehydrogenase gene and comparison
with the homologous human gene. Gene 63 (1988), pp. 261-75.
Gray, M.W. Evolution of organellar genomes. Curr Opin Genet Dev 9 (1999), pp. 678-87.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
31
Grossman, A.R. Paths toward algal genomics. Plant Physiol 137 (2005), pp. 410-27.
Herron, M.D., Hackett, J.D., Aylward, F.O. and Michod, R.E. Triassic origin and early
radiation of multicellular volvocine algae. Proc Natl Acad Sci U S A 106 (2009), pp.
3254-8.
Hunter, S., Jones, P., Mitchell, A., Apweiler, R., Attwood, T.K., Bateman, A., Bernard, T.,
Binns, D., Bork, P., Burge, S., de Castro, E., Coggill, P., Corbett, M., Das, U.,
Daugherty, L., Duquenne, L., Finn, R.D., Fraser, M., Gough, J., Haft, D., Hulo, N.,
Kahn, D., Kelly, E., Letunic, I., Lonsdale, D., Lopez, R., Madera, M., Maslen, J.,
McAnulla, C., McDowall, J., McMenamin, C., Mi, H., Mutowo-Muellenet, P.,
Mulder, N., Natale, D., Orengo, C., Pesseat, S., Punta, M., Quinn, A.F., Rivoire, C.,
Sangrador-Vegas, A., Selengut, J.D., Sigrist, C.J., Scheremetjew, M., Tate, J.,
Thimmajanarthanan, M., Thomas, P.D., Wu, C.H., Yeats, C. and Yong, S.Y. InterPro
in 2011: new developments in the family and domain prediction database. Nucleic
Acids Res 40 (2011), pp. D306-12.
Huson, D.H., Richter, D.C., Rausch, C., Dezulian, T., Franz, M. and Rupp, R. Dendroscope:
An interactive viewer for large phylogenetic trees. BMC Bioinformatics 8 (2007), p.
460.
Jogl, G., Rozovsky, S., McDermott, A.E. and Tong, L. Optimal alignment for enzymatic
proton transfer: structure of the Michaelis complex of triosephosphate isomerase at
1.2-A resolution. Proc Natl Acad Sci U S A 100 (2003), pp. 50-5.
Kohlhoff, M., Dahm, A. and Hensel, R. Tetrameric triosephosphate isomerase from
hyperthermophilic Archaea. FEBS Lett 383 (1996), pp. 245-50.
Kummerfeld, S.K. and Teichmann, S.A. Relative rates of gene fusion and fission in multi-
domain proteins. Trends Genet 21 (2005), pp. 25-30.
Kuroiwa, T. The primitive red algae Cyanidium caldarium and Cyanidioschyzon merolae as
model system for investigating the dividing apparatus of mitochondria and plastids.
BioEssays 20 (1998), pp. 344–354.
Lal, A., Schutzbach, J.S., Forsee, W.T., Neame, P.J. and Moremen, K.W. Isolation and
expression of murine and rabbit cDNAs encoding an alpha 1,2-mannosidase involved
in the processing of asparagine-linked oligosaccharides. J Biol Chem 269 (1994), pp.
9872-81.
Letunic, I. and Bork, P. Interactive Tree Of Life v2: online annotation and display of
phylogenetic trees made easy. Nucleic Acids Res 39 (2011), pp. W475-8.
Lolis, E., Alber, T., Davenport, R.C., Rose, D., Hartman, F.C. and Petsko, G.A. Structure of
yeast triosephosphate isomerase at 1.9-A resolution. Biochemistry 29 (1990), pp.
6609-18.
Magrane, M. and Consortium, U. UniProt Knowledgebase: a hub of integrated protein data.
Database (Oxford) 2011 (2011), p. bar009.
Marcotte, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O. and Eisenberg, D.
Detecting protein function and protein-protein interactions from genome sequences.
Science 285 (1999), pp. 751-3.
Martin, W., Brinkmann, H., Savonna, C. and Cerff, R. Evidence for a chimeric nature of
nuclear genomes: eubacterial origin of eukaryotic glyceraldehyde-3-phosphate
dehydrogenase genes. Proc Natl Acad Sci U S A 90 (1993), pp. 8692-6.
Martini, G. and Ursini, M.V. A new lease of life for an old enzyme. Bioessays 18 (1996), pp.
631-7.
Matsuzaki, M., Misumi, O., Shin, I.T., Maruyama, S., Takahara, M., Miyagishima, S.Y.,
Mori, T., Nishida, K., Yagisawa, F., Yoshida, Y., Nishimura, Y., Nakao, S., Kobayashi,
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
32
T., Momoyama, Y., Higashiyama, T., Minoda, A., Sano, M., Nomoto, H., Oishi, K.,
Hayashi, H., Ohta, F., Nishizaka, S., Haga, S., Miura, S., Morishita, T., Kabeya, Y.,
Terasawa, K., Suzuki, Y., Ishii, Y., Asakawa, S., Takano, H., Ohta, N., Kuroiwa, H.,
Tanaka, K., Shimizu, N., Sugano, S., Sato, N., Nozaki, H., Ogasawara, N., Kohara, Y.
and Kuroiwa, T. Genome sequence of the ultrasmall unicellular red alga
Cyanidioschyzon merolae 10D. Nature 428 (2004), pp. 653-7.
Merchant, S.S., Prochnik, S.E., Vallon, O., Harris, E.H., Karpowicz, S.J., Witman, G.B.,
Terry, A., Salamov, A., Fritz-Laylin, L.K., Marechal-Drouard, L., Marshall, W.F., Qu,
L.H., Nelson, D.R., Sanderfoot, A.A., Spalding, M.H., Kapitonov, V.V., Ren, Q.,
Ferris, P., Lindquist, E., Shapiro, H., Lucas, S.M., Grimwood, J., Schmutz, J., Cardol,
P., Cerutti, H., Chanfreau, G., Chen, C.L., Cognat, V., Croft, M.T., Dent, R., Dutcher,
S., Fernandez, E., Fukuzawa, H., Gonzalez-Ballester, D., Gonzalez-Halphen, D.,
Hallmann, A., Hanikenne, M., Hippler, M., Inwood, W., Jabbari, K., Kalanon, M.,
Kuras, R., Lefebvre, P.A., Lemaire, S.D., Lobanov, A.V., Lohr, M., Manuell, A.,
Meier, I., Mets, L., Mittag, M., Mittelmeier, T., Moroney, J.V., Moseley, J., Napoli, C.,
Nedelcu, A.M., Niyogi, K., Novoselov, S.V., Paulsen, I.T., Pazour, G., Purton, S., Ral,
J.P., Riano-Pachon, D.M., Riekhof, W., Rymarquis, L., Schroda, M., Stern, D., Umen,
J., Willows, R., Wilson, N., Zimmer, S.L., Allmer, J., Balk, J., Bisova, K., Chen, C.J.,
Elias, M., Gendler, K., Hauser, C., Lamb, M.R., Ledford, H., Long, J.C., Minagawa,
J., Page, M.D., Pan, J., Pootakham, W., Roje, S., Rose, A., Stahlberg, E., Terauchi,
A.M., Yang, P., Ball, S., Bowler, C., Dieckmann, C.L., Gladyshev, V.N., Green, P.,
Jorgensen, R., Mayfield, S., Mueller-Roeber, B., Rajamani, S., Sayre, R.T., Brokstein,
P., et al. The Chlamydomonas genome reveals the evolution of key animal and plant
functions. Science 318 (2007), pp. 245-50.
Muramoto, K., Ohta, K., Shinzawa-Itoh, K., Kanda, K., Taniguchi, M., Nabekura, H.,
Yamashita, E., Tsukihara, T. and Yoshikawa, S. Bovine cytochrome c oxidase
structures enable O2 reduction with minimization of reactive oxygens and provide a
proton-pumping gate. Proc Natl Acad Sci U S A 107 (2010), pp. 7740-5.
Ndimba BK, Ndimba RJ, Johnson TS, Waditee-Sirisattha R, Baba M, Sirisattha S, Shiraiwa
Y, Agrawal GK, Rakwal R.Biofuels as a sustainable energy source: An update of the
applications of proteomics in bioenergy crops and algae.J Proteomics. 2013. S1874-
3919(13)00332-1.
Ostermeier, C., Harrenga, A., Ermler, U. and Michel, H. Structure at 2.7 A resolution of the
Paracoccus denitrificans two-subunit cytochrome c oxidase complexed with an
antibody FV fragment. Proc Natl Acad Sci U S A 94 (1997), pp. 10547-53.
Palenik, B., Grimwood, J., Aerts, A., Rouze, P., Salamov, A., Putnam, N., Dupont, C.,
Jorgensen, R., Derelle, E., Rombauts, S., Zhou, K., Otillar, R., Merchant, S.S., Podell,
S., Gaasterland, T., Napoli, C., Gendler, K., Manuell, A., Tai, V., Vallon, O., Piganeau,
G., Jancek, S., Heijde, M., Jabbari, K., Bowler, C., Lohr, M., Robbens, S., Werner, G.,
Dubchak, I., Pazour, G.J., Ren, Q., Paulsen, I., Delwiche, C., Schmutz, J., Rokhsar,
D., Van de Peer, Y., Moreau, H. and Grigoriev, I.V. The tiny eukaryote Ostreococcus
provides genomic insights into the paradox of plankton speciation. Proc Natl Acad Sci
U S A 104 (2007), pp. 7705-10.
Parkinson, J. and Gordon, R. Beyond micromachining: the potential of diatoms. Trends
Biotechnol 17 (1999), pp. 190-6.
Perez-Martinez, X., Antaramian, A., Vazquez-Acevedo, M., Funes, S., Tolkunova, E.,
d'Alayer, J., Claros, M.G., Davidson, E., King, M.P. and Gonzalez-Halphen, D.
Subunit II of cytochrome c oxidase in Chlamydomonad algae is a heterodimer
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
33
encoded by two independent nuclear genes. J Biol Chem 276 (2001), pp. 11302-9.
Phizicky, E.M. and Fields, S. Protein-protein interactions: methods for detection and analysis.
Microbiol Rev 59 (1995), pp. 94-123.
Prochnik, S.E., Umen, J., Nedelcu, A.M., Hallmann, A., Miller, S.M., Nishii, I., Ferris, P.,
Kuo, A., Mitros, T., Fritz-Laylin, L.K., Hellsten, U., Chapman, J., Simakov, O.,
Rensing, S.A., Terry, A., Pangilinan, J., Kapitonov, V., Jurka, J., Salamov, A., Shapiro,
H., Schmutz, J., Grimwood, J., Lindquist, E., Lucas, S., Grigoriev, I.V., Schmitt, R.,
Kirk, D. and Rokhsar, D.S. Genomic analysis of organismal complexity in the
multicellular green alga Volvox carteri. Science 329 (2010), pp. 223-6.
Putnam, N.H., Srivastava, M., Hellsten, U., Dirks, B., Chapman, J., Salamov, A., Terry, A.,
Shapiro, H., Lindquist, E., Kapitonov, V.V., Jurka, J., Genikhovich, G., Grigoriev, I.V.,
Lucas, S.M., Steele, R.E., Finnerty, J.R., Technau, U., Martindale, M.Q. and Rokhsar,
D.S. Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic
organization. Science 317 (2007), pp. 86-94.
Sarafidou, T., Kahl, C., Martinez-Garay, I., Mangelsdorf, M., Gesk, S., Baker, E., Kokkinaki,
M., Talley, P., Maltby, E.L., French, L., Harder, L., Hinzmann, B., Nobile, C.,
Richkind, K., Finnis, M., Deloukas, P., Sutherland, G.R., Kutsche, K., Moschonas,
N.K., Siebert, R. and Gecz, J. Folate-sensitive fragile site FRA10A is due to an
expansion of a CGG repeat in a novel gene, FRA10AC1, encoding a nuclear protein.
Genomics 84 (2004), pp. 69-81.
Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM,
DiCuccio M, Edgar R, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y,
Landsman D, Lipman DJ, Madden TL, Maglott DR, Miller V, Mizrachi I, Ostell J,
Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Shumway M, Sirotkin K, Souvorov A,
Starchenko G, Tatusova TA, Wagner L, Yaschenko E, Ye J (2009). Database resources
of the National Center for Biotechnology Information. Nucleic Acids Res. 2009
Jan;37(Database issue):D5-15.
D. Sellis, D. Vlachakis, and M. Vlassi, Gromita: a fully integrated graphical user interface to
gromacs 4. Bioinform Biol Insights 3 (2009) 99-102.
Sims, P., Mann, D., and Medlin, L. Evolution of the diatoms: insights from fossil, biological
and molecular data. Phycologia 45 (2006), pp. 361–402.
Singer, M.S., Kahana, A., Wolf, A.J., Meisinger, L.L., Peterson, S.E., Goggin, C., Mahowald,
M. and Gottschling, D.E. Identification of high-copy disruptors of telomeric silencing
in Saccharomyces cerevisiae. Genetics 150 (1998), pp. 613-32.
Skarzynski, T. and Wonacott, A.J. Coenzyme-induced conformational changes in
glyceraldehyde-3-phosphate dehydrogenase from Bacillus stearothermophilus. J Mol
Biol 203 (1988), pp. 1097-118.
Snel, B., Bork, P. and Huynen, M. Genome evolution. Gene fusion versus gene fission.
Trends Genet 16 (2000), pp. 9-11.
Takeda, H. Classification of Chlorella strains by cell wall sugar composition. Phytochemistry
27 (1988), pp. 3823–3826.
Tsagrasoulis, D., Danos, V., Kissa, M., Trimpalis, P., Koumandou, V.L., Karagouni, A.D.,
Tsakalidis, A. and Kossida, S. SAFE Software and FED Database to Uncover Protein-
Protein Interactions using Gene Fusion Analysis. Evol Bioinform Online 8 (2012), pp.
47-60.
Tsukihara, T., Aoyama, H., Yamashita, E., Tomizaki, T., Yamaguchi, H., Shinzawa-Itoh, K.,
Nakashima, R., Yaono, R. and Yoshikawa, S. The whole structure of the 13-subunit
oxidized cytochrome c oxidase at 2.8 A. Science 272 (1996), pp. 1136-44.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
34
van Leeuwen, F., Gafken, P.R. and Gottschling, D.E. Dot1p modulates silencing in yeast by
methylation of the nucleosome core. Cell 109 (2002), pp. 745-56.
Wacker, H., Harvey, R.A., Winestock, C.H. and Plaut, G.W. 4-(1'-D-Ribitylamino)-5-Amino-
2,6-Dihydroxypyrimidine, the Second Product of the Riboflavin Synthetase Reaction.
J Biol Chem 239 (1964), pp. 3493-7.
Yanai, I., Derti, A. and DeLisi, C. Genes linked by fusion events are generally of the same
functional category: a systematic analysis of 30 microbial genomes. Proc Natl Acad
Sci U S A 98 (2001), pp. 7940-5.
Yoon, H.S., Hackett, J.D., Ciniglia, C., Pinto, G. and Bhattacharya, D. A molecular timeline
for the origin of photosynthetic eukaryotes. Mol Biol Evol 21 (2004), pp. 809-18.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
35
FIGURE LEGENDS
Figure 1. Schematic representations displaying the alignment of identified fusion (A,B, D &
E) or fission ( C ) proteins in green microalgal species (A and D) and diatoms (D and E) with
their respective split or composite proteins. The amino acid positions that correspond to
the beginning and the end of the alignment are indicated, as well as the boundaries of the
conserved domains relative to the full-length protein. Asterisks beside the microalgal
species names below denote that the particular species was chosen as a representative
amongst others for the purposes of the alignment comparison.
A: Alignment of fusion protein alpha 1,2 mannosidase/Fra10Ac1 in the green alga V. carteri
with the respective split proteins in the green alga C. reinhardtii*.
B: Alignment of fusion protein DOT1/riboflavin synthase in the green alga V. carteri with
the respective split proteins in the green alga C. reinhardtii*.
C: Alignment of fission proteins COX2A and COX2B in the green alga V. carteri* with the
respective composite protein in the red alga C.merolae.
D: Alignment of fusion protein G6PDH/6PGDH in the diatom P. tricornutum with the
respective split proteins in the diatom T. pseudonana*.
E: Alignment of fusion protein TIM/GAPDH inthediatom Phaeodactylum tricornutum*with
the respective split proteins in the red alga C. merolae.
Figure 2. Ribbon representations of the three dimensional homology models for fusion and
individual fission and split proteins.
A Left: Ribbon representation of the X-ray crystal structure of 3GYX, which was used as
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
36
template. 3GYX consists of six copies of a heterodimer that is made up of a large α-helical
barrel-like conformation (in orange color) and a smaller molecule in an extended coil and
β-sheet conformation that wraps around the larger component (in blue color).
Right: The 3D homology modelled molecular system of the alpha 1,2 mannosidase (in
green color) and the Fra10Ac1 molecule (in red color) in complexed conformation.
B. Left: Ribbon representation of the 3D homology model for the riboflavin synthase
model superposed on its X-ray crystal template structure. The theoretical model is in red,
whereas the template X-ray crystal structure is shown in green color.
Right: Following the spatial organization of the homotrimeric template complex structure,
the 3D model of the complex was established. This trimeric model ( shown per monomer
in green, red and blue color and wire representation) was subsequently subjected to
docking algorithms in the presence of a single DOT1 molecule (shown in yellow ribbon),
also established via computer-aided homology modelling techniques. Herein is a snapshot
of the final complex conformation.
C. Left: Ribbon representation of the produced 3D homology model of the COXIIa and
COXIIb complex including the modelled inserts of the fusion sites. COXIIa is showing in red,
COXIIb in magenta, the secondary structure predicted α-helical insert in blue and the α-
helical structure from the Thermotoga maritima bacterium in green color.
Right: using the previous conventions (C, left), the 3D homology model of COXIIa-b is
modelled on the complete two-subunit Cytochrome C Oxidase from the Paracoccus
Denitrificans (RCSB entry: 1AR1) crystal structure, which was used as template.
D. Ribbon representation of the 3D homology model for the complex of the predicted
fusion event in Phaeodactylum tricornutum. The G6PDH model is in red, whereas the
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
37
component 6PGDH model is shown in green.
E. Ribbon representation of the produced Thalassiosira pseudonana fusion protein
hypothesis 3D model. The GAPDH model is in red, whereas the TIM model is shown in
green.
Figure 3. Electrostatic potential surfaces were calculated in order to analyze and compare
the charge distribution of the produced 3D model of the COXIIa and COXIIb complex to its
template structure that was based on. The two complexes exhibited almost identical
electrostatic surfaces, sharing common features that were not disturbed by the addition of
the two extra a-helices on the homology model. There is a hydrophobic, uncharged region
in the mid section of the two complexes (depicted by white boxes) that is vital to its
function that has been conserved, despite the addition of the insert structures. This
observation verified the validity of the model, which was found to share similar
electrostatic surface of almost identical intensity, to its X-ray determined template
structure. The 3D position of the two extra α-helices is indicated by the square black boxes.
Figure 4. Dendrograms of identified fusion and fission proteins in microalgal species.
A: Dendrogram of fusion protein alpha 1,2 mannosidase/Fra10Ac1
B: Dendrogram of fusion protein DOT1/riboflavin synthase
C: Dendrogram of COX2
D: Dendrogram of fusion protein G6PDH/6PGDH
E: Dendrogram of fusion protein TIM/GAPDH
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
38
Figure 5. NCBI-extracted dendrogramillustrating the phylogenetic distribution of the
predicted fusion/fission events in the main eukaryotic and prokaryotic taxa. The
conventions/symbols are the same as in F Figure 4.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
39
TABLE LEGENDS
Table 1: List and description of the organisms analyzed in the present study.
Table 2: Predicted fusion events and the corresponding fused and heterodimeric proteins.
Table 3: Predicted fission event and the corresponding split and composite proteins.
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
40
Figure 1
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
41
Figure 2
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
42
Figure 3
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
43
Figure 4
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
44
Figure 5
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
45
Table 1
Taxonomy / Species Description
1. Chlorophyta (Green algae) They constitute a large diverse group of photosynthetic eukaryotes from which the land plants (Streptophytes) descended one billion years ago (Yoon et al., 2004). Green algae originated from primary endosymbiosis, whereby a non-photosynthetic eukaryote acquired a chloroplast by engulfing a primary cyanobacterium (Gray, 1999; Falkowski et al., 2004). They play an important role in global energy and biomass production (Grossman, 2005).
1.1.Chlorophyceae
1.1.1. Chlamydomonadales
Volvox carteri f. nagariensis Multicellular chlorophyceaen alga (Prochnik et al., 2010) which diverged from its unicellular ancestors approximately 200 million years ago (Herron et al., 2009). V. carteri is extensively used for studying multicellularity and cellular
differentiation (Herron et al., 2009; Prochnik et al., 2010)
Chlamydomonas reinhardtii C. reinhardtii (Merchant et al., 2007) is a unicellular prasinophyte which is found in diverse aquatic environments. It provides a model organism for studying eukaryotic photosynthesis, cellular metabolism and sexual reproduction due to its well established genetic background.
1.2. Trebouxiophyceae
Chlorella variabilis NC64A unicellular organism, found both in aquatic and terrestrial environments (Takeda, 1988). C. variabilis is used as a model organism for studying adaptation to photosymbiosis and viral-algal interactions (Blanc et al., 2010)
1.3. Mamiellophyceae
Ostreococcus lucimarinusCCE9901 One of the smallest known unicellular marine eukaryotes (about 1 μm diameter) (Piganeau et al., 2011). It is found in the phytoplankton of diverge marine environments. This alga, believed to be the last common ancestor of the green lineage from which all other green algae and terrestrial plants have emerged, is used in evolutionary and genomic studies (Derelle et al., 2006; Palenik et al., 2007)
2. Rhodophyta (Red algae) Red algae, like green algae, are also believed to have descended from an ancestral cyanobacterial endosymbiont (Gray, 1999; Falkowski et al., 2004). They are proposed to have evolved before the divergence of green algae from terrestrial plants (Merchant et al., 2007).
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
46
Cyanidioschyzon merolae10D C. merolae is a small (2 μm diameter) unicellular red alga with a compact
genome (about 16 Mb). (Matsuzaki et al., 2004) found in acidic hot springs. The cell contains a single nucleus, a single plastid and a single mitochondrion. Due to its simple gene composition, the genome of C. merolae is used as a model system for studying the origin, evolution and fundamental traits of eukaryotic cells (Kuroiwa, 1998)
3. Bacillariophyta (Diatoms) Diatoms are unicellular, photosynthetic algae. They are distributed in almost all water bodies throughout the world, and play an important role in the global ecosystem since they are responsible for about one-fifth of global carbon fixation (Field et al., 1998; Armbrust, 2009). A characteristic feature of diatoms is their intricate silicified cell wall, or frustule, which is exploited by nanotechnologists (Parkinson and Gordon, 1999)
3.1. Coscinodiscophyceae (Centric
diatoms)
Thalassiosira pseudonana CCMP1335 T. pseudonana (Armbrust et al., 2004) is a marine centric diatom, the origin of which is traced to 180 mllion years ago (Sims, 2006). It has been used as a model organism for studying diatom physiology (Armbrust et al., 2004)
3.2. Bacillariophyceae (Pennate
diatoms)
Phaeodactylum tricornutumCCAP
1055/1
P. tricornutum is a pennate diatom with a small genome size (about 30 Mb). It diverged from T. pseudonana 90 million years ago, and is found both in pelagic and benthic environments (Sims, 2006). It has served as a model system for studying diatom genomics (Bowler et al., 2008)
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
47
Table 2
Fused protein Heterodimeric protein
Species Protein Accession Species Protein Accession Locus
Volvox carteri
alpha-1,2-mannosidase
XP_002957696
Chlamydomonas
reinhardtii
alpha-1,2-mannosidase EDO98388 Contig ABCN01005783.1
predicted protein EDO98391 Contig ABCN01005784.1
Ostreococcus
lucimarinus
predicted protein XP_001421581 Chromosome 15
predicted protein XP_001417927 Chromosome 5
Chlorella variabilis
hypothetical protein
CHLNCDRAFT_33654 EFN60136 Contig ADIC01000169.1
hypothetical protein
CHLNCDRAFT_22127 EFN56799
Contig ADIC01001059.1
Thalassiosira
pseudonana
mannosyl-oligosaccharide 1,2-alpha-
mannosidase EED93215.1 Chromosome 4
predicted protein EED88579 Chromosome 15
Phaeodactylum
tricornutum
mannosyl-oligosaccharide alpha-1,2-
mannosidase XP_002182479 Chromosome 16
predicted protein XP_002180997 Chromosome 10
Volvox carteri
hypothetical protein
VOLCADRAFT_12073
0
XP_002949156
Chlamydomonas
reinhardtii
predicted protein EDP03930 Contig ABCN01002604.1
riboflavin synthase EDP05053 Contig ABCN01002077.1
Ostreococcus
lucimarinus
predicted protein XP_001417943 Chromosome 5
predicted protein XP_001418747 Chromosome 7
Thalassiosira
pseudonana
predicted protein EED96376 Chromosome 1
predicted protein EED95235 Chromosome 2
Phaeodactylum
tricornutum
predicted protein XP_002177531 Chromosome 1
predicted protein XP_002182351 Chromosome 15
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
48
Phaeodactylum
tricornutum
G6PDH/6PGDH
fusion protein XP_002185945
Thalassiosira
pseudonana
glucose-6-phosphate 1-dehydrogenase EED92550 Chromosome 5
6-phosphogluconate dehydrogenase EED93357 Chromosome 4
Ostreococcus
lucimarinus
predicted protein XP_001417868 Chromosome 5
predicted protein XP_001418779 Chromosome 7
Volvox carteri
hypothetical protein
VOLCADRAFT_82038 XP_002953022 scaffold VOLCAscaffold_33
hypothetical protein
VOLCADRAFT_109207 XP_002953515 scaffold VOLCAscaffold_36
Chlamydomonas
reinhardtii
glucose-6-phosphate-1-dehydrogenase EDP00500.1 Contig ABCN01004386.1
6-phosphogluconate dehydrogenase,
decarboxylating EDP00572.1 Contig ABCN01004298.1
1.
Thalassiosira
pseudonana
2.
Phaeodactylum
tricornutum
triosephosphate
isomerase
/glyceraldehyde-3-
phosphate
dehydrogenase
precursor
triosephosphate
isomerase
/glyceraldehyde-3-
phosphate
dehydrogenase
precursor
EED92326
XP_002177987
Chlamydomonas
reinhardtii
triose-phosphate isomerase EDP09773 Contig ABCN01000165.1
glyceraldehyde 3-phosphate
dehydrogenase,
dominant splicing variant
EDO96575 Contig ABCN01007563.1
Volvox carteri
hypothetical protein
VOLCADRAFT_109969 XP_002955427 scaffold VOLCAscaffold_51
glyceraldehyde 3-phosphate dehydrogenase XP_002956882 scaffold VOLCAscaffold_66
Chlorella variabilis triosephosphate isomerase cytoplasmic type EFN53775 Contig ADIC01002012.1
hypothetical protein
CHLNCDRAFT_36383 EFN53819 Contig ADIC01002038.1
Cyanidioschyzon
merolae
triose-phosphate isomerase BAC67674 -
Glyceraldehyde 3 phosphate dehydrogenase BAC67669 -
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
49
Table 3
Fission protein Composite protein
Species Protein Accession Protein Species Accessio
n
1. Volvox carteri
2.
Chlamydomonas
reinhardtii
hypothetical protein
VOLCADRAFT_74497
cytochrome c oxidase
subunit II
XP_002950066 Scaffold
VOLCAscaffold_17
Cyanidioschyzon
merolae
cytochrome c oxidase
polypeptide II
BAA34656.1
XP_002948528 Scaffold
VOLCAscaffold_10
cytochrome c oxidase
subunit II,
protein IIa of split subunit
EDP00208.1 Contig
ABCN01004598.1
cytochrome c oxidase
subunit II,
protein IIb of split subunit
EDP09974.1 Contig
ABCN01000296.1
ACC
EPTE
D M
ANU
SCR
IPT
ACCEPTED MANUSCRIPT
50
ABBREVIATIONS
CO2 Carbon dioxide
SAFE Software for the Analysis of Fusion Events
3D Three-dimensional
BLAST Basic Local Alignment Search Tool
Fra10Ac1 Fragile site, folic acid type, rare, fra(10)(q23.3) or fra(10)(q24.2) candidate 1
DOT1 Disruptor of Telomeric silencing
COX2A Cytochrome C oxidase subunit II, transmembrane domain
COX2B Cytochrome c oxidase subunit II C-terminal
G6PDH Glucose-6-phosphate 1-dehydrogenase
TIM Triosephosphate isomerase
6PGDH 6-Phosphogluconate Dehydrogenase
GAPDH Glyceraldehyde-3-phosphate dehydrogenase
DMRL 6,7-dimethyl-8-(1’-D-ribityl)-lumanize
G3P D-glyceraldehyde 3-phosphate
DHAP Dihydroxyacetone phosphate
PDB Protein Data Bank
RNA Ribonucleic acid
NCBI National Center for Biotechnology Information
ATP Adenosine triphosphate
WD Trp-Asp
SH3 Src Homology 3 Domain
NPS Network Protein Sequence Analysis
MOE Molecular Operating Environment
MEP Molecular electrostatic potential
NVT Number of atoms, Volume and Temperature
RMSd Root-mean-square deviation