+ All Categories
Home > Documents > Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

Date post: 12-Dec-2016
Category:
Upload: sophia
View: 214 times
Download: 2 times
Share this document with a friend
51
Unravelling microalgal molecular interactions using evolutionary and struc- tural bioinformatics Dimitrios Vlachakis, Athanasia Pavlopoulou, Dorothea Kazazi, Sophia Kossida PII: S0378-1119(13)00928-1 DOI: doi: 10.1016/j.gene.2013.07.039 Reference: GENE 38831 To appear in: Gene Accepted date: 18 July 2013 Please cite this article as: Vlachakis, Dimitrios, Pavlopoulou, Athanasia, Kazazi, Dorothea, Kossida, Sophia, Unravelling microalgal molecular interactions using evolu- tionary and structural bioinformatics, Gene (2013), doi: 10.1016/j.gene.2013.07.039 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Transcript
Page 1: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

�������� ����� ��

Unravelling microalgal molecular interactions using evolutionary and struc-tural bioinformatics

Dimitrios Vlachakis, Athanasia Pavlopoulou, Dorothea Kazazi, SophiaKossida

PII: S0378-1119(13)00928-1DOI: doi: 10.1016/j.gene.2013.07.039Reference: GENE 38831

To appear in: Gene

Accepted date: 18 July 2013

Please cite this article as: Vlachakis, Dimitrios, Pavlopoulou, Athanasia, Kazazi,Dorothea, Kossida, Sophia, Unravelling microalgal molecular interactions using evolu-tionary and structural bioinformatics, Gene (2013), doi: 10.1016/j.gene.2013.07.039

This is a PDF file of an unedited manuscript that has been accepted for publication.As a service to our customers we are providing this early version of the manuscript.The manuscript will undergo copyediting, typesetting, and review of the resulting proofbefore it is published in its final form. Please note that during the production processerrors may be discovered which could affect the content, and all legal disclaimers thatapply to the journal pertain.

Page 2: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

1

Unravelling microalgal molecular interactions using evolutionary

and structural bioinformatics

Dimitrios Vlachakis, Athanasia Pavlopoulou, Dorothea Kazazi, and Sophia Kossida*

Bioinformatics & Medical Informatics Team, Biomedical Research Foundation, Academy of Athens, Soranou Efessiou 4, Athens 11527, Greece

*Correspondence to: Sophia Kossida, Bioinformatics & Medical Informatics Team, Biomedical Research Foundation, Academy of Athens, Soranou Efessiou 4, Athens 11527, Greece Tel: + 30 210 6597 199, Fax: +30 210 6597 545 E-mail: [email protected]

Highlights:

We analysed 7 microalgae organisms, carefully selected to belong to diverse groups.

We identified one fission and four fusion events that are considered genuine.

Protein interactions and functional links were identified in the 7 microalgae.

We investigated their evolutionary links via protein phylogenetic profiling.

The 3D structures of the identified proteins were modelled to study their function.

Page 3: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

2

ABSTRACT

Microalgae are unicellular microorganisms indispensible for environmental stability and

life on earth, because they produce approximately half of the atmospheric oxygen, with

simultaneously feeding on the harmful greenhouse gas carbon dioxide. Using gene fusion

analysis, a series of five fusion/fission events was identified, that provided the basis for

critical insights to their evolutionary history. Moreover, the three-dimensional structures

of both the fused and the component proteins were predicted, allowing us to envisage

putative protein-protein interactions that are invaluable for the efficient usage, handling

and exploitation of microalgae. Collectively, our proposed approach on the five

fusion/fission algae protein events contributes towards the expansion of the microalgae

knowledgebase, bridging protein evolution of the ancient microalgae species and the

rapidly evolving, modern, bioinformatics field.

Keywords: gene fusion, gene fission, homology modelling, protein association,

microalgae

Page 4: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

3

INTRODUCTION

The demand for sustainable energy reserves and for increased environmental control has

escalated in the last few years, and this trend seems set to continue (Ndimba et al.,

2013). A new ammunition in the race to face this challenge is the exploitation of

microalgae, which until now have been under experimental investigation mostly for their

utilization as biofuels. Recently, their capacity to mitigate CO2emission led to their

exploitation as essential components of bio-adaptive facades of eco-friendly buildings

that generate renewable energy and produce oxygen. Nevertheless, much work remains

to be done in their basic biology.

Bioinformatics analysis methods comprise a swiss army knife that can aid the elucidation

of the molecular mechanisms in microalgae and therefore facilitate their full exploitation.

In particular, virtual protein interactomics represents a rapidly developing scientific area

on the boundary line of bioinformatics and molecular biology and comprises an

instrumental tool for the prediction, simulation and modelling of protein complex

interactions, as well as providing insights into transient intracellular signaling pathways

and protein evolution. Bioinformatics approaches can now identify putative protein-

protein interactions purely from genome sequences (Enright et al., 1999; Marcotte et al.,

1999), complementing labour intensive and time consuming conventional experimental

methods such as mass spectrometry (Ewing et al., 2007), unlinked non-complementing

mutant detection (Phizicky and Fields, 1995) and the widely used yeast two-hybrid assay

(Fields and Song, 1989).

Recombination by fusion is one of the main evolutionary mechanism to produce more

Page 5: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

4

complex and stable protein structures (Kummerfeld and Teichmann, 2005). Therefore,

bioinformatics analysis based on gene fusion and fission constitutes a powerful prediction

method in the study of protein interactions. These analysis is based on the principle that

two component proteins A and B in one organism are likely to have physical interaction or

functional association (involvement in the same protein complex, metabolic pathway or

biological process) (Snel et al., 2000; Enright and Ouzounis, 2001) if their homologs in

another organism are fused together to a single composite protein ΑB (otherwise known

as “Rosetta stone” protein) (Enright et al., 1999; Marcotte et al., 1999). Conversely, a

fission event is considered to have occurred when a composite protein is found split into

its component proteins in a reference genome (Enright and Ouzounis, 2001). Gene fusion

analysis has been applied to a number of eukaryotic and prokaryotic organisms (Enright

et al., 1999; Marcotte et al., 1999; Snel et al., 2000; Yanai et al., 2001; Kummerfeld and

Teichmann, 2005; Dimitriadis et al., 2011). However the proteomes of microalgae species

have not been yet fully explored for such fusion events.

In the present study, we utilized the potential of protein fusion analysis and recently

developed computational software in order to identify potential protein interactions and

functional links in seven microalgal species. SAFE is the only software currently available

in the public domain which has been developed specifically for the automated detection,

filtering and visualization of fusion events (Tsagrasoulis et al., 2012). Most importantly, a

performance comparison of the software against a previous benchmark study of gene

fusions showed that the results by SAFE agree with other methods, while the software

can also be highly selective.

The evolutionary fate of these fusion and fission events was investigated via protein

Page 6: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

5

phylogenetic profiling, which reflects protein evolution and sheds light into the time

frame that these events occur.

Furthermore, the three-dimensional structure of the identified component proteins or

complexes was predicted by employing homology modelling, in order to gain insight into

the protein molecular organization and putative function. Protein homology modelling is

currently recognized as the most accurate method for 3D structure prediction, yielding

models suitable for a wide spectrum of applications, such as structure based molecular

design, docking simulations and mechanism investigation.

The current study focused on the analysis of seven microalgae organisms, selected to

belong to diverse evolutionary lineages, including green algae (Volvox carteri,

Chlamydomonas reinhardtii, Chlorella variabilis,Ostreococcus lucimarinus), red algae

(Cyanidioschyzon merolae) and diatoms (Phaeodactylum tricornutumand Thalassiosira

pseudonana). The algal species that were investigated for putative fusion and fission

events in the current study, along with the taxonomic group to which they belong, are

described in Table 1

Page 7: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

6

RESULTS and DISCUSSION

The present study utilized the potential of gene fusion analysis in order to identify

potential protein interactions and functional links in seven microalgal species which

belong to diverse groups, including green algae (Volvox carteri, Chlamydomonas

reinhardtii, Chlorella variabilis,Ostreococcus lucimarinus), red algae (Cyanidioschyzon

merolae) and diatoms (Phaeodactylum tricornutumand Thalassiosira pseudonana).

Identification of putative fusion and fission events was achieved by comparison of the

seven complete, annotated proteomes against each other in an all-against-all analysis; a

total of 42 analyses were performed. Moreover, in order to extract the maximum amount

of information possible from our analysis, in case a fusion event was not detected in the

proteome of one of the seven organisms under study, then the available proteome of its

phylogenetically closest organism and/or strain was investigated by BLAST search. For

instance, in the case of Chlorella variabilis NC64A, Chlorella vulgaris was examined

instead, and in the case of Ostreococcus lucimarinus CCE9901, Ostreococcus tauris

(Supplementary Table 1 and 2).

The current study identified four fusion events and one fission event that were

considered genuine, based both on our strict parameter settings and a subsequent

thorough manual analysis (see Methods).

Two of these events have been confirmed experimentally and three of them were found

to be species-specific. The five fusion and fission events were:

1) Putative fusion of alpha 1,2 mannosidase and Fra10Ac1 homologs in the green alga

Page 8: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

7

Volvox carteri

2) Putative fusion of DOT1 and riboflavin synthase in the green alga Volvox Carteri

3) Putative fission proteins COX2A and COX2B, in the green algae Volvox carteri and

Chlamydomonas reinhardtii

4) Putative fusion of G6PDH and 6PGDH in the diatom Phaeodactylum tricornutum

5) Putative fusion of TIM and GAPDH in the diatoms Phaeodactylum tricornutum and

Thalassiosira pseudonana

The domain organization of the fused proteins was found to correspond to the domain

organization of two individual proteins in one or more other microalgae species, as

shown in Table 2. In order to display the comparison between the domain organisation of

the fused and the individual proteins, one of those species was chosen as representative

and the diagrammatic representations are visible in Figure 1. It should be noted that the

individual fusion or fission C.reinhadtii proteins in the first two fusion and single fission

events are encoded by non-homologous genes residing in different

chromosomes/scaffolds/contigs. This also applies to the individual fusion proteins in T.

pseudonana and C. merolae for the fourth and fifth detected events. The predicted fusion

events are discussed in detail below.

Identified putative interactions and functional associated proteins

Two fusion events were detected in Volvox carteri. The first fused protein

(XP_002957696.1) was found to have a domain organisation that corresponded to the

Page 9: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

8

domain organization of the split Chlamydomonas reinhardtii proteins alpha 1,2

mannosidase (IPR001382, Figure 1A), an enzyme implicated in the processing of Asn-

linked oligosaccharides (Lal et al., 1994) and shown to have lytic activity on harmful

marine microalgae and Fra10Ac1 (IPR019129), a protein of nuclear localization and

unknown function in Homo Sapiens, found expressed in human brain, heart, skeletal

muscle, kidney and liver (Sarafidou et al., 2004). Similar split protein pairs were also

detected in Ostreococcus lucimarinus, Chlorella variabilis, Thalassiosira pseudonana and

Phaeodactylum tricornutum. It should be noted here that all Genbank accession numbers

comensing with XP represent a computer-automated prediction, which have not been

manually curated, annotated or experimentally confirmed.

The second fused protein identified in Volvox (XP_002949156.1) had a domain

organization that corresponded to the domain rearrangement of the proteins DOT1

(IPR013110) and riboflavin synthase (IPR017938) in C. Reinhardtii (Figure 1B), while

similar protein pairs were found in Chlamydomonas reinhardtii, Ostreococcus lucimarinus,

Thalassiosira pseudonana and Phaeodactylum tricornutum. The function of the split

protein DOT1 (Disruptor of Telomeric silencing) is to modulate gene expression in yeast

by methylating histone H3-lysine 79 (Singer et al., 1998; Feng et al., 2002; van Leeuwen et

al., 2002), while the enzyme riboflavin synthase catalyzes the synthesis of riboflavin from

two molecules of 6,7-dimethyl-8-(1’-D-ribityl)-lumanize (DMRL) (Wacker et al., 1964).

Interestingly, one fission event was detected in both the green algae Volvox carteri

(XP_002950066 & XP_002948528) and Chlamydomonas reinhardtii (EDP00208.1 &

EDP09974.1) (Table 3). Analysis of the domain organization in the split Volvox carteri and

Chlamydomonas reinhardtiiproteins, COX2A and COX2B, showed a correspondence to the

Page 10: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

9

composite protein COX2 in Cyanidioschyzon merolae (BAA34656.1) (Figure 1C). The C.

reinhardtii proteins EDP00208.1 and EDP09974.1, are annotated as cytochrome C oxidase

subunit II, transmembrane domain (COX2A) (IPR011759) and cytochrome c oxidase

subunit II C-terminal (COX2B) (IPR002429), respectively. COX2A and COX2B proteins

correspond to the N- and C-terminal region of the second subunit of the cytochrome c

oxidase; a component of the electron transport chain of aerobic respiration, which is

involved in the transfer of electrons from cytochrome c to reduce molecular oxygen. The

cytochrome c oxidase enzyme complex is located in the inner mitochondrial membrane in

eukaryotes, and in the plasma membrane in bacteria (Tsukihara et al., 1996; Ostermeier

et al., 1997; Muramoto et al., 2010). Most importantly, this fission event has been

experimentally verified by Perez-Martinez et al., 2001 (Perez-Martinez et al., 2001). It was

demonstrated that the proteins COX2A and COX2B are encoded by two distinct genes,

namely cox2a and cox2b, in the Chlamydomonad algae C.reinhardtii and Polytomella sp.

(Perez-Martinez et al., 2001).

In the diatom Phaeodactylum tricornutum a single fusion event, XP_002185945.1 was

detected, where the domain organization of the fused Phaeodactylum protein,

G6PDH/6PGDH, was shown to correspond to the domain organization of the split

proteins' pair in Thalassiosira pseudonana, glucose-6-phosphate 1-dehydrogenase

(G6PDH) (PR001282) and 6-phosphogluconate dehydrogenase (6PGDH)

(IPR006113)(Figure 1D). These two enzymes are implicated in the pentose phosphate

pathway. G6PDH catalyses the first step in the pentose phosphate pathway, which is the

reduction of glucose-6-phosphate into gluconolactone 6-phosphate in the presence of

NADP, releasing NADPH (Fouts et al., 1988; Martini and Ursini, 1996). 6PGDH catalyses

Page 11: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

10

the conversion of 6-phosphogluconate to ribulose 5-phosphate in the presence of NADP,

producing NADPH (Adams et al., 1983; Broedel and Wolf, 1990). Similar split protein pairs

were detected in Volvox carteri, Chlamydomonas reinhardtii, and Ostreococcus

lucimarinus (Table 2).

SAFE analysis identified one fusion event in the diatoms Phaeodactylum tricornutum

(XP_002177987) and Thalassiosira pseudonana (EED92326.1, Table 2). The domain

arrangement of the fused diatom protein TIM/GAPDH was found to correspond to the

domain arrangement of the split protein pairs in the green algae Volvox carteri,

Chlamydomonas reinhardtii, Chlorella variabilis, and in the red alga Cyanidioschyzon

merolae (Table 2). The C. merolae proteins BAC67674.1 and BAC67669.1 are annotated as

triosephosphate isomerase (TIM) (IPR000652) and glyceraldehyde-3-phosphate

dehydrogenase (GAPDH) (IPR006424), respectively (Figure 1E). These enzymes are

implicated in successive steps of glycolysis, the major carbohydrate metabolic pathway in

eukaryotes (Fothergill-Gilmore, 1986). TIM catalyzes the isomerization of D-

glyceraldehyde 3-phosphate (G3P) and dihydroxyacetone phosphate (DHAP) (Bloom and

Topper, 1956; Jogl et al., 2003). TIM is active as a homodimer (Alber et al., 1981; Lolis et

al., 1990) with a notable exception in archaeobacteria where it is active as a tetramer

(Kohlhoff et al., 1996). On the other hand, GAPDH catalyzes the sixth step of the glycolytic

pathway which is the conversion of G3P to 1,3-diphospho-glycerate (Dugaiczyk et al.,

1983; Martin et al., 1993). All known active GAPDH enzymes are homotetramers (Banner

et al., 1975; Skarzynski and Wonacott, 1988).

Page 12: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

11

3D structures of identified component proteins and complexes

Homology modelling was employed to predict the three-dimensional structure of the

identified component proteins or complexes for all the fusion and fission events, and the

resulting 3D structures are shown in Figure 2. The evolutionary history of the identified

gene fusion/fission events was also investigated. The ultimate goal of this analysis was to

determine whether those events are due to gene fusion of fission. Towards this direction,

the conservation of both the fused protein and the individual component proteins across

the main eukaryotic and prokaryotic taxonomic divisions was examined (Figure 5). Based

both on experimental evidence, as well as the position of the "reference" organism within

the species tree in Figure 5, an event was assigned as either fusion or fission (i.e. if a

protein was found to be split in a single taxon and composite in the other taxonomic

groups, then this protein was regarded as the product of a fission event).

Putative fusion of alpha 1,2 mannosidase and Fra10Ac1 homologs in Volvox carteri

While the crystal structure of alpha 1,2 mannosidase has been available for species like

Saccharomyces cerevisiae and 3D structure prediction studies have been performed for

other organisms, the three-dimensional structure of Fra10Ac1 has not been available.

We have produced a model of alpha 1,2 mannosidase complexed with Fra10Ac1, based

on the great structural similarity of the former protein to the crystal structure of

adenylylsulfate reductase from Desulfovibrio gigas (RCSB entry: 3GYX). Specifically, the

Page 13: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

12

crystal structure of the later protein consists of six copies of a heterodimer that is made

up by a large α-helical barrel-like conformation and a smaller molecule in an extended

coil and β-sheet conformation that wraps around the larger component. After structurally

superposing the main subunit of adenylylsulfate reductase on alpha 1,2 mannosidase, a

model of the smaller extended molecule was prepared using the sequence of Fra10Ac1

(Figure 2A, left).Our results suggest that alpha 1,2 mannosidase and Fra10Ac1 may be

functionally associated in the species where they were detected as heterodimers. The

complex was subjected to exhaustive molecular dynamics simulations for a total of 20

nanoseconds. The explicitly solvated, periodic molecular system quickly reached

equilibrium and remained there for the remaining of the simulation time, indicating that

the protein complex (Figure 2A, right) was stable.

Putative fusion of DOT1 and riboflavin synthase in Volvox Carteri

We suggest that the proteins DOT1 and riboflavin synthase (Figure 2B, left) may interact

in species where they were found to be split. Although there is yet no evidence to

support interactions between these two proteins, the results from our homology

modelling study indicate that the homotrimer complex of riboflavin synthase creates a

concave surface in a three way asymmetrical conformation among the three monomers,

which bears just enough space to accommodate a single molecule of human DOT1. Our

docking results revealed a multiple coil-coil interaction pattern between the trimer

riboflavin synthase complex and the human DOT1 molecule (Figure 2B, right) that is

supported by numerous hydrophobic interactions

Page 14: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

13

Putative fission proteins COX2A and COX2B, in Volvox carteri and Chlamydomonas

reinhardtii

The three dimensional homology model of the COXIIa and COXIIb complex is shown in

Figure 2C, left. Electrostatic potential surfaces were calculated in order to analyze and

compare the charge distribution of the produced 3D model to its template structure

(Figure 3). The two complexes exhibited almost identical electrostatic surfaces, sharing

common features that were not disturbed by the addition of the two extra alpha-helices

on the homology model. There is a hydrophobic, uncharged region in the mid section of

the two complexes (depicted by white boxes) that is vital to its function that has been

conserved, despite the addition of the insert structures. This observation verified the

validity of the model, which was found to share similar electrostatic surface of almost

identical intensity, to its X-ray determined template structure.

An intriguing finding came to light after further bioinformatics investigation into the

fusion site. Sequence alignment of the two microalgal component proteins against their

chosen templates revealed that there is an insert of 64 amino acids at the fusion site.

More specifically, there are 21 amino acids prior to and 43 amino acids posterior to the

fusion site. Due to the lack of coordinates for that insert from the template structures,

the 64 amino acid sequence was blasted against the full PDB database. Strikingly, the

structure of a protein fragment from the marine bacterium Thermotoga maritima was

identified for the 43 residue fragment right after the fusion site. In particular, the insert

Page 15: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

14

appeared at chain A of the crystal structure of a trigger factor chaperone with

promiscuous substrate recognition in folding and assembly from the Thermotoga

maritima bacterium (RCSB entry: 3GU0). Through careful investigation and a series of

molecular dynamics simulations on homology-built models bearing or lacking the insert

structure from the bacterium, it was concluded that the insert structure is vital to the

optimal folding of the algal protein and consequently the survival of the algae species. 3D

modelling in silico studies demonstrate that when the parent protein was split into two

component ones, both adjacent residues to the fusion site acquire an extended coil

conformation that is highly exposure to the solvent. The component proteins bearing the

bacterial insert would have not been able toacquire a stable structure without it as the

exposed to the solvent coils are very unstable. Both bacterium inserts consist of a small

coil conformation, which eventually leads to structurally robust α-helices. Molecular

dynamics simulations of component protein missing the insert α-helical structures led to

the conclusion that the exposed coil has too many degrees of freedom and renders the

whole molecular system rather unstable. Notably the molecular system, which was

subjected to a five nanosecond molecular dynamics simulation never reached

equilibrium. On the contrary, the homology model of the same protein bearing the

bacterial insert structures that end in α-helical conformation, quickly reached equilibrium

(± 150 ps) and remained there for the rest of the simulation time. Judging on structural

features, the bacterial α-helical insert as well as the 21 aminoacid α-helical fragment, fit

nicely to its environment by joining in a multiple α-helix bundle conformation, next to the

pack of a-helices already present in the core of the protein. The smaller 21 aminoacid

long insert was modelled in α-helical upon the application of secondary structure

Page 16: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

15

prediction algorithms that pointed this way.

There is more evidence that lends support to our horizontal gene transfer (HGT)

hypothesis of the protein insert from microalgae to Thermotoga maritima. The sequence

identity and similarity between the insert and the bacterial protein is 37 and 59 percent

respectively. The bacterium Thermotoga maritima is a hyperthermophilic organism, that

inhabits the sediments of marine geothermal areas such as hot springs and hydrothermal

vents. The ideal water environmental temperature of the bacterium is around 80 °C.

Currently Thermotoga maritima is the only known bacterium species capable of surviving

at such high temperatures. Importantly, Thermotoga maritima inhabits the same

environment as the algae species under study. Algae and members of the Archaea family

have been well known to live in such hostile environments. For many years and it has

been suggested that Thermotoga maritima is a very ancient organism too. This is firstly

due to its hyperthermophilic abilities and secondly due to its unique deep lineage, based

on phylogenetic analysis of its ribosomal RNA material. Therefore, we speculate that both

algae and Thermotoga maritima had the evolutionary time required for such gene

transfers. Secondly, they both live in environment of extremes that are well known to

accelerate evolution. Finally, it is quite common for bacteria to integrate genes from

neighbouring organisms.

In particular, looking into the composition of its genome more carefully led to the striking

observation that more than 24% of its full genome is identical to that of other Archaea

members. This is the highest genome overlap ever observed in all bacteria species.

Conclusively, our findings suggest that horizontal gene transfer between the Thermotoga

maritima and Archaea or other neighbouring species may have helped this bacterium to

Page 17: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

16

survive in high temperature water. Gene transfer and internalization of exogenous genetic

material between species is highly promoted by the abundant energy in their

environment and the constant evolutionary push.

Putative fusion of G6PDH and 6PGDH in Phaeodactylum tricornutum

The homology model for the complex of proteins G6PDH and 6PGDH from the predicted

fusion event in Phaeodactylum tricornutum proved quite stable upon molecular

dynamicanalysis. There is a set of polar residues on extended coil conformation on both

enzymes that aid to the establishment of strong electrostatic interactions.

Putative fusion of TIM and GAPDH in diatoms Phaeodactylum tricornutum and

Thalassiosira pseudonana

For the Thalassiosira pseudonana fusion protein hypothesis the template structures of

the crystal structure of the rabbit muscle triosephosphate isomerase (RCSB entry: 1R2R)

and the photosynthetic glyceraldehyde-3-phosphate dehydrogenase structure in a

crystal of the A4 isoform complexed with NAD(RCSB entry: 1NBO) were used for TIM

and GAPDH respectively (Figure 2E).

Page 18: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

17

Putative fusion of alpha 1,2 mannosidase and Fra10Ac1 homologs in Volvox carteri

Evolutionary analysis of the fused protein revealed that it is present as a single composite

only in Volvox, as a heterodimer in green algae and diatoms, whereas only an alpha 1,2

mannosidase homolog was detected in the red alga C. merolae (Figure 4A). Orthologs of

the two component proteins were not found in eubacteria and archaea (Figure 5).

The identified fusions of the genes alpha 1,2 mannosidase and Fra10Ac1, as well as DOT1

and riboflavin synthase were unique to Volvox carteri. This microalga, after its divergence

from its unicellular relatives 200 million years ago, has evolved into a highly complex

multicellular organism, where a number of developmental changes have taken place. It is

suggested that in the case of metazoa (e.g. Cnidaria) (Putnam et al., 2007), novel protein

domains and/or combinations of domains contributed to the transition from

unicellularity to multicellularity. Therefore, it would be intriguing to speculate that the

identified fusion events, which resulted in two novel fused proteins in Volvox, could have

contributed to the multicellularity of this organism. Notably, this definition does not

necessarily defines the boundaries between unicellular and multicellular organism.

Despite the lack of evidence to support either direct or indirect association between the

two pairs of component proteins, we were able to identify by homology modelling

conserved protein interaction sites.

Putative fusion of DOT1 and riboflavin synthase in Volvox Carteri

Investigation of the evolutionary fate of the second fused Volvox protein revealed that it

Page 19: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

18

is present as a heterodimer in the Chlorophyceae, Mamiellophyceae and diatoms,

whereas only a riboflavin synthase homolog was detected in the Trebouxiophyceae C.

variabilis and the rhodophyte C. merolae (Figure 4B). Despite thorough database

searches, the fused protein in Volvox was found to exist as a single composite only in

Volvox (Figure 5).

Putative fission proteins COX2A and COX2B, in Volvox carteri and Chlamydomonas

reinhardtii

The heterodimeric protein identified in V. carteri and C. reinhardtii is present as a single

composite protein in Rhodophyta, Chlorophyceae, Mamiellophyceae and diatoms (Figure

4C). Despite thorough searches across diverse eukaryotic and prokaryotic taxonomic

groups, these two proteins were detected as heterodimers only in the algae of the order

Chlamydomonadales (Figure 5). It has been suggested that over the course of evolution,

the gene cox2 was split into two mitochondrial genes in Chlamydomonadales which were

later transferred to the nucleus (Perez-Martinez et al., 2001). Based both on our

evolutionary analysis and previous findings (Perez-Martinez et al., 2001), we propose that

the cox2 division took place after the divergence of Chlamydomonadales from the other

orders of Chlorophyta.

Putative fusion of G6PDH and 6PGDH in Phaeodactylum tricornutum

The fused protein identified in P. tricornutum is present as a heterodimer in the centric

Page 20: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

19

diatom, and also in Chlorophyceae, Mamiellophyceae and Rhodophyta, whereas only one

G6PDH ortholog was detected in Trebouxiophyceae; this is probably due to incomplete

genomic studies (Figure 4D).

Upon examination of the evolutionary fate of this fusion event across eukaryotes and

prokaryotes, we observe that the fused protein G6PDH/6PGDH is species-specific since it

was found exclusively in P. tricornutum (Figure 5). It would be tempting to hypothesize

that there must have been evolutionary pressure for the G6PDH and 6PGDH genes to

fuse during the course of evolution. This fusion event might have taken place in order to

decrease the metabolic load in the Phaeodactylum cell. We propose that the

G6PDH/6PGDH fusion should have occurred after the divergence of pennate diatoms (P.

tricornutum) from the centric diatoms (T. pseudonana), approximately less than 90

million years ago (Sims, 2006).

Putative fusion of TIM and GAPDH in the diatoms Phaeodactylum tricornutum and

Thalassiosira pseudonana

Investigation into the evolutionary fate of the fusion event between TIM and GAPDH by

sequence analysis revealed that the fused protein identified in the diatoms

Phaeodactylum tricornutum and Thalassiosira pseudonana is present as a heterodimer in

red algae, Chrorela whereas only one TIM ortholog was detected in Mamiellophyceae

(Figure 3E); this is probably attributed to incomplete genomic studies. Orthologs of the

TIM/GAPDH fusion protein were also found in the photosynthetic brown algae and the

non-photosynthetic oomycetes (Figure 5), which belong, together with the

Page 21: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

20

photosynthetic diatoms Phaeodctylum and Thalassiosira, to stramenopiles, a

heterogeneous group of heterokonts. We propose that the TIM and GAPDH fusion may

have taken place after the secondary endosymbiosis (Gray, 1999; Falkowski et al., 2004)

since the TIM/GAPDH fused protein was found to be split in green alga and red algae.

Page 22: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

21

CONCLUSIONS

Features that make microalgaeattractive are their CO2 abatement capacity, the ability for

growth at various CO2 concentrations, the ability for some of them to grow

phototrophically or heterotrophically, their rapid growth and scalability and the ease of

genetic manipulation in order to introduce genes of interest into the nuclear,

chloroplastic or mitochondrial genome.

However, the molecular biology of the microalgae has not been fully explored yet. We

argue that the implementation of novel bioinformatic techniques will help to elucidate

microalgal molecular mechanisms with implications on their exploitability. The present

virtual protein interactomics study employed such novel bioinformatics methods to the

proteomes of five diverse microalgal organisms, including green algae (Volvox carteri,

Chlamydomonas reinhardtii, Chlorella variabilis,Ostreococcus lucimarinus), red algae

(Cyanidioschyzon merolae) and diatoms (Phaeodactylum tricornutumand Thalassiosira

pseudonana). Overall we have identified five fusion and fission events, thereby obtaining

important information on putative novel protein interactions. Interestingly, three of the

five events are involved in metabolic pathways. Moreover, by employing homology

modelling we predicted the three-dimensional structures of the identified component

proteins or complexes. Comparative analysis of the evolutionary fate of the fusion and

fission events allowed us to propose hypotheses regarding the timing of these events. We

also indentified an incident of horizontal gene transfer in the bacterium Thermotoga

maritima. There is an urgent need for an in-depth understanding of the molecular

mechanisms of microalgal species for practical applications and the translation of this

Page 23: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

22

knowledge into anthropogenic as well as complex natural ecosystems which are on the

verge of unbalance. The solution to this key challenge in this post-genomic era can be

envisioned through the application of systems biology approaches in order to enrich the

knowledge in the microalgal field. Bridging the gap between virtual interactomics and

structural bioinformatics with experimental findings in microalgae can be achieved

through high throughput profiling data and in silico modelling and we can ideate their

integration with observations at the cellular scale in order to extend our understanding of

microalgal species beyond the analysis of experimental observations. The power of

utilizing bioinformatic methodology approaches in conjunction with classical

experimental procedures can expand the gained insight into the underpinning of the

microalgal molecular functions, with an immediate effect on practical applications having

socio-economic facets, as well as enabling the finer control of the natural and

antropogenic terrestrial and space ecosystems.

Page 24: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

23

MATERIALS AND METHODS

Proteome sequences retrieval

The complete proteome sequences of the seven species were obtained from the NCBI

database (Sayers et al., 2008; Benson et al. 2009). These sequences were derived from a

computer-automated pipeline and stored as preliminary data. The species names,

taxonomic classification and proteome size are summarized and listed in Table 1.

Identification of fusion events

The entire proteome of each of the seven organisms was compared against each of the

other 6 proteomes which were used as references by employing SAFE (Software for the

Analysis of Fusion Events), a computational platform for the automated detection,

filtering and visualization of fusion events (Tsagrasoulis et al., 2012). To reduce false

positives and ensure that we obtain reliable results, we used the following set of

parameter values:

The proteins of the same organisms that shared more than 85% identity over their entire

length were considered as duplicates and were removed from the subsequent steps of

the analysis, to avoid redundancy

Two component proteins in a query organism were considered to be fused in the proteome

of the reference organism only if they had a minimum domain length of 70 amino acids.

The minimum percentage of identity between two orthologous domains is set by default to

27% in SAFE. However, given that the 7 algae species under study are phylogenetically

Page 25: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

24

closely related (e.g. V. carteri and C. reinhardtii, T. pseudonana and P. tricornutum), the

parameter value was set to 35%.

A pair of proteins was considered to be fused if the corresponding component protein

domains aligned with a minimum protein coverage of 70% to the composite protein

sequence in the reference organism.

To increase the robustness of our analysis, due to the short evolutionary distance between

the organisms under investigation, the threshold for the E-value was set at 10-5 instead of

the default 10-3.

Verification of the predicted fusion events

The results of the automated analysis were subjected to further verification by manual

analysis:

The so-called ‘promiscuous’ domains which occur frequently in many otherwise unrelated

proteins, such as ATP-binding cassettes, actin binding domains, WD repeats and SH3

domains (Marcotte et al., 1999) were removed, to reduce errors.

All proteins (fused and heterodimeric) identified in our study were searched against

InterPro (which combines diverse information about protein families and domains from

multiple databases) (Hunter et al., 2011) for the full annotation of the individual protein

domains. The InterPro accession number for each protein domain is indicated in the text

by the three-letter code IPR followed by six digits.

The predicted reference fused protein was split into its component proteins and then

checked by reverse BLAST (Altschul et al., 1997) to assess whether these two proteins

returned the initial two query proteins as their best BLAST hit.

Page 26: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

25

Examination of the evolutionary fate of the predicted fusion events

The distribution of the 5 fusion/fission events across the major eukaryotic and

prokaryotic taxonomic divisions was investigated. Towards this end, we used both the

composite protein sequences and the protein sequence pairs to search the available

NCBI, UniProtKB (Magrane and Consortium, 2011), Cyanidioschyzon merolae (Matsuzaki

et al., 2004) databases for homologs by applying BLASTp (Altschul et al., 1997). The best

BLAST hit within each taxon or taxonomic group was considered as the best candidate

ortholog.

The results of our search are categorized as follows:

o A single composite protein (fused protein) homolog was detected.

o The query protein was found split into two component protein domains in the reference

proteome (heterodimeric protein).

o A single reference protein homologous to either one of the two query component proteins

was detected. This is probably due to incomplete genomic studies.

o No protein homologs were found in the reference proteome (protein not available/missing

data). This is probably attributed to incomplete genomic studies or lack of data availability

in the genomic databases.

The results of this search were also mapped to the respective leaves of a species tree in

order to trace the evolutionary fate of each of the fusion/fission events. The NCBI

Taxonomy species tree was constructed using iTol (Letunic and Bork, 2011) and visualized

with Dendroscope (Huson et al., 2007).

Page 27: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

26

Secondary Structure Prediction

Secondary structure predictions were performed using the NPS (Network Protein

Sequence Analysis) web-server.

Homology Modelling and Model Evaluation

The homology modelling of the five algae enzymes was carried out using the MOE

(2004.03) package and its built-in homology modelling application. The produced models

were initially evaluated within the MOE package by a residue packing quality function,

which depends on the number of buried non-polar side chain groups and on hydrogen

bonding. The sequence identity scores of all homology models discussed in this study

were adequate enough to allow conventional homology modelling techniques to be used.

The homology model method of MOE comprises the following steps: First an initial partial

geometry specification, where an initial partial geometry for each target sequence is

copied from regions of one or more template chains. Secondly, the insertions and

deletions task, where residues that still have no assigned backbone coordinates are

modeled. Those residues may be in loops (insertions in the model with respect to the

template), they may be outgaps (residues in a model sequence which are aligned before

the C-terminus or after the N-terminus of its template) or may be deletions (regions

where the template has an insertion with respect to the model). For this study though

outgaps have not been included in the homology modelling process. Third step is the loop

selection and sidechain packing, where a collection of independent models is created.

Last step is the final model selection and refinement one, where the final models are

Page 28: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

27

scored and ranked, after they have been stereochemically checked for persisting errors.

Furthermore the suite PROCHECK was employed to further evaluate the quality of each

one of the five algae enzyme models.

Molecular electrostatic potential (MEP)

Electrostatic potential surfaces were calculated by solving the nonlinear Poisson–

Boltzmann equation using finite difference method as implemented in the Pymol

Software. The potential was calculated on grid points per side (65, 65, 65) and the ‘grid fill

by solute’ parameter was set to 80%. The dielectric constants of the solvent and the

solute were set to 80.0 and 2.0, respectively. An ionic exclusion radius of 2.0 Å, a solvent

radius of 1.4 Å and a solvent ionic strength of 0.145 M were applied. AMBER99 charges

and atomic radii were used for this calculation.

Model Optimization

Energy minimisation was done in MOE (Molecular Operating Environment suite) initially

using the Amber99 forcefield implemented into the same package, up to a RMSD gradient

of 0.0001 to remove the geometrical strain. The model was subsequently solvated with

SPC water using the truncated octahedron box extending to 7 Å from the model and

molecular dynamics were performed after that for 200 nanoseconds, at 300K, 1 atm with

2 fs step size, using the NVT ensemble in a canonical environment. NVT stands for

Number of atoms, Volume and Temperature that remain constant throughout the

calculation. The results of the molecular dynamics simulation were collected into a

database by MOE and can be further analysed.

Page 29: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

28

Docking studies and protein-protein interactions

The docking studies amongst the various constructed models were executed using ZDOCK

version 3.0. Likewise, RDOCK was used in order to minimize the ZDOCK complex outputs

and re-rank them based on their re-estimated binding free energies. Upon docking

experiments all molecular systems were subjected to extensive energy minimisations up

to a Gradient G<0.0001, using the Charmm27 forcefield as implemented into the

Gromacs 4.5.5 suite, using our in-house developed graphical interface (Sellis et al., 2009).

Page 30: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

29

REFERENCES

Adams, M.J., Archibald, I.G., Bugg, C.E., Carne, A., Gover, S., Helliwell, J.R., Pickersgill,

R.W. and White, S.W. The three dimensional structure of sheep liver 6-

phosphogluconate dehydrogenase at 2.6 A resolution. EMBO J 2 (1983), pp. 1009-14.

Alber, T., Banner, D.W., Bloomer, A.C., Petsko, G.A., Phillips, D., Rivers, P.S. and Wilson,

I.A. On the three-dimensional structure and catalytic mechanism of triose phosphate

isomerase. Philos Trans R Soc Lond B Biol Sci 293 (1981), pp. 159-71.

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman,

D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search

programs. Nucleic Acids Res 25 (1997), pp. 3389-402.

Armbrust, E.V. The life of diatoms in the world's oceans. Nature 459 (2009), pp. 185-92.

Armbrust, E.V., Berges, J.A., Bowler, C., Green, B.R., Martinez, D., Putnam, N.H., Zhou, S.,

Allen, A.E., Apt, K.E., Bechner, M., Brzezinski, M.A., Chaal, B.K., Chiovitti, A.,

Davis, A.K., Demarest, M.S., Detter, J.C., Glavina, T., Goodstein, D., Hadi, M.Z.,

Hellsten, U., Hildebrand, M., Jenkins, B.D., Jurka, J., Kapitonov, V.V., Kroger, N.,

Lau, W.W., Lane, T.W., Larimer, F.W., Lippmeier, J.C., Lucas, S., Medina, M.,

Montsant, A., Obornik, M., Parker, M.S., Palenik, B., Pazour, G.J., Richardson, P.M.,

Rynearson, T.A., Saito, M.A., Schwartz, D.C., Thamatrakoln, K., Valentin, K., Vardi,

A., Wilkerson, F.P. and Rokhsar, D.S. The genome of the diatom Thalassiosira

pseudonana: ecology, evolution, and metabolism. Science 306 (2004), pp. 79-86.

Banner, D.W., Bloomer, A.C., Petsko, G.A., Phillips, D.C., Pogson, C.I., Wilson, I.A.,

Corran, P.H., Furth, A.J., Milman, J.D., Offord, R.E., Priddle, J.D. and Waley, S.G.

Structure of chicken muscle triose phosphate isomerase determined

crystallographically at 2.5 angstrom resolution using amino acid sequence data.

Nature 255 (1975), pp. 609-14.

Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2009). GenBank. Nucleic

Acids Res. 2009 Jan;37(Database issue): D26-31.

Blanc, G., Duncan, G., Agarkova, I., Borodovsky, M., Gurnon, J., Kuo, A., Lindquist, E.,

Lucas, S., Pangilinan, J., Polle, J., Salamov, A., Terry, A., Yamada, T., Dunigan, D.D.,

Grigoriev, I.V., Claverie, J.M. and Van Etten, J.L. The Chlorella variabilis NC64A

genome reveals adaptation to photosymbiosis, coevolution with viruses, and cryptic

sex. Plant Cell 22 (2010), pp. 2943-55.

Bloom, B. and Topper, Y.J. Mechanism of action of aldolase and phosphotriose isomerase.

Science 124 (1956), pp. 982-3.

Bowler, C., Allen, A.E., Badger, J.H., Grimwood, J., Jabbari, K., Kuo, A., Maheswari, U.,

Martens, C., Maumus, F., Otillar, R.P., Rayko, E., Salamov, A., Vandepoele, K.,

Beszteri, B., Gruber, A., Heijde, M., Katinka, M., Mock, T., Valentin, K., Verret, F.,

Berges, J.A., Brownlee, C., Cadoret, J.P., Chiovitti, A., Choi, C.J., Coesel, S., De

Martino, A., Detter, J.C., Durkin, C., Falciatore, A., Fournet, J., Haruta, M., Huysman,

M.J., Jenkins, B.D., Jiroutova, K., Jorgensen, R.E., Joubert, Y., Kaplan, A., Kroger,

N., Kroth, P.G., La Roche, J., Lindquist, E., Lommer, M., Martin-Jezequel, V., Lopez,

P.J., Lucas, S., Mangogna, M., McGinnis, K., Medlin, L.K., Montsant, A., Oudot-Le

Secq, M.P., Napoli, C., Obornik, M., Parker, M.S., Petit, J.L., Porcel, B.M., Poulsen,

N., Robison, M., Rychlewski, L., Rynearson, T.A., Schmutz, J., Shapiro, H., Siaut,

M., Stanley, M., Sussman, M.R., Taylor, A.R., Vardi, A., von Dassow, P., Vyverman,

Page 31: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

30

W., Willis, A., Wyrwicz, L.S., Rokhsar, D.S., Weissenbach, J., Armbrust, E.V., Green,

B.R., Van de Peer, Y. and Grigoriev, I.V. The Phaeodactylum genome reveals the

evolutionary history of diatom genomes. Nature 456 (2008), pp. 239-44.

Broedel, S.E., Jr. and Wolf, R.E., Jr. Genetic tagging, cloning, and DNA sequence of the

Synechococcus sp. strain PCC 7942 gene (gnd) encoding 6-phosphogluconate

dehydrogenase. J Bacteriol 172 (1990), pp. 4023-31.

Derelle, E., Ferraz, C., Rombauts, S., Rouze, P., Worden, A.Z., Robbens, S., Partensky, F.,

Degroeve, S., Echeynie, S., Cooke, R., Saeys, Y., Wuyts, J., Jabbari, K., Bowler, C.,

Panaud, O., Piegu, B., Ball, S.G., Ral, J.P., Bouget, F.Y., Piganeau, G., De Baets, B.,

Picard, A., Delseny, M., Demaille, J., Van de Peer, Y. and Moreau, H. Genome

analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique

features. Proc Natl Acad Sci U S A 103 (2006), pp. 11647-52.

Dimitriadis, D., Koumandou, V.L., Trimpalis, P. and Kossida, S. Protein functional links in

Trypanosoma brucei, identified by gene fusion analysis. BMC Evol Biol 11 (2011), p.

193.

Dugaiczyk, A., Haron, J.A., Stone, E.M., Dennison, O.E., Rothblum, K.N. and Schwartz, R.J.

Cloning and sequencing of a deoxyribonucleic acid copy of glyceraldehyde-3-

phosphate dehydrogenase messenger ribonucleic acid isolated from chicken muscle.

Biochemistry 22 (1983), pp. 1605-13.

Enright, A.J., Iliopoulos, I., Kyrpides, N.C. and Ouzounis, C.A. Protein interaction maps for

complete genomes based on gene fusion events. Nature 402 (1999), pp. 86-90.

Enright, A.J. and Ouzounis, C.A. Functional associations of proteins in entire genomes by

means of exhaustive detection of gene fusions. Genome Biol 2 (2001), p.

RESEARCH0034.

Ewing, R.M., Chu, P., Elisma, F., Li, H., Taylor, P., Climie, S., McBroom-Cerajewski, L.,

Robinson, M.D., O'Connor, L., Li, M., Taylor, R., Dharsee, M., Ho, Y., Heilbut, A.,

Moore, L., Zhang, S., Ornatsky, O., Bukhman, Y.V., Ethier, M., Sheng, Y., Vasilescu,

J., Abu-Farha, M., Lambert, J.P., Duewel, H.S., Stewart, II, Kuehl, B., Hogue, K.,

Colwill, K., Gladwish, K., Muskat, B., Kinach, R., Adams, S.L., Moran, M.F., Morin,

G.B., Topaloglou, T. and Figeys, D. Large-scale mapping of human protein-protein

interactions by mass spectrometry. Mol Syst Biol 3 (2007), p. 89.

Falkowski, P.G., Katz, M.E., Knoll, A.H., Quigg, A., Raven, J.A., Schofield, O. and Taylor,

F.J. The evolution of modern eukaryotic phytoplankton. Science 305 (2004), pp. 354-

60.

Feng, Q., Wang, H., Ng, H.H., Erdjument-Bromage, H., Tempst, P., Struhl, K. and Zhang, Y.

Methylation of H3-lysine 79 is mediated by a new family of HMTases without a SET

domain. Curr Biol 12 (2002), pp. 1052-8.

Field, C.B., Behrenfeld, M.J., Randerson, J.T. and Falkowski, P. Primary production of the

biosphere: integrating terrestrial and oceanic components. Science 281 (1998), pp.

237-40.

Fields, S. and Song, O. A novel genetic system to detect protein-protein interactions. Nature

340 (1989), pp. 245-6.

Fothergill-Gilmore, L.A. The evolution of the glycolytic pathway. Trends Biochem Sci 11

(1986), pp. 47-51.

Fouts, D., Ganguly, R., Gutierrez, A.G., Lucchesi, J.C. and Manning, J.E. Nucleotide

sequence of the Drosophila glucose-6-phosphate dehydrogenase gene and comparison

with the homologous human gene. Gene 63 (1988), pp. 261-75.

Gray, M.W. Evolution of organellar genomes. Curr Opin Genet Dev 9 (1999), pp. 678-87.

Page 32: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

31

Grossman, A.R. Paths toward algal genomics. Plant Physiol 137 (2005), pp. 410-27.

Herron, M.D., Hackett, J.D., Aylward, F.O. and Michod, R.E. Triassic origin and early

radiation of multicellular volvocine algae. Proc Natl Acad Sci U S A 106 (2009), pp.

3254-8.

Hunter, S., Jones, P., Mitchell, A., Apweiler, R., Attwood, T.K., Bateman, A., Bernard, T.,

Binns, D., Bork, P., Burge, S., de Castro, E., Coggill, P., Corbett, M., Das, U.,

Daugherty, L., Duquenne, L., Finn, R.D., Fraser, M., Gough, J., Haft, D., Hulo, N.,

Kahn, D., Kelly, E., Letunic, I., Lonsdale, D., Lopez, R., Madera, M., Maslen, J.,

McAnulla, C., McDowall, J., McMenamin, C., Mi, H., Mutowo-Muellenet, P.,

Mulder, N., Natale, D., Orengo, C., Pesseat, S., Punta, M., Quinn, A.F., Rivoire, C.,

Sangrador-Vegas, A., Selengut, J.D., Sigrist, C.J., Scheremetjew, M., Tate, J.,

Thimmajanarthanan, M., Thomas, P.D., Wu, C.H., Yeats, C. and Yong, S.Y. InterPro

in 2011: new developments in the family and domain prediction database. Nucleic

Acids Res 40 (2011), pp. D306-12.

Huson, D.H., Richter, D.C., Rausch, C., Dezulian, T., Franz, M. and Rupp, R. Dendroscope:

An interactive viewer for large phylogenetic trees. BMC Bioinformatics 8 (2007), p.

460.

Jogl, G., Rozovsky, S., McDermott, A.E. and Tong, L. Optimal alignment for enzymatic

proton transfer: structure of the Michaelis complex of triosephosphate isomerase at

1.2-A resolution. Proc Natl Acad Sci U S A 100 (2003), pp. 50-5.

Kohlhoff, M., Dahm, A. and Hensel, R. Tetrameric triosephosphate isomerase from

hyperthermophilic Archaea. FEBS Lett 383 (1996), pp. 245-50.

Kummerfeld, S.K. and Teichmann, S.A. Relative rates of gene fusion and fission in multi-

domain proteins. Trends Genet 21 (2005), pp. 25-30.

Kuroiwa, T. The primitive red algae Cyanidium caldarium and Cyanidioschyzon merolae as

model system for investigating the dividing apparatus of mitochondria and plastids.

BioEssays 20 (1998), pp. 344–354.

Lal, A., Schutzbach, J.S., Forsee, W.T., Neame, P.J. and Moremen, K.W. Isolation and

expression of murine and rabbit cDNAs encoding an alpha 1,2-mannosidase involved

in the processing of asparagine-linked oligosaccharides. J Biol Chem 269 (1994), pp.

9872-81.

Letunic, I. and Bork, P. Interactive Tree Of Life v2: online annotation and display of

phylogenetic trees made easy. Nucleic Acids Res 39 (2011), pp. W475-8.

Lolis, E., Alber, T., Davenport, R.C., Rose, D., Hartman, F.C. and Petsko, G.A. Structure of

yeast triosephosphate isomerase at 1.9-A resolution. Biochemistry 29 (1990), pp.

6609-18.

Magrane, M. and Consortium, U. UniProt Knowledgebase: a hub of integrated protein data.

Database (Oxford) 2011 (2011), p. bar009.

Marcotte, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O. and Eisenberg, D.

Detecting protein function and protein-protein interactions from genome sequences.

Science 285 (1999), pp. 751-3.

Martin, W., Brinkmann, H., Savonna, C. and Cerff, R. Evidence for a chimeric nature of

nuclear genomes: eubacterial origin of eukaryotic glyceraldehyde-3-phosphate

dehydrogenase genes. Proc Natl Acad Sci U S A 90 (1993), pp. 8692-6.

Martini, G. and Ursini, M.V. A new lease of life for an old enzyme. Bioessays 18 (1996), pp.

631-7.

Matsuzaki, M., Misumi, O., Shin, I.T., Maruyama, S., Takahara, M., Miyagishima, S.Y.,

Mori, T., Nishida, K., Yagisawa, F., Yoshida, Y., Nishimura, Y., Nakao, S., Kobayashi,

Page 33: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

32

T., Momoyama, Y., Higashiyama, T., Minoda, A., Sano, M., Nomoto, H., Oishi, K.,

Hayashi, H., Ohta, F., Nishizaka, S., Haga, S., Miura, S., Morishita, T., Kabeya, Y.,

Terasawa, K., Suzuki, Y., Ishii, Y., Asakawa, S., Takano, H., Ohta, N., Kuroiwa, H.,

Tanaka, K., Shimizu, N., Sugano, S., Sato, N., Nozaki, H., Ogasawara, N., Kohara, Y.

and Kuroiwa, T. Genome sequence of the ultrasmall unicellular red alga

Cyanidioschyzon merolae 10D. Nature 428 (2004), pp. 653-7.

Merchant, S.S., Prochnik, S.E., Vallon, O., Harris, E.H., Karpowicz, S.J., Witman, G.B.,

Terry, A., Salamov, A., Fritz-Laylin, L.K., Marechal-Drouard, L., Marshall, W.F., Qu,

L.H., Nelson, D.R., Sanderfoot, A.A., Spalding, M.H., Kapitonov, V.V., Ren, Q.,

Ferris, P., Lindquist, E., Shapiro, H., Lucas, S.M., Grimwood, J., Schmutz, J., Cardol,

P., Cerutti, H., Chanfreau, G., Chen, C.L., Cognat, V., Croft, M.T., Dent, R., Dutcher,

S., Fernandez, E., Fukuzawa, H., Gonzalez-Ballester, D., Gonzalez-Halphen, D.,

Hallmann, A., Hanikenne, M., Hippler, M., Inwood, W., Jabbari, K., Kalanon, M.,

Kuras, R., Lefebvre, P.A., Lemaire, S.D., Lobanov, A.V., Lohr, M., Manuell, A.,

Meier, I., Mets, L., Mittag, M., Mittelmeier, T., Moroney, J.V., Moseley, J., Napoli, C.,

Nedelcu, A.M., Niyogi, K., Novoselov, S.V., Paulsen, I.T., Pazour, G., Purton, S., Ral,

J.P., Riano-Pachon, D.M., Riekhof, W., Rymarquis, L., Schroda, M., Stern, D., Umen,

J., Willows, R., Wilson, N., Zimmer, S.L., Allmer, J., Balk, J., Bisova, K., Chen, C.J.,

Elias, M., Gendler, K., Hauser, C., Lamb, M.R., Ledford, H., Long, J.C., Minagawa,

J., Page, M.D., Pan, J., Pootakham, W., Roje, S., Rose, A., Stahlberg, E., Terauchi,

A.M., Yang, P., Ball, S., Bowler, C., Dieckmann, C.L., Gladyshev, V.N., Green, P.,

Jorgensen, R., Mayfield, S., Mueller-Roeber, B., Rajamani, S., Sayre, R.T., Brokstein,

P., et al. The Chlamydomonas genome reveals the evolution of key animal and plant

functions. Science 318 (2007), pp. 245-50.

Muramoto, K., Ohta, K., Shinzawa-Itoh, K., Kanda, K., Taniguchi, M., Nabekura, H.,

Yamashita, E., Tsukihara, T. and Yoshikawa, S. Bovine cytochrome c oxidase

structures enable O2 reduction with minimization of reactive oxygens and provide a

proton-pumping gate. Proc Natl Acad Sci U S A 107 (2010), pp. 7740-5.

Ndimba BK, Ndimba RJ, Johnson TS, Waditee-Sirisattha R, Baba M, Sirisattha S, Shiraiwa

Y, Agrawal GK, Rakwal R.Biofuels as a sustainable energy source: An update of the

applications of proteomics in bioenergy crops and algae.J Proteomics. 2013. S1874-

3919(13)00332-1.

Ostermeier, C., Harrenga, A., Ermler, U. and Michel, H. Structure at 2.7 A resolution of the

Paracoccus denitrificans two-subunit cytochrome c oxidase complexed with an

antibody FV fragment. Proc Natl Acad Sci U S A 94 (1997), pp. 10547-53.

Palenik, B., Grimwood, J., Aerts, A., Rouze, P., Salamov, A., Putnam, N., Dupont, C.,

Jorgensen, R., Derelle, E., Rombauts, S., Zhou, K., Otillar, R., Merchant, S.S., Podell,

S., Gaasterland, T., Napoli, C., Gendler, K., Manuell, A., Tai, V., Vallon, O., Piganeau,

G., Jancek, S., Heijde, M., Jabbari, K., Bowler, C., Lohr, M., Robbens, S., Werner, G.,

Dubchak, I., Pazour, G.J., Ren, Q., Paulsen, I., Delwiche, C., Schmutz, J., Rokhsar,

D., Van de Peer, Y., Moreau, H. and Grigoriev, I.V. The tiny eukaryote Ostreococcus

provides genomic insights into the paradox of plankton speciation. Proc Natl Acad Sci

U S A 104 (2007), pp. 7705-10.

Parkinson, J. and Gordon, R. Beyond micromachining: the potential of diatoms. Trends

Biotechnol 17 (1999), pp. 190-6.

Perez-Martinez, X., Antaramian, A., Vazquez-Acevedo, M., Funes, S., Tolkunova, E.,

d'Alayer, J., Claros, M.G., Davidson, E., King, M.P. and Gonzalez-Halphen, D.

Subunit II of cytochrome c oxidase in Chlamydomonad algae is a heterodimer

Page 34: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

33

encoded by two independent nuclear genes. J Biol Chem 276 (2001), pp. 11302-9.

Phizicky, E.M. and Fields, S. Protein-protein interactions: methods for detection and analysis.

Microbiol Rev 59 (1995), pp. 94-123.

Prochnik, S.E., Umen, J., Nedelcu, A.M., Hallmann, A., Miller, S.M., Nishii, I., Ferris, P.,

Kuo, A., Mitros, T., Fritz-Laylin, L.K., Hellsten, U., Chapman, J., Simakov, O.,

Rensing, S.A., Terry, A., Pangilinan, J., Kapitonov, V., Jurka, J., Salamov, A., Shapiro,

H., Schmutz, J., Grimwood, J., Lindquist, E., Lucas, S., Grigoriev, I.V., Schmitt, R.,

Kirk, D. and Rokhsar, D.S. Genomic analysis of organismal complexity in the

multicellular green alga Volvox carteri. Science 329 (2010), pp. 223-6.

Putnam, N.H., Srivastava, M., Hellsten, U., Dirks, B., Chapman, J., Salamov, A., Terry, A.,

Shapiro, H., Lindquist, E., Kapitonov, V.V., Jurka, J., Genikhovich, G., Grigoriev, I.V.,

Lucas, S.M., Steele, R.E., Finnerty, J.R., Technau, U., Martindale, M.Q. and Rokhsar,

D.S. Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic

organization. Science 317 (2007), pp. 86-94.

Sarafidou, T., Kahl, C., Martinez-Garay, I., Mangelsdorf, M., Gesk, S., Baker, E., Kokkinaki,

M., Talley, P., Maltby, E.L., French, L., Harder, L., Hinzmann, B., Nobile, C.,

Richkind, K., Finnis, M., Deloukas, P., Sutherland, G.R., Kutsche, K., Moschonas,

N.K., Siebert, R. and Gecz, J. Folate-sensitive fragile site FRA10A is due to an

expansion of a CGG repeat in a novel gene, FRA10AC1, encoding a nuclear protein.

Genomics 84 (2004), pp. 69-81.

Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM,

DiCuccio M, Edgar R, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y,

Landsman D, Lipman DJ, Madden TL, Maglott DR, Miller V, Mizrachi I, Ostell J,

Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Shumway M, Sirotkin K, Souvorov A,

Starchenko G, Tatusova TA, Wagner L, Yaschenko E, Ye J (2009). Database resources

of the National Center for Biotechnology Information. Nucleic Acids Res. 2009

Jan;37(Database issue):D5-15.

D. Sellis, D. Vlachakis, and M. Vlassi, Gromita: a fully integrated graphical user interface to

gromacs 4. Bioinform Biol Insights 3 (2009) 99-102.

Sims, P., Mann, D., and Medlin, L. Evolution of the diatoms: insights from fossil, biological

and molecular data. Phycologia 45 (2006), pp. 361–402.

Singer, M.S., Kahana, A., Wolf, A.J., Meisinger, L.L., Peterson, S.E., Goggin, C., Mahowald,

M. and Gottschling, D.E. Identification of high-copy disruptors of telomeric silencing

in Saccharomyces cerevisiae. Genetics 150 (1998), pp. 613-32.

Skarzynski, T. and Wonacott, A.J. Coenzyme-induced conformational changes in

glyceraldehyde-3-phosphate dehydrogenase from Bacillus stearothermophilus. J Mol

Biol 203 (1988), pp. 1097-118.

Snel, B., Bork, P. and Huynen, M. Genome evolution. Gene fusion versus gene fission.

Trends Genet 16 (2000), pp. 9-11.

Takeda, H. Classification of Chlorella strains by cell wall sugar composition. Phytochemistry

27 (1988), pp. 3823–3826.

Tsagrasoulis, D., Danos, V., Kissa, M., Trimpalis, P., Koumandou, V.L., Karagouni, A.D.,

Tsakalidis, A. and Kossida, S. SAFE Software and FED Database to Uncover Protein-

Protein Interactions using Gene Fusion Analysis. Evol Bioinform Online 8 (2012), pp.

47-60.

Tsukihara, T., Aoyama, H., Yamashita, E., Tomizaki, T., Yamaguchi, H., Shinzawa-Itoh, K.,

Nakashima, R., Yaono, R. and Yoshikawa, S. The whole structure of the 13-subunit

oxidized cytochrome c oxidase at 2.8 A. Science 272 (1996), pp. 1136-44.

Page 35: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

34

van Leeuwen, F., Gafken, P.R. and Gottschling, D.E. Dot1p modulates silencing in yeast by

methylation of the nucleosome core. Cell 109 (2002), pp. 745-56.

Wacker, H., Harvey, R.A., Winestock, C.H. and Plaut, G.W. 4-(1'-D-Ribitylamino)-5-Amino-

2,6-Dihydroxypyrimidine, the Second Product of the Riboflavin Synthetase Reaction.

J Biol Chem 239 (1964), pp. 3493-7.

Yanai, I., Derti, A. and DeLisi, C. Genes linked by fusion events are generally of the same

functional category: a systematic analysis of 30 microbial genomes. Proc Natl Acad

Sci U S A 98 (2001), pp. 7940-5.

Yoon, H.S., Hackett, J.D., Ciniglia, C., Pinto, G. and Bhattacharya, D. A molecular timeline

for the origin of photosynthetic eukaryotes. Mol Biol Evol 21 (2004), pp. 809-18.

Page 36: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

35

FIGURE LEGENDS

Figure 1. Schematic representations displaying the alignment of identified fusion (A,B, D &

E) or fission ( C ) proteins in green microalgal species (A and D) and diatoms (D and E) with

their respective split or composite proteins. The amino acid positions that correspond to

the beginning and the end of the alignment are indicated, as well as the boundaries of the

conserved domains relative to the full-length protein. Asterisks beside the microalgal

species names below denote that the particular species was chosen as a representative

amongst others for the purposes of the alignment comparison.

A: Alignment of fusion protein alpha 1,2 mannosidase/Fra10Ac1 in the green alga V. carteri

with the respective split proteins in the green alga C. reinhardtii*.

B: Alignment of fusion protein DOT1/riboflavin synthase in the green alga V. carteri with

the respective split proteins in the green alga C. reinhardtii*.

C: Alignment of fission proteins COX2A and COX2B in the green alga V. carteri* with the

respective composite protein in the red alga C.merolae.

D: Alignment of fusion protein G6PDH/6PGDH in the diatom P. tricornutum with the

respective split proteins in the diatom T. pseudonana*.

E: Alignment of fusion protein TIM/GAPDH inthediatom Phaeodactylum tricornutum*with

the respective split proteins in the red alga C. merolae.

Figure 2. Ribbon representations of the three dimensional homology models for fusion and

individual fission and split proteins.

A Left: Ribbon representation of the X-ray crystal structure of 3GYX, which was used as

Page 37: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

36

template. 3GYX consists of six copies of a heterodimer that is made up of a large α-helical

barrel-like conformation (in orange color) and a smaller molecule in an extended coil and

β-sheet conformation that wraps around the larger component (in blue color).

Right: The 3D homology modelled molecular system of the alpha 1,2 mannosidase (in

green color) and the Fra10Ac1 molecule (in red color) in complexed conformation.

B. Left: Ribbon representation of the 3D homology model for the riboflavin synthase

model superposed on its X-ray crystal template structure. The theoretical model is in red,

whereas the template X-ray crystal structure is shown in green color.

Right: Following the spatial organization of the homotrimeric template complex structure,

the 3D model of the complex was established. This trimeric model ( shown per monomer

in green, red and blue color and wire representation) was subsequently subjected to

docking algorithms in the presence of a single DOT1 molecule (shown in yellow ribbon),

also established via computer-aided homology modelling techniques. Herein is a snapshot

of the final complex conformation.

C. Left: Ribbon representation of the produced 3D homology model of the COXIIa and

COXIIb complex including the modelled inserts of the fusion sites. COXIIa is showing in red,

COXIIb in magenta, the secondary structure predicted α-helical insert in blue and the α-

helical structure from the Thermotoga maritima bacterium in green color.

Right: using the previous conventions (C, left), the 3D homology model of COXIIa-b is

modelled on the complete two-subunit Cytochrome C Oxidase from the Paracoccus

Denitrificans (RCSB entry: 1AR1) crystal structure, which was used as template.

D. Ribbon representation of the 3D homology model for the complex of the predicted

fusion event in Phaeodactylum tricornutum. The G6PDH model is in red, whereas the

Page 38: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

37

component 6PGDH model is shown in green.

E. Ribbon representation of the produced Thalassiosira pseudonana fusion protein

hypothesis 3D model. The GAPDH model is in red, whereas the TIM model is shown in

green.

Figure 3. Electrostatic potential surfaces were calculated in order to analyze and compare

the charge distribution of the produced 3D model of the COXIIa and COXIIb complex to its

template structure that was based on. The two complexes exhibited almost identical

electrostatic surfaces, sharing common features that were not disturbed by the addition of

the two extra a-helices on the homology model. There is a hydrophobic, uncharged region

in the mid section of the two complexes (depicted by white boxes) that is vital to its

function that has been conserved, despite the addition of the insert structures. This

observation verified the validity of the model, which was found to share similar

electrostatic surface of almost identical intensity, to its X-ray determined template

structure. The 3D position of the two extra α-helices is indicated by the square black boxes.

Figure 4. Dendrograms of identified fusion and fission proteins in microalgal species.

A: Dendrogram of fusion protein alpha 1,2 mannosidase/Fra10Ac1

B: Dendrogram of fusion protein DOT1/riboflavin synthase

C: Dendrogram of COX2

D: Dendrogram of fusion protein G6PDH/6PGDH

E: Dendrogram of fusion protein TIM/GAPDH

Page 39: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

38

Figure 5. NCBI-extracted dendrogramillustrating the phylogenetic distribution of the

predicted fusion/fission events in the main eukaryotic and prokaryotic taxa. The

conventions/symbols are the same as in F Figure 4.

Page 40: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

39

TABLE LEGENDS

Table 1: List and description of the organisms analyzed in the present study.

Table 2: Predicted fusion events and the corresponding fused and heterodimeric proteins.

Table 3: Predicted fission event and the corresponding split and composite proteins.

Page 41: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

40

Figure 1

Page 42: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

41

Figure 2

Page 43: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

42

Figure 3

Page 44: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

43

Figure 4

Page 45: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

44

Figure 5

Page 46: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

45

Table 1

Taxonomy / Species Description

1. Chlorophyta (Green algae) They constitute a large diverse group of photosynthetic eukaryotes from which the land plants (Streptophytes) descended one billion years ago (Yoon et al., 2004). Green algae originated from primary endosymbiosis, whereby a non-photosynthetic eukaryote acquired a chloroplast by engulfing a primary cyanobacterium (Gray, 1999; Falkowski et al., 2004). They play an important role in global energy and biomass production (Grossman, 2005).

1.1.Chlorophyceae

1.1.1. Chlamydomonadales

Volvox carteri f. nagariensis Multicellular chlorophyceaen alga (Prochnik et al., 2010) which diverged from its unicellular ancestors approximately 200 million years ago (Herron et al., 2009). V. carteri is extensively used for studying multicellularity and cellular

differentiation (Herron et al., 2009; Prochnik et al., 2010)

Chlamydomonas reinhardtii C. reinhardtii (Merchant et al., 2007) is a unicellular prasinophyte which is found in diverse aquatic environments. It provides a model organism for studying eukaryotic photosynthesis, cellular metabolism and sexual reproduction due to its well established genetic background.

1.2. Trebouxiophyceae

Chlorella variabilis NC64A unicellular organism, found both in aquatic and terrestrial environments (Takeda, 1988). C. variabilis is used as a model organism for studying adaptation to photosymbiosis and viral-algal interactions (Blanc et al., 2010)

1.3. Mamiellophyceae

Ostreococcus lucimarinusCCE9901 One of the smallest known unicellular marine eukaryotes (about 1 μm diameter) (Piganeau et al., 2011). It is found in the phytoplankton of diverge marine environments. This alga, believed to be the last common ancestor of the green lineage from which all other green algae and terrestrial plants have emerged, is used in evolutionary and genomic studies (Derelle et al., 2006; Palenik et al., 2007)

2. Rhodophyta (Red algae) Red algae, like green algae, are also believed to have descended from an ancestral cyanobacterial endosymbiont (Gray, 1999; Falkowski et al., 2004). They are proposed to have evolved before the divergence of green algae from terrestrial plants (Merchant et al., 2007).

Page 47: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

46

Cyanidioschyzon merolae10D C. merolae is a small (2 μm diameter) unicellular red alga with a compact

genome (about 16 Mb). (Matsuzaki et al., 2004) found in acidic hot springs. The cell contains a single nucleus, a single plastid and a single mitochondrion. Due to its simple gene composition, the genome of C. merolae is used as a model system for studying the origin, evolution and fundamental traits of eukaryotic cells (Kuroiwa, 1998)

3. Bacillariophyta (Diatoms) Diatoms are unicellular, photosynthetic algae. They are distributed in almost all water bodies throughout the world, and play an important role in the global ecosystem since they are responsible for about one-fifth of global carbon fixation (Field et al., 1998; Armbrust, 2009). A characteristic feature of diatoms is their intricate silicified cell wall, or frustule, which is exploited by nanotechnologists (Parkinson and Gordon, 1999)

3.1. Coscinodiscophyceae (Centric

diatoms)

Thalassiosira pseudonana CCMP1335 T. pseudonana (Armbrust et al., 2004) is a marine centric diatom, the origin of which is traced to 180 mllion years ago (Sims, 2006). It has been used as a model organism for studying diatom physiology (Armbrust et al., 2004)

3.2. Bacillariophyceae (Pennate

diatoms)

Phaeodactylum tricornutumCCAP

1055/1

P. tricornutum is a pennate diatom with a small genome size (about 30 Mb). It diverged from T. pseudonana 90 million years ago, and is found both in pelagic and benthic environments (Sims, 2006). It has served as a model system for studying diatom genomics (Bowler et al., 2008)

Page 48: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

47

Table 2

Fused protein Heterodimeric protein

Species Protein Accession Species Protein Accession Locus

Volvox carteri

alpha-1,2-mannosidase

XP_002957696

Chlamydomonas

reinhardtii

alpha-1,2-mannosidase EDO98388 Contig ABCN01005783.1

predicted protein EDO98391 Contig ABCN01005784.1

Ostreococcus

lucimarinus

predicted protein XP_001421581 Chromosome 15

predicted protein XP_001417927 Chromosome 5

Chlorella variabilis

hypothetical protein

CHLNCDRAFT_33654 EFN60136 Contig ADIC01000169.1

hypothetical protein

CHLNCDRAFT_22127 EFN56799

Contig ADIC01001059.1

Thalassiosira

pseudonana

mannosyl-oligosaccharide 1,2-alpha-

mannosidase EED93215.1 Chromosome 4

predicted protein EED88579 Chromosome 15

Phaeodactylum

tricornutum

mannosyl-oligosaccharide alpha-1,2-

mannosidase XP_002182479 Chromosome 16

predicted protein XP_002180997 Chromosome 10

Volvox carteri

hypothetical protein

VOLCADRAFT_12073

0

XP_002949156

Chlamydomonas

reinhardtii

predicted protein EDP03930 Contig ABCN01002604.1

riboflavin synthase EDP05053 Contig ABCN01002077.1

Ostreococcus

lucimarinus

predicted protein XP_001417943 Chromosome 5

predicted protein XP_001418747 Chromosome 7

Thalassiosira

pseudonana

predicted protein EED96376 Chromosome 1

predicted protein EED95235 Chromosome 2

Phaeodactylum

tricornutum

predicted protein XP_002177531 Chromosome 1

predicted protein XP_002182351 Chromosome 15

Page 49: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

48

Phaeodactylum

tricornutum

G6PDH/6PGDH

fusion protein XP_002185945

Thalassiosira

pseudonana

glucose-6-phosphate 1-dehydrogenase EED92550 Chromosome 5

6-phosphogluconate dehydrogenase EED93357 Chromosome 4

Ostreococcus

lucimarinus

predicted protein XP_001417868 Chromosome 5

predicted protein XP_001418779 Chromosome 7

Volvox carteri

hypothetical protein

VOLCADRAFT_82038 XP_002953022 scaffold VOLCAscaffold_33

hypothetical protein

VOLCADRAFT_109207 XP_002953515 scaffold VOLCAscaffold_36

Chlamydomonas

reinhardtii

glucose-6-phosphate-1-dehydrogenase EDP00500.1 Contig ABCN01004386.1

6-phosphogluconate dehydrogenase,

decarboxylating EDP00572.1 Contig ABCN01004298.1

1.

Thalassiosira

pseudonana

2.

Phaeodactylum

tricornutum

triosephosphate

isomerase

/glyceraldehyde-3-

phosphate

dehydrogenase

precursor

triosephosphate

isomerase

/glyceraldehyde-3-

phosphate

dehydrogenase

precursor

EED92326

XP_002177987

Chlamydomonas

reinhardtii

triose-phosphate isomerase EDP09773 Contig ABCN01000165.1

glyceraldehyde 3-phosphate

dehydrogenase,

dominant splicing variant

EDO96575 Contig ABCN01007563.1

Volvox carteri

hypothetical protein

VOLCADRAFT_109969 XP_002955427 scaffold VOLCAscaffold_51

glyceraldehyde 3-phosphate dehydrogenase XP_002956882 scaffold VOLCAscaffold_66

Chlorella variabilis triosephosphate isomerase cytoplasmic type EFN53775 Contig ADIC01002012.1

hypothetical protein

CHLNCDRAFT_36383 EFN53819 Contig ADIC01002038.1

Cyanidioschyzon

merolae

triose-phosphate isomerase BAC67674 -

Glyceraldehyde 3 phosphate dehydrogenase BAC67669 -

Page 50: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

49

Table 3

Fission protein Composite protein

Species Protein Accession Protein Species Accessio

n

1. Volvox carteri

2.

Chlamydomonas

reinhardtii

hypothetical protein

VOLCADRAFT_74497

cytochrome c oxidase

subunit II

XP_002950066 Scaffold

VOLCAscaffold_17

Cyanidioschyzon

merolae

cytochrome c oxidase

polypeptide II

BAA34656.1

XP_002948528 Scaffold

VOLCAscaffold_10

cytochrome c oxidase

subunit II,

protein IIa of split subunit

EDP00208.1 Contig

ABCN01004598.1

cytochrome c oxidase

subunit II,

protein IIb of split subunit

EDP09974.1 Contig

ABCN01000296.1

Page 51: Unraveling microalgal molecular interactions using evolutionary and structural bioinformatics

ACC

EPTE

D M

ANU

SCR

IPT

ACCEPTED MANUSCRIPT

50

ABBREVIATIONS

CO2 Carbon dioxide

SAFE Software for the Analysis of Fusion Events

3D Three-dimensional

BLAST Basic Local Alignment Search Tool

Fra10Ac1 Fragile site, folic acid type, rare, fra(10)(q23.3) or fra(10)(q24.2) candidate 1

DOT1 Disruptor of Telomeric silencing

COX2A Cytochrome C oxidase subunit II, transmembrane domain

COX2B Cytochrome c oxidase subunit II C-terminal

G6PDH Glucose-6-phosphate 1-dehydrogenase

TIM Triosephosphate isomerase

6PGDH 6-Phosphogluconate Dehydrogenase

GAPDH Glyceraldehyde-3-phosphate dehydrogenase

DMRL 6,7-dimethyl-8-(1’-D-ribityl)-lumanize

G3P D-glyceraldehyde 3-phosphate

DHAP Dihydroxyacetone phosphate

PDB Protein Data Bank

RNA Ribonucleic acid

NCBI National Center for Biotechnology Information

ATP Adenosine triphosphate

WD Trp-Asp

SH3 Src Homology 3 Domain

NPS Network Protein Sequence Analysis

MOE Molecular Operating Environment

MEP Molecular electrostatic potential

NVT Number of atoms, Volume and Temperature

RMSd Root-mean-square deviation


Recommended