+ All Categories
Home > Documents > Phylogenomics Reveals an Anomalous Distribution of USP...

Phylogenomics Reveals an Anomalous Distribution of USP...

Date post: 11-Mar-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
9
Phylogenomics Reveals an Anomalous Distribution of USP Genes in Metazoans Sylvain Fore ˆt, ,1 Franc xois Seneca, ,1 Danielle de Jong, 1,2 Annette Bieller, 3 Georg Hemmrich, 4 Rene Augustin, 4 David C. Hayward, 2 Eldon E. Ball, 2 Thomas C.G. Bosch, 4 Kiyokazu Agata, 5 Monika Hassel, 3 and David J. Miller* ,1 1 ARC Centre of Excellence in Coral Reef Studies and Comparative Genomics Centre, James Cook University, Townsville, Queensland, Australia 2 Evolution, Ecology and Genetics Group, Research School of Biology, Australian National University, Canberra ACT, Australia 3 Department of Biology, Philipps University, Marburg, Germany 4 Zoological Institute, Christian-Albrechts-University Kiel, Kiel, Germany 5 Department of Biophysics, Kyoto University, Kitashirakawa-Oiwake, Sakyo-ku, Kyoto, Japan  These authors contributed equally to this work. *Corresponding author: E-mail: [email protected]. Associate editor: Claudia Kappen Abstract Members of the universal stress protein (USP) family were originally identified in stressed bacteria on the basis of a shared domain, which has since been reported in a phylogenetically diverse range of prokaryotes, fungi, protists, and plants. Although not previously characterized in metazoans, here we report that USP genes are distributed in animal genomes in a unique pattern that reflects frequent independent losses and independent expansions. Multiple USP loci are present in urochordates as well as all Cnidaria and Lophotrochozoa examined, but none were detected in any of the available ecdysozoan or non-urochordate deuterostome genome data. The vast majority of the metazoan USPs are short, single- domain proteins and are phylogenetically distinct from the prokaryotic, plant, protist, and fungal members of the protein family. Whereas most of the metazoan USP genes contain introns, with few exceptions those in the cnidarian Hydra are intronless and cluster together in phylogenetic analyses. Expression patterns were determined for several cnidarian USPs, including two genes belonging to the intronless clade, and these imply diverse functions. The apparent paradox of implied diversity of roles despite high overall levels of sequence (and implied structural) similarity parallels the situation in bacteria. The absence of USP genes in ecdysozoans and most deuterostomes may be a consequence of functional redundancy or specialization in taxon-specific roles. Key words: universal stress protein, Cnidaria, USP, gene loss, Urmetazoa. Introduction The universal stress protein A (USPA) domain, originally identified in the product of the Escherichia coli uspA gene, is the archetype of what is now a family of prokaryotic and plant proteins defined as COG0589 and Pfam PF00582 (Kvint et al. 2003). Proteins containing the USP domain were originally identified in the context of the bacterial stress response; E. coli contains six such genes (uspA, CG), which are expressed in response to a wide variety of stress states, including nutrient starvation and exposure to heat, acid, heavy metals, oxidative agents, osmotic stress, antibiotics, and uncouplers of oxidative phosphorylation (Nystrom and Neidhardt 1992, 1993, 1994). In Mycobacte- rium smegmatis, three USPs are induced in response to ox- ygen starvation (O’Toole et al. 2003), and in Pseudomonas aeruginosa, USPs are required for anaerobic growth (Schreiber et al. 2006). Although some USPs are clearly in- volved in bacterial stress responses, mutagenesis studies imply other primary functions as well. In E. coli, UspC and UspE are implicated in cell adhesion as well as the production of flagella; these two proteins decrease adhe- sion and promote motility, whereas UspF and UspG have the opposite effect (Nachin et al. 2005). In addition to being required for defense against superoxide-generating agents, E. coli UspD functions in intracellular iron homeostasis (Nachin et al. 2005). Both plants and fungi also have proteins containing the USP domain, and some of the plant USP genes are stress induced; for example, tomato ER6 is induced by eth- ylene (Zegzouti et al. 1999), a plant hormone often associated with stress (Druege 2006). The USP domain is an alpha-beta-alpha fold (fig. 1), and the USP family belongs to the adenine nucleotide alpha hydrolase superfamily, which also includes the electron transport flavoprotein family, the N-type ATP protein phos- phatases and ATP sulfhydrylases. Although the USP fold is associated with ATP binding, not all proteins in the USP family bind ATP. The crystal structures of a number of USPs have been solved, including those from Methanococcus jannaschii (Zarembinski et al. 1998) and Haemophilus influenzae (Sousa and McKay 2001). Although these struc- tural features imply a fundamental distinction between the © The Author 2010. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Mol. Biol. Evol. 28(1):153–161. 2011 doi:10.1093/molbev/msq183 Advance Access publication July 21, 2010 153 Research article
Transcript
Page 1: Phylogenomics Reveals an Anomalous Distribution of USP ...web.stanford.edu/group/Palumbi/PeoplePages/manuscripts/Foret2011USP.pdfPhylogenomics Reveals an Anomalous Distribution of

Phylogenomics Reveals an Anomalous Distribution of USPGenes in Metazoans

Sylvain Foret,�,1 Francxois Seneca,�,1 Danielle de Jong,1,2 Annette Bieller,3 Georg Hemmrich,4

Rene Augustin,4 David C. Hayward,2 Eldon E. Ball,2 Thomas C.G. Bosch,4 Kiyokazu Agata,5

Monika Hassel,3 and David J. Miller*,1

1ARC Centre of Excellence in Coral Reef Studies and Comparative Genomics Centre, James Cook University, Townsville,Queensland, Australia2Evolution, Ecology and Genetics Group, Research School of Biology, Australian National University, Canberra ACT, Australia3Department of Biology, Philipps University, Marburg, Germany4Zoological Institute, Christian-Albrechts-University Kiel, Kiel, Germany5Department of Biophysics, Kyoto University, Kitashirakawa-Oiwake, Sakyo-ku, Kyoto, Japan

�These authors contributed equally to this work.

*Corresponding author: E-mail: [email protected].

Associate editor: Claudia Kappen

Abstract

Members of the universal stress protein (USP) family were originally identified in stressed bacteria on the basis of a shareddomain, which has since been reported in a phylogenetically diverse range of prokaryotes, fungi, protists, and plants.Although not previously characterized in metazoans, here we report that USP genes are distributed in animal genomes ina unique pattern that reflects frequent independent losses and independent expansions. Multiple USP loci are present inurochordates as well as all Cnidaria and Lophotrochozoa examined, but none were detected in any of the availableecdysozoan or non-urochordate deuterostome genome data. The vast majority of the metazoan USPs are short, single-domain proteins and are phylogenetically distinct from the prokaryotic, plant, protist, and fungal members of the proteinfamily. Whereas most of the metazoan USP genes contain introns, with few exceptions those in the cnidarian Hydra areintronless and cluster together in phylogenetic analyses. Expression patterns were determined for several cnidarian USPs,including two genes belonging to the intronless clade, and these imply diverse functions. The apparent paradox of implieddiversity of roles despite high overall levels of sequence (and implied structural) similarity parallels the situation in bacteria.The absence of USP genes in ecdysozoans and most deuterostomes may be a consequence of functional redundancy orspecialization in taxon-specific roles.

Key words: universal stress protein, Cnidaria, USP, gene loss, Urmetazoa.

IntroductionThe universal stress protein A (USPA) domain, originallyidentified in the product of the Escherichia coli uspA gene,is the archetype of what is now a family of prokaryotic andplant proteins defined as COG0589 and Pfam PF00582(Kvint et al. 2003). Proteins containing the USP domainwere originally identified in the context of the bacterialstress response; E. coli contains six such genes (uspA, C–G), which are expressed in response to a wide variety ofstress states, including nutrient starvation and exposureto heat, acid, heavy metals, oxidative agents, osmotic stress,antibiotics, and uncouplers of oxidative phosphorylation(Nystrom and Neidhardt 1992, 1993, 1994). In Mycobacte-rium smegmatis, three USPs are induced in response to ox-ygen starvation (O’Toole et al. 2003), and in Pseudomonasaeruginosa, USPs are required for anaerobic growth(Schreiber et al. 2006). Although some USPs are clearly in-volved in bacterial stress responses, mutagenesis studiesimply other primary functions as well. In E. coli, UspCand UspE are implicated in cell adhesion as well as the

production of flagella; these two proteins decrease adhe-sion and promote motility, whereas UspF and UspG havethe opposite effect (Nachin et al. 2005). In addition to beingrequired for defense against superoxide-generating agents,E. coli UspD functions in intracellular iron homeostasis(Nachin et al. 2005). Both plants and fungi also have proteinscontaining the USP domain, and some of the plant USP genesare stress induced; for example, tomato ER6 is induced by eth-ylene (Zegzouti et al. 1999), a plant hormone often associatedwith stress (Druege 2006).

The USP domain is an alpha-beta-alpha fold (fig. 1), andthe USP family belongs to the adenine nucleotide alphahydrolase superfamily, which also includes the electrontransport flavoprotein family, theN-type ATP protein phos-phatases and ATP sulfhydrylases. Although the USP fold isassociated with ATP binding, not all proteins in the USPfamily bind ATP. The crystal structures of a number of USPshave been solved, including those from Methanococcusjannaschii (Zarembinski et al. 1998) and Haemophilusinfluenzae (Sousa and McKay 2001). Although these struc-tural features imply a fundamental distinction between the

© The Author 2010. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, pleasee-mail: [email protected]

Mol. Biol. Evol. 28(1):153–161. 2011 doi:10.1093/molbev/msq183 Advance Access publication July 21, 2010 153

Research

article

Page 2: Phylogenomics Reveals an Anomalous Distribution of USP ...web.stanford.edu/group/Palumbi/PeoplePages/manuscripts/Foret2011USP.pdfPhylogenomics Reveals an Anomalous Distribution of

bacterial UspF/UspG types, which bind ATP, and the UspAtype, which does not, the functional significance of this isunclear.

Understanding USP function is also complicated by thefact that many of these proteins can form homo- as well asheterodimers (Nachin et al. 2008). Moreover, althoughmost bacterial USPs are small proteins containing only ei-ther one (14–15 kD) or two (ca., 30 kD) USP domains, thedomain also occurs in multidomain proteins from bacteria,Archaea, and plants (Kvint et al. 2003). In bacteria, the do-main occurs in a family of osmosensitive Kþ channel his-tidine kinases, and in both Archaea and bacteria, a family ofNaþ/Kþ antiporters. In plants, the USP domain occurs in anumber of serine/threonine kinases.

Although known from the other kingdoms of life, USPshave been thought of as ‘‘nonmetazoan’’ genes and haveprobably been underreported in expressed sequence tag(EST) data sets as suspected contaminants. However, inthe process of characterizing EST data sets for two antho-zoan cnidarians, we identified ESTs encoding clear mem-bers of the USP family in both the coral Acroporamillepora and the sea anemone Nematostella vectensis(Technau et al. 2005). These were the first metazoan USPsto be reported; however, clearly related sequences werealso detected in Schistomona japonicum (Technau et al.2005). Here, we report the presence of extensive USP genefamilies in the genomes of a phylogenetically diverse rangeof animals, including lophotrochozoans, cnidarians, and ur-ochordates. USP genes were detected neither in any of theecdysozoans for which whole-genome data are availablenor in vertebrates or other non-urochordate deuteros-tomes. The pattern of distribution of USP genes, at leastfive independent losses and multiple independent expan-sions having occurred during animal evolution, is so farunique. The metazoan USPs grouped together in phyloge-netic analyses, consistent with the hypothesis that theserepresent ancient genes present in the common metazoanancestor and were not acquired later via lateral gene trans-fer. Molecular phylogenetics indicates that metazoan USPshave evolved via taxon-specific duplications from a small an-cestral repertoire, and expression data for several cnidarianUSP genes imply diverse roles. The patchy phylogenetic dis-tribution of USP genes is consistent with the idea of wide-spread gene loss across the Metazoa and suggests thatother molecules may be capable of fulfilling their rolesor that these functions are no longer necessary in the specieswhere they are missing. Alternatively, the diversity of theseproteins in some clades and their absence in others mightindicate that they have evolved to fulfill taxon-specific roles.

Methods

SequencesSequences were obtained for a number of eukaryotes, fo-cusing primarily on those with fully sequenced and anno-tated genomes. Due to the very large number of bacterialUSP sequences, only those available from the protein databank (pdb.org) with a resolved 3D structure were used inFI

G.1.

Th

eU

SPd

om

ain

.Th

isal

ign

men

tin

clu

des

all

theHydra

magnipapillata

seq

uen

ces

and

the

bac

teri

al1M

JH.T

he

twoHydra

seq

uen

ces

ind

icat

edin

red

are

tho

sela

ckin

gin

tro

ns.

Th

ese

con

dar

yst

ruct

ure

issh

ow

nab

ove

the

alig

nm

ent,

and

the

resi

du

esin

volv

edin

AT

Pb

ind

ing

are

ind

icat

edb

ya

‘‘*’’

un

der

the

alig

nm

ent.

Th

eco

lor

cod

ing

of

resi

du

esis

bas

edo

nth

atu

sed

inth

eJa

lvie

wim

ple

men

tati

on

of

Clu

stal

X(s

eew

ww

.jalv

iew

.org

/hel

p/h

elp

.htm

l)as

follo

ws:

blu

e,A

,I,L

,M,F

,W,V

,C;r

ed,R

,K;g

reen

,N,Q

,S,T

;pin

k,C

;mag

enta

,E,D

;ora

nge

,G;c

yan

,H,Y

;yel

low

,P.T

hre

sho

lds

for

colo

rco

din

gw

ere

op

tim

ized

tocl

earl

yd

elin

eate

the

bo

un

dar

ies

of

seco

nd

ary

stru

ctu

refe

atu

res.

Foret et al. · doi:10.1093/molbev/msq183 MBE

154

Page 3: Phylogenomics Reveals an Anomalous Distribution of USP ...web.stanford.edu/group/Palumbi/PeoplePages/manuscripts/Foret2011USP.pdfPhylogenomics Reveals an Anomalous Distribution of

our analyses. Scanning for USP domains was carried outusing HMMER (Eddy 1998) version 2.3.2 with the USP pro-files available on PFAM (version 23.0). The gene models ofthe predicted proteins containing a USP domain were theninspected to confirm domain structures and intron–exonboundaries. Details of the database versions used andsequence names identified are provided in supplementarytable 1, Supplementary Material online. We adhered to thestandard nomenclature practice: the name of each USPstarts with the first letter of the genus name, followedby a three-letter reduction of the species name, followedby a number. When several protein sequences from thesame species were identical, a single representative wasused for further analysis, hence the use of some nonsequen-tial identifiers. Genes containing two USP domains weresplit, with ‘‘a’’ and ‘‘b’’ appended to the first and seconddomain, respectively (e.g., Sman5a is the first domain ofSchistosoma mansonii USP gene number 5).

PhylogeneticsSequences were aligned with MAFFT 6.717b (Katoh et al.2005), using the accurate L-INS-I method. Positions con-taining over 95% gaps were removed from the alignment.A colored version of the alignment with the intron positionsis shown in supplementary figure 7, Supplementary Materialonline. Maximum likelihood trees were inferred withPhyML 3.0 (Guindon and Gascuel 2003) using the LGamino acid substitution model (Le and Gascuel 2008), withfour substitution rate categories approximating a gammadistribution whose rate was estimated and an invariant cat-egory. The starting trees were computed using BioNJ, andthe topologies were optimized by nearest neighbor inter-change and subtree pruning and regrafting. The branchsupport was estimated using approximate likelihood tests(Shimodaira and Hasegawa 1999) and with the bootstrapprocedure, using 100 replicates.

Phylogenetic trees were also inferred with a Bayesian ap-proach using MrBayes 3.2-cvs (Ronquist and Huelsenbeck2003) that we modified to incorporate the LG model. Theamino acid substitution model was chosen by optimizationand converged rapidly to the LG model. The program wasrun for 100,000,000 generations, sampling every 1,000 gen-erations, using two runs and four chains per run. The het-erogeneity in rates was modeled by a gamma distributionwith four categories and one invariant category. The crite-ria for convergence were an average standard deviation ofsplit frequencies lower than 0.05 and potential scale reduc-tion factor for the estimated parameters between 0.995 and1.005. The first 25% observations were removed as burn-in.Quartet puzzling was carried out with Tree-Puzzle 5.2(Schmidt et al. 2002) in likelihood mapping mode.

In Situ HybridizationFor assessment of gene expression patterns in Hydra,whole-mount in situ hybridization was carried out as pre-viously described (Augustin et al. 2006).

Embryos, planula larvae, and postmetamorphic speci-mens of Acropora were fixed as described in Anctil et al.

(2007). Prior to the hybridization procedure, the specimenswere cleared in xylene for 2 h before being rehydrated tophosphate buffered saline (PBS) containing 0.1% Triton X-100 (PBS-T). Remaining lipids were removed by treatingspecimens in the RIPA detergent cocktail (Rosen and Bed-dington 1993), overnight at 4 �C, followed by rinses in PBS-T.Whole-mount hybridization proceeded as described in Ku-charski et al. (2000). Hybridization was carried out at 55 �Cfor 72 h. The templates for runoff transcription of antisenseRNA probes were generated from cloned cDNAs by poly-merase chain reaction. Control specimens, prehybridizedwith an excess of unlabeled runoff antisense transcript, failedto show staining, demonstrating the specificity of the ob-served patterns. Following dehydration and clearing througha graded glycerol series, specimens were mounted in 90%glycerol. Digital images were obtained using a SPOT digitalcamera mounted on a Wild PhotoMakroskop M400.

Results

The Phylogenetic Distribution of USP GenesThe results of scanning the available whole-genome datausing a hidden Markov model for the USP domain aresummarized in figure 2, with the numbers of USP genesindicated for representative taxa. USP genes are presentin slime molds, fungi, and the choanoflagellate Monosiga,but relatively few loci were detected in each case. In theanimal kingdom, the distribution of USPs is patchy—nonewere detected in the placozoan Trichoplax or any ecdyso-zoans for which whole-genome sequences are available or inthe vast majority of deuterostomes. However, several USPswere detected in urochordates (nine in Ciona intestinalis) aswell as in all the cnidarians and lophotrochozoans exam-ined. In these latter cases, the numbers of USP genes werehigher than in all the nonmetazoan unikonts (eukaryotesthat are either amoeboid or bear a single cilium) examined.Preliminary analyses indicate that sponge genomes also en-code USPs, but the publicly available data do not yet permitestimation of the numbers of genes present.

The distribution pattern of USPs within Metazoa re-quires multiple losses—at least five independent losseshave occurred during animal evolution (fig. 2).

Use of the OrthoMCL database (Li et al. 2003) allowedthe identification of USPs as one of 13 orthologous groupsof genes with similar phylogenetic distribution in Metazoa,but most of these are metazoan specific. Adding the furtherconstraints that the domain also be present in the range ofnonmetazoan groups in which it is known illustrates theunique nature of the USP distribution pattern; addingthe constraint that the cluster be present in a choanoflagel-late reduced the number of groups found to five, and add-ing the requirement for presence in bacteria reduced thenumber identified to two—USPs and amidohydrolase. Re-quiring that the group also be present in Archea (i.e., thereal distribution) made the USP cluster unique. This domaintherefore has a highly unusual (unique to date) phyletic dis-tribution pattern; it has a very ancient origin and has beenlost on many independent occasions in the Metazoa.

Phylogenomics of Metazoan USPs · doi:10.1093/molbev/msq183 MBE

155

Page 4: Phylogenomics Reveals an Anomalous Distribution of USP ...web.stanford.edu/group/Palumbi/PeoplePages/manuscripts/Foret2011USP.pdfPhylogenomics Reveals an Anomalous Distribution of

General Characteristics of the Metazoan USPsWith very few exceptions, the predicted metazoan USPs areshort, single-domain proteins. By contrast, in plants, a sub-stantial proportion (approximately half in Arabidopsis; Kerket al. 2003) of proteins containing the USP domain alsocontain a protein kinase domain. The majority of the meta-zoan USPs are predicted to have the hydrophobic beta 5region (fig. 1 and supplementary fig. 7, Supplementary Ma-terial online) and are thus presumably capable of dimeriza-tion. In both animals and fungi, the main exceptions to thesingle-domain general pattern are genes encoding two (i.e.,duplicated) USP domains. Based on the presence of key res-idues implicated in ATP binding in USPA from Methano-caldococcus jannaschii (1MJH; Zarembinski et al. 1998), it islikely that at least some animal USPs bind ATP (fig. 1 andsupplementary fig. 7, Supplementary Material online).

Phylogenetic AnalysisPhylogenetic analysis of the sequences of the full USP com-plement from a number of species was conducted using

maximum likelihood and Bayesian approaches. Figure 3shows the results of the Bayesian inference of all the animalsequences identified in fully sequenced genomes and in thecoral A. millepora; the results of other inference methodsare provided as supplementary figures 1–5, SupplementaryMaterial online. Although there are some (mostly minor)disagreements between the trees resulting from the appli-cation of different methods and the level of support varies,consistent trends can be identified.

The evolution of the eukaryotic USP superfamily hasbeen characterized by many lineage-specific expansions.Most of the land plant sequences grouped together, where-as those from Ostreococcus and Chlamydomonas were foundin paraphyletic groups with fungal, slime mold, and ciliatesequences (supplementary figs. 3–5, Supplementary Materialonline).

With the exception of Nvec08 (see below), the metazoanUSPs formed a monophyletic clade regardless of the me-thod of phylogenetic analysis employed. The posterior pro-bability and SH-like likelihood ratio tests (Shimodaira and

FIG. 2. Phylogenomic distribution of USP genes. For each clade, the name of a representative species is indicated and colored in red if thegenome of that species encodes USPs and in black if it does not. The number of USP genes found in each genome is given to the right of thespecies name. Branches where losses of the entire USP family have occurred are highlighted by red dots. The Cnidaria are highlighted in blue,the Lophotrochozoa in purple, the Ecdysozoa in green, and the Deuterostomia in brown. LUCA, last universal common ancestor; LECA, lasteukaryotic common ancestor.

Foret et al. · doi:10.1093/molbev/msq183 MBE

156

Page 5: Phylogenomics Reveals an Anomalous Distribution of USP ...web.stanford.edu/group/Palumbi/PeoplePages/manuscripts/Foret2011USP.pdfPhylogenomics Reveals an Anomalous Distribution of

Hasegawa 1999) strongly support monophyly of the animalUSP sequences. Although this node was not well supportedin terms of bootstrap support, it was strongly supportedby quartet puzzling (supplementary fig. 6, SupplementaryMaterial online) in which 79.1% of 10,000 random quartetsfavored monophyly of animal USPs.

The N. vectensis sequence Nvec08 is almost certainly acontaminant, as it clusters within the prokaryotic cladewith high support, its best Blast hit is a sequence fromFlavobacteria bacterium and the genomic scaffold that itis located on is made of a single short contig, containing asingle other gene (gi:5496368), which is also intronless andmost similar to another F. bacterium sequence (hypotheticalprotein FBBAL38_06985). Three USP sequences were foundin the genome of the choanoflagellate Monosiga brevis,

but these sequences did not cluster within the animalgroup.

With the exception of those from S. mansoni, most of thesequences from the lophotrochozoans Lottia gigantea (agastropod mollusc), Capitella telata (a polychaete annelid),and Helobdella robusta (a hirudinean annelid) clustered to-gether with moderate support in both maximum likelihoodand Bayesian analyses (fig. 3 and supplementary figs. 1–2,Supplementary Material online). Within this clade, twomain expansions were consistently identified—one consist-ing exclusively of sequences from Lottia and the other con-taining only annelid sequences. Most of the deuterostome(Ciona) sequences also formed a single clade. Many shallownodes of the tree were well supported by all the methods ofanalysis. These represent either orthologs between species

FIG. 3. Phylogenetic analysis of animal USP sequences. The analyses were based on the complete USP complements of the animals with fullysequenced genomes (fig. 1) plus the coral Acropora millepora. The tree shown is the result of Bayesian analysis, with posterior probabilities ofthe nodes indicated. Keys to identifiers: Amil A. millepora (Cnidaria); Nvec Nematostella vectensis (Cnidaria); Hmag Hydra magnipapillata(Cnidaria); Sman Schistosoma mansoni (Platyhelminthes); Lgig Lottia gigantea (Mollusca); Capi Capitella telata (Annelida); Hrob Helobdellarobusta (Annelida); Cins Ciona intestinalis (Urochordata). The well-resolved clade indicated by the blue background contains all but two of theHydra USP sequences, and every member of this clade lacks introns. The two Hydra sequences containing introns are indicated as a greenbranch; these groups with two other cnidarian sequences, both of which also contain introns.

Phylogenomics of Metazoan USPs · doi:10.1093/molbev/msq183 MBE

157

Page 6: Phylogenomics Reveals an Anomalous Distribution of USP ...web.stanford.edu/group/Palumbi/PeoplePages/manuscripts/Foret2011USP.pdfPhylogenomics Reveals an Anomalous Distribution of

belonging to the same class or phylum or paralogous expan-sions. The most striking example of such an expansion isfound in Hydra (see below).

Most of the metazoan USP loci are typical eukaryoticgenes in that they contain a number of introns, one of whichis characteristically at approximately the same position(supplementary fig. 7, Supplementary Material online) inplant and animal genes but is not present in fungal or protistgenes. In common with the other cnidarian USP loci, two ofthe Hydra sequences contain introns (including one at theconserved site), whereas the remaining 22 Hydra USP lociare devoid of them. The fact that these intronless genesform a well-supported monophyletic clade suggests thatthey are the products of a single retrotransposition eventthat occurred after the anthozoan/hydrozoan divergence.

Heterogeneity of USP Expression PatternsAlthough many of the metazoan USPs are represented inEST data sets, in situ expression data are available in only afew cases. In order to investigate their possible functions,the expression patterns of selected cnidarian USPs were de-termined by in situ hybridization. Given that the majorityof Hydra USP genes are likely to be the result of a retrotrans-position event, the expression patterns of representativesof this clade were of particular interest. Each of three HydraUSP genes examined gave a distinct expression pattern inadult polyps (fig. 4A–C); two of the genes for which expres-sion data were obtained (Hmag01 and Hmag10/teba1) areintronless, whereas the third (Hmag05) contains introns.Hmag01 was expressed throughout the body column endo-derm (fig. 4A), but no signal was detected in the tentacles.Hmag05, on the other hand, is expressed in a narrow ringof endodermal epithelial cells very close to the basal disk(fig. 4B). In the case of Hmag10/teba1, in situ analyses ofwhole polyps detected expression of messenger RNA(mRNA) in endodermal cells of the proximal part of tentacles(fig. 4C). This gene, named teba1 (tentacle base 1) because itis the first to be expressed in the tentacle base, is expressedboth during bud evagination and during head regeneration.

For comparative purposes, the expression pattern of oneof the Acropora USP genes was determined (fig. 4D). Duringthe settlement process, Amil10 is expressed in the regionthat will form the basal plate (fig. 4D1). Following metamor-phosis, Amil10 mRNA becomes progressively more re-stricted in distribution, with expression initially in thecalicoblast cells forming the basal plate and later associatedwith developing mesenteries of the polyp, presumably in thecalcifying cells that are producing the septa (fig. 4D2–D6).

DiscussionUSP genes are likely to have been present in the genome ofUrmetazoa—the common ancestor of all animals—andphylogenetics supports the idea that the pattern of pres-ence and absence of these genes in bilaterians with fully se-quenced genomes reflects gene loss rather than lateral genetransfer. The genomes of the urochordates, cnidarians, andlophotrochozoans examined contain 8–26 USP genes, but

none are present in the placozoan Trichoplax or in any ec-dysozoan or non-urochordate deuterostome. All the pre-dicted metazoan USPs are small, single-domain proteins,whereas many of the plant and some of the bacterial USPscontain additional domains (summarized in Kvint et al.2003). Many of the USPs in flowering plants also containa protein kinase (PK) domain; for example, 48 USPs arepresent in Arabidopsis and 23 of these are the USP/PK type(Kerk et al. 2003). The diverse expression patterns, and im-plied diversity of roles, of the single-domain Hydra proteinspresent an apparent paradox in that these proteins are

FIG. 4. Expression patterns of Hydra and Acropora USP genes. (A)Expression of Hmag01 in Hydra magnipapillata. This gene isexpressed throughout the trunk endoderm, but transcripts areabsent from the tentacle. (B) In H. magnipapillata, Hmag05 isstrongly expressed in an endodermal stripe across the base of thepolyp. (C) Expression of Hmag10/teba1 in Hydra vulgaris. This geneis strongly expressed in the endoderm at the tentacle base. Notethat H. vulgaris and H. magnipapillata are closely related sister taxa(Hemmrich et al. 2007), and the (H. magnipapillata) Hmag10 and(H. vulgaris) teba1 proteins are identical. (D) Expression of theAcropora millepora USP gene Amil10. (D1) This gene is firstexpressed as the planula is settling in the region that will form thebasal plate. (D2) A recently settled polyp viewed from the aboralsurface (the side against the substratum) shows a ring of strongexpression near the periphery of the base surrounding a zone ofweaker expression, which would overlie the forming basal plate.(D3) A slightly older polyp shows the basal expression fading asexpression along the protosepta appears (arrows). (D4–D6)Expression along the septa (arrows) in older polyps viewed fromoral (D4, D6) and aboral (D5). Arrowheads in (D6) mark tissueassociated with the synapticular ring connecting the septa.

Foret et al. · doi:10.1093/molbev/msq183 MBE

158

Page 7: Phylogenomics Reveals an Anomalous Distribution of USP ...web.stanford.edu/group/Palumbi/PeoplePages/manuscripts/Foret2011USP.pdfPhylogenomics Reveals an Anomalous Distribution of

short and very similar throughout their lengths (fig. 1). Forexample, the highly divergent expression patterns ofHmag01 and Hmag10 (fig. 4A and C) suggest distinct func-tions, but the proteins have 43% identity and 59% similarityoverall. Although these issues were not explored in the cor-responding papers, analyses of published microarray dataalso imply heterogeneous roles for USPs; in corals, differentUSP genes respond differently both during development(Grasso et al. 2008; Voolstra et al. 2009) and in adults ex-posed to thermal stress (Desalvo et al. 2008; Seneca F un-published data). This situation parallels that in bacteria,however, where USPs are implicated in a similarly widerange of processes, including cell adhesion and motilityas well as being modulators of stress responses (Nachinet al. 2005, 2008). The USP domain appears to permit a widerange of functions, but these may be redundant. We notethat despite diverse spatial expression patterns, all three ofthe Hydra USP genes for which we present expression dataare expressed in the endodermal epithelium, a highly po-tent chemical barrier for protection against intruding mi-crobes (Bosch et al. 2009). Are USPs contributing to thisdefensive barrier?

Most of the Hydra USP genes lack introns and are likelyto be derived from a single retrotransposition event. By con-trast with processed pseudogenes, most or all of the intron-less Hydra USP genes are transcribed (ESTs have beenidentified in most cases) and code for proteins; the subsetstudied are expressed in specific patterns during growthand development. Independent expansions are commonin evolution, and there are many known from Hydra—forexample, the PPOD family of peroxidases (Thomsen andBosch 2006) and NLR proteins (Lange C, Hemmrich G,Klostermeier U, Miller DJ, Rahn T, Weiss Y, Bosch T, P.Rosentiel, submitted) have undergone extensive duplica-tion in Hydra. Likely precedents for retrotransposition inHydra include the HvirAPX1 ascorbate peroxidase (Habethaand Bosch 2005). Although the significance of the expres-sion patterns for two other genes is not clear, one of theHydra USPs, teba1, is an early marker for tentacle develop-ment and its expression precedes any obvious morpholog-ical changes (not shown). The tentacle base of hydra isa region in which epithelial cells start to undergo dramaticchanges in shape and function (Bode et al. 1986), and itmay be that teba1 is involved in this process.

The implication of the data presented here is thatUrmetazoa—the common animal ancestor—shared oneor a few USP genes with members of the other kingdomsof life. One outstanding question is why USP genes areabundant in the genomes of some animals but poorly rep-resented or absent from others. Although there are manyexamples of loss of single genes, there are few direct prece-dents for the kind of distribution reported here, where anancestral domain is absent from all members of one of thethree bilaterian lineages (Ecdysozoa) and the majority ofanother (Deuterostomia) but with an expanded represen-tation in other animals. The Lophotrochozoa are assumedto have undergone fewer gene losses than have Ecdysozoa(see, e.g., Moroz et al. 2006), so the absence of USP genes from

members of the latter superphylum is not surprising. Thereare examples of domain distribution that follow the expectedpattern of greater loss from ecdysozoans. For example, themetazoan RAG1 core and N-terminal domains are both pres-ent in at least some lophotrochozoans (Moroz et al. 2006), aswell as cnidarians and many deuterostomes, but lacking in theEcdysozoa (Kapitonov and Jurka 2005). The absence of USPgenes from vertebrates and most other deuterostomes isunexpected given that the vertebrate gene complementhas undergone relatively few losses during evolution.

Gene loss is ubiquitous, and the evidence suggests thatvery few genes are indispensible. In a comparison based oninsects and vertebrates, more than one third (40%) of an-cient orthologous genes were shown to have been lost in atleast one of the ten species examined (Wyder et al. 2007),and often genes or whole pathways presumed to be essen-tial are missing—Drosophila manages perfectly well with-out CpG methylation and Caenorhabditis without eitherCpG methylation or hedgehog signaling. Conversely, genesinitially assumed to be taxon specific often turn out not tobe when more whole-genome sequences become available.For example, proteins related to the green fluorescent pro-tein of the jellyfish Aequoria victoria were assumed to berestricted to cnidarians but have recently been identified incopepods (Shagin et al. 2004) and amphioxus (Deheyn et al.2007), and the perforin domain protein apextrin was firstidentified as a taxonomically restricted gene specific toechinoderms (Haag et al. 1999) but has subsequently beenidentified in some (but not all) cnidarians (Miller et al.2007) and lophotrochozoans (Moroz et al. 2006; Takahashiet al. 2009) as well as deuterostomes. The most likely evo-lutionary scenario is that one or a few USP genes were pres-ent in the urmetazoan genome, and this small ancestralcomplement has independently undergone expansion ina number of lineages. These expansions potentially enablethe genes to acquire a diverse range of functions, many ofwhich may be taxon specific, and the diversity of expres-sion patterns seen in cnidarians is consistent with this.Few expression data are available for other metazoanUSPs, however. The Aniseed database (http://aniseed-ibdm.univ-mrs.fr/) includes in situ data for one of the Ciona USPs(gene ID: ci 0100151159), which is expressed at the tadpolestage in the anterior and posterior sensory vesicles, theneck, and the visceral ganglia. It remains to be seen whetherother animal groups display USP expression patterns as di-verse as those reported here for Hydra.

Although individual gene losses have occurred in eachspecies examined to date (Foret et al. 2010), cnidarians ap-pear to have maintained much of the ancestral metazoangene complement (Kortschak et al. 2003; Technau et al.2005; Putnam et al. 2007) and there are a number of ex-amples of genes and pathways shared by nonmetazoansand cnidarians but present only in a few bilaterians. Forexample, the enzymes involved in oxylipin biosynthesis,whose products are the jasmonates and volatile com-pounds resulting in characteristic smells of many fruitsand vegetables, are also present in cnidarians, placozoans,and amphioxus but have been lost from all other animals

Phylogenomics of Metazoan USPs · doi:10.1093/molbev/msq183 MBE

159

Page 8: Phylogenomics Reveals an Anomalous Distribution of USP ...web.stanford.edu/group/Palumbi/PeoplePages/manuscripts/Foret2011USP.pdfPhylogenomics Reveals an Anomalous Distribution of

examined to date (Lee et al. 2008). There is also evidencethat cnidarians use plant-like signaling molecules such asabscisic acid (Puce et al. 2004).

Comparative analyses such as these are consistent with agenetically complex common ancestor and underscoreboth the significance of gene loss during evolution andthe informative nature of cnidarians in terms of the ances-tral gene complement. The emerging picture is that thereare no universal rules, only trends to which there are alwaysexceptions. Explanations for the absence of USPs in mostdeuterostomes and ecdysozoans include that they eithermay be functionally redundant or may fulfill primarilytaxon-specific roles in those organisms that have retainedthem. Expression data presented here for the coral USPgene Amil10 are consistent with the idea of a taxon-specificrole, but comprehensive expression analyses in a range ofanimals (lophotrochozoans are of particular interest) willbe required to address this issue.

ConclusionsOne or a few USP genes are likely to have been present in thecommon metazoan ancestor and, rather than their presencein some lineages being a consequence of lateral gene transfer,their absence from ecdysozoans and most deuterostomesreflects gene loss. Within the animal kingdom, there havebeen a number of independent lineage-specific expansionsof the USP gene family, most clearly in Hydrawhere 22 of the24 USP genes originate from a single retrotranspositionevent. The diversity of observed expression patterns impliesa corresponding diversity of roles for the metazoan USPsdespite these being short, single-domain proteins withmoderate-to-high sequence similarity. This situation paral-lels that in bacteria. Explanations for the absence of USPgenes in ecdysozoans and most deuterostomes include func-tional redundancy or the genes being recruited primarily totaxon-specific roles in those taxa that have retained them.Alternatively, the losses and expansions in this gene familycould simply be the product of a stochastic birth and deathprocess. To test this in a rigorous quantitative framework(De Bie et al. 2006) will require more data on rates of geneloss and gain in Cnidaria and Lophotrochozoa.

Supplementary MaterialSupplementary table 1 and figures 1–7 are available at Mo-lecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

AcknowledgmentsThe work was supported in Australia by grants from theAustralian Research Council (ARC) via the ARC Centreof Excellence for Coral Reef Studies, the ARC Centre forthe Molecular Genetics of Development, and the DiscoveryGrants Program (grant DP1095343), and in Germany by theDeutsche Forschungsgemeinschaft (grant DFG SFB 617-A1)and via the DFG Cluster of Excellence programs ‘‘The Fu-

ture Ocean’’ and ‘‘Inflammation at Interfaces’’ (to T.C.G.B).D.J.M. also gratefully acknowledges the receipt of a JapanSociety for the Promotion of Science short-term visitingfellowship.

ReferencesAnctil M, Hayward DC, Miller DJ, Ball EE. 2007. Sequence and

expression of four coral G protein-coupled receptors distinctfrom all classifiable members of the rhodopsin family. Gene392:14–21.

Augustin R, Franke A, Khalturin K, Kiko R, Siebert S, Hemmrich G,Bosch TC. 2006. Dickkopf related genes are components of thepositional value gradient in Hydra. Dev Biol. 296:62–70.

Bode HR, Dunne J, Heimfeld L, Huang L, Javois L, Koizumi O,Westerfield J, Yaross M. 1986. Transdifferentiation occurscontinuously in adult Hydra. Curr Top Dev Biol. 20:257–280.

Bosch TC, Augustin R, Anton-Erxleben F, et al. (17 co-authors). 2009.Uncovering the evolutionary history of innate immunity: thesimple metazoan Hydra uses epithelial cells for host defence.Dev Comp Immunol. 33:559–569.

De Bie T, Cristianini N, Demuth JP, Hahn MW. 2006. CAFE:a computational tool for the study of gene family evolution.Bioinformatics 22:1269–1271.

Deheyn DD, Kubokawa K, McCarthy JK, Murakami A, Porrachia M,Rouse GW, Holland ND. 2007. Endogenous green fluorescentprotein (GFP) in amphioxus. Biol Bull. 213:95–100.

Desalvo MK, Voolstra CR, Sunagawa S, Schwarz JA, Stillman JH,Coffroth MA, Szmant AM, Medina M. 2008. Differential geneexpression during thermal stress and bleaching in the Caribbeancoral Montastraea faveolata. Mol Ecol. 17:3952–3971.

Druege U. 2006. Ethylene and plant responses to abiotic stress. In:Khan NA, editor. Ethylene action in plants. Berlin: Springer. p.81–118.

Eddy SR. 1998. Profile hidden Markov models. Bioinformatics14:755–763.

Foret S, Knack B, Houliston E, Momose T, Manuel M, Hayward DC,Ball EE, Miller DJ. 2010. New tricks with old genes: the geneticbases of novel cnidarian traits. Trends Genet. 26:154–158.

Grasso LC, Maindonald J, Rudd S, Hayward DC, Saint R, Miller DJ,Ball EE. 2008. Microarray analysis identifies candidate genes forkey roles in coral development. BMC Genomics. 9:540.

Guindon S, Gascuel O. 2003. A simple, fast, and accurate algorithmto estimate large phylogenies by maximum likelihood. Syst Biol.52:696–704.

Haag ES, Sly BJ, Andrews ME, Raff RA. 1999. Apextrin, a novelextracellular protein associated with larval ectoderm evolutionin Heliocidaris erythrogramma. Dev Biol. 211:77–87.

Habetha M, Bosch TC. 2005. Symbiotic Hydra express a plant-likeperoxidase gene during oogenesis. J Exp Biol. 208:2157–2165.

Hemmrich G, Anokhin B, Zacharias H, Bosch TCG. 2007. Molecularphylogenetics in Hydra, a classical model in evolutionarydevelopmental biology. Mol Phylogenet Evol. 44:281–290.

Kapitonov VV, Jurka J. 2005. RAG1 core and V(D)J recombinationsignal sequences were derived from Transib transposons. PLoSBiol. 3:e181.

Katoh K, Kuma K, Toh H, Miyata T. 2005. MAFFT version 5:improvement in accuracy of multiple sequence alignment.Nucleic Acids Res. 33:511–518.

Kerk D, Bulgrien J, Smith DW, Gribskov M. 2003. Arabidopsisproteins containing similarity to the universal stress proteindomain of bacteria. Plant Physiol. 131:1209–1219.

Kortschak RD, Samuels G, Saint R, Miller DJ. 2003. EST analysis of thecnidarian, Acropora millepora, reveals extensive gene loss andrapid sequence divergence in the model invertebrates. Curr Biol.13:2190–2195.

Foret et al. · doi:10.1093/molbev/msq183 MBE

160

Page 9: Phylogenomics Reveals an Anomalous Distribution of USP ...web.stanford.edu/group/Palumbi/PeoplePages/manuscripts/Foret2011USP.pdfPhylogenomics Reveals an Anomalous Distribution of

Kucharski R, Ball EE, Hayward DC, Maleszka R. 2000. Molecularcloning and expression analysis of a cDNA encoding a glutamatetransporter in the honeybee brain. Gene 242:399–405.

Kvint K, Nachin L, Diez A, Nystrom T. 2003. The bacterial universalstress protein: function and regulation. Curr Opin Microbiol.6:140–145.

Le SQ, Gascuel O. 2008. An improved general amino acidreplacement matrix. Mol Biol Evol. 25:1307–1320.

Lee D-S, Nioche P, Hamberg M, Raman CS. 2008. Structural insightsinto the evolutionary paths of oxylipin biosynthetic enzymes.Nature 455:363–370.

Li L, Stoeckert CJ Jr, Roos DS. 2003. OrthoMCL. Identification of orthologgroups for eukaryotic genomes. Genome Res. 13:2178–2189.

Miller DJ, Hemmrich G, Ball EE, Hayward DC, Khalturin K,Funayama N, Agata K, Bosch TCG. 2007. The innate immunerepertoire in Cnidaria—ancestral complexity and stochasticgene loss. Genome Biol. 8:R59.

Moroz LL, Edwards JR, Puthanveettil SV, et al. (23 co-authors). 2006.Neuronal transcriptome of Aplysia: neuronal compartments andcircuitry. Cell 127:1453–1467.

Nachin L, Brive L, Persson KC, Svensson P, Nystrom T. 2008.Heterodimer formation within universal stress protein classesrevealed by an in silico and experimental approach. J Mol Biol.380:340–350.

Nachin L, Nannmark U, Nystrom T. 2005. Differential roles of theuniversal stress proteins of Escherichia coli in oxidative stressresistance, adhesion, and motility. J Bacteriol. 187:6265–6272.

Nystrom T, Neidhardt FC. 1992. Cloning, mapping and nucleotidesequencing of a gene encoding a universal stress protein inEscherichia coli. Mol Microbiol. 6:3187–3198.

Nystrom T, Neidhardt FC. 1993. Isolation and properties of a mutantof Escherichia coli with an insertional inactivation of the uspAgene, which encodes a universal stress protein. J Bacteriol.175:3949–3956.

Nystrom T, Neidhardt FC. 1994. Expression and role of the universalstress protein, UspA, of Escherichia coli during growth arrest.Mol Microbiol. 11:537–544.

O’Toole R, Smeulders MJ, Blokpoel MC, Kay EJ, Lougheed K,Williams HD. 2003. A two-component regulator of universalstress protein expression and adaptation to oxygen starvation inMycobacterium smegmatis. J Bacteriol. 185:1543–1554.

Puce S, Basile G, Bavestrello G, Bruzzone S, Cerrano C, Giovine M,Arillo A, Zocchi E. 2004. Abscisic acid signaling through cyclicADP-ribose in hydroid regeneration. J Biol Chem. 279:39783–39788.

Putnam NH, Srivastava M, Hellsten U, et al. (19 co-authors). 2007.Sea anemone genome reveals ancestral eumetazoan generepertoire and genomic organization. Science 317:86–94.

Ronquist F, Huelsenbeck JP. 2003. MRBAYES 3: Bayesian phylogeneticinference under mixed models. Bioinformatics 19:1572–1574.

Rosen B, Beddington RS. 1993. Whole-mount in situ hybridization inthe mouse embryo: gene expression in three dimensions. TrendsGenet. 9:162–167.

Schmidt HA, Strimmer K, Vingron M, von Haeseler A. 2002. TREE-PUZZLE: maximum likelihood phylogenetic analysis usingquartets and parallel computing. Bioinformatics 18:502–504.

Schreiber K, Boes N, Eschbach M, Jaensch L, Wehland J, Bjarnsholt T,Givskov M, Hentzer M, Schobert M. 2006. Anaerobic survival ofPseudomonas aeruginosa by pyruvate fermentation requires anUsp-type stress protein. J Bacteriol. 188:659–668.

Shagin DA, Barsova EV, Yanushevich YG, et al. (13 co-authors). 2004.GFP-like proteins as ubiquitous metazoan superfamily: evolutionof functional features and structural complexity. Mol Biol Evol.21:841–850.

Shimodaira H, Hasegawa M. 1999. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol BiolEvol. 16:1114–1116.

Sousa MC, McKay DB. 2001. Structure of the universal stress proteinof Haemophilus influenzae. Structure 9:1135–1141.

Takahashi T, McDougall C, Troscianko J, Chen WC, Jayaraman-Nagarajan A, Shimeld SM, Ferrier DE. 2009. An EST screen fromthe annelid Pomatoceros lamarckii reveals patterns of gene lossand gain in animals. BMC Evol Biol. 9:240.

Technau U, Rudd S, Maxwell P, et al. (12 co-authors). 2005.Maintenance of ancestral complexity and non-metazoan genesin two basal cnidarians. Trends Genet. 21:633–639.

Thomsen S, Bosch TC. 2006. Foot differentiation and genomicplasticity in Hydra: lessons from the PPOD gene family. DevGenes Evol. 216:57–68.

Voolstra CR, Schnetzer J, Peshkin L, Randall CJ, Szmant AM,Medina M. 2009. Effects of temperature on gene expression inembryos of the coral Montastrea faveolata. BMC Genomics10:627.

Wyder S, Kriventseva EV, Schroder R, Kadowaki T, Zdobnov EM.2007. Quantification of ortholog losses in insects and verte-brates. Genome Biol. 8:R242.

Zarembinski TI, Hung LW, Mueller-Dieckmann HJ, Kim KK, Yokota H,Kim R, Kim SH. 1998. Structure-based assignment of the bio-chemical function of a hypothetical protein: a test case ofstructural genomics. Proc Natl Acad Sci U S A. 95:15189–15193.

Zegzouti H, Jones B, Frasse P, Marty C, Maitre B, Latch A, Pech JC,Bouzayen M. 1999. Ethylene-regulated gene expression intomato fruit: characterization of novel ethylene-responsive andripening-related genes isolated by differential display. Plant J.18:589–600.

Phylogenomics of Metazoan USPs · doi:10.1093/molbev/msq183 MBE

161


Recommended