+ All Categories
Home > Documents > Genome Sequence of an Extremely Halophilic Archaeon

Genome Sequence of an Extremely Halophilic Archaeon

Date post: 22-Jan-2017
Category:
Upload: letruc
View: 227 times
Download: 5 times
Share this document with a friend
17
383 From: Microbial Genomes Edited by: C. M. Fraser, T. D. Read, and K. E. Nelson © Humana Press Inc., Totowa, NJ 21 Genome Sequence of an Extremely Halophilic Archaeon Shiladitya DasSarma INTRODUCTION Extreme halophiles are novel microorganisms that require 5–10 times the salinity of seawater (ca. 3–5M NaCl) for optimal growth (1,2). They include diverse prokaryotic species, both archaeal and bacterial, and some eukaryotic organisms. Extreme halo- philes are found in hypersaline environments near the sea or salt deposits of marine or nonmarine origin. Two of the largest hypersaline lakes supporting a variety of halo- philic species are the Great Salt Lake in the western United States and the Dead Sea in the Middle East. Some of the most interesting hypersaline environments are small arti- ficial solar salterns used for producing salt from the sea, which are distributed through- out the world. Many hypersaline environments exhibit gradients of increasing salinity temporally and produce sequential growth of progressively more halophilic species, including complex microbial mats and spectacular blooms of bright red and red-orange colored species. These environments are important ecologically, frequently supporting entire populations of such exotic birds as pink flamingoes, which obtain their color from the pigmented halophilic microorganisms. A critical feature of halophilic microbes that prevents cell lysis in hypersaline environments is their high internal concentration of compatible solutes (e.g., amino acids, polyols, and salts), which act as osmoprotectants. Although a wide variety of halophiles has been cultured, the genome of only a single extreme halophile, Halobacterium sp NRC-1, has been completely sequenced thus far (3,4). This species is a typical halophile commonly found in many hypersaline environ- ments, including the Great Salt Lake and solar salterns. Phylogenetically, it is classified as an archaeon, a member of the third branch of life (Fig. 1). It has a growth optimum of 4.5M NaCl, close to the saturation point, and a high concentration of K + salts inter- nally. Halobacterium NRC-1 is a mesophilic archaeon, with a temperature optimum of 42 o C for growth. Alhough Halobacterium species are thought to have limited physio- logical capabilities, strain NRC-1 is metabolically quite versatile, growing aerobically, anaerobically, and phototrophically. Phototrophic growth is mediated by the light-driven proton pumping of bacteriorhodopsin, which forms a two-dimensional crystalline lattice in the purple membrane. Halobacterium NRC-1 is also highly resistant to ultraviolet and - radiation and displays sophisticated motility responses, including phototaxis, chemotaxis,
Transcript
Page 1: Genome Sequence of an Extremely Halophilic Archaeon

Extremely Halophilic Archaeon Sequence 383

383

From: Microbial GenomesEdited by: C. M. Fraser, T. D. Read, and K. E. Nelson © Humana Press Inc., Totowa, NJ

21

Genome Sequenceof an Extremely Halophilic Archaeon

Shiladitya DasSarma

INTRODUCTION

Extreme halophiles are novel microorganisms that require 5–10 times the salinity of

seawater (ca. 3–5M NaCl) for optimal growth (1,2). They include diverse prokaryotic

species, both archaeal and bacterial, and some eukaryotic organisms. Extreme halo-

philes are found in hypersaline environments near the sea or salt deposits of marine or

nonmarine origin. Two of the largest hypersaline lakes supporting a variety of halo-

philic species are the Great Salt Lake in the western United States and the Dead Sea in

the Middle East. Some of the most interesting hypersaline environments are small arti-

ficial solar salterns used for producing salt from the sea, which are distributed through-

out the world. Many hypersaline environments exhibit gradients of increasing salinity

temporally and produce sequential growth of progressively more halophilic species,

including complex microbial mats and spectacular blooms of bright red and red-orange

colored species. These environments are important ecologically, frequently supporting

entire populations of such exotic birds as pink flamingoes, which obtain their color from

the pigmented halophilic microorganisms. A critical feature of halophilic microbes that

prevents cell lysis in hypersaline environments is their high internal concentration of

compatible solutes (e.g., amino acids, polyols, and salts), which act as osmoprotectants.

Although a wide variety of halophiles has been cultured, the genome of only a single

extreme halophile, Halobacterium sp NRC-1, has been completely sequenced thus far

(3,4). This species is a typical halophile commonly found in many hypersaline environ-

ments, including the Great Salt Lake and solar salterns. Phylogenetically, it is classified

as an archaeon, a member of the third branch of life (Fig. 1). It has a growth optimum

of 4.5M NaCl, close to the saturation point, and a high concentration of K+ salts inter-

nally. Halobacterium NRC-1 is a mesophilic archaeon, with a temperature optimum of

42oC for growth. Alhough Halobacterium species are thought to have limited physio-

logical capabilities, strain NRC-1 is metabolically quite versatile, growing aerobically,

anaerobically, and phototrophically. Phototrophic growth is mediated by the light-driven

proton pumping of bacteriorhodopsin, which forms a two-dimensional crystalline lattice

in the purple membrane. Halobacterium NRC-1 is also highly resistant to ultraviolet and �-

radiation and displays sophisticated motility responses, including phototaxis, chemotaxis,

Page 2: Genome Sequence of an Extremely Halophilic Archaeon

384 DasSarma

and gas vesicle-mediated flotation. One of the most notable features of Halobacterium

NRC-1, revealed by genome sequencing, is a highly acidic proteome, which is likely

essential to maintain protein solubility and function in high salinity. Significantly, this

organism is amenable to analysis using well-developed genetic methodology, including

gene knockouts, expression vectors, and complementation systems, which make Halo-

bacterium NRC-1 a good model for functional genomic studies among extremophiles

and archaea (2).

In addition to Halobacterium NRC-1, several other halophiles are the subject of

ongoing genome projects. The most notable among these are two Dead Sea archaea,

Haloarcula marismortui and Haloferax volcanii (1), which are slightly less halophilic

than Halobacterium NRC-1, with an optimum salinity of 2–3M NaCl and a high mag-

nesium ion tolerance, reflecting the salt composition of their environment. They also

display metabolic capability for growth in media containing simple sugars and carbo-

hydrates as carbon and energy sources. Several other interesting categories of halo-

philes worthy of genomic studies include alkaliphilic halophiles, which grow in soda

lakes with pH of 9.0–11.0; psychrotrophic halophiles, which grow at freezing tempera-

tures in Antarctic lakes; bacterial halophiles, which tolerate a wide range of salinity;

and eukaryotic halophiles, such as the green algae, Dunaliella salina. Finally, sequenc-

ing of a haloarchaeal strain with a nearly identical chromosome to strain NRC-1 is also

in progress. A listing of current genome projects on halophiles is maintained on the Halo-

phile Genomes Web site at the University of Maryland Biotechnology Institute, Center

of Marine Biotechnology (http://zdna2.umbi.umd.edu).

Fig. 1. Whole genome tree of selected archaeal organisms. Gene content phylogeny done by

neighbor-joining using the SHOT web server (19) indicates that Halobacterium is located at

the base of the archaeal branch of the phylogenetic tree.

Page 3: Genome Sequence of an Extremely Halophilic Archaeon

Extremely Halophilic Archaeon Sequence 385

THE HALOBACTERIUM GENOME

The genomes of Halobacterium species were originally studied a half-century ago;

they are composed of two components, a major fraction that is G+C-rich and a rela-

tively A+T-rich (58% G+C) satellite (5). Subsequent studies showed that the satellite

deoxyribonucleic acid (DNA) corresponded mainly to large heterogeneous extrachro-

mosomal replicons containing many transposable insertion sequence (IS) elements (6).

For Halobacterium NRC-1, extensive mapping revealed the presence of three replicons:

pNRC100, about 200 kbp; pNRC200, nearly twice the size of pNRC100; and a 2-Mbp

chromosome (Fig. 2) (7,8). The pNRC100 replicon was found to be partly identical to

pNRC200 and to exist as inversion isomers (7). The chromosomes of strain NRC-1 and

another wild-type strain, GRB, were compared by restriction mapping, which showed

extensive regions of similarity and a few regions with differences, including a large

inversion and an insertion. Ordered cosmid libraries representing the genomes of

Halobacterium species GRB and H. volcanii were also constructed and compared by

hybridization, which indicated the lack of any detectable conserved gene organization

(9). These and other mapping projects suggest that significant diversity exists within

the genomes of halophilic archaea.

Genome Sequencing and Analysis

Because of the high G+C composition and the large number of IS elements, the Halo-

bacterium NRC-1 genome was sequenced in two stages. Initially, the pNRC100 replicon

was sequenced by a combination of random shotgun sequencing of libraries made from

purified covalently closed circular DNA and directed sequencing of cloned and mapped

HindIII fragments (3,7). This approach permitted the assembly of an unstable replicon

that undergoes frequent DNA rearrangements, including inversion isomerization, and

contains many IS elements. Subsequently, whole genome random shotgun sequencing

was performed, providing 7.5� coverage of the relatively stable large chromosome (4).

Remaining lower-quality regions were sequenced using polymerase chain reaction frag-

ments and by primer walking. The NRC-1 genome was assembled using the Phred, Phrap,

and Consed programs, initially masking all the known and putative new IS elements,

to avoid the formation of chimeric contigs (4,10).

The complete genome sequence of Halobacterium NRC-1 revealed a 2,571,010-bp

genome, including the 2,014,239-bp G+C-rich chromosome, and two smaller circles,

191,346-bp pNRC100, and 365,425-bp pNRC200 (Table 1; Fig. 2) (3,4). Interestingly,

pNRC100 and pNRC200 contained a 145,428-bp region of 100% identity, including

33- to 39-kb inverted repeats, which mediate inversion isomerization; the small single

copy region; and a part of the large single copy regions (Fig. 2) (7). The unique regions

of the large single copy region contained 45,918 bp for pNRC100 and 219,997 bp for

pNRC200. Glimmer (Gene Locator and Interpolated Markov Modeler) was used to iden-

tify 2,630 likely genes in the genome, of which 64% coded for proteins with significant

matches to the databases (4). In addition, 52 ribonucleic acid (RNA) genes were identi-

fied. About 40 genes in pNRC100 and pNRC200 coded for proteins likely to be essential

or important for cell viability, such as a DNA polymerase, TBP and TFB transcription

factors, and the arginyl–tRNA (transfer RNA) synthetase, suggesting that these repli-

cons should be classified as minichromosomes rather than megaplasmids (3,4).

Page 4: Genome Sequence of an Extremely Halophilic Archaeon

386 DasSarma

Page 5: Genome Sequence of an Extremely Halophilic Archaeon

Extremely Halophilic Archaeon Sequence 387

Proteome Analysis

One of the most dramatic results of genome sequencing of Halobacterium NRC-1

was the finding of an extremely acidic complement of encoded proteins, which is likely

directly related to protein function in its hypersaline (>4M KCl) cytoplasm (11). Cal-

culated isoelectric points (pIs) for predicted proteins showed an average pI of approx 5,

a prediction confirmed by proteomic analysis (Fig. 3). Similarly, acidic proteomes were

predicted from partial genome sequences of two other halophiles, H. marismortui and

H. volcanii. In contrast, the average pIs of nearly all other proteomes are close to neu-

tral. Notable exceptions are Methanobacterium thermoautotrophicum, which also con-

tains both an acidic proteome and a relatively high (~1M) internal concentration of K+

ions, and three hyperthermophiles (Pyrobaculum aerophilum, Pyrococcus furiosus, and

Sulfolobus solfataricus), which have relatively basic proteomes. Homology modeling

has shown that the acidic pI of Halobacterium NRC-1 proteins is correlated with a high

concentration of surface negative charge (11). For example, a transcription factor (TbpE)

and a topoisomerase subunit (GyrA) showed a marked increase in surface negative charge

when compared to their homologs in nonhalophilic organisms (11).

G+C Composition and IS Elements

Common characteristics of halophile genomes are their high G+C composition major

fraction, low G+C satellite fraction, and a preponderance of IS elements (6). For Halo-

bacterium NRC-1, the two pNRC replicons, which represent only 22% of the genome,

are substantially less G+C rich (58–59% G+C) than the large chromosome (68% G+C)

and contain a majority (69/91 or 76%) of the IS elements in the genome (Fig. 2). In addi-

tion, two regions of the chromosome are less G+C rich than average, with one 270-kbp

region (region I) containing 65% G+C and 13 IS elements and a second 150-kbp region

(region II) with 66% G+C and 4 IS elements (Fig. 2) (11). Interestingly, a 15-kbp region

Fig. 2. (Opposite page) (A) Circular map of the Halobacterium NRC-1 large chromosome

and (B) aligned linear genetic maps of pNRC100 and pNRC200 replicons. (A) The circular

map of the large chromsome plots contains locations of IS elements (outer scale), �-squared

analysis (red line), and G+C composition of open reading frames (black line). Colored bars asso-

ciated with the outermost circle indicate the position of the chromosomal IS elements (ISH1,

beige; ISH2, purple; ISH3, green; ISH4, yellow; ISH6, pink; ISH8, blue; ISH10, red). Roman

numerals I and II indicate AT-rich islands. (B) The circular replicons are depicted in linear forms,

with the genes and IS elements represented as blocks. The two replicons contain 145,428 bp of

identity and either 45,918 bp or 219,997 bp of unique DNA for pNRC100 and pNRC200, respec-

tively (3,4). The 33- to 39-kb inverted repeats are shown in yellow (conserved in all copies) and

orange (conserved in some, but not all, copies); the small single copy regions are in purple; the

common large single copy regions are in bright green; and the unique large single copy regions

are in tan (pNRC100) and light green (pNRC100). The IS elements are shown in dark orange (ISH2),

brown (ISH3), indigo (ISH5), blue (ISH7), dark green (ISH8), teal (ISH9), red (ISH10), and

blue-gray (ISH11). The pNRC replicons contain 69 IS elements (44 unique), 29 on pNRC100

and 40 on pNRC200; with 6 elements in the inverted repeats (repeated twice in both pNRC100

and pNR200 each), 4 elements in the SSC region in both pNRC100 and pNRC200, 7 elements in

the common large single copy region in both pNRC100 and pNRC200; and 23 elements in the

unique large single copy regions, 6 in pNRC100 and 17 in pNRC200. (Figure 2A reproduced

with permission from Cold Spring Harbor Laboratory Press, ref. 11.)

Page 6: Genome Sequence of an Extremely Halophilic Archaeon

388 DasSarma

on the pNRC inverted repeats is higher in G+C content (64%) than pNRC100 as a whole

(58%) and lacks any IS elements (3), indicating the occurrence of genomic regions with

diverse character in all three replicons. All together, there are 91 IS elements, which rep-

resent 12 families in the NRC-1 genome (Table 1) (4). These findings suggest the involve-

ment of IS elements in DNA exchange between the replicons of Halobacterium NRC-1.

The high G+C composition of Halobacterium NRC-1 is likely an adaptation to survi-

val under intense solar radiation (e.g., to minimize targets for thymine dimer formation).

Statistically, the number of thymine dimer sites is expected to be nearly 60% lower for

the NRC-1 large chromosome compared to a comparable size replicon of 50% G+C.

However, dinucleotide analysis indicated even fewer sites, by an additional 20%, than

predicted from the G+C content (11). The high G+C composition also results in an

extreme third-position G+C bias in the codon usage (86% G+C vs 70% and 46% in the

first two positions) (11).

ANNOTATION OF THE HALOBACTERIUM GENOME

The Halobacterium Genome Consortium, an international group representing 12

institutions, conducted annotation of the NRC-1 genome from summer 1999 to summer

2000. Data were released starting at 3� coverage periodically until completion, with a work-

shop held in Amherst, Massachusetts, in January 2000. This effort led to a thorough

analysis of this first halophile sequence and made it maximally useful to the commu-

nity. In the subsequent 2-year period, numerous additional genes have been identified.

The high points of the current annotation are summarized here, and a comprehensive

database is available at the Halophile Genomes web site (http://zdna2.umbi.umd.edu).

Table 1Halobacterium NRC-1 Genome Statistics

Total Chromosome pNRC200 pNRC100

Size (bp) 2,571,010 2,014,239 365,425 191,346

G+C composition (%) 65.9 67.9 59.2 57.9

Number of predicted genes 2,682 2,111 374 197

Coding (%) 84 87 76 71

Number of IS elements 91 22 40 29

ISH1 1 1 0 0

ISH2 13 4 5 4

ISH3 23 5 10 8

ISH4 2 1 0 1

ISH5 6 0 4 2

ISH6 2 1 1 0

ISH7 4 0 2 2

ISH8 21 5 10 6

ISH9 4 0 2 2

ISH10 6 2 2 2

ISH11 7 2 3 2

ISH12 2 1 1 0

Page 7: Genome Sequence of an Extremely Halophilic Archaeon

Extremely Halophilic Archaeon Sequence 389

DNA Replication

The Halobacterium NRC-1 genome codes for a heterodimeric family D DNA poly-

merase found in Archaea; many eukaryoticlike replication proteins; 2 family B DNA

polymerases, one coded by pNRC200; origin recognition and helicase recruiters (10

Orc1/Cdc6); replicative helicase (MCM); ssDNA binding proteins (6 Rfa); primases

(2 Pri); clamp loaders (RfcABC); processivity clamp (2 proliferating cell nuclear antigen

homologs); type I topoisomerase (TopA); type II topoisomerases (Top6A and Top6B);

RNA primer removal (Rad2 and RNaseH); and a few bacterial genes involved in repli-

cation, a primase (DnaG) and topoisomerase (GyrA and GyrB). Interestingly, multiple

copies of genes coding for eukaryotic origin recognition complex proteins Orc1/Cdc6

were found, including 3 scattered on the large chromosome, suggesting the possibility

of multiple replication origins (11). When analyzed for strand-specific G+C nucleo-

tide variation or G+C skew, the large chromosome of Halobacterium NRC-1 was found

to contain 4 inflection points. Two of the three orc1/cdc6 genes were located near the

inflection points, suggesting that Halobacterium NRC-1 has a novel replication system

with two separate origins of replication on the large chromosome (11).

DNA Repair

The Halobacterium NRC-1 genome contains many DNA repair genes (Fig. 4), likely

necessary to repair DNA damage resulting from intense solar radiation in its environ-

ment (12). Consistent with expectations, NRC-1 displays high levels of resistance to

both ultraviolet and �-radiation. Photoreactivation is a very efficient process in Halo-

bacterium, and two photolyase/cryptochrome homologs are encoded in the genome,

Fig. 3. Average pI profiles of proteomes predicted from genome sequences: Halobacterium

sp NRC-1 (NRC1), Haloarcula marismortui (Hma), Haloferax volcanii (Hvo), Archaeoglobus

fulgidus (Afu), Methanosarcina acetivorans (Mac), Methanococcus jannaschii (Mja), Methano-

bacter thermoautotrophicum (Mth), Pyrobaculum aerophilum (Pae), Pyrococcus furiosus (Pfu),

Sulfolobus solfataricus (Sso), Thermoplasma acidophilum (Tac), Bacillus subtilis (Bsu), Esche-

richia coli K12 (Eco), Saccharomyces cerevisiae (Sce).

Page 8: Genome Sequence of an Extremely Halophilic Archaeon

390 DasSarma

Page 9: Genome Sequence of an Extremely Halophilic Archaeon

Extremely Halophilic Archaeon Sequence 391

one of which probably functions in DNA repair. A base excision repair is likely carried

out by the Ogg, AlkA, MutY, and Nth homologs and probably by XthA, a homolog of the

endonuclease IV family of AP endonuclease, and a Ogt, possible methylation damage

repair methylase. Halobacterium NRC-1 also encodes homologs of the bacterial exci-

sion repair complex UvrABCD. Interestingly, the presence of some genes coding for

homologs of the eukaryotic form of excision repair (Rad2, Rad3, Rad25, and ERCC4)

suggests the existence of duplicate repair systems in NRC-1. Mismatch repair proteins

MutS1, MutS2, and MutL are found in Halobacterium NRC-1. RadA1 and RadA2, homo-

logs of RecA/Rad51 genes that are likely encoding recombinases; MRE11; and a Holli-

day junction resolvase likely involved in homologous recombination and recombina-

tional repair are also present. A homolog of the bacterial UmuC polymerase for damage

bypass is found in the Halobacterium NRC-1 genome, as is a eukaryotic adenosine tri-

phosphate (ATP)-type DNA ligase.

Transcription

Like other archaea, a simplified version of a eukaryotic RNA polymerase II–like tran-

scription system is found in Halobacterium NRC-1; it contains Rpo subunits A, C, B', B",

E', E", H, K, L, N, and M (4). In addition, a surprising finding was that the NRC-1 genome

codes for 13 copies of TBP and TFB transcription factor genes, including 5 complete and

1 partial tbp genes (4 located on pNRC100, 1 on pNRC200, and 1 on the large chromo-

some) and 7 tfb genes (2 on pNRC200 and 5 on the large chromosome) (13). These results

suggested the possibility of a novel mechanism for gene regulation using alternate TBP–

TFB combinations for promoter selection. Consistent with this hypothesis, analysis of

Fig. 4. (Opposite page) Integrated view of the genome of Halobacterium NRC-1 (4). Aspects

of energy production, nutrient uptake, membrane assembly, cation and anion transport, and

signal transduction are depicted. ATP synthesis by chemiosmotic coupling of proton transport

by the respiratory chain and by light-driven proton pumping by bacteriorhodopsin (BR; purple

oval) or chloride transport by halorhodopsin (HR; blue oval) is shown. Below, the semiphos-

phorylated Entner–Doudoroff pathway is shown, and the presence of fatty acid oxidation and

the citric acid cycle is indicated. Enzymes not yet identified are marked with asterisks. A vari-

ety of nutrient uptake systems (represented by yellow or brown structures) coded by the genome,

including glycerol 3-phosphate (UgpABCE) and sugar (RbsAC) ABC transporters, a lactate

(LctP) transporter, formate–oxalate antiporter (OxiT), spermidine and putrescine uptake ABC

transporter (PotABCD), and amino acid (PutP, Cat) and dipeptide (DppABCDF) transporters,

are shown. Other amino acid uptake systems, represented by a generic ABC transporter, are also

likely to exist. Components of the protein translocation machinery (SecDEFY, SRP19, SRP54,

SR�) (in black) are shown. Carotenoid and retinal (Ret) biosynthesis is shown. Cation transport-

ers (in green) shown are for K+ (TrkAH and KdpABC), Na+ (NhaC), Cd2+ (ZntX and Cd efflux

ATPase), Co2+ (CbiNOQ), Cu2+ (NosFY), Fe3+ (iron permease and HemUV), and Zn2+ (ZurMA).

Anion transporters shown (in red) are for SO42� (CysAT), PO4

3� (PstABC and phosphate perme-

ase), Cl� (chloride channel), and arsenate (ArsABC). A complex system of photoreceptors and

signal transduction components are shown, including 2 sensory receptors (SRI shown in blue

and SRII shown in orange), 17 transducers (Htr I–Htr X, HtrXII–Htr XVIII) responding to light

(h�), O2, or amino acids, as indicated. Transmission of the motility signal to the flagellar motor

via CheAW and CheY is shown by arrows. A flagellum is depicted as a wavy line. Single exam-

ples of sensor kinases (membrane bound [white rhombus] or cytoplasmic) and response regula-

tors are identified. Gas vesicles (white ovals) and DNA repair systems are indicated within the cell.

Page 10: Genome Sequence of an Extremely Halophilic Archaeon

392 DasSarma

the genome sequence and saturation mutagenesis of the bop promoter provided evidence

for alternate TATA box sequences (14). Nearly 100 transcriptional regulators, mostly

bacterial type, have also been identified.

Protein Synthesis

The translation system of Halobacterium NRC-1 has hybrid eukaryotic and bacterial

character, but like other Archaea, all of its ribosomal proteins have eukaryotic homo-

logs. Interestingly, the ribosomal protein genes of Halobacterium NRC-1 are organized

into multigene clusters that resemble operons of bacteria. In addition to the 52 RNAs

(16S, 23S, and 5S rRNAs, 47 tRNAs [transfer RNAs], 7S RNA, and RNaseP), NRC-1

has 18 different aminoacyl–tRNA synthetases coded in the genome plus the GatABC

amidotransferases for charging with glutamine and asparagine (4). Interestingly, one

aminoacyl–tRNA synthetase, ArgRS, closely related to the bacterial and yeast mito-

chondrial enzymes, is coded by pNRC200.

For protein secretion, the Halobacterium NRC-1 general secretory (Sec) machinery

is a hybrid of eukaryotic and bacterial systems. Sec61�, Sec61�, SRP54, SRP19, and the

7S RNA are related to the corresponding eukaryotic factors, while FtsY, SecD, and SecF

(but not SecA) are related to the bacterial factors (4). In addition to the Sec system,

recent bioinformatic analysis has suggested that the twin-arginine (Tat) protein export

pathway used for secretion of mainly redox proteins in bacteria is also present in NRC-

1 and may be commonly used in this archaeon (15,16).

Cell Envelope

Halobacterium NRC-1 cells are surrounded by a single lipid bilayer membrane and

an S layer assembled from the cell surface glycoprotein. The cytoplasm is in osmotic

equilibrium with the hypersaline environment, with a correspondingly high intracel-

lular K+ concentration that may be equivalent to the external Na+ concentration. Like

other Archaea, the polar lipids are based on archaeol, a glycerol diether lipid containing

phytanyl chains derived from C20 isoprenoids. The Halobacterium NRC-1 genome con-

tained all of the key enzyme genes of isoprenoid synthesis, including HMG–coenzyme

A reductase (MvaA), the target of the growth inhibitor mevinolin (4). To maintain the

ionic balance, NRC-1 encodes multiple K+ transporters, including KdpABC, an ATP-driven

K+ transport system, and TrkAH, a low-affinity K+ transporter driven by the membrane

potential (Fig. 4). Active Na+ efflux is likely mediated by NhaC proteins coding for uni-

directional Na+/H+ antiporters. Interestingly, genes coding KdpABC and copies of TrkA

(three of five) and NhaC (one of three) are found on pNRC200. In addition, active trans-

porters for nutrient uptake were identified for cationic amino acids (Cat) and proline (PutP),

dipeptides (DppABCDF), oligopeptides (AppACF), a sugar transporter (Rbs), removal of

heavy metals (arsenite and cadmium) and other toxic compounds (multidrug resistance

homologs), and multiple copies of phosphate transporter systems, PstABC, and phosphate

permease.

Purple Membrane

Halobacterium NRC-1 contains purple membrane, a two-dimensional crystalline lattice

of the light-driven proton pump, bacteriorhodopsin, a complex of a protein, bacterio-

Page 11: Genome Sequence of an Extremely Halophilic Archaeon

Extremely Halophilic Archaeon Sequence 393

opsin, and a chromophore, retinal (Fig. 4). Under high-illumination conditions, cells

can grow phototrophically, a capability recently recognized in planktonic bacteria (12).

Five purple membrane regulon genes, which are clustered on the chromosome and coor-

dinately regulated, were identified, including bop, specifying bacteriorhodopsin; crtB1

and brp, coding the first and last committed steps of retinal synthesis, respectively, blp, a

gene of unknown function; and bat, the sensor–regulator (14). The bat gene product

(Bat) is a member of a small gene family, containing a GAF (cGMP-binding) domain,

PAS/PAC (redox-sensing) domain, and DNA-binding helix-turn-helix motif, which

likely binds an UAS (upstream activator protein) sequence for gene activation. The bop

gene TATA box sequence deviates from the consensus archaeal promoter sequence, sug-

gesting the involvement of novel factors, such as alternate TBP and TFB proteins, in its

transcription (14).

Taxis and Signal Transduction

Halobacterium species are highly chemotactic and phototactic, with both chemical

gradients and gradients of light intensity or color modulating their swimming behavior.

A large number of taxis genes have been identified, including sopI and sopII, coding

for the phototaxis receptors; SRI and SRII, which are in the bacteriorhodopsin family

(and also including halorhodopsin, a chloride pump) (Fig. 4) (12). SRI mediates attrac-

tant responses to orange light and repellent responses to near-ultraviolet light, while SRII

is a blue light repellent photoreceptor. Interestingly, homologs of haloarchaeal rhodopsins

have recently been found in the genomes of fungi, algae, marine bacteria, and cyanobac-

teria (12). A total of 17 htr genes coding for integral membrane proteins homologous to

bacterial chemotaxis receptors were found, as were a complete set of che genes encod-

ing chemotaxis determinants. There are 6 flagellin genes and an archaeal-type flagellar

apparatus (16). A large gene cluster, flaD-K, codes the archaeal flagellar apparatus,

with flaD, flaE, flaG, flaH, flaI, and flaJ similar to other archaea and only flaK resem-

bling a bacterial flagellar regulator. Two-component regulatory systems are evident in

the Halobacterium NRC-1 genome, including 6 response regulator genes and 14 histi-

dine kinases. The Halobacterium NRC-1 genome revealed the presence of several possi-

ble circadian photoregulators, including a eukaryotic cryptochrome and a cyanobacterial

KaiC-like protein, consistent with a circadian rhythm in this phototrophic microbe (12).

Gas Vesicles

Halobacterium species, like many photosynthetic aquatic prokaryotes, possess the

ability to regulate buoyancy by the synthesis of gas-filled vesicles (Fig. 4). The require-

ments for gas vesicle formation have been extensively studied in NRC-1 by genetic analy-

sis (17). A cluster of genes, gvpMLKJIHGFEDACN(O), present on both pNRC100 and

pNRC200 in NRC-1 was shown to be necessary and sufficient for wild-type gas vesicle

synthesis. Interestingly, the genome sequence of Halobacterium NRC-1 also revealed

a silent, but nearly complete, gvp gene cluster, lacking only gvpM, on pNRC200 (4,12).

Carotenoids and Retinal

Halobacterium produces red-orange carotenoids that are essential for phototrans-

duction and protection against photodamage, the most abundant being bacterioruberins

Page 12: Genome Sequence of an Extremely Halophilic Archaeon

394 DasSarma

(Fig. 4). Genes encoding bacterial phytoene synthases have been identified in Halobac-

terium NRC-1, crtB1, and crtB2, and several genes coding for subsequent desaturation

steps are likely coded by crtI1, crtI2, and crtI3 (4). Genes that catalyze subsequent con-

version to bacterioruberin have not yet been identified. In a branch of the carotenoid

pathway, lycopene is cyclized by the crtY gene product to form �-carotene, which is oxi-

datively cleaved to form retinal by the brp and blh gene products (Fig. 4) (18). For certain

steps of the carotenoid biosynthetic pathway, multiple genes may exist in Halobacter-

ium NRC-1, and these may be differentially regulated by light or oxygen.

Energy Metabolism

Halobacterium NRC-1 can grow chemoorganotrophically, either aerobically or ane-

robically, and has phototrophic capability using bacteriorhodopsin. Halobacterium requires

all but 5 of the 20 amino acids for growth, and several amino acids may be used as a

source of energy. Aerobically, arginine and aspartate can be used via the citric acid cycle;

anaerobically, arginine can be used via the arginine deiminase pathway, coded by the

arcRACB genes on pNRC200 (Fig. 4) (3). Genes for a gluconeogenic pathway for car-

bohydrate synthesis during growth on amino acids and nearly all genes for a reverse

Embden–Meyerhof glycolytic pathway are present. Although Halobacterium is reported

to be unable to metabolize sugars, a sugar uptake transporter and genes coding for glucose

dehydrogenase and 2-keto-3-deoxygluconate kinase, a semi-phosphorylated Entner–

Doudoroff pathway, are present in Halobacterium NRC-1. The genes for gluconeogenesis

and catabolism of glyceraldeyde 3-phosphate (produced by glucose catabolism) to pyru-

vate are also present. Halobacterium NRC-1 also possesses genes encoding enzymes of

the bacterial-like fatty acid �-oxidation pathway and a 2-oxoacid dehydrogenase complex.

EVOLUTION AND LATERAL GENE TRANSFERS

Halobacterium NRC-1 is an organism of evolutionary interest that is distantly related

to some methanogens and is classified as a euryarchaeote based on the 16S rRNA

sequence. After complete sequencing, the Halobacterium NRC-1 genome was com-

pared to 11 other complete genomes by gene content analysis using the DARWIN suite

of programs (4). The results confirmed the archaeal status of NRC-1, with the closest

relatives being Archeoglobus fulgidus and Methanococcus jannaschii. Interestingly,

however, similarities were also noted to the Gram-positive bacterium, Bacillus subtilis,

and the radiation-resistant bacterium, Deinococcus radiodurans. More recently, whole

genome analysis using a larger number of completed genomes showed Halobacterium

NRC-1 to branch at the root of the archaeal tree (Fig. 1) (19). The discrepancy between

the 16S rRNA and whole genome trees requires a more detailed investigation because

it suggests the possibility for the appearence of halophiles at a very early point in evo-

lution. However, an additional possibility is that the position of NRC-1 in whole genome

trees is distorted, with Halobacterium pulled away from the other archaea and toward

the bacteria as a consequence of many lateral gene transfers from bacteria.

A comprehensive analysis of gene histories of Halobacterium NRC-1 has recently

been conducted (S. P. Kennedy and S. DasSarma, unpublished). Detailed phylogenetic

analysis of proteins catalogued as having bacterial phylogenies in the National Center

for Biotechnology Information Clusters of Orthologous Groups database was carried

Page 13: Genome Sequence of an Extremely Halophilic Archaeon

Extremely Halophilic Archaeon Sequence 395

out. In addition bacterial-like genes clustered together in the genome and coding speci-

fic metabolic pathways were also subjected to phylogenetic analysis. Based on this

analysis, several hundred proteins, including biosynthetic, transport, and energy systems

(e.g., histidine utilization, purine metabolism, glycerol utilization) and components of

the electron transport chain were found to display clear bacterial histories. These genes

are likely to have been acquired in this halophile by lateral gene transfers. Surprisingly,

no physical link was observed with IS elements for these bacterial genes, suggesting that

the genes were acquired at an early point in evolution, and any vestige of the underlying

acquisition recombinational activity has been ameliorated. Although the mechanisms

responsible for interdomain genetic exchanges are unknown, the finding of hundreds

of bacterial genes in NRC-1 likely reflects the long-term opportunity for exchanges

between halophilic bacteria and archaea cohabiting hypersaline environments over evo-

lutionary time. In this respect, NRC-1 is similar to some other mesophilic archaea (20)

and hyperthermophilic bacteria (21) in having large numbers of horizontally acquired

genes in its genome.

Acquisition of Respiratory Chain Components

Two of the most interesting cases of possible lateral gene transfers into Halobacter-

ium NRC-1 are the genes encoding electron transport chain factors and biosynthetic

proteins (11). Ten nuo genes, encoding subunits of NADH dehydrogenase, along with 3

cox genes, encoding subunits of cytochrome-c oxidase, are clustered together into prob-

able operons, as are 6 men genes, for menaquinone biosynthesis. Interestingly, the nuo

gene order is conserved with respect to Escherichia coli, with closest branching to Syne-

chocystis sp PCC6803; the men gene order is conserved with respect to both E. coli and

D. radiodurans, with closest branching to B. subtilis. Moreover, the G+C analysis of

these two groups of genes showed they were distinguishable from the average chromo-

somal genes (64 or 73% compared with 68%). These results point to the interesting pos-

sibility that adaptation of halophiles to an oxidizing atmosphere occurred via the acqui-

sition of electron transport chain components from aerobic bacteria through lateral transfer

events. Further analysis is necessary to determine whether such transfers of respiratory

genes have occurred once or repeatedly in the evolution of the diversity of modern

halophiles.

Evolution of Purple Membrane

Retinal-containing chromoproteins like bacteriorhodopsin in purple membrane and

sensory rhodopsins have recently been discovered in diverse bacteria and eukaryotes

and are therefore present in all three branches of life, Archaea, Bacteria, and Eukarya

(12,22). Although the evolutionary origin of retinal chromoproteins is unclear at present,

their wide distribution in nature is consistent with horizontal transmission. An interest-

ing further speculation is that primordial rhodopsins were an early evolutionary inven-

tion and may have been responsible for the original dominant form of phototrophy in the

sea, pre-dating chlorophyll-based photosynthesis. Such early phototrophs, with the rela-

tively simple capacity for coupling transmembrane light-driven proton pumping to adeno-

sine triphosphate synthesis (22,23), could have arisen in a reducing atmosphere (although

a small quantity of oxygen would have been necessary for the synthesis of retinal). Evolu-

tion of organisms with more complex chlorophyll-based photosynthetic systems operating

Page 14: Genome Sequence of an Extremely Halophilic Archaeon

396 DasSarma

with great efficiency could subsequently have displaced purple membrane–containing

organisms from most environments. Interestingly, the complementarity of the spectra

for purple membrane, with a peak at 568 nm, and photosynthetic membranes, with a

trough in this same wavelength, is striking (Fig. 5) and is consistent with coevolution

of the two types of membranes. Moreover, both chlorophyll-based cyanobacteria and

purple membrane–based haloarchaea still coexist in modern hypersaline environments,

with the former dominating at relatively lower salinity and the latter dominating at

saturating salinity.

Evolution of pNRC Replicons

The genome organization of Halobacterium NRC-1, with a large chromosome and

two related extrachromosomal replicons, is both complex and intriguing. One possible

reason for the maintenance of multiple replicons, including pNRC100 and pNRC200,

is that they have captured some essential genes and are therefore required for viability.

The compatibility between these related replicons may be explained by the presence of

multiple origins of replication of different compatibility groups (3,24). Because dozens

of copies of IS elements are present on these replicons, the transposable elements are

likely responsible for frequently promoting exchanges of DNA between them. More-

Fig. 5. Ultraviolet-visible spectra of Halobacterium NRC-1 purple membrane (PM), red mem-

brane (RM), and photosynthetic membrane (PM). Purple membrane and red membrane were

separated on a sucrose gradient, and spectra were plotted with photosynthetic membrane. The

complementarity of purple and photosynthetic membrane spectra is apparent, consistent with

coevolution of the two membranes.

Page 15: Genome Sequence of an Extremely Halophilic Archaeon

Extremely Halophilic Archaeon Sequence 397

over, once an extrachromosomal replicon is established in two or more copies, continued

DNA exchanges between individual copies of the smaller replicons and the large chro-

mosome could result in generation of additional genomic diversity.

Such a possible scheme has been proposed for the evolution of pNRC100, including

multiple replicon fusions of precursor plasmids, followed by the acquisition of chromo-

somal genes, with both processes mediated by IS elements (3). The duplication of a por-

tion of a pNRC100 precursor replicon through unequal crossing over of two IS element

pairs would have resulted in the formation of inverted repeats, which subsequently would

serve to stabilize the region within the repeats and create inversion isomers. Through

such processes, essential genes may have been captured from the chromosome, stabi-

lized on the pNRC100 and pNRC200 replicons, and resulted in their achievement of

minichromosome status.

The existence of multiple minichromosome replicons with the capability to acquire

new genes and harboring multiple essential genes is a highly novel character of the Halo-

bacterium NRC-1 genome. As a result, the NRC-1 genomic condition may be one of a

competitive dynamic equilibrium between several essential replicons in the genome.

Such a condition may arise from time to time in evolution and subside for intervening

periods through reduction in numbers by replicon fusions. The heterogeneity of mini-

chromosomes among Halobacterium strains is testament to such underlying dynamic

processes (25). Given these findings in Halobacterium NRC-1, it is not inconceivable

that competition between replicons is a general phenomenon in evolution and may play

an important role in shaping the long-term evolution of prokaryotic genomes, includ-

ing the evolution of new chromosomes from plasmids.

FUTURE PROSPECTS

The complete sequence of Halobacterium NRC-1 has provided an excellent platform

for evolutionary and comparative genomic analysis of an extremely halophilic archaeon

(4,11). As one of the few sequenced mesophilic archaea, which coinhabits a dynamic

environment populated by a multitude of bacteria, hundred of genes with bacterial or

uncertain histories have been uncovered. Additional genomic studies of diverse halo-

philes (1) (e.g., marine haloarchaea) are necessary to provide a significantly better under-

standing of the evolutionary position of these novel microorganisms. The finding of

large dynamic extrachromosomal replicons, containing both essential genes and a large

number of IS elements, has suggested the occurrence of multiple chromosomes that may

compete for genes (3).

In addition to evolutionary insights, the ease of culture and the wide range of biolog-

ical responses of halophiles promise significant opportunities in functional genomics

and biotechnology. DNA arrays, proteomics, and gene knockouts are all approaches

available for further studies of Halobacterium biology (2,26). The recent use of a whole

genome microarray to study purple membrane expression illustrates the power of func-

tional genomic approaches and remind us of the need to adhere to established rigorous

genetic practices in the postgenomic era (27,28). Significantly, halophilic archaea serve

as excellent models for fundamental aspects of eukaryotic biology (e.g., DNA replica-

tion, transcription, and translation). Finally, halophilic proteins and complexes, many

of which are extremely novel, provide genuine future opportunities for biotechnology,

including the development of new vaccines and antibiotics (29,30).

Page 16: Genome Sequence of an Extremely Halophilic Archaeon

398 DasSarma

ACKNOWLEDGMENTS

Studies of haloarchaeal genomics in my laboratory have been generously supported

by the National Science Foundation. I wish to thank many current and former students

and associates and collaborators in the Halobacterium Genome Consortium who pro-

vided much of the information collected in this chapter. Special thanks are given to Dr.

Philip Harriman for support and encouragement.

REFERENCES

1. DasSarma S, Arora P. Halophiles. In: Encyclopedia of Life Sciences. London: Macmillan, 2000,

pp. 458–466.

2. DasSarma S, Robb FT, Place AR, et al. (eds). Archaea: A Laboratory Manual—Halophiles.

Cold Spring Harbor, NY: Cold Spring Harbor, Laboratory Press, 1995.

3. Ng W-L, Ciufo SA, Smith TM, et al. Snapshot of a large dynamic replicon from a halophilic

archaeon: megaplasmid or minichromosome? Genome Res 1998; 8:1131–1141.

4. Ng WV, Kennedy SP, Mahairas GG, et al. Genome sequence of Halobacterium species NRC-1.

Proc Natl Acad Sci USA 2000; 97:12,176–12,181.

5. Joshi JG, Guild WR, Handler P. The presence of two species of DNA in some halobacteria.

J Mol Biol 1963; 6:34–38.

6. Charlebois RL, Doolittle WF. Transposable elements and genome structure in halobacteria. In:

Berg DE, Howe MM (eds). Mobile DNA. Washington, DC: American Society for Microbiology,

1989, pp. 297–307.

7. Ng W-L, Kothakota S, DasSarma S. Structure of the large gas vesicle plasmid in Halobacterium

halobium: inversion isomers, inverted repeats, and insertion sequences. J Bacteriol 1991; 173:

1958–1964.

8. Hackett NR, Bobovnikova Y, Heyrovska N. Conservation of chromosomal arrangement among

three strains of the genetically unstable archaeon Halobacterium species. J Bacteriol 1994; 176:

7711–7718.

9. St Jean A, Charlebois RL. Comparative genomic analysis of the Haloferax volcanii DS2 and

Halobacterium sp GRB contig maps reveals extensive rearrangement. J Bacteriol 1996; 178:

3860–3868.

10. Gordon D, Abajian C, Green P. Consed: a graphical tool for sequence finishing. Genome Res

1998; 8:195–202.

11. Kennedy SP, Ng WV, Salzberg SL, Hood L, DasSarma S. Understanding the adaptation of Halo-

bacterium species NRC-1 to its extreme environment through computational analysis of its genome

sequence. Genome Res 2001; 11:1641–1650.

12. DasSarma S, Kennedy SP, Berquist B, et al. Genomic perspective on the photobiology of Halo-

bacterium species NRC-1, a phototrophic, phototactic, and UV-tolerant haloarchaeon. Photosyn

Res 2001; 70:3–17.

13. Baliga NS, Goo YA, Ng WV, Hood L, Daniels CJ, DasSarma S. Is gene expression in Halo-

bacterium NRC-1 regulated by multiple TBP and TFB transcription factors? Mol Microbiol 2000;

36:1184–1185.

14. Baliga NS, Kennedy SP, Ng WV, Hood L, DasSarma S. Genomic and genetic dissection of an

archaeal regulon. Proc Natl Acad Sci USA 2001; 98:2521–2525.

15. Bolhuis A. Protein transport in the halophilic archaeon Halobacterium sp NRC-1: a major role

for the twin-arginine translocation pathway? Microbiology 2002; 148:3335–3346.

16. Patenge N, Berendes A, Engelhardt H, Schuster SC, Oesterhelt D. The fla gene cluster is

involved in the biogenesis of flagella in Halobacterium. Mol Microbiol 2001; 41:653-663.

Page 17: Genome Sequence of an Extremely Halophilic Archaeon

Extremely Halophilic Archaeon Sequence 399

17. DasSarma S, Arora P. Genetic analysis of the gas vesicle gene cluster in haloarchaea. FEMS

Microbiol Lett 1997; 153:1–10.

18. Peck RF, Echavarri-Erasun C, Johnson EA, et al. brp and blh are required for synthesis of the

retinal cofactor of bacteriorhodopsin in Halobacterium. J Biol Chem 2001; 276:5739–5744.

19. Korbel JO, Snel B, Huynen MA, Bork P. SHOT: a web server for the construction of genome

phylogenies. Trends Genet 2002; 18:158–162.

20. Deppenmeier U, Johann A, Hartsch T, et al. The genome of Methanosarcina mazei: evidence for

lateral gene transfer between bacteria and archaea. J Mol Microbiol Biotechnol 2002; 4:453–461.

21. Nelson KE, Clayton RA, Gill SR, et al. Evidence for lateral gene transfer between archaea and

bacteria from genome sequence of Thermotoga maritima. Nature 1999; 399:323–329.

22. Beja O, Aravind L, Koonin EV, et al. Bacterial rhodopsin: evidence for a new type of photo-

trophy in the sea. Science 2000; 289:1902–1906.

23. Racker E, Stoeckenius W. Reconstitution of purple membrane vesicles catalyzing light-driven

proton uptake and adenosine triphosphate formation. J Biol Chem 1974; 249:662–663.

24. Ng WL, DasSarma S. Minimal replication origin of the 200-kilobase Halobacterium plasmid

pNRC100. J Bacteriol 1993; 175:4584–4596.

25. Ng W-L, Arora P, DasSarma S. Large deletions in class III gas-vesicles deficient mutants of

Halobacterium. Sys Appl Microbiol 1994; 16:560-568.

26. Peck RF, DasSarma S, Krebs MP. Homologous gene knockout in the archaeon Halobacterium

with ura3 as a counterselectable marker. Mol Microbiol 2000; 35:667–676.

27. Baliga NS, Pan M, Goo YA, et al. Coordinate regulation of energy transduction modules in

Halobacterium sp analyzed by a global systems approach. Proc Natl Acad Sci USA 2003; 99:

14,913–14,918.

28. DasSarma S. Biology reports Ltd. faculty of 1000 commentary. Available at: http://www.faculty

of1000.com/article/12403819. Accessed January 8, 2003.

29. Stuart ES, Morshed F, Sremac M, DasSarma S. Antigen presentation using novel particulate

organelles from halophilic archaea. J Biotechnol 2001; 88:119–128.

30. Hansen JL, Ippolito JA, Ban N, Nissen P, Moore PB, Steitz TA. The structures of four macro-

lide antibiotics bound to the large ribosomal subunit. Mol Cell 2002; 10:117–128.


Recommended