Chapter 2 PRNP and PrP
Chapter 2: PRNP and PrP
52
Chapter 2 PRNP and PrP
Chapter 2: PRNP and PrP
In this Chapter, I first discuss features of the prion protein gene PRNP and of vertebrate
prion proteins. Then I describe background and logic of my strategy for discovery of
PRNP homologues. Finally, I outline arguments for the Kangaroo Genome Project and
the reasons why I used the tammar wallaby PRNP in comparative genomic analysis.
2.1 Vertebrate Prion Proteins
The set of known vertebrate proteins comprising the prion protein family consists of 99
members, from fish to mammals. There are 79 mammalian PrPs (78 eutherian and 1
marsupial), 14 bird PrPs, 2 reptile PrPs, 1 amphibian PrP and 4 fish PrP homologue
sequences available in the NCBI Entrez Protein database (Figure 2.1).
There is conservation of the general sequence features among the vertebrate PrPs. Prion
proteins can be provisionally divided into four regions with distinct amino acid
composition: the basic region (region 1), the repeats or low-complexity sequence
(region 2), the hydrophobic region (region 3), and the C-terminal region (region 4).
However, the PrP regions from different vertebrate classes exhibit differences in their
primary sequences (Chapter 6.3).
Most notably, the PrPs show conservation in the middle hydrophobic sequence, in the
presence of one disulfide bond and two N-glycosylation sites in the C-terminal domain,
and in the presence of the N- and C-terminal signal sequences for extracellular export
and attachment of a GPI anchor. On the other hand, the N-terminal repeat region is
variable, both in repeat motif length and sequence, and is entirely absent in frog PrP.
The species barrier in prion transmission is determined in part by the sequence
similarity between host PrPC and exogenous PrPSc (Chapter 1.2). Variability and
conservation among prion protein sequences is therefore important because of the risk
of prion transmission.
53
Chapter 2 PRNP and PrP
Figure 2.1. Overall structures of PrP, stPrP, PrP-like and Sho proteins showing: S, signal sequence; B, basic region; H, hydrophobic region; R/PGH, PGH-rich repeats; R/GH, GH-rich repeats; B,R/RG, RG-rich basic repeats; B,R, basic repeats; N, N-glycosylation site; S-S, disulfide bridge; GPI, glycophosphatidylinositol anchor; GY and GYH, GY- and GYH-rich regions. Regions and attachment positions are approximately to scale. Numbers indicate the first residue of each section, and last one of each protein. A, Mammalian, avian and reptilian PrPs; numbers refer to human; additional N site for avian in italics. B, Xenopus laevis PrP. C, Fugu, Tetraodon and salmon stPrP; numbers refer to PrP-461. D, Fugu and Tetraodon PrP-like; numbers refer to Fugu. E, Zebrafish PrP-like. F, Zebrafish and Fugu Sho (Chapter 4); the arrow indicates insertion region in Fugu; numbers refer to zebrafish. G, Mammalian Sho (Chapter 4); numbers refer to human.
53a
Chapter 2 PRNP and PrP
The discovery and characteristics of mammalian PrPs are described in Chapter 1.2.
Here I will summarize the history of discovery of PrPs in other species, and also outline
major analyses of PrPs.
Harris et al. (1991) isolated a cDNA coding for the chicken PrP. This protein showed
33% identity with mouse PrP, but the middle hydrophobic sequence, glycosylation sites
(with the third site unique to birds), disulphide bridge and GPI-anchor were conserved.
The proximal repeats of bird and mammalian PrP, however, showed marked differences
(Chapter 6.3). The chicken PRNP mRNA levels increased during postnatal development
in brain in parallel with the levels of choline acetyltransferase mRNA.
To understand better the species barrier that determines prion transmission between
human and primates, Shätzl et al. (1995) compared human PrP with 25 monkey and ape
PrPs. The most prominent difference in this selection of PrPs was in the number of
proximal repeats: whereas one fewer repeat was detected in the orang-utan, African
green monkey and spider monkey PrP, one additional repeat was found in the squirrel
monkey PrP relative to the human PrP. Variations in the residues 90-130 (PrPC-PrPSc
interface; Chapter 1.2.5) could influenced human prion transmission to apes. However,
this analysis also indicated that residues outside this region are also involved in the
species barrier. Differences in approximately one third of amino acid residues in PrP
were observed when the bovine, sheep, mink, rat, mouse, Armenian hamster, Chinese
hamster and Syrian hamster PrP were included in the alignment. Genomic organization
of the PRNP gene was identical in all the species (Chapter 2.2).
Windl et al. (1995) sequenced the first marsupial PrP. This sequence revealed overall
conservation of mammalian prion protein (80% identity) but there were differences in
the composition of proximal repeats (Chapter 6.3).
Wopfner et al. (1999) expanded the number of known mammalian PrPs to a total of 46,
and the number of avian PrPs to a total of 9, and then analysed the regions of PrP that
control the species barrier. Structural regions (Chapter 1.3) were conserved among
54
Chapter 2 PRNP and PrP
mammals, as were functional positions including the two glycosylation sites, two
cysteines and a serine residue (amino acid 231 in human) that is the attachment for a
GPI-anchor. The minor differences in PrPs could strongly affect disease transmission:
there are only two residues different between the dog and cat PrP, and between the
ferret and mink PrP. However, whereas dog and ferret are resistant, cat and mink are
susceptible to prion infection. PrPs were also highly conserved between bird species
(roughly 90%), but avian PrPs showed only 30% of overall identity with mammalian
PrPs. The only PrP region invariably conserved between mammals and birds was the
middle hydrophobic sequence bordered by residues 110-128 in human PrP (Chapter
1.2.5). The proximal repeats of avian and mammalian PrPs are different, perhaps
reflecting different evolutionary pressures.
Van Rheede et al. (2003) studied molecular evolution of the mammalian (eutherian)
prion protein. In order to include representatives of major clades from all 18 eutherian
orders in the analysis, they sequenced 26 new eutherian PrPs. Glycosylation sites,
disulphide bridge, hydrophobic region, elements of secondary structure and signal
peptides are all conserved among the eutherian PrPs (Figure 2.1). The repeat number in
eutherian PrPs varies from as low as two in squirrel (also shown in the lemur PrP (Gilch
et al., 2000) to seven in gymnure and leaf-nosed bat. Deviations from the repeat
consensus sequence were observed, as well as repeat homogenization. Not all the
histidines in repeats implicated in copper binding (Chapter 2.5.2) are conserved
eutherian-wide. Expansion and contraction of repeats is a frequent mutational process in
the eutherian PRNP. I show in Chapter 6.3 that this counts also for the marsupial
mammals.
Simonic et al. (2000) cloned a cDNA encoded by the turtle PRNP. The 270-residue
protein showed 40% identity with mammalian and 58% identity with avian PrPs. Ten
tandem hexarepeats were found in the N-terminal part of the protein, whose
composition was different from those in bird and mammals (Chapter 6). Homology
modelling of the turtle PrPC C-terminal region suggested that turtle PrP could generate
the same fold as mammalian PrP.
55
Chapter 2 PRNP and PrP
Strumbo et al. (2001) reported the sequence of the 216 residues in X. laevis PrP (Figure
2.1). This amphibian PrP showed more identity with avian and turtle PrPs (more than
44%) than with mammalian PrP (about 28%). The major surprise was a lack of the
repeats in the N-terminal part of the protein. The conserved hydrophobic sequence was
four residues shorter than in other PrPs.
Suzuki et al. (2002) reported a new gene in three fish species, encoding the protein with
similarities to PrP (Figure 2.1). First, a cDNA was cloned in Fugu rubripes coding for a
protein of 180 amino acids. This protein was named PrP-like, because it shared the
conserved middle hydrophobic sequence and other features of PrP, including its basic
nature and the predicted N-terminal signal sequence and GPI-anchor attachment. As for
mammal PrP, the complete coding region lies within a single exon (Chapter 2.2).
However, the Fugu PrP-like lacked the repeats, disulfide bridge and glycosylation sites
of other vertebrate PrPs, and had a different C-terminal domain. Two other fish PrP-like
sequences were discovered in Tetraodon nigroviridis and zebrafish.
Two more fish genes were later reported to encode proteins with structural features
similar to PrP (Rivera-Milla et al., 2003; Oidtmann et al., 2003). Firstly, Rivera-Milla et
al. (2003) reported a cDNA from Fugu rubripes encoding a protein of 461 amino acids
(PrP461) that contained the conserved hydrophobic region and a C-terminal domain
similar to those in other vertebrate PrPs, including the disulfide bridge and N-
glycosylation sites, one of which is conserved with other PrPs. However, it had a
greatly expanded repeat region. Sequence similarity between the Fugu PrP461 and
mammalian PrPs was 22%. The same Fugu rubripes protein independently discovered
by Oidtmann et al. 2003 was named stPrP-1. It has a different length (450 amino acids
due to the inclusion of an extra small (30 bp) intron in its ORF (Chapter 2.2). A 605
amino acids orthologue from Atlantic salmon Salmo salar, longer because of an
expanded repeat region, was also described by Oidtmann et al. (2003). A homologous
stPrP-1 from Tetraodon was found in public genomic data (Rivera-Milla et al., 2003
and Oidtmann et al., 2003).
56
Chapter 2 PRNP and PrP
Secondly, Oidtmann et al. (2003) reported a cDNA encoded by a third related gene in
Fugu they named stPrP-2. This stPrP-2 is closely related to stPrP-1 and has the same
sequence features: a hydrophobic region that is disrupted by charged residues, a C-
terminal domain with the disulfide bridge and three N-glycosylation sites, and an
expanded repeat region. It was estimated that the Fugu stPrP-1, Fugu stPrP-2, salmon
stPrP-1 and Fugu PrP-like show 24.8, 21.3, 17.7 and 16.3% identity with the human PrP
27-30 (residues 90-230).
In addition to the problem of transmission of prions among mammals and to humans in
the recent BSE crisis, the findings of PrP homologues in fish raise new issues: the
possibility of spread of prions to farmed fish (e.g. from meat and bone meal feedstuff
derived from farm animals), and vice versa. Oidtmann et al. (2003) indicated that it
seems unlikely that fish could accumulate mammalian prions, but this possibility should
not be excluded, as factors contributing to the species barrier are not fully understood.
2.2 PRNP and Its Homologues
The mammalian prion protein gene and its fish homologues have similar characteristics,
including the exon/intron structure and complete ORF within 3’ terminal exon.
2.2.1 Mammalian PRNP
There is only a single PRNP gene in the mammalian genome. Interest in the structure
and regulation of this gene has been extreme because it dictates both host PrPC amino
acid sequence and level of its expression. These features determine genetic
resistance/disposition to the prion diseases (Prusiner and Scott, 1997; Prusiner, 1998).
The first PRNP gene studied was that of Syrian hamster (Basler et al., 1986). Analysis
showed that it has two exons (56-82 bp and 2kb), separated by one intron (10 kb) and
that the entire ORF resides within the larger exon. Multiple transcription sites were
observed within 25 bp in the upstream promoter. The promoter region contained three
Sp-1-binding sites but no TATA box, features of a housekeeping gene that are in tune
57
Chapter 2 PRNP and PrP
with the ubiquitous expression of PRNP (Chapter 2.3). Li and Bolton (1997) showed
that there is another non-coding exon (99 bp) within the intron. In different brain
regions, the transcript containing all three exons was expressed at 30-50% of the level
of transcript containing only exons 1 and 3. Worth noting here is that the first full-
length PrP sequence was translated from the Syrian hamster PRNP DNA sequence
(Chapter 1.2.2).
Human PRNP has the same gene structure as Syrian hamster PRNP (Puckett et al.,
1991), with a short proximal exon coding for a 5’untranslated region of mRNA (136
bp), a single intron (13 kb) and a distal exon (2.3 kb) containing the ORF. Although no
trace of the exon 2 was found in human cDNAs, the gene contains an exon 2-like
sequence (Lee et al., 1998). The proximal promoter is GC-rich, typical of housekeeping
genes. I analyse human PRNP characteristics in Chapter 6.
Lee et al. (1998) compared the human PRNP with the mouse and ovine PRNPs. The
mouse Prnp (21 kb) encompasses three exons. The short exons 1 (47 bp) and 2 (98 bp)
encode the 5’ UTR (Westaway et al., 1994a), and the complete ORF lies within the
exon 3 (2 kb). The splice donor and acceptor sites flanking the exon 2 were different
from consensus sequences, suggesting that splicing may not be obligatory. Two mouse
Prnp alleles that determine different incubation times after prion infection, Prnpa and
Prnpb, have different lengths (approximately 6 kb difference in the second intron) but
this difference does not affect incubation times. There are multiple transcription start
sites across the 25 bp promoter region. The promoters of both Prnpa and Prnpb contain
binding sites for the Sp1 and AP-1 transcription factors. There are four motifs 250 bp
proximal to the transcription start site CTTTCATTTTCTC, CCATTAt/cGTAACG,
TAAAGATGATTTTTA, TCAGGGAG. These are conserved in the mouse Syrian
hamster, sheep and human promoters but their functional significance is unclear.
The 20 kb sheep PRNP gene also has three exons (52, 98, 4028 bp) (Westaway et al.,
1994b). The coding exon 3 is longer than those in other PRNPs. There is neither a Sp1-
nor an AP-1-binding site in the promoter, but there is an AP-2-binding site.
58
Chapter 2 PRNP and PrP
Hills et al., (2001) reported full genomic sequence of the 20 kb bovine PRNP (Figure
2.2). The gene structure is the same as that of the sheep PRNP, with three exons (53, 98
and 4092 bp) and two introns (2442 and 13552 bp).
Comparative genomic analysis of the human, mouse and sheep PRNP showed that the
genes accumulate transposable elements extensively and independently (Lee et al.,
1998). The content of transposable elements was estimated to be 40% in human and
mouse, and 57% in sheep. The 6 kb difference between mouse Prnpa and Prnpb is due
to insertion of the transposable element intracisternal A-particle into the intron 2. The
three-species comparisons identified conserved non-coding sequences in the intron 1
and in the 3’UTR region of the terminal exon (Chapter 6.5). The longer terminal exon is
present in the bovine and sheep genes due to integration of the Bov-B, Bov-tA, and
Mariner transposable elements in the 3’UTR.
Mammalian PRNP lies adjacent to one or two related genes (Chapter 5). The gene
immediately distal to PRNP in eutherian mammals is the PRND encoding the doppel
(Dpl) protein (Chapter 2.2.2), which is thought to have arisen by a duplication of PRNP
(Mastrangelo and Westaway 2001). The next gene adjacent to PRND, detected so far
only in humans and not present in mouse, is PRNT gene (Makrinou et al. 2002), which
seems to be a pseudogene arisen from a duplication of PRND (Chapter 5.5). Further
distal to the PRNT in human, and to the Prnd in mouse, are the RASSF2 encoding Ras
association domain family 2 protein, and the SLC23A1 encoding solute carrier family
23 member 1 protein, conserved in both human and mouse genomes (Chapter 5.5).
The PRNP gene is located on human chromosome 20p13, and in syntenic regions on
mouse chromosome 2F3, rat chromosome 3q36, dog chromosome 24 (Ensembl), bovine
chromosome 13q17, river buffalo chromosome 14q15, sheep chromosome 13q15, goat
chromosome 13q15 (Iannuzzi et al., 1998) and chicken chromosome 22 (Ensembl).
In summary, these analyses have identified, as conserved features of eutherian PRNP
promoters, their GC richness and a lack of TATA box typical of housekeeping genes.
There are some differences in gene structure and regulation of gene expression between
59
Chapter 2 PRNP and PrP
Figure 2.2: Structure of the mammalian PRNP and fish PrP-like genes. (A) Typical mammalian PRNP has two short noncoding exons (E1 and E2), two introns, and the complete ORF within longer terminal exon (E3). E2 is missing in the human PRNP. The sizes of exons and introns correspond to the bovine PRNP. (B) Fugu PrP-like contains one non-coding exon (E1) and the complete ORF is within terminal exon (E2). The two rulers indicate size in kb. Exons are depicted by black rectangles. ORF is shown as white rectangle.
59a
Chapter 2 PRNP and PrP
species. PRNP genes contain either three or two exons. There is a single transcription
start site in PRNP except for rodents which have multiple transcription start sites.
Whereas the hamster, human, bovine and mouse promoters contain Sp-1-and AP-1-
binding sites, the sheep promoter does not but instead include an AP-2-binding site.
2.2.2 PRND: A Mammalian Paralogue of PRNP gene
The first mammalian paralogue of gene encoding the prion protein was discovered in
mouse by sequencing the genomic DNA 16 kb downstream of the Prnp gene (Moore et
al., 1999). Prnd encodes a GPI-anchored glycoprotein of 179 amino acids dubbed
“doppel” (Dpl; double in German) showing roughly 25% identity with mammalian PrP.
However, Dpl contains neither the middle hydrophobic section of the PrP critical for its
function, nor the proximal repeats (Chapter 1.2). PRND is 27 kb distal to PRNP in
human. The human and rat Dpl are 76% and 90% identical with mouse Dpl. Unlike
Prnp, the Prnd was expressed minimally in the adult mouse brain, and highly in testis.
Expression of Prnd was upregulated in the brains of Prnp0/0 mice lines Ngsk Prnp0/0 and
Rcm Prnp0/0 that exhibit ataxia and neurodegeneration (Chapter 2.5.1).
The solution structure of recombinant mouse Dpl (amino acids 26-157) was very similar
to that of PrPC (Chapter 1.3), despite limited sequence homology (Figure 2.3) (Mo et
al., 2001). A globular domain contained three helices and little of β-structure. Two
disulfide bonds were found, one between Cys-109 and Cys-143 and the other between
Cys-95 and Cys-148. Regions of secondary structure occurred roughly at the same
positions in both proteins, but differences include a kink in helix αB, shorter helix αC,
shorter β-strands and different orientation of the β-sheet in Dpl.
Prnd knockout mice develop normally (Behrens et al., 2002). Sterility was found in
male but not in female Prnd-deficient mice. The spermatids from Prnd knock-out males
were immobile and malformed, their number was reduced and they were unable to
fertilize oocytes in vitro. Acrosomal defects observed in the Prnd knockout sperms
could account for infertility, perhaps due to inability of sperms to cross zona pellucida.
Transformation of the round spermatids into testicular spermatozoa was also abnormal,
60
Chapter 2 PRNP and PrP
Figure 2.3: Comparison of the backbone topology of recombinant mouse Dpl and PrP. αA, α helix A; αB, α helix B; αC, α helix C (copied from Mo et al., 2001).
60a
Chapter 2 PRNP and PrP
as well as regional separation of the spermiogeneic differentiation stages. Thus Prnd is
implicated in male gametogenesis. PrPC, although expressed in testis, could not
compensate for the loss of Dpl, indicating that Prnp and Prnd have non-redundant
functions, at least in the male reproductive tract. Indeed, the Prnp/Prnd double
knockout mice showed no additional new phenotype (Chapter 2.5.1).
The proximity of PRNP and PRND, and sequence and structural similarities of their
products, indicate that they are product of tandem duplication. After duplication of an
ancestral gene, the two genes (duplicates) evolved distinct and unrelated functions
(divergent evolution). Although the two proteins retain similar architectures but with
slightly different topologies, their diverged amino acid compositions dictate different
functions.
2.2.3 Fish PRNP Homologues
Features of the fish PRNP homologues were first defined in the Fugu genome. Suzuki
et al. (2002) reported the PrP-like gene and analyzed its structure and its local genomic
environment. The gene structure resembles that of mammalian PRNP: a short exon 1
(39 bp) and a long exon 2 (932 bp) harbouring the complete ORF are separated by an
intron (1.5 kb) (Figure 2.2). The Fugu PrP-like transcript was expressed in skin, eyes
and brain, an expression pattern different from that of mammal PRNP (Chapter 2.3). It
was noted that the PrP-like resides in the same genomic region as mammalian PRNPs,
proximal to RASSF2 and SLC23A1 in both Fugu and mammals. Suzuki et al. placed the
PrP-like between these two genes, suggesting an evolutionary relationship between the
fish PrP-like and tetrapod PRNP genes (Chapter 5.5).
Oidtmann et al. (2003) provided more details about the PRNP-related genes in Fugu.
They determined that stPrP-2 lies 2 kb proximal to PrP-like, and RASSF2 and SLC23A1
were distal (Chapter 5.5). This fish genomic region did not contain the PRND, which is
reported only in mammals.
61
Chapter 2 PRNP and PrP
The other PRNP homologue, stPrP-1 was found in different genomic context. It
contains a small intron within the ORF, unlike other members of the PRNP gene family
that had been described (Oidtmann et al., 2003). Rivera-Milla et al. (2003)
demonstrated expression of the PrP461 (stPrP-2) transcript in brain and liver. Oidtmann
et al. (2003) showed that stPrP-1, but not stPrP-2 mRNA is expressed in brain in Fugu.
Salmon stPrP-1 transcript is expressed ubiquitously (muscle, liver, skin, gills, kidney,
spleen, heart, brain) but most prominently in brain, an expression pattern similar to
PRNP expression in mammals (Chapter 2.3).
Suzuki et al. (2002) made the initial suggestion of the evolutionary link between the fish
PrP-like and tetrapod PRNP based on their similar protein sequence features
(extracellular, GPI-anchored proteins with repeats and middle hydrophobic region) and
shared contex (proximity to RASSF2 and SLS23A1 genes). However, Oidtmann et al.
(2003) considered that the stPrP-2 had a closer evolutionary relationship with tetrapod
PrPs because its C-terminal region has more similarity to mammalian PrP than does the
PrP-like. Of all the fish PrP homologues, stPrP-1 showed highest homology with other
PrPs although the gene is located in a different genomic context. I analyse these
competing hypotheses in Chapter 5.
2.3 Expression of PRNP and PrPC
Mammalian PRNP is a housekeeping gene and is expressed in a heterogenous set of
cells. This was first demonstrated by Oesch et al. (1985) who cloned a partial cDNA
coding for Syrian hamster PrP. The mRNA levels were the same in both normal and
prion-infected brain. Transcription of mRNA was shown in a range of other tissues:
heart, lung, pancreas, liver, spleen, testis and kidney.
Regulated expression of PRNP during Syrian hamster brain development was
demonstrated by Northern analysis (McKinley et al., 1987). A low level of the mRNA
was found one to ten days after birth, rising to a maximal between the day 10 and day
20 after birth, and remained constant throughout life. PrPC expression increased from a
low at day 2 to a maximum at day 10 after birth. These changes correlate with
62
Chapter 2 PRNP and PrP
morphological changes occurring during mammalian brain development, including
neuron differentiation and increase in the rates of synaptogenesis and myelination
which occur after postnatal day six, suggesting PRNP’s involvement in neuronal
maturation (Chapters 2.5.2 and 6.5).
Caughey et al. (1988) found that PRNP is expressed in normal and scrapie-infected
mouse and hamster brains, liver and spleen. In situ hybridisation (Brown et al., 1990)
revealed PRNP expression in the neurons and non-neuronal cells of mouse brain
(ependymal cells, choroid plexus epithelium, astrocytes, pericytes, endothelial cells and
meninges). Transcription was found also in the microglia cells, alveolar lining and
septal interstitial pulmonary cells and myocard, but not in spleen.
PRNP is also expressed in a number of the cell lines from mouse (epithelial cell line
C127, neuroblastoma Neuro 2A cells, erythroid cell line AA60, embryo fibroblast B6-
3T3 cells, B cell lymphoma cell line 1593), Syrian hamster (ovary-derived CCL61
cells), human (astrocytoma HTB14 cells, neuroblastoma HTB10 cells) and rat (glioma-
derived C6Bu3 cells). No PRNP mRNA was found in the mouse myeloid cell lines
5402 (differentiated) and 7320 (undifferentiated), nor in the human T cell lymphoma
cell line MBL-2 (Caughey et al., 1998).
Human lymphocytes and lymphoid cell lines (but not erythrocytes or granulocytes)
transcribe PRNP and express PrPC (Cashman et al., 1990). After activation of T
lymphocytes, abundance of PrPC on the cell surface increased. Polyclonal antibodies to
PrPC suppressed concanavalin A-induced activation of lymphocytes, indicating that the
PrPC may participate in activation of T lymphocytes (Chapter 2.5.2).
Manson et al. (1992) studied PRNP expression during embryonic mouse development
using in situ hybridisation. Transcripts were found by 13.5 or 16.5 days throughout the
developing brain and spinal cord, and also in the peripheral nervous system (ganglia and
nerve trunks of the sympathetic nervous system and neural cells of sensory organs). At
this stage PRNP expression was also detected in the differentiating non-neuronal cells
of dental lamina and kidney. In extra-embryonic tissue, the PRNP transcripts were
63
Chapter 2 PRNP and PrP
found in the maternal cells of the placenta, and in the amnion, umbilical cord and
mesodermal layer of yolk-sac.
The distribution of tissues expressing PrPC was studied in Syrian hamster (Bendheim et
al., 1992). Immunohistochemical analysis localized PrPC in brain to the neurons and
surrounding neuropil in the hippocampus, septal, caudate and thalamic nuclei, dorsal
root ganglia and dorsal root axons. PrPC was most concentrated within the hippocampus
including the CA1, CA3, CA4 subfields, fimbria, pyramidal cells, dentate formation and
the intervening neuropil. Cortex, fornix, caudate, thalamus, brainstem and spinal cord
expressed less PrPC. In non-neuronal tissues, the circulating leukocytes, heart, myocard,
lung (bronchial epithelium), stomach (parietal and glandular neuroepithelial cells),
intestines, spleen, testis and ovary all expressed PrPC.
Askanas et al. (1993) demonstrated that PrPC is concentrated at the postsynaptic domain
of human normal neuromuscular junctions (NMJ). At the NMJ, molecular compositions
of the extracellular matrix and immediately postsynaptic cytoplasmic domain are
different from those in the nonsynaptic region of the muscle fibre.
Ford et al. (2002a) developed antibodies recognising PrPC in glutaraldehyde-fixed tissue
and studied PrPC expression in the brain. PrPC expression was predominantly neural.
The GABA-immunoreactive neurones showed the highest levels of expression.
Dopaminergic neurones and glia, on the other hand, showed no PrPC expression.
However, all the neurones expressed PRNP mRNA, indicating the importance of
posttranscriptional control of mRNA activity (Chapter 6.5).
PrPC is expressed in a heterogenous set of mouse tissues outside brain (Ford et al.,
2002b), including peripheral nerves and Schwann cells, sympathetic ganglia and nerves,
parasympathetic and enteric nervous system, antigen presenting and processing cells,
populations of lymphocytes and the neuroendocrine system. A good correlation
between mRNA and protein was found outside brain.
64
Chapter 2 PRNP and PrP
Barmada et al. (2004) generated transgenic mice in which the PRNP promoter drives
expression of a fusion protein PrP-EGFP (enhanced green fluorescent protein). PrP-
EGPH was expressed within synapse-rich regions in brain. In the hippocampus,
fluorescence was found in the synapse-rich layers such as the strata oriens, radiatum,
lacunosum-moleculare and lucidum, alveus, subiculum, fimbria and hilus. PrP-EGPH
was found throughout the neocortex. In the cerebellum, fluorescence was detected at
high levels in the molecular layer, and at lower levels in the granule cell layer and white
matter.
Morel et al. (2004) analysed expression of PrPC in normal human intestinal tissues. PrPC
was expressed in enterocytes, the dominant cell population of the intestinal epithelium,
and also in the vascular epithelia. The enterocytic cell line caco-2/TC7 also expresses
PrPC.
2.4. Cell Biological Features of PrPC
The metabolism of PrPC determines both its normal role and its contribution to prion
disease pathogenesis.
Mammalian PrPC is a membrane protein that cycles constitutively between the cell
membrane and early endosomes (reviewed in Harris, 2003). The biosynthetic pathway
of PrPC is similar to that of other secreted and membrane proteins. It is first synthesized
in the endoplasmic reticulum (ER), then post-translationally modified (cleavage of
signal peptides, N-linked glycosylation and addition of the GPI-anchor) in the ER and
Golgi, and finally it reaches the cell surface. The PrPC molecules cycle constitutively
through the cell with a transit time of approximately 1 hr: the t1/2 for internalisation and
the t1/2 for return to the cell surface are both roughly 20 min with the protein being
equally divided between the two compartments (Shyng et al., 1993). Most of the
molecules are recycled intact to the cell surface but a small percentage is proteolytically
cleaved in the middle of the protein. Roughly 10-30% of the membrane-anchored
molecules is released into the extracellular milieu. The t1/2 for degradation of PrPC in
lysosomes is 3-6 hrs (Taraboulos et al., 1992).
65
Chapter 2 PRNP and PrP
Most of the protein resides in membrane “rafts”, detergent-resistant domains enriched in
sphingolipids that are foci for signal transduction events. Internalisation of PrPC may
occur via clathrin-coated vesicles and is mediated by the N-terminal part of the protein
or, alternatively, through a caveolae-mediated endosomal pathway. Binding of copper
stimulates the endocytosis.
Peters et al. (2003) analysed PrPC trafficking using cryoimmunogold electron
microscopy. They found that PrPC was enriched in the caveolae, stable membrane
microdomains (“rafts”) that mediate key cell processes such as signal transduction,
anchored by the actin cytoskeleton, and enriched in caveolin, cholesterol and
glycosphingolipids. PrPC was delivered to the late endosomes/lysosomes via a
nonclassical, caveolae-containing early endocytic structures (“caveosomes”). The GPI-
anchored proteins may cycle between the cell surface and trans Golgi network via this
pathway and inhibitors of such endocytosis may be of therapeutic interest.
Early studies of PrPC localization were ambiguous: it was predominantly found in the
soma with minor signal in the neuropil (Bendheim et al., 1992; Ford et al., 2002b) but
PrP was found to be predominant in the neuropil as well. It could also be predominant
in the synaptosomal plasma membrane but with no presence in the synaptic vesicles or
cytosol (Herms et al., 1999).
Mironov et al. (2003) investigated ultrastructural localization of PrPC in the mouse
hippocampus cornu ammonis 1 (CA1) and dentate gyrus areas. They demonstrated
ubiquitous cell distribution of the extracellular PrPC. Consistent with its GPI-anchored
membrane asociation, this suggests that it diffuses along the cell membrane. PrPC was
associated predominantly with the neuropil and had the same concentration within the
synaptic specializations and perisynaptically. It was present with the same concentration
in the presynaptic and postsynaptic membranes and within the synapse, but no PrP was
found in the synaptic vesicles. Besides PrPC associated with the biosynthetic and
endocytic membranous structures, a cytosolic PrP was also identified in subpopulations
of unknown neurons in the hippocampus, neocortex and thalamus (CPrP cells). This
66
Chapter 2 PRNP and PrP
cytosolic PrP could be novel PrP entity, with structure and function different from the
extracellular PrPC.
Barmada et al. (2004) confirmed existence of the cytosolic PrP. Further, they showed
that the PrP-EGFP (Chapter 2.3) is localized primarily along axons and in presynaptic
terminals. This distribution is consistent with retrograde and anterograde transport of
PrP along axons (Moya et al., 2004) and with preferential sorting of some GPI-anchored
proteins in neurons to their axonal surface. There was less PrP-EGFP on dendrites in the
hippocampus and cerebellum.
In enterocytes, PrPC is localized in rafts microdomains as well (Morel et al., 2004;
Chapter 2.3). Further, it was mainly concentrated in the lateral membrane, associated
with the junctional complexes. This localization was dependent on cell-cell contacts
(Chapter 2.5.2). PrPC was not found on the apical membrane.
There are three topological forms of PrP known: the extracellular GPI-anchored form
(PrPC) comprising roughly 50% of total PrP, and two transmembrane entities (CtmPrP
and NtmPrP) spanning the cell membrane in opposite orientations and comprising about
10% and 40% of total PrP (Hegde et al., 1998). Two adjacent regions act in concert to
generate transmembrane entities: TM1 (A113-S135 in human PrP) and STE (for stop
transfer effector L104-M112 in human PrP). Aberrant regulation of PrP biogenesis and
topology may cause neurodegeneration. CtmPrP caused severe neurodegeneration in
mice, and is a key component in the GSS disease pathway caused by the A117V
mutation. NtmPrP could have normal role.
Both PrP isoforms have two variably occupied glycosylation sites, Asn181 and Asn197
in human PrP (reviewed in Rudd et al., 2002). More than 50 glycans occupied either or
both sites in PrPC. The PrPSc from the scrapie-infected hamster brain also had glycans at
these sites, but contained more of tri- and tetra-antennary glycan complexes. The
glycans stabilise the folded part of PrPC, so altering its sugars could have functional
consequences. For example, the PrP transformation occurs more readily if PrPC is
unglycosylated. The oligosaccharides are also required for the intracellular trafficking
67
Chapter 2 PRNP and PrP
of PrPC (Chapter 1.2.6). Further, they are big in comparison with the PrPC. Simulations
of molecular dynamics (Zuegg and Gready, 2000) showed that the folded domain of
PrPC is stabilized by an indirect effect of glycosylation, and that the glycans change the
surface charge to a negative electrostatic field which could inhibit the association of
PrPC with the membrane.
Both PrP isoforms also contain a phosphatidyl inositol glycolipid that attaches them to
the outer leaflet of cell membrane (Stahl et al., 1988). In fact, all vertebrate homologues
of PrP were predicted to contain this GPI-anchor (Chapter 2.1). It is readily cleaved by
a bacterial enzyme PI-PLC (Chapter 1.2.7), releasing PrPC from the cell membrane.
Simulations of the molecular dynamics indicated that the GPI-anchor is flexible and
maintains the protein 9-13Å from the cell membrane (Zuegg and Gready, 2000). In
general, GPI-anchored proteins are involved in signal transduction and cell activation
(e.g. acetylcholinesterase in synaptic cleft) and they show rapid locomotion (Medof et
al., 1996). They may be promiscuous and reincorporate into membranes in trans,
remaining fully functional (protein “painting”). Such intermembrane transfer of the
mouse GPI-anchored complement restriction factors from erythrocytes to the epithelium
was shown to occur in vivo under physiological conditions (Kooyman et al., 1995). By
analogy with these observations, this feature of GPI-linked proteins may enable
spreading of prions from neuron to neuron.
Indeed, Liu et al. (2002) demonstrated that PrPC could be transferred from cell to cell by
a GPI-dependent process in vitro. This process is tightly regulated, as it occurred only
after either donor or recipient cells, or both, were activated by the protein kinase C
(Chapter 2.5.2) activator phorbol 12-myristate 13-acetate (PMA). The transfer was also
dependent on direct cell to cell contact.
Exosomes are membrane vesicles released into the extracellular millieu. Follicular
dendritic cells, which are implicated in the peripheral prion disease pathogenesis
(Chapter 1.2.9), release and exchange exosomes with other cells (Fevrier and Raposo,
2004). Exosomes are released after exocytic fusion of multivesicular endosomes, and
could act as carriers for intercellular exchange of PrPC and PrPSc (Fevrier et al., 2004).
68
Chapter 2 PRNP and PrP
A fraction of infectious PrPSc was released from the scrapie-infected Mov and Rov cells
in association with exosomes. Native PrPC is released in the same manner. Protein
composition of the PrP-carrying exosomes was evaluated by mass spectroscopy.
Among others, proteins involved in adhesion, membrane fusion and exosome
biogenesis were found, indicating that the PrP-carrying vesicles are bona fide
exosomes. Exosomes are a newly discovered mode of intercellular communication.
They are released by many cell types including B cells and intestinal epithelial cells.
They are enriched in cell-type specific proteins (e.g. MHC I and II in B cells), in
ubiquitous proteins involved in biosynthesis of exosomes and their adhesion to target
cells, in membrane raft components, and in GPI-anchored proteins. This finding is in
agreement with result of Peters et al. (2003): one fate of the caveosomes is exocytic
fusion and release into the extracellular environment.
Yedidia et al. (2001) showed that roughly 10% of the newly synthesized PrPC is
degraded by the ERAD-proteasome pathway, which is responsible for clearing of
misfolded proteins (Chapter 1.4.1). During this process, PrP molecules are translocated
to the cytosol, unglycosylated, ubiquitinated and degraded.
Ubiquitous expression indicates functional contribution of the PRNP to many cell types.
Glycosylated, GPI-anchored extracellular PrPC diffuses along the cell membrane. It
resides in the cell membrane foci that mediate signal transduction, cycles constitutively
and is degraded by the lysosomes.
2.5 Normal Function of PRNP
The normal function of prion protein gene remains elusive, and a number of hypotheses
were proposed.
2.5.1 Prnp Knock-Out Mice
Prnp knock-out mice were constructed to illuminate the normal function of Prnp.
69
Chapter 2 PRNP and PrP
Prion protein gene knock-out mice conservatively generated by disrupting the Prnp
ORF have no obvious phenotype (Bueler et al., 1992; Manson et al., 1994; Weissmann
and Flechsig, 2003). No major anatomical abnormalities, infertility, difference in
immunological status, learning or behavioural changes were found.
A more radical Prnp knock-out (which, as well as disrupting the ORF, also included
removal of the splice acceptor site of the exon 3) produced ataxia and loss of the
Purkinje cells later in life (Sakaguchi et al., 1996). However, this phenotype was a
consequence of the up-regulation of the Prnd gene and its high, non-physiological
expression in brain (Chapter 2.2.2).
There are several explanations for the lack of phenotype in the conservative Prnp
knock-out mice. The knock-out phenotype could be so subtle that a selective
disadvantage may emerge only after many generations, for example as a consequence of
stressful conditions. Alternatively, the functional redundancy or compensation of its
loss by other molecule(s) may mask the loss of the Prnp gene. Another possibility is
that the protein may have recently lost its function (Bueler et al., 1992). Finally, the
knock-out phenotype may not be apparent in laboratory settings.
However, there could be more subtle phenotypic changes in the Prnp knock-out mice.
Collinge et al. (1994) reported that the CA1 hippocampal slices from Prnp knock-out
mice show weakened GABAA receptor-mediated fast inhibition and impaired long-term
potentiation. However, other laboratories could not confirm this observation (Lasmezas,
2003). Colling et al. (1997) reported aberrant mossy fibers in the Prnp0/0 mice
hippocampus CA2 and dentate gyrus regions, similar to morphological abnormalities
following epileptic seizures.
Mice devoid of Prnp exhibited alterations in circadian activity rhythms and sleep
(Tobler et al., 1996) indicating involvement of Prnp in regulation of sleep. Period
lengths of the circadian activity rhythms were longer in the null mice than in wild type.
Next, the Prnp0/0 mice were less active in the first half of the dark period. The null mice
70
Chapter 2 PRNP and PrP
also showed different non-rapid eye movement sleep (REM), waking distribution in the
dark and sleep fragmentation. These phenotypes were rescued by re-introduction of the
Prnp gene. Evaluation of behavioural parameters in the Prnp0/0 mice showed normal
fear-motivated memory, anxiety and exploratory behaviour but slightly increased
locomotor activity (Roesler et al., 1999).
Results consistent with the mild knock-out phenotype were produced by using a
tetracycline controlled transactivator to repress PrPC expression in adult mice. Tremblay
et al. (1998) found no deleterious effects. After administration of doxycycline (an
analogue of tetracycline) to adult mice, expression of PrPC in brain was repressed by
90% after seven days of treatment; when doxycycline was withdrawn, it took seven
days for PrPC expression to return to its normal level. Doxycycline-treated mice were
not susceptible to exogenous prions. The absence of systemic or CNS dysfunction upon
PrPC repression also argues in favour of redundancy between PrPC and other
molecule(s) as no developmental compensation and adaptation was possible using this
experimental system.
Using the cre-loxP system to knock-out the Prnp gene in 9 week old mice Malluci et al.
(2002) found that mice remained healthy and showed no evidence of
neurodegeneration. However, a significant reduction of afterhyperpolarization in the
CA1 cells was found, indicating that the PrPC may modulate neuronal excitability by
affecting afterhyperpolarization. Bypassing developmental compensatory mechanisms
induced no detrimental effect, suggesting once again functional redundancy between
Prnp and another gene(s).
Coitinho et al. (2003) studied behavioural parameters in the 3- and 9-months old Prnp0/0
mice. Behavioural parameters were also compared after administration of anti-PrPC
antibodies into the CA1 region of dorsal hippocampus in normal 3- and 9-months old
rats. Memory performance normally declines with aging, starting at the age of 9-12
months in rodents. No difference from normal mice was observed in the 3-months old
Prnp0/0 mice. On the other hand, impairment of both short- and long-term memory was
observed in the 9-months old Prnp0/0 mice when compared with normal mice. This was
71
Chapter 2 PRNP and PrP
also the case in comparisons of 9-months old rats that received anti-PrPC antibodies
compared with normal rats. Decreased locomotor activity during observation of an open
field was observed in the 9-months old Prnp0/0 mice. Normal anxiety was found in both
Prnp0/0 mice age groups. These observations may be explained by the impairment (or
modification) of PrPC physiological functions in the adult Prnp0/0 mice hippocampus.
The Prnd gene (Chapter 2.2.2) is dispensable for prion disease pathogenesis. Its normal
function must encompass reproduction, since male Prnd knock-outs are infertile.
Overexpression of Prnd in the brain causes neurodegeneration that can be rescued by
the expression of Prnp. Mice in which both paralogues, Prnp and Prnd, were
inactivated showed no additional new phenotype (Genoud et al., 2004). Double knock-
out mice had no morphologic or immunologic abnormalities apart from infertility of
male mice. This analysis showed that there is no functional redundancy between Prnp
and Prnd genes. Therefore, functional redundancy is likely to exist between Prnp and
its other homologue(s) (Chapters 4-6).
The homologue(s) of Prnp with redundant function are unknown. Shadow of prion
protein SPRN is the only human gene that is such a candidate at present (Chapters 4-7).
2.5.2 Hypotheses about the Function of PRNP
Many hypotheses have been proposed for PRNP function, including its involvement in
copper transport, copper buffering, redox signalling, neuroprotection, cell-cell
interactions, lymphocyte activation and nucleic acid metabolism and signal
transduction. Here I will briefly outline eight hypotheses, and describe in full ninth
which is supported by my work (Chapter 6).
2.5.2.1 PrPC Transports Copper
The endocytic pathway of PrPC could suggest a role in uptake or in efflux of an
extracellular ligand (Harris, 2003). Mammalian PrPC binds copper cooperatively at five
to six sites in a low micromolar range (total copper concentration is 16-20 µM in blood,
72
Chapter 2 PRNP and PrP
0.5-2.5 µM in cerebrospinal fluid and 15 µM in synapse) and in a pH-dependent manner
(optimal at physiological pH) (Brown et al., 1997). The residues involved in copper
binding are histidines that reside in the proximal octarepeats, and histidines in the C-
terminal domain (His96, His111 or His140 in human PrP; Chapter 2.1). Deletion of the
proximal repeats in chicken PrPC also affected copper binding (Pauly and Harris, 1998).
Prnp0/0 mice showed reduced copper content in the membrane-enriched brain and liver
extracts and increased content of serum copper. Tenfold reductions of copper content
were also found in the synaptosomal and endosome-enriched brain fractions, indicating
that the PrPC-deficient cell membranes are also deficient in copper. Further, a reduction
in the activity of copper/zinc superoxide dismutase (SOD-1) and altered
electrophysiological responses in the excess of copper were observed in PrPC-deficient
cells.
Thus, PrPC is a copper-binding cuproprotein whose low affinity copper binding may
allow exchange of copper with other molecules. In this, PrPC may be similar to the
proteins implicated in pathogenesis of Parkinson’s disease (monoamine oxidase),
Alzheimer’s disease (amyloid precursor protein APP) and familial amyotrophic lateral
sclerosis (SOD-1), which are also cuproproteins.
It is unclear how copper and PrPC could be functionally related. Bound copper may
serve as a cofactor for enzymatic activity of PrPC, PrPC may act as a sink for chelation
of extracellular copper ions, or PrPC may act as a carrier protein for copper uptake and
delivery to intracellular targets. Pauly and Harris (1998) showed that copper rapidly and
reversibly stimulates endocytosis of PrPC from the cell surface. Incubation of N2a
mouse neuroblastoma cells expressing either mouse or chicken PrPC with excess CuSO4
(200 µM, 500 µM) rapidly stimulated internalisation of both PrPCs. The removal of
metal reversed the PrPC distribution.
Two models for the role of PrPC in copper trafficking were hypothesized (Harris, 2003).
Firstly, PrPC could serve as a receptor for uptake of copper ions from the extracellular
milieu. It could bind copper on the plasma membrane via the proximal repeats and
73
Chapter 2 PRNP and PrP
deliver it by endocytosis to the acidic endosomal compartments, where copper ions
dissociates at low pH and are then transported to the cytoplasm. PrPC could then return
to the cell surface to begin a new cycle. Alternatively, PrPC could facilitate cellular
efflux of copper via the secretory pathway by binding copper ions in the Golgi
compartments.
2.5.2.2 PrPC Buffers Copper from the Synapse
As PrPC is concentrated at the synapse both presynaptically and postsynaptically,
copper binding may have an anti-oxidant effect that is important for synaptic
homeostasis (reviewed by Brown, 2001). At the cellular level, PrP-deficient cells are
more susceptible to oxidative damage and toxicity, and show increased sensitivity to
various kinds of stresses, implying a protective role of PrPC. The synaptic release of
copper may increase its local concentration to up to 250 µM. This copper is usually
bound to peptides or amino acids and must be taken up rapidly by the neurones. Excess
copper can catalyse interconversion of various reactive oxygen species, or even
generate hydroxyl radicals from water. Sequestering it from the synapse is therefore
important to protect the cell from oxidative damage. PrPC-deficient cells do take up
copper, but to a lesser extent than PrPC-containing cells.
Brown et al. (2001) showed that the protection of cells against oxidative stress by PrPC
is proportional to the amount of copper it binds. Both purified PrPC and recombinant
PrP exhibited superoxide-dismutase-like activity in a formazan formation assay. The
SOD-like activity increased with the number of copper molecules incorporated, and it
depended on the copper concentration. This suggested that copper binding facilitates
changes in the secondary structure of the protein. The SOD-like activity was inhibited
when PrPs were incubated with the PrP106-126 (Chapter 1.2.5). Increased resistance to
oxidative stress was also shown for cells grown in excess copper, but not when PrPC
was stripped away using PI-PLC. Expression of PrPC with bound copper boosted
cellular resistance to oxidative stress.
74
Chapter 2 PRNP and PrP
Cui et al. (2003) investigated which regions of prion protein are required for the SOD-
like activity. The repeats and hydrophobic region (Chapter 2.1) are indispensable for
this activity, and the C-terminus is also important.
Several studies argue against a copper-transporting role for PrPC; for example, the in
vitro study of Rachidi et al. (2003) indicated that the PrPC was not involved in delivery
of copper at physiological concentrations (1.6 µM).
2.5.2.3 PrPC Contributes to Redox Signalling
Another suggestion is that PrPC could be a copper-sensitive stress-sensor, which is able
to initiate signal transduction cascades. After sensing stimuli such as copper and/or free
radicals, PrPC could trigger intracellular calcium signals that contribute to modulation of
synaptic transmission and maintenance of neuronal integrity (reviewed by Vassallo and
Herms, 2003). PrPC may efficiently buffer copper at the synapse in order to maintain
copper concentrations in the presynaptic cytosol and protect synapses from oxidative
insult. These complementary activities should also contribute to the preservation of
neuronal electrophysiology. Copper may be transported back to the cell by other
transporters present on the outer side of the cell membrane.
Some features of PrPC, like its neuroprotective effect against oxidative stress, suggest
that it is involved in free radical pathways, as these overlap with systems controlling
homeostasis of redox-active metals such as copper. One scenario is that PrPC acts as a
modulator of calcium flux in response to copper because copper enables redox
signalling and triggers responses. Thus copper will bind to PrPC after its concentration
increases, enabling it to participate in the redox reactions (such as SOD-like activity), in
turn triggering membrane kinases and activating Ca2+-mediated signalling cascades.
Therefore PrPC may act as a sensor for strong copper/reactive oxygen species (ROS)
stimuli and by generating a signal through redox chemistry it may turn on Ca2+ -
mediated signalling.
75
Chapter 2 PRNP and PrP
2.5.2.4 PrPC has Neuroprotective Role
Several lines of evidence implicate a role of PrPC in prevention of apoptotic cell death.
Using the yeast two-hybrid system, Kurschner and Morgan (1995) identified Bcl-2 as a
binding partner of PrPC. Bcl-2 specifically suppresses apoptosis in a number of cell
types and it can bind proteins from the same and from other protein families. A peptide
comprising the C-terminal 183 amino acids of mouse PrPC (residues 72-254) interacted
with the Bcl-2 region that contains the BH2 domain (residues 174-236). By this
association, PrPC could sequester Bcl-2 from its intracellular organelle pools, and the
depletion of Bcl-2 pools during prion disease and accumulation of PrPSc may contribute
to apoptosis.
Kuwahara et al. (1999) established hippocampal cell lines from Prnp0/0 and Prnp+/+
mice. A stress insult (serum removal from the cell culture) caused apoptosis in the
Prnp0/0 cells but not in the Prnp+/+ cells. Transduction of the Prnp0/0 cells with either
PrPC- or Bcl-2-coding constructs prevented apoptosis of the cells under the serum-free
conditions. Prnp0/0 cells had shorter neurites than Prnp+/+ cells, but this was also
abrogated by the expression of PrPC in Prnp0/0 cells. This study strongly indicated the
involvement of PrPC in prevention of cell death.
Human PrPC protected neurons against apoptosis mediated by the Bax protein (Bounhar
et al., 2001). Inhibition of apoptosis depended on the proximal octarepeats but not on
the GPI-anchor. Bax is not pro-apoptotic unless it is induced by insult or
overexpression. However, overepression of both Bax and PrPC prevented apoptosis in
the human primary neurons. Conversely, an antisense PrPC cDNA potentiated the effect
of Bax overexpression. Trafficking of PrPC past the cis-Golgi was required for
neuroprotection. The PrP mutations D178A (FFI) and T183A prevented the protective
effect of PrPC. Thus, PrPC could be a strong natural neuroprotector.
Activation of PrPC in vitro induced neuroprotection (Chiarini et al., 2002). An
immunogenic PrPC-binding peptide (PrR; Martins et al., 1997) that binds the mouse
76
Chapter 2 PRNP and PrP
PrPC between residues 113-128 activated the cAMP/protein kinase A (PKA) and ERK
pathways, partially preventing apoptosis in retinal explants from neonatal rats or
neonatal mice, but not from Prnp0/0 mice. Incubation of cells with PrR increased the
intracellular levels of cAMP, activity of PKA and activation of ERKs. Addition of the
PrP106-126 peptide disrupted interactions between PrPC and PrR, blocking the
neuroprotective effect. Inhibitors of PKA, but not of ERKs, blocked neuroprotection
suggesting involvement of cAMP/PKA-dependent pathway in the PrPC-mediated
neuroprotection. Further, antibodies to PrPC that increased cAMP also increased the
neuroprotective effect, indicating that the activation of PrPC transduces neuroprotective
signals through a cAMP/PKA-dependent pathway and affects sensitivity to induced
apoptosis.
A cDNA microarray analysis was used to determine which genes are over- or under-
expressed in a human breast cancer cell line resistant to the cytotoxic action of tumor
necrosis factor α (TNF) (Diarra-Mehrpour et al., 2004). Seventeen-fold overexpression
of PRNP mRNA and also overexpression of PrPC was found in a TNF-resistant clone.
Furthermore, overexpression of PrPC was able to convert TNF-sensitive cells into TNF-
resistant cells. The protective effect of PrPC on tumor cells could be a consequence of
its interaction with laminin 2 and activation of the PI3K/Akt pathway.
2.5.2.5 PrPC Mediates Intercellular Contacts
PrPC binds molecules in extracellular matrix and on the cell membrane that mediate cell-
cell interactions.
The 37-kDa laminin receptor precursor (LRP) was identified as an interacting partner of
PrPC using the yeast two-hybrid system in S. cerevisiae (Rieger et al., 1997). PrPC binds
the same domain of LRP (residues 161-180) as does laminin. Laminin is a glycoprotein
involved in cell attachment, differentiation, movement and growth. The interaction
between PrPC and LRP was confirmed by re-transformation and by co-transfection in
the insect (Sf9) and mammalian (COS-7) cells. The LRP level was higher in scrapie-
infected N2a cells and in brains of scrapie-infected mice and hamsters. The LRP,
77
Chapter 2 PRNP and PrP
located on the cell surface, binds elastin and laminin and mediates their action. The two
extracellular proteins, PrPC and LRP may interact on the cell surface.
Interaction between laminin and PrPC was also shown (Graner et al., 2000) by the
specific and saturable fashion in which PrPC bound laminin. In brain, laminin promotes
neuronal differentiation, migration of neurons, neuronal regeneration and also acts anti-
apoptotically. These effects are mediated by the cell membrane receptors because
laminins are major components of the extracellular matrix. For example, the interaction
between laminin and amyloid precursor protein promotes neurite outgrowth. The PrPC-
laminin interaction was also involved in neuritogenesis induced by NGF and laminin in
PC-12 cells, suggesting a role for PrPC in neuronal plasticity. Supporting this hypothesis
are the observations that anti-PrPC antibodies inhibited neuritogenesis and that NGF
treatment of the PC-12 cells increased PrPC expression by 25%. Laminin is a big (800
kDa), heterotrimeric molecule with many known isoforms. PrPC bound preferentially to
the well-conserved γ-1 chain C-terminal domain of laminin that stimulates neurite
outgrowth. Neuritogenesis stimulated by the γ-1 chain was abrogated in the Prnp0/0
cells. PrPC may therefore act as a laminin receptor.
In the caveolae-like membrane microdomains (“rafts”), PrPC was identified as a part of
protein complexes together with three spice variants of the neural cell-adhesion
molecule (N-CAM) (Schmitt-Ulms et al., 2001). The N-CAMs belong to the
immunoglobulin superfamily and they mediate cell-cell interaction by triggering
cytosolic signals. The PrPC-N-CAM interaction occurred through amino-acid side
chains. The interacting face of PrPC, its N-terminal part, the first helix and the adjacent
loop, bound the β-strands C and C’ within two adjacent N-CAM fibronectin type III
modules. The partners may associate early during their joint passage in the secretory
pathway. Knock-out mice lacking N-CAM were susceptible to prions, indicating that N-
CAM is not the protein X (Chapter 1.2.4). However, the PrPC/N-CAM association may
be involved as an alternative signalling route from PrPC to Fyn tyrosine kinase (see
below).
78
Chapter 2 PRNP and PrP
In order to identify proteins that reside near PrPC in the cell, Schmitt-Ulms et al. (2004)
used time-controlled transcardiac cross-linking (tcTPC), a method that combines
transcardiac perfusion and mild formaldehyde cross-linking. More than 20 proteins
were identified; most of these were either integral membrane proteins or proteins that
reside near the cell membrane. Some of the proteins were components of the secretory
pathway. Of twenty proteins, six are GPI-anchored proteins (PrPC, N-CAM 1, N-CAM
2, myelin-associated glycoprotein, contactin-1, limbic system-associated membrane
protein), and two were previously identified partners of PrPC, chaperone BiP (Chapter
1.4.1), and APP-like proteins. Most of these twenty proteins are involved in cell
adhesion and neuritic outgrowth. Although it is possible that not all identified proteins
are genuine interacting partners of PrPC, this analysis confirmed that PrPC is embedded
within the specialized membrane microdomains (“rafts”) together with a defined subset
of other GPI-attached molecules.
Morel et al. (2004) showed co-localization of PrPC and Src kinase at the junctional
complexes on the lateral membrane of enterocytes. A pool of Src also co-precipitated
with anti- PrPC antibodies and vice versa. Thus, PrPC could play a role in intercellular
signalling and/or sensing of neighboring cells, through an interaction with Src kinases
(Fyn tyrosine kinase is a member of Src family; see below).
2.5.2.6 PrPC is Involved in Lymphocyte Activation
Evidence that PrPC is involved in the activation of T cells includes the observation that
PrPC is expressed at high levels in T cells, B cells, monocytes and dendritic cells (Li et
al., 2001). The composition of N-linked glycans on PrPC from these cells is different
from those on PrPC from brain or neuroblastoma cells. The level of PrPC expressed on
the surface of T cells increased as a consequence of cellular activation (Chapter 2.3).
The memory T cells express more PrPC than naïve T cells. Anti-PrPC antibodies
inhibited the proliferation of T cells in vitro. Thus PrPC may be involved in the
activation of T cells.
79
Chapter 2 PRNP and PrP
There is a strict association between the PrPC and Fyn in the lymphoblastoid T cells
(Mattei et al., 2004). PrPC clustered within the glycophospholipid-enriched membrane
microdomains (GEMs) where it strongly interacted with the GM1 and GM3
gangliosides. The GM3 is the main constituent of GEMs where it modulates signal
transduction. The phosphorylation protein ZAP-70 was also found to interact with PrPC
after T cell activation mediated by CD28 and CD3. ZAP-70 has a key role in the GEM-
associated signalling pathways leading to T cell activation. PrPC could be a component
of the signalling complex leading to T cell activation.
Finally, after hypothermal stimulation of the human lymphocyte cell line Jurkat E6.1,
PrPC co-localized with the CD3 and GM1 in the lipid rafts (Wurm et al., 2004).
Thus, PrPC could be involved in activation of T cells.
2.5.2.7 PrPC Participates in Nucleic Acid Metabolism
PrPC has nucleocapsid protein-like properties (Gabus et al., 2001). Human PrPC
mimicked the chaperone properties of HIV Ncp7 nucleocapside protein by actively
assisting the annealing of complementary nucleic acid strands, viral RNA dimerization,
hybridisation of replication primer tRNALys to the HIV-1 5’-primer binding site
sequence and initiation of reverse transcription by reverse transcriptase. The
transmembrane or the cytoplasmic PrP entities (Chapter 2.4) could interact with cellular
and/or viral nucleic acids.
2.5.2.8 PrPs are Memory Molecules
Alternative PrP conformations other than PrPSc could exist (Tompa and Friedrich,
1998). The self-sustaining autocatalytic propagation of these states may determine the
normal PrP function. A kinetic model was proposed in which PrP forms a bi-stable
molecular switch that can structurally encode and stably store information. Such a
mechanism could control a range of physiological processes, including the formation of
80
Chapter 2 PRNP and PrP
memory. The mechanism for long-term synaptic stabilization mediated by the neuronal
isoform of CPEB from sea hare shows similarities with this model (Chapter 1.5.2).
2.5.2.9 PrPC is Signal Transduction Protein
The final hypothesis I will discuss is that PrPC could be a signal transduction protein.
This is supported by findings of interactions between PrPC and proteins involved in
signal transduction. Antibody cross-linking of PrPC in the mouse 1C11 neuronal cells
triggers activation of the Fyn tyrosine kinase (Mouillet-Richard et al., 2000). In the
mouse hippocampus, Fyn contributes to the molecular mechanisms for induction of
long-term potentiation (a long-lasting enhancement of synaptic transmission thought to
be the cellular basis for learning and memory) (Kojima et al., 1997). The 1C11 cell line
is a neuroectodermal progenitor that, depending on the inducers, differentiates into
either 1C11*/5-HT serotonergic cells or 1C11**/NE noradrenegic cells. PrPC is expressed in
both progenitor and differentiated cells. PrPC cross-linking did not trigger response in
the progenitor cells. However, dephosphorylation of the Fyn tyrosine kinase and
increase of its kinase activity was found 10 min after ligation of the ani-PrPC antibodies
1A8 and SAF61 in the differentiated cells. Progenitor and differentiated cells have
similar amounts of PrPC, but the signalling competence involving PrPC depended on the
differentiation and full acquisition of neuron-associated functions. In the differentiated
cells, PrPC co-immunoprecipitated with caveolin-1. Antibodies against caveolin-1
inhibited the PrPC-mediated activation of Fyn, indicating involvement of caveolin-1 in
the coupling of PrPC with Fyn. Physiological extracellular signal leading to the
activation of PrPC is unknown. Although PrPC was abundant in both cell bodies and
neurite extensions, the neuritic PrPC was mostly due to Fyn activation. Thus, PrPC may
be involved in modulation of neuronal functions.
The recombinant bovine PrP (residues 25-242) interacts with the catalytic α/α’ subunits
of protein kinase CK2 (Meggio et al., 2000), a pleiotropic protein kinase that is
abundant in brain. CK2 phosphorylates more than 200 substrates, most of which are
involved in signal transduction and gene expression. The association between CK2 and
81
Chapter 2 PRNP and PrP
PrP induced CK2 phosphotransferase activity. Both N-terminal and C-terminal parts of
recombinant PrP were involved in this activation, but the N-terminus was more
important for activation. The CK2 is extracellular and could contact PrPC on the outer
side of the cell membrane leading to stabilization of the active conformation of CK2.
Recombinant mouse PrP (residues 23-231) was used as a bait to screen a mouse brain
cDNA expression library in the yeast two-hybrid system (Spielhaupter and Shätzl,
2001) leading to identification of the neuronal phosphoprotein synapsin Ib, adaptor
protein Grb2 and uncharacterized prion interactor Pint as potential partners of PrPC.
These interactions were confirmed by co-immunoprecipitation assays. Synapsin Ib and
Grb2 interacted with both the N- and C-terminal parts of PrP, but Pint interacted with
the C-terminal part only. PrPC co-fractionated with synapsin Ib and Grb2 in microsomal
preparations, indicating that these proteins interact in the intracellular, presumably
Golgi, vesicles. Pint1 is a newly discovered protein, with homologues in human and C.
elegans. Synapsins reversibly attach synaptic vesicles to the cytoskeleton, and regulate
their release, so the interaction between synapsin Ib and PrPC may contribute to the
regulation of cell-cell contact and extracellular signalling. Grb-2 is an adaptor involved
in intracellular signal transduction, which links signals coming from extracellular
proteins to their intracellular effectors. Interactions between PrPC and these proteins
involved in signal transduction suggest a role for PrPC in signal transduction.
When PrPC was stimulated with various anti-PrPC antibodies in the 1C11 progenitor and
differentiated cells, neurohypothalamic GT1-7 cells and T lymphoid BW5147 cells
(Schneider et al., 2004), it triggered production of the NADPH oxidase-dependent
reactive oxygen species (ROS), and phosphorylation of the extracellular regulated
kinase 1 and 2 (ERK 1 and 2), two MAPK kinases. PrPC activation lead to
phosphorylation of the p47PHOX subunit of NADPH, a substrate of the protein kinase C.
Inhibition of NADPH oxidase with diphenyleneiodonium (DPI) abolished ROS
production following PrPC activation, indicating involvement of NADPH oxidase ROS
production in the PrPC-mediated signalling. ROS act as chemical mediators in many
signalling processes such as regulation of transcription factors and activation of kinases,
including the MAPK kinase family. After PrPC activation the ERKs, but not the other
82
Chapter 2 PRNP and PrP
MAPKs, c-Jun NH2-terminal kinase or p38MAPK, were phosphorylated (activated). In
the neuronal context, ERKs are modulators of long-term synaptic facilitation (synaptic
plasticity): they activate the CREB-1-mediated gene transcription (Martin et al., 1997;
Si et al., 2003a; Chapter 6.5). Phosphorylation of ERKs is regulated by ROS production
in the 1C11 progenitor, GT1-7 and BW5147 cells, although the GT-1 and BW5147
cells lack caveolin-1. In the differentiated 1C11 cells, but not in the other cells, both
ROS production and phosphorylation of ERKs were specifically controlled by the
activation of Fyn tyrosine kinase. Thus, PrPC contributes to signalling networks in
neuronal, neuroendocrine and lymphoid cells (Figure 2.4).
Using Affymetrix oligonucleotide microarrays, Mody et al. (2001) analysed patterns of
gene expression in the developing mouse hippocampus. Of 11000 genes, 1926 showed
dynamic changes across the five timepoints denoting major developmental events.
These were the embryonic day 16 (E16) corresponding to the proliferation of neurons,
and the postnatal days 1, 7, 16 and 30 (P1, P7, P16, P30) corresponding to the
outgrowth and differentiation of neurons (P1, P7), formation of synapses (P16) and
maturation of synaptic function (P30). Genes showed 16 different expression patterns
(c0 - c15) of four major types: type I showing overall age-dependent down-regulation
(c0, c1, c5), type II showing general age-dependent up-regulation to peak levels at P16
or P30 (c10, c11, c14, c15), type III showing peak expression at either P1 or P7 (c4, c8,
c9, c12, c13) and type IV showing minimal expression at either P1 or P7 (c2, c3, c6).
This clustering correlated with the major developmental changes. For instance, the c1
genes highly expressed at the E16 were switched off after birth. The Prnp gene
belonged in the type II c15 cluster, showing the highest expression at P30, when the
hippocampal synapses become more active and begin to exhibit increased synaptic
plasticity. The other genes that shared the expression profile with Prnp were related to
the maturation of synaptic function, including the genes involved in synaptic function,
signal transduction, control of transcription and translation, glucose and oxidative
metabolism and membrane regulation of ionic concentration. Of particluar note here is
that the genes encoding PKC subunit βII and MEK protein kinase, which are involved
in the PrPC-induced signalling (Figure 2.4), clustered together with Prnp within the c15.
This clustering of PRNP gene and genes involved in its signalling pathway with genes
83
Chapter 2 PRNP and PrP
PrP
?
PKC
MEK
ERK
PrP
PKC
MEK
ERK
Cav
Fyn
Shc
Grb2
Ras
Raf
?
1C11, GT1-7, BW5147
(Signal(s) ?)
NOxNOx
1C115-HT, 1C11NE
Figure 2.4: Model of the proposed PrPC-associated signalling pathways. 1C11, progenitor neuroectodermal cells; GT1-7, hypothalamic cell line; BW5147, T lymphocyte cell line; 1C115-HT, differentiated serotonergic cells; 1C11NE, differentiated noradrenegic cells; PKC, protein kinase C; NOx, NADPH oxidase; MEK, MEK1 and MEK2 kinases; ERK, ERK1 and ERK2 kinases; Cav, caveolin 1b, Fyn, Fyn tyrosine kinase; Shc Grb-2 Ras Raf, Shc-Grb2/SOS-Ras-Raf signalling cascade (modified from Schneider et al., 2004).
83a
Chapter 2 PRNP and PrP
contributing to mature synaptic function indicates the involvement of PRNP in synaptic
plasticity (Chapter 6).
Comparative genomics is a strategy to understand gene function. I used this approach to
analyse the elusive function of PRNP gene in Chapter 6. My analysis supports best the
signal transduction hypothesis.
2.6 Genomes: Digging Out the Gems
A major impetus for sequencing of the human genome was its potential for discovery of
new human genes related to known disease-associated genes. Study of such genes may
shed light on the function of their disease-causing counterparts, reveal the basis for
related diseases, uncover potential drug targets and gain new insights into disease
pathogenesis mechanisms. Further, genomic sequence allows rapid discovery of
paralogues of the classic drug target proteins in silico. There are also numerous similar
applications to basic physiology and cell biology.
As well as the human genome, there are 167 genomes completely sequenced by now,
including only five other vertebrates, mouse, rat, Fugu, chicken and chimp (Genome
News network; Table 2.1). By 19 August, more than 30 genomes were sequenced this
year (Genome News Network). Genomic sequences are deposited in public biological
databases, and comparison of genomes is a strategy to discover new genes, define gene
regulatory elements and understand genome evolution and gene function.
2.6.1 The Human Genome
The human genome provides evidence of our evolutionary history (Lander et al., 2001).
Clues about human development, physiology and evolution are all encrypted within the
2.9 Gb of DNA. Basic features of the broad genome landscape are the gene content,
distribution of GC content and CpG islands, distribution of repeats and recombination
rate.
84
Chapter 2 PRNP and PrP
Table 2.1: 167 sequenced genomes (Genome News Network, 24 August 2004) Aeropyrum pernix Agrobacterium tumefaciens Anabaena Anopheles gambiae Apis mellifera Aquifex aeolicus Arabidopsis thaliana Archaeoglobus fulgidus Ashbya gossypii Bacillus anthracis Bacillus cereus Bacillus halodurans Bacillus subtilis Bacteroides thetaiotaomicron Bartonella henselae Bartonella quintana Bdellovibrio bacteriovorus Bifidobacterium longum Blochmannia floridanus Bordetella bronchiseptica Bordetella parapertussis Bordetella pertussis Borrelia burgdorferi Bradyrhizobium japonicum Brucella melitensis Brucella suis Buchnera aphidicola Caenorhabditis briggsae Caenorhabditis elegans Campylobacter jejuni Candida glabrata Caulobacter crescentus Chlamydia muridarum Chlamydia trachomatis Chlamydophila caviae Chlamydophila pneumoniae Chlorobium tepidum Chromobacterium violaceum Ciona intestinalis Clostridium acetobutylicum Clostridium perfringens Clostridium tetani Corynebacterium diphtheriae Corynebacterium efficiens Coxiella burnetii Cyanidioschyzon merolae Debaryomyces hansenii Deinococcus radiodurans Desulfovibrio vulgaris Drosophila melanogaster Encephalitozoon cuniculi Enterococcus faecalis Erwinia carotovora Escherichia coli Fugu rubripes Fusobacterium nucleatum
Gallus gallus Geobacter sulfurreducens Gloeobacter violaceus Guillardia theta Haemophilus ducreyi Haemophilus influenzae Halobacterium Helicobacter hepaticus Helicobacter pylori Homo sapiens Kluyveromyces waltii Lactobacillus johnsonii Lactobacillus plantarum Lactococcus lactis Leptospira interrogans Listeria innocua Listeria monocytogenes Magnaporthe grisea Mesorhizobium loti Methanobacterium thermoautotrophicum Methanococcoides burtonii Methanococcus jannaschii Methanococcus maripaludis Methanogenium frigidum Methanopyrus kandleri Methanosarcina acetivorans Methanosarcina mazei Mus musculus Mycobacterium bovis Mycobacterium leprae Mycobacterium paratuberculosis Mycobacterium tuberculosis Mycoplasma gallisepticum Mycoplasma genitalium Mycoplasma mycoides Mycoplasma penetrans Mycoplasma pneumoniae Mycoplasma pulmonis Mycoplasma mobile Nanoarchaeum equitans Neisseria meningitidis Neurospora crassa Nitrosomonas europaea Oceanobacillus iheyensis Onions yellows phytoplasma Oryza sativa Pan troglodytes Pasteurella multocida Phanerochaete chrysosporium Photorhabdus luminescens Picrophilus torridus Plasmodium falciparum Plasmodium yoelii yoelii Porphyromonas gingivalis Prochlorococcus marinus Protochlamydia amoebophila
Pseudomonas aeruginosa Pseudomonas putida Pseudomonas syringae Pyrobaculum aerophilum Pyrococcus abyssi Pyrococcus furiosus Pyrococcus horikoshii Pyrolobus fumarii Ralstonia solanacearum Rattus norvegicus Rhodopirellula baltica Rhodopseudomonas palustris Rickettsia conorii Rickettsia prowazekii Rickettsia siberica Saccharomyces cerevisiae Saccharopolyspora erythraea Salmonella enterica Salmonella typhimurium Schizosaccharomyces pombe Shewanella oneidensis Shigella flexneria Sinorhizobium meliloti Staphylococcus aureus Staphylococcus epidermidis Streptococcus agalactiae Streptococcus mutans Streptococcus pneumoniae Streptococcus pyogenes Streptomyces avermitilis Streptomyces coelicolor Sulfolobus solfataricus Sulfolobus tokodaii Synechococcus Synechocystis Thermoanaerobacter tengcongensis Thermoplasma acidophilum Thermoplasma volcanium Thermosynechococcus elongatus Thermotagoa maritima Thermus thermophilus Treponema denticola Treponema pallidum Tropheryma whipplei Ureaplasma urealyticum Vibrio cholerae Vibrio parahaemolyticus Vibrio vulnificus Wigglesworthia glossinidia Wolbachia pipientis Wolinella succinogenes Xanthomonas axonopodis Xanthomonas campestris Xylella fastidiosa Yarrowia lipolytica Yersinia pestis
Bold and underlined, the six vertebrate genomes.
84a
Chapter 2 PRNP and PrP
2.6.1.1 Gene and Protein Content
The early estimates of the gene number varied between 30000-100000 genes. Yet an
average human gene is complex. The mean exon number per gene is 8.8, and the mean
size of internal exons is 145 bp. An average gene extends across 27 kb. The mean size
of introns is 3365 bp, and mean sizes of 3’UTR and 5’UTR are 770 and 300 bp
respectively. The mean size of coding sequence is 1340 bp, translating into a protein of
447 amino acids. It was estimated that approximately 35% of human genes are
alternatively spliced, and there are on average 3 distinct transcripts per gene. The gene
density ranges from 6.4 genes/Mb (chromosome Y) to 26.8 genes/Mb (chromosome
19).
Protein-coding genes in the human genome were predicted from three lines of evidence:
direct evidence of transcription (mRNA, EST), indirect evidence (homology to
previously identified genes and proteins) and ab initio prediction using software that
recognizes the functional signals in genes. The ab initio gene prediction methods
predict correctly about 70% of individual exons and 20% of individual genes in human.
The gene prediction strategy used as a first step the Ensembl prediction system, starting
with the ab initio prediction (Genscan program) and confirmation of gene predictions
by assesing similarity with known proteins, mRNAs, ESTs and protein motifs from any
organism. The protein matches were then extended using the GeneWise program. This
system yielded 35500 gene and 44860 transcript predictions. Frequent mistakes with
this system are fragmentation, merging and overlapping of genes.
In the second step, the Genie program predictions were combined with the Ensembl
gene predictions. Genie starts with the mRNA and EST matches, and then employs the
Hidden Markov Model statistics for ab initio prediction to extend these matches in both
3’ and 5’ directions. This strategy yielded fewer fragmented genes than the Ensembl
system, merging 15437 Ensembl gene predictions into 9526 clusters.
In the final step, known genes from the RefSeq, SWISSPROT and TrEMBL databases
were incorporated into the results, producing a final estimate of 31000 coding genes in
85
Chapter 2 PRNP and PrP
the human genome, only twice as many as in worm or fly. This includes about 15000
known genes, and about 17000 gene predictions, which are a collection of anonymous
genes and a fantastic resource for targeted gene discovery (Chapter 4). This estimate
leads to calculations that, on average, 1.5% of the human genome is coding sequence.
There are also several thousands of non-coding genes in the human genome (tRNAs,
rRNAs, splicesomal RNAs, telomeric RNAs, snoRNAs, microRNA, siRNAs, and other
non-coding genes of unknown function). Overall 30% of the genome would be
transcribed.
The full set of known human proteins is more complex than those in invertebrates due
to presence of vertebrate-specific protein domains and motifs: 7% of the InterPro
families are vertebrate-specific representing 70 protein families and 24 domain families.
Vertebrates have arranged pre-existing protein components into a richer collection of
domain architectures. Specifically, the human genome contains more genes, domains,
protein families, paralogues, multidomain proteins with multiple functions and domain
architectures, in comparisons with worm and fly.
2.6.1.2 GC Content and CpG Islands
There are GC-rich and GC-poor regions in the human genome. The genome-wide GC
content average is 41%, ranging from 36-47.1% on a large scale (> 10Mb), and from
33.1-59% on a smaller scale. There is strong positive correlation between the GC
content and gene density. The human genome contains 28890 CpG islands, which are
short genomic regions (<85 bp) with high GC content (>75%) that are associated with
5’ ends of genes.
2.6.1.3 Repeat Content
Repeat sequences account for more than 50% of the human genome. The repeats are
evidence of evolutionary events and forces that have shaped the genome. As passive
entities, they represent markers for studies of mutation and selection. As active entities,
86
Chapter 2 PRNP and PrP
they have reshaped genome by causing rearrangements, forming new genes, reshuffling
existing genes and modulating of GC content.
Transposable elements comprise 45% of the genome. The currently recognized long
interspersed elements (LINEs), short interspersed elements (SINEs), long terminal
repeats (LTR) retroposons and DNA transposons comprise 13%, 20%, 8% and 3%,
respectively, of the human genome. Overall activity of these transposons has declined
over the past 35-50 million years, with the possible exception of the 61 LINEs with
intact ORFs. There is a remarkable variation in the repeat content across the genome,
ranging from less than 2% across the four 100 kb homeobox gene clusters to 89%
across 525 kb of the X chromosome in region Xp11. The absence of repeats in a
genomic region may indicate many cis-regulatory elements that cannot be interrupted
by insertions.
Simple sequence repeats (SSR) are perfect or imperfect tandem repeats of a particular k-
mer. Microsatellites have a short k (1-13 bp) and minisatellites have longer k (14-500
bp). Simple sequence repeats arise by the DNA polymerase slippage and comprise 3%
of the genome with frequency of one SSR per 2 kb.
Segmental duplications of parts of the genomic sequence (1-200 kb) occur as
interchromosomal duplications when segments are distributed to nonhomologous
chromosomes, and as the intrachromosomal duplications when duplications occur
within a particular chromosome. These regions comprise 3.3% of the genome.
Chromosomal regions near centromeres and telomeres consist almost entirely of
interchromosomal duplicated segments.
2.6.1.4 Recombination Rate
The overall occurrence of single nucleotide polymorphisms (SNP) is roughly 1 in 1900
bp. Recombination rate varies across the genome. In general, recombination rate is
higher in the distal regions of chromosomes (20 Mb from telomere) and on the shorter
87
Chapter 2 PRNP and PrP
chromosome arms, promoting at least one crossover per chromosome arm per meiosis.
Recombination is suppressed near the centromeres.
2.6.1.5 Quality Assessment of the Human Genome Sequence
World standards for the human genome sequence fidelity state that there should be less
than one base pair error per 10000 DNA bases (99.99% accuracy), and that the
sequence should be without gaps. Schmutz et al. (2004) performed a detailed evaluation
of a sample of 34 Mb of the human DNA reference sequence. Accuracy of the sequence
was above 99.99%, with the overall error rate 1/73369. There was 1 significant error (a
single error that causes 50 contiguous base pairs to be incorrect) in 2630005 base pairs.
2.6.2 The Mouse Genome
Mouse is a key experimental tool for biomedical research (Waterston et al., 2002). The
mouse genome is also important for comparative genomics, since roughly 75 million
years of independent evolution separates the human and mouse genomes, which now
diverge in nearly one substitution per two nucleotides.
The mouse genome (2.6 Gb) is 14% smaller than the human genome (Table 2.2), due to
higher deletion rate in mouse. The mouse has higher overall GC content (42%) and
tighter GC distribution. There are fewer CpG islands in the mouse genome (15500) than
in human (28890).
Only 37.5% of the mouse genome can be recognized as transposon-derived, compared
with 45% of the human genome. This is due to higher nucleotide substitution rate that
makes ancient repeat sequences difficult to recognize. The neutral substitution rate in
mouse (4.5 x 10-9 per year) is twice that of human (2.2 x 10-9 per year), perhaps
determined by population size, body size or generation time. The depth of the human
repeat analysis (150-200 million years), therefore, is better than that of the mouse repeat
analysis (100-120 million years). Lineage-specific repeats account for 32.4% of the
mouse genome compared with 24.4% in human. The rate of transposition is constant in
88
Chapter 2 PRNP and PrP
Table 2.2: Vertebrate genomes in numbers
Human Mouse Dog Rat Fugu Genome size 2.9 Gb 2.6 Gb 2.4 Gb 2.7 Gb 365 Mb Gene number 31000 31000 NA 31000 39000 Gene density 6.4/Mb -26.8 /Mb NA NA NA 1/10.9 kb
Average gene size 27 kb NA NA NA NA GC content 41% 42% NA 43% 44.1-53.5 %
Transposons 45% 37.5% 31% 40% 2.7 % Substitution rate 2.2 x 10-9/year 4.5 x 10-9/year (2.2 x 10-9/year) 4.9 x 10-9/year NA SNP frequency 1/1900 bp 1/600 bp 1/1500 bp NA NA
88a
Chapter 2 PRNP and PrP
mouse although it has declined in humans. There are 3000 individual LINEs, four SINE
lineages and three LTR lineages that are potentially active in the mouse genome. The
LINEs bias toward AT-rich, and the SINEs bias toward GC-rich genome regions.
The SNP frequency is 1 per 500-700 bp in mouse. Mouse has roughly four-fold more
short SSRs (1-5 bp unit) than human.
Both the mouse and human genome have about 30000 protein-coding genes. There are
80% mouse genes with one identifiable orthologue in human. Less than one percent of
genes are unique to each genome. At the nucleotide level, the two genomes can be
aligned across 40% of their lengths. These sequences are the orthologous sequences
from the common ancestor that remained in both lineages. Over 90% of the human and
mouse genomes can be partitioned into the regions of conserved synteny (orthologous
gene loci on the same chromosome in two species regardless of gene order and presence
of intervening genes). In these genomic regions, gene order from the most recent
ancestor has been conserved in both species.
Approximately 5% of mammalian genome is under purifying selection, more that its
coding potential (1.5%). This suggests that the UTRs (1%), regulatory elements, non-
protein-coding genes and chromosomal structural elements are under functional
selection as well.
The mammalian genome is evolving in a non-uniform manner. There is a substantial
variation across the genome in all three forces that shape genome: nucleotide
substitution, deletion and insertion. Neutral substitution rate is correlated with
recombination rate genome-wide.
Two general mechanisms guide protein invention in eukaryotes. First, domains can be
combined to form new architectures, and second, gene families may expand in a
lineage-specific manner. In the mouse lineage, many local gene family expansions have
occurred. Such examples include genes involved in reproduction, immunity,
development and olfaction.
89
Chapter 2 PRNP and PrP
Two-genome comparison between human and mouse allowed estimation of rate of
protein evolution. Measures of protein sequence evolution are the percentage of identity
and the ratio between the rates of non-synonymous (KA) mutations per non-synonimous
site and synonymous (KS) mutations per synonymous site (in general, the KA / KS ratio
<1 indicates purifying selection, the KA / KS ratio =1 indicates neutral evolution, and the
KA / KS >1 indicates positive selection). For the 12845 pairs of mouse-human 1:1
orthologues, the median amino acid identity was 78.5%, and the median KA / KS ratio
was 0.115. The major determinant of the KA / KS ratio was variation in KA. The KS
clustered tightly around 0.6 synonymous substitutions per synonymous site, indicating a
similar neutral substitution rate among all proteins. Domains are under greater selective
pressure than protein regions not containing domains, and catalytical domains are under
greater selective pressure than not-catalytical domains. Finally, domains in the secreted
class are typically under less purifying selection than are either nuclear or cytoplasmic
domains. Protein domain families involved in the immunity and gene transcription
showed the highest median KA / KS ratio.
2.6.3 The Rat Genome
Rat is a tool in experimental medicine and drug discovery (Gibbs et al., 2004). It is
separated from mouse by 12-24 million years, and from human by about 75 million
years. This third mammalian genome sequenced allowed three-way comparisons to
resolve new details of mammalian evolution.
The rat genome (2.7 Gb) is smaller than human but bigger than mouse (Table 2.2). The
difference between rodents is due to a different repeat content and to a different
proportion of segmental duplications.
The number of genes encoded by the rat genome is similar to that in mouse and human
(about 30000). Most of genes (90%) have had no deletion or duplication since the last
common ancestor. The intronic structures have been conserved as well. Coding density
is about 1.7%. There are 435 tRNA genes, and 454 other known non-coding RNA genes
90
Chapter 2 PRNP and PrP
defined in the rat genome. There are 15975 CpG islands in the rat genome. The GC
content is 0.35% enriched in comparison with mouse (43%) due to a higher rate of A to
G transitions over T to C transitions. There is also an excess of the G+T over C+A on
the coding strand (strand asymmetry).
In the protein-coding sequences there is an overall excess of small deletions over
insertions. Based on the three-species comparisons, the rates of indel accumulations in
nuclear, accumulated/secreted, mitochondrial, cytoplasmic proteins, enzymes and
ligand-binding proteins are 4 x 10-4, 3.9 x 10-4, 3.1 x 10-4, 2.4 x 10-4, 2.1 x 10-4 and 1.4 x
10-4. Whereas the transmembrane protein regions were the most refractory to indel
accumulation, the low-complexity protein regions were three times enriched in indels.
Almost all human disease-associated genes have 1:1 orthologues in the rat genome and
are unlikely to be diverged, duplicated or lost. However, their rates of synonymous
substitution are higher than those of remaining genes. Some rat-specific genes arose
through expansion of gene families, including the genes encoding pheromones,
immunity-related proteins and proteins involved in chemosensation and detoxification.
About 3% of the genome is in the large segmental duplications, associated primarily
with the pericentromeric and subtelomeric regions. These regions harbour many
recently expanded gene families. Intrachromosomal duplications occur three times more
frequently than the interchromosomal duplications in rat.
Roughly 40% of the rat genome aligns with human and mouse and this fraction contains
the vast majority of exons and regulatory elements. A portion of this eutherian core
makes 5-6% of the genome that is under selective constraint. About 28% of the rat
genome aligns only with mouse. This fraction contains rodent-specific repeats (40%),
and the rest may be single-copy DNA deleted in the human lineage.
One third (29%) of the rat genome aligns with neither human nor mouse. Half of this
sequence consists of rat-specific repeats, and about third of this sequence are rodent-
specific repeats deleted in mouse.
91
Chapter 2 PRNP and PrP
There were 250 genome rearrangements in the rodent lineage since evolutionary split
between rodents and human. The neutral substitution rate appears to be three times
higher in rodents than in human, with that in rat 5-10% higher than in mouse.
Microdeletions occur at a two-fold higher rate in rodents than in human. There is a
correlation between the local rate of microinsertions, microdeletions, transposable
element insertions and nucleotide substitutions in the rat genome.
Males have two-fold excess of nucleotide substitution and of little indels (<50 bp)
mirroring the ratio of the numbers of cell divisions between the male and female
germlines.
About 40% of the rat genome is derived from transposable elements. The LINEs
comprise 22% of the genome, with the L1 family still active. Two SINE families, B2
and ID, are also active, as well as all three classes of LTR retroviral elements. The DNA
transposons are inactive.
2.6.4 The Fugu rubripes Genome
The tiger pufferfish, Fugu rubripes, has the smallest vertebrate genome but its gene
repertoire is similar to mammals (Venkatesh et al., 2000). Thus it could be a useful
reference genome for gene discovery and discovery of conserved regulatory elements.
Although the compact genome of tiger pufferfish Fugu rubripes has only 365 Mb
(Aparicio et al., 2002), the number of protein-coding genes between human and Fugu is
comparable (Table 2.2). 31059 genes were predicted in the Fugu genome, with the
upper bound of gene loci expected to reach 38000-40000. Genes were predicted mostly
using the homology evidence due to unavailability of cDNA.
Only 2.7% of the genome matched interspersed repeats but this is probably a significant
underestimate due to incompleteness of the Fugu repeats database. Rapid deletion of
nonfunctional sequences may be the mechanism accounting for the repeat structure in
92
Chapter 2 PRNP and PrP
Fugu. On the other hand, transposable elements in Fugu appear to be very active. At
least 40 families of transposable elements have accumulated fewer than 5% of
substitutions, indicating that they may be active.
The compactness of Fugu genome is due to reduction in the size of introns and
intergenic regions. Roughly 75% introns are <425 bp in length, but the number of
introns, 161536, is very similar to that in human. Both gain and loss of introns were
observed in the Fugu lineage. The presence of “giant” genes was also noted in Fugu.
The average gene density was estimated to be one gene locus per 10.9 kb. Gene loci
occupy one third of the genome.
There was much lower GC variation in the overall Fugu GC content (44.1-53.5%) than
in human.
With windows of 1, 0.5 and >1 kb, roughly 0.15, 1.3 and 5% of the Fugu genome
contained duplicated segments, indicating that the large duplications are not a recent
feature of the Fugu genome. However, evidence for ancient duplications comes from
the existence of paralogous segments.
Most of human peptides (75%) have some match in Fugu. About 6000 Fugu proteins
have no match in human. There is a general human-Fugu concordance between the
predicted protein classifications. Exceptions include an excess of the potassium channel
subunits and kinases in Fugu, and an excess of the C2H2 zinc finger proteins in human.
Olfactory receptors show a clear expansion of different families in Fugu.
Many short genomic segments are conserved between human and Fugu after separation
by 450 million years of independent evolution. However, scrambling of the gene order,
depending on the chromosome length, was also often found in human-Fugu
comparisons.
93
Chapter 2 PRNP and PrP
2.6.5 The Dog Genome
Dog is an attractive choice for genetic comparisons as the characteristics of about 300
breeds are maintained by restricting gene flow between breeds. The dog genome
sequence was sequenced with 1.5 time coverage, consisting of 6.22 million reads and
covering about 50% of the 4.8 Gb diploid genome (Kirkness et al., 2003). This limited
depth of sequencing permits some initial analyses.
The dog genome is estimated to be 2.3-2.4 Gb (Table 2.2). The 6.22 million reads were
merged into 522011 contigs with mean span of 8.6 kb and random sequence coverage
of about 77%.
Roughly 31% of the sequence is repeat-derived (e.g. human 45%, mouse 38%), but the
dog repeat libraries may not be as complete as those for human and mouse. The
substitution levels were similar in dog and human.
The dog-human alignments covered 18473 genes. Dog appears to have much larger
complements than human of olfactory receptor genes, and genes involved in peptide
metabolism.
The SNP frequency in dog was estimated to be about 1/1500 bp.
Many sequences in the dog genome differed in the presence or absence of a SINE
insertion, and such polymorphisms were verified in a number of dog breeds.
Approximately 7% of the 23000 SINE_cF elements are dimorphic in the sequenced
poodle, and these are a valuable resource for phylogenetic studies. This kind of gene
dimorphism may cause dramatic phenotypic effects (e.g. induction of the canine
narcolepsy), contributing to the phenotypic diversity among the dog breeds.
At present, besides the published human, mouse, rat, dog and Fugu genome analyses,
sequences of the chimpanzee and chicken genomes are also available (Ensembl).
Sequencing and assembly of the Tetraodon and zebrafish genomes is near completion
94
Chapter 2 PRNP and PrP
(Genoscope; Ensembl). Further, sequencing of the cow, pig and Brazilian opossum
genomes is underway (NCBI). The National Human Genome Research Institute
(NHGRI, USA) approved funding for the projects to sequence the genomes of African
savannah elephant, the European common shrew, the guinea pig, two species of
hedgehog, the nine-banded armadillo, the rabbit, the cat, and the orang-utan (Genome
News Network). These vertebrate genomes are priceless resource for discovery of genes
and definition of gene regulatory sequences.
2.6.6 Annotation of Genomic Sequences
Automatic genome annotation is a major strategy to annotate genomic sequences
(Chapter 2.6.1). However, this is a work in progress. The main problems are mistakes
arising from automatic genome annotation, and the inability of recent programs to
predict UTRs and non-protein coding genes. Further, the collections of transcripts and
ESTs are limited.
Guigo et al. (2003) developed a two-stage multi-exon gene prediction procedure that
exploits the availability of human and mouse genomic sequences. The first stage is to
run gene-prediction programs (TWINSCAM, SGP2) that utilize genome alignment in
combination with detection of statistical patterns in DNA. In the second stage,
multiexon genes predicted in human and mouse are compared. Gene prediction is
retained only if the predicted proteins in both species align, with at least one predicted
intron at the same location. A total of 1019 additional new genes were predicted using
this method. The reliability of these gene predictions was 76%, as tested by RT-PCR
and direct sequencing of a single exon pair from a sample of the gene predictions.
Analysis of gene expression patterns indicated that this gene prediction system could be
particularly sensitive to genes with tissue-restricted expression.
There are still transcripts and ESTs that are missing from the human collections. Ota et
al. (2004) sequenced 21243 full-length human cDNAs, of which 14490 were unique.
Roughly half of these were protein-coding cDNAs (5416). Of these, 1999 clusters had
not been predicted by computational methods. The distribution of GC content in this
95
Chapter 2 PRNP and PrP
category has a peak at 58%, suggesting that there may be a bias against GC-rich
transcripts in the current protein-coding gene predictions. The remaining cDNAs
contained no ORF, corresponding to the non-protein-coding genes.
Manual curation is at present the ultimate way to annotate genomic sequence. For
example, The Vertebrate Genome Annotation database (VEGA;
http://vega.sanger.ac.uk/) is a central repository for manual annotation of different
vertebrate finished genome sequences. Expert manual annotators have to correct
mistakes arising from automatic gene prediction by effectively integrating the ab initio
gene predictions, direct evidence, homology-based evidence and comparison across
multiple genomes.
This strategy can also be used in a targeted fashion to search for genes of interest in
multiple genomes and compile supporting evidence. Using this strategy to search
genomic databases for the predicted PRNP paralogues, I discovered a new human
PRNP paralogue dubbed Shadow of prion protein gene (SPRN). I compiled direct
evidence, ab initio gene predictions and homology-based evidence for this gene in
mammals and fish (Chapter 4).
2.7 Comparative Genomic Analysis
With the availability of many genomic sequences, it is now possible to decipher
information that is encrypted within the DNA stands. Comparative genomic analysis is
emerging as a major strategy to understand genomes.
Functional sequences tend to evolve more slowly than non-functional sequences (Frazer
et al., 2002). By comparing genomic sequences it is therefore possible to identify
conserved, functional sequences (coding and non-coding) against non-conserved, non-
functional background noise. The depth of comparative analysis depends on the
evolutionary distance between sequences in comparison. For instance, human-mouse
(75 million years) comparisons revealed many conserved coding and non-coding
regions, but it was not possible to discriminate which non-coding conserved regions are
96
Chapter 2 PRNP and PrP
indeed functional (Waterston et al., 2002). When more species are included in
comparisons (e.g. human, mouse and cow), non-coding sequences conserved in all
species are more likely to be functional. At the other extreme are comparisons between
human and fish (450 million years). These will primarily reveal conserved coding
sequences, but conserved regulatory sequences could be also found.
Computational tools have been developed to enable comparison and analysis of
genomic sequences (Frazer et al., 2003). There are two basic types of programs for
alignment of long genomic sequences: global and local. Global alignments are designed
to produce an optimal similarity score over the entire lengths of sequences compared. I
used the global alignment tool VISTA in my work (Mayor et al., 2000; Chapter 3). The
VISTA server implements AVID algorithm that works by first finding maximal exact
matches between two sequences using suffix tree, and then identifies the best anchor
points based on the length of the exact matches and the similarity of their flanking
regions. Local alignments, on the other hand, are computed to produce optimal
similarity scores between the subregions of sequences. I used the PipMaker program for
local alignments in my analyses (Schwartz et al., 2000; Chapter 3). The underlying
algorithm BLASTZ is a gapped BLAST program that starts by finding short, exact
matches and than extend those matches to alignments that include gaps.
Kellis et al. (2003) compared genomes of four yeast species (S. cerevisiae, S.
paradoxus, S. mikatae, S. bayanus) that diverged over 5-20 million years. This
comparative genomic analysis allowed gene identification and determination of gene
structure. Gene regulatory elements in genes were also found. Furthermore, genes and
genome regions that exhibit fast or slow evolutionary changes were identified.
Thomas et al. (2003) compared an 1.8 Mb region of human chromosome 7 harbouring
10 genes with its orthologous genomic regions from 11 species. Human, chimpanzee,
baboon, cat, dog, cow, pig, rat, mouse, chicken, Fugu, Tetraodon and zebrafish,
spanning 450 million years of evolution, were included in this analysis. These
sequences showed conservation that reflected both functional constraints and neutral
sequence entropy. The small genomic regions (average 58 bp) conserved across these
97
Chapter 2 PRNP and PrP
sequences called multi-species conserved sequences (MCS) are candidate regions for
functional roles. About 2% of MCSs comprises ancestral repeats and 32% represents
coding sequences or UTRs. The remaining 68% of MCSs are outside known exons, and
almost none correspond to currently known regulatory elements. Many of the conserved
non-coding genomic sequences identified by this strategy were previously not
detectable in pairwise sequence comparisons. The human-fish comparisons detected
conservation largely confined to coding sequences, but almost third of human coding
exons did not align with fish. Eliminating chimp and baboon did not affect the
specificity of the MCS detection, but eliminating non-human primates, chicken and fish
reduced the MCS number by 17%. Chicken sequence alone detected 40% of MCS bases
(94% of the coding but only 29% of the non-coding sequences).
I used the public genomic data for mammals (human, mouse, rat) and fish (zebrafish,
Fugu, Tetraodon) as a basis for gene discovery and comparative genomic analysis by
which I determined evolutionary trajectories of the PRNP and SPRN genes (Chapter 5).
2.8 The Tammar Wallaby: an Alternative Mammalian Experimental Model and
Kangaroo Genome Project
The number of vertebrate genomic sequences available limits the depth of comparative
genomic analysis. O’Brien et al. (2001) discussed current limitations for comparative
genomics and listed mammalian species that are a priority for sequencing. The criteria
for sequencing priority includes phylogeny, relevance to understanding human biology
or medicine, economic importance, genomic characteristics, developmental features and
species diversity among mammalian orders. Of 4600-4800 mammal species, all but 270
are eutherian (“placental”) mammals. The eighteen eutherian orders cluster into four
principal clades. Human, mouse and rat all cluster in the clade III. There is therefore a
need to sequence representatives from the other three clades. Livestock together with
cat and dog, cluster in the same clade IV. Representatives of the remaining clade II
(sloths, anteaters, armadillos) and clade I (Afrotheria) should also be considered, as well
as marsupials and monotremes.
98
Chapter 2 PRNP and PrP
Graves and Westerman (2002) presented the case for a Kangaroo Genome Project.
Marsupials, found only in Australasia and the Americas, are mammals since they bear
fur and suckle their young with milk. Yet, independent evolution over 180 million years
of separation from their eutherian relatives has sculpted different (but not inferior)
mammals with quite distinct characteristics.
Three distantly related marsupial species have been of major experimental interest:
tammar wallaby (Macropus eugenii), fat-tailed dunnart (Sminthopsis crassicaudata) and
Brazilian opossum (Monodelphis domestica) (Graves and Weterman, 2002).
All mammals (Figure 2.5) are equally related to birds and reptiles (about 310 million
years of separation), and fish (roughly 450 million years). Marsupials (Metatheria) and
Eutheria diverged about 180 million years ago. These therian mammals diverged from
the egg-laying mammals monotremes (Prototheria) roughly 210 million years ago.
Early marsupials radiated in the Americas more that 65 million years ago, and during
the time of the supercontinent Gondwana they colonized Antarctica and Australia. After
separation of the Americas and Australia 38-84 million years ago, Australian marsupials
evolved separately. The oldest fossils found in Australia are dated 55 million years ago.
The evolutionary distance between tammar wallaby (Australia) and Brazilian opossum
(South America) mirrors that of human and mouse (75 million years).
The marsupial genome is roughly the same size as eutherian genomes, but it is usually
divided into fewer, larger chromosomes. A basal 2n=14 karyotype represented in all
marsupial superfamilies represents an ancestral diploid marsupial karyotype. The
diploid karyotype of tammar wallaby contains 16 chromosomes, and the diploid
Brazilian opossum karyotype has 18 chromosomes.
Comparative gene mapping has been used to study the relationships between the
mammalian genomes. The experiments comparing human and other eutherians showed,
for instance, that the X chromosome content is mainly conserved among all eutherian
mammals. However, most genes on the short arm of human X are autosomal
99
Chapter 2 PRNP and PrP
Figure 2.5: Evolutionary relationship among vertebrates (Graves and Westerman, 2002). My, million years.
99a
Chapter 2 PRNP and PrP
(chromosome 5) in tammar wallaby, as well as in monotremes, implying that they were
added onto the eutherian X after divergence of marsupials. The relatively recent
evolutionary origin of this region could explain why a high number of human genes on
the short arm of human X escape the X inactivation. The marsupial X is subject to
inactivation, but the mechanism seems to be simpler than that in eutherians, and may be
ancestral. Several genes involved in eutherian sex determination have been isolated and
analysed in marsupials.
The depth of comparative genomics depends on the richness and evolutionary span
across the species being compared. As marsupials on the evolutionary scale fill the huge
gap between eutherians (which radiated roughly 105 million years ago) and bird/reptile
branch (which diverged about 310 million years ago), this lineage makes a logical
choice for sequencing. Being at this mid evolutionary distance from human, the highest
promise for such an alternative mammalian experimental system is in identification of
conserved genes and of conserved regulatory sequences.
This potential of such analyses of the kangaroo genome was discussed by Wakefield
and Graves (2003). Sequencing of the kangaroo genome will provide a new dimension
to comparative genomic analysis, as inferred from the contribution of Australian
mammal tammar wallaby (Macropus eugenii) to biology, genetics and genomics.
Comparison of the XPCT gene between human, mouse and tammar wallaby suggested a
high ratio of conservation signal to random noise. This reduced noise level could be
particularly useful for identification of gene regulatory regions.
The kangaroo genome project (Figure 2.6; http://kangaroo.genome.org.au) is an
international project to achieve draft-quality sequencing of the tammar wallaby genome.
The project includes mapping of the tammar genome, sequencing of DNA and analysis
of gene expression. Initial funding for the project was approved in March 2004.
I outline some major discoveries that have emerged from the mammalian-wide
comparisons, arguing in favour of the kangaroo genome project.
100
Chapter 2 PRNP and PrP
Figure 2.6: Kangaroo genome project logo.
100a
Chapter 2 PRNP and PrP
2.8.1 The Mammalian Testis-Determining Gene
A testis-determining gene is encoded by the Y chromosome in mammals. This gene
defines maleness, inducing the development of testis from the indifferentiated gonad.
The Y-borne zinc-finger (ZFY) gene was an early candidate for the testis-determining
gene. It maps to the eutherian Y and also has a homologue (ZFX) on the short arm of
the human X. Using the human ZFY as probe to hybridise the tammar wallaby and fat-
tailed dunnart chromosome spreads, Sinclair et al., 1988, surprisingly, found that it
mapped to neither Y nor X. In marsupials, the ZFY is autosomal, indicating that it is not
primary mammalian sex-determining gene.
2.8.2 Discovery of New Human Genes
It was proposed that there are two classes of Y-chromosome associated genes: single
copy genes present on both Y and X and widely expressed, and multicopy Y-specific
genes expressed in testis. It was thought that one such a testis-specific gene was the
human RBMY (for RNA-binding motif gene, Y chromosome). RBMY genes were
reported to have no X homologue in eutherians. However, Delbridge et al. (1999) first
found that the RBMY has a homologue on the marsupial X and subsequently
demonstrated also on the human chromosome X by cloning, sequencing and fluorescent
in situ hybridisation. Thus the new human locus, RBMX, was found after comparison
between marsupials and human. This human gene is now being investigated for a role in
mental retardation, since its position on the human X falls within a deletion interval
containing several X-linked mental retardation genes.
2.8.3 Detection of Regulatory Elements
Chapman et al. (2003) used marsupial sequence for phylogenetic footprinting (Chapter
6.6). A BAC clone from stripe-faced dunnart (Sminthopsis macroura) was isolated
harbouring the lymphoblastic leukemia-1 (LYL1) gene. LYL1 is a member of the stem-
cell leukemia gene family identified on the basis of translocations in T cell acute
101
Chapter 2 PRNP and PrP
leukemia. By aligning the LYL1 promoter between human, mouse and dunnart,
Chapman et al. found conserved putative transcription factor-binding sites.
I therefore isolated and characterized the prion protein gene from tammar wallaby. In
comparative genomic analysis that included also the PRNPs from four eutherian species
(human, mouse, bovine, ovine), I identified mammalian-wide conserved gene regions
and potential regulatory elements. I discussed these findings with respect to current
hypotheses about the function of PRNP (Chapter 6). This study showed utility of the
marsupial sequence in analysis of the human disease-related gene.
2.9 The Present Study
The original aim of this study was to analyse the evolution and function of prion protein
gene. Elucidation of its normal function is essential for better understanding of its role
in prion diseases, and for development of strategies for therapy and prevention of prion
diseases.
This project grew another dimension when I discovered the new human SPRN gene and
defined a new family of vertebrate Shadoo proteins (Chapter 4).
I then analysed evolution of PRNP and SPRN genes and showed different evolutionary
trajectories for these two mammalian genes. The more conserved evolution of SPRN
gene indicates that it has more prominent, and perhaps more important, function than
PRNP suggesting that it could substitute for the loss of PRNP in the knock-out mice
(Chapter 5).
Finally, PRNP gene comparisons across the eutherian-marsupial distance enabled me to
identify conserved gene regions that represent potential regulatory elements. I fitted this
information with the hypotheses on normal function of PRNP and concluded that my
analysis supports best the signal transduction hypothesis (Chapter 6).
102