+ All Categories
Transcript

6) 762–771www.elsevier.com/locate/ygeno

Genomics 88 (200

A 360-kb interchromosomal duplication of the human HYDIN locus

Norman A. Doggett a,⁎, Gary Xie a, Linda J. Meincke a, Robert D. Sutherland a, Mark O. Mundt a,Nicolas S. Berbari b, Brian E. Davy b, Michael L. Robinson b,1, M. Katharine Rudd c,

James L. Weber d, Raymond L. Stallings e, Cliff Han a

a DOE Joint Genome Institute and Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USAb Division of Molecular and Human Genetics, Children's Research Institute, Ohio State University, 700 Children's Drive, Columbus, OH 43205, USA

c Division of Human Biology, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, C3-168, Seattle, WA 98109, USAd Center for Medical Genetics, Marshfield Medical Research Foundation, 1000 North Oak Avenue, Marshfield, WI 54449, USA

e Children's Cancer Research Institute, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Drive, San Antonio, TX 78229, USA

Received 13 April 2006; accepted 19 July 2006Available online 30 August 2006

Abstract

The HYDIN gene located in human chromosome band 16q22.2 is a large gene encompassing 423 kb of genomic DNA that has been suggestedas a candidate for an autosomal recessive form of congenital hydrocephalus. We have found that the human HYDIN locus has been very recentlyduplicated, with a nearly identical 360-kb paralogous segment inserted on chromosome 1q21.1. The duplication, among the largestinterchromosomal segmental duplications described in humans, is not accounted for in the current human genome assembly and appears to be partof a greater than 550-kb contig that must lie within 1 of the 11 sequence gaps currently remaining in 1q21.1. Both copies of the HYDIN gene areexpressed in alternatively spliced transcripts. Elucidation of the role of HYDIN in human disease susceptibility will require careful discriminationamong the paralogous copies.© 2006 Elsevier Inc. All rights reserved.

Keywords: Segmental duplication; Interchromosomal duplication; Gene duplication; HYDIN gene; Human chromosome 16; Human chromosome 1

The human HYDIN gene was suggested as a candidate forcongenital hydrocephalus (an accumulation of cerebrospinalfluid within the ventricles of the brain) based on the finding ofmutations in the mouse Hydin gene, which were found to becausative for congenital hydrocephalus in hy3 mice [1]. Thehuman homologue of Hydin, located in chromosome band16q22.2, contains conserved exons spanning 423 kb of genomicDNA. While mutations in human HYDIN have not yet beenestablished for congenital hydrocephalus, there has been onecase of hydrocephalus associated with a reciprocal translocationvery near the cytogenetic location of the HYDIN gene in 16q22,t(4;16) (q35;q22.1) [2]. Pulsed-field gel analysis of this regionfurther refined the translocation breakpoint to an altered 1.2-MbNotI fragment with a CALB2 probe [3], which can now be

⁎ Corresponding author. Fax: +1 505 665 3024.E-mail address: [email protected] (N.A. Doggett).

1 Current address: Zoology Department, 256 Pearson Hall, Miami University,Oxford, OH 45056, USA.

0888-7543/$ - see front matter © 2006 Elsevier Inc. All rights reserved.doi:10.1016/j.ygeno.2006.07.012

deduced to encompass the HYDIN locus. CALB2 is 160 kbdistal to HYDIN in the finished sequence spanning this regionand both are contained in a single 1,145,723-bp NotI fragment(chromosome 16: 69,291,728–70,437,450; NCBI build 35).

Congenital hydrocephalus, an accumulation of cerebrospinalfluid within the ventricles of the brain, affects an estimated 1 in1000 live births [4]. The X-linked congenital hydrocephalusconditions, congenital stenosis of the aqueduct of Sylvius(HSAS, OMIN 307000) and MASA syndrome (MASA,OMIN 303350), have been shown to be caused by mutationsin the L1CAMgene located at Xq28 [5–7] and represent themostcommon inherited forms of the disease. Evidence exists forautosomal recessive forms of the disease but this has not beenproven by the association of genemutationswith disease. Despitethe identification of HYDIN as a candidate for human congenitalhydrocephalus, a link between this locus and disease has not beenestablished.We now show that most of theHYDIN gene has beenduplicated onto chromosome 1, which complicates the identifi-cation of mutations that could be associated with disease.

763N.A. Doggett et al. / Genomics 88 (2006) 762–771

Results

Identification of 16q22.2–1q21.1 paralogy

The duplication of the HYDIN locus from 16q22.2 to 1q21.1was identified during finishing of chromosome 16 [8]. Initiallywe noticed that in the BAC contig (AC027281→A-C109135→AC130459→AC099495) we had assembled at16q22.2, the percentage sequence identity in the overlapbetween some clones (∼99.5%) was a little lower than isusually found between clones of different haplotypes(∼99.9%). We then generated additional clone coverage fromthe RP11 library to recover clones with the same haplotype fromthis region. Two clone paths, representing each haplotype, wereassembled after sequencing several additional clones from theRP11 library, with sequence identity between overlappingclones nearly 100% (Fig. 1). Surprisingly, two ends of one path(AC092369→AC109135→AC136634) appeared to extendinto the middle of a sequenced clone, AL049742, belonging tochromosome 1q21.1. The other clone path (AC027281→A-C138625→AC099495) fit perfectly into the chromosome 16map and sequence. The paralogous region between the two

Fig. 1. Physical map of human chromosome 16 and human chromosome 1 encompasaligned mRNAs are indicated on the left. Horizontal arrows illustrate transcriptstranscripts from chromosome 16 transverse the boundary of the duplicon, causing th(chromosome 16:69,402,781–69,760,074, NCBI build 35) does not include the eboundaries of paralogy between the 16q22.2 and the 1q21.1 contigs. Chromosome bwithin this band denote locations of sequence and clone gaps. Possible sites for insershown by dashed lines extending to gaps. Solid lines extending to two gaps represeNT_034398 in the current assembly.

paths is 357.3 kb in length with an average sequence identity of99.2%. Comparison with mouse chromosome sequences thatare syntenic to human 1q21.1 and 16q22.2 regions revealedonly sequences homologous to the chromosome 16 location,consistent with this being the original copy of the duplication.

The alignment between the paralogues has one majorinterruption due to the insertion of a long interspersed nuclearelement-1 (LINE-1) of 6089 bp in the chromosome 1 paralogue(Fig. 2). This insertion was accompanied by a duplication of15 bp, “AAGAAGGGAAAACAC,” which flanks the insertionsite on the chromosome 1 paralogue. There are also two minormicrodeletions of mostly repetitive DNA that have occurredwithin the duplicons, one of 1679 bp on chromosome 16 (bases10,543–12,222 within the chromosome 1 duplicon) and anotherof 2092 bp on chromosome 1 (bases 132,520–134,612 withinthe chromosome 16 duplicon). Most other differences betweenparalogues are small indel changes occurring in simplesequence repeats. The high level of sequence identity (99.2%)and minimal amount of rearrangement between the paralogouscopies suggested a very recent duplication event.

To determine how recently the duplication occurred inprimate evolution we designed a pair of primers to detect the

sing the HYDIN duplication. Accession numbers of sequenced BAC clones andproduced primarily from chromosome 16 (red) or chromosome 1 (blue). Sixeir truncation in the chromosome 1 copy. Note that the extent of the duplicationntire coding region of the HYDIN gene model. Dashed gray box defines theand 1q21.1 is shown at the bottom aligned to its base pair position. Black boxestion of the HYDIN duplication contig into gaps within chromosome 1q21.1 arent the most likely sites for insertion of the duplication, extending off of contig

Fig. 2. Schematic representation of the HYDIN duplicon insertion into chromosome 1. The duplicon spans 357,302 bp of genomic DNA on chromosome 16 flanked byAlu and LTR repeats and 364,611 bp of genomic DNA on chromosome 1 flanked by LINE-1 and LTR repeats. The orientations of repeats around the boundaries of theduplicon are indicated by horizontal arrows. Average sequence identity between the paralogues is 99.2%. Alignment is interrupted by insertion of 6.1 kb of a LINE-1into the chromosome 1 duplicon and two small microdeletions, one of 1.7 kb within the chromosome 16 duplicon and the other of 2.1 kb within the chromosome 1duplicon.

764 N.A. Doggett et al. / Genomics 88 (2006) 762–771

presence of the duplication with PCR by targeting the insertionsite. One pair of primers (FLK) flanked the duplication insertionsite on chromosome 1 and the second pair of primers (JCT)spanned one junction of duplication on chromosome 1. PCRwasperformed against a panel of primate DNAs including lemur,spider monkey, macaque, orangutan, gorilla, chimpanzee, andhuman (Fig. 3). PCR amplification with the JCT primers,detecting the HYDIN insertion on chromosome 1q21.1,occurred only in the human DNA, indicating that the HYDINduplication is specific to humans. Amplification with the FLKprimers, detecting the flanking sequence without the duplica-tion, occurred in all primates tested. We note that amplificationwith the FLK primers in humans results from flanking sequencesthat are shared in clone AL049742 (also within 1q21.1 but not

Fig. 3. Detection of the HYDIN duplication in primates. (A) PCR primer design fjunction of the duplication site on chromosome 1 (JCT). The shaded region represefragment of 248 bp in the absence of the inserted duplication or from additional duplic685 bp only when the duplication is present. (B) Agarose gel image of PCR productsspider monkey; 3, macaque; 4, orangutan; 5, gorilla; 6, chimpanzee; and 7, human.

part of the HYDIN contig). In fact, sequence identity betweenclone AL049742 and sequences flanking the HYDIN duplica-tion are extensive, indicating additional segmental duplicationsare involved. We found a similar sequence of extending 41 kb at99.1% identity between AL049742 and drafted cloneAC092369 immediately adjacent to the 5′ end of the HYDINduplication sequence and another 42 kb of sequence shared at99.6% identity between AL049742 and AC137783 immediatelyadjacent to the 3′ end of the HYDIN duplication. Theseadditional duplications on either side of the HYDIN duplicationmade it impossible to design flanking primers that would beunique to the HYDIN duplication contig.

To confirm the PCR results, which indicated that nonhumanprimates did not contain the HYDIN duplication, we performed

lanking the insertion duplication site on chromosome 1 (FLK) and spanning ants the site of the HYDIN duplication insertion. FLK primers should produce aated flanking sequence present in AL049742. JCT primers produce a fragment ofusing FLK and JCT primers against a panel of primate DNAs: lanes 1, lemur; 2,Marker is 100 bp ladder.

Fig. 4. FISH results on human chromosomes with DNA of RP11-424M24 (AC027281). All 26 individuals contained signals on both chromosome arms at 16q22 and1q21. A representative FISH image is shown.

765N.A. Doggett et al. / Genomics 88 (2006) 762–771

FISH with BAC clone RP11-424M24 (AC027281) labeled withbiotin and detected with FITC–avidin against a normal humancell line, GM08729, and two Pan troglodytes cell lines, PT5and TANK (Supplemental Fig. 2). The human cell line hadsignal on both 16q22 and 1q21. Both chimpanzee cell lines hadFITC signals on 16q22 but no signal on 1q21, confirming thatthe HYDIN duplication is confined to humans.

To ascertain the frequency of the HYDIN duplication inhumans we performed FISH and additional PCR typingexperiments. FISH was performed with DNA from BACclone RP11-424M24 (AC027281) on chromosomes from 26individuals, all of whom were found to contain the duplicationhomozygously on both 1q21 and 16q22 (Fig. 4).To performtyping within a larger panel of humans, we designed two pairsof PCR primers from within the HYDIN duplication that woulddetect length differences that would differentiate between theparalogous copies. These primer pairs are approximately 40 kbapart and present in the finished sequence of two differentclones from chromosome 16 (AC027281 and AC138625) andeach within two different clones covering the chromosome 1copy (Table 1). Primer pair HYDIN-indel-61 detects a 61-bpdifference between paralogous copies and primer pair HYDIN-indel-12 detects a 12-bp difference. These markers were typedthrough about 190 individuals from the Human Diversity Panel(part of the Human Variation Collections of the NIGMSRepository at the Coriell Cell Repositories, Camden, NJ, USA).These individuals were Pakistani, African, East Asian, andNative American. All individuals carried both fragment sizeswith each marker. Also, the intensity ratio for the two bands foreach marker, by eye, did not seem to vary at all among these

Table 1PCR primers detecting length differences within the HYDIN duplication

Primer pair Chromosome 16 size andaccession number

Chromosome 1 size andaccession numbers

HYDIN-indel-61 153 bp, AC027281 214 bp, AC092369, AC109135HYDIN-indel-12 175 bp, AC138625 163 bp, AC109135, AC130459

individuals. These results showed that among a reasonably largeand diverse sample of humans all contained both chromosome16 and chromosome 1 copies of the HYDIN duplication and,further, that there was no evidence for a polymorphic presenceof the chromosome 1 copy.

Genes expressed from the duplication regions

Overall GC content of 41.57% across the duplicon isrelatively low, suggesting it is located in a gene-poor region,consistent with its 16q22.2 cytogenetic location. The insertionsite in chromosome 1 occurred within a 20-kb window of 38.7%GC content. To annotate the duplication region for gene contentwe first searched the UniGene database from NCBI with theBlastn program. Thirteen entries were found in the duplicationregion (Fig. 1). One mouse gene, Hydin, AY173049, was foundlocated on mouse chromosome 8, in the region that ishomologous with the chromosome 16 location of the duplica-tion. Alignments generated using Pipmaker indicated that theduplicon encompasses seven mRNA sequences (Accession Nos.AB058767, AK027571, AL834340, AL833826, AK057467,AL122038/AL133042, and CR611620) and includes most ofthe exons of six additional mRNAs (Accession Nos. AK026688,AK074472, AL137259, BC043273, AK022933, BC028351) aswell as most exons of the predicted human HYDIN gene.Sufficient sequence divergence exists between the paralogouscopies of the HYDIN gene to enable specification of cDNAorigin as being from chromosome 16 or chromosome 1.Sequences of four mRNAs (AB058767, AK027571,AL834340, and AL833826) align more closely to the chromo-some 1 copy of the duplication, suggesting these are expressedfrom this site, while nine mRNAs (AK026688, AK074472,AL137259, BC043273, AK022933, AK057467, AL122038/AL133042, BC028351, and CR611620) appear expressed fromthe chromosome 16 locus based on sequence identity. ThemRNA transcripts appear to represent alternative splicing formsor partial sequences of a larger HYDIN gene. A list of the

766 N.A. Doggett et al. / Genomics 88 (2006) 762–771

chromosomal origin and tissue source for each of the HYDINtranscripts is shown in Table 2. Transcripts arising from thechromosome 16 HYDIN locus were isolated from lung, testis,NT2 neuronal precursor cells, and Jurkat leukemic Tcells, whiletranscripts arising from the chromosome 1 paralogue werederived only from brain or neuronal cells (brain, amygdala, andNT2 neuronal precursor cells), suggesting a more limited andperhaps specialized pattern of expression of the chromosome 1HYDIN transcripts.

One of the transcripts expressed from chromosome 1,AB058767, contains a 15-bp deletion (ACGGAGAAG-GAGCGC) compared to the chromosome 16 gene structure(deletion is within exon 12 of the chromosome 16 HYDIN genemodel). The last 13 bases of the deletion are identical to 13 basesthat occur immediately upstream of the deletion, suggesting thisoccurred by a local DNA recombination error. This variationpermitted us to test whether differential expression patterns mayexist between paralogous transcripts from within the HYDINduplication. The chromosome 1 transcript AB058767 (mRNAfor KIAA1864 protein) was isolated from brain tissue. Todetermine whether a paralogue to AB058767 could be expressedfrom chromosome 16, we aligned this mRNA sequence to thechromosome 16 copy of the duplication and designed two pairsof PCR primers, each with one primer overlapping the 15 bpdeleted from chromosome 1 (and thus unique to chromosome16) and the paired primer flanking upstream or downstream fromthis site. PCR was performed against a human cDNA panel,including tissue from heart, brain, placenta, lung, liver, skeletalmuscle, kidney, and pancreas and cDNA from testis. PCRamplification occurred only in human testis cDNA and in noneof the other cDNA libraries, suggesting that the paralogue toAB058767 may have a pattern of expression different from thatof its chromosome 1 counterpart or is poorly represented in thecDNA libraries tested (data not shown).

HYDIN gene model and expression

We established a predicted genomic structure of the humanHYDIN gene by sequence comparison between the mouse

Table 2Chromosomal origin and tissue source of HYDIN transcripts

mRNA Location Source

AK026688 Chromosome 16 Human lungAK074472 Chromosome 16 Human lungAL137259 Chromosome 16 TestisBC043273 Chromosome 16 TestisAB058767 Chromosome 1 BrainAK027571 Chromosome 1 Retinoic acid-induced NT2

neuronal precursor cellsAL834340 Chromosome 1 AmygdalaAL833826 Chromosome 1 AmygdalaAK022933 Chromosome 16 Retinoic acid-induced NT2

neuronal precursor cellsAK057467 Chromosome 16 TestisAL122038 Chromosome 16 TestisAL133042 Chromosome 16 TestisBC028351 Chromosome 16 TestisCR611620 Chromosome 16 Jurkat leukemic T cell line

Hydin sequence and the genomic sequence of chromosome 16.This human HYDIN gene model spans over 423 kb of genomicDNA and contains 86 exons. Seventy-two exons are identicalwith exons from existing mRNAs (Supplemental Fig. 1) and 14exons are supported with only mouse–human sequence identity.The resulting full-length transcript is 15,732 bp long andencodes a putative protein of 5120 amino acids (HYDINSupplemental Gene Model). This predicted human HYDINshows 80% DNA sequence identity and 77% protein sequenceidentity with mouse Hydin. The 357.3-kb duplicon on chromo-some 1 encompasses 79 exons (6 through 84) of the predictedHYDIN gene and thus contains only a partial sequence.

Sequence alignments reveal that the mRNAs are clustered inroughly three locations along the predicted human HYDIN gene(Fig. 1 and Supplemental Fig. 1). Transcripts located near the 5′end of HYDIN contain poly(A) tails that indicate ends of genes.We designed five pairs of PCR primers to test whether there existsa single transcript expressed as human HYDIN—three primerpairs from within the clusters of expressed transcripts and twopairs of primers bridging these three clusters (Fig. 5A). We firstevaluated a panel of commercially available cDNAs preparedfrom heart, brain, placenta, lung, liver, skeletal muscle, kidney,and pancreas. Positive PCR results were observed from withinthe clustered transcripts in heart, brain, and lung but only veryfaint bands were detected with bridging primers in brain,placenta, and liver (Fig. 5B). We were concerned that thefaintness of bands with the bridging primers might be due to thequality of the cDNA libraries and repeated these experimentsusing human total testicular RNA in RT-PCR (Fig. 5C). PCRamplifications were observed with all primers indicating thattranscripts exist in testis that span the three clusters of mRNAs,confirming the existence of full-length HYDIN transcripts inhuman. These results in combination with the various mRNAsthat exist for partial HYDIN transcripts indicate that HYDINexhibits a complex expression of various isoforms, which aresubject to alternative splicing. Northern blot analysis of humantestis DNA revealed a largeHYDIN transcript that is significantlylarger than 9 kb but smaller than the observed mouse Hydintranscript (Fig. 5D). These results do not preclude the presence ofeven larger HYDIN transcripts in brain or other tissues.

HYDIN gene conservation

While the predicted HYDIN gene product shares significantsimilarity (77%) to the mouse Hydin protein, it is not similar toany known proteins. The function of the mouse Hydin geneproduct is still unknown; however, it was reported to contain a314-amino-acid domain that is similar to the actin bindingprotein caldesmon and, due to its expression pattern withinciliated ependymal cells lining the third and fourth ventricles innewborn mice, was suggested to have a role associated withcilia or ciliated structures [1]. Despite its large size, only a fewconserved SCOP domains have been identified in the putativehuman HYDIN protein. Amino acids 193–272, 4319–4417,and 4527–4602 share similarity with PapD, a fimbrial cha-perone protein (E values of 7.7E−10, 3.7E−10, and 1.4E−5,respectively), and residues 2270–2421 share similarity with

Fig. 5. PCR results against a human cDNA panel with primers designed from the predicted human HYDIN gene. (A) Primer design. Two pairs of primers, 1 and 2,cover regions of the predicted HYDIN gene between existing mRNA transcripts, and three pairs of primers, 3, 4, and 5, are designed within the sequences covered bymRNA entries in GenBank. (B) PCR results with primer pairs 1, 2, 3, 4, and 5. Templates: lanes A, heart; B, brain; C, placenta; D, lung; E, liver; F, skeletal muscle; G,kidney; and H, pancreas. (C) PCR results with primer pairs 1, 2, 3, 4, and 5 using human total testicular RNA. Marker lanes are 100 bp ladder. (D) Northern blot ofhuman testis RNAwith a Hydin probe. The largest human transcript is well over 9 kb, but shorter than the mouse Hydin transcript. The human sample also shows aprominent smaller transcript around 6 kb.

767N.A. Doggett et al. / Genomics 88 (2006) 762–771

apolipophorin-III (E value of 8E−6). Weak but significantsimilarity to a major sperm protein (Motile_Sperm) andputative metallopeptidase (DUF335) for residues 198–292and 2681–2816, respectively, was found using Pfam (E value of8.20E−2 and 8.60E−2, respectively). Major sperm proteins areinvolved in sperm motility and oligomerize to form filaments.

Since HYDIN lacked similarity to previously knownproteins, we used tBlastn to compare the HYDIN protein tosix frame translations of a variety of recently drafted genomes.Conserved homologues were found in Fugu rubripes (42.6%identity over 3157 residues), Ciona intestinalis (49.6%identity over 4106 residues), and Chlamydomonas reinhardtii(36.0% identity over 3374 residues) (Supplemental Figs. 3–5).Putative homologues were also found in genome sequences ofGiardia lamblia, Plasmodium falciparum, and Plasmodiumyoelii yoelii. Ci. intestinalis is an Ascidian (sea squirt) thatpossesses a flagellar radial spoke, sperm flagella, and abundantstigmatal cilia in the branchial basket that generate the feedingcurrent and abundant frontal cilia in the pharynx that transportmucus over the walls. Ch. reinhardtii is a unicellular greenalgae (Chlorophyta) that swims with two flagella. G. lamblia(intestinalis) is a protozoan that moves with the aid of fiveflagella. Pl. falciparum and Pl. yoelii yoelii, the causativeagents of malaria in humans and mice, are sporozoans thatlack cilia or flagella but may contain inactive genes for cilia orflagella from their closely related protozoans. Thus, theHYDIN protein is remarkably well conserved in the primitivechordate, Ci. intestinalis, and among flagellated Protists andChlorophyta.

Discussion

The duplication of the HYDIN locus is among the largestinterchromosomal duplications in the human genome. Thelargest segmental duplications occur on the sex chromosomesand predominately involve intrachromosomal duplicationswithin the Y chromosome. Among the autosomes, there areonly three interchromosomal duplications larger than 300 kbdescribed in the current genome assembly (SupplementalTable 1). Only the largest of these, a 372-kb segmentalduplication between chromosomes 18 and 21, is bigger than theHYDIN duplication. There is only one intrachromosomalduplication among the autosomes that is larger than theHYDIN duplication, which is a 431-kb intraduplication onchromosome 10. Excluding the HYDIN duplication itself, thechromosomal band 16q22.2 containing the HYDIN gene isquite low in segmental duplications, with only 1.8% (36.6 kb)of its 2-Mb length identified as segmental duplications. Incontrast the paralogous site of the HYDIN duplication onchromosome 1q21.1 is among the most duplicated regions inthe genome, with 63.6% of its finished sequence contained insegmental duplications. Chromosome band 1q21.1 is estimatedto be 4.9 Mb and contains 3,998,164 bp of finished sequence, ofwhich 2,543,532 bp is contained in segmental duplications. Ofthat, 1,917,429 bp are exclusively intrachromosomal duplica-tions, 485,494 bp are both intra-and interchromosomalduplications, and 140,609 bp are exclusively interchromosomalduplications. Eleven clone gaps account for the approximately902 kb of missing sequence in 1q21.1. Inclusion of the HYDIN

768 N.A. Doggett et al. / Genomics 88 (2006) 762–771

duplication and its contig increases the amount of segmentalduplications in this band to approximately 68%. An abundanceof interchromosomal duplications within pericentromericregions is well documented (reviewed in [9]). However,chromosome band 1q21.1 is adjacent to the constitutiveheterochromatin band 1q12, which is composed of classicalsatellite DNA and not the centromere. Similar large constitutiveheterochromatic bands consisting predominately of satellite IIor III DNA are located at 9q12, 16q11.2, and Yq12. Of these,9q12 and Y12 contain abundant segmental duplications inadjacent bands 9q13 and Yq11.23, respectively.

The sequence of the chromosome 1 clone contig in which theparalogous copy of the HYDIN gene resides is not entirelyfinished. The end containing the insertion site near the 5′ start ofthe paralogous HYDIN transcripts is finished, with two finishedclones (AC136634 and AC137783) and an additional draftedclone (AC138879) confirming the sequence of this insertionjunction (Fig. 1). Beyond the paralogous HYDIN sequence atthis end of the contig are numerous other copies of previouslyidentified segmental duplications from 1q21.1. The largest ofthese are Segmental Duplications Nos. 361, 447, 449, 455, 552,572, 574, and 1211, most of which contain numerous othersmaller duplications (see Segmental Dups Track in the UCSCGenome Browser). The sequence at this end cannot beconsistently aligned with other finished sequences fromchromosome 1q21.1 and therefore must extend into 1 of the11 sequence gaps of this band. The other end of the contig,containing the 3′ end of the HYDIN paralogous copy, isrepresented by two drafted clones (BX546456 and AC092369).Beyond the HYDIN duplication at this end are also numerousother copies of previously identified segmental duplicationsfrom 1q21.1, including Segmental Duplications 361, 363, 456,472, 486, 491, 522, 523, 551, 970, and 13399 and smallerduplications contained within most of these (see SegmentalDups Track in the UCSC Genome Browser). This end of thecontig appears to overlap with the genomic contig NT_034398from 1q21.1, but confirmation of this possibility will requirefinishing of clones BX546456 and AC092369.

Alu repeat clusters have been shown to play a role asmediatorsof recurrent chromosomal rearrangements [10,11]. While repeti-tive elements could be found near the breakpoints of the 16q22.2–1q21.1 parology—Alu (Jo)/LTR repetitive elements at thechromosome 16 duplicon junction and LINE-1/LTR repetitiveelements at the chromosome 1 junction—these did not sharesequence similarity, and thus the molecular mechanism by whichthis duplication occurred is unlikely to have involved homologousrecombination and remains, for the moment, unclear.

Duplications within chromosomes (intrachromosomal) maysubsequently lead to deletions or rearrangements due to unequalrecombination events during meiosis between nearly identicalcopies of the duplicon. This is observed in genomic mutationdisorders such as Charcot–Marie–Tooth 1A, DiGeorge/velo-cardiofacial syndrome, Prader–Willi and Angelman syndromes,Smith–Magenis syndrome, Williams–Beuren syndrome, andneurofibromatosis type 1 (reviewed in [12]). A polymorphicinterstitial duplication on human chromosome 15q24–q26 hasbeen shown to be a susceptibility factor for panic and phobic

disorders [13]. Polymorphic interchromosomal duplications ofmembers of the olfactory receptor gene family occur in thesubtelomeric region of several human chromosomes [14], andpolymorphic segmental duplications of the defensin genecluster have been reported at 8p23.1 [15].

Unequal crossing-over events (either intrachromosomal orinterchromosomal) occurring between repeat arrays duringmitosis can lead to mosaicism as has been observed for the15q24–q26 duplicon, which is flanked by chromosome 15-specific low-copy repeat sequences [13], and for subtelomericinterchromosomal repeats on chromosomes 4 and 10 [16]. Atthe present time, we do not know whether the duplicated copyof the HYDIN locus on chromosome 1 can contribute towardaltered susceptibility to hydrocephalus or other conditions. Dueto the high level of sequence identity between the two loci,however, it will remain a challenge to differentiate positivelybetween the chromosome 16 and the chromosome 1 copies inany mutational or expression analysis. Only the first 5 and thelast 2 exons of the predicted 86-exon HYDIN gene model areunique to chromosome 16 and it is not likely possible togenerate primers that enable specific amplification of theremaining chromosome 16 HYDIN exons for mutation screen-ing of patients with autosomal congenital hydrocephalus.Amplified gene fragments would need to be cloned and severalindividual clone isolates sequenced to generate sequencingreads representing both paralogues.

Recent studies have found evidence of a significant number ofpolymorphic duplications within the genomes of normalindividuals. Sebat et al. [17] found evidence for 76 large-scalecopy number polymorphisms in the human genome usingrepresentational oligonucleotide microarray analysis of 20individuals. They reported large-scale polymorphic losses of862,373 and 209,899 bp in 1q21. The largest of these, which islinked to Accession Nos. AC138775 and AL583842 in NCBIbuild 33, is not placed in the current NCBI build 35 assembly. Thesmaller polymorphic loss can be placed in the current assembly atchromosome 1, 142,374,952–142,584,851, within NT_004434.Iafrate et al. [18] identified 255 large-scale polymorphisms byarray-based comparative genomic hybridization of 55 individualsand none of these involved the HYDIN locus, agreeing with ourfindings that the HYDIN duplication shows no evidence ofstructural polymorphism (in the absence of disease).

Recent segmental duplications often serve as “nurseries” forthe evolution of human-specific genes [19]. The duplicatedcopies of genes escape the evolutionary pressure of their parentgene and are more freely able to acquire new functions. TheHYDIN segmental duplication described here provides evi-dence for gene evolution through gene duplication, pointmutations, alternate splicing, and gene splitting. While a full-length HYDIN gene is not possible from the chromosome 1 sitedue to truncation at the 5′ and 3′ ends relative to chromosome16, at least four transcripts are expressed from the chromosome1 paralogue of the HYDIN duplication. These transcripts mayhave a more limited range of tissue expression than transcriptsarising from chromosome 16; however, elucidation of any novelor specialized functions of chromosome 1 HYDIN transcriptswill require further analysis. By comparison, in the mouse,

Table 3PCR primers used in this study

Primer name Sequence (5′ to 3′)

FLK.F GAGTGCCTCCGTAAGCTGAGFLK.R CATGTACACGAGGTGGTTGGJCT.F CAAGAGCAAAACCCTGCTTCJCT.R AGACCCACAAACTGGTCCTGHYDIN-indel-61.F GGAAGAATTCAAGGGGAAAAAHYDIN-indel-61.R GGCCGAGGTGATAGTGAAGAHYDIN-indel-12.F TTGATGAATGCAGCCTGGTAHYDIN-indel-12.R CCAAATGCTGCTAAGTGAAGCHYDIN1.F AGGCATTATTCCAGCCCTTTHYDIN1.R TCGGAATATGGAAGGCACTCHYDIN2.F AGTCCCTTGTGAATGGTTCGHYDIN2.R CTTGCACGTTGGACTTCTCAHYDIN3.F AAAGCCAGGAAACACATTGGHYDIN3.R CAGGCTGTTGTTCATGATGGHYDIN4.F CTTTCCATCTGGGCATCACTHYDIN4.R TCTCCTTCCTGCTTTGGTGTHYDIN5.F GTAAGAAGGGCCGGGTTAAGHYDIN5.R GGAGAGCTGGATGAGAGGTGchr16.orf2.1.F AAGGAGCGCACGGAGAAGchr16.orf2.1.R ATCCACCGGAAGTGATTCAGchr16.orf2.2.F TGGAGACAATCGAAAGGAAAAchr16.orf2.2.R CTCTCCAGGCGCTCCTTCT

769N.A. Doggett et al. / Genomics 88 (2006) 762–771

the single-copyHydin is expressed in every tissue known to have9+2 cilia. The incidence of congenital hydrocephalus appears tovary among different populations (reviewed in [20,21]). Furtherpopulation studies including genomic and gene expression levelanalysis are needed to determine any role of the chromosome 1duplication of the HYDIN locus in disease susceptibility.

On the basis of the June 2002 (NCBI build 30) draft humangenome assembly, a total of 107.4 Mb (3.53%) of the humangenome content was found to be involved in recent segmentalduplications [22]. Subsequent analysis of the finished genomesequence (NCBI build 35) revealed that segmental duplicationsaccounted for about 5.3% of the genome [23].While the HYDINduplication is not represented in the most recent genomeassembly for chromosome 1 (NCBI build 35), its duplicationwas detected in Accession Nos. AC027281, AC092369,AC109135, and AC099495 in the Segmental DuplicationDatabase (http://humanparalogy.gs.washington.edu/) based oncomparison of Celera whole-genome shotgun sequence dataversus BAC clone sequences (described in [24]) (see alsoWSSD Duplication Track in the UCSC Genome Browser).However, the extent of the duplication was not precisely defined,and it remained unknown where the additional copy of theduplicon existed. Upon further analysis of the Whole GenomeShotgun SequenceDetection (WSSD)Database and comparisonwith the build 35 human genome assembly, we can account for atotal of approximately 5 Mb of genomic duplications, includingthe one in this paper, detected by a high depth of Celera whole-genome shotgun sequence reads but not identified as segmentalduplications (i.e., present in the WSSD Duplication Track butabsent from the Segmental Dups Track). The missing copies ofthese duplications may reside within remaining sequence gaps inthe current genome assembly or possibly represent structuralpolymorphisms. Additional studies are needed to resolve theseremaining genomic questions.

Methods

De novo gene structure prediction

Genscan (http://genes.mit.edu/GENSCAN.html) [25] and GrailEXP (http://grail.lsd.ornl.gov/grailexp/) [26] were used for ab initio gene finding.Disparities were observed in the results of gene prediction programs, includingdifferent use of splicing sites or exons between different gene models. These abinitio prediction programs did not reveal any new genes compared withpreviously known human or mouse genes, and neither was successful inpredicting the human HYDIN gene, because of its giant size.

Genes identified during annotation

For previously known human mRNA sequence, the genomic structures wereestablished by sequence comparison and alignment between the human mRNAsequences and genomic DNA sequence of chromosome 16. The position ofputative exons in the sequence was determined with the SIM4 program (http://globin.cse.psu.edu/globin/html/) [27]. The primary location of mRNAs wasdetermined by using the Sim4 and Blastn program [28] to align and compareidentity with chromosome 16 or chromosome 1 paralogues.

For human HYDIN gene annotation, a tBlastn search was performed byusing the mouse Hydin protein sequence against the human genomic DNA. Theputative human HYDIN protein sequence was extracted after parsing the Blastresult. Further sequence analysis using tBlastn (putative human HYDIN protein

sequence as the query) allowed identification of the predicted HYDIN genestructure on chromosome 16, and its splice sites were confirmed using the SpliceSite Prediction program (http://www.fruitfly.org/seq_tools/splice.html) and all86 exons were computationally validated using the Genewise program (http://www.ebi.ac.uk/Wise2/) [29].

Duplicon junction identification

Blast2 alignment (http://www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html) [30]was used to compare the 360 kb of paralogous sequence between chromosome16 and chromosome 1. Sequence alignments were performed with Clustalwsoftware [31]. After all repetitive elements were determined using theRepeatMasker program (http://www.repeatmasker.org/), Pipmaker analysis(http://www.bx.psu.edu/miller_lab/) [32] was performed to plot 360 kb ofchromosome 16 sequence against its paralogous sequence at chromosome 1.

Sequence analysis of predicted genes

Deduced amino acid sequences were analyzed for N-terminal signalsequences and transmembrane domains using Psort (http://psort.ims.u-tokyo.ac.jp/) [33]. InterPRO (http://www.ebi.ac.uk/interpro/index.html) [34] wasused for profile, pattern, and motif searching. SMART software (http://smart.embl-heidelberg.de/) [35] was used for signaling domain identification.Divergent homologous sequences were determined by Psi-Blast searching [28].

PCR amplifications

PCR primer sequences are listed in Table 3. A human multiple-tissue cDNApanel was purchased from ClonTech. Primate and human diversity DNA panelswere acquired from the Coriell Cell Repositories. PCRs with FLK and JCTprimers were performed in 10-μl reactions, with 10 mM Tris–HCl, pH 8.3,50 mM KCl, 2 mM MgCl2, 480 mM dNTP's, 0.7 μM primers, 1.5 U AmpliTaqGold, and 500 ng DNA. Thermal cycling was performed in an ABI 9700 underthe following conditions: 95°C for 5 min initial denaturation, 35 cycles (95°Cfor 30 s, 58°C for 30 s, 68°C for 1 min), and 68°C for 10 min final elongation.PCRs of the multiple-tissue cDNA panel were performed in 50 μl containing20 mM Tris–HCl, pH 8.5, 50 mM KCl, 1.5 mM MgCl2, 200 μM dNTP's,0.2 μM primers, 0.75 μl Taq DNA polymerase (Clontech). These reactions wererun in anMJ thermocycler under the following conditions: 94°C for 2 min initialdenaturation, 30 cycles (94°C for 10 s, 60°C for 30 s, 68°C for 1 min), and

770 N.A. Doggett et al. / Genomics 88 (2006) 762–771

68°C for 7 min final elongation. The methods for DNA typing using the humandiversity panel were performed as described [36].

Fluorescence in situ hybridization

BAC DNA was isolated using the Plasmid Maxi Prep Kit (Qiagen Ltd.,Ireland) method and quantified using a fluorimeter with known standards.Exactly 1 μg of purified DNA was then fluorescently labeled using eitherspectrum-green–dUTP (Vysis, UK) or biotin-14–dATP (Invitrogen, USA) bystandard nick-translation reaction mixtures as recommended by the supplier. TheDNAwas then ethanol precipitated in the presence of 50× human Cot-1 DNA(to block highly repetitive sequences) and resuspended in 22.5 μl ofhybridization buffer (50% formamide, 1× SSC, 10% dextran sulfate) overnightat 37°C. The hybridization protocol recommended by Vysis, UK, was thencarried out on normal human chromosome preparations from individuals withinthe Irish population as well as on the human cell line GM08729 (Coriell,) andtwo chimpanzee cell lines, PT5 and TANK (ATCC,Manassas, VA, USA). Slideshybridized with the biotin-labeled BAC, RPCI-11 424M24, were hybridized,washed, and detected with avidin–FITC as described [37]. Following thehybridization and washing steps, the slides were counterstained with 4′,6-diamidino-2-phenylindole dihydrochloride (DAPI-Antifade from Vysis).Images were acquired with a CCD camera coupled to an Olympus BX60microscope. Image processing was carried out using Cytovision 3.52 softwarefrom Applied Imaging.

Acknowledgments

We thank Judy Tesmer and Meghan Doyal, Los AlamosNational Laboratory, for technical assistance.We are also gratefulto Barbara Trask, Fred Hutchinson Cancer Research Center, forproviding the support and facilities for M.K.R. to perform FISHon primate cell lines in her laboratory. This work was supportedby the U.S. Department of Energy Office of Biological andEnvironmental Research under Contract W-7405-ENG-36.

Appendix A. Supplementary data

Supplementary data associated with this article can be found,in the online version, at doi:10.1016/j.ygeno.2006.07.012.

References

[1] B.E. Davy, M.L. Robinson, Congenital hydrocephalus in hy3 mice iscaused by a frameshift mutation in Hydin, a large novel gene, Hum. Mol.Genet. 12 (2003) 1163–1170.

[2] D.F. Callen, E.G. Baker, S.A. Lane, Re-evaluation of GM2346 from del(16) (q22) to t(4;16) (q35;q22.1), Clin. Genet. 38 (1990) 466–468.

[3] N. Saguragawa, Y. Yokoyama, Clinical and molecular genetics of inheritedhydrocephalus, Congenit. Anom. 34 (1994) 303–310.

[4] P.H. Schurr, C.E. Polkey (Eds.), Hydrocephalus, Oxford Univ. Press,New York, 1993.

[5] A. Rosenthal, M. Jouet, S. Kenwrick, Aberrant splicing of neural celladhesion molecule L1 mRNA in a family with X-linked hydrocephalus,Nat. Genet. 2 (1992) 107–112.

[6] M. Jouet, et al., X-linked spastic paraplegia (SPG1), MASA syndrome andX-linked hydrocephalus result from mutations in the neural cell adhesiongene L1CAM, Nat. Genet. 7 (1994) 402–407.

[7] L. Vits, et al., MASA syndrome is due to mutations in the neural celladhesion gene L1CAM, Nat. Genet. 7 (1994) 408–413.

[8] J. Martin, et al., The sequence and analysis of duplication-rich humanchromosome 16, Nature 432 (2004) 988–994.

[9] E.E. Eichler, Recent duplication, domain accretion and the dynamicmutation of the human genome, Trends Genet. 17 (2001) 661–669.

[10] E. Kolomietz, M.S. Meyn, A. Pandita, J.A. Squire, The role of Alu repeat

clusters as mediators of recurrent chromosomal aberrations in tumors,Genes Chromosomes Cancer 35 (2002) 97–112.

[11] J.A. Bailey, G. Liu, E.E. Eichler, An Alu transposition model for the originand expansion of human segmental duplications, Am. J. Hum. Genet. 73(2003) 823–834.

[12] J.R. Lupski, Genomic disorders: structural features of the genome can leadto DNA rearrangements and human disease traits, Trends Genet. 14 (1998)417–422.

[13] M. Gratacòs, et al., A polymorphic genomic duplication on humanchromosome 15 is a susceptibility factor for panic and phobic disorders,Cell 106 (2001) 367–379.

[14] B.J. Trask, et al., Members of the olfactory receptor gene family arecontained in large blocks of DNA duplicated polymorphically near theends of human chromosomes, Hum. Mol. Genet. 7 (1998) 13–26.

[15] E.J. Hollox, J.A.L. Armour, J.C.K. Barber, Extensive normal copy numbervariation of a beta-defensin antimicrobial-gene cluster, Am. J. Hum. Genet.73 (2003) 591–600.

[16] P.G. van Overveld, et al., Interchromosomal repeat array interactionsbetween chromosomes 4 and 10: a model for subtelomeric plasticity, Hum.Mol. Genet. 9 (2000) 2879–2884.

[17] J. Sebat, et al., Large-scale copy number polymorphism in the humangenome, Science 305 (2004) 525–528.

[18] A.J. Iafrate, et al., Detection of large-scale variation in the human genome,Nat. Genet. 36 (2004) 949–951.

[19] J.L. Nahon, Birth of ‘human-specific’ genes during primate evolution,Genetica 118 (2003) 193–208.

[20] C. Schrander-Stumpel, J.-P. Fryns, Congenital hydrocephalus: nosologyand guidelines for clinical approach and genetic counseling, Eur. J. Pediatr.157 (1998) 355–362.

[21] F. Haverkamp, et al., Congenital hydrocephalus internus and aqueductstenosis: aetiology and implications for genetic counseling, Eur. J. Pediatr.158 (1999) 474–478.

[22] J. Cheung, et al., Genome-wide detection of segmental duplications andpotential assembly errors in the human genome sequence, Genome Biol. 4(2003) R25.

[23] International Human Genome Sequence Consortium, Finishing the~euchromatic sequence of the human genome, Nature 431 (2004) 931–945.

[24] J.A. Bailey, et al., Recent segmental duplications in the human genome,Science 297 (2002) 1003–1007.

[25] C. Burge, S. Karlin, Prediction of complete gene structures in humangenomic DNA, J. Mol. Biol. 268 (1997) 78–94.

[26] Y. Xu, R.J. Mural, E.C. Uberbacher, Inferring gene structures ingenomic sequences using pattern recognition and expressed sequence tags,in: T. Gaasterland, P. Karp, K. Karplus, C. Ouzounis, C. Sander, A. Valencia(Eds.), Proceedings of the Fifth International Conference on IntelligentSystems for Molecular Biology, AAAI Press, Menlo Park, CA, 1997,pp. 344–353.

[27] L. Florea, G. Hartzell, Z. Zhang, G.M. Rubin, W. Miller, A computerprogram for aligning a cDNA sequence with a genomic DNA sequence,Genome Res. 8 (1998) 967–974.

[28] S.F. Altschul, et al., Gapped BLAST and PSI-BLAST: a new generation ofprotein database search programs, Nucleic Acids Res. 25 (1997) 3389–3402.

[29] E. Birney, M. Clamp, R. Durbin, GeneWise and genomewise, GenomeRes. 14 (2004) 988–995.

[30] T.A. Tatusova, T.L. Madden, BLAST 2 sequences, a new tool forcomparing protein and nucleotide sequences, FEMS Microbiol. Lett. 174(1999) 247–250.

[31] J.D. Thompson, D.G. Higgins, T.J. Gibson, CLUSTALW: improving thesensitivity of progressive multiple sequence alignment through sequenceweighting, position-specific gap penalties and weight matrix choice,Nucleic Acids Res. 22 (1994) 4673–4680.

[32] S. Schwartz, et al., PipMaker—A Web server for aligning two genomicDNA sequences, Genome Res. 10 (2000) 577–586.

[33] K. Nakai, M. Kanehisa, Expert system for predicting protein localization sitesin gram-negative bacteria, Proteins Struct. Funct. Genet. 11 (1991) 95–110.

[34] N.J. Mulder, et al., The InterPro database, brings increased coverage andnew features, Nucleic Acids Res. 31 (2003) 315–318.

[35] J. Schultz, F. Milpetz, P. Bork, C.P. Ponting, SMART, a simple modular

771N.A. Doggett et al. / Genomics 88 (2006) 762–771

architecture research tool: identification of signaling domains, Proc. Natl.Acad. Sci. U. S. A. 95 (1998) 5857–5864.

[36] J.L. Weber, et al., Human diallelic insertion/deletion polymorphisms, Am.J. Hum. Genet. 71 (2002) 854–862.

[37] B.J. Trask, B. Birren, E. Green, P. Hieter, R. Myers (Eds.), GenomeAnalysis: A Laboratory Manual, vol. 4, Cold Spring Harbor LaboratoryPress, New York, 1998, pp. 303–413.

Web site references

http://bioweb.pasteur.fr/seqanal/interfaces/clustalw.html: ClustalW multi-ple alignments.

http://www.ebi.ac.uk/Wise2/: Genewise protein to genomic DNA sequencecomparison.

http://genes.mit.edu/GENSCAN.html: Genscan gene structure prediction.http://grail.lsd.ornl.gov/grailexp/: GRAILEXP gene discovery suite.http://www.ebi.ac.uk/interpro/index.html: InterPRO database.http://www.ncbi.nlm.nih.gov/blast/: NCBI BLAST pages.http://www.bx.psu.edu/miller_lab/: PipMaker alignments program.http://psort.ims.u-tokyo.ac.jp/: PSORT protein localization prediction.http://www.repeatmasker.org/: RepeatMasker program.http://globin.cse.psu.edu/globin/html/: Sim4 program.http://smart.embl-heidelberg.de/: SMART Simple Modular Architecture

Research Tool.http://www.fruitfly.org/seq_tools/splice.html: Splice site prediction by

neural network.http://genome.cse.ucsc.edu/cgi-bin/hgGateway: UCSC Genome Browser.http://humanparalogy.gs.washington.edu/: Segmental Duplication Database.


Top Related