+ All Categories
Home > Documents > Structure andExpression ofthe Human Immunoglobulin XGenes … · Structure andExpression ofthe...

Structure andExpression ofthe Human Immunoglobulin XGenes … · Structure andExpression ofthe...

Date post: 21-Jan-2021
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
12
Structure and Expression of the Human Immunoglobulin X Genes By Thomas J . Vasicek and Philip Leder From the Department of Genetics, Harvard Medical School; and the Howard Hughes Medical Research Institute, Boston, Massachusetts 02115 Summary We determined the DNA sequence of two large regions of chromosome 22 : 33 .7 kb containing the Ct, complex ; and 5 .2 kb 5' of the functionally rearranged X gene from the human myeloma, U266 . Analysis of these sequences reveals the complete structure of the human Cr complex and a previously undescribed seventh Cx region that may encode the Ke+Oz - X protein . The seven constant regions are organized in a tandem array, and each is preceded by a single Jx region . A1, A2, X3, and X7 are apparently active genes, while X4, X5, and X6 are pseudogenes. There are no other Jx or Ca regions within a 60-kb region surrounding the CX complex ; however, there are at least four other X-like genes and X pseudogenes in the human genome. The X genes appear to have evolved via a series of gene duplication events resulting from unequal crossing over or gene conversion between the highly conserved Cr regions on mispaired chromosomes. The lack of Alu sequences in this large segment of DNA suggests that the Ca complex resulted from a recent amplification of a smaller Alu-free segment of DNA . Illegitimate recombination between repeated sequences containing X2 and X3 may be responsible for variable amplification of the A genes . We also found a 1,377-bp open reading frame (ORF) located on the opposite strand in the region containing X7. While this ORF is flanked by potential RNA splicing signals, we have no evidence that it is part of a functional gene. We also discovered a VX pseudogene, called ~V\1, 3 kb upstream of the U266X gene. Using primer extension analysis to map the transcription start in the human X gene, we have identified its initiation point 41 by upstream of the initiation codon . Analysis of the X promoter reveals that it contains a TATAA box at position -29 relative to the transcription initiation site and an octamer sequence at -67 . Computer analysis of 40 kb of DNA sequences surrounding the human X locus has revealed no sequences resembling the n or IgH transcriptional enhancers, nor have in vitro analyses for function revealed enhancer activity . A comparison of these results with those obtained in separate studies with transgenic mice point to a complex, developmentally linked mechanism of transcriptional activation. T he Ig genes are among the most intensely studied and clinically relevant loci in man . Ig proteins are encoded at three independent loci : IgH, Igtc, and IgX. Each locus con- sists ofa complex of V, J, and C gene segments, along with diversity segments in the IgH locus, which require rearrange- ment during B cell development to produce active genes (for review, see reference 1) . The detailed structures of the human K and IgH genes have been determined, and control sequences such as transcriptional enhancers and conserved promoter ele- ments required for Ig gene activation have been delineated . Because of the importance of the Ig genes, we have set out to characterize the third of these loci, the human X locus . XL chains are present in -40% of human Igs (2) . Four X isotypes, termed Mcg+, Ke - Oz - , Ke - Oz+, and Ke+ Oz - , have been described on the basis of their reactivity with the Oz (3), Kern (4), and Mcg (5) antisera, which were raised against X Bence Jones proteins isolated from patients with multiple myeloma (6) . The Oz, Kern, and Mcg an- tisera bind to specific epitopes on the Cx regions that result from specific amino acid substitutions . In addition to the four well-known X isotypes, peptide sequence analysis of human X proteins reveals 10 additional, distinct Ct, sequences (7) . Whether these unique X proteins represent as yet uncharac- terized Cx genes or discrete polymorphic alleles of known genes has yet to be determined. This contrasts with the human K gene, which is represented as a single C region sequence with two polymorphic alleles . In their initial characterization of the human X locus, Hieter et al . (8) cloned and partially characterized the Cx gene com- plex and predicted that the human X locus contained a long tandem array of at least six Cx regions spanning >30 kb of DNA . The DNA sequence of the first three Cx regions (numbered from 5' to 3') revealed that they potentially en- coded the Mcg+, Ke - Oz - , and Ke - Oz' polypeptides, 609 J. Exp . Med. © The Rockefeller University Press " 0022-1007/90/08/0609/12 $2.00 Volume 172 August 1990 609-620 Downloaded from http://rupress.org/jem/article-pdf/172/2/609/1394554/609.pdf by guest on 23 May 2021
Transcript
Page 1: Structure andExpression ofthe Human Immunoglobulin XGenes … · Structure andExpression ofthe Human Immunoglobulin XGenes By ThomasJ. Vasicek and Philip Leder From the DepartmentofGenetics,

Structure and Expression of the HumanImmunoglobulin X GenesBy Thomas J. Vasicek and Philip Leder

From the Department of Genetics, Harvard Medical School; andthe Howard Hughes Medical Research Institute, Boston, Massachusetts 02115

Summary

We determined the DNA sequence of two large regions ofchromosome 22 : 33.7 kb containingthe Ct, complex; and 5 .2 kb 5' of the functionally rearranged X gene from the human myeloma,U266. Analysis of these sequences reveals the complete structure of the human Cr complex anda previously undescribed seventh Cx region that may encode the Ke+Oz- X protein . The sevenconstant regions are organized in a tandem array, and each is preceded by a single Jx region .A1, A2, X3, and X7 are apparently active genes, while X4, X5, and X6 are pseudogenes. Thereare no other Jx or Ca regions within a 60-kb region surrounding the CX complex ; however,there are at least four other X-like genes and X pseudogenes in the human genome. The X genesappear to have evolved via a series of gene duplication events resulting from unequal crossingover or gene conversion between the highly conserved Cr regions on mispaired chromosomes.The lack of Alu sequences in this large segment of DNA suggests that the Ca complex resultedfrom a recent amplification of a smaller Alu-free segment of DNA. Illegitimate recombinationbetween repeated sequences containing X2 and X3 may be responsible for variable amplificationof the A genes . We also found a 1,377-bp open reading frame (ORF) located on the oppositestrand in the region containing X7. While this ORF is flanked by potential RNA splicing signals,we have no evidence that it is part of a functional gene. We also discovered a VX pseudogene,called ~V\1, 3 kb upstream of the U266X gene. Using primer extension analysis to map thetranscription start in the human X gene, we have identified its initiation point 41 by upstreamof the initiation codon . Analysis of the X promoter reveals that it contains a TATAA box atposition -29 relative to the transcription initiation site and an octamer sequence at -67 . Computeranalysis of 40 kb of DNA sequences surrounding the human X locus has revealed no sequencesresembling the n or IgH transcriptional enhancers, nor have in vitro analyses for function revealedenhancer activity. A comparison of these results with those obtained in separate studies withtransgenic mice point to a complex, developmentally linked mechanism oftranscriptional activation.

The Ig genes are among the most intensely studied andclinically relevant loci in man. Ig proteins are encoded

at three independent loci : IgH, Igtc, and IgX. Each locus con-sists of a complex of V, J, and C gene segments, along withdiversity segments in the IgH locus, which require rearrange-ment during B cell development to produce active genes (forreview, see reference 1) . The detailed structures of the humanK and IgH genes have been determined, and control sequencessuch as transcriptional enhancers and conserved promoter ele-ments required for Ig gene activation have been delineated .Because of the importance of the Ig genes, we have set outto characterize the third of these loci, the human X locus .X L chains are present in -40% of human Igs (2) . Four

X isotypes, termed Mcg+, Ke - Oz- , Ke-Oz+, and Ke+Oz - , have been described on the basis of their reactivitywith the Oz (3), Kern (4), and Mcg (5) antisera, which wereraised against X Bence Jones proteins isolated from patients

with multiple myeloma (6) . The Oz, Kern, and Mcg an-tisera bind to specific epitopes on the Cx regions that resultfrom specific amino acid substitutions . In addition to the fourwell-known X isotypes, peptide sequence analysis of humanX proteins reveals 10 additional, distinct Ct, sequences (7) .Whether these unique X proteins represent as yet uncharac-terized Cx genes or discrete polymorphic alleles of knowngenes has yet to be determined. This contrasts with the humanK gene, which is represented as a single C � region sequencewith two polymorphic alleles .

In their initial characterization ofthe human X locus, Hieteret al . (8) cloned and partially characterized the Cx gene com-plex and predicted that the human X locus contained a longtandem array of at least six Cx regions spanning >30 kb ofDNA. The DNA sequence of the first three Cx regions(numbered from 5' to 3') revealed that they potentially en-coded the Mcg+, Ke - Oz - , and Ke - Oz' polypeptides,

609

J. Exp . Med. © The Rockefeller University Press " 0022-1007/90/08/0609/12 $2.00Volume 172 August 1990 609-620

Dow

nloaded from http://rupress.org/jem

/article-pdf/172/2/609/1394554/609.pdf by guest on 23 May 2021

Page 2: Structure andExpression ofthe Human Immunoglobulin XGenes … · Structure andExpression ofthe Human Immunoglobulin XGenes By ThomasJ. Vasicek and Philip Leder From the DepartmentofGenetics,

respectively (8) . Dariavach et al . (9) concluded, from partialsequences ofC4, CA5, and CK that X4 and X5 were pseu-dogenes and, contrary to our findings, that A6 could encodethe Ke+Oz - protein . The location of the Ke+Oz' gene isof particular interest, since, in addition to the main C, com-plex on chromosome 22 (10, 11), several other unlinked, relatedsequences are detected by crosshybridization to labeled Cx,probes (8) . Therefore, the existence of another unlinked Xgene complex could not be ruled out . Further characteriza-tion of two of these unlinked sequences revealed that theyare pseudogenes, while two others may encode X-like pro-teins (12, 13) . These A-like genes may be related to similarsequences in the mouse, termed Xs, that may be importantin early B cell differentiation (14) .To determine the genetic basis of human A protein com-

plexity and the mechanisms of X gene expression control,we sequenced 33.7 kb of DNA containing the entire humanCx complex . We also sequenced an 8.5-kb DNA segmentcontaining the promoter and rearranged Vx-J, sequences ofan active A gene. As a result, we can now definitively de-scribed the organization of the human C;, complex, whichincludes a previously uncharacterized seventh Cx gene thatis capable of encoding the Ke+Oz - protein . We show thatthe DNA sequence conservation between the Cx genes ex-tends beyond the coding exons and may contribute to theevolution of the X locus. Furthermore, analysis ofthe sequencedata confirms that several of the Q, regions are pseudogenesand reveals a long open reading frame (ORF)t in the oppo-site orientation to the A genes. We also searched these se-quences for potential control elements similar to the knownIg enhancers and mapped the transcription start site in theX promoter. Finally, we describe a Vx pseudogene upstreamof an active, rearranged V), .

Materials and MethodsCloning of VA and C Genes.

Recombinant phage clones con-taining the Cx, genes were those reported previously by Hieter etal. (8) . These were cloned from Charon phage libraries preparedfrom either human fetal liver DNA partially digested with Sau3a(15) or EcoRI-digested DNA from white blood cells of a patientwith chronic myelogenous leukemia . Cosmid 49a-4 was isolatedfrom a library of partial Mbol-digested spleen DNA from ana-thalassemia patient in the BamHI-digested pJB8 vector (16), andwas provided by C.C . Morton andJ . Sarid. The active X gene, rear-ranged into X2, was cloned from size-fractionated, EcoRI-digestedDNA prepared from the IgE-A human myeloma U266 cell line(17) . Fragments of the phage and cosmid clones were subclonedas smaller plasmids by standard techniques for further manipula-tion (15) .DNA Sequencing.

Dideoxy chain termination sequencing (18)was performed on double-stranded plasmids, cosmids, and X phageusing Sequenase (United States Biochemical Corp., Cleveland, OH).Primers were either synthesized on a DNA synthesizer (380A; Ap-plied Biosystems Inc., Foster City, CA) or purchased from NewEngland Biolabs (Beverly, MA) (pBR322 and pUC primers) orPromega Biotech (Madison, WI) (T7 and Sp6 promoter primers) .All coding region and pseudogene sequences were confirmed by

t Abbreviation used in this paper. ORF, open reading frame .

610

Human X Light Chain Gene Structure and Expression

sequencing on both strands with dITP in the place of dGTP (19),as were most of the intronic and intergenic regions, which weresequenced on only one strand . Sequences were analyzed on 4.0 or5.0%, denaturing, electrolyte gradient (20) gels, 100 cm long by0.4 mm thick, run in an Ephortec (Haake/Buckler, Saddle Brook,NJ) gel apparatus. This arrangement allowed resolution of600-850bases from each primer.

Primer Extension Analysis .

An antisense strand deoxyoligonu-cleotide primer, 5'-CTGTGCCCTG AGTGAGGAGG GTGAG-GATGA-3, complementary to the 3'-most bases of the first exon(the VX leader coding exon; see Fig. 9) was end labeled with T4polynucleotide kinase and used to prime cDNA synthesis with AMVreverse transcriptase (Life Sciences, Tampa, FL) ontotal cellular RNA(15) from U266 myeloma cells . The resulting cDNAs were sub-jected to electrophoresis on an 8% sequencing gel with adjacentlanes containing chain termination sequencing reactions.

Computer-aided DNA Sequence Analysis.

Computer analyses wereperformed using the University ofWisconsin Genetics ComputerGroup (UWGCG) software (21), except for the evolutionary tree,which was generated by the progressive alignment program de-scribed by Feng and Doolittle (22) .

ResultsDNA Sequence ofthe Ca Complex.

To define the struc-tural features of the Cx complex and establish the mecha-nism for its evolution, as well as search for potential enhancersequences, we determined the nucleotide sequence of the en-tire Cx complex. The DNA sequencing strategy for the Cagenes is shown in Fig. 1 . We subcloned each of the X genesseparately and performed plasmid-directed, chain terminationsequencing reactions with primers that specifically hybridizeto either the Jx or the Cx regions to determine the sequenceof large regions surrounding each Jx and C gene. In addi-tion, we used commercially available plasmid primers to se-quence inward from the ends of each cloned fragment andobtained the complete sequence ofeach plasmid insert by ex-tending the known regions with additional synthetic primers.We sequenced across the subclone boundaries and small EcoRlrestriction fragments by priming directly on fragments clonedin cosmids and X phage . In all, 42.2 kb of DNA were se-quenced, largely on both strands.

Structure ofthe Human Ca Complex.

Analysis of the com-plete sequence of human C, complex reveals that it consistsof a tandem array of seven Ca genes, each preceded by asingle Jx region (Fig . 1) . We determined that no additionalA genes exist within 10 kb 5' of J,,1 or within 17 kb 3' ofC7 by probing Southern blots of cloned DNA with labeledJx and Cx fragments (data not shown) . Southern blot anal-ysis with J), and Cx probes and partial DNA sequence anal-ysis of the C, complex led previous investigators to proposethat the A locus contained a long array of six Jx-Ct, genepairs arranged in tandem (8, 9) . X7 may have been missedin previous studies because the distance between X6 and X7is much smaller than that between the other A genes . Thedistance between each Cx region polyadenylation site andthe next Jx region is as follows : C,1 to J,,2, 3,800 bp; Cx2to Jx3, 3,600 bp; Ca3 to Jr4, 3,780 bp ; Cx4 to J),5, 3,080bp ; CX5 to Jx6, 2,110 bp; CO to J,,7, 1,420 bp. In addition,the Jx-C, intron length decreases from 1,540 by in A1 to

Dow

nloaded from http://rupress.org/jem

/article-pdf/172/2/609/1394554/609.pdf by guest on 23 May 2021

Page 3: Structure andExpression ofthe Human Immunoglobulin XGenes … · Structure andExpression ofthe Human Immunoglobulin XGenes By ThomasJ. Vasicek and Philip Leder From the DepartmentofGenetics,

1,145 by in X7; A2, X3, X5, and A6 are all -1,310 bp . Fur-thermore, the Ja region in X4 was not previously recog-nized, probably because of its unusual location . X4 containsa large deletion beginning 220 by 3' from JO and endingat by 65 of the Cx region . This deletion removes most ofthe intron, placing JO anomalously close to CO. While acomputer search of the entire C\ complex sequence revealsno additional Jr or Cx regions, the similarity between the

genes extends beyond the coding sequences (see below) .Sequence Conservation Flanking theJA and CA Genes. We

performed a dot matrix comparison of the C;, complex se-quence to itself (Fig . 2) to determine the extent of identitybetween the seven X genes. Direct repeats within the sequenceare revealed by diagonals offset from the identity line, butparallel to it . The longest diagonals are partially due to twodirect repeats -3 kb long in which are embedded X2 andX3. These repeated sequences are shown as the shaded boxesin Fig . 1 and share >95% identity over their entire lengths.The repeats extend through the Ja-Ca intron and 1.4 kb 5'of the J ;, regions. An additional kilobase 5' of the Ja regionsis >85% identical, but 3' of the Cx regions, the sequencesdiverge rapidly and are <60% identical .

Comparison of the remaining J,, and C, regions on thedot matrix results in very short diagonals except in a fewcases . The plot shows that sequences 5' of J,,2, J ;,3, Jx4, andJX5, 3' of CO and CO, and in the introns of X6 and X7share persistent identity extending well beyond the codingregions .

Notably, computer searches of the sequenced regions of

611

Vasicek and Leder

X7

TX6

`PX5

`YX4

43

X2

Figure 1 .

Structure and orga-nization of the human CX com-plex . The Ja and C, exons, la-beled X1-X7, are shown as blackboxes; the ragged left edge onCA indicates that a portion ofthe C region is deleted (seetext) . The dotted line indicatesthe physical continuity of thelocus . Restriction sites shownare Bgl11, Bg; EcoR1, E ; andHindIII, H. Horizontal arrowsbeneath the restriction map de-pict some of the individual se-quencing gel runs. Representa-tive plasmids, X phages, and onecosmid are shown beneath theruler, which is labeled 0-34 kb.The numbers in parentheses in-dicate the total length of theclone inserts . The shaded boxesat X2 and M mark directrepeats that share >95% iden-tity. The box under X7 showsthe location, and the whitearrow indicates the orientation,of a long ORF in the oppositeorientation from the X genes.These sequence data have beensubmitted to the EMBL Gen-Bank Data Libraries under theaccession number X51755 .

X,1

42

R3 Tl4 `YR5 `YR6 R7

MME~MmwjEll

soon I101001001010 5 10 15 20 25 30 34kb

34

30

25

20

1s

10

0 kb

Figure 2 .

Dot matrix comparison of human CX complex DNA se-quence to itself. The solid diagonal line bisecting the matrix is calledthe identity line . The relative positions of the h, and C, regions areshown along the axes, although their sizes are exaggerated for illus-trative purposes . The matrix was generated using the University ofWisconsin Genetics Computer Group version 5 compare and dotplotprograms with a window size of 99 and a stringency of 50 .

Dow

nloaded from http://rupress.org/jem

/article-pdf/172/2/609/1394554/609.pdf by guest on 23 May 2021

Page 4: Structure andExpression ofthe Human Immunoglobulin XGenes … · Structure andExpression ofthe Human Immunoglobulin XGenes By ThomasJ. Vasicek and Philip Leder From the DepartmentofGenetics,

the X locus reveal no Alu repeat sequences . On average, Alusequences occur every 5-6 kb in the human genome (23),and we might expect six such sequences within the A locus .Alu sequences have been implicated in genomic rearrange-ments at other loci (24), however, it appears that they havenot been involved with the duplications observed in the Xlocus . Indeed, lack of Alu sequences in a region this largesuggests that it arose from recent amplification of a sequencethat itself lacked an Alu sequence .

Distinguishing between Functional andPseudogenes .

Havingprecisely located all of the remaining Jx and Cx genes in theX complex, we set out to determine which of them werefunctional. All active Ig J regions are preceded by conservednonamer (G2T5GT) and heptamer (CACTGTG) sequencesseparated by either 12 or 23 unconserved base pairs. The re-verse complement of these sequences (CACTGTG andACASC2) separated by 23 or 12 by is found downstream ofeach corresponding V region (or D in the IgH locus) (1) .These recombination signals are necessary and sufficient foraccurate V-j rearrangements in normal pre-B cells (25, 26) .Mutation of the recombination signals is one potential mech-

A

J AWON A&016INATION SITNK~s

1 -OG-CCC-0.2

S4 C--fp--A--5 OT----A---

aQTATCM---

7 _Or_-

B

S' IMTNMSLAT® ABIION AM PBLYADBIYIATIBN SIONNS

1

7e1 ___

-------- T __A----- A- ,_________ C_41------ __________ __________ ______n__

21N OTMTCMCC CTCAOCtc.C AMAOODOAO AtTAGADCTD CAOATMCA aomA40DDD TCTCTCCTCCS ___ _______T__ ---------t_ __________ __________ ------ ___ __________ ----------4

__�_A_-TAC-------G- C_________ ______� __ __-C__TCT_

6

.q-C-0-__ ________A_ ,----A_-G_ C-4------ ------ -__ __-C___A- -O-TC-____6

-m-c-0---

________A_ ,A_-Q- C_-0------ ____T_____ -----G-___ ------TC__7

-CC-C-O.--- ________A_ ,A_-G- C_-G------ ____T_____ ---------- -----G_C__

711 __T------T -----C____- __________ ----2 CACDCCAABD CATCAAOCCC TTCICCCTM AM! ---------- __________ __________ ----

4 ___T-----T ----{_____ ---------- _-C_6 _________T ---- __T__ ----a..OA_ --C-

C

J-C INM SPLICINO SIOIIALS

2CG0

4 T-- -

a --C -LOU OIt

ATOAOOCTOW_ ~IT TOTO OTA

CTO- --0

0CA0_OT-~(~_ MT

4a C

a

2,memo +-12MUM

7 __T------T -----C_____ ____��p

61 2

14/_{_____ __A__-C___

TATRTCAT TOTCMTCAO--------- ----------

_ c------------ _____ C_____Ca_

Ca---A__.4a_CrCAA-T__--CAD-_TOT t 14M 4--C-------T___,q�oTO7__MA--_4 124 4CMACCOOnQCCaCTUMCACCC=W

------------------------__-G-+1261 4------_______________________A____________T____T__T___-TO- r qT 4 GATCT-T-TCCT---TAT-40A-CMA___-__T__-C___TC_4-CCTOOM t n6Tf 4------T---------------n_T____C_____-C_C_T_T_TACTCT-_OOV_TA + 1251 4-------------------------------A_____.M-4-TT_____--CrtT AO + 1666 4------------------------------M4

+ In

__ `

OT CAD

M WC

ly Gin1!8

Figure 3 .

Comparison of flanking sequences surrounding the JAand Cx regions . The complete sequences for X2 are shown . Identitiesin the other genes are represented by dashes, and deletions are repre-sented by asterisks. The amino acids are numbered as in Kabat et al .(40). (A) Heptamer-nonamer recombination signals 5' of the Jaregions . (B) 3' flanking sequences showing the translation terminator(ter) and the AATAAA polyadenylation signals . (C) Splice donor andacceptor sequences in the J),-C), intron . Add 60 to the numbers be-tween the arrows to calculate the total intron lengths.

Human X Light Chain Gene Structure and Expression

anism for inactivating an Ig gene. Fig. 3 A shows that eachhuman Jx has an intact copy of the consensus nonamer andheptamer sequences separated by 12 bases . The recombina-tion signals in A2 and A3 exactly match the consensus, andthe substitutions present in the others are known to haveno effect on activity (26) . Thus, V-j rearrangements with allof these Jx regions are theoretically possible.

Another potential mechanism for inactivating a gene ismutation of the RNA processing signals for polyadenylationor splicing. Each of the seven A genes has at least one copyof the consensus polyadenylation site (AATAAA) in its 3'untranslated region (Fig. 3 B) . Two polyadenylation sites arepresent in A2, A3, A5, and A6, but the 5' site is mutatedin A1 and A4, and is completely removed from X7 by a 20-bpdeletion . The other signals critical to RNA processing arethose required for splicing . Nearly all eukaryotic introns be-gin with the AGGT"/GAGT splice donor and end with(T/c)10NCAGG splice acceptor sequences (27) . However,only A1, X2, A3, and A7 contain these conserved mRNAsplice signals (Fig. 3 C) . Note that the 5' splice donors inA4, X5, and A6 all lack the GT dinucleotide, and the deletionin the A4 gene completely eliminates the 3' splice acceptorsequence (the underlined GT and AG dinucleotides are in-variant and define the beginning and the end of the intron,respectively) . Thus, it is unlikely that transcripts from A4,A5, and A6 could be appropriately spliced .

The Coding Regions : A7 Encodes the Ke+Oz- C Region.Only X1, A2, A3, and A7 can direct synthesis of a functionalmRNA containing a complete J,\-Cx ORE Hieter et al . (8)showed that A1, A2, and A3 could encode the Mcg, Ke -Oz - , and Ke- Oz' proteins, but could not determine whichof the remaining X genes encoded the Ke+Oz' protein . Thenucleotide sequence data presented in Figs . 3 and 4 indicatethat the newly discovered A7 has all the hallmarks of a func-tional X gene segment . It has the consensus signals for V-jrearrangement, mRNA splicing, and polyadenylation, as wellas an open reading frame that could encode a protein muchlike the published Ke+Oz' A chain (28) . It has been shownthat the Gly residue at position 152 is required for reactivitywith the Kern antiserum and, indeed, A7 contains a Gly atthis position (Fig. 5) . However, A7 differs from publishedamino acid sequences of X proteins at three other positions :157 (Val), 195 (Arg), and 212 (Ala) . To determine whetherthis discrepancy is due to a cloning artifact or polymorphism,we cloned and sequenced this same region from two addi-tional individuals and foundthem to be identical to our originalsequence (data not shown) . Therefore, the possibilities re-main that the published Ke+Oz- amino acid sequence is en-coded on an as yet uncharacterized gene, that additional, rarepolymorphic forms of the characterized A genes exist in thehuman population, or that the published Ke+Oz- aminoacid sequence is incorrect at these positions.

A4, AS, and A6 Are Pseudogenes.

Multiple defects in A4,A5, and A6 render them incapable of coding for functionalA proteins . Most dramatically, A4 contains a deletion of-1,150 by with respect to A3 and A5 . Fig. 1 shows that theJx and Cx regions are much closer together in a4 than inthe other genes. The bases missing from A4, make up most

Dow

nloaded from http://rupress.org/jem

/article-pdf/172/2/609/1394554/609.pdf by guest on 23 May 2021

Page 5: Structure andExpression ofthe Human Immunoglobulin XGenes … · Structure andExpression ofthe Human Immunoglobulin XGenes By ThomasJ. Vasicek and Philip Leder From the DepartmentofGenetics,

Figure 4 .

Comparison of the JX and Ca regions of the 11 known human X genes, including seven genes from the main X complex on chro-mosome 22 and four dispersed X genes. The genes are arranged to keep the most similar sequences closest together. The dispersed X genes are :~X1, a processed pseudogene (12) located on chromosome 18 (O.W. McBride, personal communication) ; and X14 .1, %16 .1, and ßa18 .1, whosechromosomal locations have not yet been reported (13) . The J\ and C ;, region sequences are spliced together, with the Ja regions boxed in theupper left, to show the J,-C ORF. Only the complete DNA and amino acid sequences of X3 are shown with the nucleotide differences in theother genes indicated. Identities between X3 and the other genes are represented by dashes, and deletions are shown with asterisks. The aminoacid sequence is numbered as in Kabat et al . (40) and is discontinuous due to the insertion of 106a after number 106, and deletion of numbers169, 201, and 202 . To optimize the alignment, bases were deleted from two points in X6 (7) : a four-base duplication, AGCT, between posi-tions 246 and 247, and an additional G between positions 348 and 349. The JI, regions are enclosed in the box .

of the intron and the 3' mRNA splice site (see the line 4in Fig. 3 C), plus 64 by of Cx coding sequence in the X3gene (see Fig. 4, ~K 3' of the Jx) . In addition, a three-basedeletion at positions 226-228 in X4 eliminates one codon,

Serological

Neg Ksmm Yca Kern

Neg ozOz+

a3 -Lya-Lsu--Ala -Ser--Ser--Lys -Als--Thr-

-GIn--Thr--

12 -Lys-Leu--Ala--Sur--

--Lys-Als--

-Arg--Gln--Thr--

J17 -Gln-Leu--

Cly -Lys-Val- Lys -Arg--Arg--Ala-

i1 -Lys-Vol-

Cly -Lys-Ale-

-Arg--Gln--Thr-fae -Gln-Vsl--Alo--Ser- Cly -Aan-Ala--Thr--Arg--Cln--Ala-

114 .1 -Gin-Leu--Thr--Sur- Cly -Thr-Gln--Thr--Arg--Gln--Ala-Ale .l -Cln-Leu--Thr--Sur- Cly -Thr-Gin--Thr--Arg--Gin--Ala-fas -Clu-Lsu--Thr--Leu- Gly -Thr-Lys--Thr--Arg--Cln--Ala-ea4 -Cln-Leu---r ---- ,__ Cly _Thr-Cln--Thr--Aen--Cln--Ala-

eale .1 --+---*---Als--Leu- Cly -Thr-Leu--

-Ser- -tereal -Cln-Pro--Pro--Leu-

-Thr-Gln--Ser--Trp--Thr-

Figure 5.

Amino acid differences between the X genes. Theresidues are numbered as Kabat et al . (40), and only the differencesthat determine the isotype of the active genes and those that occur inmore than one gene are shown . The sequences are arranged as in Fig.4 to place the most similar sequences together. The predicted reac-tivity of X1, 2, 3, and 7 to the Mcg, Ke, and Oz antisera are shownon the left, and the relevant amino acids are boxed .

613

Vasicek and Leder

and a single-base deletion at position 270 causes a frameshift .X5 cannot code for a functional protein due to a deletion of11 by in CO (positions 151-161 ; Fig . 4), resulting in a shiftto a missense frame that would not be terminated until thepolyadenylation site. X6 has three frameshift mutations inits coding sequences, and JX,6 has a four-base deletion thatwould result in premature termination four codons later.Moreover, Cß,6 also contains a four-base duplication at po-sition 247 and a single-base insertion at position 349. There-fore, while X4, X5, and X6 have the appropriate signalsfor V;,J>, recombination, rearrangements with these genescannot produce functional proteins.

Evolution ofthe Ig 11 Genes.

While the X evolutionary treeand the structure of the mouse X genes (Fig. 6, A and C)suggest that the mouse X2-4 and X3-1 clusters arose byduplication of an ancestral cluster (29), similar descriptionof the origins of the human X genes is complicated by inter-genic exchanges that appear to have taken place among them .This effect is revealed on examination of the sequences shownin Figs . 3 and 4 . For example, as shown in Fig. 4, Cß,3 andCß,7 are identical from nucleotide positions 41-174, except

Dow

nloaded from http://rupress.org/jem

/article-pdf/172/2/609/1394554/609.pdf by guest on 23 May 2021

Page 6: Structure andExpression ofthe Human Immunoglobulin XGenes … · Structure andExpression ofthe Human Immunoglobulin XGenes By ThomasJ. Vasicek and Philip Leder From the DepartmentofGenetics,

B

C

12X3

X14.1X16.1

yjX1_Xs

_

X,- 14

C X2

v2

X2 X4

v1

X3 X11

11 II

1 II IIIs11

tokn

WXS WX18 .1

- Mouse

Figure 6.

Evolution of the human and mouse X genes for whichDNA sequences have been published. The Jr sequences (where avail-able) were spliced to the Ca sequences for calculation of evolutionaryrelationships. The sequences for human XI-X7 are from Fig. 4; X14.1,X16 .1, and 1GX18 .1 are from Chang et al . (13); and 1GX1 is fromHollis et al . (12) . The mouse Jr-C1, region sequences are from Milleret al . (52) and Selsing et al. (29), and the Xs sequence is from Kudoet al . (14) . (A) Evolutionary relationship of the 11 human and fivemouse X genes. Horizontal distances correspond to the degree ofdivergence between sequences as quantitated using the progressivealignment program described by Feng and Doolittle (22) . (B) Physicalorganization of the human X genes. The CX complex, X1-X7, mapsto 22gii ; ,OXI is on chromosome 18 (O.W McBride, personal com-munication) ; and X14.1, X16 .1, and OX18 .1 map positions have notbeen reported . The human V1, regions are to the left (centromeric) ofthe C complex (53) . (C) Physical organization of the mouse Vr andCa genes (54), all located on chromosome 16 (55), although X5 mapsoutside of the main Ig cluster (56) . The clusters are drawn roughlyto scale, and 10-kb scale bars are shown .

for a G substitution in Cx7 at position 127 . This identitycontinues 5' of the CX regions into the IVS (Fig. 3 C) . Frompositions 175 to 301 (Fig. 4), however, CX1 and C,7 areidentical at all but three nucleotide positions, and they sharefour substitutions that are absent from CO. For the re-mainder of the coding region (nucleotides 302-360) Cal andC1,3 are identical and C7 has four substitutions. This situ-ation reverses again in the 3' untranslated region (Fig . 3 B),where C1,1 and CO share eight substitutions that are absent

614

Human X Light Chain Gene Structure and Expression

from CO. It is possible that there is differential representa-tion of these exchanges at the level of alleles . For example,a similar exchange has occurred in certain alleles of the humanIg C«1 and C«2 genes (30) . The results of Udey and Blum-berg (31) also suggest that exchanges are present in some al-leles of the C1, complex and not in others .A Long ORF Overlaps Cal.

In addition to the X C re-gion coding exons, several ORFs were found in the com-puter analysis of the Cx complex sequence using theUWGCG Frames program . One strikingly long ORF (1,377bp) is encoded in the region of Q,7 on the strand oppositethat encoding the X genes (Fig. 1) . The probability that a458-codnn ORF would occur in a random DNA sequence([61/64]458 = 2.8 x 10 -1°), is very small . While there areseveral ATG codons within this ORF (Fig. 7), none matchthe initiator consensus (32) and, therefore, are unlikely tobe capable of efficient translation initiation . However, thereare several consensus splice acceptor sequences near the be-ginning of this ORF, as well as splice donor sequences nearits end, suggesting that it could be spliced to other exonsat either or both ends . The 2 kb we have sequenced upstreamof this ORF contains no sequences resembling transcriptionalpromoters ; moreover, there are no polyadenylation sites within10 kb downstream of this region . Comparison ofthe deducedamino acid sequence of the long ORF to the sequence databanks reveals no similarities to published sequences. To de-termine whether this region is transcribed, we analyzed RNApurified from fetal liver, adult PBL, three Burkitt Lymphomacell lines, and a myeloma cell line in an RNase protectionassay (data not shown) . No protected fragments were detectedusing 40-Wg samples of total cellular RNA from these tissues(the lower limit of detection is 0 .1 pg of a 700-base frag-ment) . Thus, we were unable to detect transcripts from thisORF in the tissues analyzed, but it may be transcribed insome other tissue, or in these tissues at a different develop-mental stage . While we cannot demonstrate that this ORFis part of an active gene, its discovery remains an intriguingobservation .The Search for a Potential Human A Enhancer

Althoughcis-acting enhancer elements have been identified for the uand IgH genes, such regions have not been described in thehuman X locus . To initiate a search for the potential humanX enhancer, we isolated the active, rearranged X gene (Fig.8) from the U266 cell line (17) . Using this clone, we attemptedto demonstrate tissue-specific expression when the gene wastransfected into a variety of cultured lymphoid cells . How-ever, we found, as others have reported, that cloned X genesdirect very weak transcription in transfected and retrovirus-infected tissue culture cells (33, 34 ; Vasicek, et al ., manu-script in preparation) . The inability to detect substantial tran-scriptional activity from the human X gene in tissue culturesystems suggests that either sequences necessary for its ex-pression are simply not present on the large clone or thatX expression cannot be achieved using DNA transfection.In studies to be published elsewhere (Vasicek, et al ., manu-script in preparation), we have shown that this cloned, rear-ranged X gene is expressed in a tissue-specific manner in trans-genic mice, suggesting a more complex regulatory sequence

1111

X211

13 VX411 11

WX5 VX6 X711 11 11

X14.1 X16.1 VX18.11...J..i.1 1 1 _ _ 1

~yX1-L

1okc

Dow

nloaded from http://rupress.org/jem

/article-pdf/172/2/609/1394554/609.pdf by guest on 23 May 2021

Page 7: Structure andExpression ofthe Human Immunoglobulin XGenes … · Structure andExpression ofthe Human Immunoglobulin XGenes By ThomasJ. Vasicek and Philip Leder From the DepartmentofGenetics,

involving activation ofthe gene in B cell precursors (see Dis-cussion) .BichThuy and Queen (35) have reported that a 4-kb DNA

segment 3' of the mouse X1 gene (see Fig. 6 C) can activatetranscription of a reporter gene construct, the X promoterdriving the bacterial chloramphenicol acetyl transferase gene,in B cells expressing their endogenous X genes . We have notfound similar activity in analogous regions from the humanX locus, where we have tested fragments including as muchas 2.1 kb 3' of X7 (see Discussion) .

In further experiments, we analyzed DNA fragments from

615

Vasicek and Leder

Figure 7 .

DNA sequence of the long ORPoverlapping Cß,7 . The reverse complement of theC),7 region is underlined and its ends are marked.Potential splice acceptor and donor sequences (27)are also underlined, and the first and last base ofthe exons that would result from splicing intothese sequences are marked with vertical lines .

DNase I-treated U266 nuclei to investigate the chromatinstructure in the vicinity of the active X locus . DNasc I-sensi-tive chromatin regions have been shown to correlate stronglywith regions having cis-regulatory function (36, 37) . The chro-matin in the vicinity of DNase I hypersensitive sites, foundin association with enhancer sequences in the x locus andother genes, is apparently in an "open configuration" eitherto promote the binding of transcription factors or becausesuch factors are bound . Several DNase I-hypersensitive siteswere found in the vicinity ofthe functionally rearranged U266X gene (data not shown) . Because we could not demonstrate

Figure 8 .

Restriction map and sequencingstrategy for the active X gene from the U266human myeloma cell line. V regions are shaded,and the J and C regions are black . The rearrangedVX region, U266X, is shown rearranged into theJ), region of X2, and a pseudogene, OVX1, isshown 3 .7 kb 5' of the active gene . The restric-tion sites shown are EcoRI, E ; XbaI, X ; BamHI,B ; and BglII, Bg. The short arrows indicate someof the individual sequencing gel runs, and thescale in kilobases is shown by the ruler on thebottom. These sequence data have been submittedto the EMBL/GenBank Data Libraries under theaccession number X51754 .

Dow

nloaded from http://rupress.org/jem

/article-pdf/172/2/609/1394554/609.pdf by guest on 23 May 2021

Page 8: Structure andExpression ofthe Human Immunoglobulin XGenes … · Structure andExpression ofthe Human Immunoglobulin XGenes By ThomasJ. Vasicek and Philip Leder From the DepartmentofGenetics,

expression of X genes containing these regions, we analyzedthe DNA sequence of the X promoter to detect sequence pat-terns that would indicate the presence of potential enhancersequences. We have performed extensive computer searchesfor enhancer-like sequences in DNA surrounding the CAcomplex and the functionally rearranged U266X gene. While,as expected in a sequence of this size, these searches have re-vealed numerous enhancer core-like and potential transcrip-tion factor binding sites, their relevance to expression, if any,is difficult to evaluate .Mapping ofthe 71 Transcription Start Site.

While the mech-anism of X gene activation is still unknown, some featuresof the A promoter are similar to analogous regions in theand IgH loci . The A promoter contains the conserved oc-

tamer sequence, ATTTGCAT, and a TATAA box (Fig. 9),often GATAA in X genes (38, 39) . To map the exact tran-scription start site in the U266X gene, we annealed syntheticoligonucleotide primers complementary to the leader codingregion ofthe A message with total cellular RNA from U266cells and used AMV reverse transcriptase to synthesize cDNAmolecules corresponding to the 5' end of the X mRNA. Fig.

1

96%I --O-TOA00" T--MiO-= AGG-M= TC----T__C -M-TCCTTT 00-=M¢ CCTT-T-T-- AAAWCA-A -C-CCTCCAOMWVA AOACMTCAQ GOACTTTAOT CTCCTTOCTO AOCCOOOCTO TCM=AAO CAOAAOTCTC TAAOCMAGG MCAAGTOAG OOTOGGOTQA

91

1961

T-OC--0007 -0-CTC--_T --------- e ***CA.-AG-_ TOC-----TC A-__,----- -------G-__ -----AA--A -CCTG..CCA-=6V- OAAOAOGAM TCAGGATOCA GATTTOCATO OAOOTCCCOC CCTTCTCTOA OOCAOAOGGG ATAAOACAOO OCTGG.000C AGMCCAGTG

t181

2821

-CT---C --- ----A----- ATCOGM-TO -CO--Gs,-- __- ___ __- A_C OCT ___ C_T -_G _F ___ -_T o-_ -_C n7

U286VX CTOGGOTCTC AOGAOOCAOC XTCTCAOGA CQTCACCACC ATO OCC TOG OCT CTO CTC ATC CTC ACC CTC CTC ACT CAG OOCWet Ala Trp Ala Lou Lou Ile Lou Thr Lou Leu Thr Gin Gly-26

-12 -19

-6263

2991 11

69k1

__- ----C---- C ----T-7000 A----- t 2314 TCTAT-C ------ CnT TC--O---- T -_T OT_ --- TCC -A- -AG ---ACA QMTOCCT CCAGOVAOO OGCCACAO F 67 i CTOTCTT TCTMCesa +TTTCCe0M TCC TOG GCC CAG TCT OCC CTGThr 01

IVS -~

~- IVS

fly Ser Trp Ala Gin Ser Ala Lou-6

-3

-1 1

4N

134k1 --- --- --A -T- --A -T- --A -T- G-C --0 --- --- 0-0 -C- -OG A-T A-- --T G-O --- -A- -A- CT- -GA TA-U266Y~ ACT CAO OCT CCC TCC OCO TCC OGG TCT CTT MA CAO TCA GTC ACC TTC TCC TOC AGT GGA ACC AGC AOT GAC ATT

Thr Gin Pro Pro Ser Ala Ser Oly Ser Lou Oly Gin Ser Val Thr PM Ser Cys Ser Gly Thr Ser Ser Asp Ile

6

9

11

27 27d 27e 27f136

269k1

see sss sss s-A A-_ --0 CA- ___ -__ -AO __O A-0 -__ _-_ C-0 _-_ -- T GTG --G G-C --- __- AG_ -AT -AC

MWY~ GOT MT TAT MT TAT OTC TCC TOG TAC C011 CM CAC MA OGC AAA OCC CM NIA CTC ATG ATC TAT GAG GTC ACTGly Asn Tyr Asn Tyr Val Ser Trp Tyr Ary Gin His Pro Gly Lys Ala Pro Lys Lou Wet Ile Tyr Glu Val Thr2B

62216

284

1WVX MO COG OOC TCA 0G0 QTC CCT MC COC TTC TCT OGC TCC MC TCT OOC MC ACC OCC TCC CTO ACC GTC TCT OOGLys Arp Pro Ssr Oly Val Pro Asn Ary Phe liar Gly Ser Lys Ser Oly Asn Thr Ala Ser Leu Thr Val Ser Gly63

77HlEPTNBi i~- 23 -

286

338

CAGAQTA ACACAOOCAGATGk1

QC- __A __C _a ___ ___ ___ _{ -__ -__ --T CAG GTO -00 -AC A__ -_C -C_ OCA =_-__~_-_

-_=x==

UY6BV~ CTC CAO OCT GAG OAT 0110 OCT OAT TAT TAC TOC AOC TCA TAC OCA OW AGT MT ACT TTG ATT TTC GGC GGA OGOLou Gin Ala Glu Asp Olu Ala Asp Tyr Tyr Cys Ser Ser Tyr Ala Oly Ser Asn Ser Lou Ile Phe Gly Gly Gly78

96 96a 99

161--h NONA163t

N1 MOAAOTOAG "EdECGt TTTCOCCATC

11266112 ACC AGO CTO ACC OTC CTA OOTMOTCTC TTCTCCCCTC TCCTTMMACTCThr Arq Lou Thr Val Lou I IVSi (1316bp)162

U6 1t

616 Human X Light Chain Gene Structure and Expression

10 shows a DNA sequencing gel with adjacent lanes con-taining the cDNA and a DNA sequencing ladder. These dataindicate that the U266X gene transcript begins at a G nucleo-tide 29 by downstream of the TATAA box and 41 by up-stream of the translation initiation codon (see Fig . 9) .A AVPselidogene 3.7 kb Upstream ofthe Active Gene

Whilesearching for enhancer sequences in the A promoter region,we discovered a X pseudogene, OVA1, 3.7 kb upstream ofU266X (Fig . 8) . The DNA sequences of OVA1 and U266Xare compared in Fig. 9 . OVA1 has most of the sequence char-acteristics of active genes : upstream octamer and TATAA(GATAA) sequences, conserved mRNA splice donor and ac-ceptor sequences, and appropriately spaced heptamer andnonamer sequences for VA-JA rearrangement . However, italso has a 10-bp deletion, from position 135 to 144 in theV),-coding region, that would render it inactive . Interest-ingly, t/,Vx1 and U266X are members of different Va regionfamilies . The U266X protein is 94% similar to the Mcg VAregion, a subgroup II protein, and the predicted amino acidsequence of the hypothetical OVAL protein is 83% similarto the DEL Vx region of subgroup III (40) . These two

Figure 9 .

Comparison of the U266X and OValV region sequences . The entire sequence ofU266X is shown but only the differences in OVI,1are shown . The deduced amino acid sequence,numbered as in Kabat et al. (40), of the activegene is indicated below the DNA sequences. Theoctamer sequence 67 by upstream of the transcrip-tion start site, and the "TATAA' box (GATAA inX genes) 28 by upstream, are doubly underlined .The transcription start is indicated with an arrow41 by upstream of the translation initiationcodon. The heptamer and nonamer Vx-Jx rear-rangement sequences are underlined in the pseu-dogene sequence and the J A2 sequences, intowhich the active gene is rearranged, are shown .

Dow

nloaded from http://rupress.org/jem

/article-pdf/172/2/609/1394554/609.pdf by guest on 23 May 2021

Page 9: Structure andExpression ofthe Human Immunoglobulin XGenes … · Structure andExpression ofthe Human Immunoglobulin XGenes By ThomasJ. Vasicek and Philip Leder From the DepartmentofGenetics,

genes are adjacent and yet they share a nucleotide similarityof only 66%.

Discussion

Figure 10 .

Primer extensionexperiment showing the tran-scription start in the activeU266 X gene. The primer usedwas a 30-base oligonucleotidecomplementary to the 3' endof the leader exon . The sameprimer was used to produce thesequence ladder size marker. TheG at position +1 is indicatedby an arrow.

The complexity of the X proteins is reflected in the com-plexity of the X gene family. While the X genes share a numberof structural similarities with the IgH and K genes, they ex-hibit differences as well, The human X locus consists of manyVx regions, as do the K and IgH loci, but, unlike the otherIg loci, it encodes a cluster of Jx-Cx pairs. Although themouse X locus has only two functional Vx regions, it alsocontains three functional Jx-Cx pairs. This conservation ofmultiple Ja-Cr pairs over evolutionary time, also seen in theshark (41), suggests that it has an important role in the im-mune response. Obviously, these multiple genes contributeto L chain diversity, a function that is achieved with moreeconomy by multiple J regions in the K and IgH loci in theabsence o£ duplicated C regions .The structure of the Cx complex allows us to assign the

four X isotypes to genes within the main X complex . Thiscomplex contains a tandem array of seven Jx-Cx gene pairsthat we call X1-X7, 5' to 3'. In accordance with previous studies(8), we find that the first three genes are functional and codefor the most common isotypic forms of the Cx region, Mcg,Ke- Oz - , and Ke - Oz+, respectively. We also find a new Xgene, X7, that appears to be functional and is capable ofproducing a protein similar to the Ke+Oz- chain that ispresent in all human sera (4) . We have found that X4, X5,and X6 are pseudogenes, in contrast to the report by Dari-avach et al . (9), in which they describe X6 as the Ke+Oz -gene. It is possible, though unlikely, that this conflict resultsfrom a polymorphism . The sequence of CX6 presented byDariavach et al . (9) resembles a combination of our sequencesfor CX6 and CX7 . Such a hybrid gene may have resulted fromrecombination due to unequal crossing over between X6 andX7, deleting the intervening 3 kb. However, we have foundthree alleles of X6 and X7 to contain sequences identical tothose shown in Figs . 3 and 4 (data not shown) . Furthermore,Taub et al . (42) examined >100 individuals by Southern blotwithout detecting any alleles containing such a 3-kb dele-tion . Thus, the result of Dariavach et al . (9) may be due tocloning artifacts or sequencing errors, or they may have cloneda rare, deleted allele .

617

Vasicek and Leder

In addition to the seven X genes in the C, complex, sev-eral unlinked DNA fragments crosshybridize with Q, probes(8) (Fig . 6, A and B) . Two of these fragments contain pseu-dogenes; one, OX1, is a processed pseudogene (12). locatedon chromosome 18 (O.W. McBride, personal communica-tion) and the other, which has not been mapped to a specificchromosomal location, contains a defective C, region calledt/A18.1 (13) . Two other dispersed X sequences, called X14.1and X16.1, may be functional genes ; they are closely relatedto the X genes in the main X complex and share >86%nucleotide sequence similarity with the active Ig X genes (13) .Hollis et al . (43) showed that the X14.1 gene, and perhapsthe X16.1 gene, are transcribed in pre-B cells. The structureof the X14.1/X16 .1 genes, and their expression pattern, ledHollis et al . (43) to suggest that they are human analoguesof the mouse X5 gene (Fig. 6 C) . L chain-like proteins thatassociate with H chains in human (44) and mouse (45) pre-Bcells may be encoded by the X14.1 and Xs genes, respectively.The X gene family appears increasingly complex ; for, in

addition to the seven X genes in the Cx complex we havedescribed, three polymorphic forms of the X locus may en-code distinct X proteins (42) . These polymorphic alleles ap-parently result from duplication of the region containing X2and X3 and concurrently increase the size of the 8.4-kb EcoRlfragment on which they are normally found. Alleles carryingthese polymorphic forms appear to have one, two, or threeadditional X genes that have not yet been characterized indetail, though they are likely to be closely related to X2 andX3, as they share identical restriction maps. This amplificationof the X genes appears to be due to duplication of a 5.4-kbregion containing X2 and X3 (Fig . 1) . X2 and X3 are them-selves located within two repeated sequences that share >95%sequence identity for 3 .2 kb containing Jx-Cx2 and Jx-Cx3(Fig . 1, shaded boxes) ; an additional 1.0 kb 5' of these regionsshare >85% identity. Since analogous positions on theserepeats are 5.4 kb apart, mispairing between two chromo-somes 22 and unequal crossing over or sister chromatid ex-change could result in additional copies of the repeat unit .The resulting duplication would yield a 5 .4-kb increase inthe distance between the two EcoRI sites flanking X2 andX3 at the expense of the sister chromatid . Both unequalcrossing over, and precise excision ofthe direct repeats, couldalso result in the loss of X gene copies, but X loci with fewerthan two genes in this region have not been found ; this sug-gests that there may be specific selection against such allelesin the population. In any case, individuals in the populationcould have four to seven potentially active X genes per haploidgenome .While X2 and X3 are the most similar of the X genes, and

the similarity between them is the most extensive, the otherfive X genes also share a very high degree of DNA sequencesimilarity extending beyond the coding regions (Fig . 2) . Thesimilarity among the genes suggests a role for recent geneduplication events and/or exchange of genetic informationby gene conversion . The lack of Alu sequences within thelocus is consistent with this interpretation .While the X locus has a number of gross structural differ-

Dow

nloaded from http://rupress.org/jem

/article-pdf/172/2/609/1394554/609.pdf by guest on 23 May 2021

Page 10: Structure andExpression ofthe Human Immunoglobulin XGenes … · Structure andExpression ofthe Human Immunoglobulin XGenes By ThomasJ. Vasicek and Philip Leder From the DepartmentofGenetics,

ences from the « and IgH loci, it also differs in the moresubtle structure of its transcriptional control elements. TheIgH gene has a powerful, position- and orientation-inde-pendent, tissue-specific enhancer element in the J.-C. intron(46) . The K locus has a similar intron enhancer (47), althoughits activity appears to be not as strong.

To develop a functional assay for the mechanisms of tran-scriptional control of the X genes, we cloned the active Xgene (rearranged into X2) from the U266 human myelomacell line. We and others (33, 34, 48) have found that clonedX genes are only weakly transcribed in transfected tissue cul-ture cells, even in cells that are actively expressing endoge-nous A genes . Furthermore, Neuberger et al . (49) and Hagmanet al . (50) found that expression of certain mouse A clonesin transgenic mice required attachment of IgH chain enhancersequences . However, we have found that the 8.5-kb U266Xclone, with 5.2 kb of 5' flanking sequences, is specificallytranscribed at high levels in the lymphoid tissues of threetransgenic mouse lines, without addition of heterologous en-hancer sequences (Vasicek, et al ., manuscript in preparation) .This result suggests that sequences required for tissue-specificactivation of the human A gene in transgenic mice are lo-cated within the 8.5-kb U266X clone. This does not ruleout the possibility that additional enhancer elements might

References

2 .

3 .

4 .

5 .

6 .

7 .

be located further outside of the structural genes, but it sug-gests that, to become activated, the gene must be presentin the cell during some stage of development . Such condi-tions are obviously difficult, if not impossible, to recapitu-late in tissue culture cells .

BichThuy and Queen (35) recently found that severalnonoverlapping restriction fragments up to 4 kb downstreamof the mouse Xt gene (the most 3' X gene ; see Fig . 6 C) arecapable of activating the A promoter and other promoters .Furthermore, Meyer and Neuberger (51) recently demonstratedthe presence of a second, more powerful enhancer 9 kb down-stream of the mouse tc C region . Since V region rearrange-ment into one ofthe A genes deletes the upstream sequences,one might predict a downstream position for the X enhancer .We have not detected enhancer activity in sequences as muchas 2.1 kb downstream of X7 (data not shown) . We cannotrule out the possibility that the human X locus contains ad-ditional enhancer sequences further 3' of the CX complex,but it seems that the 8.5-kb U266X clone contains the neces-sary sequences for high level, tissue-specific expression in vivo.Since these sequences are only active in transgenic mice, how-ever, further characterization will require utilization of thisin vivo model .

We thank Peter Gentile for oligonucleotide synthesis, and Cynthia Morton and Jacob Sarid for the 49a-4cosmid . We are also grateful to Judy Swain and Robert Replogle for their early efforts in the project .

A portion of this work was supported by grants from E.I . Dupont, Inc ., the Markey Foundation, andthe American Business Foundation for Cancer Research .

Address correspondence to Philip Leder, Department of Genetics, Harvard Medical School, 25 ShattuckStreet, Boston, MA 02115 .

Received for publication 22 February 1990 and in revised form 16 May 1990 .

Max, E.E . 1989. Immunoglobulins : molecular genetics. In Fun-damental Immunology, 2nd ed . WE. Paul, editor. Raven PressLtd ., New York . 235-290 .Hood, L ., WR. Gray, B.G. Sanders, and W.J . Dreyer. 1967 .Light chain evolution : antibodies. Cold Spring Harbor Symp.Quant. Biol. 32:133 .Ein, D. 1968 . Nonallelic behavior ofthe Oz groups in humanX immunoglobulin chains . Proc Nat. Acad. Sci. USA. 60:982.Hess, M., N . Hilschmann, L . Rivat, C . Rivat, and C . Ropartz.1971 . Isotypes in human immunoglobulin X chains . Nature(Lond.) . 234:58 .Fett, J.W., and H.F. Deutsch . 1975 . A new X chain gene . Im-munochemistry. 12:643 .Hilschmann, N., and L.C . Craig . 1965 . Amino acid sequencestudies with BenceJones proteins. Proc. Nad. Acad. Sci. USA.53:1403 .Frangione, B., T Moloshok, F. Prelli, and A . Solomon . 1985 .Human X light chain constant region gene C"~': the primarystructure of X VI Bence Jones protein Mor. Proc. Nad. Acad.Sci. USA. 82:3415 .

61 8

8 .

9 .

Human X Light Chain Gene Structure and Expression

Hieter, P.A ., G.F. Hollis, S .J . Korsmeyer, TA. Waldmann, andP. Leder. 1981 . Clustered arrangement of immunoglobulin Xconstant region genes in man . Nature (Lond.). 294:536 .Dariavach, P., G. Lefranc, and M.-P. Lefranc . 1987 . Humanimmunoglobulin Cx6 gene encodes the Kern+Oz- X chain,and CO and Cx5 are pseudogenes. Proc Nat. Acad. Sci. USA.84:9074 .Erickson, J ., J . Martinis, and C.M . Croce. 1981 . Assignmen tof the genes for human X immunoglobulin chains to chromo-some 22 . Nature (Lond.). 294:173 .McBride, O.W, PA. Hieter, G.F. Hollis, D. Swan, M.C. Otey,and P. Leder. 1982. Chromosomal location of human K andX immunoglobulin light chain constant region genes .J. ExpMed. 155:1480 .Hollis, G.F., P.A . Hieter, O.W. McBride, D. Swan, and P. Leden1982 . Processed genes : a dispersed human immunoglobulingene bearing evidence of RNAtype processing. Nature (Lond.).296:321 .Chang, H., E. Dmitrovsky, P.A . Hieter, K . Mitchell, P. Leder,L. Turoczi, I.R . Kirsch, and G.F. Hollis . 1986 . Identificatio n

Dow

nloaded from http://rupress.org/jem

/article-pdf/172/2/609/1394554/609.pdf by guest on 23 May 2021

Page 11: Structure andExpression ofthe Human Immunoglobulin XGenes … · Structure andExpression ofthe Human Immunoglobulin XGenes By ThomasJ. Vasicek and Philip Leder From the DepartmentofGenetics,

of three new Ig Mike genes in man. J. Exa Med. 163:425 .14 . Kudo, A., N. Sakaguchi, andF. Melchers . 1987 . Organization

of the murine Ig-related Xs gene transcribed selectively in pre-B lymphocytes . EMBO (Eur. Mot Biol. Organ) J. 6:103 .

15 . Maniatis, T, E.F. Fritsch, and J. Sambrook . 1982 . MolecularCloning: ALaboratory Manual . Cold Spring Harbor labora-tory, Cold Spring Harbor, New York . 545 pp.

16 . Ish-Horowicz, D., and J.K . Burke. 1981 . Rapid and efficientcosmid vector cloning. Nucleic Acids Res. 9:2989 .

17 . Nilsson, K., H. Bennich, S.G.O . Johansson, and J. Ponten .1970 . Established immunoglobulin producing myeloma (IgE)and lymphoblastoid (IgG) cell lines from an IgE myeloma pa-tient. Clin. Exp. Immunol. 7:477 .

18 . Sanger, F., S. Nicklen, and A.R . Coulson. 1977 . DNA se-quencing with chain-terminating inhibitors . Proc Natl. Acad.Sci . USA. 74:5463.

19 . Tabor, S., andC.C. Richardson. 1987 . DNAsequence analysiswith a modified bacteriophage T7 DNApolymerase. Proc Natl.Acad. Sci. USA. 84:4767.

20 . Sheen, J., and B. Seed. 1988 . Electrolyte gradient gels for DNAsequencing. Biotechniques. 6:942 .

21 . Devereux, J., P. Haeberli, and O. Smithies. 1984 . A compre-hensive set of sequence analysis programs for the VAX. Nu-cleic Acids Res. 12:387 .

22 . Feng, D.-F., and R.F . Doolittle . 1987 . Progressive sequencealignment as a prerequisite to correct phylogenetic trees.J. MotEvol. 25:351 .

23 . Sharp, P.A. 1983 . Conversion of RNAto DNA in mammals:Alu-like elements and pseudogenes. Nature (Lond.). 301:471.

24 . Hess, J.F ., M. Fox, C. Schmidt, andC.-K. J. Shen . 1983 . Mo-lecular evolution of the human adult a-globin-like gene re-gion : insertion and deletion ofAlu family repeats and nonAluDNA sequences. Proc Nad. Acad. Sci. USA. 80:5970.

25 . Akira, S., K. Okazaki, and H. Sakano. 1987 . Two pairs ofrecombination signals are sufficient to cause immunoglobulinV(D)-J joining. Science (Wash. DC). 238 :1134.

26 . Hesse, J.E ., M.R . Lieber, K. Mizuuchi, and M. Gellert. 1989 .V(DV recombination: a functional definition of the joiningsignals . Genes & Dev. 3:1053.

27 . Shapiro, M.B ., and P. Senapathy. 1987 . RNA splice junctionsof different classes of eukaryotes : sequence statistics and func-tional implications in gene expression . Nucleic Acids Res.15 :7155 .

28 . Ponstingl, VH., M. Hess, and N. Hilschmann . 1968 . Die voll-standige aminosaure-sequenz des BenceJones-Proteins deco eineneue untergruppe der immunglobulin-Irketten vom X-typ.Hoppe-Seyler's Z. Physiol. Chem. 349:867 .

29 . Selsing, E., J. Miller, R. Wilson, and U. Storb. 1982 . Evolu-tion of mouse immunoglobulin X genes. Proc Natl. Acad. Sci.USA. 79:4681.

30 . Flanagan, J.G ., M.-P. Lefranc, and T.H . Rabbitts. 1984 . Mech-anisms of divergence and convergence of the human immuno-globulin al and cO constant region gene sequences. Cell.36 :681 .

31 . Udey, J.A ., and B.B . Blumberg. 1988 . Intergenic exchangemaintains identity between two human X light chain immu-noglobulin gene intron sequences . Nucleic Acids Res. 16:2959.

32 . Kozak, M. 1986 . Point mutations define a sequence flankingthe AUGinitiator codon that modulates translation by eukary-otic ribosomes. Cell. 44:283 .

33 . Cone, R.D., E.B. Reilly, H.N . Eisen, R.C. Mulligan . 1987 .Tissue-specific expression offunctionally rearranged X1 Ig genethrough a retrovirus vector. Science (Wash. DC). 236 :954 .

619

Vasicek and Leder

34 . Picard, D., and W Schaffner. 1983 . Correct transcription ofa cloned mouse immunoglobulin gene in vivo. Proc Natl. Acad.Sci . USA. 80:417 .

35 . BichThuy, L. t ., andC. Queen. 1989 . An enhancer associatedwith the mouse immunoglobulin X1 gene is specific for X lightchain producing cells . Nucleic Acids Res. 17:5307 .

36 . Stalder, J., A. Larsen, J.D. Engel, M. Dolan, M. Groudine,andH. Weintraub. 1980. Tissue-specific DNAcleavages in theglobin chromatin domain introducedby DNAaseI. Cell. 20:451 .

37 . Parslow, TG., andD.K . Granner. 1983 . Structureofa nuclease-sensitive region inside the immunoglobulin kappa gene : evi-dence for a role in gene regulation . Nucleic Acids Res. 11 :4775 .

38 . Falkner, F.G., and H.G. Zachau . 1984 . Correct transcriptionof an immunoglobulin K gene requires an upstream fragmentcontaining conserved sequence elements. Nature (Lon4 310:71 .

39 . Parslow, TG., D.L . Blair, W.J . Murphy, and D.K . Granner.1984 . Structure of the 5' ends of immunoglobulin genes: anovel conserved sequence . Proc Natl. Acad. Sci. USA. 81:2650.

40 . Kabat, E.A ., IT. Wu, M. Reid-Miller, H.M . Perry, andK.S .Gottesman. 1987 . Sequence s ofproteins ofimmunological in-terest . U.S. Department of Health and Human Services . U.S.Govt . Printing Office No. 165-462. 804 pp.

41 . Schluter, S.F., V.S. Hohman, A.B. Edmundson, and J.J . Mar-chalonis . 1989 . Evolution of immunoglobulin light chains :cDNA clones specifying sandbar shark constant regions. Proc.Natl . Acad. Sci. USA . 86 :9661 .

42 . Taub, R.A ., G.F. Hollis, P.A. Hieter, S.J . Korsmeyer, TA.Waldmann, and P. Leder. 1983 . Variable amplification of im-munoglobulin Xlight-chain genes in human populations. Na-ture (Lond.). 304:172 .

43 . Hollis, G.F., R.J . Evans, J.M . Stafford-Hollis, S.J . Korsmeyer,andJ.P. McKearn. 1989 . ImmunoglobulinX light-chain-relatedgenes 14 .1 and 16 .1 are expressed in pre-B cells and may encode the human immunoglobulin w light-chain protein. Proc.Natl. Acad. Sci . USA. 86:5552.

44 . Kerr, W.G ., M.D . Cooper, L . Feng, P.D . Burrows, and L.M .Hendershot . 1989 . Muheavychains can associate with a pseudo-light chain complex (~L) in human pre-B cell lines . Int. Im-munol. 1:355 .

45 . Pillai, S., and D. Baltimore . 1987 . Formation of disulphide-linked 14zwz tetramers in pre-B cells by the 18k w-immuno-globulin light chain. Nature (Lond.) . 329:172 .

46 . Banerji, J., L. Olson, andWSchaffner. 1983 . A lymphocyte-specific cellular enhancer is located downstream of thejoiningregion in immunoglobulin heavy chain genes. Cell. 33 :729 .

47 . Picard, D., andW. Schaffner. 1984 . A lymphocyte-specific en-hancer in the mouse immunoglobulin K gene. Nature (Lond.).307:80.

48 . Picard, D., and W Schaffner. 1985 . Cell-type preference ofimmunoglobulin K and X gene promoters. EMBO (Eur MotBiol . Organ)J. 4:2831.

49 . Neuberger, M.S ., H.M . Caskey, S. Pettersson, H .T. Williams,and M.A . Suranl . 1989 . Isotype exclusion and transgene down-regulation in immunoglobulin-X transgenic mice . Nature(Loud.). 338:350 .

50 . Hagman, James, D. Lo, LT Doglio, J. Hackett, Jr., C.M .Rudin, D. Haasch, R. Brinster, and U. Storb. 1989 . Inhibi-tion of immunoglobulin gene rearrangement by the expres-sion of a X transgene. J. Exlz Med. 169:1911.

51 . Meyer, K.B ., and M.S. Neuberger. 1989 . The immunoglob-ulin is locus contains a second, stronger B-cell-specific enhancerwhich is located downstream of the constant region . EMBO(Eur Mot Biol. Organ) J. 8:1959 .

Dow

nloaded from http://rupress.org/jem

/article-pdf/172/2/609/1394554/609.pdf by guest on 23 May 2021

Page 12: Structure andExpression ofthe Human Immunoglobulin XGenes … · Structure andExpression ofthe Human Immunoglobulin XGenes By ThomasJ. Vasicek and Philip Leder From the DepartmentofGenetics,

52 . Miller, J., E. Selsing, and U. Storb. 1982 . Structural altera-tions in J regions of mouse immunoglobulin lambda chainsare associated with differential gene expression . Nature (Lond.).295:428.

53 . Emanuel, RS., P.C. Nowell, C. McKeon, C.M . Croce, andM.A . Israel . 1986 . Translocation breakpoint mapping: molec-ular and cytogenetic studies ofchromosomes 22 . Cancer Genet.Cytogenet. 19 :81 .

54 . Storb, U., D. Haasch, B. Arp, P. Sanchez, P.-A. Cazenave,andJ. Miller. 1989 . Physical linkage ofmouse X genes by pulsed-

620 Human X Light Chain Gene Structure and Expression

field gel electrophoresis suggests that the rearrangement pro-cess favors proximate target sequences. MA Cell. Biol. 9:711 .

55 . D'Eustachio, P., A.L.M. Bothwell, TK. Takaro, D. Baltimore,and F.H. Ruddle. 1981 . Chromosomal location of the struc-tural genes encoding murine immunoglobulin X light chains .J. Exp% Med. 153:793 .

56 . Kudo, A., and F. Melchers. 1987 . A second gene, V,~B in theAS locus of the mouse, which appears to be selectively ex-pressed in pre-B lymphocytes. EMBO (Eur. Mol. Biol. Organ)J. 6 :2267.

Dow

nloaded from http://rupress.org/jem

/article-pdf/172/2/609/1394554/609.pdf by guest on 23 May 2021


Recommended