+ All Categories
Home > Documents > Isolation and sequence of the gene for iso-2-cytochrome c in ...

Isolation and sequence of the gene for iso-2-cytochrome c in ...

Date post: 14-Feb-2017
Category:
Upload: dangkhanh
View: 224 times
Download: 2 times
Share this document with a friend
5
Proc. Natl. Acad. Sci. USA Vol. 77, No. 1, pp. 541-545, January 1980 Genetics Isolation and sequence of the gene for iso-2-cytochrome c in Saccharomyces cerevisiae (yeast cytochrome c genes/sequence comparison/silent nucleotide substitutions/molecular cloning) DONNA L. MONTGOMERY*t, DAVID W. LEUNGt, MICHAEL SMITHf, PETER SHALIT*, GERARD FAYE*, AND BENJAMIN D. HALL* *Department of Genetics, SK-50, University of Washington, Seattle, Washington 98195; and tDepartment of Biochemistry, University of British Columbia, Vancouver, British Columbia, Canada V6T iW5 Communicated by Emanuel Margoliash, October 29,1979 ABSTRACT The two apocytochrome c proteins of yeast are coded for by separate genes. Iso-2-cytochrome c differs from the iso-I protein at 17 positions within a homologous sequence of 108 amino acids. The previously cloned iso-1-cytochrome c coding sequence has been used to identify X-yeast recombinant phage containing the gene for iso-2-cytochrome c. The latter protein contains the dipeptide Ala-Ala which is coded for by the nucleic acid sequence G-CN-G-C-N. The recognition specificity of restriction endonuclease Fnu4HI for G-CN-G-C provided a rapid means of locating the region of the cloned fragment which codes for iso-2-cytochrome c. The DNA sequence of this gene has been determined and compared with that of the iso- 1-cytochrome c locus. There is no intervening sequence within the gene for iso-2-cytochrome c. At 45 of the 91 positions for which iso-1- and iso-2-cytochrome c have the same amino acid, the codons differ. Such third position variation does not occur within the region coding for amino acids 70-80, the protein sequence that is also most conserved among all eukaryotic cy- tochromes c. Comparisons of the amino acid sequences of the cytochrome c apoproteins of different eukaryotic organisms have provided valuable information about evolutionary relationships between organisms as well as insight into the structural basis of cyto- chrome c function (1, 2). One conclusion of these structural studies-that much of the cytochrome c sequence is highly eonserved during evolution-has provided us with a means for isolating the genes that code for cytochrome c and analyzing their structure. In previous work we cloned the gene for yeast iso-l-cyto- chrome c by screening a pool of X-yeast recombinants with a molecular hybridization probe constructed in accordance with genetic data (3). The complete DNA sequence is now known for the iso-1-cytochrome c coding locus (CYC1) (4). The se- quence homology between the yeast iso-1- and iso-2-cyto- chrome c genes is expected to be at least 62% and could be as high as 94%, if the same codons are used at positions where both proteins contain the same amino acid. A degree of homology near the latter value would be sufficient for the formation of stable heteroduplexes between these two genes. We have ob- served such homology and have used the DNA of the isolated CYC1 gene as a molecular hybridization probe to identify and isolate recombinant DNA clones containing the coding se- quence for yeast iso-2-cytochrome c. The strategy we have used to clone the iso-2-cytochrome c gene may have general applicability for isolation of cytochrome c genes. The iso-I and iso-2 cytochrome c proteins differ in 17 of 108 amino acid residues (5), whereas cytochrome c molecules from other fungi and from invertebrates differ from yeast iso-l-cytochrome c in 26-45 amino acid residues. Despite this greater degree of sequence divergence, the strong evolutionary conservation of limited parts of the cytochrome c gene sequence should make possible cytochrome c gene cloning with inter- specific DNA hybridization probes. MATERIALS AND METHODS Yeast Strains. The CYC1+ CYC7+ strain D311-3A, cycl-1 CYC7+ strain D234-4D, and cycl-9 CYC7+ strain B596 have been described (3). DNA Methods. DNA preparation, labeling, cloning, re- striction mapping, and sequence determination were carried out as described (3, 4). All experiments involving recombinant DNA were done in accordance with the official guidelines operative in the United States and Canada. RESULTS Physical Identification of the Gene for Iso-2-Cytochrome c. DNA of recombinant plasmid pYeCYC1(0.6), carrying the CYCJ coding sequence, hybridized to two bands in an EcoRI digest of DNA from a CYC1+ yeast strain (Fig. 1, lane b). The upper of these bands was shifted by a mutation in the EcoRI site within CYC1 and disappeared when cycl-1 (CYC1 dele- tion) DNA was used for hybridization; the lower band was the same size, 2.2 kilobases (kb), in all three strains. These were the only hybridization bands observed with the pYeCYC1(0.6) probe. Because of the high degree of sequence homology be- tween the iso-i- and iso-2-cytochrome c apoproteins (5), this second band is assumed to correspond to the iso-2-cytochrome c structural gene (CYP3, CYC7) (6, 7). According to this view, the sequence homology responsible for the band at 2.2 kb in- volves the 321 base pairs of CYC1 coding sequence in the pYeCYC1(0.6) probe and not the 276 base pairs of 3' flanking DNA sequences. This interpretation was confirmed by the demonstration that an EcoRI/Taq I fragment excised from pYeCYC1(0.6) also hybridized to the 2.2-kb EcoRI fragment in total yeast DNA (data not shown). In addition to the coding region, this EcoRI/Taq I probe contains only 20 base pairs of 3' distal DNA (4). Cloning the Iso-2 Cytochrome c Gene. The gene for iso- 2-cytochrome c was first cloned from strain D2344D (8), which contains the cycl-1 deletion mutation and hence lacks the CYC1 coding sequence (3). pYeCYC1(O.6) DNA was used to screen a pool of EcoRI fragments of D234-4D DNA cloned in Xgt (9). A clone was obtained which contained a 2.2-kb EcoRI fragment, assumed to correspond to the lower band in Fig. 1. The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "ad- vertisement" in accordance with 18 U. S. C. §1734 solely to indicate this fact. 541 Abbreviations: kb, kilobase(s); G3PDH, glyceraldehyde-3-phosphate dehydrogenase. t Present address: Department of Biochemistry, The University of Texas Health Science Center, San Antonio, TX 78284.
Transcript
Page 1: Isolation and sequence of the gene for iso-2-cytochrome c in ...

Proc. Natl. Acad. Sci. USAVol. 77, No. 1, pp. 541-545, January 1980Genetics

Isolation and sequence of the gene for iso-2-cytochrome c inSaccharomyces cerevisiae

(yeast cytochrome c genes/sequence comparison/silent nucleotide substitutions/molecular cloning)

DONNA L. MONTGOMERY*t, DAVID W. LEUNGt, MICHAEL SMITHf, PETER SHALIT*, GERARD FAYE*, ANDBENJAMIN D. HALL**Department of Genetics, SK-50, University of Washington, Seattle, Washington 98195; and tDepartment of Biochemistry, University of British Columbia,Vancouver, British Columbia, Canada V6T iW5

Communicated by Emanuel Margoliash, October 29,1979

ABSTRACT The two apocytochrome c proteins of yeast arecoded for by separate genes. Iso-2-cytochrome c differs fromthe iso-I protein at 17 positions within a homologous sequenceof 108 amino acids. The previously cloned iso-1-cytochrome ccoding sequence has been used to identify X-yeast recombinantphage containing the gene for iso-2-cytochrome c. The latterprotein contains the dipeptide Ala-Ala which is coded for by thenucleic acid sequence G-CN-G-C-N. The recognition specificityof restriction endonuclease Fnu4HI for G-CN-G-C provideda rapid means of locating the region of the cloned fragmentwhich codes for iso-2-cytochrome c. The DNA sequence of thisgene has been determined and compared with that of the iso-1-cytochrome c locus. There is no intervening sequence withinthe gene for iso-2-cytochrome c. At 45 of the 91 positions forwhich iso-1- and iso-2-cytochrome c have the same amino acid,the codons differ. Such third position variation does not occurwithin the region coding for amino acids 70-80, the proteinsequence that is also most conserved among all eukaryotic cy-tochromes c.

Comparisons of the amino acid sequences of the cytochromec apoproteins of different eukaryotic organisms have providedvaluable information about evolutionary relationships betweenorganisms as well as insight into the structural basis of cyto-chrome c function (1, 2). One conclusion of these structuralstudies-that much of the cytochrome c sequence is highlyeonserved during evolution-has provided us with a means forisolating the genes that code for cytochrome c and analyzingtheir structure.

In previous work we cloned the gene for yeast iso-l-cyto-chrome c by screening a pool of X-yeast recombinants with amolecular hybridization probe constructed in accordance withgenetic data (3). The complete DNA sequence is now knownfor the iso-1-cytochrome c coding locus (CYC1) (4). The se-quence homology between the yeast iso-1- and iso-2-cyto-chrome c genes is expected to be at least 62% and could be ashigh as 94%, if the same codons are used at positions where bothproteins contain the same amino acid. A degree of homologynear the latter value would be sufficient for the formation ofstable heteroduplexes between these two genes. We have ob-served such homology and have used the DNA of the isolatedCYC1 gene as a molecular hybridization probe to identify andisolate recombinant DNA clones containing the coding se-quence for yeast iso-2-cytochrome c.The strategy we have used to clone the iso-2-cytochrome c

gene may have general applicability for isolation of cytochromec genes. The iso-I and iso-2 cytochrome c proteins differ in 17of 108 amino acid residues (5), whereas cytochrome c moleculesfrom other fungi and from invertebrates differ from yeast

iso-l-cytochrome c in 26-45 amino acid residues. Despite thisgreater degree of sequence divergence, the strong evolutionaryconservation of limited parts of the cytochrome c gene sequenceshould make possible cytochrome c gene cloning with inter-specific DNA hybridization probes.

MATERIALS AND METHODSYeast Strains. The CYC1+ CYC7+ strain D311-3A, cycl-1

CYC7+ strain D234-4D, and cycl-9 CYC7+ strain B596 havebeen described (3).DNA Methods. DNA preparation, labeling, cloning, re-

striction mapping, and sequence determination were carriedout as described (3, 4). All experiments involving recombinantDNA were done in accordance with the official guidelinesoperative in the United States and Canada.

RESULTSPhysical Identification of the Gene for Iso-2-Cytochrome

c. DNA of recombinant plasmid pYeCYC1(0.6), carrying theCYCJ coding sequence, hybridized to two bands in an EcoRIdigest of DNA from a CYC1+ yeast strain (Fig. 1, lane b). Theupper of these bands was shifted by a mutation in the EcoRIsite within CYC1 and disappeared when cycl-1 (CYC1 dele-tion) DNA was used for hybridization; the lower band was thesame size, 2.2 kilobases (kb), in all three strains. These were theonly hybridization bands observed with the pYeCYC1(0.6)probe. Because of the high degree of sequence homology be-tween the iso-i- and iso-2-cytochrome c apoproteins (5), thissecond band is assumed to correspond to the iso-2-cytochromec structural gene (CYP3, CYC7) (6, 7). According to this view,the sequence homology responsible for the band at 2.2 kb in-volves the 321 base pairs of CYC1 coding sequence in thepYeCYC1(0.6) probe and not the 276 base pairs of 3' flankingDNA sequences. This interpretation was confirmed by thedemonstration that an EcoRI/Taq I fragment excised frompYeCYC1(0.6) also hybridized to the 2.2-kb EcoRI fragmentin total yeast DNA (data not shown). In addition to the codingregion, this EcoRI/Taq I probe contains only 20 base pairs of3' distal DNA (4).Cloning the Iso-2 Cytochrome c Gene. The gene for iso-

2-cytochrome c was first cloned from strain D2344D (8), whichcontains the cycl-1 deletion mutation and hence lacks theCYC1 coding sequence (3). pYeCYC1(O.6) DNA was used toscreen a pool of EcoRI fragments of D234-4D DNA cloned inXgt (9). A clone was obtained which contained a 2.2-kb EcoRIfragment, assumed to correspond to the lower band in Fig. 1.

The publication costs of this article were defrayed in part by pagecharge payment. This article must therefore be hereby marked "ad-vertisement" in accordance with 18 U. S. C. §1734 solely to indicatethis fact.

541

Abbreviations: kb, kilobase(s); G3PDH, glyceraldehyde-3-phosphatedehydrogenase.t Present address: Department of Biochemistry, The University ofTexas Health Science Center, San Antonio, TX 78284.

Page 2: Isolation and sequence of the gene for iso-2-cytochrome c in ...

542 Genetics: Montgomery et al.

a b C

A "lo

B -

c '-

p40,-

FIG. 1. Hybridization of the iso-1-cytochrome c gene to highmolecular weight yeast DNA. Southern filter transfers of EcoRI-digested DNA from yeast strains B596 (cycl-9, CYC7+) (lane a),D311-3A (cycl+, cyc7+) (lane b), and D234-4D (cycl-1, CYC7+) (lanec) were hybridized to nick/translation-labeled DNA from pYe-CYC1(0.6), an iso-1-cytochrome c gene clone. Bands A and B corre-

spond to the EcoRI fragment containing CYC1 from B596 andD311-3A, respectively. Band C is present in all three strains at a lowerintensity than bands A and B.

This cloning strategy was simpler than attempting to clone theiso-2-cytochrome c gene from a strain that also possessed theiso-1-cytochrome c gene because it eliminates the possibilityof obtaining primarily iso-1-cytochrome c clones. The gene was

subsequently transferred to pBR322 as a 1.55-kb EcoRI/Pstfragment. This clone, pYeCYC7(1.5)b, was used for restrictionenzyme mapping and for locating the CYC7 gene on the re-

striction map.For the most informative comparison of iso-1- and iso-2-

cytochrome c genes, it was thought desirable to clone both genesfrom the same wild-type strain and determine their sequences.

Therefore, the iso-2-cytochrome c gene was recloned fromD311-3A, the same yeast strain previously used for cloning andsequencing the CYC1 gene (3, 4). It was cloned first as a 3.5-kbHindIII/HindIII fragment in the vector X 590 (10) by usingpYeCYC1(0.6) DNA as probe for screening X-yeast recombinant clones. The gene was subsequently transferred to pBR322as a 1.55-kb EcoRI/Pst I fragment. This hybrid plasmid,pYeCYC7(1.5)a, was the source of DNA for restriction enzymemapping and for sequencing.

Restriction Map of the 1.55-kb Fragment. The mappingof restriction endonuclease cleavage sites was done by themethod of Smith and Birnstiel (11), with 5'-end labeling of theEcoRI site. Cleavage sites were found for Ava II, Alu I, Bgl II,

Fnu4HI, Hae III, Hha I, Hinfi, Hpa II, Hph I, Kpn I, Mbo II,Mnl I, and Sau3A (FnuEI). A partial map of the fragment isshown in Fig. 2.

Locating the CYC7 gene on the fragment was facilitated bythe observation that the CYC7 gene had to contain a Fnu4HIsite. Fnu4HI recognizes the sequence G-C-N-G-C (13), a se-quence that is present, independent of third position choice,when two alanines occur next to each other in the protein se-quence. In the CYC7 protein, this sequence is found at aminoacids 101 and 102 (Fig. 3). Therefore, we knew the gene hadto span at least one of the Fnu4HI sites.The exact position was determined by cutting pYe-

CYC7(1.5)b with both Kpn I and Pst I and hybridizing to the600-base-pair fragment containing the CYC1 gene from pYe-CYC1(0.6). Both of the resulting fragments hybridized to theCYC1 gene, thus indicating that the CYC7 gene spanned theKpn I site as well.A Kpn I site exists in the coding region of the CYC1 gene in

an area of amino acid homology with iso-2-cytochrome c. Thiscorresponding site in the iso-2-cytochrome c sequence is thesame distance from the Ala-Ala (G-C-N-G-C-N) as the Kpn Isite is from the Fnu4HII site on the restriction map. Therefore,we assigned the gene position as shown in Fig. 2.

Determination of the DNA Sequence. The scheme used fordetermining the DNA sequence is shown in the lower half ofFig. 2. In obtaining the sequence, both the Maxam-Gilbert (12)and enzymatic terminator methods (14) were used with mod-ifications as described (4).

All of the synthetic oligodeoxyribonucleotides used as primersfor sequencing the CYC1 gene (4) were tried as primers on theCYC7 gene. Only one, pCsAsGA3, gave specific priming on the1.55-kb fragment. This primer was used to sequence the regionfrom the Kpn I site to 80 base pairs downstream from the 3' endof the coding region. The rest of the sequence was obtained bythe Maxam-Gilbert method.The sequence of the iso-2-cytochrome c gene is shown as the

top line in Fig. 3. (The bottom line is CYC1 sequence forcomparison.) This determination is supported by extensiveoverlaps between the various experiments and by repetitionsof those experiments in which little overlap was obtained. Inaddition, the known amino acid sequence of iso-2-cytochromec provides an independent check for >50% of the sequence.

DISCUSSIONSequence of the Coding Region. Between the ATG initia-

tion codon and the TAG terminator, the DNA sequence con-tains an uninterrupted sequence of 112 triplet codons corre-sponding to the predicted coding sequence for the iso-2-cyto-chrome c apoprotein (5). In the central section of Fig. 3, theiso-1- and iso-2 cytochrome c coding sequences are compared,in an alignment that places the methionine initiator codon ofiso-l-cytochrome c opposite the fifth codon of iso-2-cytochromec. Within the 327 residues of coding sequence that are com-pared, there are 78 differences: 11 differences are in codon firstpositions, 9 are in second positions, and 58 are in third positions.Of the latter, 45 correspond to silent base substitutions; differentsynonomous codons are used in the two proteins to code for thesame amino acid. The distribution of these third-positionchanges within the coding region is highly nonrandom, witha low degree of substitution in the region coding for amino acids66-95 and a much higher level elsewhere. This contrast can beseen clearly by comparing the iso-1- and iso-2-cytochrome csequences coding for regions 70-79 (no third-position substi-tutions in 10 conserved residues) and -6 to 57 (34 third-positionssubstitutions in 53 conserved residues). The latter value agreesexactly with that expected if, in this region, the amino acid

Proc. Natl. Acad. Sci. USA 77 (1980)

Page 3: Isolation and sequence of the gene for iso-2-cytochrome c in ...

Proc. Natl. Acad. Sci. USA 77 (1980) 543

q. lI I 1I I

IfIh)I

II IC ~I I

0HiConI

IT

N'.

AV

I'..

N.N1%

Qz,I I

5'

A

D

"I-,

4823'

BC

E

F

FIG. 2. Restriction map and sequence strategy. The upper diagram shows part of the restriction map of the 1.55-kb EcoRI/Pst fragmentfrom pYeCYC7(1.5)b, obtained after 5'-end labeling the EcoRI end. The lower diagram shows the sequencing experiments used on DNA frotnpYeCYC7(1.5)a to compile the data for iso-2-cytochrome c given in Fig. 3. Experiments A, B, C, D, and E were carried out by the method ofMaxam and Gilbert (12) after 5'-end labeling at the restriction sites indicated. Experiment F used the terminator method (12) with primingwith the synthetic oligodeoxyribonucleotide pC3A3GA3 (4).

coding function were the only constraint acting upon the twocoding sequences and if all codons for a given amino acid (ex-cept CGN arginine codons) (4, 15) were used with equal like-lihood. Because of the high degree of third-position substitutionelsewhere in the two genes, the strong iso-1/iso-2 homology inthe coding sequences for amino acids 66-95 suggests that somespecial constraint operates within this region.A useful paradigm for understanding this constraint is pro-

vided by DNA sequence comparisons for the small icosahedralbacteriophages OX174 and G4. Their DNAs are >30% differentin base composition although the derived proteins have identicalfunctions (16-18). Parts of the coding regions of the DNA are

highly conserved in a fashion analogous to that of the conservedregions of the cytochrome c genes of yeast. In cX174 and G4DNA, the conserved regions have a specific function in additionto coding for one protein. These functions include the originof viral DNA replication (in gene A), two promoters (in geneA and gene C), and overlapping coding regions (gene B in geneA, gene K overlapping genes A and C, and gene E in gene D).It is tempting to speculate that dual function may be responsiblefor the conservation of sequence in the yeast cytochrome c

genes. The second function might be like one of those found inkX174 and G4 or might be some other type of function such

as the binding site of a regulatory protein, a signal for tran-scription termination or RNA processing, or a sequence relatedto chromosome structure. Clearly it is desirable to test thesepossibilities. If, in fact, there is a dual function for this regionof the gene, then conservation of the corresponding amino acidsequence among different eukaryotic cytochromes c may re-

flect this second function rather than a rigid requirement forone particular amino acid sequence (1). Also, as a practicalbenefit, the sequence constraints on this region may allow theyeast cytochrome c genes to be used as general probes for cy-

tochrome c genes from other organisms.Sequences Preceding the Iso-i- and Iso-2-Cytochrome c

Coding Regions. When the 5'-proximal sequences before theiso-i- and iso-2-cytochrome c genes are aligned at ATG (Fig.

3, top portion), direct sequence homologies are not evident;nonetheless, there are many similarities between the two: bothsequences are A+T-rich (;70% A+T); both consist mainly ofalternating pyrimidine and purine sequences between -180and -160; both have one long and several short pyrimidine-richsequences in the region -115 to -40; and both have exactly thesame nucleotide composition, A15C6T5, in the 26 nucleotidespreceding the initiator ATG. Within the 90 base pairs imme-diately upstream from ATG, there are numerous short sequencehomologies between the iso-i- and iso-2-cytochrome c flankingsequences (Fig. 4a). These homologies, like those between theiso-1 and iso-2-cytochrome c coding sequences, may well reflectthe common evolutionary origin of the two genes.

Those sequences common to both the iso-i- and iso-2-cyto-chrome c 5'-proximal regions are likely to include signals forthe initiation of transcription by yeast RNA polymerase II andfor ribosome binding. In this connection, a further comparisoncan be made with the 5'-linked sequences preceding thestructural gene for yeast glyceraldehyde-3-phosphate dehy-drogenase (G3PDH) (19). From -1 to -25, these G3PDH-proximal sequences bear a general resemblance to those pre-

ceding iso-I and iso-2-cytochrome c; moreover, the sequenceA-C-A-C-A-C-A is G3PDH-proximal in essentially the sameplace as it is before iso-i-cytochrome c and the sequence A-T-A-A-A-C-A-A-A (found at -11 to -3 for iso-2-cytochromec) occurs just before the G3PDH initiator ATG.

Differences between the 5' flanking sequences of these threegenes may bear a direct relationship to their respective levelsof in vivo expression. G3PDH is one of the most abundant yeastproteins, comprising as much as 5% of cellular dry weight (19),whereas iso-1- and iso-2-cytochrome c make up 0.05 and0.003%, respectively, of total dry weight when fully derepressed(20). The regions outside the genes for iso-l-cytochrome c andG3PDH possess the common sequence G-T-A-T-A-T-A-A-A(at -124 for iso-l-cytochrome c, and at -144 for G3PDH),whereas the closest analogous iso-2-cytochrome c 5' flankingsequence is A-C-A-T-A-C-A-A-G at -68. By using these three

-180

Genetics: Montgomery et al.

Page 4: Isolation and sequence of the gene for iso-2-cytochrome c in ...

544 Genetics: Montgomery et al.

-150 -120

MAGGCACACAACATATATATATATCGTGTTGTGAAGCTCGAGAAGATTAGATCAGAATAGGCATATATATATGTGTGCGACGACACATGATCATATGGCATGCATGTGCTCTGTATGTATA

-90 -60 -30i ATAAATTr

1 30 60

ATG GCT AAA GMA TTrTC- * C G CT GC GT GCT AC F1AO GfGf A aC AP,E qX gjTGCTGC T GCT AC LT [,EB4oTGTCCrL* G ,LP4 G

et Ala Ls u Met Thr GCu Phe LysPro

Gly Ser Ala Lys Lys Gly Ala Thr Leu Phe Lys Thr Arg Cys GlGln Cys His Thr IleMe rl Ala Leu Vai1-5 1 10 20

90 120 150

E qAG GGT G TT GG R KAT GGT AT |TTT G GA CA GG A rA AT TC AC ACA GATGiuAG GGT Ga TT G p LT G A TTT GG GA C C GT Cp 4T dffi TAT T AC ACA GAT

Clu Lye Giy Giy Pro His Lys Vai Giy Pro Asn Leu His Giy Ile Phe Giy Arg His Ser Giy Gin iValLsu Cy Tyr Ser Tyr Thr Asp30 40 50

180 210 240

1 Psiq: FTC A R FAC G AAA G *TIATTG TC GAG TAC TTG AC zAC CCA AAG AAA TAT ATT CCT GGT ACC AAG ATGFEK F!^ FTC TTG W * _*C TG TCGAG TAC TTG AC WAC CCA AAG AAA TAT ATT CCT GGT ACC AAG ATGAla An Ile Lysn Lys Asn Va Luy Trp AnspClu ASnr Met Ser Giu Tyr Leu Thr Asn Pro Lys Lys Tyr Ile Pro Gly Thr Lys Met

60 70 80

270 300 330

2g T"T GC IGGG TTG MAG MAG GAA AG IGC AGA MAC GAT ITA ATT ACX E T-C-0 R GCC 0 B0 TT GG TTG MG MG GAA C AGA MC G9 A ATT AC TGT

Ata Phe AGy Gly Leu Lys Lys Glu Lys Asp Arg Aen Asp Leu Ile Thr Tyr Let Lhe Lys Ala Ala Lyu Ter

90 100

360 390 420I I I I I I I

GCTATGTCGTCGGAGGAGATATTTATTACTW I IATTATTCTAGTIM TACAGTTATTTATTMATTMATTATTTTTATATGCATGCACATAAMAAGTCTATATTTMAGTTCTTTTATTT

450 480

ATTAATACATTTTTTTTTTGCCCTATTTATTTTTTTATAGTTATGTTAGTATTAAGAACGTTATTTATATTTCAAATTTTTFIG. 3. Sequences of the iso-2- (top lines) and iso-l-cytochrome c genes. The DNA sequence of the iso-1-cytochrome c gene and its derived

amino acid sequence are taken from Smith et al. (4). The amino acid sequence of iso-2-cytochrome c gene is derived from the DNA sequenceand agrees with that previously published (5) after correction of a typographical error (Lys rather than Val) at position 44 (E. Margoliash, personalcommunication). The boxed-in areas indicate homologous DNA sequences within the coding regions of the iso-1- and iso-2-cytochrome cgenes.

nonanucleotide sequences as points of reference, it can be seenthat the three genes differ greatly in their pyrimidine contentin the region between this common sequence and the A-richsequence commencing at -25. In the nontranscribed strand(same polarity as mRNA), G3PDH has eight pyrimidine clustersof length 24, iso-l-cytochrome c has four (one of length 16),and iso-2-cytochrome c has only two, both 4 nucleotides long.These comparisons suggest that, for various yeast. genes, thelevel of gene expression may be positively correlated with thelength of DNA between the ATG initiation codon and a T-A-T-A-A-A (or similar sequence)-preceding the gene. Within thisregion, the presence of a high content of clustered pyrimidineresidues also correlates with high gene activity. In order todetermine what functional role (if any) these distances andsequences may have, it will be necessary, for a number of yeastgenes, to determine the transcription start point and the relativetranscriptional and translational efficiencies.

Sequences Following the Coding Region. From 20 to 140nucleotides after the iso-2-cytochrome c translation stop, thenontranscribed strand has a very high T content (57%). T-Richsequences are dispersed throughout this region, in contrast tothe discrete clustering of T residues following nucleotides10-18, 125-135, and 160-191 after the iso-l-cytochrome cgene. No direct sequence homologies are evident between the3' flanking sequences of the two genes when they are alignedwith coding termini in register (Fig. 3 Bottom). However, thereis significant homology between a region close to the iso-2-cytochrome c termination codon and one located 85 nucleotidesdownstream from the iso-1-cytochrome c translation stop (Fig.4b). Within these regions, the elements common to both genesare a GGA-containing purine cluster followed, 24 or 32 nu-cleotides later, by a closely homologous T-rich sequence. Nei-ther of these sequence elements corresponds to any of theknown or hypothesized signals affecting gene expression(21, 22). For the iso-1-cytochrome c gene, this region is present

AC-Ar2gr-Cr-r-TTTTr-r-TTTGTCGATAT('-ATr.TAATTAIZTTATr.Tr-Ar-rr-TTAr.ATTr.Ar.Gr-r-r.Tr. r. r.r-r. r.Ar-ATr.Mr-Tr.TAAr- r-SAAAAGGAARGAMAGACAACCTGAAGTCTAGGT

Proc. Natl. Acad. Sci. USA 77 (1980)

RUJUImOU I I I I uu I I II no I I n I 0 1 1 M16A 9 I I I Up-UU 116 1 I 1 MUMl6rmOO 9 UWIMu I %, I rsuu II

Page 5: Isolation and sequence of the gene for iso-2-cytochrome c in ...

Proc. Natl. Acad. Sci. USA 77 (1980) 545

a-1

GA AGA ACATACA TTATTCAC TTAAGTAA A TACATTACATCA CA AC\ / \ ,A / NT /\ / \ / \

CTTATACA AG 'C AGCAT ATTACT ACTTC G AAAC TAAA AA ATGI \ / 'V \ \/ \ I/ \ \,C'\ \ / \ /

TC TT CTTTGT AA AT TATAGACAC ACAAATACACACAC TT TA

-1

biso-2 coding region - TAG - 11 base pairs - GGAGGAGA - 24 base pairs - TTTTTTACAGTTAT

iso-i coding region - TAA - 84 base pairs - GAAAAGGAAGGAG - 32 base pairs - TTTTTTATAGTTAT

FIG. 4. Sequence homologies outside the coding regions of the iso-2- and iso-1-cytochrome c genes. Flanking sequences from the 5' (a) andthe 3' (b) ends of the iso-1 and iso-2 coding regions were compared side-by-side in various shifted orientations. The most noticeable out-of-registerhomologies are those shown.

on the mature mRNA molecule, 40-100 nucleotidesproximal to poly(A) (J. Boss and R. Zitomer, personal com-munication), suggesting that the G-G-A-G-G-A andT-T-T-T-T-T-A--A-G-T-T-A-T elements may specify stepsin transcription termination or 3'-terminal processing. Furthermapping and sequencing studies of the iso-i- and iso-2-cyto-chrome c mRNA 3' termini should help to disclose what func-tion (if any) these sequences have in gene expression.

We thank Jeremy Boss and Richard Zitomer for discussions of theirunpublished data on cycl mRNA structure. This research was sup-ported by the Medical Research Council of Canada, by a ResearchGrant from the National Institutes of Health, by a National Institutesof Health Postdoctoral Fellowship to D.L.M., a Killam PostdoctoralFellowship to D.W.L., a National Science Foundation PredoctoralFellowship to P.S., and a North Atlantic Treaty Organization Fel-lowship to G.F.; M.S. is a Career Investigator of the Medical ResearchCouncil of Canada.

1. Margoliash, E., Ferguson-Miller, S., Kang, C. H. & Brautigan,D. L. (1976) Fed. Proc. Fed. Am. Soc. Exp. Biol. 35, 2124-2130.

2. Fitch, W. M. & Margoliash, E. (1967) Science .155,279-284.3. Montgomery, D. L., Hall, B. D., Gillam, S. & Smith, M. (1978)

Cell 14, 673-680.4. Smith, M., Leung, D. W., Gillam, S., Astell, C. R., Montgomery,

D. L. & Hall, B. D. (1979) Cell 16,753-761.5. Borden, D. & Margoliash, E. (1976) in Handbook of Biochem-

istry, ed. Fasman, G. (Chemical Rubber Company Press,Cleveland, OH), 3rd Ed., Vol. 3, pp. 268-279.

6. Petrochilo, E. & Verdiere, J. (1977) Biochem. Biophys. Res.Commun. 79,364-371.

7. Downie, J. A., Stewart, J. W., Brockman, N., Schweingruber, A.M. & Sherman, F. (1977) J. Mol. Biol. 113,369-384.

8. Sherman, F., Stewart, J. W., Parker, J. H., Inhaber, E., Shipman,N. A., Putterman, G. J., Gordisky, R. L. & Margoliash, E. (1968)J. Biol. Chem. 248,5446-5456.

9. Cameron, J. R., Panasenko, S. -M., Lehman, I. R. & Davis, R. W.(1975) Proc. Natl. Acad. Sci. USA 72,3416-3420.

10. Murray, N. E., Brammar, W. J. & Murray, K. (1977) Mol. Gen.Genet. 150,53-61.

11. Smith, H. 0. & Birnstiel, M. L. (1976) Nucleic Acids Res. 3,2387-2398.

12. Maxam, A. M. & Gilbert, W. (1977) Proc. Natl. Acad. Sci. USA74,560-564.

13. Leung, D. W., Lui, A. C. P., Merilees, H., McBride, B. C. & Smith,M. (1979) Nucleic Acids. Res. 6, 17-25.

14. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl. Acad.Sci. USA 74,5463-5467.

15. Kafatos, F. C., Efstradiadis, A., Forget, B. G. & Weissman, S. M.(1977) Proc. Natl. Acad. Sci. USA 74,5618-5622.

16. Sanger, F., Coulson, A. R., Friedmann, T., Air, G. M., Barrell, B.G., Brown, N. L., Fiddes, J. C., Hutchinson, C. A., III, Slocombe,P. M. & Smith, M. (1978) J. Mol. Biol. 125,225-246.

17. Godson, G. N., Barrell, B. G., Staden, R. & Fiddes, J. C. (1978)Nature (London) 276,236-247.

18. Godson, G. N., Fiddes, J. C., Barrell, B. G. & Sanger, F. (1978)in The Single-Stranded DNA Phages, eds. Denhardt, D. T.,Dressler, D. & Ray, D. S. (Cold Spring Harbor Laboratory, ColdSpring Harbor, NY), pp. 51-86.

19. Holland, M. J. & Holland, J. P. (1979) J. Biol. Chem. 254,9839-9845.

20. Sherman, F. & Stewart, J. W. (1971) Ann. Rev. Genet. 5, 257-296.

21. Konkel, D. A., Tilghman, S. M. & Leder, P. (1978) Cell 15,1125-1132.

22. Hagenbuchle, O., Santer, M., Steitz, J. A. & Mans, R. J. (1978) Cell13, 551-563.

Genetics: Montgomery et al.


Recommended