+ All Categories
Home > Documents > Complete aFrenchbean Phaseolin - PNAS · 1898 Botany: Slightom et al. Greaction with dimethyl...

Complete aFrenchbean Phaseolin - PNAS · 1898 Botany: Slightom et al. Greaction with dimethyl...

Date post: 07-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
5
Proc. NatL Acad. Sci. USA Vol. 80, pp. 1897-1901, April 1983 Botany Complete nucleotide sequence of a French bean storage protein gene: Phaseolin (DNA sequence/introns/protein structure) JERRY L. SLIGHTOM*t, SAMUEL M. SUNtt, AND TIMOTHY C. HALL*t *Agrjgenefics Advanced Research Laboratory, 5649 East Buckeye Road, Madison, Wisconsin 53716; and tDepartment of Genetics, University of Wisconsin, Madison, Wisconsin 53706 Communicated by Sydney Brenner, December 15, 1982 ABSTRACT The complete nucleotide sequences of the gene and the mRNA coding for a specific phaseolin type French bean major storage protein have been determined. Comparison of these sequences reveals a phaseolin gene structure consisting of 80 base pairs (bp) of 5' untranslated DNA, 1,263 bp of protein-encoding DNA which is interrupted by five intervening sequences (IVSI,72 bp; IVS2, 88 bp; IVS3, 124 bp; IVS4, 128 bp; and IVS5, 103 bp), and 135 bp of 3' untranslated DNA. Sequences characteristic of eukaryotic promoters "CCAAT" and "TATA" are present in the 5' flanking DNA, and the eukaryotic poly(A) addition signal A- A-T-A-A-A occurs 16 bp before the first nucleotide of poly(A). The derived amino acid sequence yields an amino acid compo- sition and a molecular weight compatible with those found for the a-type phaseolin protein. Two regions that probably serve as carbohydrate-peptide linkage recognition sites have been identified. A region of highly hydrophobic amino acids at the NH2 terminus of the protein suggests the presence of a signal peptide in the newly synthesized phaseolin protein. "Phaseolin" is the name of a group of polypeptides which make up the major storage glycoprotein in the seeds of French bean (Phaseolus vulgaris L.), representing about 50% of the total protein in mature seeds (1). One-dimensional NaDodSO4/ polyacrylamide gel electrophoresis of phaseolin isolated from cotyledons of the cultivar Tendergreen resolves three poly- peptide bands-a, ,B, and y, of 51-53, 47-48, and 43-46 kilo- daltons (kDa), respectively (2, 3). All three polypeptides are encoded in 16S mRNA species, and these proteins accumulate rapidly in the developing seed cotyledon, beginning when the cotyledons are about 7 mm in length and continuing until the cotyledons reach 17-19 mm in length (4). Two-dimensional gel electrophoretic separation of phaseolin resolves five poly- peptides, indicating charge and molecular weight heteroge- neity in the phaseolin protein pool (2). Peptide mapping of these phaseolin proteins after proteolytic and chemical cleav- ages shows that all of these proteins are highly homologous (5, 6), suggesting that they may be encoded in a multigene family. Genomic blot analysis using a cloned phaseolin gene as probe confirms that phaseolin is indeed encoded in a. mul- tigene family (unpublished data). We have previously reported (7) the isolation and partial nucleotide sequence of the phaseolin genomic clone AG-APVPh 177.4 (A177.4) and of a cloned phaseolin cDNA, AG-cpPVPhl (cDNA1) which contains about 40% of a phaseolin mRNA transcript (7). In this paper we report the complete nucleotide sequence determination of the phaseolin genomic clone A177.4 and the sequence of a recently isolated cDNA clone AG-cpPVPh31 (cDNA31) which contains a full-length copy of a phaseolin mRNA transcript. MATERIALS AND METHODS Materials. Restriction endonucleases EcoRI, BamHI, Pst I, Ava II, Bgl I, Bgl II, HindIII, HincII, Sac I, and Xba I were from BioTec (Madison, WI); Dde I, Cfo I, and Sau96I were from Bethesda Research Laboratories; and Msp I, Hinfl, and Rsa I were from New England BioLabs. Terminal transferase and T4 polynucleotide kinase were from P-L Biochemicals, and calf intestine alkaline phosphatase was from Boehringer Mannheim. [y-32P]ATP (2,000-3,000 Ci/mmol; 1 Ci = 3.7 X 1010 Bq) and T4 ligase were from New England Nuclear. Chemicals used for DNA sequence analysis reactions were from vendors recommended by Maxam and Gilbert (8). X-ray film X-Omat AR-5 was from Kodak. DNA Cloning and Isolation. Cloning and screening for the phaseolin Charon 24A clone A177.4 and its subclone AG- pPVPh7.2 (p7.2) have been described (7). Additional pBR322 subclones from A177.4 (used to facilitate DNA sequence de- terminations) are the 3.0-kilobase pair (kbp) EcoRI-BamHI subelone AG-pPVPh3.0 and the 3.8-kbp Bgl II-BamHI sub- clone AG-pPVPh3.8 (see Fig. 1, clone 177.4 for location of these subcloned regions). These subclones were constructed by using the method described by Slightom et al. (9). cDNA31 was constructed from purified phaseolin poly(A)+ mRNA by using a modification of the procedures described by Land et aL (10). DNA from A177.4 was purified by the method de- scribed by Slightom et aL (9) and plasmid DNAs were purified by using the alkaline extraction procedure described by Birn- boim and Doly (11) for cultures as large as 1 liter. DNA Sequence Determination. The reactions were those described by Maxam and Gilbert (8), except that formic acid was used for the A+G reaction. DNA sequencing gels 35.6 cm wide, 43.2 cm long, and 0.4 mm thick were used initially; however, when cDNA31 was analyzed, we used gels that were 21.6 cm wide, 104.1 cm long, and 0.2 mm thick. Long gel plates were treated as described by Garoff and Ansorge (12) so that the acrylamide used to form the gel matrix was bonded directly onto the face plate. Long gels were poured vertically by using a 50-ml syringe fitted with a 16-gauge needle to con- trol the acrylamide flow rate. In order to obtain maximal an- alytical runs on these long gels [up to 600 base pairs (bp)] the reaction times were decreased: to 20-30 sec at 20°C for the Abbreviations: kDa, kilodalton(s); A177.4, AG-APVPh 177.4; cDNA31, AG-cpPVPh3l; kbp, kilobase pair(s); bp, base pair(s). t Present address: Arco Plant Cell Research Inst., 6560 Trinity Court, Dublin, CA 94566. 1897 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertise- ment" in accordance with 18 U. S. C. §1734 solely to indicate this fact. Downloaded by guest on October 29, 2020
Transcript
Page 1: Complete aFrenchbean Phaseolin - PNAS · 1898 Botany: Slightom et al. Greaction with dimethyl sulfate and 1 min at 200C for the A+G, C+T, and Creactions. RESULTSANDDISCUSSION PhaseolinmRNAandGeneNucleotide

Proc. NatL Acad. Sci. USAVol. 80, pp. 1897-1901, April 1983Botany

Complete nucleotide sequence of a French bean storage proteingene: Phaseolin

(DNA sequence/introns/protein structure)

JERRY L. SLIGHTOM*t, SAMUEL M. SUNtt, AND TIMOTHY C. HALL*t*Agrjgenefics Advanced Research Laboratory, 5649 East Buckeye Road, Madison, Wisconsin 53716; and tDepartment of Genetics, University of Wisconsin,Madison, Wisconsin 53706

Communicated by Sydney Brenner, December 15, 1982

ABSTRACT The complete nucleotide sequences of the geneand the mRNA coding for a specific phaseolin type French beanmajor storage protein have been determined. Comparison of thesesequences reveals a phaseolin gene structure consisting of 80 basepairs (bp) of 5' untranslated DNA, 1,263 bp of protein-encodingDNA which is interrupted by five intervening sequences (IVSI,72bp; IVS2, 88 bp; IVS3, 124 bp; IVS4, 128 bp; and IVS5, 103 bp),and 135 bp of 3' untranslated DNA. Sequences characteristic ofeukaryotic promoters "CCAAT" and "TATA" are present in the5' flanking DNA, and the eukaryotic poly(A) addition signal A-A-T-A-A-A occurs 16 bp before the first nucleotide of poly(A).The derived amino acid sequence yields an amino acid compo-sition and a molecular weight compatible with those found forthe a-type phaseolin protein. Two regions that probably serveas carbohydrate-peptide linkage recognition sites have beenidentified. A region of highly hydrophobic amino acids at theNH2 terminus of the protein suggests the presence of a signalpeptide in the newly synthesized phaseolin protein.

"Phaseolin" is the name of a group of polypeptides which makeup the major storage glycoprotein in the seeds of French bean(Phaseolus vulgaris L.), representing about 50% of the totalprotein in mature seeds (1). One-dimensional NaDodSO4/polyacrylamide gel electrophoresis of phaseolin isolated fromcotyledons of the cultivar Tendergreen resolves three poly-peptide bands-a, ,B, and y, of 51-53, 47-48, and 43-46 kilo-daltons (kDa), respectively (2, 3). All three polypeptides areencoded in 16S mRNA species, and these proteins accumulaterapidly in the developing seed cotyledon, beginning when thecotyledons are about 7 mm in length and continuing until thecotyledons reach 17-19 mm in length (4). Two-dimensionalgel electrophoretic separation of phaseolin resolves five poly-peptides, indicating charge and molecular weight heteroge-neity in the phaseolin protein pool (2). Peptide mapping ofthese phaseolin proteins after proteolytic and chemical cleav-ages shows that all of these proteins are highly homologous(5, 6), suggesting that they may be encoded in a multigenefamily. Genomic blot analysis using a cloned phaseolin geneas probe confirms that phaseolin is indeed encoded in a. mul-tigene family (unpublished data).We have previously reported (7) the isolation and partial

nucleotide sequence of the phaseolin genomic clone AG-APVPh177.4 (A177.4) and of a cloned phaseolin cDNA, AG-cpPVPhl(cDNA1) which contains about 40% of a phaseolin mRNAtranscript (7).

In this paper we report the complete nucleotide sequencedetermination of the phaseolin genomic clone A177.4 and the

sequence of a recently isolated cDNA clone AG-cpPVPh31(cDNA31) which contains a full-length copy of a phaseolinmRNA transcript.

MATERIALS AND METHODSMaterials. Restriction endonucleases EcoRI, BamHI, Pst I,

Ava II, Bgl I, Bgl II, HindIII, HincII, Sac I, and Xba I werefrom BioTec (Madison, WI); Dde I, Cfo I, and Sau96I werefrom Bethesda Research Laboratories; and Msp I, Hinfl, andRsa I were from New England BioLabs. Terminal transferaseand T4 polynucleotide kinase were from P-L Biochemicals,and calf intestine alkaline phosphatase was from BoehringerMannheim. [y-32P]ATP (2,000-3,000 Ci/mmol; 1 Ci = 3.7 X1010 Bq) and T4 ligase were from New England Nuclear.Chemicals used for DNA sequence analysis reactions were fromvendors recommended by Maxam and Gilbert (8). X-ray filmX-Omat AR-5 was from Kodak.DNA Cloning and Isolation. Cloning and screening for the

phaseolin Charon 24A clone A177.4 and its subclone AG-pPVPh7.2 (p7.2) have been described (7). Additional pBR322subclones from A177.4 (used to facilitate DNA sequence de-terminations) are the 3.0-kilobase pair (kbp) EcoRI-BamHIsubelone AG-pPVPh3.0 and the 3.8-kbp Bgl II-BamHI sub-clone AG-pPVPh3.8 (see Fig. 1, clone 177.4 for location ofthese subcloned regions). These subclones were constructedby using the method described by Slightom et al. (9). cDNA31was constructed from purified phaseolin poly(A)+ mRNA byusing a modification of the procedures described by Land etaL (10). DNA from A177.4 was purified by the method de-scribed by Slightom et aL (9) and plasmid DNAs were purifiedby using the alkaline extraction procedure described by Birn-boim and Doly (11) for cultures as large as 1 liter.DNA Sequence Determination. The reactions were those

described by Maxam and Gilbert (8), except that formic acidwas used for the A+G reaction. DNA sequencing gels 35.6cm wide, 43.2 cm long, and 0.4 mm thick were used initially;however, when cDNA31 was analyzed, we used gels that were21.6 cm wide, 104.1 cm long, and 0.2 mm thick. Long gelplates were treated as described by Garoff and Ansorge (12)so that the acrylamide used to form the gel matrix was bondeddirectly onto the face plate. Long gels were poured verticallyby using a 50-ml syringe fitted with a 16-gauge needle to con-trol the acrylamide flow rate. In order to obtain maximal an-alytical runs on these long gels [up to 600 base pairs (bp)] thereaction times were decreased: to 20-30 sec at 20°C for the

Abbreviations: kDa, kilodalton(s); A177.4, AG-APVPh 177.4; cDNA31,AG-cpPVPh3l; kbp, kilobase pair(s); bp, base pair(s).t Present address: Arco Plant Cell Research Inst., 6560 Trinity Court,Dublin, CA 94566.

1897

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertise-ment" in accordance with 18 U. S. C. §1734 solely to indicate this fact.

Dow

nloa

ded

by g

uest

on

Oct

ober

29,

202

0

Page 2: Complete aFrenchbean Phaseolin - PNAS · 1898 Botany: Slightom et al. Greaction with dimethyl sulfate and 1 min at 200C for the A+G, C+T, and Creactions. RESULTSANDDISCUSSION PhaseolinmRNAandGeneNucleotide

1898 Botany: Slightom et al.

G reaction with dimethyl sulfate and 1 min at 200C for theA+G, C+T, and C reactions.

RESULTS AND DISCUSSIONPhaseolin mRNA and Gene Nucleotide Sequences. The re-

striction endonuclease sites and nucleotide sequence analysisstrategies used for cDNA31 and A177.4 clones are shown inFig. 1. This figure also presents the structural outline of thephaseolin genomic gene, showing intron and exon locationsderived from the nucleotide sequence comparison presentedin Fig. 2. The structure of this particular phaseolin gene in-cludes a total of 1,990 bp of DNA distributed as follows: 80bp of 5' untranslated DNA, 1,263 bp of protein-encoding DNAinterrupted by five introns (a total of 515 bp), and 135 bp of3' untranslated DNA. Thus, the original mRNA transcript of1,990 bp must be processed by five or more RNA splicing re-actions to yield a mature 1,475-bp mRNA molecule. We referto these introns, 5' to3', as, IVS1,-2,-3,4, and -5(Figs. land 2).

5'.Flanldng and Untransleted DNA. Assignment of the 5'untranslated DNA follows the identification of the 5' end ofthe mRNA transcript and the translation initiator codon AUG.The nucleotide sequence of cDNA31 extends to position 1, anadenine nucleotide in Fig. 2, after which the sequence showsthe G-C tails used to clone the synthesized cDNA moleculeinto the Pst I site of pBR322. Additional evidence showing

Pon',C&

*P E

^%044

'IT-

Proc. Natl. Acad. Sci. USA 80 (1983)

that this nucleotide represents the start of the phaseolin mRNAtranscript, and would have the protecting "Cap" structure[presumably m7Gppp (13)] attached, was obtained by S1 nu-clease digestions (14) and by the identification of identical ad-enine nucleotide termini in four other phaseolin full-lengthcDNA clones (unpublished data). Assignment of the initiatorcodon is somewhat arbitrary because the two consecutive NH2-terminal AUG codons are equally favored to be an initiatorcodon when compared to the initiator consensus sequence de-scribed by Kozak (15), A-X-X-A..U-G-G (Fig- 2). We have ar-bitrarily selected the first methionine codon as the initiatorcodon for this phaseolin genie and hence the 5' untranslatedregion would be 80 bp long.

Also included in Fig. 2, is 100 bp of 5' flanking DNA inwhich several DNA sequences believed important in regu-lating the transcription of eukaryotic genes are located, theTATA and CCAAT box sequences (16). Three TATA box se-quences are located upstream from the mRNA cap, at posi-tions -28, -37, and -39 (single overlined nucleotides inFig. 2). The TATA box sequence is generally located about-30 bp from the mRNA cap in most eukaryotic genes. Thus,we expect that the sequence at position -28 would be themost important in regulating the expression of this gene. TwoCCAAT box-like sequences are located in this flanking DNA,the first at position -67 (CCAT) and a second at position -74(CCAAAT) (double overlined nucleotides in Fig. 2). In most

cDNA clone 31P X C B C CC P

,%I pBR322GC tailY ,' ,f' I,

--4- r 4/ // /j f-i-- I

nxon/Number p I, /, i83/i '-I I 6/b 3i 23 i

I II I|

ntron Number 7-pb,8bpL 3b7AP 88bp l~Ibp I

H ,E FS' PHFi ixR I F S

5 Flankina5UI il _

\" NL FbdyAandG-C%% tails

59bp ! i 188b1I I

Amino acid position 4 1 41 4122451242 351358 42q1%% I ~~~~~~~~I I INucleotide positiorl% 2(0 400 600 8 1000 1200 1400 16G0 -1800 2(00

0 ,it + , .I - I .-*__________4 *'#- -

XPhseolin gene structure ,,-'

EKEt t EE G E

t left armEM

Choron 24A clone 177. 4

FIG. 1. Restriction endonuclease map, sequence analysis strategy, and structure of the phaseolin gene. Restriction enzyme sites thatwere usefulare shown by vertical arrows: B,.Bgl I; C, Sac I; E, EcoRI; F, Hinfl; G, Bgl II; H, HincI; K, Kpn I; M, BamHI; P, Pst I; R, Rsa I; S, Sau96I; U, Sau3a;and X, Xba I. The direction and distance analyzed from the sites are shown by horizontal arrows. Comparison of the nucleotide sequences fromphaseolin cDNA31 and A177.4 clones (Fig. 2) revealed the phaseolin gene structure shown in the center. Both phaseolin untranslated regions (5'and 3') and coding sequences are shown as heavy lines, with coding region lines being twice as heavy as those for untranslated regions. Intron andexon numbers and sizes (bp) are given between corresponding dashed lines. Amino acid positions are given below the structural gene with respectto NH2 and COOH termini and intron locations.

II,, I

\

I'

EII1

I I

I I

I I

I

HC S B1+ BtS: 1_444 4'

ICULRRCC FR14 44&4'4. 4 4-_.UI3Flandn

Ch24Aright arr

Ch 24A

.:----------.- likAr I 04. , -.--

z- I- -IZLI wI s I I m

lw 9 0 Ab- -Y VI%

II14 1 2%9

W-1I

Dow

nloa

ded

by g

uest

on

Oct

ober

29,

202

0

Page 3: Complete aFrenchbean Phaseolin - PNAS · 1898 Botany: Slightom et al. Greaction with dimethyl sulfate and 1 min at 200C for the A+G, C+T, and Creactions. RESULTSANDDISCUSSION PhaseolinmRNAandGeneNucleotide

Botany: Slightom et al. Proc. Natl. Acad. Sci. USA 80 (1983) 1899

177-4 CATATGCGTGTCATCCCATGCCCAAATCTCCATGCAT6TTCCAACCACCTTCITCTCrATATAATACCTATAAATACCTCTAATATCACTCACT lT IC4 . 4 . 4 . + . 4, . 4, ., 4 -1

177-4 ATCATCCATCCATCCAGAGTACTACTACTCTACTACTATAATACCCCAACCCAACTCATATTCAATACTACTC IACTATGAT6AGAGCAAGGGTTCCACTcDNA31 ATCATCCATCCATCCAGAGTACTACTACTCTACTACTATAATACCCCAACCCAACTCATArTCAATACTACTCTACTATGAT6AGAGCAAGGGI TCCACTCAP/ (5' - UNTRANSLATED) - METMETARGALAARGVALPRoLE

17 7.4 CCTG6TT6CTG666AATTC TT T TCCTG66CATCACT TTCT6CCTCAT TTG6CCAC TTCACTCC66GA66A66AA6A6A6CCAA6AT AACCCC IT C IAC rT CAAC

cDIA31 CCT6TT6CT666AATTCTTTTCCT66CATCACTTTCT6CCTCATTT6CCACTTCACTCC6G6A6GA66AA6A6A6CCAA6ATAACCCCICTACT TCAACULEULEULEUGLY ILELEUPHELEUALASERLEUSERALASERPHEALATHRSERLEUARsGLUGLUGLUGLUSERGLNASPASNPROPHETYRPHEASN

177 4 TCT6ACAACT CCTG66MCACTCT ATTCAAAAACCAATATGGTCACAITCGTG6TCC TCCA6A66T TCACCAACAATCCAAACGACTTCAGAATC TTG6AAG

cDUA31 TCTGACAACTCCTI6AACACTCTATTCAAAAACCAATATGGTCACATTC6TGTCCTCCAGAGGTTC6ACCAACAATCCAAACGACTTCAGAATCT T6AAGSERASPASNSETRNPAsNTHRLEUPHELYSASN6LNTYRGLYHIs ILEARGVALLEUGLNARGPHEAsPGLNGLNSERLYSARGLEUGLNAsNLEtIGLuA

17 7 -4 ACTACC6TCT TGTGGAGTTCAGGTCCAAACCCGAAACCCTCCT TCTTCCTCA6CA66CT6AT6CTGA6T TACTCC TA6TTGTCCGT AGTGGT AAGT AAT T

cDIA31 ACTACCGTCTTGTG6AGTTCAGGTCCAAACCCGAAACCCTCCTTCTTCCTCAGCAGGCTGATGCTGAGTTACTCCIAGTTGTCCGTA6T6IspTYRAR6LEUVALGLUPHEAR6SERLYsPROGLuTHRLEuLEuLEUPRoGLNGLNALAAsPALAGLuLEuLEuLEUVALVALARGSERG

17 7 -4 6CTACT66TATCACTTG6TTTC TTCTT6CA6AAATAAT66TAATG6A6TTT TTTATAAT T TCA666A6C6CCATACTC6TC TT66TG6AAACCTG6AT GATC6C----------------- _-------------+---------+----------+--------- -------------_-----+---------+---------- Soo

cDIIA31 (IOVS 1. 72 ap) G6AGCGCCATACTCGTCTTGGTGAAACCTGATGATCGCLYSERALAI LELEUVALLEUVALLYsPRo^sp^sPARG

17 7 -4 AGAGAGTACT TCTTCCTTACGAGCGATAACCCGATATTCTCTGATCACCAGAAAATCCCTGCAGGAACCAT TtTCTATT T66TTAACCC T6ATCCCAAA6_-_- ,-_-_,-_-_-_,-_-_-_-_-_-_-,-- ----- - - 600

cDIIA31 A6A6A6TACTTCTTCCTTACGAGCGATAACCCGAT ATTCTCTGATCACCAGAAAATCCCT6CAGGAACCAT TTTCTAT TT66TTAACCCTG6ATCCCAAA6AtsGtuTYRPHE PHELEu THRSERASPASIIPRO ILEPHESERAspH ISGLNILYs ILEPRoALAGLYTHRIlLEPHETYRLEUVALASNPRoASPPRoLysG

17 7 -4 AGGATCTCAGAATAATCCAACTCGCCATGCCCGTTAACAACCCTCAGATTCATGTACTG6CCT TTT6TAATACC6AACTAAT TTT T TGTTATT TTAAC TT G

cDNIA31 AGGATCTCAGAATAATCCAACTCGCCATGCCCGTTAACAACCCTCAGATA '_(IS 2, 88 BP) -Lu^sPLEuARs[LE ILEGLNLEuALARETPROVALAsN^sNlPRoGLNILEN IS,---

17 7 -4 CAATTTCTCTCCAAATGTGATGAT AAATG6TTT6TCCT6TA6GAAT TT TTCC TATCTAGCACAGAAGCCCAACAAT CCTACTTGCAAGAGTTICAGCAAGCA__- ---- ------ -- -- -- -- -- 800

cDIIA31 !: 6MiiiT TT CCTATCTA6CACA6AA6CCCAACAATCCT ACT T6CAA6A6T TCA6CAA6CAGLUPHEPHELEUSERSERTHRGLuALAGLNGLNSERTYRLEuGLNGLUPHESERLYsH I

177 -4 TATTCTAGAGGCCTCCTTCAATGTAAGAAAGAAAACAGCATCTAACTACATAT TT6C6T T6CCATT TA6CT A6TACTTT6TCTAAAT6TCACAC TTGTTG__-------- --_ , ------+-----------------------------+----------------------------------------- 900

cDNIA31 TATTCTAGAGGCCTCCTTCAAT ( IVS 3, 124 BP)S ILELEuGLuALASERPHE^sN

177 -4 ATTT6TT6MAT6ATATCATTAT ATAT6GTTT6CAT6AT TT T ATAGAGC^AATTCGAGGAGATCAACAGGGT TCTGTTTGAAGAGGAGGGACAGCAAGAG__ ___ -------------------+----------------- --+-----------------------------+--------------------- 1000

cD1A3 1 GACAAATTC6A66A6ATCAACA66GTTCT6TT T6AA6A66A666ACA6CAA6A6SERLYSPHE6LU6LU ILEAsNIARGVALLEUPHEGLuGLuGLuGLY6LmGLNG6LU

17 7 -4 66A6T6ATT6T6AACATT6ATTC TGAACAGATT^AAGAACTGAGCAAACATGCAAAATC TA6TTCAA66AAATCCCTTTCCAAACAA6ATAACACAAT T6

cDIIA31 GGA6TGATTGTGAACATTGATTCTGAACAGATTAAGGAACTGAGCAAACATGCAAAATCTAGTTCAAGGAATCCCTTTCCAAACAAGATAACACAATTGGLYVAL ILEVALAs I LEAsPSERGLuGLNI ILELYS6LULEUSERLYSHI sALALYsSERSERSERAR6.LYSSERLEUSERLYsGLNAspAsIITHR IL E6

177- 4 G6AACGATTTGGAAACCTGACTGAGAGGACCGATAACTCCTTGAATGTGTTAATCAGTTCTATAGAGATGGAAGAGGTAAATACAXAAGAAAACCATAT----------------------------------------------------------------------~~~~~~~~~~~~~Ziiii ----------------------1200F120

MDIAN GAAACGAATTTGGAAACCTGACTGAGAGGACCGATAACTCCTTGAATGTGTTAATCAGTTCTATAGAGAT66AALYAsioGLUPNEGLY^sNLEuTHRGLuAR6THR^sp^sNSERLEuASNVALLEU ILESERSERILEGLUPIETGLuGLU

17 7 -4 A6ACAMACTCA6CAATT6A6T TCTATTATTCAC T6TC6TCTT66TTA6AAAATCT TA6TATT6A6ACT ATAATTAAATAAT66TT T TTTT T6TTAACAAA

(IVS 4, 128BP)-

17 7 -4 TTTA666A6CTCTTT TT6T6CCACACTACTAT TCTAAGCCATTG6TTATACT A6T66TTAAT6AA66A6AA6CACAT6TT6AACTT6TT66CCCAAAA6G----1 r---*---------+---------+-------------------+--------------------------------+--------+-----~----- 1400

cDNA31 -66A6 7CT2TTTT6T6CCACACTACTATTCTAA)CCATTGTTATACTAGTTTAATGAAAGAAGCACATGTTGAACTTGTTGCCCAAAACGGLYALALEUPHEVALPRoHisTYRTYRSERLYsALAILEVALILELEUVALVALAsNGLuGYGLuALAISVALGLULEUVALGLYPRO LYSGL

17 7 -4 AATAAGGAAACCTTGGAATATGAGAGCTACAGAGCTGAGCT TTCTAAA6AC6AT6TAT TTG6TAATCCCA6CA6CAT ATCCA6TT6CCATCAA66CTACC_---_--_---_----_----,---__-_--_-------.--------.-.*-* 1500

CDIIA3 1 AAATAAGGAAACCT TGGAATATGAGAGCTACAGAGCTGAGCTT TCTAAA6AC6ATG6TATT T6TAATCCCA6CA6CATATCCA6T T6CCATCAA66C TACCYAsN6LUTysHTRLEUGLuTYRGLUSERTYRARsALALULEIJSERLYsAsPAsPVALPHEPVALILEPRoALAALATYRPROVALALAILELYSALATHR

17 7 4 TCCAAC6TfiAATT TCACT66T TTC66TATCAAT6CTAATAACAACAAT A66AACC TCC TTG6CAGTAT ATATATTTAT TAT ATAT6ACCAT 6AAT TTG6AA._. . . 4- 4 -.--.-.-------------4---------7---------- 1600

cDNA31 TCCAACGTGAATTATC6TACTGC6C ATGCATTAATAACAACAATA66AACCTCC(TSCA6 8SERASPVALASNPLE TR6LYPHEGLYILEAsNALAASNASNAsPAsNARGAsNLEuLELsALA

177 ^4 TATA666TT6TTGATG6AATTrT T TATTTATAATT66TAAT6C6T6ATT6TGAT T6TAAATAT6AA66TAAGAC66ACAAT6TCATAA6CA6CATC6GT A-------------------------- ------------+---------+-------- ----------------------+----------- 1700

CDHA31 (IVS 5, 103 BP) ' AAGA^6CGGACAATGTCATAAGCAGCATCGGTALYLYsTHRAsPASNVAL ILESERSERILEGLYA

17 7 4 6A6CTCT66AC66T AAA6AC6TG TTGG66CT TAC6TTCTC TG66TC T6GT6ACGAA6T TATG6M6CTG6ATCAACAAACA6AGT G6ATCGT AC T T GT1GGA

cDNA31 6A6CTCTGGACGTAAAAGACGTGTT6G66CTTAC6TTCTCT666TCT66T6AC6AA6TTAT6AA6CTGATCAACAAACA6AGTGGATC6TACTATGTGGAR.ALALEUASPGLYLYsASPVALLEU6LYLEUTHRPHE SER6LYSER6LYASP6LUVALSETLYSLEUALEASILYSGLN SERG6LYSERTYRPHEEVLAS

177 4 T6CACACCATCACCAACA6GAACA6CAAAA666AA6AAA666T6CATT T6GT6TAC T6AATAA6TATG6AAC TAAAATGCATG6TAGGT GTAAGAGC 1CA TGG

. , .-j - ., , . . * 900o

cDNA31 TGCACACCA-TCACCAACA6GAACA6CMMGGGAA6AAA6GT6CAT6TTGT6T6AGATAATA AT6AACTAAAATGCATGATAGGTGTAAGAGC1CAAT6GGPALAHiX sHiXsHISG~mGLNG~uGINuIANLYsGLYARALYsGLYALAPHEVALNTYRTER

177 *4 A5A6CAT G6AAT AT T6TATCC6ACCATG6TMCA6TATAATAAC 1GA6C TCCAT C TCAC T ~TC[TC AT GAAT AAACAAAGGAT G T TAT6A1

cDIIA31 AGAGCATGGAATA1 TGTATCCGACCAT T AACAGT ATAATAAC TGAGC CCATC 1CACT TC I C fATGAAT AAACAAAGGAIG TTA TGA1- --POL Y (A)

FIG. 2. Nucleotide sequences of phaseolin clones cDNA31 and A177.4. Nucleotide sequence numbering begins with position 1 corresponding tothe first adenine in cDNA31. The nucleotide sequences of these two clones show complete homology to the point where poly(A) is added, except forthe five introns, indicated by arrows and gaps in the sequence of cDNA31. Nucleotides that may have biological importance (see text) have singleoverlines (TATA boxes), double overlines (CCAAT boxes), and single underline [A-A-T-A-A-A-poly(A) addition signal]. The amino acid sequenceof phaseolin has been derived from these sequences and is shown below the cDNA sequence line. The initiator codon is believed to be the first me-thionine codon and the terminator codon is identified by TER. Underlined amino acid residues denote possible N-glycoside attachment sites.

eukaryotic genes this sequence is located -77 ± 10 bp up- cap site as is the first TATA box found in phaseolin, -28 bp.stream from the mRNA cap (16). Thus, either or both of these However, a CCAT sequence is located at -112 bp upstreamCCAAT box-like sequences could play a role in controlling of the zein cap site, 35 bp further away than is found for pha-transcription. seolin and many other eukaryotic genes. TATA box-like se-TATA and CCAAT box-like sequences have been located in quences have also been located in similar positions for soy-

DNA flanking the maize major storage protein gene, zein (17). bean leghemoglobin genes Lbc and Lba (18) and Lbc2 and Lbc3The zein TATA box is located at the same distance from the (19, 20) and in a soybean actin gene (21).

Dow

nloa

ded

by g

uest

on

Oct

ober

29,

202

0

Page 4: Complete aFrenchbean Phaseolin - PNAS · 1898 Botany: Slightom et al. Greaction with dimethyl sulfate and 1 min at 200C for the A+G, C+T, and Creactions. RESULTSANDDISCUSSION PhaseolinmRNAandGeneNucleotide

Proc. Natl. Acad. Sci. USA 80 (1983)

DonorExon Intron

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 T+1 +2 +3 +4 +5 +6 +7 +8 +9+10

Base frequency C 4

T 13

Plantconsensus

Overallconsensus

A 1 2 5 6 1 7 8 2 6 00 0 17 B 2 511 7 7 9

G 2 1 0 6 7 2 2 6 014 20 0 1 0 16 1 0 0 1 0

9 4 4 6 2 8 5 0 4 0 0 0 1 0 0 2 2 4 5

8 11 4 6 9 2 7 14 2 0 20 2 6 2 14 711 8 6

A A 616 T A 6 T

AcceptorIntron Exon

-15-14-13-12-11-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 [+1 +2 +3 +4 +5 +6 +7 +8 +9+10

7 0 1 7 7 8 10 11 6 4 6 8 3 200 2 5 7 5 1 3 5 9 9 1

2 1 4 2 1 4 5 1 1 3 0 10 0 0 20 18 3 8 7 7 3 8 5 1 10

2 2 1 1 1 0 1 0 1 2 1 0 5 0 0

9 17 14 10 11 8 4 8 12 11 13 2 12 0 00 1 1 4 6 0 3 1 5 10 11 4 4 6 14 4 5 5 8

G T

66TI-------(T/C)1- ---------. X T A G

FIG. 3. Donor- and acceptor-site DNA sequences from 20 plant splicejunctions. The total number of times a given nucleotide occurs at a specificposition is shown. The locations of positions in introns and exons are indicated by horizontal arrows and numbers; the G-T/A-G rule is shown bythe vertical arrow. X in consensus position sequences indicates that there is no nucleotide preference for that position. The plant genes analyzedwere: French bean phaseolin (Fig. 2), soybean leghemoglobins (Lba, Lbc, Lbc2, and Lb3) (18-20), and soybean actin (21).

Comparison of the 3' nucleotide sequences from clonescDNA31 and A177.4 reveals a TGA translation terminator co-

don (position 1,855, Fig. 2), and 135 bp of 3' untranslatedDNA after which the sequence of cDNA31 shows a long stretchof poly(A). The hexanucleotide A-A-T-A-A-A, which has beensuggested to be a signal for poly(A) addition (22), is located 16bp 5' to the first nucleotide of poly(A). A consensus sequenceand location for the poly(A) addition signal in plant genes hasnot yet been derived, in part because several other hexanu-cleotide sequences have been suggested to be the poly(A) ad-dition signal for soybean leghemoglobin genes (18, 19).

Intervening Sequences. The nucleotide sequence compar-

ison in Fig. 2 shows that the phaseolin structural gene in A177.4is interrupted by five small introns, creating six protein-en-coding DNA segments (exons). These introns are quite smallcompared to those found in many split genes from nonplantspecies (23). The phaseolin introns are as follows: IVS1, 72 bp;IVS2, 88 bp; IVS3, 124 bp; IVS4, 128 bp; and IVS5, 103 bp.Our previously published nucleotide sequence for IVS4 (IVS-C in ref. 7) contained several errors, including one extra nu-

cleotide. Assigning the second methionine as codon number1, we find that IVS1 splits codon 104; IVS2 is between codons167 and 168; IVS3 is between codons 194 and 195; IVS4 isbetween codons 271 and 272; and IVS5 splits codon 358.

All five phaseolin introns conform to the universal G-T/A-G splicing rule (24) observed for the 3' and 5' intron bound-aries. Fig. 3 compares 20 plant intron donor- and acceptor-sitesequences. The donor and acceptor consensus sequences forplant introns are similar, but not identical, to the consensus

sequences derived from other split genes (23, 25). The con-

sensus plant donor sequence shows a preference for T ratherthan A in position -2 and shows no preference in position-3. The consensus plant acceptor sequence shows a strongpreference for either G or A rather than no preference (X) inposition -4. In positions -5 to -15, plant introns show a strongpreference for either T or A, which may be a result of theirhigh A+T content, whereas nonplant acceptor sequences pre-fer either T or C. Also, at position + 1 the acceptor plant se-

quence shows a much higher preference for G (90%) than thatobserved for nonplant species (47%). These differences in theplant splice site donor and acceptor sequences probably donot require a RNA splicing mechanism different from that usedin nonplant species.

Analysis of the nucleotide composition of the phaseolin in-trons reveals sequences that are very rich in A+T: IVS1, 71%;IVS2, 68%; IVS3, 71%; IVS4, 74%; and IVS5, 77%. The av-

erage A+T composition is 72.4%, 17.1% more A+T than found

for the exons (average, 55.3%). Calculations for the A+T com-

positions in other plant gene introns reveals a similar pref-erence. Introns in soybean actin (21) are 69.3% A+T and insoybean leghemoglobins Lba, Lbc, Lbc2, and Lbc3 (18-20),A+T compositions average 76.8%. The A+T composition ofthese plant introns is considerably higher than that found inintrons from nonplant species-for example, human fetal glo-bin genes average 55% A+T (26). An explanation of why plantintrons have a high A+T content is not yet apparent.

Phaseolin Protein Sequence. The phaseolin protein en-

coded by clones A177.4 and cDNA31 contains 420 amino acids,not including the initiator codon. Analysis of amino acid se-

quences derived from nucleotide sequences for P. vulgaris lec-tin and maize zein reveals the presence of signal peptides ofat least 20 amino acids for lectin (27) and 21 amino acids forzein (17). The NH2-terminal region of phaseolin contains 24hydrophobic and 2 hydrophilic amino acid residues (residues1-26, Fig. 2) and is followed by a highly hydrophilic region(residues 27-35). The presence of this hydrophobic regioncombined with the findings for zein and lectin strongly sug-gests that phaseolin does contain a signal peptide. The twohydrophilic arginine residues at positions 2 and 4 do not weakenthis argument because positively charged amino acids havebeen found in similar locations in the signal peptides from manyother eukaryotic and prokaryotic secretory proteins (28). Inthe absence of NH2-terminal amino acid data for native pha-seolin we suggest that the phaseolin signal peptide is most likely21-26 amino acids in length, with cleavage occurring some-

where after residue 21 (serine) and certainly before residue 27(arginine) after which is located a highly hydrophilic region ofamino acids. For the sake of arguments given below, we willat this time assume that the signal peptide is 21 amino acidresidues long and that this native phaseolin protein contains399 amino acids.

Phaseolin is known to be glycosylated (29), and a search forthe glycosylation recognition sequence Asn-X-(Ser or Thr) (30)reveals the presence of two N-glycoside attachment sites(underlined in Fig. 2), one at nucleotides 1,115-1,123 (aminoacids 251-253) and the other at nucleotides 1,510-1,518 (aminoacids 340-342).

Comparison of tryptophan methionine residue locationspredicted by phaseolin peptide mapping (5) with their loca-tions found in the derived amino acid sequence shows rea-sonable agreement. Both analyses show only one tryptophanresidue, located 4.5 kDa by peptide mapping and 2.8 kDa byderived amino acid sequence from the NH2-terminus of pha-seolin. After inversion of the methionine peptide map (5) the

X T 6 6 T A A 6 T A T AI T T T T A AATTT T A 6ITAA~TAAT A A AT T TA AA A

1900 Botany: Slightom et al.

Dow

nloa

ded

by g

uest

on

Oct

ober

29,

202

0

Page 5: Complete aFrenchbean Phaseolin - PNAS · 1898 Botany: Slightom et al. Greaction with dimethyl sulfate and 1 min at 200C for the A+G, C+T, and Creactions. RESULTSANDDISCUSSION PhaseolinmRNAandGeneNucleotide

Proc. Natl. Acad. Sci. USA 80 (1983) 1901

Table 1. Comparison of phaseolin amino acid compositionsComposition Composition derived fromfound in nucleotide sequence in

Amino acid average protomer cDNA31 and A177.4Lys 24 24His 10 10Arg 16 17Asp & Asn 55 51Thr 13 14Ser 33 35Glu & Gln 70 60Pro 16 14'I2 Cys 2 _Gly 23 24Ala 20 21Val 25 27Met 3 3Ile 22 24Leu 39 37Tyr 11 12Phe 22 25Trp 5 1

Total* 409 399

The amino acid analysis (31) was performed on a phaseolin mixturecontaining a, (3, and y phaseolin protomers. For this comparison wehave assumed that the average phaseolin protomer has a molecularweight of 47,000 (nonglycosylated). With 115 as the average molecularweight of an amino acid residue, this average phaseolin protomer wouldconsist of 409 amino acids.* Coefficient of correlation = 0.98.

suggested locations for three methionine residues are 18, 29-31, and 43 kDa from the NH2 terminus; in the derived aminoacid sequence we find corresponding methionine residues at14.4, 24.3, and 41.7 kDa from the NH2 terminus.

Table 1 compares amino acid compositions derived fromthe 399 amino acids for phaseolin with the amino acid com-position found by amino acid analysis of the phaseolin pro-tomers (31). These amino acid compositions show excellentagreement, with a coefficient of correlation of 0.98. The cal-culated size for the nonglycosylated phaseolin protein en-coded in clones A177.4 and cDNA31 is 45.2 kDa. If we as-sume that each oligosaccharide side chain adds about 2 kDa(30), the phaseolin protein identified here will have an ap-parent size of about 49.2 kDa which is closest to the observedsize for a /3-type phaseolin polypeptide, 47-48 kDa (2, 3).We thank Dr. 0. Smithies (National Institutes of Health Grant GM-

20069) for use of his laboratory during the initial stages of this project,Drs. Mike Murray and Les Hoffman forcDNA cloning of phaseolinmRNAand useful discussions of results, Dr. Yu Ma for useful discussion con-cerning phaseolin peptide mapping and glycosylation studies, and RogerDrong and Rod Klassy for technical help in screening cDNA clones andpreparation of plasmid DNAs. We are especially grateful to Richard Barker

for training us to use long DNA sequencing gel technology. This paperis Number 2623 from the Laboratory of Genetics, University of Wis-consin.

1. Ma, Y. & Bliss, F. A. (1978) Crop Sci. 17, 431-437.2. Brown, J. W. S., Ma, Y., Bliss, F. A. & Hall, T. C. (1981) Theor.

Appi Genet. 59, 83-88.3. Hall, T. C., McLeester, R. C. & Bliss, F. A. (1977) Plant Physiol.

59,1122-1124.4. Sun, S. M., Mutschler, M. A., Bliss, F. A. & Hall, T. C. (1978)

Plant Physiol 61, 918-923.5. Ma, Y., Bliss, F. A. & Hall, T. C. (1980) Plant Physiol. 66, 897-

902.6. Bollini, R. & Vitale, A. (1981) Physiol Plant. 52, 96-100.7. Sun, S. M., Slightom, J. L. & Hall, T. C. (1981) Nature (London)

289, 37-41.8. Maxam, A. M. & Gilbert, W. (1980) Methods Enzymol 65, 499-

560.9. Slightom, J. L., Blechl, A. E. & Smithies, 0. (1980) Cell 21, 627-

638.10. Land, H., Grez, M., Hauser, H., Lindennaier, W. & Schuitz, G.

(1981) Nucleic Acids Res. 9, 2251-2266.11. Birnboim, H. C. & Doly, J. (1979) Nucleic Acids Res. 7, 1513-

1523.12. Garoff, H. & Ansorge, W. (1981) Anal Biochem. 115, 450-457.13. Shatkin, A. J. (1976) Cell 9, 645-653.14. Berk, A. J. & Sharp, P. A. (1977) Cell 12, 721-732.15. Kozak, M. (1981) Nucleic Acids Res. 9, 5233-5252.16. Efstratiadis, A., Posakony, J. W., Maniatis, T., Lawn, R. M.,

O'Connell, C., Spritz, R. A., DeRiel, J. K., Forget, B. G.,Weissman, S. M., Slightom, J. L., Blechl, A. E., Smithies, O.,Baralle, F. E., Shoulders, C. C. & Proudfoot, N. J. (1980) Cell21, 653-668.

17. Pedersen, K., Devereux, J., Wilson, D. R., Sheldon, E. & Lar-kins, B. A. (1982) Cell 29, 1015-1026.

18. Hyldig-Nielsen, J. J., Jensen, E. 0., Paludan, K., Wiborg, O.,Garrett, R., J0rgensen, P. & Marcker, K. A. (1982) Nucleic AcidsRes. 10, 689-701.

19. Wiborg, O., Hyldig-Nielsen, J. J., Jensen, E. 0., Paludan, K. &Marker, K. A. (1982) Nucleic Acids Res. 10, 3487-3494.

20. Brisson, N. & Verma, D. P. S. (1982) Proc. Natl. Acad. Sci. USA79,4055-4059.

21. Shah, D. M., Hightower, R. C. & Meagher, R. B. (1982) Proc.Natl Acad. Sci. USA 79, 1022-1026.

22. Proudfoot, N. J. & Brownlee, G. G. (1976) Nature (London) 263,211-214.

23. Breathnach, R. & Chambon, P. (1981) Annu. Rev. Biochem. 50,349-383.

24. Breathnach, R., Benoist, C., O'Hare, K., Gannon, F. & Cham-bon, P. (1978) Proc. Natl Acad. Sci. USA 75, 4853-4857.

25. Mount, S. M. (1982) Nucleic Acids Res. 10, 459-472.26. Smithies, O., Engels, W. R., Devereux, J. R., Slightom, J. L. &

Shen, S. (1981) Cell 26, 345-353.27. Hoffinan, L. M., Ma, Y. & Barker, R. F. (1982) Nucleic Acids Res.

10, 7819-7828.28. Inouye, M. & Halegoua, S. (1980) CRC Crit. Rev. Biochem. 1, 339-

371.29. Hall, T. C., Ma, Y., Buchbinder, B. U., Pyne, J. W., Sun, S. M.

& Bliss, F. A. (1978) Proc. NatL Acad. Sci. USA 75, 3196-3200.30. Sharon, N. & Lis, H. (1979) Biochem. Soc. Trans. 7, 783-799.31. Sun, S. M. (1974) Dissertation (Univ. Wisconsin, Madison).

Botany: Slightorn et al.

Dow

nloa

ded

by g

uest

on

Oct

ober

29,

202

0


Recommended