+ All Categories
Home > Documents > Spiroplasma Virus 4: Nucleotide Sequence of theViral DNA ...

Spiroplasma Virus 4: Nucleotide Sequence of theViral DNA ...

Date post: 01-Jan-2017
Category:
Upload: volien
View: 218 times
Download: 1 times
Share this document with a friend
12
Vol. 169, No. 11 Spiroplasma Virus 4: Nucleotide Sequence of the Viral DNA, Regulatory Signals, and Proposed Genome Organization JOEL RENAUDIN, MARIE-CLAIRE PASCAREL, AND JOSEPH-MARIE BOVE* Laboratoire de Biologie Cellulaire et Moleulaire, I.N.R.A. et Universite de Bordeaux II, Domaine de la Grande Ferrade, 33140 Pont de la Maye, France Received 10 February 1987/Accepted 23 July 1987 The replicative form (RF) of spiroplasma virus 4 (SpV4) has been cloned in Escherichia coli, and the cloned RF has been shown to be infectious by transfection (M. C. Pascarel-Devilder, J. Renaudin, and J.-M. Bove, Virology 151:390-393, 1986). The cloned SpV4 RF was randomly subcloned and was fully sequenced by the dideoxy chain termination technique, using the M13 cloning and sequencing system. The nucleotide sequence of the SpV4 genome contains 4,421 nucleotides with a G+C content of 32 mol%. The triplet TGA is not a termination codon but, as in Mycoplasma capricolum (F. Yamao, A. Muto, Y. Kawauchi, M. Iwami, S. Iwagani, Y. Azumi, and S. Osawa, Proc. NatI. Acad. Sci. USA 82:2306-2309, 1985), probably codes for tryptophan. With these assumptions, nine open reading frames (ORFs) were identified. All nine are characterized by an ATG or GTG initiation codon, one or several termination codons, and a Shine-Dalgarno sequence upstream of the initiation codon. The nine ORFs are distributed in all three reading frames. One of the ORFs (ORF1) corresponds to the 60,000-dalton capsid protein gene. Analysis of codon usage showed that T- and A-terminated codons are preferably used, reflecting the low G+C content (32 mol%) of the SpV4 genome. The viral DNA contains two G+C-rich inverted repeat sequences. One could be involved in transcription termination and the other in initiation of cDNA strand synthesis. The SpV4 genome was found to contain at least three promoterlike sequences quasi-identical to those of eubacteria. These results fully support the bacterial origin of spiroplasmas. Spiroplasma virus 4 (SpV4) is an isometric virus with single-stranded circular DNA that produces a lytic infection of the helical mollicute Spiroplasma melliferum (22). The 4.4-kilobase viral DNA is one of the smallest genomes of procaryotic DNA viruses. Possible bacterial equivalents of SpV4 are the coliphages G4 and 4X174. The genomes of these phages are only slightly larger than that of SpV4 and code for at least 10 proteins. The SpV4 DNA might also code for a relatively large number of proteins despite its small size. Therefore, SpV4 DNA seemed to be a good candidate for nucleotide sequencing to obtain information on gene structure and regulatory signals in the spiroplasmas. Such data are still very scarce in the mollicutes in general and the spiroplasmas in particular. We have recently cloned the double-stranded replicative form (RF) of SpV4. The cloned RF was proved to be infectious by transfection, indicating that no sequences were lost during cloning (20). We report here the full sequence of the cloned SpV4 DNA. The viral genome has nine open reading frames (ORFs) provided that TGA is not taken as a termination codon. The regulatory signals are very similar to those of eubacterial sequences, in agreement with recent views on the phylogeny of the mollicutes, regarded as a coherent phylogenetic group de- riving by regressive evolution from low-G+C gram-positive bacteria (26). MATERIALS AND METHODS Bacteria and bacteriophage. Escherichia coli HB101 was used for propagating recombinant plasmids containing SpV4 * Corresponding author. RF DNA, and E. coli TG1 was used as the host for bacteriophage M13. (E. coli HB101 and TG1 as well as phage M13mp8 were kindly supplied by S. Wain Hobson [Institut Pasteur, Paris].) Enzymes and chemicals. Restriction endonucleases, DNA polymerase I (Klenow fragment), calf intestine phosphatase, T4 DNA ligase, isopropyl-,3-D-thiogalactopyranoside, and 5-bromo-4-chloro-3-indoyl-p-galactoside (X-gal) were pur- chased from Boehringer GMb H (Mannheim, Federal Re- public of Germany). A nick translation kit, an M13 sequenc- ing kit, and the labeled nucleotides [a-32P]dCTP (110 TBq/ mmol) and [ot-35S]dATPaS (22 TBq/mmol) were purchased from Amersham Corp. (Arlington Heights, Ill.). N,N'- Methylene-bisacrylamide and acrylamide were obtained from Bio-Rad Laboratories (Richmond, Calif.). Urea was from E. Merck AG (Darmstadt, Federal Republic of Ger- many). Agarose and low-melting-point agarose were pur- chased from Bethesda Research Laboratories, Inc. (Gaith- ersburg, Md.). SpV4 RF DNA. Culture of S. melliferum Gl and propaga- tion of SpV4 in this spiroplasma have been described previ- ously (22). Purification of the SpV4 RF DNA and its cloning in E. coli have been described recently (20). Shotgun cloning and dideoxy sequencing of SpV4 RF DNA with bacteriophage M13. SpV4 RF DNA fragments were randomly generated by sonication (4), cloned in E. coli TG1 after insertion into the RF of phage M13mp8 used as a vector (17), and sequenced by the dideoxy chain termination method (30) as follows. Recombinant plasmid pESV4-13 containing the full-size SpV4 RF inserted at the ClaI site of the E. coli plasmid vector pBR328 (20) was sonicated at 10 W for 45 s. The 400- to 800-base-pair fragments were purified by electrophoresis on a 1% low-melting-point agarose gel and 4950 JOURNAL OF BACTERIOLOGY, Nov. 1987, p. 4950-4961 0021-9193/87/114950-12$02.00/0 Copyright © 1987, American Society for Microbiology
Transcript
Page 1: Spiroplasma Virus 4: Nucleotide Sequence of theViral DNA ...

Vol. 169, No. 11

Spiroplasma Virus 4: Nucleotide Sequence of the Viral DNA,Regulatory Signals, and Proposed Genome Organization

JOEL RENAUDIN, MARIE-CLAIRE PASCAREL, AND JOSEPH-MARIE BOVE*Laboratoire de Biologie Cellulaire et Moleulaire, I.N.R.A. et Universite de Bordeaux II, Domaine de la Grande Ferrade,

33140 Pont de la Maye, France

Received 10 February 1987/Accepted 23 July 1987

The replicative form (RF) of spiroplasma virus 4 (SpV4) has been cloned in Escherichia coli, and the clonedRF has been shown to be infectious by transfection (M. C. Pascarel-Devilder, J. Renaudin, and J.-M. Bove,Virology 151:390-393, 1986). The cloned SpV4 RF was randomly subcloned and was fully sequenced by thedideoxy chain termination technique, using the M13 cloning and sequencing system. The nucleotide sequenceof the SpV4 genome contains 4,421 nucleotides with a G+C content of 32 mol%. The triplet TGA is not atermination codon but, as in Mycoplasma capricolum (F. Yamao, A. Muto, Y. Kawauchi, M. Iwami, S.Iwagani, Y. Azumi, and S. Osawa, Proc. NatI. Acad. Sci. USA 82:2306-2309, 1985), probably codes fortryptophan. With these assumptions, nine open reading frames (ORFs) were identified. All nine arecharacterized by an ATG or GTG initiation codon, one or several termination codons, and a Shine-Dalgarnosequence upstream of the initiation codon. The nine ORFs are distributed in all three reading frames. One ofthe ORFs (ORF1) corresponds to the 60,000-dalton capsid protein gene. Analysis of codon usage showed thatT- and A-terminated codons are preferably used, reflecting the low G+C content (32 mol%) of the SpV4genome. The viral DNA contains two G+C-rich inverted repeat sequences. One could be involved intranscription termination and the other in initiation of cDNA strand synthesis. The SpV4 genome was foundto contain at least three promoterlike sequences quasi-identical to those of eubacteria. These results fullysupport the bacterial origin of spiroplasmas.

Spiroplasma virus 4 (SpV4) is an isometric virus withsingle-stranded circular DNA that produces a lytic infectionof the helical mollicute Spiroplasma melliferum (22). The4.4-kilobase viral DNA is one of the smallest genomes ofprocaryotic DNA viruses. Possible bacterial equivalents ofSpV4 are the coliphages G4 and 4X174. The genomes ofthese phages are only slightly larger than that of SpV4 andcode for at least 10 proteins. The SpV4 DNA might also codefor a relatively large number of proteins despite its smallsize. Therefore, SpV4 DNA seemed to be a good candidatefor nucleotide sequencing to obtain information on genestructure and regulatory signals in the spiroplasmas. Suchdata are still very scarce in the mollicutes in general and thespiroplasmas in particular. We have recently cloned thedouble-stranded replicative form (RF) of SpV4. The clonedRF was proved to be infectious by transfection, indicatingthat no sequences were lost during cloning (20). We reporthere the full sequence of the cloned SpV4 DNA. The viralgenome has nine open reading frames (ORFs) provided thatTGA is not taken as a termination codon. The regulatorysignals are very similar to those of eubacterial sequences, inagreement with recent views on the phylogeny of themollicutes, regarded as a coherent phylogenetic group de-riving by regressive evolution from low-G+C gram-positivebacteria (26).

MATERIALS AND METHODS

Bacteria and bacteriophage. Escherichia coli HB101 wasused for propagating recombinant plasmids containing SpV4

* Corresponding author.

RF DNA, and E. coli TG1 was used as the host forbacteriophage M13. (E. coli HB101 and TG1 as well as phageM13mp8 were kindly supplied by S. Wain Hobson [InstitutPasteur, Paris].)Enzymes and chemicals. Restriction endonucleases, DNA

polymerase I (Klenow fragment), calf intestine phosphatase,T4 DNA ligase, isopropyl-,3-D-thiogalactopyranoside, and5-bromo-4-chloro-3-indoyl-p-galactoside (X-gal) were pur-chased from Boehringer GMb H (Mannheim, Federal Re-public of Germany). A nick translation kit, an M13 sequenc-ing kit, and the labeled nucleotides [a-32P]dCTP (110 TBq/mmol) and [ot-35S]dATPaS (22 TBq/mmol) were purchasedfrom Amersham Corp. (Arlington Heights, Ill.). N,N'-Methylene-bisacrylamide and acrylamide were obtainedfrom Bio-Rad Laboratories (Richmond, Calif.). Urea wasfrom E. Merck AG (Darmstadt, Federal Republic of Ger-many). Agarose and low-melting-point agarose were pur-chased from Bethesda Research Laboratories, Inc. (Gaith-ersburg, Md.).SpV4 RF DNA. Culture of S. melliferum Gl and propaga-

tion of SpV4 in this spiroplasma have been described previ-ously (22). Purification of the SpV4 RF DNA and its cloningin E. coli have been described recently (20).Shotgun cloning and dideoxy sequencing of SpV4 RF DNA

with bacteriophage M13. SpV4 RF DNA fragments wererandomly generated by sonication (4), cloned in E. coli TG1after insertion into the RF of phage M13mp8 used as a vector(17), and sequenced by the dideoxy chain terminationmethod (30) as follows. Recombinant plasmid pESV4-13containing the full-size SpV4 RF inserted at the ClaI site ofthe E. coli plasmid vector pBR328 (20) was sonicated at 10Wfor 45 s. The 400- to 800-base-pair fragments were purified byelectrophoresis on a 1% low-melting-point agarose gel and

4950

JOURNAL OF BACTERIOLOGY, Nov. 1987, p. 4950-49610021-9193/87/114950-12$02.00/0Copyright © 1987, American Society for Microbiology

Page 2: Spiroplasma Virus 4: Nucleotide Sequence of theViral DNA ...

SPIROPLASMA VIRUS 4 GENOME 4951

1

Acci

2

Scal

112 99--*-

49 74 89 83

59 52 107 44i - I S

72 90_ _

3

Bcil

80_-

75--.

70 93- IN

91 113 92- -- '-3 2 62

46 68 95- b-b b-_

45 19 79 85 66 60 69 98

~~~~~~~~~~~~~~~~~-

C73 277 , 8 82 71 110 III 109

3H 2H 67 21 106 114 1 63-,,

FIG. 1. Sequencing strategy. I, Clal-linearized restriction map of SpV4 RF DNA with unique restriction sites. II, Directions, alignments,and numbers of the sequenced DNA fragments. V, Viral DNA strand; C, complementary strand. kbp, Kilobase pairs.

blunt ended by a fill-in reaction with DNA polymerase I(Klenow fragment) and all four deoxyribonucleotide 5'-triphosphates (1). The blunt-ended fragments were ligated tothe dephosphorylated SmaI-linearized M13mp8 RF vector.The ligation mixture was used to transform E. coli TG1 cellsby the method of Hanahan (10). Among the recombinantphage giving colorless plaques, those containing SpV4 DNAwere further selected by in situ hybridization (15) with an

SpV4-specific probe made by nick translation (25) of theSpV4 RF. A total of 114 hybridization-positive subcloneswere obtained.

In addition, two HincII restriction fragments (139 and 390base pairs) of SpV4 RF were separately cloned in bothorientations, using the same dephosphorylated SmaI-linearized M13mp8 RF vector.

Preparation of single-stranded DNA templates from therecombinant phages, annealing the forward 17-mer universalprimer to templates, and sequencing reactions were per-

formed following the M13 Cloning and Sequencing Hand-book (1), except that for the sequencing reaction, concentra-tions of ddATP and ddTTP working solutions were loweredto 0.015 and 0.05 mM, respectively. [a-35S]dATPaS (22TBq/mmol) was used as the labeled nucleotide.

Sequencing gel electrophoresis. Sequencing reaction mix-tures were loaded onto a 0.4-mm-thick, 50-cm-long poly-acrylamide gel containing 7 M urea and 6.5% acrylamide inTris-borate-EDTA buffer (pH 8.3). Electrophoresis was per-formed at 36W constant power for 4 h (short run) or 8 h (longrun). Gels were fixed in a mixture of 10% acetic acid and 10%methanol for 20 min before being dried under vacuum. Theywere autoradiographed overnight at room temperature withDu Pont Cronex 4 X-ray films.

Sequence analysis. Computer analysis of the nucleotidesequence was performed by using the alignment programNUCALN of Wilbur and Lipman (33) and the translationalprogram NUMSEQ of Fristensky et al. (5). Hydropathyprofiles of putative polypeptides were displayed by themethod of Kyte and Doolittle (14).

Determination of NH2-terminal amino acid sequence ofSpV4 capsid protein. Proteins were purified by sodium do-decyl sulfate-polyacrylamide gel electrophoresis and elec-troblotted onto glass fiber sheets coated with Polybrene (32).The immobilized proteins were subjected to automatic gas-

phase sequence analysis essentially as described by Hewicket al. (11).

RESULTS

Nucleotide sequence of SpV4 DNA. The double-strandedRF DNA was found to contain 4,421 base pairs. The DNAsequencing strategy is outlined in Fig. 1. Sequence data forboth strands were obtained for all 4,421 base pairs withoverlaps between the junctions. On the basis of hybridiza-tion experiments, it was shown that the II-V sequences ofFig. 1 correspond to the viral DNA strand of the RF and thatthe II-C sequences correspond to the complementary strand.The nucleotide sequence of the single-stranded circular

viral DNA (V strand) is indicated in Fig. 2. Since, for cloningpurposes, the circular RF was initially linearized with re-striction endonuclease ClaI (20) and since ClaI cuts thesequence 5'-ATCGAT-3' between T and C, C was chosenarbitrarily as nucleotide number 1 and T as nucleotidenumber 4,421.The base composition of SpV4 DNA is 34% A, 33.9% T,

11.8% C, and 20.2% G. The G+C content is 32 mol%,slightly higher than that of the spiroplasma host DNA (26mol%) (3).

Distribution of ORFs on SpV4 genome. The SpV4 DNAsequence was analyzed by the NUMSEQ translational pro-gram. Figure 3 summarizes the positions of terminationcodons on the viral DNA strand (V) and the complementarystrand (C) in two cases (II and III): when all three termina-tion codons are used (III) and when TAA and TAG but notTGA are represented (II). Only when termination codonTGA was omitted (II) could an ORF large enough to fit the60,000-dalton capsid protein be identified on the viral DNAstrand (II-V) in ORF2. The reason for not considering TGAas a termination codon derives from the finding of Yamao etal. (36) that in Mycoplasma capricolum TGA is not a

termination codon, but codes for tryptophan (see Discus-sion). In the following results, the assumption will be madethat TGA is not a termination codon, and only the ORFscorresponding to part II of Fig. 3 will be considered.The positions of methionine codon ATG on the viral DNA

strand (V) and the complementary strand (C) are representedas vertical bars in Fig. 4. The ORFs of Fig. 3, panel II, that

I _

Clal

65

1I05

5H IH

4 Kpba I

Clal

II

VOL. 169, 1987

Page 3: Spiroplasma Virus 4: Nucleotide Sequence of theViral DNA ...

4952 RENAUDIN ET AL.

30 60 90 120cGrATAcAA mTcAAGGACTAGTAAARCGAM CTCGI I IrTATAWiKAMCATACTATTGgGTRAMGKATCATTCTVCAATATACTGCTCGaTTAAAspSer6lnLys6lyTyrGlrlGlllTrpThrStrLytThrIeStrAr9PheTrpAspLysGlyPhtHisThrIlt6ly6luLtuThrTyrHisSerAl&AsnTyrThrAlsArgTyrThr

150 180 210 240

ThrLysLysLtuGlyYalLysAspTyrLysAlaLtuGlnLeuNalPro6luLysLeuAr9MrLys6lyIleGlyLtuLysTyOrP4eMET6luAsnLys6luArgIleTyrLys6lu

270 300 360AC.MGTMAAMUCAACGATA-GAWATTAAGAMAMATCCTAAGAlTM(6ACGtGTtCGTC TirwAT@CATTAWiA

Asp6trYJlLtuIltStrThrAspLysGlylltLysAriohtLysYalProLysTyrPh-eAspAr9ArgVGluAr*GluTrpGlnAspGluPhiTyrLtuAspTyrIleLystlir ,s

390 420 450 480AC TCMCCGC TCG

510 P1 540 570 600

ACCMGVAClNTAYGAGlCGGGCATATATACTTGTfiTAATGTCCCCATTGKG Vj GTATGGGCCAGGCAAAAKC

PrsL*vLysThrGlyLysLys"ValArgArgLyiValLysAuiThrLysArgHis

U0 660 690 7mAr,11c1 iSGll MUNIC6TGAATATATGCTAAATCCCGTGTCCGTCI G@ GtnR>Ct

G1uTrpArgLnThrHisS*rAlaArg.rIlhLysArgAlaAmnIlETProSkrAsiPr.ArgGlyGlyArgArgPhu

750 780 810 8MTCATAl G6CTATCGl G5TTTTCR TGTTGTASCATRC6AWGtACG^MGATGATTAAt TCWAtWGTA6^tKWATG ACflATGMTATTTGTWA

IETAIaTyrArG6lyPhuLysThrSlrArgYalValLysHisArlYalArrgArgTrPlheAsnHisArgArgArgTyrArpgMLysLsuSerLysLys

870 900 930 960ZT6CAAAM T TTA CTrA CCAACTTGCAMNrATC6

LysIErGlArlllokAonThrLGWleLysL°bhGli&rp6trHisL*WksGlyTyrAstpAsWtrpopTelhrAseIaLtuAlaLtv6luLys6luIleGloll6GlyTyrArI

990 102O 1050 1080

CYS61uThrCysLyYSLUYaIIluLYuSerVaIslAiysAP6hflleVaICysLysllCysu.AsnsliysArGl..m.b.bmMET

1110 1140 1170 1200T66TAMTTMTAtsMNWiAtATTAGAMAAMA6GWTUCTGSMGC 11T6ATTlTATATG6ATARt6ixU

AseLYSLOvllrLmLyskpVasEArf%lld%O,syhtGl**rolyLeucysll*AI&YaIGI*mdl#lvaINMlAl*a rLy%troLlylltPrTrrp130 1260 it 1290 1320

IlLpysIleTyrThrProCysTrY4llleYal6lyrslXerya 6llerMGIL#dllevhlGlnOr1lnTLrGlluysllelllleTyrArgLys6luysLyshp6lu kfltTrfri

1350 130 1410 1440ll~~~~~~WlTCA C U U A K T "n U TTATATGt

G6Vl*hukyV16allrA prlrGl S. y lubrTyr6ilakallWskplspTrLys6l"mrTyryy6lhlYalAl&ThrTyrAsmArgFIG. 2. Nucleotide sequence of SpV4 DNA and derived amino acid sequences of the putative polypeptides. The -10 and -35 regions of

promoterlike sequences P1, P2, and P3 are underlined.

J. BACTERIOL.

Page 4: Spiroplasma Virus 4: Nucleotide Sequence of theViral DNA ...

SPIROPLASMA VIRUS 4 GENOME 4953

147 150 15 150TTiTW;T_$cTimGCG TCA5TZTOOSATGIiT GAA OAAl G=ACTGCTiA6ATAGTGGArTGATACTATMG

TyrkA6lvlltlnlvlela6l"^oly6lllGl rAspLevArf*rttsLspLysTyrGlyAspAspTyrLouGleLouLeuProProAlaArgLov6lyGlyAspAspThrlld-@u

1590 160 1650 1680CCGWTCI6I I IIA6MICGn^"GA T"AAAC T G wi APreLysSrVlalU6vGlW-ilnl#Ar#Ml*akrGlvTyrLoveifi"GlWksnltl rrLyesLvspysGInGlyUt6lyAspUt SkphIll

-Mat of Wfi1710 1740 1770

LAsTrp6ls6ldtr6ld.yLys11U1l*ilLysGlyLysysGlWkplG slWpGIsSlHElLysLysLyuIErS.rLysLeuiiAlaArlYalHisAspPhS.r?ETPh.LysGlyAsdHis

1830 10 1890 1920TATCCGC6TTCMWAATUATCC CAAMATCACGCCTGMATCCTGGTGCATAAMTGGMGK

Ilt rArstrLysIldHisllSProHisLy%ThrlleArpAlahosnY lGlyGIvIlelleProlltTyrGlnThrProYalTyrFroGlyGlvHislld.ysWAspLevThr

1M50 1960 2010 2040TAliTTrATATC6CCTAlilTMtATGACCtCTATGiTG^AMATCTAGATACATATsGCMGCsTGTCCTTG C6inm m m s

SerLnwTrArfrsSrrThI%llV lPr°PrdR.AspAspL#nIleYalAspThrTyrAlI eAlVY lroTrpArsIlleVYlTrpLysAspL#vluLyslPhoeGlyGIz

2970 2100 2130 2160AATTCTGATATTTTTAAATGCTCCT CCTG CGTACC ITCTCM6 GITTGMiTTATGGTACMGGCTGACCATTTTGGATTACTCCTAA6TTCCT06

AsnrAslp6rTrpksfhlLysAsiAlaPro°roYVlProAspItY&lAImPrlStrGlyGlyTrpkpTyrGlyTrUArelaAspHisPheGlylleThrProLysY&IProGly

2190 222Z50 2280AnA1MMAAATCM MA6CTAGCTAMATATA CTGMAGGTCAMGTAGCGTGGCMGATATAGTCTATCAGWAA

llArgYalLyrsArrglrAlaTyrAlaLysIl.llAsnAspTrpPlAkpGlnAsdurSer rluC ysAl& LuThrL.uWspk erSrAsnSrGlIGlyklr

2310 2340 2370 2400TASTAATlCA C TCAATIW 66C CATATTGTAATMTACCACATATMACTARGCTACCWC TCCTATAC CT

k6ilrAs16alThrAp11.G1 GlyGliyLysrITurllAldsTyrHisATyr yrPheThrS9rCysrLA rAlPr.6lysGslAlrnThrThrLTf

2430 2460 2490 n20AlRleATGcaTilAcTACTAZAmA A ATGTTCCTA^TTT 66rAMC f TGiCAAT A

Asal6lyGl1yWAlarI aTrahrTLysPheArrkpAalPr1rLeuArGlyThrPr.LuIlsgArslpA sLLyyGslArgThrlIeysThrGlyGlrLUGlyIle

50 20 2610 240

61frVa1AspAl1dil'hL.Wa1AldmAl6lThrAlaGluAlaAl AlyGluArgAld,lrekrAs.LErpAlaApL.drAsImA&ThrGlyll1Skrfller

2670 2m 30 2760

Asrldlezl*TrTy^%r6Usyry6ldVsph* rg6lyGlyThrArlTyrY&161 larTHdm isPllyVlHisTrAlasPpAlargLevGln2790 2820 220 288

ArsrG61aihl.dmyt6idSlar1G1iVi6a1rhlPre6lThrOrGlrThal6LydElMrPrb61I6lyAsdv1AlalbiErGlSThdFIG. 2-4Continued)

VOL. 169, 1987

Page 5: Spiroplasma Virus 4: Nucleotide Sequence of theViral DNA ...

4954 RENAUDIN ET AL.

2910 2940 2970

I=t3 3060 309 30Z1-i IIIII IIIIrTAT6GAATCCiGTAGCMTATCG^AAnSGC G ATTCAK

GlgksiwkpffyAsproulmUIlerGll~SrGl6lJl YlLysAnAg6l1ld1&161rolys rGflnkpkelull1161ypIuO

3150 3160 3210 3240ACMmGMmCTMCTAGTTCTGRTCC ATC CGCTATTGTC

AlSrpAldslLnArfki.ysPrWk%YlAla6lyrvElTAr&rSterNisProGllsrLiiuApTyrTrpHisPh#AlkpHisTyrAlaGlllLtvPrLysLftSer

327 3300 3360

3390 3420 3450 3480GCTTTTATCCTWATTCGTCGTTTAAT AMGTTC6CTTAC6ATACm ^ TCt

PrdLuTyrurThrProGlLvnArgArgll0"

3510 350 3570 300

GETlyPrd.eILeuGlV6lyalGyAlaIyAlal6lySirAaIlGlylGilye6dly

330 3&0 3690 3720Z61T^C6r6AnAZAZW^RGAATARCTCATTAAMGC CACCR AAnA CR

EIlgAspLysTrpk.ArgAspPh.1u61AuAr#E rAumThr61mTyrGl.ArgAlaArgLysApETGIuAlaAlaGlyllWlhnIru.gAla6lfglySerGl1y

3750 3780 310 340CWCGCTCAZCnC U0m Am6RRAAAA> TG6ATlATGCTTATZAACTTCTATCTAAKW6CTG61ulAldwSsrPrder6ly61y41krGlSsr9.rPh.ly9.rAuIllThrk1ysrETrAIyndrtrAlAdEnG.ludurLy6L1ys6lW,pAld1u

370 3900 39 3 NOc6TOMTTTr6MTCTWlITCM"A1MTMTCTCGTMTMATG5TGEGT4rGTMT TRACC1MAAUtAMT0ArAlakd61yT*rLyrGlThrIliA1plArgksikETYaArgkrValIIeThrLderLysg#yaLys...

3990 4020 4050 40ACAM1m^CA1TATTCTMTrKiTEATECACrNATMTATmTTAATMTATA rNETAI CyArfroU6l1SalHisAu61yGLsly6lLyValAs#h LysisTyr&rAnlykpVlAlarlTyrAsEAdykinTyrllSW

HETIldrpllhLylll.fldm

4110 4140 4170C

kerWhPrr¢"kLyCy%YalgyCys rh**iirAleTpyal^Zr*ArAAarvlloLdiiAmr.dli&rro%WTl FNMleVAICYdhlVallrValLoWlalVaITrVl&ld.LnrpilWlr6iwNalUlLTrpLd&6alldj&rGIlelloklidAd

423 420 420 420

atTYrbrkcl blTrAilAlyArfrokoCyslPro6lislIleTrL-ydWldysruyTyr^6lArjrdl6Hisll@61Y

450 42 4410

fruNisTyrtisIlCFIG. 2-(Continued)

nfroLnuih1ind.%Thbr

3000GATACAGMTAAIrTAMA6TT,AATAA6AC I I I I CA6AACATA6TTATATTATTGTM66CA6TTMCGTTATAAACATACTTATCWAVATKAKOMTT(ATTCC6T66

II.GlnkWktiTyrUuValAsnLysThrPheThr6luHisSerTyrIltIltY,alLovAI&YalYalArlTyrLysHisThrTyr6inGInGlyIl.6luAlakpTrpPheArg6ly

GTC

IWARICC I iiisillif Ill" iiLieLysiynAounenoNgTyrelyTkrLys&#UArI

J. BACTERIOL.

Page 6: Spiroplasma Virus 4: Nucleotide Sequence of theViral DNA ...

SPIROPLASMA VIRUS 4 GENOME 4955

I

Accl

2

Scal

3

Bcil

4 KbpI

Clal

0 ~~~-

5 -

4421

-3

4

5

6

4421

11 lnr -11 111 ;EI Ia li io IZ- Z'NE1.I] on ME on on am I ;I ---~~~~~I liti ~I III linii111111 1

~~~~~u lil Ili1111 Ii III1111 iliiilifi'i Iii IIii.I I I I

.~~~~- III 1II7C

4

5

6

3- 5'FIG. 3. Potential protein-coding regions. I, Same as in Fig. 1. II, Positions of termination codons TAA and TAG are shown as vertical bars

in all three reading frames of the viral DNA strand (V: 1, 2, 3) and of the complementary strand (C: 4, 5, 6). III, Same as II, but in additionto TAA and TAG, termination codon TGA is also positioned. The reading frames are defined as follows: the first nucleotide of the first codonof frame 1 is nucleotide number 1; in frame 2 the first nucleotide is nucleotide number 2; in frame 3 it is nucleotide number 3. kbp, Kilobasepairs.

IKbp

C1

0 1 2 3 4 4.421/0I U U *lal Accl Sca 1 Bcl1

0.54

Cla1

1~~~~~~~~~~~~~~~~~~~~~~1~~~~~~~- F 141EV 2

3 Ii _____ _ __

II

C~~~ ~ ~ ~~~~~~JE=-- 5--

oil~~~~~ ~_~i--- --6

FIG. 4. Summary of methionine codons in reading frames 1, 2, and 3 of the viral DNA strand (V) and frames 4, 5, and 6 of thecomplementary strand (C). I, Same as in Fig. 1. Positions of the methionine codon are indicated by vertical bars. ORFs larger than 120nucleotides are represented as boxes. The numbered boxes are ORFs with a ribosome-binding site upstream of the initiation codon, shownby an arrowhead. kbp, Kilobase pairs.

I 0

Clal

1

V 23

II

C

I

3

III

liiimiI I11I111111 A 11

IIITT-1111 I I I I III III I lillII111111II111-1 11 11 ITF.TIIHIIIIIIF-lll loillill

III -] I I I I I 1111111 I 11III I I

I 11 H I ltu 1- I III III I HI 11

VOL. 169, 1987

I

II I I 11 111111 1111 IIIV 2 1 l II III 1111 IN 11 I

Page 7: Spiroplasma Virus 4: Nucleotide Sequence of theViral DNA ...

4956 RENAUDIN ET AL.

ORF Nucleotide sequence Num

17361 ATG61GAAAGG AAAAAEAGAAGATG

39642 TTTT|AG6AAIA 66ATAFTGATIAATATG3 TAT|AI6KAiAGGK616A|AAAAEAAAAGATG

35354 T G T G CIA A A GG|T G6 T G AIA A T A G T A T G

10805 CAAIAG6AjAiAiIGGiIAJAAAAT TATG

r--ir-iri ~~~8246 T A T JTEjG6JAATJCTTTATG7 ATTCTCTE]TjGGAGAIT_GITOG C ACGATATG

5708 G A CIA G A A G G AT G T T G A T T G T G

7219 CATIJG|AAGGAG|A1IIA TCATATG

16S rRNA 3'0H end of:S. cAC OH-U C U U U C C U C C A C U A GS.hubtLtL6 OH-U C U U U C C U C C AC U AG

E. co&

nber of codons

554

321

150

134

85

74

49

39

29

OH-A U U C C U C C A C U A G

FIG. 5. Shine-Dalgarno sequences associated with ORFs of SpV4 DNA. Nine ORFs have, upstream of the initiation codon, sequencesthat are complementary to the 3'-OH end of 16S rRNA of S. citri (in boxes), B. subtilis, and E. coli. The initiation codon is underlined. Thenumber of codons includes the termination codon(s).

are larger than 120 nucleotides are indicated as boxes in Fig.4. Only some of these ORFs have an ATG initiation codon atthe 5' end (arrowheads in Fig. 4), and almost all of these arelocated on the viral DNA strand (V).A bacterial coding ORF possesses, at 5 to 10 nucleotides

upstream of the initiation codon, a Shine-Dalgarno sequencecomplementary to the 3'-OH end of the 16S rRNA (31). Thissite, approximately six nucleotides long, often contains thesequence AGGA and is involved in ribosome binding. Thenucleotide sequence at the 3'-OH end of S. melliferum 16SrRNA is not known. A search for Shine-Dalgarno sequenceson the SpV4 DNA was done by comparing the sequencesupstream to the initiation codons of the ORFs with the3'-OH end of three others 16S rRNAs: that of E. coli, that ofBacillus subtilis, and that of Spiroplasma citri (Fig. 5). S.citri is serologically related to S. melliferum, and these twospiroplasmas have 65% DNA homology (2). The sequence ofthe 15 terminal nucleotides of S. citri 16S rRNA is the sameas that of the following mollicutes and gram-positive bacte-ria: M. capricolum, Mycoplasma sp. strain PG50, Achol-eplasma laidlawii, Clostridium innocuum, and B. subtilis (6,12, 18, 34). Undoubtedly, this 3'-OH terminal sequence ishighly conserved and differs from that of E. coli at the very3'-OH end, 3'-UCU being replaced by 3'-A in E. coli. It istherefore very likely that the 3'-OH end of 16S rRNA of S.melliferum is identical to that of S. citri. Figure 5 shows thenine ORFs which have a Shine-Dalgarno sequence comple-mentary to S. citri 16S rRNA. These nine ORFs have beennumbered 1 to 9 from the largest to the smallest; they areindicated by their number in Fig. 4, except ORF9, which isshorter than 120 nucleotides. ORF1, the largest, has the sizeexpected for the 60,000-dalton capsid protein, the only viralprotein identified so far. The Shine-Dalgarno sequences areup to three nucleotides larger when evaluated against the3'-UCU-terminated 16S rRNA of S. citri or B. subtilis thanagainst the 3'-A-terminated 16S rRNA of E. coli (Fig. 5).

In summary, nine putative coding ORFs have been iden-tified on the SpV4 viral DNA. Each ORF is bordered by an

initiation codon and at least one termination codon, andpossesses, upstream of the initiation codon, a Shine-Dalgarno sequence. ORF1 has the expected size for the60,000-dalton capsid protein gene. The initiation codon ofORFs 1, 2, 3, 4, 5, 6, 7, and 9 is ATG, and that of ORF8 isGTG. The GTG codon of ORF8 is part of the sequenceTTGTG (Fig. 5). The codon formed by the first threenucleotides (TTG) has been described as an initiation codonin gram-positive bacteria and phages (16). Hence, TTG couldalso be the start of an ORF, however, one that is shorter thanORF8 starting with GTG. The first termination codon isTAG for ORFs 2, 8, and 9 and TAA for the other six ORFs.ORFs 5 and 6 are terminated by two and three adjacent TAA

clai

Bcll Accl

P2

.

Scal

FIG. 6. Proposed SpV4 genome organization.

J. BACTERIOL.

Page 8: Spiroplasma Virus 4: Nucleotide Sequence of theViral DNA ...

SPIROPLASMA VIRUS 4 GENOME 4957

ORF 3 Glu Asp Glu Lys Glu Asn Glu TER(reading frame 1 ) ,l 13677 ' r I'. r* r-

GAAGATGAAAAAGAAAATGAGTAAATTGORF I(reading frame 2) MET Lys Lys Lys Met Ser Lys Leu

ORF 5 Lys Asp Glu Asn Ile Tyr Ser Lys Lys TER(reading frame 3)P II30 r.i, ,r- .r- r r .r --.

AAAGATGAAAATATATACTCAAAGAAATAATORF 3 L!.(reading frame 1) MET Lys Ile Tyr Thr GIn Arg Asn Asn

ORF 2 Arg Tyr Asp Val Thr Leu Thr(reading frame Ir rm55'f''-I rI u '---V I

CGATATGATA . . ... GTTACTTTAACTORF 7L.....JL....JL..JL.J(reading frame 2) MET Ile Leu Leu TER

FIG. 7. Nucleotide and putative amino acid sequences of ORF overlapping regions.

codons, respectively. The nine putative coding ORFs in-volve all three reading frames (Fig. 6). There are threeoverlapping regions, between ORFs 5 and 3, 3 and 1, and 2and 7. ORF2 fully overlaps ORF7. The nucleotide sequencesof overlapping regions are shown in Fig. 7.ORF of capsid protein. The size of the capsid protein of

SpV4 was found to be 60,000 daltons by polyacrylamide gelelectrophoresis (22). ORF1, which extends from nucleotide1736 to nucleotide 3397, has a coding capacity of 63,900daltons of protein and is the only ORF large enough toaccommodate the capsid protein. The N-terminal amino acidsequence of the capsid protein, determined as described inMaterials and Methods, was found to be Met-Lys-Lys-Lys-Met-Ser..... This is precisely the amino acid sequencepredicted from the nucleotide sequence of ORF1 (Fig. 2). Itis therefore very likely that ORF1 is indeed the gene for theSpV4 capsid protein.Codon usage. The codon usage for the capsid protein has

been determined from the nucleotide sequence of its gene(ORF1). For codons specifying the same amino acid anddiffering by only the nucleotide at position 3, those with A orT in the third position are used much more frequently thanthose terminated by C or G; those terminated by T are morefrequently used than those ending with A (Table 1). Withinthe six codons for arginine, AGA and AGG starting with Aare more frequently used than, respectively, CGA and CGGstarting with C. A similar situation occurs with leucine, forwhich the two codons starting with T are preferably usedover those starting with C. The preferred usage of codonshaving A or T in the first or third position as described abovefor ORF1 is also true for the other eight ORFs.Hydropathy profiles. From their nucleotide sequences, we

-43 regqio -35 region

determined the hydropathy (14) curves of ORFi (capsidprotein) and the other eight ORFs. The hydropathy profilesfor the putative proteins of ORFs 2, 3, 4, 6, and 8 showalternating hydrophilic and hydrophobic regions. The capsidprotein profile is similar but shows a pronounced hydropho-bic peak in the region corresponding to nucleotide 2940. Theputative polypeptides corresponding to ORFs 7 and 9 arepeculiar in that ORF7 has only hydrophobic regions, whileORF9 is fully hydrophilic. For ORF5, the profile shows twolarge hydrophobic regions and a small hydrophilic region atthe C-terminal end.

Regulatory signals. Three promoterlike sequences (P1, P2,P3) were identified (Fig. 8). Their positions on the SpV4genome are indicated in Fig. 6. Sequence P1 is close to theconsensus sequence of bacterial promoters, recognized byE. coli RNA polymerase carrying sigma factor a70 or the B.subtilis holoenzyme with sigma factor u43 (24). It has aperfect Pribnow (TATPuATPu) box (21) located at -10 of aCAT box. The A residue of CAT could be the start (+ 1) ofmRNA transcription. Also, the sequence of P1 at the -35region is only two nucleotides short of the most conservedsequence in that region, namely, GTTGACA. Finally, at-43, there is an A+T-rich region, characteristic of a gener-alized promoter. Promoterlike sequence P2 has regionssimilar to the -10, -35, and -43 regions of the consensusbacterial promoter, but the initiation point of mRNA tran-scription does not involve a CAT box and remains ambigu-ous. The start of mRNA transcription from promoterlikesequence P3 is also ambiguous; the region at -35 has a goodconsensus sequence, but in the region at -10 the A's of theTATPuATPu box are replaced by T's. Preliminary Northernblot (RNA blot) analyses of viral mRNAs (data not shown)

.1

A-Trich T G T T G A C A A T T T 12-14 bp T A T Pu A T Pu 4-7 bp C A T

544A-Trich A G T T G T C T T G GGG Sb TATAATA --C A T

12bp 5 bp ~~~~1292A-Trich T A T T G T T C A A A C _2bp TATAATT 5GbBAA

3954A-Trich CATTGTCAAAAA T TTTTT TTA G A T

*Consensus

Pi

P2

P3

FIG. 8. Comparison of the three promoterlike sequences of SpV4 DNA (P1, P2, and P3) with a consensus promoter sequence (28).Underlined residues are those that agree with the consensus sequence. bp, Base pairs.

VOL. 169, 1987

-.! I 0 reg ion-1I

Page 9: Spiroplasma Virus 4: Nucleotide Sequence of theViral DNA ...

4958 RENAUDIN ET AL.

TABLE 1. Codon usage for SpV4 capsid protein

Codon Usage (%) Codon Usage (%)

TTT-Phe..... 25 (4.5) TAT-Tyr...... 20 (3.6)TTC-Phe..... 2 (0.4) TAC-Tyr...... 1 (0.2)TTA-Leu..... 23 (4.2) TAA-TER...... 1 (0.2)TTG-Leu..... 14 (2.5) TAG-TER...... 0 (0.0)

CTT-Leu..... 3 (0.5) CAT-His...... 14 (2.5)CTC-Leu..... 0 (0.0) CAC-His...... 2 (0.4)CTA-Leu..... 1 (0.2) CAA-Gln...... 19 (3.4)CTG-Leu..... 0 (0.0) CAG-Gln...... 7 (1.3)

ATT-Ile..... 26 (4.7) AAT-Asn...... 30 (5.4)ATC-Ile..... 1 (0.2) AAC-Asn...... 1 (0.2)ATA-Ile..... 7 (1.3) AAA-Lys...... 25 (4.5)ATG-Met..... 14 (2.5) AAG-Lys...... 7 (1.3)

GTT-Val..... 23 (4.2) GAT-Asp...... 32 (5.8)GTC-Val..... 1 (0.2) GAC-Asp...... 3 (0.5)GTA-Val..... 9 (1.6) GAA-Glu...... 14 (2.5)GTG-Val..... 0 (0.0) GAG-Glu...... 6 (1.1)

TCT-Ser..... 8 (1.4) TGT-Cys...... 1 (0.2)TCC-Ser..... 0 (0.0) TGC-Cys...... 1 (0.2)TCA-Ser..... 14 (2.5) TGA-Trp...... 9 (1.6)TCG-Ser..... 1 (0.2) TGG-Trp...... 1 (0.2)

CCT-Pro..... 28 (5.1) CGT-Arg...... 15 (2.7)CCC-Pro..... 0 (0.0) CGC-Arg...... 0 (0.0)CCA-Pro..... 5 (0.9) CGA-Arg...... 1 (0.2)CCG-Pro..... 2 (0.4) CGG-Arg...... 1 (0.2)

ACT-Thr..... 23 (4.2) AGT-Ser...... 16 (2.9)ACC-Thr..... 1 (0.2) AGC-Ser...... 1 (0.2)ACA-Thr..... 7 (1.3) AGA-Arg...... 8 (1.4)ACG-Thr..... 3 (0.5) AGG-Arg...... 2 (0.4)

GCT-Ala..... 20 (3.6) GGT-Gly...... 23 (4.2)GCC-Ala..... 0 (0.0) GGC-Gly...... 0 (0.0)GCA-Ala..... 12 (2.2) GGA-Gly...... 13 (2.3)GCG-Ala..... 7 (1.3) GGG-Gly...... 0 (0.0)

seem to indicate that P1, P2, and P3 are functional but thatP1 is more efficient. Two additional promoterlike sequenceswere localized around nucleotides 905 and 2117, respec-tively. No evidence for their involvement in transcriptionwas obtained.Two inverted repeat sequences were located on the SpV4

DNA, one around nucleotide 3932 and the other aroundnucleotide 528. The sequence at nucleotide 3932 is shownin Fig. 10 as double-stranded DNA; the inverted repeats,11 nucleotides long, are underlined. The putative RNAtranscript from this double-stranded DNA has the ability toform a hairpin structure (Fig. 10). This structure has aUUUUUUUA-3'-OH sequence, typical of a transcriptionterminator independent of factor rho, and could be involvedin transcription termination. It should be noticed that, asshown in Fig. 10, the sequence around nucleotide 3932 ispart of the promoterlike region P3 (Fig. 8). A similar situa-tion has also been described for phage 4X174, in which thepromoter of gene A overlaps the main transcription termi-nator (9).The inverted repeat sequence at nucleotide 528 is located

in the untranslated region between ORF2 and ORF8 (Fig. 6)and can form a secondary structure as shown in Fig. 9. With

seven G * C base pairs of which five are in a row, thishairpin structure should be quite stable. The hairpin struc-ture of SpV4 is reminiscent of similar structures on thesingle-stranded DNAs of phages M13, G4, and 4X174 (9).For M13 and G4, there is a unique origin of cDNA strandsynthesis. The hairpin structure of M13 DNA, located in theuntranslated region between genes II and IV, and that of G4DNA, located in the intergenic space between genes F andG, are involved in the transcription of the RNA primerrequired for complementary strand synthesis. For 4X174,the origins of complementary strand synthesis are multipleand almost randomly located. The hairpin structure betweengenes F and G is involved in the formation of the preprimo-some (13).

DISCUSSION

With 4,421 nucleotide residues, the single-stranded circu-lar DNA of S. melliferum SpV4 is one of the smallestprocaryotic viral DNAs known. Despite its small size itseems to possess nine ORFs. The identification of theseORFs is based on the assumption that TGA is not a termi-nation codon. No ORFs could be detected when all three

J. BACTERIOL.

Page 10: Spiroplasma Virus 4: Nucleotide Sequence of theViral DNA ...

SPIROPLASMA VIRUS 4 GENOME 4959

528T

T G

C T

A.T

T.A

T TA

T. A

A TT.A

A . T

C.G

G.C

G.C

G.C

I.CT.A

.T

C . GG. C

T A Start ofORF.G-T-A-G-T-T-G C- G-A-C-A-G-A-A-A.G.G-AGTGT-T-G-AT-T-T A-G-ON

FIG. 9. Secondary structure of untranslated region betweenORF2 and ORF8.

termination codons were considered as such. The findingthat in the mollicute S. melliferum TGA does not seem tofunction as a termination codon is similar to that in themollicute M. capricolum for which Yamao et al. (36) haveshown that TGA codes for tryptophan. It is likely that inspiroplasmas, TGA also codes for tryptophan, as we haverecently proposed (23) for the following reason. The gene forspiralin, the major membrane protein of S. citri, has beencloned in E. coli HB101, and the recombinant bacterial cloneexpresses spiralin (19). The replicative form of SpV4 has

also been cloned in E. coli HB101 and found to be infectiousin spiroplasmas by transfection (20), indicating that thecloned RF contained in particular a biologically active capsidprotein gene. However, in the recombinant E. coli clones,expression of the capsid protein did not occur, whatever thecloning strategy used. These results are easily explained if itis assumed that TGA codes for tryptophan. Indeed, spiralincontains no tryptophan (35), hence no TGA codon, and canthus be expressed in the bacterium. On the contrary, thegene for the capsid protein (ORF1) has nine TGA codons(Fig. 2; Table 1) and cannot be fully expressed in E. coli,since the ribosomes will stop at the first UGA codonencountered on the mRNA. The absence of TGA codonsfrom the spiralin gene has recently been confirmed by thesequence determination of this gene (C. Chevalier, C. Sail-lard, and J.-M. Bovd, unpublished data).The universal codon for tryptophan is TGG. The gene for

the capsid protein contains one TGG codon besides the nineTGA codons. Under the assumption that TGA codes fortryptophan in spiroplasmas, the fact that there are nine TGAcodons for one TGG codon is only a particular case of amore general phenomenon, namely, the preferred usage ofA- (or T-) terminated codons in SpV4 ORFs (Table 1). Thepredominance of A- or T-terminated codons probably re-flects the high A+T content of SpV4 DNA: 68 mol%,compared with 55.2% for 4XX174 DNA and 54.3% for G4DNA. Even though SpV4 DNA contains as much T (33.9mol%) as A (34 mol%), the codons terminated by T are morefrequent than those ended by A. The high frequency of T inthe third position was also noted for X174 codons and to alesser extent for G4 codons, but it is more pronounced forSpV4 DNA. The more frequent use of T rather than A as thelast codon base allows more options for anticodons read byusing wobble.The promoterlike regions of the SpV4 DNA that were

identified had sequences very similar to those of the bacterialpromoters used by E. coli or B. subtilis RNA polymeraseswhen a70 or a43 are the respective sigma factors. Also, eachone of the nine viral ORFs has a bacterial Shine-Dalgarnosequence similar to that found in gram-positive bacteria (16).

-1O(P3)I | 3932 * 1

CATTGTCAAAAAGGGTTAAATAACCCTTTTTTTAGAAAG-OHHO-GTAACAGTTTTTCCCAATTTATTGGGAAAAAAATCTTTC

+ strand- strand

CAUUGUCAAAAAGGGUUAAAUAACCCUUUUUUUA-OH

RNA Structure

A-AI %

A U

% I

U .A

*.

G.CG.C

. c'I I

A.U

A.UA . UI I

A.U

A.UA. U

A.U

U-U-G-U-C U-U-A-OHFIG. 10. DNA sequence and RNA structure of SpV4 putative transcription terminator.

RNA

VOL. 169, 1987

-35(P-1)

Page 11: Spiroplasma Virus 4: Nucleotide Sequence of theViral DNA ...

4960 RENAUDIN ET AL.

It is quite probable that the host spiroplasma genome has thesame bacteriumlike promoters and Shine-Dalgarno se-quences as the SpV4 genome. This seems to be so sincespiroplasma genes such as the spiralin gene can be expressedin E. coli from their own promoters (19). This imnplies thatthe same spiroplasma promoters are recognized by two RNApolymerases: the one from spiroplasma and the one from E.coli. This is not surprising, since we have recently shownthat the RNA polymerase from S. melliferum has a subunitstructure of the type PP'a2 (7), very similar to that of theeubacterial enzyme. In addition, Rogers et al. (27) haveobtained from sequencing data direct evidence for the oc-currence in S. melliferum of at least one bacteriumlikepromoter upstream to a tRNA gene cluster.The spiroplasmal virus SpV4 and the bacterial viruses G4

and 4X174 have several similarities. The three viruses areisometric, lytic, and contain single-stranded circular DNA,of 4,421 nucleotides for SpV4 and 5,386 and 5,577 nucleo-tides, respectively, for 4X174 and G4 (8, 29). Despite theirsmall sizes, they code for approximately the same number ofproteins, 11 for G4 and 4X174 and 9 for SpV4. However,SpV4 has only one.capsid protein, while G4 and 4X174 havefour. The genes or ORFs are located in all three readingframes with several overlapping regions.The only identified SpV4 protein is the capsid protein.

ORF1 has been identified as the gene for this protein. With1,662 nucleotide residues, this gene represents more thanone-third of the genome. The genes of the four capsidproteins of G4 and 4X174 occupy more than one-half of thegenome.

ACKNOWLEDGMENTS

We thank J. Vandekerckhove, M. Puype, and J. Van Damme forsequencing the NH2-terminal end of SpV4 capsid protein.

LITERATURE CITED

1. Amersham International plc. 1984. M13 cloning and sequencinghandbook. Amersham International plc., London.

2. Bove, J.-M., C. Saillard, P. Junca, J. R. Degorce-Dumas, B.Ricard, A. Nhami, R. F. Whitcomb, D. Williamson, and J. G.Tully. 1982. Guanine-plus-cytosine content, hybridization per-centages and Ecorl restriction enzyme profiles of spiroplasmalDNA. Rev. Infect. Dis. 4:5129-5135.

3. Clark, T. B., R. F. Whitcomb, J. G. TuHy, C. Mouches, C.Saillard, J.-M. Bov6, H. Wroblewski, P. Carle, D. L. Rose, R. B.Henegar, and D. Williamson. 1985. Spiroplasma melliferum, anew species from the honeybee (Apis mellifera). Int. J. Syst.Bacteriol. 35:296-308.

4. Deininger, P. L. 1983. Random subcloning of sonicated DNA:application to shotgun DNA sequence analysis. Anal. Biochem.129:219-223.

5. Fristensky, B., J. Lis, and R. Wu. 1982. Portable micro com-puter software for nucleotide sequence analysis. Nucleic AcidsRes. 10:6451-6463.

6. Frydenderg, J., and C. Christiansen. 1985. The sequence of 16SrRNA from mycoplasma strain PG50. DNA 4:127-137.

7. Gadeau, A. P., C. Mouches, and J.-M. Bove. 1986. Probableinsensitivity of mollicutes to rifampin and characterization ofspiroplasmal DNA-dependent RNA polymerase. J. Bacteriol.166:824-828.

8. Godson, G. N., B. G. Barreil, R. Staden, and J. C. Fiddes. 1978.Nucleotide sequence of bacteriophage G4 DNA. Nature (Lon-don) 276:236-247.

9. Godson, G. N., J. C. Fiddes, B. G. Barrell, and F. Sanger. 1978.Comparative DNA sequence analysis of the G4 and 4X174genomes, p. 51-83. In D. T. Denharat, D. Dressler, and D. S.Ray (ed.), The single stranded DNA phages. Cold Spring

Harbor Laboratory, Cold Spring Harbor, N.Y.10. Hanahan, D. 1983. Studies on transformation of Escherichia coli

with plasmids. J. Mol. Biol. 166:557-580.11. Hewick, R. M., M. W. Hunkapiller, L. E. Hood, and W. J.

Dreyer. 1981. A gas-liquid solid phase peptide and proteinsequenator. J. Biol. Chem. 256:7990-7997.

12. Iwami, M., A. Muto, F. Yamao, and S. Osawa. 1984. Nucleotidesequence of rrnB 16S ribosomal RNA gene from Mycoplasmacapricolum. Mol. Gen. Genet. 196:317-322.

13. Kornberg, A. (ed.). 1982. Supplement to DNA replication.W. H. Freeman & Co., San Francisco.

14. Kyte, J., and R. F. Doolitte. 1982. A simple method for display-ing the hydropathic character of a protein. J. Mol. Biol.157:105-132.

15. Maniatis, T., E. F. Fritsch, and J. Sambrook. 1982. Molecularcloning, a laboratory manual. Cold Spring Harbor Laboratory,Cold Spring Harbor, N.Y.

16. McLaughlin, J. R., C. L. Murray, and D. L. Thomas. 1981.Unique features in the ribosome binding site sequence of thegram-positive Staphylococcus aureus ,-lactamase gene. J. Biol.Chem. 256:11283-11291.

17. Messing, J., and J. Vieira. 1982. A new pair of M13 vectors forselecting either DNA strand of double-digest restriction frag-ments. Gene 19:269-276.

18. Moran, C. P., L. Naomi, S. F. J. Legrice, G. Lee, M. Stephens,A. L. Sonenshein, J. Pero, and R. Losick. 1982. Nucleotidesequences that signal the initiation of transcription and transla-tion in Bacillus subtilis. Mol. Gen. Genet. 186:339-346.

19. Mouches, C., T. Candresse, G. Barroso, C. Saillard, H.Wroblewski, and J. M. Bove. 1985. Gene for spiralin, the majormembrane protein of the helical mollicute Spiroplasma citri:cloning and expression in Escherichia coli. J. Bacteriol.164:1094-1099.

20. Pascarel-Devilder, M. C., J. Renaudin, and J.-M. BovC. 1986.The spiroplasma virus 4 replicative form cloned in Escherichiacoli transfects spiroplasmas. Virology 151:390-393.

21. Pribnow, D. 1975. Nucleotide sequence of an RNA polymerasebinding site at an early T4 promoter. Proc. Natl. Acad. Sci.USA 72:343-361.

22. Renaudin, J., M. C. Pascarel, M. Garnier, P. Carle-Junca, andJ.-M. Bove. 1984. SpV4, a new spiroplasma virus with circular,single stranded DNA. Ann. Virol. 135E:343-361.

23. Renaudin, J., M. C. Pascarel, C. Sailiard, C. Chevalier, andJ.-M. Bove. 1986. Chez les spiroplasmes le codon UGA n'estpas non sens et semble coder pour le tryptophane. C.R. Acad.Sci. Paris Ser. III 303:539-540.

24. Reznikoff, W. S., D. A. Siegele, D. W. Cowing, and C. A. Gross.1985. The regulation of transcription initiation in bacteria.Annu. Rev. Genet. 19:355-387.

25. Rigby, P. W. J., M. Dieckmann, L. Rhodes, and P. Berg. 1977.Labelling deoxyribonucleic acid to high specific activity in vitroby nick translation with DNA polymerase I. J. Mol. Biol. 113:237.

26. Rogers, M. J., J. Simmons, R. T. Walker, W. G. Weisburg,C. R. Woese, R. J. Tanner, I. M. Robinson, D. A. Stahl, G.Olsen, R. H. Leach, and J. Maniloff. 1985. Construction of themycoplasma evolutionary tree from Ss rRNA sequence data.Proc. Natl. Acad. Sci. USA 82:1160-1164.

27. Rogers, M. J., A. A. Steinmetz, and R. T. Walker. 1986. Thenucleotide sequence of a tRNA gene cluster from Spiroplasmamelliferum. Nucleic Acids Res. 14:3145.

28. Rosenberg, M., and D. Court. 1979. Regulatory sequencesinvolved in the promotion and termination of RNA transcrip-tion. Annu. Rev. Genet. 13:319-353.

29. Sanger, F., G. M. Air, B. G. Barrell, N. L. Brown, A. R.Coulson, J. C. Fiddes, C. A. Hutchinson III, P. M. Slocombe, andM. Smith. 1977. Nucleotide sequence of bacteriophage 4)X174DNA. Nature (London) 265:68.

30. Sanger, F., S. Nicklenm, and A. R. Coulson. 1977. DNA se-quencing with chain terminating inhibitors. Proc. Natl. Acad.Sci. USA 74:5463-5467.

31. Shine, J., and L. Dalgarno. 1974. The 3' terminal sequence ofEscherichia coli 16S ribosomal RNA: complementarity to non-sense triplets and ribosome binding sites. Proc. Natl. Acad. Sci.

J. BACTERIOL.

Page 12: Spiroplasma Virus 4: Nucleotide Sequence of theViral DNA ...

VOL. 169, 1987 SPIROPLASMA VIRUS 4 GENOME 4961

USA 71:1343-1346.32. Vandekerckhove, J., G. Bauw, M. Puype, J. Van Damme, and

M. Van Montagu. 1985. Protein-blotting on Polybrene-coatedglass-fiber sheets. Eur. J. Biochem. 152:9-19.

33. Wilbur, W. J., and D. J. Lipman. 1983. Rapid similaritysearches of nucleic acid and protein data banks. Proc. Natl.Acad. Sci. USA 80:726-730.

34. Woese, C. R., J. Maniloff, and L. B. Zablen. 1980. Phylogeneticanalysis of the mycoplasmas. Proc. Natl. Acad. Sci. USA

77:494-498.35. Wroblewski, H., D. Robic, D. Thomas, and A. Blanchard. 1984.

Comparison of the amino acid compositions and antigenicproperties of spiralins purified from the plasma membranes ofdifferent spiroplasmas. Ann. Microbiol. (Paris) 135A:73-82.

36. Yamao, F., A. Muto, Y. Kawauchi, M. Iwami, S. Iwagani, Y.Azumi, and S. Osawa. 1985. UGA is read as tryptophan inMycoplasma capricolum. Proc. Natl. Acad. Sci. USA 82:2306-2309.


Recommended